1. 05 8月, 2020 4 次提交
  2. 04 8月, 2020 3 次提交
  3. 03 8月, 2020 7 次提交
    • D
      9f305aa8
    • G
      Change the unit of the GUC from kb to mb · 3ef5e267
      Gang Xiong 提交于
      3ef5e267
    • G
      Make max_slot_wal_keep_size work on 6X · ea69506b
      Gang Xiong 提交于
      1. change the GUC unit from MB to KB as 6X doesn't have GUC_UNIT_MB.
      2. the upstream commit added 3 fields in the system view
         'pg_replication_slots', this commit remove that change since we cannot make
         catalog change on 6X.
      3. upstream uses 'slot->active_pid' to identify the process that acquired the
         replication slot, this commit added 'walsnd' in 'ReplicationSlot' to do the
         same.
      4. upstream uses condition variable to wait the walsender exit, this commit
         uses WalSndWaitStoppingOneWalSender as we don't have condition variable on 6X.
      5. add test cases.
      ea69506b
    • A
      Allow users to limit storage reserved by replication slots · 7a274622
      Alvaro Herrera 提交于
      Replication slots are useful to retain data that may be needed by a
      replication system.  But experience has shown that allowing them to
      retain excessive data can lead to the primary failing because of running
      out of space.  This new feature allows the user to configure a maximum
      amount of space to be reserved using the new option
      max_slot_wal_keep_size.  Slots that overrun that space are invalidated
      at checkpoint time, enabling the storage to be released.
      
      Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
      Reviewed-by: NMasahiko Sawada <sawada.mshk@gmail.com>
      Reviewed-by: NJehan-Guillaume de Rorthais <jgdr@dalibo.com>
      Reviewed-by: NÁlvaro Herrera <alvherre@alvh.no-ip.org>
      Discussion: https://postgr.es/m/20170228.122736.123383594.horiguchi.kyotaro@lab.ntt.co.jp
      7a274622
    • W
      Add "FILL_MISSING_FIELDS" option for gpload. · 7afdd72c
      Wen Lin 提交于
      This reverts commit 7118e8ac.
      7afdd72c
    • (
      Resolve high `CacheMemoryContext` usage for `ANALYZE` on large partition table. (#10554) · f8c8265a
      (Jerome)Junfeng Yang 提交于
      In some cases, merge stats logic for root partition table may consume
      very high memory usage in CacheMemoryContext.
      This may lead to `Canceling query because of high VMEM usage` when
      concurrently ANALYZE partition tables.
      
      For example, there are several root partition tables and they both have
      thousands of leaf tables. And these tables are all wide tables that may
      contain hundreds of columns.
      So when analyze()/auto_stats() leaf tables concurrently,
      `leaf_parts_analyzed` will consume lots of memory(catalog catch for
      pg_statistic and pg_attribute) under
      CacheMemoryContext for each backend, which may hit the protect VMEM
      limit.
      In `leaf_parts_analyzed`, a single backend's leaf table analysis for a
      root partition table, it may add cache entries up to
      number_of_leaf_tables * number_of_columns tuples from pg_statistic and
      number_of_leaf_tables * number_of_columns tuples from pg_arrtibute.
      Set guc `optimizer_analyze_root_partition` or
      `optimizer_analyze_enable_merge_of_leaf_stats` to false could skip merge
      stats for root table and `leaf_parts_analyzed` will not execute.
      
      To resolve this issue:
      1. When checking whether merge stats are available for a root table in
      `leaf_parts_analyzed`, check whether all leaf tables are ANALYZEd first,
      if they're still un-ANALYZE leaf table exists, return quickly to avoid touch
      columns' pg_attribute and pg_statistic per leaf table(this will save lots of time).
      And also don't rely on system catalog cache and use the
      index to fetch the stats tuple to avoid one-time cache usage(in common cases).
      
      2. When merging a stats in `merge_leaf_stats`, don't rely on system
      catalog cache and use the index to fetch the stats tuple.
      
      There are side-effects for not rely on system catalog cache(which are all **rare** situations).
      1. If insert/update/copy several leaf tables which under **same
      root partition** table in **same session** and all leaf tables are **analyzed**
      will be much slower since auto_stats will call `leaf_parts_analyzed` once the leaf
      table gets updated, and we don't rely on system catalog cache now.
      (`set optimizer_analyze_enable_merge_of_leaf_stats=false` could avoid
      this)
      
      2. ANALYZE the same root table several times in the same session is much
      slower than before since we don't rely on system catalog cache.
      
      Seems this solution improves the performance for ANALYZE, and
      it also makes ANALYZE won't hit the memory issue anymore.
      
      (cherry picked from commit 533a47dd)
      f8c8265a
    • N
      ic-proxy: handle early coming BYE correctly · bd8959f6
      Ning Yu 提交于
      In a query that contains multiple init/sub plans, the packets of the
      second subplan might be received while the first is still being
      processed in the ic-proxy mode, this is because in ic-proxy mode a local
      host handshake is used instead of the global one.
      
      To distinguish the packets of different subplans, especially for the
      early coming ones, we must stop handling on the BYE immediately, and
      pass any unhandled early coming pkts to the successor or the
      placeholder.
      
      This fixes the random hanging during the ICW parallel group of
      qp_functions_in_from.  No new test is added.
      Co-authored-by: NHubert Zhang <hzhang@pivotal.io>
      Co-authored-by: NNing Yu <nyu@pivotal.io>
      (cherry picked from commit 79ff4e62)
      bd8959f6
  4. 01 8月, 2020 1 次提交
    • B
      gpinitsystem: use new 6-field ARRAY format internally for QD and QEs · 27038bd4
      bhuvnesh chaudhary 提交于
      The initialization file (passed as gpinitsystem -I <file>) can have two
      formats: legacy (5-field) and new (6-field, that has the HOST_ADDRESS).
      
      This commit fixes a bug in which an internal sorting routine that matched
      a primary with its corresponding mirror assumed that <file> was always
      in the new format.  The fix is to convert any input <file> to the new
      format via re-writing the QD_ARRAY, PRIMARY_ARRAY and MIRROR_ARRAY to
      have 6 fields.  We also always use '~' as the separator instead of ':'
      for consistency.
      
      The bug fixed is that a 5-field <file> was being sorted numerically,
      causing either the hostname (on a multi-host cluster) or the port (on
      a single-host cluster) to be used to sort instead or the content.
      This could result in the primary and its corresponding mirror being
      created on different contents, which fortunately hit an internal error
      check.
      
      Unit tests and a behave test have been added as well.  The behave test
      uses a demo cluster to validate a legacy gpinitsystem initialization
      file format (e.g. one that has 5 fields) successfully creates a
      Greenplum database.
      Co-authored-by: NDavid Krieger <dkrieger@vmware.com>
      27038bd4
  5. 31 7月, 2020 4 次提交
    • A
      Correct and stabilize some replication tests · 15dd8027
      Ashwin Agrawal 提交于
      Adding pg_stat_clear_snapshot() in functions looping over
      gp_stat_replication / pg_stat_replication to refresh result everytime
      the query is run as part of same transaction. Without
      pg_stat_clear_snapshot() query result is not refreshed for
      pg_stat_activity neither for xx_stat_replication functions on multiple
      invocations inside a transaction. So, in absence of it the tests
      become flaky.
      
      Also, tests commit_blocking_on_standby and dtx_recovery_wait_lsn were
      initially committed with wrong expectations, hence were missing to
      test the intended behavior. Now reflect the correct expectation.
      
      (cherry picked from commit c565e988)
      15dd8027
    • A
      Add mirror_replay test to greenplum_schedule · 29ca99ee
      Ashwin Agrawal 提交于
      This was missed in commit 96b332c0.
      
      (cherry picked from commit 8ef5d722)
      29ca99ee
    • C
      Add knowledge of partition selectors to Orca's DPv2 algorithm (#10263) (#10558) · d3886cf2
      Chris Hajas 提交于
      Orca's DP algorithms currently generate logical alternatives based only on cardinality; they do not take into account motions/partition selectors as these are physical properties handled later in the optimization process. Since DPv2 doesn't generate all possible alternatives for the optimization stage, we end up generating alternatives that do not support partition selection or can only place poor partition selectors.
      
      This PR introduces partition knowledge into the DPv2 algorithm. If there is a possible partition selector, it will generate an alternative that considers it, in addition to the previous alternatives.
      
      We introduce new properties, m_contain_PS  to indicate whether a SExpressionInfo contains a PS for a particular expression. We consider an expression to have a possible partition selector if the join expression columns and the partition table's partition key overlap. If they do, we mark this expression as containing a PS for a particular PT.
      
      We consider a good PS one which is selective. Eg:
      ```
      - DTS
      - PS
         -TS
           - Pred
      ```
      
      would be selective. However, if there is no selective predicate, we do not consider this as a promising PS.
      
      For now, we add just a single alternative that satisfies this property and only consider linear trees.
      
      This is a backport of 9c445321
      d3886cf2
    • A
      Improve cardinality for joins using distribution columns in ORCA · 4b473948
      Ashuka Xue 提交于
      This commit only affects cardinality estimation in ORCA when the user
      sets `optimizer_damping_factor_join = 0`. It improves the square root
      algorithm first introduced by commit ce453cf2.
      
      In the original square root  algorithm, we assumed that distribution
      column predicates would have some correlation with other predicates in
      the join and therefore would be accordingly damped when calculating join
      cardinality.
      
      However, distribution columns are ideally unique in order to gain the
      best performance for GPDB. Under this assumption, distribution columns
      should not be correlated and thus needed to be treated as independent
      when calculating join cardinality. This is a best guess since we do not
      have a way to support correlated columns at this time.
      Co-authored-by: NAshuka Xue <axue@vmware.com>
      Co-authored-by: NChris Hajas <chajas@vmware.com>
      4b473948
  6. 30 7月, 2020 1 次提交
    • D
      Add Orca support for index only scan · 93c9829a
      David Kimura 提交于
      This commit allows Orca to select plans that leverage IndexOnlyScan
      node. A new GUC 'optimizer_enable_indexonlyscan' is used to enable or
      disable this feature. Index only scan is disabled by default, until the
      following issues are addressed:
      
        1) Implement cost comparison model for index only scans. Currently,
           cost is hard coded for testing purposes.
        2) Support index only scan using GiST and SP-GiST as allowed.
           Currently, code only supports index only scans on b-tree index.
      Co-authored-by: NChris Hajas <chajas@vmware.com>
      (cherry picked from commit 3b72df18)
      93c9829a
  7. 29 7月, 2020 20 次提交