1. 10 6月, 2020 1 次提交
  2. 09 6月, 2020 2 次提交
  3. 08 6月, 2020 1 次提交
  4. 06 6月, 2020 5 次提交
  5. 05 6月, 2020 5 次提交
  6. 04 6月, 2020 5 次提交
    • H
      Fix plan difference in gporca regression test · ed9b9eea
      Hans Zeller 提交于
      ed9b9eea
    • W
      Add "FILL_MISSING_FIELDS" option for gpload. · 87fef901
      Wen Lin 提交于
      87fef901
    • H
      Support "NDV-preserving" function and op property · 392c2e97
      Hans Zeller 提交于
      Orca uses this property for cardinality estimation of joins.
      For example, a join predicate foo join bar on foo.a = upper(bar.b)
      will have a cardinality estimate similar to foo join bar on foo.a = bar.b.
      
      Other functions, like foo join bar on foo.a = substring(bar.b, 1, 1)
      won't be treated that way, since they are more likely to have a greater
      effect on join cardinalities.
      
      Since this is specific to ORCA, we use logic in the translator to determine
      whether a function or operator is NDV-preserving. Right now, we consider
      a very limited set of operators, we may add more at a later time.
      
      Let's assume that we join tables R and S and that f is a function or
      expression that refers to a single column and does not preserve
      NDVs. Let's also assume that p is a function or expression that also
      refers to a single column and that does preserve NDVs:
      
      join predicate       card. estimate                         comment
      -------------------  -------------------------------------  -----------------------------
      col1 = col2          |R| * |S| / max(NDV(col1), NDV(col2))  build an equi-join histogram
      f(col1) = p(col2)    |R| * |S| / NDV(col2)                  use NDV-based estimation
      f(col1) = col2       |R| * |S| / NDV(col2)                  use NDV-based estimation
      p(col1) = col2       |R| * |S| / max(NDV(col1), NDV(col2))  use NDV-based estimation
      p(col1) = p(col2)    |R| * |S| / max(NDV(col1), NDV(col2))  use NDV-based estimation
      otherwise            |R| * |S| * 0.4                        this is an unsupported pred
      Note that adding casts to these expressions is ok, as well as switching left and right side.
      
      Here is a list of expressions that we currently treat as NDV-preserving:
      
      coalesce(col, const)
      col || const
      lower(col)
      trim(col)
      upper(col)
      
      One more note: We need the NDVs of the inner side of Semi and
      Anti-joins for cardinality estimation, so only normal columns and
      NDV-preserving functions are allowed in that case.
      
      This is a port of these GPDB 5X and GPOrca PRs:
      https://github.com/greenplum-db/gporca/pull/585
      https://github.com/greenplum-db/gpdb/pull/10090
      
      (cherry picked from commit 3ccd1ebfa1ea949ac77ed3b5d8f5faadfa87affd)
      
      Also updated join.sql expected files with minor motion changes.
      392c2e97
    • D
      Docs - update PXF, gpss versions · 235ab4ff
      David Yozie 提交于
      235ab4ff
    • D
      Docs - update versioning & build for 6.8 release · c1f648ae
      David Yozie 提交于
      c1f648ae
  7. 03 6月, 2020 5 次提交
    • S
      Remove unnecessary projections from duplicate sensitive Distribute(s) in ORCA · 3f84adff
      Shreedhar Hardikar 提交于
      Duplicate sensitive HashDistribute Motions generated by ORCA get
      translated to Result nodes with hashFilter cols set. However, if the
      Motion needs to distribute based on a complex expression (rather than
      just a Var), the expression must be added into the targetlist of the
      Result node and then referenced in hashFilterColIdx.
      
      However, this can affect other operators above the Result node. For
      example, a Hash operator expects the targetlist of its child node to
      contain only elements that are to be hashed. Additional expressions here
      can cause issues with memtuple bindings that can lead to errors.
      
      (E.g The attached test case, when run without our fix, will give an
      error: "invalid input syntax for integer:")
      
      This PR fixes the issue by adding an additional Result node on top of
      the duplicate sensitive Result node to project only the elements from
      the original targetlist in such cases.
      3f84adff
    • J
      Fixed the \dm empty output error · 387bf9cd
      Jinbao Chen 提交于
      The psql client ignored rel storage when he create the \dm command.
      So the output of \dm was empty. Add the correct rel storage check in command
      387bf9cd
    • X
      Revert "Upgrade pgbouncer to 1.13" · 0da92fbc
      Xiaoran Wang 提交于
      This reverts commit 412493b0.
      
      Failed to compile pgbouncer on centos6: can't find libevent.
      pgbouncer 1.13 uses pkg-config to look libevent up instead of
      using --with-libevent.
      
      Another issue is: pgbouncer does not support libevent version 1.x
      in 1.13 version, but we use libevent 1.4 on centos6.
      0da92fbc
    • X
      Upgrade pgbouncer to 1.13 · 412493b0
      Xiaoran Wang 提交于
      412493b0
    • H
      Refactoring the DbgPrint and OsPrint methods (#10149) · d9b16e34
      Hans Zeller 提交于
      * Make DbgPrint and OsPrint methods on CRefCount
      
      Create a single DbgPrint() method on the CRefCount class. Also create
      a virtual OsPrint() method, making some objects derived from CRefCount
      easier to print from the debugger.
      
      Note that not all the OsPrint methods had the same signatures, some
      additional OsPrintxxx() methods have been generated for that.
      
      * Making print output easier to read, print some stuff on demand
      
      Required columns in required plan properties are always the same
      for a given group. Also, equivalent expressions in required distribution
      properties are important in certain cases, but in most cases they
      disrupt the display and make it harder to read.
      
      Added two traceflags, EopttracePrintRequiredColumns and
      EopttracePrintEquivDistrSpecs that have to be set to print this
      information. If you want to go back to the old display, use these
      options when running gporca_test: -T 101016 -T 101017
      
      * Add support for printing alternative plans
      
      A new method, CEngine::DbgPrintExpr() can be called from
      COptimizer::PexprOptimize, to allow printing of the best plan
      for different contexts. This is only enabled in debug builds.
      
      To use this:
      
      - run an MDP using gporca_test, using a debug build
      - print out memo after optimization (-T 101006 -T 101010)
      - set a breakpoint near the end of COptimizer::PexprOptimize()
      - if, after looking at the contents of memo, you want to see
        the optimal plan for context c of group g, do the following:
        p eng.DbgPrintExpr(g, c)
      
      You could also get the same info from the memo printout, but it
      would take a lot longer.
      
      (cherry picked from commit b3fdede6)
      d9b16e34
  8. 02 6月, 2020 5 次提交
    • J
      Add missing field skipData in RefreshClause serialization. (#10219) · 7c47e862
      Jinbao Chen 提交于
      SkipData flag should only short circuit in transientrel_receive on QE
      
      We should still do the begin/end work, e.g. remove the new
      created temp file, or we will have file leak.
      Co-authored-by: NJinbao Chen <jinchen@pivotal.io>
      Co-authored-by: NHubert Zhang <hzhang@pivotal.io>
      7c47e862
    • J
      Make concurrent refresh check early that there is a unique index on matview. (#10215) · 37d05b30
      Jinbao Chen 提交于
      In REFRESH MATERIALIZED VIEW command, CONCURRENTLY option is only
      allowed if there is at least one unique index with no WHERE clause on
      one or more columns of the matview. Previously, concurrent refresh
      checked the existence of a unique index on the matview after filling
      the data to new snapshot, i.e., after calling refresh_matview_datafill().
      So, when there was no unique index, we could need to wait a long time
      before we detected that and got the error. It was a waste of time.
      
      To eliminate such wasting time, this commit changes concurrent refresh
      so that it checks the existence of a unique index at the beginning of
      the refresh operation, i.e., before starting any time-consuming jobs.
      If CONCURRENTLY option is not allowed due to lack of a unique index,
      concurrent refresh can immediately detect it and emit an error.
      
      Author: Masahiko Sawada
      Reviewed-by: Michael Paquier, Fujii Masao
      Co-authored-by: NFujii Masao <fujii@postgresql.org>
      37d05b30
    • R
      Fix how must_gather is determined with LIMIT ALL · cb9c2aa4
      Richard Guo 提交于
      If there is ORDER BY or DISTINCT in the query, we need to bring all the
      data to a single node by setting must_gather to be true. An exception is
      when there's a LIMIT or OFFSET clause, which would be handled later when
      inserting Limit node. Here to tell if there is any LIMIT or OFFSET
      clause, we should use limit_needed, instead of checking limitCount or
      limitOffset directly.
      
      Fixes issue #9746.
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Reviewed-by: NEkta Khanna <ekhanna@pivotal.io>
      cb9c2aa4
    • J
      pg_upgrade: speed up AO segment queries · fddcc10a
      Jacob Champion 提交于
      The query to obtain AO auxiliary catalog names is relatively expensive
      compared to the other aux table queries, and it was being performed once
      for every AO table. Consolidate all calls into a single query, and
      manually join the results with the other relation info by relid.
      
      Also improve the "couldn't find aux tables" FATAL message for easier
      debugging (it needs to include the dbname).
      fddcc10a
    • J
      pg_upgrade: fix gpdb5_schedule flake · 19a8740d
      Jacob Champion 提交于
      The large_objects test invokes pg_upgrade and must not run concurrently
      with the setup DDL. This led to intermittent failures, since some setup
      scripts temporarily add objects that aren't upgradable.
      19a8740d
  9. 01 6月, 2020 1 次提交
    • H
      Refactor the resource management in INITPLAN function · 3506e0e5
      Hubert Zhang 提交于
      We introduce function which runs on INITPLAN in commit a21ff2
      INITPLAN function is designed to support "CTAS select * from udf();"
      Since udf() is run on EntryDB, but EntryDB is always read gang which
      cannot do dispatch work, the query would fail if function contains DDL
      statement etc.
      
      The idea of INITPLAN function is to run the function on INITPLAN, which
      is QD in fact and store the result into a tuplestore. Later the FunctionScan
      on EntryDB just read tuple from tuplestore instead of running the real function.
      
      But the life cycle management is a little tricky. In the original commit, we
      hack to close the tuplestore in INITPLAN without deleting the file, and let
      EntryDB reader to delete the file after finishing the tuple fetch. This will
      introduce file leak if the transaction abort before the entryDB runs.
      
      This commit add a postprocess_initplans in ExecutorEnd() of the main plan to
      clean the tuplestore createed in preprocess_initplans in ExecutorStart() of
      the main plan. Note that postprocess_initplans must be place after the dispatch
      work are finished i.e. mppExecutorFinishup().
      Upstream don't need this function since it always use scalar PARAM to communicate
      between INITPLAN and main plan.
      
      cherry-pick from: f669acf7
      3506e0e5
  10. 30 5月, 2020 3 次提交
    • W
      Fix ccnt overflow in gpperfmon (#10205) · 6fbca676
      Wang Hao 提交于
      For some reason, gpmon_qexeckey_t structure used int16 for ccnt while all other GP code operates int32. This problem can cause ccnt overflow in gpperfmon packets.
      
      This problem doesn't affect master branch as gpperfmon code has been removed from it.
      But it seems to affect 6X_STABLE and 5X_STABLE branches.
      
      Authored-by: Denis Smirnov darthunix@gmail.com
      Reviewed-by: Hao Wang haowang@pivotal.io
      6fbca676
    • C
      Penalize cross products in Orca's DPv2 algorithm more accurately (#10029) · 7b034832
      Chris Hajas 提交于
      Previously in the DPv2 transform (exhaustive2) while we penalized
      cross joins for the remaining joins in greedy, we did
      not for the first join, which in some cases selected a cross join.
      This ended up selecting a poor join order in many cases and went against
      the intent of the alternative being generated, which is to minimize
      cross joins.
      
      We also increase the cost of the default penalty from 5 to 1024, which is the value we use in the cost model during the optimization stage.
      
      The greedy alternative also wasn't kept in the heap, so we include that now too.
      
      (cherry picked from commit 457bb928)
      7b034832
    • C
      Ensure Material nodes generated by Orca always materialize · 389f08a4
      Chris Hajas 提交于
      In cases where Orca generates a NLJ with a parameter on the inner side,
      the executor will not pass the EXEC_FLAG_REWIND flag down, as it assumed
      the inner side will always need to be rescanned. The material node will therefore
      not have its rewind flag set and can act as a no-op.
      
      This is not always correct. While the executor will set EXEC_FLAG_REWIND
      if the Materialize is directly above a motion, it does not recognize
      the case where the Materialize is on the inner side with other nodes
      between it and the motion, even though the Materialize serves to
      prevent a rescan of the underlying Motion node.
      
      This causes the execution to fail with:
      `Illegal rescan of motion node: invalid plan (nodeMotion.c:1623)` as it
      would attempt to rescan a motion.
      
      Since Orca only produces Materialize when necessary, either for
      performance reasons or to prevent rescan of an underlying Motion,
      EXEC_FLAG_REWIND should be set for any Materialize generated by Orca.
      
      Below is a valid plan generated by Orca:
      
      ```
       Result  (cost=0.00..3448.01 rows=1 width=4)
         ->  Nested Loop  (cost=0.00..3448.01 rows=1 width=1)
               Join Filter: true
               ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..431.00 rows=2 width=4)
                     ->  Seq Scan on foo1  (cost=0.00..431.00 rows=1 width=4)
               ->  Result  (cost=0.00..431.00 rows=1 width=1)
                     Filter: (foo1.a = foo2.a)
                     ->  Materialize  (cost=0.00..431.00 rows=1 width=4)
                           ->  Hash Semi Join  (cost=0.00..431.00 rows=1 width=4)
                                 Hash Cond: (foo2.b = foo3.b)
                                 ->  Gather Motion 3:1  (slice2; segments: 3)  (cost=0.00..0.00 rows=1 width=8)
                                       ->  Bitmap Heap Scan on foo2  (cost=0.00..0.00 rows=1 width=8)
                                             Recheck Cond: (c = 3)
                                             ->  Bitmap Index Scan on f2c  (cost=0.00..0.00 rows=0 width=0)
                                                   Index Cond: (c = 3)
                                 ->  Hash  (cost=431.00..431.00 rows=1 width=4)
                                       ->  Gather Motion 3:1  (slice3; segments: 3)  (cost=0.00..431.00 rows=2 width=4)
                                             ->  Seq Scan on foo3  (cost=0.00..431.00 rows=1 width=4)
       Optimizer: Pivotal Optimizer (GPORCA)
       ```
      Co-authored-by: NChris Hajas <chajas@pivotal.io>
      Co-authored-by: NShreedhar Hardikar <shardikar@pivotal.io>
      389f08a4
  11. 28 5月, 2020 3 次提交
    • L
      docs - gpinitsystem -I format addition (#10167) · 9aaf570d
      Lena Hunter 提交于
      * clarifying pg_upgrade note
      
      * gpinitsystem -I second format
      
      * gpinitsystem edits
      
      * edits from review
      9aaf570d
    • X
      Fix flaky resgroup test · 2b58ab75
      xiong-gang 提交于
      pg_resgroup_move_query is an asychronous operation, pg_resgroup_move_query
      complete doesn't mean the query has already move to the destination group.
      2b58ab75
    • S
      Log fewer errors (#10100) · d6f92610
      Sambitesh Dash 提交于
      This is a continuation of commit 456b2b31 in GPORCA. Adding more errors to the list that
      doesn't get logged in log file. We are also removing the code that writes to std::cerr,
      generating a not very nice looking log message. Instead, add the info whether the error was
      unexpected to another log message that we also generate.
      
      The original commit on the master branch is fba77702.
      d6f92610
  12. 27 5月, 2020 1 次提交
  13. 26 5月, 2020 1 次提交
    • P
      Fix a hung issue caused by gp_interconnect_id disorder · 7a35ed6a
      Pengzhou Tang 提交于
      This issue is exposed when doing an experiment to remove the
      special "eval_stable_functions" handling in evaluate_function(),
      qp_functions_in_* test cases will get stuck sometimes and it turns
      out to be a gp_interconnect_id disorder issue.
      
      Under UDPIFC interconnect, gp_interconnect_id is used to
      distinguish the executions of MPP-fied plan in the same session
      and in the receiver side, packets with smaller gp_interconnect_id
      is treated as 'past' packets, receiver will stop the sender to send
      the packets.
      
      The RCA of the hung is:
      1. QD call InitSliceTable() to advance the gp_interconnect_id and
      store it in slice table.
      2. In CdbDispatchPlan->exec_make_plan_constant(), QD find some
      stable function need to be simplified to const, then it executes
      this function first.
      3. The function contains the SQL, QD init another slice table and
      advance the gp_interconnect_id again, QD dispatch the new plan and
      execute it.
      4. After the function is simplified to const, QD continues to dispatch
      the previous plan, however, the gp_interconnect_id for it becomes the
      older one. When a packet comes, if the receiver hasn't set up the
      interconnect yet, the packet will be handled by handleMismatch() and
      it will be treated as `past` packets and the senders will be stopped
      earlier by the receiver. Later the receiver finish the setup of
      interconnect, it cannot get any packets from senders and get stuck.
      
      To resolve this, we advance the gp_interconnect_id when a plan is
      really dispatched, the plan is dispatched sequentially, so the later
      dispatched plan will have a higher gp_interconnect_id.
      
      Also limit the usage of gp_interconnect_id in rx thread of UDPIFC,
      we prefer to use sliceTable->ic_instance_id in main thread.
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Reviewed-by: NAsim R P <apraveen@pivotal.io>
      Reviewed-by: NHubert Zhang <hzhang@pivotal.io>
      7a35ed6a
  14. 25 5月, 2020 2 次提交