提交 · revert-10763-gpload_upper_case_column · Greenplum / Gpdb

07 9月, 2020 2 次提交

X
Revert "fix gpload upper letters in column in merge mode (#10763)" · 1b14a018
由 xiaoxiao 提交于 9月 07, 2020
```
This reverts commit 1060a425.
```
1b14a018

fix gpload upper letters in column in merge mode (#10763) · 1060a425

由 xiaoxiao 提交于 9月 07, 2020

* fix gpload fail when capital letters in column name in merge mode

add double quotations in column names when create staging tables
omit distribution key
Co-authored-by: NXiaoxiaoHe <hxiaoxiao@vmware.com>

1060a425

04 9月, 2020 1 次提交

Use return instead of exit() in configure · 6d3c99bb

由 Peter Eisentraut 提交于 8月 30, 2016

Using exit() requires stdlib.h, which is not included.  Use return
instead.  Also add return type for main().
Reviewed-by: NHeikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: NThomas Munro <thomas.munro@enterprisedb.com>
(cherry picked from commit 1c0cf52b)

6d3c99bb

03 9月, 2020 5 次提交

[gpopt] Return a resource-safe type from gpdb::GetRelation · e754018a

由 Jesse Zhang 提交于 8月 27, 2020

This commit takes advantage of the resource-safety afforded by
RelationWrapper by using it as the return type of gpdb::GetRelation().

This allows us to write code like this:

auto rel = GetRelation(...);
if (!RelIsSupported(rel)) {
	return -1;
}
do_stuff(rel);

Instead of code like this before the patch:

Relation rel = GetRelation(...);

if (!RelIsSupported(rel)) {
	CloseRelation(rel);
	return -1;
}

GPOS_TRY {
	do_stuff(rel);
	CloseRelation(rel);
} GPOS_CATCH_EX(ex) {
	CloseRelation(rel);
	GPOS_RETHROW(ex);
} GPOS_CATCH_END;

e754018a

J

[gpopt] Add a simple RAII wrapper for Relation · fb83fa30
由 Jesse Zhang 提交于 8月 27, 2020

fb83fa30

[gpopt] clean up code calling GetRelation · 62d3d2ab

由 Jesse Zhang 提交于 8月 27, 2020

We're about to introduce a different return type to gpdb::GetRelation in
a forthcoming commit. To ease that transition, change Yoda conditions
for pointer non-null comparison to the more idiomatic C++ style of using
pointers in a boolean context. Also remove one redundant fallback
exception.

62d3d2ab

A
Handle the PROCSIG_NOTIFY_INTERRUPT signal · 6d155238
由 Adam Lee 提交于 9月 01, 2020
```
This will make LISTEN and NOTIFY work on the QD node.
```
6d155238

Export MASTER_DATA_DIRECTORY while calling gpconfig · c6467d2c

由 Bhuvnesh Chaudhary 提交于 8月 25, 2020

For gpdb applicance, there are certain GUCs which are set using
gpconfig, but currently it fails as MASTER_DATA_DIRECTORY is not
exported. This commit exports MASTER_DATA_DIRECTORY so that gpconfig
succeeds.

This commit also allows setting DCA_VERSION_FILE to enable testing.
Also add a test for the same to ensure that DCA
configuration GUCs are set properly on the environment.

c6467d2c

02 9月, 2020 4 次提交

H

Correct reset condition in sessionResetSlot · b4cec6c9
由 Hubert Zhang 提交于 9月 02, 2020

b4cec6c9

Fix flaky test 'gangsize' · 2771a51a

由 Hubert Zhang 提交于 9月 02, 2020

In the test case, there's query like 'insert into t select i from
generate_series(1,10) i', the slice of 'generate_series' has the locus of
general, so it might be executed in any segment according to the session id and
that makes the test flaky. To make it deterministic, we change generate_series
to a regular table and filter the data with gp_segment_id. This commit also
removes the alternative expect files.
Co-authored-by: NGang Xiong <gangx@vmware.com>

2771a51a

H

Fix compile error for missing brackets · b2d32cb9
由 Hubert Zhang 提交于 9月 02, 2020

b2d32cb9

Using lwlock to protect resgroup slot in session state · a4cb06b4

由 Hubert Zhang 提交于 9月 02, 2020

Resource group used to access resGroupSlot in SessionState without
lock. This is correct when session only access resGroupSlot by itself.
But as we introduced runaway feature, we need to traverse the current
session array to find the top consumer session when redzone is reached.
This requires:
1. runaway detector should hold shared resgroup lock to avoid resGroupSlot
is detached from a session concurrently when redzone is reached.
2. normal session should hold exclusive lock when modifying resGroupSlot
in SessionState.

Also fix a compile warning.
Reviewed-by: NNing Yu <nyu@pivotal.io>

a4cb06b4

01 9月, 2020 8 次提交

Allow direct dispatch in Orca if predicate on column gp_segment_id (#10679) · 10e2b2d9

由 David Kimura 提交于 9月 01, 2020

This approach special cases gp_segment_id enough to include the column
as a distributed column constraint. It also updates direct dispatch info
to be aware of gp_segment_id which represents the raw value of the
segment where the data resides. This is different than other columns
which hash the datum value to decide where the data resides.

After this change the following DDL shows Gather Motion from 2 segments
on a 3 segment demo cluster.

```
CREATE TABLE t(a int, b int) DISTRIBUTED BY (a);
EXPLAIN SELECT gp_segment_id, * FROM t WHERE gp_segment_id=1 or gp_segment_id=2;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Gather Motion 2:1  (slice1; segments: 2)  (cost=0.00..431.00 rows=1 width=12)
   ->  Seq Scan on t  (cost=0.00..431.00 rows=1 width=12)
         Filter: ((gp_segment_id = 1) OR (gp_segment_id = 2))
 Optimizer: Pivotal Optimizer (GPORCA)
(4 rows)

```

10e2b2d9

Have row and cost estimates in planner represent per-node row counts. · c5f6dbbe

由 Heikki Linnakangas 提交于 9月 01, 2020

This is more in line with upstream parallel plans, where the estimates
also mean "per worker".

NOTE: The rows/tuples/pages in RelOptInfo still represent whole-rel
values. That's the only thing that makes sense for join rels, which
could have Paths with different locus.

This doesn't change the row counts displayed in EXPLAIN output, because
previously we divided the row counts stored on the plan nodes with the
number of segments, for display purposes. With this patch, that's no
longer necessary. You can see the difference in the cost estimates,
however.

This doesn't affect GPORCA's cost model, and the GPORCA translator has
been modified to divide row count estimates in the final plan by the number
of segments, to keep the row counts shown in EXPLAIN comparable with the
Postgres planner's numbers, and unchanged from previous versions.

This includes some changes to GPORCA output files too. Most of the real
changes that are not just to plans in queries where GPORCA falls back,
are because I added an "ANALYZE int8_tbl" to the int8 test. That affects
many test queries that used the int8_tbl table. I added the "ANALYZE
int8_tbl" command to make one of the planner tests to produce the same
plan as before (I forget which one, unfortunately.).

Discussion: https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/cGZsAFiRfBE/m/aq6PKj23AwAJReviewed-by: NZhenghua Lyu <zlv@pivotal.io>
Reviewed-by: NJinbao Chen <jinchen@pivotal.io>

c5f6dbbe

H
Add comments to 'gp_aggregates_costs' test. · 765a526b
由 Heikki Linnakangas 提交于 9月 01, 2020
```
Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
```
765a526b

ic-proxy: Quit proxy bgworker when postmaster is dead · 9ce59d1a

由 Hubert Zhang 提交于 9月 01, 2020

Proxy bgworker will become orphan process after postmaster is dead
due to the lack of checking pipe postmaster_alive_fds[POSTMASTER_FD_WATCH].
Epoll this pipe inside proxy bgworker main loop as well.
Reviewed-by: NNing Yu <nyu@pivotal.io>

9ce59d1a

Fix resource group runaway rounding issue · 757184f9

由 Hubert Zhang 提交于 9月 01, 2020

When calculating safeChunksThreshold of runaway in resource group,
we used to divide by 100 to get the number of safe chunks. This may
lead to small chunk numbers to be rounded to zero. Fix it by storing
safeChunksThreshold100(100 times bigger than the real safe chunk) and
do the computation on the fly.
Reviewed-by: NNing Yu <nyu@pivotal.io>

757184f9

J
Decorate assert-only variables with GPOS_ASSERTS_ONLY · 082ea4c5
由 Jesse Zhang 提交于 8月 27, 2020
```
This is in no way exhaustive, I'm only changing what seems abundantly
obvious and greppable.
```
082ea4c5

Add GPOS_UNUSED attribute · 26ae898a

由 Jesse Zhang 提交于 8月 27, 2020

While we're at it, also add another attribute
GPOS_ASSERTS_ONLY. This should help us eliminate a lot of
clutter around the code that looks like this:

    BOOL result =
            m_cte_consumer_info->Insert(key, GPOS_NEW(m_mp) SCTEConsumerInfo(cte_plan));

26ae898a

[gpopt] Remove dead variables from translator · b6959a6a

由 Jesse Zhang 提交于 8月 27, 2020

With this patch, the whole translator compiles warning free.

null_ndv was orphaned in commit 25479cf1 ("Fix num_distinct
calculation in relcache translator").

coercePathType was dead on arrival in commit cc799db4 ("Fix Relcache
Translator to send CoercePath info (#2842)").

b6959a6a

31 8月, 2020 7 次提交

Fix a 'VACUUM FULL' bug · 082f39d5

由 xiong-gang 提交于 8月 31, 2020

When doing 'VACUUM FULL', 'swap_relation_files' updates the pg_class entry but
not increase the command counter, so the later 'vac_update_relstats' will
inplace update the 'relfrozenxid' and 'relhasindex' of old tuple, when the
transaction is interrupted and aborted on the QE after this, the old entry is
corrupted.
This problem is partially fixed by commit 7f7fa498, this commit seperates the
code of sending stats to QD and call it in `vac_update_relstats` instead of
update the stats on QE.

082f39d5

Fix crash when planner chose an Index Only Scan on a bitmap index. · 0b2f53d5

由 Heikki Linnakangas 提交于 8月 31, 2020

Index Only Scans have not been implemented on Bitmap Indexes, but in
certain circumstances, when the query doesn't need any of the attributes
from the index, like in "SELECT count(*) from table", the planner may
still choose an Index Only Scan. It's debatable if that's actually a
planner bug, but we can easily support that limited case.
Reviewed-by: NAshwin Agrawal <aashwin@vmware.com>

0b2f53d5

Fix DISTINCT plans created on top of pre-sorted inputs. · a9810725

由 Heikki Linnakangas 提交于 8月 31, 2020

If you have a pre-sorted input, like Index Scan, and a DISTINCT clause,
the planner would create an invalid plan. A Redistribute Motion node is
breaks the ordering of its input, so such a plan cannot be used as
input to a Unique node.

This is possibly unreachable at the moment, because parse analysis
transforms simple DISTINCT queries to GROUP BY (see call to
transformDistinctToGroupBy() in transformSelectStmt()). I have not been
able to come up with a query that would exercise this codepath; any
simple query is transformed to a GROUP BY, and anything more complicated,
with window functions or aggregates, don't yield sorted input to the
DISTINCT stage. But if you disable the DISTINCT -> GROUP BY transformation
in parse analysis, this query caused an assertion before this commit:

    postgres=# create table distincttest (i int, j int) distributed by (i);
    CREATE TABLE
    postgres=# create index on distincttest (j);
    CREATE INDEX
    postgres=# set gp_enable_multiphase_agg =off; set enable_hashagg=off; set enable_seqscan=off; set enable_bitmapscan=off;
    SET
    SET
    SET
    SET
    postgres=# explain select distinct j from distincttest;
    FATAL:  Unexpected internal error (createplan.c:6871)
    DETAIL:  FailedAssertion("!(numCols >= 0 && numCols <= list_length(pathkeys))", File: "createplan.c", Line: 6871)
    server closed the connection unexpectedly
    	This probably means the server terminated abnormally
    	before or while processing the request.
    The connection to the server was lost. Attempting reset: Succeeded.

a9810725

Disable strxfrm for mk_sort at compile time · 2d523e9e

由 Denis Smirnov 提交于 7月 20, 2020

Glibc implementations are known to return inconsistent results for
strcoll() and strxfrm() on many platforms that can cause
unpredictable bugs. Because of that PostgreSQL disabled strxfrm()
by default since 9.5 at compile time by TRUST_STRXFRM definition.
Greenplum has its own mk sort implementation that can also use
strxfrm(). Hence mk sort can also be affected by strcoll() and
strxfrm() inconsistency (breaks merge joins). That is why strxfrm()
should be disabled by default with TRUST_STRXFRM_MK_SORT definition
for mk sort as well. We don't use PostgreSQL's TRUST_STRXFRM
definition as many users used Greenplum with strxfrm() enabled for
mk sort and disabled in PostgreSQL core. Keeping TRUST_STRXFRM_MK_SORT
as a separate definition allows these users not to reindex after
version upgrade.
Reviewed-by: NAsim R P <pasim@vmware.com>
Reviewed-by: NHeikki Linnakangas <linnakangash@vmware.com>
Reviewed-by: NHubert Zhang <hzhang@pivotal.io>

2d523e9e

fix gpload multi-level partition table and special char in columns issue (#10686) · d80ec3a5

由 xiaoxiao 提交于 8月 31, 2020

fix match column condition to resovle primary key conflict when using the gpload
merge mode to import data to the Multi-level partition table
fix fail when special char and capital letters in column names
Co-authored-by: NXiaoxiaoHe <hxiaoxiao@vmware.com>

d80ec3a5

(

Fix bug for ANALYZE inherited tables. (#10723) · 289ebe68

由 (Jerome)Junfeng Yang 提交于 8月 31, 2020

When QD acquiring sample rows on QE, the QE should only collect required
on parent or all inherited tables. Otherwise, the QD may get wrong results
for parent table since the inherited tables will overwrite the expected
values. And we'll have incorrect results in pg_class and pg_statistic.

In `gp_acquire_sample_rows`, the function has three inputs, but somehow,
the code loses the last `inherited` argument usage. This is important to
distinguish whether the QD need samples for parent only or all inherited
tables.

On QE, when receiving ANALYZE request through gp_acquire_sample_rows.
We should only perform do_analyze_rel for the parent table only
or all it's children tables. Because QD will send two acquire sample
rows requests to QE.
To distinguish the two requests, we check the ctx->inherited value.

289ebe68

盏

Enable autoanalyze in Greenplum (#10515) · 0b17968e

由盏一提交于 8月 31, 2020

The basic idea for only enabling auto-ANALYZE through Master’s autovacuum
daemon is to collect pgstat info into Master when executing queries. Start
the Master’s autovacuum launcher process. Fire an autovacuum work process for
a database on Master when the naptime reaches. Then the autovacuum worker
will iterate through all tables/materialized views under a specified database,
and execute ANALYZE for tables which reached the analyze threshold. Note the
ANALYZE statement issued in the autovacuum worker on Master is the same as
executing it through query on QD. ie. The auto-ANALYZE is coordinated by the
master and segments do not start it’s own autovacuum launcher and autovacuum
worker.

More details please refer to src/backend/postmaster/README.auto-ANALYZE.
Co-authored-by: NJunfeng(Jerome) Yang <jeyang@pivotal.io>

0b17968e

28 8月, 2020 4 次提交

H
Don't try to build upstream HTML docs as part of "make world". · 89a11719
由 Heikki Linnakangas 提交于 8月 28, 2020
```
Fixes https://github.com/greenplum-db/gpdb/issues/9973.
```
89a11719

Fix gathering statistics sample from segments. · e878e2e8

由 Heikki Linnakangas 提交于 8月 28, 2020

Commit 0c27e42a changed the way that the gp_acquire_sample_rows()
function, called by ANALYZE, collects the sample rows. With the commit,
the sample size was not chosen correctly. The sample size is passed to
gp_acquire_sample_rows() as an argument, 'targrows', but the function did
not pass it down to the do_analyze_rel() function that actually collects
the sample. As a result, do_analyze_rel() collected a larger sample, but
gp_acquire_sample_rows() only returned the first 'targrows' rows of it
to the caller.

For example, if you have three segments and the total desired sample size
is 3000 rows, gp_acquire_sample_rows() is called with targrows=1000. But
do_analyze_rel() nevertheless collected a sample with 3000 rows, but
only the first 1000 rows of it were returned to the QD. The end result was
that the sample was highly biased towards the physical beginning of table.

This adds a test case, which creates and ANALYZEs a table with values
0-99, with 100 copies of each distinct value. The table is populated in
order, so there is perfect correlation between the physical order and the
values. Before this patch, ANALYZE built a histogram like this for it:

regression=# select histogram_bounds from pg_stats s where tablename = 'uniformtest';
        histogram_bounds
---------------------------------
 {0,3,6,10,13,17,20,24,27,34,40}
(1 row)

After this fix:

         histogram_bounds
----------------------------------
 {0,8,21,32,42,51,60,71,81,89,99}
(1 row)

Commit 0c27e42a updated the plan in expected output of
'gp_aggregates_costs' test. This reverts it back; the reason it changed was
that the statistics were bogus, and now they're good again. I'm not sure
which plan actually is better for that query. The cost estimates are not
very accurate in either case, but they're inaccurate in different ways. The
query actually returns 300000 rows, the estimate with the bogus stats was
463756 rows and with teh correct stats it's 103613.

e878e2e8

Increase default value of guc gp_snapshotadd_timeout · 7c6c1b76

由 Paul Guo 提交于 8月 27, 2020

This is used to avoid "writer segworker group shared snapshot collision on id
153871" kind of error. Pengzhou and I saw this in a real product environment
on gpdb 5. Pengzhou suspected that writer gang exits due to
gp_vmem_idle_resource_timeout but it exits slowly because of ProcArrayLock
contention so the collision happens when a new gang is created. The theory was
roughly verified with process core dump when that issue happens - ProcArrayLock
contention was found in those core files.

Increasing the default gp_snapshotadd_timeout value to tolerate more with the
case. We have been optimizing the ProcArrayLock but we can not 100% avoid the
contention.

7c6c1b76

Disable changing distribution keys implicitly when creating unique index (#10510) · 84d2a23f

由 Hao Wu 提交于 8月 28, 2020

In previous GPDB version, the distribution keys may be changed implicitly
when creating a unique index on a hash-distributed empty table.
```SQL
create table foo(a int, b int) distributed by(a);
create unique index on foo(b);
-- now, foo is hash distributed by b, not by a
```
It might be useful(maybe) to avoid changing the distribution keys. However,
on the other side, it's crazy if the user doesn't notice the NOTICE message
like, "NOTICE:  updating distribution policy to match new UNIQUE index".

What's worse, this behavior could bring data inconsistency. See,
```SQL
create table foo(a int, b int) distributed by(a);
insert into foo select i,i from generate_series(1,5)i;

create table foopart (i int4, j int4) distributed by (i) partition by
        range (i) (start (1) end (3) every (1));
create unique index on foopart_1_prt_1 (j);
insert into foopart values(1,2),(2,1);
```
The data inconsistency is
```
gpadmin=# select gp_segment_id, * from foopart_1_prt_1;
 gp_segment_id | i | j
 ---------------+---+---
             1 | 1 | 2
 (1 row)

gpadmin=# select * from foo f, foopart_1_prt_1 p where f.a = p.j;
 a | b | i | j
 ---+---+---+---
 (0 rows)
```

Implicitly changing the distribution keys is not very useful, but harmful.
This PR disables changing the distribution keys when creating a unique index.
Reviewed-by: NHubert Zhang <hzhang@pivotal.io>

84d2a23f

27 8月, 2020 1 次提交
- M
  
  docs - updated the gpdb-doc readme file · 4facbcd7
  由 mkiyama 提交于 8月 26, 2020
  
  4facbcd7
26 8月, 2020 3 次提交

Refactor code to sort/redistribute input to Agg nodes. · 15e1341c

由 Heikki Linnakangas 提交于 8月 26, 2020

This introduces two new functions cdb_prepare_path_for_sorted_agg() and
cdb_prepare_path_for_hashed_agg(), to sort and/or redistribute the input
to an Agg node, in single-phase aggregation. Previously, the logic was in
the callers in planner.c. This is a nice cleanup now, but is particularly
helpful with the PostgreSQL v12 merge which will introduce more codepaths
that create Agg nodes. Encapsulating the logic in functions reduces the
duplication.

Parallel grouping is currently disabled alogether, but if it wasn't, we
should be using these functions when creating parallel grouping paths,
too.

There's one almost user-visible change here, which explains the change in
'gp_aggregates' expected output. If a sorted Gather Motion is created, we
now use the path keys needed for the grouping (root->grouped_pathkeys),
rather than the pathkeys of the subpath (subpath->pathkeys), as the merge
key for the Gather Motion. The grouped_pathkeys must be a subset of the
subpath's keys, but the subpath might have extra keys that are not needed
for the Agg. Don't bother to preserve the order of those extra keys,
mostly because it's more convenient in the code to not bother with it, but
in principle it also saves some CPU cycles.
Reviewed-by: NGang Xiong <gxiong@pivotal.io>

15e1341c

Fix url_curl on MacOS (#10261) · 89a1211c

由 Xiaoran Wang 提交于 8月 26, 2020

* Fix url_curl on MacOS

Fix libcurl can not read data from gpfdist
on MacOS

But gpfdist with a pipe can not work on macos as
flock(2) which is used in gfile.c is not supported
on MacOS.

89a1211c

Don't try to generate generic plans with GPORCA. · 09aa23d3

由 Heikki Linnakangas 提交于 8月 25, 2020

If you have plan_cache_mode=auto, which is the default, never try to
generate "generic" plans. GPORCA doesn't support Param nodes, so it will
always fall back to the Postgres planner. What happened without this patch
was that the backend code would compare the cost of the custom plan
generated with GPORCA with the cost of a generic plan generated with the
Postgres planner, and that doesn't make much sense because the GPORCA has
a very different cost model from the Postgres planner.

No test, because it would be quite tedious and fragile to write one, and
the code change seems simple enough.

I bumped into this while hacking on PR #10676, which changes the Postgres
planner's cost model. There's a test in 'direct_dispatch' for the generic
plan generation, and it started to fail because with the planner cost
model changes, the Postgres planner's generic plan started to look cheaper
than the custom plan generated with GPORCA. So we do have some test
coverage for this, although accidental.
Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>

09aa23d3

25 8月, 2020 5 次提交

D

Docs - add new postgis packaging revision · 9f7aed4e
由 David Yozie 提交于 8月 25, 2020

9f7aed4e

docs - add information on upgrading to PostGIS 2.5.4 (#10550) · cf020c44

由 Mel Kiyama 提交于 8月 25, 2020

* docs - add information on upgrading to PostGIS 2.5.4

Upgrade instructions 2.1.5 to different versions of 2.5.4

* docs - upgrade to PostGIS 2.5.4 review comments

* docs - more review comment updates.
reorder upgrade sections.
clarify removing PostGIS package, is for removing the gppkg

* docs - minor edit

* docs - review updates - more emphasis on removing PostGIS from a database deleting objects.
-Create separate paragraph in Upgrading section.
-Add warning in Removing PostGIS section

* docs - minor review comment update

* small edits
Co-authored-by: NDavid Yozie <dyozie@pivotal.io>

cf020c44

Harden analyzedb further against dropped/recreated tables (#10669) · 4bbbb381

由 Chris Hajas 提交于 8月 25, 2020

Commit 445fc7cc hardened some parts of analyzedb. However, it missed a
couple of cases.

1) When the statement to get the modcount from the pg_aoseg table failed
due to a dropped table, the transaction was also terminated. This caused
further modcount queries to fail and while those tables were analyzed,
it would error and not properly record the mod count. Therefore, we now
restart the transaction when it errors.

2) If the table is dropped and then recreated while analyzedb is running
(or some other mechanism that results in the table being successfully
analyzed, but the pg_aoseg table did not exist during the initial
check), the logic to update the modcount may fail. Now, we skip the
update for the table if this occurs. In this case, the modcount would
not be recorded and the next analyzedb run will consider the table
modified (or dirty) and re-analyze it, which is the desired behavior.

4bbbb381

Fix flaky 'combocid' test. · 5cbb2282

由 Heikki Linnakangas 提交于 8月 25, 2020

It would sometimes fail like this:

--- /tmp/build/e18b2f02/gpdb_src/src/test/regress/expected/combocid.out	2020-08-25 03:14:48.314831054 +0000
+++ /tmp/build/e18b2f02/gpdb_src/src/test/regress/results/combocid.out	2020-08-25 03:14:48.326832158 +0000
@@ -66,7 +66,7 @@
 FETCH ALL FROM c;
  ctid  | cmin | foobar | distkey
 -------+------+--------+---------
- (0,1) |    0 |      1 |
+ (0,1) |    1 |      1 |
  (0,2) |    1 |      2 |
  (0,5) |    0 |    333 |
 (3 rows)

I was able to reproduce that locally, by inserting a random delay in the
SeqNext() function.

5cbb2282

Allow setting direct dispatch info if predicate on gp_segment_id for planner. · 13b38eb8

由 Zhenghua Lyu 提交于 8月 25, 2020

This commit implements the same feature for planner as the PR
https://github.com/greenplum-db/gpdb/pull/10679.

This commit does not implement the group-by feature in PR 10679.
The following commit message is almost the same as PR 10679.

This approach special cases gp_segment_id enough to include the column
as a distributed column constraint. It also updates direct dispatch info
to be aware of gp_segment_id which represents the raw value of the
segment where the data resides. This is different than other columns
which hash the datum value to decide where the data resides.

After this change the following DDL shows Gather Motion from 2 segments
on a 3 segment demo cluster.

```
CREATE TABLE t(a int, b int) DISTRIBUTED BY (a);
EXPLAIN SELECT gp_segment_id, * FROM t WHERE gp_segment_id=1 or gp_segment_id=2;
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Gather Motion 2:1  (slice1; segments: 2)
   ->  Seq Scan on t
         Filter: ((gp_segment_id = 1) OR (gp_segment_id = 2))
 Optimizer: Postgres query optimizer
(4 rows)
```

13b38eb8