提交 · 6.10.0 · Greenplum / Gpdb

05 8月, 2020 4 次提交

由 xiong-gang 提交于 8月 05, 2020

In commit 3ef5e267, we changed the value of some macro in guc.h, that would
introduce ABI change if 3rd-party libaraies are using the macros.

cced3a78

Add a test for replication lag · 964851d5

由 xiong-gang 提交于 8月 05, 2020

When the mirror is down and the master is reset, the 'dtx recovery' process
will hang because the primary can't sync the WAL to the mirror. The FTS
should be able to probe and 'sync off' the mirror.

964851d5

D

Docs - update versioning to 6.10 · 87e5a9d0
由 David Yozie 提交于 8月 04, 2020

87e5a9d0
L
docs - note how oss users get gpbackup (#10478) · 7e08fb15
由 Lisa Owen 提交于 8月 04, 2020
```
* docs - note how oss users get gpbackup

* small edit
```
7e08fb15

04 8月, 2020 3 次提交

Fix flaky test max_slot_wal_keep_size · 4561146e

由 xiong-gang 提交于 8月 04, 2020

In the test, check 'sync_state' in pg_stat_replication is unnecessary and it
makes the test flaky, so remove it.

4561146e

ic-proxy: correct SIGHUP handler · cf0c2fcd

由 Ning Yu 提交于 8月 04, 2020

Fixed the bug that the SIGHUP handler was installed for SIGINT by
mistake, so the ic-proxy bgworkers would die on SIGHUP.

By correcting the signal name, now we could let the ic-proxy bgworkers
reload the postgresql.conf by executing "gpstop -u".
Reviewed-by: NHubert Zhang <hzhang@pivotal.io>
(cherry picked from commit a181655b)

cf0c2fcd

D

Docs - fix problem in xml xrefs · f03cf59f
由 David Yozie 提交于 8月 03, 2020

f03cf59f

03 8月, 2020 7 次提交

D
Revert "Revert "docs - add config file parameter fill_missing_fields (#10404)"" · 9f305aa8
由 David Yozie 提交于 8月 03, 2020
```
This reverts commit bd5b35b1.
```
9f305aa8
G

Change the unit of the GUC from kb to mb · 3ef5e267
由 Gang Xiong 提交于 7月 28, 2020

3ef5e267

Make max_slot_wal_keep_size work on 6X · ea69506b

由 Gang Xiong 提交于 6月 17, 2020

1. change the GUC unit from MB to KB as 6X doesn't have GUC_UNIT_MB.
2. the upstream commit added 3 fields in the system view
   'pg_replication_slots', this commit remove that change since we cannot make
   catalog change on 6X.
3. upstream uses 'slot->active_pid' to identify the process that acquired the
   replication slot, this commit added 'walsnd' in 'ReplicationSlot' to do the
   same.
4. upstream uses condition variable to wait the walsender exit, this commit
   uses WalSndWaitStoppingOneWalSender as we don't have condition variable on 6X.
5. add test cases.

ea69506b

Allow users to limit storage reserved by replication slots · 7a274622

由 Alvaro Herrera 提交于 4月 07, 2020

Replication slots are useful to retain data that may be needed by a
replication system. But experience has shown that allowing them to
retain excessive data can lead to the primary failing because of running
out of space. This new feature allows the user to configure a maximum
amount of space to be reserved using the new option
max_slot_wal_keep_size. Slots that overrun that space are invalidated
at checkpoint time, enabling the storage to be released.

Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
Reviewed-by: NMasahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: NJehan-Guillaume de Rorthais <jgdr@dalibo.com>
Reviewed-by: NÁlvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20170228.122736.123383594.horiguchi.kyotaro@lab.ntt.co.jp

7a274622

W
Add "FILL_MISSING_FIELDS" option for gpload. · 7afdd72c
由 Wen Lin 提交于 8月 03, 2020
```
This reverts commit 7118e8ac.
```
7afdd72c

(

Resolve high `CacheMemoryContext` usage for `ANALYZE` on large partition table. (#10554) · f8c8265a

由 (Jerome)Junfeng Yang 提交于 8月 03, 2020

In some cases, merge stats logic for root partition table may consume
very high memory usage in CacheMemoryContext.
This may lead to `Canceling query because of high VMEM usage` when
concurrently ANALYZE partition tables.

For example, there are several root partition tables and they both have
thousands of leaf tables. And these tables are all wide tables that may
contain hundreds of columns.
So when analyze()/auto_stats() leaf tables concurrently,
`leaf_parts_analyzed` will consume lots of memory(catalog catch for
pg_statistic and pg_attribute) under
CacheMemoryContext for each backend, which may hit the protect VMEM
limit.
In `leaf_parts_analyzed`, a single backend's leaf table analysis for a
root partition table, it may add cache entries up to
number_of_leaf_tables * number_of_columns tuples from pg_statistic and
number_of_leaf_tables * number_of_columns tuples from pg_arrtibute.
Set guc `optimizer_analyze_root_partition` or
`optimizer_analyze_enable_merge_of_leaf_stats` to false could skip merge
stats for root table and `leaf_parts_analyzed` will not execute.

To resolve this issue:
1. When checking whether merge stats are available for a root table in
`leaf_parts_analyzed`, check whether all leaf tables are ANALYZEd first,
if they're still un-ANALYZE leaf table exists, return quickly to avoid touch
columns' pg_attribute and pg_statistic per leaf table(this will save lots of time).
And also don't rely on system catalog cache and use the
index to fetch the stats tuple to avoid one-time cache usage(in common cases).

2. When merging a stats in `merge_leaf_stats`, don't rely on system
catalog cache and use the index to fetch the stats tuple.

There are side-effects for not rely on system catalog cache(which are all **rare** situations).
1. If insert/update/copy several leaf tables which under **same
root partition** table in **same session** and all leaf tables are **analyzed**
will be much slower since auto_stats will call `leaf_parts_analyzed` once the leaf
table gets updated, and we don't rely on system catalog cache now.
(`set optimizer_analyze_enable_merge_of_leaf_stats=false` could avoid
this)

2. ANALYZE the same root table several times in the same session is much
slower than before since we don't rely on system catalog cache.

Seems this solution improves the performance for ANALYZE, and
it also makes ANALYZE won't hit the memory issue anymore.

(cherry picked from commit 533a47dd)

f8c8265a

ic-proxy: handle early coming BYE correctly · bd8959f6

由 Ning Yu 提交于 8月 02, 2020

In a query that contains multiple init/sub plans, the packets of the
second subplan might be received while the first is still being
processed in the ic-proxy mode, this is because in ic-proxy mode a local
host handshake is used instead of the global one.

To distinguish the packets of different subplans, especially for the
early coming ones, we must stop handling on the BYE immediately, and
pass any unhandled early coming pkts to the successor or the
placeholder.

This fixes the random hanging during the ICW parallel group of
qp_functions_in_from.  No new test is added.
Co-authored-by: NHubert Zhang <hzhang@pivotal.io>
Co-authored-by: NNing Yu <nyu@pivotal.io>
(cherry picked from commit 79ff4e62)

bd8959f6

01 8月, 2020 1 次提交

gpinitsystem: use new 6-field ARRAY format internally for QD and QEs · 27038bd4

由 bhuvnesh chaudhary 提交于 7月 28, 2020

The initialization file (passed as gpinitsystem -I <file>) can have two
formats: legacy (5-field) and new (6-field, that has the HOST_ADDRESS).

This commit fixes a bug in which an internal sorting routine that matched
a primary with its corresponding mirror assumed that <file> was always
in the new format.  The fix is to convert any input <file> to the new
format via re-writing the QD_ARRAY, PRIMARY_ARRAY and MIRROR_ARRAY to
have 6 fields.  We also always use '~' as the separator instead of ':'
for consistency.

The bug fixed is that a 5-field <file> was being sorted numerically,
causing either the hostname (on a multi-host cluster) or the port (on
a single-host cluster) to be used to sort instead or the content.
This could result in the primary and its corresponding mirror being
created on different contents, which fortunately hit an internal error
check.

Unit tests and a behave test have been added as well.  The behave test
uses a demo cluster to validate a legacy gpinitsystem initialization
file format (e.g. one that has 5 fields) successfully creates a
Greenplum database.
Co-authored-by: NDavid Krieger <dkrieger@vmware.com>

27038bd4

31 7月, 2020 4 次提交

Correct and stabilize some replication tests · 15dd8027

由 Ashwin Agrawal 提交于 7月 23, 2020

Adding pg_stat_clear_snapshot() in functions looping over
gp_stat_replication / pg_stat_replication to refresh result everytime
the query is run as part of same transaction. Without
pg_stat_clear_snapshot() query result is not refreshed for
pg_stat_activity neither for xx_stat_replication functions on multiple
invocations inside a transaction. So, in absence of it the tests
become flaky.

Also, tests commit_blocking_on_standby and dtx_recovery_wait_lsn were
initially committed with wrong expectations, hence were missing to
test the intended behavior. Now reflect the correct expectation.

(cherry picked from commit c565e988)

15dd8027

A
Add mirror_replay test to greenplum_schedule · 29ca99ee
由 Ashwin Agrawal 提交于 7月 30, 2020
```
This was missed in commit 96b332c0.

(cherry picked from commit 8ef5d722)
```
29ca99ee

Add knowledge of partition selectors to Orca's DPv2 algorithm (#10263) (#10558) · d3886cf2

由 Chris Hajas 提交于 7月 30, 2020

Orca's DP algorithms currently generate logical alternatives based only on cardinality; they do not take into account motions/partition selectors as these are physical properties handled later in the optimization process. Since DPv2 doesn't generate all possible alternatives for the optimization stage, we end up generating alternatives that do not support partition selection or can only place poor partition selectors.

This PR introduces partition knowledge into the DPv2 algorithm. If there is a possible partition selector, it will generate an alternative that considers it, in addition to the previous alternatives.

We introduce new properties, m_contain_PS  to indicate whether a SExpressionInfo contains a PS for a particular expression. We consider an expression to have a possible partition selector if the join expression columns and the partition table's partition key overlap. If they do, we mark this expression as containing a PS for a particular PT.

We consider a good PS one which is selective. Eg:
```
- DTS
- PS
   -TS
     - Pred
```

would be selective. However, if there is no selective predicate, we do not consider this as a promising PS.

For now, we add just a single alternative that satisfies this property and only consider linear trees.

This is a backport of 9c445321

d3886cf2

Improve cardinality for joins using distribution columns in ORCA · 4b473948

由 Ashuka Xue 提交于 7月 13, 2020

This commit only affects cardinality estimation in ORCA when the user
sets `optimizer_damping_factor_join = 0`. It improves the square root
algorithm first introduced by commit ce453cf2.

In the original square root  algorithm, we assumed that distribution
column predicates would have some correlation with other predicates in
the join and therefore would be accordingly damped when calculating join
cardinality.

However, distribution columns are ideally unique in order to gain the
best performance for GPDB. Under this assumption, distribution columns
should not be correlated and thus needed to be treated as independent
when calculating join cardinality. This is a best guess since we do not
have a way to support correlated columns at this time.
Co-authored-by: NAshuka Xue <axue@vmware.com>
Co-authored-by: NChris Hajas <chajas@vmware.com>

4b473948

30 7月, 2020 1 次提交

Add Orca support for index only scan · 93c9829a

由 David Kimura 提交于 6月 30, 2020

This commit allows Orca to select plans that leverage IndexOnlyScan
node. A new GUC 'optimizer_enable_indexonlyscan' is used to enable or
disable this feature. Index only scan is disabled by default, until the
following issues are addressed:

  1) Implement cost comparison model for index only scans. Currently,
     cost is hard coded for testing purposes.
  2) Support index only scan using GiST and SP-GiST as allowed.
     Currently, code only supports index only scans on b-tree index.
Co-authored-by: NChris Hajas <chajas@vmware.com>
(cherry picked from commit 3b72df18)

93c9829a

29 7月, 2020 20 次提交

N
Revert "ic-proxy: get libuv on pr pipeline" · a9705b21
由 Ning Yu 提交于 7月 29, 2020
```
This reverts commit 7f44bfa8.
```
a9705b21

ic-proxy: include postmaster pid in the domain socket path · fb16decf

由 Ning Yu 提交于 7月 29, 2020

We used to store them under /tmp/, we include the postmaster port number
in the file name in the hope that two clusters will not conflict with
each other on this file.

However the conflict still happen in the test src/bin/pg_basebackup.
And it can also happen if a second cluster is missed configured by
accident.  So to make things safe we also include the postmaster pid in
the domain socket path, there is no chance for two postmasters to share
the same pids.
Reviewed-by: NPaul Guo <pguo@pivotal.io>
(cherry picked from commit 5c5a358a)

fb16decf

Partial revert ic-proxy: enable ic-proxy in gpdb packages · 00521b2d

由 Tyler Ramer 提交于 7月 27, 2020

The libuv source is in the build container, rather thank in a gcp
bucket. This commit is a partial revert from getting source from a
gcp bucket and instead uses the libuv available in the build
container image.
Co-authored-by: NTyler Ramer <tramer@vmware.com>
Co-authored-by: NKris Macoskey <kmacoskey@vmware.com>
Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>

(cherry picked from commit 88d7d969)

00521b2d

A
Allow isolation2 to take additional pg_regress options. · a2bcfe85
由 Adam Berlin 提交于 3月 29, 2019
```
(cherry picked from commit 3b7d6b85)
```
a2bcfe85

ic-proxy: get libuv on pr pipeline · 7f44bfa8

由 Ning Yu 提交于 7月 24, 2020

We must install libuv on the PR pipeline to compile with ic-proxy
enabled.  ICW tests are still run in ic-udpifc mode.

(cherry picked from commit ef887cfe)

7f44bfa8

ic-proxy: reload addresses on SIGHUP · 5c4106c0

由 Ning Yu 提交于 7月 23, 2020

We used to mark the GUC gp_interconnect_proxy_addresses as
PGC_POSTMASTER, so the cluster must be restarted to reload this setting,
this can be a problem during gpexpand: the cluster expansion itself is
online, but to configure the proxy addresses for the new segments a
restart is needed.

Now we changed it to PGC_SIGHUP, so the setting can be reloaded on
SIGHUP.

Also changed the setting from a developer option to a normal one.

(cherry picked from commit c2523232)

5c4106c0

N
ic-proxy: do not generate too many messages · f8d97936
由 Ning Yu 提交于 7月 11, 2020
```
(cherry picked from commit c6c36cc8)
```
f8d97936

Enable libuv and ic-proxy on Travis · e88c1b69

由 Ning Yu 提交于 3月 27, 2020

The code will be compiled with ic-proxy enabled, but the tests are still
ran in the default ic-udpifc mode.
Authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
(cherry picked from commit cdf4cdeb)

e88c1b69

N
ic-proxy: add ic-proxy ICW jobs · aa37a8e0
由 Ning Yu 提交于 7月 10, 2020
```
(cherry picked from commit 190ac2d0)
```
aa37a8e0

ic-proxy: enable ic-proxy in gpdb packages · 4de88554

由 Ning Yu 提交于 7月 10, 2020

Build gpdb binaries with --enable-ic-proxy, and include libuv in the
gpdb packages.

(cherry picked from commit 87e1d9ee)

4de88554

ic-proxy: enable ic-proxy with --enable-ic-proxy · ffcbff26

由 Ning Yu 提交于 6月 15, 2020

We used to use the option --with-libuv to enable ic-proxy, it is not
staightforward to understand the purpose of that option, though.  So we
renamed it to --enable-ic-proxy, and the default setting is changed to
"disable".

Suggested by Kris Macoskey <kmacoskey@pivotal.io>

(cherry picked from commit 81810a20)

ffcbff26

ic-proxy: let backends connect to the proxy bgworker · cc6063d4

由 Ning Yu 提交于 5月 18, 2020

Only in proxy mode, of course.  Currently the ic-proxy mode shares most
of the backend logic with ic-tcp mode, so instead of copying the code we
actually embed the ic-proxy specific logic in ic_tcp.c .

(cherry picked from commit 94c9d996)

cc6063d4

N
ic-proxy: launch as a bgworker · 2b3aaba2
由 Ning Yu 提交于 5月 18, 2020
```
(cherry picked from commit 5b60069c)
```
2b3aaba2
N
ic-proxy: new value "proxy" in GUC gp_interconnect_type · e0d07d8d
由 Ning Yu 提交于 5月 18, 2020
```
It is for the ic-proxy mode.

(cherry picked from commit 245ca266)
```
e0d07d8d
N
ic-proxy: make gp_interconnect_proxy_addresses a GUC · 4cc5bc93
由 Ning Yu 提交于 5月 18, 2020
```
(cherry picked from commit 3140a44f)
```
4cc5bc93

ic-proxy: implement the core logic · 7b8bedec

由 Ning Yu 提交于 5月 18, 2020

The interconnect proxy mode, a.k.a. ic-proxy, is a new interconnect
mode, all the backends communicate via a proxy bgworker, all the
backends on the same segment share the same proxy bgworker, so every two
segments only need one network connection between them, which reduces
the network flows as well the ports.

To enable the proxy mode we need to first configure the guc
gp_interconnect_proxy_addresses, for example:

    gpconfig \
      -c gp_interconnect_proxy_addresses \
      -v "'1:-1:10.0.0.1:2000,2:0:10.0.0.2:2001,3:1:10.0.0.3:2002'" \
      --skipvalidation

Then restart to take effect.

(cherry picked from commit 6188fb1f)

7b8bedec

Store dbid in CdbProcess · 87a66baf

由 Ning Yu 提交于 5月 18, 2020

It is a preparation for the ic-proxy mode, we need this information to
distinguish a primary segment with its mirror.

(cherry picked from commit 8804bf39)

87a66baf

Upgrade pgbouncer to 1.13 (#10405) · 72b40812

由 Xiaoran Wang 提交于 7月 29, 2020

* Upgrade pgbouncer to 1.13

Make pgbouncer 1.13 to work on centos6

Use pgbouncer master branch

72b40812

D

Docs - add GreenplumR 1.1.0 to component version table · 12b98731
由 David Yozie 提交于 7月 28, 2020

12b98731

docs - add GUC gp_add_column_inherits_table_setting 6.x only (#10517) · cc057f2a

由 Mel Kiyama 提交于 7月 28, 2020

* docs - add GUC gp_add_column_inherits_table_setting 6.x only

-Add GUC
-Also, updated ADD COLUMN clause of  ALTER TABLE command with GUC information

* docs - update based on review comment.

* docs - review comment update.

cc057f2a