提交 · 5001751af146f3d0c2a32c30818577eef423d2b1 · Greenplum / Gpdb

17 4月, 2018 1 次提交

Fix 'distribution_policy' issue on replicated table by gpcheckcat · 5001751a

由 Pengzhou Tang 提交于 4月 11, 2018

'distribution_policy' test do constraints check on randomly-distributed
tables, however, attrnums = null in gp_distribution_policy is no longer
effective to identify a randomly distributed table after we involved new
replicated distributed policy, so add more filter to make things right.

5001751a

03 4月, 2018 1 次提交

Get rid of pg_exttable.fmterrtbl · 8f6fe2d6

由 Adam Lee 提交于 3月 12, 2018

The pg_exttable.fmterrtbl column stored the OID of the error table, but
without an error table it is just set to the OID of the external table.
That is not necessary, there are other columns which indicate if error
logging is enabled. Therefore this column can be removed.

8f6fe2d6

29 3月, 2018 1 次提交

Support replicated table in GPDB · 7efe3204

由 Pengzhou Tang 提交于 1月 29, 2018

* Support replicated table in GPDB

Currently, tables are distributed across all segments by hash or random in GPDB. There
are requirements to introduce a new table type that all segments have the duplicate
and full table data called replicated table.

To implement it, we added a new distribution policy named POLICYTYPE_REPLICATED to mark
a replicated table and added a new locus type named CdbLocusType_SegmentGeneral to specify
the distribution of tuples of a replicated table.  CdbLocusType_SegmentGeneral implies
data is generally available on all segments but not available on qDisp, so plan node with
this locus type can be flexibly planned to execute on either single QE or all QEs. it is
similar with CdbLocusType_General, the only difference is that CdbLocusType_SegmentGeneral
node can't be executed on qDisp. To guarantee this, we try our best to add a gather motion
on the top of a CdbLocusType_SegmentGeneral node when planing motion for join even other
rel has bottleneck locus type, a problem is such motion may be redundant if the single QE
is not promoted to executed on qDisp finally, so we need to detect such case and omit the
redundant motion at the end of apply_motion(). We don't reuse CdbLocusType_Replicated since
it's always implies a broadcast motion bellow, it's not easy to plan such node as direct
dispatch to avoid getting duplicate data.

We don't support replicated table with inherit/partition by clause now, the main problem is
that update/delete on multiple result relations can't work correctly now, we can fix this
later.

* Allow spi_* to access replicated table on QE

Previously, GPDB didn't allow QE to access non-catalog table because the
data is incomplete,
we can remove this limitation now if it only accesses replicated table.

One problem is QE need to know if a table is replicated table,
previously, QE didn't maintain
the gp_distribution_policy catalog, so we need to pass policy info to QE
for replicated table.

* Change schema of gp_distribution_policy to identify replicated table

Previously, we used a magic number -128 in gp_distribution_policy table
to identify replicated table which is quite a hack, so we add a new column
in gp_distribution_policy to identify replicated table and partitioned
table.

This commit also abandon the old way that used 1-length-NULL list and
2-length-NULL list to identify DISTRIBUTED RANDOMLY and DISTRIBUTED
FULLY clause.

Beside, this commit refactor the code to make the decision-making of
distribution policy more clear.

* support COPY for replicated table

* Disable row ctid unique path for replicated table.
  Previously, GPDB use a special Unique path on rowid to address queries
  like "x IN (subquery)", For example:
  select * from t1 where t1.c2 in (select c2 from t3), the plan looks
  like:
   ->  HashAggregate
         Group By: t1.ctid, t1.gp_segment_id
            ->  Hash Join
                  Hash Cond: t2.c2 = t1.c2
                ->  Seq Scan on t2
                ->  Hash
                    ->  Seq Scan on t1

  Obviously, the plan is wrong if t1 is a replicated table because ctid
  + gp_segment_id can't identify a tuple, in replicated table, a logical
  row may have different ctid and gp_segment_id. So we disable such plan
  for replicated table temporarily, it's not the best way because rowid
  unique way maybe the cheapest plan than normal hash semi join, so
  we left a FIXME for later optimization.

* ORCA related fix
  Reported and added by Bhuvnesh Chaudhary <bchaudhary@pivotal.io>
  Fallback to legacy query optimizer for queries over replicated table

* Adapt pg_dump/gpcheckcat to replicated table
  gp_distribution_policy is no longer a master-only catalog, do
  same check as other catalogs.

* Support gpexpand on replicated table && alter the dist policy of replicated table

7efe3204

13 1月, 2018 2 次提交

Remove filespaces. · 5a3a39bc

由 Heikki Linnakangas 提交于 12月 20, 2017

Remove the concept of filespaces, revert tablespaces to work the same as
in upstream.

There is some leftovers in management tools. I don't know how to test all
that, and I was afraid of touching things I can't run. Also, we may need
to create replacements for some of those things on top of tablespaces, to
make the management of tablespaces easier, and it might be easier to modify
the existing tools than write them from scratch. (Yeah, you could always
look at the git history, but still.)

Per the discussion on gpdb-dev mailing list, the plan is to cherry-pick
commit 16d8e594 from PostgreSQL 9.2, to make it possible to have a
different path for a tablespace in the primary and its mirror. But that's
not included in this commit yet.

TODO: Make temp_tablespaces work.
TODO: Make pg_dump do something sensible, when dumping from a GPDB 5 cluster
that uses filespaces. Same with pg_upgrade.

Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/sON4lraPEqg/v3lkM587BAAJ

5a3a39bc

Remove checks related to persistent tables from gpcheckcat. · 126a7935

由 Heikki Linnakangas 提交于 12月 15, 2017

Because persistent tables are no more.

NOTE: It would still be nice to check for consistency between pg_class
and files on disk, to check that there are no extra data files, and no
data files missing that have a pg_class entry. Same with AO seg files,
I suppose. But that's a significantly different query than what we have
here.

126a7935

12 11月, 2017 1 次提交

Fix quoting issues in dispatching ALTER and DROP DATABASE, and gpcheckcat. · 08edcaa4

由 Heikki Linnakangas 提交于 11月 10, 2017

Unfortunately, pg_upgrade still barfs on this. That was fixed in the
upstream in commit a2385cac13, in PostgreSQL 9.1, so we'll revisit this
once we merge up to that.

08edcaa4

30 10月, 2017 2 次提交

Use string representation of segment ids in CatMissingIssue object. · 792a9b43

由 Heikki Linnakangas 提交于 10月 30, 2017

In commit 226e8867, I changed the CatMissingIssue object to hold the
content IDs of segments where an entry is missing in a Python list, instead
of the string representation of a PostgreSQL array (e.g. "{1,2,-1}") that
was used before. That was a nice simplification, but it turns out that
there was more code that accessed the CatMissingIssue.segids field that I
missed. It would make sense to change the rest of the code, IMHO, but to
make the CI pipeline happy quickly, this commit just changes the code back
to using a string representation of a PostgreSQL array again.

This hopefully fixes the MM_gpcheckcat behave test failures.

792a9b43

Fix gpcheckcat error reporting of missing entries entries. · 8199a402

由 Heikki Linnakangas 提交于 10月 29, 2017

In commit 226e8867, I changed the shape of the result set passed to the
processMissingDuplicateEntryResult() function, removing the "exists" column.
But I failed to update the line that extracts the primary key columns from
the result set for that change. Fix.

This should fix the failures in the gpcheckcat behave tests.

8199a402

28 10月, 2017 1 次提交

Don't use a temp table in gpccheckcat, when checking for missing entries. · 226e8867

由 Heikki Linnakangas 提交于 10月 28, 2017

The new query is simpler. There was a comment about using the temp table
to avoid gathering all the data to the master, but I don't think that is a
good tradeoff. Creating a temp table is pretty expensive, and even with
the temp table, the master needs to broadcast all the master's entries from
to the segments. For comparison, with the Gather node, all the segments
need to send their entries to the master. Isn't that roughly the same
amount of traffic?

A long time ago, the query was made to use the temp table, after a report
from a huge cluster with over 1000 segments, where the total size of
pg_attribute, across all the nodes, was over 200 GB. So the catalogs can
be large. But even then, I don't think this query can get much better than
this.

The new query moves some of the logic from SQL to the Python code. Seems
simpler that way.

The real reason to do this right now is that in the next commit, I'm
going to change the way snapshots are dispatched with a query, and that
change will change the visibility of the temp table that was created in
the same command. In a nutshell, currently, if you do "CREATE TABLE mytemp
AS SELECT oid FROM pg_class WHERE relname='mytemp'", the oid of the table
being created is included. On PostgreSQL, and after the snapshot changes
I'm working on, it will not be. And would confuse this gpcheckcat query.

226e8867

18 10月, 2017 1 次提交

pg_auth_members, not pg_auth_member · 4c34361a

由 Brendan Stephens 提交于 10月 01, 2017

Causes catalog tests to fail as we resort to oid, which does not exist.
Also causes exception in SUMMARY and terminates.

4c34361a

24 6月, 2017 3 次提交

Update gpcheckcat dependency check · 6375e9bc

由 Jimmy Yih 提交于 6月 16, 2017

The current gpcheckcat dependency check only checked for extra
pg_depend entries where a pg_depend entry's objid or refobjid did not
exist as an OID of any catalog table with hasoids set. We also need to
check the reverse scenario where a catalog entry is missing an entry
in pg_depend. This particular scenario is difficult to flag due to
catalog entries having multiple unique pg_depend references or are
created later from a query that may add dependency (e.g. granting
ownership of a database to a certain user). Therefore, we add a very
basic check only against catalog tables that immediately create
dependencies upon its relative query.

6375e9bc

Change gpcheckcat missing/extra check to include pg_depend. · 4bdc7e45

由 Jimmy Yih 提交于 6月 09, 2017

We did not check for missing or extra pg_depend entries across the
cluster during gpcheckcat. We would be unaware of scenarios where a
pg_depend entry went missing and the object that used that dependency
is dropped. Those scenarios could lead to leftover catalog entries and
prevent some simple CREATE statements.

4bdc7e45

Fix gpcheckcat output · 843b2109

由 Jimmy Yih 提交于 6月 08, 2017

As gpcheckcat builds its mapping of catalog issues, it can flag
objects whose parents no longer exist (e.g. a toast table left over
after dropping a table). When these get caught, gpcheckcat will
unfortunately error out on the reporting step. To prevent erroring
out, we just check for None in the RelationObject's vars during
reporting.

Another issue that is fixed is the repetitive reporting of issues on
the testing's current database following testing of a different
database. The catalog issues reported were invalid for the current
database and were actually issues from the previous database that was
checked. This was caused by the improper resetting of the GPObjects
and GPObjectGraph global dictionaries. To fix the issue, we properly
use the clear() function to reuse the global variables.

843b2109

31 3月, 2017 1 次提交
- L
  Modify gpcheckcat issue report to show content id instead of segname+1 · c1782222
  由 Larry Hamel 提交于 3月 29, 2017
```
Signed-off-by: NChumki Roy <croy@pivotal.io>
```
  c1782222
17 3月, 2017 1 次提交

Fix gpcheckcat for pg_amop, pg_amproc and pg_index. · 0f0faf9e

由 Ashwin Agrawal 提交于 3月 14, 2017

pg_amop, pg_amproc oids are not synchronized from master and segments, hence
adding them to known differences. There are more tables which needs to be
cleaned-up here based on function RelationNeedsSynchronizedOIDs() in catalog.c
based leaving that for separate commit.

Also, since there exist simpler way to skip performing check for column use the
same for `indcheckxmin` column of pg_index instead of what was added as part of
commit 79caf1c0

Also, cleanup some checks existing for versions prior to 4.1.

0f0faf9e

09 3月, 2017 1 次提交

Skip pg_index.indcheckxmin column in gpcheckcat. · 79caf1c0

由 Shoaib Lari 提交于 3月 07, 2017

The gpcheckcat 'inconsistent' check should skip the pg_index.indcheckxmin column
for now as due to the HOT feature, the value of this column can be different
between the master and the segments.

A more long term fix will resolve the underlying issue that makes the
indcheckxmin column value to be different between the master and the segments.

79caf1c0

10 2月, 2017 1 次提交
- C
  Fix end result timestamp for gpcheckcat · 8178dd7b
  由 Chumki Roy 提交于 2月 08, 2017
```
* It was reporting minutes as hours
Signed-off-by: NMarbin Tan <mtan@pivotal.io>
```
  8178dd7b
25 1月, 2017 1 次提交

gpcheckcat query for gp_relation_node to include tablespace. · 3a6fc29e

由 Ashwin Agrawal 提交于 12月 30, 2016

Commit 8fe321af added tablespace OID to
gp_relation_node to correctly reflect unique relfilenode. As a result need to
modify the gpcheckcat query by adding tablespace OID validating
gp_relation_node's correctness with gp_persistent_relation_node.

3a6fc29e

30 11月, 2016 1 次提交

gpcheckcat should not check for AO segidxid · e2edf501

由 Jimmy Yih 提交于 11月 29, 2016

The segidxid column is no longer available.  It was removed in
https://github.com/greenplum-db/gpdb/commit/6580341ab62242f72738bfa865a690fcda4b5dee.

e2edf501

17 11月, 2016 1 次提交

Fix gpcheckcat extraneous entry repair to use primary keys if oids not... · dab6fe18

由 Chumki Roy 提交于 11月 16, 2016

Fix gpcheckcat extraneous entry repair to use primary keys if oids not consistent across segments. (#1288)

gpcheckcat missing/extraneous repair assumed oids are consistent across
segments for a given catalog entry.This is not always guaranteed for all catalog
tables (which are defined in gpcheckcat). Instead, we need to generate the
repair based on the primary keys (the unique identifier in this case).

Authors: Marbin, Larry, Chris, Chumki and Karen

dab6fe18

21 10月, 2016 1 次提交

add gpcheckcat check for "mirroring_matching" (#1195) · bf0ab799

由 Larry Hamel 提交于 10月 20, 2016

* Add gpcheckcat check for "mirroring_matching"

This check compares the configuration setting for mirroring against each
segment's report of its current mirror state (whether mirrors are enabled
or not) and reports any mismatch.

Authors: Larry Hamel, Karen Huddleston, Chumki Roy

bf0ab799

03 9月, 2016 1 次提交

Enhance foreign key check detection for gpcheckcat (#1061) · 92cc0187

由 Marbin Tan 提交于 9月 02, 2016

* Enhance foreign key check detection for gpcheckcat

We will now do a bidirectional check for the following
catalogs with foreign key check with pg_class:
    pg_attribute, pg_index, and pg_appendonly

This is accomplished by doing a full join based on the foreign keys and
primary keys between pg_class and a catalog mentioned above. Once
detected, we will now output missing catalog infront of our stderr.

* Add unit test for foreign_key check in gpcheckcat

* update the behave test to account for gpcheckcat creating a repair for foreign_key.

* Add new module for foreign key check in gpcheckcat
* get unittest in a working state
* refactor unit test to be "smaller"

Update behave and unit test for gpcheckcat foreign_key_check:

* We are keeping gp_distribution_policy to be checked as part of the
  left join instead of skipping it all together just because it's a
  master only catalog table.
* Add behave test that deletes from segments instead of master for
  foreign key check
* Fix unit test to be able to mock out the full and left join queries
  for forign key check

Add unit test for gpcheck -C option

* Remove pg_constraint check

Due to pg_class limitations where some flags (relhaspkey) are maintained
lazily (updates to some pg_class columns are not enforced).
There is no good way to create a one to one mapping between pg_class
and pg_constraint. So, removing/commenting it out for now

Authors: Nikhil Kak, Chris Hajas, Chumki Roy, and Marbin Tan

92cc0187

21 7月, 2016 1 次提交

Add -E flag to gpcheckcat to perform missing/extraneous repair · f9fcaafd

由 Christopher Hajas 提交于 7月 18, 2016

The missing and extraneous repair in gpcheckcat should not generate a repair file unless the -E flag
is provided, as this repair does not cover all cases.

Authors: Chris Hajas and Karen Huddleston

f9fcaafd

07 7月, 2016 1 次提交

Refactor gpcheckcat to modularize the repair code and add repair for the... · 20ccb77a

由 kaknikhil 提交于 7月 06, 2016

Refactor gpcheckcat to modularize the repair code and add repair for the missing_extraneous module. (#873)

* refactor gpcheckcat to modularize the repair code and add repair for
missing_extraneous.
1. Moved all the generic repair code to its own module.
2. New module for missing_extraneous repair.
3. Make gpcheckcat do repairs per check instead of doing it after all
the checks have finished
4. Fix gpcheckcat so that it now creates a repair dir with -R option.
5. Unit and behave tests for the above mentioned modules.
6. Update gpcheckcat behave to use consistent db names
7. Behave tests for a few other checks like foreign_key, constraint_db,
part_integrity and also for the -g option

20ccb77a

06 7月, 2016 1 次提交

Remove the last remnants of gp_verification_history · 227a1ae0

由 Daniel Gustafsson 提交于 7月 06, 2016

The gpverify code was remove in 8afc1dd1 and one subsequent commit.
Clean out the last few trivial mentions of it.

227a1ae0

30 4月, 2016 1 次提交

Allow gpcheckcat to determine default batch size on runtime · 15b605c3

由 Marbin Tan 提交于 4月 27, 2016

Batch size of 8 is too low and each cluster may have a different
system configuration, so we would like to dertermine a default batch
size before running gpcheckcat.

* Add unittest for batch size

* Truncate batch size to be, at maximum, the amount of primaries.
  Batch size can be no longer be larger than the amount of primaries.

* Refactor: Create method main() for gpcheckcat
  In order improve unit testing, move functionality from '__main__'
  to a method.

Authors: Marbin Tan, Larry Hamel, Nikhil Kak

15b605c3

29 4月, 2016 1 次提交

Separated missing issues from extra issues in gpcheckcat summary · 3ab9ecb9

由 Chumki Roy 提交于 4月 26, 2016

In a previous commit, f569c1d1,
gpcheckcat was modified to display a list of tables with missing
attributes.  This commit adds the ability to list tables with
extraneous attributes.

Authors: Chumki Roy and James McAtamney

3ab9ecb9

26 4月, 2016 1 次提交
- N
  
  update description for namespace test in gpcheckcat · a9c1ef15
  由 Nikhil Kak 提交于 4月 25, 2016
  
  a9c1ef15
16 4月, 2016 1 次提交

Use leaked schema dropper in gpcheckcat · 9dfaf11e

由 Stephen Wu 提交于 4月 12, 2016

- gpcheckcat should return a return code of 0 if schemas are found/dropped
- Backfilled tests for leaked schema logging
- Also cleaned up typo in Makefile

9dfaf11e

15 4月, 2016 1 次提交

Add behave test for gpcheckcat: catalog problems for conflicting db ownership... · 8793ffe5

由 Larry Hamel 提交于 4月 08, 2016

Add behave test for gpcheckcat: catalog problems for conflicting db ownership will generate repair scripts that have timestamped names (to avoid overwriting)
- add behave steps for removing and also validating directories that are relative to the current working directory (allows * globbing)
- criteria for determining that repair file was not overwritten:  run gpcheckcat twice, and expect 1 and then 2 repair scripts.
- verify that stdout reports "owner" error
- change criteria for gpcheckcat ownership: relax the pg_type check so that both 4.3STABLE and master report an ownership error for pg_type.

8793ffe5

14 4月, 2016 1 次提交

Modified gpcheckcat to display tables having missing attributes in a · f569c1d1

由 Christopher Hajas 提交于 4月 01, 2016

more readable format.

Previously, gpcheckcat identified tables with missing attributes but
displayed them in various locations in the output in a
non-standardized format. This commit summarizes all the tables with missing
attributes in one section at the end of the output in the format
"[database].[schema].[table].[segment id]".

Authors: Chris Hajas and Jamie McAtamney

f569c1d1

13 4月, 2016 7 次提交

N

Rename unique index violation check to remove redundancy. · 0c2bc820
由 Nikhil Kak and Stephen Wu 提交于 4月 12, 2016

0c2bc820

Update logging for gpcheckcat · b09a335a

由 Nikhil Kak and Stephen Wu 提交于 4月 12, 2016

- print out all information about violated indexes in the logs
- changes unique index check to also return the list of violated
  segments in order to support this

b09a335a

N

Set exit code to non-zero if unique index check fails · 8e72335a
由 Nikhil Kak and Stephen Wu 提交于 4月 12, 2016

8e72335a

gpcheckcat refactoring · f9b551ac

由 Nikhil Kak and Stephen Wu 提交于 4月 12, 2016

- more accurate function names in GPObject
- remove duplication of query strings
- make file import for test subject independent of current working
  directory
- update test names to match casing convention

f9b551ac

N
Re-add test for dropping leaked schemas · 2d9db6cb
由 Nikhil Kak and Stephen Wu 提交于 4月 12, 2016
```
- additionally, minor refactoring in gpcheckcat and unique index check
```
2d9db6cb
N
Properly fail the unique index check in gpcheckcat · 7c6b108a
由 Nikhil Kak and Stephen Wu 提交于 4月 12, 2016
```
- Before this we were adding issues to the report but not actually
  setting the checkStatus variable to False
```
7c6b108a

Add unique index violation check to the list of checks in gpcheckcat · fda90975

由 Chumki Roy and Stephen Wu 提交于 4月 12, 2016

- includes unit tests for gpcheckcat covering everything after
  option parsing and everything before writing to stdout/logs
- in the process of rebasing, removes the tests for dropping leaked
  schemas.  These tests will be added back in the later commit
- adds a behave integration test for this check

fda90975

08 4月, 2016 2 次提交

gpcheckcat refactoring and nomenclature changes · b57da2ea

由 Stephen Wu 提交于 4月 06, 2016

- use consistent spacing
- use spaces instead of tabs
- update nomenclature in gpcheckcat - rename 'test' to 'check'
  This is is to be more consistent and to avoid confusion with unit/integration
  tests in the future.  We purposely did not change 'test' in stdout and logs
  because integration tests for other parts of the system that call
  gpcheckcat currently fail if our output changes.  A future commit may
  change 'test' to 'check' in stdout in addition to fixing the aforementioned
  integration tests.

Authors: Chumki Roy and Stephen Wu

b57da2ea

drop leaked/orphan schemas before running gpcheckcat catalog checks (#595) · 41dfd82b

由 kaknikhil 提交于 4月 07, 2016

* drop leaked schemas before running gpcheckcat tests

  1. drop any leaked/orphaned schemas before running any of the gpcheckcat tests
  2. add unit and behave tests
  3. move gpcheckcat from gpMgmt/bin/lib to gpMgmt/bin
  4. misc refactoring

orphan/leaked schemas are temp schemas that are not associated with any session id.
There used to be a check for leaked temp schemas in gpcheckcat
which ended up creating a repair script.

* drop the database at the end of the behave test

* move the gpcheckcat bin to lib symlink before the copy to /Users/nikhilkak/git/gpdb/gpAux/greenplum-db-devel

* fix the if check before symlinking  gpcheckcat from bin to lib

closes #595

41dfd82b