- 17 4月, 2018 1 次提交
-
-
由 Pengzhou Tang 提交于
'distribution_policy' test do constraints check on randomly-distributed tables, however, attrnums = null in gp_distribution_policy is no longer effective to identify a randomly distributed table after we involved new replicated distributed policy, so add more filter to make things right.
-
- 03 4月, 2018 1 次提交
-
-
由 Adam Lee 提交于
The pg_exttable.fmterrtbl column stored the OID of the error table, but without an error table it is just set to the OID of the external table. That is not necessary, there are other columns which indicate if error logging is enabled. Therefore this column can be removed.
-
- 29 3月, 2018 1 次提交
-
-
由 Pengzhou Tang 提交于
* Support replicated table in GPDB Currently, tables are distributed across all segments by hash or random in GPDB. There are requirements to introduce a new table type that all segments have the duplicate and full table data called replicated table. To implement it, we added a new distribution policy named POLICYTYPE_REPLICATED to mark a replicated table and added a new locus type named CdbLocusType_SegmentGeneral to specify the distribution of tuples of a replicated table. CdbLocusType_SegmentGeneral implies data is generally available on all segments but not available on qDisp, so plan node with this locus type can be flexibly planned to execute on either single QE or all QEs. it is similar with CdbLocusType_General, the only difference is that CdbLocusType_SegmentGeneral node can't be executed on qDisp. To guarantee this, we try our best to add a gather motion on the top of a CdbLocusType_SegmentGeneral node when planing motion for join even other rel has bottleneck locus type, a problem is such motion may be redundant if the single QE is not promoted to executed on qDisp finally, so we need to detect such case and omit the redundant motion at the end of apply_motion(). We don't reuse CdbLocusType_Replicated since it's always implies a broadcast motion bellow, it's not easy to plan such node as direct dispatch to avoid getting duplicate data. We don't support replicated table with inherit/partition by clause now, the main problem is that update/delete on multiple result relations can't work correctly now, we can fix this later. * Allow spi_* to access replicated table on QE Previously, GPDB didn't allow QE to access non-catalog table because the data is incomplete, we can remove this limitation now if it only accesses replicated table. One problem is QE need to know if a table is replicated table, previously, QE didn't maintain the gp_distribution_policy catalog, so we need to pass policy info to QE for replicated table. * Change schema of gp_distribution_policy to identify replicated table Previously, we used a magic number -128 in gp_distribution_policy table to identify replicated table which is quite a hack, so we add a new column in gp_distribution_policy to identify replicated table and partitioned table. This commit also abandon the old way that used 1-length-NULL list and 2-length-NULL list to identify DISTRIBUTED RANDOMLY and DISTRIBUTED FULLY clause. Beside, this commit refactor the code to make the decision-making of distribution policy more clear. * support COPY for replicated table * Disable row ctid unique path for replicated table. Previously, GPDB use a special Unique path on rowid to address queries like "x IN (subquery)", For example: select * from t1 where t1.c2 in (select c2 from t3), the plan looks like: -> HashAggregate Group By: t1.ctid, t1.gp_segment_id -> Hash Join Hash Cond: t2.c2 = t1.c2 -> Seq Scan on t2 -> Hash -> Seq Scan on t1 Obviously, the plan is wrong if t1 is a replicated table because ctid + gp_segment_id can't identify a tuple, in replicated table, a logical row may have different ctid and gp_segment_id. So we disable such plan for replicated table temporarily, it's not the best way because rowid unique way maybe the cheapest plan than normal hash semi join, so we left a FIXME for later optimization. * ORCA related fix Reported and added by Bhuvnesh Chaudhary <bchaudhary@pivotal.io> Fallback to legacy query optimizer for queries over replicated table * Adapt pg_dump/gpcheckcat to replicated table gp_distribution_policy is no longer a master-only catalog, do same check as other catalogs. * Support gpexpand on replicated table && alter the dist policy of replicated table
-
- 13 1月, 2018 2 次提交
-
-
由 Heikki Linnakangas 提交于
Remove the concept of filespaces, revert tablespaces to work the same as in upstream. There is some leftovers in management tools. I don't know how to test all that, and I was afraid of touching things I can't run. Also, we may need to create replacements for some of those things on top of tablespaces, to make the management of tablespaces easier, and it might be easier to modify the existing tools than write them from scratch. (Yeah, you could always look at the git history, but still.) Per the discussion on gpdb-dev mailing list, the plan is to cherry-pick commit 16d8e594 from PostgreSQL 9.2, to make it possible to have a different path for a tablespace in the primary and its mirror. But that's not included in this commit yet. TODO: Make temp_tablespaces work. TODO: Make pg_dump do something sensible, when dumping from a GPDB 5 cluster that uses filespaces. Same with pg_upgrade. Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/sON4lraPEqg/v3lkM587BAAJ
-
由 Heikki Linnakangas 提交于
Because persistent tables are no more. NOTE: It would still be nice to check for consistency between pg_class and files on disk, to check that there are no extra data files, and no data files missing that have a pg_class entry. Same with AO seg files, I suppose. But that's a significantly different query than what we have here.
-
- 12 11月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
Unfortunately, pg_upgrade still barfs on this. That was fixed in the upstream in commit a2385cac13, in PostgreSQL 9.1, so we'll revisit this once we merge up to that.
-
- 30 10月, 2017 2 次提交
-
-
由 Heikki Linnakangas 提交于
In commit 226e8867, I changed the CatMissingIssue object to hold the content IDs of segments where an entry is missing in a Python list, instead of the string representation of a PostgreSQL array (e.g. "{1,2,-1}") that was used before. That was a nice simplification, but it turns out that there was more code that accessed the CatMissingIssue.segids field that I missed. It would make sense to change the rest of the code, IMHO, but to make the CI pipeline happy quickly, this commit just changes the code back to using a string representation of a PostgreSQL array again. This hopefully fixes the MM_gpcheckcat behave test failures.
-
由 Heikki Linnakangas 提交于
In commit 226e8867, I changed the shape of the result set passed to the processMissingDuplicateEntryResult() function, removing the "exists" column. But I failed to update the line that extracts the primary key columns from the result set for that change. Fix. This should fix the failures in the gpcheckcat behave tests.
-
- 28 10月, 2017 1 次提交
-
-
由 Heikki Linnakangas 提交于
The new query is simpler. There was a comment about using the temp table to avoid gathering all the data to the master, but I don't think that is a good tradeoff. Creating a temp table is pretty expensive, and even with the temp table, the master needs to broadcast all the master's entries from to the segments. For comparison, with the Gather node, all the segments need to send their entries to the master. Isn't that roughly the same amount of traffic? A long time ago, the query was made to use the temp table, after a report from a huge cluster with over 1000 segments, where the total size of pg_attribute, across all the nodes, was over 200 GB. So the catalogs can be large. But even then, I don't think this query can get much better than this. The new query moves some of the logic from SQL to the Python code. Seems simpler that way. The real reason to do this right now is that in the next commit, I'm going to change the way snapshots are dispatched with a query, and that change will change the visibility of the temp table that was created in the same command. In a nutshell, currently, if you do "CREATE TABLE mytemp AS SELECT oid FROM pg_class WHERE relname='mytemp'", the oid of the table being created is included. On PostgreSQL, and after the snapshot changes I'm working on, it will not be. And would confuse this gpcheckcat query.
-
- 18 10月, 2017 1 次提交
-
-
由 Brendan Stephens 提交于
Causes catalog tests to fail as we resort to oid, which does not exist. Also causes exception in SUMMARY and terminates.
-
- 24 6月, 2017 3 次提交
-
-
由 Jimmy Yih 提交于
The current gpcheckcat dependency check only checked for extra pg_depend entries where a pg_depend entry's objid or refobjid did not exist as an OID of any catalog table with hasoids set. We also need to check the reverse scenario where a catalog entry is missing an entry in pg_depend. This particular scenario is difficult to flag due to catalog entries having multiple unique pg_depend references or are created later from a query that may add dependency (e.g. granting ownership of a database to a certain user). Therefore, we add a very basic check only against catalog tables that immediately create dependencies upon its relative query.
-
由 Jimmy Yih 提交于
We did not check for missing or extra pg_depend entries across the cluster during gpcheckcat. We would be unaware of scenarios where a pg_depend entry went missing and the object that used that dependency is dropped. Those scenarios could lead to leftover catalog entries and prevent some simple CREATE statements.
-
由 Jimmy Yih 提交于
As gpcheckcat builds its mapping of catalog issues, it can flag objects whose parents no longer exist (e.g. a toast table left over after dropping a table). When these get caught, gpcheckcat will unfortunately error out on the reporting step. To prevent erroring out, we just check for None in the RelationObject's vars during reporting. Another issue that is fixed is the repetitive reporting of issues on the testing's current database following testing of a different database. The catalog issues reported were invalid for the current database and were actually issues from the previous database that was checked. This was caused by the improper resetting of the GPObjects and GPObjectGraph global dictionaries. To fix the issue, we properly use the clear() function to reuse the global variables.
-
- 31 3月, 2017 1 次提交
-
-
由 Larry Hamel 提交于
Signed-off-by: NChumki Roy <croy@pivotal.io>
-
- 17 3月, 2017 1 次提交
-
-
由 Ashwin Agrawal 提交于
pg_amop, pg_amproc oids are not synchronized from master and segments, hence adding them to known differences. There are more tables which needs to be cleaned-up here based on function RelationNeedsSynchronizedOIDs() in catalog.c based leaving that for separate commit. Also, since there exist simpler way to skip performing check for column use the same for `indcheckxmin` column of pg_index instead of what was added as part of commit 79caf1c0 Also, cleanup some checks existing for versions prior to 4.1.
-
- 09 3月, 2017 1 次提交
-
-
由 Shoaib Lari 提交于
The gpcheckcat 'inconsistent' check should skip the pg_index.indcheckxmin column for now as due to the HOT feature, the value of this column can be different between the master and the segments. A more long term fix will resolve the underlying issue that makes the indcheckxmin column value to be different between the master and the segments.
-
- 10 2月, 2017 1 次提交
-
-
由 Chumki Roy 提交于
* It was reporting minutes as hours Signed-off-by: NMarbin Tan <mtan@pivotal.io>
-
- 25 1月, 2017 1 次提交
-
-
由 Ashwin Agrawal 提交于
Commit 8fe321af added tablespace OID to gp_relation_node to correctly reflect unique relfilenode. As a result need to modify the gpcheckcat query by adding tablespace OID validating gp_relation_node's correctness with gp_persistent_relation_node.
-
- 30 11月, 2016 1 次提交
-
-
由 Jimmy Yih 提交于
The segidxid column is no longer available. It was removed in https://github.com/greenplum-db/gpdb/commit/6580341ab62242f72738bfa865a690fcda4b5dee.
-
- 17 11月, 2016 1 次提交
-
-
由 Chumki Roy 提交于
Fix gpcheckcat extraneous entry repair to use primary keys if oids not consistent across segments. (#1288) gpcheckcat missing/extraneous repair assumed oids are consistent across segments for a given catalog entry.This is not always guaranteed for all catalog tables (which are defined in gpcheckcat). Instead, we need to generate the repair based on the primary keys (the unique identifier in this case). Authors: Marbin, Larry, Chris, Chumki and Karen
-
- 21 10月, 2016 1 次提交
-
-
由 Larry Hamel 提交于
* Add gpcheckcat check for "mirroring_matching" This check compares the configuration setting for mirroring against each segment's report of its current mirror state (whether mirrors are enabled or not) and reports any mismatch. Authors: Larry Hamel, Karen Huddleston, Chumki Roy
-
- 03 9月, 2016 1 次提交
-
-
由 Marbin Tan 提交于
* Enhance foreign key check detection for gpcheckcat We will now do a bidirectional check for the following catalogs with foreign key check with pg_class: pg_attribute, pg_index, and pg_appendonly This is accomplished by doing a full join based on the foreign keys and primary keys between pg_class and a catalog mentioned above. Once detected, we will now output missing catalog infront of our stderr. * Add unit test for foreign_key check in gpcheckcat * update the behave test to account for gpcheckcat creating a repair for foreign_key. * Add new module for foreign key check in gpcheckcat * get unittest in a working state * refactor unit test to be "smaller" Update behave and unit test for gpcheckcat foreign_key_check: * We are keeping gp_distribution_policy to be checked as part of the left join instead of skipping it all together just because it's a master only catalog table. * Add behave test that deletes from segments instead of master for foreign key check * Fix unit test to be able to mock out the full and left join queries for forign key check Add unit test for gpcheck -C option * Remove pg_constraint check Due to pg_class limitations where some flags (relhaspkey) are maintained lazily (updates to some pg_class columns are not enforced). There is no good way to create a one to one mapping between pg_class and pg_constraint. So, removing/commenting it out for now Authors: Nikhil Kak, Chris Hajas, Chumki Roy, and Marbin Tan
-
- 21 7月, 2016 1 次提交
-
-
由 Christopher Hajas 提交于
The missing and extraneous repair in gpcheckcat should not generate a repair file unless the -E flag is provided, as this repair does not cover all cases. Authors: Chris Hajas and Karen Huddleston
-
- 07 7月, 2016 1 次提交
-
-
由 kaknikhil 提交于
Refactor gpcheckcat to modularize the repair code and add repair for the missing_extraneous module. (#873) * refactor gpcheckcat to modularize the repair code and add repair for missing_extraneous. 1. Moved all the generic repair code to its own module. 2. New module for missing_extraneous repair. 3. Make gpcheckcat do repairs per check instead of doing it after all the checks have finished 4. Fix gpcheckcat so that it now creates a repair dir with -R option. 5. Unit and behave tests for the above mentioned modules. 6. Update gpcheckcat behave to use consistent db names 7. Behave tests for a few other checks like foreign_key, constraint_db, part_integrity and also for the -g option
-
- 06 7月, 2016 1 次提交
-
-
由 Daniel Gustafsson 提交于
The gpverify code was remove in 8afc1dd1 and one subsequent commit. Clean out the last few trivial mentions of it.
-
- 30 4月, 2016 1 次提交
-
-
由 Marbin Tan 提交于
Batch size of 8 is too low and each cluster may have a different system configuration, so we would like to dertermine a default batch size before running gpcheckcat. * Add unittest for batch size * Truncate batch size to be, at maximum, the amount of primaries. Batch size can be no longer be larger than the amount of primaries. * Refactor: Create method main() for gpcheckcat In order improve unit testing, move functionality from '__main__' to a method. Authors: Marbin Tan, Larry Hamel, Nikhil Kak
-
- 29 4月, 2016 1 次提交
-
-
由 Chumki Roy 提交于
In a previous commit, f569c1d1, gpcheckcat was modified to display a list of tables with missing attributes. This commit adds the ability to list tables with extraneous attributes. Authors: Chumki Roy and James McAtamney
-
- 26 4月, 2016 1 次提交
-
-
由 Nikhil Kak 提交于
-
- 16 4月, 2016 1 次提交
-
-
由 Stephen Wu 提交于
- gpcheckcat should return a return code of 0 if schemas are found/dropped - Backfilled tests for leaked schema logging - Also cleaned up typo in Makefile
-
- 15 4月, 2016 1 次提交
-
-
由 Larry Hamel 提交于
Add behave test for gpcheckcat: catalog problems for conflicting db ownership will generate repair scripts that have timestamped names (to avoid overwriting) - add behave steps for removing and also validating directories that are relative to the current working directory (allows * globbing) - criteria for determining that repair file was not overwritten: run gpcheckcat twice, and expect 1 and then 2 repair scripts. - verify that stdout reports "owner" error - change criteria for gpcheckcat ownership: relax the pg_type check so that both 4.3STABLE and master report an ownership error for pg_type.
-
- 14 4月, 2016 1 次提交
-
-
由 Christopher Hajas 提交于
more readable format. Previously, gpcheckcat identified tables with missing attributes but displayed them in various locations in the output in a non-standardized format. This commit summarizes all the tables with missing attributes in one section at the end of the output in the format "[database].[schema].[table].[segment id]". Authors: Chris Hajas and Jamie McAtamney
-
- 13 4月, 2016 7 次提交
-
-
-
- print out all information about violated indexes in the logs - changes unique index check to also return the list of violated segments in order to support this
-
-
- more accurate function names in GPObject - remove duplication of query strings - make file import for test subject independent of current working directory - update test names to match casing convention
-
- additionally, minor refactoring in gpcheckcat and unique index check
-
- Before this we were adding issues to the report but not actually setting the checkStatus variable to False
-
- includes unit tests for gpcheckcat covering everything after option parsing and everything before writing to stdout/logs - in the process of rebasing, removes the tests for dropping leaked schemas. These tests will be added back in the later commit - adds a behave integration test for this check
-
- 08 4月, 2016 2 次提交
-
-
由 Stephen Wu 提交于
- use consistent spacing - use spaces instead of tabs - update nomenclature in gpcheckcat - rename 'test' to 'check' This is is to be more consistent and to avoid confusion with unit/integration tests in the future. We purposely did not change 'test' in stdout and logs because integration tests for other parts of the system that call gpcheckcat currently fail if our output changes. A future commit may change 'test' to 'check' in stdout in addition to fixing the aforementioned integration tests. Authors: Chumki Roy and Stephen Wu
-
由 kaknikhil 提交于
* drop leaked schemas before running gpcheckcat tests 1. drop any leaked/orphaned schemas before running any of the gpcheckcat tests 2. add unit and behave tests 3. move gpcheckcat from gpMgmt/bin/lib to gpMgmt/bin 4. misc refactoring orphan/leaked schemas are temp schemas that are not associated with any session id. There used to be a check for leaked temp schemas in gpcheckcat which ended up creating a repair script. * drop the database at the end of the behave test * move the gpcheckcat bin to lib symlink before the copy to /Users/nikhilkak/git/gpdb/gpAux/greenplum-db-devel * fix the if check before symlinking gpcheckcat from bin to lib closes #595
-