1. 17 4月, 2018 1 次提交
    • P
      Fix 'distribution_policy' issue on replicated table by gpcheckcat · 5001751a
      Pengzhou Tang 提交于
      'distribution_policy' test do constraints check on randomly-distributed
      tables, however, attrnums = null in gp_distribution_policy is no longer
      effective to identify a randomly distributed table after we involved new
      replicated distributed policy, so add more filter to make things right.
      5001751a
  2. 03 4月, 2018 1 次提交
    • A
      Get rid of pg_exttable.fmterrtbl · 8f6fe2d6
      Adam Lee 提交于
      The pg_exttable.fmterrtbl column stored the OID of the error table, but
      without an error table it is just set to the OID of the external table.
      That is not necessary, there are other columns which indicate if error
      logging is enabled. Therefore this column can be removed.
      8f6fe2d6
  3. 29 3月, 2018 1 次提交
    • P
      Support replicated table in GPDB · 7efe3204
      Pengzhou Tang 提交于
      * Support replicated table in GPDB
      
      Currently, tables are distributed across all segments by hash or random in GPDB. There
      are requirements to introduce a new table type that all segments have the duplicate
      and full table data called replicated table.
      
      To implement it, we added a new distribution policy named POLICYTYPE_REPLICATED to mark
      a replicated table and added a new locus type named CdbLocusType_SegmentGeneral to specify
      the distribution of tuples of a replicated table.  CdbLocusType_SegmentGeneral implies
      data is generally available on all segments but not available on qDisp, so plan node with
      this locus type can be flexibly planned to execute on either single QE or all QEs. it is
      similar with CdbLocusType_General, the only difference is that CdbLocusType_SegmentGeneral
      node can't be executed on qDisp. To guarantee this, we try our best to add a gather motion
      on the top of a CdbLocusType_SegmentGeneral node when planing motion for join even other
      rel has bottleneck locus type, a problem is such motion may be redundant if the single QE
      is not promoted to executed on qDisp finally, so we need to detect such case and omit the
      redundant motion at the end of apply_motion(). We don't reuse CdbLocusType_Replicated since
      it's always implies a broadcast motion bellow, it's not easy to plan such node as direct
      dispatch to avoid getting duplicate data.
      
      We don't support replicated table with inherit/partition by clause now, the main problem is
      that update/delete on multiple result relations can't work correctly now, we can fix this
      later.
      
      * Allow spi_* to access replicated table on QE
      
      Previously, GPDB didn't allow QE to access non-catalog table because the
      data is incomplete,
      we can remove this limitation now if it only accesses replicated table.
      
      One problem is QE need to know if a table is replicated table,
      previously, QE didn't maintain
      the gp_distribution_policy catalog, so we need to pass policy info to QE
      for replicated table.
      
      * Change schema of gp_distribution_policy to identify replicated table
      
      Previously, we used a magic number -128 in gp_distribution_policy table
      to identify replicated table which is quite a hack, so we add a new column
      in gp_distribution_policy to identify replicated table and partitioned
      table.
      
      This commit also abandon the old way that used 1-length-NULL list and
      2-length-NULL list to identify DISTRIBUTED RANDOMLY and DISTRIBUTED
      FULLY clause.
      
      Beside, this commit refactor the code to make the decision-making of
      distribution policy more clear.
      
      * support COPY for replicated table
      
      * Disable row ctid unique path for replicated table.
        Previously, GPDB use a special Unique path on rowid to address queries
        like "x IN (subquery)", For example:
        select * from t1 where t1.c2 in (select c2 from t3), the plan looks
        like:
         ->  HashAggregate
               Group By: t1.ctid, t1.gp_segment_id
                  ->  Hash Join
                        Hash Cond: t2.c2 = t1.c2
                      ->  Seq Scan on t2
                      ->  Hash
                          ->  Seq Scan on t1
      
        Obviously, the plan is wrong if t1 is a replicated table because ctid
        + gp_segment_id can't identify a tuple, in replicated table, a logical
        row may have different ctid and gp_segment_id. So we disable such plan
        for replicated table temporarily, it's not the best way because rowid
        unique way maybe the cheapest plan than normal hash semi join, so
        we left a FIXME for later optimization.
      
      * ORCA related fix
        Reported and added by Bhuvnesh Chaudhary <bchaudhary@pivotal.io>
        Fallback to legacy query optimizer for queries over replicated table
      
      * Adapt pg_dump/gpcheckcat to replicated table
        gp_distribution_policy is no longer a master-only catalog, do
        same check as other catalogs.
      
      * Support gpexpand on replicated table && alter the dist policy of replicated table
      7efe3204
  4. 13 1月, 2018 2 次提交
    • H
      Remove filespaces. · 5a3a39bc
      Heikki Linnakangas 提交于
      Remove the concept of filespaces, revert tablespaces to work the same as
      in upstream.
      
      There is some leftovers in management tools. I don't know how to test all
      that, and I was afraid of touching things I can't run. Also, we may need
      to create replacements for some of those things on top of tablespaces, to
      make the management of tablespaces easier, and it might be easier to modify
      the existing tools than write them from scratch. (Yeah, you could always
      look at the git history, but still.)
      
      Per the discussion on gpdb-dev mailing list, the plan is to cherry-pick
      commit 16d8e594 from PostgreSQL 9.2, to make it possible to have a
      different path for a tablespace in the primary and its mirror. But that's
      not included in this commit yet.
      
      TODO: Make temp_tablespaces work.
      TODO: Make pg_dump do something sensible, when dumping from a GPDB 5 cluster
      that uses filespaces. Same with pg_upgrade.
      
      Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/sON4lraPEqg/v3lkM587BAAJ
      5a3a39bc
    • H
      Remove checks related to persistent tables from gpcheckcat. · 126a7935
      Heikki Linnakangas 提交于
      Because persistent tables are no more.
      
      NOTE: It would still be nice to check for consistency between pg_class
      and files on disk, to check that there are no extra data files, and no
      data files missing that have a pg_class entry. Same with AO seg files,
      I suppose. But that's a significantly different query than what we have
      here.
      126a7935
  5. 12 11月, 2017 1 次提交
  6. 30 10月, 2017 2 次提交
    • H
      Use string representation of segment ids in CatMissingIssue object. · 792a9b43
      Heikki Linnakangas 提交于
      In commit 226e8867, I changed the CatMissingIssue object to hold the
      content IDs of segments where an entry is missing in a Python list, instead
      of the string representation of a PostgreSQL array (e.g. "{1,2,-1}") that
      was used before. That was a nice simplification, but it turns out that
      there was more code that accessed the CatMissingIssue.segids field that I
      missed. It would make sense to change the rest of the code, IMHO, but to
      make the CI pipeline happy quickly, this commit just changes the code back
      to using a string representation of a PostgreSQL array again.
      
      This hopefully fixes the MM_gpcheckcat behave test failures.
      792a9b43
    • H
      Fix gpcheckcat error reporting of missing entries entries. · 8199a402
      Heikki Linnakangas 提交于
      In commit 226e8867, I changed the shape of the result set passed to the
      processMissingDuplicateEntryResult() function, removing the "exists" column.
      But I failed to update the line that extracts the primary key columns from
      the result set for that change. Fix.
      
      This should fix the failures in the gpcheckcat behave tests.
      8199a402
  7. 28 10月, 2017 1 次提交
    • H
      Don't use a temp table in gpccheckcat, when checking for missing entries. · 226e8867
      Heikki Linnakangas 提交于
      The new query is simpler. There was a comment about using the temp table
      to avoid gathering all the data to the master, but I don't think that is a
      good tradeoff. Creating a temp table is pretty expensive, and even with
      the temp table, the master needs to broadcast all the master's entries from
      to the segments. For comparison, with the Gather node, all the segments
      need to send their entries to the master. Isn't that roughly the same
      amount of traffic?
      
      A long time ago, the query was made to use the temp table, after a report
      from a huge cluster with over 1000 segments, where the total size of
      pg_attribute, across all the nodes, was over 200 GB. So the catalogs can
      be large. But even then, I don't think this query can get much better than
      this.
      
      The new query moves some of the logic from SQL to the Python code. Seems
      simpler that way.
      
      The real reason to do this right now is that in the next commit, I'm
      going to change the way snapshots are dispatched with a query, and that
      change will change the visibility of the temp table that was created in
      the same command. In a nutshell, currently, if you do "CREATE TABLE mytemp
      AS SELECT oid FROM pg_class WHERE relname='mytemp'", the oid of the table
      being created is included. On PostgreSQL, and after the snapshot changes
      I'm working on, it will not be. And would confuse this gpcheckcat query.
      226e8867
  8. 18 10月, 2017 1 次提交
  9. 24 6月, 2017 3 次提交
    • J
      Update gpcheckcat dependency check · 6375e9bc
      Jimmy Yih 提交于
      The current gpcheckcat dependency check only checked for extra
      pg_depend entries where a pg_depend entry's objid or refobjid did not
      exist as an OID of any catalog table with hasoids set. We also need to
      check the reverse scenario where a catalog entry is missing an entry
      in pg_depend. This particular scenario is difficult to flag due to
      catalog entries having multiple unique pg_depend references or are
      created later from a query that may add dependency (e.g. granting
      ownership of a database to a certain user). Therefore, we add a very
      basic check only against catalog tables that immediately create
      dependencies upon its relative query.
      6375e9bc
    • J
      Change gpcheckcat missing/extra check to include pg_depend. · 4bdc7e45
      Jimmy Yih 提交于
      We did not check for missing or extra pg_depend entries across the
      cluster during gpcheckcat. We would be unaware of scenarios where a
      pg_depend entry went missing and the object that used that dependency
      is dropped. Those scenarios could lead to leftover catalog entries and
      prevent some simple CREATE statements.
      4bdc7e45
    • J
      Fix gpcheckcat output · 843b2109
      Jimmy Yih 提交于
      As gpcheckcat builds its mapping of catalog issues, it can flag
      objects whose parents no longer exist (e.g. a toast table left over
      after dropping a table). When these get caught, gpcheckcat will
      unfortunately error out on the reporting step. To prevent erroring
      out, we just check for None in the RelationObject's vars during
      reporting.
      
      Another issue that is fixed is the repetitive reporting of issues on
      the testing's current database following testing of a different
      database. The catalog issues reported were invalid for the current
      database and were actually issues from the previous database that was
      checked. This was caused by the improper resetting of the GPObjects
      and GPObjectGraph global dictionaries. To fix the issue, we properly
      use the clear() function to reuse the global variables.
      843b2109
  10. 31 3月, 2017 1 次提交
  11. 17 3月, 2017 1 次提交
    • A
      Fix gpcheckcat for pg_amop, pg_amproc and pg_index. · 0f0faf9e
      Ashwin Agrawal 提交于
      pg_amop, pg_amproc oids are not synchronized from master and segments, hence
      adding them to known differences. There are more tables which needs to be
      cleaned-up here based on function RelationNeedsSynchronizedOIDs() in catalog.c
      based leaving that for separate commit.
      
      Also, since there exist simpler way to skip performing check for column use the
      same for `indcheckxmin` column of pg_index instead of what was added as part of
      commit 79caf1c0
      
      Also, cleanup some checks existing for versions prior to 4.1.
      0f0faf9e
  12. 09 3月, 2017 1 次提交
    • S
      Skip pg_index.indcheckxmin column in gpcheckcat. · 79caf1c0
      Shoaib Lari 提交于
      The gpcheckcat 'inconsistent' check should skip the pg_index.indcheckxmin column
      for now as due to the HOT feature, the value of this column can be different
      between the master and the segments.
      
      A more long term fix will resolve the underlying issue that makes the
      indcheckxmin column value to be different between the master and the segments.
      79caf1c0
  13. 10 2月, 2017 1 次提交
  14. 25 1月, 2017 1 次提交
  15. 30 11月, 2016 1 次提交
  16. 17 11月, 2016 1 次提交
    • C
      Fix gpcheckcat extraneous entry repair to use primary keys if oids not... · dab6fe18
      Chumki Roy 提交于
      Fix gpcheckcat extraneous entry repair to use primary keys if oids not consistent across segments. (#1288)
      
      gpcheckcat missing/extraneous repair assumed oids are consistent across 
      segments for a given catalog entry.This is not always guaranteed for all catalog 
      tables (which are defined in gpcheckcat). Instead, we need to generate the 
      repair based on the primary keys (the unique identifier in this case).
      
      Authors: Marbin, Larry, Chris, Chumki and Karen
      dab6fe18
  17. 21 10月, 2016 1 次提交
    • L
      add gpcheckcat check for "mirroring_matching" (#1195) · bf0ab799
      Larry Hamel 提交于
      * Add gpcheckcat check for "mirroring_matching"
      
      This check compares the configuration setting for mirroring against each
      segment's report of its current mirror state (whether mirrors are enabled
      or not) and reports any mismatch.
      
      Authors: Larry Hamel, Karen Huddleston, Chumki Roy
      bf0ab799
  18. 03 9月, 2016 1 次提交
    • M
      Enhance foreign key check detection for gpcheckcat (#1061) · 92cc0187
      Marbin Tan 提交于
      * Enhance foreign key check detection for gpcheckcat
      
      We will now do a bidirectional check for the following
      catalogs with foreign key check with pg_class:
          pg_attribute, pg_index, and pg_appendonly
      
      This is accomplished by doing a full join based on the foreign keys and
      primary keys between pg_class and a catalog mentioned above. Once
      detected, we will now output missing catalog infront of our stderr.
      
      * Add unit test for foreign_key check in gpcheckcat
      
      * update the behave test to account for gpcheckcat creating a repair for foreign_key.
      
      * Add new module for foreign key check in gpcheckcat
      * get unittest in a working state
      * refactor unit test to be "smaller"
      
      Update behave and unit test for gpcheckcat foreign_key_check:
      
      * We are keeping gp_distribution_policy to be checked as part of the
        left join instead of skipping it all together just because it's a
        master only catalog table.
      * Add behave test that deletes from segments instead of master for
        foreign key check
      * Fix unit test to be able to mock out the full and left join queries
        for forign key check
      
      Add unit test for gpcheck -C option
      
      * Remove pg_constraint check
      
      Due to pg_class limitations where some flags (relhaspkey) are maintained
      lazily (updates to some pg_class columns are not enforced).
      There is no good way to create a one to one mapping between pg_class
      and pg_constraint. So, removing/commenting it out for now
      
      Authors: Nikhil Kak, Chris Hajas, Chumki Roy, and Marbin Tan
      92cc0187
  19. 21 7月, 2016 1 次提交
  20. 07 7月, 2016 1 次提交
    • K
      Refactor gpcheckcat to modularize the repair code and add repair for the... · 20ccb77a
      kaknikhil 提交于
      Refactor gpcheckcat to modularize the repair code and add repair for the missing_extraneous module. (#873)
      
      * refactor gpcheckcat to modularize the repair code and add repair for
      missing_extraneous.
      1. Moved all the generic repair code to its own module.
      2. New module for missing_extraneous repair.
      3. Make gpcheckcat do repairs per check instead of doing it after all
      the checks have finished
      4. Fix gpcheckcat so that it now creates a repair dir with -R option.
      5. Unit and behave tests for the above mentioned modules.
      6. Update gpcheckcat behave to use consistent db names
      7. Behave tests for a few other checks like foreign_key, constraint_db,
      part_integrity and also for the -g option
      20ccb77a
  21. 06 7月, 2016 1 次提交
  22. 30 4月, 2016 1 次提交
    • M
      Allow gpcheckcat to determine default batch size on runtime · 15b605c3
      Marbin Tan 提交于
      Batch size of 8 is too low and each cluster may have a different
      system configuration, so we would like to dertermine a default batch
      size before running gpcheckcat.
      
      * Add unittest for batch size
      
      * Truncate batch size to be, at maximum, the amount of primaries.
        Batch size can be no longer be larger than the amount of primaries.
      
      * Refactor: Create method main() for gpcheckcat
        In order improve unit testing, move functionality from '__main__'
        to a method.
      
      Authors: Marbin Tan, Larry Hamel, Nikhil Kak
      15b605c3
  23. 29 4月, 2016 1 次提交
  24. 26 4月, 2016 1 次提交
  25. 16 4月, 2016 1 次提交
    • S
      Use leaked schema dropper in gpcheckcat · 9dfaf11e
      Stephen Wu 提交于
      - gpcheckcat should return a return code of 0 if schemas are found/dropped
      - Backfilled tests for leaked schema logging
      - Also cleaned up typo in Makefile
      9dfaf11e
  26. 15 4月, 2016 1 次提交
    • L
      Add behave test for gpcheckcat: catalog problems for conflicting db ownership... · 8793ffe5
      Larry Hamel 提交于
      Add behave test for gpcheckcat: catalog problems for conflicting db ownership will generate repair scripts that have timestamped names (to avoid overwriting)
      - add behave steps for removing and also validating directories that are relative to the current working directory (allows * globbing)
      - criteria for determining that repair file was not overwritten:  run gpcheckcat twice, and expect 1 and then 2 repair scripts.
      - verify that stdout reports "owner" error
      - change criteria for gpcheckcat ownership: relax the pg_type check so that both 4.3STABLE and master report an ownership error for pg_type.
      8793ffe5
  27. 14 4月, 2016 1 次提交
    • C
      Modified gpcheckcat to display tables having missing attributes in a · f569c1d1
      Christopher Hajas 提交于
      more readable format.
      
      Previously, gpcheckcat identified tables with missing attributes but
      displayed them in various locations in the output in a
      non-standardized format. This commit summarizes all the tables with missing
      attributes in one section at the end of the output in the format
      "[database].[schema].[table].[segment id]".
      
      Authors: Chris Hajas and Jamie McAtamney
      f569c1d1
  28. 13 4月, 2016 7 次提交
  29. 08 4月, 2016 2 次提交
    • S
      gpcheckcat refactoring and nomenclature changes · b57da2ea
      Stephen Wu 提交于
      - use consistent spacing
      - use spaces instead of tabs
      - update nomenclature in gpcheckcat - rename 'test' to 'check'
        This is is to be more consistent and to avoid confusion with unit/integration
        tests in the future.  We purposely did not change 'test' in stdout and logs
        because integration tests for other parts of the system that call
        gpcheckcat currently fail if our output changes.  A future commit may
        change 'test' to 'check' in stdout in addition to fixing the aforementioned
        integration tests.
      
      Authors: Chumki Roy and Stephen Wu
      b57da2ea
    • K
      drop leaked/orphan schemas before running gpcheckcat catalog checks (#595) · 41dfd82b
      kaknikhil 提交于
      * drop leaked schemas before running gpcheckcat tests
      
        1. drop any leaked/orphaned schemas before running any of the gpcheckcat tests
        2. add unit and behave tests
        3. move gpcheckcat from gpMgmt/bin/lib to gpMgmt/bin
        4. misc refactoring
      
      orphan/leaked schemas are temp schemas that are not associated with any session id.
      There used to be a check for leaked temp schemas in gpcheckcat
      which ended up creating a repair script.
      
      * drop the database at the end of the behave test
      
      * move the gpcheckcat bin to lib symlink before the copy to /Users/nikhilkak/git/gpdb/gpAux/greenplum-db-devel
      
      * fix the if check before symlinking  gpcheckcat from bin to lib
      
      closes #595  
      41dfd82b