1. 14 11月, 2020 1 次提交
    • A
      gpstart logs commands' stderr · 717f0c47
      Adam Lee 提交于
      I have seen too many "[CRITICAL]:-gpstart failed. (Reason='')
      exiting..." errors, and there was nothing in the log. The reason could
      be "SSH PATH", "Python modules" or some other issues.
      
      Log the stderr to save debugging efforts.
      717f0c47
  2. 07 11月, 2020 1 次提交
  3. 23 10月, 2020 1 次提交
    • A
      gprecoverseg: log the error if pg_rewind fails · 57756cc0
      Adam Lee 提交于
      It didn't log the error message before if pg_rewind fails, fix that to make
      DBA/field/developer's life eaisier.
      
      Before this:
      ```
      20201022:15:19:10:011118 gprecoverseg:earth:adam-[INFO]:-Running pg_rewind on required mirrors
      20201022:15:19:12:011118 gprecoverseg:earth:adam-[WARNING]:-Incremental recovery failed for dbid 2. You must use gprecoverseg -F to recover the segment.
      20201022:15:19:12:011118 gprecoverseg:earth:adam-[INFO]:-Starting mirrors
      20201022:15:19:12:011118 gprecoverseg:earth:adam-[INFO]:-era is 0406b847bf226356_201022151031
      ```
      
      After this:
      ```
      20201022:15:33:31:019577 gprecoverseg:earth:adam-[INFO]:-Running pg_rewind on required mirrors
      20201022:15:33:31:019577 gprecoverseg:earth:adam-[WARNING]:-pg_rewind: fatal: could not find common ancestor of the source and target cluster's timelines
      20201022:15:33:31:019577 gprecoverseg:earth:adam-[WARNING]:-Incremental recovery failed for dbid 2. You must use gprecoverseg -F to recover the segment.
      20201022:15:33:31:019577 gprecoverseg:earth:adam-[INFO]:-Starting mirrors
      20201022:15:33:31:019577 gprecoverseg:earth:adam-[INFO]:-era is 0406b847bf226356_201022151031
      ```
      57756cc0
  4. 21 10月, 2020 1 次提交
  5. 01 10月, 2020 1 次提交
    • A
      Delete logic to cleanup shared memory on unclean shutdown · 44d90150
      Ashwin Agrawal 提交于
      Postgres has logic to reuse or cleanup the shared-memory from previous
      unclean shut-down. Plus, also starting b0fc0df9 the System V shared
      memory consumption was dramatically reduced. Hence, no need to have
      this logic in utilities to clean up shared memory.
      
      The main reason to make this change now is postmaster.pid file format
      changed and postmaster status is recorded on last
      line. CleanSharedMem() was coded with expectation last line will
      always be shared memory key, no more holds true due to it. If we have
      to keep this logic around need to change the logic to read line 7 from
      file and not last line. Given the need doesn't exist, just deleting
      the logic instead of fixing it.
      
      Based on inputs from Heikki Linnakangas and Asim R P.
      44d90150
  6. 25 9月, 2020 6 次提交
    • T
      Replace unix.InterfaceAddrs with gp.IfAddrs · c3167d17
      Tyler Ramer 提交于
      This was an outstanding TODO, and there is an added benefit of removing
      yet another extensive shell command which is fragile.
      Authored-by: NTyler Ramer <tramer@vmware.com>
      c3167d17
    • T
      Update utilities test code for Python 3 · ab965ba5
      Tyler Ramer 提交于
      This commit makes several broad changes to address conversion issues common to
      multiple test files:
      
      - Several built-in functions have been deprecated or renamed, or now need to
        use bytestrings (and associated encoding and decoding) instead of strings
      
      - There is a "test case" run when ComputeCatalogUpdate is executed as a
        standalone program, but this should not be present in shipped code, so we
        remove it
      
      - Some shelled-out commands in test code have been simplified due to changes
        to shell escaping, file redirection, and string manipulation, moving string
        parsing logic from shell commands to internal Python logic wherever possible
      Co-authored-by: NAshwin Agrawal <aashwin@vmware.com>
      Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>
      Co-authored-by: NTyler Ramer <tramer@vmware.com>
      ab965ba5
    • J
      Replace pickling with json and shlex in utilities · b248209e
      Jamie McAtamney 提交于
      Pickling was previously used in several utilities when shelling out commands
      and/or executing commands remotely, in order to avoid needing to escape strings
      when passing them back to the master.  The actual string contents were largely
      or wholly ASCII, so pickling was overkill for that purpose.
      
      The semantics of byte strings in Python 3 breaks the pickling logic, so we've
      taken the opportunity to simplify that whole logic stack.  Code that formerly
      pickled strings now uses shlex.quote() to escape strings where possible and
      serializes strings with json where that is insufficient, removing any helper
      functions that are no longer necessary.
      Authored-by: NJamie McAtamney <jmcatamney@vmware.com>
      Authored-by: NTyler Ramer <tramer@vmware.com>
      
      Removed unused or unecessary helper functions from gppylib
      
      Shell escape function was unused and python 3 shlex.quote() function
      should be used anyway.
      
      canStringBeParsedAsInt was a silly helper function, and also failed to
      actually complete the cast as string.
      Authored-by: NTyler Ramer <tramer@vmware.com>
      b248209e
    • J
      Update utilities code to work with Python 3 · 78f5cf43
      Jamie McAtamney 提交于
      This commit makes several broad changes to address conversion issues common to
      multiple utilities:
      
      - The input and output of subprocess in Python 3 are now bytestrings instead
        of strings. Thus, some sanitizing of inputs and outputs is necessary
      
      - Many built-in functions like raw_input and __cmp__ are deprecated in Python 3,
        and as a side effect list sorting and hashing work differently, requiring a
        different set of helper functions
      
      - Implicit relative imports no longer work, so dbconn (in utilities code) and
        mgmt_utils (in test code) must be added to the search path and imported using
        a full path instead
      
      - File objects require flush methods in python3, and popen2 has been deprecated
      Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>
      Co-authored-by: NTyler Ramer <tramer@vmware.com>
      78f5cf43
    • T
      Allow GPDB to build and test with Python 3 · 7306abea
      Tyler Ramer 提交于
      - Update Python file shebangs to use python3 and update gp_replicate_check and
        gpversion.py to allow running under Python 3
      
      - Use Centos 7 dev containers with Python 3 and pip3 installed for testing, as
        prod containers do not yet work with Python 3, and update Travis with Python 3
      
      - Install dependencies with pip3 to get Python 3-compatible versions
      
      - Copy the Python 3 version of .so files, don't unset PYTHONHOME and PYTHONPATH,
        and don't remove built files from install locations, so that the Python 2 and
        Python 3 versions of various files can coexist
      Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>
      Co-authored-by: NKris Macoskey <kmacoskey@vmware.com>
      Co-authored-by: NTyler Ramer <tramer@vmware.com>
      7306abea
    • J
      Run 2to3 against Python code · 54a65573
      Jamie McAtamney 提交于
      The 2to3 utility is an officially-supported script to automatically convert
      Python 2 code to Python 3.  It's not a complete fix by any means, but it
      handles most basic syntax transformations and similar.
      
      This commit is the result of running 2to3 against every Python file in the
      gpMgmt directory, so it's quite large and fairly scattershot.  Manual updates
      to any code that 2to3 can't handle will come in later commits.
      54a65573
  7. 13 8月, 2020 1 次提交
  8. 28 7月, 2020 1 次提交
    • T
      Remove gphostcache · f61d35cd
      Tyler Ramer 提交于
      Gphostcache has numerous issues, and has been a pain point for some
      time. For this reason, we are removing it.
      
      This commit moves the useful function of gphostcache - the hostname
      deduping - to the gparray class, where a list of deduped hostnames is
      returned from gpArray.get_hostlist().
      
      There is a FIXME of correctly adding hostname to a newly added or
      recovered mirror. The hostname resolution from address was incorrect and
      faulty in its logic - an IP address never requires a hostname associated
      with it. However, the hostname field in gp_segment_configuration should
      be populated somehow - we recommend a "hostname" field addition to any
      configuration files that require it. For now, we simple set the
      "hostname" to "address" which ultimately delivers the same functionality
      as the gphostcache implementation.
      Co-authored-by: NTyler Ramer <tramer@vmware.com>
      Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>
      f61d35cd
  9. 21 7月, 2020 1 次提交
    • P
      Use postgres database for pg_rewind cleanly shutdown execution to avoid potential pg_rewind hang. · 288908f3
      Paul Guo 提交于
      During testing, I encountered an incremental gprecoverseg hang issue.
      Incremental gprecoverseg is based on pg_rewind.  pg_rewind launches a single
      mode postgres process and quits after crash recovery if the postgres instance
      was not cleanly shut down - this is used to ensure that the postgres is in a
      consistent state before doing incremental recovery. I found that the single
      mode postgres hangs with the below stack.
      
      \#1  0x00000000008cf2d6 in PGSemaphoreLock (sema=0x7f238274a4b0, interruptOK=1 '\001') at pg_sema.c:422
      \#2  0x00000000009614ed in ProcSleep (locallock=0x2c783c0, lockMethodTable=0xddb140 <default_lockmethod>) at proc.c:1347
      \#3  0x000000000095a0c1 in WaitOnLock (locallock=0x2c783c0, owner=0x2cbf950) at lock.c:1853
      \#4  0x0000000000958e3a in LockAcquireExtended (locktag=0x7ffde826aa60, lockmode=3, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001', locallockp=0x0) at lock.c:1155
      \#5  0x0000000000957e64 in LockAcquire (locktag=0x7ffde826aa60, lockmode=3, sessionLock=0 '\000', dontWait=0 '\000') at lock.c:700
      \#6  0x000000000095728c in LockSharedObject (classid=1262, objid=1, objsubid=0, lockmode=3) at lmgr.c:939
      \#7  0x0000000000b0152b in InitPostgres (in_dbname=0x2c769f0 "template1", dboid=0, username=0x2c59340 "gpadmin", out_dbname=0x0) at postinit.c:1019
      \#8  0x000000000097b970 in PostgresMain (argc=5, argv=0x2c51990, dbname=0x2c769f0 "template1", username=0x2c59340 "gpadmin") at postgres.c:4820
      \#9  0x00000000007dc432 in main (argc=5, argv=0x2c51990) at main.c:241
      
      It tries to hold the lock for template1 on pg_database with lockmode 3 but
      it conflicts with the lock with lockmode 5 which was held by a recovered dtx
      transaction in startup RecoverPreparedTransactions(). Typically the dtx
      transaction comes from "create database" (by default the template database is
      template1).
      
      Fixing this by using the postgres database for single mode postgres execution.
      The postgres database is commonly used in many background worker backends like
      dtx recovery, gdd and ftsprobe. With this change, we do not need to worry
      about "create database" with template postgres, etc since they won't succeed,
      thus avoid the lock conflict.
      
      We may be able to fix this in InitPostgres() by bypassing the locking code in
      single mode but the current fix seems to be safer.  Note InitPostgres()
      locks/unlocks some other catalog tables also but almost all of them are using
      lock mode 1 (except mode 3 pg_resqueuecapability per debugging output).  It
      seems that it is not usual in real scenario to have a dtx transaction that
      locks catalog with mode 8 which conflicts with mode 1.  If we encounter this
      later we need to think out a better (might not be trivial) solution for this.
      For now let's fix the issue we encountered at first.
      
      Note in this patch the code fixes in buildMirrorSegments.py and twophase.c are
      not related to this patch. They do not seem to be strict bugs but we'd better
      fix them to avoid potential issues in the future.
      Reviewed-by: NAshwin Agrawal <aashwin@vmware.com>
      Reviewed-by: NAsim R P <pasim@vmware.com>
      288908f3
  10. 09 7月, 2020 1 次提交
  11. 17 6月, 2020 3 次提交
    • T
      Close short lived connections · bc35b6b2
      Tyler Ramer 提交于
      Due to refactor of dbconn and newer versions of pygresql, using
      `with dbconn.connect() as conn` no longer attempts to close a
      connection, even if it did prior. Instead, this syntax uses the
      connection itself as context and, as noted in execSQL, overrides the
      autocommit functionality of execSQL.
      
      Therefore, close the connection manually to ensure that execSQL is
      auto-commited, and the connection is closed.
      Co-authored-by: NTyler Ramer <tramer@pivotal.io>
      Co-authored-by: NJamie McAtamney <jmcatamney@pivotal.io>
      bc35b6b2
    • T
      Refactor dbconn · 330db230
      Tyler Ramer 提交于
      One reason pygresql was previously modified was that it did not handle
      closing a connection very gracefully. In the process of updating
      pygresql, we've wrapped the connection it provides with a
      ClosingConnection function, which should handle gracefully closing the
      connection when the "with dbconn.connect as conn" syntax is used.
      
      This did, however, illustrate issues where a cursor might have been
      created as the result of a dbconn.execSQL() call, which seems to hold
      the connection open if not specifically closed.
      
      It is therefore necessary to remove the ability to get a cursor from
      dbconn.execSQL(). To highlight this difference, and to ensure that
      future calls to this library is easy to use, I've cleaned up and
      clarified the dbconn execution code, to include the following features.
      
      - dbconn.execSQL() closes the cursor as part of the function. It returns
        no rows
      - functions dbconn.query() is added, which behaves like dbconn.execSQL()
        except that it now returns a cursor
      - function dbconn.execQueryforSingleton() is renamed
        dconn.querySingleton()
      - function dbconn.execQueryforSingletonRow() is renamed
        dconn.queryRow()
      Authored-by: NTyler Ramer <tramer@pivotal.io>
      330db230
    • T
      Update PyGreSQL from 4.0.0 to 5.1.2 · f5758021
      Tyler Ramer 提交于
      This commit updates pygresql from 4.0.0 to 5.1.2, which requires
      numerous changes to take advantages of the major result syntax change
      that pygresql5 implemented. Of note, cursors or query objects
      automatically cast returned values as appropriate python types - list of
      ints, for example, instead of a string like "{1,2}". This is the bulk of
      the changes.
      
      Updating to pygresql 5.1.2 provides numerous benfits, including the
      following:
      
      - CVE-2018-1058 was addressed in pygresql 5.1.1
      
      - We can save notices in the pgdb module, rather than relying on importing
      the pg module, thanks to the new "set_notices()"
      
      - pygresql 5 supports python3
      
      - Thanks to a change in the cursor, using a "with" syntax guarentees a
        "commit" on the close of the with block.
      
      This commit is a starting point for additional changes, including
      refactoring the dbconn module.
      
      Additionally, since isolation2 uses pygresql, some pl/python scripts
      were updated, and isolation2 SQL output is further decoupled from
      pygresql. The output of a psql command should be similar enough to
      isolation2's pg output that minimal or no modification is needed to
      ensure gpdiff can recognize the output.
      Co-Authored-by: NTyler Ramer <tramer@pivotal.io>
      Co-authored-by: NJamie McAtamney <jmcatamney@pivotal.io>
      f5758021
  12. 17 2月, 2020 1 次提交
    • H
      Fix dependencies issue in GPPKG utility · 497305b7
      Haozhou Wang 提交于
      1. When two gppkg packages have the same dependencies, gppkg utility
         will refuse to install the second gppkg package and throw an error.
         This patch fixes this issue and the second gppkg package can install
         successfully.
      
      2. Fix install/uninstall issue if the master and standby master use the
         same node address.
      497305b7
  13. 11 2月, 2020 1 次提交
    • A
      Incremental recovery and rebalance should run pg_rewind in parallel · 43ad9d05
      Asim R P 提交于
      Incremental recovery and rebalance operations involve running
      pg_rewind against failed primaries.  This patch changes gprecoverseg
      such that pg_rewind is invoked in parallel, using the WorkerPool
      interface, for each affected segment in the cluster.  There is no
      reason to rewind segments one after the other.
      
      Fixes Github issue #9466
      
      Reviewed by: Mark Sliva and Paul Guo
      43ad9d05
  14. 31 12月, 2019 1 次提交
  15. 07 8月, 2019 1 次提交
  16. 05 8月, 2019 1 次提交
    • D
      Assorted gpMgmt/bin Makefile fixups · 9f707e1e
      Daniel Gustafsson 提交于
      The tree I was working off clearly had stale files, which led me to
      include two utils which were removed some time ago: gpcheckutil.py
      and gpcheck.py. Remove these two from their respective Makefiles.
      
      Also fix a Bash error in the Stream symlink test, the logoical AND
      requires [[ .. ]]; rather than [ .. ];.
      
      Both of these spotted while repeatedly running make install with
      trees in various states.
      9f707e1e
  17. 31 7月, 2019 1 次提交
    • D
      Convert bin, sbin and doc in gpMgmt to recursive targets · b5aba18b
      Daniel Gustafsson 提交于
      Installing the Management utilities used to be pretty brute-force
      operation which copied more or less everything over blindly and then
      tried to remove what shouldn't be installed. This is clearly not a
      terribly clean and sustainable solution, as subsequent issues with
      it has proven (editor savefiles, patch .rej/.orig files etc were
      routinely copied and never purged etc).
      
      This takes a first stab at turning installation of gpMgmt/bin, sbin
      and doc into proper recursive make targets which only install the
      files that were intended to be installed.
      
      Discussion: https://github.com/greenplum-db/gpdb/pull/8179
      Reviewed by Bradford Boyle, Kalen Krempely, Jamie McAtamney and
      many more
      b5aba18b
  18. 05 7月, 2019 1 次提交
    • H
      Re-enable gppkg behave test for both centos and ubuntu (#8066) · af73aca6
      Hao Wu 提交于
      * Revert "CI: skip CLI behave tests that currently fail on ubuntu18.04"
      
      This reverts commit 6a0cb2c6.
      This commit is to re-enable behave test for centos/ubuntu
      
      * Ouput the same error message format
      
      gppkg for rpm and deb outputs error messages when consistency is broken, but
      the message is not the same, which make a pipeline fail.
      
      * Update gppkg for behave test and fix some minor error in gppkg
      
      Now, the binary sample.gppkg is removed, instead we add sample-*.rpm and sample-*.deb.
      Because sample.gppkg is platform specific and GP major version sensitive. The uploaded
      rpm/deb files are available unless the rpm/deb file is incompatible on the specific
      platform. GP_MAJORVERSION is dynamically retrieved from a makefile installed in the gpdb
      folder, so the gp major version in the gppkg will be always correct.
      sample.gppkg will only be generated when gppkg tag is provided
      af73aca6
  19. 07 6月, 2019 1 次提交
  20. 28 5月, 2019 1 次提交
    • X
      Optimize explicit transactions · b43629be
      xiong-gang 提交于
      Currently, explicit 'BEGIN' creates a full-size writer gang and starts a transaction
      on it, the following 'END' will commit the transaction in a two-phase way. It can be
      optimized for some cases:
      case 1:
      BEGIN;
      SELECT * FROM pg_class;
      END;
      
      case 2:
      BEGIN;
      SELECT * FROM foo;
      SELECT * FROM bar;
      END;
      
      case 3:
      BEGIN;
      INSERT INTO foo VALUES(1);
      INSERT INTO bar VALUES(2);
      END;
      
      For case 1, it's unnecessary to create a gang and no need to have two-phase commit.
      For case 2, it's unnecessary to have two-phase commit as the executors don't write
      any XLOG.
      For case 3, don't have to create a full-size writer gang and do two-phase commit on
      a full-size gang.
      Co-authored-by: NJialun Du <jdu@pivotal.io>
      b43629be
  21. 12 4月, 2019 2 次提交
  22. 05 4月, 2019 1 次提交
    • J
      UtilsTestCase: improve test reporting for flake · 0fe07f49
      Jacob Champion 提交于
      test_RemoteOperation_logger_debug() has been flaking out on the CI
      pipeline, and there's no indication of what is going wrong. Replace the
      assertTrue() call, which gives no indication of the difference between
      actual and expected, with mock.assert_has_calls(), which will tell us
      exactly what the calls were in case of failure.
      
      It's possible that this will fix the flake entirely. The previous test
      implementation depended on logger.debug() to be called *first* with our
      expected output, but given the poor isolation of our global logger
      system, it's entirely possible that some other code occasionally calls
      debug(). (That this is an issue at all indicates that this isn't really
      a unit test, but that's not something to tackle here.) assert_has_calls()
      doesn't mind how many other calls happen as long as the one we're
      looking for is eventually made, and I think that matches the intent of
      the test better anyway.
      
      Backport to 6X_STABLE.
      0fe07f49
  23. 22 3月, 2019 3 次提交
  24. 21 3月, 2019 2 次提交
  25. 19 3月, 2019 1 次提交
  26. 14 3月, 2019 2 次提交
  27. 09 3月, 2019 2 次提交
    • J
      Revert recent changes to gpinitstandby and gprecoverseg · 659f0ee5
      Jacob Champion 提交于
      One of these changes appears to have possibly introduced a serious
      performance regression in the master pipeline. To avoid destabilizing
      work over the weekend, I'm reverting for now and we can investigate more
      fully next week.
      
      This reverts the following commits:
      "gprecoverseg: Show progress of pg_basebackup on each segment"
          1b38c6e8
      "Add gprecoverseg -s to show progress sequentially"
          9e89b5ad
      "gpinitstandby: guide the user on single-host systems"
          c9c3c351
      "gpinitstandby: rename -F to -S and document it"
          ba3eb5b4
      659f0ee5
    • K
      Add gprecoverseg -s to show progress sequentially · 9e89b5ad
      Kalen Krempely 提交于
      When -s is present, show pg_basebackup progress sequentially instead
      of inplace. Useful when writing to a file, or if a tty does not support
      escape sequences. Defaults to showing the progress inplace.
      9e89b5ad