1. 19 4月, 2018 1 次提交
    • D
      Speed up dispatcher detection of segment state changes · 85101317
      David Kimura 提交于
      Dispatcher has DISPATCH_WAIT_TIMEOUT_MSEC (current value is 2000) as poll
      timeout. It waited for 30 iterations of poll to timeout before checking the
      segment status. And then initiated fts probe before checking the segment
      status. As a result it took ~minute for query to fail in case of segment
      failures.
      
      This commit updates to check segment status on every poll timeout. It also
      leverages fts version to optimize whether to check segments. It avoids
      performing fts probe, instead it relies on fts to be called on regular
      intervals and provide cached results.
      
      With this change test time for twophase_tolerance_with_mirror_promotion was cut
      down by ~2 minutes.
      Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
      85101317
  2. 23 3月, 2018 1 次提交
  3. 07 12月, 2017 1 次提交
    • P
      Resent a cancel/finish signal if QE didn't respond for a long time. · 07ee8008
      Pengzhou Tang 提交于
      Previously, dispatcher only send cancel/finish signal to QEs once, so if
      the signal arrives faster than the query or is omitted by the secure_read(),
      the QE may have no chance to quit if the QE is assigned to execute a MOTION
      node and it's peer has been canceled.
      
      This fixes issue #3950
      07ee8008
  4. 09 11月, 2017 1 次提交
  5. 08 11月, 2017 2 次提交
  6. 02 11月, 2017 1 次提交
    • H
      Wake up faster, if a segment returns an error. · 3bbedbe9
      Heikki Linnakangas 提交于
      Previously, if a segment reported an error after starting up the
      interconnect, it would take up to 250 ms for the main thread in the QD
      process to wake up and poll the dispatcher connections, and to see that
      there was an error. Shorten that time, by waking up immediately if the
      QD->QE libpq socket becomes readable while we're waiting for data to
      arrive in a Motion node.
      
      This isn't a complete solution, because this will only wake up if one
      arbitrarily chosen connection becomes readable, and we still rely on
      polling for the others. But this greatly speeds up many common scenarios.
      In particular, the "qp_functions_in_select" test now runs in under 5 s
      on my laptop, when it took about 60 seconds before.
      3bbedbe9
  7. 30 10月, 2017 2 次提交
    • A
      Retire gp_libpq_fe part 2, changing including path · 974c414e
      Adam Lee 提交于
      Signed-off-by: NAdam Lee <ali@pivotal.io>
      974c414e
    • A
      Retire gp_libpq_fe part 1, libpq itself · 510a20b6
      Adam Lee 提交于
          commit b0328d5631088cca5f80acc8dd85b859f062ebb0
          Author: mcdevc <a@b>
          Date:   Fri Mar 6 16:28:45 2009 -0800
      
              Separate our internal libpq front end from the client libpq library
              upgrade libpq to the latest to pick up bug fixes and support for more
              client authentication types (GSSAPI, KRB5, etc)
              Upgrade all files dependent on libpq to handle new version.
      
      Above is the initial commit of gp_libpq_fe, seems no good reasons still
      having it.
      
      Key things this PR do:
      
      1, remove the gp_libpq_fe directory.
      2, build libpq source codes into two versions, for frontend and backend,
      check the macro FRONTEND.
      3, libpq for backend still bypasses local authentication, SSL and some
      environment variables, and these are the whole differences.
      Signed-off-by: NAdam Lee <ali@pivotal.io>
      510a20b6
  8. 10 10月, 2017 1 次提交
  9. 01 9月, 2017 1 次提交
  10. 09 8月, 2017 1 次提交
    • P
      Do not include gp-libpq-fe.h and gp-libpq-int.h in cdbconn.h · cf7cddf7
      Pengzhou Tang 提交于
      The whole cdb directory was shipped to end users and all header files
      that cdb*.h included are also need to be shipped to make checkinc.py
      pass. However, exposing gp_libpq_fe/*.h will confuse customer because
      they are almost the same as libpq/*, as Heikki's suggestion, we should
      keep gp_libpq_fe/* unchanged. So to make system work, we include
      gp-libpg-fe.h and gp-libpq-int.h directly in c files that need them
      cf7cddf7
  11. 31 7月, 2017 1 次提交
    • M
      Implement "COPY ... FROM ... ON SEGMENT' · e254287e
      Ming LI 提交于
      Support COPY statement that imports the data file on segments directly
      parallel. It could be used to import data files generated by "COPY ...
      to ... ON SEGMENT'.
      
      This commit also supports all kinds of data file formats which "COPY ...
      TO" supports, processes reject limit numbers and logs errors accordingly.
      
      Key workflow:
         a) For COPY FROM, nothing changed by this commit, dispatch modified
         COPY command to segments at first, then read data file on master, and
         dispatch the data to relevant segment to process.
      
         b) For COPY FROM ON SEGMENT, on QD, read dummy data file, other parts
         keep unchanged, on QE, process the data stream (empty) dispatched
         from QD at first, then re-do the same workflow to read and process
         the local segment data file.
      Signed-off-by: NMing LI <mli@pivotal.io>
      Signed-off-by: NAdam Lee <ali@pivotal.io>
      Signed-off-by: NHaozhou Wang <hawang@pivotal.io>
      Signed-off-by: NXiaoran Wang <xiwang@pivotal.io>
      e254287e
  12. 24 7月, 2017 1 次提交
    • X
      Use non-blocking recv() in internal_cancel() · 23e5a5ee
      xiong-gang 提交于
      The issue of hanging on recv() in internal_cancel() are reported
      serveral times, the socket status is shown 'ESTABLISHED' on master,
      while the peer process on the segment has already exit. We are not
      sure how exactly dose this happen, but we are able to simulate this
      hang issue by dropping packet or reboot the system on the segment.
      
      This patch use poll() to do non-blocking recv() in internal_cancel();
      the timeout of poll() is set to the max value of authentication_timeout
      to make sure the process on segment has already exit before attempting
      another retry; and we expect retry on connect() can detect network issue.
      Signed-off-by: NNing Yu <nyu@pivotal.io>
      23e5a5ee
  13. 14 2月, 2017 1 次提交
    • P
      Fix dispatch and interconnect defects when postmaster is not alive · e28c84b2
      Pengzhou Tang 提交于
      Although postmaster of one segment is killed, QEs of it are still available and for some defects, query may get hung. Improvements in this commit include:
      1. Interconnect motion receiver and sender check segments status if no data available for long time
      to avoid query hang issue.
      2. Add segments status checking into gang sanity test.
      3. Do not reuse Gangs whose postmaster is not alive, and recreate a new one.
      4. Check segments status when creating gang failed.
      5. Close connection if it's peer is down
      e28c84b2
  14. 10 2月, 2017 1 次提交
    • H
      Remove unused atomic functions. · 2cd519d3
      Heikki Linnakangas 提交于
      None of the source files that #included gp_atomic.h actually needed the
      declarations from gp_atomic.h. They actually needed the definitions from
      port/atomics.h, which gp-atomic.h in turn #included.
      2cd519d3
  15. 14 11月, 2016 1 次提交
    • X
      Use nonblocking mechanism to send data in async dispatcher. · 2516eac6
      xiong-gang 提交于
      pqFlush is sending data synchronously though the socket is set
      O_NONBLOCK, this incurs performance downgradation. This commit uses
      pqFlushNonBlocking instead, and synchronizes the completion of
      dispatching to all Gangs before query execution.
      
      Signed-off-by: Kenan Yao<kyao@pivotal.io>
      2516eac6
  16. 13 9月, 2016 2 次提交
    • P
      Remove duplicate checks that already exist within PQsendGpQuery_shared() · d6a5c7a8
      Pengzhou Tang 提交于
      Before dispatching a command, we assume the connection is newly created or is reused. For newly created connection, it must be idle, for reused connection, it should have been cleaned up, meanwhile within the internal dispatching, PQsendGpQuery_shared() also do the busy checking and bad connection checking, so pre-checking looks like unnecessary
      d6a5c7a8
    • P
      Speed up QE cancel when one or more QEs got errors · 39ed6031
      Pengzhou Tang 提交于
      QD need to cancel QEs when
      1) QD get a error
      2) one or more QEs got error and cancelOnError was set to true.
      
      We want to cancel QEs as soon as possible once above conditions are reached, but considering
      the cost of cancelling QEs is high, we want to process as many pending finish QEs as possible
      before actually cancel. The original interval before cancelling is 2 seconds which is too
      long that users will see an obvious delay before errors are reported, this commit lower
      this interval to 100 ms to speed up the cancelling process.
      39ed6031
  17. 29 8月, 2016 1 次提交
    • P
      Fix few dispatch related bugs · eb40e073
      Pengzhou Tang 提交于
      1.Fix primary writer gang leak: accidentally set PrimaryWriterGang to NULL which cause disconnectAndDestroyAllGangs()
        can not destroy primary writer gang.
      2.Fix gang leak: when creating gang, if retry count exceed the limitation, forget to destroy the failed gang.
      3.Remove duplicate sanity check before dispatchCommand().
      4.Remove unnecessary error-out when a broken Gang is no longer needed.
      5.Fix thread leak problem
      6.Enhance error handling for cdbdisp_finishCommand
      eb40e073
  18. 17 7月, 2016 1 次提交