1. 09 11月, 2020 5 次提交
    • H
      Revert "ic-proxy: refresh peers on demand" · 7f558913
      Hubert Zhang 提交于
      This reverts commit 9265ea6a.
      7f558913
    • N
      ic-proxy: refresh peers on demand · 9265ea6a
      Ning Yu 提交于
      The user can adjust the ic-proxy peer addresses at runtime and reload by
      sending SIGHUP, if an address is modified or removed, the corresponding
      peer connection must be closed or reestablished.  The same to the peer
      listener, if the listener port is changed, then must re-setup the
      listener.
      9265ea6a
    • N
      ic-proxy: classify peer addresses · 854c4b84
      Ning Yu 提交于
      The peer addresses are specified with the GUC
      gp_interconnect_proxy_addresses, it can be reloaded on SIGHUP, we used
      to only care about the newly added ones, however it is also possible for
      the user to modify them, or even remove some of them.
      
      So now we add the logic to classify the addresses after parsing the GUC,
      we can tell whether an address is added, removed, or modified.
      
      The handling of the classified addresses will be done in the next
      commit.
      854c4b84
    • N
      ic-proxy: optimize looking up of my addr · 40facdb1
      Ning Yu 提交于
      We used to scan the whole addr list to find my addr, now we record it
      directly when parsing the addresses.
      40facdb1
    • N
      ic-proxy: rename ICProxyAddr.addr to sockaddr · 2c2ca626
      Ning Yu 提交于
      A ICProxyAddr variable is usually named as "addr", so the attribute is
      referred as "addr->addr", it's confusing and sometimes ambiguous.
      
      So renamed the attribute to "sockaddr", the function
      ic_proxy_extract_addr() is also renamed to ic_proxy_extract_sockaddr().
      2c2ca626
  2. 28 10月, 2020 1 次提交
    • mask all signal in the udp pthreads · 54451fc0
      盏一 提交于
      In some cases, some signals (like SIGQUIT) that should only be
      processed by the main thread of the postmaster may be dispatched to rxThread.
      So we should and it is safe to block all signals in the udp pthreads.
      
      Fix #11006
      54451fc0
  3. 28 9月, 2020 1 次提交
  4. 10 9月, 2020 1 次提交
    • N
      ic-proxy: support hostname as proxy addresses · 2a1794bc
      Ning Yu 提交于
      The GUC gp_interconnect_proxy_addresses is used to set the listener
      addresses and ports of all the proxy bgworkers, only IP addresses were
      supported previously, which is inconvenient to use.
      
      Now we add the support for hostnames too, the IP addresses are also
      supported.
      
      Note that if a hostname is bound to a different IP at runtime, we must
      reload the setting with the "gpstop -u" command.
      Reviewed-by: NHubert Zhang <hzhang@pivotal.io>
      2a1794bc
  5. 02 9月, 2020 2 次提交
    • H
      Fix compile error for missing brackets · b2d32cb9
      Hubert Zhang 提交于
      b2d32cb9
    • H
      Using lwlock to protect resgroup slot in session state · a4cb06b4
      Hubert Zhang 提交于
      Resource group used to access resGroupSlot in SessionState without
      lock. This is correct when session only access resGroupSlot by itself.
      But as we introduced runaway feature, we need to traverse the current
      session array to find the top consumer session when redzone is reached.
      This requires:
      1. runaway detector should hold shared resgroup lock to avoid resGroupSlot
      is detached from a session concurrently when redzone is reached.
      2. normal session should hold exclusive lock when modifying resGroupSlot
      in SessionState.
      
      Also fix a compile warning.
      Reviewed-by: NNing Yu <nyu@pivotal.io>
      a4cb06b4
  6. 01 9月, 2020 1 次提交
  7. 14 8月, 2020 1 次提交
    • H
      Using libuv 1.18 API in ic-proxy · ab36eb90
      Hubert Zhang 提交于
      ic-proxy is developed with libuv, the minimal supported libuv version is
      1.18.0. But in commit 608514, we introduce new API in libuv 1.19, which
      break compatibility on os like Ubuntu 18.04 whose default libuv version
      is 1.18.
      
      We should keep our code base align with libuv 1.18, and replace the new
      API in libuv 1.19. The API change is mainly about how to access data field
      in uv handle and uv loop. The new API uses function interface like
      `uv_handle_set_data` and `uv_handle_get_data` to access data filed, while
      the old API in 1.18 access the data filed directly. Note that in the latest
      libuv version 1.38.2, the old API and new API are both supported. And libuv
      is stable enough to support the old API for a long time.
      ab36eb90
  8. 12 8月, 2020 2 次提交
    • H
      Fix compilation without libuv's uv.h header. · 7858128f
      Heikki Linnakangas 提交于
      ic_proxy_backend.h includes libuv's uv.h header, and ic_proxy_backend.h
      was being included in ic_tcp.c, even when compiling with
      --disable-ic-proxy.
      7858128f
    • H
      ic-proxy: support parallel backend registeration to proxy · 608514c5
      Hubert Zhang 提交于
      Previously, when backends connect to a proxy, we need to setup
      domain socket pipe and send HELLO message(recv ack message) in
      a blocking and non-parallel way. This makes ICPROXY hard to introduce
      check_for_interrupt during backend registeration.
      
      By utilizing libuv loop, we could register backend in paralle. Note
      that this is one of the step to replace all the ic_tcp backend logic
      reused by ic_proxy currently. In future, we should use libuv to replace
      all the backend logic, from registeration to send/recv data.
      Co-authored-by: NNing Yu <nyu@pivotal.io>
      608514c5
  9. 10 8月, 2020 1 次提交
    • N
      ic-proxy: type checking in ic_proxy_new() · a3ef623d
      Ning Yu 提交于
      A typical mistake on allocating typed memory is as below:
      
          int64 *ptr = malloc(sizeof(int32));
      
      To prevent this, now we make ic_proxy_new() a typed allocator, it always
      return a pointer of the specified type, for example:
      
          int64 *p1 = ic_proxy_new(int64); /* good */
          int64 *p2 = ic_proxy_new(int32); /* bad, gcc will raise a warning */
      Reviewed-by: NHubert Zhang <hzhang@pivotal.io>
      a3ef623d
  10. 06 8月, 2020 1 次提交
  11. 04 8月, 2020 1 次提交
    • N
      ic-proxy: correct SIGHUP handler · a181655b
      Ning Yu 提交于
      Fixed the bug that the SIGHUP handler was installed for SIGINT by
      mistake, so the ic-proxy bgworkers would die on SIGHUP.
      
      By correcting the signal name, now we could let the ic-proxy bgworkers
      reload the postgresql.conf by executing "gpstop -u".
      Reviewed-by: NHubert Zhang <hzhang@pivotal.io>
      a181655b
  12. 03 8月, 2020 1 次提交
    • N
      ic-proxy: handle early coming BYE correctly · 79ff4e62
      Ning Yu 提交于
      In a query that contains multiple init/sub plans, the packets of the
      second subplan might be received while the first is still being
      processed in the ic-proxy mode, this is because in ic-proxy mode a local
      host handshake is used instead of the global one.
      
      To distinguish the packets of different subplans, especially for the
      early coming ones, we must stop handling on the BYE immediately, and
      pass any unhandled early coming pkts to the successor or the
      placeholder.
      
      This fixes the random hanging during the ICW parallel group of
      qp_functions_in_from.  No new test is added.
      Co-authored-by: NHubert Zhang <hzhang@pivotal.io>
      Co-authored-by: NNing Yu <nyu@pivotal.io>
      79ff4e62
  13. 29 7月, 2020 1 次提交
    • N
      ic-proxy: include postmaster pid in the domain socket path · 5c5a358a
      Ning Yu 提交于
      We used to store them under /tmp/, we include the postmaster port number
      in the file name in the hope that two clusters will not conflict with
      each other on this file.
      
      However the conflict still happen in the test src/bin/pg_basebackup.
      And it can also happen if a second cluster is missed configured by
      accident.  So to make things safe we also include the postmaster pid in
      the domain socket path, there is no chance for two postmasters to share
      the same pids.
      Reviewed-by: NPaul Guo <pguo@pivotal.io>
      5c5a358a
  14. 23 7月, 2020 2 次提交
    • N
      ic-proxy: reload addresses on SIGHUP · c2523232
      Ning Yu 提交于
      We used to mark the GUC gp_interconnect_proxy_addresses as
      PGC_POSTMASTER, so the cluster must be restarted to reload this setting,
      this can be a problem during gpexpand: the cluster expansion itself is
      online, but to configure the proxy addresses for the new segments a
      restart is needed.
      
      Now we changed it to PGC_SIGHUP, so the setting can be reloaded on
      SIGHUP.
      
      Also changed the setting from a developer option to a normal one.
      c2523232
    • N
      ic-proxy: do not generate too many messages · c6c36cc8
      Ning Yu 提交于
      c6c36cc8
  15. 10 7月, 2020 3 次提交
    • N
      ic-proxy: enable ic-proxy with --enable-ic-proxy · 81810a20
      Ning Yu 提交于
      We used to use the option --with-libuv to enable ic-proxy, it is not
      staightforward to understand the purpose of that option, though.  So we
      renamed it to --enable-ic-proxy, and the default setting is changed to
      "disable".
      
      Suggested by Kris Macoskey <kmacoskey@pivotal.io>
      81810a20
    • N
      ic-proxy: let backends connect to the proxy bgworker · 94c9d996
      Ning Yu 提交于
      Only in proxy mode, of course.  Currently the ic-proxy mode shares most
      of the backend logic with ic-tcp mode, so instead of copying the code we
      actually embed the ic-proxy specific logic in ic_tcp.c .
      94c9d996
    • N
      ic-proxy: implement the core logic · 6188fb1f
      Ning Yu 提交于
      The interconnect proxy mode, a.k.a. ic-proxy, is a new interconnect
      mode, all the backends communicate via a proxy bgworker, all the
      backends on the same segment share the same proxy bgworker, so every two
      segments only need one network connection between them, which reduces
      the network flows as well the ports.
      
      To enable the proxy mode we need to first configure the guc
      gp_interconnect_proxy_addresses, for example:
      
          gpconfig \
            -c gp_interconnect_proxy_addresses \
            -v "'1:-1:10.0.0.1:2000,2:0:10.0.0.2:2001,3:1:10.0.0.3:2002'" \
            --skipvalidation
      
      Then restart to take effect.
      6188fb1f
  16. 05 6月, 2020 1 次提交
  17. 03 6月, 2020 2 次提交
    • A
      Squash me: address concerns in code review · bf36fb3b
      Asim R P 提交于
      Remember if the select call was interrupted.  Act on it after emitting
      debug logs and checking cancel requests from dispatcher.
      bf36fb3b
    • A
      Check errno as early as possible · 9fd138da
      Asim R P 提交于
      Previously, the result of select() system call and errno set by it was
      checked after performing several function calls, including checking
      for interrupts and checkForCancelFromQD.  That made it very likely for
      errno to change, losing the original value that was set by the
      select().
      
      This patch fixes it so that the errno is checked immediately after the
      system call.  This should address intermittent failures in CI with
      error message like this:
      
          ERROR","58M01","interconnect error: select: Success"
      9fd138da
  18. 29 5月, 2020 1 次提交
  19. 25 5月, 2020 1 次提交
    • P
      Fix a hung issue caused by gp_interconnect_id disorder · 644bde25
      Pengzhou Tang 提交于
      This issue is exposed when doing an experiment to remove the
      special "eval_stable_functions" handling in evaluate_function(),
      qp_functions_in_* test cases will get stuck sometimes and it turns
      out to be a gp_interconnect_id disorder issue.
      
      Under UDPIFC interconnect, gp_interconnect_id is used to
      distinguish the executions of MPP-fied plan in the same session
      and in the receiver side, packets with smaller gp_interconnect_id
      is treated as 'past' packets, receiver will stop the sender to send
      the packets.
      
      The RCA of the hung is:
      1. QD call InitSliceTable() to advance the gp_interconnect_id and
      store it in slice table.
      2. In CdbDispatchPlan->exec_make_plan_constant(), QD find some
      stable function need to be simplified to const, then it executes
      this function first.
      3. The function contains the SQL, QD init another slice table and
      advance the gp_interconnect_id again, QD dispatch the new plan and
      execute it.
      4. After the function is simplified to const, QD continues to dispatch
      the previous plan, however, the gp_interconnect_id for it becomes the
      older one. When a packet comes, if the receiver hasn't set up the
      interconnect yet, the packet will be handled by handleMismatch() and
      it will be treated as `past` packets and the senders will be stopped
      earlier by the receiver. Later the receiver finish the setup of
      interconnect, it cannot get any packets from senders and get stuck.
      
      To resolve this, we advance the gp_interconnect_id when a plan is
      really dispatched, the plan is dispatched sequentially, so the later
      dispatched plan will have a higher gp_interconnect_id.
      
      Also limit the usage of gp_interconnect_id in rx thread of UDPIFC,
      we prefer to use sliceTable->ic_instance_id in main thread.
      Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Reviewed-by: NAsim R P <apraveen@pivotal.io>
      Reviewed-by: NHubert Zhang <hzhang@pivotal.io>
      644bde25
  20. 22 5月, 2020 1 次提交
    • P
      Monitor dispatcher connection when receiving from TCP interconnect · c1d45e9e
      Pengzhou Tang 提交于
      This is mainly to resolve slow response to sequence requests under
      TCP interconnect, sequence requests are sent through libpqs from
      QEs to QD (we call them dispatcher connections). In the past, under
      TCP interconnect, QD checked the events on dispatcher connections
      every 2 seconds, obviously it's inefficient.
      
      Under UDPIFC mode, QD also monitors the dispatcher connections when
      receving tuples from QEs so QD can process sequence requests in
      time, this commit applies the same logic to the TCP interconnect.
      Reviewed-by: NHao Wu <gfphoenix78@gmail.com>
      Reviewed-by: NNing Yu <nyu@pivotal.io>
      c1d45e9e
  21. 27 4月, 2020 1 次提交
    • P
      Fix a race condition in flushBuffer · 51c1bf91
      Pengzhou Tang 提交于
      flushBuffer() is used to send packets through TCP interconnect, before
      sending, it first check whether receiver stopped or teared down the
      interconnect, however, there is window between checking and sending, the
      receiver may tear down the interconnect and close the peer, so send()
      will report an error, to resolve this, we recheck whether the receiver
      stopped or teared down the interconnect in this window and don't error
      out in that case.
      Reviewed-by: NJinbao Chen <jinchen@pivotal.io>
      Reviewed-by: NHao Wu <hawu@pivotal.io>
      51c1bf91
  22. 24 4月, 2020 1 次提交
    • H
      Remove TUPLE_CHUNK_ALIGN. · 30ca2852
      Heikki Linnakangas 提交于
      It was set to 1 on all supported platforms, and I'm almost certain it
      would be broken if you tried to set it to anything else, because it hasn't
      been tested for a long time.
      
      As far as I can see, the alignment was only needed because in the
      receiving side, we cast the buffer into a TupSerHeader pointer, and there
      was otherwise no guarantee that the buffer was suitably aligned for
      TupSerHeader. That's easy to fix by memcpy()ing the TupSerHeader into a
      local variable that's properly aligned.
      Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
      30ca2852
  23. 23 4月, 2020 1 次提交
    • P
      Remove forceEos mechanism for TCP interconnect · 041d9399
      Pengzhou Tang 提交于
      In TCP interconnect, the sender used to force an EOS messages to the
      receiver in two cases:
      1. cancelUnfinished is true in mppExecutorFinishup.
      2. an error occurs.
      
      For case1, the comment says: to finish a cursor, the QD used to send
      a cancel to the QEs, QEs then set the cancelUnfinished flag and did
      a normal executor finish up. We now use QueryFinishPending mechanism
      to stop a cursor, so case1 logic is invalid for a long time.
      
      For case2, the purpose is: when an error occurs, we force an EOS to
      the receiver so the receiver didn't report an interconnect error and
      QD then will check the dispatch results and report the errors in the
      QEs. From the view of interconnect, we have selectedd to the end of
      the query and no error in the interconnect, this logic has two
      problems:
      1. it doesn't work for initplan, initplan will not check the dispatch
      results and throw the errors, so when an error occurs in the QEs for
      the initplan, the QD cannot notice that.
      2. it doesn't work for cursors, for example:
         DECLARE c1 cursor for select i from t1 where i / 0 = 1;
         FETCH all from c1;
         FETCH all from c1;
      All FETCH commands don't report errors which is not expected.
      
      This commit removed the forceEos mechanism, for the case2, the
      receiver will report an interconnect error without forceEos, this is
      ok because when multiple errors reports from QEs, the QD is inclined
      to report non-interconnect error.
      041d9399
  24. 20 4月, 2020 1 次提交
    • H
      Use a unicast IP address for interconnection (#9696) · 790c7bac
      Hao Wu 提交于
      * Use a unicast IP address for interconnection on the primary
      
      Currently, interconnect/UDP always binds the wildcard address to
      the socket, which makes all QEs on the same node share the same
      port space(up to 64k). For dense deployment, the UDP port could run
      out, even if there are multiple IP address.
      To increase the total number of available ports for QEs on a node,
      we bind a single/unicast IP address to the socket for interconnect/UDP,
      instead of the wildcard address. So segments with different IP address
      have different port space.
      To fully utilize this patch to alleviate running out of port, it's
      better to assign different ADDRESS(gp_segment_configuration.address) to
      different segment, although it's not mandatory.
      
      Note: QD/mirror uses the primary's address value in
      gp_segment_configuration as the destination IP to connect to the
      primary.  So the primary returns the ADDRESS as its local address
      by calling `getsockname()`.
      
      * Fix the origin of the source IP address for backends
      
      The destination IP address uses the listenerAddr of the parent slice.
      But the source IP address to bind is difficult. Because it's not
      stored on the segment, and the slice table is sent to the QEs after
      they had bound the address and port. The origin of the source
      IP address for different roles is different:
      1. QD : by calling `cdbcomponent_getComponentInfo()`
      2. QE on master: by qdHostname dispatched by QD
      3. QE on segment: by the local address for QE of the TCP connection
      790c7bac
  25. 08 4月, 2020 2 次提交
    • P
      Remove redundant 'hasError' flag in TeardownTCPInterconnect · a6ae448d
      Pengzhou Tang 提交于
      This flag is duplicated with 'forceEOS', 'forceEOS' can also tell
      whether errors occur or not.
      a6ae448d
    • P
      Fix interconnect hung issue · ec1d9a70
      Pengzhou Tang 提交于
      We hit interconnect hung issue many times in many cases, all have
      the same pattern: the downstream interconnect motion senders keep
      sending the tuples and they are blind to the fact that upstream
      nodes have finished and quitted the execution earlier, the QD
      then get enough tuples and wait all QEs to quit which cause a
      deadlock.
      
      Many nodes may quit execution earlier, eg, LIMIT, HashJoin, Nest
      Loop, to resolve the hung issue, they need to stop the interconnect
      stream explicitly by calling ExecSquelchNode(), however, we cannot
      do that for rescan cases in which data might lose, eg, commit
      2c011ce4. For rescan cases, we tried using QueryFinishPending to
      stop the senders in commit 02213a73 and let senders check this
      flag and quit, that commit has its own problem, firstly, QueryFini
      shPending can only set by QD, it doesn't work for INSERT or UPDATE
      cases, secondly, that commit only let the senders detect the flag
      and quit the loop in a rude way (without sending the EOS to its
      receiver), the receiver may still be stuck inreceiving tuples.
      
      This commit revert the QueryFinishPending method firstly.
      
      To resolve the hung issue, we move TeardownInterconnect to the
      ahead of cdbdisp_checkDispatchResult so it guarantees to stop
      the interconnect stream before waiting and checking the status
      of QEs.
      
      For UDPIFC, TeardownInterconnect() remove the ic entries, any
      packets for this interconnect context will be treated as 'past'
      packets and be acked with STOP flag.
      
      For TCP, TeardownInterconnect() close all connection with its
      children, the children will treat any readable data in the
      connection as a STOP message include the closure operation.
      
      A test case is not included, both commit 2c011ce4 and 02213a73
      contain one.
      ec1d9a70
  26. 23 3月, 2020 2 次提交
    • H
      Revert "Make fault injector configurable (#9532)" (#9795) · 495343e1
      Hao Wu 提交于
      This reverts commit 7f5c7da1.
      495343e1
    • H
      Make fault injector configurable (#9532) · 7f5c7da1
      Hao Wu 提交于
      Definition FAULT_INJECTOR is hardcoded in a header(pg_config_manual.h) file.
      Fault injector is useful, but it may introduce some issues in production
      stage, like runtime cost, security problems. It's better to enable this
      feature in development and disable it in release.
      
      To achieve this target, we add a configure option to make fault injector
      configurable. When fault injector is disabled, some tests using this feature
      should be avoided to run ICW. Under isolation2 and regress, there are a lot of
      tests. Now, all tests under isolation2 and regress that depend on fault injector
      are moved to a new schedule file. The pattern name of it is XXX_faultinjector_schedule.
      
      **NOTE**
      All tests that depend on fault injector are saved to the XXX_faultinjector_schedule.
      With this rule, we only run tests that don't depend on fault injector when fault injector
      is disabled.
      
      The schedule files used for fault injector are:
      src/test/regress/greenplum_faultinjector_schedule
      src/test/isolation2/isolation2_faultinjector_schedule
      src/test/isolation2/isolation2_resgroup_faultinjector_schedule
      Reviewed-by: NAsim R P <apraveen@pivotal.io>
      Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
      7f5c7da1
  27. 02 3月, 2020 2 次提交