提交 · 7118e8aca825b743dd9477d19406fcc06fa53852 · Greenplum / Gpdb

10 6月, 2020 1 次提交
- W
  Revert "Add "FILL_MISSING_FIELDS" option for gpload." (#10280) · 7118e8ac
  由 Wen Lin 提交于 6月 10, 2020
```
This reverts commit 87fef901.
```
  7118e8ac
04 6月, 2020 1 次提交
- W
  
  Add "FILL_MISSING_FIELDS" option for gpload. · 87fef901
  由 Wen Lin 提交于 6月 04, 2020
  
  87fef901
21 5月, 2020 1 次提交

Fix the gpload error that have no attribute "staging_table" or "fast_path". · e5595d5d

由 Wen Lin 提交于 5月 21, 2020

while gpload is loading data if the configure file contains "error_table" and doesn't contain "preload", an error of no attribute "staging_table" or "fast_path" occurs.

e5595d5d

12 5月, 2020 1 次提交

[skip-ci][6X_STABLE] gpload: improve error message (#10077) · 56e1e09c

由 Peifeng Qiu 提交于 5月 12, 2020

gpload in the latest windows client package requires VS redistributable
package. Output more meaningful message if pg.py fails to load.

56e1e09c

02 3月, 2020 1 次提交

Add max_retries flag for gpload (#9606) · 1bb52ea8

由 Huiliang.liu 提交于 2月 28, 2020

Add max_retries flag for gpload. It indicates the max times on connecting to GPDB timed out.
max_retries default value is 0 which means no retry.
If max_retries is -1 or other negative value, it means retry forever.

Test has been done manually.

( cherry pick from master commit: b891b85b)

1bb52ea8

03 1月, 2020 2 次提交
- H
  catch ImportError for gpversion (#9346) · 00f6b533
  由 Huiliang.liu 提交于 1月 03, 2020
```
cherry pick from gpdb master.
gpload will run in GPDB6 compatibility mode if imports gpVersion failed
```
  00f6b533
- H
  GPload: change metadata query SQL to improvement performance (#8904) · 770412c3
  由 Huiliang.liu 提交于 11月 01, 2019
```
GPload: change metadata query SQL to improvement performance
Old query SQL may take long time if catalog is large.
```
  770412c3
24 9月, 2019 1 次提交

Ship subprocess32 and replace subprocess with it in python code (#8658) · 7e44dbf1

由 Paul Guo 提交于 9月 20, 2019

* Ship modified python module subprocess32 again

subprocess32 is preferred over subprocess according to python documentation.
In addition we long ago modified the code to use vfork() against fork() to
avoid some "Cannot allocate memory" kind of error (false alarm though - memory
is actually sufficient) on gpdb product environment that is usually with memory
overcommit disabled.  And we compiled and shipped it also but later it was just
compiled but not shipped somehow due to makefile change (maybe a regression).
Let's ship it again.

* Replace subprocess with our own subprocess32 in python code.

Cherry-picked 9c4a885b and
              da724e8d and
              a8090c13 and
              4354f28c

7e44dbf1

26 8月, 2019 2 次提交
- H
  Fix gpload unit test case (#8498) · 8602a57b
  由 Huiliang.liu 提交于 8月 26, 2019
```
(cherry picked from commit a905a1eb)
```
  8602a57b
- H
  GPload supports GPDB5 and GPDB6 with the same gpload.py file (#8483) · 3fcceca4
  由 Huiliang.liu 提交于 8月 26, 2019
```
* Get gpdb version and support gpdb5 and gpdb6
* add gpversion.py into windows package

(cherry picked from commit 2fdc38ff)
```
  3fcceca4
11 4月, 2019 1 次提交

Remove references to SunOS and HP-UX (#7356) · dc13bbd8

由 Ben Christel 提交于 4月 09, 2019

We don't support Greenplum on these platforms.

Some files (e.g. Makefile.{hpux,solaris}) have been left in place
because they are upstream postgres files. Removing them isn't
worth the headache it would cause when merging commits from
postgres.

Cherry-picked from 52c37372Authored-by: NBen Christel <bchristel@pivotal.io>

dc13bbd8

01 2月, 2019 1 次提交

Rename gp_distribution_policy.attrnums to distkey, and make it int2vector. · 69ec6926

由 Heikki Linnakangas 提交于 2月 01, 2019

This is in preparation for adding operator classes as a new column
(distclass) to gp_distribution_policy. This naming is consistent with
pg_index.indkey/indclass. Change the datatype to int2vector, also for
consistency with pg_index, and some other catalogs that store attribute
numbers, and because int2vector is slightly more convenient to work with
in the backend. Move the column to the end of the table, so that all the
variable-length and nullable columns are at the end, which makes it
possible to reference the other columns directly in Form_gp_policy.

Add a backend function, pg_get_table_distributedby(), to deparse the
DISTRIBUTED BY definition of a table into a string. This is similar to
pg_get_indexdef_columns(), pg_get_functiondef() etc. functions that we
have. Use the new function in psql and pg_dump, when connected to a GPDB6
server.
Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Co-authored-by: NPeifeng Qiu <pqiu@pivotal.io>
Co-authored-by: NAdam Lee <ali@pivotal.io>

69ec6926

17 1月, 2019 1 次提交

Remove duplicate import and unused vars from gpload · d9bc848a

由 Daniel Gustafsson 提交于 1月 17, 2019

This removes a duplicate import and a few set, but never used, vars
from the gpload.py code as well as the including_defaults token as
it was clearly unused.

Also fixes a few typos while in there, one of which is a user facing
error message.
Reviewed-by: NJacob Champion <pchampion@pivotal.io>

d9bc848a

13 12月, 2018 1 次提交

Reporting cleanup for GPDB specific errors/messages · 56540f11

由 Daniel Gustafsson 提交于 12月 13, 2018

The Greenplum specific error handling via ereport()/elog() calls was
in need of a unification effort as some parts of the code was using a
different messaging style to others (and to upstream). This aims at
bringing many of the GPDB error calls in line with the upstream error
message writing guidelines and thus make the user experience of
Greenplum more consistent.

The main contributions of this patch are:

* errmsg() messages shall start with a lowercase letter, and not end
  with a period. errhint() and errdetail() shall be complete sentences
  starting with capital letter and ending with a period. This attempts
  to fix this on as many ereport() calls as possible, with too detailed
  errmsg() content broken up into details and hints where possible.

* Reindent ereport() calls to be more consistent with the common style
  used in upstream and most parts of Greenplum:

	ereport(ERROR,
			(errcode(<CODE>),
			 errmsg("short message describing error"),
			 errhint("Longer message as a complete sentence.")));

* Avoid breaking messages due to long lines since it makes grepping
  for error messages harder when debugging. This is also the de facto
  standard in upstream code.

* Convert a few internal error ereport() calls to elog(). There are
  no doubt more that can be converted, but the low hanging fruit has
  been dealt with. Also convert a few elog() calls which are user
  facing to ereport().

* Update the testfiles to match the new messages.

Spelling and wording is mostly left for a follow-up commit, as this was
getting big enough as it was. The most obvious cases have been handled
but there is work left to be done here.

Discussion: https://github.com/greenplum-db/gpdb/pull/6378Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

56540f11

30 11月, 2018 1 次提交

Remove unused local variable · 9347d895

由 Daniel Gustafsson 提交于 11月 30, 2018

Reviewed-by: NJacob Champion <pchampion@pivotal.io>
Reviewed-by: NJimmy Yih <jyih@pivotal.io>

9347d895

29 11月, 2018 1 次提交

Compare with None using the is operator · e39047b5

由 Daniel Gustafsson 提交于 11月 29, 2018

While == None works for comparison, it's a wasteful operation as it
performs type conversion and expansion. Instead move to using the
"is" operator which is the documented best practice for Python code.

Reviewed-by: Jacob Champion

e39047b5

14 11月, 2018 1 次提交

Add encoding option as condition of finding reusable table (#6151) (#6205) · 2c6567f2

由 Huiliang.liu 提交于 11月 14, 2018

* Add external table encoding option as condition of finding reusable table

Get database default encoding if ENCODING is not set in config file.
Find encoding code by encoding string and then add encoding code as one of
conditions of finding reusable table.

2c6567f2

30 7月, 2018 1 次提交

gpload: exit with os._exit to prevent hang (#5335) · f3e5c093

由 Peifeng Qiu 提交于 7月 30, 2018

gpload test case will run gpload with subprocess, read stdout
and stderr from it and wait for exit. sys.exit in gpload does some
cleanup may cause deadlock between test and gpload. os._exit will
exit immediately, but we need to flush stdout and stderr before
that.

f3e5c093

24 7月, 2018 1 次提交

Fix external schema bug in fast_match (#5324) · b2e38f17

由 Huiliang.liu 提交于 7月 24, 2018

- The results of fast_match SQL don't include shema name, so we need
add shema name to extSchemaTable for fast_match
- Remove locationStr which is unused.

b2e38f17

23 7月, 2018 1 次提交

support fast_match option in gpload config file (#5310) · d240a284

由 Huiliang.liu 提交于 7月 23, 2018

- add fast_match option in gpload config file. If both reuse_tables
and fast_match are true, gpload will try fast match external
table(without checking columns). If reuse_tables is false and
fast_match is true, it will print warning message.

d240a284

23 4月, 2018 1 次提交
- P
  
  Remove customer names from gpload.py · 9c4c7158
  由 Peifeng Qiu 提交于 4月 23, 2018
  
  9c4c7158
03 4月, 2018 1 次提交

Get rid of pg_exttable.fmterrtbl · 8f6fe2d6

由 Adam Lee 提交于 3月 12, 2018

The pg_exttable.fmterrtbl column stored the OID of the error table, but
without an error table it is just set to the OID of the external table.
That is not necessary, there are other columns which indicate if error
logging is enabled. Therefore this column can be removed.

8f6fe2d6

27 3月, 2018 1 次提交

Use hard kill in gpload to avoid unexpected gpfdist hang (#4765) · 83fb63c0

由 Peifeng Qiu 提交于 3月 27, 2018

When gpload finishes its query, it will send SIGTERM to gpfdist.
gpfdist handle SIGTERM with exit(1), which will invoke registered
apr handlers and cleanup all apr resources including apr_pool. If
this happens just during normal destruction of apr_pool in
do_close, gpfdist will hang.

Call _exit in gpfdist to avoid any cleanup handlers, and let gpload
send SIGKILL to perform hard kill.

83fb63c0

26 2月, 2018 1 次提交

fix gpload bug about handling nullas option (#4583) · 2e330960

由 huiliang-liu 提交于 2月 26, 2018

- if the data file contains "\N" as the delimiter, it would not be
recognized properly by gpload
- root cause: gpload replace the quote in nullas option as well as
replace '\' as '\\'
- solution: add quote_no_slash function to handle nullas option

2e330960

19 1月, 2018 1 次提交

Support EXT_STAGING_TABLE option in configuration file for user (#4356) · d0446107

由 huiliang-liu 提交于 1月 19, 2018

to definite the reusing external table by themselves instead of
searching in gpload, which may have bad performance if there are
too many external tables.

d0446107

27 12月, 2017 1 次提交

Fix gpload count error bug with EXTERNAL.SCHEMA (#4211) · 11934719

由 Jialun 提交于 12月 27, 2017

- Fix gpload.py, add schema prefix to every external table name
  when EXTERNAL.SCHEMA is set
- Add new test cases

11934719

13 11月, 2017 1 次提交

fix error_table (#3848) · d0b6e344

由 Jialun 提交于 11月 12, 2017

gpload ERROR_TABLE configuration is forbidden in GPDB5 and use
LOG_ERRORS instead. That's why Informatica 9.x connector can't
load data into GPDB5.

So we accept ERROR_TABLE, if ERROR_TABLE is set, we will not
create it as before, but set LOG_ERRORS and REUSE_TABLE to true
instead.

d0b6e344

30 10月, 2017 1 次提交

fix bug gpload error count (#3629) · c9977c1d

由 Jialun 提交于 10月 30, 2017

gpload error count is incorrect when more than one segment has format
error, for the cmdtime is different, and only errors with the newest
cmdtime is counted.

So we add startTime which will be used for counting all the errors
occured during the same gpload operation.

c9977c1d

07 8月, 2017 1 次提交
- A
  Revert "gpload: log gpfdist outputs by default" · 77918a73
  由 Adam Lee 提交于 8月 07, 2017
```
This reverts commit 64d150e9.
```
  77918a73
27 7月, 2017 1 次提交
- A
  Log gpload threads' terminating · 4b71d480
  由 Adam Lee 提交于 7月 25, 2017
```
It's useful and important for debugging.
```
  4b71d480
11 7月, 2017 2 次提交

A
gpload: log gpfdist outputs by default · 64d150e9
由 Adam Lee 提交于 6月 23, 2017
```
Which is important for debugging customers' issues. (log level still
matters)
```
64d150e9

Fixed gpload freezed when logging some non-unicode data · 769836cc

由 Ming LI 提交于 7月 05, 2017

1. Log raw string if it can't be decoded as unicode.
2. If similar exception issues in log(), continue processing left log with a warning.
3. If other exception issues in CatThread, log thread exit without blocking worker process,
and report warning "gpfdist log halt because Log Thread got an exception:".

769836cc

30 6月, 2017 1 次提交

Fix gpload exiting before cleaning up · a69008c4

由 Adam Lee 提交于 6月 22, 2017

gpload.cleanupSql() in the `finally` block may throw a `SystemExit`
exception, which will not be caught by `except Exception:` and exit
before cleaning up others like gpfdist processes.

This commit catches `SystemExit`, and do the stop_gpfdists() before
cleanupSql() to avoid this situation.
Signed-off-by: NAdam Lee <ali@pivotal.io>
Signed-off-by: NXiaoran Wang <xiwang@pivotal.io>

a69008c4

05 5月, 2017 1 次提交
- N
  
  Fix gpload error when format option escape specify to "\" · 3a983b11
  由 Ning Wu 提交于 5月 02, 2017
  
  3a983b11
18 4月, 2017 2 次提交

N

Remove tailing spaces in gpload · 2298a165
由 Ning Wu 提交于 4月 10, 2017

2298a165

Refactor gpload create_external_table() function · 2c808650

由 Ning Wu 提交于 3月 31, 2017

1. fix an error if the quote and escape options are not specified in the
CSV format.

2. add codes for delimiter, escape and quote options to process a
ASCII, unicode-encoded or escaped single character in order to find
the correct table to reuse when load external data.

2c808650

03 3月, 2017 1 次提交

Fix gpload reuse table issues (#1693) · 051f456e

由 Wu Ning 提交于 3月 03, 2017

* Fix gpload reuse table issues

This commit fixes issues below:

1, reports unexpected error when reuse_table is enabled
2, doesn't recognize GPDB delimiter syntax like E'\t'
3, unnecessary case sensible of column names
Signed-off-by: NAdam Lee <ali@pivotal.io>
Signed-off-by: NNing Wu <nwu@pivotal.io>

* delimiter '\t' handled

* Fixed report warnings wrongly when reusable table exists

* remove the tailing space and simplify sql

051f456e

22 2月, 2017 1 次提交

Update the catalog query in gpload · 60b89ebe

由 Adam Lee 提交于 2月 22, 2017

Due to changes of external table "on master" feature
Signed-off-by: NHaozhou Wang <hawang@pivotal.io>

60b89ebe

18 1月, 2017 1 次提交
- L
  
  Close the db connection at the end. · 242ec7bd
  由 laixiong 提交于 1月 06, 2017
  
  242ec7bd
10 1月, 2017 1 次提交

gpload local_hostname var should be list instead of string · d8aaf459

由 Jasper Li 提交于 4月 19, 2016

This was fixed in GPDB 4.3 immediately after it was brought up but was
never ported to 5.0. Issue was introduced in
https://github.com/greenplum-db/gpdb/commit/232eb64ad9f93dce8941f7b124a98f0c21c3350b
which has the initial discussion.

d8aaf459