diff --git a/README.md b/README.md index b2f8853f24925b4f7eb9951e5bc67345fce0ec85..33d9bf8d39cab2df1802a9371bb6ae003a90b23a 100644 --- a/README.md +++ b/README.md @@ -170,7 +170,8 @@ make distclean ### Building GPDB with PXF PXF is an extension framework for GPDB to enable fast access to external hadoop datasets. -Refer to [PXF extension](https://github.com/greenplum-db/gpdb/tree/master/gpAux/extensions/pxf) for more information. +Refer to [PXF extension](gpAux/extensions/pxf/README.md) for more information. + Currently, GPDB is built with PXF by default (--enable-pxf is on). In order to build GPDB without pxf, simply invoke `./configure` with additional option `--disable-pxf`. PXF requires curl, so `--enable-pxf` is not compatible with the `--without-libcurl` option. diff --git a/gpAux/extensions/pxf/README.md b/gpAux/extensions/pxf/README.md index d1f1c3ca253d50ac8d72a841feb3bec477e8c125..e4be2207946780fe1f2a633638c111b851fea01a 100644 --- a/gpAux/extensions/pxf/README.md +++ b/gpAux/extensions/pxf/README.md @@ -5,15 +5,6 @@ PXF consists of a server side JVM based component and a C client component which This module only includes the PXF C client and the build instructions only builds the client. Using the 'pxf' protocol with external table, GPDB can query external datasets via PXF service that runs alongside GPDB segments. -## Table of Contents - -* Usage -* Initialize and start GPDB cluster -* Enable PXF extension -* Run unit tests -* Run regression tests -======= - ## Usage ### Enable PXF extension in GPDB build process. @@ -54,23 +45,25 @@ Additional instructions on building and starting a GPDB cluster can be found in the top-level [README.md](../../../README.md) ("_Build the database_" section). -## Create and use PXF external table -If you wish to simply test GPDB and PXF without hadoop, you can use the Demo Profile. -The Demo profile demonstrates how GPDB can parallely the external data via the PXF agents. The data served is -static data from the PXF agents themselves. +### Install PXF Server +Please refer to [PXF Development](https://github.com/greenplum-db/pxf/blob/master/README.md) for instructions to setup PXF. +You will need one PXF server agent per Segment host. + +### Create and use PXF external table +If you wish to simply test drive PXF extension without hitting any external data source, you can avoid starting any of the hadoop components (while installing the PXF Server) and simply use the Demo Profile. + +The Demo profile demonstrates how GPDB using its segments can access static data served by the PXF service(s) in parallel. ``` # CREATE EXTERNAL TABLE pxf_read_test (a TEXT, b TEXT, c TEXT) \ LOCATION ('pxf://localhost:5888/tmp/dummy1' \ -'?FRAGMENTER=org.apache.hawq.pxf.api.examples.DemoFragmenter' \ -'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoAccessor' \ -'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoTextResolver') \ +'?FRAGMENTER=org.greenplum.pxf.api.examples.DemoFragmenter' \ +'&ACCESSOR=org.greenplum.pxf.api.examples.DemoAccessor' \ +'&RESOLVER=org.greenplum.pxf.api.examples.DemoTextResolver') \ FORMAT 'TEXT' (DELIMITER ','); ``` -Please refer to [PXF Setup](https://cwiki.apache.org/confluence/display/HAWQ/PXF+Build+and+Install) for instructions to setup PXF. -Once you install and run PXF server alongside the GPDB segments, you can select data from the demo PXF profile: ``` # SELECT * from pxf_read_test order by a; @@ -90,11 +83,11 @@ If you wish to use PXF with Hadoop, you will need to integrate with Hdfs or Hive -## Run regression tests +### Run regression tests ``` make installcheck ``` This will connect to the running database, and run the regression -tests located in the `regress` directory. +tests located in the `regress` directory. \ No newline at end of file diff --git a/gpAux/extensions/pxf/expected/setup.out b/gpAux/extensions/pxf/expected/setup.out index bee92950eb49273aff217bdf909a181f7f6d9380..8d81c3468eab5e9f1f50d8ef25d90ccee9a44b3c 100644 --- a/gpAux/extensions/pxf/expected/setup.out +++ b/gpAux/extensions/pxf/expected/setup.out @@ -9,20 +9,20 @@ DROP EXTENSION pxf; CREATE EXTENSION pxf; CREATE EXTERNAL TABLE pxf_read_test (a TEXT, b TEXT, c TEXT) LOCATION ('pxf://tmp/dummy1' -'?FRAGMENTER=org.apache.hawq.pxf.api.examples.DemoFragmenter' -'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoAccessor' -'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoTextResolver') +'?FRAGMENTER=org.greenplum.pxf.api.examples.DemoFragmenter' +'&ACCESSOR=org.greenplum.pxf.api.examples.DemoAccessor' +'&RESOLVER=org.greenplum.pxf.api.examples.DemoTextResolver') FORMAT 'TEXT' (DELIMITER ','); CREATE EXTERNAL TABLE pxf_readcustom_test (a TEXT, b TEXT, c TEXT) LOCATION ('pxf://tmp/dummy1' -'?FRAGMENTER=org.apache.hawq.pxf.api.examples.DemoFragmenter' -'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoAccessor' -'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoResolver') +'?FRAGMENTER=org.greenplum.pxf.api.examples.DemoFragmenter' +'&ACCESSOR=org.greenplum.pxf.api.examples.DemoAccessor' +'&RESOLVER=org.greenplum.pxf.api.examples.DemoResolver') FORMAT 'CUSTOM' (formatter='pxfwritable_import'); CREATE WRITABLE EXTERNAL TABLE pxf_write_test (a int, b TEXT) LOCATION ('pxf:///tmp/pxf?' -'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoFileWritableAccessor' -'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoTextResolver') +'&ACCESSOR=org.greenplum.pxf.api.examples.DemoFileWritableAccessor' +'&RESOLVER=org.greenplum.pxf.api.examples.DemoTextResolver') FORMAT 'TEXT' (DELIMITER ',') DISTRIBUTED BY (a); CREATE TABLE origin (a int, b TEXT) DISTRIBUTED BY (a); INSERT INTO origin SELECT i, 'data_' || i FROM generate_series(10,99) AS i; diff --git a/gpAux/extensions/pxf/sql/setup.sql b/gpAux/extensions/pxf/sql/setup.sql index 3b75c616ee2788d9e47f92ef5aa89db244594369..426cef659a0fcffedeeecd42da05d6e1e6c714b3 100644 --- a/gpAux/extensions/pxf/sql/setup.sql +++ b/gpAux/extensions/pxf/sql/setup.sql @@ -13,23 +13,23 @@ CREATE EXTENSION pxf; CREATE EXTERNAL TABLE pxf_read_test (a TEXT, b TEXT, c TEXT) LOCATION ('pxf://tmp/dummy1' -'?FRAGMENTER=org.apache.hawq.pxf.api.examples.DemoFragmenter' -'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoAccessor' -'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoTextResolver') +'?FRAGMENTER=org.greenplum.pxf.api.examples.DemoFragmenter' +'&ACCESSOR=org.greenplum.pxf.api.examples.DemoAccessor' +'&RESOLVER=org.greenplum.pxf.api.examples.DemoTextResolver') FORMAT 'TEXT' (DELIMITER ','); CREATE EXTERNAL TABLE pxf_readcustom_test (a TEXT, b TEXT, c TEXT) LOCATION ('pxf://tmp/dummy1' -'?FRAGMENTER=org.apache.hawq.pxf.api.examples.DemoFragmenter' -'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoAccessor' -'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoResolver') +'?FRAGMENTER=org.greenplum.pxf.api.examples.DemoFragmenter' +'&ACCESSOR=org.greenplum.pxf.api.examples.DemoAccessor' +'&RESOLVER=org.greenplum.pxf.api.examples.DemoResolver') FORMAT 'CUSTOM' (formatter='pxfwritable_import'); CREATE WRITABLE EXTERNAL TABLE pxf_write_test (a int, b TEXT) LOCATION ('pxf:///tmp/pxf?' -'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoFileWritableAccessor' -'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoTextResolver') +'&ACCESSOR=org.greenplum.pxf.api.examples.DemoFileWritableAccessor' +'&RESOLVER=org.greenplum.pxf.api.examples.DemoTextResolver') FORMAT 'TEXT' (DELIMITER ',') DISTRIBUTED BY (a); CREATE TABLE origin (a int, b TEXT) DISTRIBUTED BY (a); -INSERT INTO origin SELECT i, 'data_' || i FROM generate_series(10,99) AS i; \ No newline at end of file +INSERT INTO origin SELECT i, 'data_' || i FROM generate_series(10,99) AS i; diff --git a/gpdb-doc/markdown/common/gpdb-features.html.md.erb b/gpdb-doc/markdown/common/gpdb-features.html.md.erb index 46e8248dd892c86153630c823fdc40294ef5c00b..92b7b2257ce49b4ba4cf051a513b033ca133e7e5 100644 --- a/gpdb-doc/markdown/common/gpdb-features.html.md.erb +++ b/gpdb-doc/markdown/common/gpdb-features.html.md.erb @@ -52,6 +52,6 @@ the Greenplum hosts. - The deprecated gpcheck management utility and its replacement gpsupport are only supported with Pivotal Greenplum Database. -- To use the Greenplum Platform Extension Framework (PXF) with open source Greenplum Database, you must separately build and install the PXF server software. Refer to the build instructions in the PXF README files in the Greenplum Database and Apache HAWQ (incubating) repositories. +- To use the Greenplum Platform Extension Framework (PXF) with open source Greenplum Database, you must separately build and install the PXF server software. Refer to the build instructions in the PXF README files in the Greenplum Database and PXF repositories. - Suggestions to contact Pivotal Technical Support in this documentation are intended only for Pivotal Greenplum Database customers. diff --git a/gpdb-doc/markdown/pxf/overview_pxf.html.md.erb b/gpdb-doc/markdown/pxf/overview_pxf.html.md.erb index e8e942d75b9c5a1d1247c98af8afb13bf45aebb4..87e046a3f6535e93805c85a08d7afaaafff6f254 100644 --- a/gpdb-doc/markdown/pxf/overview_pxf.html.md.erb +++ b/gpdb-doc/markdown/pxf/overview_pxf.html.md.erb @@ -21,7 +21,7 @@ specific language governing permissions and limitations under the License. --> -The Greenplum Platform Extension Framework (PXF) provides parallel, high throughput data access and federated queries across heterogeneous data sources via built-in connectors that map a Greenplum Database external table definition to an external data source. This Greenplum Database extension is based on [PXF](https://cwiki.apache.org/confluence/display/HAWQ/PXF) from Apache HAWQ (incubating). +The Greenplum Platform Extension Framework (PXF) provides parallel, high throughput data access and federated queries across heterogeneous data sources via built-in connectors that map a Greenplum Database external table definition to an external data source. PXF has its roots from Apache HAWQ project. - **[PXF Architecture](intro_pxf.html)** diff --git a/gpdb-doc/markdown/pxf/sdk/build_conn.html.md.erb b/gpdb-doc/markdown/pxf/sdk/build_conn.html.md.erb index 4e688af1c338b78d0a3f2ed547d982f11712ca6b..f14f1d1750ce4cb0daf79208ad585581bfefd948 100644 --- a/gpdb-doc/markdown/pxf/sdk/build_conn.html.md.erb +++ b/gpdb-doc/markdown/pxf/sdk/build_conn.html.md.erb @@ -111,14 +111,14 @@ Before building the *Demo* connector, ensure that you have: Perform the following procedure to create a local copy of the *Demo* connector source code, update package names, configure compile-time dependencies, and use `gradle` to build the connector. -1. Download the PXF *Demo* connector source code. You can obtain the PXF source code from the Apache HAWQ (incubating) `incubator-hawq` `github` repository. For example: +1. Download the PXF *Demo* connector source code from the Greenplum PXF git repo. You can obtain the PXF source code from Greenplum PXF `github` repository. For example: ``` shell user@devsystem$ cd $PXFDEV_BASE - user@devsystem$ git clone https://github.com/apache/incubator-hawq.git + user@devsystem$ git clone https://github.com/greenplum-db/pxf.git ``` - The `clone` operation creates a directory named `incubator-hawq/` in your current working directory. + The `clone` operation creates a directory named `pxf/` in your current working directory. 2. Create a project directory for your copy of the source code and navigate to that directory. For example: @@ -134,24 +134,18 @@ Perform the following procedure to create a local copy of the *Demo* connector s user@devsystem$ cp $PXFDEV_BASE/pxf-api-.jar libs/ ``` -4. The source code for the PXF *Demo* connector is located in the `incubator-hawq/pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/examples` directory of the repository you cloned in Step 1. Copy this code to your work area. For example: +4. The source code for the PXF *Demo* connector is located in the `pxf/server/pxf-api/src/main/java/org/greenplum/pxf/api/examples` directory of the repository you cloned in Step 1. Copy this code to your work area. For example: ``` shell user@devsystem$ mkdir -p src/main/java/org/greenplum/pxf/example/demo user@devsystem$ cd src/main/java/org/greenplum/pxf/example/demo - user@devsystem$ cp $PXFDEV_BASE/incubator-hawq/pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/examples/* . + user@devsystem$ cp $PXFDEV_BASE/pxf/server/pxf-api/src/main/java/org/greenplum/pxf/api/examples/* . ``` -5. The original PXF *Demo* connector resides in the `org.apache.hawq.pxf.api.examples` package. Your *Demo* connector resides in a package named `org.greenplum.pxf.example.demo`. Update the package name in your local copy of the *Demo* connector source code. You can edit the files, run a script, etc. For example: +5. The original PXF *Demo* connector resides in the `org.greenplum.pxf.api.examples` package. Your *Demo* connector resides in a package named `org.greenplum.pxf.example.demo`. Update the package name in your local copy of the *Demo* connector source code. You can edit the files, run a script, etc. For example: ``` shell - user@devsystem$ sed -i.bak s/"org.apache.hawq.pxf.api.examples"/"org.greenplum.pxf.example.demo"/g *.java - ``` - - The `sed` command above creates a backup of each file. Remove the backup files. For example: - - ``` shell - user@devsystem$ rm *.bak + user@devsystem$ find . -name '*.java' -exec sed -i '' s/"org.apache.hawq.pxf.api.examples"/"org.apache.hawq.pxf.example.demo"/g {} + ``` 6. Initialize a `gradle` Java library project for your *Demo* connector. For example: @@ -200,7 +194,7 @@ Perform the following procedure to create a local copy of the *Demo* connector s testCompile 'junit:junit:4.12' compile 'commons-logging:commons-logging:1.1.3' - compile 'org.apache.hawq.pxf.api:pxf-api:3.3.0.0' + compile 'org.greenplum.pxf.api:pxf-api:4.0.0' } diff --git a/gpdb-doc/markdown/pxf/sdk/deploy_profile.html.md.erb b/gpdb-doc/markdown/pxf/sdk/deploy_profile.html.md.erb index 6f00da4700697884f243c9afd1544c3e96c9b375..95cc130ee84337a704357d5b039bdbd762d39d16 100644 --- a/gpdb-doc/markdown/pxf/sdk/deploy_profile.html.md.erb +++ b/gpdb-doc/markdown/pxf/sdk/deploy_profile.html.md.erb @@ -28,7 +28,7 @@ The profile \ provides a simple mapping to the \ classes. The G ### Specifying the Plug-in Class Names -The profile \ identify the fully-qualified names of the Java classes that PXF will use to split (\), read and/or write (\), and deserialize/serialize (\) the external data. +The profile \ identify the fully-qualified names of the Java classes that PXF will use to split (\), read and/or write (\), and deserialize/serialize (\) the external data. When you define a profile that supports a read operation from an external data store, you must provide one each of \, \, and \ plug-in class names. You must provide both an \ and a \ plug-in class name for a profile that supports a write operation to an external data store. @@ -43,9 +43,9 @@ You can re-use a single plug-in class name in multiple profile definitions. For delimited single line records from plain text files on HDFS - org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter - org.apache.hawq.pxf.plugins.hdfs.LineBreakAccessor - org.apache.hawq.pxf.plugins.hdfs.StringPassResolver + org.greenplum.pxf.plugins.hdfs.HdfsDataFragmenter + org.greenplum.pxf.plugins.hdfs.LineBreakAccessor + org.greenplum.pxf.plugins.hdfs.StringPassResolver @@ -56,9 +56,9 @@ You can re-use a single plug-in class name in multiple profile definitions. For It is not splittable (non parallel) and slower than HdfsTextSimple. - org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter - org.apache.hawq.pxf.plugins.hdfs.QuotedLineBreakAccessor - org.apache.hawq.pxf.plugins.hdfs.StringPassResolver + org.greenplum.pxf.plugins.hdfs.HdfsDataFragmenter + org.greenplum.pxf.plugins.hdfs.QuotedLineBreakAccessor + org.greenplum.pxf.plugins.hdfs.StringPassResolver ``` @@ -154,7 +154,7 @@ Perform the following procedure to define and register a read profile and a writ ``` shell gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf restart" ``` - + 8. Verify that you correctly deployed the *Demo* connector profiles by creating and accessing Greenplum Database external tables: 1. Connect to a database in which you created the PXF extension as the `gpadmin` user. For example, to connect to a database named `pxf_exampledb`: @@ -162,7 +162,7 @@ Perform the following procedure to define and register a read profile and a writ ``` shell gpadmin@gpmaster$ psql -d pxf_exampledb -U gpadmin ``` - + 2. Create a readable Greenplum external table specifying the `DemoReadLocalFS` profile name. For example: ``` sql @@ -171,12 +171,12 @@ Perform the following procedure to define and register a read profile and a writ FORMAT 'TEXT' (DELIMITER ','); CREATE EXTERNAL TABLE ``` - + 3. Query the `demo_tbl_read_wp` table: ``` sql pxf_exampledb=# SELECT * from demo_tbl_read_wp; - a | b | c + a | b | c ----------------+--------+-------- fragment2 row1 | value1 | value2 fragment2 row2 | value1 | value2 @@ -195,7 +195,7 @@ Perform the following procedure to define and register a read profile and a writ FORMAT 'TEXT' (DELIMITER ','); CREATE EXTERNAL TABLE ``` - + 5. Write some text data into the `demo_tbl_write_wp` table. For example: ``` sql diff --git a/gpdb-doc/markdown/pxf/sdk/dev_overview.html.md.erb b/gpdb-doc/markdown/pxf/sdk/dev_overview.html.md.erb index 0670b125b1026beb44baec5ba8f48d0cc07a1188..4e180935c528158fe508a22f0502ae7a17677e82 100644 --- a/gpdb-doc/markdown/pxf/sdk/dev_overview.html.md.erb +++ b/gpdb-doc/markdown/pxf/sdk/dev_overview.html.md.erb @@ -3,7 +3,7 @@ title: Using the PXF Java SDK --- The Greenplum Platform Extension Framework (PXF) SDK provides the Java classes and interfaces that you use to create connectors to new external data sources, data formats, and data access APIs from Greenplum Database. You can extend PXF functionality *without changing Greenplum Database* when you use the PXF Java SDK. -PXF in Greenplum Database is based on [PXF](https://cwiki.apache.org/confluence/display/HAWQ/PXF) from the open source Apache HAWQ (incubating) project. You can contribute to PXF development via the [Greenplum Database](https://github.com/greenplum-db/gpdb/tree/master/gpAux/extensions/pxf) and the [Apache HAWQ (incubating)](https://github.com/apache/incubator-hawq/tree/master/pxf) open source `github` repositories. +PXF in Greenplum Database has its roots in the Apache HAWQ project. You can contribute to Greenplum PXF development via the open source github repositories for [PXF Server Plugins](https://github.com/greenplum-db/pxf) and the [Greenplum PXF Extension/Client](https://github.com/greenplum-db/gpdb/tree/master/gpAux/extensions/pxf). ## Topic Overview diff --git a/gpdb-doc/markdown/pxf/sdk/pxfapi.html.md.erb b/gpdb-doc/markdown/pxf/sdk/pxfapi.html.md.erb index 113ce65bc46e850d6ba9523238141317bb1448a1..402e60fa7a652e47ddebfba6efa1e7e68692a127 100644 --- a/gpdb-doc/markdown/pxf/sdk/pxfapi.html.md.erb +++ b/gpdb-doc/markdown/pxf/sdk/pxfapi.html.md.erb @@ -29,19 +29,17 @@ The PXF API exposes the following interfaces: | `WriteAccessor` | Writes `OneRow` records to the external data source. | -Refer to the [PXF API JavaDocs](http://hawq.incubator.apache.org/docs/pxf/javadoc/) for detailed information about the classes and interfaces exposed by the API. - ## General PXF API Information ### Package Name -The PXF API base package name is `org.apache.hawq.pxf.api`. All PXF API classes and interfaces reside in this package. +The PXF API base package name is `org.greenplum.pxf.api`. All PXF API classes and interfaces reside in this package. ### JAR File You need the PXF API JAR file to develop with the PXF SDK. This file is named `pxf-api-.jar`, where `` is a dot-separated 4 digit version number. For example: ``` shell -pxf-api-3.3.0.0.jar +pxf-api-4.0.0.jar ``` *PXF JAR files are not currently available from a remote repository.* You can obtain the PXF API JAR file from your Greenplum Database installation here: diff --git a/gpdb-doc/markdown/pxf/troubleshooting_pxf.html.md.erb b/gpdb-doc/markdown/pxf/troubleshooting_pxf.html.md.erb index 2af377833da47ca3b899e452ab409f3fc7101d82..08b1e21e2bb88870499ea2383bbb398edb5e7d2c 100644 --- a/gpdb-doc/markdown/pxf/troubleshooting_pxf.html.md.erb +++ b/gpdb-doc/markdown/pxf/troubleshooting_pxf.html.md.erb @@ -45,7 +45,7 @@ PXF utilizes `log4j` for service-level logging. PXF-service-related log messages PXF provides more detailed logging when the `DEBUG` level is enabled. To configure PXF `DEBUG` logging, uncomment the following line in `pxf-log4j.properties`: ``` shell -#log4j.logger.org.apache.hawq.pxf=DEBUG +#log4j.logger.org.greenplum.pxf=DEBUG ``` Copy the `pxf-log4j.properties` file to each segment host and restart the PXF service on *each* Greenplum Database segment host. For example: @@ -74,7 +74,7 @@ dbname=> SELECT * FROM hdfstest; Examine/collect the log messages from `pxf-service.log`. **Note**: `DEBUG` logging is quite verbose and has a performance impact. Remember to turn off PXF service `DEBUG` logging after you have collected the desired information. - + ### Client-Level Logging diff --git a/gpdb-doc/markdown/pxf/using_pxf.html.md.erb b/gpdb-doc/markdown/pxf/using_pxf.html.md.erb index 60bda934e7ebf57622d404962df73997b814ac25..fe533240e27b3a3f2bebda46be8ccf6bde56e77e 100644 --- a/gpdb-doc/markdown/pxf/using_pxf.html.md.erb +++ b/gpdb-doc/markdown/pxf/using_pxf.html.md.erb @@ -34,7 +34,7 @@ You must explicitly enable the PXF extension in each Greenplum Database in which **Note**: You must have Greenplum Database administrator privileges to create an extension. - + ### Enable Procedure Perform the following procedure for **_each_** database in which you want to use PXF: @@ -50,11 +50,11 @@ Perform the following procedure for **_each_** database in which you want to use ``` sql database-name=# CREATE EXTENSION pxf; ``` - + Creating the `pxf` extension registers the `pxf` protocol and the call handlers required for PXF to access external data. ### Disable Procedure - + When you no longer want to use PXF on a specific database, you must explicitly disable the PXF extension for that database: 1. Connect to the database as the `gpadmin` user: @@ -62,30 +62,30 @@ When you no longer want to use PXF on a specific database, you must explicitly d ``` shell gpadmin@gpmaster$ psql -d -U gpadmin ``` - + 2. Drop the PXF extension: ``` sql database-name=# DROP EXTENSION pxf; ``` - + The `DROP` command fails if there are any currently defined external tables using the `pxf` protocol. Add the `CASCADE` option if you choose to forcibly remove these external tables. ## Granting Access to PXF -To read external data with PXF, you create an external table with the `CREATE EXTERNAL TABLE` command that specifies the `pxf` protocol. You must specifically grant `SELECT` permission to the `pxf` protocol to all non-`SUPERUSER` Greenplum Database roles that require such access. +To read external data with PXF, you create an external table with the `CREATE EXTERNAL TABLE` command that specifies the `pxf` protocol. You must specifically grant `SELECT` permission to the `pxf` protocol to all non-`SUPERUSER` Greenplum Database roles that require such access. To grant a specific role access to the `pxf` protocol, use the `GRANT` command. For example, to grant the role named `bill` read access to data referenced by an external table created with the `pxf` protocol: ``` sql -GRANT SELECT ON PROTOCOL pxf TO bill; +GRANT SELECT ON PROTOCOL pxf TO bill; ``` To write data to an external data store with PXF, you create an external table with the `CREATE WRITABLE EXTERNAL TABLE` command that specifies the `pxf` protocol. You must specifically grant `INSERT` permission to the `pxf` protocol to all non-`SUPERUSER` Greenplum Database roles that require such access. For example: ``` sql -GRANT INSERT ON PROTOCOL pxf TO bill; +GRANT INSERT ON PROTOCOL pxf TO bill; ``` ## Configuring Filter Pushdown @@ -99,7 +99,7 @@ SHOW gp_external_enable_filter_pushdown; SET gp_external_enable_filter_pushdown TO 'on'; ``` -**Note:** Some external data sources do not support filter pushdown. Also, filter pushdown may not be supported with certain data types or operators. If a query accesses a data source that does not support filter push-down for the query constraints, the query is instead executed without filter pushdown (the data is filtered after it is transferred to Greenplum Database). +**Note:** Some external data sources do not support filter pushdown. Also, filter pushdown may not be supported with certain data types or operators. If a query accesses a data source that does not support filter push-down for the query constraints, the query is instead executed without filter pushdown (the data is filtered after it is transferred to Greenplum Database). PXF accesses data sources using different connectors, and filter pushdown support is determined by the specific connector implementation. The following PXF connectors support filter pushdown: @@ -157,13 +157,13 @@ A PXF profile definition includes the name of the profile, a description, and th ``` xml HdfsTextSimple - This profile is suitable for using when reading + This profile is suitable for using when reading delimited single line records from plain text files on HDFS - org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter - org.apache.hawq.pxf.plugins.hdfs.LineBreakAccessor - org.apache.hawq.pxf.plugins.hdfs.StringPassResolver + org.greenplum.pxf.plugins.hdfs.HdfsDataFragmenter + org.greenplum.pxf.plugins.hdfs.LineBreakAccessor + org.greenplum.pxf.plugins.hdfs.StringPassResolver ``` @@ -175,7 +175,7 @@ A PXF profile definition includes the name of the profile, a description, and th You use PXF to access data stored on external systems. Depending upon the external data store, this access may require that you install and/or configure additional components or services for the external data store. For example, to use PXF to access a file stored in HDFS, you must install a Hadoop client on each Greenplum Database segment host. -PXF depends on JAR files and other configuration information provided by these additional components. The `$GPHOME/pxf/conf/pxf-private.classpath` and `$GPHOME/pxf/conf/pxf-public.classpath` configuration files identify PXF JAR dependencies. In most cases, PXF manages the `pxf-private.classpath` file, adding entries as necessary based on your Hadoop distribution and optional Hive and HBase client installations. +PXF depends on JAR files and other configuration information provided by these additional components. The `$GPHOME/pxf/conf/pxf-private.classpath` and `$GPHOME/pxf/conf/pxf-public.classpath` configuration files identify PXF JAR dependencies. In most cases, PXF manages the `pxf-private.classpath` file, adding entries as necessary based on your Hadoop distribution and optional Hive and HBase client installations. Should you need to add additional JAR dependencies for PXF, for example a JDBC driver JAR file, you must add them to the `pxf-public.classpath` file on each segment host, and then restart PXF on each host. @@ -193,11 +193,11 @@ FORMAT '[TEXT|CSV|CUSTOM]' (); The `LOCATION` clause in a `CREATE EXTERNAL TABLE` statement specifying the `pxf` protocol is a URI that identifies the path to, or other information describing, the location of the external data. For example, if the external data store is HDFS, the \ would identify the absolute path to a specific HDFS file. If the external data store is Hive, \ would identify a schema-qualified Hive table name. -Use the query portion of the URI, introduced by the question mark (?), to identify the PXF profile name. +Use the query portion of the URI, introduced by the question mark (?), to identify the PXF profile name. -You will provide profile-specific information using the optional &\=\ component of the `LOCATION` string and formatting information via the \ component of the string. The custom options and formatting properties supported by a specific profile are identified later in usage documentation. +You will provide profile-specific information using the optional &\=\ component of the `LOCATION` string and formatting information via the \ component of the string. The custom options and formatting properties supported by a specific profile are identified later in usage documentation. -Greenplum Database passes the parameters in the `LOCATION` string as headers to the PXF Java service. +Greenplum Database passes the parameters in the `LOCATION` string as headers to the PXF Java service. Table 1. Create External Table Parameter Values and Descriptions