未验证 提交 54dee6ce 编写于 作者: S Shivram Mani 提交者: GitHub

Use PXF server from apache/hawq to the new greenplum/pxf repo (#5798)

PXF client in gpdb uses pxf libraries from apache hawq repo. These pxf libraries will continue being developed in a new PXF repo greenplum-db/pxf and is in the process of getting open sourced in the next few days. The PXF extension and gpdb-pxf client code will continue to remain in gpdb repo.

The following changes are included in this PR:

Transition from the old PXF namespace org.apache.hawq.pxf to org.greenplum.pxf
(there is a separate PR in the PXF repo to address the package namespace refactor
greenplum-db/pxf#5)

Doc updates to reflect the new PXF repo and the new package namespace
上级 fc008690
......@@ -170,7 +170,8 @@ make distclean
### Building GPDB with PXF
PXF is an extension framework for GPDB to enable fast access to external hadoop datasets.
Refer to [PXF extension](https://github.com/greenplum-db/gpdb/tree/master/gpAux/extensions/pxf) for more information.
Refer to [PXF extension](gpAux/extensions/pxf/README.md) for more information.
Currently, GPDB is built with PXF by default (--enable-pxf is on).
In order to build GPDB without pxf, simply invoke `./configure` with additional option `--disable-pxf`.
PXF requires curl, so `--enable-pxf` is not compatible with the `--without-libcurl` option.
......
......@@ -5,15 +5,6 @@ PXF consists of a server side JVM based component and a C client component which
This module only includes the PXF C client and the build instructions only builds the client.
Using the 'pxf' protocol with external table, GPDB can query external datasets via PXF service that runs alongside GPDB segments.
## Table of Contents
* Usage
* Initialize and start GPDB cluster
* Enable PXF extension
* Run unit tests
* Run regression tests
=======
## Usage
### Enable PXF extension in GPDB build process.
......@@ -54,23 +45,25 @@ Additional instructions on building and starting a GPDB cluster can be
found in the top-level [README.md](../../../README.md) ("_Build the
database_" section).
## Create and use PXF external table
If you wish to simply test GPDB and PXF without hadoop, you can use the Demo Profile.
The Demo profile demonstrates how GPDB can parallely the external data via the PXF agents. The data served is
static data from the PXF agents themselves.
### Install PXF Server
Please refer to [PXF Development](https://github.com/greenplum-db/pxf/blob/master/README.md) for instructions to setup PXF.
You will need one PXF server agent per Segment host.
### Create and use PXF external table
If you wish to simply test drive PXF extension without hitting any external data source, you can avoid starting any of the hadoop components (while installing the PXF Server) and simply use the Demo Profile.
The Demo profile demonstrates how GPDB using its segments can access static data served by the PXF service(s) in parallel.
```
# CREATE EXTERNAL TABLE pxf_read_test (a TEXT, b TEXT, c TEXT) \
LOCATION ('pxf://localhost:5888/tmp/dummy1' \
'?FRAGMENTER=org.apache.hawq.pxf.api.examples.DemoFragmenter' \
'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoAccessor' \
'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoTextResolver') \
'?FRAGMENTER=org.greenplum.pxf.api.examples.DemoFragmenter' \
'&ACCESSOR=org.greenplum.pxf.api.examples.DemoAccessor' \
'&RESOLVER=org.greenplum.pxf.api.examples.DemoTextResolver') \
FORMAT 'TEXT' (DELIMITER ',');
```
Please refer to [PXF Setup](https://cwiki.apache.org/confluence/display/HAWQ/PXF+Build+and+Install) for instructions to setup PXF.
Once you install and run PXF server alongside the GPDB segments, you can select data from the demo PXF profile:
```
# SELECT * from pxf_read_test order by a;
......@@ -90,11 +83,11 @@ If you wish to use PXF with Hadoop, you will need to integrate with Hdfs or Hive
## Run regression tests
### Run regression tests
```
make installcheck
```
This will connect to the running database, and run the regression
tests located in the `regress` directory.
tests located in the `regress` directory.
\ No newline at end of file
......@@ -9,20 +9,20 @@ DROP EXTENSION pxf;
CREATE EXTENSION pxf;
CREATE EXTERNAL TABLE pxf_read_test (a TEXT, b TEXT, c TEXT)
LOCATION ('pxf://tmp/dummy1'
'?FRAGMENTER=org.apache.hawq.pxf.api.examples.DemoFragmenter'
'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoAccessor'
'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoTextResolver')
'?FRAGMENTER=org.greenplum.pxf.api.examples.DemoFragmenter'
'&ACCESSOR=org.greenplum.pxf.api.examples.DemoAccessor'
'&RESOLVER=org.greenplum.pxf.api.examples.DemoTextResolver')
FORMAT 'TEXT' (DELIMITER ',');
CREATE EXTERNAL TABLE pxf_readcustom_test (a TEXT, b TEXT, c TEXT)
LOCATION ('pxf://tmp/dummy1'
'?FRAGMENTER=org.apache.hawq.pxf.api.examples.DemoFragmenter'
'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoAccessor'
'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoResolver')
'?FRAGMENTER=org.greenplum.pxf.api.examples.DemoFragmenter'
'&ACCESSOR=org.greenplum.pxf.api.examples.DemoAccessor'
'&RESOLVER=org.greenplum.pxf.api.examples.DemoResolver')
FORMAT 'CUSTOM' (formatter='pxfwritable_import');
CREATE WRITABLE EXTERNAL TABLE pxf_write_test (a int, b TEXT)
LOCATION ('pxf:///tmp/pxf?'
'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoFileWritableAccessor'
'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoTextResolver')
'&ACCESSOR=org.greenplum.pxf.api.examples.DemoFileWritableAccessor'
'&RESOLVER=org.greenplum.pxf.api.examples.DemoTextResolver')
FORMAT 'TEXT' (DELIMITER ',') DISTRIBUTED BY (a);
CREATE TABLE origin (a int, b TEXT) DISTRIBUTED BY (a);
INSERT INTO origin SELECT i, 'data_' || i FROM generate_series(10,99) AS i;
......@@ -13,23 +13,23 @@ CREATE EXTENSION pxf;
CREATE EXTERNAL TABLE pxf_read_test (a TEXT, b TEXT, c TEXT)
LOCATION ('pxf://tmp/dummy1'
'?FRAGMENTER=org.apache.hawq.pxf.api.examples.DemoFragmenter'
'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoAccessor'
'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoTextResolver')
'?FRAGMENTER=org.greenplum.pxf.api.examples.DemoFragmenter'
'&ACCESSOR=org.greenplum.pxf.api.examples.DemoAccessor'
'&RESOLVER=org.greenplum.pxf.api.examples.DemoTextResolver')
FORMAT 'TEXT' (DELIMITER ',');
CREATE EXTERNAL TABLE pxf_readcustom_test (a TEXT, b TEXT, c TEXT)
LOCATION ('pxf://tmp/dummy1'
'?FRAGMENTER=org.apache.hawq.pxf.api.examples.DemoFragmenter'
'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoAccessor'
'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoResolver')
'?FRAGMENTER=org.greenplum.pxf.api.examples.DemoFragmenter'
'&ACCESSOR=org.greenplum.pxf.api.examples.DemoAccessor'
'&RESOLVER=org.greenplum.pxf.api.examples.DemoResolver')
FORMAT 'CUSTOM' (formatter='pxfwritable_import');
CREATE WRITABLE EXTERNAL TABLE pxf_write_test (a int, b TEXT)
LOCATION ('pxf:///tmp/pxf?'
'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoFileWritableAccessor'
'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoTextResolver')
'&ACCESSOR=org.greenplum.pxf.api.examples.DemoFileWritableAccessor'
'&RESOLVER=org.greenplum.pxf.api.examples.DemoTextResolver')
FORMAT 'TEXT' (DELIMITER ',') DISTRIBUTED BY (a);
CREATE TABLE origin (a int, b TEXT) DISTRIBUTED BY (a);
INSERT INTO origin SELECT i, 'data_' || i FROM generate_series(10,99) AS i;
\ No newline at end of file
INSERT INTO origin SELECT i, 'data_' || i FROM generate_series(10,99) AS i;
......@@ -52,6 +52,6 @@ the Greenplum hosts.
- The deprecated <codeph>gpcheck</codeph> management utility and its replacement <codeph>gpsupport</codeph> are only supported with Pivotal Greenplum Database.
- To use the Greenplum Platform Extension Framework (PXF) with open source Greenplum Database, you must separately build and install the PXF server software. Refer to the build instructions in the PXF README files in the Greenplum Database and Apache HAWQ (incubating) repositories.
- To use the Greenplum Platform Extension Framework (PXF) with open source Greenplum Database, you must separately build and install the PXF server software. Refer to the build instructions in the PXF README files in the Greenplum Database and PXF repositories.
- Suggestions to contact Pivotal Technical Support in this documentation are intended only for Pivotal Greenplum Database customers.
......@@ -21,7 +21,7 @@ specific language governing permissions and limitations
under the License.
-->
The Greenplum Platform Extension Framework (PXF) provides parallel, high throughput data access and federated queries across heterogeneous data sources via built-in connectors that map a Greenplum Database external table definition to an external data source. This Greenplum Database extension is based on [PXF](https://cwiki.apache.org/confluence/display/HAWQ/PXF) from Apache HAWQ (incubating).
The Greenplum Platform Extension Framework (PXF) provides parallel, high throughput data access and federated queries across heterogeneous data sources via built-in connectors that map a Greenplum Database external table definition to an external data source. PXF has its roots from Apache HAWQ project.
- **[PXF Architecture](intro_pxf.html)**
......
......@@ -111,14 +111,14 @@ Before building the *Demo* connector, ensure that you have:
Perform the following procedure to create a local copy of the *Demo* connector source code, update package names, configure compile-time dependencies, and use `gradle` to build the connector.
1. Download the PXF *Demo* connector source code. You can obtain the PXF source code from the Apache HAWQ (incubating) `incubator-hawq` `github` repository. For example:
1. Download the PXF *Demo* connector source code from the Greenplum PXF git repo. You can obtain the PXF source code from Greenplum PXF `github` repository. For example:
``` shell
user@devsystem$ cd $PXFDEV_BASE
user@devsystem$ git clone https://github.com/apache/incubator-hawq.git
user@devsystem$ git clone https://github.com/greenplum-db/pxf.git
```
The `clone` operation creates a directory named `incubator-hawq/` in your current working directory.
The `clone` operation creates a directory named `pxf/` in your current working directory.
2. Create a project directory for your copy of the source code and navigate to that directory. For example:
......@@ -134,24 +134,18 @@ Perform the following procedure to create a local copy of the *Demo* connector s
user@devsystem$ cp $PXFDEV_BASE/pxf-api-<version>.jar libs/
```
4. The source code for the PXF *Demo* connector is located in the `incubator-hawq/pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/examples` directory of the repository you cloned in Step 1. Copy this code to your work area. For example:
4. The source code for the PXF *Demo* connector is located in the `pxf/server/pxf-api/src/main/java/org/greenplum/pxf/api/examples` directory of the repository you cloned in Step 1. Copy this code to your work area. For example:
``` shell
user@devsystem$ mkdir -p src/main/java/org/greenplum/pxf/example/demo
user@devsystem$ cd src/main/java/org/greenplum/pxf/example/demo
user@devsystem$ cp $PXFDEV_BASE/incubator-hawq/pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/examples/* .
user@devsystem$ cp $PXFDEV_BASE/pxf/server/pxf-api/src/main/java/org/greenplum/pxf/api/examples/* .
```
5. The original PXF *Demo* connector resides in the `org.apache.hawq.pxf.api.examples` package. Your *Demo* connector resides in a package named `org.greenplum.pxf.example.demo`. Update the package name in your local copy of the *Demo* connector source code. You can edit the files, run a script, etc. For example:
5. The original PXF *Demo* connector resides in the `org.greenplum.pxf.api.examples` package. Your *Demo* connector resides in a package named `org.greenplum.pxf.example.demo`. Update the package name in your local copy of the *Demo* connector source code. You can edit the files, run a script, etc. For example:
``` shell
user@devsystem$ sed -i.bak s/"org.apache.hawq.pxf.api.examples"/"org.greenplum.pxf.example.demo"/g *.java
```
The `sed` command above creates a backup of each file. Remove the backup files. For example:
``` shell
user@devsystem$ rm *.bak
user@devsystem$ find . -name '*.java' -exec sed -i '' s/"org.apache.hawq.pxf.api.examples"/"org.apache.hawq.pxf.example.demo"/g {} +
```
6. Initialize a `gradle` Java library project for your *Demo* connector. For example:
......@@ -200,7 +194,7 @@ Perform the following procedure to create a local copy of the *Demo* connector s
testCompile 'junit:junit:4.12'
<b>compile 'commons-logging:commons-logging:1.1.3'
compile 'org.apache.hawq.pxf.api:pxf-api:3.3.0.0'</b>
compile 'org.greenplum.pxf.api:pxf-api:4.0.0'</b>
}
</pre>
......
......@@ -28,7 +28,7 @@ The profile \<name\> provides a simple mapping to the \<plugins\> classes. The G
### <a id="select_classes"></a>Specifying the Plug-in Class Names
The profile \<plugins\> identify the fully-qualified names of the Java classes that PXF will use to split (\<fragmenter\>), read and/or write (\<accessor\>), and deserialize/serialize (\<resolver\>) the external data.
The profile \<plugins\> identify the fully-qualified names of the Java classes that PXF will use to split (\<fragmenter\>), read and/or write (\<accessor\>), and deserialize/serialize (\<resolver\>) the external data.
When you define a profile that supports a read operation from an external data store, you must provide one each of \<fragmenter\>, \<accessor\>, and \<resolver\> plug-in class names. You must provide both an \<accessor\> and a \<resolver\> plug-in class name for a profile that supports a write operation to an external data store.
......@@ -43,9 +43,9 @@ You can re-use a single plug-in class name in multiple profile definitions. For
delimited single line records from plain text files on HDFS
</description>
<plugins>
<fragmenter>org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter>
<accessor>org.apache.hawq.pxf.plugins.hdfs.LineBreakAccessor</accessor>
<resolver>org.apache.hawq.pxf.plugins.hdfs.StringPassResolver</resolver>
<fragmenter>org.greenplum.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter>
<accessor>org.greenplum.pxf.plugins.hdfs.LineBreakAccessor</accessor>
<resolver>org.greenplum.pxf.plugins.hdfs.StringPassResolver</resolver>
</plugins>
</profile>
......@@ -56,9 +56,9 @@ You can re-use a single plug-in class name in multiple profile definitions. For
It is not splittable (non parallel) and slower than HdfsTextSimple.
</description>
<plugins>
<fragmenter>org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter>
<accessor>org.apache.hawq.pxf.plugins.hdfs.QuotedLineBreakAccessor</accessor>
<resolver>org.apache.hawq.pxf.plugins.hdfs.StringPassResolver</resolver>
<fragmenter>org.greenplum.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter>
<accessor>org.greenplum.pxf.plugins.hdfs.QuotedLineBreakAccessor</accessor>
<resolver>org.greenplum.pxf.plugins.hdfs.StringPassResolver</resolver>
</plugins>
</profile>
```
......@@ -154,7 +154,7 @@ Perform the following procedure to define and register a read profile and a writ
``` shell
gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf restart"
```
8. Verify that you correctly deployed the *Demo* connector profiles by creating and accessing Greenplum Database external tables:
1. Connect to a database in which you created the PXF extension as the `gpadmin` user. For example, to connect to a database named `pxf_exampledb`:
......@@ -162,7 +162,7 @@ Perform the following procedure to define and register a read profile and a writ
``` shell
gpadmin@gpmaster$ psql -d pxf_exampledb -U gpadmin
```
2. Create a readable Greenplum external table specifying the `DemoReadLocalFS` profile name. For example:
``` sql
......@@ -171,12 +171,12 @@ Perform the following procedure to define and register a read profile and a writ
FORMAT 'TEXT' (DELIMITER ',');
CREATE EXTERNAL TABLE
```
3. Query the `demo_tbl_read_wp` table:
``` sql
pxf_exampledb=# SELECT * from demo_tbl_read_wp;
a | b | c
a | b | c
----------------+--------+--------
fragment2 row1 | value1 | value2
fragment2 row2 | value1 | value2
......@@ -195,7 +195,7 @@ Perform the following procedure to define and register a read profile and a writ
FORMAT 'TEXT' (DELIMITER ',');
CREATE EXTERNAL TABLE
```
5. Write some text data into the `demo_tbl_write_wp` table. For example:
``` sql
......
......@@ -3,7 +3,7 @@ title: Using the PXF Java SDK
---
The Greenplum Platform Extension Framework (PXF) SDK provides the Java classes and interfaces that you use to create connectors to new external data sources, data formats, and data access APIs from Greenplum Database. You can extend PXF functionality *without changing Greenplum Database* when you use the PXF Java SDK.
PXF in Greenplum Database is based on [PXF](https://cwiki.apache.org/confluence/display/HAWQ/PXF) from the open source Apache HAWQ (incubating) project. You can contribute to PXF development via the [Greenplum Database](https://github.com/greenplum-db/gpdb/tree/master/gpAux/extensions/pxf) and the [Apache HAWQ (incubating)](https://github.com/apache/incubator-hawq/tree/master/pxf) open source `github` repositories.
PXF in Greenplum Database has its roots in the Apache HAWQ project. You can contribute to Greenplum PXF development via the open source github repositories for [PXF Server Plugins](https://github.com/greenplum-db/pxf) and the [Greenplum PXF Extension/Client](https://github.com/greenplum-db/gpdb/tree/master/gpAux/extensions/pxf).
## <a id="prereqs"></a>Topic Overview
......
......@@ -29,19 +29,17 @@ The PXF API exposes the following interfaces:
| `WriteAccessor` | Writes `OneRow` records to the external data source. |
Refer to the [PXF API JavaDocs](http://hawq.incubator.apache.org/docs/pxf/javadoc/) for detailed information about the classes and interfaces exposed by the API.
## <a id="api_info"></a>General PXF API Information
### <a id="pkg_name"></a>Package Name
The PXF API base package name is `org.apache.hawq.pxf.api`. All PXF API classes and interfaces reside in this package.
The PXF API base package name is `org.greenplum.pxf.api`. All PXF API classes and interfaces reside in this package.
### <a id="jar_file"></a>JAR File
You need the PXF API JAR file to develop with the PXF SDK. This file is named `pxf-api-<version>.jar`, where `<version>` is a dot-separated 4 digit version number. For example:
``` shell
pxf-api-3.3.0.0.jar
pxf-api-4.0.0.jar
```
*PXF JAR files are not currently available from a remote repository.* You can obtain the PXF API JAR file from your Greenplum Database installation here:
......
......@@ -45,7 +45,7 @@ PXF utilizes `log4j` for service-level logging. PXF-service-related log messages
PXF provides more detailed logging when the `DEBUG` level is enabled. To configure PXF `DEBUG` logging, uncomment the following line in `pxf-log4j.properties`:
``` shell
#log4j.logger.org.apache.hawq.pxf=DEBUG
#log4j.logger.org.greenplum.pxf=DEBUG
```
Copy the `pxf-log4j.properties` file to each segment host and restart the PXF service on *each* Greenplum Database segment host. For example:
......@@ -74,7 +74,7 @@ dbname=> SELECT * FROM hdfstest;
Examine/collect the log messages from `pxf-service.log`.
**Note**: `DEBUG` logging is quite verbose and has a performance impact. Remember to turn off PXF service `DEBUG` logging after you have collected the desired information.
### <a id="pxfdblogmsg"></a>Client-Level Logging
......
......@@ -34,7 +34,7 @@ You must explicitly enable the PXF extension in each Greenplum Database in which
**Note**: You must have Greenplum Database administrator privileges to create an extension.
### <a id="enable-pxf-steps"></a>Enable Procedure
Perform the following procedure for **_each_** database in which you want to use PXF:
......@@ -50,11 +50,11 @@ Perform the following procedure for **_each_** database in which you want to use
``` sql
database-name=# CREATE EXTENSION pxf;
```
Creating the `pxf` extension registers the `pxf` protocol and the call handlers required for PXF to access external data.
### <a id="disable-pxf-steps"></a>Disable Procedure
When you no longer want to use PXF on a specific database, you must explicitly disable the PXF extension for that database:
1. Connect to the database as the `gpadmin` user:
......@@ -62,30 +62,30 @@ When you no longer want to use PXF on a specific database, you must explicitly d
``` shell
gpadmin@gpmaster$ psql -d <database-name> -U gpadmin
```
2. Drop the PXF extension:
``` sql
database-name=# DROP EXTENSION pxf;
```
The `DROP` command fails if there are any currently defined external tables using the `pxf` protocol. Add the `CASCADE` option if you choose to forcibly remove these external tables.
## <a id="access_pxf"></a>Granting Access to PXF
To read external data with PXF, you create an external table with the `CREATE EXTERNAL TABLE` command that specifies the `pxf` protocol. You must specifically grant `SELECT` permission to the `pxf` protocol to all non-`SUPERUSER` Greenplum Database roles that require such access.
To read external data with PXF, you create an external table with the `CREATE EXTERNAL TABLE` command that specifies the `pxf` protocol. You must specifically grant `SELECT` permission to the `pxf` protocol to all non-`SUPERUSER` Greenplum Database roles that require such access.
To grant a specific role access to the `pxf` protocol, use the `GRANT` command. For example, to grant the role named `bill` read access to data referenced by an external table created with the `pxf` protocol:
``` sql
GRANT SELECT ON PROTOCOL pxf TO bill;
GRANT SELECT ON PROTOCOL pxf TO bill;
```
To write data to an external data store with PXF, you create an external table with the `CREATE WRITABLE EXTERNAL TABLE` command that specifies the `pxf` protocol. You must specifically grant `INSERT` permission to the `pxf` protocol to all non-`SUPERUSER` Greenplum Database roles that require such access. For example:
``` sql
GRANT INSERT ON PROTOCOL pxf TO bill;
GRANT INSERT ON PROTOCOL pxf TO bill;
```
## <a id="filter-pushdown"></a>Configuring Filter Pushdown
......@@ -99,7 +99,7 @@ SHOW gp_external_enable_filter_pushdown;
SET gp_external_enable_filter_pushdown TO 'on';
```
**Note:** Some external data sources do not support filter pushdown. Also, filter pushdown may not be supported with certain data types or operators. If a query accesses a data source that does not support filter push-down for the query constraints, the query is instead executed without filter pushdown (the data is filtered after it is transferred to Greenplum Database).
**Note:** Some external data sources do not support filter pushdown. Also, filter pushdown may not be supported with certain data types or operators. If a query accesses a data source that does not support filter push-down for the query constraints, the query is instead executed without filter pushdown (the data is filtered after it is transferred to Greenplum Database).
PXF accesses data sources using different connectors, and filter pushdown support is determined by the specific connector implementation. The following PXF connectors support filter pushdown:
......@@ -157,13 +157,13 @@ A PXF profile definition includes the name of the profile, a description, and th
``` xml
<profile>
<name>HdfsTextSimple</name>
<description>This profile is suitable for using when reading
<description>This profile is suitable for using when reading
delimited single line records from plain text files on HDFS
</description>
<plugins>
<fragmenter>org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter>
<accessor>org.apache.hawq.pxf.plugins.hdfs.LineBreakAccessor</accessor>
<resolver>org.apache.hawq.pxf.plugins.hdfs.StringPassResolver</resolver>
<fragmenter>org.greenplum.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter>
<accessor>org.greenplum.pxf.plugins.hdfs.LineBreakAccessor</accessor>
<resolver>org.greenplum.pxf.plugins.hdfs.StringPassResolver</resolver>
</plugins>
</profile>
```
......@@ -175,7 +175,7 @@ A PXF profile definition includes the name of the profile, a description, and th
You use PXF to access data stored on external systems. Depending upon the external data store, this access may require that you install and/or configure additional components or services for the external data store. For example, to use PXF to access a file stored in HDFS, you must install a Hadoop client on each Greenplum Database segment host.
PXF depends on JAR files and other configuration information provided by these additional components. The `$GPHOME/pxf/conf/pxf-private.classpath` and `$GPHOME/pxf/conf/pxf-public.classpath` configuration files identify PXF JAR dependencies. In most cases, PXF manages the `pxf-private.classpath` file, adding entries as necessary based on your Hadoop distribution and optional Hive and HBase client installations.
PXF depends on JAR files and other configuration information provided by these additional components. The `$GPHOME/pxf/conf/pxf-private.classpath` and `$GPHOME/pxf/conf/pxf-public.classpath` configuration files identify PXF JAR dependencies. In most cases, PXF manages the `pxf-private.classpath` file, adding entries as necessary based on your Hadoop distribution and optional Hive and HBase client installations.
Should you need to add additional JAR dependencies for PXF, for example a JDBC driver JAR file, you must add them to the `pxf-public.classpath` file on each segment host, and then restart PXF on each host.
......@@ -193,11 +193,11 @@ FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
The `LOCATION` clause in a `CREATE EXTERNAL TABLE` statement specifying the `pxf` protocol is a URI that identifies the path to, or other information describing, the location of the external data. For example, if the external data store is HDFS, the \<path-to-data\> would identify the absolute path to a specific HDFS file. If the external data store is Hive, \<path-to-data\> would identify a schema-qualified Hive table name.
Use the query portion of the URI, introduced by the question mark (?), to identify the PXF profile name.
Use the query portion of the URI, introduced by the question mark (?), to identify the PXF profile name.
You will provide profile-specific information using the optional &\<custom-option\>=\<value\> component of the `LOCATION` string and formatting information via the \<formatting-properties\> component of the string. The custom options and formatting properties supported by a specific profile are identified later in usage documentation.
You will provide profile-specific information using the optional &\<custom-option\>=\<value\> component of the `LOCATION` string and formatting information via the \<formatting-properties\> component of the string. The custom options and formatting properties supported by a specific profile are identified later in usage documentation.
Greenplum Database passes the parameters in the `LOCATION` string as headers to the PXF Java service.
Greenplum Database passes the parameters in the `LOCATION` string as headers to the PXF Java service.
<caption><span class="tablecap">Table 1. Create External Table Parameter Values and Descriptions</span></caption>
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册