未验证 提交 54dee6ce 编写于 作者: S Shivram Mani 提交者: GitHub

Use PXF server from apache/hawq to the new greenplum/pxf repo (#5798)

PXF client in gpdb uses pxf libraries from apache hawq repo. These pxf libraries will continue being developed in a new PXF repo greenplum-db/pxf and is in the process of getting open sourced in the next few days. The PXF extension and gpdb-pxf client code will continue to remain in gpdb repo.

The following changes are included in this PR:

Transition from the old PXF namespace org.apache.hawq.pxf to org.greenplum.pxf
(there is a separate PR in the PXF repo to address the package namespace refactor
greenplum-db/pxf#5)

Doc updates to reflect the new PXF repo and the new package namespace
上级 fc008690
......@@ -170,7 +170,8 @@ make distclean
### Building GPDB with PXF
PXF is an extension framework for GPDB to enable fast access to external hadoop datasets.
Refer to [PXF extension](https://github.com/greenplum-db/gpdb/tree/master/gpAux/extensions/pxf) for more information.
Refer to [PXF extension](gpAux/extensions/pxf/README.md) for more information.
Currently, GPDB is built with PXF by default (--enable-pxf is on).
In order to build GPDB without pxf, simply invoke `./configure` with additional option `--disable-pxf`.
PXF requires curl, so `--enable-pxf` is not compatible with the `--without-libcurl` option.
......
......@@ -5,15 +5,6 @@ PXF consists of a server side JVM based component and a C client component which
This module only includes the PXF C client and the build instructions only builds the client.
Using the 'pxf' protocol with external table, GPDB can query external datasets via PXF service that runs alongside GPDB segments.
## Table of Contents
* Usage
* Initialize and start GPDB cluster
* Enable PXF extension
* Run unit tests
* Run regression tests
=======
## Usage
### Enable PXF extension in GPDB build process.
......@@ -54,23 +45,25 @@ Additional instructions on building and starting a GPDB cluster can be
found in the top-level [README.md](../../../README.md) ("_Build the
database_" section).
## Create and use PXF external table
If you wish to simply test GPDB and PXF without hadoop, you can use the Demo Profile.
The Demo profile demonstrates how GPDB can parallely the external data via the PXF agents. The data served is
static data from the PXF agents themselves.
### Install PXF Server
Please refer to [PXF Development](https://github.com/greenplum-db/pxf/blob/master/README.md) for instructions to setup PXF.
You will need one PXF server agent per Segment host.
### Create and use PXF external table
If you wish to simply test drive PXF extension without hitting any external data source, you can avoid starting any of the hadoop components (while installing the PXF Server) and simply use the Demo Profile.
The Demo profile demonstrates how GPDB using its segments can access static data served by the PXF service(s) in parallel.
```
# CREATE EXTERNAL TABLE pxf_read_test (a TEXT, b TEXT, c TEXT) \
LOCATION ('pxf://localhost:5888/tmp/dummy1' \
'?FRAGMENTER=org.apache.hawq.pxf.api.examples.DemoFragmenter' \
'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoAccessor' \
'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoTextResolver') \
'?FRAGMENTER=org.greenplum.pxf.api.examples.DemoFragmenter' \
'&ACCESSOR=org.greenplum.pxf.api.examples.DemoAccessor' \
'&RESOLVER=org.greenplum.pxf.api.examples.DemoTextResolver') \
FORMAT 'TEXT' (DELIMITER ',');
```
Please refer to [PXF Setup](https://cwiki.apache.org/confluence/display/HAWQ/PXF+Build+and+Install) for instructions to setup PXF.
Once you install and run PXF server alongside the GPDB segments, you can select data from the demo PXF profile:
```
# SELECT * from pxf_read_test order by a;
......@@ -90,7 +83,7 @@ If you wish to use PXF with Hadoop, you will need to integrate with Hdfs or Hive
## Run regression tests
### Run regression tests
```
make installcheck
......
......@@ -9,20 +9,20 @@ DROP EXTENSION pxf;
CREATE EXTENSION pxf;
CREATE EXTERNAL TABLE pxf_read_test (a TEXT, b TEXT, c TEXT)
LOCATION ('pxf://tmp/dummy1'
'?FRAGMENTER=org.apache.hawq.pxf.api.examples.DemoFragmenter'
'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoAccessor'
'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoTextResolver')
'?FRAGMENTER=org.greenplum.pxf.api.examples.DemoFragmenter'
'&ACCESSOR=org.greenplum.pxf.api.examples.DemoAccessor'
'&RESOLVER=org.greenplum.pxf.api.examples.DemoTextResolver')
FORMAT 'TEXT' (DELIMITER ',');
CREATE EXTERNAL TABLE pxf_readcustom_test (a TEXT, b TEXT, c TEXT)
LOCATION ('pxf://tmp/dummy1'
'?FRAGMENTER=org.apache.hawq.pxf.api.examples.DemoFragmenter'
'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoAccessor'
'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoResolver')
'?FRAGMENTER=org.greenplum.pxf.api.examples.DemoFragmenter'
'&ACCESSOR=org.greenplum.pxf.api.examples.DemoAccessor'
'&RESOLVER=org.greenplum.pxf.api.examples.DemoResolver')
FORMAT 'CUSTOM' (formatter='pxfwritable_import');
CREATE WRITABLE EXTERNAL TABLE pxf_write_test (a int, b TEXT)
LOCATION ('pxf:///tmp/pxf?'
'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoFileWritableAccessor'
'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoTextResolver')
'&ACCESSOR=org.greenplum.pxf.api.examples.DemoFileWritableAccessor'
'&RESOLVER=org.greenplum.pxf.api.examples.DemoTextResolver')
FORMAT 'TEXT' (DELIMITER ',') DISTRIBUTED BY (a);
CREATE TABLE origin (a int, b TEXT) DISTRIBUTED BY (a);
INSERT INTO origin SELECT i, 'data_' || i FROM generate_series(10,99) AS i;
......@@ -13,22 +13,22 @@ CREATE EXTENSION pxf;
CREATE EXTERNAL TABLE pxf_read_test (a TEXT, b TEXT, c TEXT)
LOCATION ('pxf://tmp/dummy1'
'?FRAGMENTER=org.apache.hawq.pxf.api.examples.DemoFragmenter'
'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoAccessor'
'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoTextResolver')
'?FRAGMENTER=org.greenplum.pxf.api.examples.DemoFragmenter'
'&ACCESSOR=org.greenplum.pxf.api.examples.DemoAccessor'
'&RESOLVER=org.greenplum.pxf.api.examples.DemoTextResolver')
FORMAT 'TEXT' (DELIMITER ',');
CREATE EXTERNAL TABLE pxf_readcustom_test (a TEXT, b TEXT, c TEXT)
LOCATION ('pxf://tmp/dummy1'
'?FRAGMENTER=org.apache.hawq.pxf.api.examples.DemoFragmenter'
'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoAccessor'
'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoResolver')
'?FRAGMENTER=org.greenplum.pxf.api.examples.DemoFragmenter'
'&ACCESSOR=org.greenplum.pxf.api.examples.DemoAccessor'
'&RESOLVER=org.greenplum.pxf.api.examples.DemoResolver')
FORMAT 'CUSTOM' (formatter='pxfwritable_import');
CREATE WRITABLE EXTERNAL TABLE pxf_write_test (a int, b TEXT)
LOCATION ('pxf:///tmp/pxf?'
'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoFileWritableAccessor'
'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoTextResolver')
'&ACCESSOR=org.greenplum.pxf.api.examples.DemoFileWritableAccessor'
'&RESOLVER=org.greenplum.pxf.api.examples.DemoTextResolver')
FORMAT 'TEXT' (DELIMITER ',') DISTRIBUTED BY (a);
CREATE TABLE origin (a int, b TEXT) DISTRIBUTED BY (a);
......
......@@ -52,6 +52,6 @@ the Greenplum hosts.
- The deprecated <codeph>gpcheck</codeph> management utility and its replacement <codeph>gpsupport</codeph> are only supported with Pivotal Greenplum Database.
- To use the Greenplum Platform Extension Framework (PXF) with open source Greenplum Database, you must separately build and install the PXF server software. Refer to the build instructions in the PXF README files in the Greenplum Database and Apache HAWQ (incubating) repositories.
- To use the Greenplum Platform Extension Framework (PXF) with open source Greenplum Database, you must separately build and install the PXF server software. Refer to the build instructions in the PXF README files in the Greenplum Database and PXF repositories.
- Suggestions to contact Pivotal Technical Support in this documentation are intended only for Pivotal Greenplum Database customers.
......@@ -21,7 +21,7 @@ specific language governing permissions and limitations
under the License.
-->
The Greenplum Platform Extension Framework (PXF) provides parallel, high throughput data access and federated queries across heterogeneous data sources via built-in connectors that map a Greenplum Database external table definition to an external data source. This Greenplum Database extension is based on [PXF](https://cwiki.apache.org/confluence/display/HAWQ/PXF) from Apache HAWQ (incubating).
The Greenplum Platform Extension Framework (PXF) provides parallel, high throughput data access and federated queries across heterogeneous data sources via built-in connectors that map a Greenplum Database external table definition to an external data source. PXF has its roots from Apache HAWQ project.
- **[PXF Architecture](intro_pxf.html)**
......
......@@ -111,14 +111,14 @@ Before building the *Demo* connector, ensure that you have:
Perform the following procedure to create a local copy of the *Demo* connector source code, update package names, configure compile-time dependencies, and use `gradle` to build the connector.
1. Download the PXF *Demo* connector source code. You can obtain the PXF source code from the Apache HAWQ (incubating) `incubator-hawq` `github` repository. For example:
1. Download the PXF *Demo* connector source code from the Greenplum PXF git repo. You can obtain the PXF source code from Greenplum PXF `github` repository. For example:
``` shell
user@devsystem$ cd $PXFDEV_BASE
user@devsystem$ git clone https://github.com/apache/incubator-hawq.git
user@devsystem$ git clone https://github.com/greenplum-db/pxf.git
```
The `clone` operation creates a directory named `incubator-hawq/` in your current working directory.
The `clone` operation creates a directory named `pxf/` in your current working directory.
2. Create a project directory for your copy of the source code and navigate to that directory. For example:
......@@ -134,24 +134,18 @@ Perform the following procedure to create a local copy of the *Demo* connector s
user@devsystem$ cp $PXFDEV_BASE/pxf-api-<version>.jar libs/
```
4. The source code for the PXF *Demo* connector is located in the `incubator-hawq/pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/examples` directory of the repository you cloned in Step 1. Copy this code to your work area. For example:
4. The source code for the PXF *Demo* connector is located in the `pxf/server/pxf-api/src/main/java/org/greenplum/pxf/api/examples` directory of the repository you cloned in Step 1. Copy this code to your work area. For example:
``` shell
user@devsystem$ mkdir -p src/main/java/org/greenplum/pxf/example/demo
user@devsystem$ cd src/main/java/org/greenplum/pxf/example/demo
user@devsystem$ cp $PXFDEV_BASE/incubator-hawq/pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/examples/* .
user@devsystem$ cp $PXFDEV_BASE/pxf/server/pxf-api/src/main/java/org/greenplum/pxf/api/examples/* .
```
5. The original PXF *Demo* connector resides in the `org.apache.hawq.pxf.api.examples` package. Your *Demo* connector resides in a package named `org.greenplum.pxf.example.demo`. Update the package name in your local copy of the *Demo* connector source code. You can edit the files, run a script, etc. For example:
5. The original PXF *Demo* connector resides in the `org.greenplum.pxf.api.examples` package. Your *Demo* connector resides in a package named `org.greenplum.pxf.example.demo`. Update the package name in your local copy of the *Demo* connector source code. You can edit the files, run a script, etc. For example:
``` shell
user@devsystem$ sed -i.bak s/"org.apache.hawq.pxf.api.examples"/"org.greenplum.pxf.example.demo"/g *.java
```
The `sed` command above creates a backup of each file. Remove the backup files. For example:
``` shell
user@devsystem$ rm *.bak
user@devsystem$ find . -name '*.java' -exec sed -i '' s/"org.apache.hawq.pxf.api.examples"/"org.apache.hawq.pxf.example.demo"/g {} +
```
6. Initialize a `gradle` Java library project for your *Demo* connector. For example:
......@@ -200,7 +194,7 @@ Perform the following procedure to create a local copy of the *Demo* connector s
testCompile 'junit:junit:4.12'
<b>compile 'commons-logging:commons-logging:1.1.3'
compile 'org.apache.hawq.pxf.api:pxf-api:3.3.0.0'</b>
compile 'org.greenplum.pxf.api:pxf-api:4.0.0'</b>
}
</pre>
......
......@@ -43,9 +43,9 @@ You can re-use a single plug-in class name in multiple profile definitions. For
delimited single line records from plain text files on HDFS
</description>
<plugins>
<fragmenter>org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter>
<accessor>org.apache.hawq.pxf.plugins.hdfs.LineBreakAccessor</accessor>
<resolver>org.apache.hawq.pxf.plugins.hdfs.StringPassResolver</resolver>
<fragmenter>org.greenplum.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter>
<accessor>org.greenplum.pxf.plugins.hdfs.LineBreakAccessor</accessor>
<resolver>org.greenplum.pxf.plugins.hdfs.StringPassResolver</resolver>
</plugins>
</profile>
......@@ -56,9 +56,9 @@ You can re-use a single plug-in class name in multiple profile definitions. For
It is not splittable (non parallel) and slower than HdfsTextSimple.
</description>
<plugins>
<fragmenter>org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter>
<accessor>org.apache.hawq.pxf.plugins.hdfs.QuotedLineBreakAccessor</accessor>
<resolver>org.apache.hawq.pxf.plugins.hdfs.StringPassResolver</resolver>
<fragmenter>org.greenplum.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter>
<accessor>org.greenplum.pxf.plugins.hdfs.QuotedLineBreakAccessor</accessor>
<resolver>org.greenplum.pxf.plugins.hdfs.StringPassResolver</resolver>
</plugins>
</profile>
```
......
......@@ -3,7 +3,7 @@ title: Using the PXF Java SDK
---
The Greenplum Platform Extension Framework (PXF) SDK provides the Java classes and interfaces that you use to create connectors to new external data sources, data formats, and data access APIs from Greenplum Database. You can extend PXF functionality *without changing Greenplum Database* when you use the PXF Java SDK.
PXF in Greenplum Database is based on [PXF](https://cwiki.apache.org/confluence/display/HAWQ/PXF) from the open source Apache HAWQ (incubating) project. You can contribute to PXF development via the [Greenplum Database](https://github.com/greenplum-db/gpdb/tree/master/gpAux/extensions/pxf) and the [Apache HAWQ (incubating)](https://github.com/apache/incubator-hawq/tree/master/pxf) open source `github` repositories.
PXF in Greenplum Database has its roots in the Apache HAWQ project. You can contribute to Greenplum PXF development via the open source github repositories for [PXF Server Plugins](https://github.com/greenplum-db/pxf) and the [Greenplum PXF Extension/Client](https://github.com/greenplum-db/gpdb/tree/master/gpAux/extensions/pxf).
## <a id="prereqs"></a>Topic Overview
......
......@@ -29,19 +29,17 @@ The PXF API exposes the following interfaces:
| `WriteAccessor` | Writes `OneRow` records to the external data source. |
Refer to the [PXF API JavaDocs](http://hawq.incubator.apache.org/docs/pxf/javadoc/) for detailed information about the classes and interfaces exposed by the API.
## <a id="api_info"></a>General PXF API Information
### <a id="pkg_name"></a>Package Name
The PXF API base package name is `org.apache.hawq.pxf.api`. All PXF API classes and interfaces reside in this package.
The PXF API base package name is `org.greenplum.pxf.api`. All PXF API classes and interfaces reside in this package.
### <a id="jar_file"></a>JAR File
You need the PXF API JAR file to develop with the PXF SDK. This file is named `pxf-api-<version>.jar`, where `<version>` is a dot-separated 4 digit version number. For example:
``` shell
pxf-api-3.3.0.0.jar
pxf-api-4.0.0.jar
```
*PXF JAR files are not currently available from a remote repository.* You can obtain the PXF API JAR file from your Greenplum Database installation here:
......
......@@ -45,7 +45,7 @@ PXF utilizes `log4j` for service-level logging. PXF-service-related log messages
PXF provides more detailed logging when the `DEBUG` level is enabled. To configure PXF `DEBUG` logging, uncomment the following line in `pxf-log4j.properties`:
``` shell
#log4j.logger.org.apache.hawq.pxf=DEBUG
#log4j.logger.org.greenplum.pxf=DEBUG
```
Copy the `pxf-log4j.properties` file to each segment host and restart the PXF service on *each* Greenplum Database segment host. For example:
......
......@@ -161,9 +161,9 @@ A PXF profile definition includes the name of the profile, a description, and th
delimited single line records from plain text files on HDFS
</description>
<plugins>
<fragmenter>org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter>
<accessor>org.apache.hawq.pxf.plugins.hdfs.LineBreakAccessor</accessor>
<resolver>org.apache.hawq.pxf.plugins.hdfs.StringPassResolver</resolver>
<fragmenter>org.greenplum.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter>
<accessor>org.greenplum.pxf.plugins.hdfs.LineBreakAccessor</accessor>
<resolver>org.greenplum.pxf.plugins.hdfs.StringPassResolver</resolver>
</plugins>
</profile>
```
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册