This connector provides a Sink that writes partitioned files to filesystems supported by the [Flink `FileSystem` abstraction](//ci.apache.org/projects/flink/flink-docs-release-1.7/ops/filesystems.html).
该连接器提供了一个Sink,可将分区文件写入Flink FileSystem抽象支持的文件系统。
Important Note: For S3, the `StreamingFileSink` supports only the [Hadoop-based](https://hadoop.apache.org/) FileSystem implementation, not the implementation based on [Presto](https://prestodb.io/). In case your job uses the `StreamingFileSink` to write to S3 but you want to use the Presto-based one for checkpointing, it is advised to use explicitly _“s3a://”_ (for Hadoop) as the scheme for the target path of the sink and _“s3p://”_ for checkpointing (for Presto). Using _“s3://”_ for both the sink and checkpointing may lead to unpredictable behavior, as both implementations “listen” to that scheme.
Since in streaming the input is potentially infinite, the streaming file sink writes data into buckets. The bucketing behaviour is configurable but a useful default is time-based bucketing where we start writing a new bucket every hour and thus get individual files that each contain a part of the infinite output stream.
Within a bucket, we further split the output into smaller part files based on a rolling policy. This is useful to prevent individual bucket files from getting too big. This is also configurable but the default policy rolls files based on file size and a timeout, _i.e_ if no new data was written to a part file.
The `StreamingFileSink` supports both row-wise encoding formats and bulk-encoding formats, such as [Apache Parquet](http://parquet.apache.org).
该StreamingFileSink同时支持逐行编码格式和批量编码格式,如Apache的镶木。
#### Using Row-encoded Output Formats
#### Using Row-encoded Output Formats 使用行编码的输出格式
The only required configuration are the base path where we want to output our data and an [Encoder](//ci.apache.org/projects/flink/flink-docs-release-1.7/api/java/org/apache/flink/api/common/serialization/Encoder.html) that is used for serializing records to the `OutputStream` for each file.
This will create a streaming sink that creates hourly buckets and uses a default rolling policy. The default bucket assigner is [DateTimeBucketAssigner](//ci.apache.org/projects/flink/flink-docs-release-1.7/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/bucketassigners/DateTimeBucketAssigner.html) and the default rolling policy is [DefaultRollingPolicy](//ci.apache.org/projects/flink/flink-docs-release-1.7/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/DefaultRollingPolicy.html). You can specify a custom [BucketAssigner](//ci.apache.org/projects/flink/flink-docs-release-1.7/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/BucketAssigner.html) and [RollingPolicy](//ci.apache.org/projects/flink/flink-docs-release-1.7/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/RollingPolicy.html) on the sink builder. Please check out the JavaDoc for [StreamingFileSink](//ci.apache.org/projects/flink/flink-docs-release-1.7/api/java/org/apache/flink/streaming/api/functions/sink/filesystem/StreamingFileSink.html) for more configuration options and more documentation about the workings and interactions of bucket assigners and rolling policies.
这将创建一个流接收器,该接收器创建每小时的存储桶并使用默认的滚动策略。默认存储桶分配器是DateTimeBucketAssigner,默认滚动策略是DefaultRollingPolicy。您可以在接收器构建器上指定自定义的BucketAssigner和RollingPolicy。请查看JavaDoc for StreamingFileSink,以获取更多配置选项以及有关存储桶分配器和滚动策略的工作方式和交互作用的更多文档。
#### Using Bulk-encoded Output Formats
#### Using Bulk-encoded Output Formats 使用批量编码的输出格式
In the above example we used an `Encoder` that can encode or serialize each record individually. The streaming file sink also supports bulk-encoded output formats such as [Apache Parquet](http://parquet.apache.org). To use these, instead of `StreamingFileSink.forRowFormat()` you would use `StreamingFileSink.forBulkFormat()` and specify a `BulkWriter.Factory`.
[ParquetAvroWriters](//ci.apache.org/projects/flink/flink-docs-release-1.7/api/java/org/apache/flink/formats/parquet/avro/ParquetAvroWriters.html) has static methods for creating a `BulkWriter.Factory` for various types.
**IMPORTANT:** Bulk-encoding formats can only be combined with the `OnCheckpointRollingPolicy`, which rolls the in-progress part file on every checkpoint.
\ No newline at end of file
**IMPORTANT:** Bulk-encoding formats can only be combined with the `OnCheckpointRollingPolicy`, which rolls the in-progress part file on every checkpoint.