diff --git a/docs/1.7/23.md b/docs/1.7/23.md index 2e185ce22d5726409fd6a4519ea7b11eaa8dd407..8c58a40fe75e847fb8dee3bd0a46f305327497b6 100644 --- a/docs/1.7/23.md +++ b/docs/1.7/23.md @@ -1,41 +1,41 @@ -# Queryable State Beta +# Queryable State Beta 可查询状态Beta -**Note:** The client APIs for queryable state are currently in an evolving state and there are **no guarantees** made about stability of the provided interfaces. It is likely that there will be breaking API changes on the client side in the upcoming Flink versions. +**Note:** 可查询状态的客户端API当前处于不断变化的状态,**没有提供关于所提供接口的稳定性的保证**。在即将到来的FLink版本中,可能会在客户端上中断API更改。 -In a nutshell, this feature exposes Flink’s managed keyed (partitioned) state (see [Working with State](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/state.html)) to the outside world and allows the user to query a job’s state from outside Flink. For some scenarios, queryable state eliminates the need for distributed operations/transactions with external systems such as key-value stores which are often the bottleneck in practice. In addition, this feature may be particularly useful for debugging purposes. +简而言之,这个特性公开了Flink的托管键控(分区)状态(请参见[与State](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/state.html))一起工作到外部世界,并允许用户从外部Flink查询作业的状态)。对于某些场景,Queryable状态消除了对具有外部系统(例如键值存储)的分布式操作/事务的需求,而这往往是实际中的瓶颈。此外,此特性对于调试目的可能特别有用。 -**Attention:** When querying a state object, that object is accessed from a concurrent thread without any synchronization or copying. This is a design choice, as any of the above would lead to increased job latency, which we wanted to avoid. Since any state backend using Java heap space, _e.g._ `MemoryStateBackend` or `FsStateBackend`, does not work with copies when retrieving values but instead directly references the stored values, read-modify-write patterns are unsafe and may cause the queryable state server to fail due to concurrent modifications. The `RocksDBStateBackend` is safe from these issues. +**注意:** 当查询状态对象时,可以从并发线程访问该对象,而不需要进行任何同步或复制。这是一个设计选择,因为上面的任何一个都会导致工作延迟的增加,这是我们想要避免的。由于使用Java堆空间的任何状态后端,例如_`MemoryStateBackend`或`FsStateBackend`,在检索值时都不会与副本一起工作,而是直接引用存储的值,因此读-修改-写入模式是不安全的,并且可能导致可查询的状态服务器由于并发修改而失败。`RocksDBStateBackend` 不受这些问题的影响。 -## Architecture +## Architecture 建筑学 -Before showing how to use the Queryable State, it is useful to briefly describe the entities that compose it. The Queryable State feature consists of three main entities: +在演示如何使用“可查询状态”之前,简要描述组成该状态的实体非常有用。“可查询状态”功能由三个主要实体组成: -1. the `QueryableStateClient`, which (potentially) runs outside the Flink cluster and submits the user queries, -2. the `QueryableStateClientProxy`, which runs on each `TaskManager` (_i.e._ inside the Flink cluster) and is responsible for receiving the client’s queries, fetching the requested state from the responsible Task Manager on his behalf, and returning it to the client, and -3. the `QueryableStateServer` which runs on each `TaskManager` and is responsible for serving the locally stored state. +1. `QueryableStateClient`(可能)在FLink集群外部运行并提交用户查询, +2. `QueryableStateClientProxy`,它运行在每个`TaskManager`(_在Flink集群内)上,负责接收客户端的查询,代表他从Responsible Task Manager获取所请求的状态,并将其返回给客户端,以及 +3. `QueryableStateServer`运行在每个`TaskManager`上,负责为本地存储的状态提供服务。 -The client connects to one of the proxies and sends a request for the state associated with a specific key, `k`. As stated in [Working with State](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/state.html), keyed state is organized in _Key Groups_, and each `TaskManager` is assigned a number of these key groups. To discover which `TaskManager` is responsible for the key group holding `k`, the proxy will ask the `JobManager`. Based on the answer, the proxy will then query the `QueryableStateServer` running on that `TaskManager` for the state associated with `k`, and forward the response back to the client. +客户端连接到代理之一,并发送与特定密钥“k”关联的状态请求。正如[使用State](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/state.html),键控状态组织在_KEY组中]中所述,每个`TaskManager`都被分配了许多这些关键组。为了发现哪个`TaskManager`负责持有`k`的密钥组,代理将询问 `JobManager`。根据答案,代理将查询在该 `TaskManager` 上运行的 `QueryableStateServer`,以获得与 `k`关联的状态,并将响应转发回客户端。 -## Activating Queryable State +## Activating Queryable State 激活可查询状态 -To enable queryable state on your Flink cluster, you just have to copy the `flink-queryable-state-runtime_2.11-1.7.1.jar` from the `opt/` folder of your [Flink distribution](https://flink.apache.org/downloads.html "Apache Flink: Downloads"), to the `lib/` folder. Otherwise, the queryable state feature is not enabled. +要在Flink集群上启用可查询状态,只需将`flink-queryable-state-runtime_2.11-1.7.1.jar` 从您[Flink distribution](https://flink.apache.org/downloads.html "Apache Flink: Downloads"),文件夹中复制到`lib/`文件夹。否则,无法启用可查询状态功能。 -To verify that your cluster is running with queryable state enabled, check the logs of any task manager for the line: `"Started the Queryable State Proxy Server @ ..."`. +要验证您的群集是否已启用Queryable状态,请检查任何任务管理器的日志:`"Started the Queryable State Proxy Server @ ..."`。 -## Making State Queryable +## Making State Queryable 使状态Queryable -Now that you have activated queryable state on your cluster, it is time to see how to use it. In order for a state to be visible to the outside world, it needs to be explicitly made queryable by using: +现在您已经激活了集群上的Queryable状态,现在是看看如何使用它的时候了。为了使状态对外部世界可见,需要使用以下方法显式地使其可查询: -* either a `QueryableStateStream`, a convenience object which acts as a sink and offers its incoming values as queryable state, or -* the `stateDescriptor.setQueryable(String queryableStateName)` method, which makes the keyed state represented by the state descriptor, queryable. +* `QueryableStateStream`是一个方便的对象,充当接收器,并以可查询状态提供传入值,或者 +* stateDescriptor.setQueryable(String queryableStateName)`方法,它使由状态描述符表示的键状态是可查询的。 -The following sections explain the use of these two approaches. +以下各节介绍了这两种方法的使用情况。 -### Queryable State Stream +### Queryable State Stream 可查询状态流 -Calling `.asQueryableState(stateName, stateDescriptor)` on a `KeyedStream` returns a `QueryableStateStream` which offers its values as queryable state. Depending on the type of state, there are the following variants of the `asQueryableState()` method: +在`KeyedStream` 上调用`.asQueryableState(stateName, stateDescriptor)`返回一个 `QueryableStateStream` ,该 `QueryableStateStream` 提供其值为Queryable状态。根据状态的类型,`asQueryableState()` 方法有以下变体: @@ -61,9 +61,9 @@ QueryableStateStream asQueryableState( -**Note:** There is no queryable `ListState` sink as it would result in an ever-growing list which may not be cleaned up and thus will eventually consume too much memory. +**Note:** 没有可查询的`ListState` 接收器,因为它会导致一个不断增长的列表,该列表可能不会被清理,因此最终会消耗过多的内存。 -The returned `QueryableStateStream` can be seen as a sink and **cannot** be further transformed. Internally, a `QueryableStateStream` gets translated to an operator which uses all incoming records to update the queryable state instance. The updating logic is implied by the type of the `StateDescriptor` provided in the `asQueryableState` call. In a program like the following, all records of the keyed stream will be used to update the state instance via the `ValueState.update(value)`: +返回的`QueryableStateStream` 可视为sink,**不能进一步转换。在内部, `QueryableStateStream` 被转换为使用所有传入记录来更新可查询状态实例的运算符。 `asQueryableState` 调用中提供的`StateDescriptor`类型暗示了更新逻辑。在如下程序中,键控流的所有记录都将用于通过`ValueState.update(value)`更新状态实例: @@ -73,11 +73,11 @@ stream.keyBy(0).asQueryableState("query-name") -This acts like the Scala API’s `flatMapWithState`. +这类似于ScalaAPI的`flatMapWithState`。 -### Managed Keyed State +### Managed Keyed State 受管理的键控状态 -Managed keyed state of an operator (see [Using Managed Keyed State](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/state.html#using-managed-keyed-state)) can be made queryable by making the appropriate state descriptor queryable via `StateDescriptor.setQueryable(String queryableStateName)`, as in the example below: +操作符的托管键控状态(请参见[使用托管键控State](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/state.html#using-managed-keyed-state))可以通过`StateDescriptor.setQueryable(String queryableStateName)`使适当的状态描述符可查询),如下例所示: @@ -91,15 +91,15 @@ descriptor.setQueryable("query-name"); // queryable state name -**Note:** The `queryableStateName` parameter may be chosen arbitrarily and is only used for queries. It does not have to be identical to the state's own name. +**Note:** `queryableStateName`参数可以任意选择,仅用于查询。它不一定要和国家的名字相同。 -This variant has no limitations as to which type of state can be made queryable. This means that this can be used for any `ValueState`, `ReduceState`, `ListState`, `MapState`, `AggregatingState`, and the currently deprecated `FoldingState`. +该变型没有限制可以使哪种类型的状态是可查询的。这意味着,这可用于任何`ValueState`, `ReduceState`, `ListState`, `MapState`, `AggregatingState`和当前已过时的 `FoldingState`。 -## Querying State +## Querying State 查询状态 -So far, you have set up your cluster to run with queryable state and you have declared (some of) your state as queryable. Now it is time to see how to query this state. +到目前为止,您已经将集群设置为使用Queryable状态运行,并且声明(有些)您的状态为queryable。现在是了解如何查询此状态的时候了。 -For this you can use the `QueryableStateClient` helper class. This is available in the `flink-queryable-state-client` jar which must be explicitly included as a dependency in the `pom.xml` of your project along with `flink-core`, as shown below: +为此,您可以使用 `QueryableStateClient` 辅助类。这可以在`flink-queryable-state-client`jar中提供,它必须明确包含在项目 `pom.xml` 和 `flink-core`之间的依赖关系中,如下所示: @@ -118,9 +118,9 @@ For this you can use the `QueryableStateClient` helper class. This is available -For more on this, you can check how to [set up a Flink program](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/linking_with_flink.html). +有关此问题的详细信息,您可以检查如何[设置FLink程序](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/linking_with_flink.html)。 -The `QueryableStateClient` will submit your query to the internal proxy, which will then process your query and return the final result. The only requirement to initialize the client is to provide a valid `TaskManager` hostname (remember that there is a queryable state proxy running on each task manager) and the port where the proxy listens. More on how to configure the proxy and state server port(s) in the [Configuration Section](#Configuration). +`QueryableStateClient`将将您的查询提交给内部代理,然后内部代理将处理您的查询并返回最终结果。初始化客户机的唯一要求是提供一个有效的`TaskManager`主机名(请记住,每个任务管理器上都运行着一个可查询的状态代理)和代理侦听的端口。更多关于如何配置[Configuration节](#Configuration)中的代理和状态服务器端口的信息。 @@ -130,7 +130,7 @@ QueryableStateClient client = new QueryableStateClient(tmHostname, proxyPort); -With the client ready, to query a state of type `V`, associated with a key of type `K`, you can use the method: +在客户端准备好的情况下,要查询与类型`K`的密钥相关联的类型 `V`的状态,您可以使用以下方法: @@ -145,15 +145,15 @@ CompletableFuture getKvState( -The above returns a `CompletableFuture` eventually holding the state value for the queryable state instance identified by `queryableStateName` of the job with ID `jobID`. The `key` is the key whose state you are interested in and the `keyTypeInfo` will tell Flink how to serialize/deserialize it. Finally, the `stateDescriptor` contains the necessary information about the requested state, namely its type (`Value`, `Reduce`, etc) and the necessary information on how to serialize/deserialize it. +以上返回`CompletableFuture`,最终保存了ID为`jobID`的作业的 `queryableStateName` 标识的可查询状态实例的状态值。`key` 是您感兴趣的状态,而`keyTypeInfo`将告诉flink如何序列化/反序列化它。最后,`stateDescriptor`包含有关所请求状态的必要信息,即它的类型(`Value`, `Reduce`等)以及有关如何序列化/反序列化它的必要信息。 -The careful reader will notice that the returned future contains a value of type `S`, _i.e._ a `State` object containing the actual value. This can be any of the state types supported by Flink: `ValueState`, `ReduceState`, `ListState`, `MapState`, `AggregatingState`, and the currently deprecated `FoldingState`. +仔细的读者会注意到,返回的未来包含一个类型为 `S`、a `State` 对象的值,其中包含实际值。这可以是Flink支持的任何状态类型:`ValueState`, `ReduceState`, `ListState`, `MapState`, `AggregatingState`,以及当前废弃的`FoldingState`。 -**Note:** These state objects do not allow modifications to the contained state. You can use them to get the actual value of the state, _e.g._ using `valueState.get()`, or iterate over the contained `` entries, _e.g._ using the `mapState.entries()`, but you cannot modify them. As an example, calling the `add()` method on a returned list state will throw an `UnsupportedOperationException`.**Note:** The client is asynchronous and can be shared by multiple threads. It needs to be shutdown via `QueryableStateClient.shutdown()` when unused in order to free resources. +**Note:** 这些状态对象不允许对包含的状态进行修改。您可以使用它们获得状态的实际值,例如使用`valueState.get()`,或者迭代包含的 `` 条目,__使用 `mapState.entries()`,但是不能修改它们。例如,在返回的列表状态上调用‘add()’方法将抛出一个 `UnsupportedOperationException`. **注意:** 客户端是异步的,可以由多个线程共享。它需要在未使用时通过 `QueryableStateClient.shutdown()`关闭,以便释放资源。 -### Example +### Example 例子 -The following example extends the `CountWindowAverage` example (see [Using Managed Keyed State](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/state.html#using-managed-keyed-state)) by making it queryable and shows how to query this value: +下面的示例扩展了“Counterdinaverage”示例(参见[使用受管理的键状态](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/state.html#using-managed-keyed-state)),通过它可以查询并显示如何查询此值: @@ -189,7 +189,7 @@ public class CountWindowAverage extends RichFlatMapFunction, -Once used in a job, you can retrieve the job ID and then query any key’s current state from this operator: +一旦在作业中使用,就可以检索作业ID,然后从此运算符查询任何键的当前状态: @@ -217,25 +217,25 @@ resultFuture.thenAccept(response -> { -## Configuration +## Configuration 布局,构造 -The following configuration parameters influence the behaviour of the queryable state server and client. They are defined in `QueryableStateOptions`. +以下配置参数影响可查询状态服务器和客户端的行为。它们在“QueryableStateOptions”中定义。 -### State Server +### State Server 状态服务器 -* `query.server.ports`: the server port range of the queryable state server. This is useful to avoid port clashes if more than 1 task managers run on the same machine. The specified range can be: a port: “9123”, a range of ports: “50100-50200”, or a list of ranges and or points: “50100-50200,50300-50400,51234”. The default port is 9067. -* `query.server.network-threads`: number of network (event loop) threads receiving incoming requests for the state server (0 => #slots) -* `query.server.query-threads`: number of threads handling/serving incoming requests for the state server (0 => #slots). +* `query.server.ports`: 可查询状态服务器的服务器端口范围。如果超过1个任务管理器在同一台计算机上运行,则这可用于避免端口冲突。指定范围可以是:端口:“9123”、一系列端口:“50100-50200”或范围和/或点列表:“50100-50200,50300-50400,51234”。默认端口为9067。 +* `query.server.network-threads`: 接收状态服务器传入请求的网络(事件循环)线程数(0=>;#时隙) +* `query.server.query-threads`: 处理/服务状态服务器传入请求的线程数(0=>;#槽)。 -### Proxy +### Proxy 代理服务器 -* `query.proxy.ports`: the server port range of the queryable state proxy. This is useful to avoid port clashes if more than 1 task managers run on the same machine. The specified range can be: a port: “9123”, a range of ports: “50100-50200”, or a list of ranges and or points: “50100-50200,50300-50400,51234”. The default port is 9069. -* `query.proxy.network-threads`: number of network (event loop) threads receiving incoming requests for the client proxy (0 => #slots) -* `query.proxy.query-threads`: number of threads handling/serving incoming requests for the client proxy (0 => #slots). +* `query.proxy.ports`: 可查询状态代理的服务器端口范围。如果超过1个任务管理器在同一台计算机上运行,则这可用于避免端口冲突。指定范围可以是:端口:“9123”、一系列端口:“50100-50200”或范围和/或点列表:“50100-50200,50300-50400,51234”。默认端口为9069。 +* `query.proxy.network-threads`: 接收客户端代理传入请求的网络(事件循环)线程数(0=>;#时隙) +* `query.proxy.query-threads`: 处理/服务客户端代理传入请求的线程数(0=>;#槽)。 -## Limitations +## Limitations 限制,边界 -* The queryable state life-cycle is bound to the life-cycle of the job, _e.g._ tasks register queryable state on startup and unregister it on disposal. In future versions, it is desirable to decouple this in order to allow queries after a task finishes, and to speed up recovery via state replication. -* Notifications about available KvState happen via a simple tell. In the future this should be improved to be more robust with asks and acknowledgements. -* The server and client keep track of statistics for queries. These are currently disabled by default as they would not be exposed anywhere. As soon as there is better support to publish these numbers via the Metrics system, we should enable the stats. +* 可查询状态生命周期被绑定到作业的生命周期,_例如_TATES在启动时注册Queryable状态,并在处理时注销它。在未来的版本中,需要将其解耦,以便在任务完成后允许查询,并通过状态复制加快恢复速度。 +* 关于可用的KvState的通知是通过一个简单的Tell进行的。在未来,这一点应该得到改进,以便在请求和确认方面更加健壮。 +* 服务器和客户端跟踪查询的统计信息。它们当前在默认情况下被禁用,因为它们不会在任何地方被曝光。只要有更好的支持通过度量系统发布这些数字,我们就应该启用统计信息。