未验证 提交 0e3a0d96 编写于 作者: W Wing 提交者: GitHub

Refine backend (#1) (#7031)

上级 9701bd09
# Open Fetcher
Fetcher is a concept in SkyWalking backend. It uses pulling mode rather than [receiver](backend-receivers.md), which
read the data from the target systems. This mode is typically in some metrics SDKs, such as Prometheus.
Fetcher is a concept in SkyWalking backend. When reading data from target systems, the pull mode is more suitable than the [receiver](backend-receivers.md). This mode is typically found in metrics SDKs, such as Prometheus.
## Prometheus Fetcher
Suppose you want to enable some `metric-custom.yaml` files stored at `fetcher-prom-rules`, append its name to `enabledRules` of
`prometheus-fetcher` as below:
`prometheus-fetcher` as follows:
```yaml
prometheus-fetcher:
......@@ -17,7 +16,7 @@ prometheus-fetcher:
Prometheus fetcher is configured via a configuration file. The configuration file defines everything related to fetching
services and their instances, as well as which rule files to load.
OAP can load the configuration at bootstrap. If the new configuration is not well-formed, OAP fails to start up. The files
The OAP can load the configuration at bootstrap. If the new configuration is not well-formed, the OAP fails to start up. The files
are located at `$CLASSPATH/fetcher-prom-rules`.
The file is written in YAML format, defined by the scheme described below. Brackets indicate that a parameter is optional.
......@@ -26,13 +25,13 @@ A full example can be found [here](../../../../oap-server/server-bootstrap/src/m
Generic placeholders are defined as follows:
* `<duration>`: a duration This will parse a textual representation of a duration. The formats accepted are based on
* `<duration>`: This is parsed into a textual representation of a duration. The formats accepted are based on
the ISO-8601 duration format `PnDTnHnMn.nS` with days considered to be exactly 24 hours.
* `<labelname>`: a string matching the regular expression \[a-zA-Z_\]\[a-zA-Z0-9_\]*
* `<labelvalue>`: a string of unicode characters
* `<host>`: a valid string consisting of a hostname or IP followed by an optional port number
* `<path>`: a valid URL path
* `<string>`: a regular string
* `<labelname>`: A string matching the regular expression \[a-zA-Z_\]\[a-zA-Z0-9_\]*.
* `<labelvalue>`: A string of unicode characters.
* `<host>`: A valid string consisting of a hostname or IP followed by an optional port number.
* `<path>`: A valid URL path.
* `<string>`: A regular string.
```yaml
# How frequently to fetch targets.
......@@ -76,16 +75,16 @@ name: <string>
exp: <string>
```
More about MAL, please refer to [mal.md](../../concepts-and-designs/mal.md)
To know more about MAL, please refer to [mal.md](../../concepts-and-designs/mal.md)
## Kafka Fetcher
Kafka Fetcher pulls messages from Kafka Broker(s) what is the Agent delivered. Check the agent documentation about the details. Typically Tracing Segments, Service/Instance properties, JVM Metrics, and Meter system data are supported. Kafka Fetcher can work with gRPC/HTTP Receivers at the same time for adopting different transport protocols.
The Kafka Fetcher pulls messages from the Kafka Broker to learn about what agent is delivered. Check the agent documentation for details. Typically, tracing segments, service/instance properties, JVM metrics, and meter system data are supported. Kafka Fetcher can work with gRPC/HTTP Receivers at the same time for adopting different transport protocols.
Kafka Fetcher is disabled in default, and we configure as following to enable.
Kafka Fetcher is disabled by default. To enable it, configure as follows.
namespace aims to isolate multi OAP cluster when using the same Kafka cluster.
if you set a namespace for Kafka fetcher, OAP will add a prefix to topic name. you should also set namespace in `agent.config`, the property named `plugin.kafka.namespace`.
Namespace aims to isolate multi OAP cluster when using the same Kafka cluster.
If you set a namespace for Kafka fetcher, the OAP will add a prefix to topic name. You should also set namespace in the property named `plugin.kafka.namespace` in `agent.config`.
```yaml
kafka-fetcher:
......@@ -97,9 +96,9 @@ kafka-fetcher:
`skywalking-segments`, `skywalking-metrics`, `skywalking-profile`, `skywalking-managements`, `skywalking-meters`, `skywalking-logs`
and `skywalking-logs-json` topics are required by `kafka-fetcher`.
If they do not exist, Kafka Fetcher will create them in default. Also, you can create them by yourself before the OAP server started.
If they do not exist, Kafka Fetcher will create them by default. Also, you can create them by yourself before the OAP server starts.
When using the OAP server automatic creation mechanism, you could modify the number of partitions and replications of the topics through the following configurations:
When using the OAP server automatic creation mechanism, you could modify the number of partitions and replications of the topics using the following configurations:
```yaml
kafka-fetcher:
......@@ -114,9 +113,9 @@ kafka-fetcher:
consumePartitions: ${SW_KAFKA_FETCHER_CONSUME_PARTITIONS:""}
```
In cluster mode, all topics have the same number of partitions. Then we have to set `"isSharding"` to `"true"` and assign the partitions to consume for OAP server. The OAP server can use commas to separate multiple partitions.
In the cluster mode, all topics have the same number of partitions. Set `"isSharding"` to `"true"` and assign the partitions to consume for the OAP server. Use commas to separate multiple partitions for the OAP server.
Kafka Fetcher allows to configure all the Kafka producers listed [here](http://kafka.apache.org/24/documentation.html#consumerconfigs) in property `kafkaConsumerConfig`. Such as:
The Kafka Fetcher allows you to configure all the Kafka producers listed [here](http://kafka.apache.org/24/documentation.html#consumerconfigs) in property `kafkaConsumerConfig`. For example:
```yaml
kafka-fetcher:
selector: ${SW_KAFKA_FETCHER:default}
......@@ -133,7 +132,7 @@ kafka-fetcher:
...
```
When use Kafka MirrorMaker 2.0 to replicate topics between Kafka clusters, you can set the source Kafka Cluster alias(mm2SourceAlias) and separator(mm2SourceSeparator) according to your Kafka MirrorMaker [config](https://github.com/apache/kafka/tree/trunk/connect/mirror#remote-topics).
When using Kafka MirrorMaker 2.0 to replicate topics between Kafka clusters, you can set the source Kafka Cluster alias (mm2SourceAlias) and separator (mm2SourceSeparator) according to your Kafka MirrorMaker [config](https://github.com/apache/kafka/tree/trunk/connect/mirror#remote-topics).
```yaml
kafka-fetcher:
selector: ${SW_KAFKA_FETCHER:default}
......
# Health Check
Health check intends to provide a unique approach to check the healthy status of OAP server. It includes the health status
of modules, GraphQL and gRPC services readiness.
Health check intends to provide a unique approach to check the health status of the OAP server. It includes the health status
of modules, GraphQL, and gRPC services readiness.
## Health Checker Module.
Health Checker module could solute how to observe the health status of modules. We can active it by below:
The Health Checker module helps observe the health status of modules. You may activate it as follows:
```yaml
health-checker:
selector: ${SW_HEALTH_CHECKER:default}
default:
checkIntervalSeconds: ${SW_HEALTH_CHECKER_INTERVAL_SECONDS:5}
```
Notice, we should enable `telemetry` module at the same time. That means the provider should not be `-` and `none`.
Note: The `telemetry` module should be enabled at the same time. This means that the provider should not be `-` and `none`.
After that, we can query OAP server health status by querying GraphQL:
After that, we can check the OAP server health status by querying GraphQL:
```
query{
......@@ -38,7 +38,7 @@ If the OAP server is healthy, the response should be
}
```
Once some modules are unhealthy, for instance, storage H2 is down. The result might be like below:
If some modules are unhealthy (e.g. storage H2 is down), then the result may look as follows:
```json
{
......@@ -50,16 +50,16 @@ Once some modules are unhealthy, for instance, storage H2 is down. The result mi
}
}
```
You could refer to [checkHealth query](https://github.com/apache/skywalking-query-protocol/blob/master/common.graphqls)
Refer to [checkHealth query](https://github.com/apache/skywalking-query-protocol/blob/master/common.graphqls)
for more details.
## The readiness of GraphQL and gRPC
We could opt to above query to check the readiness of GraphQL.
Use the query above to check the readiness of GraphQL.
OAP has implemented [gRPC Health Checking Protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md).
We could use [grpc-health-probe](https://github.com/grpc-ecosystem/grpc-health-probe) or any other tools to check the
OAP has implemented the [gRPC Health Checking Protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md).
You may use the [grpc-health-probe](https://github.com/grpc-ecosystem/grpc-health-probe) or any other tools to check the
health of OAP gRPC services.
## CLI tool
Please follow the [CLI doc](https://github.com/apache/skywalking-cli#checkhealth) to get the health status score directly through the `checkhealth` command.
\ No newline at end of file
Please follow the [CLI doc](https://github.com/apache/skywalking-cli#checkhealth) to get the health status score directly through the `checkhealth` command.
# Init mode
SkyWalking backend supports multiple storage implementors. Most of them could initialize the storage,
such as Elastic Search, Database automatically when the backend startup in first place.
The SkyWalking backend supports multiple storage implementors. Most of them would automatically initialize the storage,
such as Elastic Search or Database, when the backend starts up at first.
But there are some unexpected happens based on the storage, such as
`When create Elastic Search indexes concurrently, because of several backend instances startup at the same time.`,
there is a change, the APIs of Elastic Search would be blocked without any exception.
And this has more chances happen in container management platform, like k8s.
But there may be some unexpected events that may occur with the storage, such as
`When multiple Elastic Search indexes are created concurrently, these backend instances would start up at the same time.`,
When there is a change, the APIs of Elastic Search would be blocked without reporting any exception.
This often happens on container management platforms, such as k8s.
That is where you need **Init mode** startup.
This is where you need the **Init mode** startup.
## Solution
Only one single instance should run in **Init mode** before other instances start up.
Only one single instance should run in the **Init mode** before other instances start up.
And this instance will exit graciously after all initialization steps are done.
Use `oapServiceInit.sh`/`oapServiceInit.bat` to start up backend. You should see the following logs
Use `oapServiceInit.sh`/`oapServiceInit.bat` to start up backend. You should see the following logs:
> 2018-11-09 23:04:39,465 - org.apache.skywalking.oap.server.starter.OAPServerStartUp -2214 [main] INFO [] - OAP starts up in init mode successfully, exit now...
## Kubernetes
Initialization in this mode would be included in our Kubernetes scripts and Helm.
\ No newline at end of file
Initialization in this mode would be included in our Kubernetes scripts and Helm.
# IP and port setting
Backend is using IP and port binding, in order to support the OS having multiple IPs.
The binding/listening IP and port are specified by core module
The backend uses IP and port binding in order to allow the OS to have multiple IPs.
The binding/listening IP and port are specified by the core module
```yaml
core:
default:
......@@ -10,21 +10,19 @@ core:
gRPCHost: 0.0.0.0
gRPCPort: 11800
```
There are two IP/port pair for gRPC and HTTP rest services.
There are two IP/port pairs for gRPC and HTTP REST services.
- Most agents and probes use gRPC service for better performance and code readability.
- Few agent use rest service, because gRPC may be not supported in that language.
- UI uses rest service, but data in GraphQL format, always.
- Some agents use REST service, because gRPC may be not supported in that language.
- The UI uses REST service, but the data is always in GraphQL format.
## Notice
## Note
### IP binding
In case some users are not familiar with IP binding, you should know, after you did that,
the client could only use this IP to access the service. For example, bind `172.09.13.28`, even you are
in this machine, must use `172.09.13.28` rather than `127.0.0.1` or `localhost` to access the service.
For users who are not familiar with IP binding, note that once IP binding is complete, the client could only use this IP to access the service. For example, if `172.09.13.28` is bound, even if you are
in this machine, you must use `172.09.13.28`, rather than `127.0.0.1` or `localhost`, to access the service.
### Module provider specified IP and port
The IP and port in core are only default provided by core. But some module provider may provide other
IP and port settings, this is common. Such as many receiver modules provide this.
The IP and port in the core module are provided by default. But it is common for some module providers, such as receiver modules, to provide other IP and port settings.
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册