未验证 提交 41bc528a 编写于 作者: W Wing 提交者: GitHub

Refine concepts and designs (#6677)

上级 573c3359
# Observability Analysis Language
Provide OAL(Observability Analysis Language) to analysis incoming data in streaming mode.
OAL(Observability Analysis Language) serves to analyze incoming data in streaming mode.
OAL focuses on metrics in Service, Service Instance and Endpoint. Because of that, the language is easy to
OAL focuses on metrics in Service, Service Instance and Endpoint. Therefore, the language is easy to
learn and use.
Since 6.3, the OAL engine is embedded in OAP server runtime, as `oal-rt`(OAL Runtime).
OAL scripts now locate in `/config` folder, user could simply change and reboot the server to make it effective.
But still, OAL script is compile language, OAL Runtime generates java codes dynamically.
Since 6.3, the OAL engine is embedded in OAP server runtime as `oal-rt`(OAL Runtime).
OAL scripts are now found in the `/config` folder, and users could simply change and reboot the server to run them.
However, the OAL script is a compiled language, and the OAL Runtime generates java codes dynamically.
You could open set `SW_OAL_ENGINE_DEBUG=Y` at system env, to see which classes generated.
You can open set `SW_OAL_ENGINE_DEBUG=Y` at system env to see which classes are generated.
## Grammar
Scripts should be named as `*.oal`
Scripts should be named `*.oal`
```
// Declare the metrics.
METRICS_NAME = from(SCOPE.(* | [FIELD][,FIELD ...]))
......@@ -24,84 +24,82 @@ disable(METRICS_NAME);
```
## Scope
Primary **SCOPE**s are `All`, `Service`, `ServiceInstance`, `Endpoint`, `ServiceRelation`, `ServiceInstanceRelation`, `EndpointRelation`.
Also there are some secondary scopes, which belongs to one primary scope.
Primary **SCOPE**s are `All`, `Service`, `ServiceInstance`, `Endpoint`, `ServiceRelation`, `ServiceInstanceRelation`, and `EndpointRelation`.
There are also some secondary scopes which belong to a primary scope.
Read [Scope Definitions](scope-definitions.md), you can find all existing Scopes and Fields.
See [Scope Definitions](scope-definitions.md), where you can find all existing Scopes and Fields.
## Filter
Use filter to build the conditions for the value of fields, by using field name and expression.
Use filter to build conditions for the value of fields by using field name and expression.
The expressions support to link by `and`, `or` and `(...)`.
The OPs support `==`, `!=`, `>`, `<`, `>=`, `<=`, `in [...]` ,`like %...`, `like ...%` , `like %...%` , `contain` and `not contain`, with type detection based of field type. Trigger compile
or code generation error if incompatible.
The expressions support linking by `and`, `or` and `(...)`.
The OPs support `==`, `!=`, `>`, `<`, `>=`, `<=`, `in [...]` ,`like %...`, `like ...%` , `like %...%` , `contain` and `not contain`, with type detection based on field type. In the event of incompatibility, compile or code generation errors may be triggered.
## Aggregation Function
The default functions are provided by SkyWalking OAP core, and could implement more.
The default functions are provided by the SkyWalking OAP core, and it is possible to implement additional functions.
Provided functions
Functions provided
- `longAvg`. The avg of all input per scope entity. The input field must be a long.
> instance_jvm_memory_max = from(ServiceInstanceJVMMemory.max).longAvg();
In this case, input are request of each ServiceInstanceJVMMemory scope, avg is based on field `max`.
In this case, the input represents the request of each ServiceInstanceJVMMemory scope, and avg is based on field `max`.
- `doubleAvg`. The avg of all input per scope entity. The input field must be a double.
> instance_jvm_cpu = from(ServiceInstanceJVMCPU.usePercent).doubleAvg();
In this case, input are request of each ServiceInstanceJVMCPU scope, avg is based on field `usePercent`.
- `percent`. The number or ratio expressed as a fraction of 100, for the condition matched input.
In this case, the input represents the request of each ServiceInstanceJVMCPU scope, and avg is based on field `usePercent`.
- `percent`. The number or ratio is expressed as a fraction of 100, where the input matches with the condition.
> endpoint_percent = from(Endpoint.*).percent(status == true);
In this case, all input are requests of each endpoint, condition is `endpoint.status == true`.
- `rate`. The rate expressed as a fraction of 100, for the condition matched input.
In this case, all input represents requests of each endpoint, and the condition is `endpoint.status == true`.
- `rate`. The rate expressed is as a fraction of 100, where the input matches with the condition.
> browser_app_error_rate = from(BrowserAppTraffic.*).rate(trafficCategory == BrowserAppTrafficCategory.FIRST_ERROR, trafficCategory == BrowserAppTrafficCategory.NORMAL);
In this case, all input are requests of each browser app traffic, `numerator` condition is `trafficCategory == BrowserAppTrafficCategory.FIRST_ERROR` and `denominator` condition is `trafficCategory == BrowserAppTrafficCategory.NORMAL`.
The parameter (1) is the `numerator` condition.
The parameter (2) is the `denominator` condition.
- `sum`. The sum calls per scope entity.
In this case, all input represents requests of each browser app traffic, the `numerator` condition is `trafficCategory == BrowserAppTrafficCategory.FIRST_ERROR` and `denominator` condition is `trafficCategory == BrowserAppTrafficCategory.NORMAL`.
Parameter (1) is the `numerator` condition.
Parameter (2) is the `denominator` condition.
- `sum`. The sum of calls per scope entity.
> service_calls_sum = from(Service.*).sum();
In this case, calls of each service.
In this case, the number of calls of each service.
- `histogram`. Read [Heatmap in WIKI](https://en.wikipedia.org/wiki/Heat_map)
- `histogram`. See [Heatmap in WIKI](https://en.wikipedia.org/wiki/Heat_map).
> all_heatmap = from(All.latency).histogram(100, 20);
In this case, thermodynamic heatmap of all incoming requests.
The parameter (1) is the precision of latency calculation, such as in above case, 113ms and 193ms are considered same in the 101-200ms group.
The parameter (2) is the group amount. In above case, 21(param value + 1) groups are 0-100ms, 101-200ms, ... 1901-2000ms, 2000+ms
In this case, the thermodynamic heatmap of all incoming requests.
Parameter (1) is the precision of latency calculation, such as in the above case, where 113ms and 193ms are considered the same in the 101-200ms group.
Parameter (2) is the group amount. In the above case, 21(param value + 1) groups are 0-100ms, 101-200ms, ... 1901-2000ms, 2000+ms
- `apdex`. Read [Apdex in WIKI](https://en.wikipedia.org/wiki/Apdex)
- `apdex`. See [Apdex in WIKI](https://en.wikipedia.org/wiki/Apdex).
> service_apdex = from(Service.latency).apdex(name, status);
In this case, apdex score of each service.
The parameter (1) is the service name, which effects the Apdex threshold value loaded from service-apdex-threshold.yml in the config folder.
The parameter (2) is the status of this request. The status(success/failure) effects the Apdex calculation.
In this case, the apdex score of each service.
Parameter (1) is the service name, which reflects the Apdex threshold value loaded from service-apdex-threshold.yml in the config folder.
Parameter (2) is the status of this request. The status(success/failure) reflects the Apdex calculation.
- `p99`, `p95`, `p90`, `p75`, `p50`. Read [percentile in WIKI](https://en.wikipedia.org/wiki/Percentile)
- `p99`, `p95`, `p90`, `p75`, `p50`. See [percentile in WIKI](https://en.wikipedia.org/wiki/Percentile).
> all_percentile = from(All.latency).percentile(10);
**percentile** is the first multiple value metrics, introduced since 7.0.0. As having multiple values, it could be query through `getMultipleLinearIntValues` GraphQL query.
In this case, `p99`, `p95`, `p90`, `p75`, `p50` of all incoming request. The parameter is the precision of p99 latency calculation, such as in above case, 120ms and 124 are considered same.
Before 7.0.0, use `p99`, `p95`, `p90`, `p75`, `p50` func(s) to calculate metrics separately. Still supported in 7.x, but don't be recommended, and don't be included in official OAL script.
**percentile** is the first multiple-value metric, which has been introduced since 7.0.0. As a metric with multiple values, it could be queried through the `getMultipleLinearIntValues` GraphQL query.
In this case, see `p99`, `p95`, `p90`, `p75`, and `p50` of all incoming requests. The parameter is precise to a latency at p99, such as in the above case, and 120ms and 124ms are considered to produce the same response time.
Before 7.0.0, `p99`, `p95`, `p90`, `p75`, `p50` func(s) are used to calculate metrics separately. They are still supported in 7.x, but they are no longer recommended and are not included in the current official OAL script.
> all_p99 = from(All.latency).p99(10);
In this case, p99 value of all incoming requests. The parameter is the precision of p99 latency calculation, such as in above case, 120ms and 124 are considered same.
In this case, the p99 value of all incoming requests. The parameter is precise to a latency at p99, such as in the above case, and 120ms and 124ms are considered to produce the same response time.
## Metrics name
The metrics name for storage implementor, alarm and query modules. The type inference supported by core.
The metrics name for storage implementor, alarm and query modules. The type inference is supported by core.
## Group
All metrics data will be grouped by Scope.ID and min-level TimeBucket.
- In `Endpoint` scope, the Scope.ID = Endpoint id (the unique id based on service and its Endpoint)
- In the `Endpoint` scope, the Scope.ID is same as the Endpoint ID (i.e. the unique ID based on service and its endpoint).
## Disable
`Disable` is an advanced statement in OAL, which is only used in certain case.
Some of the aggregation and metrics are defined through core hard codes,
this `disable` statement is designed for make them de-active,
such as `segment`, `top_n_database_statement`.
In default, no one is being disable.
`Disable` is an advanced statement in OAL, which is only used in certain cases.
Some of the aggregation and metrics are defined through core hard codes. Examples include `segment` and `top_n_database_statement`.
This `disable` statement is designed to render them inactive.
By default, none of them are disabled.
## Examples
```
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册