oal.md 12.4 KB
Newer Older
wu-sheng's avatar
wu-sheng 已提交
1
# Observability Analysis Language
wu-sheng's avatar
wu-sheng 已提交
2 3 4 5 6 7 8 9
Provide OAL(Observability Analysis Language) to analysis incoming data in streaming mode. 

OAL focuses on metric in Service, Service Instance and Endpoint. Because of that, the language is easy to 
learn and use.

Considering performance, reading and debugging, OAL is defined as a compile language. 
The OAL scrips will be compiled to normal Java codes in package stage.

wu-sheng's avatar
wu-sheng 已提交
10
## Grammar
wu-sheng's avatar
wu-sheng 已提交
11 12 13 14 15 16 17 18
Scripts should be named as `*.oal`
```

METRIC_NAME = from(SCOPE.(* | [FIELD][,FIELD ...]))
[.filter(FIELD OP [INT | STRING])]
.FUNCTION([PARAM][, PARAM ...])
```

wu-sheng's avatar
wu-sheng 已提交
19
## Scope
20 21
Primary **SCOPE**s are `All`, `Service`, `ServiceInstance`, `Endpoint`, `ServiceRelation`, `ServiceInstanceRelation`, `EndpointRelation`.
Also there are some secondary scopes, which belongs to one primary scope. 
wu-sheng's avatar
wu-sheng 已提交
22

wu-sheng's avatar
wu-sheng 已提交
23
## Field
24 25 26 27 28 29 30 31 32 33
By using Aggregation Function, the requests will group by time and **Group Key(s)** in each scope.

- SCOPE `All`

| Name | Remarks | Group Key | Type | 
|---|---|---|---|
| endpoint  | Represent the endpoint path of each request.  |   | string |
| latency  | Represent how much time of each request. |   |  int(in ms)  |
| status  | Represent whether success or fail of the request.  |   | bool(true for success)  |
| responseCode | Represent the response code of HTTP response, if this request is the HTTP call. e.g. 200, 404, 302| | int |
wu-sheng's avatar
wu-sheng 已提交
34 35


wu-sheng's avatar
wu-sheng 已提交
36
### SCOPE `Service`
wu-sheng's avatar
wu-sheng 已提交
37 38

Calculate the metric data from each request of the service. 
39 40 41 42 43 44 45 46 47 48 49

| Name | Remarks | Group Key | Type | 
|---|---|---|---|
| id | Represent the unique id of the service | yes | int |
| name | Represent the name of the service | | string |
| serviceInstanceName | Represent the name of the service instance id referred | | string |
| endpointName | Represent the name of the endpoint, such a full path of HTTP URI | | string |
| latency | Represent how much time of each request. | | int |
| status | Represent whether success or fail of the request. | | bool(true for success)  |
| responseCode | Represent the response code of HTTP response, if this request is the HTTP call | | int|
| type | Represent the type of each request. Such as: Database, HTTP, RPC, gRPC. | | enum |
wu-sheng's avatar
wu-sheng 已提交
50

wu-sheng's avatar
wu-sheng 已提交
51
### SCOPE `ServiceInstance`
wu-sheng's avatar
wu-sheng 已提交
52 53

Calculate the metric data from each request of the service instance. 
54 55 56

| Name | Remarks | Group Key | Type | 
|---|---|---|---|
wu-sheng's avatar
wu-sheng 已提交
57
| id | Represent the unique id of the service instance, usually a number. | yes | int |
58 59 60 61 62 63 64
| name |  Represent the name of the service instance. Such as `ip:port@Service Name`.  **Notice**: current native agent uses `processId@Service name` as instance name, which is useless when you want to setup a filter in aggregation. | | string|
| serviceName | Represent the name of the service. | | string |
| endpointName | Represent the name of the endpoint, such a full path of HTTP URI. | | string|
| latency | Represent how much time of each request. | | int |
| status | Represent whether success or fail of the request. | | bool(true for success) |
| responseCode | Represent the response code of HTTP response, if this request is the HTTP call. | | int |
| type | Represent the type of each request. Such as: Database, HTTP, RPC, gRPC. | | enum |
wu-sheng's avatar
wu-sheng 已提交
65

wu-sheng's avatar
wu-sheng 已提交
66
#### Secondary scopes of `ServiceInstance` 
67 68 69 70 71 72 73

Calculate the metric data if the service instance is a JVM and collected by javaagent.

1. SCOPE `ServiceInstance_JVM_CPU`

| Name | Remarks | Group Key | Type | 
|---|---|---|---|
wu-sheng's avatar
wu-sheng 已提交
74
| id | Represent the unique id of the service instance, usually a number. | yes | int |
75 76 77 78 79 80 81 82
| name |  Represent the name of the service instance. Such as `ip:port@Service Name`.  **Notice**: current native agent uses `processId@Service name` as instance name, which is useless when you want to setup a filter in aggregation. | | string|
| serviceName | Represent the name of the service. | | string |
| use_percent | Represent how much percent of cpu time cost| | double|

2. SCOPE `ServiceInstance_JVM_Memory`

| Name | Remarks | Group Key | Type | 
|---|---|---|---|
wu-sheng's avatar
wu-sheng 已提交
83
| id | Represent the unique id of the service instance, usually a number. | yes | int |
84 85 86 87 88 89 90 91 92 93 94 95
| name |  Represent the name of the service instance. Such as `ip:port@Service Name`.  **Notice**: current native agent uses `processId@Service name` as instance name, which is useless when you want to setup a filter in aggregation. | | string|
| serviceName | Represent the name of the service. | | string |
| isHeap | Represent this value the memory metric values are heap or not | | bool |
| init | See JVM document | | long |
| max | See JVM document | | long |
| used | See JVM document | | long |
| committed | See JVM document | | long |

3. SCOPE `ServiceInstance_JVM_Memory_Pool`

| Name | Remarks | Group Key | Type | 
|---|---|---|---|
wu-sheng's avatar
wu-sheng 已提交
96
| id | Represent the unique id of the service instance, usually a number. | yes | int |
97 98 99 100 101 102 103 104
| name |  Represent the name of the service instance. Such as `ip:port@Service Name`.  **Notice**: current native agent uses `processId@Service name` as instance name, which is useless when you want to setup a filter in aggregation. | | string|
| serviceName | Represent the name of the service. | | string |
| poolType | Include CODE_CACHE_USAGE, NEWGEN_USAGE, OLDGEN_USAGE, SURVIVOR_USAGE, PERMGEN_USAGE, METASPACE_USAGE based on different version of JVM. | | enum |
| init | See JVM document | | long |
| max | See JVM document | | long |
| used | See JVM document | | long |
| committed | See JVM document | | long |

wu-sheng's avatar
wu-sheng 已提交
105
### SCOPE `Endpoint`
wu-sheng's avatar
wu-sheng 已提交
106 107

Calculate the metric data from each request of the endpoint in the service. 
108 109 110 111 112 113 114 115 116 117 118

| Name | Remarks | Group Key | Type | 
|---|---|---|---|
| id | Represent the unique id of the endpoint, usually a number. | yes | int |
| name | Represent the name of the endpoint, such a full path of HTTP URI. | | string |
| serviceName | Represent the name of the service. | | string |
| serviceInstanceName | Represent the name of the service instance id referred. | | string |
| latency | Represent how much time of each request. | | int |
| status | Represent whether success or fail of the request.| | bool(true for success) |
| responseCode | Represent the response code of HTTP response, if this request is the HTTP call. | | int |
| type | Represent the type of each request. Such as: Database, HTTP, RPC, gRPC. | | enum |
wu-sheng's avatar
wu-sheng 已提交
119

wu-sheng's avatar
wu-sheng 已提交
120
### SCOPE `ServiceRelation`
wu-sheng's avatar
wu-sheng 已提交
121 122

Calculate the metric data from each request between one service and the other service
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138

| Name | Remarks | Group Key | Type | 
|---|---|---|---|
| sourceServiceId | Represent the id of the source service. | yes | int |
| sourceServiceName | Represent the name of the source service. | | string |
| sourceServiceInstanceName | Represent the name of the source service instance. | | string |
| destServiceId | Represent the id of the destination service. | yes | string |
| destServiceName | Represent the name of the destination service. | | string |
| destServiceInstanceName | Represent the name of the destination service instance.| | string|
| endpoint | Represent the endpoint used in this call. | | string
| latency | Represent how much time of each request. | | int |
| status | Represent whether success or fail of the request.| | bool(true for success) |
| responseCode | Represent the response code of HTTP response, if this request is the HTTP call. | | int |
| type | Represent the type of each request. Such as: Database, HTTP, RPC, gRPC. | | enum |
| detectPoint | Represent where is the relation detected. Values: client, server, proxy. | yes | enum|

wu-sheng's avatar
wu-sheng 已提交
139

wu-sheng's avatar
wu-sheng 已提交
140
### SCOPE `ServiceInstanceRelation`
wu-sheng's avatar
wu-sheng 已提交
141 142

Calculate the metric data from each request between one service instance and the other service instance
143 144 145 146 147 148 149 150 151 152 153 154 155 156 157

| Name | Remarks | Group Key | Type | 
|---|---|---|---|
| sourceServiceInstanceId | Represent the id of the source service instance. | yes | int|
| sourceServiceName | Represent the name of the source service. | | string |
| sourceServiceInstanceName | Represent the name of the source service instance. | | string |
| destServiceName | Represent the name of the destination service. | | |
| destServiceInstanceId | Represent the id of the destination service instance. | yes | int| 
| destServiceInstanceName | Represent the name of the destination service instance. | | string |
| endpoint | Represent the endpoint used in this call. | | string
| latency | Represent how much time of each request. | | int |
| status | Represent whether success or fail of the request.| | bool(true for success) |
| responseCode | Represent the response code of HTTP response, if this request is the HTTP call. | | int |
| type | Represent the type of each request. Such as: Database, HTTP, RPC, gRPC. | | enum |
| detectPoint | Represent where is the relation detected. Values: client, server, proxy. | yes | enum|
wu-sheng's avatar
wu-sheng 已提交
158

wu-sheng's avatar
wu-sheng 已提交
159
### SCOPE `EndpointRelation`
wu-sheng's avatar
wu-sheng 已提交
160 161 162 163 164 165

Calculate the metric data of the dependency between one endpoint and the other endpoint. 
This relation is hard to detect, also depends on tracing lib to propagate the prev endpoint. 
So `EndpointRelation` scope aggregation effects only in service under tracing by SkyWalking native agents, 
including auto instrument agents(like Java, .NET), OpenCensus SkyWalking exporter implementation or others propagate tracing context in SkyWalking spec.

166 167 168 169 170 171 172 173 174 175 176
| Name | Remarks | Group Key | Type | 
|---|---|---|---|
| endpointId | Represent the id of the endpoint as parent in the dependency. | yes | int |
| endpoint | Represent the endpoint as parent in the dependency.| | string| 
| childEndpointId | Represent the id of the endpoint being used by the parent endpoint in row(1) | yes | int| 
| childEndpoint| Represent the endpoint being used by the parent endpoint in row(2) | | string |
| rpcLatency | Represent the latency of the RPC from some codes in the endpoint to the childEndpoint. Exclude the latency caused by the endpoint(1) itself.
| status | Represent whether success or fail of the request.| | bool(true for success) |
| responseCode | Represent the response code of HTTP response, if this request is the HTTP call. | | int |
| type | Represent the type of each request. Such as: Database, HTTP, RPC, gRPC. | | enum |
| detectPoint | Represent where is the relation detected. Values: client, server, proxy. | yes | enum|
wu-sheng's avatar
wu-sheng 已提交
177

wu-sheng's avatar
wu-sheng 已提交
178
## Filter
179 180 181 182 183
Use filter to build the conditions for the value of fields, by using field name and expression. 

The expressions support to link by `and`, `or` and `(...)`. 
The OPs support `=`, `!=`, `>`, `<`, `in (v1, v2, ...`, `like "%..."`, with type detection based of field type. Trigger compile
 or code generation error if incompatible. 
wu-sheng's avatar
wu-sheng 已提交
184

wu-sheng's avatar
wu-sheng 已提交
185
## Aggregation Function
186
The default functions are provided by SkyWalking OAP core, and could implement more. 
wu-sheng's avatar
wu-sheng 已提交
187 188

Provided functions
189 190 191 192 193 194 195 196
- `avg()`. The average value. The field type must be number.
- `p99()`. The 99% of the given values should be greater or equal. The field type must be number.
- `p90()`. The 90% of the given values should be greater or equal. The field type must be number.
- `p75()`. The 75% of the given values should be greater or equal. The field type must be number.
- `p50()`. The 75% of the given values should be greater or equal. The field type must be number.
- `percent()`. The percentage of selected by filter in the whole given data. No type requirement.
- `histogram(start, step)`. Group the given value by the given step, begin with the start value.
- `sum()`. The sum number of selected by filter. No type requirement.
wu-sheng's avatar
wu-sheng 已提交
197

wu-sheng's avatar
wu-sheng 已提交
198
## Metric name
wu-sheng's avatar
wu-sheng 已提交
199 200
The metric name for storage implementor, alarm and query modules. The type inference supported by core.

wu-sheng's avatar
wu-sheng 已提交
201
## Group
wu-sheng's avatar
wu-sheng 已提交
202 203 204 205
All metric data will be grouped by Scope.ID and min-level TimeBucket. 

- In `Endpoint` scope, the Scope.ID = Endpoint id (the unique id based on service and its Endpoint)

wu-sheng's avatar
wu-sheng 已提交
206
## Examples
wu-sheng's avatar
wu-sheng 已提交
207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231
```
// Caculate p99 of both Endpoint1 and Endpoint2
Endpoint_p99 = from(Endpoint.latency).filter(name in ("Endpoint1", "Endpoint2")).summary(0.99)

// Caculate p99 of Endpoint name started with `serv`
serv_Endpoint_p99 = from(Endpoint.latency).filter(name like ("serv%")).summary(0.99)

// Caculate the avg response time of each Endpoint
Endpoint_avg = from(Endpoint.latency).avg()

// Caculate the histogram of each Endpoint by 50 ms steps.
// Always thermodynamic diagram in UI matches this metric. 
Endpoint_histogram = from(Endpoint.latency).histogram(50)

// Caculate the percent of response status is true, for each service.
Endpoint_success = from(Endpoint.*).filter(status = "true").percent()

// Caculate the percent of response code in [200, 299], for each service.
Endpoint_200 = from(Endpoint.*).filter(responseCode like "2%").percent()

// Caculate the percent of response code in [500, 599], for each service.
Endpoint_500 = from(Endpoint.*).filter(responseCode like "5%").percent()

// Caculate the sum of calls for each service.
EndpointCalls = from(Endpoint.*).sum()
wu-sheng's avatar
wu-sheng 已提交
232
```