未验证 提交 54b72c88 编写于 作者: wu-sheng's avatar wu-sheng 提交者: GitHub

Enhance documents about the data report and query protocols. (#8041)

上级 149b359c
......@@ -61,6 +61,8 @@ Release Notes.
#### Documentation
* Enhance documents about the data report and query protocols.
All issues and pull requests are [here](https://github.com/apache/skywalking/milestone/101?closed=1)
------------------
......
# Protocols
There are two different types of protocols.
- [**Probe Protocol**](#probe-protocols). It includes descriptions and definitions on how agents send collected metrics data and traces, as well as the format of each entity.
- [**Query Protocol**](#query-protocol). The backend enables the query function in SkyWalking's own UI and other UIs. These queries are based on GraphQL.
# Probe Protocol
It includes descriptions and definitions on how agents send collected metrics, logs, traces and events, as well as the format of each entity.
## Probe Protocols
They also related to the probe group. For more information, see [Concepts and Designs](../concepts-and-designs/overview.md).
These groups are **language-based native agent protocol**, **service mesh protocol** and **3rd-party instrument protocol**.
### Language-based native agent protocol
There are two types of protocols that help language agents work in distributed environments.
1. **Cross Process Propagation Headers Protocol** and **Cross Process Correlation Headers Protocol** come in in-wire data format. Agent/SDK usually uses HTTP/MQ/HTTP2 headers
to carry the data with the RPC request. The remote agent will receive this in the request handler, and bind the context with this specific request.
1. **Trace Data Protocol** is in out-of-wire data format. Agent/SDK uses this to send traces and metrics to SkyWalking or other compatible backends.
### Tracing
There are two types of protocols that help language agents work in distributed tracing.
- **Cross Process Propagation Headers Protocol** and **Cross Process Correlation Headers Protocol** come in in-wire data format. Agent/SDK usually uses HTTP/MQ/HTTP2 headers
to carry the data with the RPC request. The remote agent will receive this in the request handler, and bind the context with this specific request.
[Cross Process Propagation Headers Protocol v3](Skywalking-Cross-Process-Propagation-Headers-Protocol-v3.md) has been the new protocol for in-wire context propagation since the version 8.0.0 release.
[Cross Process Correlation Headers Protocol v1](Skywalking-Cross-Process-Correlation-Headers-Protocol-v1.md) is a new in-wire context propagation protocol which is additional and optional.
Please read SkyWalking language agents documentation to see whether it is supported.
This protocol defines the data format of transporting custom data with `Cross Process Propagation Headers Protocol`.
It has been supported by the SkyWalking javaagent since 8.0.0,
Please read SkyWalking language agents documentation to see whether it is supported.
- **Trace Data Protocol** is an out-of-wire data format. Agent/SDK uses this to send traces to SkyWalking OAP server.
[SkyWalking Trace Data Protocol v3](Trace-Data-Protocol-v3.md) defines the communication method and format between the agent and backend.
### Logging
- **Log Data Protocol** is an out-of-wire data format. Agent/SDK and collector use this to send logs into SkyWalking OAP server.
[SkyWalking Log Data Protocol](Log-Data-Protocol.md) defines the communication method and format between the agent and backend.
### Metrics
SkyWalking has native metrics format, and support widely used metric formats such as Prometheus, OpenCensus, and Zabbix.
The native metrics format definition could be found [here](https://github.com/apache/skywalking-data-collect-protocol/blob/master/language-agent/Meter.proto).
Typically, agent meter plugin(e.g. [Java Meter Plugin](https://skywalking.apache.org/docs/skywalking-java/latest/en/setup/service-agent/java-agent/java-plugin-development-guide/#meter-plugin)) and
Satellite [Prometheus fetcher](https://skywalking.apache.org/docs/skywalking-satellite/latest/en/setup/plugins/fetcher_prometheus-metrics-fetcher/)
would transfer metrics into native format and forward to SkyWalking OAP server.
About receiving 3rd party formats metrics, read [Meter receiver](../setup/backend/backend-meter.md) and [OpenTelemetry receiver](../setup/backend/backend-receivers.md#opentelemetry-receiver) docs for more details.
### Browser probe protocol
The browser probe, such as [skywalking-client-js](https://github.com/apache/skywalking-client-js), could use this protocol to send data to the backend. This service is provided by gRPC.
[SkyWalking Browser Protocol](Browser-Protocol.md) defines the communication method and format between `skywalking-client-js` and backend.
### Service Mesh probe protocol
The probe in sidecar or proxy could use this protocol to send data to the backend. This service provided by gRPC requires
the following key information:
1. Service Name or ID on both sides.
1. Service Instance Name or ID on both sides.
1. Endpoint. URI in HTTP, service method full signature in gRPC.
1. Latency. In milliseconds.
1. Response code in HTTP
1. Status. Success or fail.
1. Protocol. HTTP, gRPC
1. DetectPoint. In Service Mesh sidecar, `client` or `server`. In normal L7 proxy, value is `proxy`.
### Events Report Protocol
The protocol is used to report events to the backend. The [doc](../concepts-and-designs/event.md) introduces the definition of an event, and [the protocol repository](https://github.com/apache/skywalking-data-collect-protocol/blob/master/event) defines gRPC services and message formats of events.
......@@ -69,12 +64,3 @@ JSON event record example:
}
]
```
### 3rd-party instrument protocol
3rd-party instrument protocols are not defined by SkyWalking. They are just protocols/formats with which SkyWalking is compatible, and SkyWalking could receive them from their existing libraries. SkyWalking starts with supporting Zipkin v1, v2 data formats.
The backend has a modular design, so it is very easy to extend a new receiver to support a new protocol/format.
## Query Protocol
The query protocol follows GraphQL grammar, and provides data query capabilities, which depends on your analysis metrics.
Read [query protocol doc](query-protocol.md) for more details.
......@@ -42,3 +42,183 @@ See [Cross Process Propagation Headers Protocol v3](Skywalking-Cross-Process-Pro
4. `Span#skipAnalysis` may be TRUE, if this span doesn't require backend analysis.
### Protocol Definition
```protobuf
// The segment is a collection of spans. It includes all collected spans in a simple one request context, such as a HTTP request process.
//
// We recommend the agent/SDK report all tracked data of one request once for all, such as,
// typically, such as in Java, one segment represent all tracked operations(spans) of one request context in the same thread.
// At the same time, in some language there is not a clear concept like golang, it could represent all tracked operations of one request context.
message SegmentObject {
// A string id represents the whole trace.
string traceId = 1;
// A unique id represents this segment. Other segments could use this id to reference as a child segment.
string traceSegmentId = 2;
// Span collections included in this segment.
repeated SpanObject spans = 3;
// **Service**. Represents a set/group of workloads which provide the same behaviours for incoming requests.
//
// The logic name represents the service. This would show as a separate node in the topology.
// The metrics analyzed from the spans, would be aggregated for this entity as the service level.
string service = 4;
// **Service Instance**. Each individual workload in the Service group is known as an instance. Like `pods` in Kubernetes, it
// doesn't need to be a single OS process, however, if you are using instrument agents, an instance is actually a real OS process.
//
// The logic name represents the service instance. This would show as a separate node in the instance relationship.
// The metrics analyzed from the spans, would be aggregated for this entity as the service instance level.
string serviceInstance = 5;
// Whether the segment includes all tracked spans.
// In the production environment tracked, some tasks could include too many spans for one request context, such as a batch update for a cache, or an async job.
// The agent/SDK could optimize or ignore some tracked spans for better performance.
// In this case, the value should be flagged as TRUE.
bool isSizeLimited = 6;
}
// Segment reference represents the link between two existing segment.
message SegmentReference {
// Represent the reference type. It could be across thread or across process.
// Across process means there is a downstream RPC call for this.
// Typically, refType == CrossProcess means SpanObject#spanType = entry.
RefType refType = 1;
// A string id represents the whole trace.
string traceId = 2;
// Another segment id as the parent.
string parentTraceSegmentId = 3;
// The span id in the parent trace segment.
int32 parentSpanId = 4;
// The service logic name of the parent segment.
// If refType == CrossThread, this name is as same as the trace segment.
string parentService = 5;
// The service logic name instance of the parent segment.
// If refType == CrossThread, this name is as same as the trace segment.
string parentServiceInstance = 6;
// The endpoint name of the parent segment.
// **Endpoint**. A path in a service for incoming requests, such as an HTTP URI path or a gRPC service class + method signature.
// In a trace segment, the endpoint name is the name of first entry span.
string parentEndpoint = 7;
// The network address, including ip/hostname and port, which is used in the client side.
// Such as Client --> use 127.0.11.8:913 -> Server
// then, in the reference of entry span reported by Server, the value of this field is 127.0.11.8:913.
// This plays the important role in the SkyWalking STAM(Streaming Topology Analysis Method)
// For more details, read https://wu-sheng.github.io/STAM/
string networkAddressUsedAtPeer = 8;
}
// Span represents a execution unit in the system, with duration and many other attributes.
// Span could be a method, a RPC, MQ message produce or consume.
// In the practice, the span should be added when it is really necessary, to avoid payload overhead.
// We recommend to creating spans in across process(client/server of RPC/MQ) and across thread cases only.
message SpanObject {
// The number id of the span. Should be unique in the whole segment.
// Starting at 0.
int32 spanId = 1;
// The number id of the parent span in the whole segment.
// -1 represents no parent span.
// Also, be known as the root/first span of the segment.
int32 parentSpanId = 2;
// Start timestamp in milliseconds of this span,
// measured between the current time and midnight, January 1, 1970 UTC.
int64 startTime = 3;
// End timestamp in milliseconds of this span,
// measured between the current time and midnight, January 1, 1970 UTC.
int64 endTime = 4;
// <Optional>
// In the across thread and across process, these references targeting the parent segments.
// The references usually have only one element, but in batch consumer case, such as in MQ or async batch process, it could be multiple.
repeated SegmentReference refs = 5;
// A logic name represents this span.
//
// We don't recommend to include the parameter, such as HTTP request parameters, as a part of the operation, especially this is the name of the entry span.
// All statistic for the endpoints are aggregated base on this name. Those parameters should be added in the tags if necessary.
// If in some cases, it have to be a part of the operation name,
// users should use the Group Parameterized Endpoints capability at the backend to get the meaningful metrics.
// Read https://github.com/apache/skywalking/blob/master/docs/en/setup/backend/endpoint-grouping-rules.md
string operationName = 6;
// Remote address of the peer in RPC/MQ case.
// This is required when spanType = Exit, as it is a part of the SkyWalking STAM(Streaming Topology Analysis Method).
// For more details, read https://wu-sheng.github.io/STAM/
string peer = 7;
// Span type represents the role in the RPC context.
SpanType spanType = 8;
// Span layer represent the component tech stack, related to the network tech.
SpanLayer spanLayer = 9;
// Component id is a predefinited number id in the SkyWalking.
// It represents the framework, tech stack used by this tracked span, such as Spring.
// All IDs are defined in the https://github.com/apache/skywalking/blob/master/oap-server/server-bootstrap/src/main/resources/component-libraries.yml
// Send a pull request if you want to add languages, components or mapping defintions,
// all public components could be accepted.
// Follow this doc for more details, https://github.com/apache/skywalking/blob/master/docs/en/guides/Component-library-settings.md
int32 componentId = 10;
// The status of the span. False means the tracked execution ends in the unexpected status.
// This affects the successful rate statistic in the backend.
// Exception or error code happened in the tracked process doesn't mean isError == true, the implementations of agent plugin and tracing SDK make the final decision.
bool isError = 11;
// String key, String value pair.
// Tags provides more informance, includes parameters.
//
// In the OAP backend analysis, some special tag or tag combination could provide other advanced features.
// https://github.com/apache/skywalking/blob/master/docs/en/guides/Java-Plugin-Development-Guide.md#special-span-tags
repeated KeyStringValuePair tags = 12;
// String key, String value pair with an accurate timestamp.
// Logging some events happening in the context of the span duration.
repeated Log logs = 13;
// Force the backend don't do analysis, if the value is TRUE.
// The backend has its own configurations to follow or override this.
//
// Use this mostly because the agent/SDK could know more context of the service role.
bool skipAnalysis = 14;
}
message Log {
// The timestamp in milliseconds of this event.,
// measured between the current time and midnight, January 1, 1970 UTC.
int64 time = 1;
// String key, String value pair.
repeated KeyStringValuePair data = 2;
}
// Map to the type of span
enum SpanType {
// Server side of RPC. Consumer side of MQ.
Entry = 0;
// Client side of RPC. Producer side of MQ.
Exit = 1;
// A common local code execution.
Local = 2;
}
// A ID could be represented by multiple string sections.
message ID {
repeated string id = 1;
}
// Type of the reference
enum RefType {
// Map to the reference targeting the segment in another OS process.
CrossProcess = 0;
// Map to the reference targeting the segment in the same process of the current one, just across thread.
// This is only used when the coding language has the thread concept.
CrossThread = 1;
}
// Map to the layer of span
enum SpanLayer {
// Unknown layer. Could be anything.
Unknown = 0;
// A database layer, used in tracing the database client component.
Database = 1;
// A RPC layer, used in both client and server sides of RPC component.
RPCFramework = 2;
// HTTP is a more specific RPCFramework.
Http = 3;
// A MQ layer, used in both producer and consuer sides of the MQ component.
MQ = 4;
// A cache layer, used in tracing the cache client component.
Cache = 5;
}
// The segment collections for trace report in batch and sync mode.
message SegmentCollection {
repeated SegmentObject segments = 1;
}
```
......@@ -9,17 +9,17 @@ Metadata contains concise information on all services and their instances, endpo
You may query the metadata in different ways.
```graphql
extend type Query {
getGlobalBrief(duration: Duration!): ClusterBrief
# Normal service related metainfo
getAllServices(duration: Duration!): [Service!]!
# Normal service related meta info
getAllServices(duration: Duration!, group: String): [Service!]!
searchServices(duration: Duration!, keyword: String!): [Service!]!
searchService(serviceCode: String!): Service
# Fetch all services of Browser type
getAllBrowserServices(duration: Duration!): [Service!]!
searchBrowserServices(duration: Duration!, keyword: String!): [Service!]!
searchBrowserService(serviceCode: String!): Service
# Service intance query
# Service instance query
getServiceInstances(duration: Duration!, serviceId: ID!): [ServiceInstance!]!
# Endpoint query
......@@ -127,12 +127,51 @@ extend type Query {
}
```
### Others
The following queries are for specific features, including trace, alarm, and profile.
1. Trace. Query distributed traces by this.
1. Alarm. Through alarm query, you can find alarm trends and their details.
### Logs
```graphql
extend type Query {
# Return true if the current storage implementation supports fuzzy query for logs.
supportQueryLogsByKeywords: Boolean!
queryLogs(condition: LogQueryCondition): Logs
# Test the logs and get the results of the LAL output.
test(requests: LogTestRequest!): LogTestResponse!
}
```
Log implementations have a little differences with different database options. Search engine(s), e.g. ElasticSearch and OpenSearch, could support
full log text fuzzy query. Others would not support considering performance impact and end user experience.
`test` API is provided for the debugger tool of native LAL parsing.
### Trace
```graphql
extend type Query {
queryBasicTraces(condition: TraceQueryCondition): TraceBrief
queryTrace(traceId: ID!): Trace
}
```
Trace query provides to fetch trace segment list, and spans of given trace id.
### Alarm
```graphql
extend type Query {
getAlarmTrend(duration: Duration!): AlarmTrend!
getAlarm(duration: Duration!, scope: Scope, keyword: String, paging: Pagination!, tags: [AlarmTag]): Alarms
}
```
Alarm query provides to query detected alerting messages with relative events.
### Event
```graphql
extend type Query {
queryEvents(condition: EventQueryCondition): Events
}
```
The actual query GraphQL scripts can be found in the `query-protocol` folder [here](../../../oap-server/server-query-plugin/query-graphql-plugin/src/main/resources).
Event query is fetching the event list according to given sources and time range conditions.
## Condition
### Duration
......
......@@ -172,7 +172,11 @@ catalog:
- name: "Compiling Guide"
path: "/en/guides/How-to-build"
- name: "Protocols"
path: "/en/protocols/readme"
catalog:
- name: "Data Report(Probe/Agent) Protocol"
path: "/en/protocols/readme"
- name: "Query Protocol (GraphQL)"
path: "/en/protocols/query-protocol"
- name: "FAQs"
path: "/en/FAQ/readme"
- name: "Changelog"
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册