OpenTelemetry collector troubleshooting: telemetry service vs logging exporter
Upstream OTel has a short docs entry on how troubleshooting works for
the collector. There’s only one problem, something I get asked on a regular
basis and I’d like to address here: what’s the difference between the
telemetry
service and the logging
exporter?
Let’s see it in action!
To follow along, make sure you’ve got the following installed:
The setup is as follows:
[load.sh(otel-cli)] --> [OTel collector] --> [otel-desktop-viewer]
OTLP:43180 OTLP:4317
In order to see some telemetry in the pipeline we’re using the following script, producing some spans:
#!/bin/bash
# an otel-cli demo of span background
#
# This demo shows span background functionality with events added to the span
# while it's running in the background, then a child span is created and
# the background span is ended gracefully.
# Source: https://github.com/equinix-labs/otel-cli/blob/main/demos/15span-background-layered.sh
set -e
set -x
carrier=$(mktemp) # traceparent propagation via tempfile
sockdir=$(mktemp -d) # a unix socket will be created here
# start the span background server, set up trace propagation, and
# time out after 20 seconds (which shouldn't be reached)
otel-cli span background \
--tp-carrier $carrier \
--sockdir $sockdir \
--tp-print \
--service $0 \
--name "$0 script execution" \
--timeout 20 &
data1=$(uuidgen)
# add an event to the span running in the background, with an attribute
# set to the uuid we just generated
otel-cli span event \
--name "did a thing" \
--sockdir $sockdir \
--attrs "data1=$data1"
# waste some time
sleep 1
# add an event that says we wasted some time
otel-cli span event --name "slept 1 second" --sockdir $sockdir
# run a shorter sleep inside a child span, also note that this is using
# --tp-required so this will fail loudly if there is no traceparent
# available
otel-cli exec \
--service $0 \
--name "sleep 0.2" \
--tp-required \
--tp-carrier $carrier \
--tp-print \
sleep 0.2
# finally, tell the background server we're all done and it can exit
otel-cli span end --sockdir $sockdir
Download load.sh.
Before you’re executing load.sh
make sure to set OTEL_EXPORTER_OTLP_ENDPOINT
so that otel-cli
knows to talk to our collector:
## Bash, zsh:
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:43180
## Fish:
set -x OTEL_EXPORTER_OTLP_ENDPOINT http://localhost:43180
We start with the following OTel collector config (note that because otel-desktop-viewer
is running and using port 8888
already to expose Prometheus metrics, I had
to overwrite the collector port for its own metrics setting
service.telemetry.metrics.address
to 8989
):
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:43180
exporters:
otlp:
endpoint: localhost:4317
tls:
insecure: true
service:
telemetry:
metrics:
address: ":8989"
pipelines:
traces:
receivers: [otlp]
exporters: [otlp]
Download otel-collector-config_0.yaml.
Which gives us:
$ otelcol_0.84.0 --config otel-collector-config_0.yaml
2023-09-07T12:50:58.671+0100 info service/telemetry.go:84 Setting up own telemetry...
2023-09-07T12:50:58.671+0100 info service/telemetry.go:201 Serving Prometheus metrics {"address": ":8989", "level": "Basic"}
2023-09-07T12:50:58.672+0100 info service/service.go:138 Starting otelcol... {"Version": "0.84.0", "NumCPU": 10}
2023-09-07T12:50:58.672+0100 info extensions/extensions.go:31 Starting extensions...
2023-09-07T12:50:58.674+0100 warn internal@v0.84.0/warning.go:40 Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks {"kind": "receiver", "name": "otlp", "data_type": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-09-07T12:50:58.675+0100 info otlpreceiver@v0.84.0/otlp.go:101 Starting HTTP server {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:43180"}
2023-09-07T12:50:58.675+0100 info service/service.go:161 Everything is ready. Begin running and processing data.
Now let’s enable verbose logs in the collector:
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:43180
exporters:
otlp:
endpoint: localhost:4317
tls:
insecure: true
service:
telemetry:
metrics:
address: ":8989"
logs:
level: debug
pipelines:
traces:
receivers: [otlp]
exporters: [otlp]
Download otel-collector-config_1.yaml.
Which gives us:
$ otelcol_0.84.0 --config otel-collector-config_1.yaml
2023-09-07T12:59:37.627+0100 info service/telemetry.go:84 Setting up own telemetry...
2023-09-07T12:59:37.628+0100 info service/telemetry.go:201 Serving Prometheus metrics {"address": ":8989", "level": "Basic"}
2023-09-07T12:59:37.628+0100 debug exporter@v0.84.0/exporter.go:273 Stable component. {"kind": "exporter", "data_type": "traces", "name": "otlp"}
2023-09-07T12:59:37.628+0100 debug receiver@v0.84.0/receiver.go:294 Stable component. {"kind": "receiver", "name": "otlp", "data_type": "traces"}
2023-09-07T12:59:37.628+0100 info service/service.go:138 Starting otelcol... {"Version": "0.84.0", "NumCPU": 10}
2023-09-07T12:59:37.628+0100 info extensions/extensions.go:31 Starting extensions...
2023-09-07T12:59:37.628+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1] Channel created {"grpc_log": true}
2023-09-07T12:59:37.628+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1] original dial target is: "localhost:4317" {"grpc_log": true}
2023-09-07T12:59:37.628+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1] parsed dial target is: {URL:{Scheme:localhost Opaque:4317 User: Host: Path: RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}} {"grpc_log": true}
2023-09-07T12:59:37.628+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1] fallback to scheme "passthrough" {"grpc_log": true}
2023-09-07T12:59:37.628+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1] parsed dial target is: {URL:{Scheme:passthrough Opaque: User: Host: Path:/localhost:4317 RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}} {"grpc_log": true}
2023-09-07T12:59:37.629+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1] Channel authority set to "localhost:4317" {"grpc_log": true}
2023-09-07T12:59:37.630+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1] Resolver state updated: {
"Addresses": [
{
"Addr": "localhost:4317",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Type": 0,
"Metadata": null
}
],
"ServiceConfig": null,
"Attributes": null
} (resolver returned new addresses) {"grpc_log": true}
2023-09-07T12:59:37.630+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1] Channel switches to new LB policy "pick_first" {"grpc_log": true}
2023-09-07T12:59:37.630+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1 SubChannel #2] Subchannel created {"grpc_log": true}
2023-09-07T12:59:37.630+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1] Channel Connectivity change to CONNECTING {"grpc_log": true}
2023-09-07T12:59:37.630+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to CONNECTING {"grpc_log": true}
2023-09-07T12:59:37.630+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1 SubChannel #2] Subchannel picks a new address "localhost:4317" to connect {"grpc_log": true}
2023-09-07T12:59:37.630+0100 info zapgrpc/zapgrpc.go:178 [core] pickfirstBalancer: UpdateSubConnState: 0x1400061e5b8, {CONNECTING <nil>} {"grpc_log": true}
2023-09-07T12:59:37.630+0100 warn internal@v0.84.0/warning.go:40 Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks {"kind": "receiver", "name": "otlp", "data_type": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-09-07T12:59:37.631+0100 info otlpreceiver@v0.84.0/otlp.go:101 Starting HTTP server {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:43180"}
2023-09-07T12:59:37.631+0100 info service/service.go:161 Everything is ready. Begin running and processing data.
2023-09-07T12:59:37.635+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to READY {"grpc_log": true}
2023-09-07T12:59:37.635+0100 info zapgrpc/zapgrpc.go:178 [core] pickfirstBalancer: UpdateSubConnState: 0x1400061e5b8, {READY <nil>} {"grpc_log": true}
2023-09-07T12:59:37.635+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1] Channel Connectivity change to READY {"grpc_log": true}
That’s nice. It gives us some insight what the collector is doing but the pipeline(s) are still opaque. The logging exporter to the rescue! Using:
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:43180
exporters:
otlp:
endpoint: localhost:4317
tls:
insecure: true
logging:
verbosity: detailed
service:
telemetry:
metrics:
address: ":8989"
logs:
level: debug
pipelines:
traces:
receivers: [otlp]
exporters: [otlp, logging]
Download otel-collector-config_2.yaml.
Which gives us:
$ otelcol_0.84.0 --config otel-collector-config_2.yaml
2023-09-07T13:03:38.777+0100 info service/telemetry.go:84 Setting up own telemetry...
2023-09-07T13:03:38.777+0100 info service/telemetry.go:201 Serving Prometheus metrics {"address": ":8989", "level": "Basic"}
2023-09-07T13:03:38.777+0100 info exporter@v0.84.0/exporter.go:275 Development component. May change in the future. {"kind": "exporter", "data_type": "traces", "name": "logging"}
2023-09-07T13:03:38.777+0100 debug exporter@v0.84.0/exporter.go:273 Stable component. {"kind": "exporter", "data_type": "traces", "name": "otlp"}
2023-09-07T13:03:38.777+0100 debug receiver@v0.84.0/receiver.go:294 Stable component. {"kind": "receiver", "name": "otlp", "data_type": "traces"}
2023-09-07T13:03:38.777+0100 info service/service.go:138 Starting otelcol... {"Version": "0.84.0", "NumCPU": 10}
2023-09-07T13:03:38.777+0100 info extensions/extensions.go:31 Starting extensions...
2023-09-07T13:03:38.777+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1] Channel created {"grpc_log": true}
2023-09-07T13:03:38.777+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1] original dial target is: "localhost:4317" {"grpc_log": true}
2023-09-07T13:03:38.777+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1] parsed dial target is: {URL:{Scheme:localhost Opaque:4317 User: Host: Path: RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}} {"grpc_log": true}
2023-09-07T13:03:38.777+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1] fallback to scheme "passthrough" {"grpc_log": true}
2023-09-07T13:03:38.777+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1] parsed dial target is: {URL:{Scheme:passthrough Opaque: User: Host: Path:/localhost:4317 RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}} {"grpc_log": true}
2023-09-07T13:03:38.777+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1] Channel authority set to "localhost:4317" {"grpc_log": true}
2023-09-07T13:03:38.777+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1] Resolver state updated: {
"Addresses": [
{
"Addr": "localhost:4317",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Type": 0,
"Metadata": null
}
],
"ServiceConfig": null,
"Attributes": null
} (resolver returned new addresses) {"grpc_log": true}
2023-09-07T13:03:38.777+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1] Channel switches to new LB policy "pick_first" {"grpc_log": true}
2023-09-07T13:03:38.777+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1 SubChannel #2] Subchannel created {"grpc_log": true}
2023-09-07T13:03:38.777+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1] Channel Connectivity change to CONNECTING {"grpc_log": true}
2023-09-07T13:03:38.777+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to CONNECTING {"grpc_log": true}
2023-09-07T13:03:38.777+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1 SubChannel #2] Subchannel picks a new address "localhost:4317" to connect {"grpc_log": true}
2023-09-07T13:03:38.777+0100 warn internal@v0.84.0/warning.go:40 Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks {"kind": "receiver", "name": "otlp", "data_type": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-09-07T13:03:38.777+0100 info otlpreceiver@v0.84.0/otlp.go:101 Starting HTTP server {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:43180"}
2023-09-07T13:03:38.778+0100 info service/service.go:161 Everything is ready. Begin running and processing data.
2023-09-07T13:03:38.778+0100 info zapgrpc/zapgrpc.go:178 [core] pickfirstBalancer: UpdateSubConnState: 0x14000a73320, {CONNECTING <nil>} {"grpc_log": true}
2023-09-07T13:03:38.781+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to READY {"grpc_log": true}
2023-09-07T13:03:38.782+0100 info zapgrpc/zapgrpc.go:178 [core] pickfirstBalancer: UpdateSubConnState: 0x14000a73320, {READY <nil>} {"grpc_log": true}
2023-09-07T13:03:38.782+0100 info zapgrpc/zapgrpc.go:178 [core] [Channel #1] Channel Connectivity change to READY {"grpc_log": true}
2023-09-07T13:03:43.742+0100 info TracesExporter {"kind": "exporter", "data_type": "traces", "name": "logging", "resource spans": 1, "spans": 1}
2023-09-07T13:03:43.742+0100 info ResourceSpans #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.17.0
Resource attributes:
-> service.name: Str(./load.sh)
ScopeSpans #0
ScopeSpans SchemaURL: https://opentelemetry.io/schemas/1.17.0
InstrumentationScope github.com/equinix-labs/otel-cli 0.4.0 5f1223a809007b088af2ea9b9c534a798589f1f4 2023-08-09T17:22:15Z
Span #0
Trace ID : 94403f2d2ac83d79800ab1b2821134a4
Parent ID : 7ce093a916eef527
ID : 1c2e8d9a61cd67b4
Name : sleep 0.2
Kind : Client
Start time : 2023-09-07 12:03:43.525865 +0000 UTC
End time : 2023-09-07 12:03:43.735383 +0000 UTC
Status code : Unset
Status message :
Attributes:
-> command: Str(sleep)
-> arguments: Str(0.2
)
{"kind": "exporter", "data_type": "traces", "name": "logging"}
2023-09-07T13:03:43.759+0100 info TracesExporter {"kind": "exporter", "data_type": "traces", "name": "logging", "resource spans": 1, "spans": 1}
2023-09-07T13:03:43.759+0100 info ResourceSpans #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.17.0
Resource attributes:
-> service.name: Str(./load.sh)
ScopeSpans #0
ScopeSpans SchemaURL: https://opentelemetry.io/schemas/1.17.0
InstrumentationScope github.com/equinix-labs/otel-cli 0.4.0 5f1223a809007b088af2ea9b9c534a798589f1f4 2023-08-09T17:22:15Z
Span #0
Trace ID : 94403f2d2ac83d79800ab1b2821134a4
Parent ID :
ID : 7ce093a916eef527
Name : ./load.sh script execution
Kind : Client
Start time : 2023-09-07 12:03:42.46539 +0000 UTC
End time : 2023-09-07 12:03:43.754638 +0000 UTC
Status code : Unset
Status message :
Events:
SpanEvent #0
-> Name: did a thing
-> Timestamp: 2023-09-07 12:03:42.465333 +0000 UTC
-> DroppedAttributesCount: 0
-> Attributes::
-> data1: Str(772EAC3E-E700-4D55-B6BB-599FF72E1F2C)
SpanEvent #1
-> Name: slept 1 second
-> Timestamp: 2023-09-07 12:03:43.516306 +0000 UTC
-> DroppedAttributesCount: 0
{"kind": "exporter", "data_type": "traces", "name": "logging"}