OpenTelemetry collector troubleshooting: telemetry service vs logging exporter

Upstream OTel has a short docs entry on how troubleshooting works for the collector. There’s only one problem, something I get asked on a regular basis and I’d like to address here: what’s the difference between the telemetry service and the logging exporter?

Let’s see it in action!

To follow along, make sure you’ve got the following installed:

The setup is as follows:

 [load.sh(otel-cli)] -->     [OTel collector]     --> [otel-desktop-viewer]
                         OTLP:43180     OTLP:4317

In order to see some telemetry in the pipeline we’re using the following script, producing some spans:

#!/bin/bash
# an otel-cli demo of span background
#
# This demo shows span background functionality with events added to the span
# while it's running in the background, then a child span is created and
# the background span is ended gracefully.
# Source: https://github.com/equinix-labs/otel-cli/blob/main/demos/15span-background-layered.sh

set -e
set -x

carrier=$(mktemp)    # traceparent propagation via tempfile
sockdir=$(mktemp -d) # a unix socket will be created here

# start the span background server, set up trace propagation, and
# time out after 20 seconds (which shouldn't be reached)
otel-cli span background \
    --tp-carrier $carrier \
    --sockdir $sockdir \
    --tp-print \
    --service $0 \
    --name "$0 script execution" \
    --timeout 20 &

data1=$(uuidgen)

# add an event to the span running in the background, with an attribute
# set to the uuid we just generated
otel-cli span event \
    --name "did a thing" \
    --sockdir $sockdir \
    --attrs "data1=$data1"

# waste some time
sleep 1

# add an event that says we wasted some time
otel-cli span event --name "slept 1 second" --sockdir $sockdir

# run a shorter sleep inside a child span, also note that this is using
# --tp-required so this will fail loudly if there is no traceparent
# available
otel-cli exec \
    --service $0 \
    --name "sleep 0.2" \
    --tp-required \
    --tp-carrier $carrier \
    --tp-print \
    sleep 0.2

# finally, tell the background server we're all done and it can exit
otel-cli span end --sockdir $sockdir

Download load.sh.

Before you’re executing load.sh make sure to set OTEL_EXPORTER_OTLP_ENDPOINT so that otel-cli knows to talk to our collector:

## Bash, zsh:
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:43180
## Fish:
set -x OTEL_EXPORTER_OTLP_ENDPOINT http://localhost:43180

We start with the following OTel collector config (note that because otel-desktop-viewer is running and using port 8888 already to expose Prometheus metrics, I had to overwrite the collector port for its own metrics setting service.telemetry.metrics.address to 8989):

receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:43180

exporters:
  otlp:
    endpoint: localhost:4317
    tls:
      insecure: true

service:
  telemetry:
    metrics:
      address: ":8989"
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp]

Download otel-collector-config_0.yaml.

Which gives us:

$ otelcol_0.84.0 --config otel-collector-config_0.yaml
2023-09-07T12:50:58.671+0100    info    service/telemetry.go:84 Setting up own telemetry...
2023-09-07T12:50:58.671+0100    info    service/telemetry.go:201        Serving Prometheus metrics      {"address": ":8989", "level": "Basic"}
2023-09-07T12:50:58.672+0100    info    service/service.go:138  Starting otelcol...     {"Version": "0.84.0", "NumCPU": 10}
2023-09-07T12:50:58.672+0100    info    extensions/extensions.go:31     Starting extensions...
2023-09-07T12:50:58.674+0100    warn    internal@v0.84.0/warning.go:40  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks     {"kind": "receiver", "name": "otlp", "data_type": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-09-07T12:50:58.675+0100    info    otlpreceiver@v0.84.0/otlp.go:101        Starting HTTP server    {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:43180"}
2023-09-07T12:50:58.675+0100    info    service/service.go:161  Everything is ready. Begin running and processing data.

Now let’s enable verbose logs in the collector:

receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:43180

exporters:
  otlp:
    endpoint: localhost:4317
    tls:
      insecure: true

service:
  telemetry:
    metrics:
      address: ":8989"
    logs:
      level: debug
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp]

Download otel-collector-config_1.yaml.

Which gives us:

$ otelcol_0.84.0 --config otel-collector-config_1.yaml
2023-09-07T12:59:37.627+0100    info    service/telemetry.go:84 Setting up own telemetry...
2023-09-07T12:59:37.628+0100    info    service/telemetry.go:201        Serving Prometheus metrics      {"address": ":8989", "level": "Basic"}
2023-09-07T12:59:37.628+0100    debug   exporter@v0.84.0/exporter.go:273        Stable component.       {"kind": "exporter", "data_type": "traces", "name": "otlp"}
2023-09-07T12:59:37.628+0100    debug   receiver@v0.84.0/receiver.go:294        Stable component.       {"kind": "receiver", "name": "otlp", "data_type": "traces"}
2023-09-07T12:59:37.628+0100    info    service/service.go:138  Starting otelcol...     {"Version": "0.84.0", "NumCPU": 10}
2023-09-07T12:59:37.628+0100    info    extensions/extensions.go:31     Starting extensions...
2023-09-07T12:59:37.628+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1] Channel created     {"grpc_log": true}
2023-09-07T12:59:37.628+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1] original dial target is: "localhost:4317"   {"grpc_log": true}
2023-09-07T12:59:37.628+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1] parsed dial target is: {URL:{Scheme:localhost Opaque:4317 User: Host: Path: RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}  {"grpc_log": true}
2023-09-07T12:59:37.628+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1] fallback to scheme "passthrough"    {"grpc_log": true}
2023-09-07T12:59:37.628+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1] parsed dial target is: {URL:{Scheme:passthrough Opaque: User: Host: Path:/localhost:4317 RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}     {"grpc_log": true}
2023-09-07T12:59:37.629+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1] Channel authority set to "localhost:4317"   {"grpc_log": true}
2023-09-07T12:59:37.630+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1] Resolver state updated: {
  "Addresses": [
    {
      "Addr": "localhost:4317",
      "ServerName": "",
      "Attributes": null,
      "BalancerAttributes": null,
      "Type": 0,
      "Metadata": null
    }
  ],
  "ServiceConfig": null,
  "Attributes": null
} (resolver returned new addresses)     {"grpc_log": true}
2023-09-07T12:59:37.630+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1] Channel switches to new LB policy "pick_first"      {"grpc_log": true}
2023-09-07T12:59:37.630+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1 SubChannel #2] Subchannel created    {"grpc_log": true}
2023-09-07T12:59:37.630+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1] Channel Connectivity change to CONNECTING   {"grpc_log": true}
2023-09-07T12:59:37.630+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to CONNECTING  {"grpc_log": true}
2023-09-07T12:59:37.630+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1 SubChannel #2] Subchannel picks a new address "localhost:4317" to connect {"grpc_log": true}
2023-09-07T12:59:37.630+0100    info    zapgrpc/zapgrpc.go:178  [core] pickfirstBalancer: UpdateSubConnState: 0x1400061e5b8, {CONNECTING <nil>} {"grpc_log": true}
2023-09-07T12:59:37.630+0100    warn    internal@v0.84.0/warning.go:40  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks     {"kind": "receiver", "name": "otlp", "data_type": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-09-07T12:59:37.631+0100    info    otlpreceiver@v0.84.0/otlp.go:101        Starting HTTP server    {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:43180"}
2023-09-07T12:59:37.631+0100    info    service/service.go:161  Everything is ready. Begin running and processing data.
2023-09-07T12:59:37.635+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to READY       {"grpc_log": true}
2023-09-07T12:59:37.635+0100    info    zapgrpc/zapgrpc.go:178  [core] pickfirstBalancer: UpdateSubConnState: 0x1400061e5b8, {READY <nil>}      {"grpc_log": true}
2023-09-07T12:59:37.635+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1] Channel Connectivity change to READY        {"grpc_log": true}

That’s nice. It gives us some insight what the collector is doing but the pipeline(s) are still opaque. The logging exporter to the rescue! Using:

receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:43180

exporters:
  otlp:
    endpoint: localhost:4317
    tls:
      insecure: true
  logging:
    verbosity: detailed

service:
  telemetry:
    metrics:
      address: ":8989"
    logs:
      level: debug
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp, logging]

Download otel-collector-config_2.yaml.

Which gives us:

$ otelcol_0.84.0 --config otel-collector-config_2.yaml
2023-09-07T13:03:38.777+0100    info    service/telemetry.go:84 Setting up own telemetry...
2023-09-07T13:03:38.777+0100    info    service/telemetry.go:201        Serving Prometheus metrics      {"address": ":8989", "level": "Basic"}
2023-09-07T13:03:38.777+0100    info    exporter@v0.84.0/exporter.go:275        Development component. May change in the future.        {"kind": "exporter", "data_type": "traces", "name": "logging"}
2023-09-07T13:03:38.777+0100    debug   exporter@v0.84.0/exporter.go:273        Stable component.       {"kind": "exporter", "data_type": "traces", "name": "otlp"}
2023-09-07T13:03:38.777+0100    debug   receiver@v0.84.0/receiver.go:294        Stable component.       {"kind": "receiver", "name": "otlp", "data_type": "traces"}
2023-09-07T13:03:38.777+0100    info    service/service.go:138  Starting otelcol...     {"Version": "0.84.0", "NumCPU": 10}
2023-09-07T13:03:38.777+0100    info    extensions/extensions.go:31     Starting extensions...
2023-09-07T13:03:38.777+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1] Channel created     {"grpc_log": true}
2023-09-07T13:03:38.777+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1] original dial target is: "localhost:4317"   {"grpc_log": true}
2023-09-07T13:03:38.777+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1] parsed dial target is: {URL:{Scheme:localhost Opaque:4317 User: Host: Path: RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}  {"grpc_log": true}
2023-09-07T13:03:38.777+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1] fallback to scheme "passthrough"    {"grpc_log": true}
2023-09-07T13:03:38.777+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1] parsed dial target is: {URL:{Scheme:passthrough Opaque: User: Host: Path:/localhost:4317 RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}     {"grpc_log": true}
2023-09-07T13:03:38.777+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1] Channel authority set to "localhost:4317"   {"grpc_log": true}
2023-09-07T13:03:38.777+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1] Resolver state updated: {
  "Addresses": [
    {
      "Addr": "localhost:4317",
      "ServerName": "",
      "Attributes": null,
      "BalancerAttributes": null,
      "Type": 0,
      "Metadata": null
    }
  ],
  "ServiceConfig": null,
  "Attributes": null
} (resolver returned new addresses)     {"grpc_log": true}
2023-09-07T13:03:38.777+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1] Channel switches to new LB policy "pick_first"      {"grpc_log": true}
2023-09-07T13:03:38.777+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1 SubChannel #2] Subchannel created    {"grpc_log": true}
2023-09-07T13:03:38.777+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1] Channel Connectivity change to CONNECTING   {"grpc_log": true}
2023-09-07T13:03:38.777+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to CONNECTING  {"grpc_log": true}
2023-09-07T13:03:38.777+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1 SubChannel #2] Subchannel picks a new address "localhost:4317" to connect {"grpc_log": true}
2023-09-07T13:03:38.777+0100    warn    internal@v0.84.0/warning.go:40  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks     {"kind": "receiver", "name": "otlp", "data_type": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-09-07T13:03:38.777+0100    info    otlpreceiver@v0.84.0/otlp.go:101        Starting HTTP server    {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:43180"}
2023-09-07T13:03:38.778+0100    info    service/service.go:161  Everything is ready. Begin running and processing data.
2023-09-07T13:03:38.778+0100    info    zapgrpc/zapgrpc.go:178  [core] pickfirstBalancer: UpdateSubConnState: 0x14000a73320, {CONNECTING <nil>} {"grpc_log": true}
2023-09-07T13:03:38.781+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to READY       {"grpc_log": true}
2023-09-07T13:03:38.782+0100    info    zapgrpc/zapgrpc.go:178  [core] pickfirstBalancer: UpdateSubConnState: 0x14000a73320, {READY <nil>}      {"grpc_log": true}
2023-09-07T13:03:38.782+0100    info    zapgrpc/zapgrpc.go:178  [core] [Channel #1] Channel Connectivity change to READY        {"grpc_log": true}
2023-09-07T13:03:43.742+0100    info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "logging", "resource spans": 1, "spans": 1}
2023-09-07T13:03:43.742+0100    info    ResourceSpans #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.17.0
Resource attributes:
     -> service.name: Str(./load.sh)
ScopeSpans #0
ScopeSpans SchemaURL: https://opentelemetry.io/schemas/1.17.0
InstrumentationScope github.com/equinix-labs/otel-cli 0.4.0 5f1223a809007b088af2ea9b9c534a798589f1f4 2023-08-09T17:22:15Z
Span #0
    Trace ID       : 94403f2d2ac83d79800ab1b2821134a4
    Parent ID      : 7ce093a916eef527
    ID             : 1c2e8d9a61cd67b4
    Name           : sleep 0.2
    Kind           : Client
    Start time     : 2023-09-07 12:03:43.525865 +0000 UTC
    End time       : 2023-09-07 12:03:43.735383 +0000 UTC
    Status code    : Unset
    Status message :
Attributes:
     -> command: Str(sleep)
     -> arguments: Str(0.2
)
        {"kind": "exporter", "data_type": "traces", "name": "logging"}
2023-09-07T13:03:43.759+0100    info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "logging", "resource spans": 1, "spans": 1}
2023-09-07T13:03:43.759+0100    info    ResourceSpans #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.17.0
Resource attributes:
     -> service.name: Str(./load.sh)
ScopeSpans #0
ScopeSpans SchemaURL: https://opentelemetry.io/schemas/1.17.0
InstrumentationScope github.com/equinix-labs/otel-cli 0.4.0 5f1223a809007b088af2ea9b9c534a798589f1f4 2023-08-09T17:22:15Z
Span #0
    Trace ID       : 94403f2d2ac83d79800ab1b2821134a4
    Parent ID      :
    ID             : 7ce093a916eef527
    Name           : ./load.sh script execution
    Kind           : Client
    Start time     : 2023-09-07 12:03:42.46539 +0000 UTC
    End time       : 2023-09-07 12:03:43.754638 +0000 UTC
    Status code    : Unset
    Status message :
Events:
SpanEvent #0
     -> Name: did a thing
     -> Timestamp: 2023-09-07 12:03:42.465333 +0000 UTC
     -> DroppedAttributesCount: 0
     -> Attributes::
          -> data1: Str(772EAC3E-E700-4D55-B6BB-599FF72E1F2C)
SpanEvent #1
     -> Name: slept 1 second
     -> Timestamp: 2023-09-07 12:03:43.516306 +0000 UTC
     -> DroppedAttributesCount: 0
        {"kind": "exporter", "data_type": "traces", "name": "logging"}