Steven's Knowledge

Getting Started

Deploy Vector and OpenTelemetry Collector locally, route logs through transforms, ship to multiple backends, observe with Prometheus

Getting Started

This page sets up two pipelines side-by-side — Vector and the OpenTelemetry Collector — so you can feel the difference in shape. Both ingest logs from a noisy source, transform them, and ship to multiple sinks (file + Prometheus).

Setup

# docker-compose.yml
version: '3.8'
services:
  vector:
    image: timberio/vector:0.40.0-debian
    volumes:
      - ./vector.toml:/etc/vector/vector.toml
      - ./logs:/logs
    ports: ['8686:8686']  # API

  otel:
    image: otel/opentelemetry-collector-contrib:latest
    volumes:
      - ./otel-config.yaml:/etc/otelcol-contrib/config.yaml
      - ./logs:/logs
    ports: ['4317:4317', '4318:4318', '8888:8888']  # OTLP gRPC/HTTP, metrics

  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports: ['9090:9090']

  log-source:
    image: chentex/random-logger
    command: ['100', '1000']  # log every 100-1000ms
    volumes:
      - ./logs:/logs

prometheus.yml:

global: { scrape_interval: 15s }
scrape_configs:
  - job_name: vector
    static_configs: [{ targets: ['vector:9598'] }]
  - job_name: otel
    static_configs: [{ targets: ['otel:8888'] }]

Vector: Hello, Pipeline

vector.toml:

# Source: tail Docker logs
[sources.docker_logs]
type = "docker_logs"

# Transform: drop healthchecks
[transforms.filter_healthcheck]
type = "filter"
inputs = ["docker_logs"]
condition = '!match!(.message, r"healthcheck|/ping")'

# Transform: parse log level
[transforms.parse_level]
type = "remap"
inputs = ["filter_healthcheck"]
source = '''
.level = match!(.message, r"(?i)(error|warn|info|debug)") || "unknown"
.level = downcase(string!(.level))
'''

# Transform: convert errors → metric
[transforms.errors_to_metric]
type = "log_to_metric"
inputs = ["parse_level"]

[[transforms.errors_to_metric.metrics]]
field = "level"
type = "counter"
name = "log_events_by_level_total"
tags = { level = "{{level}}" }

# Sink 1: write filtered logs to file
[sinks.archive_file]
type = "file"
inputs = ["parse_level"]
path = "/logs/filtered-{%Y-%m-%d}.log"
encoding.codec = "json"

# Sink 2: expose Prometheus metrics
[sinks.prom_metrics]
type = "prometheus_exporter"
inputs = ["errors_to_metric"]
address = "0.0.0.0:9598"

# Pipeline self-monitoring
[api]
enabled = true
address = "0.0.0.0:8686"

Bring it up:

docker-compose up -d vector log-source prometheus
sleep 5

# Watch Vector's API
curl http://localhost:8686/health

# See metrics flowing
curl http://localhost:9598/metrics | grep log_events_by_level_total

Open Prometheus at http://localhost:9090 and query log_events_by_level_total. You're now generating metrics from logs.

OpenTelemetry Collector: Same Thing

otel-config.yaml:

receivers:
  filelog:
    include: ["/logs/*.log"]
    start_at: beginning
    operators:
      # Parse JSON or fallback to plain
      - type: regex_parser
        regex: '^(?P<timestamp>\S+)\s+(?P<level>\w+)\s+(?P<message>.*)$'
        timestamp:
          parse_from: attributes.timestamp
          layout_type: gotime
          layout: '2006-01-02T15:04:05'

processors:
  # Drop healthchecks
  filter/healthcheck:
    logs:
      log_record:
        - 'IsMatch(body, "healthcheck|/ping")'

  # Resource attrs
  resource:
    attributes:
      - { key: environment, value: dev, action: insert }
      - { key: pipeline, value: otel, action: insert }

  # Sample debug logs 10%
  probabilistic_sampler:
    sampling_percentage: 10

  # Batch for efficiency
  batch:
    send_batch_size: 1000
    timeout: 5s

exporters:
  file:
    path: /logs/otel-filtered.log

  prometheus:
    endpoint: '0.0.0.0:8889'

  debug:
    verbosity: basic

service:
  pipelines:
    logs:
      receivers: [filelog]
      processors: [filter/healthcheck, resource, batch]
      exporters: [file, debug]
  telemetry:
    metrics: { address: '0.0.0.0:8888' }

Bring it up:

docker-compose up -d otel

# OTel exposes its own metrics
curl http://localhost:8888/metrics | grep otelcol_

Add Realistic Sources

Replace the random log generator with a real-app pattern. A Python service emitting OTLP:

# app.py
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
import time, random

trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(OTLPSpanExporter(endpoint="localhost:4317", insecure=True))
)
tracer = trace.get_tracer("my-app")

while True:
    with tracer.start_as_current_span("handle_request") as span:
        latency = random.gauss(50, 30)
        span.set_attribute("http.status_code", random.choice([200, 200, 200, 500]))
        span.set_attribute("latency_ms", latency)
        time.sleep(latency / 1000)

Now OTel Collector is receiving real OTLP traces. Add a tail-sampling processor (see Patterns) and you can keep just the interesting ones.

Multi-Backend Routing

A common need: send logs to S3 (cheap archive) and Datadog (live search) and Splunk (security). Vector:

[sources.docker_logs]
type = "docker_logs"

[sinks.s3_archive]
type = "aws_s3"
inputs = ["docker_logs"]
bucket = "logs-archive"
region = "us-east-1"
encoding.codec = "json"
compression = "gzip"
batch.timeout_secs = 60

[sinks.datadog]
type = "datadog_logs"
inputs = ["docker_logs"]
default_api_key = "${DD_API_KEY}"

# Only errors to Splunk
[transforms.errors_only]
type = "filter"
inputs = ["docker_logs"]
condition = '.level == "error"'

[sinks.splunk_hec]
type = "splunk_hec_logs"
inputs = ["errors_only"]
endpoint = "https://splunk.example.com:8088"
default_token = "${SPLUNK_HEC_TOKEN}"

Same source, three sinks, with cost-conscious routing.

Test It

# Generate some load
for i in {1..1000}; do
  echo "$(date -u +%FT%TZ) INFO request id=$i status=200" >> ./logs/app.log
done

# See it flow
docker-compose logs vector | tail
curl http://localhost:9598/metrics | grep log_events

Cleanup

docker-compose down -v

What's Next

  • Patterns — multi-backend, sampling, PII masking, logs-to-metrics, edge vs aggregator
  • Best Practices — reliability, capacity, monitoring the pipeline, pitfalls

On this page