Notebook / Infrastructure / 004
essay entry no. 004 · May 05, 2026

Observability in a hybrid microservices stack: Spring Boot, FastAPI, and LangChain on Kubernetes

In distributed systems, a single request can cross multiple services and runtimes. When something breaks, we need to easily reconstruct the flow of that request through logs, metrics, and traces, instead of manually navigating unrelated application logs with no correlation.

Distributed systems rarely fail in clean ways. A 500 from order-service might actually come from a timeout or connection issue several services downstream, wrapped and re-thrown before it reaches the client. By the time we’re debugging it, we’re looking at multiple log streams with no shared context.

Without request correlation, debugging is manual: line up timestamps, grep through logs, guess which requests belong together, repeat.

Observability fixes that by connecting logs, metrics, and traces around a shared identifier that follows the request across every service boundary. When a request fails, we can move from the log entry to the full distributed trace and then to the related metrics in a couple of clicks.

This article walks through an observability setup for a mixed Java/Python stack running on Kubernetes:

  • Spring Boot
  • FastAPI
  • LangChain
  • OpenTelemetry
  • Grafana LGTM (Loki, Grafana, Tempo, Prometheus)

The reference project includes three deployable services plus a Makefile that boots the entire environment on Minikube with a single command.


The three pillars and the thread between them

Logs, metrics, and traces each solve a different part of the debugging problem. The thing that actually changes how we debug is the shared identifier that connects all three.

Logs capture detailed events inside a service: requests, errors, downstream calls, retries, decisions. They’re essential, but in a microservice architecture every service writes independently. Once a request crosses multiple services, reconstructing the full story from logs alone gets painful fast.

Metrics give us aggregated system behavior over time: throughput, latency, error rates, saturation. They tell us that something is wrong and often where to start looking, but not which request triggered the problem.

Traces follow a single request end-to-end through the system. A distributed trace records each service hop, downstream HTTP call, and internal span, together with timing and metadata. This is where the correlation ID lives.

In this stack, OpenTelemetry generates a unique trace_id for every incoming request. That ID is:

  • propagated automatically to every downstream HTTP call via the W3C traceparent header
  • injected into the logging context of every log line written during that request
  • attached to every span in the distributed trace stored in Tempo

A Loki query like:

{namespace="apps"} | json | traceId="4bf92f35..."

returns every log line, from every service, for that specific request. A single click on the traceId field opens the full Tempo trace. The investigation that used to take twenty minutes takes thirty seconds.


Stack choice: LGTM + OpenTelemetry

The Elastic Stack was the standard and is still a solid option, but for Kubernetes workloads the Grafana LGTM stack tends to be lighter and easier to operate.

Two aspects are particularly relevant here.

First, Elasticsearch moved to the SSPL license in 2021, which is not recognized as OSI open source and introduces additional constraints for self-hosting scenarios. In contrast, the entire LGTM stack is released under Apache 2.0.

Second, Loki indexes labels rather than the full log content, significantly reducing resource consumption for structured logging workloads. In use cases where queries are primarily based on fields such as traceId or level, this approach provides a much smaller operational footprint without meaningful limitations.

LGTM + OpenTelemetry This stack
licence Apache 2.0 throughout
log indexing labels only — lightweight on Kubernetes
instrumentation vendor-neutral — OpenTelemetry SDK
agentic / LLM tracing OpenInference spans → Tempo (same trace)
unified UI Grafana — logs ↔ traces ↔ metrics correlated
Elastic Stack Alternative
licence SSPL since v7.11 — not OSI open source
log indexing full-text — better unstructured search, heavier
instrumentation Elastic APM — vendor-specific agents
agentic / LLM tracing no native OpenInference integration
unified UI Kibana — requires X-Pack for full correlation

The four components of this stack:

  • OpenTelemetry Collector: receives OTLP spans from all services, batches and forwards them to Tempo. Deployed as a Kubernetes Deployment in the observability namespace.
  • Grafana Tempo: stores and queries distributed traces. Accepts OTLP gRPC on port 4317.
  • Grafana Loki + Promtail: Promtail runs as a DaemonSet and scrapes Kubernetes pod logs; Loki extracts the traceId field as a label on ingest, enabling direct correlation with Tempo.
  • Prometheus + Grafana: Prometheus scrapes /actuator/prometheus (Spring Boot) and /metrics (FastAPI) via pod annotations. Grafana is the unified UI for all three backends.

Architecture

request path
POST /orders
order-service Spring Boot :8080
GET /inventory/{id}
inventory-service FastAPI :8081
POST /recommendations
recommendation-agent LangChain :8082
LangChain chain FakeListLLM / OpenAI · OpenInference spans
all three services share the same trace_id via W3C traceparent
telemetry pipeline
Traces
every service ── OTLP ──▶ otel-collector ──▶ Tempo
Metrics
Prometheus ◀── scrape ── /actuator/prometheus · /metrics
Logs
Promtail ── pod stdout ──▶ Loki traceId as label on ingest
Grafana logs ↔ traces ↔ metrics · unified view

The OTEL Collector is the only component the application services need to reach. Prometheus and Promtail pull from the services directly, no sidecar required.

One detail worth remembering: Spring Boot’s OTLP exporter uses HTTP/protobuf on port 4318, not gRPC on 4317. The Python services use gRPC on 4317. The Collector accepts both, so mixed runtimes work without additional configuration.


The reference project

Reference project observability-demo
vmcforge/observability-demo
order-service Spring Boot inventory-service FastAPI recommendation-agent LangChain
kube-prometheus-stack loki-stack tempo otel-collector
minikube kubectl helm docker k9s optional
make all cluster + infra + build + deploy in one shot

The cluster setup follows the pattern from the Minikube profiles article: each project gets its own named Minikube profile, its own kubectl context, and an exportable standalone kubeconfig. The make cluster-start target creates the observability-demo profile and automatically exports its context to ~/.kube/observability-demo.yaml.

# Full bootstrap — cluster + infra + build + deploy
make all

# Forward all UIs and service ports to localhost
make port-forward        # Grafana :3000  Prometheus :9090  services :8080–8082

# Generate traffic to produce traces, logs, and metrics
make seed                # 10 orders through the full chain

# Open k9s on this cluster
make k9s
# k9s --context observability-demo
# k9s --kubeconfig ~/.kube/observability-demo.yaml

# Per-session isolation
export KUBECONFIG=~/.kube/observability-demo.yaml

Every kubectl and helm command in the Makefile uses --context=observability-demo and --kube-context=observability-demo respectively, so the Makefile is safe to run regardless of which context is currently active in ~/.kube/config.


Instrumentation: Spring Boot

Spring Boot 3.x and Micrometer Tracing handle most of this automatically. Three dependencies are all that’s needed:

<!-- Bridges Micrometer's tracing API to the OpenTelemetry SDK -->
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>

<!-- Exports spans via OTLP HTTP/protobuf to the OTEL Collector -->
<dependency>
    <groupId>io.opentelemetry</groupId>
    <artifactId>opentelemetry-exporter-otlp</artifactId>
</dependency>

<!-- Exposes /actuator/prometheus for Prometheus scraping -->
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

With these on the classpath, application.yml points the exporter at the OTEL Collector and enables 100% sampling:

spring:
  application:
    name: order-service
  jackson:
    property-naming-strategy: SNAKE_CASE

management:
  tracing:
    sampling:
      probability: 1.0
  otlp:
    tracing:
      endpoint: http://${OTEL_COLLECTOR_HOST:otel-collector.observability.svc.cluster.local}:4318/v1/traces
  endpoints:
    web:
      exposure:
        include: health,prometheus

The port matters: 4318 is OTLP over HTTP/protobuf. Spring Boot’s exporter uses HTTP, not gRPC. Pointing it at 4317 produces silent connection failures: no exception, no timeout, spans just stop arriving in Tempo with nothing in the logs to explain why.

For structured JSON logs with the trace ID, logstash-logback-encoder reads the traceId and spanId keys that Micrometer Tracing writes to MDC automatically:

<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
    <encoder class="net.logstash.logback.encoder.LogstashEncoder">
        <customFields>{"service":"${appName}"}</customFields>
        <includeMdcKeyName>traceId</includeMdcKeyName>
        <includeMdcKeyName>spanId</includeMdcKeyName>
    </encoder>
</appender>

Every log line produced during a request now includes "traceId":"4bf92f35..." as a JSON field. Promtail extracts this as a Loki label on ingest, with no extra configuration on the application side.

HTTP trace propagation is also automatic. When order-service calls inventory-service via Spring’s RestClient, the W3C traceparent header is injected by the auto-configured observation infrastructure. No manual header passing.


Instrumentation: FastAPI

The Python side needs a bit more setup, but it’s still straightforward. Five packages cover tracing, metrics, and structured logging:

opentelemetry-sdk>=1.25.0
opentelemetry-instrumentation-fastapi>=0.46b0
opentelemetry-exporter-otlp-proto-grpc>=1.25.0
prometheus-fastapi-instrumentator>=6.1.0
structlog>=24.2.0

At startup, the tracer provider is initialized and the FastAPI app is instrumented:

from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor

resource = Resource.create({"service.name": SERVICE_NAME})
tracer_provider = TracerProvider(resource=resource)
tracer_provider.add_span_processor(
    BatchSpanProcessor(
        OTLPSpanExporter(endpoint=f"{OTEL_COLLECTOR_HOST}:4317", insecure=True)
    )
)
trace.set_tracer_provider(tracer_provider)

app = FastAPI()
FastAPIInstrumentor().instrument_app(app)

Unlike Spring Boot, the Python exporter uses gRPC on port 4317. The OTEL Collector accepts both; services don’t need to align on transport.

For structured logging with the correlation ID, structlog needs one custom processor that reads the active OTEL span context at log time:

import structlog
from opentelemetry import trace

def add_otel_context(logger, method, event_dict):
    span_context = trace.get_current_span().get_span_context()
    if span_context.is_valid:
        event_dict["traceId"] = format(span_context.trace_id, "032x")
        event_dict["spanId"]  = format(span_context.span_id,  "016x")
    return event_dict

structlog.configure(
    processors=[
        structlog.stdlib.add_log_level,
        structlog.processors.TimeStamper(fmt="iso"),
        add_otel_context,
        structlog.processors.JSONRenderer(),
    ],
    ...
)

The field name traceId must match what the Promtail pipeline stage extracts, and what Spring Boot’s logstash-logback-encoder writes. Same camelCase across all services. If any service drifts to trace_id or TraceId, Loki ends up with multiple label names and the Grafana correlation link breaks.

Custom spans are simple:

tracer = trace.get_tracer(SERVICE_NAME)

@app.get("/inventory/{product_id}")
async def get_inventory(product_id: str):
    with tracer.start_as_current_span("inventory.lookup") as span:
        span.set_attribute("product.id", product_id)
        # ... logic

The child span appears nested under the parent HTTP request span in Tempo, with its own duration and attributes.


Instrumentation: LangChain and agentic flows

LangChain operations (chain invocations, LLM calls, tool executions, agent steps) don’t appear in OpenTelemetry traces by default. OpenInference from Arize AI fills that gap with OTEL-compatible instrumentation for the LangChain execution graph:

openinference-instrumentation-langchain>=0.1.19

One call after setting the global tracer provider instruments everything:

from openinference.instrumentation.langchain import LangChainInstrumentor

trace.set_tracer_provider(tracer_provider)     # must come first
LangChainInstrumentor().instrument()

No other changes needed. Every chain invocation now produces child spans under the parent request span. In Tempo, a trace for a /recommendations call looks like this:

HTTP POST /recommendations  (345ms)
  └─ langchain.recommend  (340ms)        ← custom span from the handler
       └─ langchain.chain  (335ms)       ← OpenInference
            ├─ langchain.llm  (310ms)    ← LLM call: prompt + completion + tokens
            └─ langchain.output_parser   (1ms)

With a real LLM (OpenAI, Anthropic), the langchain.llm span includes the full prompt, completion text, and token counts, all visible in Tempo without adding instrumentation code. The FakeListLLM in the reference project produces the same span structure, so the wiring is identical regardless of which model is plugged in.

The agent_trace_id field in the /recommendations response returns the current trace ID as a string, so clients can include it in bug reports or pass it to a support dashboard:

def _current_trace_id() -> str:
    span_context = trace.get_current_span().get_span_context()
    return format(span_context.trace_id, "032x") if span_context.is_valid else "0" * 32

How the correlation ID propagates

Every HTTP request entering order-service gets a trace_id generated at the edge. When order-service calls inventory-service, the outgoing request carries:

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
              │  │                               │               └─ flags
              │  └─ trace_id (128-bit hex)       └─ parent span_id (64-bit)
              └─ version

The receiving service reads this header automatically, extracts the trace_id, and continues the trace as a child span. The LangChain spans created by OpenInference are children of the FastAPI span. One trace_id runs from the first HTTP entry point through every service and every LLM call.

Structlog reads the trace_id from the active OTEL span context at the moment each log call is made. Every logger.info(...) inside a request handler includes it automatically, no need to pass the ID through function arguments or thread-local state. Spring Boot’s MDC integration works the same way.


Open http://localhost:3000 (admin / admin) after running make port-forward. Run make seed first to generate traces and logs to explore.

Following a request end to end:

1
Explore → Loki: click the compass icon in the left sidebar, select datasource Loki. In the query field type {namespace="apps", app="order-service"} and run the query. The log stream from the order service shows up. Loki query showing order-service logs
2
Expand a log entry: click any log line to open the fields panel. A traceId field appears in the structured JSON. Click the Tempo link icon next to it to jump directly to that trace. Expanded log row showing traceId field
3
Read the trace in Tempo: the waterfall view shows the full span tree for the request. The header gives the trace ID, start time, total duration, and number of services involved. Each bar in the waterfall is one span; width represents duration. Tempo trace waterfall showing order-service spans
4
Inspect a span: click any row in the waterfall. A detail panel opens below showing the span's attributes: HTTP method, status code, URL, duration, and any custom tags added by the instrumentation. Tempo span detail panel with attributes
5
Find slow requests: Explore → Tempo → TraceQL tab → run {name="http post /orders"}. The table lists every order trace with its duration, making it easy to spot outliers at a glance. Tempo TraceQL search showing all POST /orders traces with duration
6
Verify Prometheus scraping: in Grafana Explore, select the Prometheus datasource and run up{kubernetes_namespace="apps"}. The result should show Result series: 3, one per service, each returning 1 (UP). Grafana Prometheus Explore showing up metric returning 1 for all three services

Useful Prometheus queries once metrics are flowing:

# Request rate per service (last 5 minutes)
sum(rate(http_server_requests_seconds_count{job="kubernetes-pods-annotation"}[5m])) by (app)

# P95 latency across all services
histogram_quantile(0.95, sum(rate(http_server_requests_seconds_bucket[5m])) by (le, app))

# Error rate (non-2xx responses)
sum(rate(http_server_requests_seconds_count{status!~"2.."}[5m])) by (app)

Useful Loki queries:

# All logs for a specific trace
{namespace="apps"} | json | traceId="<paste-trace-id>"

# Errors across all services
{namespace="apps"} | json | level="error"

# LangChain recommendation calls with output
{app="recommendation-agent"} | json | message="recommendation_response"

Quick reference

Makefile targets

target
what it does
make all
cluster-start + infra-up + build + deploy
make infra-up
Helm install: kube-prometheus-stack + loki + tempo + otel-collector
make build
build all images into Minikube Docker daemon
make deploy
kubectl apply all service manifests + wait for rollout
make port-forward
Grafana :3000 · Prometheus :9090 · services :8080–8082
make seed
10 POST /orders → full chain — generates traces + logs + metrics
make k9s
k9s --context observability-demo
make redeploy
rebuild images + rollout restart (use after code changes)
make logs SVC=name
kubectl logs --follow for the specified service
make destroy
minikube delete -p observability-demo + cleanup

Cluster context — Minikube profile isolation

pattern
command
create profile
minikube start -p observability-demo --cpus 4 --memory 8192
per-command
kubectl --context=observability-demo get pods -A
export kubeconfig
make kubeconfig-export → ~/.kube/observability-demo.yaml
per-session
export KUBECONFIG=~/.kube/observability-demo.yaml
k9s (merged config)
k9s --context observability-demo
k9s (isolated)
k9s --kubeconfig ~/.kube/observability-demo.yaml

Instrumentation — transport and port by runtime

service
traces (OTEL)
metrics (Prometheus)
order-service
HTTP/protobuf → :4318/v1/traces
/actuator/prometheus
inventory-service
gRPC → :4317
/metrics
recommendation-agent
gRPC → :4317 + OpenInference
/metrics
log correlation field
traceId (camelCase — all services)
annotation: prometheus.io/scrape=true

This setup covers a single Kubernetes cluster, which is enough for most local development and many production workloads. Cross-cluster tracing is where things get more interesting: traceparent crossing network boundaries managed by a service mesh, multiple OTEL Collectors routing spans across backends, Grafana picking up mesh telemetry as a fourth datasource. That’s where observability stops being a tooling problem and starts being a distributed systems problem on its own.

The setup is small once it’s wired in, the bigger investment is the discipline to use the same correlation field across every service. Once that’s in place, debugging stops being archaeology and starts looking like a single query.

VM

V. M. Casale

backend / cloud / things that go bump in the night

I keep an engineering notebook of the small fixes, environment tricks, and infrastructure patterns that quietly make my work-week better.

Read next.