Notebook / Infrastructure / 007
essay entry no. 007 · June 22, 2026

Re-evaluating runtime choices when integration is no longer the bottleneck

Adopting or migrating to a new language used to be prohibitive, not because of the language itself, but because of everything around it. Agentic AI reduces much of that integration overhead, turning what was a closed decision by default into one worth reconsidering.

The obstacle to adopting a new language has never been the language itself. It has been the surrounding stack: framework, ORM, HTTP server, migration tooling, metrics wiring, and the effort to make all of it work together in a production system. Inconsistent documentation and configuration that diverged from reality consumed days before a service could serve its first request.

That integration cost was what ended most runtime migration discussions before they started. Agentic AI could reduces that cost, which makes previously closed decisions worth reopening.


What AI changes, and what it does not

In enterprise contexts, AI does not replace teams, design architectures, or make migration decisions, or at least I don’t believe these are its most valuable capabilities today. What it handles well is integration: assembling frameworks, resolving dependencies, pinning versions, and producing working configurations. That layer was what made adoption historically uneconomical.

The question becomes whether a different runtime justifies the change, and that is a question answered by measurement, not opinion.


The benchmark setup

Four microservices share a common baseline: REST endpoints, PostgreSQL, and Prometheus metrics. Each implements the same compute-heavy business logic, so load differences isolate runtime behavior rather than application complexity.

All four run on a single Minikube cluster against a shared PostgreSQL instance, each in its own schema. Prometheus scrapes all four; Grafana builds a comparison dashboard at startup. make seed sends 500 orders per service in parallel across all discount tiers, generating enough load for meaningful metrics without a dedicated load-testing tool.


Stack overview

java-service Spring Boot 3.4
python-service FastAPI
node-service Fastify 4
rust-service Axum 0.7
port
8081
8082
8083
8084
ORM
Spring Data JPA + Hibernate
SQLAlchemy 2.x async + asyncpg
Prisma (schema: node_bench)
SQLx with postgres feature
migrations
Flyway (schema: java_bench)
Alembic (search_path: python_bench)
Prisma migrate deploy (on startup)
sqlx::migrate! embedded (on startup)
metrics
Micrometer + /actuator/prometheus
prometheus-fastapi-instrumentator
prom-client with custom middleware
metrics + metrics-exporter-prometheus
runtime
eclipse-temurin:21-jre
python:3.12-slim
node:20-slim (multi-stage)
debian:bookworm-slim (statically linked)

Measurements

CPU under load

During make stress, Java exhibits the classic JIT warmup curve: CPU rises during profiling, then falls below the scripting runtimes once optimized code runs. Rust stays flat and low from the first request. Python and Node show the highest sustained usage throughout.

cpu usage per service · % during seed load
50% 40% 30% 20% 10% 0% 11:53 11:54 11:55 11:56 11:57
node.js 21.1% python 15.7% java 8.36% rust 4.36%

Python and Node execute each iteration through a general-purpose runtime with boxing, dynamic dispatch, and garbage collection. Rust compiles directly to native instructions with no runtime indirection. Java, once warmed up, approaches Rust rather than the scripting runtimes. Python could partially close the gap with C extensions like NumPy, but the core logic here is kept pure to expose the baseline cost of the runtime model.

Memory

Under load, most services drift 10-20% above their idle baseline; Rust remains flat throughout.

resident memory at idle · baseline = rust
rust
7.34 MBbaseline
node
~80 MB×11
python
~85 MB×12
java
~191 MB×26
process RSS before any seed traffic · lower is better

Rust maps to what the service uses, nothing more. Node and Python carry an interpreter and a module graph. Java initializes a virtual machine, a JIT compiler, the Spring context, and the Hibernate entity graph before serving a single request. Rust produces a single statically linked binary in the single-digit megabyte range.

the fleet mathAcross 30 services, an 11x to 26x memory ratio translates directly into node count and monthly cost. A 10x to 20x reduction in infrastructure spend makes a rewrite look attractive. Whether it justifies one depends on migration cost.

Artifact size

Rust’s binary is a few MB, but the image ships a debian:bookworm-slim base and lands at 116 MB. Python sits at 243 MB on python:3.12-slim. Java and Node both reach approximately 400 MB: Java from the JRE layer beneath the fat jar, Node from node_modules and the bundled Prisma query engine.

container image size · baseline = rust
rust
116 MBbaseline
python
243 MB×2.1
java
399 MB×3.4
node
408 MB×3.5
uncompressed image on disk · lower is better

What the benchmark does not capture

Performance data alone does not determine whether to migrate. In an enterprise context, the runtime is rarely the deciding factor: the benchmark shows runtime differences, not the integration cost and ecosystem dependencies that drive the actual decision.

Migration is determined by what the service touches: authentication, internal systems, data contracts, observability, and deployment model. Reduced integration cost matters here, not as a reason to migrate, but as a way to make the decision concrete. A fully wired prototype can be evaluated against a migration estimate with running software rather than diagrams.

The underlying constraint is why a technology was chosen and where its advantage comes from. Some ecosystems depend on library depth and maturity that cannot be generated: replacing the equivalent of a mature Apache project is a hard boundary regardless of tooling. For general-purpose services with a standard shape, repository plus API layer plus common integrations, runtimes are increasingly interchangeable, and migration becomes an engineering trade-off rather than a default rejection.


When migration is and is not worth it

Worth it. Migration makes sense when the service has no ecosystem-specific dependencies and the target runtime provides a framework of equivalent maturity. A REST API over a relational database with standard observability is that case: Spring Boot, FastAPI, Fastify, and Axum are interchangeable at that level. Migrating to Rust produces a measurable reduction in CPU usage relative to any scripting runtime and an absolute reduction in resident memory that holds against Java regardless of load profile.

Not worth it. Services where the runtime and the library stack are inseparable. Python ML inference pipelines depend on PyTorch, NumPy, or JAX, none of which have production-equivalent implementations outside CPython. Agentic AI systems built on LangGraph carry the same constraint: the graph execution model, stateful memory abstractions, and integration ecosystem are Python-native and have no equivalent maturity elsewhere. Domain-heavy codebases are poor candidates for the same structural reason: re-encoding business logic costs more than any runtime savings offset, and that cost does not decrease because scaffolding is faster to generate.


What changed

None of the runtimes have changed. Java, Python, Node, and Rust exhibit the same characteristics they always have. What changed is the cost of verifying those differences in a representative environment.

Integration effort was the real barrier to that verification. With it reduced, the question shifts from whether a migration is feasible in principle to whether a specific service’s runtime profile justifies the cost. That is a narrower and more tractable question.


Run it yourself

The full project is available at github.com/ValerioMC/runtime-bench. All four services, the Kubernetes manifests, the Prometheus scrape configuration, and the Grafana dashboard are included.

The only prerequisites are Docker, Minikube, and kubectl. A single command provisions the cluster, builds the images, deploys all services, and starts port-forwards:

make up

Once running, make stress sends sustained concurrent load to all four services for 120 seconds with 8 workers each. The duration and concurrency are configurable:

make stress                     # 120 s default
DURATION=300 CONCURRENCY=16 make stress

Grafana is available at http://localhost:3000 (user: admin, password: admin) and opens a pre-built dashboard showing CPU usage, memory consumption, and request throughput for all four services side by side. The curves from the charts in this article are produced by that dashboard during a standard stress run.

VM

V. M. Casale

backend / cloud / things that go bump in the night

I keep an engineering notebook of the small fixes, environment tricks, and infrastructure patterns that quietly make my work-week better.

Read next.