Re-evaluating runtime choices when integration is no longer the bottleneck

The obstacle to adopting a new language has never been the language itself. It has been the surrounding stack: framework, ORM, HTTP server, migration tooling, metrics wiring, and the effort to make all of it work together in a production system. Inconsistent documentation and configuration that diverged from reality consumed days before a service could serve its first request.

That integration cost was what ended most runtime migration discussions before they started. Agentic AI could reduces that cost, which makes previously closed decisions worth reopening.

What AI changes, and what it does not

In enterprise contexts, AI does not replace teams, design architectures, or make migration decisions, or at least I don’t believe these are its most valuable capabilities today. What it handles well is integration: assembling frameworks, resolving dependencies, pinning versions, and producing working configurations. That layer was what made adoption historically uneconomical.

The question becomes whether a different runtime justifies the change, and that is a question answered by measurement, not opinion.

The benchmark setup

Four microservices share a common baseline: REST endpoints, PostgreSQL, and Prometheus metrics. Each implements the same compute-heavy business logic, so load differences isolate runtime behavior rather than application complexity.

All four run on a single Minikube cluster against a shared PostgreSQL instance, each in its own schema. Prometheus scrapes all four; Grafana builds a comparison dashboard at startup. make seed sends 500 orders per service in parallel across all discount tiers, generating enough load for meaningful metrics without a dedicated load-testing tool.

Stack overview

java-service Spring Boot 3.4

python-service FastAPI

node-service Fastify 4

rust-service Axum 0.7

port

8081

8082

8083

8084

ORM

Spring Data JPA + Hibernate

SQLAlchemy 2.x async + asyncpg

Prisma (schema: node_bench)

SQLx with postgres feature

migrations

Flyway (schema: java_bench)

Alembic (search_path: python_bench)

Prisma migrate deploy (on startup)

sqlx::migrate! embedded (on startup)

metrics

Micrometer + /actuator/prometheus

prometheus-fastapi-instrumentator

prom-client with custom middleware

metrics + metrics-exporter-prometheus

runtime

eclipse-temurin:21-jre

python:3.12-slim

node:20-slim (multi-stage)

debian:bookworm-slim (statically linked)

Measurements

CPU under load

During make stress, Java exhibits the classic JIT warmup curve: CPU rises during profiling, then falls below the scripting runtimes once optimized code runs. Rust stays flat and low from the first request. Python and Node show the highest sustained usage throughout.

cpu usage per service · % during seed load

node.js 21.1% python 15.7% java 8.36% rust 4.36%

Python and Node execute each iteration through a general-purpose runtime with boxing, dynamic dispatch, and garbage collection. Rust compiles directly to native instructions with no runtime indirection. Java, once warmed up, approaches Rust rather than the scripting runtimes. Python could partially close the gap with C extensions like NumPy, but the core logic here is kept pure to expose the baseline cost of the runtime model.

Memory

Under load, most services drift 10-20% above their idle baseline; Rust remains flat throughout.

resident memory at idle · baseline = rust

rust

7.34 MBbaseline

node

~80 MB×11

python

~85 MB×12

java

~191 MB×26

process RSS before any seed traffic · lower is better

Rust maps to what the service uses, nothing more. Node and Python carry an interpreter and a module graph. Java initializes a virtual machine, a JIT compiler, the Spring context, and the Hibernate entity graph before serving a single request. Rust produces a single statically linked binary in the single-digit megabyte range.

the fleet mathAcross 30 services, an 11x to 26x memory ratio translates directly into node count and monthly cost. A 10x to 20x reduction in infrastructure spend makes a rewrite look attractive. Whether it justifies one depends on migration cost.

Artifact size

Rust’s binary is a few MB, but the image ships a debian:bookworm-slim base and lands at 116 MB. Python sits at 243 MB on python:3.12-slim. Java and Node both reach approximately 400 MB: Java from the JRE layer beneath the fat jar, Node from node_modules and the bundled Prisma query engine.

container image size · baseline = rust

rust

116 MBbaseline

python

243 MB×2.1

java

399 MB×3.4

node

408 MB×3.5

uncompressed image on disk · lower is better

What the benchmark does not capture

Performance data alone does not determine whether to migrate. In an enterprise context, the runtime is rarely the deciding factor: the benchmark shows runtime differences, not the integration cost and ecosystem dependencies that drive the actual decision.

Migration is determined by what the service touches: authentication, internal systems, data contracts, observability, and deployment model. Reduced integration cost matters here, not as a reason to migrate, but as a way to make the decision concrete. A fully wired prototype can be evaluated against a migration estimate with running software rather than diagrams.

The underlying constraint is why a technology was chosen and where its advantage comes from. Some ecosystems depend on library depth and maturity that cannot be generated: replacing the equivalent of a mature Apache project is a hard boundary regardless of tooling. For general-purpose services with a standard shape, repository plus API layer plus common integrations, runtimes are increasingly interchangeable, and migration becomes an engineering trade-off rather than a default rejection.

When migration is and is not worth it

Worth it. Migration makes sense when the service has no ecosystem-specific dependencies and the target runtime provides a framework of equivalent maturity. A REST API over a relational database with standard observability is that case: Spring Boot, FastAPI, Fastify, and Axum are interchangeable at that level. Migrating to Rust produces a measurable reduction in CPU usage relative to any scripting runtime and an absolute reduction in resident memory that holds against Java regardless of load profile.

Not worth it. Services where the runtime and the library stack are inseparable. Python ML inference pipelines depend on PyTorch, NumPy, or JAX, none of which have production-equivalent implementations outside CPython. Agentic AI systems built on LangGraph carry the same constraint: the graph execution model, stateful memory abstractions, and integration ecosystem are Python-native and have no equivalent maturity elsewhere. Domain-heavy codebases are poor candidates for the same structural reason: re-encoding business logic costs more than any runtime savings offset, and that cost does not decrease because scaffolding is faster to generate.

What changed

None of the runtimes have changed. Java, Python, Node, and Rust exhibit the same characteristics they always have. What changed is the cost of verifying those differences in a representative environment.

Integration effort was the real barrier to that verification. With it reduced, the question shifts from whether a migration is feasible in principle to whether a specific service’s runtime profile justifies the cost. That is a narrower and more tractable question.

Run it yourself

The full project is available at github.com/ValerioMC/runtime-bench. All four services, the Kubernetes manifests, the Prometheus scrape configuration, and the Grafana dashboard are included.

The only prerequisites are Docker, Minikube, and kubectl. A single command provisions the cluster, builds the images, deploys all services, and starts port-forwards:

make up

Once running, make stress sends sustained concurrent load to all four services for 120 seconds with 8 workers each. The duration and concurrency are configurable:

make stress                     # 120 s default
DURATION=300 CONCURRENCY=16 make stress

Grafana is available at http://localhost:3000 (user: admin, password: admin) and opens a pre-built dashboard showing CPU usage, memory consumption, and request throughput for all four services side by side. The curves from the charts in this article are produced by that dashboard during a standard stress run.

Re-evaluating runtime choices when integration is no longer the bottleneck

What AI changes, and what it does not

The benchmark setup

Stack overview

Measurements

CPU under load

Memory

Artifact size

What the benchmark does not capture

When migration is and is not worth it

What changed

Run it yourself

Read next.

InvestigatorAI: a near-production multi-agent AI system