Beyond CPU Autoscaling

CPU tells you how busy your consumers are. Kafka lag tells you how much work is waiting. Reliable event platforms need both.

Most autoscaling strategies work well for HTTP services because traffic usually increases CPU.

Kafka consumers are different.

A consumer can sit at 20% CPU while hundreds of thousands of messages keep piling up in Kafka.

That means CPU alone is not a reliable scaling signal for event-driven workloads.

A better strategy is to scale using two signals:

Signal	What it tells you
Kafka Lag	How much work is waiting
CPU	How much compute pressure exists

This article explains why single-trigger autoscaling is incomplete and how dual-trigger KEDA scaling gives better control over throughput, latency, and infrastructure cost.

The Problem

Most teams start with CPU-based autoscaling.

Traffic increases
        ↓
CPU increases
        ↓
HPA adds more pods

This model works for request-response systems.

But Kafka consumers do not always behave like HTTP APIs.

A consumer may spend most of its time waiting on:

database calls
downstream APIs
network IO
disk writes
external systems

In those cases, CPU remains low.

But Kafka lag keeps increasing.

From Kubernetes’ point of view, the service looks healthy.

From the business point of view, the system is falling behind.

Architecture

The important part is not KEDA itself.

The important part is choosing the right scaling signals.

Why CPU Alone Is Not Enough

CPU answers:

How busy are my consumers?

Kafka lag answers:

How much work is waiting?

Those are not the same question.

Situation	CPU	Kafka Lag	Should scale?
Slow database	Low	High	Yes
Consumer waiting on IO	Low	High	Yes
Heavy processing	High	Low	Maybe
Normal traffic	Medium	Low	No
Traffic spike	Medium	High	Yes

CPU is a resource metric.

Kafka lag is a demand metric.

A platform that only watches CPU is blind to queue pressure.

Why Kafka Lag Alone Is Also Not Enough

The opposite mistake is scaling only on Kafka lag.

That looks better initially.

Kafka lag increases
        ↓
KEDA adds more consumers
        ↓
Lag should reduce

But this also breaks in production.

Imagine the database is already slow.

Lag increases
        ↓
KEDA adds more pods
        ↓
More consumers hit the same database
        ↓
More DB connections
        ↓
More contention
        ↓
Latency gets worse

The platform scaled out.

But throughput did not improve.

Scaling consumers does not automatically fix a downstream bottleneck.

Dual-Trigger Autoscaling

A stronger strategy is to combine both signals.

In KEDA, this typically means configuring a ScaledObject with:

Kafka trigger based on consumer lag
CPU trigger based on resource utilization

Conceptually:

Sequence Flow:

New consumers joining the group are not free.

They trigger rebalancing.

Hidden Cost: Consumer Group Rebalancing

Every scale event changes the consumer group.

During rebalance, some consumption may pause briefly.

If autoscaling is too aggressive, the system may spend too much time rebalancing and not enough time processing.

That is why cooldown periods matter.

Production Tuning Decisions

Decision	Why it matters
`minReplicaCount`	Prevents cold starts during normal traffic
`maxReplicaCount`	Protects Kafka, DB, and downstream systems
Lag threshold	Controls when queue pressure becomes meaningful
CPU threshold	Prevents uncontrolled scale-out
Polling interval	Controls responsiveness
Cooldown period	Prevents scale oscillation
Partition count	Defines real parallelism limit

The most important one is often ignored:

Max useful consumers cannot exceed the number of Kafka partitions.

If a topic has 24 partitions, creating 80 consumers does not give 80-way parallelism.

Most of them will sit idle.

A healthy autoscaling system does not instantly jump to maximum replicas.

It scales enough to recover lag without destabilizing the platform.

Engineering Tradeoffs

You Gain	You Pay
Faster backlog recovery	More tuning
Better latency control	More metrics
Better resource utilization	Operational complexity
Lower idle infrastructure cost	Rebalance overhead
Better production visibility	More failure modes to monitor

This is the real engineering decision.

Dual-trigger autoscaling is not “more advanced YAML”.

It is a tradeoff between simplicity and operational control.

Failure Modes to Watch

Scaling beyond partition count - More pods do not help if partitions are already fully assigned.
Scaling into downstream bottlenecks - If DB latency is the bottleneck, more consumers can make it worse.
Aggressive scale up/down - Frequent replica changes can cause repeated consumer rebalances.
Lag threshold too low - The system scales for short-lived spikes that would have recovered naturally.
Cooldown too short - The platform oscillates instead of stabilizing.

Production Checklist

Before using dual-trigger KEDA autoscaling, validate this:

Check	Done
Max replicas <= Kafka partition count	✅
Database capacity tested at max replicas	✅
Lag threshold load-tested	✅
CPU threshold load-tested	✅
Cooldown period configured	✅
Consumer rebalance duration monitored	✅
Lag dashboard available	✅
Scale events visible in monitoring	✅
Downstream latency monitored	✅
Failure recovery tested	✅

The Engineering Decision

The core decision is simple:

Do not scale Kafka consumers only because pods look busy. Scale because business work is waiting and the system has enough capacity to process more of it.

CPU tells one side of the story.

Kafka lag tells the other.

Together, they give a more accurate picture of system health.

Final Takeaway

Autoscaling event-driven systems is not just an infrastructure problem. It is a feedback-control problem. If the feedback signal is wrong, the scaling decision will be wrong.

CPU-based autoscaling works when CPU represents demand.

Kafka consumers often break that assumption.

For adaptive event platforms, lag and CPU should be treated as complementary signals:

Kafka lag shows demand. CPU shows compute pressure. Cooldown protects stability. Partition count defines the scaling ceiling. Downstream capacity decides whether scaling actually helps.

The best autoscaling strategy is not the one that creates the most pods.

It is the one that keeps throughput stable without creating unnecessary operational pressure.

Beyond CPU Autoscaling

The Problem

Architecture

Why CPU Alone Is Not Enough

Why Kafka Lag Alone Is Also Not Enough

Dual-Trigger Autoscaling

Sequence Flow:

Hidden Cost: Consumer Group Rebalancing

Production Tuning Decisions

Engineering Tradeoffs

Production Checklist

The Engineering Decision

Final Takeaway

Comments

More from this blog

Kafka Disaster Recovery in Kubernetes

Kafka Partitioning Strategy in Production

Dynamic Kafka Consumer Orchestration

Event-Driven Systems Beyond the Happy Path

Command Palette

The Problem

Architecture

Why CPU Alone Is Not Enough

Why Kafka Lag Alone Is Also Not Enough

Dual-Trigger Autoscaling

Sequence Flow:

Hidden Cost: Consumer Group Rebalancing

Production Tuning Decisions

Engineering Tradeoffs

Production Checklist

The Engineering Decision

Final Takeaway

Comments

More from this blog