Skip to main content

Command Palette

Search for a command to run...

Beyond CPU Autoscaling

Designing Adaptive Kafka Consumers with KEDA

Updated
6 min read
 Beyond CPU Autoscaling

CPU tells you how busy your consumers are. Kafka lag tells you how much work is waiting. Reliable event platforms need both.


Most autoscaling strategies work well for HTTP services because traffic usually increases CPU.

Kafka consumers are different.

A consumer can sit at 20% CPU while hundreds of thousands of messages keep piling up in Kafka.

That means CPU alone is not a reliable scaling signal for event-driven workloads.

A better strategy is to scale using two signals:

Signal What it tells you
Kafka Lag How much work is waiting
CPU How much compute pressure exists

This article explains why single-trigger autoscaling is incomplete and how dual-trigger KEDA scaling gives better control over throughput, latency, and infrastructure cost.


The Problem

Most teams start with CPU-based autoscaling.

Traffic increases
        ↓
CPU increases
        ↓
HPA adds more pods

This model works for request-response systems.

But Kafka consumers do not always behave like HTTP APIs.

A consumer may spend most of its time waiting on:

  • database calls

  • downstream APIs

  • network IO

  • disk writes

  • external systems

In those cases, CPU remains low.

But Kafka lag keeps increasing.

From Kubernetes’ point of view, the service looks healthy.

From the business point of view, the system is falling behind.


Architecture

The important part is not KEDA itself.

The important part is choosing the right scaling signals.


Why CPU Alone Is Not Enough

CPU answers:

How busy are my consumers?

Kafka lag answers:

How much work is waiting?

Those are not the same question.

Situation CPU Kafka Lag Should scale?
Slow database Low High Yes
Consumer waiting on IO Low High Yes
Heavy processing High Low Maybe
Normal traffic Medium Low No
Traffic spike Medium High Yes

CPU is a resource metric.

Kafka lag is a demand metric.

A platform that only watches CPU is blind to queue pressure.


Why Kafka Lag Alone Is Also Not Enough

The opposite mistake is scaling only on Kafka lag.

That looks better initially.

Kafka lag increases
        ↓
KEDA adds more consumers
        ↓
Lag should reduce

But this also breaks in production.

Imagine the database is already slow.

Lag increases
        ↓
KEDA adds more pods
        ↓
More consumers hit the same database
        ↓
More DB connections
        ↓
More contention
        ↓
Latency gets worse

The platform scaled out.

But throughput did not improve.

Scaling consumers does not automatically fix a downstream bottleneck.


Dual-Trigger Autoscaling

A stronger strategy is to combine both signals.

In KEDA, this typically means configuring a ScaledObject with:

  • Kafka trigger based on consumer lag

  • CPU trigger based on resource utilization

Conceptually:


Sequence Flow:

New consumers joining the group are not free.

They trigger rebalancing.


Hidden Cost: Consumer Group Rebalancing

Every scale event changes the consumer group.

During rebalance, some consumption may pause briefly.

If autoscaling is too aggressive, the system may spend too much time rebalancing and not enough time processing.

That is why cooldown periods matter.

Production Tuning Decisions

Decision Why it matters
minReplicaCount Prevents cold starts during normal traffic
maxReplicaCount Protects Kafka, DB, and downstream systems
Lag threshold Controls when queue pressure becomes meaningful
CPU threshold Prevents uncontrolled scale-out
Polling interval Controls responsiveness
Cooldown period Prevents scale oscillation
Partition count Defines real parallelism limit

The most important one is often ignored:

Max useful consumers cannot exceed the number of Kafka partitions.

If a topic has 24 partitions, creating 80 consumers does not give 80-way parallelism.

Most of them will sit idle.

A healthy autoscaling system does not instantly jump to maximum replicas.

It scales enough to recover lag without destabilizing the platform.

Engineering Tradeoffs

You Gain You Pay
Faster backlog recovery More tuning
Better latency control More metrics
Better resource utilization Operational complexity
Lower idle infrastructure cost Rebalance overhead
Better production visibility More failure modes to monitor

This is the real engineering decision.

Dual-trigger autoscaling is not “more advanced YAML”.

It is a tradeoff between simplicity and operational control.

Failure Modes to Watch

  1. Scaling beyond partition count - More pods do not help if partitions are already fully assigned.

  2. Scaling into downstream bottlenecks - If DB latency is the bottleneck, more consumers can make it worse.

  3. Aggressive scale up/down - Frequent replica changes can cause repeated consumer rebalances.

  4. Lag threshold too low - The system scales for short-lived spikes that would have recovered naturally.

  5. Cooldown too short - The platform oscillates instead of stabilizing.


Production Checklist

Before using dual-trigger KEDA autoscaling, validate this:

Check Done
Max replicas <= Kafka partition count
Database capacity tested at max replicas
Lag threshold load-tested
CPU threshold load-tested
Cooldown period configured
Consumer rebalance duration monitored
Lag dashboard available
Scale events visible in monitoring
Downstream latency monitored
Failure recovery tested

The Engineering Decision

The core decision is simple:

Do not scale Kafka consumers only because pods look busy. Scale because business work is waiting and the system has enough capacity to process more of it.

CPU tells one side of the story.

Kafka lag tells the other.

Together, they give a more accurate picture of system health.


Final Takeaway

Autoscaling event-driven systems is not just an infrastructure problem. It is a feedback-control problem. If the feedback signal is wrong, the scaling decision will be wrong.

CPU-based autoscaling works when CPU represents demand.

Kafka consumers often break that assumption.

For adaptive event platforms, lag and CPU should be treated as complementary signals:

Kafka lag shows demand. CPU shows compute pressure. Cooldown protects stability. Partition count defines the scaling ceiling. Downstream capacity decides whether scaling actually helps.

The best autoscaling strategy is not the one that creates the most pods.

It is the one that keeps throughput stable without creating unnecessary operational pressure.

1 views