Skip to main content

Command Palette

Search for a command to run...

Kafka Partitioning Strategy in Production

A Framework That Actually Scales

Updated
8 min read
Kafka Partitioning Strategy in Production

Partition count is the most consequential — and most irreversible — decision you make when designing a Kafka topic.

Get it right and your system scales gracefully for years. Get it wrong and you're either stuck with a rebalancing disaster at 3 AM or watching idle consumers sit on empty partitions while one thread drowns.

Designing Kafka topologies for a real-time event distribution platform processing 10M+ messages/day across multiple tenants and hotel properties globally, this is the framework I'd use again.


Why Partitions Matter More Than You Think

A partition is the unit of parallelism in Kafka. Each partition is consumed by exactly one consumer within a consumer group at any point in time.

More partitions give you:

  • Higher consumer parallelism ceiling

  • Better throughput distribution across brokers

  • Finer-grained rebalancing granularity

More partitions cost you:

  • More file handles on each broker (each partition = a directory of segment files)

  • More metadata for the controller to track

  • Longer, more disruptive consumer group rebalances

  • Higher end-to-end latency if over-partitioned (broker memory is spread thin)

Partition count is a one-way door. You can increase it, but increasing causes a rebalance and breaks key-based routing for messages in-flight. You can never decrease without deleting the topic.


The Four Decisions You Actually Need to Make


Decision 1 — The Partition Key

This is the single most important choice. The key determines:

  • Which partition a message lands in

  • What ordering guarantees you get

  • Whether you'll have hotspots

The Selection Framework

Requirement Key to choose Risk to watch
Per-entity ordering (e.g., all events for entity X in order) Entity ID Skew if some entities are far more active
Global ordering Single constant key All messages → 1 partition — kills parallelism
Max throughput, ordering not needed Null (round-robin) No ordering guarantee whatsoever
Multi-tenant, per-entity ordering Entity ID (with per-tenant topics) Topic proliferation
Multi-tenant, shared topics Composite: TenantID + EntityID Custom partitioner needed

The Hotspot Check

Before committing to any key, sample 24 hours of production traffic and ask:

Does the top 1% of key values account for > 20% of total message volume?

If yes — you have a hotspot risk. Mitigation options:

  • Add a bucket suffix to the key: entityId + "-" + (random % N) — trades ordering granularity for even distribution

  • Pre-partition at the producer level

  • Choose a higher-cardinality dimension of the same entity

In our platform, we partitioned by property ID (a high-cardinality entity identifier). Traffic distribution analysis showed no single property drove more than 0.1% of total volume — safe from hotspots.


Decision 2 — Partition Count

The formula everyone uses:

Partitions = max(throughput-based count, consumer-parallelism-based count)

Throughput-Based Count

Partitions = Target throughput (MB/s)
             ─────────────────────────────────────────────────────
             min(Producer throughput per partition,
                 Consumer throughput per partition)

A single Kafka partition typically handles 10–50 MB/s depending on message size, compression, and consumer processing speed. For most event-driven workloads processing KB-sized messages, throughput alone rarely drives you above 10–20 partitions.

Consumer-Parallelism-Based Count

This is usually the binding constraint:

Partitions ≥ Max consumers you will ever want to run simultaneously

Take your target pod count at peak load, add a growth multiplier, and round up to a clean number. A 10:1 partition-to-consumer ratio is a reasonable starting point — it gives you headroom to scale consumers up to 10x without repartitioning.

The Rule of Thumb Table

Daily message volume Typical peak TPS Partition range
< 1M / day < 50 10–30
1M–10M / day 50–500 30–100
10M–100M / day 500–5,000 100–500
> 100M / day > 5,000 500+ (measure carefully)

At 10M+ messages/day with peaks hitting ~1,000 messages/second per processing pipeline, our high-volume topics sat comfortably in the 500-partition range with a 10:1 partition-to-consumer ratio.

Match Partition Count to Topic Role

Not every topic in your pipeline carries the same volume or needs the same parallelism ceiling. A common mistake is assigning the same partition count everywhere.

Topic Role Relative Volume Partition guidance
High-volume fanout High Match consumer pool
Pre-processing Low-medium 4–48, don't over-shard
Downstream push Matches fanout Same as fanout topic
Burst/resync Spiky High — headroom first

In our platform, a pre-processing topic deliberately ran at a fraction of our main topic's partition count — over-partitioning it would have created unnecessary rebalancing overhead for a step that didn't need parallelism.


Decision 3 — Replication Factor

For production: always 3.

Config Value Why
replication.factor 3 Tolerates 1 broker failure
min.insync.replicas 2 Prevents silent data loss when 1 broker is behind
acks (producer) all Waits for all ISRs to acknowledge

Watch Broker Leader Balance

With 3 brokers and many partitions, leaders distribute across brokers. Over time — especially after broker restarts — they drift.

Monitor replication bytes in per broker. If one broker is handling 5x the replication traffic of another, you have a leader imbalance. Use kafka-leader-election.sh --election-type PREFERRED to rebalance.

In our cluster, post-incident broker restarts regularly caused temporary leader imbalance. Making preferred leader election part of our post-incident runbook solved it.


Decision 4 — Consumer Group Mapping

Partition count means nothing without a clean consumer group design.

Rules that hold at scale:

  1. One consumer group per topic — shared consumer groups across teams/services create invisible coupling. When one consumer falls behind, the group's lag aggregation hides where the problem is.

  2. Consumer count ≤ partition count — consumers beyond the partition count sit idle. The partition is the ceiling, not the consumer count.

  3. Consistent naming convention{ENV}.{SERVICE}.consume-{topic-short-name}. Makes Grafana queries, Splunk searches, and KEDA triggers consistent across environments.

  4. Monitor lag per partition, not just aggregate — a single hot partition drowning in lag is invisible in an aggregate view.


Common Mistakes and Their Symptoms

Mistake Symptom Fix
Too few partitions Consumers max out CPU, lag grows despite healthy consumers Increase partitions (causes rebalance)
Too many partitions on low-volume topic Slow controller failover, excessive file handles, slow rebalances Match partitions to actual need
Wrong partition key (e.g., timestamp) Random distribution, ordering lost, unpredictable consumer load Redesign key — unavoidable topic recreation
Hotspot key One partition at 90% lag, others empty Compound key with bucket suffix
Leader imbalance One broker at 5x replication bytes vs others Run preferred leader election
Shared topics in multi-tenant system Can't isolate tenant incidents Migrate to per-tenant topic model

The Pre-Launch Checklist

□ Partition key chosen — ordering requirements documented
□ Key distribution analyzed — top 1% of keys < 20% of volume
□ Partition count ≥ max concurrent consumers planned
□ 2–3x growth headroom added
□ Replication factor = 3, min.insync.replicas = 2
□ Consumer group naming convention defined and documented
□ Per-partition lag monitoring configured before go-live
□ Broker leader distribution verified post-topic creation
□ Producer acks=all configured
□ Topic retention policy set (time + size-based)

Summary

Decision Recommendation Key risk to avoid
Partition key Entity ID for per-entity ordering Hotspot from low-cardinality key
Partition count max(throughput-based, consumer pool size) × 2–3x headroom Under-partitioning (harder to fix than over)
Topic design Match partitions to topic role — not one-size-fits-all Over-partitioning low-volume topics
Multi-tenancy Per-tenant topics over shared topics Shared topics hide blast radius
Replication RF=3, min.insync.replicas=2, acks=all Under-replication tolerating silent data loss
Consumer groups 1 per topic, monitor per-partition lag Aggregate lag metrics hiding hot partitions

Partition design is upstream of every other Kafka optimization. KEDA autoscaling, consumer lag tuning, disaster recovery — all of them build on top of the foundation you set here. Get this right before you tune anything else.