Installing and configuring Apache Kafka
Also available as:
PDF

Topic Settings

For each topic, Kafka maintains a structured commit log with one or more partitions. These topic partitions form the basic unit of parallelism in Kafka. In general, the more partitions there are in a Kafka cluster, the more parallel consumers can be added, resulting in higher throughput.

You can calculate the number of partitions based on your throughput requirements. If throughput from a producer to a single partition is P and throughput from a single partition to a consumer is C, and if your target throughput is T, the minimum number of required partitions is

max (T/P, T/C).

Note also that more partitions can increase latency:

  • End-to-end latency in Kafka is defined as the difference in time from when a message is published by the producer to when the message is read by the consumer.

  • Kafka only exposes a message to a consumer after it has been committed, after the message is replicated to all in-sync replicas.

  • Replication of one thousand partitions from one broker to another can take up 20ms. This is too long for some real-time applications.

  • In the new Kafka producer, messages are accumulated on the producer side; producers buffer the message per partition. This approach allows users to set an upper bound on the amount of memory used for buffering incoming messages. After enough data is accumulated or enough time has passed, accumulated messages are removed and sent to the broker. If you define more partitions, messages are accumulated for more partitions on the producer side.

  • Similarly, the consumer fetches batches of messages per partition. Consumer memory requirements are proportional to the number of partitions that the consumer subscribes to.

Important Topic Properties

Review the following settings in the Advanced kafka-broker category, and modify as needed:

auto.create.topics.enable

Enable automatic creation of topics on the server. If this property is set to true, then attempts to produce, consume, or fetch metadata for a nonexistent topic automatically create the topic with the default replication factor and number of partitions. The default is enabled.

default.replication.factor

Specifies default replication factors for automatically created topics. For high availability production systems, you should set this value to at least 3.

num.partitions

Specifies the default number of log partitions per topic, for automatically created topics. The default value is 1. Change this setting based on the requirements related to your topic and partition design.

delete.topic.enable

Allows users to delete a topic from Kafka using the admin tool, for Kafka versions 0.9 and later. Deleting a topic through the admin tool will have no effect if this setting is turned off.

By default this feature is turned off (set to false).