Disk Drive Considerations
For throughput, we recommend dedicating multiple drives to Kafka data. More drives typically perform better with Kafka than fewer. Do not share these Kafka drives with any other application or use them for Kafka application logs.
You can configure multiple drives by specifying a comma-separated list of
directories for the
log.dirs property in the
server.properties file. Kafka uses a round-robin approach to assign
partitions to directories specified in
log.dirs; the default value is
num.io.threads property should be set to a value equal to or
greater than the number of disks dedicated for Kafka. Recommendation: start by
setting this property equal to the number of disks.
Depending on how you configure flush behavior (see "Log Flush Management"), a
faster disk drive is beneficial if the
property is set to flush the log file after every 100,000 messages
Kafka performs best when data access loads are balanced among partitions, leading to balanced loads across disk drives. In addition, data distribution across disks is important. If one disk becomes full and other disks have available space, this can cause performance issues. To avoid slowdowns or interruptions to Kafka services, you should create usage alerts that notify you when available disk space is low.
RAID can potentially improve load balancing among the disks, but RAID can cause performance bottleneck due to slower writes. In addition, it reduces available disk space. Although RAID can tolerate disk failures, rebuilding RAID array is I/O-intensive and effectively disables the server. Therefore, RAID does not provide substantial improvements in availability.