How Uber builds reliable redeliveries and dead letter queues with Apache Kafka

Conference

Big Data & AI

Room 2

Wednesday from 2:00 PM til 2:50 PM

In distributed systems, retries are inevitable. From network errors to replication issues and even outages in downstream dependencies, services operating at a massive scale must be prepared to encounter, identify, and handle failure as gracefully as possible.

Given the scope and pace at which Uber operates, our systems must be fault-tolerant and uncompromising when it comes to failing intelligently. In particular, in streaming processing and event driven architecture, supporting reliable redeliveries with dead letter queues is a popular ask from many real-time applications and services at Uber. To accomplish this, we leverage Apache Kafka, a popular open source distributed pub/sub messaging platform, which has been industry-tested for delivering high performance at scale. We build competing consumption semantics with dead letter queues on top of existing Kafka APIs and provide interfaces to ack or nack out of order messages with retries and in-process fanout features.

Stream Processing Big Data - How Streaming Data Streaming API

Mingmin Chen
Mingmin Chen is the tech lead and senior software engineer with streaming data team at Uber, primarily focusing on building Apache Kafka pipeline and scaling Uber's real-time infrastructure. Prior to that he was a software engineer with Twitter and Oracle, working on big data infrastructure, storage server and database technologies. He got his PhD in computer science from UC Davis.