Apache Hudi is a data Lake technology that has been in use since 2016. Originally built by Uber, Hudi is the first of the three data platforms that we’re going to examine in detail. The series has in-depth reviews of Delta Lake and Apache Iceberg, and will end with a comparison between those three popular technologies.
This is the third post in our series, Architectures Of A Modern Data Platform
In this post, we are going to provide an overview of Apache Hudi, before diving in and examining its key strengths and how they match up against the requirements we set out in our previous article.
Hudi was originally developed at Uber in the early 2010s and went into production in 2016. Uber had a requirement for a cloud-based data lake to power and manage the 100PB of data that underpin key business functions for trips, riders, and customers. Uber’s data engineering team had a specific need to feed changes from relational SQL databases via either change data capture (CDC) or binlogs into a data lake.
Uber submitted Hudi to the Apache software foundation in 2...