These days everyone talks about open-source, however still not common in the Data Warehouse (DWH) field. Why is this? In my recent blog, I researched OLAP technologies and what’s coming next, in this blog I choose one of the open-source technologies and build it together to have a full data architecture for DWH use-cases based on modern open-source technology. As the title says it all, I went for Apache Druid, querying with Apache Superset and using the Apache Airflow as an Orchestrator.
Druid- the data store
What is Druid
Druid is an open-source, column-oriented, distributed data store written in Java. It’s designed to quickly ingest massive quantities of event data, and provide low-latency queries on top of the data.
What druid.io is
Druid has many Key Features as Sub-second OLAP Queries, Real-time Streaming Ingestion, Scalable, Highly Available, Power Analytic Applications and Cost Effective.
With the Comparison of modern OLAP Technologies in mind, I choose Druid over ClickHouse, Pinot and…