Tag: Airflow

Open-Source Data Warehousing – Druid, Apache Airflow & Superset


These days, everyone talks about open-source. However, this is still not common in the Data Warehouse (DWH) field. Why is this?

In my recent blog, I researched OLAP technologies, for this post I chose some open-source technologies and used them together to build a full data architecture for a Data Warehouse system.

I went with Apache Druid for data storage, Apache Superset for querying and Apache Airflow as a task orchestrator.

Druid – the data store

Druid is an open-source, column-oriented, distributed data store written in Java. It’s designed to quickly ingest massive quantities of event data, and provide low-latency queries on top of the data.



What druid.io is



Why Druid?

Druid has many key features, including sub-second OLAP queries, real-time streaming ingestion, scalability, and cost-effectiveness.

With the comparison of modern OLAP Technologies in mind, I chose Druid over ClickHouse, Pinot and Apache Kylin. Recently, Microsoft announced they will add Druid to their Azure HDInsight 4.0.

Why not Druid?

Carter Shanklin wrote a detailed post about Druid’s… Read more

These days, everyone talks about open-source. However, this is still not common in the Data Warehouse (DWH) field. Why is this?

In my recent blog, I researched OLAP technologies, for this post I chose some open-source technologies and used them together to build a full data architecture for a Data Warehouse system.

I went with Apache Druid for data storage, Apache Superset for querying and Apache Airflow as a task orchestrator.

Druid – the data store

Druid is an open-source, column-oriented, distributed data store written in Java. It’s designed to quickly ingest massive quantities of event data, and provide low-latency queries on top of the data.

What druid.io is

Why Druid?

Druid has many key features, including sub-second OLAP queries, real-time streaming ingestion, scalability, and cost-effectiveness.

With the comparison of modern OLAP Technologies in mind, I chose Druid over ClickHouse, Pinot and Apache Kylin. Recently, Microsoft announced they will add Druid to their Azure HDInsight 4.0.

Why not Druid?

Carter Shanklin wrote a detailed post about Druid’s…

Read more