Tag: OLAP

Open-Source Data Warehousing – Druid, Apache Airflow & Superset


These days, everyone talks about open-source. However, this is still not common in the Data Warehouse (DWH) field. Why is this?

In my recent blog, I researched OLAP technologies, for this post I chose some open-source technologies and used them together to build a full data architecture for a Data Warehouse system.

I went with Apache Druid for data storage, Apache Superset for querying and Apache Airflow as a task orchestrator.

Druid – the data store

Druid is an open-source, column-oriented, distributed data store written in Java. It’s designed to quickly ingest massive quantities of event data, and provide low-latency queries on top of the data.



What druid.io is



Why Druid?

Druid has many key features, including sub-second OLAP queries, real-time streaming ingestion, scalability, and cost-effectiveness.

With the comparison of modern OLAP Technologies in mind, I chose Druid over ClickHouse, Pinot and Apache Kylin. Recently, Microsoft announced they will add Druid to their Azure HDInsight 4.0.

Why not Druid?

Carter Shanklin wrote a detailed post about Druid’s… Read more

These days, everyone talks about open-source. However, this is still not common in the Data Warehouse (DWH) field. Why is this?

In my recent blog, I researched OLAP technologies, for this post I chose some open-source technologies and used them together to build a full data architecture for a Data Warehouse system.

I went with Apache Druid for data storage, Apache Superset for querying and Apache Airflow as a task orchestrator.

Druid – the data store

Druid is an open-source, column-oriented, distributed data store written in Java. It’s designed to quickly ingest massive quantities of event data, and provide low-latency queries on top of the data.

What druid.io is

Why Druid?

Druid has many key features, including sub-second OLAP queries, real-time streaming ingestion, scalability, and cost-effectiveness.

With the comparison of modern OLAP Technologies in mind, I chose Druid over ClickHouse, Pinot and Apache Kylin. Recently, Microsoft announced they will add Druid to their Azure HDInsight 4.0.

Why not Druid?

Carter Shanklin wrote a detailed post about Druid’s…

Read more

OLAP, what’s coming next?


Are you on the lookout for a replacement for the Microsoft Analysis Cubes, are you looking for a big data OLAP system that scales ad libitum, do you want to have your analytics updated even real-time? In this blog, I want to show you possible solutions that are ready for the future and fits into existing data architecture.



What is OLAP?



OLAP is an acronym for Online Analytical Processing. OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modelling. An OLAP cube is a multidimensional database that is optimised for data warehouse and online analytical processing (OLAP) applications. An OLAP cube is a method of storing data in a multidimensional form, generally for reporting purposes. In OLAP cubes, data (measures) are categorised by dimensions. In order to manage and perform processes with an OLAP cube, Microsoft developed a query language, known as multidimensional expressions (MDX), in the late 1990s.  Many… Read more

Are you on the lookout for a replacement for the Microsoft Analysis Cubes, are you looking for a big data OLAP system that scales ad libitum, do you want to have your analytics updated even real-time? In this blog, I want to show you possible solutions that are ready for the future and fits into existing data architecture.

What is OLAP?

OLAP is an acronym for Online Analytical Processing. OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modelling. An OLAP cube is a multidimensional database that is optimised for data warehouse and online analytical processing (OLAP) applications. An OLAP cube is a method of storing data in a multidimensional form, generally for reporting purposes. In OLAP cubes, data (measures) are categorised by dimensions. In order to manage and perform processes with an OLAP cube, Microsoft developed a query language, known as multidimensional expressions (MDX), in the late 1990s.  Many…

Read more