Category: data engineering

Open-Source Data Warehousing – Druid, Apache Airflow & Superset


These days everyone talks about open-source, however still not common in the Data Warehouse (DWH) field. Why is this? In my recent blog, I researched OLAP technologies and what’s coming next, in this blog I choose one of the open-source technologies and build it together to have a full data architecture for DWH use-cases based on modern open-source technology. As the title says it all, I went for Apache Druid, querying with Apache Superset and using the Apache Airflow as an Orchestrator.
Druid- the data store
What is Druid
Druid is an open-source, column-oriented, distributed data store written in Java. It’s designed to quickly ingest massive quantities of event data, and provide low-latency queries on top of the data.
What druid.io is
Why Druid?
Druid has many Key Features as Sub-second OLAP Queries, Real-time Streaming Ingestion, Scalable, Highly Available, Power Analytic Applications and Cost Effective.
With the Comparison of modern OLAP Technologies in mind, I choose Druid over ClickHouse, Pinot and… Read more

These days everyone talks about open-source, however still not common in the Data Warehouse (DWH) field. Why is this? In my recent blog, I researched OLAP technologies and what’s coming next, in this blog I choose one of the open-source technologies and build it together to have a full data architecture for DWH use-cases based on modern open-source technology. As the title says it all, I went for Apache Druid, querying with Apache Superset and using the Apache Airflow as an Orchestrator.
Druid- the data store
What is Druid
Druid is an open-source, column-oriented, distributed data store written in Java. It’s designed to quickly ingest massive quantities of event data, and provide low-latency queries on top of the data.
What druid.io is
Why Druid?
Druid has many Key Features as Sub-second OLAP Queries, Real-time Streaming Ingestion, Scalable, Highly Available, Power Analytic Applications and Cost Effective.
With the Comparison of modern OLAP Technologies in mind, I choose Druid over ClickHouse, Pinot and…

Read more

OLAP, what’s coming next?


Are you on the lookout for a replacement for the Microsoft Analysis Cubes, are you looking for a big data OLAP system that scales ad libitum, do you want to have your analytics updated even real-time? In this blog, I want to show you possible solutions that are ready for the future and fits into existing data architecture.



What is OLAP?



OLAP is an acronym for Online Analytical Processing. OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modelling. An OLAP cube is a multidimensional database that is optimised for data warehouse and online analytical processing (OLAP) applications. An OLAP cube is a method of storing data in a multidimensional form, generally for reporting purposes. In OLAP cubes, data (measures) are categorised by dimensions. In order to manage and perform processes with an OLAP cube, Microsoft developed a query language, known as multidimensional expressions (MDX), in the late 1990s.  Many… Read more

Are you on the lookout for a replacement for the Microsoft Analysis Cubes, are you looking for a big data OLAP system that scales ad libitum, do you want to have your analytics updated even real-time? In this blog, I want to show you possible solutions that are ready for the future and fits into existing data architecture.

What is OLAP?

OLAP is an acronym for Online Analytical Processing. OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modelling. An OLAP cube is a multidimensional database that is optimised for data warehouse and online analytical processing (OLAP) applications. An OLAP cube is a method of storing data in a multidimensional form, generally for reporting purposes. In OLAP cubes, data (measures) are categorised by dimensions. In order to manage and perform processes with an OLAP cube, Microsoft developed a query language, known as multidimensional expressions (MDX), in the late 1990s.  Many…

Read more

Data Engineering, the future of Data Warehousing?


Data Engineering, the future of Data Warehousing

Today, there are 6,500 people on LinkedIn who call themselves data engineers according to stitchdata.com. In San Francisco alone, there are 6,600 job listings for this same title. The number of data engineers has doubled in the past year, but engineering leaders still find themselves faced with a significant shortage of data engineering talent. So is it really the future of data warehousing? What is data engineering? These questions and much more I want to answer in this blog post.

In unicorn companies like Facebook, Google, Apple where data is the fuel for the company, mostly in America, is where data engineers are largely used. In Europe, the job title does not completely exist besides the startup mecca Berlin, Munich, etc. They are called or included in jobs like software engineer, big data engineer, business analyst, data analyst, data scientist and also the business intelligence engineer. Myself, I started as a…

Read more