Open-Source Data Warehousing – Druid, Apache Airflow & Superset

Open-Source Data Warehousing – Druid, Apache Airflow & Superset


These days everyone talks about open-source, however still not common in the Data Warehouse (DWH) field. Why is this? In my recent blog, I researched OLAP technologies and what’s coming next, in this blog I choose one of the open-source technologies and build it together to have a full data architecture for DWH use-cases based on modern open-source technology. As the title says it all, I went for Apache Druid, querying with Apache Superset and using the Apache Airflow as an Orchestrator.
Druid- the data store
What is Druid
Druid is an open-source, column-oriented, distributed data store written in Java. It’s designed to quickly ingest massive quantities of event data, and provide low-latency queries on top of the data.
What druid.io is
Why Druid?
Druid has many Key Features as Sub-second OLAP Queries, Real-time Streaming Ingestion, Scalable, Highly Available, Power Analytic Applications and Cost Effective.
With the Comparison of modern OLAP Technologies in mind, I choose Druid over ClickHouse, Pinot and… Read more

These days everyone talks about open-source, however still not common in the Data Warehouse (DWH) field. Why is this? In my recent blog, I researched OLAP technologies and what’s coming next, in this blog I choose one of the open-source technologies and build it together to have a full data architecture for DWH use-cases based on modern open-source technology. As the title says it all, I went for Apache Druid, querying with Apache Superset and using the Apache Airflow as an Orchestrator.
Druid- the data store
What is Druid
Druid is an open-source, column-oriented, distributed data store written in Java. It’s designed to quickly ingest massive quantities of event data, and provide low-latency queries on top of the data.
What druid.io is
Why Druid?
Druid has many Key Features as Sub-second OLAP Queries, Real-time Streaming Ingestion, Scalable, Highly Available, Power Analytic Applications and Cost Effective.
With the Comparison of modern OLAP Technologies in mind, I choose Druid over ClickHouse, Pinot and…

Read more

OLAP, what’s coming next?


Are you on the lookout for a replacement for the Microsoft Analysis Cubes, are you looking for a big data OLAP system that scales ad libitum, do you want to have your analytics updated even real-time? In this blog, I want to show you possible solutions that are ready for the future and fits into existing data architecture.



What is OLAP?



OLAP is an acronym for Online Analytical Processing. OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modelling. An OLAP cube is a multidimensional database that is optimised for data warehouse and online analytical processing (OLAP) applications. An OLAP cube is a method of storing data in a multidimensional form, generally for reporting purposes. In OLAP cubes, data (measures) are categorised by dimensions. In order to manage and perform processes with an OLAP cube, Microsoft developed a query language, known as multidimensional expressions (MDX), in the late 1990s.  Many… Read more

Are you on the lookout for a replacement for the Microsoft Analysis Cubes, are you looking for a big data OLAP system that scales ad libitum, do you want to have your analytics updated even real-time? In this blog, I want to show you possible solutions that are ready for the future and fits into existing data architecture.

What is OLAP?

OLAP is an acronym for Online Analytical Processing. OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modelling. An OLAP cube is a multidimensional database that is optimised for data warehouse and online analytical processing (OLAP) applications. An OLAP cube is a method of storing data in a multidimensional form, generally for reporting purposes. In OLAP cubes, data (measures) are categorised by dimensions. In order to manage and perform processes with an OLAP cube, Microsoft developed a query language, known as multidimensional expressions (MDX), in the late 1990s.  Many…

Read more

Today’s Office – The Location Independent Lifestyle


Palawan - Today's Office

Today’s Office is something I started doing while I was working on the road and working remotely. Whenever suited I took a picture from my «Office». Thus I’m sharing my best pictures over my latest few months, also because the independent-work-lifestyle resembles the way of working from many of us these days. If you want to go directly to the images, click here.
What is location independent lifestyle?
Firstly, what is location-independent-work or a nomadic-lifestyle? I would describe it as not being tied to a particular location for any reason. And why would you want this? There are several answers besides the obvious that you travel the world, learn from other cultures and meet amazing people. For me, it also about productivity when on the road I get into deep focus. While having a delicious coffee, I can get things done for 2-3 hours. Additionally, I get natural breaks during travelling or…

Read more

Tools I Use – Part III


Tools I Use - sspaeti.com

Tools I use part III will focus on increasing your productivity in communication, grammar, deep work or study, little notes and in Google Chrome. Some of the tools you might already know but hopefully not all the features as I try to elaborate on the most helpful once. Without further ado, please enjoy the tools below.
Instant language Translation
Around you
Google Translate – play.google.com and itunes.apple.com

As this is an obvious one to translate text, many might not know about the app having on-the-fly translation using your camera. See how it work in the video.

https://youtu.be/Ro-HfETpzhc?t=22s

Same goes for pictures. For example, as I am living in Denmark with not speaking Danish at all, I never had any problems reading my letters or anything in the shop with the native Google Translate. It sounds like a small thing, but if you travel or live in a foreign country, it can save your life!
Call or Video
Skype – Skype.com

Same goes…

Read more

Data Engineering, the future of Data Warehousing?


Data Engineering, the future of Data Warehousing

Today, there are 6,500 people on LinkedIn who call themselves data engineers according to stitchdata.com. In San Francisco alone, there are 6,600 job listings for this same title. The number of data engineers has doubled in the past year, but engineering leaders still find themselves faced with a significant shortage of data engineering talent. So is it really the future of data warehousing? What is data engineering? These questions and much more I want to answer in this blog post.

In unicorn companies like Facebook, Google, Apple where data is the fuel for the company, mostly in America, is where data engineers are largely used. In Europe, the job title does not completely exist besides the startup mecca Berlin, Munich, etc. They are called or included in jobs like software engineer, big data engineer, business analyst, data analyst, data scientist and also the business intelligence engineer. Myself, I started as a…

Read more

Data Warehouse vs Data Lake | ETL vs ELT


Data Lake vs Data Warehouse

There is a bit of a confusion between Data Warehouse vs Data Lake or ETL vs ELT. I hear that Data Warehouses are not used anymore, that they are replaced by Data Lakes altogether, but is that true? And why do we need Data Warehouses anyway? I will go into that as well as the definitions of both pluses explain the differences between them.
Data Warehouse vs Data Lake
Data Warehouse definition
A Data Warehouse, in short DWH and also known as an Enterprise Data Warehouse (EDW), is the traditional way of collecting data as we do since 31 years. The DWH serves the purpose of being the data integration from many different sources, the single point of truth and the data management meaning cleaning, historize and data joined together. It provides greater executive insight into corporate performance with management Dashboards, Reports or Ad-Hoc Analyses.

Various types of business data are analysed with Data Warehouses. The need for it often…

Read more

Tools I Use – Microsoft OneNote – Part II


As promised in part I here is the part II of tools I use and in this part, I will entirely focus on one of my most used and favourite tool called Microsoft OneNote. I use it for almost everything, you may ask “Why?! What is the big deal, I can use Microsoft Word, Google Keep, a paper block or anything else, why MS OneNote?” Yes, that is fully true but you are lacking fundamental structure and essential features that you won’t have in these tools.

Maybe the biggest feature itself is to organisation and structure inside OneNote, it keeps your work, university, private material perfectly organized, and you can get things done. If your project is growing bigger than expected, you are able to quickly restructure your notes by increasing an additional section and turn the existing section into a section group which allows you to split notes into different segments like releases, parts,…

Read more

What Data Warehouse Automation tools are on the market


The 5 top most searched Data Warehouse Automation tools on the market compared with GoogleTrends is telling you that WhereScape is first before TimeXtender and BiReady (new Attunity Compose) over the last year. See the picture in full size or go directly to GoogleTrend comparison and change to your own needs.

Although the analysis is not representative, it still gives some insights and a good overview to size and presumably usage compared to each other, worldwide. Please consider that WhereScape and TimeXtender have more search results as the company name is the same as their product, meaning some of them are dedicated to the company name rather the Data Warehouse Automation (DWA) tool itself. And BimlFlex just published their first release and biGENiUS is rather new to market their product actively, they will probably increase slightly in the soon future.
Data Warehouse Automation Tools on the market
As you can imagine, there are plenty of… Read more

The 5 top most searched Data Warehouse Automation tools on the market compared with GoogleTrends is telling you that WhereScape is first before TimeXtender and BiReady (new Attunity Compose) over the last year. See the picture in full size or go directly to GoogleTrend comparison and change to your own needs.

Although the analysis is not representative, it still gives some insights and a good overview to size and presumably usage compared to each other, worldwide. Please consider that WhereScape and TimeXtender have more search results as the company name is the same as their product, meaning some of them are dedicated to the company name rather the Data Warehouse Automation (DWA) tool itself. And BimlFlex just published their first release and biGENiUS is rather new to market their product actively, they will probably increase slightly in the soon future.
Data Warehouse Automation Tools on the market
As you can imagine, there are plenty of…

Read more

Why automate? What does Data Warehouse Automation for us?


This article is for you if you considering to use Data Warehouse Automation (DWA) and asking yourself why you should use Data Warehouse Automation tools what does it do for you. After I explained in my previous blog Why Data Warehouse Automation is not more popular, you will find the why and what of Data Warehouse Automation in this second post of the series.
Why automate your Data Warehouse?
Every industry has used automation to increase productivity, reduce manual effort, improve quality and consistency, and speed delivery. Henry Ford introduced the assembly to produce automobiles, and today Uber and countless other startups use the Internet and digital processing to reduce friction in commercial transactions. Thus, the time has come to introduce automation to data warehousing.
Pointed out by Eckerson Group.

I would say it like this. In a society where time flys remarkably fast and data became the new gold, it’s crucial to have proper analyses… Read more

This article is for you if you considering to use Data Warehouse Automation (DWA) and asking yourself why you should use Data Warehouse Automation tools what does it do for you. After I explained in my previous blog Why Data Warehouse Automation is not more popular, you will find the why and what of Data Warehouse Automation in this second post of the series.
Why automate your Data Warehouse?
Every industry has used automation to increase productivity, reduce manual effort, improve quality and consistency, and speed delivery. Henry Ford introduced the assembly to produce automobiles, and today Uber and countless other startups use the Internet and digital processing to reduce friction in commercial transactions. Thus, the time has come to introduce automation to data warehousing.
Pointed out by Eckerson Group.

I would say it like this. In a society where time flys remarkably fast and data became the new gold, it’s crucial to have proper analyses…

Read more

Why Data Warehouse Automation is not more popular


Data Warehouse Automation

I was working with a Data Warehouse Automation (DWA) tool for a little more than a year, and I have to say I loved it. As a BI developer you could focus on the challenges you had in dimensional modelling, what granularity should you have the fact tables and going crazy with the business requirements and everything fast, consistent and tested!
But why is Data Warehouse Automation not used more often and more popular? I’m asking that myself more and more. That’s why I’m writing a series of blog posts all about DWA. In this first blog, I’m trying to find possible reasons behind and also argue for DWA, and why we should use it more often.
Every one needs to make data driven decision faster, why not use a generator which gives you answers in days instead of months..?
Losing control
What do I mean by that? Many people and therefore many companies fear…

Read more