Data Engineering Blog

Genuine News about the Data Ecosystem.
Topics: #dataengineering #bigdata #python #opensource #etl

Open-Source Data Warehousing – Druid, Apache Airflow & Superset

These days, everyone talks about open-source. However, this is still not common in the Data Warehouse (DWH) field. Why is this? In my recent blog, I researched OLAP technologies, for this post I chose some open-source technologies and used them together to build a full data architecture for a Data Warehouse system. I went with Apache Druid for data storage, Apache Superset for querying and Apache Airflow as a task orchestrator.

OLAP, what’s coming next?

Are you on the lookout for a replacement for the Microsoft Analysis Cubes, are you looking for a big data OLAP system that scales ad libitum, do you want to have your analytics updated even real-time? In this blog, I want to show you possible solutions that are ready for the future and fits into existing data architecture. What is OLAP? OLAP is an acronym for Online Analytical Processing. OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modelling.

Data Engineering, the future of Data Warehousing?

Today, there are 6,500 people on LinkedIn who call themselves data engineers according to stitchdata.com. In San Francisco alone, there are 6,600 job listings for this same title. The number of data engineers has doubled in the past year, but engineering leaders still find themselves faced with a significant shortage of data engineering talent. So is it really the future of data warehousing? What is data engineering? These questions and much more I want to answer in this blog post.

Data Warehouse vs Data Lake | ETL vs ELT

There is a bit of a confusion between Data Warehouse vs Data Lake or ETL vs ELT. I hear that Data Warehouses are not used anymore, that they are replaced by Data Lakes altogether, but is that true? And why do we need Data Warehouses anyway? I will go into that as well as the definitions of both pluses explain the differences between them. Data Warehouse vs Data Lake Data Warehouse definition A Data Warehouse, in short DWH and also known as an Enterprise Data Warehouse (EDW), is the traditional way of collecting data as we do since 31 years.

What Data Warehouse Automation tools are on the market

The 5 top most searched Data Warehouse Automation tools on the market compared with GoogleTrends is telling you that WhereScape is first before TimeXtender and BiReady (new Attunity Compose) over the last year. See the picture in full size or go directly to GoogleTrend comparison and change to your own needs. Although the analysis is not representative, it still gives some insights and a good overview to size and presumably usage compared to each other, worldwide.

Why automate? What does Data Warehouse Automation for us?

This article is for you if you considering to use Data Warehouse Automation (DWA) and asking yourself why you should use Data Warehouse Automation tools what does it do for you. After I explained in my previous blog Why Data Warehouse Automation is not more popular, you will find the why and what of Data Warehouse Automation in this second post of the series. Why automate your Data Warehouse? Every industry has used automation to increase productivity, reduce manual effort, improve quality and consistency, and speed delivery.