Are you on the lookout for a replacement for the Microsoft Analysis Cubes, are you looking for a big data OLAP system that scales ad libitum, do you want to have your analytics updated even real-time? In this blog, I want to show you possible solutions that are ready for the future and fits into existing data architecture.
What is OLAP?
OLAP is an acronym for Online Analytical Processing. OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modelling. An OLAP cube is a multidimensional database that is optimised for data warehouse and online analytical processing (OLAP) applications. An OLAP cube is a method of storing data in a multidimensional form, generally for reporting purposes. In OLAP cubes, data (measures) are categorised by dimensions. In order to manage and perform processes with an OLAP cube, Microsoft developed a query language, known as multidimensional expressions (MDX), in the late 1990s. Many…
There is a bit of a confusion between Data Warehouse vs Data Lake or ETL vs ELT. I hear that Data Warehouses are not used anymore, that they are replaced by Data Lakes altogether, but is that true? And why do we need Data Warehouses anyway? I will go into that as well as the definitions of both pluses explain the differences between them.
Data Warehouse vs Data Lake
Data Warehouse definition
A Data Warehouse, in short DWH and also known as an Enterprise Data Warehouse (EDW), is the traditional way of collecting data as we do since 31 years. The DWH serves the purpose of being the data integration from many different sources, the single point of truth and the data management meaning cleaning, historize and data joined together. It provides greater executive insight into corporate performance with management Dashboards, Reports or Ad-Hoc Analyses.
Various types of business data are analysed with Data Warehouses. The need for it often…
The 5 top most searched Data Warehouse Automation tools on the market compared with GoogleTrends is telling you that WhereScape is first before TimeXtender and BiReady (new Attunity Compose) over the last year. See the picture in full size or go directly to GoogleTrend comparison and change to your own needs.
Although the analysis is not representative, it still gives some insights and a good overview to size and presumably usage compared to each other, worldwide. Please consider that WhereScape and TimeXtender have more search results as the company name is the same as their product, meaning some of them are dedicated to the company name rather the Data Warehouse Automation (DWA) tool itself. And BimlFlex just published their first release and biGENiUS is rather new to market their product actively, they will probably increase slightly in the soon future.
Data Warehouse Automation Tools on the market
As you can imagine, there are plenty of…
This article is for you if you considering to use Data Warehouse Automation (DWA) and asking yourself why you should use Data Warehouse Automation tools what does it do for you. After I explained in my previous blog Why Data Warehouse Automation is not more popular, you will find the why and what of Data Warehouse Automation in this second post of the series.
Why automate your Data Warehouse?
Every industry has used automation to increase productivity, reduce manual effort, improve quality and consistency, and speed delivery. Henry Ford introduced the assembly to produce automobiles, and today Uber and countless other startups use the Internet and digital processing to reduce friction in commercial transactions. Thus, the time has come to introduce automation to data warehousing.
Pointed out by Eckerson Group.
I would say it like this. In a society where time flys remarkably fast and data became the new gold, it’s crucial to have proper analyses…
I was working with a Data Warehouse Automation (DWA) tool for a little more than a year, and I have to say I loved it. As a BI developer you could focus on the challenges you had in dimensional modelling, what granularity should you have the fact tables and going crazy with the business requirements and everything fast, consistent and tested!
But why is Data Warehouse Automation not used more often and more popular? I’m asking that myself more and more. That’s why I’m writing a series of blog posts all about DWA. In this first blog, I’m trying to find possible reasons behind and also argue for DWA, and why we should use it more often.
Every one needs to make data driven decision faster, why not use a generator which gives you answers in days instead of months..?
What do I mean by that? Many people and therefore many companies fear…
Very long time ago, 1986, 31 years ago to be precise, IBM in Europe created the very first architecture of a data warehouse. And it seems to be a masterpiece as it didn’t change much since. Though how can we improve or bring some innovation into Data Warehouse business in times everyone is talking about big data, data lakes, Internet of Things (IoT), predictive analytics, Data Vault, etc.?
No matter how we want to improve the architecture, it has to be automated as much as possible. Nowadays it became too slow for serving the business needs doing it the traditional way. However, I don’t think DWH will go away anytime soon (see more DWH vs Data Lake). I strongly believe that DWA tools are the future and will boost the Data Warehousing reputation back to earlier years.
That’s why I wrote this blog series all about DWA. You can start with the first blog…
Continued from Migrate from Oracle to Microsoft (Views) – Part II here part III with the main focus on very interesting topic testing ;-).
When I implemented all about 20 views from Oracle to SQL Server, I had to test the performance and not least important, the content of the views if their still contain the same. And how do you do that the easiest way?
For measuring the performance you simply select alls Views with a SELECT COUNT(*) on both databases. Don’t forget to clear the cache before each run.
To generate all the COUNT-Statements I created a little script, just to make life easier right ;-):
–Generate Queries – Performance Test…
–stuff(c.list,1,1,”) as cols,
‘PRINT ”’ + v.Table_name + ”’ ‘ +
‘SELECT ”’+v.Table_name+”’ as Tbl_name, NULL as cnt UNION ALL ‘ +
‘SELECT ”’+v.Table_name+”’ as Tbl_name, count(*) as cnt FROM [dbo].[‘+v.Table_name +’] ‘–WHERE 1=2 ‘
Continued from Migrate from Oracle to Microsoft (Views) – Part I here part II with more focus on the performance part.
Performance issues recursive queries
When it comes to performance I had one big issue with the built-in function SYS_CONNECT_BY_PATH which is used for rekursiv queries in Oracle. This function is well know by the Oracle optimizer which makes this function unbelievable fast and there is no similar function on SQL Server. Even more complex was, that this queries was used on nested views, so the SQL was quite complex.
I started to translate it to an CTE view which is the pendant in SQL server and the answer to recursive queries in SQL Server. But I ended up having big performance issues because the queries where to complex. I had good performance measures on an easy simple query but invinitive response with more complex.
So what’s the solution to that?
Inline User-Defined Functions. Of course it…
Sometimes you have to migrate from an Oracle database to a Microsoft SQL Server. I’m not going into the reason that could be behind. I had different projects where I had to achieve this goal. The issues that I faced is what I would like to share with you.
When it comes to rewrite code from Ora-SQL to MS-SQL, in my case it were Oracle Views that had to be transfered to Microsoft Views, then the first thing you figure out is, that there’s not the same built in functions on both sides. For example the well know TO_NUMBER() or TO_CHAR() in Oracle dosen’t exist in Microsoft. So what you do?
Below you see a list of the common functions compared between the two technologies:
MS SQL Server
Smallest integer >= n
Max or min number or string in list
Translate NULL to n
Return NULL if two values are equal
str1 + str2
There is no easy way to generate partition for SSAS Cubes by default. So you have to do a SSIS-Package (here is a way you can do it dynamic-cube-partitioning-in-ssas-2008) or write a SQL Script which generates XMLA executables.
As shown in the Statement, you can dynamically generate month partitions. The trick is, to create a linked server, that you can execute XMLA statements with relational T-SQL, so i created a Linked Server first.
Thanks for any advice or improvements in the comment section:
@server = ‘SSAS’
, @srvproduct = ”
, @provider = ‘MSOLAP’
, @datasrc = ‘[DB-NAME]’;
EXEC master.dbo.sp_serveroption @server=’SSAS’, @optname=’rpc out’, @optvalue=’true’;
CREATE SCHEMA ssas;
CREATE PROCEDURE ssas.usp_Execute (