Today, there are 6,500 people on LinkedIn who call themselves data engineers according to stitchdata.com. In San Francisco alone, there are 6,600 job listings for this same title. The number of data engineers has doubled in the past year, but engineering leaders still find themselves faced with a significant shortage of data engineering talent. So is it really the future of data warehousing? What is data engineering? These questions and much more I want to answer in this blog post.
In unicorn companies like Facebook, Google, Apple where data is the fuel for the company, mostly in America, is where data engineers are largely used. In Europe, the job title does not completely exist besides the startup mecca Berlin, Munich, etc. They are called or included in jobs like software engineer, big data engineer, business analyst, data analyst, data scientist and also the business intelligence engineer. Myself, I started as a…
This article is for you if you considering to use Data Warehouse Automation (DWA) and asking yourself why you should use Data Warehouse Automation tools what does it do for you. After I explained in my previous blog Why Data Warehouse Automation is not more popular, you will find the why and what of Data Warehouse Automation in this second post of the series.
Why automate your Data Warehouse?
Every industry has used automation to increase productivity, reduce manual effort, improve quality and consistency, and speed delivery. Henry Ford introduced the assembly to produce automobiles, and today Uber and countless other startups use the Internet and digital processing to reduce friction in commercial transactions. Thus, the time has come to introduce automation to data warehousing.
Pointed out by Eckerson Group.
I would say it like this. In a society where time flys remarkably fast and data became the new gold, it’s crucial to have proper analyses…
I was working with a Data Warehouse Automation (DWA) tool for a little more than a year, and I have to say I loved it. As a BI developer you could focus on the challenges you had in dimensional modelling, what granularity should you have the fact tables and going crazy with the business requirements and everything fast, consistent and tested!
But why is Data Warehouse Automation not used more often and more popular? I’m asking that myself more and more. That’s why I’m writing a series of blog posts all about DWA. In this first blog, I’m trying to find possible reasons behind and also argue for DWA, and why we should use it more often.
Every one needs to make data driven decision faster, why not use a generator which gives you answers in days instead of months..?
Losing control
What do I mean by that? Many people and therefore many companies fear…
Continued from Migrate from Oracle to Microsoft (Views) – Part II here part III with the main focus on very interesting topic testing ;-).
Testing
When I implemented all about 20 views from Oracle to SQL Server, I had to test the performance and not least important, the content of the views if their still contain the same. And how do you do that the easiest way?
Measure performance
For measuring the performance you simply select alls Views with a SELECT COUNT(*) on both databases. Don’t forget to clear the cache before each run.
To generate all the COUNT-Statements I created a little script, just to make life easier right ;-):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
--Generate Queries - Performance Test... SELECT v.TABLE_NAME, --stuff(c.list,1,1,'') as cols, 'PRINT ''' + v.Table_name + ''' ' + 'SELECT '''+v.Table_name+''' as Tbl_name, NULL as cnt UNION ALL ' + 'SELECT '''+v.Table_name+''' as Tbl_name, count(*) as cnt FROM [dbo].['+v.Table_name +'] '--WHERE 1=2 ' + CHAR(13)+CHAR(10) FROM INFORMATION_SCHEMA.VIEWS v --where v.TABLE_NAME LIKE '%What you like%' ORDER BY v.TABLE_NAME --To the generated output you just add below in front and you're ready to go: SET STATISTICS TIME ON PRINT 'My_View' SELECT 'My_View' as Tbl_name, NULL as cnt UNION ALL SELECT 'My_View' as Tbl_name, count(*) as cnt FROM [dbo].[My_View] ... |
Compare content
First you create a database linked server on your SQL Server database. If you don’t know how to do this, please read the following How-to.
If you have only a few comparison you easily create EXCEPT-SELECT-Query and compare your Oracle Views over the database link (linked-Server)…
Continued from Migrate from Oracle to Microsoft (Views) – Part I here part II with more focus on the performance part.
Performance issues recursive queries
When it comes to performance I had one big issue with the built-in function SYS_CONNECT_BY_PATH which is used for rekursiv queries in Oracle. This function is well know by the Oracle optimizer which makes this function unbelievable fast and there is no similar function on SQL Server. Even more complex was, that this queries was used on nested views, so the SQL was quite complex.
I started to translate it to an CTE view which is the pendant in SQL server and the answer to recursive queries in SQL Server. But I ended up having big performance issues because the queries where to complex. I had good performance measures on an easy simple query but invinitive response with more complex.
So what’s the solution to that?
Inline User-Defined Functions. Of course it…
Sometimes you have to migrate from an Oracle database to a Microsoft SQL Server. I’m not going into the reason that could be behind. I had different projects where I had to achieve this goal. The issues that I faced is what I would like to share with you.
Different functions
When it comes to rewrite code from Ora-SQL to MS-SQL, in my case it were Oracle Views that had to be transfered to Microsoft Views, then the first thing you figure out is, that there’s not the same built in functions on both sides. For example the well know TO_NUMBER() or TO_CHAR() in Oracle dosen’t exist in Microsoft. So what you do?
Below you see a list of the common functions compared between the two technologies:
Description
Oracle
MS SQL Server
Smallest integer >= n
CEIL
CEILING
Modulus
MOD
%
Truncate number
TRUNC
<none>
Max or min number or string in list
GREATEST,LEAST
<none>
Translate NULL to n
NVL
ISNULL
Return NULL if two values are equal
DECODE
NULLIF
String concatenation
CONCAT(str1,str2)
str1 + str2
Convert ASCII…
There is no easy way to generate partition for SSAS Cubes by default. So you have to do a SSIS-Package (here is a way you can do it dynamic-cube-partitioning-in-ssas-2008) or write a SQL Script which generates XMLA executables.
As shown in the Statement, you can dynamically generate month partitions. The trick is, to create a linked server, that you can execute XMLA statements with relational T-SQL, so i created a Linked Server first.
Thanks for any advice or improvements in the comment section:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 |
EXEC master.dbo.sp_addlinkedserver @server = 'SSAS' , @srvproduct = '' , @provider = 'MSOLAP' , @datasrc = '[DB-NAME]'; EXEC master.dbo.sp_serveroption @server='SSAS', @optname='rpc out', @optvalue='true'; CREATE SCHEMA ssas; CREATE PROCEDURE ssas.usp_Execute ( @XMLA XML ) AS DECLARE @Command VARCHAR(MAX) = CONVERT(VARCHAR(MAX), @XMLA); EXEC (@Command) AT SSAS; CREATE PROCEDURE ssas.usp_ProcessDatabase ( @CubeName NVARCHAR(50) , @CubeID NVARCHAR(50) , @MeasureGroupID NVARCHAR(50) , @DataSourceID NVARCHAR(50) , @StartDT DATETIME --'YYYY-MM-DD' , @EndDT DATETIME , @PartitionPreFix NVARCHAR(20) , @SqlQuery nvarchar(max) ='' /*'SELECT FK_AS_Produkt_ID, FK_AS_Produkt_productId_BK, FK_AS_Vertrag_ID, FK_AS_Vertrag_bookingId_BK, FK_AS_Inserent_ID, FK_AS_Inserent_accountId_BK, FK_AS_Objekt_ID, FK_AS_Objekt_vehicleId_BK, FK_CD_Zeit_Zeit_ID, FK_CD_Zeit_Zeit_DIM_ZEIT_KEY_BK, [Datum Start], [Datum Ende], [Datum Start Booking], [Status ID], [Anzahl Plätze], [Anzahl verwendete Plätze], Preis, MwSt, privateInserters_distict_cnt_PD, privateInsertersReturners1_distict_cnt, privateInsertersReturners12_distict_cnt, privateInsertersReturners48_distict_cnt, soldMemberSlotsFixdate_sum_PD, privateInsertersMoto_sum_PD, vehiclesPrivatInsertersAuto_sum_PD, revenuePrivateInsertersListingsAuto_sum_PD, revenuePrivateInsertersAdditionalsAuto_sum_PD, vehiclesPrivatInsertersMoto_sum_PD, revenuePrivateInsertersAdditionalsMoto_sum_PD, SA_SourceSystem_ID, SA_Load_ID, CA_Load_ID, DW_Load_ID, SCD_TransactionDate, DS_Load_ID FROM DM.AS_BookingStats WHERE FK_CD_Zeit_Zeit_ID BETWEEN <#=dateFrom#> AND <#=dateTo#>'*/ ) AS DECLARE @myXMLA XML, @myXMLA_before XML, @myXMLA_delete XML declare @value nvarchar(30),@dateFrom varchar(10),@dateTo varchar(10),@year nvarchar(4) declare @replSqlQuery nvarchar(max) set @dateFrom=FORMAT(DATEADD(MONTH, DATEDIFF(MONTH, 0, @StartDT), 0),'yyyyMMdd') --First day of month select @value=@PartitionPreFix+FORMAT (@StartDT, 'yyyyMM')+'_before' SELECT @replSqlQuery= REPLACE(@SqlQuery,'WHERE FK_CD_Zeit_Zeit_ID BETWEEN <#=dateFrom#> AND <#=dateTo#>', 'WHERE FK_CD_Zeit_Zeit_ID < ' + @dateFrom) --delete standard set @myXMLA_delete = N'<Delete xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"> <Object> <DatabaseID>'+@CubeName+'</DatabaseID> <CubeID>'+@CubeID+'</CubeID> <MeasureGroupID>'+@MeasureGroupID+'</MeasureGroupID> <PartitionID>'+@MeasureGroupID+'</PartitionID> </Object> </Delete>' PRINT CONVERT(nvarchar(max),@myXMLA_delete) EXEC ssas.usp_Execute @myXMLA_delete; PRINT ' ' --delete first partition before set @myXMLA_delete = N'<Delete xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"> <Object> <DatabaseID>'+@CubeName+'</DatabaseID> <CubeID>'+@CubeID+'</CubeID> <MeasureGroupID>'+@MeasureGroupID+'</MeasureGroupID> <PartitionID>'+@value+'</PartitionID> </Object> </Delete>' PRINT CONVERT(nvarchar(max),@myXMLA_delete) EXEC ssas.usp_Execute @myXMLA_delete; PRINT ' ' Set @myXMLA =N'<Create xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"> <ParentObject> <DatabaseID>'+@CubeName+'</DatabaseID> <CubeID>'+@CubeID+'</CubeID> <MeasureGroupID>'+@MeasureGroupID+'</MeasureGroupID> </ParentObject> <ObjectDefinition> <Partition xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200" xmlns:ddl300="http://schemas.microsoft.com/analysisservices/2011/engine/300" xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300" xmlns:ddl400="http://schemas.microsoft.com/analysisservices/2012/engine/400" xmlns:ddl400_400="http://schemas.microsoft.com/analysisservices/2012/engine/400/400"> <ID>'+@value+'</ID> <Name>'+@value+'</Name> <Source xsi:type="QueryBinding"> <DataSourceID>'+@DataSourceID+'</DataSourceID> <QueryDefinition> '+ @replSqlQuery +' </QueryDefinition> </Source> <StorageMode>Molap</StorageMode> <ProcessingMode>Regular</ProcessingMode> <ProactiveCaching> <SilenceInterval>-PT1S</SilenceInterval> <Latency>-PT1S</Latency> <SilenceOverrideInterval>-PT1S</SilenceOverrideInterval> <ForceRebuildInterval>-PT1S</ForceRebuildInterval> <Source xsi:type="ProactiveCachingInheritedBinding" /> </ProactiveCaching> </Partition> </ObjectDefinition> </Create>' PRINT CONVERT(nvarchar(max),@myXMLA) EXEC ssas.usp_Execute @myXMLA; PRINT ' ' WHILE @StartDT < @EndDT BEGIN set @dateFrom=FORMAT(DATEADD(MONTH, DATEDIFF(MONTH, 0, @StartDT), 0),'yyyyMMdd') --First day of month set @dateTo =FORMAT(DATEADD(MONTH, DATEDIFF(MONTH, -1, @StartDT), -1),'yyyyMMdd') --Last Day of month select @value=@PartitionPreFix+FORMAT (@StartDT, 'yyyyMM') SELECT @replSqlQuery= REPLACE(REPLACE(@SqlQuery,'<#=dateTo#>', @dateTo), '<#=dateFrom#>', @dateFrom) set @myXMLA_delete = N'<Delete xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"> <Object> <DatabaseID>'+@CubeName+'</DatabaseID> <CubeID>'+@CubeID+'</CubeID> <MeasureGroupID>'+@MeasureGroupID+'</MeasureGroupID> <PartitionID>'+@value+'</PartitionID> </Object> </Delete>' PRINT CONVERT(nvarchar(max),@myXMLA_delete) EXEC ssas.usp_Execute @myXMLA_delete; PRINT ' ' Set @myXMLA =N'<Create xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"> <ParentObject> <DatabaseID>'+@CubeName+'</DatabaseID> <CubeID>'+@CubeID+'</CubeID> <MeasureGroupID>'+@MeasureGroupID+'</MeasureGroupID> </ParentObject> <ObjectDefinition> <Partition xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200" xmlns:ddl300="http://schemas.microsoft.com/analysisservices/2011/engine/300" xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300" xmlns:ddl400="http://schemas.microsoft.com/analysisservices/2012/engine/400" xmlns:ddl400_400="http://schemas.microsoft.com/analysisservices/2012/engine/400/400"> <ID>'+@value+'</ID> <Name>'+@value+'</Name> <Source xsi:type="QueryBinding"> <DataSourceID>'+@DataSourceID+'</DataSourceID> <QueryDefinition> '+ @replSqlQuery +' </QueryDefinition> </Source> <StorageMode>Molap</StorageMode> <ProcessingMode>Regular</ProcessingMode> <ProactiveCaching> <SilenceInterval>-PT1S</SilenceInterval> <Latency>-PT1S</Latency> <SilenceOverrideInterval>-PT1S</SilenceOverrideInterval> <ForceRebuildInterval>-PT1S</ForceRebuildInterval> <Source xsi:type="ProactiveCachingInheritedBinding" /> </ProactiveCaching> </Partition> </ObjectDefinition> </Create>' PRINT CONVERT(nvarchar(max),@myXMLA) EXEC ssas.usp_Execute @myXMLA; PRINT ' ' SET @StartDT = DATEADD(MONTH,1,@StartDT) END GO |
…
Like you would think ORDER BY is the same in Oracle as in the Microsoft environment, it isn’t. It depends on the Collation of the database. In the following example you see, that two manually added text in Oracle and Microsoft don’t order the same. After I added the Collation for SQL_Latin1_General_CP850_BIN2, it ordered the same.
ORDER BY Oracle:
ORDER BY Microsoft – normal and with collation:
…