Friday, March 19, 2010

SSIS Traps

Okay, so our 8 year old sql server 2000 database server died..
At least, the SCSI RAID failed / all databases were migrated so IT Infra turned it off... There was only one little DTS which was turned off with it..

So, after a few weeks I got a call: I put my excell in the specified folder, but the website doesn't get updated ..

Off to the server room, turning on the server, going back to desk; still no reply on my ping.. back to server room again .. finding a monitor & keyboard: RAID FAILURE, press F1 to continue.. okay: F1 :P one drive disabled, the fun still continued :)

Win 2000 started, I Immediatly started the sql managment studio, found the DTS and started to taking pictures (server had no network connection) with my SLR :)

Okay, easy peasy; just a little DTS reading an CSV and putting the information in the database (luckly another SQL server, not this broken one).

5 mins work, at most....... right...............

Ofcourse, it's 2010 now, DTS is no more ! It's SSIS !! New & Funstuff!!
Lets quickly re-create this little DTS in SSIS :) :)
Oh, need Visual Studio Business Development Studio (1000 MB) to do this.. Okay, have SQL 2008 R2 CTP right here !! Lets install !! (see install issues previous blog.. )

So, I installed SQL 2008 R2, seeing NOTHING in the program files, no management studio, no BI Development Studio no nothing;
Starting install again ... checked all neccesary components again, it began installing .. again.. After install I got the tools !!

So lets get started ! (5 mins are WAAAAAAAAYYYYY gone already at this phase.. ;)

First trap:
Connections ->Data Sources to create new data source and re-use
(using new connection from datasource in connection manager (bottom screen))

Using a Excute SQL Task & Data Flow Task

Making connection with the CSV:
After double click on the Data Flow Task choose 'flat file source' from the toolbox and drag it on the main screen. Double click it and fill in the path and create new Flat File Connection.

Here comes the second trap

DTS did by default a truncation of data which did not fit into a column. (Or maybe it didn't care about column with at all); SSIS DOES care about column with, so the default with of 50 is maybe a little tight and easy overlooked.
In my case I had one column with 689 chars and so SSIS turned red for the first time real soon giving a truncation error.
At first I changed all the column widths to 255 / 689 exact according to the destination dbase data-types. The truncation errors stayed as if nothing happened.
So, apperantly I had a column somewhere which was > 255, nice..

I wasn't feeling like checking 4000 rows what the max column with was..
So, googling.. found you can change SSIS's behaviour on checking / failing on this. I'ts configured in double click on the flat-file choose error output ->change the truncation column from 'Fail component' to 'Ignore failure'. Second problem fixed.

Third trap
As I'm importing the data into a SQL Server, it might look obvious to choose 'SQL Server Destination' as a destination for the CSV file. This is another trap :)
As it turns out to be, the name 'SQL Server Destination' implies that you want to do a 'BULK INSERT' which is damn fast blablabla BUT not supported on remote sql server's. So you SSIS must be on the same server as the destination dbase.
Which, in my case insn't so. Luckly SSIS gives an very descriptive error about that :) So I choose a OLE DB Destination :)
Third trap fixed :)

Fourth trap
Column mapping. Remembering the good old days where my DTS just used to discover the dbase columns and csv file columns and automatically map those.. SSIS doesn't seem to understand col001 - Column 01 mapping ... So, changing those names 34 times ... and again in the mappings screen, 34 times.. sigh.. did I say 5 mins ??

Okay, renamed the columns, automap works ! (no need to draw 34 lines... :P )

Pressing play.... GREEN ! It actually turned green !! omg !!
Only one problem.. my data is in beteween " .. "bla1","bla2" etc
And my SSIS nicely places it in dbase with "bla1"
:) ooh that's an easy one, I saw 'text qualifier' option in the Flat File Source options screen; Placing an " as a text qualifier, press play again -->RED

resulting in Fourth Trap
if you use the " as text qualifier the SSIS gives an error saying it can't qualify the last column. I fixed this using the Flat File Connection Manager Editor.
In the 'advanced' pane, go to the last column and change the value for the last column from TextQualified true to false; Problem fixed :)
Pressing play, GREEN again :) but already noticed Fifth Trap.

As i'm doing full import everytime I need to truncate the table before filling it with fresh data. So I already put an truncate table SQL task in my SSIS with a nice green arrow to the DataFlow task. And this works fine, only because of the Fourth Trap I noticed this is not really best practice, resulting in an empty table when something goes wrong in the DataFlow task. So I need a transaction based execution.
In the 'Execute SQL Task' I changed the property 'TransactionOption' from Supported to Required. This resulted in the Sixth Trap

Transactions
changing this option immediatly resulted into a RED SSIS with the error
[Execute SQL Task] Error: Failed to acquire connection "Bla". Connection may not be configured correctly or you may not have the right permissions on this connection.

(oh by the way, we're entering the second day in this '5 min' time-path now..)
Googling for this error and for enabling transactions in SSIS; One option you should enable is 'RetainSameConnection' = True in the shared datasource properties
This didn't help. So googling futher..

After few hrs of googling, trying I think I must look into the DTC issues.
I looked ito the book 'Microsoft SQL Server 2008 Integration Services, page 510; Which gives a sample without the use of the DTC; Implemented this and.. it works !
I even didn't specify a failure path, it just works out of the box; defining an TSQL Statment 'BEGIN TRANSACTION' and 'COMMIT TRANSACTION'.
Just to be safe I also defined a 'ROLLBACK' on the failure paths :)
BE AWARE that all tasks use the same Connection Manager. In addition to this, you must make sure that the RetainSameConnection property on the Connection Manager is set to True ! (page 510).

Seventh Trap
The failure paths are default on an 'Logical AND. All constraints must evaluate to True' setting. Which implies there can only be one failure path. So I changed this to the OR mode, which makes all red lines go dotted and works great :)

Eight Trap
Now this thing works on my dev-machine, got to get this on production sql ! Trying to publish to a sql 2005 server is NOT going to work. So, publish it to a sql 2008 production SSIS enabled server.. euh.. we ain't got those.. sigh..
So, I just publish to my own dev-sql2008 and asked IT infra to create a production version :)

Last trap
Okay, just before publishing I discovered an error. As it seems, my solution in 'Fourth Trap' isn't working.. The last column still has data surrounded by "blabla" in the resultset. I spent the rest of the afternoon googling for solutions but only found same issues. Now asking for a change in the export format, for a different column or data delimiter..

[Flat File Source [1]] Error: The column delimiter for column "Column 31" was not found.
-->prob. I'm having this issue: http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/a2ed5d71-b749-47ed-abb2-d745b0a2020a

To be continued.....
Resources:
http://ssisblog.replicationanswers.com/2008/01/10/escaping-double-quotes-in-flat-files-ssis-is-different-to-dts.aspx