Saving time with text data sources in SSIS

Text data sources are among the most common sources used with SQL Server Integration Services. Understanding how they work can help you avoid falling into the usual traps.

Related videos:

Using the Pivot transformation in SQL Server Integration Services

Using the Pivot transformation in SQL Server Integration Services


Read the full transcript from this video below:  

Saving time with text data sources in SSIS

Eric Johnson: Hello. Welcome to the screencast on using text data sources in SSIS. My
name is Eric Johnson. I am a SQL server MVP, and a consultant with
Consortio Services in Colorado Springs. We are going to take a look at
text data sources today because they are probably one of the more common
sources you will use, especially in simpler SSIS packages but there can be
a couple little things about them that can bite you. You can just save time
if you go into them understanding how SSIS works with the text data source.
Let us just take a look at a sample package and walk through a couple of
imports.

If you look on the hard drive, I have two contact lists, and we are going
to take a look at both of these by the end of this screencast. The first
one is a list of contacts. It is first and last name, email address, phone
number and a comment field to the right, for some longer comments. One of
them, the header row is here, so you will have the column names. Then in
the ContactList2 file, I do not have that header row, so there are no column
names, so we will take a look at the differences there.

We are going to start by looking at setting up an import of ContactList 1.
It is pretty straightforward: you just create a new flat file connection
inside of your SSIS package. I am just going to call this "Contact List". You
browse to the location of the file, and for ContactList 1, remember, the
column names were in that first row of data, so I need to check this box.
If I do not, when I come over here to look at columns, you will see first
name, last name, email, and they show up as a data row, and I get column
zero, column one, column two. We are going to make sure to check Column
Names in the First Row of Data. Then, when I come down to this columns tab
I see First Name, Last Name, Email as the column names, and there is my
description text. I am just going to go and click OK and setup, actually,
I already have an OAB destination for the table. This is where a lot of
people will stop and they will see errors, and I will show you what I mean.

We will start by dropping into this data flow task that we have. We will
drop a flat file source, then we will define this flat file source as that
contact list connection we just created. Then we will go ahead and come
all the way down to Destinations and drop in an OLE DB Destination, and
drag a path, and in our OLE DB destination it is going to be the Eric
Johnson, EJohnson Mobile Adventure Works Database. I am going to do just a
standard table or view load, and it is this DBO contact list table, which
as you will see, is completely empty.

Go the mappings column, and it will automatically set up my first name and
last name. Email does not get set up because the name does not match, so we
will make sure Email goes to Email Address, Phone and Description. As you
can see, we are not quite ready to run this because we are seeing an error
in our OLE DB destination. If you hover over this, it is going to pop up,
and you can also see it in our error list. This is a test package that I
use, so there are actually a lot of errors from other pieces/parts. The
error that we are seeing here is, 'The column cannot convert between
unicode and non-unicode.' That is because, by default, when you set up a
flat file data source, they are set up as non-unicode. In this database,
let me just flip over to Enterprise Manager for just a moment, and you can
see under this contact list table that all of the columns are nvarchar.
Actually, you can even see all the columns because there is also a
description which is a varchar. The rest are all nvarchar. A lot of people
would be tempted to go ahead, drop a data conversion task on and convert
all my data from the flat file, from nvarchar to char or varchar, but we
can actually fix it at the file source.

I am just going to click Connect List Connection Manager that we setup, and
I am going to go to advanced. Here you can see for First Name, Last Name
and Email all the data types that have been defined. These are all
nvarchar columns, except for Description. I am going to go to First Name
and I am just going to pick from the dropdown the WString, dt_wstr, which
is Unicode, and I am going to do that for, and you can do this by multi-
selecting Last Name, Email and Phone. We are going to change all of those
to the Wstring format. We are going to click OK, and this is going to give
me a metadata warning, so all you have to do is double-click it, and it
says, 'The metadata no longer matches.' Click You and click OK, and it
updates the metadata from the source. Now you will see that I am getting a
truncation warning. Let me see if I can show you that one. 'Truncation may
occur because the Phone column has a length of 50 but the database only has
a length of 25.' Again, all the columns in this flat file, and I will open
this back up, not only did it default the data type, it also defaulted the
data length, so everything, as you can see if you click through, the column
with a set to 50.

Let us flip back over to Management Studio again and take a look. First
Name, Last Name, and Email are all fine at 50. Phone Number is actually 25;
Description is 1,000. I am going to go ahead and set the phone number to
25. I am going to leave Description alone for a second, at 50 because I
want you to see what happens if this is wrong. You will the warning, and
again, double-click to update your metadata -- click Yes and click OK. You
will see, you will get the warning when the data in the flat file is longer
than data in the database but you do not see a warning when the data in
the flat file is defined as a shorter column than the data in the database,
as is the case for that Description column. If you look at our data even
though the description column is defined as 50 characters, if you look here
and scroll over, that last column of data is in some cases much longer than
50 characters. What is going to happen when we try to execute this package?
Let's find out.

I am going to come over here; ignore these Build Cache and Import Contacts
control flows -- we are just looking at this dataflow. You can see that I am
still getting an exclamation point. Sometimes you will get that until you
have opened and clicked OK on your destination, so now it should be clear,
so it is. I am going to right-click and select Execute Task. You would
expect this to work but I suspect that we are going to have a failure, and
we will. The reason we had a failure, if you scroll to task failed, let me
just make this a little bit bigger. Here is our error: 'Data conversion
failed. The output column description failed because truncation occurred.'
Even though we do not get a warning because it does not know there is data
in the text file longer than 50; it only knows that the data you have
defined was too long for the database. It does not know that the data in
the text file was too long for the text file's definition. It sounds a
little confusing but we get an error at runtime, whereas we did not in
design time.

Again, just come back into that Contact List connection. You can do all
this the first time through. You do not have to keep coming back in like I
have, I have done this for demo purposes. The output column width, I am
going to set it to 1,000 so as you can see, it will match this description
column in the database of 1,000. Again, I have to refresh the metadata;
click OK. I will just double-click this one and click OK, as well.
Everything is nice and refreshed. I do not see any warnings, and then we
will go ahead, and just so you know I am not running any smoke and mirrors,
here is a select statement for Select All from Contact List. As you can see,
my table is completely blank, so when I come back here and execute this
data flow, it should run yellow, green. If you look at the dataflow
itself, it is done. 38 rows got imported from the flat file to the
destination. Flip back over here, do a select, and there is my data
complete with the description column on the records that had it.

Pretty simple, but those are the things you just need to be aware of. Real
quick, and then we will wrap this up. I just want to show you what would
happen if you were working with a text file that did not have the columns
defined. You have two choices really, and I am just going to open the same
contact list definition, and we will just change from ContactList 1 to the
ContactList 2 file. In this case, the column names are not in the first row
of data, so I am going to uncheck that, and I am going to hit Reset
Columns. Basically, it has just read this data back in as though it has
never seen this text file before. You will see I have Column 0, Column 1,
Column 2, Column 3 and Column 4. I have a few options: I can leave them as
they were, and if you remember, 3 was our phone numbers, so I can change
this back to a 25. The first four columns needed to be Wstrings instead of
String, and this fourth column was our description so it is 1,000. Column 4
was our description -- it is 1,000. I can do that, and my flat file source
just needs to be updated, so I click OK. My mappings are going to be
completely gone now, so I am going to go ahead and delete those invalid
column references, and when I reopen this, I will get fresh mappings, but
it is no longer going to be able to map regularly.

I could come through and say 0 is First Name, 1 is Last Name, 2 is Email
Address but this gets tedious, especially over a large packet, so you want
to make sure you have friendly names. Once again, on the properties of our
Connection Manager, while we are going through and changing data types and
lengths, you can actually change your column names. You can go through and
define each of these as First Name, Last Name, Email and so on, and this
gives you the ability if you have a file that is coming, and it is not well
defined, to define them, and later on in your packages, you will have nice
friendly names, Description, and you will be able to reference everything
by name, just as though the first row did contain your column headers. A
lot of double-clicking on this to automatically fix your source file, and
then you could go ahead and redo your mapping. It should map most of them
again except for email because the name was not the same, and now you are
back where you were with the file that had the column header.

Real simple stuff but the bottom line is: make sure when you are reading
data in from a text file, that in your Connection Manager, you have defined
all the valid values under Advanced for the columns names, the data types
and the column widths. It will save you a whole lot of headache later as
you develop larger and more complicated packages. I hope this has been
helpful. Thank you for listening.

View All Videos

Start the conversation

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

-ADS BY GOOGLE

SearchBusinessAnalytics

SearchDataCenter

SearchDataManagement

SearchAWS

SearchOracle

SearchContentManagement

SearchWindowsServer

Close