The bulk copy tool (BCP) is one of SQL Server’s mainstay command-line tools -- a very handy and somewhat underrated
way to import and export massive amounts of data into and out of SQL Server. Unfortunately, it has a few limitations -- even in SQL Server 2008 R2 and SQL Server 2012 -- that are worth calling out, since they can be the source of gotchas along the way. Here are some of the major ones I’ve run into.
There’s no UTF-8 support. SQL Server has native support for Unicode, as anyone who’s worked with the nvarchar and ntext field types can attest. It deals with Unicode internally by mapping every character to a two-byte entity. This isn’t a problem if you’re just dealing with data that’s shipped between instances of SQL Server, since they all store the same way.
Get more information on SQL bulk copy
Importing and exporting data with SQL bulk copy
Using SQL bulk copy for data moving, sqldiag and troubleshooting
Things get complicated, though, if you try to use BCP to import data from a source that exported Unicode data as UTF-8. This is a subvariant of Unicode designed to allow backward compatibility with 8-bit ASCII text, so that webpages, emails and other formats that use 8-bit ASCII encoding by default can be used to store Unicode data.
If you export data from a source in UTF-8, don’t expect to use it with BCP; it doesn’t support UTF-8, period. You have to take the data in question and export it as full 2-byte Unicode to make it acceptable. The irony is that one other commonplace coding is accepted by BCP via the -C switch: ISO 1252 (ANSI/Microsoft Windows). On the whole, though, you would be better off exporting your data as 2-byte Unicode for maximum compatibility with BCP, especially if you’re dealing with data that may contain characters which aren’t comfortably handled as ASCII.
Watch out for row order on export. Data exported with BCP using a query follows the same rules about export order that would apply to the query under any other circumstances. In other words, if your query has no explicit ORDER BY clause, you’ll get the data out in what may seem to be an entirely arbitrary order. It’s typically based on the order of the underlying index for the table or tables in question, but I’ve learned not to trust even that as a rule of thumb -- especially if the query has a JOIN or some other aggregation between multiple tables.
What order the data is exported in is usually not important, but what order the data is imported in can be crucial. If you have a database in which the validity of later imported rows is contingent on earlier rows and you’re importing the data in batches that enforce this, then the order of the export is important, and you need to construct your BCP statements accordingly. This may seem obvious, but I’m constantly surprised at how many people, including some longtime SQL Server experts, are not aware of this.
Stored procedures invoked from BCP can’t take parameters. If you use a stored procedure that takes parameters as part of the Transact-SQL (T-SQL) statement in a BCP action, it almost certainly won’t work and will throw a function sequence error at the command line.
When a T-SQL statement is passed to BCP, it’s analyzed using the SET FMTONLY ON mechanism to determine the columnar format of the result set. This means dynamically constructed statements, such as stored procedures that take parameters, aren’t analyzed correctly and won’t compile under BCP.
If you want to get around this, there are several ways to do it:
- Create a one-off stored procedure that doesn’t accept any parameters but invokes the stored procedure in question and passes any needed parameters (maybe by accepting them from a source other than the command line).
- Use sqlcmd instead of BCP.
- A trick described on MSDN’s blog involves using what’s called the openrowset trick. If you run a SELECT statement on the OPENROWSET function, you can pass along a T-SQL statement to be run on the server in an ad-hoc way, and thus work around the limitations involved in passing a stored procedure with parameters. This trick has limits, though: for one, it should not be used in a conjunction with a statement which when run makes destructive changes to the database, since the statement may need to be run more than once.
Watch out for table definitions on import. When you use BCP to export data from one SQL Server source and then import it into another, the definitions of the columns you’re exporting from and importing to must match. This includes things like NULL or NOT NULL definitions, the lack of which on the target table can cause silent data corruption.
Triggers in the target database aren’t fired with BCP. BCP’s native behavior is to disable triggers on the target database whenever it is run for an import operation. Since BCP import operations are often big, having triggers enabled by default could get messy. Consequently, you need to use the command option -h "FIRE_TRIGGERS" with BCP to cause triggers to fire.
Note that when enabled, triggers run once for each batch operation -- that is, once for each time you run BCP. Also note that in SQL Server 2005 and later releases, triggers use row versioning, with tempdb used to store the row version information during the import operation. If your tempdb can’t accommodate the influx of data generated by the trigger, the operation could abort unexpectedly.
BCP cannot natively append file output. If you use BCP to export data to a file, that file has to be created anew. You can’t pick an existing file and append the results of the export to it. Fortunately, a workaround isn’t difficult; simply export to any number of files, then use the COPY shell command to concatenate the results:
COPY export1.dat + export2.dat export.dat
ABOUT THE AUTHOR
Serdar Yegulalp has been writing about computers and IT for more than 15 years for a variety of publications, including InformationWeekand Windows Magazine.