As a database developer, administrator and consultant, I often receive questions about the business benefits of...
full-text searching in SQL Server. As one of the most comprehensive options, full-text searching offers various functional approaches, and each can have a positive impact on business objectives. Though this method requires searching through vast amounts of data while ranking it, proper use can return results in a reasonable time frame.
This article outlines some of the capabilities of SQL Server's full-text search, including predicates and functions. It also explores how to use hybrid approaches to full-text searching to enable more comprehensive searches and to meet specific business needs.
Searching columns with LIKE and PATINDEX
When you search text columns using either the LIKE predicate or PATINDEX function results in a table scan that, depending on the size of the text column and the table size, can be time-intensive. Searching with these two options uses internal SQL Server processing rather than an external process.
Using a LIKE predicate offers specificity but limits you to only the most basic search capabilities, such as exact matches of the word or phrase.
Searching columns using the CONTAINS or FREETEXT predicates
The CONTAINS and FREETEXT predicates encompass Microsoft's full-text search capabilities, along with the full-text search service in SQL Server. Any search performed with these predicates requires that both the full-text search service is running and that the user has the correct permissions to access the full-text files stored on the hard drive The T-SQL code must also interface with another process or service outside the one on which it natively runs.
The following options utilize full-text searching:
FREETEXT - This feature allows you to search on word meaning rather than exact wording. It separates strings into individual words and then generates inflectional forms of the words, as well as using expansions and offering the option to replace the words based on thesaurus matches. Parameter sniffing does not work across conversion, so using it will necessitate passing in a string as an nvarchar data type.
FREETEXTTABLE - This feature offers the same basic functionality as FREETEXT, but it returns a full table. This function allows you to use the top_n_by_rank parameter to only return the top N matches based on the rank assigned to the results.
CONTAINS - This feature searches for a word, phrase, prefixes of the word or phrase, a word near another word, inflection stems and word synonyms.
CONTAINSTABLE - This feature has same basic functionality as CONTAINS, however this parameter also returns a table and allows you to return records based on rank. You can also specify weighting criteria as parameters and use the weights on the resulting table.
With full-text searches, one issue is that you lose control of the content selection offered with LIKE and PATINDEX. It becomes difficult to determine how inflection and word synonyms will be used, especially when some companies request a more customized solution.
Hybrid search solutions
SQL Server 2005 allows you to implement the C# language through Common Language Runtime (CLR) stored procedures. These stored procedures can then implement several search methods to provide comprehensive text searching. CLR stored procedures are slightly less efficient at text processing on tables, because they require the SQL Server to initiate and run code through the .NET runtime engine. But the implementation of C# allows a company to tweak its full-text implementation as needed to support business directives rather than being forced to accept its limitations.
The complexity of this approach should not be underestimated, because it normally requires most of the full-text searching functionality Microsoft provides, such as word, phrase, prefix, inflection, word synonyms, word weighting and ranking. As the complexities of your search expand, a typically rapid solution may slow down significantly. This approach also requires that you take into account international character sets rather than work off Microsoft's foundation.
In some scenarios, an acceptable compromise is to implement Microsoft's full-text functionality and then use CLR stored procedures to refine the result set as needed.
Pattern matching-based modifications
Another solution to refine your search is to write full-text functionality into tables written with T-SQL code. If you shred text columns into tables where the primary key is a word linked to the original record, word searches can use clustered index seeks and provide a performance boost to your server. Also, this method eliminates the need to interface with CLR, the .NET engine or the Microsoft full-text engine. But with this approach, you lose a lot of functionality, such as inflection stems, word synonyms and prefix matching. Comparing the weight and rank of words also has to be customized and added, which can further slow processing.
An acceptable compromise is to implement T-SQL code with a shredded text column and, using the seeks on the clustered index, to narrow the results returned to the best matches. This reduces the records that have to be examined using either the full-text search or a CLR stored procedure.
Other scenarios use full-text searching, and a unique T-SQL system to cache the results. In most scenarios, this allows the cached search results to be used while also being cleared periodically to freshen results. A high-use server can benefit when it is relieved from constantly performing text searches, and users get near-instant results to text searches that are not significantly customized. This approach fits many user patterns that emerge as part of regular search engines.
Tuning your full-text search functionality pays off
It is difficult to top the out-of-the-box functionality that Microsoft has provided for SQL Server full-text searching. It is possible, however, to tune these capabilities by augmenting them with cached searches and shredded text columns utilizing clustered index seeks. Another issue involves acquiring sufficient functionality and speed from a CLR stored procedure, but with the right business requirements, it is a feasible option.
ABOUT THE AUTHOR
Matthew Schroeder is a senior software engineer who works on SQL Server database systems ranging in size from 2 GB to 3+ TB, with between 2k and 40+k trans/sec. He specializes in OLTP/OLAP DBMS systems as well as highly scalable processing systems written in .NET. Matthew is a Microsoft certified MCITP, Database Developer, has a master's degree in computer science and more than 12 years of experience in SQL Server/Oracle. He can be reached at firstname.lastname@example.org.