SharePoint Search, Synonyms, Thesaurus, and You

 

A little known, and interesting, feature in SharePoint search is the ability to create customized thesaurus word sets. The word sets can either be synonyms, or word replacements, augmenting search functionality. This ability is not limited to single words, it can also be extend into specific phrases.

Replacement Words

Search has the ability to replace a queried keyword with replacement keyword. An example of this might be commonly misspelled word, or phrase. The SharePoint search will handle the more common ‘thier’ for ‘their’ replacements. You might have something that is specific to your circumstance. Lets say we had a SharePoint site that services a botanical company that specialized flowers. Looking at our search ‘Web Analytics Reports’ we could see that people were searching for the term ‘anniemone’, but are get zero returned results in the ‘No Results Queries’ report. After questioning some people we find that they were actually trying to search for ‘anemone’.

image

http://sharepointvaquero.com/sites/search/Pages/results.aspx?k=anniemone

As you can see from the screen shot, and link, above the actual search term was ‘anniemone’, but I was returned results for ‘anemone’.

Expansion Words

When trying to understand expansion words think of it as word synonyms. In the concept of synonyms, expansion words, works in the same process. Expansion augments the search query with designated synonyms. As an example consider our plant SharePoint site example. We know that ‘Birthroot’ is another name for ‘Trillium’. Using an expansion set any search for either ‘Birthroot’ or ‘Trillium’ will return both ‘birthroot’ and ‘Trullium’ results.

image

http://sharepointvaquero.com/sites/test/_layouts/OSSSearchResults.aspx?k=birthroot&cs=This%20Site&u=http%3A%2F%2Fsharepointvaquero.com%2Fsites%2Ftest

Looking at the search string note that I searched for ‘Birthroot’, but my search result page displays both ‘Birthroot’ and ‘Trillium’ results.

How-To

The custom thesaurus file is a simple XML file on the SharePoint server(s) located at:

%ProgramFiles%\Microsoft Office Servers\14.0\Data\Office Server\Applications\GUID-query-0\Config

There should be only one ‘GUID-query-0’ directory if you have one SharePoint search service application. Make sure the following changes are made to each SharePoint server that has SharePoint search query service. The English XML thesaurus file is tsenu.xml as shown below.

Make a backup of any files you modify!

<XML ID="Microsoft Search Thesaurus">

<!--  Commented out

    <thesaurus xmlns="x-schema:tsSchema.xml">
        <diacritics_sensitive>0</diacritics_sensitive>
        <expansion>
            <sub>Internet Explorer</sub>
            <sub>IE</sub>
            <sub>IE8</sub>
        </expansion>
        <replacement>
            <pat>NT5</pat>
            <pat>W2K</pat>
            <sub>Windows 2000</sub>
        </replacement>
        <expansion>
            <sub>run</sub>
            <sub>jog</sub>
        </expansion>
    </thesaurus>
-->
</XML>

The commented out XML leaflets show an example structure. From my previous examples my updated tsenu.xml looks like this.

<XML ID="Microsoft Search Thesaurus">

<!--  Commented out

<thesaurus xmlns="x-schema:tsSchema.xml">
    <diacritics_sensitive>0</diacritics_sensitive>
    <expansion>
         <sub>Internet Explorer</sub>
         <sub>IE</sub>
         <sub>IE5</sub>
    </expansion>
    <replacement>
         <pat>NT5</pat>
         <pat>W2K</pat>
         <sub>Windows 2000</sub>
    </replacement>
    <expansion>
         <sub>run</sub>
         <sub>jog</sub>       
    </expansion>
</thesaurus>

-->

<thesaurus xmlns="x-schema:tsSchema.xml">
     <diacritics_sensitive>0</diacritics_sensitive>
     <expansion>
          <sub>birthroot</sub>
          <sub>trillium</sub>
     </expansion>
     <replacement>
          <pat>anniemone</pat>
          <sub>anemone</sub>
     </replacement>
</thesaurus>

</XML>

For the replacement set, encase it in the <replacement> node. The <pat> node denotes the key term that is going to be replaced. The <sub> node is the key term that will substituted into the search query.

<replacement>   
      <pat>anniemone</pat>   
      <sub>anemone</sub>
</replacement>

In the example the ‘anniemone’ term would be substituted with the ‘anemone’ term.

The <expansion> node is where you would place expansion, or synonym, words. Each <sub> node within the <expansion> node would search for all other <sub> nodes.

<expansion>
    <sub>birthroot</sub>
    <sub>trillium</sub>
</expansion>

In this example a search for either ‘birthroot’ or ‘trillium’ would return results for both ‘birthroot’ and ‘trillium’.

Each language has it’s own thesaurus XML file. A full listing of each language is the reference link.

The <diacritics_sensitive> node lets SharePoint search ignore, or account, for diacritical marks of that particular language.

Once you have completed editing the appropriate XML file close and save it.

You will only need to restart the SharePoint Server Search service of each SharePoint server for the modified XML file to take effect.

Start>Administrative Tools>Services>SharePoint Server Search 14>Restart

–Javi

(As fate would have it I found the official Microsoft page after experimenting on my system)

Reference: Manage thesaurus files (SharePoint Server 2010)

About the Author

Leave a Reply

%d bloggers like this: