Next: Internals, Previous: Cache tuning, Up: Top [Contents][Index]
The key structure is what uniquely identifies each word that is inserted in the inverted index. A key is made of a string (which is the word being indexed), and a document identifier (which is really a list of numbers), as discussed above.
The exact structure of the inverted index key must
be specified in the configuration parameter
"wordlist_wordkey_description"
. See the WordKeyInfo(3) manual
page for more information on the format.
We will focus on three examples that illustrate common usage.
First example: a very simple inverted index would be to associate each word occurrence to an URL (coded as a 32 bit number). The key description would be:
Word 8/URL 32
Second example: if building a full text index of the content of a database, you need to know in which field, table and record the word appeared. This makes three numbers for the document id.
Only a few bits are needed to encode the field and table name (let’s say you have a maximum of 16 field names and 16 table names, 4 bits each is enough). The record number uses 24 bits because we know we won’t have more than 16 M records.
The structure of the key would then be:
Word 8/Table 4/Field 4/Record 32
When you have more than one field involved in a key you must chose the order in which they appear. It is mandatory that the Word is first. It is the part of the key that has highest precedence when sorting. The fields that follow have lower and lower precedence.
Third example: we go back to the first example and imagine we have a relevance ranking function that calculates a value for each word occurrence. By inserting this relevance ranking value in the inverted index key, all the occurrences will be sorted with the most relevant first.
Word 8/Rank 5/URL 32
Next: Internals, Previous: Cache tuning, Up: Top [Contents][Index]