In this second part of an introduction to indexes, I am going to look at a very familiar type of index... it's even helpfully called "index" just in case it isn't obvious enough!
Turn to the back (or front if you're Argos) of almost any non-fiction book, and you'll find an index. There are indexes in history books, technical books, shopping catalogues, cookery books, and school text books, to name but a few. Even some fiction books will have an index, to allow you to find references to people or places within them.
So why do we need the index? What does it help us to do that we wouldn't be able to otherwise?
Well quite simply, it allows to to find references to items within the book. In the case of a cookery book, it might allow us to look up recipes containing coriander, or beef; history books might have references to significant people (such as Henry VIII) or places (such as The Tower of London); shopping catalogues might have references to actual objects such as chairs, trousers, or toys.
In each of those examples, the items are different (a history book is unlikely to contain references to coriander), but the idea is the same: the index allows us to find where items are mentioned in the main body of the book or catalogue.
Why do we need an index to do this? Well if we didn't, we'd have to read the whole book to find where the things we were interested in were mentioned. I don't know about you, but I don't want to look through every single recipe in a cookery book every time I want to find one that uses beef or lamb; I would also not want to look at every item in a shopping catalogue if I were only looking for a chair, or a pair of shoes.
The crucial difference between this type of index, and the one found in a telephone directory, is that the main body of the book or catalogue does not have to be in an order that it useful to our search. Let me give a few examples:
Let's take a quick look at how we use this type of index.
Suppose I indeed wanted to find recipes that contain coriander in my cookery book. We know that indexes are usually arranged in alphabetical order, so I would scan through the index until I found an entry for coriander like so:
Confit of cranberries 48
chickpea, chilli and coriander soup 13
oven-roasted vegetables with garlic and coriander 122
Coulbriac, salmon 73
Obviously, we could pick which of the recipes we were interested in, and then follow the relevant page reference (13 or 122) to find it.
Here's another example, this time from a shopping catalogue. Suppose I was looking for a new pair of shoes. In one particular catalogue, the index entry for shoes is as follows:
Shoes 22, 34, 42, 51-53, 68, 70-79, 86, 101, 120-129
In this case, we would probably keep our place on the index page by holding it with one hand, and flick backwards and forwards between the index page and all the pages listed under "Shoes" until we found the one we wanted. Suppose, however, that we were only interested in men's shoes? With the index the way it is above, we would still have to flick backwards and forwards between the index and the main catalogue until we found them. Suppose the index entry looked like this instead:
Now we only have to look through pages 70 - 79. Putting the extra information in the index has saved us having to flick backwards and forwards between the index, and the main body of the catalogue (i.e. the data). In relational database terms, the situation is very similar: data and indexes are also stored on pages, and if the information that we need (in this case the type of shoe) is not held in the index, we have to 'flick' between index and data pages to perform our search. This 'flicking' is an overhead for database engines, just as it is tedious for us to do by hand. So putting extra information into a database index can often save a lot of effort when running database queries, and make them run much faster, especially when there are lots of matching index entries.
This type of index is similar to a type of database index known as a non-clustered index. Non-clustered indexes differ from clustered indexes in that the information in the main part of the book (the data) is not in the order of the index key. In the case of the telephone directory, that order was surname, first name, and address; for books and catalogues, the order is a combination of categories, and standard English language order.
If we were to change the order of the book to match the index at the back, it would suddenly become a very uninteresting book - we would end up with an alphabetical list of words; the book would become both meaningless and useless.
Although it is obvious, I should point out that the main body of a book or catalogue can only be in one order: a telephone directory can only be in one order; a cookery book can only be in one order etc. In the relational database world, a similar constraint applies: the data can only be in one order. Since the order of the data is part of what makes a clustered index, the result is that a particular set of data can only have one clustered index.
You can, however, have as many non-clustered indexes as you like. In the examples we have been using so far, it is difficult to see why we would want to do this. The index in the back of a cookery book contains anything that is significant, whether that be ingredients or cooking techniques. So why would we want more than one?
To answer that question, let's go back to the telephone directory.
We already talked about the fact that a telephone directory is great for performing searches when we know the surname, but useless for any searches when we don't: if we only know the address and/or telephone number then we always have to look at every entry on every page to find the ones we are interested in.
But what if we want to be able to find an entry when all we have is the telephone number?
If we were to re-order the directory in telephone number order, then this would suddenly become possible. Brilliant! Unfortunately, as we have already seen, the directory can only be in one order. So when it is in telephone number order, it suddenly becomes useless for finding entries based on their name (surname and first name).
What we can do, however, is create a separate (non-clustered) index that is in telephone number order.
What about if we also want to be able to search on just an address, i.e. find all the people who are living in a particular house? This is also no problem, because we can create a second (also non-clustered) index that uses the address as its key (remember we can have as many non-clustered indexes as we like).
So what we are left with is a telephone directory in its original order (surname, first name, and address) that we can use for normal searches, but with two additional indexes: one with the telephone number as its key, and the other with the address. If we were doing this for real in a relational database, we would create a clustered index and two non-clustered indexes. This would allow us to perform all three types of searches, and everyone would be happy!