A user on Reddit says:
I’m not a DB ninja like a lot of developers as I started out more front end and moved towards full stack. I understand the point of indexing is to speed up SQL queries on large data sets, but I feel like I’m just not sure when to use them and when not to.
What is the current best practice with rails?
Uncertainty often stems from a lack of understanding, and indexes in your Rails app can, if you’re not familiar with them, be akin to those murky backwaters you rarely venture into. But it needn’t be. Indexing done right can make a huge difference to the performance and speed of your app, which more often than not, will translate into increased sales, conversions or whatever it is you measure to determine if your app’s ultimate purpose is being fulfilled.
An index is a database construct that makes it easier for the database to look up a given piece of information. An oft cited example, which brings the point home, is that of a phone book index. Imagine you wanted to look up one Rick Sanchez’s phone number – what would be faster? Looking through each page of the book one by one, or flipping back to the index which conveniently lists in alphabetical order the page number Rick’s phone number might be found in? Of course, the index would be way faster. And the positive performance effects multiply the more times you have to look up phone numbers.
Now, when someone moves in or out of Rick’s locality (which the phone book covers), or someone’s name or phone number changes, the index at the back will also have to be updated. This might end up costing you performance if you’re not careful.
If we represented the phone book example in database terms, we’d probably have a PhoneBook
table, with a name
and phone_number
column. Adding an index to the table on the name
column, makes it easier and faster for the database to look up a phone number for a given name (say when you do something like PhoneBook.where(name: 'Morty Smith')
), especially if there are a large number of rows to look through.
So the main reason to use an index on a given column is increased speed, which will more often than not, translate into more sales, conversions or whatever it is you measure to determine if your app’s ultimate purpose is being fulfilled.
Note that although we’re just talking about indexing on one column, it is actually possible to index on multiple columns too. I’ll list out resources at the end of this article which will go more in depth into topics like this.
If you’re with me so far, it’s clear to you that an index on a column makes it faster for the database to look up a row or a set of rows which satisfy a given condition on that column. With that, we can come up with a rule of thumb:
You should add an index on a column if most (or many) of the queries you write for the table involve that column.
In a typical Rails application, there’s one type of column that is pretty much guaranteed to be involved in most queries for tables that it is in, and that is the foreign key column.
Let’s consider an example.
Say you have a User
and a Post
model, and the User
has_many :posts
. This of course means that the :posts
table has a foreign key column user_id
, and that whenever we do something like @user.posts
or @post.user
, we are asking the database to look up a given user_id
for one or many Post
records. Without an index, the database would have to look up each row in the posts
table, one by one, to see if their user_id
matched what was given. If there are many Post
records (in other words, if Post.count
returns a huge number), then this would take a lot of time. So, we would add an index to the posts
table on user_id
.
Right now, you’ll have to take my word for it that this is an improvement. But I’ll cover how you can verify this for yourself very soon.
So, add indexes by default to your foreign key columns. This is considered a standard Rails practice.
“Non-foreign key” columns can also be good candidates for adding indexes on. If they’re referred to a lot in your table queries, and you have a sizable number of rows in your table, then it would probably benefit you to add an index on them. Common examples I’ve come across in my experience are the date_of_birth
, email
, name
columns in a User
table. Remember to apply the rule of thumb discussed above if you’re unsure.
You know what an index is and when to use it. Here’s where we cover how you can begin to actually use them in your Rails app.
If you have two models which you want to associate, you know that you have to create a foreign key column on one of them. It turns out that in Rails 4 and beyond, you can create an index at the same time (which you would want to do in most cases), with the add_reference
method.
Say you’ve created a User
and Post
model, and you want to set up a has_many
relationship between them in your DB, you could do this in your console:
rails g migration AddUserToPosts user:references
which will generate a migration which uses the ‘add_reference` method like so:
add_reference :posts, :user, index: true
The add_reference
method will not only create a user_id
column on your Post
table, but also add an index on user_id
. Nifty. You can also pass in foreign_key: true
to this method to add in a database constraint
If for some reason, you don’t have access to the add_reference
method, or you’ve already created a user_id
column without adding an index to it, you can use the add_index
method in your migration like so:
add_index :posts, :user_id
You can use the add_index
method to add an index on any column in the table.
So far, you’ve taken my word that indexing is a good thing which speeds up your app in ways that are relevant to business goals. But you don’t have to, because you can use ActiveRecord's
#explain
method to verify if adding an index made a difference to your query time or not. This will also help you determine the situations where indexing doesn’t give you much of a performance benefit.
The #explain
method in ActiveRecord
maps to the EXPLAIN
command in SQL
, and one of the things this command does is tell you how long the database thinks the query in question will take. The specifics of how the database formats this “explanation” and the units it measures “execution time” in depend on the database you’re using. Two common databases in the Rails world are Postgres and MySQL, so I’ll give you the basics on those.
To see the output of the #explain
method, you simply send the #explain
message to the query you’re writing, like so:
Post.where(user_id: 23).explain
If you’re on a Postgres database, and you’ve run explain on a query, you’ll want to look at the cost numbers. The higher this number is, the longer the database thinks your query is going to take. You’ll typically see a range being output for cost.
If you’re on a MySQL database, you’ll want to look at the rows number, which tells you how many rows the database thinks its going to have to go through to get you a result. The higher this number, the longer the database thinks your query is going to take.
In short, by using ‘#explain’ before and after you’ve added an index, you can see for yourself how much of a difference adding the index has made.
This article, and this one, written by Tom Ward, a Rails contributor and developer at Basecamp, are fantastic reads which go deeper into indexing and show you how you can make sense of the explain
method’s output (he uses MySQL in his examples).
Tom also has this handy script, which you can use to list any foreign keys you might have missed indexing.
Indexing is a vast topic, and if you want to dig deeper, here are some starting points:
This book covers many things I didn’t cover here. Definitely worth a read.
Cheers, and until next time!