3 ways to work with time in Postgres (& ActiveRecord)

Let’s say you’re developing a project management app with users and projects and the product manager asks you:

  1. How much time does it take a given user to create their first project?
  2. Give me a list of users who took longer than 1 month to create their first project.
  3. What is the average amount of time that it takes for a user to create their first project?
  4. What is the average amount of time per user between project creation?

You could approach solving the problems above purely in Ruby and ActiveRecord. However the data we want to work with resides in a database. The database is smart, capable & fast; with SQL, we can make the database retrieve arbitrary rows and perform complex calculations. It therefore stands to reason that making the database “do the work” as much as possible before sending information to Ruby-land will net us performance gains.

In this article, we’re going to explore how we can solve the above problems in Rails using SQL and a bit of ActiveRecord. The SQL used in this article applies only to Postgres(version 9.4.1), though the ideas in general can be translated to other database management systems.

First, a brief primer on SELECT

A SELECT statement retrieves zero or more rows from one or more database tables or database views.

So something like:

SELECT * FROM users

will retrieve all available rows from the users table.

You can also pass in “aggregator” functions like MIN(), COUNT(), MEAN() etc to SELECT. When these functions are passed in to SELECT, the return value is a single piece of data (an integer, string, timestamp etc).

So for example this:

SELECT MIN(created_at) FROM users

will return the earliest created_at (which is a timestamp) in the users table.

SELECT functions can also be nested within themselves, as we’ll see soon enough.

In ActiveRecord the select method can be used to achieve the same result. Also, calls to methods such as .all, .where, .find etc translate to SELECT functions.

I’ve prepared a handy 3-page PDF cheatsheet containing the salient points of this article. If you’d prefer downloading and printing it out, fill in your email address below and I’ll send it to you.

Download a 3-page Cheatsheet for working with time in Postgres



For the sake of this article, assume our models look like this:

class User < ActiveRecord::Base
  has_many :projects
end

class Project < ActiveRecord::Base
  belongs_to :user
end

Arithmetic operators (-, +, * and /)

The first problem we have to tackle is figuring out how much time it takes a user to create their first project. This time equates to the difference between the user’s earliest Project#created_at and User#created_at.

In Postgres, we can use the arithmetic difference operator - to calculate the difference between two timestamps or dates. The data type matters – if we calculate the difference between two timestamps, the return value is an “interval”, and if we calculate the difference between two dates, the return value is an integer representing the number of days between the two dates.

To make it easy to display the data, we’ll plan to use the select method with something like:

users = User.select('users.email, (projects.created_at - users.created_at) as time_to_first_project')

This will return an ActiveRecord::Relation which we can then iterate over and call #time_to_first_project on:

<% users.each do |user| %>
  <%= user.email %>
  <%= user.time_to_first_project %>
<% end %>

However, the above select won’t work for us for two reasons. First, ActiveRecord will complain that there is no FROM clause for the projects table. We can solve this with a call to joins:

User.joins(:projects).select('users.email, (projects.created_at - users.created_at)....')

Second, the above statement compares the creation times of all of a given users’ projects to User#created_at. We only care about the earliest project, so we can narrow this down with a call to where:

User.joins(:projects)
    .where('projects.created_at = (SELECT MIN(projects.created_at) FROM projects WHERE projects.user_id = users.id)')
    .select("users.email, (projects.created_at - users.created_at) as time_to_first_project")

Because a user’s earliest project’s created_at varies with the user, we retrieve this timestamp by passing in the aggregator function MIN() to a nested SELECT function (nested in this case within the first select). This nested SELECT function is also known as a “sub-query”.

This will return a collection of ActiveRecord objects with the attributes email and time_to_first_project. Because we’re subtracting two timestamps, time_to_first_project will be an “interval”.

Intervals in Postgres are the largest datatype available for storing time and consequently contain a lot of detail. If you inspect time_to_first_project for a few records, you’ll notice that they look something like: "00:18:43.082321" or "9 days 12:48:48.220725", which means you might have to do a bit of parsing and/or decorating before you present the information to the user.

Postgres also makes available the AGE(..) function, to which you can pass in 2 timestamps and get an interval. If you pass in one timestamp to AGE(), you’ll get back the difference between the current date (at midnight) and the passed-in timestamp.

If you have two timestamps and you want find the number of days between them, then you can use the CAST function to take advantage of the fact that when you subtract two dates you get the days between them. We’ll come back to CAST in a little bit.

Comparison and Filtering

Next, we want a list of users who took longer than 1 month to create a project. Ideally, we’d want to be able to get a list of users who took longer than any arbitrary period of time to create a project. This operation boils down to a ‘greater-than’ comparison between time_to_first_project and the time period we pass in (1 month for example). Postgres supports the comparison of dates and times to each other. In this case, since time_to_first_project is an interval, we have to make sure that we compare it to an interval in our call to where. This means that if we do:

users.where("(projects.created_at - users.created_at) > 30")

Postgres will complain that the operator to compare an interval and integer doesn’t exist. To get around this, we have to ensure that the right hand side of the comparison is an interval. We do this with the INTERVAL type keyword. We’d use it in this way:

User.joins(:projects)
    .where('projects.created_at = (SELECT MIN(projects.created_at) FROM projects WHERE projects.user_id = users.id)')
    .where("(projects.created_at - users.created_at) > INTERVAL '1 month'")
    .select(...)

As you can see, we can pass in a human readable string like “1 month” to INTERVAL, which is nice.

Aggregations

Next on the list, we want to calculate the average amount of time it takes to create a project. Given what know so far, this won’t require much explanation. Our query will look like this:

User.joins(:projects)
    .where('projects.created_at = (SELECT MIN(projects.created_at) FROM projects WHERE projects.user_id = users.id)')
    .select("AVG(projects.created_at - users.created_at) as average_time_to_first_project")

We’re using the AVG() function and our query will return one User object on which we can call average_time_to_first_project.

ActiveRecord also has available the average function, which we can call like this:

User.joins(...)
    .where(...)
    .average('projects.created_at - users.created_at)

This query’s return value will be a BigDecimal, and because of this we might lose some information. For example, if the true average is INTERVAL "1 day 23:00:01.2234", the BigDecimal value will be 1.

Finally, what if we want to calculate the average time between project creation for each user?

The average time between projects for a user is the average of the difference between consecutive Project#created_at values. So for example, if there are three projects, the average time between projects is:

"((project3#created_at - project2#created_at) + (project2#created_at - project1#created_at))/2"

This is equal to (project3#created_at - project1#created_at) / 2. For N projects, the average time between projects is:

"(projectN#created_at - project1#created_at) / (N - 1)"

In our example, “projectN” corresponds to the user’s latest project and therefore its creation date is equal to MAX(projects.created_at). Similarly, the user’s earlier project creation date corresponds to MIN(projects.created_at). The number of projects is calculated with COUNT(projects).

For our output, let’s say we want the average time between projects to be reported in number of days. Since subtracting two dates in Postgres returns an integer number of days, we can obtain the average time between projects in days by first casting MAX(projects.created_at) and MIN(projects.created_at) to dates, subtracting them and then dividing by COUNT(projects) - 1. In Postgres, we can cast our timestamps with the CAST function. For example:

CAST(MIN(projects.created_at) as date)

will convert the created_at timestamp to a date.

Finally, we want this calculation to be performed once for every user. For this, we have to use the group method, which corresponds to the GROUP BY clause. We need this clause because if we didn’t have it, the calculation will be performed only once for the entire set of projects, rather than for every user.

Our query will look like this:

User.joins(:projects)
    .group('users.email')
    .having('COUNT(projects) > 1')
    .select("(CAST(MAX(projects.created_at) as date) - CAST(MIN(projects.created_at) as date))/(COUNT(projects) - 1) as avg_time_between_projects, users.email as email")

You’ll also notice that I’ve included a call to .having('COUNT(projects) > 1'). This ensures that only users who have more than one project are considered.

Where should we put this stuff in our codebase?

In the database

It can be beneficial to have this SQL reside in the DB instead of in your application layer. You can make this happen with views. A “view” is a stored query which we can query just as we would a table. As far as ActiveRecord is concerned, there is no difference between a view and a table.

The idea is that by creating a view with the user’s email and the fields we’re interested in, like average time between projects and time taken to create the first project, our application code will be cleaner because we can now say something like UserReport.where('average_time_between_projects > ?', 30.days) instead of “all the SQLs”.

To create a view, first create a migration like so:

def change
  sql = <<-SQL
    CREATE VIEW user_reports AS
    SELECT (CAST(MAX(packages.created_at) as date) - CAST(MIN(packages.created_at) as date))/(COUNT(packages)-1) as time_between_projects, COUNT(packages) as count, users.email as email FROM \"users\" INNER JOIN \"packages\" ON \"packages\".\"user_id\" = \"users\".\"id\" WHERE \"users\".\"role\" = 'client' GROUP BY users.email HAVING COUNT(packages) > 1    
  SQL
  execute(sql)
end

The SQL I’ve used was generated by calling the to_sql method on the query we ran previously. Run the migration and then create a UserReport model class which inherits from ActiveRecord::Base. You should now be able do things like UserReport.average('time_between_projects'). You might also notice that results come back faster.

There is one issue with this approach and that is that your schema file is not going to show this view. So if you want your CI to build correctly, you might have to change your schema option to generate sql instead. It’s pretty simple to do this with Rails. The other caveat is that this introduces a cost to switching to a different database backend, because the SQL we now have in our migration is Postgres-specific.

Here are some good resources on views:

  1. https://blog.pivotal.io/labs/labs/database-views-performance-rails
  2. http://blog.roberteshleman.com/2014/09/17/using-postgres-views-with-rails/
  3. The Scenic gem for views, which gets around the schema problem

In the application

If you don’t want to go the views route, you might consider organizing your queries with query objects and scopes. Have a look at this for a brief introduction to query objects.

Conclusion

With the information covered in this article, we can now begin to craft an answer for our hypothetical product manager, making the database server bear the brunt of the work. I’d now like to hear from you. Have you worked with time in ActiveRecord and your DB? If yes, how was the experience and what did you learn from it? Leave a comment below!

I’ve prepared a handy 3-page PDF cheatsheet containing the salient points of this article. If you’d prefer downloading and printing it out, fill in your email address below and I’ll send it to you.

Download a 3-page Cheatsheet for working with time in Postgres



How To Keep Your Controllers Thin with Form Objects

Typically in Rails, forms which post data to a #create or #update action are concerned with one particular resource. For example, a form to update a user’s profile. The sequence looks something like this:

  1. The form submits some data to the #create action in UsersController.
  2. The #create action, with something like User.create(params) creates the record in the User table.

This is easy to reason about. We have one form to create one user. It submits user data to a users controller, which then creates a record in the User table with a call to the User model.

Though some percentage of the forms in our app can be described this simply, most forms we find ourselves needing to build are not that simple. They might need to save multiple records, or update more than one table, and/or perform some additional processing (like sending an email). There might even be more than one context under which a record can be updated (for example a multi-step application form).

Trying to stuff a bunch of logic in your controller can turn out to be quite painful in the long run, as they become longer and harder to understand. Moreover, you might find yourself having to perform gymnastics with your views to get them to pass in parameters correctly.

For example, imagine you’re building a form for a project management app. This form creates a user like in our first example, but in addition, has to do a few other things. First, it has to create a record in the Message table. Then it has to create a record in the Task table. Finally it has to send an email to the project manager.

Some definitions and assumptions before we go forward:

  1. The Message table is used for internal messaging amongst various members of the project.
  2. The Task table is used to store and manage tasks assigned to members of the project.
  3. When a new user is created, we want to send an internal message and assign a new task to the project manager.
  4. We also want to send an email notification to the project manager when a new user is created.

Compared to the first example, it’s obvious that this form requires a flow that doesn’t quite as neatly fit in to the conventional form flow that we saw earlier. There are a few ways we can handle this:

  1. Add in the code to UsersController to accomplish the extra stuff.
  2. If your models are associated with each other (via has_many or belongs_to), build a nested form using either Rails’ built-in fields_for helper, or Ryan Bates’ nested_form gem. You’ll still have to send the email in the controller.
  3. Create a new controller and corresponding “form object” that encapsulates what you want to do.

What is a Form Object

A form object is a Plain Old Ruby Object (PORO). It takes over from the controller wherever it needs to talk to the database and other parts of your app like mailers, service objects and so on. A form object typically functions together with a dedicated controller and a view.

Why use a Form Object

Now, while there are good reasons to go with either option 1 or 2, I’m going to elaborate on option 3. There are a few benefits to using a form object in a situation like this:

  1. Your app will be easy to change.

    If your app is a decent sized web-app, you will likely have a multitude of paths and views through which data gets saved and/or retrieved from your Users table. It’s worth thinking about if UsersController or the User model should be where these paths meet, because if you’re not careful your controller can devolve into a mess of hard-to-change conditional code.

  2. Your app will be easy to reason about

    Conventionally in Rails, the simplest way to reason about a controller is to have it concerned with the seven RESTful actions and views; these actions would only interact with one model and ideally the interaction would be a one-liner like @user.update(params). The closer your controller is to this pattern, the easier your controller will be to reason about.

  3. Your view will likely be simpler to write (read: no deeply nested forms) and contain less logic.

What a Form Object looks like in practice

First things first, decide on the name of your new controller. This will give you quick feedback on if your abstractions will make things easier to reason about.

I always ask myself this, what happens when the form is submitted? In our example above, a user is created and the rest of the project team is notified. So a reasonable name might be ‘UserRegistrationsController`. You can double check by trying to apply the seven RESTful actions to this controller:

  • Can I “create” a user_registration?
  • Can I “update” a user_registration?
  • Can I “destroy” a user_registration?
  • … and so on

All seven actions might not always make sense, which is why I find it handy to define my routes like this, for example:

resource :user_registrations, only: [:create, :update, :new, :edit] 

In my Rails apps, the convention I follow is to always name the form object class FormObject, and namespace them with something context dependent. So in this case, my form object would be UserRegistration::FormObject.

So your form, which resides at user_registrations/new.html.erb, posts some data to the #create action of the UserRegistrationsController, which calls #save on UserRegistration::FormObject with the params you pass in.

Form Object parameters

To be able to specify an input in your form, you need to expose the related attribute in your form object. In our example, one of the inputs we want is the user’s name. So in our form object, we’d do:

module UserRegistration
  class FormObject
    include ActiveModel::Model
    attr_accessor :name
    ...
    ...
    def self.model_name
      ActiveModel::Name.new(self, nil, 'UserRegistration')
    end
  end
end

Because we’ve said attr_accessor :name, we can now in our view say something like <%= f.text_field :name %>.

You’ll also notice a couple of other things:

  1. We said include ActiveModel::Model. This is cool, because we can now use methods like validates and perform any validations we want. An advantage of using validations in the form object is that they are specific to the form object and won’t clutter up your model(s).

  2. We also defined the model_name class method. The Rails form builder methods (form_for and the rest) need this to be defined.

Form Object Initialize and Save

The behavior of our form object will be governed by two methods. #initialize and #save. This is because in our controller, we want to be able define our create action like so:

def create
  @form_object = UserRegistration::FormObject.new(params)
  if @form_object.save
    ... #typically we flash and redirect here
  else
    render :new
  end
end

You can see above how the form object directly replaces the model, allowing us to keep our controller clean and short.

Your initialize method would look something like:

def initialize(params)
  self.name = params.fetch(:name, '')
  ... # and so on and so forth
end

And save:

...
def save
  return false unless valid? #valid? comes from ActiveModel::Model
  User.create(name: name)
  notify_project_manager
  assign_task_to_project_manager
  true
end

private

def notify_project_manager
  ... # here we talk to the Message model
end

def assign_task_to_project_manager
  ... # here we talk to the Task model
end
...
# and so on and so forth

The important thing with the #save method is to return false and true correctly depending on if the form submission meets your criteria, so that your controller can make the right decision.

Recap

I’m a big fan of form objects, and a general rule of thumb for me is to use them whenever I feel things are getting complicated in the controller and view. They can even be used in situations where you’re not dealing directly with the database (like for example interacting with a third party API).

I encourage you, if you haven’t already, to consider how form objects might fit into your app. They will go a long way in ensuring your controllers and models are thin and keeping your code maintainable.

Have you used form objects in your Rails apps? Have they helped or hindered you? How else do you keep your controllers thin? Let me know in the comments section, I’d love to hear what you think.

If you would like to learn more about ways to make your Rails app enjoyable to work on, I’ve put together a free 4-part email course, which will walk you through the steps of refactoring a legacy app.


When is indexing in a model appropriate?

A user on Reddit says:

I’m not a DB ninja like a lot of developers as I started out more front end and moved towards full stack. I understand the point of indexing is to speed up SQL queries on large data sets, but I feel like I’m just not sure when to use them and when not to.

What is the current best practice with rails?

Uncertainty often stems from a lack of understanding, and indexes in your Rails app can, if you’re not familiar with them, be akin to those murky backwaters you rarely venture into. But it needn’t be. Indexing done right can make a huge difference to the performance and speed of your app, which more often than not, will translate into increased sales, conversions or whatever it is you measure to determine if your app’s ultimate purpose is being fulfilled.

What is an index?

An index is a database construct that makes it easier for the database to look up a given piece of information. An oft cited example, which brings the point home, is that of a phone book index. Imagine you wanted to look up one Rick Sanchez’s phone number – what would be faster? Looking through each page of the book one by one, or flipping back to the index which conveniently lists in alphabetical order the page number Rick’s phone number might be found in? Of course, the index would be way faster. And the positive performance effects multiply the more times you have to look up phone numbers.

Now, when someone moves in or out of Rick’s locality (which the phone book covers), or someone’s name or phone number changes, the index at the back will also have to be updated. This might end up costing you performance if you’re not careful.

Why use an index?

If we represented the phone book example in database terms, we’d probably have a PhoneBook table, with a name and phone_number column. Adding an index to the table on the name column, makes it easier and faster for the database to look up a phone number for a given name (say when you do something like PhoneBook.where(name: 'Morty Smith') ), especially if there are a large number of rows to look through.

So the main reason to use an index on a given column is increased speed, which will more often than not, translate into more sales, conversions or whatever it is you measure to determine if your app’s ultimate purpose is being fulfilled.

Note that although we’re just talking about indexing on one column, it is actually possible to index on multiple columns too. I’ll list out resources at the end of this article which will go more in depth into topics like this.

When to use an index

If you’re with me so far, it’s clear to you that an index on a column makes it faster for the database to look up a row or a set of rows which satisfy a given condition on that column. With that, we can come up with a rule of thumb:

You should add an index on a column if most (or many) of the queries you write for the table involve that column.

In a typical Rails application, there’s one type of column that is pretty much guaranteed to be involved in most queries for tables that it is in, and that is the foreign key column.

Let’s consider an example.

Say you have a User and a Post model, and the User has_many :posts. This of course means that the :posts table has a foreign key column user_id, and that whenever we do something like @user.posts or @post.user, we are asking the database to look up a given user_id for one or many Post records. Without an index, the database would have to look up each row in the posts table, one by one, to see if their user_id matched what was given. If there are many Post records (in other words, if Post.count returns a huge number), then this would take a lot of time. So, we would add an index to the posts table on user_id.

Right now, you’ll have to take my word for it that this is an improvement. But I’ll cover how you can verify this for yourself very soon.

So, add indexes by default to your foreign key columns. This is considered a standard Rails practice.

“Non-foreign key” columns can also be good candidates for adding indexes on. If they’re referred to a lot in your table queries, and you have a sizable number of rows in your table, then it would probably benefit you to add an index on them. Common examples I’ve come across in my experience are the date_of_birth, email, name columns in a User table. Remember to apply the rule of thumb discussed above if you’re unsure.

How to Index

You know what an index is and when to use it. Here’s where we cover how you can begin to actually use them in your Rails app.

When you’re associating two models with no prior connection

If you have two models which you want to associate, you know that you have to create a foreign key column on one of them. It turns out that in Rails 4 and beyond, you can create an index at the same time (which you would want to do in most cases), with the add_reference method.

Say you’ve created a User and Post model, and you want to set up a has_many relationship between them in your DB, you could do this in your console:

rails g migration AddUserToPosts user:references

which will generate a migration which uses the ‘add_reference` method like so:

add_reference :posts, :user, index: true

The add_reference method will not only create a user_id column on your Post table, but also add an index on user_id. Nifty. You can also pass in foreign_key: true to this method to add in a database constraint

Adding an index to an existing association

If for some reason, you don’t have access to the add_reference method, or you’ve already created a user_id column without adding an index to it, you can use the add_index method in your migration like so:

add_index :posts, :user_id

You can use the add_index method to add an index on any column in the table.

How to check if your index made a difference

So far, you’ve taken my word that indexing is a good thing which speeds up your app in ways that are relevant to business goals. But you don’t have to, because you can use ActiveRecord's #explain method to verify if adding an index made a difference to your query time or not. This will also help you determine the situations where indexing doesn’t give you much of a performance benefit.

The #explain method in ActiveRecord maps to the EXPLAIN command in SQL, and one of the things this command does is tell you how long the database thinks the query in question will take. The specifics of how the database formats this “explanation” and the units it measures “execution time” in depend on the database you’re using. Two common databases in the Rails world are Postgres and MySQL, so I’ll give you the basics on those.

To see the output of the #explain method, you simply send the #explain message to the query you’re writing, like so:

Post.where(user_id: 23).explain

If you’re on a Postgres database, and you’ve run explain on a query, you’ll want to look at the cost numbers. The higher this number is, the longer the database thinks your query is going to take. You’ll typically see a range being output for cost.

If you’re on a MySQL database, you’ll want to look at the rows number, which tells you how many rows the database thinks its going to have to go through to get you a result. The higher this number, the longer the database thinks your query is going to take.

In short, by using ‘#explain’ before and after you’ve added an index, you can see for yourself how much of a difference adding the index has made.

This article, and this one, written by Tom Ward, a Rails contributor and developer at Basecamp, are fantastic reads which go deeper into indexing and show you how you can make sense of the explain method’s output (he uses MySQL in his examples).

Tom also has this handy script, which you can use to list any foreign keys you might have missed indexing.

Advanced Topics

Indexing is a vast topic, and if you want to dig deeper, here are some starting points:

A free book on indexing

This book covers many things I didn’t cover here. Definitely worth a read.

When not to use an index

  1. Check this out, and
  2. this.

Optimizing Indexes

  1. Tips on optimizing indexes

Terms to Google

  1. Partial Indexes
  2. Redundant Indexes

As always, hit me up in the comments section if you’re stuck on anything in particular or if there’s a topic you want me to write more about. Cheers, and until next time!

All About Foreign Keys

The concept of “foreign keys” in the Rails world has quite the potential to confuse, because, depending on the context of use, the intended meaning might actually differ! In addition, you might find yourself wondering how things like “references”, “foreign keys”, “associations” and “joins” are different, and why you should use one or not use the other.

You’ve probably reached a point in your Rails life where you want to understand more about how ActiveRecord works with your database. As you’ve become familiar with the ActiveRecord API, you have built up an intuition about how things work, but you know that to build efficient and elegant web applications with Ruby and Rails, it will benefit you greatly to deepen your understanding.

Foreign Keys

A foreign key is a mechanism that you can use to associate one table to another. Typically, it takes the form of a column with the name <other_table_name>_id. Let’s say you have a User table and a Project table, and you’ve used a has_many on User, you’ll see that a statement like user.projects when translated into SQL by ActiveRecord will automatically use user_id as a foreign key. Check it out yourself and see, try:

user.projects.to_sql

in your console. You should see an SQL statement that is searching for all projects with a user_id corresponding to user‘s ID.

ActiveRecord relies on this column to figure out which projects of all in the projects table are “related” or “associated” to the given user.

You probably know that doing has_many is not enough. You will also have to add a column to the Project table with the name user_id. Your migration might look like:

add_column :projects, :user_id, :integer

Newer versions of ActiveRecord also provide the #references and #add_reference method, which take care of adding the foreign key column for you. So, instead of using the add_column method, you can use the add_reference method, like so:

add_reference :projects, :user

This will add a user_id column to the projects table. The advantage with using the reference method is that you can add an index and a database constraint on the foreign key painlessly as well.

Check out this doc and this doc for more info.

Referential Integrity

If you were wondering what I meant when I said ‘database constraint on the foreign key’, rest assured, because I’m going to cover what that means here.

It depends on who you’re talking to, but in many contexts, a foreign key in the Rails world is understood plainly as the column itself. Depending on your use case, this could be enough. Keep in mind though, that the problem with this approach is that when we only have a foreign key column, we have no way of ensuring that for a given value of user_id, a corresponding User record actually exists. This has the potential to lead to confusion down the line.

To know what I mean, fire up your console and try the following (assuming you have the same setup as above, and that we have one User in the system with ID = 1)

project = Project.create(name: 'hit song', user_id: 2)

You should be able to create the project, but when you do project.user, you’ll get back nil. AKA an orphaned project. You can solve this with a validates in your model to ensure uniqueness and/or presence of the user, but you might find yourself in a situation where you’re bypassing validation callbacks somehow (probably with a bulk operation), and yet again creating orphans.

What solves this problem, is getting the database to ensure that orphaned records are not created.

Enter the add_foreign_key method, which you’ll notice is sort of a second meaning for the term ‘foreign key’. Using this method in your migration tells the database (in the way only ActiveRecord knows how) to enforce referential integrity by adding a foreign key constraint.

This very good article from the robots at thoughtbot goes into many of the finer details of this concept, and why it’s important.

You’d use it like so:

add_column :projects, :user_id
add_foreign_key :projects, :users

This Rails doc gives some additional details.

If you’re using the add_reference method, you can pass in a boolean to indicate if a foreign key constraint is to be added, like so:

add_reference :projects, :user, foreign_key: true

The .joins method

If you came to Rails without a background in SQL (like me), you’ve probably wondered about the .joins method, how it is related to the concept of associations and foreign keys, and when you should use it.

The rule of thumb with the .joins method is that you need to use it when you need to query more than one table at the same time.

In other words, if the query you’re writing for a given table/model needs to reference attributes in another table (or more), you will need to invoke .joins to make this happen. This holds true regardless of if the tables are associated to each other with a foreign key column or constraint.

So for example, if we want to query the projects table for all projects that are owned by a user of role ‘manager’ (assuming we have a ‘role’ column on the users table), we’d do:

Project.joins(:user).where(user: { role: 'manager' })

You’ll notice that if you try this without the .joins, ActiveRecord will complain.

For some background on joins, check out this nice summary on the different types of joins you can do in SQL. You can even use the .joins method to query multiple tables that are not associated to one another if you so choose. I encourage you to fire up the console and use the .to_sql method for your queries whenever you’re stuck to get a better handle on what ActiveRecord is doing behind the scenes.

Hope this helped, and as always, please leave a comment below if you’re stuck on something that you need help with or if something above was unclear in any way. I’d love to help out!

Using scopes with has_many and belongs_to

Implementing scopes on your associations can cause much confusion and frustration; especially when you see hard to interpret SQL-y errors being returned, (you’ve probably seen error messages which start with PG::UndefinedTable: ERROR: missing FROM-clause entry for table..., right?) and have no idea how to go about fixing them.

When you have a belongs_to or a has_one/has_many (with or without :through) association defined on your model, you will in many situations benefit from defining scopes in your model that reference your association. Defining a custom scope which references attributes in your association table(s) has the potential to DRY up your code and make code in other parts of your application easier to understand.

Defining scopes directly on the association

ActiveRecord's has_many and belongs_to methods give you the ability to define scopes directly on the association. This way, you can customize the SQL that is generated when you access the association.

Say you have a User model which has_many posts, and when you do @user.posts, you only want to return the posts which were created in the last year. Your User model would look like:

class User < ActiveRecord::Base
  has_many :posts, -> { where('created_at > ?', Time.current - 1.year) }
end

Because you’ve defined the scope this way, when you do @user.posts, the generated query will include the where condition above so that any posts returned will have been created within the past year.

You can also pass in an argument to this scope, which allows you to further customize the generated query. Say instead of defaulting to 1 year you wanted to pass in the timeframe, you could do something like:

class User < ActiveRecord::Base
  has_many :posts, ->(timeframe) { where('created_at > ?', timeframe }
end

One gotcha to keep in mind, especially when using columns like created_at which are common to most tables, is that you might have to specify the table name explicitly in your scope if your DB complains.

So the above would look like:

class User < ActiveRecord::Base
  has_many :posts, ->(timeframe) { where('posts.created_at > ?', timeframe) }
end

Referencing attributes of the association table in your scope

You might also sometimes benefit from defining scopes in the parent model which reference attributes of your association. Adding to the example above, let’s say you wanted to define a scope on the User model which returns all users who have pending posts (posts which haven’t been “approved” yet).

class User < ActiveRecord::Base
  has_many :posts
  scope :with_pending_posts, -> { joins(:posts).where('posts.pending = true') }
end

Couple of things to note from the scope definition above:

  1. You have to use the joins method in your scope so that your DB knows which table you’re referring to.
  2. To reference association attributes in the where method, you have to use the table name of the association followed by the column name (posts.pending in this case) to prevent your DB from complaining, and to have it return the expected result.

A corollary from the points above is that you can technically reference attributes of any table in your scope, even if they are not explicitly defined as an association in your model. You’d of course have to set up your joins correctly – refer to the ActiveRecord documentation for more info, and if you’re still stuck, ping me in the comments below!

Update: Kevin pointed out in the comments section that instead of doing:

scope :with_pending_posts, -> { joins(:posts).where('posts.pending = true') }

you could do:

scope :with_pending_posts, -> { joins(:posts).merge(Post.pending) }

This is definitely worth a consideration as it makes your code cleaner and easier to read. Not only that, you will probably be able to make use of the scope you define on your Post model in other areas of your system.

If you haven’t seen the merge method before, take a look at the following resources to get acquainted and some ideas on how it could be useful:

  1. ActiveRecord documentation on merge
  2. A useful article via gorails.com on using merge with scopes
  3. An article I wrote on 4 ways to filter has_many associations, one of which includes using the merge method

Troubleshooting error messages

When the scope you’re building is complex, you’ll probably run into errors, especially SQL errors. It will benefit you greatly to learn how to read these errors and understand what your DB is complaining about. Typical complaints include tables which are not defined and columns which don’t exist because the query is looking for them in the wrong table.

Remember that a custom scope is the same as defining a class method. So if you’re stuck, ask yourself how you would implement the scope as a class method.

Keep in mind that if you want your scope to do calculations based on association attributes, you might not be able to use Ruby, and will have to brush up on your SQL (more importantly, SQL that works with your DB of choice) to accomplish what you need.

For example, check out the scope below, where for an Availability model, we’re returning records whose start_time is greater than an associated Venue's notice_time, which in this case happens to be stored as an integer. As you’ll notice, Now() and 1 hour::interval are Postgres specific SQL.

class Availablity < ActiveRecord::Base
  has_one :venue
  scope :after_notice_time, -> { joins(:venue).where("start_time >= now() + venues.notice_time * '1 hour'::interval") }
end

Check out this Reddit thread for more details on the above example.

Also, if you haven’t seen this before, this Railscast is an amazing resource for getting deeper into association scopes.

As always, let me know in the comments how I can help you – cheers!

I love sharing tips, tricks and insights to make the craft and business of web development with Ruby and Rails faster, easier and more enjoyable.

Sign up to receive an update when I post a new article.

I also like sharing helpful and thought provoking articles by other smart people. Check out my most recent newsletter to see an example

Should I create a model if it will only have 1 row?

A post on Reddit says:

I’m not sure how best to do this. I would just set up a model, but that seems wrong to have a model with just 1 row in it (it’s a single charity). Is there a better way to do this? STI? something else?

It can seem like overkill to create a database table and a corresponding ActiveRecord model if you know for a fact that your table will only have one row. This is a pretty common situation, and there are a couple of ways to deal with this.

Note: Pat Maddox’s amazing comment on Reddit, which addresses this question, was the primary inspiration for this post.

First, if non-developers need the ability to update information that this “model” would contain, then yes, you should create an ActiveRecord model. This will of course give you all the benefits of ActiveRecord as well, like being able to use has_many, belongs_to and everything else.

If only developers are going to be concerned with and using this model, you don’t have to tie it to the database and can implement it as a “Plain Old Ruby Object” aka a “PORO”.

The advantage with PORO’s is that they are cheap to instantiate because you won’t be making a call to the database when you do so. Setting up association-like behavior can be as simple as defining either instance or class methods on the PORO. As Pat advises in his comments however, you should always ask yourself what benefit you’d get from implementing association-like behavior in your PORO to ensure that you’re doing it for the right reasons, and to keep your code easy to understand.

So, going with the example in the Reddit post above, where we have a Charity model (this is the model that will only have one “row” and the one we will be writing a PORO for) which needs to define a has_many type association with the Donation model, you’d have something that looks like:

class Charity #this is our PORO
  def name
    'The Human Fund'
  end

  def donations
    Donation.all
  end
end

class Donation < ActiveRecord::Base #this corresponds to a table in the DB
  validates :amount, presence: true

  def charity
    Charity.new
  end
end

As you can see, we’re manually defining the methods (#donations and #charity) that we’d get automatically if we used has_many. This allows us to get around the fact that a Charity object does not correspond to a table in the database, and lets other parts of our code use Charity objects as they would any other ActiveRecord object.

For more association type goodness on POROs, also check out the activemodel-associations gem.

So if you decide to go with a PORO, you next question is probably: Where should this file reside?

There are a couple of common options – in the models directory with the rest of the models in /app/models, or in the /lib directory. Pat Maddox recommends the models directory, and I concur, because if you’re going to treat it like a model in other parts of your code, it’s natural that other developers (or you in the future) will look for the class in the models directory. That being said, this is just a convention, and either way works.

As always, let me know in the comments below if you have a question that is not covered by this article (maybe polymorphic associations?), and I’ll get back to you ASAP.

4 ways to filter has_many associations

You have a model that has_many associations, and you’d like to get a collection of these models with their associations that satisfies a list of conditions both on the model and the association records. You’ve tried a few things, but whatever you’ve tried only “sort of” works (meaning, not really) and leaves you feeling like you need a push in the right direction.

Filtering has_many associations can be one of those problems which makes you scratch your head and want to do some extra reading to brush up on your SQL and ActiveRecord knowledge.

For example, let’s say you have User and Project models in your system, and you want to write a query that retrieves a collection of users with projects that were created in a given date range.

Depending on what you want your query to return, here are 4 different ways to approach this problem:

  1. The simplest thing you can do is to combine the joins and where methods that ActiveRecord gives you, like so:

    User.joins(:projects).where(projects: { zipcode: 30332 })
    

    This technique is a good choice when you need to filter your records by one or more attributes. This will give you back a collection of User records which all have projects with a zipcode of 30332.

    One gotcha to keep in mind is that since you’re doing a INNER JOIN with the joins method, this can return duplicates. Read on to point 2 to see how you can get around this.

  2. More often than not, you might have scopes defined on your association models which you want to re-use. So when you want to use these scopes to do your filtering, get acquainted with ActiveRecord's merge function. According to the ActiveRecord documentation, merge returns an array representing the intersection of the ActiveRecord::Relation that you call merge on, and the ActiveRecord::Relation that you pass in.

    So, for example, if you have a scope in your Project model called opened_recently, which returns all projects which have been created within the last 10 days, you can do something like:

    User.joins(:projects).merge(Project.opened_recently)
    

    This will return to you a list of User objects which all satisfy the condition of having projects which were opened recently.

    Like I said before, one thing to keep in mind when you’re using joins for a has_many association is that since you are doing an INNER JOIN, it is possible that you will get back duplicate User (in our example) objects – basically one object for every ‘match’. It’s easy enough to get around this with a call to the uniq method.

    User.joins(:projects).merge(Project.opened_recently).uniq
    

    The uniq method works by changing your query from something like SELECT `users` from ... to something like SELECT DISTINCT `users` from...

  3. If you’re going to be needing to retrieve information from the records you filter, especially information stored in the associations of these records, another option to consider is eager loading and the includes method. As you probably know, the benefit of eager loading is that you get to avoid the N + 1 queries problem.

    It’s pretty straightforward to filter by attributes or scopes on the association:

    User.includes(:projects).where(projects: { zipcode: '30332' })
    

    This will give us all users who have projects in Atlanta, and also eager load their projects.

    If in your .where you want you use an SQL fragment, then you need to call the references method:

    User.includes(:projects).where('projects.deleted_at IS NOT NULL').references(:projects)
    

    You can also use the merge method in conjunction with includes and references to make use of any scopes you might have defined in your association models, like so:

    User.includes(:projects).merge(Project.opened_recently).references(:projects)
    

    You won’t need to use uniq when you use includes because ActiveRecord will take care of it for you automatically.

  4. Now, what should you do if you want your collection to only have those associations which meet your filter criteria? This situation arose for me when I had to write an API endpoint which let app users pass in parameters to filter associations. There are a couple of ways you can approach this. In my case, since I had to return an object that would eventually be converted to JSON, the simplest way was to manually construct a hash which contained the record and the filtered associations.

So I did something like this in my controller:

def show 
  user_attributes.merge(projects: filtered_projects) 
end


def user_attributes 
  # returns a hash containing attributes for the given user 
  # something like this: # { id: 1, first_name: 'Rick', last_name: 'Sanchez' } 
end

def filtered_projects 
  # here's where I used any Project related params to filter down my projects 
  # something like Project.opened_after(params[:project_date]).as_json 
  # I use as_json because I need this method to return a hash 
end

Note that the merge method used here is the one for Hash, and not the ActiveRecord method. You should be able to extend this to return multiple users as well.

Another way to do this is by defining a scoped has_many association. If you’re interested in seeing how this might be done, ping me in the comment section below and I’ll do a post on this topic specifically.

Hope this was helpful. As always, if you have a question about this topic that isn’t specifically addressed above, leave a comment below, and I’ll do my best to help out!

I love sharing tips, tricks and insights to make the craft and business of web development with Ruby and Rails faster, easier and more enjoyable.

Sign up to receive an update when I post a new article.

I also like sharing helpful and thought provoking articles by other smart people. Check out my most recent newsletter to see an example

How to use has_many :through with additional attributes on the join table

A common area where developers can get hung up is when using has_many :through. has_many :through essentially allows you to link two models together with a “join table”. Though it can be pretty simple to work with, things can get confusing when you find yourself having to pass and store additional attributes in this join table.

“I have read every post on the internet pertaining to has_many through with additional attributes on the join table and I am still not getting it”

“how do you store extra info in the join model and extract that out later?”

If you followed along my previous post, at this point you should have a good idea of how to write your models, controllers and views such that you can create, display and edit records which use a has_many :through association. If you’re still stuck on that, leave a comment below and I’ll try to sort you out. Make sure to check out Ryan Bates screencast on the basics, and also checkout Rahoul’s article on when you should has_many in the first place.

So, what do you do when you want to store/access additional attributes in the join table?

You already know how to set up your view form so that it passes in parameters to your controller and correctly creates a new record in the join table. If you’re using checkboxes in your view, the checkbox code probably looks like this:

<% Group.all.each do |group| %>
  <%= check_box_tag "user[group_ids][]", group.id, @user.group_ids.include?(group.id) %>
  <%= group.name %>
  <br />
<% end %>

…assuming that you have a User and Group models and a UserGroup which functions as the join.

And your controller #create action should be very similar to this:

def create
  User.create!(user_params)
end

def user_params
  params.require(:user).permit(group_ids: [])
end

As you can see, we don’t have to do anything special in the controller because group_ids will be accepted as a parameter (it is part of the dynamic programming which happens when you call has_many). You do have to correctly permit the group_ids param though.

Now, let’s say for a given user you want to specify if they are an admin of the group or not, via checkbox.

I’m going to assume that we will have an edit page for each group record, and on this page, we will see all the users that belong to this group along with a checkbox next to each user indicating if they are an admin or not.

The View

In my group edit page, I want to show the users that are in the group and next to each user show a checkbox allowing me to select if the user is going to be an admin.

<%= form_for @group do |f| %>
  <%= @group.inspect %>
  <br />
  <% if @group.user_groups.present? %>
    <%= f.fields_for :user_groups do |ugf| %>
      <% user = ugf.object.user %>
      <%= user.name %>
      Admin?
      <%= ugf.check_box :admin %>
      <br />
    <% end %>
  <% else %>
    No Users in this group yet
  <% end %>
  <%= f.submit 'Update group' %>
<% end %>

Couple of things to note here:

1) I’m using the user_groups association. By using fields_for on this association, I can treat it like any other association of group and build a custom form for it.

2) When submitting this form, the user_groups parameters will be passed in under the user_groups_attributes key in the params hash.

The Model

To be able to pass in a hash with user_groups_attributes to the Group model and call update or save on it, we need to use the accepts_nested_attributes_for method. This method tells Rails and ActiveRecord how to correctly deal with user_groups_attributes being in the params hash.

Our Group would look like this:

class Group < ActiveRecord::Base
  has_many :user_groups
  has_many :users, through: :groups
  accepts_nested_attributes_for :user_groups
end

So the parameter we pass in to accepts_nested_attributes_for is the model/association we want to accept nested attributes for.

The Controller

Because of the setup we did above in the Group model, we can now use the usual update method with the params that we get from the view/form.

def update
  @group = Group.find(params[:id])
  @group.update(group_params)
end

def group_params
  params.require(:group).permit(user_groups_attributes: [:admin, :id])
end

If you’re using Rails 4 and strong_parameters, you will have to make sure you permit the correct parameters.

And that’s it! You should now be able to update this admin attribute on the UserGroup join table. You can follow the same approach for different types of data as well (like a text field for example). I encourage you to look deeper into what the params hash looks like once it gets to the controller so that you get more comfortable with it. Play with this idea in Rails console as well to increase your confidence.

I’ve posted an example app on github – check it out if you need more info about how exactly to get this to work. If you’re still stuck, or are dealing with a use case which is not similar to this, post in the comments section below and I’ll try to help.

How a has_many :through association works in practice

When you’re beginning to work with Rails and ActiveRecord, it can be tough to wrap your head around the concept of has_many :through associations, and more importantly how they can be used in your Rails app.

“lets say i have group, member, memberships…how do i then actually USE this in my models, controllers and views?”

“I’m using a has_many :through relationship which appears to be working. I now need a form that I can render to have users join or leave the group.”

First off, you should be reasonably confident that a has_many :through association is a good choice for your app design. Assuming you are, here’s how you can put it into practice.

Let’s say you’re building an app with Users and Groups, and you want to build a form where a given user can join or leave any group. You’ve chosen to associate your users and groups through the table Membership.

Your models are structured like so:

class User < ActiveRecord::Base
  has_many :memberships
  has_many :groups, through: :memberships
end

class Group < ActiveRecord::Base
  has_many :memberships
  has_many :users, through: :memberships
end

class Membership < ActiveRecord::Base
  belongs_to :user
  belongs_to :group
end

I always find it helpful before I start writing my controllers or views, to think through what I want to accomplish, and use the Rails console to ensure I’m on the right path. In other words, I try to come up a with an end-goal, and then work backwards with the help of the Rails console to figure out my solution.

So, given that we want to render a form that allows a user to join or leave a group, we should ask ourselves, “what does it mean to have a user join a group?”

As far as the database is concerned, it essentially means that the Membership table will have a row with the user_id and group_id columns corresponding to our given user and group. Our task then is to orchestrate things such when the user clicks ‘submit’ on their form, the Membership table ends up with a new row (or rows) with the correct user_id and group_id values.

In the form/view that we render for the user, we need a way whereby the user can select which groups they want to join. And in the controller, we want to extract the choices that the user made in the form and use those choices to create new rows in the Membership table.

First, imagine you had a user_id and a group_id, how would you go about creating that new row in the Membership table? ActiveRecord makes it easy. What we want to do is, for a given User record, add a ‘membership’ to it, so that when we say @user.groups, we get back a list of groups which includes the group we just added. If you do this:

#assuming @user is your User record
@user.groups << Group.find(group_id)

#if you have a list of group_ids(something like [1, 2]), you can also do:
@user.group_ids << group_ids

This piece of code will automagically create a new row in the Membership table with the right group_id and user_id. And now whenever you use @user.groups, you will get back a list of groups that you added. For added confidence, try the above in your Rails console.

Check out the Rails guide below for more details on what methods ActiveRecord provides you when you use has_many :through.

I’m going to assume you have some type of authentication set up in your app, which gives you access to the current_user method. We will need this method to get our user_id, via a simple current_user.id call. Once you have your user (with something like @user = User.find(current_user.id)), you can use the code above to add groups.

The next question is, how do you write your view such that it passes the group_id values to the controller?

How you write your view will depend wholly on the flow you expect your users to go through and other factors in your app. Let’s say you decide to provide your user with a list of checkboxes listing the groups they can join or leave. Here’s a simple way to accomplish that:

<% Group.all.each do |group| %>
  <%= check_box_tag "user[group_ids][]", group.id, @user.group_ids.include?(group.id) %>
  <%= group.name %>
  <br />
<% end %>

The usage of check_box_tag probably needs some explanation.

The first parameter "user[group_ids][]" tells Rails to group all the checkboxes with this ID together and pass it in to the controller as an Array. What this means is in your controller, you can do params[:user][:group_ids] and get a nice Ruby Array of all the group_ids that the user chose.

The last parameter just ensures that any Group which the user currently belongs to is checked when the form is rendered.

The above should get you going with using has_many :through in your Rails app, but depending on what your app does I’m pretty sure you will run into issues which are not covered in this post. So if you’re still stuck, check out the following:

1) Check out this amazing Rails cast which takes you through the entire gamut of using has_many :through. If you haven’t already, I’d highly recommend you get yourself an account on RailsCasts.

2) Rails ActiveRecord Guide on has_many :through

If you need more help, hit reply in the comment section below, and I’ll do my best to get you going.

Retrieving Random Records in Rails

What is the best way to retrieve one or more random records in Rails?

Given that there are multiple ways to retrieve random records in Rails and Ruby (take a look at this SO post), it can be confusing and frustrating to decide which course of action to take.

“Is there a technique to retrieve random records that is correct/general/Rails way/cross-database compatible? “

In this post, I’ll cover a couple of simple approaches that you can take to figure out what the best way might be for you.

The first priority that you likely have to keep in mind is speed. Most applications have a threshold for what is considered slow – in web apps for example, anything that takes longer than a few hundred milliseconds to load is considered slow. So you don’t want to be choosing a method that will have your users waiting around forever for your app to do something. That being said, there is also no point in sacrificing code readability, ease of re-use and just time in general, to get faster than you have to.

The second thing you have to consider is the size (number of rows) of the database table you are looking to retrieve a random record from. In general, the bigger your table, the more knowledgable you’ll have to be about your database platform (typically Postgres or MySQL) to get to your desired speed.

Take this common solution to retrieve a random record in Rails:  

random_user = User.all.sample

To use the code above requires no internal knowledge of the database you’re running on. The #all method is a well-known ActiveRecord method and #sample is a Ruby method on Array that retrieves a random value from a given array.

The flip side of this approach is that if your database has a sizable number of rows, it will take a long time to execute. Try it out on your Rails console and see! The reason it does so is because User.all retrieves all the users from the database, and then builds ActiveRecord objects from them. So if you have let’s say 10,000 users, building 10,000 ActiveRecord objects takes a while.  If you’re reasonably certain that the table in question will not grow much in size over the lifetime of your app, and the speed at which the above code executes is agreeable to you, then by all means go for it.

Now, if your table is HUGE (has a million+ rows for example), the above method is not going to work for you. Consider using the ActiveRecord #offset and the Ruby rand methods. The #offset method in ActiveRecord specifies how many rows to skip before returning the rest of the rows. The rand method, as expected, generates a random number less than a maximum value – so when you say rand(10), it will generate a random number less than 10.

So something like this:

count = User.count
random_offset = rand(count)
random_user = User.offset(random_offset).first`

…works better (faster) for big tables, because you’re not making the database scan through every single row in the table.

Resources: