Everyday performance rules for Ruby on Rails developers

2023-12-07

Delve into best practices for HTTP, Ruby on Rails, and databases, and discover when to adhere to the rules or when breaking them could set your code apart.

This post takes inspiration from Visual design rules you can safely follow every time by Anthony Hobday.

Here, we’re talking about performance rules you can safely follow every time. As with any rules, you are free to break them. You may need a good reason to do so.

We are covering some best practices for HTTP, Ruby, and the database layers so that most applications with decent traffic can improve their response time. Some of these practices can be obvious to experienced developers. Your applications are on good Rails if you already know and use them all.

HTTP

Use a CDN

Serve all resources from a CDN. It will reduce latency for your visitors and the number of requests to your server. What’s more, CDNs provide more bandwidth than your servers.

CDNs aren’t expensive, and their rates are very progressive. They are also simple to set:

config.action_controller.asset_host = "cdn.application.example"

We can’t think of any good reason to do without them except for an application running solely on a private network.

Enable HTTP compression

Compression saves bandwidth at a modest CPU cost. Most web servers, like Apache and Nginx, enable compression by default. To be sure, verify the presence of Content-Encoding: gzip in the response headers.

Enable HTTP cache

A cached resource means one less request for the client and server and, therefore, a faster loading time.
It goes without saying that caching should be enabled for all resources passing through the CDN.

The Cache-Control header gives instructions to browsers and CDNs. Check that you have a header in the response that looks like this Cache-Control "max-age=86400, public". Max age is the duration in seconds, so 24 hours in this case. It’s up to you to decide whether you need a more aggressive cache.

For private resources, turn off the cache with the Cache-Control: private header.

Enable keep-alive connections

Keep-alive connections are reusable. They prevent having to re-establish a connection, as well as SSL negotiation. They reduce latency time for all pages made up of several resources.

Web servers often activate them by default. You can verify the presence of the following header Keep-Alive timeout=5, max=100. In this example, the connection is closed after 5 seconds of inactivity and can be reused 100 times.

Ruby

Run in the background as much as possible

Any heavy or latency-intensive task should be run in the background as far as possible. Sending e-mails is a case in point.
It’s a relatively long task compared with the duration of an HTTP request. Moreover, its duration is unpredictable since it requires a network connection. On the other hand, there’s no obligation to send the e-mail during the HTTP request. So, using the deliver_later method from Rails controllers is a good habit.

As well as reducing response time, it frees up a web process or thread to handle the subsequent request. So the application can handle a larger volume. It will also be less vulnerable to a denial-of-service attack.

Know count, size, and length to save on SQL queries

It’s important to know the differences between these three methods to ensure you’re triggering the fewest or most optimized SQL queries possible.

The count method always triggers a SELECT count(*) FROM table query. The length method ensures that the relationship has been loaded to count in memory. The size method adapts to the loading of the relationship. Either it triggers a query if it hasn’t been loaded, or it counts in memory if it has already been loaded. Here’s a summary table:

	Records loaded	Record not loaded
count	SELECT count(*) FROM table	SELECT count(*) FROM table
size	Count in memory	SELECT count(*) FROM table
length	Count in memory	SELECT * FROM table

When counting and enumerating, it’s important to call size after the relationship has been loaded. In all cases, the aim is to trigger a single request.

# Bad 2 queries instead of 1
users = User.all
users.size # SELECT COUNT(*) FROM "users"
users.each { } # SELECT "users".* FROM "users"

# Good
users = User.all
users.length # SELECT "users".* FROM "users"
users.each { } # No queries

# Good
users = User.all
users.each { } # SELECT "users".* FROM "users"
users.size # No queries

# Good
users = User.all.load # SELECT "users".* FROM "users"
users.size # No queries
users.each { } # No queries

# Bad 2 queries instead of 1
users = User.all
users.each { } # SELECT "users".* FROM "users"
users.count # SELECT COUNT(*) FROM "users"

Know exists, any/empty, and present/blank to save on SQL queries

As with count, size, and length, it’s important to know the subtleties of these methods to trigger the fewest and most efficient queries possible.

The exists? method always triggers a query. It is optimized because it stops as soon as it finds a line. The present? and blank? methods make sure that the query has been executed before checking for their presence or absence in memory. Finally, the any? and empty? methods adapt if the relationship has already been loaded. Here’s a summary table:

	Loaded	Not loaded
exists?	SELECT 1 FROM table LIMIT 1	SELECT 1 FROM table LIMIT 1
any?/empty?	In memory	SELECT 1 FROM table LIMIT 1
present?/blank?	In memory	SELECT * FROM table

In this way, we can deduce good and bad uses when we condition a display according to the presence of recordings.

# Bad 2 queries instead of 1
users = Users.all
if users.exists? # SELECT 1 FROM users LIMIT 1
  users.each { } # SELECT * FROM users
end

# Good
users = Users.all
if users.present? # SELECT * FROM users
  users.each { } # No queries
end

# Bad 2 queries instead of 1
users = Users.all
if users.any? # SELECT 1 FROM users LIMIT 1
  users.each { } # SELECT * FROM users
end

# Good
users = Users.all.load # SELECT * FROM users
if users.any? # No queries
  users.each { } # No queries
end

Use pluck instead of loading ActiveRecord instances when possible

Pluck allows you to retrieve the raw result of an SQL query and thus avoid creating ActiveRecord instances. Since there are fewer things to do, it’s inevitably faster and less memory-hungry. On the other hand, you no longer benefit from the full functionality of an ActiveRecord model. So it’s a good idea to use it when you don’t need your model’s methods. We’re thinking in particular of a CSV or text export of several thousand lines.

# Slow
CSV.generate do |csv|
  User.all.each { |user| csv << [user.id, user.name, user.email] }
end

# Fast
CSV.generate do |csv|
  User.pluck(:id, :name, :email).each { |row| csv << row }
end

When you need to retrieve a large number of records, try to find a solution with pluck.

Use symbols or frozen string literals

By default, Ruby creates a new instance for any string literal, but not for symbols.

"Ruby".object_id #=> 60
"Ruby".object_id #=> 80
"Ruby".object_id #=> 100

:ruby.object_id # => 710748
:ruby.object_id # => 710748
:ruby.object_id # => 710748

The same chain has been created three times, which is wasteful when the symbol has been reused. So symbols are more efficient. A comment at the beginning of each file tells Ruby to freeze and reuse literal strings.

# frozen_string_literal: true
"Ruby".object_id #=> 60
"Ruby".object_id #=> 60
"Ruby".object_id #=> 60

:ruby.object_id # => 710748
:ruby.object_id # => 710748
:ruby.object_id # => 710748

However, the string can no longer be modified, as it has been frozen. Passing a frozen string to a method you don’t control may result in a FrozenError exception. You’ll need to duplicate it explicitly.

# frozen_string_literal: true
"Ruby".concat(" on Rails") # FrozenError: can't modify frozen String
"Ruby".dup.concat(" on Rails") # => "Ruby on Rails"

Store function results in local variables if needed more than once

Even if this sounds obvious, it’s not uncommon to read about this kind of code.

# Bad
if object.expensive_compute
  puts object.expensive_compute
end

# Good
if result = object.expensive_compute
  puts result
end

Even with relatively quick methods, it’s a shame to repeat them.

# Bad
puts array.first.method1
puts array.first.method2
puts array.first.method3

# Good
object = array.first
puts object.method1
puts object.method2
puts object.method3

This doesn’t make the code more complicated, and sometimes simplifies it a little.

Reuse HTTP connections

For the reasons explained in the Enable keep-alive connections section, it’s more efficient to reuse the same HTTP connection to execute multiple requests. Each time, you save the time needed to establish the connection, as well as the SSL negotiation. This is a significant saving.

# Slow, create 5 connections
Net::HTTP.get(url)
Net::HTTP.get(url)
Net::HTTP.get(url)
Net::HTTP.get(url)
Net::HTTP.get(url)

# Fast, by re-using the same connection 5 times
Net::HTTP.start(url.host) do |http|
  http.get(url.path)
  http.get(url.path)
  http.get(url.path)
  http.get(url.path)
  http.get(url.path)
end

# Fast, but without a block
http = Net::HTTP.new(url.host, url.port)
http.start
http.get(url.path)
http.get(url.path)
http.get(url.path)
http.get(url.path)
http.get(url.path)
http.finish

The easiest way is to make all requests in a block passed to Net::HTTP.start, as you won’t forget to close the connection. If you can’t group all your requests in one block, you can always start and end them manually.

Database

Tune your Database settings

By default, most databases are not optimally configured for your use and your server’s capabilities. If you manage your own database, it’s crucial to do so.
If your database is managed by a third party, it’s just as important to check that this has been done properly.

Fortunately, there are tools available to help you do just that. For PostgreSQL, we recommend PGTune. For MySQL, there is MySQLTunner, but we have yet to gain experience with it.

By giving the amount of RAM and number of CPUs, PGTune gives you the best settings for your server. It’s effortless to do.
Then, you only have to copy the settings to the config file.

We like to store those settings into a dedicated file such as /etc/postgresql/16/main/conf.d/pgtune.conf. If we have to change it in the future, we just need to replace the whole file. It’s easier for the maintenance.

For SQLite, there are good default since Rails 7.1. Otherwise you can set those parameters yourself:

PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL;
PRAGMA journal_size_limit = 67108864; -- 64 megabytes
PRAGMA mmap_size = 134217728; -- 128 megabytes
PRAGMA cache_size = 2000;
PRAGMA busy_timeout = 5000;
-- Source: Stephen Margheim
-- https://fractaledmind.github.io/2023/09/21/enhancing-rails-sqlite-performance-metrics

SQL will always be faster than your code

If a task can be performed by the database or your code, let the database take care of it. It’s faster because all the work is done where the data is. This means less bandwidth consumption and less latency. Moreover, your code is unlikely to be better than the database.

  Invoice.pluck(:amount).sum # Slower
  Invoice.sum(:amount) # Faster

Index all foreign keys

The odds for a foreign key to appear in a where clause are very high. If it’s not the case, probably that the foreign key is useless. So it’s a no brainer decision to create an index when adding a foreign key.

The disadvantage of indexes is that they slow down table writes. But very often, the number of reads far exceeds the number of writes. On the other hand, if your index is never used, it should be deleted. This information is available in PostgreSQL’s internal tables. It’s a bit confusing to get into. Fortunately, tools such as PgHero make it very easy.

So, by default, add an index to each foreign key, then delete the few that are never used.

Exclude nulls from indexes

A database index is a B-tree structure. It is very efficient when data has a high cardinality. However, when a column allows nulls, it often becomes the most redundant value. The index is less efficient and takes up more space. Unless null is an infrequently repeated value, there are only disadvantages to indexing them.
Exclude them when creating the index with a where clause.

add_index :table, :column, where: "(column IS NOT NULL)"

Or in pure SQL :

CREATE INDEX name ON table (column) WHERE column IS NOT NULL;

Do not index column with a low cardinality such as boolean

The reason is the same as in the previous paragraph. B-tree indexes work best when cardinality is high. So a Boolean is the worst column you can index. So don’t index booleans.

As with other types, if you have very repetitive values that are not significant from a business point of view, it’s probably a good idea to exclude them from the index. We’re thinking in particular of default values:

add_column :accounts, :balance, default: 0, null: false
add_index :accouns, :balance, where: "(balance != 0)"

Or in pure SQL:

ALTER TABLE accounts ADD COLUMN balance decimal DEFAULT 0 NOT NULL;
CREATE INDEX index_accounts_balance ON accounts (balance) WHERE balance != 0;

Conclusion

These rules are by no means exhaustive. Feel free to share your own.