How to keep your data consistent with foreign key constraints

Datacenter

We all have that co-worker (and have been that co-worker) who SSHs into a server and runs SQL statements against live data. On staging servers this can be a minor issue if things go wrong, but in production it can be disastrous. At WillCall we often have to spend time putting our staging data into different configurations for testing purposes and have felt the pain of inconsistent data more than once.

Today we’re going to talk about data consistency, why keeping your data consistent is always a challenge, and what you can do about it.

Integrity issues can creep in unintentionally if you do not have a deep understanding of how ActiveRecord callbacks and validations work and which methods skip them entirely!

For example if you have a Post model and each Post had many Comments, when you call post.destroy with the dependent: :destroy option, it will destroy the associated comments as well.

class Post < ActiveRecord::Base
  has_many :comments, dependent: :destroy
end

class Comment < ActiveRecord::Base
  belongs_to :post
end

This keeps your data consistent. You have no orphaned comments floating around waiting to explode when you call comment.post.title.

However if someone were to mistakenly use post.delete instead of post.destroy the callbacks that destroy the associated comments would never be run!

The Rails guides have a list of dangerous methods that skip callbacks for various reasons and should probably be avoided by junior most developers. They include:

  • decrement
  • decrement_counter
  • delete
  • delete_all
  • increment
  • increment_counter
  • toggle
  • touch
  • update_column
  • update_columns
  • update_all
  • update_counters

Solution: Foreign key constraints.

Foreign key constraints are the database’s solution to this data integrity problem. They are basically a way to tell the database to only allow actual Post IDs in the comments post_id column. This means if you tried to add a comment to a non-existent or deleted post, or delete a post that has comments, it would raise an ActiveRecord::InvalidForeignKey exception.

Rails does not have support for foreign key constraints because DHH is not a fan of them, but there are several gems that will add this ability for you. Foreigner is the most popular, but I have been having good results with the more full featured SchemaPlus. I’ll show how to use both:

Foreigner

Foreigner is a simple gem that just adds methods to create foreign keys in mysql, postgres, and sqlite. To install Foreigner, add the following to your Gemfile:

gem 'foreigner', '~> 1.6.1'

And then install with $ bundle install.

Now you can use the foreign_key method in your migrations and the foreign key constraint will be added for you.

create_table :comments do |t|
  t.string :body
  t.foreign_key :posts
end

And that’s it! You can also add null: false to ensure that the post_id is not nil.

SchemaPlus

SchemaPlus is another option. Although being slightly less popular it adds some conveniences that foreigner does not like column default expressions which I might talk about in another post. Another benefit is that it will automatically add foreign key constraints to all t.references and t.belongs_to methods in your migrations for you. To install schema_plus add the following to your Gemfile:

gem 'schema_plus', '~> 1.4.1'

And then install with $ bundle install. And you’re done! When you start your next project this is definitely something I would try playing with. There are also many options you can pass that customize your foreign key constraints described here.

Testing Integrity

Finally adding a few tests is always a good thing, so if you want to test that you cannot accidentally violate your data integrity you could do something like this in rspec.

require 'spec_helper'

describe 'Post Integrity' do
  it 'should not allow posts to be destroyed if they have comments' do
    post = create(:post)
    create(:comment, post: post)
    expect { post.destroy! }.to raise_error(ActiveRecord::DeleteRestrictionError)
  end

  it 'should not allow posts to be deleted if they have comments' do
    post = create(:post)
    create(:comment, post: post)
    expect { post.delete }.to raise_error(ActiveRecord::InvalidForeignKey)
  end
end

How to set up a Rails 4.1 app on AWS with Elastic Beanstalk and PostgreSQL

AWS Elastic Beanstalk

Update: This post is now a bit outdated. There is a NEW VERSION of this post that has updated instructions for the new Elastic Beanstalk and Rails 4.2.

Managing ops on a small team can be a real pain. At WillCall our infrastructure is pretty simple but we still have to spin up and down lots of servers every day to handle the varying traffic across our services. Three years ago when we had just started the company we were doing this manually with chef, but it could be error-prone and started to feel like a full-time job.

For simple projects, tools like heroku can be perfect. Great for prototyping apps and testing out different ideas without a lot of hassle. But when your project gets more complicated and you want to have more control of your servers, load balancers, workers, auto-scaling conditions, etc, you will feel this pain as well.

There are many services that provide the ability to get a Rails app up and running quickly while still keeping full control of your infrastructure, and my current favorite is Amazon’s Elastic Beanstalk. The service is aptly described by Amazon as follows:

AWS Elastic Beanstalk makes it even easier for developers to quickly deploy and manage applications in the AWS cloud. Developers simply upload their application, and Elastic Beanstalk automatically handles the deployment details of capacity provisioning, load balancing, auto-scaling, and application health monitoring.

Now that Amazon supports PostgreSQL via RDS having a fully-managed postgres-backed Rails app has never been easier!

You can find all of the code for this post at github.com/jtescher/example-rails-elasticbeanstalk-blog

If you get stuck or have other issues the documentation for Elastic Beanstalk is pretty good.

Getting Started:

Sign up for an AWS account via the instructions at console.aws.amazon.com/elasticbeanstalk and then download the Elastic Beanstalk Command Line Tools via Homebrew (or here for PC)

$ brew update
$ brew install aws-elasticbeanstalk

Create the app:

The most current version of rails at the time of this writing is 4.1.0.rc1 so that’s what we will use now.

$ gem install rails -v 4.1.0.rc1
$ rails new blog --database=postgresql
$ cd blog
$ git init && git add -A && git commit -m "Initial commit"

Initialize beanstalk environment

$ eb init

I would choose the following settings, but for a description of each option see the AWS example here

Enter your AWS Access Key ID (current value is "AKIAIOSEXAMPLE"): 

Enter your AWS Secret Access Key (current value is "PfiCYEXAMPLE"): 

Select an AWS Elastic Beanstalk service region.
2) US West (Oregon)

Enter an AWS Elastic Beanstalk application name 
(auto-generated value is "blog"): 

Enter an AWS Elastic Beanstalk environment name 
(current value is "blog-env"): 

Select an environment tier.
1) WebServer::Standard::1.0

Select a solution stack.
44) 64bit Amazon Linux 2013.09 running Ruby 1.9.3

Select an environment type.
1) LoadBalanced
 
Create an RDS DB Instance? [y/n]: y

Create an RDS BD Instance from (current value is "[No snapshot]"): 1

Enter an RDS DB master password: 
Retype password to confirm: 

If you terminate your environment, your RDS DB Instance will be deleted and you will lose your data.
Create snapshot? [y/n]: y

Attach an instance profile (current value is "[Create a default instance profile]"): 1

Then create an optionsettings file in the newly created .elasticbeanstalk directory called optionsettings.blog-env (or your custom env name as the extension if you customized it)

# .elasticbeanstalk/optionsettings.blog-env
[aws:autoscaling:asg]
Custom Availability Zones=
MaxSize=4
MinSize=1

[aws:autoscaling:launchconfiguration]
EC2KeyName=
InstanceType=t1.micro

[aws:autoscaling:updatepolicy:rollingupdate]
RollingUpdateEnabled=false

[aws:ec2:vpc]
ELBScheme=public
ELBSubnets=
Subnets=
VPCId=

[aws:elasticbeanstalk:application]
Application Healthcheck URL=

[aws:elasticbeanstalk:application:environment]
BUNDLE_WITHOUT=test:development
RAILS_SKIP_ASSET_COMPILATION=false
RAILS_SKIP_MIGRATIONS=false

[aws:elasticbeanstalk:hostmanager]
LogPublicationControl=false

[aws:elasticbeanstalk:monitoring]
Automatically Terminate Unhealthy Instances=true

[aws:elasticbeanstalk:sns:topics]
Notification Endpoint=
Notification Protocol=email

[aws:rds:dbinstance]
DBDeletionPolicy=Snapshot
DBEngine=postgres
DBInstanceClass=db.t1.micro
DBSnapshotIdentifier=
DBUser=ebroot

Then you can test that everything worked with:

$ eb status
Environment "blog-env" is not running.

Starting the app

To start the environment run: (we will start the app without the rails app first to make sure it’s all set up properly)

$ eb start
Would you like to deploy the latest Git commit to your environment? [y/n]: n
Waiting for environment "blog-env" to launch.         
INFO	createEnvironment is starting.                          
INFO	Using elasticbeanstalk-us-west-2-whatever as Amazon S3 s...
INFO	Created security group named: sg-aebgroupid             
INFO	Created load balancer named: awseb-e-m-AWSEBLoa-xloadid ...
INFO	Created security group named: sg-cebgroupid             
INFO	Created Auto Scaling launch configuration named: awseb-e...
INFO	Creating RDS database named: apdbnamez. This may take a ...
INFO	Created RDS database named: apdbnamez                   
INFO	Waiting for EC2 instances to launch. This may take a few...
INFO	Created Auto Scaling group named: awseb-e-dse-stack-AWSE...
INFO	Created Auto Scaling group policy named: arn:aws:autosca...
INFO	Created Auto Scaling group policy named: arn:aws:autosca...
INFO	Created CloudWatch alarm named: awseb-e-dv-stack-AWSEBCl...
INFO	Created CloudWatch alarm named: awseb-e-dv-stack-AWSEBCl...

The environment should now be running. To see the status and url:

$ eb status --verbose
Retrieving status of environment "blog-env".
URL		: blog-env-hkdpsewdfsd.elasticbeanstalk.com
Status		: Ready
Health		: Green
Environment Name: blog-env
Environment ID	: e-jlksjdf8d8s
Environment Tier: WebServer::Standard::1.0
Solution Stack	: 64bit Amazon Linux 2013.09 running Ruby 1.9.3
Version Label	: Sample Application
Date Created	: 2014-01-20 18:45:40
Date Updated	: 2014-01-20 19:03:13
Description	: 

RDS Database: AWSEBRDSDatabase | sdfjlsdkflsk.sdfkjlsdkjf.us...
Database Engine:	postgres 9.3.1
Allocated Storage:	10
Instance Class:		db.t1.micro
Multi AZ:		False
Master Username:	ebroot
Creation Time:		2014-01-20 18:53:06
DB Instance Status:	available

If you open the URL that is returned by this command ( should be URL: your-app-env.elasticbeanstalk.com ) you should see the Elastic Beanstalk welcome page:

Elastic Beanstalk Welcome Page

Getting the pg gem to work on elasticbeanstalk

The postgresql-devel yum package is required to install the pg gem on amazon linux and configuring packages on Elastic Beanstalk instances is as simple as dropping a YAML formatted .config file in a top level .ebextensions folder.

# .ebextensions/packages.config
packages:
  yum:
    postgresql-devel: []

Configuring database.yml to work with RDS postgres

Database credentials should never be hard coded and Elastic Beanstalk makes managing environment variables quite simple. RDS variables are configured and updated automatically so your production section of database.yml can be updated to be the following:

production:
  <<: *default
  database: <%= ENV['RDS_DB_NAME'] %>
  username: <%= ENV['RDS_USERNAME'] %>
  password: <%= ENV['RDS_PASSWORD'] %>
  host: <%= ENV['RDS_HOSTNAME'] %>
  port: <%= ENV['RDS_PORT'] %>

Configuring secrets.yml

By default Rails now includes a secrets.yml file and your production keys and settings should be set there as environment variables. The default environment only requires a SECRET_KEY_BASE variable to be defined, so let’s add that to your environment. First run:

$ bundle exec rake secret
5387fca6a0a69d945bb57e8be5bce3a119292216dfc95b270275c518641aa3e8db14a5ead261b1d50332a92ed860c544ef24356100ccb391507b1c5b8d46f39b

Then go to console.aws.amazon.com/elasticbeanstalk/home and in blog-env on the left under configuration click on the cog next to ‘Software Configuration’. At the bottom, enter SECRET_KEY_BASE on the left and the output of rake secret on the right.

After you save, it should look like this:

Elastic Beanstalk Environment Variable Config

Adding some scaffolding

To finish up let’s add a post resource so we can start blogging:

$ rails generate scaffold post title:string body:text
$ bundle exec rake db:migrate

Deploying the app to AWS

Commit all the changes we just made:

$ git add -A && git commit -am "Add post resource, configure database.yml for production"

Then deploy with:

$ git aws.push

If you visit https://console.aws.amazon.com/elasticbeanstalk/home you should see the progress of your app as it deploys.

Once that finishes you have your functional app running Rails and postgres! Go to your-app-env.elasticbeanstalk.com/posts and you’re done!:

Done: List of posts