SQL is the best!

I gave a SQL tutorial at PyLadies Boston last night and it was pretty fun. We used sqlite3 (which is definitely my least favorite DBMS, but it does come installed on pretty much every Linux/Unix machine by default and is the default for Django so I decided it was the best tool for this particular job. Giving a tutorial on something I used daily and have used consistently for 7 years was a bit weird because I did forget a few things because it didn't even cross my mind that people wouldn't know. For example: I initially neglected to mention that every statement needs a semicolon at the end and that you can't mix quotes (no " with '). Consider that was the bulk of all the issues, I'm feeling pretty successful right now! Take a look at the full tutorial below and let me know what you think.

ActiveRecord Joins

Random thing I missed while turning all those selects into regular queries using joins: if you use the names of associations to join tables (ex. recipes.joins(:ingredients)), it creates an INNER join. Why is this important? Here’s my example:

I have recipes with ingredients, and each ingredient might have a subingredient (gluten, soy, etc). Every recipe has ingredients, so an inner join is fine there. However, every recipe does not have subingredients. Since I was doing a recipes.joins(ingredients: :subingredients), I was only getting recipes that did not have subingredients... definitely not what I wanted. Here’s the code:

Recipe.joins(ingredients: :subingredients).joins(:ingredients)

Resulting SQL
SELECT "recipes".* FROM "recipes" INNER JOIN "ingredient_measurements" ON "ingredient_measurements"."recipe_id" = "recipes"."id" INNER JOIN "ingredients" ON "ingredients"."id" = "ingredient_measurements"."ingredient_id" INNER JOIN "ingredients_subingredients" ON "ingredients_subingredients"."ingredient_id" = "ingredients"."id" INNER JOIN "subingredients" ON "subingredients"."id" = "ingredients_subingredients"."subingredient_id"

Recipe.joins(:ingredients).joins('LEFT JOIN "ingredients_subingredients" ON "ingredients_subingredients"."ingredient_id" = "ingredients"."id" LEFT JOIN "subingredients" ON "subingredients"."id" = "ingredients_subingredients"."subingredient_id"')

Resulting SQL
SELECT "recipes".* FROM "recipes" INNER JOIN "ingredient_measurements" ON "ingredient_measurements"."recipe_id" = "recipes"."id" INNER JOIN "ingredients" ON "ingredients"."id" = "ingredient_measurements"."ingredient_id" LEFT JOIN "ingredients_subingredients" ON "ingredients_subingredients"."ingredient_id" = "ingredients"."id" LEFT JOIN "subingredients" ON "subingredients"."id" = "ingredients_subingredients"."subingredient_id"

UPDATE: As Darren (manvsmachine) mentions, you can also use .eager_load to create LEFT OUTER joins. However, that does give you a different query. It seems to have the same-ish results, but the SQL output is slightly different. It looks something like this:


Resulting SQL
SELECT "recipes"."id" AS t0_r0, "recipes"."name" AS t0_r1, "recipes"."instructions" AS t0_r2, ... "recipes"."image" AS t0_r14, "subingredients"."id" AS t1_r0, "subingredients"."name" AS t1_r1, "subingredients"."created_at" AS t1_r2, "subingredients"."updated_at" AS t1_r3 FROM "recipes" LEFT OUTER JOIN "ingredient_measurements" ON "ingredient_measurements"."recipe_id" = "recipes"."id" LEFT OUTER JOIN "ingredients" ON "ingredients"."id" = "ingredient_measurements"."ingredient_id" LEFT OUTER JOIN "ingredients_subingredients" ON "ingredients_subingredients"."ingredient_id" = "ingredients"."id" LEFT OUTER JOIN "subingredients" ON "subingredients"."id" = "ingredients_subingredients"."subingredient_id"

Seriously, you should have a staging environment

 Having a good staging environment for a production app is vital. Ideally, it is almost identical to production, with the exception being test credentials if using a service like Stripe. When I came in, we had a staging environment. Ok, great! Except that we had no data in it, so we couldn’t actually test anything but the most basic interactions. It was somewhere between very difficult to impossible to test if bugs had been fixed. It was also using the production environment variable, so it was actually indistinguishable from production… but luckily, we were using test Stripe keys!

How we fixed it (in our Rails app):

  1. Copied production.rb to staging.rb and replaced   config.action_mailer.smtp_settings with the settings from Mailtrap. If you aren’t up for Mailtrap (it’s pretty rad, I would definitely recommend it), you can also use just plain old gmail.
  2. 2) If you are using the rails_12factor gem, don’t forget to add it to the staging environment in the Gemfile, like so:
    group :staging, :production do
      gem 'rails_12factor'
    If you don’t do this, you will (like me) end up with your staging server being unable to find any of your assets and getting a ton of 404s in your console. Whomp whomp.
  3. Set a cron job to regularly grab data from production. Here’s an example of how we are doing it:
    heroku maintenance:on --app STAGING_APP_NAME
    heroku pg:backups capture --app PROD_APP_NAME
    heroku pg:backups restore `heroku pg:backups public-url --app PROD_APP_NAME | cat` DB_NAME --app STAGING_APP_NAME --confirm STAGING_APP_NAME
    heroku run rake db:sanitize_passwords --app STAGING_APPNAME
    heroku run rake db:migrate --app STAGING_APP_NAME
    heroku maintenance:off --app STAGING_APP_NAME 
    where DB_NAME is something like HEROKU_POSTGRESQL_JADE. This type of script is only relevant if using heroku and postgres, but you can do something similar if using other providers/databases. This post goes over a different way to do it if you are using mysql. ‘rake db:sanitize_passwords’ would be a task of your own creation to sanitize all the passwords of the users (very important) and also (less important, but still good for privacy) remove identifying data.
  4. Add staging to your database.yml. This should basically be a copy of production, but the database name should be something like #{product}_staging.

I’ve probably missed a few small things, but after that, basically just look for 'production’ in your app and add matching bits for 'staging’.

UPDATE: I did miss something after all. One of my former coworkers brought this to my attention: '3a) Rake task to sanitize/de-identify data, reset passwords?’. In the script I’m actually running, I am updating the credit card information so each customer just has a Stripe test account with a atest card instead of linked to their actual account, but I definitely did forget to reset passwords. Another reason this is useful is that, if you reset all your staging passwords to the same thing, you can login as any user on your test environment to debug any issues they may be having.