Getting SQL Results That Are Distinct Across Two Columns

So this is a weird issue I just came across. Here's an example table schema:

mysql> describe queues;
+--------------+---------------+
| Field        | Type          |
+--------------+---------------+
| id           | int(11)       |
| customer_id  | mediumint(9)  |
| request_time | decimal(12,0) |
| item_id      | smallint(6)   |
+--------------+---------------+

mysql> select * from queues;
+------+--------------+--------------+--------+
| id   | customer_id | request_time | item_id |
+------+-------------+--------------+---------+
| 6829 |       15066 | 201704161118 |       1 |
| 6872 |       15066 | 201704161118 |       2 |
| 6875 |       15066 | 201704161118 |      26 |
| 6880 |       15066 | 201704161118 |       8 |
| 6881 |       15066 | 201704161118 |      15 |
| 6930 |       15077 | 201704161942 |       6 |
| 8683 |       14625 | 201704171412 |      10 |
+------+-------------+--------------+---------+

In my example, I might have the same customer requesting multiple items at the same time. I want to display all the items they have requested in the same line. That means I want to get a list of all the unique customers and request times combined. Yes, this isn't the *greatest* example because this table should probably be designed in a different way, but stick with me!

If I only want customer_id and request_time, that is pretty simple.

mysql> SELECT DISTINCT customer_id, request_time FROM queues;
+-------------+--------------+
| customer_id | request_time |
+-------------+--------------+
|       15066 | 201704161118 |
|       15077 | 201704161942 |
|       14625 | 201704171412 |
+-------------+--------------+

However, in my case, I need the queue id to do additional queries. That's where it gets just a smidge bit more complicated! Instead of just a simple DISTINCT, I've got to count the distinct records and then use HAVING to actually limit it.

mysql> SELECT *, 
COUNT(DISTINCT customer_id, request_time) as unique_orders 
FROM queues 
GROUP BY customer_id, request_time 
HAVING unique_orders >= 1;
+------+-------------+--------------+---------+
| id   | customer_id | request_time | item_id |
+------+-------------+--------------+---------+
| 6829 |       15066 | 201704161118 |       1 |
| 6930 |       15077 | 201704161942 |       6 |
| 8683 |       14625 | 201704171412 |      10 |
+------+-------------+--------------+---------+

Not too difficult, but I did go through a few different variations before getting to this result. I wanted it to work, but SELECT id, DISTINCT(customer_id, request_time) definitely does not!

SQL is the best!

I gave a SQL tutorial at PyLadies Boston last night and it was pretty fun. We used sqlite3 (which is definitely my least favorite DBMS, but it does come installed on pretty much every Linux/Unix machine by default and is the default for Django so I decided it was the best tool for this particular job. Giving a tutorial on something I used daily and have used consistently for 7 years was a bit weird because I did forget a few things because it didn't even cross my mind that people wouldn't know. For example: I initially neglected to mention that every statement needs a semicolon at the end and that you can't mix quotes (no " with '). Consider that was the bulk of all the issues, I'm feeling pretty successful right now! Take a look at the full tutorial below and let me know what you think.

ActiveRecord Joins

Random thing I missed while turning all those selects into regular queries using joins: if you use the names of associations to join tables (ex. recipes.joins(:ingredients)), it creates an INNER join. Why is this important? Here’s my example:

I have recipes with ingredients, and each ingredient might have a subingredient (gluten, soy, etc). Every recipe has ingredients, so an inner join is fine there. However, every recipe does not have subingredients. Since I was doing a recipes.joins(ingredients: :subingredients), I was only getting recipes that did not have subingredients... definitely not what I wanted. Here’s the code:

BEFORE:
Code
Recipe.joins(ingredients: :subingredients).joins(:ingredients)

Resulting SQL
SELECT "recipes".* FROM "recipes" INNER JOIN "ingredient_measurements" ON "ingredient_measurements"."recipe_id" = "recipes"."id" INNER JOIN "ingredients" ON "ingredients"."id" = "ingredient_measurements"."ingredient_id" INNER JOIN "ingredients_subingredients" ON "ingredients_subingredients"."ingredient_id" = "ingredients"."id" INNER JOIN "subingredients" ON "subingredients"."id" = "ingredients_subingredients"."subingredient_id"

AFTER:
Code
Recipe.joins(:ingredients).joins('LEFT JOIN "ingredients_subingredients" ON "ingredients_subingredients"."ingredient_id" = "ingredients"."id" LEFT JOIN "subingredients" ON "subingredients"."id" = "ingredients_subingredients"."subingredient_id"')

Resulting SQL
SELECT "recipes".* FROM "recipes" INNER JOIN "ingredient_measurements" ON "ingredient_measurements"."recipe_id" = "recipes"."id" INNER JOIN "ingredients" ON "ingredients"."id" = "ingredient_measurements"."ingredient_id" LEFT JOIN "ingredients_subingredients" ON "ingredients_subingredients"."ingredient_id" = "ingredients"."id" LEFT JOIN "subingredients" ON "subingredients"."id" = "ingredients_subingredients"."subingredient_id"

UPDATE: As Darren (manvsmachine) mentions, you can also use .eager_load to create LEFT OUTER joins. However, that does give you a different query. It seems to have the same-ish results, but the SQL output is slightly different. It looks something like this:

Code
Recipe.eager_load(:subingredients)

Resulting SQL
SELECT "recipes"."id" AS t0_r0, "recipes"."name" AS t0_r1, "recipes"."instructions" AS t0_r2, ... "recipes"."image" AS t0_r14, "subingredients"."id" AS t1_r0, "subingredients"."name" AS t1_r1, "subingredients"."created_at" AS t1_r2, "subingredients"."updated_at" AS t1_r3 FROM "recipes" LEFT OUTER JOIN "ingredient_measurements" ON "ingredient_measurements"."recipe_id" = "recipes"."id" LEFT OUTER JOIN "ingredients" ON "ingredients"."id" = "ingredient_measurements"."ingredient_id" LEFT OUTER JOIN "ingredients_subingredients" ON "ingredients_subingredients"."ingredient_id" = "ingredients"."id" LEFT OUTER JOIN "subingredients" ON "subingredients"."id" = "ingredients_subingredients"."subingredient_id"