Find latest entry in a many-to-many relationship

Credit

The solution here has helped me with this problem many times. This is another answer with an explanation by the same author.

Introduction

While I have years of experience optimizing SQL queries, I still find it difficult to think in a declarative programming style. Most programmers are familiar with the imperative style, where we tell the computer what to do in each step. Or procedural, where we wrap up imperative statements in procedures that can be called over and over again, or from multiple places (for all practical purposes, think of functions that allow for code re-use).

Having learned programming using Lisp, I’m also more familiar with a functional programming style than most programmers. But thinking in a declarative programming style can still be elusive for me. I’ve gotten a lot better at formulating SQL queries over the years, but one common question that I always spend 30 minutes googling around for is the following.

Query

I have two tables, alpha and beta. The relationship between them is that many beta can and do belong to a single alpha. I would like a list of all alpha associated with a single beta of each alpha. The particular beta I want should be selected according to some criteria that maximizes or minimizes some attribute of the beta.

Lets run through a few examples.

Return all alpha and their newest beta

  select alpha.id, b1.id
  from alpha
  join beta as b1 on b1.alpha_id = alpha.id
  left join beta as b2 on b2.alpha_id = alpha.id
    and b2.created_at > b1.created_at
  where b2.id is null

Return all alpha and the beta with the lowest value of a particular field

  select alpha.id, b1.id
  from alpha
  join beta as b1 on b1.alpha_id = alpha.id
  left join beta as b2 on b2.alpha_id = alpha.id
    and (b2.foo < b1.foo OR b2.foo = b1.foo AND b2.id > b1.id)
  where b2.id is null

This form (with the “=” condition) allows for selecting a particular “b1” in the case where its “foo” is equal to the “foo” of “b2”.

Conclusion

This is a classic case of declarative instead of procedural thought. Instead of getting the list of all alphas, then the list of all the betas for each alpha, and then cycling through the betas to find the ones I want, I’m:

  1. performing a self-join with a constraint, which exponentially expands the result-set for each alpha, and
  2. then adding a constraint to the main query, which collapses the result-set down to one beta for each alpha.

Note that the un-collapsed result set is 562 rows for an alpha that has 34 betas. Recall that premature performance optimization is the root of all evil; YMMV; always profile your code for your particular data-set and client requirements; etc, etc.

Leave a Reply