On this page:
1.1 Appendable Queries
1.2 Reusable Joins
1.3 Inline Joins
1.4 Aggregates
1.5 Putting it All Together
7.7

1 Introduction

I love relational databases, but SQL usually leaves me disappointed. Plisqin is an SQL generator written in Racket. It is intended to demonstrate some shortcomings of SQL and explore how a better database query language could be designed.

Code is available on GitHub.

Reading Racket Code
This document will contain a lot of Racket code:
> (if (integer? 12.3) "an int" "not an int")

"not an int"

Notice how if and integer? are hyperlinked. Even if you have no Racket/Scheme/Lisp experience, you might be able to follow along by using these links when you are not sure what some code is doing.

The Video Rental Example Schema
Here is a query in Plisqin:
(from r "Rental"
      (where r".RentalId > 100"))

Here is an (almost) equivalent query:
(from r Rental
      (where (RentalId r)" > 100"))

When you see upper-case identifiers such as Rental and RentalId, they almost certainly refer to part of the Video Rental Example Schema. This schema models a brick-and-mortar video rental store (think pre-Netflix). Keep this database diagram handy while reading this document.

1.1 Appendable Queries

The from macro is used to create queries. Here new-releases-1 is defined as a procedure that creates and returns a query. Then we render it to SQL:
> (define (new-releases-1)
    (from i "Item"
          (where i".ReleaseDate > '2018-01-01'")
          (select i".ItemName")))
> (display (to-sql (new-releases-1)))

select

  i.ItemName

from Item i

where i.ReleaseDate > '2018-01-01'

This query has 2 clauses: one where clause and one select clause. It is easy to add more clauses to an existing query. Here, we create new-releases-2 which appends 3 more clauses to new-releases-1:
> (define (new-releases-2)
    (from i (new-releases-1)
          (select i".ReleaseDate")
          (where i".ItemTypeId = 1")
          (order-by i".ReleaseDate asc")))
> (display (to-sql (new-releases-2)))

select

  i.ItemName

  , i.ReleaseDate

from Item i

where i.ReleaseDate > '2018-01-01'

and i.ItemTypeId = 1

order by i.ReleaseDate asc

This isn’t very exciting alone, but combined with the rest of Plisqin and the full power of Racket, we can get great composability that SQL lacks.

1.2 Reusable Joins

Joins can be defined as Racket procedures and reused. For example, I know that every Rental has exactly one Item. I have defined this join in the Item-of procedure:

> (define (recent-rentals)
    (from rental "Rental"
          (join item (Item-of rental))
          (where item".ReleaseDate >= '2018-11-22'")))
> (display (to-sql (recent-rentals)))

select rental.*

from Rental rental

inner join Copy _copy

on _copy.CopyId = rental.CopyId

inner join Item _item

on _item.ItemId = _copy.ItemId

where _item.ReleaseDate >= '2018-11-22'

You can see in the generated SQL that I glossed over some details. I said that every Rental has exactly one Item. To be more precise, I could have said that every Rental has exactly one Copy (joined using CopyId), and every Copy has exactly one Item (joined using ItemId). But those are details I don’t want to think about at this level. I would much rather just say (Item-of rental).

1.3 Inline Joins

Joins are allowed to appear pretty much anywhere inside a query. This has some powerful results. For starters, I could have defined the previous example like this instead:
(define (recent-rentals)
  (from rental "Rental"
        (where (Item-of rental)".ReleaseDate >= '2018-11-22'")))

The join is "inline" in the where clause. But we can do even better. The following example also contains an inline join, can you guess where it is?

> (define (recent-rentals)
    (from rental "Rental"
          (where (ReleaseDate rental)" >= '2018-11-22'")))
> (display (to-sql (recent-rentals)))

select rental.*

from Rental rental

inner join Copy _copy

on _copy.CopyId = rental.CopyId

inner join Item _item

on _item.ItemId = _copy.ItemId

where _item.ReleaseDate >= '2018-11-22'

The inline join comes from (ReleaseDate rental), which returns

(scalar (Item-of rental)".ReleaseDate")

Putting joins inside scalars can be very useful. It is nice to be able to say "the ReleaseDate of a Rental" without repeating the details of any join(s) that might be involved.

1.4 Aggregates

This concept is a little more complex, so let’s start with some hand-written SQL. Let’s imagine that I need to write a query that returns Customers with 1) their number of Rentals and 2) their number of movie Rentals. (Our store rents things other than movies, such as video games.) This is the SQL I might write:

select customer.*

  , rentalSummary.MovieRentalCount

  , rentalSummary.TotalRentalCount

from Customer customer

inner join (

    select checkout.CustomerId

      , sum(case when item.ItemTypeId = 1 then 1

                 else 0 end) as MovieRentalCount

      , count(*) as TotalRentalCount

    from Rental rental

    inner join Copy copy on copy.CopyId = rental.CopyId

    inner join Item item on item.ItemId = copy.ItemId

    inner join Checkout checkout on checkout.CheckoutId = rental.CheckoutId

    group by checkout.CustomerId

) rentalSummary

on rentalSummary.CustomerId = customer.CustomerId

This is a pretty common pattern (for me at least):
  1. Group a query (group Rentals by CustomerId)

  2. Join it to the main query (join on rentalSummary.CustomerId ...)

  3. Select the aggregates (select MovieRentalCount and TotalRentalCount)

But this hand-written SQL suffers from major composability problems. Notice how the aggregate expressions (MovieRentalCount and TotalRentalCount) are "sealed inside" the grouped join. Ideally I want to be able to define the grouping and joining (parts 1 and 2) independently of the aggregate operations (part 3). Plisqin allows me to do this:
> (define (my-query)
    (from cust Customer
          (join rentals (grouped-Rentals-of cust))
          (select cust".*")
          (select (count rentals)
                  " as TotalRentalCount")
          (select (sum "case when "(ItemTypeId rentals)" = 1 then 1 else 0 end")
                  " as MovieRentalCount")))
> (display (to-sql (my-query)))

select

  cust.*

  , _rental.__INJECT2 as TotalRentalCount

  , _rental.__INJECT3 as MovieRentalCount

from Customer cust

inner join (

    select

      _checkout.CustomerId as __INJECT1

      , count(*) as __INJECT2

      , sum(case when _item.ItemTypeId = 1 then 1 else 0 end) as __INJECT3

    from Rental _rental

    inner join Checkout _checkout

    on _checkout.CheckoutId = _rental.CheckoutId

    inner join Copy _copy

    on _copy.CopyId = _rental.CopyId

    inner join Item _item

    on _item.ItemId = _copy.ItemId

    group by _checkout.CustomerId) _rental

on _rental.__INJECT1 = cust.CustomerId

In the above example, (grouped-Rentals-of cust) returns a grouped join. This join has no select clauses of its own, but (count rentals) gets "injected into" this grouped join and exposed via the synthetic name __INJECT2. Similarly, the larger sum expression also gets "injected into" the grouped join as __INJECT3. (Don’t worry about __INJECT1 for now.)

Notice how the aggregate expressions live outside of the grouped join. In the following example two more aggregate expressions are added and new injections appear, while (grouped-Rentals-of cust) does not change.
> (define (my-query)
    (from cust Customer
          (join rentals (grouped-Rentals-of cust))
          ; Local definitions can help readability:
          (define (is-movie rental)
            (bool (ItemTypeId rental)" = 1"))
          (define (is-recent rental)
            (bool (CheckoutTime rental)" > '2018-01-01'"))
          (define (count-when predicate)
            (sum "case when "predicate" then 1 else 0 end"))
          (select cust".*")
          (select (count rentals)
                  " as TotalRentalCount")
          (select (count-when (is-movie rentals))
                  " as MovieRentalCount")
          (select (count-when (is-recent rentals))
                  " as RecentRentalCount")
          (select (count-when (bool (is-movie rentals)
                                    " and "
                                    (is-recent rentals)))
                  " as RecentMovieRentalCount")))
> (display (to-sql (my-query)))

select

  cust.*

  , _rental.__INJECT2 as TotalRentalCount

  , _rental.__INJECT3 as MovieRentalCount

  , _rental.__INJECT4 as RecentRentalCount

  , _rental.__INJECT5 as RecentMovieRentalCount

from Customer cust

inner join (

    select

      _checkout.CustomerId as __INJECT1

      , count(*) as __INJECT2

      , sum(case when _item.ItemTypeId = 1 then 1 else 0 end) as __INJECT3

      , sum(case when _checkout.CheckoutTime > '2018-01-01' then 1 else 0 end) as __INJECT4

      , sum(case when _item.ItemTypeId = 1 and _checkout.CheckoutTime > '2018-01-01' then 1 else 0 end) as __INJECT5

    from Rental _rental

    inner join Checkout _checkout

    on _checkout.CheckoutId = _rental.CheckoutId

    inner join Copy _copy

    on _copy.CopyId = _rental.CopyId

    inner join Item _item

    on _item.ItemId = _copy.ItemId

    group by _checkout.CustomerId) _rental

on _rental.__INJECT1 = cust.CustomerId

1.5 Putting it All Together

Let’s imagine that we want a report of Frequently-Rented Items. But we are going to want many variations on this report, such as

We might use a table-valued function in standard SQL, but this will have at least three problems. First, every variation we introduce might require additional parameters. For example, to handle "customers older than X" we would need to add a "customerAgeMinimum" parameter and then we would have to update all the call sites. Second, we might end up paying a performance cost for joins that don’t get used. For example, when we add that "customerAgeMinimum" parameter, we are going to have to join the Customer table even if the caller indicates "no minimum" by passing null or zero. Third, passing a list (such as a list of StoreIds) is painful at best.

In Plisqin, we can handle all these variations with only one parameter.

> (define (frequently-rented-items [rental-query Rental])
    (from i Item
          (join rentals rental-query
                (group-by (ItemId rentals))
                (join-on (ItemId rentals)" = "(ItemId i)))
          (select i".*")
          (select (count rentals)" as NumRentals")
          (order-by (count rentals)" desc")))

The rental-query parameter is optional, and it defaults to Rental which will just create a new, unfiltered query of the Rental table. But since queries append by default, we can pass in an arbitrarily-filtered Rental query and our grouped join will append to it. This will allow us to handle all the variations we listed. First, let’s do the "Frequently rented at certain stores" variation:

> (define (at-certain-stores)
    (from r Rental
          (where (StoreId r)" in (42, 43, 44)")))
> (display (to-sql (frequently-rented-items (at-certain-stores))))

select

  i.*

  , r.__INJECT2 as NumRentals

from Item i

inner join (

    select

      _copy.ItemId as __INJECT1

      , count(*) as __INJECT2

    from Rental r

    inner join Checkout _checkout

    on _checkout.CheckoutId = r.CheckoutId

    inner join Store _store

    on _store.StoreId = _checkout.StoreId

    inner join Copy _copy

    on _copy.CopyId = r.CopyId

    where _store.StoreId in (42, 43, 44)

    group by _copy.ItemId) r

on r.__INJECT1 = i.ItemId

order by r.__INJECT2 desc

Next, frequently rented in a certain month:
> (define (this-month)
    (from r Rental
          (where (CheckoutTime r)" >= '2018-11-01'")
          (where (CheckoutTime r)" < '2018-12-01'")))
> (display (to-sql (frequently-rented-items (this-month))))

select

  i.*

  , r.__INJECT2 as NumRentals

from Item i

inner join (

    select

      _copy.ItemId as __INJECT1

      , count(*) as __INJECT2

    from Rental r

    inner join Checkout _checkout

    on _checkout.CheckoutId = r.CheckoutId

    inner join Copy _copy

    on _copy.CopyId = r.CopyId

    where _checkout.CheckoutTime >= '2018-11-01'

    and _checkout.CheckoutTime < '2018-12-01'

    group by _copy.ItemId) r

on r.__INJECT1 = i.ItemId

order by r.__INJECT2 desc

OK, we see where this is headed. Let’s just combine them all and declare victory:

> (define (combined-query)
    (frequently-rented-items
     (from r Rental
           ; at certain stores:
           (where (StoreId r)" in (42, 43, 44)")
           ; in a certain month:
           (where (CheckoutTime r)" >= '2018-11-01'")
           (where (CheckoutTime r)" < '2018-12-01'")
           ; having an 'Action' genre:
           (join g (Genres-of r)
                 (join-on g".GenreName = 'Action'"))
           ; by customers over a certain age:
           (where (CustomerBirthDate r)" < '1988-11-01'"))))
> (display (to-sql (combined-query)))

select

  i.*

  , r.__INJECT2 as NumRentals

from Item i

inner join (

    select

      _copy.ItemId as __INJECT1

      , count(*) as __INJECT2

    from Rental r

    inner join Copy _copy

    on _copy.CopyId = r.CopyId

    inner join Item _item

    on _item.ItemId = _copy.ItemId

    inner join ItemGenre _itemGenre

    on _itemGenre.ItemId = _item.ItemId

    inner join Genre _genre

    on _genre.GenreId = _itemGenre.GenreId

    and _genre.GenreName = 'Action'

    inner join Checkout _checkout

    on _checkout.CheckoutId = r.CheckoutId

    inner join Store _store

    on _store.StoreId = _checkout.StoreId

    inner join Customer _customer

    on _customer.CustomerId = _checkout.CustomerId

    where _store.StoreId in (42, 43, 44)

    and _checkout.CheckoutTime >= '2018-11-01'

    and _checkout.CheckoutTime < '2018-12-01'

    and _customer.CustomerBirthDate < '1988-11-01'

    group by _copy.ItemId) r

on r.__INJECT1 = i.ItemId

order by r.__INJECT2 desc

This concludes the sales pitch. Hopefully you can see that Plisqin allows you to do things that are impossible in SQL. Continue reading at Getting Started.