1 Introduction

7.7

1 Introduction

I love relational databases, but SQL usually leaves me disappointed. Plisqin is an SQL generator written in Racket. It is intended to demonstrate some shortcomings of SQL and explore how a better database query language could be designed.

Code is available on GitHub.

Reading Racket Code

This document will contain a lot of Racket code:

> (if (integer? 12.3) "an int" "not an int")
"not an int"

Notice how if and integer? are hyperlinked. Even if you have no Racket/Scheme/Lisp experience, you might be able to follow along by using these links when you are not sure what some code is doing.

The Video Rental Example Schema

Here is a query in Plisqin:

(from r "Rental"
(where r".RentalId > 100"))

Here is an (almost) equivalent query:

(from r Rental
(where (RentalId r)" > 100"))

When you see upper-case identifiers such as Rental and RentalId, they almost certainly refer to part of the Video Rental Example Schema. This schema models a brick-and-mortar video rental store (think pre-Netflix). Keep this database diagram handy while reading this document.

1.1 Appendable Queries

The from macro is used to create queries. Here new-releases-1 is defined as a procedure that creates and returns a query. Then we render it to SQL:

> (define (new-releases-1)
    (from i "Item"
          (where i".ReleaseDate > '2018-01-01'")
          (select i".ItemName")))
> (display (to-sql (new-releases-1)))
select
  i.ItemName
from Item i
where i.ReleaseDate > '2018-01-01'

This query has 2 clauses: one where clause and one select clause. It is easy to add more clauses to an existing query. Here, we create new-releases-2 which appends 3 more clauses to new-releases-1:

> (define (new-releases-2)
    (from i (new-releases-1)
          (select i".ReleaseDate")
          (where i".ItemTypeId = 1")
          (order-by i".ReleaseDate asc")))
> (display (to-sql (new-releases-2)))
select
  i.ItemName
  , i.ReleaseDate
from Item i
where i.ReleaseDate > '2018-01-01'
and i.ItemTypeId = 1
order by i.ReleaseDate asc

This isn’t very exciting alone, but combined with the rest of Plisqin and the full power of Racket, we can get great composability that SQL lacks.

1.2 Reusable Joins

Joins can be defined as Racket procedures and reused. For example, I know that every Rental has exactly one Item. I have defined this join in the Item-of procedure:

> (define (recent-rentals)
    (from rental "Rental"
          (join item (Item-of rental))
          (where item".ReleaseDate >= '2018-11-22'")))
> (display (to-sql (recent-rentals)))
select rental.*
from Rental rental
inner join Copy _copy
on _copy.CopyId = rental.CopyId
inner join Item _item
on _item.ItemId = _copy.ItemId
where _item.ReleaseDate >= '2018-11-22'

You can see in the generated SQL that I glossed over some details. I said that every Rental has exactly one Item. To be more precise, I could have said that every Rental has exactly one Copy (joined using CopyId), and every Copy has exactly one Item (joined using ItemId). But those are details I don’t want to think about at this level. I would much rather just say (Item-of rental).

1.3 Inline Joins

Joins are allowed to appear pretty much anywhere inside a query. This has some powerful results. For starters, I could have defined the previous example like this instead:

(define (recent-rentals)
(from rental "Rental"
(where (Item-of rental)".ReleaseDate >= '2018-11-22'")))

The join is "inline" in the where clause. But we can do even better. The following example also contains an inline join, can you guess where it is?

> (define (recent-rentals)
(from rental "Rental"
(where (ReleaseDate rental)" >= '2018-11-22'")))
> (display (to-sql (recent-rentals)))
select rental.*
from Rental rental
inner join Copy _copy
on _copy.CopyId = rental.CopyId
inner join Item _item
on _item.ItemId = _copy.ItemId
where _item.ReleaseDate >= '2018-11-22'

The inline join comes from (ReleaseDate rental), which returns

(scalar (Item-of rental)".ReleaseDate")

Putting joins inside scalars can be very useful. It is nice to be able to say "the ReleaseDate of a Rental" without repeating the details of any join(s) that might be involved.

1.4 Aggregates

This concept is a little more complex, so let’s start with some hand-written SQL. Let’s imagine that I need to write a query that returns Customers with 1) their number of Rentals and 2) their number of movie Rentals. (Our store rents things other than movies, such as video games.) This is the SQL I might write:

select customer.*

, rentalSummary.MovieRentalCount

, rentalSummary.TotalRentalCount

from Customer customer

inner join (

select checkout.CustomerId

, sum(case when item.ItemTypeId = 1 then 1

else 0 end) as MovieRentalCount

, count(*) as TotalRentalCount

from Rental rental

inner join Copy copy on copy.CopyId = rental.CopyId

inner join Item item on item.ItemId = copy.ItemId

inner join Checkout checkout on checkout.CheckoutId = rental.CheckoutId

group by checkout.CustomerId

) rentalSummary

on rentalSummary.CustomerId = customer.CustomerId

This is a pretty common pattern (for me at least):

Group a query (group Rentals by CustomerId)
Join it to the main query (join on rentalSummary.CustomerId ...)
Select the aggregates (select MovieRentalCount and TotalRentalCount)

But this hand-written SQL suffers from major composability problems. Notice how the aggregate expressions (MovieRentalCount and TotalRentalCount) are "sealed inside" the grouped join. Ideally I want to be able to define the grouping and joining (parts 1 and 2) independently of the aggregate operations (part 3). Plisqin allows me to do this:

> (define (my-query)
    (from cust Customer
          (join rentals (grouped-Rentals-of cust))
          (select cust".*")
          (select (count rentals)
                  " as TotalRentalCount")
          (select (sum "case when "(ItemTypeId rentals)" = 1 then 1 else 0 end")
                  " as MovieRentalCount")))
> (display (to-sql (my-query)))
select
  cust.*
  , _rental.__INJECT2 as TotalRentalCount
  , _rental.__INJECT3 as MovieRentalCount
from Customer cust
inner join (
    select
      _checkout.CustomerId as __INJECT1
      , count(*) as __INJECT2
      , sum(case when _item.ItemTypeId = 1 then 1 else 0 end) as __INJECT3
    from Rental _rental
    inner join Checkout _checkout
    on _checkout.CheckoutId = _rental.CheckoutId
    inner join Copy _copy
    on _copy.CopyId = _rental.CopyId
    inner join Item _item
    on _item.ItemId = _copy.ItemId
    group by _checkout.CustomerId) _rental
on _rental.__INJECT1 = cust.CustomerId

In the above example, (grouped-Rentals-of cust) returns a grouped join. This join has no select clauses of its own, but (count rentals) gets "injected into" this grouped join and exposed via the synthetic name __INJECT2. Similarly, the larger sum expression also gets "injected into" the grouped join as __INJECT3. (Don’t worry about __INJECT1 for now.)

Notice how the aggregate expressions live outside of the grouped join. In the following example two more aggregate expressions are added and new injections appear, while (grouped-Rentals-of cust) does not change.

> (define (my-query)
    (from cust Customer
          (join rentals (grouped-Rentals-of cust))
          ; Local definitions can help readability:
          (define (is-movie rental)
            (bool (ItemTypeId rental)" = 1"))
          (define (is-recent rental)
            (bool (CheckoutTime rental)" > '2018-01-01'"))
          (define (count-when predicate)
            (sum "case when "predicate" then 1 else 0 end"))
          (select cust".*")
          (select (count rentals)
                  " as TotalRentalCount")
          (select (count-when (is-movie rentals))
                  " as MovieRentalCount")
          (select (count-when (is-recent rentals))
                  " as RecentRentalCount")
          (select (count-when (bool (is-movie rentals)
                                    " and "
                                    (is-recent rentals)))
                  " as RecentMovieRentalCount")))
> (display (to-sql (my-query)))
select
  cust.*
  , _rental.__INJECT2 as TotalRentalCount
  , _rental.__INJECT3 as MovieRentalCount
  , _rental.__INJECT4 as RecentRentalCount
  , _rental.__INJECT5 as RecentMovieRentalCount
from Customer cust
inner join (
    select
      _checkout.CustomerId as __INJECT1
      , count(*) as __INJECT2
      , sum(case when _item.ItemTypeId = 1 then 1 else 0 end) as __INJECT3
      , sum(case when _checkout.CheckoutTime > '2018-01-01' then 1 else 0 end) as __INJECT4
      , sum(case when _item.ItemTypeId = 1 and _checkout.CheckoutTime > '2018-01-01' then 1 else 0 end) as __INJECT5
    from Rental _rental
    inner join Checkout _checkout
    on _checkout.CheckoutId = _rental.CheckoutId
    inner join Copy _copy
    on _copy.CopyId = _rental.CopyId
    inner join Item _item
    on _item.ItemId = _copy.ItemId
    group by _checkout.CustomerId) _rental
on _rental.__INJECT1 = cust.CustomerId

1.5 Putting it All Together

Let’s imagine that we want a report of Frequently-Rented Items. But we are going to want many variations on this report, such as

Frequently rented at certain stores
Frequently rented in a certain month
Frequently rented action movies
Frequently rented by customers over a certain age

We might use a table-valued function in standard SQL, but this will have at least three problems. First, every variation we introduce might require additional parameters. For example, to handle "customers older than X" we would need to add a "customerAgeMinimum" parameter and then we would have to update all the call sites. Second, we might end up paying a performance cost for joins that don’t get used. For example, when we add that "customerAgeMinimum" parameter, we are going to have to join the Customer table even if the caller indicates "no minimum" by passing null or zero. Third, passing a list (such as a list of StoreIds) is painful at best.

In Plisqin, we can handle all these variations with only one parameter.

> (define (frequently-rented-items [rental-query Rental])
    (from i Item
          (join rentals rental-query
                (group-by (ItemId rentals))
                (join-on (ItemId rentals)" = "(ItemId i)))
          (select i".*")
          (select (count rentals)" as NumRentals")
          (order-by (count rentals)" desc")))

The rental-query parameter is optional, and it defaults to Rental which will just create a new, unfiltered query of the Rental table. But since queries append by default, we can pass in an arbitrarily-filtered Rental query and our grouped join will append to it. This will allow us to handle all the variations we listed. First, let’s do the "Frequently rented at certain stores" variation:

> (define (at-certain-stores)
    (from r Rental
          (where (StoreId r)" in (42, 43, 44)")))
> (display (to-sql (frequently-rented-items (at-certain-stores))))
select
  i.*
  , r.__INJECT2 as NumRentals
from Item i
inner join (
    select
      _copy.ItemId as __INJECT1
      , count(*) as __INJECT2
    from Rental r
    inner join Checkout _checkout
    on _checkout.CheckoutId = r.CheckoutId
    inner join Store _store
    on _store.StoreId = _checkout.StoreId
    inner join Copy _copy
    on _copy.CopyId = r.CopyId
    where _store.StoreId in (42, 43, 44)
    group by _copy.ItemId) r
on r.__INJECT1 = i.ItemId
order by r.__INJECT2 desc

Next, frequently rented in a certain month:

> (define (this-month)
    (from r Rental
          (where (CheckoutTime r)" >= '2018-11-01'")
          (where (CheckoutTime r)" < '2018-12-01'")))
> (display (to-sql (frequently-rented-items (this-month))))
select
  i.*
  , r.__INJECT2 as NumRentals
from Item i
inner join (
    select
      _copy.ItemId as __INJECT1
      , count(*) as __INJECT2
    from Rental r
    inner join Checkout _checkout
    on _checkout.CheckoutId = r.CheckoutId
    inner join Copy _copy
    on _copy.CopyId = r.CopyId
    where _checkout.CheckoutTime >= '2018-11-01'
    and _checkout.CheckoutTime < '2018-12-01'
    group by _copy.ItemId) r
on r.__INJECT1 = i.ItemId
order by r.__INJECT2 desc

OK, we see where this is headed. Let’s just combine them all and declare victory:

> (define (combined-query)
    (frequently-rented-items
     (from r Rental
           ; at certain stores:
           (where (StoreId r)" in (42, 43, 44)")
           ; in a certain month:
           (where (CheckoutTime r)" >= '2018-11-01'")
           (where (CheckoutTime r)" < '2018-12-01'")
           ; having an 'Action' genre:
           (join g (Genres-of r)
                 (join-on g".GenreName = 'Action'"))
           ; by customers over a certain age:
           (where (CustomerBirthDate r)" < '1988-11-01'"))))
> (display (to-sql (combined-query)))
select
  i.*
  , r.__INJECT2 as NumRentals
from Item i
inner join (
    select
      _copy.ItemId as __INJECT1
      , count(*) as __INJECT2
    from Rental r
    inner join Copy _copy
    on _copy.CopyId = r.CopyId
    inner join Item _item
    on _item.ItemId = _copy.ItemId
    inner join ItemGenre _itemGenre
    on _itemGenre.ItemId = _item.ItemId
    inner join Genre _genre
    on _genre.GenreId = _itemGenre.GenreId
    and _genre.GenreName = 'Action'
    inner join Checkout _checkout
    on _checkout.CheckoutId = r.CheckoutId
    inner join Store _store
    on _store.StoreId = _checkout.StoreId
    inner join Customer _customer
    on _customer.CustomerId = _checkout.CustomerId
    where _store.StoreId in (42, 43, 44)
    and _checkout.CheckoutTime >= '2018-11-01'
    and _checkout.CheckoutTime < '2018-12-01'
    and _customer.CustomerBirthDate < '1988-11-01'
    group by _copy.ItemId) r
on r.__INJECT1 = i.ItemId
order by r.__INJECT2 desc

This concludes the sales pitch. Hopefully you can see that Plisqin allows you to do things that are impossible in SQL. Continue reading at Getting Started.

top ← prev up next →

1	Introduction
2	Getting Started
3	Walkthrough
4	Reference
5	Video Rental Example Schema
6	For Developers
7	Preview of Plisqin 0.2

1.1	Appendable Queries
1.2	Reusable Joins
1.3	Inline Joins
1.4	Aggregates
1.5	Putting it All Together