3.2 Layer 1 - Schema as Procedures
We’ve already seen two ways to deal with our DB schema. One is to use strings, like this:
(from r "Rental" (where r".RentalId > 100"))
The other is to use procedures, like this:
(from r Rental (where (RentalId r)" > 100"))
But we haven’t really looked at how these procedures are defined. This section will walk you through how to recreate the definitions of the Video Rental Example Schema. (Procedures like RentalId are not hyperlinked in this section because we are not using those definitions; we are recreating them.) The code we write will be heavily influenced by my personal convention for naming and organizing the procedures that represent the database schema.
We will be constructing a file called "video-rental-schema.rkt". If you are following along, start a new file now. The first line of this file should be #lang racket. Next, add this line to your file:
(require plisqin)
Whenever you see a code block formatted like the above, you should add that code to your file.
This format, on the other hand, shows a REPL interaction:
> (to-sql (scalar "hello world")) "hello world"
You can try that code in your REPL and verify that the result matches this guide. If you get errors, revisit Environment Setup.
3.2.1 Tables
First we use table to define our tables. We can use the information schema to generate this code. For example, running this query in SQL Server:
select '(table '+TABLE_NAME+')' |
from INFORMATION_SCHEMA.TABLES |
where TABLE_TYPE='BASE TABLE' |
order by TABLE_NAME |
gives us the following Racket code:
I like to use CamelCase for tables and fields. This helps avoid name clashes and enhances readability for me.
(table Account) (table Checkout) (table Copy) (table Customer) (table District) (table Employee) (table Genre) (table Item) (table ItemGenre) (table ItemType) (table Rental) (table Store)
> (displayln (to-sql (from r (Rental) (join c (Copy) (join-on c".CopyId = "r".RentalId")))))
select _rental.*
from Rental _rental
inner join Copy _copy
on _copy.CopyId = _rental.RentalId
> (and (Rental? (Rental)) (Rental? (from r (Rental))) (Rental? (join r (Rental)))) #t
> (Rental? (Copy)) #f
3.2.2 Columns
To define the columns, we can use the field-cases macro. For example, to define a column Bar on table Foo we would write
(field-cases (Bar x) [(Foo? x) (scalar x".Bar")])
But this a little too verbose for column information that we can pull from the information schema. So let’s define a helper macro called column:
(define-syntax-rule (column table? column-name) (field-cases (column-name x) [(table? x) (scalar x (format ".~a" 'column-name))]))
Now we can define columns using (column Foo? Bar). This makes it a lot easier to use the information schema again. This query works for SQL Server:
select '(column '+TABLE_NAME+'? '+COLUMN_NAME+')' |
from INFORMATION_SCHEMA.COLUMNS |
order by TABLE_NAME, COLUMN_NAME |
And produces the following code:
(column Account? AccountId) (column Account? Address) (column Account? PhoneNumber) (column Checkout? CheckoutId) (column Checkout? CheckoutTime) (column Checkout? CustomerId) (column Checkout? EmployeeId) (column Checkout? StoreId) (column Copy? Barcode) (column Copy? CopyId) (column Copy? ItemId) (column Customer? AccountId) (column Customer? CustomerBirthDate) (column Customer? CustomerId) (column Customer? CustomerName) (column District? DistrictId) (column District? DistrictName) (column Employee? EmployeeId) (column Employee? EmployeeName) (column Employee? PrimaryStoreId) (column Genre? GenreId) (column Genre? GenreName) (column Item? ItemId) (column Item? ItemName) (column Item? ItemTypeId) (column Item? ReleaseDate) (column ItemGenre? GenreId) (column ItemGenre? ItemId) (column ItemType? ItemTypeId) (column ItemType? ItemTypeName) (column Rental? CheckoutId) (column Rental? CopyId) (column Rental? PricePaid) (column Rental? RentalId) (column Store? Address) (column Store? DistrictId) (column Store? StoreId)
3.2.3 Joins
A singular join is a join that will not increase the number of rows in the result set. It gets a name like SingularNoun-of/s. This means that (SingularNoun-of/s x) should join at most 1 matching SingularNoun for each x.
A plural join is a join that is allowed to increase the number of rows in the result set. It gets a name like PluralNouns-of/p. This means that (PluralNouns-of/p x) can join any number of matching PluralNouns for each x.
A grouped join is a join that contains at least one group-by clause. It gets a name like PluralNouns-of/g but it is actually a special case of singular join. For example, (Rentals-of/g item) means "group Rentals by Item and join those groups to Item".
Note that this is just my personal convention. (See also: of/s, of/p, and of/g.) Plisqin does not know or care whether a join is singular or plural.
It might be possible have the information schema help us here, but I just write joins by hand.
3.2.3.1 Singular Join Examples
Let’s define Copy-of/s which should return a valid value for everything that has at most 1 Copy:
(field-cases (Copy-of/s x) [(Rental? x) (join c (Copy) (join-on (CopyId c)" = "(CopyId x)))] [(Copy? x) x])
Using field-cases is kind of like Racket’s built-in cond, but one difference is that the conditions are checked in reverse order (bottom to top). So our definition of (Copy-of/s x) means "If x is already a Copy, just return it. If x is a Rental return a join using CopyId. Otherwise, none of the conditions match and it is an error."
My convention says that singular joins are allowed to contain other singular joins, but not plural joins. To demonstrate, let’s do one more singular join:
(field-cases (Item-of/s x) [(Copy? x) (join i (Item) (join-on (ItemId i)" = "(ItemId x)))] [(Rental? x) (Item-of/s (Copy-of/s x))] [(Item? x) x])
The use of recursion in the Rental? condition is a common pattern. We have (Item-of/s copy) working already. We have previously defined (Copy-of/s rental). So we can use these two definitions to define (Item-of/s rental) as (Item-of/s (Copy-of/s rental)).
3.2.3.2 Plural Join Examples
Plural joins work pretty much the same way as singular joins. Let’s look at Genres. The Item:Genre relationship is Many:Many so we have the ItemGenre mapping table between them. That means Genre:ItemGenre is One:Many, and Item:ItemGenre is also One:Many. First we define ItemGenres-of/p
(field-cases (ItemGenres-of/p x) [(Item? x) (join ig (ItemGenre) (join-on (ItemId ig)" = "(ItemId x)))] [(Genre? x) (join ig (ItemGenre) (join-on (GenreId ig)" = "(GenreId x)))] [(or (Copy? x) (Rental? x)) (ItemGenres-of/p (Item-of/s x))])
But in most of our queries, we won’t care about the ItemGenre table. We will only be thinking about Items and Genres. For example, join all the Genres for a given Item/Copy/Rental. We can define Genres-of/p like this:
(field-cases (Genres-of/p x) [(Item? x) (join g (Genre) (join-on (GenreId g)" = "(GenreId (ItemGenres-of/p x))))] [(or (Copy? x) (Rental? x)) (Genres-of/p (Item-of/s x))])
Let’s also define Copies-of/p. There isn’t really much to explain here, but we will need these definitions later.
(field-cases (Copies-of/p x) [(Item? x) (join c (Copy) (join-on (ItemId c)" = "(ItemId x)))])
3.2.3.3 Grouped Join Examples
A grouped join is a join that contains a group-by clause. The definition of (Rentals-of/g x) should return the correct grouped join for any x that has a group of Rentals:
(field-cases (Rentals-of/g x) [(Copy? x) (join r (Rental) (join-on (CopyId r)" = "(CopyId x)) (group-by (CopyId r)))] [(Item? x) (join r (Rental) (join c (Copy-of/s r)) (join-on (ItemId c)" = "(ItemId x)) (group-by (ItemId c)))])
> (define (rental-summary item-or-copy) (from x item-or-copy (join r (Rentals-of/g x)) (select (count r)" as NumRentals") (select x".*") (order-by (count r)" desc"))) > (define copy-sql (to-sql (rental-summary (Copy)))) > (define item-sql (to-sql (rental-summary (Item)))) > (displayln copy-sql)
select
_rental.__INJECT2 as NumRentals
, _copy.*
from Copy _copy
inner join (
select
_rental.CopyId as __INJECT1
, count(*) as __INJECT2
from Rental _rental
group by _rental.CopyId) _rental
on _rental.__INJECT1 = _copy.CopyId
order by _rental.__INJECT2 desc
> (displayln item-sql)
select
_rental.__INJECT2 as NumRentals
, _item.*
from Item _item
inner join (
select
_copy.ItemId as __INJECT1
, count(*) as __INJECT2
from Rental _rental
inner join Copy _copy
on _copy.CopyId = _rental.CopyId
group by _copy.ItemId) _rental
on _rental.__INJECT1 = _item.ItemId
order by _rental.__INJECT2 desc
If you don’t understand the occurrences of __INJECT, revisit the Injections section.
3.2.4 Derived Scalars
By "derived scalar" I mean any scalar expression that is not a direct access of some database column. For example, an SQL expression like (lineItem.UnitPrice * lineItem.Quantity) would be a derived scalar. My convention dictates that scalars must be singular, meaning that they do not contain plural joins. It would be very confusing if accessing a scalar could increase the number of rows in the result set.
This is a very common type of derived scalar. It says that the ItemName of a Copy or Rental is the ItemName of its (singular) Item:
(field-cases (ItemName x) [(or (Copy? x) (Rental? x)) (ItemName (Item-of/s x))])
But didn’t we already define ItemName earlier, in the Columns section? Yes, but field-cases allows you to append to previously existing definitions, even across files:
> (field-cases (twice x) [(integer? x) (* x 2)])
> (field-cases (twice x) [(string? x) (format "~a ~a" x x)]) > (twice 3) 6
> (twice "pizza") "pizza pizza"
The conditions are evaluated in reverse order. (This allows you to save generated code in one file and manual overrides in another file, which can be handy.) To prove this, let’s override the integer? case of twice, and notice that the string? case still works the same:
> (field-cases (twice x) [(integer? x) (format "Overridden: ~a" (* x 2))]) > (twice 3) "Overridden: 6"
> (twice "pizza") "pizza pizza"
Now let’s do one more derived scalar. This one will involved a grouped join. Remember that grouped joins are singular, so they are allowed in scalars.
(field-cases (NumRentals x) [(or (Item? x) (Copy? x)) (count (Rentals-of/g x))])
Now we can get the number of Rentals for a given Item or Copy. Let’s test both NumRentals and the extended definition of ItemName.
> (define (my-query) (from c (Copy) (select (NumRentals c)) (select (ItemName c)) (select c".*"))) > (displayln (to-sql (my-query)))
select
_rental.__INJECT2
, _item.ItemName
, _copy.*
from Copy _copy
inner join (
select
_rental.CopyId as __INJECT1
, count(*) as __INJECT2
from Rental _rental
group by _rental.CopyId) _rental
on _rental.__INJECT1 = _copy.CopyId
inner join Item _item
on _item.ItemId = _copy.ItemId
It works as expected.
Imagine that as our business grows, we notice that queries including (NumRentals item) are performing poorly, even with proper indexing. So we decide to denormalize, and we add a "NumRentals" column to the Item table, knowing that it might be a few hours out of date. We can update our definition of NumRentals to use this new column, and none of the calling code has to change.