7.7
7 Preview of Plisqin 0.2
> | | (require (for-label plisqin/scribblings/racket | plisqin/private/lang/default-require)) |
|
|
|
> | | (current-connection 'cities-example) |
|
|
> | | {if 41.add1 = 21 * 1.add1 | "as it should" | "what is wrong?"} |
|
|
"as it should" |
7.1 Setup and Schema
Plisqin doesn’t require you to define your schema up front,
but this is the fastest way to demonstrate the coolest features.
TODO explain.
|
|
|
|
> | | (def-fields-of Country | CountryId | CountryName | CountryPopulation) |
|
|
> | | (def-fields-of City | CityId | CountryId | CityName | CityPopulation) |
|
|
> | | (def-fields-of Organization | OrgId | OrgShortName) |
|
|
> | | (def-fields-of CountryOrganization | CountryId | OrgId) |
|
|
7.2 Query Fundamentals
The
from macro is used to create a query.
The following code defines a function called
big-cities which
takes no arguments and returns a query:
Love S-expressions? Don’t worry, the syntactic sugar is optional.
You can write (CityName city) instead.
The above query has 4 clauses: an order-by clause, a limit
clause, and 2 select clauses.
You can pass a query into show-table and it will print the results:
> | | (show-table (big-cities)) |
|
|
Show TableShow SQL |
select | city.CityName | , city.CityPopulation | from City city | order by city.CityPopulation desc | limit 10 |
|
|
CityName | CityPopulation | Shanghai | 23390000 | Beijing | 21710000 | Lagos | 21324000 | Tokyo | 13784212 | Tianjin | 13245000 | Guangzhou | 13080500 | Mumbai | 12442373 | São Paulo | 12106920 | Nanyang | 12010000 | Shenzhen | 11908400 |
|
Queries are appendable. This means you can add more clauses to
a query you already defined elsewhere.
For example, let’s start with big-cities and append a
where clause that excludes cities with a population over 20 million:
> | | (show-table | {from city (big-cities) | {where city.CityPopulation <= 20 * 1000 * 1000}}) |
|
|
Show TableShow SQL |
select | city.CityName | , city.CityPopulation | from City city | where (city.CityPopulation <= 20000000) | order by city.CityPopulation desc | limit 10 |
|
|
CityName | CityPopulation | Tokyo | 13784212 | Tianjin | 13245000 | Guangzhou | 13080500 | Mumbai | 12442373 | São Paulo | 12106920 | Nanyang | 12010000 | Shenzhen | 11908400 | Wuhan | 11895000 | Kinshasa | 11855000 | Chengdu | 11050000 |
|
It’s important to understand that the above query has 5 clauses in total:
the where clause plus the 4 clauses from big-cities.
You might be able to foresee that appendable queries can greatly reduce code duplication.
7.3 Joins
Every City has exactly one Country.
Relationships between tables are represented using join.
The following query appends to big-cities, joins the Country table,
and adds a select clause.
> | | (define (explicit-join-example) | {from city (big-cities) | {join country Country | {join-on country.CountryId = city.CountryId}} | {select country.CountryName}}) |
|
|
The above query would work just fine, but joining the Country
of a City is something we will be doing often.
So let’s make it reusable:
> | | (def/append! (Country x) ; the Country of x ... | [(City? x) ; if x is a City ... | ; ... is this join: | {join country Country | {join-on country.CountryId = x.CountryId}}]) |
|
|
The definition of Country has been expanded.
In addition to being a table, Country can now be used as
a function that takes a City and returns a join to the City’s Country.
So now we can use {city.Country} in our queries:
> | | (define (big-cities-with-country) | {from city (big-cities) | {select city.Country.CountryName}}) |
|
|
> | | (show-table (big-cities-with-country)) |
|
|
Show TableShow SQL |
select | city.CityName | , city.CityPopulation | , country.CountryName | from City city | inner join Country country | on (country.CountryId = city.CountryId) | order by city.CityPopulation desc | limit 10 |
|
|
CityName | CityPopulation | CountryName | Shanghai | 23390000 | People's Republic of China | Beijing | 21710000 | People's Republic of China | Lagos | 21324000 | Nigeria | Tokyo | 13784212 | Japan | Tianjin | 13245000 | People's Republic of China | Guangzhou | 13080500 | People's Republic of China | Mumbai | 12442373 | India | São Paulo | 12106920 | Brazil | Nanyang | 12010000 | People's Republic of China | Shenzhen | 11908400 | People's Republic of China |
|
That’s pretty nice, but we can do even better.
Do we really want to say {city.Country.CountryName} every time?
Can we just say {city.CountryName} instead?
Yes, by expanding the definition of CountryName:
> | | (def/append! (CountryName x) ; the CountryName of x ... | [(City? x) ; if x is a City ... | ; ... is this expression: | {x.Country.CountryName}]) |
|
|
And that should do it. Let’s try it out:
> | | (define (big-cities-outside-china) | {from city (big-cities-with-country) | {where city.CountryName not-like "%china"}}) |
|
|
> | | (show-table (big-cities-outside-china)) |
|
|
Show TableShow SQL |
select | city.CityName | , city.CityPopulation | , country.CountryName | from City city | inner join Country country | on (country.CountryId = city.CountryId) | where (country.CountryName not like '%china') | order by city.CityPopulation desc | limit 10 |
|
|
CityName | CityPopulation | CountryName | Lagos | 21324000 | Nigeria | Tokyo | 13784212 | Japan | Mumbai | 12442373 | India | São Paulo | 12106920 | Brazil | Kinshasa | 11855000 | Democratic Republic of the Congo | Lahore | 10665000 | Pakistan | Chennai | 10265000 | India | Seoul | 9806538 | South Korea | Jakarta | 9769000 | Indonesia | Hyderabad | 9305000 | India |
|
7.4 Timeout for more Schema
Let’s take a peek at the Organization table:
> | | (show-table {from org Organization}) |
|
|
Show TableShow SQL |
select org.* | from Organization org |
|
|
OrgId | OrgShortName | 1 | EU | 2 | NATO | 3 | OPEC |
|
A Country can have multiple Organizations.
For example, Belgium is a member of both the EU and NATO.
Obviously, an Organization can have multiple Countries.
This many-to-many relationship requires a mapping table, CountryOrganization.
> | | (def/append! (CountryOrganizations x) | [(Country? x) | {join co CountryOrganization | {join-on co.CountryId = x.CountryId}}] | [(Organization? x) | {join co CountryOrganization | {join-on co.OrgId = x.OrgId}}]) |
|
|
> | | (def/append! (Organizations x) | [(Country? x) | {join org Organization | {join-on org.OrgId = x.CountryOrganizations.OrgId}}]) |
|
|
> | | (def/append! (Countries x) | [(Organization? x) | {join country Country | {join-on country.CountryId = x.CountryOrganizations.CountryId}}]) |
|
|
Now we are able to use {country.Organizations} and
{organization.Countries} in our queries.
Let me just prove what I said about Belgium is true:
> | | (define (belgium-orgs) | {from co Country | {where co.CountryName = "Belgium"} | {select co.CountryName} | {select co.Organizations.OrgShortName}}) |
|
|
> | | (show-table (belgium-orgs)) |
|
|
Show TableShow SQL |
select | co.CountryName | , org.OrgShortName | from Country co | inner join CountryOrganization co1 | on (co1.CountryId = co.CountryId) | inner join Organization org | on (org.OrgId = co1.OrgId) | where (co.CountryName = 'Belgium') |
|
|
CountryName | OrgShortName | Belgium | EU | Belgium | NATO |
|
7.5 Grouped Joins and Aggregates
Consider this code:
CitiesG means "a Group of Cities."
The general naming convention is PluralNounsG meaning "a Group of Plural Nouns."
> | | (def/append! (CitiesG x) | [(Country? x) | {join city City | {group-by city.CountryId} | {join-on city.CountryId = x.CountryId}}]) |
|
|
I call CitiesG a "grouped join."
An important quality of a grouped join is that it does not increase the cardinality
of the result set with respect to its argument(s).
In the above example, the argument x is tested to be a Country.
The join is grouped by CountryId.
Therefore, there will be at most 1 group per Country.
Grouped joins are used with aggregate functions.
In the following example, count and sum are the aggregates.
Notice that CitiesG always occurs inside an aggregate:
TODO it would probably be good to mention that grouping and aggregating
is compatible with the SQL way of doing it.
> | | (show-table (city-stats-by-country)) |
|
|
Show TableShow SQL |
select | co.CountryName | , city.__INJECT2 as CountCitiesG | , city.__INJECT3 as SumCityPopulation | from Country co | inner join ( | select | city.CountryId as __INJECT1 | , count(*) as __INJECT2 | , sum(city.CityPopulation) as __INJECT3 | from City city | group by city.CountryId) city | on (city.__INJECT1 = co.CountryId) | order by city.__INJECT2 desc | limit 10 |
|
|
CountryName | CountCitiesG | SumCityPopulation | United States of America | 3096 | 166081290 | Germany | 910 | 46732208 | Japan | 797 | 131330481 | Brazil | 457 | 72116443 | People's Republic of China | 292 | 1207807867 | India | 242 | 162615627 | Canada | 185 | 21393821 | Czech Republic | 182 | 5965467 | Hungary | 177 | 6099159 | Romania | 173 | 7136976 |
|
OK, so we are seeing aggregated City information by Country.
Could we not also see aggregated City information by Organization?
Yes, because just like a Country, an Organization also has a group of Cities.
So first, let’s extend the definition of CitiesG:
> | | (def/append! (CitiesG x) | [(Organization? x) | {join ct City | {group-by ct.Country.Organizations.OrgId} | {join-on ct.Country.Organizations.OrgId = x.OrgId}}]) |
|
|
The above join is a valid grouped join because it will return at most one
group of Cities per Organization. (Note that a single City might belong to
more than one Organization, but that is fine.)
Now if we review
city-stats-by-country, we see that the only
Country-specific clause is
{select co.CountryName}.
Let’s remove that clause to make a reusable version:
Now city-stats should work on any X for which CitiesG is defined!
Which makes it easy to implement the "by Country" and "by Organization" variants:
|
> | | (define (city-stats-by-org) | {from org (city-stats Organization) | {select org.OrgShortName}}) |
|
|
> | | (show-table (city-stats-by-country)) |
|
|
Show TableShow SQL |
select | city.__INJECT2 as CountCitiesG | , city.__INJECT3 as SumCityPopulation | , x.CountryName | , x.CountryId | from Country x | inner join ( | select | city.CountryId as __INJECT1 | , count(*) as __INJECT2 | , sum(city.CityPopulation) as __INJECT3 | from City city | group by city.CountryId) city | on (city.__INJECT1 = x.CountryId) | order by city.__INJECT2 desc | limit 10 |
|
|
CountCitiesG | SumCityPopulation | CountryName | CountryId | 3096 | 166081290 | United States of America | 3 | 910 | 46732208 | Germany | 20 | 797 | 131330481 | Japan | 12 | 457 | 72116443 | Brazil | 5 | 292 | 1207807867 | People's Republic of China | 1 | 242 | 162615627 | India | 2 | 185 | 21393821 | Canada | 42 | 182 | 5965467 | Czech Republic | 91 | 177 | 6099159 | Hungary | 100 | 173 | 7136976 | Romania | 68 |
|
> | | (show-table (city-stats-by-org)) |
|
|
Show TableShow SQL |
select | ct.__INJECT2 as CountCitiesG | , ct.__INJECT3 as SumCityPopulation | , x.OrgShortName | from Organization x | inner join ( | select | org.OrgId as __INJECT1 | , count(*) as __INJECT2 | , sum(ct.CityPopulation) as __INJECT3 | from City ct | inner join Country country | on (country.CountryId = ct.CountryId) | inner join CountryOrganization co | on (co.CountryId = country.CountryId) | inner join Organization org | on (org.OrgId = co.OrgId) | group by org.OrgId) ct | on (ct.__INJECT1 = x.OrgId) | order by ct.__INJECT2 desc | limit 10 |
|
|
CountCitiesG | SumCityPopulation | OrgShortName | 5609 | 364134810 | NATO | 2446 | 165607188 | EU | 275 | 142308308 | OPEC |
|
TODO guide what to read next.
> | | (define (case-example) | {from co Country | {define (million x) | {x * 1000 * 1000}} | {define (Size co) | {case-when | ; TODO need string literal handling here | {when co.CountryPopulation > 100.million then "'huge'"} | {when co.CountryPopulation > 30.million then "'big'"} | {when co.CountryPopulation > 1.million then "'medium'"} | {else "'small'"}}} | {group-by co.Size} | {select co.Size" as Size"} | {select co.count" as Count"}}) |
|
|
> | | (show-table (case-example)) |
|
|
Show TableShow SQL |
select | case when (co.CountryPopulation > 100000000) then 'huge' when (co.CountryPopulation > 30000000) then 'big' when (co.CountryPopulation > 1000000) then 'medium' else 'small' end as Size | , count(*) as Count | from Country co | group by case when (co.CountryPopulation > 100000000) then 'huge' when (co.CountryPopulation > 30000000) then 'big' when (co.CountryPopulation > 1000000) then 'medium' else 'small' end |
|
|
Size | Count | big | 36 | huge | 15 | medium | 123 | small | 43 |
|