SQL to Blaze¶
This page maps SQL expressions to blaze expressions.
Note
The following SQL expressions are somewhat specific to PostgreSQL, but blaze itself works with any database for which a SQLAlchemy dialect exists.
Prerequisites¶
If you’re interested in testing these against a PostgreSQL database, make sure
you’ve executed the following code in psql
session:
CREATE TABLE df (
id BIGINT,
amount DOUBLE PRECISION,
name TEXT
);
On the blaze side of things, the table below assumes the following code has been executed:
>>> from blaze import symbol, by, join, concat
>>> df = symbol('df', 'var * {id: int64, amount: float64, name: string}')
Note
Certain SQL constructs such as window functions don’t directly correspond to
a particular Blaze expression. Map
expressions are the closest
representation of window functions in Blaze.
Computation | SQL | Blaze |
---|---|---|
Column Arithmetic | select amount * 2 from df
|
df.amount * 2
|
Multiple Columns | select id, amount from df
|
df[['id', 'amount']]
|
Selection | selelct * from df where amount > 300
|
df[df.amount > 300]
|
Group By | select avg(amount) from df group by name
|
by(df.name, amount=df.amount.mean())
|
select avg(amount) from df group by name, id
|
by(merge(df.name, df.id),
amount=df.amount.mean())
|
|
Join | select * from
df inner join df2
on df.name = df2.name
|
join(df, df2, 'name')
|
Map | select amount + 1 over () from df
|
df.amount.map(lambda x: x + 1,
'int64')
|
Relabel Columns | select
id,
name as alias,
amount as dollars
from df
|
df.relabel(name='alias',
amount='dollars')
|
Drop duplicates | select distinct * from df
|
df.distinct()
|
select distinct(name) from df
|
df.name.distinct()
|
|
/* postgresql only */
select distinct on (name) * from
df order by name
|
||
Reductions | select avg(amount) from df
|
df.amount.mean()
|
select amount, count(amount)
from df group by amount
|
df.amount.count_values()
|
|
Concatenate | select * from df
union all
select * from df
|
concat(df, df)
|
Column Type Information | select
column_name,
data_type,
character_maximum_length
from
information_schema.columns
where
table_name = 'df'
|
df.dshape
|
select
column_name,
data_type,
character_maximum_length
from
information_schema.columns
where
table_name = 'df'
and
column_name = 'amount'
|
df.amount.dshape
|