SQL to Blaze¶

This page maps SQL expressions to blaze expressions.

Note

The following SQL expressions are somewhat specific to PostgreSQL, but blaze itself works with any database for which a SQLAlchemy dialect exists.

Prerequisites¶

If you’re interested in testing these against a PostgreSQL database, make sure you’ve executed the following code in psql session:

CREATE TABLE df (
    id BIGINT,
    amount DOUBLE PRECISION,
    name TEXT
);

On the blaze side of things, the table below assumes the following code has been executed:

>>> from blaze import symbol, by, join, concat
>>> df = symbol('df', 'var * {id: int64, amount: float64, name: string}')

Note

Certain SQL constructs such as window functions don’t directly correspond to a particular Blaze expression. Map expressions are the closest representation of window functions in Blaze.

Computation	SQL	Blaze
Column Arithmetic	select amount * 2 from df	df.amount * 2
Multiple Columns	select id, amount from df	df[['id', 'amount']]
Selection	selelct * from df where amount > 300	df[df.amount > 300]
Group By	select avg(amount) from df group by name	by(df.name, amount=df.amount.mean())
Group By	select avg(amount) from df group by name, id	by(merge(df.name, df.id), amount=df.amount.mean())
Join	select * from df inner join df2 on df.name = df2.name	join(df, df2, 'name')
Map	select amount + 1 over () from df	df.amount.map(lambda x: x + 1, 'int64')
Relabel Columns	select id, name as alias, amount as dollars from df	df.relabel(name='alias', amount='dollars')
Drop duplicates	select distinct * from df	df.distinct()
	select distinct(name) from df	df.name.distinct()
	/* postgresql only / select distinct on (name) from df order by name
Reductions	select avg(amount) from df	df.amount.mean()
Reductions	select amount, count(amount) from df group by amount	df.amount.count_values()
Concatenate	select * from df union all select * from df	concat(df, df)
Column Type Information	select column_name, data_type, character_maximum_length from information_schema.columns where table_name = 'df'	df.dshape
Column Type Information	select column_name, data_type, character_maximum_length from information_schema.columns where table_name = 'df' and column_name = 'amount'	df.amount.dshape