SQL to Blaze
============

This page maps SQL expressions to blaze expressions.

.. note::

   The following SQL expressions are somewhat specific to PostgreSQL, but blaze
   itself works with any database for which a SQLAlchemy dialect exists.

Prerequisites
-------------

If you're interested in testing these against a PostgreSQL database, make sure
you've executed the following code in ``psql`` session:

.. code-block:: sql

   CREATE TABLE df (
       id BIGINT,
       amount DOUBLE PRECISION,
       name TEXT
   );

On the blaze side of things, the table below assumes the following code has
been executed:

.. code-block:: python

   >>> from blaze import symbol, by, join, concat
   >>> df = symbol('df', 'var * {id: int64, amount: float64, name: string}')

.. note::

   Certain SQL constructs such as window functions don't directly correspond to
   a particular Blaze expression. ``Map`` expressions are the closest
   representation of window functions in Blaze.


+-----------------+-------------------------------------------------+-----------------------------------------+
| Computation     | SQL                                             | Blaze                                   |
+=================+=================================================+=========================================+
|                 | .. code-block:: sql                             | .. code-block:: python                  |
| Column          |                                                 |                                         |
| Arithmetic      |    select amount * 2 from df                    |    df.amount * 2                        |
+-----------------+-------------------------------------------------+-----------------------------------------+
|                 | .. code-block:: sql                             | .. code-block:: python                  |
| Multiple        |                                                 |                                         |
| Columns         |    select id, amount from df                    |    df[['id', 'amount']]                 |
+-----------------+-------------------------------------------------+-----------------------------------------+
|                 | .. code-block:: sql                             | .. code-block:: python                  |
|                 |                                                 |                                         |
| Selection       |    selelct * from df where amount > 300         |    df[df.amount > 300]                  |
+-----------------+-------------------------------------------------+-----------------------------------------+
|                 | .. code-block:: sql                             | .. code-block:: python                  |
|  Group By       |                                                 |                                         |
|                 |    select avg(amount) from df group by name     |    by(df.name, amount=df.amount.mean()) |
|                 |                                                 |                                         |
|                 +-------------------------------------------------+-----------------------------------------+
|                 | .. code-block:: sql                             | .. code-block:: python                  |
|                 |                                                 |                                         |
|                 |    select avg(amount) from df group by name, id |    by(merge(df.name, df.id),            |
|                 |                                                 |       amount=df.amount.mean())          |
+-----------------+-------------------------------------------------+-----------------------------------------+
|                 | .. code-block:: sql                             | .. code-block:: python                  |
| Join            |                                                 |                                         |
|                 |    select * from                                |    join(df, df2, 'name')                |
|                 |        df inner join df2                        |                                         |
|                 |    on df.name = df2.name                        |                                         |
+-----------------+-------------------------------------------------+-----------------------------------------+
|                 | .. code-block:: sql                             | .. code-block:: python                  |
|                 |                                                 |                                         |
| Map             |    select amount + 1 over () from df            |    df.amount.map(lambda x: x + 1,       |
|                 |                                                 |                  'int64')               |
+-----------------+-------------------------------------------------+-----------------------------------------+
|                 | .. code-block:: sql                             | .. code-block:: python                  |
|                 |                                                 |                                         |
| Relabel Columns |    select                                       |    df.relabel(name='alias',             |
|                 |        id,                                      |               amount='dollars')         |
|                 |        name as alias,                           |                                         |
|                 |        amount as dollars                        |                                         |
|                 |     from df                                     |                                         |
+-----------------+-------------------------------------------------+-----------------------------------------+
|                 | .. code-block:: sql                             | .. code-block:: python                  |
|                 |                                                 |                                         |
| Drop duplicates |    select distinct * from df                    |    df.distinct()                        |
|                 +-------------------------------------------------+-----------------------------------------+
|                 | .. code-block:: sql                             | .. code-block:: python                  |
|                 |                                                 |                                         |
|                 |    select distinct(name) from df                |    df.name.distinct()                   |
|                 +-------------------------------------------------+-----------------------------------------+
|                 | .. code-block:: sql                             | .. code-block::python                   |
|                 |                                                 |                                         |
|                 |    /* postgresql only */                        |    # postgresql only                    |
|                 |    select distinct on (name) * from             |    df.sort(df.name).distinct(df.name)   |
|                 |    df order by name                             |    df.sort('name').distinct('name')     |
+-----------------+-------------------------------------------------+-----------------------------------------+
|                 | .. code-block:: sql                             | .. code-block:: python                  |
|                 |                                                 |                                         |
| Reductions      |    select avg(amount) from df                   |    df.amount.mean()                     |
|                 +-------------------------------------------------+-----------------------------------------+
|                 | .. code-block:: sql                             | .. code-block:: python                  |
|                 |                                                 |                                         |
|                 |    select amount, count(amount)                 |    df.amount.count_values()             |
|                 |    from df group by amount                      |                                         |
+-----------------+-------------------------------------------------+-----------------------------------------+
|                 | .. code-block:: sql                             | .. code-block:: python                  |
|                 |                                                 |                                         |
| Concatenate     |    select * from df                             |    concat(df, df)                       |
|                 |    union all                                    |                                         |
|                 |    select * from df                             |                                         |
+-----------------+-------------------------------------------------+-----------------------------------------+
|                 | .. code-block:: sql                             | .. code-block:: python                  |
|                 |                                                 |                                         |
|                 |    select                                       |    df.dshape                            |
|                 |        column_name,                             |                                         |
|                 |        data_type,                               |                                         |
|                 |        character_maximum_length                 |                                         |
|                 |    from                                         |                                         |
|                 |        information_schema.columns               |                                         |
| Column Type     |    where                                        |                                         |
| Information     |        table_name = 'df'                        |                                         |
|                 +-------------------------------------------------+-----------------------------------------+
|                 | .. code-block:: sql                             | .. code-block:: python                  |
|                 |                                                 |                                         |
|                 |    select                                       |    df.amount.dshape                     |
|                 |        column_name,                             |                                         |
|                 |        data_type,                               |                                         |
|                 |        character_maximum_length                 |                                         |
|                 |    from                                         |                                         |
|                 |        information_schema.columns               |                                         |
|                 |    where                                        |                                         |
|                 |        table_name = 'df'                        |                                         |
|                 |            and                                  |                                         |
|                 |        column_name = 'amount'                   |                                         |
+-----------------+-------------------------------------------------+-----------------------------------------+