==============================
Interacting with SQL Databases
==============================

How to
------

Typically one provides a SQL connection string to the ``data`` constructor

.. code-block:: python

   >>> db = data('postgresql:///user:pass@hostname')  # doctest: +SKIP

   or

   >>> t = data('postgresql://user:pass@hostname::my-table-name')  # doctest: +SKIP

Alternatively users familiar with SQLAlchemy can pass any SQLAlchemy engine,
metadata, or Table objects to ``data``.  This can be useful if you need to
specify more information that does not fit comfortably into a URI (like a
desired schema.)

.. code-block:: python

   >>> import sqlalchemy  # doctest: +SKIP
   >>> engine = sqlalchemy.create_engine('postgreqsql://hostname')  # doctest: +SKIP

   >>> db = data(engine)  # doctest: +SKIP

How does it work?
-----------------

As you manipulate a Blaze expression Blaze in turn manipulates a SQLAlchemy
expression.  When you ask for a result SQLAlchemy generates the SQL appropriate
for your database and sends the query to the database to be run.


What databases does Blaze support?
----------------------------------

Blaze derives all SQL support from SQLAlchemy so really one should ask, *What
databases does SQLAlchemy support?*.  The answer is *quite a few* in the main
SQLAlchemy project and *most* when you include third party libraries.

However, URI support within Blaze is limited to a smaller set.  For exotic
databases you may have to create a ``sqlalchemy.engine`` explicitly as shown
above.


What operations work on SQL databases?
--------------------------------------

Most tabular operations, but not all.  SQLAlchemy translation is a high
priority. Failures include array operations like slicing and dot products don't
make sense in SQL.  Additionally some operations like datetime access are not
yet well supported through SQLAlchemy.  Finally some databases, like SQLite,
have limited support for common mathematical functions like ``sin``.


How can I try this out?
-----------------------

The easiest way to play with SQL is to download a SQLite database.  We
recommend the `Lahman baseball statistics database`_.  After downloading one could connect blaze
to that database with the following code

.. code-block:: python

   >>> from blaze import data
   >>> db = data('sqlite:///Downloads/lahman2013.sqlite') # doctest: +SKIP
   >>> db.<tab>  # see available tables  # doctest: +SKIP
   db.AllstarFull          db.FieldingOF           db.Schools              db.fields
   db.Appearances          db.FieldingPost         db.SchoolsPlayers       db.isidentical
   db.AwardsManagers       db.HallOfFame           db.SeriesPost           db.like
   db.AwardsPlayers        db.Managers             db.Teams                db.map
   db.AwardsShareManagers  db.ManagersHalf         db.TeamsFranchises      db.relabel
   db.AwardsSharePlayers   db.Master               db.TeamsHalf            db.schema
   db.Batting              db.Pitching             db.apply                db.temp
   db.BattingPost          db.PitchingPost         db.data
   db.Fielding             db.Salaries             db.dshape
   >>> db.Teams.peek()  # view one particular database  # doctest: +SKIP
       yearID lgID teamID franchID divID  Rank   G  Ghome   W   L     ...       \
   0     1871   NA    BS1      BNA  None     3  31    NaN  20  10     ...
   1     1871   NA    CH1      CNA  None     2  28    NaN  19   9     ...
   2     1871   NA    CL1      CFC  None     8  29    NaN  10  19     ...
   3     1871   NA    FW1      KEK  None     7  19    NaN   7  12     ...

       DP    FP                     name                               park  \
   0  NaN  0.83     Boston Red Stockings                South End Grounds I
   1  NaN  0.82  Chicago White Stockings            Union Base-Ball Grounds
   2  NaN  0.81   Cleveland Forest Citys       National Association Grounds
   3  NaN  0.80     Fort Wayne Kekiongas                     Hamilton Field

       attendance  BPF  PPF  teamIDBR  teamIDlahman45  teamIDretro
   0          NaN  103   98       BOS             BS1          BS1
   1          NaN  104  102       CHI             CH1          CH1
   2          NaN   96  100       CLE             CL1          CL1
   3          NaN  101  107       KEK             FW1          FW1
   ...

One can then query and compute results as with a normal blaze workflow.


Connecting to a Schema Other than ``public`` with PostgreSQL
------------------------------------------------------------

To connect to a non-default schema, one may pass a ``sqlalchemy.MetaData``
object to ``data``. For example:


.. code-block:: python

   >>> from blaze import data
   >>> from sqlalchemy import MetaData
   >>> ds = data(MetaData('postgresql://localhost/test', schema='my_schema'))
   >>> ds.dshape  # doctest: +SKIP
   dshape("{table_a: var * {a: ?int32}, table_b: var * {b: ?int32}}")

.. _`Lahman baseball statistics database`: https://github.com/jknecht/baseball-archive-sqlite/raw/master/lahman2013.sqlite


Foreign Keys and Automatic Joins
--------------------------------

Often times one wants to access the columns of a table into which we have a foreign key.

For example, given a ``products`` table with this schema:

  .. code-block:: sql

     create table products (
         id integer primary key,
         name text
     )

and an ``orders`` table with this schema:

  .. code-block:: sql

     create table orders (
         id integer primary key,
         product_id integer references (id) products,
         quantity integer
     )

we want to get the name of the products in every order. In SQL, you would write the following join:

  .. code-block:: sql

     select
         o.id, p.name
     from
         orders o
             inner join
         products p
             on o.product_id = p.id


This is fairly straightforward. However, when you have more than two joins the SQL
gets unruly and hard to read. What we really want is a syntactically simply way to
follow the chain of foreign key relationships and be able to access columns in
foreign tables without having to write a lot of code. This is where blaze comes in.

Blaze can generate the above joins for you, so instead of writing a bunch of joins in
SQL you can simply access the columns of a foreign table as if they were columns on
the foreign key column.

The previous example in blaze looks like this:

  .. code-block:: python

     >>> from blaze import data, compute
     >>> d = data('postgresql://localhost/db')  # doctest: +SKIP
     >>> d.fields  # doctest: +SKIP
     ['products', 'orders']
     >>> expr = d.orders.product_id.name  # doctest: +SKIP
     >>> print(compute(expr))  # doctest: +SKIP
     SELECT orders.id, p.name
     FROM orders as o, products as p
     WHERE o.product_id = p.id


.. warning::

   The above feature is very experimental right now. We would
   appreciate bug reports and feedback on the API.