=============
Basic Queries
=============

Here we give a quick overview of some of the more common query functionality.

We use the well known iris dataset

.. code-block:: python

   >>> from blaze import data
   >>> from blaze.utils import example
   >>> iris = data(example('iris.csv'))
   >>> iris.peek()
       sepal_length  sepal_width  petal_length  petal_width      species
   0            5.1          3.5           1.4          0.2  Iris-setosa
   1            4.9          3.0           1.4          0.2  Iris-setosa
   2            4.7          3.2           1.3          0.2  Iris-setosa
   3            4.6          3.1           1.5          0.2  Iris-setosa
   ...


Column Access
-------------

Select individual columns using attributes

.. code-block:: python

   >>> iris.species
           species
   0   Iris-setosa
   1   Iris-setosa
   2   Iris-setosa
   3   Iris-setosa
   ...

Or item access

.. code-block:: python

   >>> iris['species']
           species
   0   Iris-setosa
   1   Iris-setosa
   2   Iris-setosa
   3   Iris-setosa
   ...

Select many columns using a list of names

.. code-block:: python

   >>> iris[['sepal_length', 'species']]
       sepal_length      species
   0            5.1  Iris-setosa
   1            4.9  Iris-setosa
   2            4.7  Iris-setosa
   3            4.6  Iris-setosa
   ...


Mathematical operations
-----------------------

Use mathematical operators and functions as normal

.. code-block:: python

   >>> from blaze import log
   >>> log(iris.sepal_length * 10)
       sepal_length
   0       3.931826
   1       3.891820
   2       3.850148
   3       3.828641
   ...


Note that mathematical functions like ``log`` should be imported from ``blaze``.
These will translate to ``np.log``, ``math.log``, ``sqlalchemy.sql.func.log``,
etc. based on the backend.


Reductions
----------

As with many Blaze operations reductions like ``sum`` and ``mean`` may be used
either as methods or as base functions.

.. code-block:: python

   >>> iris.sepal_length.mean()  # doctest: +ELLIPSIS
   5.84333333333333...

   >>> from blaze import mean
   >>> mean(iris.sepal_length)  # doctest: +ELLIPSIS
   5.84333333333333...


Split-Apply-Combine
-------------------

The ``by`` operation expresses split-apply-combine computations.  It has the
general format

.. code-block:: python

   >>> by(table.grouping_columns, name_1=table.column.reduction(),
   ...                            name_2=table.column.reduction(),
   ...                            ...)  # doctest: +SKIP

Here is a concrete example.  Find the shortest, longest, and average petal
length by species.

.. code-block:: python

   >>> from blaze import by
   >>> by(iris.species, shortest=iris.petal_length.min(),
   ...                   longest=iris.petal_length.max(),
   ...                   average=iris.petal_length.mean())
              species  average  longest  shortest
   0      Iris-setosa    1.462      1.9       1.0
   1  Iris-versicolor    4.260      5.1       3.0
   2   Iris-virginica    5.552      6.9       4.5

This simple model can be extended to include more complex groupers and more
complex reduction expressions.


Add Computed Columns
--------------------

Add new columns using the ``transform`` function

.. code-block:: python

   >>> transform(iris, sepal_ratio = iris.sepal_length / iris.sepal_width,
   ...                 petal_ratio = iris.petal_length / iris.petal_width)  # doctest: +SKIP
       sepal_length  sepal_width  petal_length  petal_width      species  \
   0            5.1          3.5           1.4          0.2  Iris-setosa
   1            4.9          3.0           1.4          0.2  Iris-setosa
   2            4.7          3.2           1.3          0.2  Iris-setosa
   3            4.6          3.1           1.5          0.2  Iris-setosa

       sepal_ratio  petal_ratio
   0      1.457143     7.000000
   1      1.633333     7.000000
   2      1.468750     6.500000
   3      1.483871     7.500000
   ...


Text Matching
-------------

Match text with glob strings, specifying columns with keyword arguments.

.. code-block:: python

   >>> iris[iris.species.like('*versicolor')]  # doctest: +SKIP
       sepal_length  sepal_width  petal_length  petal_width          species
   50           7.0          3.2           4.7          1.4  Iris-versicolor
   51           6.4          3.2           4.5          1.5  Iris-versicolor
   52           6.9          3.1           4.9          1.5  Iris-versicolor


Relabel Column names
--------------------

.. code-block:: python

   >>> iris.relabel(petal_length='PETAL-LENGTH', petal_width='PETAL-WIDTH')  # doctest: +SKIP
       sepal_length  sepal_width  PETAL-LENGTH  PETAL-WIDTH      species
   0            5.1          3.5           1.4          0.2  Iris-setosa
   1            4.9          3.0           1.4          0.2  Iris-setosa
   2            4.7          3.2           1.3          0.2  Iris-setosa

========
Examples
========

Blaze can help solve many common problems that data analysts and scientists encounter. Here are a few examples of common issues that can be solved using  blaze.

Combining separate, gzipped csv files.
--------------------------------------

.. code-block:: python

   >>> from blaze import odo
   >>> from pandas import DataFrame
   >>> odo(example('accounts_*.csv.gz'), DataFrame)
      id      name  amount
   0   1     Alice     100
   1   2       Bob     200
   2   3   Charlie     300
   3   4       Dan     400
   4   5     Edith     500


Split-Apply-Combine
-------------------

.. code-block:: python

   >>> from blaze import data, by
   >>> t = data('sqlite:///%s::iris' % example('iris.db'))
   >>> t.peek()
       sepal_length  sepal_width  petal_length  petal_width      species
   0            5.1          3.5           1.4          0.2  Iris-setosa
   1            4.9          3.0           1.4          0.2  Iris-setosa
   2            4.7          3.2           1.3          0.2  Iris-setosa
   3            4.6          3.1           1.5          0.2  Iris-setosa
   4            5.0          3.6           1.4          0.2  Iris-setosa
   5            5.4          3.9           1.7          0.4  Iris-setosa
   6            4.6          3.4           1.4          0.3  Iris-setosa
   7            5.0          3.4           1.5          0.2  Iris-setosa
   8            4.4          2.9           1.4          0.2  Iris-setosa
   9            4.9          3.1           1.5          0.1  Iris-setosa
   ...
   >>> by(t.species, max=t.petal_length.max(), min=t.petal_length.min())
              species  max  min
   0      Iris-setosa  1.9  1.0
   1  Iris-versicolor  5.1  3.0
   2   Iris-virginica  6.9  4.5