Quickstart =========== This quickstart is here to show some simple ways to get started created and manipulating Blaze Symbols. To run these examples, import blaze as follows. .. code-block:: python >>> from blaze import * Blaze Interactive Data ~~~~~~~~~~~~~~~~~~~~~~ Create simple Blaze expressions from nested lists/tuples. Blaze will deduce the dimensionality and data type to use. .. code-block:: python >>> t = data([(1, 'Alice', 100), ... (2, 'Bob', -200), ... (3, 'Charlie', 300), ... (4, 'Denis', 400), ... (5, 'Edith', -500)], ... fields=['id', 'name', 'balance']) >>> t.peek() id name balance 0 1 Alice 100 1 2 Bob -200 2 3 Charlie 300 3 4 Denis 400 4 5 Edith -500 Simple Calculations ~~~~~~~~~~~~~~~~~~~ Blaze supports simple computations like column selection and filtering with familiar Pandas getitem or attribute syntax. .. code-block:: python >>> t[t.balance < 0] id name balance 0 2 Bob -200 1 5 Edith -500 >>> t[t.balance < 0].name name 0 Bob 1 Edith Stored Data ~~~~~~~~~~~ Define Blaze expressions directly from storage like CSV or HDF5 files. Here we operate on a CSV file of the traditional `iris dataset`_. .. code-block:: python >>> from blaze.utils import example >>> iris = data(example('iris.csv')) >>> iris.peek() sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 Iris-setosa 1 4.9 3.0 1.4 0.2 Iris-setosa 2 4.7 3.2 1.3 0.2 Iris-setosa 3 4.6 3.1 1.5 0.2 Iris-setosa 4 5.0 3.6 1.4 0.2 Iris-setosa 5 5.4 3.9 1.7 0.4 Iris-setosa 6 4.6 3.4 1.4 0.3 Iris-setosa 7 5.0 3.4 1.5 0.2 Iris-setosa 8 4.4 2.9 1.4 0.2 Iris-setosa 9 4.9 3.1 1.5 0.1 Iris-setosa ... Use remote data like SQL databases or Spark resilient distributed data-structures in exactly the same way. Here we operate on a SQL database stored in a `sqlite file`_. .. code-block:: python >>> iris = data('sqlite:///%s::iris' % example('iris.db')) >>> iris.peek() sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 Iris-setosa 1 4.9 3.0 1.4 0.2 Iris-setosa 2 4.7 3.2 1.3 0.2 Iris-setosa 3 4.6 3.1 1.5 0.2 Iris-setosa 4 5.0 3.6 1.4 0.2 Iris-setosa 5 5.4 3.9 1.7 0.4 Iris-setosa 6 4.6 3.4 1.4 0.3 Iris-setosa 7 5.0 3.4 1.5 0.2 Iris-setosa 8 4.4 2.9 1.4 0.2 Iris-setosa 9 4.9 3.1 1.5 0.1 Iris-setosa ... More Computations ~~~~~~~~~~~~~~~~~ Common operations like Joins and split-apply-combine are available on any kind of data .. code-block:: python >>> by(iris.species, # Group by species ... min=iris.petal_width.min(), # Minimum of petal_width per group ... max=iris.petal_width.max()) # Maximum of petal_width per group species max min 0 Iris-setosa 0.6 0.1 1 Iris-versicolor 1.8 1.0 2 Iris-virginica 2.5 1.4 Finishing Up ~~~~~~~~~~~~ Blaze computes only as much as is necessary to present the results on screen. Fully evaluate the computation, returning an output similar to the input type by calling ``compute``. .. code-block:: python >>> t[t.balance < 0].name # Still an Expression name 0 Bob 1 Edith >>> list(compute(t[t.balance < 0].name)) # Just a raw list ['Bob', 'Edith'] Alternatively use the ``odo`` operation to push your output into a suitable container type. .. code-block:: python >>> result = by(iris.species, avg=iris.petal_width.mean()) >>> result_list = odo(result, list) # Push result into a list >>> odo(result, DataFrame) # Push result into a DataFrame species avg 0 Iris-setosa 0.246 1 Iris-versicolor 1.326 2 Iris-virginica 2.026 >>> odo(result, example('output.csv')) # Write result to CSV file .. _`iris dataset`: https://raw.githubusercontent.com/blaze/blaze/master/blaze/examples/data/iris.csv .. _`sqlite file`: https://raw.githubusercontent.com/blaze/blaze/master/blaze/examples/data/iris.db