Quickstart¶
This quickstart is here to show some simple ways to get started created and manipulating Blaze Symbols. To run these examples, import blaze as follows.
>>> from blaze import *
Blaze Interactive Data¶
Create simple Blaze expressions from nested lists/tuples. Blaze will deduce the dimensionality and data type to use.
>>> t = data([(1, 'Alice', 100),
... (2, 'Bob', -200),
... (3, 'Charlie', 300),
... (4, 'Denis', 400),
... (5, 'Edith', -500)],
... fields=['id', 'name', 'balance'])
>>> t.peek()
id name balance
0 1 Alice 100
1 2 Bob -200
2 3 Charlie 300
3 4 Denis 400
4 5 Edith -500
Simple Calculations¶
Blaze supports simple computations like column selection and filtering with familiar Pandas getitem or attribute syntax.
>>> t[t.balance < 0]
id name balance
0 2 Bob -200
1 5 Edith -500
>>> t[t.balance < 0].name
name
0 Bob
1 Edith
Stored Data¶
Define Blaze expressions directly from storage like CSV or HDF5 files. Here we operate on a CSV file of the traditional iris dataset.
>>> from blaze.utils import example
>>> iris = data(example('iris.csv'))
>>> iris.peek()
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
5 5.4 3.9 1.7 0.4 Iris-setosa
6 4.6 3.4 1.4 0.3 Iris-setosa
7 5.0 3.4 1.5 0.2 Iris-setosa
8 4.4 2.9 1.4 0.2 Iris-setosa
9 4.9 3.1 1.5 0.1 Iris-setosa
...
Use remote data like SQL databases or Spark resilient distributed data-structures in exactly the same way. Here we operate on a SQL database stored in a sqlite file.
>>> iris = data('sqlite:///%s::iris' % example('iris.db'))
>>> iris.peek()
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
5 5.4 3.9 1.7 0.4 Iris-setosa
6 4.6 3.4 1.4 0.3 Iris-setosa
7 5.0 3.4 1.5 0.2 Iris-setosa
8 4.4 2.9 1.4 0.2 Iris-setosa
9 4.9 3.1 1.5 0.1 Iris-setosa
...
More Computations¶
Common operations like Joins and split-apply-combine are available on any kind of data
>>> by(iris.species, # Group by species
... min=iris.petal_width.min(), # Minimum of petal_width per group
... max=iris.petal_width.max()) # Maximum of petal_width per group
species max min
0 Iris-setosa 0.6 0.1
1 Iris-versicolor 1.8 1.0
2 Iris-virginica 2.5 1.4
Finishing Up¶
Blaze computes only as much as is necessary to present the results on screen.
Fully evaluate the computation, returning an output similar to the input type
by calling compute
.
>>> t[t.balance < 0].name # Still an Expression
name
0 Bob
1 Edith
>>> list(compute(t[t.balance < 0].name)) # Just a raw list
['Bob', 'Edith']
Alternatively use the odo
operation to push your output into a suitable
container type.
>>> result = by(iris.species, avg=iris.petal_width.mean())
>>> result_list = odo(result, list) # Push result into a list
>>> odo(result, DataFrame) # Push result into a DataFrame
species avg
0 Iris-setosa 0.246
1 Iris-versicolor 1.326
2 Iris-virginica 2.026
>>> odo(result, example('output.csv')) # Write result to CSV file
<odo.backends.csv.CSV object at ...>