Release Notes¶
Release 0.9.1¶
Release: | 0.9.1 |
---|---|
Date: | December 17th, 2015 |
New Expressions¶
Improved Expressions¶
New Backends¶
- Initial support for dask.dataframe has been added, see (#1317). Please send feedback via an issue or pull request if we missed any expressions you need.
Improved Backends¶
Blaze Server now supports dynamically adding datasets (#1329).
Two new keyword only arguments are added to
compute()
for use when computing against aClient
object:compute_kwargs
: This is a dictionary to send to the server to use as keyword arguments when callingcompute
on the server.odo_kwargs
: This is a dictionary to send to the server to use as keyword arguments when callingodo
on the server.
This extra information is completely optional and will have different meanings based on the backend of the data on the server (#1342).
Experimental Features¶
- There is now support for joining tables from multiple sources. This is very experimental right now, so use it at your own risk. It currently only works with things that fit in memory (#1282).
- Foreign columns in database tables that have foreign key relationships can now be accessed with a more concise syntax (#1192).
API Changes¶
- Removed support for Python 2.6 (#1267).
- Removed support for Python 3.3 (#1270).
- When a CSV file consists of all strings, you must pass
has_header=True
when using theData
constructor (#1254). - Comparing
date
anddatetime
datashaped things to the empty string now raises aTypeError
(#1308). Like
expressions behave like a predicate, and operate on columns, rather than performing the selection for you on a table (#1333, #1340).blaze.server.Server.run()
no longer retries binding to a new port by default. Also, positional arguments are no longer forwarded to the inner flask app’srun
method. All keyword arguments not consumed by the blaze serverrun
are still forwarded (#1316).Server
represents datashapes in a canonical form with consistent linebreaks for use by non-Python clients (#1361).
Bug Fixes¶
- Fixed a bug where
Merge
expressions would unpack option types in their fields. This could cause you to have a table whereexpr::{a: int32}
butexpr.a::?int32
. Note that the dotted access is an option (#1262). - Explicitly set
Node.__slots__
andExpr.__slots__
to()
. This ensures instances of slotted subclasses likeJoin
do not have a useless empty__dict__
attribute (#1274 and #1268). - Fixed a bug that prevented creating a
InteractiveSymbol
that wrappednan
if the dshape wasdatetime
. This now correctly coerces to NaT (#1272). - Fixed an issue where blaze client/server could not use isin expressions
because the
frozenset
failed to serialize. This also added support for rich serialization over json for things like datetimes (#1255). - Fixed a bug where
len
would fail on an interactive expression whose resources were sqlalchemy objects (#1273). - Use aliases instead of common table expressions (CTEs) because MySQL doesn’t have CTEs (#1278).
- Fixed a bug where we couldn’t display an empty string identifier in interactive mode (#1279).
- Fixed a bug where comparisons with optionals that should have resulted in optionals did not (#1292).
- Fixed a bug where
Join.schema
would not always be instantiated (#1288). - Fixed a bug where comparisons to a empty string magically converted
the empty string to
None
(#1308). - Fix the
retry
kwarg to the blaze server. Whenretry
is False, an exception is now properly raised if the port is in use. (#1316). - Fixed a bug where leaves that only appeared in the predicate of a selection
would not be in scope in time to compute the predicate. This would cause whole
expressions like
a[a > b]
to fail becauseb
was not in scope (#1275). - Fix a broken test on machines that don’t allow postgres to read from the local filesystem (#1323).
- Updated a test to reflect changes from odo #366 (#1323).
- Fixed pickling of blaze expressions with interactive symbols (#1319).
- Fixed repring partials in blaze expression to show keyword arguments (#1319).
- Fixed a memory leak that would preserve the lifetime of any blaze expression that had cached an attribute access (#1335).
- Fixed a bug where
common_subexpression()
gave the wrong answer (#1325, #1338). BinaryMath
operations without numba installed were failing (#1343).- win32 tests were failing for
hypot
andatan2
due to slight differences in numpy vs numba implementations of those functions (#1343). - Only start up a
ThreadPool
when using the h5py backend (#1347, #1331). - Fix return type for sum and mean reductions whose children have a
Decimal
dshape.
Miscellaneous¶
blaze.server.Server.run()
now useswarnings.warn()
instead ofprint
when it fails to bind to a port and is retrying (#1316).- Make expressions (subclasses of Expr) weak referencable (:issue:`1319).
- Memoize dshape and schema methods (#1319).
- Use
pandas.DataFrame.sort_values()
with pandas version >= 0.17.0 (#1321).
Release 0.8.3¶
Release: | 0.8.3 |
---|---|
Date: | September 15, 2015 |
New Expressions¶
Improved Expressions¶
Distinct
expressions now support an on parameter to allow distinct on a subset of columns (#1159)Reduction
instances are now named as their class name if their_child
attribute is named'_'
(#1198)Join
expressions now promotes the types of the fields being joined on. This allows us to join things likeint32
andint64
and have the result be an int64. This also allows us to join any typea
with?a
. (#1193, #1218).
New Backends¶
Improved Backends¶
API Changes¶
Serialization format in blaze server is now passed in as a mimetype (#1176)
We only allow and use HTTP
POST
requests when sending a computation to Blaze server for consistency with the HTTP spec (#1172)Allow
Client
objects to explicitly disable verification of ssl certificates by passingverify_ssl=False
. (#1170)Enable basic auth for the blaze server. The server now accepts an
authorization
keyword which must be a callable that accepts an object holding the username and password, or None if no auth was given and returns a bool indicating if the request should be allowed.Client
objects can pass an optionalauth
keyword which should be a tuple of (username, password) to send to the server. (#1175)We now allow
Distinct
expressions onColumnElement
to be more general and let things likesa.sql.elements.Label
objects through (#1212)Methods now take priority over field names when using attribute access for
Field
instances to fix a bug that prevented accessing the method at all (#1204). Here’s an example of how this works:>>> from blaze import symbol >>> t = symbol('t', 'var * {max: float64, isin: int64, count: int64}') >>> t['count'].max() t.count.max() >>> t.count() # calls the count method on t t.count() >>> t.count.max() # AttributeError Traceback (most recent call last): ... AttributeError: ...
Bug Fixes¶
- Upgrade versioneer so that our version string is now PEP 440 compliant (#1171)
- Computed columns (e.g., the result of a call to
transform()
) can now be accessed via standard attribute access when using the SQL backend (#1201) - Fixed a bug where blaze server was depending on an implementation detail of
CPython regarding
builtins
(#1196) - Fixed incorrect SQL generated by count on a subquery (#1202).
- Fixed an
ImportError
generated by an API change in dask. - Fixed an issue where columns were getting trampled if there were column name collisions in a sql join. (#1208)
- Fixed an issue where arithmetic in a Merge expression wouldn’t work
because arithmetic wasn’t defined on
sa.sql.Select
objects (#1207) - Fixed a bug where the wrong value was being passed into
time()
(#1213) - Fixed a bug in sql relabel that prevented relabeling anything that generated a subselect. (#1216)
- Fixed a bug where methods weren’t accessible on fields with the same name (#1204)
- Fixed a bug where optimized expressions going into a pandas group by were incorrectly assigning extra values to the child DataFrame (#1221)
- Fixed a bug where multiple same-named operands caused incorrect scope
to be constructed ultimately resulting in incorrect results on
expressions like
x + x + x
(#1227). Thanks to @llllllllll and @jcrist for discussion around solving the issue. - Fixed a bug where
minute()
andMinute
were not being exported which made them unusable from the blaze server (#1232). - Fixed a bug where repr was being called on data resources rather than string, which caused massive slowdowns on largish expressions running against blaze server (#1240, #1247).
- Skip a test on Win32 + Python 3.4 and PyTables until this gets sorted out on the library side (#1251).
Release 0.8.1¶
Release: | 0.8.1 |
---|---|
Date: | July 7th, 2015 |
New Expressions¶
- String arithmetic is now possible across the numpy and pandas backends via the
+
(concatenation) and*
(repeat) operators (#1058). - Datetime arithmetic is now available (#1112).
- Add a
Concat
expression that implements Union-style operations (#1128). - Add a
Coerce
expression that casts expressions to a different datashape. This maps toastype
in numpy andcast
in SQL (#1137).
Improved Expressions¶
New Backends¶
None
Improved Backends¶
Blaze Server¶
- Tie blueprint registration to data registration (#1061).
- Don’t catch import error when flask doesn’t exist, since blaze does this
in its
__init__.py
(#1087). - Multiple serialization formats including JSON, pickle, and msgpack are now available. Additionally, one can add custom serialization formats with this implementation (#1102, #1122).
- Add a
'names'
field to the response of thecompute.<format>
route for Bokeh compatibility (#1129). - Add cross origin resource sharing for Bokeh compatibility (#1134).
- Add a command line interface (#1115).
- Add a way to tell the blaze server command line interface what to server via a YAML file (#1115).
SQL¶
- Use aliases to allow expressions on the SQL backend that involve a multiple step reduction operation (#1066, #1126).
- Fix unary not operator
~
(#1091). - Postgres uses
==
to compareNaN
so we do it that way as well for the postgresql backend (#1123). - Find table inside non-default schema when serving off a SQLAlchemy
MetaData
object (#1145).
API Changes¶
Bug Fixes¶
- Handle SQLAlchemy API churn around reference of
ColumnElement
objects in the 1.0.x series (#1071, #1076). - Obscure hashing bug when passing in both a pandas Timestamp and a
datetime.datetime
object. Both objects hash to the same value but don’t necessarily compare equal; this makes Python call__eq__
which caused anEq
expression to be constructed (#1097). - Properly handle
And
expressions that involve the same field in MongoDB (#1099). - Handle Dask API changes (#1114).
- Use the
date
function in SQLAlchemy when getting thedate
attribute of adatetime
dshaped expression. Previously this was calling extract, which is incorrect for the postgres backend (#1120). - Fix API compatibility with different versions of psutil (#1136).
- Use explicit
int64
comparisons on Windows, since the default values may be different (#1148). - Fix name attribute propagation in pandas
Series
objects (#1152). - Raise a more informative error when trying to subset with an unsupported expression in the MongoDB backend (#1155).
Release 0.7.3¶
- General maturation of many backends through use.
- Renamed
into
toodo
Release 0.7.0¶
- Pull out data migration utilities to
into
project - Out-of-core CSV support now depends on chunked pandas computation
- h5py and bcolz backends support multi-threading/processing
- Remove
data
directory includingSQL
,HDF5
objects. Depend on standard types within other projects instead (e.g.sqlalchemy.Table
,h5py.Dataset
, ...) - Better support SQL nested queries for complex queries
- Support databases, h5py files, servers as first class datasets
Release 0.6.6¶
- Not intended for public use, mostly for internal build systems
- Bugfix
Release 0.6.5¶
- Improve uri string handling #715
- Various bug fixes #715
Release 0.6.4¶
- Back CSV with
pandas.read_csv
. Better performance and more robust unicode support but less robust missing value support (some regressions) #597 - Much improved SQL support #626 #650 #652 #662
- Server supports remote execution of computations, not just indexing #631
- Better PyTables and datetime support #608 #639
- Support SparkSQL #592
Release 0.6.3¶
by takes only two arguments, the grouper and apply child is inferred using common_subexpression
Better handling of pandas Series object
Better printing of empty results in interactive mode
- Regex dispatched resource function bound to Table, e.g.
Table('/path/to/file.csv')
Release 0.6.2¶
Efficient CSV to SQL migration using native tools #454
Dispatched
drop
andcreate_index
functions #495DPlyr interface at
blaze.api.dplyr
. #484- Various bits borrowed from that interface
transform
function adopted to main namespaceSummary
object for named reductions- Keyword syntax in
by
andmerge
e.g.by(t, t.col, label=t.col2.max(), label2=t.col2.min())
New Computation Server #527
Better PyTables support #487 #496 #526
Release 0.6.1¶
- More consistent behavior of
into
bcolz
backend- Control namespace leakage
Release 0.6¶
- Nearly complete rewrite
- Add abstract table expression system
- Translate expressions onto a variety of backends
- Support Python, NumPy, Pandas, h5py, sqlalchemy, pyspark, PyTables, pymongo
Release 0.5¶
- HDF5 in catalog.
- Reductions like any, all, sum, product, min, max.
- Datetime design and some initial functionality.
- Change how Storage and ddesc works.
- Some preliminary rolling window code.
- Python 3.4 now in the test harness.
Release 0.4.2¶
- Fix bug for compatibility with numba 0.12
- Add sql formats
- Add hdf5 formats
- Add support for numpy ufunc operators
Release 0.4.1¶
- Fix bug with compatibility for numba 0.12
Release 0.4¶
- Split the datashape and blz modules out.
- Add catalog and server for blaze arrays.
- Add remote arrays.
- Add csv and json persistence formats.
- Add python3 support
- Add scidb interface
Release 0.3¶
- Solidifies the execution subsystem around an IR based on the pykit project, as well as a ckernel abstraction at the ABI level.
- Supports ufuncs running on ragged array data.
- Cleans out previous low level data descriptor code, the data descriptor will have a higher level focus.
- Example out of core groupby operation using BLZ.
Release 0.2¶
- Brings in dynd as a required dependency for in-memory data.
Release 0.1¶
- Initial preview release