Additional Persistence Techniques¶

Embedding SQL Insert/Update Expressions into a Flush¶

This feature allows the value of a database column to be set to a SQL expression instead of a literal value. It’s especially useful for atomic updates, calling stored procedures, etc. All you do is assign an expression to an attribute:

class SomeClass(object):
    pass
mapper(SomeClass, some_table)

someobject = session.query(SomeClass).get(5)

# set 'value' attribute to a SQL expression adding one
someobject.value = some_table.c.value + 1

# issues "UPDATE some_table SET value=value+1"
session.commit()

This technique works both for INSERT and UPDATE statements. After the flush/commit operation, the value attribute on someobject above is expired, so that when next accessed the newly generated value will be loaded from the database.

Using SQL Expressions with Sessions¶

SQL expressions and strings can be executed via the Session within its transactional context. This is most easily accomplished using the execute() method, which returns a ResultProxy in the same manner as an Engine or Connection:

Session = sessionmaker(bind=engine)
session = Session()

# execute a string statement
result = session.execute("select * from table where id=:id", {'id':7})

# execute a SQL expression construct
result = session.execute(select([mytable]).where(mytable.c.id==7))

The current Connection held by the Session is accessible using the connection() method:

connection = session.connection()

The examples above deal with a Session that’s bound to a single Engine or Connection. To execute statements using a Session which is bound either to multiple engines, or none at all (i.e. relies upon bound metadata), both execute() and connection() accept a mapper keyword argument, which is passed a mapped class or Mapper instance, which is used to locate the proper context for the desired engine:

Session = sessionmaker()
session = Session()

# need to specify mapper or class when executing
result = session.execute("select * from table where id=:id", {'id':7}, mapper=MyMappedClass)

result = session.execute(select([mytable], mytable.c.id==7), mapper=MyMappedClass)

connection = session.connection(MyMappedClass)

Partitioning Strategies¶

Simple Vertical Partitioning¶

Vertical partitioning places different kinds of objects, or different tables, across multiple databases:

engine1 = create_engine('postgresql://db1')
engine2 = create_engine('postgresql://db2')

Session = sessionmaker(twophase=True)

# bind User operations to engine 1, Account operations to engine 2
Session.configure(binds={User:engine1, Account:engine2})

session = Session()

Above, operations against either class will make usage of the Engine linked to that class. Upon a flush operation, similar rules take place to ensure each class is written to the right database.

The transactions among the multiple databases can optionally be coordinated via two phase commit, if the underlying backend supports it. See Enabling Two-Phase Commit for an example.

Custom Vertical Partitioning¶

More comprehensive rule-based class-level partitioning can be built by overriding the Session.get_bind() method. Below we illustrate a custom Session which delivers the following rules:

Flush operations are delivered to the engine named master.
Operations on objects that subclass MyOtherClass all occur on the other engine.
Read operations for all other classes occur on a random choice of the slave1 or slave2 database.

engines = {
    'master':create_engine("sqlite:///master.db"),
    'other':create_engine("sqlite:///other.db"),
    'slave1':create_engine("sqlite:///slave1.db"),
    'slave2':create_engine("sqlite:///slave2.db"),
}

from sqlalchemy.orm import Session, sessionmaker
import random

class RoutingSession(Session):
    def get_bind(self, mapper=None, clause=None):
        if mapper and issubclass(mapper.class_, MyOtherClass):
            return engines['other']
        elif self._flushing:
            return engines['master']
        else:
            return engines[
                random.choice(['slave1','slave2'])
            ]

The above Session class is plugged in using the class_ argument to sessionmaker:

Session = sessionmaker(class_=RoutingSession)

This approach can be combined with multiple MetaData objects, using an approach such as that of using the declarative __abstract__ keyword, described at __abstract__.

Horizontal Partitioning¶

Horizontal partitioning partitions the rows of a single table (or a set of tables) across multiple databases.

See the “sharding” example: Horizontal Sharding.

Bulk Operations¶

Note

Bulk Operations mode is a new series of operations made available on the Session object for the purpose of invoking INSERT and UPDATE statements with greatly reduced Python overhead, at the expense of much less functionality, automation, and error checking. As of SQLAlchemy 1.0, these features should be considered as “beta”, and additionally are intended for advanced users.

New in version 1.0.0.

Bulk operations on the Session include Session.bulk_save_objects(), Session.bulk_insert_mappings(), and Session.bulk_update_mappings(). The purpose of these methods is to directly expose internal elements of the unit of work system, such that facilities for emitting INSERT and UPDATE statements given dictionaries or object states can be utilized alone, bypassing the normal unit of work mechanics of state, relationship and attribute management. The advantages to this approach is strictly one of reduced Python overhead:

The flush() process, including the survey of all objects, their state, their cascade status, the status of all objects associated with them via relationship(), and the topological sort of all operations to be performed is completely bypassed. This reduces a great amount of Python overhead.
The objects as given have no defined relationship to the target Session, even when the operation is complete, meaning there’s no overhead in attaching them or managing their state in terms of the identity map or session.
The Session.bulk_insert_mappings() and Session.bulk_update_mappings() methods accept lists of plain Python dictionaries, not objects; this further reduces a large amount of overhead associated with instantiating mapped objects and assigning state to them, which normally is also subject to expensive tracking of history on a per-attribute basis.
The process of fetching primary keys after an INSERT also is disabled by default. When performed correctly, INSERT statements can now more readily be batched by the unit of work process into executemany() blocks, which perform vastly better than individual statement invocations.
UPDATE statements can similarly be tailored such that all attributes are subject to the SET clase unconditionally, again making it much more likely that executemany() blocks can be used.

The performance behavior of the bulk routines should be studied using the Performance example suite. This is a series of example scripts which illustrate Python call-counts across a variety of scenarios, including bulk insert and update scenarios.

Usage¶

The methods each work in the context of the Session object’s transaction, like any other:

s = Session()
objects = [
    User(name="u1"),
    User(name="u2"),
    User(name="u3")
]
s.bulk_save_objects(objects)

For Session.bulk_insert_mappings(), and Session.bulk_update_mappings(), dictionaries are passed:

s.bulk_insert_mappings(User,
  [dict(name="u1"), dict(name="u2"), dict(name="u3")]
)

Comparison to Core Insert / Update Constructs¶

The bulk methods offer performance that under particular circumstances can be close to that of using the core Insert and Update constructs in an “executemany” context (for a description of “executemany”, see Executing Multiple Statements in the Core tutorial). In order to achieve this, the Session.bulk_insert_mappings.return_defaults flag should be disabled so that rows can be batched together. The example suite in Performance should be carefully studied in order to gain familiarity with how fast bulk performance can be achieved.

ORM Compatibility¶

The bulk insert / update methods lose a significant amount of functionality versus traditional ORM use. The following is a listing of features that are not available when using these methods:

persistence along relationship() linkages
sorting of rows within order of dependency; rows are inserted or updated directly in the order in which they are passed to the methods
Session-management on the given objects, including attachment to the session, identity map management.
Functionality related to primary key mutation, ON UPDATE cascade
SQL expression inserts / updates (e.g. Embedding SQL Insert/Update Expressions into a Flush)
ORM events such as MapperEvents.before_insert(), etc. The bulk session methods have no event support.

Features that are available include:

INSERTs and UPDATEs of mapped objects
Version identifier support
Multi-table mappings, such as joined-inheritance - however, an object to be inserted across multiple tables either needs to have primary key identifiers fully populated ahead of time, else the Session.bulk_save_objects.return_defaults flag must be used, which will greatly reduce the performance benefits

SQLAlchemy 1.0 Documentation