Chapter 35. Batch Insertion

Neo4j has a batch insertion facility intended for initial imports, which bypasses transactions and other checks in favor of performance. This is useful when you have a big dataset that needs to be loaded once.

Batch insertion is included in the neo4j-kernel component, which is part of all Neo4j distributions and editions.

Be aware of the following points when using batch insertion:

  • The intended use is for initial import of data but you can use it on an existing database if the existing database is shutdown first.
  • Batch insertion is not thread safe.
  • Batch insertion is non-transactional.
  • Batch insertion is not enforcing constraints on the inserted data while inserting data.
  • Batch insertion will re-populate all existing indexes and indexes created during batch insertion on shutdown.
  • Batch insertion will verify all existing constraints and constraints created during batch insertion on shutdown.
  • Unless shutdown is successfully invoked at the end of the import, the database files will be corrupt.
[Warning]Warning

Always perform batch insertion in a single thread (or use synchronization to make only one thread at a time access the batch inserter) and invoke shutdown when finished.

[Warning]Warning

Since the batch insertion doesn’t enforce constraint during data loading, if the inserted data violate any constraint the batch inserter will fail on shutdown and the database will be inconsistent.