File Sizes
Neo4j relies on Java’s Non-blocking I/O subsystem for all file handling. Furthermore, while the storage file layout is optimized for interconnected data, Neo4j does not require raw devices. Thus, file sizes are only limited by the underlying operating system’s capacity to handle large files. Physically, there is no built-in limit of the file handling capacity in Neo4j.
Neo4j has a built-in page cache, that will cache the contents of the storage files. If there is not enough RAM to keep the storage files resident, then Neo4j will page parts of the files in and out as necessary, while keeping the most popular parts of the files resident at all times. Thus, ACID speed degrades gracefully as RAM becomes the limiting factor.
Read speed
Enterprises want to optimize the use of hardware to deliver the maximum business value from available resources. Neo4j’s approach to reading data provides the best possible usage of all available hardware resources. Neo4j does not block or lock any read operations; thus, there is no danger for deadlocks in read operations and no need for read transactions. With a threaded read access to the database, queries can be run simultaneously on as many processors as may be available. This provides very good scale-up scenarios with bigger servers.
Write speed
Write speed is a consideration for many enterprise applications. However, there are two different scenarios:
- sustained continuous operation and
- bulk access (e.g., backup, initial or batch loading).
To support the disparate requirements of these scenarios, Neo4j supports two modes of writing to the storage layer.
In transactional, ACID-compliant normal operation, isolation level is maintained and read operations can occur at the same time as the writing process. At every commit, the data is persisted to disk and can be recovered to a consistent state upon system failures. This requires disk write access and a real flushing of data. Thus, the write speed of Neo4j on a single server in continuous mode is limited by the I/O capacity of the hardware. Consequently, the use of fast SSDs is highly recommended for production scenarios.
Neo4j has a Batch Inserter that operates directly on the store files. This mode does not provide transactional security, so it can only be used when there is a single write thread. Because data is written sequentially, and never flushed to the logical logs, huge performance boosts are achieved. The Batch Inserter is optimized for non-transactional bulk import of large amounts of data.
Data size
In Neo4j, data size is mainly limited by the address space of the primary keys for Nodes, Relationships, Properties and RelationshipTypes. Currently, the address space is as follows:
nodes | 235 (∼ 34 billion) |
relationships | 235 (∼ 34 billion) |
properties | 236 to 238 depending on property types (maximum ∼ 274 billion, always at least ∼ 68 billion) |
relationship types | 216 (∼ 65 000) |