24.6. Memory mapped IO settings

Introduction

Each file in the Neo4j store is accessed through the Neo4j page cache, when reading from, or writing to, the store files. Since there is only one page cache, there is only one setting for specifying how much memory Neo4j is allowed to use for page caching. The shared page cache ensures that memory is split across the various store files in the most optimal manner [1], depending on how the database is used and what data is popular.

The memory for the page cache is allocated outside the normal Java heap, so you need to take both the Java heap, and the page cache, into consideration in your capacity planning. Other processes running on the OS will impact the availability of such memory. Neo4j will require all of the heap memory of the JVM, plus the memory to be used for the page cache, to be available as physical memory. Other processes may thus not use more than what is available after the configured memory allocation is made for Neo4j.

[Important]Important

Make sure that your system is configured such that it will never need to swap. If memory belonging to the Neo4j process gets swapped out, it can lead to considerable performance degradation.

The amount of memory available to the page cache is configured using the dbms.memory.pagecache.size setting. With that setting, you specify the number of bytes available to the page cache, e.g. 150m og 4g. The default page memory setting is 50% of the machines memory, after subtracting the memory that is reserved for the Java heap.

For optimal performance, you will want to have as much of your data fit in the page cache as possible. You can sum up the size of all the *store.db* files in your store file directory, to figure out how big a page cache you need to fit all your data. For instance, on a posix system you can look at the total of running $ du -hc *store.db* in your data/databases/graph.db directory. Obviously the store files will grow as you add more nodes, relationships and properties, so configuring more page cache memory than you have data, is recommended when possible.

Configuration

Parameter Possible values Effect

dbms.memory.pagecache.size

The maximum amount of memory to use for the page cache, either in bytes, or greater byte-like units, such as 100m for 100 mega-bytes, or 4g for 4 giga-bytes.

The amount of memory to use for mapping the store files, in a unit of bytes. This will automatically be rounded down to the nearest whole page. This value cannot be zero. For extremely small and memory constrained deployments, it is recommended to still reserve at least a couple of megabytes for the page cache.

unsupported.dbms.report_configuration

true or false

If set to true the current configuration settings will be written to the default system output, mostly the console or the logfiles.

When configuring the amount of memory allowed for the page cache and the JVM heap, make sure to also leave room for the operating systems page cache, and other programs and services the system might want to run. It is important to configure the memory usage, such that the Neo4j JVM process won’t need to use any swap memory, as this will cause a significant drag on the performance of the system.

When reading the configuration parameters on startup Neo4j will automatically configure the parameters that are not specified. The cache size will be configured based on the available memory on the computer, with the assumption that the machine is dedicated to running Neo4j. Specifically, Neo4j will look at how much memory the machine has, subtract the JVM heap allocation from that, and then use 50% of what is left for the page cache. This is the default configuration when nothing else is specified.

Batch insert example

Read general information on batch insertion in Chapter 35, Batch Insertion.

The configuration should suit the data set you are about to inject using BatchInsert. Lets say we have a random-like graph with 10M nodes and 100M relationships. Each node (and maybe some relationships) have different properties of string and Java primitive types. The important thing is that the page cache has enough memory to work with, that it doesn’t slow down the BatchInserter:

dbms.memory.pagecache.size=4g

The configuration above will more or less fit the entire graph in memory. A rough formula to calculate the memory needed can look like this:

bytes_needed = number_of_nodes * 15
             + number_of_relationships * 34
             + number_of_properties * 64

Note that the size of the individual property very much depends on what data it contains. The numbers given in the above formula are only a rough estimate.



[1] This is an informal comparison to the store-specific memory mapping settings of previous versions of Neo4j. We are not claiming that our page replacement algorithms are optimal in the formal sense. Truly optimal page replacement algorithms require knowledge of events arbitrarily far into the future.