3.7. Loading Data

As you’ve seen you can not only query data expressively but also create data with Cypher statements.

Naturally in most cases you wouldn’t want to write or generate huge statements to generate your data but instead use an existing data source that you pass into your statement and which is used to drive the graph generation process.

That process not only includes creating completely new data but also integrating with existing structures and updating your graph.

Parameters

In general we recommend passing in varying literal values from the outside as named parameters. This allows Cypher to reuse existing execution plans for the statements.

Of course you can also pass in parameters for data to be imported. Those can be scalar values, maps, lists or even lists of maps.

In your Cypher statement you can then iterate over those values (e.g. with UNWIND) to create your graph structures.

For instance to create a movie graph from JSON data structures pulled from an API you could use:

{
  "movies" : [ {
    "title" : "Stardust",
    "released" : 2007,
    "cast" : [ {
      "actor" : {
        "name" : "Robert de Niro",
        "born" : 1943
      },
      "characters" : [ "Captain Shakespeare" ]
    }, {
      "actor" : {
        "name" : "Michelle Pfeiffer",
        "born" : 1958
      },
      "characters" : [ "Lamia" ]
    } ]
  } ]
}
UNWIND {movies} as movie
MERGE (m:Movie {title:movie.title}) ON CREATE SET m.released = movie.released
FOREACH (role IN movie.cast |
   MERGE (a:Person {name:role.actor.name}) ON CREATE SET a.born = role.actor.born
   MERGE (a)-[:ACTED_IN {roles:role.characters}]->(m)
)

Importing CSV

Cypher provides an elegant built-in way to import tabular CSV data into graph structures.

The LOAD CSV clause parses a local or remote file into a stream of rows which represent maps (with headers) or lists. Then you can use whatever Cypher operations you want to apply to either create nodes or relationships or to merge with existing graph structures.

As CSV files usually represent either node- or relationship-lists, you run multiple passes to create nodes and relationships separately.

For more details, see Section 11.6, “Load CSV”.

movies.csv 

id,title,country,year
1,Wall Street,USA,1987
2,The American President,USA,1995
3,The Shawshank Redemption,USA,1994

LOAD CSV WITH HEADERS FROM "http://neo4j.com/docs/3.1.0-SNAPSHOT/csv/intro/movies.csv" AS line
CREATE (m:Movie { id:line.id,title:line.title, released:toInt(line.year)});

persons.csv 

id,name
1,Charlie Sheen
2,Oliver Stone
3,Michael Douglas
4,Martin Sheen
5,Morgan Freeman

LOAD CSV WITH HEADERS FROM "http://neo4j.com/docs/3.1.0-SNAPSHOT/csv/intro/persons.csv" AS line
MERGE (a:Person { id:line.id })
ON CREATE SET a.name=line.name;

roles.csv 

personId,movieId,role
1,1,Bud Fox
4,1,Carl Fox
3,1,Gordon Gekko
4,2,A.J. MacInerney
3,2,President Andrew Shepherd
5,3,Ellis Boyd 'Red' Redding

LOAD CSV WITH HEADERS FROM "http://neo4j.com/docs/3.1.0-SNAPSHOT/csv/intro/roles.csv" AS line
MATCH (m:Movie { id:line.movieId })
MATCH (a:Person { id:line.personId })
CREATE (a)-[:ACTED_IN { roles: [line.role]}]->(m);

If your file contains denormalized data, you can either run the same file with multiple passes and simple operations as shown above or you might have to use MERGE to create entities uniquely.

For our use-case we can import the data using a CSV structure like this:

movie_actor_roles.csv 

title;released;actor;born;characters
Back to the Future;1985;Michael J. Fox;1961;Marty McFly
Back to the Future;1985;Christopher Lloyd;1938;Dr. Emmet Brown

LOAD CSV WITH HEADERS FROM "http://neo4j.com/docs/3.1.0-SNAPSHOT/csv/intro/movie_actor_roles.csv" AS
  line FIELDTERMINATOR ";"
MERGE (m:Movie { title:line.title })
ON CREATE SET m.released = toInt(line.released)
MERGE (a:Person { name:line.actor })
ON CREATE SET a.born = toInt(line.born)
MERGE (a)-[:ACTED_IN { roles:split(line.characters,",")}]->(m)

If you import a large amount of data (more than 10000 rows), it is recommended to prefix your LOAD CSV clause with a PERIODIC COMMIT hint. This allows Neo4j to regularly commit the import transactions to avoid memory churn for large transaction-states.