Creating Data
We’ll start by looking into the clauses that allow us to create data.
To add data, we just use the patterns we already know. By providing patterns we can specify what graph structures, labels and properties we would like to make part of our graph.
Obviously the simplest clause is called CREATE
.
It will just go ahead and directly create the patterns that you specify.
For the patterns we’ve looked at so far this could look like the following:
CREATE (:Movie { title:"The Matrix",released:1997 })
If we execute this statement, Cypher returns the number of changes, in this case adding 1 node, 1 label and 2 properties.
Nodes created: 1 |
---|
Properties set: 2 |
Labels added: 1 |
|
As we started out with an empty database, we now have a database with a single node in it:
If case we also want to return the created data we can add a RETURN
clause, which refers to the variable we’ve assigned to our pattern elements.
CREATE (p:Person { name:"Keanu Reeves", born:1964 }) RETURN p
This is what gets returned:
p |
---|
1 row |
Nodes created: 1 |
Properties set: 2 |
Labels added: 1 |
|
If we want to create more than one element, we can separate the elements with commas or use multiple CREATE
statements.
We can of course also create more complex structures, like an ACTED_IN
relationship with information about the character, or DIRECTED
ones for the director.
CREATE (a:Person { name:"Tom Hanks", born:1956 })-[r:ACTED_IN { roles: ["Forrest"]}]->(m:Movie { title:"Forrest Gump",released:1994 }) CREATE (d:Person { name:"Robert Zemeckis", born:1951 })-[:DIRECTED]->(m) RETURN a,d,r,m
This is the part of the graph we just updated:
In most cases, we want to connect new data to existing structures. This requires that we know how to find existing patterns in our graph data, which we will look at next.
Matching Patterns
Matching patterns is a task for the MATCH
statement.
We pass the same kind of patterns we’ve used so far to MATCH
to describe what we’re looking for.
It is similar to query by example, only that our examples also include the structures.
Note A |
To find the data we’ve created so far, we can start looking for all nodes labeled with the Movie
label.
MATCH (m:Movie) RETURN m
Here’s the result:
This should show both The Matrix and Forrest Gump.
We can also look for a specific person, like Keanu Reeves.
MATCH (p:Person { name:"Keanu Reeves" }) RETURN p
This query returns the matching node:
Note that we only provide enough information to find the nodes, not all properties are required. In most cases you have key-properties like SSN, ISBN, emails, logins, geolocation or product codes to look for.
We can also find more interesting connections, like for instance the movies titles that Tom Hanks acted in and the roles he played.
MATCH (p:Person { name:"Tom Hanks" })-[r:ACTED_IN]->(m:Movie) RETURN m.title, r.roles
m.title | r.roles |
---|---|
1 row | |
|
|
In this case we only returned the properties of the nodes and relationships that we were interested in.
You can access them everywhere via a dot notation identifer.property
.
Of course this only lists his role as Forrest in Forrest Gump because that’s all data that we’ve added.
Now we know enough to connect new nodes to existing ones and can combine MATCH
and CREATE
to attach structures to the graph.
Attaching Structures
To extend the graph with new information, we first match the existing connection points and then attach the newly created nodes to them with relationships. Adding Cloud Atlas as a new movie for Tom Hanks could be achieved like this:
MATCH (p:Person { name:"Tom Hanks" }) CREATE (m:Movie { title:"Cloud Atlas",released:2012 }) CREATE (p)-[r:ACTED_IN { roles: ['Zachry']}]->(m) RETURN p,r,m
Here’s what the structure looks like in the database:
Tip It is important to remember that we can assign variables to both nodes and relationships and use them later on, no matter if they were created or matched. |
It is possible to attach both node and relationship in a single CREATE
clause.
For readability it helps to split them up though.
Important A tricky aspect of the combination of |
Completing Patterns
Whenever we get data from external systems or are not sure if certain information already exists in the graph, we want to be able to express a repeatable (idempotent) update operation.
In Cypher MERGE
has this function.
It acts like a combination of MATCH
or CREATE
, which checks for the existence of data first before creating it.
With MERGE
you define a pattern to be found or created.
Usually, as with MATCH
you only want to include the key property to look for in your core pattern.
MERGE
allows you to provide additional properties you want to set ON CREATE
.
If we wouldn’t know if our graph already contained Cloud Atlas we could merge it in again.
MERGE (m:Movie { title:"Cloud Atlas" }) ON CREATE SET m.released = 2012 RETURN m
m |
---|
1 row |
|
We get a result in any both cases: either the data (potentially more than one row) that was already in the graph or a single, newly created Movie
node.
Note A |
So foremost MERGE
makes sure that you can’t create duplicate information or structures, but it comes with the cost of needing to check for existing matches first.
Especially on large graphs it can be costly to scan a large set of labeled nodes for a certain property.
You can alleviate some of that by creating supporting indexes or constraints, which we’ll discuss later.
But it’s still not for free, so whenever you’re sure to not create duplicate data use CREATE
over MERGE
.
Tip
|
MATCH (m:Movie { title:"Cloud Atlas" }) MATCH (p:Person { name:"Tom Hanks" }) MERGE (p)-[r:ACTED_IN]->(m) ON CREATE SET r.roles =['Zachry'] RETURN p,r,m
In case the direction of a relationship is arbitrary, you can leave off the arrowhead.
MERGE
will then check for the relationship in either direction, and create a new directed relationship if no matching relationship was found.
If you choose to pass in only one node from a preceding clause, MERGE
offers an interesting functionality.
It will then only match within the direct neighborhood of the provided node for the given pattern, and, if not found create it.
This can come in very handy for creating for example tree structures.
CREATE (y:Year { year:2014 }) MERGE (y)<-[:IN_YEAR]-(m10:Month { month:10 }) MERGE (y)<-[:IN_YEAR]-(m11:Month { month:11 }) RETURN y,m10,m11
This is the graph structure that gets created:
Here there is no global search for the two Month
nodes; they are only searched for in the context of the 2014 Year
node.