3.3. Patterns in Practice

Creating Data

We’ll start by looking into the clauses that allow us to create data.

To add data, we just use the patterns we already know. By providing patterns we can specify what graph structures, labels and properties we would like to make part of our graph.

Obviously the simplest clause is called CREATE. It will just go ahead and directly create the patterns that you specify.

For the patterns we’ve looked at so far this could look like the following:

CREATE (:Movie { title:"The Matrix",released:1997 })

If we execute this statement, Cypher returns the number of changes, in this case adding 1 node, 1 label and 2 properties.

Nodes created: 1
Properties set: 2
Labels added: 1

(empty result)

As we started out with an empty database, we now have a database with a single node in it:

If case we also want to return the created data we can add a RETURN clause, which refers to the variable we’ve assigned to our pattern elements.

CREATE (p:Person { name:"Keanu Reeves", born:1964 })
RETURN p

This is what gets returned:

p
1 row
Nodes created: 1
Properties set: 2
Labels added: 1

Node[1]{name:"Keanu Reeves",born:1964}

If we want to create more than one element, we can separate the elements with commas or use multiple CREATE statements.

We can of course also create more complex structures, like an ACTED_IN relationship with information about the character, or DIRECTED ones for the director.

CREATE (a:Person { name:"Tom Hanks",
  born:1956 })-[r:ACTED_IN { roles: ["Forrest"]}]->(m:Movie { title:"Forrest Gump",released:1994 })
CREATE (d:Person { name:"Robert Zemeckis", born:1951 })-[:DIRECTED]->(m)
RETURN a,d,r,m

This is the part of the graph we just updated:

In most cases, we want to connect new data to existing structures. This requires that we know how to find existing patterns in our graph data, which we will look at next.

Matching Patterns

Matching patterns is a task for the MATCH statement. We pass the same kind of patterns we’ve used so far to MATCH to describe what we’re looking for. It is similar to query by example, only that our examples also include the structures.

[Note]Note

A MATCH statement will search for the patterns we specify and return one row per successful pattern match.

To find the data we’ve created so far, we can start looking for all nodes labeled with the Movie label.

MATCH (m:Movie)
RETURN m

Here’s the result:

This should show both The Matrix and Forrest Gump.

We can also look for a specific person, like Keanu Reeves.

MATCH (p:Person { name:"Keanu Reeves" })
RETURN p

This query returns the matching node:

Note that we only provide enough information to find the nodes, not all properties are required. In most cases you have key-properties like SSN, ISBN, emails, logins, geolocation or product codes to look for.

We can also find more interesting connections, like for instance the movies titles that Tom Hanks acted in and the roles he played.

MATCH (p:Person { name:"Tom Hanks" })-[r:ACTED_IN]->(m:Movie)
RETURN m.title, r.roles
m.titler.roles
1 row

"Forrest Gump"

["Forrest"]

In this case we only returned the properties of the nodes and relationships that we were interested in. You can access them everywhere via a dot notation identifer.property.

Of course this only lists his role as Forrest in Forrest Gump because that’s all data that we’ve added.

Now we know enough to connect new nodes to existing ones and can combine MATCH and CREATE to attach structures to the graph.

Attaching Structures

To extend the graph with new information, we first match the existing connection points and then attach the newly created nodes to them with relationships. Adding Cloud Atlas as a new movie for Tom Hanks could be achieved like this:

MATCH (p:Person { name:"Tom Hanks" })
CREATE (m:Movie { title:"Cloud Atlas",released:2012 })
CREATE (p)-[r:ACTED_IN { roles: ['Zachry']}]->(m)
RETURN p,r,m

Here’s what the structure looks like in the database:

[Tip]Tip

It is important to remember that we can assign variables to both nodes and relationships and use them later on, no matter if they were created or matched.

It is possible to attach both node and relationship in a single CREATE clause. For readability it helps to split them up though.

[Important]Important

A tricky aspect of the combination of MATCH and CREATE is that we get one row per matched pattern. This causes subsequent CREATE statements to be executed once for each row. In many cases this is what you want. If that’s not intended, please move the CREATE statement before the MATCH, or change the cardinality of the query with means discussed later or use the get or create semantics of the next clause: MERGE.

Completing Patterns

Whenever we get data from external systems or are not sure if certain information already exists in the graph, we want to be able to express a repeatable (idempotent) update operation. In Cypher MERGE has this function. It acts like a combination of MATCH or CREATE, which checks for the existence of data first before creating it. With MERGE you define a pattern to be found or created. Usually, as with MATCH you only want to include the key property to look for in your core pattern. MERGE allows you to provide additional properties you want to set ON CREATE.

If we wouldn’t know if our graph already contained Cloud Atlas we could merge it in again.

MERGE (m:Movie { title:"Cloud Atlas" })
ON CREATE SET m.released = 2012
RETURN m
m
1 row

Node[5]{title:"Cloud Atlas",released:2012}

We get a result in any both cases: either the data (potentially more than one row) that was already in the graph or a single, newly created Movie node.

[Note]Note

A MERGE clause without any previously assigned variables in it either matches the full pattern or creates the full pattern. It never produces a partial mix of matching and creating within a pattern. To achieve a partial match/create, make sure to use already defined variables for the parts that shouldn’t be affected.

So foremost MERGE makes sure that you can’t create duplicate information or structures, but it comes with the cost of needing to check for existing matches first. Especially on large graphs it can be costly to scan a large set of labeled nodes for a certain property. You can alleviate some of that by creating supporting indexes or constraints, which we’ll discuss later. But it’s still not for free, so whenever you’re sure to not create duplicate data use CREATE over MERGE.

[Tip]Tip

MERGE can also assert that a relationship is only created once. For that to work you have to pass in both nodes from a previous pattern match.

MATCH (m:Movie { title:"Cloud Atlas" })
MATCH (p:Person { name:"Tom Hanks" })
MERGE (p)-[r:ACTED_IN]->(m)
ON CREATE SET r.roles =['Zachry']
RETURN p,r,m

In case the direction of a relationship is arbitrary, you can leave off the arrowhead. MERGE will then check for the relationship in either direction, and create a new directed relationship if no matching relationship was found.

If you choose to pass in only one node from a preceding clause, MERGE offers an interesting functionality. It will then only match within the direct neighborhood of the provided node for the given pattern, and, if not found create it. This can come in very handy for creating for example tree structures.

CREATE (y:Year { year:2014 })
MERGE (y)<-[:IN_YEAR]-(m10:Month { month:10 })
MERGE (y)<-[:IN_YEAR]-(m11:Month { month:11 })
RETURN y,m10,m11

This is the graph structure that gets created:

Here there is no global search for the two Month nodes; they are only searched for in the context of the 2014 Year node.