OPTIONS

Perform Quorum Reads on Replica Sets

New in version 3.2.

Overview

When reading from the primary of a replica set, it is possible to read data that is stale or not durable, depending on the read concern used [1]. With a read concern level of "local", a client can read data before it is durable; that is, before they have propagated to enough replica set members to avoid a rollback. A read concern level of "majority" guarantees durable reads but may return stale data that has been overwritten by another write operation.

This tutorial outlines a procedure that uses db.collection.findAndModify() to read data that is not stale and cannot be rolled back. To do so, the procedure uses the findAndModify() method with a write concern to modify a dummy field in a document. Specifically, the procedure requires that:

Important

The “quorum read” procedure has a substantial cost over simply using a read concern of "majority" because it incurs write latency rather than read latency. This technique should only be used if staleness is absolutely intolerable.

Prerequisites

This tutorial reads from a collection named products. Initialize the collection using the following operation.

db.products.insert( [
   {
     _id: 1,
     sku: "xyz123",
     description: "hats",
     available: [ { quantity: 25, size: "S" }, { quantity: 50, size: "M" } ],
     _dummy_field: 0
   },
   {
     _id: 2,
     sku: "abc123",
     description: "socks",
     available: [ { quantity: 10, size: "L" } ],
     _dummy_field: 0
   },
   {
     _id: 3,
     sku: "ijk123",
     description: "t-shirts",
     available: [ { quantity: 30, size: "M" }, { quantity: 5, size: "L" } ],
     _dummy_field: 0
   }
] )

The documents in this collection contain a dummy field named _dummy_field that will be incremented by the db.collection.findAndModify() in the tutorial. If the field does not exist, the db.collection.findAndModify() operation will add the field to the document. The purpose of the field is to ensure that the db.collection.findAndModify() results in a modification to the document.

Procedure

1

Create a unique index.

Create a unique index on the fields that will be used to specify an exact match in the db.collection.findAndModify() operation.

This tutorial will use an exact match on the sku field. As such, create a unique index on the sku field.

db.products.createIndex( { sku: 1 }, { unique: true } )
2

Use findAndModify to read committed data.

Use the db.collection.findAndModify() method to make a trivial update to the document you want to read and return the modified document. A write concern of { w: "majority" } is required. To specify the document to read, you must use an exact match query that is supported by a unique index.

The following findAndModify() operation specifies an exact match on the uniquely indexed field sku and increments the field named _dummy_field in the matching document. While not necessary, the write concern for this command also includes a wtimeout value of 5000 milliseconds to prevent the operation from blocking forever if the write cannot propagate to a majority of voting members.

var updatedDocument = db.products.findAndModify(
   {
     query: { sku: "abc123" },
     update: { $inc: { _dummy_field: 1 } },
     new: true,
     writeConcern: { w: "majority", wtimeout: 5000 }
   }
);

Even in situations where two nodes in the replica set believe that they are the primary, only one will be able to complete the write with w: "majority". As such, the findAndModify() method with "majority" write concern will be successful only when the client has connected to the true primary to perform the operation.

Since the quorum read procedure only increments a dummy field in the document, you can safely repeat invocations of findAndModify(), adjusting the wtimeout as necessary.

[1]In some circumstances, two nodes in a replica set may transiently believe that they are the primary, but at most, one of them will be able to complete writes with { w: "majority" } write concern. The node that can complete { w: "majority" } writes is the current primary, and the other node is a former primary that has not yet recognized its demotion, typically due to a network partition. When this occurs, clients that connect to the former primary may observe stale data despite having requested read preference primary, and new writes to the former primary will eventually roll back.