[mongodb-user] Compound indexes for date range queries

Kevin Adistambha

2016-11-02 03:21:11 UTC

Hi John

Itâs difficult to say for sure what indexes is optimal without knowing for
sure the cardinality of your data. However, I tried to make an educated
guess regarding your data, and populated a test collection using 50,000
randomly generated documents to check out your queries.

I tried with the arguably simpler second query you posted:

{start: {$gte: 12345, $lte: 5678}, amount_sent: 88}

For this query, you need an index having the fields start and amount_sent.
You suggested using {start:-1, amount_sent:1}. However, this is not
optimal, which we can see in the explain() output. Note the use of hint()
<https://docs.mongodb.com/manual/reference/method/cursor.hint/> to force

db.test.explain('executionStats').find({amount_sent:10, start: {$gte: 1467295200, $lte: 1483189200}}).sort({start:-1}).hint({start:-1, amount_sent:1})

...
"nReturned": 253,
"executionTimeMillis": 38,
"totalKeysExamined": 24947,
"totalDocsExamined": 253,
...

From the explain('executionStats') output, MongoDB needs to scan ~25,000
keys to return 253 documents matching the query.

A better index is {amount_sent:1, start:1}. This index is structured by
first having an exact value of amount_sent, where for each value of
amount_sent, sorted values of start is stored. Hereâs an illustration of
the index structure:
amount_sent 1 2 3 âŠ
start 1 2 3 âŠ 1 2 3 âŠ 1 2 3 âŠ âŠ

Since you need an exact value of amount_sent and a range of start (and
possibly sort using start), you can verify from the illustration above that
the index facilitates the query perfectly.

db.test.explain('executionStats').find({amount_sent:10, start: {$gte: 1467295200, $lte: 1483189200}}).sort({start:-1}).hint({amount_sent:1, start:1})

...
"nReturned": 253,
"executionTimeMillis": 1,
"totalKeysExamined": 253,
"totalDocsExamined": 253,
...

Now MongoDB examines 253 keys (instead of ~25,000) and returns the same
number of documents. One takeaway from this is that an index fits the query
if the numbers in nReturned, totalKeysExamined, and totalDocsExamined are
the same.

Your first query is more involved:

{$or: [{sender: âBobâ, receiver: âBob}], amount_sent: 30, start: {$gte:
12345, $lte: 5678}}

db.test.explain('executionStats').find({$or:[{sender:"Oscar"}, {receiver:"Oscar"}], amount_sent:10, start: {$gte: 1467295200, $lte: 1483189200}}).sort({start:-1}).hint({sender:1, amount_sent:1, start:1})

...
"nReturned": 4,
"executionTimeMillis": 118,
"totalKeysExamined": 50000,
"totalDocsExamined": 50000,
...
"stage": "SORT_KEY_GENERATOR",
...

This is not great at all. MongoDB looks at 50,000 keys and documents
(basically the whole collection), only to return 4 of them. Also, note the
stage called âSORT_KEY_GENERATORâ there. The existence of this stage means
that MongoDB cannot use the index to present a sorted result, and thus is
doing an in-memory sort instead, which is limited to 32 MB. If your query
returns a large result set that exceeds 32 MB, the query will fail. See Use
Indexes to Sort Query Results
<https://docs.mongodb.com/v3.2/tutorial/sort-results-with-indexes/>.

We can improve this by refactoring your query (by unwrapping the $or
statement) and creating a couple of indexes to support both terms of the
$or query.

The refactored query looks like:

{$or:[
{sender:"Oscar", amount_sent:10, start: {$gte: 1467295200, $lte: 1483189200}},
{receiver:"Oscar", amount_sent:10, start: {$gte: 1467295200, $lte: 1483189200}}
]}

and the two indexes are {sender:1, amount_sent:1, start:1} and {receiver:1,
amount_sent:1, start:1}, to satisfy both terms of the $or query. The

db.test.explain('executionStats').find({$or:[{sender:"Oscar", amount_sent:10, start: {$gte: 1467295200, $lte: 1483189200}}, {receiver:"Oscar", amount_sent:10, start: {$gte: 1467295200, $lte: 1483189200}}]}).sort({start:-1})

...
"nReturned": 4,
"executionTimeMillis": 1,
"totalKeysExamined": 4,
"totalDocsExamined": 4,
...
"stage": "SORT_MERGE",
...

The refactored query and the two indexes appear to work together well. Now
MongoDB returns 4 documents, and examined exactly the same number of index
keys and documents. Also note the existence of a âSORT_MERGEâ stage. This
is basically the merge stage of a merge sort, which is not bound to the 32
MB limit. This stage merges the two (already sorted) results for each of
the $or terms. Also, execution time of the query drops from 118 ms to just
1 ms.

Please note that the analysis I did is mostly an educated guess since I
donât know the exact nature of your data. To optimize your use case, I
would encourage you to fully understand the explain() result:
https://docs.mongodb.com/v3.2/reference/explain-results/

Best regards,
Kevin
â
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/3c58f4f2-32b8-4889-8941-5d67a47accc2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.