[mongodb-user] Mongo queries are slow and yielding/waiting for locks thousand times

Discussion:

Abhishek Somwanshi

2018-11-20 17:46:44 UTC

Hello

We are using mongo setup with replica sets on AWS.
Setup Details:
1 Primary node: r3.8xlarge
4 secondary nodes: r5.xlarge
Mongo version: 3.0.8 (WiredTiger)
Database size: 358GB

We have configured mongo cloud monitoring for this setup. Here are few
stats from it:

Mongo Primary Network: 350MB/S (In: 40MB/S, Out: 310MB/S, Num of requests:
5.32KB/S)
1.1K Connections on average on primary node
Tickets available on primary: Reads: 125, Write: 100
Queues: Total: 15-20, Read : 0-1, Write: 15-20

Other Points:
CPU and memory stats on the instance look pretty much under control.
We are using 800gb GP2 EBS volume(2400 IOPS) and we are consuming around
2000 IOPS. The burst balance is almost available to full capacity which
means that also is not exhausted.
Primary node is of r3.8xlarge type, so it has 10gig network.
Ulimits are set as follows:

Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 64000 64000
processes
Max open files 64000 64000 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 1967994 1967994 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us

Problem:
We are accessing mongo using Java application and we are seeing that a lot
of our queries are taking considerable amount of time such as 2-4 seconds.
We enabled profiling and listed slowest queries. In that we found that a
sample query for update object, acquired intent lock 17000 times and had to
yield those many times.
We are not able to understand if it is intent lock then why does it need to
yield so many times? Does intent lock also not allow other operations to
proceed?
If the query has to yield so many times, how can we see the advantage of
document level lock?
We have also seen some of our queries getting timed out(probably waiting
for the lock too long and eventually dying).
Could you please guide us on how to debug this problem and improve the
performance of our mongo ?

Here's the profiler output of a sample query:
{
"op" : "update",
"ns" : "backend.Scores",
"query" : {
"channel.id" : "hkxj",
"metric" : "YVR",
"date" : ISODate("2018-11-20T10:00:00Z")
},
"updateobj" : {
"$set" : {
"channel" : {
"id" : "hkxj",
"failures" : 39,
"items" : [

{
"_id" : ObjectId(
"5bf3e434800075956f1"),
"image" :
"Loading Image...

",
"b_time" : ISODate(
"2018-11-26T19:24:00Z"),
"title" : "What's New ",
"id" : "fq969"
},
{
"_id" : ObjectId(
"5bf3e43f800075956f0"),
"image" :
"Loading Image...

",
"broadcast_time" : ISODate(
"2018-11-26T20:24:00Z"),
"title" : "Let's End This",
"id" : "fsfgd"
}
]
},
"metric" : "YVR",
"date" : ISODate("2018-11-20T10:00:00Z")
},
"$setOnInsert" : {
"__v" : 0
}
},
"nscanned" : 0,
"nscannedObjects" : 2209900,
"nMatched" : 1,
"nModified" : 1,
"keyUpdates" : 0,
"writeConflicts" : 0,
"numYield" : 17264,
"locks" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(17266),
"w" : NumberLong(17266)
}
},
"Database" : {
"acquireCount" : {
"w" : NumberLong(17266)
}
},
"Collection" : {
"acquireCount" : {
"w" : NumberLong(17265)
}
},
"oplog" : {
"acquireCount" : {
"w" : NumberLong(1)
}
}
},
"millis" : 3864,
"execStats" : {

},
"ts" : ISODate("2018-11-20T10:40:25.104Z"),
"client" : "172.2.3.52",
"allUsers" : [ ],
"user" : ""
}
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/3af012f8-9d3f-4402-8e7e-69be77dd3457%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Robert Cochran

2018-11-20 23:13:48 UTC

Permalink

I know this does not answer your issue -- I understand that -- but I want
to say that my understanding is that MongoDB version 3.0.8 reached
end-of-life in February 2018. You should upgrade to a supported version.

Thanks so much

Bob

Post by Abhishek Somwanshi
Hello
We are using mongo setup with replica sets on AWS.
1 Primary node: r3.8xlarge
4 secondary nodes: r5.xlarge
Mongo version: 3.0.8 (WiredTiger)
Database size: 358GB
We have configured mongo cloud monitoring for this setup. Here are few
5.32KB/S)
1.1K Connections on average on primary node
Tickets available on primary: Reads: 125, Write: 100
Queues: Total: 15-20, Read : 0-1, Write: 15-20
CPU and memory stats on the instance look pretty much under control.
We are using 800gb GP2 EBS volume(2400 IOPS) and we are consuming around
2000 IOPS. The burst balance is almost available to full capacity which
means that also is not exhausted.
Primary node is of r3.8xlarge type, so it has 10gig network.
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 64000 64000
processes
Max open files 64000 64000 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 1967994 1967994 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
We are accessing mongo using Java application and we are seeing that a lot
of our queries are taking considerable amount of time such as 2-4 seconds.
We enabled profiling and listed slowest queries. In that we found that a
sample query for update object, acquired intent lock 17000 times and had to
yield those many times.
We are not able to understand if it is intent lock then why does it need
to yield so many times? Does intent lock also not allow other operations to
proceed?
If the query has to yield so many times, how can we see the advantage of
document level lock?
We have also seen some of our queries getting timed out(probably waiting
for the lock too long and eventually dying).
Could you please guide us on how to debug this problem and improve the
performance of our mongo ?
{
"op" : "update",
"ns" : "backend.Scores",
"query" : {
"channel.id" : "hkxj",
"metric" : "YVR",
"date" : ISODate("2018-11-20T10:00:00Z")
},
"updateobj" : {
"$set" : {
"channel" : {
"id" : "hkxj",
"failures" : 39,
"items" : [
{
"_id" : ObjectId(
"5bf3e434800075956f1"),
"image" : "
http://images.abcd.com/something/wscoob.jpg",
"b_time" : ISODate(
"2018-11-26T19:24:00Z"),
"title" : "What's New ",
"id" : "fq969"
},
{
"_id" : ObjectId(
"5bf3e43f800075956f0"),
"image" : "
http://images.abcd.com/something/w4citv2.jpg",
"broadcast_time" : ISODate
("2018-11-26T20:24:00Z"),
"title" : "Let's End This"
,
"id" : "fsfgd"
}
]
},
"metric" : "YVR",
"date" : ISODate("2018-11-20T10:00:00Z")
},
"$setOnInsert" : {
"__v" : 0
}
},
"nscanned" : 0,
"nscannedObjects" : 2209900,
"nMatched" : 1,
"nModified" : 1,
"keyUpdates" : 0,
"writeConflicts" : 0,
"numYield" : 17264,
"locks" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(17266),
"w" : NumberLong(17266)
}
},
"Database" : {
"acquireCount" : {
"w" : NumberLong(17266)
}
},
"Collection" : {
"acquireCount" : {
"w" : NumberLong(17265)
}
},
"oplog" : {
"acquireCount" : {
"w" : NumberLong(1)
}
}
},
"millis" : 3864,
"execStats" : {
},
"ts" : ISODate("2018-11-20T10:40:25.104Z"),
"client" : "172.2.3.52",
"allUsers" : [ ],
"user" : ""
}

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/86616928-cda0-4dfc-aac4-25668796f27f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

'Kevin Adistambha' via mongodb-user

2018-11-23 01:33:03 UTC

Permalink

Hi Abhishek,

I think this part of the output you posted showed what went wrong:

"nscanned" : 0,
"nscannedObjects" : 2209900,
"nMatched" : 1,
"nModified" : 1,
"keyUpdates" : 0,
"writeConflicts" : 0,
"numYield" : 17264,

So the query scanned 2,209,900 documents to find the single one to update.
Since nscanned is zero, I think the collection is not indexed, thus MongoDB
was forced to load all documents from disk and check if any of them matches
the update criteria. As you can imagine, this is very inefficient. The high
number of yields was due to MongoDB waiting for the document to be loaded
from disk.

In short, the slow performance has nothing to do with locks. It was due to
an unindexed update command in a relatively large collection.

I believe this is the query involved in the update:

"query" : {
"channel.id" : "hkxj",
"metric" : "YVR",
"date" : ISODate("2018-11-20T10:00:00Z")

You might want to check if adding an index such as:

db.collection.createIndex({channel.id: 1, metric: 1, date: 1})

would help with this query. See Indexes
<https://docs.mongodb.com/manual/indexes/> and Indexing Strategies
<https://docs.mongodb.com/manual/applications/indexes/> for more
information.

Please try this in a development environment before doing this in
production. Also remember to take backups before you do any admin work such
as this in your prod environment.

I would also like to reiterate Bobâs comment that MongoDB 3.0 is very old
and is out of support. Please consider upgrading to a newer version (4.0.4
is the newest currently) for bugfixes and improvements. For upgrading,
please see Upgrade to the Latest Revision of MongoDB
<https://docs.mongodb.com/manual/tutorial/upgrade-revision/index.html>,
noting that MongoDB would have to be upgraded by following major versions
progression, e.g. 3.0 -> 3.2 -> 3.4 -> 3.6 -> 4.0.

Best regards,
Kevin
â
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/905853e0-ce6a-4f5b-95a9-1be2247e4d78%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Abhishek Somwanshi

2018-11-26 05:30:03 UTC

Permalink

Thank you Robert and Kevin. I was able to get rid of the query which was
scanning million records to return one. Now when I move to next set of
slowest queries, there are few update queries which are waiting for lock.
What is strange is they are waiting for intent lock(w) for 2.5 seconds. Can
you tell me what could be the possible reason for this?
Here's the execution details for one such query:
"writeConflicts" : 0,
"numYield" : 1,
"locks" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(3),
"w" : NumberLong(3)
}
},
"Database" : {
"acquireCount" : {
"w" : NumberLong(3)
},
"acquireWaitCount" : {
"w" : NumberLong(1)
},
"timeAcquiringMicros" : {
"w" : NumberLong(2497196)
}
},
"Collection" : {
"acquireCount" : {
"w" : NumberLong(2)
}
},
"oplog" : {
"acquireCount" : {
"w" : NumberLong(1)
}
}
},
"millis" : 2500,

Post by 'Kevin Adistambha' via mongodb-user
Hi Abhishek,
"nscanned" : 0,
"nscannedObjects" : 2209900,
"nMatched" : 1,
"nModified" : 1,
"keyUpdates" : 0,
"writeConflicts" : 0,
"numYield" : 17264,
So the query scanned 2,209,900 documents to find the single one to update.
Since nscanned is zero, I think the collection is not indexed, thus
MongoDB was forced to load all documents from disk and check if any of them
matches the update criteria. As you can imagine, this is very inefficient.
The high number of yields was due to MongoDB waiting for the document to be
loaded from disk.
In short, the slow performance has nothing to do with locks. It was due to
an unindexed update command in a relatively large collection.
"query" : {
"channel.id" : "hkxj",
"metric" : "YVR",
"date" : ISODate("2018-11-20T10:00:00Z")
db.collection.createIndex({channel.id: 1, metric: 1, date: 1})
would help with this query. See Indexes
<https://docs.mongodb.com/manual/indexes/> and Indexing Strategies
<https://docs.mongodb.com/manual/applications/indexes/> for more
information.
Please try this in a development environment before doing this in
production. Also remember to take backups before you do any admin work such
as this in your prod environment.
I would also like to reiterate Bobâs comment that MongoDB 3.0 is very old
and is out of support. Please consider upgrading to a newer version (4.0.4
is the newest currently) for bugfixes and improvements. For upgrading,
please see Upgrade to the Latest Revision of MongoDB
<https://docs.mongodb.com/manual/tutorial/upgrade-revision/index.html>,
noting that MongoDB would have to be upgraded by following major versions
progression, e.g. 3.0 -> 3.2 -> 3.4 -> 3.6 -> 4.0.
Best regards,
Kevin
â

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/a15df9b1-434a-43f8-9d46-82ea46c04aab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

'Kevin Adistambha' via mongodb-user

2018-11-28 03:29:05 UTC

Permalink

Hi Abhishek

According to the What type of locking does MongoDB use
<https://docs.mongodb.com/manual/faq/concurrency/#faq-concurrency-locking>,
intent lock provides a signal that a thread is intending to do a read or
write. This is different from an exclusive lock, which signals the
intention to take exclusive access of a resource.

What I understand from the output you posted is that the thread to update
yielded 1 time. Typically a thread yields when itâs waiting for something
slow to finish, e.g. fetching the needed data from disk. From the output, I
see that itâs waiting for an intent lock. Based on this output and the
content of the linked page above, I can think of one possibility: the
thread is trying to write the update, but itâs forced to wait behind an
exclusive lock on that document that was obtained by another thread. Of
course, this is probably not the only reason why itâs waiting.

Is this operation generally slow and itâs detrimental to your overall
operation? Is this operation always slow, or is it only slow some time?
Itâs hard to pinpoint what exactly happened when looking at a single
occurrence of an event. This could be caused by many things, e.g. slow
disk, overburdened server, etc. You might be able to gain a more holistic
view of the state of your deployment using tools such as mongostat
<https://docs.mongodb.com/manual/reference/program/mongostat/> or mongotop
<https://docs.mongodb.com/manual/reference/program/mongotop/>. Itâs also
best to observe the recommendations in the Production Notes
<https://docs.mongodb.com/manual/administration/production-notes/> and
check if you have any suboptimal setting.

Best regards,
Kevin
â
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/46e61c3c-05e7-409e-95c7-7e17615747d9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.