Discussion:
Bad performance during updates
(too old to reply)
Barış Soner Uşaklı
2014-10-13 21:11:05 UTC
Permalink
Some info about the setup, single mongodb instance on a 16 core server.

There are 14 nodejs processes with connection pool of 20 each(default of
the driver was 5), so mongostat displays 281 for connections.

I attached the output of mongostat showing the problem, as soon as update
count goes above 100 per second readers start queueing up during which time
the app is unresponsive. This can last 4 seconds or more than 20 seconds
depending on the updates.



<Loading Image...>


insert query update delete getmore command flushes mapped vsize res
faults locked db idx miss % qr|qw ar|aw netIn netOut conn
time
*0 250 3 *0 0 32|0 0 2.03g 5.11g 552m
0 0:0.4% 0 0|0 0|0 53k 131k 281 16:59:20
*0 586 21 3 15 43|0 0 2.03g 5.11g 564m
0 0:2.5% 0 0|0 0|0 95k 394k 281 16:59:21
*0 600 8 *0 9 19|0 0 2.03g 5.11g 554m
0 0:4.0% 0 0|0 0|0 102k 263k 281 16:59:22
*0 641 36 4 15 36|0 0 2.03g 5.11g 559m
0 0:6.9% 0 0|0 0|0 127k 358k 281 16:59:23
*0 348 123 1 16 94|0 0 2.03g 5.11g 563m
0 0:9.8% 0 207|0 0|1 73k 211k 281 16:59:24
*0 739 79 1 35 49|0 0 2.03g 5.11g 556m
0 0:11.2% 0 154|0 1|0 145k 551k 281 16:59:26
*0 738 54 *0 72 43|0 0 2.03g 5.11g 561m
0 0:6.8% 0 143|0 1|0 373k 1m 281 16:59:27
*0 693 28 *0 70 67|0 0 2.03g 5.11g 549m
0 0:5.7% 0 116|0 13|0 349k 1m 281 16:59:28
*0 1719 13 *0 103 67|0 0 2.03g 5.11g 566m
0 0:1.9% 0 0|0 1|0 957k 2m 281 16:59:29
*0 1035 40 5 42 86|0 0 2.03g 5.11g 589m
0 0:4.5% 0 0|0 2|0 301k 1m 281 16:59:30
insert query update delete getmore command flushes mapped vsize res
faults locked db idx miss % qr|qw ar|aw netIn netOut conn
time
*0 509 27 4 17 55|0 0 2.03g 5.11g 591m
0 0:2.9% 0 0|0 0|0 111k 445k 281 16:59:31
*0 405 18 3 7 42|0 0 2.03g 5.11g 555m
0 0:3.9% 0 0|0 0|0 95k 265k 281 16:59:32
*0 325 20 4 5 21|0 0 2.03g 5.11g 558m
0 0:2.5% 0 0|0 0|1 74k 194k 281 16:59:33
*0 629 20 *0 26 44|0 0 2.03g 5.11g 565m
0 0:3.7% 0 0|0 0|0 141k 573k 281 16:59:34
*0 378 19 1 6 34|0 0 2.03g 5.11g 553m
0 0:3.7% 0 0|0 0|0 86k 234k 281 16:59:35
*0 374 9 2 6 15|0 0 2.03g 5.11g 558m
0 0:1.0% 0 0|0 0|0 63k 186k 281 16:59:36
*0 583 20 4 11 64|0 1 2.03g 5.11g 563m
0 0:2.6% 0 0|0 2|0 130k 380k 281 16:59:37
*0 234 11 1 5 11|0 0 2.03g 5.11g 551m
0 0:1.2% 0 0|0 0|0 61k 174k 281 16:59:38
*0 193 2 *0 7 11|0 0 2.03g 5.11g 556m
0 0:0.4% 0 0|0 0|0 43k 160k 281 16:59:39


A little bit about the schema and structure of the app.

There are about 2.6 million documents. Each document has a _key field and
most of them also have value and score fields. So there are lots of
documents like this.

...
{
_id : {}
_key: 'key1',
value: 'somestringvalue',
score: 12345
},
{
_id: {},
_key: 'key1',
value: 'someotherstringvalue',
score: 123456,
}
...

There are 2 compound indices, {_key:1, value: -1} and {_key:1, score: 1}. I
do sorting on score and I have queries that check if certain documents
exists using key and value. There are no queries that don't use indices idx
miss % is always 0.

I can provide more info if needed.

Thanks.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/b847cde0-0fd1-4aa8-a33b-e9cd4f3c4ed1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Rob Moore
2014-10-14 03:41:17 UTC
Permalink
What are the updates doing? Anything in the mongod log about what the slow
operation is?

Rob.
Post by Barış Soner Uşaklı
Some info about the setup, single mongodb instance on a 16 core server.
There are 14 nodejs processes with connection pool of 20 each(default of
the driver was 5), so mongostat displays 281 for connections.
I attached the output of mongostat showing the problem, as soon as update
count goes above 100 per second readers start queueing up during which time
the app is unresponsive. This can last 4 seconds or more than 20 seconds
depending on the updates.
<https://lh4.googleusercontent.com/-xjjgbDCNJRg/VDw-noyB2yI/AAAAAAAAAL0/kHEcmtY_PXY/s1600/Capture.PNG>
insert query update delete getmore command flushes mapped vsize res
faults locked db idx miss % qr|qw ar|aw netIn netOut conn
time
*0 250 3 *0 0 32|0 0 2.03g 5.11g 552m
0 0:0.4% 0 0|0 0|0 53k 131k 281 16:59:20
*0 586 21 3 15 43|0 0 2.03g 5.11g 564m
0 0:2.5% 0 0|0 0|0 95k 394k 281 16:59:21
*0 600 8 *0 9 19|0 0 2.03g 5.11g 554m
0 0:4.0% 0 0|0 0|0 102k 263k 281 16:59:22
*0 641 36 4 15 36|0 0 2.03g 5.11g 559m
0 0:6.9% 0 0|0 0|0 127k 358k 281 16:59:23
*0 348 123 1 16 94|0 0 2.03g 5.11g 563m
0 0:9.8% 0 207|0 0|1 73k 211k 281 16:59:24
*0 739 79 1 35 49|0 0 2.03g 5.11g 556m
0 0:11.2% 0 154|0 1|0 145k 551k 281 16:59:26
*0 738 54 *0 72 43|0 0 2.03g 5.11g 561m
0 0:6.8% 0 143|0 1|0 373k 1m 281 16:59:27
*0 693 28 *0 70 67|0 0 2.03g 5.11g 549m
0 0:5.7% 0 116|0 13|0 349k 1m 281 16:59:28
*0 1719 13 *0 103 67|0 0 2.03g 5.11g 566m
0 0:1.9% 0 0|0 1|0 957k 2m 281 16:59:29
*0 1035 40 5 42 86|0 0 2.03g 5.11g 589m
0 0:4.5% 0 0|0 2|0 301k 1m 281 16:59:30
insert query update delete getmore command flushes mapped vsize res
faults locked db idx miss % qr|qw ar|aw netIn netOut conn
time
*0 509 27 4 17 55|0 0 2.03g 5.11g 591m
0 0:2.9% 0 0|0 0|0 111k 445k 281 16:59:31
*0 405 18 3 7 42|0 0 2.03g 5.11g 555m
0 0:3.9% 0 0|0 0|0 95k 265k 281 16:59:32
*0 325 20 4 5 21|0 0 2.03g 5.11g 558m
0 0:2.5% 0 0|0 0|1 74k 194k 281 16:59:33
*0 629 20 *0 26 44|0 0 2.03g 5.11g 565m
0 0:3.7% 0 0|0 0|0 141k 573k 281 16:59:34
*0 378 19 1 6 34|0 0 2.03g 5.11g 553m
0 0:3.7% 0 0|0 0|0 86k 234k 281 16:59:35
*0 374 9 2 6 15|0 0 2.03g 5.11g 558m
0 0:1.0% 0 0|0 0|0 63k 186k 281 16:59:36
*0 583 20 4 11 64|0 1 2.03g 5.11g 563m
0 0:2.6% 0 0|0 2|0 130k 380k 281 16:59:37
*0 234 11 1 5 11|0 0 2.03g 5.11g 551m
0 0:1.2% 0 0|0 0|0 61k 174k 281 16:59:38
*0 193 2 *0 7 11|0 0 2.03g 5.11g 556m
0 0:0.4% 0 0|0 0|0 43k 160k 281 16:59:39
A little bit about the schema and structure of the app.
There are about 2.6 million documents. Each document has a _key field and
most of them also have value and score fields. So there are lots of
documents like this.
...
{
_id : {}
_key: 'key1',
value: 'somestringvalue',
score: 12345
},
{
_id: {},
_key: 'key1',
value: 'someotherstringvalue',
score: 123456,
}
...
There are 2 compound indices, {_key:1, value: -1} and {_key:1, score: 1}.
I do sorting on score and I have queries that check if certain documents
exists using key and value. There are no queries that don't use indices idx
miss % is always 0.
I can provide more info if needed.
Thanks.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/0aefc7b1-f7be-4d80-9258-e15f8a78178e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Barış Soner Uşaklı
2014-10-14 04:00:24 UTC
Permalink
I see a lot of these in the slow logs.


2014-10-13T23:21:17.639-0400 [conn605] update 0.objects query: { _key:
"uid:1921:tids_read", value: "11343" } update: { $set: { score:
1413256875817.0 } } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1
keyUpdates:1 numYields:0 locks(micros) w:1550 1170ms
2014-10-13T23:21:17.639-0400 [conn605] command 0.$cmd command: update {
update: "objects", updates: [ { q: { _key: "uid:1921:tids_read", value:
"11343" }, u: { $set: { score: 1413256875817.0 } }, multi: false, upsert:
true } ], ordered: false, writeConcern: { w: 1 } } keyUpdates:0 numYields:0
reslen:55 1170ms

And another version of it that doesnt use value but updates a embedded
members array.

2014-10-13T22:45:10.194-0400 [conn558] update 0.objects query: { _key:
"cid:3:read_by_uid" } update: { $addToSet: { members: { $each: [ "2886" ] }
} } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1 keyUpdates:0
numYields:0 locks(micros) w:11993 356ms
2014-10-13T22:45:10.195-0400 [conn558] command 0.$cmd command: update {
update: "objects", updates: [ { q: { _key: "cid:3:read_by_uid" }, u: {
$addToSet: { members: { $each: [ "2886" ] } } }, multi: false, upsert: true
} ], ordered: false, writeConcern: { w: 1 } } keyUpdates:0 numYields:0
reslen:55 356ms
Post by Rob Moore
What are the updates doing? Anything in the mongod log about what the slow
operation is?
Rob.
Post by Barış Soner Uşaklı
Some info about the setup, single mongodb instance on a 16 core server.
There are 14 nodejs processes with connection pool of 20 each(default of
the driver was 5), so mongostat displays 281 for connections.
I attached the output of mongostat showing the problem, as soon as update
count goes above 100 per second readers start queueing up during which time
the app is unresponsive. This can last 4 seconds or more than 20 seconds
depending on the updates.
<https://lh4.googleusercontent.com/-xjjgbDCNJRg/VDw-noyB2yI/AAAAAAAAAL0/kHEcmtY_PXY/s1600/Capture.PNG>
insert query update delete getmore command flushes mapped vsize res
faults locked db idx miss % qr|qw ar|aw netIn netOut conn
time
*0 250 3 *0 0 32|0 0 2.03g 5.11g 552m
0 0:0.4% 0 0|0 0|0 53k 131k 281
16:59:20
*0 586 21 3 15 43|0 0 2.03g 5.11g 564m
0 0:2.5% 0 0|0 0|0 95k 394k 281
16:59:21
*0 600 8 *0 9 19|0 0 2.03g 5.11g 554m
0 0:4.0% 0 0|0 0|0 102k 263k 281
16:59:22
*0 641 36 4 15 36|0 0 2.03g 5.11g 559m
0 0:6.9% 0 0|0 0|0 127k 358k 281
16:59:23
*0 348 123 1 16 94|0 0 2.03g 5.11g 563m
0 0:9.8% 0 207|0 0|1 73k 211k 281
16:59:24
*0 739 79 1 35 49|0 0 2.03g 5.11g 556m
0 0:11.2% 0 154|0 1|0 145k 551k 281
16:59:26
*0 738 54 *0 72 43|0 0 2.03g 5.11g 561m
0 0:6.8% 0 143|0 1|0 373k 1m 281
16:59:27
*0 693 28 *0 70 67|0 0 2.03g 5.11g 549m
0 0:5.7% 0 116|0 13|0 349k 1m 281
16:59:28
*0 1719 13 *0 103 67|0 0 2.03g 5.11g 566m
0 0:1.9% 0 0|0 1|0 957k 2m 281
16:59:29
*0 1035 40 5 42 86|0 0 2.03g 5.11g 589m
0 0:4.5% 0 0|0 2|0 301k 1m 281
16:59:30
insert query update delete getmore command flushes mapped vsize res
faults locked db idx miss % qr|qw ar|aw netIn netOut conn
time
*0 509 27 4 17 55|0 0 2.03g 5.11g 591m
0 0:2.9% 0 0|0 0|0 111k 445k 281
16:59:31
*0 405 18 3 7 42|0 0 2.03g 5.11g 555m
0 0:3.9% 0 0|0 0|0 95k 265k 281
16:59:32
*0 325 20 4 5 21|0 0 2.03g 5.11g 558m
0 0:2.5% 0 0|0 0|1 74k 194k 281
16:59:33
*0 629 20 *0 26 44|0 0 2.03g 5.11g 565m
0 0:3.7% 0 0|0 0|0 141k 573k 281
16:59:34
*0 378 19 1 6 34|0 0 2.03g 5.11g 553m
0 0:3.7% 0 0|0 0|0 86k 234k 281
16:59:35
*0 374 9 2 6 15|0 0 2.03g 5.11g 558m
0 0:1.0% 0 0|0 0|0 63k 186k 281
16:59:36
*0 583 20 4 11 64|0 1 2.03g 5.11g 563m
0 0:2.6% 0 0|0 2|0 130k 380k 281
16:59:37
*0 234 11 1 5 11|0 0 2.03g 5.11g 551m
0 0:1.2% 0 0|0 0|0 61k 174k 281
16:59:38
*0 193 2 *0 7 11|0 0 2.03g 5.11g 556m
0 0:0.4% 0 0|0 0|0 43k 160k 281
16:59:39
A little bit about the schema and structure of the app.
There are about 2.6 million documents. Each document has a _key field and
most of them also have value and score fields. So there are lots of
documents like this.
...
{
_id : {}
_key: 'key1',
value: 'somestringvalue',
score: 12345
},
{
_id: {},
_key: 'key1',
value: 'someotherstringvalue',
score: 123456,
}
...
There are 2 compound indices, {_key:1, value: -1} and {_key:1, score: 1}.
I do sorting on score and I have queries that check if certain documents
exists using key and value. There are no queries that don't use indices idx
miss % is always 0.
I can provide more info if needed.
Thanks.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/9d3ee002-e87c-4c28-acac-e066147609bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Rob Moore
2014-10-14 04:17:13 UTC
Permalink
Given that the lock time for each of those is much smaller than the
operation length (1.550ms holding the write lock vs. 1170ms for the entire
operation) I am going to guess that they were also blocked by whatever is
the operation that triggers the slow down.

Can you look in the logs for either the first slow operation (or it could
be later in the log but will then likely have a high numYeilds value)
around 16:59:23?

You might want to also try using the mtools package to visualize the logs
to find the triggering operation.

https://github.com/rueckstiess/mtools/wiki look at mlogvis.

I did not see any faulting occurring and the mapped size was small so I
don;t think you are running into a disk I/O problem which makes me think
this is still related to a slow operation.

Can you turn on the profiler on this system so we can query for the slow
operation after it happens again?

Rob.
Post by Barış Soner Uşaklı
I see a lot of these in the slow logs.
1413256875817.0 } } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1
keyUpdates:1 numYields:0 locks(micros) w:1550 1170ms
2014-10-13T23:21:17.639-0400 [conn605] command 0.$cmd command: update {
true } ], ordered: false, writeConcern: { w: 1 } } keyUpdates:0 numYields:0
reslen:55 1170ms
And another version of it that doesnt use value but updates a embedded
members array.
"cid:3:read_by_uid" } update: { $addToSet: { members: { $each: [ "2886" ] }
} } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1 keyUpdates:0
numYields:0 locks(micros) w:11993 356ms
2014-10-13T22:45:10.195-0400 [conn558] command 0.$cmd command: update {
update: "objects", updates: [ { q: { _key: "cid:3:read_by_uid" }, u: {
$addToSet: { members: { $each: [ "2886" ] } } }, multi: false, upsert: true
} ], ordered: false, writeConcern: { w: 1 } } keyUpdates:0 numYields:0
reslen:55 356ms
Post by Rob Moore
What are the updates doing? Anything in the mongod log about what the
slow operation is?
Rob.
Post by Barış Soner Uşaklı
Some info about the setup, single mongodb instance on a 16 core server.
There are 14 nodejs processes with connection pool of 20 each(default of
the driver was 5), so mongostat displays 281 for connections.
I attached the output of mongostat showing the problem, as soon as
update count goes above 100 per second readers start queueing up during
which time the app is unresponsive. This can last 4 seconds or more than 20
seconds depending on the updates.
<https://lh4.googleusercontent.com/-xjjgbDCNJRg/VDw-noyB2yI/AAAAAAAAAL0/kHEcmtY_PXY/s1600/Capture.PNG>
insert query update delete getmore command flushes mapped vsize res
faults locked db idx miss % qr|qw ar|aw netIn netOut conn
time
*0 250 3 *0 0 32|0 0 2.03g 5.11g 552m
0 0:0.4% 0 0|0 0|0 53k 131k 281
16:59:20
*0 586 21 3 15 43|0 0 2.03g 5.11g 564m
0 0:2.5% 0 0|0 0|0 95k 394k 281
16:59:21
*0 600 8 *0 9 19|0 0 2.03g 5.11g 554m
0 0:4.0% 0 0|0 0|0 102k 263k 281
16:59:22
*0 641 36 4 15 36|0 0 2.03g 5.11g 559m
0 0:6.9% 0 0|0 0|0 127k 358k 281
16:59:23
*0 348 123 1 16 94|0 0 2.03g 5.11g 563m
0 0:9.8% 0 207|0 0|1 73k 211k 281
16:59:24
*0 739 79 1 35 49|0 0 2.03g 5.11g 556m
0 0:11.2% 0 154|0 1|0 145k 551k 281
16:59:26
*0 738 54 *0 72 43|0 0 2.03g 5.11g 561m
0 0:6.8% 0 143|0 1|0 373k 1m 281
16:59:27
*0 693 28 *0 70 67|0 0 2.03g 5.11g 549m
0 0:5.7% 0 116|0 13|0 349k 1m 281
16:59:28
*0 1719 13 *0 103 67|0 0 2.03g 5.11g 566m
0 0:1.9% 0 0|0 1|0 957k 2m 281
16:59:29
*0 1035 40 5 42 86|0 0 2.03g 5.11g 589m
0 0:4.5% 0 0|0 2|0 301k 1m 281
16:59:30
insert query update delete getmore command flushes mapped vsize res
faults locked db idx miss % qr|qw ar|aw netIn netOut conn
time
*0 509 27 4 17 55|0 0 2.03g 5.11g 591m
0 0:2.9% 0 0|0 0|0 111k 445k 281
16:59:31
*0 405 18 3 7 42|0 0 2.03g 5.11g 555m
0 0:3.9% 0 0|0 0|0 95k 265k 281
16:59:32
*0 325 20 4 5 21|0 0 2.03g 5.11g 558m
0 0:2.5% 0 0|0 0|1 74k 194k 281
16:59:33
*0 629 20 *0 26 44|0 0 2.03g 5.11g 565m
0 0:3.7% 0 0|0 0|0 141k 573k 281
16:59:34
*0 378 19 1 6 34|0 0 2.03g 5.11g 553m
0 0:3.7% 0 0|0 0|0 86k 234k 281
16:59:35
*0 374 9 2 6 15|0 0 2.03g 5.11g 558m
0 0:1.0% 0 0|0 0|0 63k 186k 281
16:59:36
*0 583 20 4 11 64|0 1 2.03g 5.11g 563m
0 0:2.6% 0 0|0 2|0 130k 380k 281
16:59:37
*0 234 11 1 5 11|0 0 2.03g 5.11g 551m
0 0:1.2% 0 0|0 0|0 61k 174k 281
16:59:38
*0 193 2 *0 7 11|0 0 2.03g 5.11g 556m
0 0:0.4% 0 0|0 0|0 43k 160k 281
16:59:39
A little bit about the schema and structure of the app.
There are about 2.6 million documents. Each document has a _key field
and most of them also have value and score fields. So there are lots of
documents like this.
...
{
_id : {}
_key: 'key1',
value: 'somestringvalue',
score: 12345
},
{
_id: {},
_key: 'key1',
value: 'someotherstringvalue',
score: 123456,
}
...
1}. I do sorting on score and I have queries that check if certain
documents exists using key and value. There are no queries that don't use
indices idx miss % is always 0.
I can provide more info if needed.
Thanks.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/ab9195c0-9ff6-49f4-bca8-75cfbe0568f9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Barış Soner Uşaklı
2014-10-14 04:39:41 UTC
Permalink
I got an html file after running it through mlogvis, although its really
slow the log file was 100mb, What should I be looking at in this high num
yields?

<Loading Image...>
Post by Rob Moore
Given that the lock time for each of those is much smaller than the
operation length (1.550ms holding the write lock vs. 1170ms for the entire
operation) I am going to guess that they were also blocked by whatever is
the operation that triggers the slow down.
Can you look in the logs for either the first slow operation (or it could
be later in the log but will then likely have a high numYeilds value)
around 16:59:23?
You might want to also try using the mtools package to visualize the logs
to find the triggering operation.
https://github.com/rueckstiess/mtools/wiki look at mlogvis.
I did not see any faulting occurring and the mapped size was small so I
don;t think you are running into a disk I/O problem which makes me think
this is still related to a slow operation.
Can you turn on the profiler on this system so we can query for the slow
operation after it happens again?
Rob.
Post by Barış Soner Uşaklı
I see a lot of these in the slow logs.
1413256875817.0 } } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1
keyUpdates:1 numYields:0 locks(micros) w:1550 1170ms
2014-10-13T23:21:17.639-0400 [conn605] command 0.$cmd command: update {
true } ], ordered: false, writeConcern: { w: 1 } } keyUpdates:0 numYields:0
reslen:55 1170ms
And another version of it that doesnt use value but updates a embedded
members array.
"cid:3:read_by_uid" } update: { $addToSet: { members: { $each: [ "2886" ] }
} } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1 keyUpdates:0
numYields:0 locks(micros) w:11993 356ms
2014-10-13T22:45:10.195-0400 [conn558] command 0.$cmd command: update {
update: "objects", updates: [ { q: { _key: "cid:3:read_by_uid" }, u: {
$addToSet: { members: { $each: [ "2886" ] } } }, multi: false, upsert: true
} ], ordered: false, writeConcern: { w: 1 } } keyUpdates:0 numYields:0
reslen:55 356ms
Post by Rob Moore
What are the updates doing? Anything in the mongod log about what the
slow operation is?
Rob.
Post by Barış Soner Uşaklı
Some info about the setup, single mongodb instance on a 16 core server.
There are 14 nodejs processes with connection pool of 20 each(default
of the driver was 5), so mongostat displays 281 for connections.
I attached the output of mongostat showing the problem, as soon as
update count goes above 100 per second readers start queueing up during
which time the app is unresponsive. This can last 4 seconds or more than 20
seconds depending on the updates.
<https://lh4.googleusercontent.com/-xjjgbDCNJRg/VDw-noyB2yI/AAAAAAAAAL0/kHEcmtY_PXY/s1600/Capture.PNG>
insert query update delete getmore command flushes mapped vsize
res faults locked db idx miss % qr|qw ar|aw netIn netOut conn
time
*0 250 3 *0 0 32|0 0 2.03g 5.11g
552m 0 0:0.4% 0 0|0 0|0 53k 131k 281
16:59:20
*0 586 21 3 15 43|0 0 2.03g 5.11g
564m 0 0:2.5% 0 0|0 0|0 95k 394k 281
16:59:21
*0 600 8 *0 9 19|0 0 2.03g 5.11g
554m 0 0:4.0% 0 0|0 0|0 102k 263k 281
16:59:22
*0 641 36 4 15 36|0 0 2.03g 5.11g
559m 0 0:6.9% 0 0|0 0|0 127k 358k 281
16:59:23
*0 348 123 1 16 94|0 0 2.03g 5.11g
563m 0 0:9.8% 0 207|0 0|1 73k 211k 281
16:59:24
*0 739 79 1 35 49|0 0 2.03g 5.11g
556m 0 0:11.2% 0 154|0 1|0 145k 551k 281
16:59:26
*0 738 54 *0 72 43|0 0 2.03g 5.11g
561m 0 0:6.8% 0 143|0 1|0 373k 1m 281
16:59:27
*0 693 28 *0 70 67|0 0 2.03g 5.11g
549m 0 0:5.7% 0 116|0 13|0 349k 1m 281
16:59:28
*0 1719 13 *0 103 67|0 0 2.03g 5.11g
566m 0 0:1.9% 0 0|0 1|0 957k 2m 281
16:59:29
*0 1035 40 5 42 86|0 0 2.03g 5.11g
589m 0 0:4.5% 0 0|0 2|0 301k 1m 281
16:59:30
insert query update delete getmore command flushes mapped vsize
res faults locked db idx miss % qr|qw ar|aw netIn netOut conn
time
*0 509 27 4 17 55|0 0 2.03g 5.11g
591m 0 0:2.9% 0 0|0 0|0 111k 445k 281
16:59:31
*0 405 18 3 7 42|0 0 2.03g 5.11g
555m 0 0:3.9% 0 0|0 0|0 95k 265k 281
16:59:32
*0 325 20 4 5 21|0 0 2.03g 5.11g
558m 0 0:2.5% 0 0|0 0|1 74k 194k 281
16:59:33
*0 629 20 *0 26 44|0 0 2.03g 5.11g
565m 0 0:3.7% 0 0|0 0|0 141k 573k 281
16:59:34
*0 378 19 1 6 34|0 0 2.03g 5.11g
553m 0 0:3.7% 0 0|0 0|0 86k 234k 281
16:59:35
*0 374 9 2 6 15|0 0 2.03g 5.11g
558m 0 0:1.0% 0 0|0 0|0 63k 186k 281
16:59:36
*0 583 20 4 11 64|0 1 2.03g 5.11g
563m 0 0:2.6% 0 0|0 2|0 130k 380k 281
16:59:37
*0 234 11 1 5 11|0 0 2.03g 5.11g
551m 0 0:1.2% 0 0|0 0|0 61k 174k 281
16:59:38
*0 193 2 *0 7 11|0 0 2.03g 5.11g
556m 0 0:0.4% 0 0|0 0|0 43k 160k 281
16:59:39
A little bit about the schema and structure of the app.
There are about 2.6 million documents. Each document has a _key field
and most of them also have value and score fields. So there are lots of
documents like this.
...
{
_id : {}
_key: 'key1',
value: 'somestringvalue',
score: 12345
},
{
_id: {},
_key: 'key1',
value: 'someotherstringvalue',
score: 123456,
}
...
1}. I do sorting on score and I have queries that check if certain
documents exists using key and value. There are no queries that don't use
indices idx miss % is always 0.
I can provide more info if needed.
Thanks.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/43f6d08e-8349-49a1-9f2b-fe25db8ec0ed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Barış Soner Uşaklı
2014-10-14 05:25:36 UTC
Permalink
<Loading Image...>

So this seems to show something useful I think.

The query 2014-10-14T00:20:30.317-0400 [conn1574] update 0.objects query:
{ _key: "uid:2848:tids_read", value: "11343" } update: { $set: { score:
1413260429824.0 } } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1
keyUpdates:1 numYields:0 locks(micros) w:10044 491ms

Is called many times which I think causes the lockup
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/d4435727-efdb-47cd-ab75-0803188d0060%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Rob Moore
2014-10-14 13:53:30 UTC
Permalink
That is another case were the operation ran quickly once it got the lock
(10044 nanoseconds = 10ms) but the total time for the operation was large
(491ms). That implies that the operation waited about 481ms to obtain the
write lock.

To be honest even the 10ms a bit long for a simple set and index update.
Are the _key fields fairly unique e.g., are there on the order of a million
unique _key values? I am wondering if the score values are not as unique as
I would have expected and there is a large group of entries with the same
_key/score index value. When MongoDB goes to update that index it ends up
having to scan a large number of documents to find the actual record to
update.

Back to the graph: Can you zoom in on one of the periods that has a large
spike in the operation time and look for an operation that is the outlier
in the graph. The "bad" operation could be a read, write, or even a count.
I think there is something that is grabbing and not releasing the lock and
that is what is blocking all of the updates, or only allowing a relatively
small number of writes to run per unit of time.

Rob.
Post by Barış Soner Uşaklı
<https://lh4.googleusercontent.com/-1lU_13d_rdw/VDyyS28MEnI/AAAAAAAAAMU/GNmdVf52eaE/s1600/Capture.PNG>
So this seems to show something useful I think.
The query 2014-10-14T00:20:30.317-0400 [conn1574] update 0.objects
query: { _key: "uid:2848:tids_read", value: "11343" } update: { $set: {
score: 1413260429824.0 } } nscanned:1 nscannedObjects:1 nMatched:1
nModified:1 keyUpdates:1 numYields:0 locks(micros) w:10044 491ms
Is called many times which I think causes the lockup
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/3b6de11c-8bbd-44a6-927f-6a209436c8be%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Barış Soner Uşaklı
2014-10-14 17:04:29 UTC
Permalink
In case of this query { _key: "uid:2848:tids_read", value: "11343" }
update: { $set: { score: 1413260429824.0 } }

Out of 2.9 million documents there are 60 documents with
_key:"uid:2848:tids_read" and it can grow as the user reads more topics.
11343 is the topic id so that is also not unique in the global scheme of
this but together with _key the _key/value pair is unique in the
collection. Although I didn't create a unique compound index on _key/value
maybe I should.

I zoomed into the graph and there seems to be some large read queryies that
use $in. Here are two that have the largest duration.


2014-10-14T00:26:45.259-0400 [conn1825] query 0.objects query: { _key: {
$in: [ "topic:11367", "topic:11240", "topic:11376", "topic:11313",
"topic:11375", "topic:11374", "topic:11370", "topic:11323", "topic:11357",
"topic:11373", "topic:11105", "topic:11092", "topic:10843", "topic:11307",
"topic:11363", "topic:11368", "topic:11365", "topic:11362", "topic:11350",
"topic:11371", "topic:8439", "topic:11351", "topic:11359", "topic:11361",
"topic:11369", "topic:11304", "topic:11334", "topic:11366", "topic:11356",
"topic:11076", "topic:4890", "topic:11364", "topic:11277", "topic:11283",
"topic:11342", "topic:11260", "topic:11186", "topic:11346", "topic:11360",
"topic:11358", "topic:11203", "topic:11285", "topic:11354", "topic:11287",
"topic:5593", "topic:11353", "topic:10929", "topic:11280", "topic:11330",
"topic:11309", "topic:11347", "topic:11355", "topic:11096", "topic:11349",
"topic:11352", "topic:11348", "topic:11198", "topic:11344", "topic:11327",
"topic:10784", "topic:11252", "topic:11341", "topic:11339", "topic:11340",
"topic:11336", "topic:11039", "topic:11335", "topic:10993", "topic:11338",
"topic:11337", "topic:11314", "topic:11324", "topic:11329", "topic:11306",
"topic:2942", "topic:11333", "topic:9831", "topic:11318", "topic:10431",
"topic:11332", "topic:11325", "topic:11328", "topic:11159", "topic:11190",
"topic:11233", "topic:10667", "topic:11326", "topic:11242", "topic:11303",
"topic:7536", "topic:11191", "topic:8695", "topic:11315", "topic:11321",
"topic:5550", "topic:11322", "topic:10144", "topic:11320", "topic:11319",
"topic:10069", "topic:11024", "topic:11312", "topic:11140", "topic:10816",
"topic:11223", "topic:11317", "topic:10984", "topic:11316", "topic:10455",
"topic:10211", "topic:11298", "topic:11238", "topic:11249", "topic:11310",
"topic:11250", "topic:11224", "topic:11028", "topic:11048", "topic:11284",
"topic:11300", "topic:11278", "topic:11212", "topic:11305", "topic:11295",
"topic:4867", "topic:7877", "topic:11259", "topic:11120", "topic:11114",
"topic:11279", "topic:11302", "topic:8646", "topic:11301", "topic:11125",
"topic:11290", "topic:11297", "topic:11294", "topic:11085", "topic:11299",
"topic:9943", "topic:11293", "topic:10992", "topic:11103", "topic:11291",
"topic:11228", "topic:10524", "topic:11209", "topic:11266", "topic:10383",
"topic:11289", "topic:11292", "topic:11133", "topic:11286", "topic:11288",
"topic:11282", "topic:8872", "topic:7890", "topic:11272", "topic:9588",
"topic:11276", "topic:11281", "topic:7937", "topic:11256", "topic:11095",
"topic:11183", "topic:10135", "topic:11097", "topic:11271", "topic:11134",
"topic:11151", "topic:11274", "topic:11275", "topic:11027", "topic:11227",
"topic:11270", "topic:11179", "topic:11107", "topic:11222", "topic:4059",
"topic:11267", "topic:11273", "topic:10557", "topic:11269", "topic:11264",
"topic:11268", "topic:11064", "topic:5580", "topic:11263", "topic:11262",
"topic:11234", "topic:11257", "topic:11189", "topic:11248", "topic:11258",
"topic:11232", "topic:11193", "topic:11208", "topic:9525", "topic:11245",
"topic:11251", "topic:7730", "topic:11153", "topic:11254", "topic:11219",
"topic:11244", "topic:11255", "topic:11253", "topic:11247", "topic:11200",
"topic:11230", "topic:11243", "topic:11042", "topic:11197", "topic:11226",
"topic:11239", "topic:11235", "topic:11236", "topic:11187", "topic:10659",
"topic:11210", "topic:11231", "topic:11165", "topic:10762", "topic:11225",
"topic:11229", "topic:11204", "topic:11206", "topic:9433", "topic:10592",
"topic:11218", "topic:11213", "topic:11221", "topic:11201", "topic:11214",
"topic:11215", "topic:11216", "topic:11211", "topic:10916", "topic:4400",
"topic:11217", "topic:11192", "topic:9509", "topic:11207", "topic:9545",
"topic:11177", "topic:7611", "topic:11109", "topic:11045", "topic:11123",
"topic:11202", "topic:11205", "topic:11194", "topic:11199", "topic:11185",
"topic:11172", "topic:11195", "topic:10626", "topic:11196", "topic:11012",
"topic:11091", "topic:11163", "topic:11168", "topic:10385", "topic:11188",
"topic:11073", "topic:11184", "topic:11178", "topic:6742", "topic:6622",
"topic:11182", "topic:11174", "topic:11181", "topic:11170", "topic:11154",
"topic:11167", "topic:11176", "topic:11175", "topic:11171", "topic:11135",
"topic:11160", "topic:11162", "topic:11128", "topic:9542", "topic:11166",
"topic:11161", "topic:11137", "topic:11164", "topic:11148", "topic:10743",
"topic:11080", "topic:11146", "topic:10721", "topic:11155", "topic:11158",
"topic:11157", "topic:11131", "topic:11156", "topic:11152", "topic:11150",
"topic:11149", "topic:11147", "topic:11145", "topic:11093", "topic:11143",
"topic:10872", "topic:11144", "topic:3716", "topic:11142", "topic:11056",
"topic:11141", "topic:11139", "topic:11121", "topic:11138", "topic:11136",
"topic:11126", "topic:11124", "topic:11129", "topic:11132", "topic:11130",
"topic:3320", "topic:11127", "topic:11113", "topic:11118", "topic:11116",
"topic:11117", "topic:11115", "topic:11041", "topic:10981", "topic:11110",
"topic:11111", "topic:11112", "topic:11108", "topic:11100", "topic:11005",
"topic:11079", "topic:11029", "topic:8798", "topic:11104", "topic:11054",
"topic:10362", "topic:10533", "topic:11106", "topic:10734", "topic:11057",
"topic:11098", "topic:11102", "topic:11101", "topic:11099", "topic:6552",
"topic:10991", "topic:11033", "topic:11094", "topic:11075", "topic:11089",
"topic:11084", "topic:10955", "topic:5911", "topic:11090", "topic:11088",
"topic:11087", "topic:10954", "topic:11067", "topic:11037", "topic:10792",
"topic:11078", "topic:11077", "topic:11025", "topic:11003", "topic:9804",
"topic:11069", "topic:11063", "topic:11074", "topic:10632", "topic:10878",
"topic:11071", "topic:11050", "topic:11023", "topic:11022", "topic:11072",
"topic:11065", "topic:11070", "topic:11062", "topic:11061", "topic:1923",
"topic:10983", "topic:11068", "topic:11066", "topic:3484", "topic:10963",
"topic:11055", "topic:10949", "topic:11059" ] } } planSummary: IXSCAN {
_key: 1, score: -1 } cursorid:30595555047 ntoreturn:0 ntoskip:0
nscanned:162 nscannedObjects:101 keyUpdates:0 numYields:3 locks(micros)
r:322748 nreturned:101 reslen:4807 331ms


And

2014-10-14T00:28:36.878-0400 [conn1712] warning: log line attempted (57k)
over max size (10k), printing beginning and end ... query 0.objects query:
{ _key: { $in: [ "notifications:post:196440:uid:1215",
"notifications:post:196440:uid:1609", "notifications:post:196440:uid:2217",
"notifications:post:196440:uid:891", "notifications:post:196440:uid:1534",
"notifications:post:196440:uid:673", "notifications:post:196440:uid:1302",
"notifications:post:196440:uid:766", "notifications:post:196440:uid:221",
"notifications:post:196440:uid:1067", "notifications:post:196440:uid:2226",
"notifications:post:196440:uid:817", "notifications:post:196440:uid:1477",
"notifications:post:196440:uid:1963",
"notifications:tid:10294:pid:196969:uid:74",
"notifications:post:196440:uid:1661", "notifications:post:196440:uid:361",
"notifications:post:196440:uid:1255",
"notifications:tid:10223:pid:196989:uid:1197",
"notifications:tid:10223:pid:196998:uid:1197",
"notifications:post:196440:uid:236",
"notifications:tid:10223:pid:197019:uid:1197",
"notifications:post:196440:uid:1274",
"notifications:tid:10223:pid:197023:uid:1197",
"notifications:post:196440:uid:1108", "notifications:post:196440:uid:1508",
"notifications:post:196440:uid:1028",
"notifications:tid:10296:pid:197068:uid:361",
"notifications:post:196440:uid:2319", "notifications:post:196440:uid:4390",
"notifications:post:196440:uid:3231", "notifications:post:196440:uid:1428",
"notifications:tid:9621:pid:197108:uid:956",
"notifications:post:196440:uid:3095", "notifications:post:196440:uid:1618",
"notifications:follow:7:uid:3095",
"notifications:tid:10299:pid:197130:uid:3035",
"notifications:post:196440:uid:2539", "notifications:post:196440:uid:1058",
"notifications:post:196440:uid:670", "notifications:post:196440:uid:1509",
"notifications:post:196440:uid:628",
"notifications:tid:10223:pid:197218:uid:1197",
"notifications:post:196440:uid:2048", "notifications:post:196440:uid:3286",
"notifications:post_flag:196446:uid:3095",
"notifications:post:196440:uid:4357", "notifications:post:196440:uid:1665",
"notifications:post:196440:uid:3035", "notifications:post:196440:uid:1807",
"notifications:post:196440:uid:1411", "notifications:post:196440:uid:1535",
"notifications:post:196440:uid:1054",
"notifications:tid:10315:pid:197408:uid:139",
"notifications:tid:10320:pid:197416:uid:1325",
"notifications:post:196440:uid:2612", "notifications:post:197467:uid:1325",
"notifications:post:197467:uid:1637", "notifications:post:197467:uid:1963",
"notifications:post:197467:uid:1040", "notifications:post:197467:uid:1072",
"notifications:post:197467:uid:817", "notifications:post:197467:uid:1349",
"notifications:post:197467:uid:1067", "notifications:post:197467:uid:1373",
"notifications:post:197467:uid:4179", "notifications:post:197467:uid:691",
"notifications:tid:9929:pid:197513:uid:2306",
"notifications:post:197467:uid:1508", "notifications:post:197467:uid:3038",
"notifications:post:197467:uid:3619", "notifications:post:197467:uid:139",
"notifications:post:197467:uid:3010", "notifications:follow:7:uid:4297",
"notifications:post:197467:uid:3231", "notifications:post:197467:uid:3436",
"notifications:post:197467:uid:74", "notifications:post:197467:uid:1274",
"notifications:post:197467:uid:925", "notifications:post:197467:uid:2244",
"notifications:post:197467:uid:1345", "notifications:post:197467:uid:2972",
"notifications:post:196440:uid:830", "notifications:post:197467:uid:45",
"notifications:post:197467:uid:830", "notifications:post:196440:uid:2568",
"notifications:post:1974 .......... post:222550:uid:4253",
"notifications:post:222544:uid:4253", "notifications:post:217836:uid:2558",
"notifications:post:217791:uid:3286", "notifications:post:222597:uid:836",
"notifications:post:222597:uid:1830", "notifications:post:222597:uid:4253",
"notifications:post:222544:uid:1878", "notifications:post:222597:uid:1770",
"notifications:post:222594:uid:2294", "notifications:post:222544:uid:2795",
"notifications:post:222549:uid:2795", "notifications:post:222594:uid:2795",
"notifications:post:222597:uid:1195", "notifications:post:222594:uid:2317",
"notifications:post:222614:uid:2795", "notifications:post:222614:uid:94",
"notifications:post:222614:uid:250", "notifications:post:222614:uid:1696",
"notifications:post:222614:uid:4570",
"notifications:post_flag:222655:uid:1770",
"notifications:post:222614:uid:3661", "notifications:post:222614:uid:999",
"notifications:post:222614:uid:468", "notifications:post:222549:uid:1611",
"notifications:post:222597:uid:1611", "notifications:post:222738:uid:468",
"notifications:post:222738:uid:1246", "notifications:post:222738:uid:1477",
"notifications:post:222682:uid:45", "notifications:post:222738:uid:45",
"notifications:follow:7:uid:4644", "notifications:post:222614:uid:1611",
"notifications:follow:7:uid:4645", "notifications:post:222614:uid:1731",
"notifications:post:222682:uid:593", "notifications:post:222614:uid:593",
"notifications:post:222614:uid:591", "notifications:post:222614:uid:3918",
"notifications:tid:2942:pid:222777:uid:1871",
"notifications:post:222692:uid:1936", "notifications:post:222614:uid:1411",
"notifications:tid:11186:pid:222834:uid:512",
"notifications:post:222682:uid:3327", "notifications:post:222614:uid:1971",
"notifications:post:222692:uid:3883", "notifications:post:217474:uid:94",
"notifications:tid:11076:pid:222922:uid:94",
"notifications:tid:11076:pid:222927:uid:2244",
"notifications:post_flag:222238:uid:1325",
"notifications:post:222614:uid:1928", "notifications:post:222614:uid:1217",
"notifications:tid:11076:pid:223004:uid:4647",
"notifications:post_flag:223004:uid:558",
"notifications:post:222682:uid:2015",
"notifications:post_flag:222998:uid:16",
"notifications:post_flag:222994:uid:16",
"notifications:post:223018:uid:983", "notifications:post:223018:uid:1411",
"notifications:post:223018:uid:1530",
"notifications:post_flag:223004:uid:2244",
"notifications:post:223018:uid:1878", "notifications:post:223018:uid:1611",
"notifications:tid:11076:pid:223059:uid:3537",
"notifications:tid:11076:pid:223068:uid:1876",
"notifications:post:223018:uid:94", "notifications:post:223018:uid:1061",
"notifications:tid:11076:pid:223095:uid:1876",
"notifications:post:223018:uid:147", "notifications:post:223018:uid:1217",
"notifications:post:222614:uid:1058", "notifications:post:223018:uid:691",
"notifications:post:223018:uid:2795", "notifications:post:223018:uid:591",
"notifications:post:223018:uid:3918", "notifications:post:223018:uid:2319",
"notifications:post:223018:uid:1441", "notifications:post:223141:uid:1058",
"notifications:post:223018:uid:2458", "notifications:post:223018:uid:1058",
"notifications:post:223018:uid:1936",
"notifications:post_flag:223167:uid:350",
"notifications:post:223018:uid:593" ] } } planSummary: IXSCAN { _key: 1,
score: -1 } cursorid:30669589930 ntoreturn:0 ntoskip:0 nscanned:129
nscannedObjects:101 keyUpdates:0 numYields:2 locks(micros) r:232144
nreturned:101 reslen:160463 220ms
Post by Rob Moore
That is another case were the operation ran quickly once it got the lock
(10044 nanoseconds = 10ms) but the total time for the operation was large
(491ms). That implies that the operation waited about 481ms to obtain the
write lock.
To be honest even the 10ms a bit long for a simple set and index update.
Are the _key fields fairly unique e.g., are there on the order of a million
unique _key values? I am wondering if the score values are not as unique as
I would have expected and there is a large group of entries with the same
_key/score index value. When MongoDB goes to update that index it ends up
having to scan a large number of documents to find the actual record to
update.
Back to the graph: Can you zoom in on one of the periods that has a large
spike in the operation time and look for an operation that is the outlier
in the graph. The "bad" operation could be a read, write, or even a count.
I think there is something that is grabbing and not releasing the lock and
that is what is blocking all of the updates, or only allowing a relatively
small number of writes to run per unit of time.
Rob.
Post by Barış Soner Uşaklı
<https://lh4.googleusercontent.com/-1lU_13d_rdw/VDyyS28MEnI/AAAAAAAAAMU/GNmdVf52eaE/s1600/Capture.PNG>
So this seems to show something useful I think.
The query 2014-10-14T00:20:30.317-0400 [conn1574] update 0.objects
query: { _key: "uid:2848:tids_read", value: "11343" } update: { $set: {
score: 1413260429824.0 } } nscanned:1 nscannedObjects:1 nMatched:1
nModified:1 keyUpdates:1 numYields:0 locks(micros) w:10044 491ms
Is called many times which I think causes the lockup
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/18c45223-9a7e-4385-94f6-39b1b78ea337%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Barış Soner Uşaklı
2014-10-14 17:08:35 UTC
Permalink
Attached the mlogvis file
Post by Barış Soner Uşaklı
In case of this query { _key: "uid:2848:tids_read", value: "11343" }
update: { $set: { score: 1413260429824.0 } }
Out of 2.9 million documents there are 60 documents with
_key:"uid:2848:tids_read" and it can grow as the user reads more topics.
11343 is the topic id so that is also not unique in the global scheme of
this but together with _key the _key/value pair is unique in the
collection. Although I didn't create a unique compound index on _key/value
maybe I should.
I zoomed into the graph and there seems to be some large read queryies
that use $in. Here are two that have the largest duration.
2014-10-14T00:26:45.259-0400 [conn1825] query 0.objects query: { _key: {
$in: [ "topic:11367", "topic:11240", "topic:11376", "topic:11313",
"topic:11375", "topic:11374", "topic:11370", "topic:11323", "topic:11357",
"topic:11373", "topic:11105", "topic:11092", "topic:10843", "topic:11307",
"topic:11363", "topic:11368", "topic:11365", "topic:11362", "topic:11350",
"topic:11371", "topic:8439", "topic:11351", "topic:11359", "topic:11361",
"topic:11369", "topic:11304", "topic:11334", "topic:11366", "topic:11356",
"topic:11076", "topic:4890", "topic:11364", "topic:11277", "topic:11283",
"topic:11342", "topic:11260", "topic:11186", "topic:11346", "topic:11360",
"topic:11358", "topic:11203", "topic:11285", "topic:11354", "topic:11287",
"topic:5593", "topic:11353", "topic:10929", "topic:11280", "topic:11330",
"topic:11309", "topic:11347", "topic:11355", "topic:11096", "topic:11349",
"topic:11352", "topic:11348", "topic:11198", "topic:11344", "topic:11327",
"topic:10784", "topic:11252", "topic:11341", "topic:11339", "topic:11340",
"topic:11336", "topic:11039", "topic:11335", "topic:10993", "topic:11338",
"topic:11337", "topic:11314", "topic:11324", "topic:11329", "topic:11306",
"topic:2942", "topic:11333", "topic:9831", "topic:11318", "topic:10431",
"topic:11332", "topic:11325", "topic:11328", "topic:11159", "topic:11190",
"topic:11233", "topic:10667", "topic:11326", "topic:11242", "topic:11303",
"topic:7536", "topic:11191", "topic:8695", "topic:11315", "topic:11321",
"topic:5550", "topic:11322", "topic:10144", "topic:11320", "topic:11319",
"topic:10069", "topic:11024", "topic:11312", "topic:11140", "topic:10816",
"topic:11223", "topic:11317", "topic:10984", "topic:11316", "topic:10455",
"topic:10211", "topic:11298", "topic:11238", "topic:11249", "topic:11310",
"topic:11250", "topic:11224", "topic:11028", "topic:11048", "topic:11284",
"topic:11300", "topic:11278", "topic:11212", "topic:11305", "topic:11295",
"topic:4867", "topic:7877", "topic:11259", "topic:11120", "topic:11114",
"topic:11279", "topic:11302", "topic:8646", "topic:11301", "topic:11125",
"topic:11290", "topic:11297", "topic:11294", "topic:11085", "topic:11299",
"topic:9943", "topic:11293", "topic:10992", "topic:11103", "topic:11291",
"topic:11228", "topic:10524", "topic:11209", "topic:11266", "topic:10383",
"topic:11289", "topic:11292", "topic:11133", "topic:11286", "topic:11288",
"topic:11282", "topic:8872", "topic:7890", "topic:11272", "topic:9588",
"topic:11276", "topic:11281", "topic:7937", "topic:11256", "topic:11095",
"topic:11183", "topic:10135", "topic:11097", "topic:11271", "topic:11134",
"topic:11151", "topic:11274", "topic:11275", "topic:11027", "topic:11227",
"topic:11270", "topic:11179", "topic:11107", "topic:11222", "topic:4059",
"topic:11267", "topic:11273", "topic:10557", "topic:11269", "topic:11264",
"topic:11268", "topic:11064", "topic:5580", "topic:11263", "topic:11262",
"topic:11234", "topic:11257", "topic:11189", "topic:11248", "topic:11258",
"topic:11232", "topic:11193", "topic:11208", "topic:9525", "topic:11245",
"topic:11251", "topic:7730", "topic:11153", "topic:11254", "topic:11219",
"topic:11244", "topic:11255", "topic:11253", "topic:11247", "topic:11200",
"topic:11230", "topic:11243", "topic:11042", "topic:11197", "topic:11226",
"topic:11239", "topic:11235", "topic:11236", "topic:11187", "topic:10659",
"topic:11210", "topic:11231", "topic:11165", "topic:10762", "topic:11225",
"topic:11229", "topic:11204", "topic:11206", "topic:9433", "topic:10592",
"topic:11218", "topic:11213", "topic:11221", "topic:11201", "topic:11214",
"topic:11215", "topic:11216", "topic:11211", "topic:10916", "topic:4400",
"topic:11217", "topic:11192", "topic:9509", "topic:11207", "topic:9545",
"topic:11177", "topic:7611", "topic:11109", "topic:11045", "topic:11123",
"topic:11202", "topic:11205", "topic:11194", "topic:11199", "topic:11185",
"topic:11172", "topic:11195", "topic:10626", "topic:11196", "topic:11012",
"topic:11091", "topic:11163", "topic:11168", "topic:10385", "topic:11188",
"topic:11073", "topic:11184", "topic:11178", "topic:6742", "topic:6622",
"topic:11182", "topic:11174", "topic:11181", "topic:11170", "topic:11154",
"topic:11167", "topic:11176", "topic:11175", "topic:11171", "topic:11135",
"topic:11160", "topic:11162", "topic:11128", "topic:9542", "topic:11166",
"topic:11161", "topic:11137", "topic:11164", "topic:11148", "topic:10743",
"topic:11080", "topic:11146", "topic:10721", "topic:11155", "topic:11158",
"topic:11157", "topic:11131", "topic:11156", "topic:11152", "topic:11150",
"topic:11149", "topic:11147", "topic:11145", "topic:11093", "topic:11143",
"topic:10872", "topic:11144", "topic:3716", "topic:11142", "topic:11056",
"topic:11141", "topic:11139", "topic:11121", "topic:11138", "topic:11136",
"topic:11126", "topic:11124", "topic:11129", "topic:11132", "topic:11130",
"topic:3320", "topic:11127", "topic:11113", "topic:11118", "topic:11116",
"topic:11117", "topic:11115", "topic:11041", "topic:10981", "topic:11110",
"topic:11111", "topic:11112", "topic:11108", "topic:11100", "topic:11005",
"topic:11079", "topic:11029", "topic:8798", "topic:11104", "topic:11054",
"topic:10362", "topic:10533", "topic:11106", "topic:10734", "topic:11057",
"topic:11098", "topic:11102", "topic:11101", "topic:11099", "topic:6552",
"topic:10991", "topic:11033", "topic:11094", "topic:11075", "topic:11089",
"topic:11084", "topic:10955", "topic:5911", "topic:11090", "topic:11088",
"topic:11087", "topic:10954", "topic:11067", "topic:11037", "topic:10792",
"topic:11078", "topic:11077", "topic:11025", "topic:11003", "topic:9804",
"topic:11069", "topic:11063", "topic:11074", "topic:10632", "topic:10878",
"topic:11071", "topic:11050", "topic:11023", "topic:11022", "topic:11072",
"topic:11065", "topic:11070", "topic:11062", "topic:11061", "topic:1923",
"topic:10983", "topic:11068", "topic:11066", "topic:3484", "topic:10963",
"topic:11055", "topic:10949", "topic:11059" ] } } planSummary: IXSCAN {
_key: 1, score: -1 } cursorid:30595555047 ntoreturn:0 ntoskip:0
nscanned:162 nscannedObjects:101 keyUpdates:0 numYields:3 locks(micros)
r:322748 nreturned:101 reslen:4807 331ms
And
2014-10-14T00:28:36.878-0400 [conn1712] warning: log line attempted (57k)
{ _key: { $in: [ "notifications:post:196440:uid:1215",
"notifications:post:196440:uid:1609", "notifications:post:196440:uid:2217",
"notifications:post:196440:uid:891", "notifications:post:196440:uid:1534",
"notifications:post:196440:uid:673", "notifications:post:196440:uid:1302",
"notifications:post:196440:uid:766", "notifications:post:196440:uid:221",
"notifications:post:196440:uid:1067", "notifications:post:196440:uid:2226",
"notifications:post:196440:uid:817", "notifications:post:196440:uid:1477",
"notifications:post:196440:uid:1963",
"notifications:tid:10294:pid:196969:uid:74",
"notifications:post:196440:uid:1661", "notifications:post:196440:uid:361",
"notifications:post:196440:uid:1255",
"notifications:tid:10223:pid:196989:uid:1197",
"notifications:tid:10223:pid:196998:uid:1197",
"notifications:post:196440:uid:236",
"notifications:tid:10223:pid:197019:uid:1197",
"notifications:post:196440:uid:1274",
"notifications:tid:10223:pid:197023:uid:1197",
"notifications:post:196440:uid:1108", "notifications:post:196440:uid:1508",
"notifications:post:196440:uid:1028",
"notifications:tid:10296:pid:197068:uid:361",
"notifications:post:196440:uid:2319", "notifications:post:196440:uid:4390",
"notifications:post:196440:uid:3231", "notifications:post:196440:uid:1428",
"notifications:tid:9621:pid:197108:uid:956",
"notifications:post:196440:uid:3095", "notifications:post:196440:uid:1618",
"notifications:follow:7:uid:3095",
"notifications:tid:10299:pid:197130:uid:3035",
"notifications:post:196440:uid:2539", "notifications:post:196440:uid:1058",
"notifications:post:196440:uid:670", "notifications:post:196440:uid:1509",
"notifications:post:196440:uid:628",
"notifications:tid:10223:pid:197218:uid:1197",
"notifications:post:196440:uid:2048", "notifications:post:196440:uid:3286",
"notifications:post_flag:196446:uid:3095",
"notifications:post:196440:uid:4357", "notifications:post:196440:uid:1665",
"notifications:post:196440:uid:3035", "notifications:post:196440:uid:1807",
"notifications:post:196440:uid:1411", "notifications:post:196440:uid:1535",
"notifications:post:196440:uid:1054",
"notifications:tid:10315:pid:197408:uid:139",
"notifications:tid:10320:pid:197416:uid:1325",
"notifications:post:196440:uid:2612", "notifications:post:197467:uid:1325",
"notifications:post:197467:uid:1637", "notifications:post:197467:uid:1963",
"notifications:post:197467:uid:1040", "notifications:post:197467:uid:1072",
"notifications:post:197467:uid:817", "notifications:post:197467:uid:1349",
"notifications:post:197467:uid:1067", "notifications:post:197467:uid:1373",
"notifications:post:197467:uid:4179", "notifications:post:197467:uid:691",
"notifications:tid:9929:pid:197513:uid:2306",
"notifications:post:197467:uid:1508", "notifications:post:197467:uid:3038",
"notifications:post:197467:uid:3619", "notifications:post:197467:uid:139",
"notifications:post:197467:uid:3010", "notifications:follow:7:uid:4297",
"notifications:post:197467:uid:3231", "notifications:post:197467:uid:3436",
"notifications:post:197467:uid:74", "notifications:post:197467:uid:1274",
"notifications:post:197467:uid:925", "notifications:post:197467:uid:2244",
"notifications:post:197467:uid:1345", "notifications:post:197467:uid:2972",
"notifications:post:196440:uid:830", "notifications:post:197467:uid:45",
"notifications:post:197467:uid:830", "notifications:post:196440:uid:2568",
"notifications:post:1974 .......... post:222550:uid:4253",
"notifications:post:222544:uid:4253", "notifications:post:217836:uid:2558",
"notifications:post:217791:uid:3286", "notifications:post:222597:uid:836",
"notifications:post:222597:uid:1830", "notifications:post:222597:uid:4253",
"notifications:post:222544:uid:1878", "notifications:post:222597:uid:1770",
"notifications:post:222594:uid:2294", "notifications:post:222544:uid:2795",
"notifications:post:222549:uid:2795", "notifications:post:222594:uid:2795",
"notifications:post:222597:uid:1195", "notifications:post:222594:uid:2317",
"notifications:post:222614:uid:2795", "notifications:post:222614:uid:94",
"notifications:post:222614:uid:250", "notifications:post:222614:uid:1696",
"notifications:post:222614:uid:4570",
"notifications:post_flag:222655:uid:1770",
"notifications:post:222614:uid:3661", "notifications:post:222614:uid:999",
"notifications:post:222614:uid:468", "notifications:post:222549:uid:1611",
"notifications:post:222597:uid:1611", "notifications:post:222738:uid:468",
"notifications:post:222738:uid:1246", "notifications:post:222738:uid:1477",
"notifications:post:222682:uid:45", "notifications:post:222738:uid:45",
"notifications:follow:7:uid:4644", "notifications:post:222614:uid:1611",
"notifications:follow:7:uid:4645", "notifications:post:222614:uid:1731",
"notifications:post:222682:uid:593", "notifications:post:222614:uid:593",
"notifications:post:222614:uid:591", "notifications:post:222614:uid:3918",
"notifications:tid:2942:pid:222777:uid:1871",
"notifications:post:222692:uid:1936", "notifications:post:222614:uid:1411",
"notifications:tid:11186:pid:222834:uid:512",
"notifications:post:222682:uid:3327", "notifications:post:222614:uid:1971",
"notifications:post:222692:uid:3883", "notifications:post:217474:uid:94",
"notifications:tid:11076:pid:222922:uid:94",
"notifications:tid:11076:pid:222927:uid:2244",
"notifications:post_flag:222238:uid:1325",
"notifications:post:222614:uid:1928", "notifications:post:222614:uid:1217",
"notifications:tid:11076:pid:223004:uid:4647",
"notifications:post_flag:223004:uid:558",
"notifications:post:222682:uid:2015",
"notifications:post_flag:222998:uid:16",
"notifications:post_flag:222994:uid:16",
"notifications:post:223018:uid:983", "notifications:post:223018:uid:1411",
"notifications:post:223018:uid:1530",
"notifications:post_flag:223004:uid:2244",
"notifications:post:223018:uid:1878", "notifications:post:223018:uid:1611",
"notifications:tid:11076:pid:223059:uid:3537",
"notifications:tid:11076:pid:223068:uid:1876",
"notifications:post:223018:uid:94", "notifications:post:223018:uid:1061",
"notifications:tid:11076:pid:223095:uid:1876",
"notifications:post:223018:uid:147", "notifications:post:223018:uid:1217",
"notifications:post:222614:uid:1058", "notifications:post:223018:uid:691",
"notifications:post:223018:uid:2795", "notifications:post:223018:uid:591",
"notifications:post:223018:uid:3918", "notifications:post:223018:uid:2319",
"notifications:post:223018:uid:1441", "notifications:post:223141:uid:1058",
"notifications:post:223018:uid:2458", "notifications:post:223018:uid:1058",
"notifications:post:223018:uid:1936",
"notifications:post_flag:223167:uid:350",
"notifications:post:223018:uid:593" ] } } planSummary: IXSCAN { _key: 1,
score: -1 } cursorid:30669589930 ntoreturn:0 ntoskip:0 nscanned:129
nscannedObjects:101 keyUpdates:0 numYields:2 locks(micros) r:232144
nreturned:101 reslen:160463 220ms
Post by Rob Moore
That is another case were the operation ran quickly once it got the lock
(10044 nanoseconds = 10ms) but the total time for the operation was large
(491ms). That implies that the operation waited about 481ms to obtain the
write lock.
To be honest even the 10ms a bit long for a simple set and index update.
Are the _key fields fairly unique e.g., are there on the order of a million
unique _key values? I am wondering if the score values are not as unique as
I would have expected and there is a large group of entries with the same
_key/score index value. When MongoDB goes to update that index it ends up
having to scan a large number of documents to find the actual record to
update.
Back to the graph: Can you zoom in on one of the periods that has a large
spike in the operation time and look for an operation that is the outlier
in the graph. The "bad" operation could be a read, write, or even a count.
I think there is something that is grabbing and not releasing the lock and
that is what is blocking all of the updates, or only allowing a relatively
small number of writes to run per unit of time.
Rob.
Post by Barış Soner Uşaklı
<https://lh4.googleusercontent.com/-1lU_13d_rdw/VDyyS28MEnI/AAAAAAAAAMU/GNmdVf52eaE/s1600/Capture.PNG>
So this seems to show something useful I think.
The query 2014-10-14T00:20:30.317-0400 [conn1574] update 0.objects
query: { _key: "uid:2848:tids_read", value: "11343" } update: { $set: {
score: 1413260429824.0 } } nscanned:1 nscannedObjects:1 nMatched:1
nModified:1 keyUpdates:1 numYields:0 locks(micros) w:10044 491ms
Is called many times which I think causes the lockup
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/3a75e5dc-f0f6-4ae5-863c-e2e0d2d13221%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Rob Moore
2014-10-15 01:00:39 UTC
Permalink
Thanks for the mongovis file. It helped.

I am not an expert in reading these graphs but here is a screen shot with
it zoomed in on one of the periods were operations are slow. Notice how the
length of time to complete operations increases with time?

<Loading Image...>
At the end of the slow period are 3 green dots. Those are all "killcursor"
commands which represent the end of queries. I then looked at the same
period of time looking at the yield time:

<Loading Image...>


All of the operations are counts with the longest being the one shown:

2014-10-14T00:11:24.906-0400 [conn1623] command 0.$cmd command: count {
count: "objects", query: { _key: "categories:2:tid" }, fields: null }
planSummary: COUNT { _key: 1, score: -1 } keyUpdates:0 numYields:42
locks(micros) r:13361 reslen:48 509ms


I think the combination of the counts and the large $in queries you posted
are blocking the writes and causing the slow downs.

What are the queries and counts used for? Can you make sure that only one
of them is running at a time? Have you considered using a replica set and
moving these expensive queries to a secondary?

Rob
Post by Barış Soner Uşaklı
Attached the mlogvis file
Post by Barış Soner Uşaklı
In case of this query { _key: "uid:2848:tids_read", value: "11343" }
update: { $set: { score: 1413260429824.0 } }
Out of 2.9 million documents there are 60 documents with
_key:"uid:2848:tids_read" and it can grow as the user reads more topics.
11343 is the topic id so that is also not unique in the global scheme of
this but together with _key the _key/value pair is unique in the
collection. Although I didn't create a unique compound index on _key/value
maybe I should.
I zoomed into the graph and there seems to be some large read queryies
that use $in. Here are two that have the largest duration.
2014-10-14T00:26:45.259-0400 [conn1825] query 0.objects query: { _key: {
$in: [ "topic:11367", "topic:11240", "topic:11376", "topic:11313",
"topic:11375", "topic:11374", "topic:11370", "topic:11323", "topic:11357",
"topic:11373", "topic:11105", "topic:11092", "topic:10843", "topic:11307",
"topic:11363", "topic:11368", "topic:11365", "topic:11362", "topic:11350",
"topic:11371", "topic:8439", "topic:11351", "topic:11359", "topic:11361",
"topic:11369", "topic:11304", "topic:11334", "topic:11366", "topic:11356",
"topic:11076", "topic:4890", "topic:11364", "topic:11277", "topic:11283",
"topic:11342", "topic:11260", "topic:11186", "topic:11346", "topic:11360",
"topic:11358", "topic:11203", "topic:11285", "topic:11354", "topic:11287",
"topic:5593", "topic:11353", "topic:10929", "topic:11280", "topic:11330",
"topic:11309", "topic:11347", "topic:11355", "topic:11096", "topic:11349",
"topic:11352", "topic:11348", "topic:11198", "topic:11344", "topic:11327",
"topic:10784", "topic:11252", "topic:11341", "topic:11339", "topic:11340",
"topic:11336", "topic:11039", "topic:11335", "topic:10993", "topic:11338",
"topic:11337", "topic:11314", "topic:11324", "topic:11329", "topic:11306",
"topic:2942", "topic:11333", "topic:9831", "topic:11318", "topic:10431",
"topic:11332", "topic:11325", "topic:11328", "topic:11159", "topic:11190",
"topic:11233", "topic:10667", "topic:11326", "topic:11242", "topic:11303",
"topic:7536", "topic:11191", "topic:8695", "topic:11315", "topic:11321",
"topic:5550", "topic:11322", "topic:10144", "topic:11320", "topic:11319",
"topic:10069", "topic:11024", "topic:11312", "topic:11140", "topic:10816",
"topic:11223", "topic:11317", "topic:10984", "topic:11316", "topic:10455",
"topic:10211", "topic:11298", "topic:11238", "topic:11249", "topic:11310",
"topic:11250", "topic:11224", "topic:11028", "topic:11048", "topic:11284",
"topic:11300", "topic:11278", "topic:11212", "topic:11305", "topic:11295",
"topic:4867", "topic:7877", "topic:11259", "topic:11120", "topic:11114",
"topic:11279", "topic:11302", "topic:8646", "topic:11301", "topic:11125",
"topic:11290", "topic:11297", "topic:11294", "topic:11085", "topic:11299",
"topic:9943", "topic:11293", "topic:10992", "topic:11103", "topic:11291",
"topic:11228", "topic:10524", "topic:11209", "topic:11266", "topic:10383",
"topic:11289", "topic:11292", "topic:11133", "topic:11286", "topic:11288",
"topic:11282", "topic:8872", "topic:7890", "topic:11272", "topic:9588",
"topic:11276", "topic:11281", "topic:7937", "topic:11256", "topic:11095",
"topic:11183", "topic:10135", "topic:11097", "topic:11271", "topic:11134",
"topic:11151", "topic:11274", "topic:11275", "topic:11027", "topic:11227",
"topic:11270", "topic:11179", "topic:11107", "topic:11222", "topic:4059",
"topic:11267", "topic:11273", "topic:10557", "topic:11269", "topic:11264",
"topic:11268", "topic:11064", "topic:5580", "topic:11263", "topic:11262",
"topic:11234", "topic:11257", "topic:11189", "topic:11248", "topic:11258",
"topic:11232", "topic:11193", "topic:11208", "topic:9525", "topic:11245",
"topic:11251", "topic:7730", "topic:11153", "topic:11254", "topic:11219",
"topic:11244", "topic:11255", "topic:11253", "topic:11247", "topic:11200",
"topic:11230", "topic:11243", "topic:11042", "topic:11197", "topic:11226",
"topic:11239", "topic:11235", "topic:11236", "topic:11187", "topic:10659",
"topic:11210", "topic:11231", "topic:11165", "topic:10762", "topic:11225",
"topic:11229", "topic:11204", "topic:11206", "topic:9433", "topic:10592",
"topic:11218", "topic:11213", "topic:11221", "topic:11201", "topic:11214",
"topic:11215", "topic:11216", "topic:11211", "topic:10916", "topic:4400",
"topic:11217", "topic:11192", "topic:9509", "topic:11207", "topic:9545",
"topic:11177", "topic:7611", "topic:11109", "topic:11045", "topic:11123",
"topic:11202", "topic:11205", "topic:11194", "topic:11199", "topic:11185",
"topic:11172", "topic:11195", "topic:10626", "topic:11196", "topic:11012",
"topic:11091", "topic:11163", "topic:11168", "topic:10385", "topic:11188",
"topic:11073", "topic:11184", "topic:11178", "topic:6742", "topic:6622",
"topic:11182", "topic:11174", "topic:11181", "topic:11170", "topic:11154",
"topic:11167", "topic:11176", "topic:11175", "topic:11171", "topic:11135",
"topic:11160", "topic:11162", "topic:11128", "topic:9542", "topic:11166",
"topic:11161", "topic:11137", "topic:11164", "topic:11148", "topic:10743",
"topic:11080", "topic:11146", "topic:10721", "topic:11155", "topic:11158",
"topic:11157", "topic:11131", "topic:11156", "topic:11152", "topic:11150",
"topic:11149", "topic:11147", "topic:11145", "topic:11093", "topic:11143",
"topic:10872", "topic:11144", "topic:3716", "topic:11142", "topic:11056",
"topic:11141", "topic:11139", "topic:11121", "topic:11138", "topic:11136",
"topic:11126", "topic:11124", "topic:11129", "topic:11132", "topic:11130",
"topic:3320", "topic:11127", "topic:11113", "topic:11118", "topic:11116",
"topic:11117", "topic:11115", "topic:11041", "topic:10981", "topic:11110",
"topic:11111", "topic:11112", "topic:11108", "topic:11100", "topic:11005",
"topic:11079", "topic:11029", "topic:8798", "topic:11104", "topic:11054",
"topic:10362", "topic:10533", "topic:11106", "topic:10734", "topic:11057",
"topic:11098", "topic:11102", "topic:11101", "topic:11099", "topic:6552",
"topic:10991", "topic:11033", "topic:11094", "topic:11075", "topic:11089",
"topic:11084", "topic:10955", "topic:5911", "topic:11090", "topic:11088",
"topic:11087", "topic:10954", "topic:11067", "topic:11037", "topic:10792",
"topic:11078", "topic:11077", "topic:11025", "topic:11003", "topic:9804",
"topic:11069", "topic:11063", "topic:11074", "topic:10632", "topic:10878",
"topic:11071", "topic:11050", "topic:11023", "topic:11022", "topic:11072",
"topic:11065", "topic:11070", "topic:11062", "topic:11061", "topic:1923",
"topic:10983", "topic:11068", "topic:11066", "topic:3484", "topic:10963",
"topic:11055", "topic:10949", "topic:11059" ] } } planSummary: IXSCAN {
_key: 1, score: -1 } cursorid:30595555047 ntoreturn:0 ntoskip:0
nscanned:162 nscannedObjects:101 keyUpdates:0 numYields:3 locks(micros)
r:322748 nreturned:101 reslen:4807 331ms
And
2014-10-14T00:28:36.878-0400 [conn1712] warning: log line attempted (57k)
{ _key: { $in: [ "notifications:post:196440:uid:1215",
"notifications:post:196440:uid:1609", "notifications:post:196440:uid:2217",
"notifications:post:196440:uid:891", "notifications:post:196440:uid:1534",
"notifications:post:196440:uid:673", "notifications:post:196440:uid:1302",
"notifications:post:196440:uid:766", "notifications:post:196440:uid:221",
"notifications:post:196440:uid:1067", "notifications:post:196440:uid:2226",
"notifications:post:196440:uid:817", "notifications:post:196440:uid:1477",
"notifications:post:196440:uid:1963",
"notifications:tid:10294:pid:196969:uid:74",
"notifications:post:196440:uid:1661", "notifications:post:196440:uid:361",
"notifications:post:196440:uid:1255",
"notifications:tid:10223:pid:196989:uid:1197",
"notifications:tid:10223:pid:196998:uid:1197",
"notifications:post:196440:uid:236",
"notifications:tid:10223:pid:197019:uid:1197",
"notifications:post:196440:uid:1274",
"notifications:tid:10223:pid:197023:uid:1197",
"notifications:post:196440:uid:1108", "notifications:post:196440:uid:1508",
"notifications:post:196440:uid:1028",
"notifications:tid:10296:pid:197068:uid:361",
"notifications:post:196440:uid:2319", "notifications:post:196440:uid:4390",
"notifications:post:196440:uid:3231", "notifications:post:196440:uid:1428",
"notifications:tid:9621:pid:197108:uid:956",
"notifications:post:196440:uid:3095", "notifications:post:196440:uid:1618",
"notifications:follow:7:uid:3095",
"notifications:tid:10299:pid:197130:uid:3035",
"notifications:post:196440:uid:2539", "notifications:post:196440:uid:1058",
"notifications:post:196440:uid:670", "notifications:post:196440:uid:1509",
"notifications:post:196440:uid:628",
"notifications:tid:10223:pid:197218:uid:1197",
"notifications:post:196440:uid:2048", "notifications:post:196440:uid:3286",
"notifications:post_flag:196446:uid:3095",
"notifications:post:196440:uid:4357", "notifications:post:196440:uid:1665",
"notifications:post:196440:uid:3035", "notifications:post:196440:uid:1807",
"notifications:post:196440:uid:1411", "notifications:post:196440:uid:1535",
"notifications:post:196440:uid:1054",
"notifications:tid:10315:pid:197408:uid:139",
"notifications:tid:10320:pid:197416:uid:1325",
"notifications:post:196440:uid:2612", "notifications:post:197467:uid:1325",
"notifications:post:197467:uid:1637", "notifications:post:197467:uid:1963",
"notifications:post:197467:uid:1040", "notifications:post:197467:uid:1072",
"notifications:post:197467:uid:817", "notifications:post:197467:uid:1349",
"notifications:post:197467:uid:1067", "notifications:post:197467:uid:1373",
"notifications:post:197467:uid:4179", "notifications:post:197467:uid:691",
"notifications:tid:9929:pid:197513:uid:2306",
"notifications:post:197467:uid:1508", "notifications:post:197467:uid:3038",
"notifications:post:197467:uid:3619", "notifications:post:197467:uid:139",
"notifications:post:197467:uid:3010", "notifications:follow:7:uid:4297",
"notifications:post:197467:uid:3231", "notifications:post:197467:uid:3436",
"notifications:post:197467:uid:74", "notifications:post:197467:uid:1274",
"notifications:post:197467:uid:925", "notifications:post:197467:uid:2244",
"notifications:post:197467:uid:1345", "notifications:post:197467:uid:2972",
"notifications:post:196440:uid:830", "notifications:post:197467:uid:45",
"notifications:post:197467:uid:830", "notifications:post:196440:uid:2568",
"notifications:post:1974 .......... post:222550:uid:4253",
"notifications:post:222544:uid:4253", "notifications:post:217836:uid:2558",
"notifications:post:217791:uid:3286", "notifications:post:222597:uid:836",
"notifications:post:222597:uid:1830", "notifications:post:222597:uid:4253",
"notifications:post:222544:uid:1878", "notifications:post:222597:uid:1770",
"notifications:post:222594:uid:2294", "notifications:post:222544:uid:2795",
"notifications:post:222549:uid:2795", "notifications:post:222594:uid:2795",
"notifications:post:222597:uid:1195", "notifications:post:222594:uid:2317",
"notifications:post:222614:uid:2795", "notifications:post:222614:uid:94",
"notifications:post:222614:uid:250", "notifications:post:222614:uid:1696",
"notifications:post:222614:uid:4570",
"notifications:post_flag:222655:uid:1770",
"notifications:post:222614:uid:3661", "notifications:post:222614:uid:999",
"notifications:post:222614:uid:468", "notifications:post:222549:uid:1611",
"notifications:post:222597:uid:1611", "notifications:post:222738:uid:468",
"notifications:post:222738:uid:1246", "notifications:post:222738:uid:1477",
"notifications:post:222682:uid:45", "notifications:post:222738:uid:45",
"notifications:follow:7:uid:4644", "notifications:post:222614:uid:1611",
"notifications:follow:7:uid:4645", "notifications:post:222614:uid:1731",
"notifications:post:222682:uid:593", "notifications:post:222614:uid:593",
"notifications:post:222614:uid:591", "notifications:post:222614:uid:3918",
"notifications:tid:2942:pid:222777:uid:1871",
"notifications:post:222692:uid:1936", "notifications:post:222614:uid:1411",
"notifications:tid:11186:pid:222834:uid:512",
"notifications:post:222682:uid:3327", "notifications:post:222614:uid:1971",
"notifications:post:222692:uid:3883", "notifications:post:217474:uid:94",
"notifications:tid:11076:pid:222922:uid:94",
"notifications:tid:11076:pid:222927:uid:2244",
"notifications:post_flag:222238:uid:1325",
"notifications:post:222614:uid:1928", "notifications:post:222614:uid:1217",
"notifications:tid:11076:pid:223004:uid:4647",
"notifications:post_flag:223004:uid:558",
"notifications:post:222682:uid:2015",
"notifications:post_flag:222998:uid:16",
"notifications:post_flag:222994:uid:16",
"notifications:post:223018:uid:983", "notifications:post:223018:uid:1411",
"notifications:post:223018:uid:1530",
"notifications:post_flag:223004:uid:2244",
"notifications:post:223018:uid:1878", "notifications:post:223018:uid:1611",
"notifications:tid:11076:pid:223059:uid:3537",
"notifications:tid:11076:pid:223068:uid:1876",
"notifications:post:223018:uid:94", "notifications:post:223018:uid:1061",
"notifications:tid:11076:pid:223095:uid:1876",
"notifications:post:223018:uid:147", "notifications:post:223018:uid:1217",
"notifications:post:222614:uid:1058", "notifications:post:223018:uid:691",
"notifications:post:223018:uid:2795", "notifications:post:223018:uid:591",
"notifications:post:223018:uid:3918", "notifications:post:223018:uid:2319",
"notifications:post:223018:uid:1441", "notifications:post:223141:uid:1058",
"notifications:post:223018:uid:2458", "notifications:post:223018:uid:1058",
"notifications:post:223018:uid:1936",
"notifications:post_flag:223167:uid:350",
"notifications:post:223018:uid:593" ] } } planSummary: IXSCAN { _key: 1,
score: -1 } cursorid:30669589930 ntoreturn:0 ntoskip:0 nscanned:129
nscannedObjects:101 keyUpdates:0 numYields:2 locks(micros) r:232144
nreturned:101 reslen:160463 220ms
Post by Rob Moore
That is another case were the operation ran quickly once it got the lock
(10044 nanoseconds = 10ms) but the total time for the operation was large
(491ms). That implies that the operation waited about 481ms to obtain the
write lock.
To be honest even the 10ms a bit long for a simple set and index update.
Are the _key fields fairly unique e.g., are there on the order of a million
unique _key values? I am wondering if the score values are not as unique as
I would have expected and there is a large group of entries with the same
_key/score index value. When MongoDB goes to update that index it ends up
having to scan a large number of documents to find the actual record to
update.
Back to the graph: Can you zoom in on one of the periods that has a
large spike in the operation time and look for an operation that is the
outlier in the graph. The "bad" operation could be a read, write, or even
a count. I think there is something that is grabbing and not releasing the
lock and that is what is blocking all of the updates, or only allowing a
relatively small number of writes to run per unit of time.
Rob.
Post by Barış Soner Uşaklı
<https://lh4.googleusercontent.com/-1lU_13d_rdw/VDyyS28MEnI/AAAAAAAAAMU/GNmdVf52eaE/s1600/Capture.PNG>
So this seems to show something useful I think.
The query 2014-10-14T00:20:30.317-0400 [conn1574] update 0.objects
query: { _key: "uid:2848:tids_read", value: "11343" } update: { $set: {
score: 1413260429824.0 } } nscanned:1 nscannedObjects:1 nMatched:1
nModified:1 keyUpdates:1 numYields:0 locks(micros) w:10044 491ms
Is called many times which I think causes the lockup
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/13d866f4-ea44-4933-a236-8ca16992c9b0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Barış Soner Uşaklı
2014-10-15 01:39:53 UTC
Permalink
The count is used to get the number of topics inside a category, for
example each topic id(tid) is added to categories:2:tid. So documents look
like this.

{
_id:
_key: 'categories:2:tid'
value: '5'
score: 1413336469123
},
{
_id:
_key: 'categories:2:tid',
value: '6',
score: 1413336469123
}

In this case category 2 has 5000 topics so there are 5000 documents with
_key = categories:2:tid, value is unique for every
_key='categories:2:tid'. And score is just the timestamp of the topic
creation time. I guess count is getting slower since it scans all the
documents with that key. I might have to change it so that I store the
topic count in the app instead of counting it all the time.

I have already limited the number of keys I pass into the $in operator and
that seemed to help too.

The problem comes from the fact that most of these operations are trying to
mimic redis where certain operations are much faster. For example that
count is actually O(1) in redis. The actual source code is here if anyone
is
interested https://github.com/NodeBB/NodeBB/blob/master/src/database/mongo/sorted.js.
The above `categories:2:cid` documents simulate a sorted set in redis.
Post by Rob Moore
Thanks for the mongovis file. It helped.
I am not an expert in reading these graphs but here is a screen shot with
it zoomed in on one of the periods were operations are slow. Notice how the
length of time to complete operations increases with time?
<https://lh5.googleusercontent.com/-MX7IVYoD920/VD3BOgm91jI/AAAAAAAADbc/T_v1o4lUsXY/s1600/slowdown.png>
At the end of the slow period are 3 green dots. Those are all "killcursor"
commands which represent the end of queries. I then looked at the same
<https://lh6.googleusercontent.com/-J8T1D3XXOGw/VD3EAgwGJXI/AAAAAAAADbo/CuBpy6x0h2k/s1600/slowdown2.png>
2014-10-14T00:11:24.906-0400 [conn1623] command 0.$cmd command: count {
count: "objects", query: { _key: "categories:2:tid" }, fields: null }
planSummary: COUNT { _key: 1, score: -1 } keyUpdates:0 numYields:42
locks(micros) r:13361 reslen:48 509ms
I think the combination of the counts and the large $in queries you posted
are blocking the writes and causing the slow downs.
What are the queries and counts used for? Can you make sure that only one
of them is running at a time? Have you considered using a replica set and
moving these expensive queries to a secondary?
Rob
Post by Barış Soner Uşaklı
Attached the mlogvis file
Post by Barış Soner Uşaklı
In case of this query { _key: "uid:2848:tids_read", value: "11343" }
update: { $set: { score: 1413260429824.0 } }
Out of 2.9 million documents there are 60 documents with
_key:"uid:2848:tids_read" and it can grow as the user reads more topics.
11343 is the topic id so that is also not unique in the global scheme of
this but together with _key the _key/value pair is unique in the
collection. Although I didn't create a unique compound index on _key/value
maybe I should.
I zoomed into the graph and there seems to be some large read queryies
that use $in. Here are two that have the largest duration.
2014-10-14T00:26:45.259-0400 [conn1825] query 0.objects query: { _key: {
$in: [ "topic:11367", "topic:11240", "topic:11376", "topic:11313",
"topic:11375", "topic:11374", "topic:11370", "topic:11323", "topic:11357",
"topic:11373", "topic:11105", "topic:11092", "topic:10843", "topic:11307",
"topic:11363", "topic:11368", "topic:11365", "topic:11362", "topic:11350",
"topic:11371", "topic:8439", "topic:11351", "topic:11359", "topic:11361",
"topic:11369", "topic:11304", "topic:11334", "topic:11366", "topic:11356",
"topic:11076", "topic:4890", "topic:11364", "topic:11277", "topic:11283",
"topic:11342", "topic:11260", "topic:11186", "topic:11346", "topic:11360",
"topic:11358", "topic:11203", "topic:11285", "topic:11354", "topic:11287",
"topic:5593", "topic:11353", "topic:10929", "topic:11280", "topic:11330",
"topic:11309", "topic:11347", "topic:11355", "topic:11096", "topic:11349",
"topic:11352", "topic:11348", "topic:11198", "topic:11344", "topic:11327",
"topic:10784", "topic:11252", "topic:11341", "topic:11339", "topic:11340",
"topic:11336", "topic:11039", "topic:11335", "topic:10993", "topic:11338",
"topic:11337", "topic:11314", "topic:11324", "topic:11329", "topic:11306",
"topic:2942", "topic:11333", "topic:9831", "topic:11318", "topic:10431",
"topic:11332", "topic:11325", "topic:11328", "topic:11159", "topic:11190",
"topic:11233", "topic:10667", "topic:11326", "topic:11242", "topic:11303",
"topic:7536", "topic:11191", "topic:8695", "topic:11315", "topic:11321",
"topic:5550", "topic:11322", "topic:10144", "topic:11320", "topic:11319",
"topic:10069", "topic:11024", "topic:11312", "topic:11140", "topic:10816",
"topic:11223", "topic:11317", "topic:10984", "topic:11316", "topic:10455",
"topic:10211", "topic:11298", "topic:11238", "topic:11249", "topic:11310",
"topic:11250", "topic:11224", "topic:11028", "topic:11048", "topic:11284",
"topic:11300", "topic:11278", "topic:11212", "topic:11305", "topic:11295",
"topic:4867", "topic:7877", "topic:11259", "topic:11120", "topic:11114",
"topic:11279", "topic:11302", "topic:8646", "topic:11301", "topic:11125",
"topic:11290", "topic:11297", "topic:11294", "topic:11085", "topic:11299",
"topic:9943", "topic:11293", "topic:10992", "topic:11103", "topic:11291",
"topic:11228", "topic:10524", "topic:11209", "topic:11266", "topic:10383",
"topic:11289", "topic:11292", "topic:11133", "topic:11286", "topic:11288",
"topic:11282", "topic:8872", "topic:7890", "topic:11272", "topic:9588",
"topic:11276", "topic:11281", "topic:7937", "topic:11256", "topic:11095",
"topic:11183", "topic:10135", "topic:11097", "topic:11271", "topic:11134",
"topic:11151", "topic:11274", "topic:11275", "topic:11027", "topic:11227",
"topic:11270", "topic:11179", "topic:11107", "topic:11222", "topic:4059",
"topic:11267", "topic:11273", "topic:10557", "topic:11269", "topic:11264",
"topic:11268", "topic:11064", "topic:5580", "topic:11263", "topic:11262",
"topic:11234", "topic:11257", "topic:11189", "topic:11248", "topic:11258",
"topic:11232", "topic:11193", "topic:11208", "topic:9525", "topic:11245",
"topic:11251", "topic:7730", "topic:11153", "topic:11254", "topic:11219",
"topic:11244", "topic:11255", "topic:11253", "topic:11247", "topic:11200",
"topic:11230", "topic:11243", "topic:11042", "topic:11197", "topic:11226",
"topic:11239", "topic:11235", "topic:11236", "topic:11187", "topic:10659",
"topic:11210", "topic:11231", "topic:11165", "topic:10762", "topic:11225",
"topic:11229", "topic:11204", "topic:11206", "topic:9433", "topic:10592",
"topic:11218", "topic:11213", "topic:11221", "topic:11201", "topic:11214",
"topic:11215", "topic:11216", "topic:11211", "topic:10916", "topic:4400",
"topic:11217", "topic:11192", "topic:9509", "topic:11207", "topic:9545",
"topic:11177", "topic:7611", "topic:11109", "topic:11045", "topic:11123",
"topic:11202", "topic:11205", "topic:11194", "topic:11199", "topic:11185",
"topic:11172", "topic:11195", "topic:10626", "topic:11196", "topic:11012",
"topic:11091", "topic:11163", "topic:11168", "topic:10385", "topic:11188",
"topic:11073", "topic:11184", "topic:11178", "topic:6742", "topic:6622",
"topic:11182", "topic:11174", "topic:11181", "topic:11170", "topic:11154",
"topic:11167", "topic:11176", "topic:11175", "topic:11171", "topic:11135",
"topic:11160", "topic:11162", "topic:11128", "topic:9542", "topic:11166",
"topic:11161", "topic:11137", "topic:11164", "topic:11148", "topic:10743",
"topic:11080", "topic:11146", "topic:10721", "topic:11155", "topic:11158",
"topic:11157", "topic:11131", "topic:11156", "topic:11152", "topic:11150",
"topic:11149", "topic:11147", "topic:11145", "topic:11093", "topic:11143",
"topic:10872", "topic:11144", "topic:3716", "topic:11142", "topic:11056",
"topic:11141", "topic:11139", "topic:11121", "topic:11138", "topic:11136",
"topic:11126", "topic:11124", "topic:11129", "topic:11132", "topic:11130",
"topic:3320", "topic:11127", "topic:11113", "topic:11118", "topic:11116",
"topic:11117", "topic:11115", "topic:11041", "topic:10981", "topic:11110",
"topic:11111", "topic:11112", "topic:11108", "topic:11100", "topic:11005",
"topic:11079", "topic:11029", "topic:8798", "topic:11104", "topic:11054",
"topic:10362", "topic:10533", "topic:11106", "topic:10734", "topic:11057",
"topic:11098", "topic:11102", "topic:11101", "topic:11099", "topic:6552",
"topic:10991", "topic:11033", "topic:11094", "topic:11075", "topic:11089",
"topic:11084", "topic:10955", "topic:5911", "topic:11090", "topic:11088",
"topic:11087", "topic:10954", "topic:11067", "topic:11037", "topic:10792",
"topic:11078", "topic:11077", "topic:11025", "topic:11003", "topic:9804",
"topic:11069", "topic:11063", "topic:11074", "topic:10632", "topic:10878",
"topic:11071", "topic:11050", "topic:11023", "topic:11022", "topic:11072",
"topic:11065", "topic:11070", "topic:11062", "topic:11061", "topic:1923",
"topic:10983", "topic:11068", "topic:11066", "topic:3484", "topic:10963",
"topic:11055", "topic:10949", "topic:11059" ] } } planSummary: IXSCAN {
_key: 1, score: -1 } cursorid:30595555047 ntoreturn:0 ntoskip:0
nscanned:162 nscannedObjects:101 keyUpdates:0 numYields:3 locks(micros)
r:322748 nreturned:101 reslen:4807 331ms
And
2014-10-14T00:28:36.878-0400 [conn1712] warning: log line attempted
(57k) over max size (10k), printing beginning and end ... query 0.objects
query: { _key: { $in: [ "notifications:post:196440:uid:1215",
"notifications:post:196440:uid:1609", "notifications:post:196440:uid:2217",
"notifications:post:196440:uid:891", "notifications:post:196440:uid:1534",
"notifications:post:196440:uid:673", "notifications:post:196440:uid:1302",
"notifications:post:196440:uid:766", "notifications:post:196440:uid:221",
"notifications:post:196440:uid:1067", "notifications:post:196440:uid:2226",
"notifications:post:196440:uid:817", "notifications:post:196440:uid:1477",
"notifications:post:196440:uid:1963",
"notifications:tid:10294:pid:196969:uid:74",
"notifications:post:196440:uid:1661", "notifications:post:196440:uid:361",
"notifications:post:196440:uid:1255",
"notifications:tid:10223:pid:196989:uid:1197",
"notifications:tid:10223:pid:196998:uid:1197",
"notifications:post:196440:uid:236",
"notifications:tid:10223:pid:197019:uid:1197",
"notifications:post:196440:uid:1274",
"notifications:tid:10223:pid:197023:uid:1197",
"notifications:post:196440:uid:1108", "notifications:post:196440:uid:1508",
"notifications:post:196440:uid:1028",
"notifications:tid:10296:pid:197068:uid:361",
"notifications:post:196440:uid:2319", "notifications:post:196440:uid:4390",
"notifications:post:196440:uid:3231", "notifications:post:196440:uid:1428",
"notifications:tid:9621:pid:197108:uid:956",
"notifications:post:196440:uid:3095", "notifications:post:196440:uid:1618",
"notifications:follow:7:uid:3095",
"notifications:tid:10299:pid:197130:uid:3035",
"notifications:post:196440:uid:2539", "notifications:post:196440:uid:1058",
"notifications:post:196440:uid:670", "notifications:post:196440:uid:1509",
"notifications:post:196440:uid:628",
"notifications:tid:10223:pid:197218:uid:1197",
"notifications:post:196440:uid:2048", "notifications:post:196440:uid:3286",
"notifications:post_flag:196446:uid:3095",
"notifications:post:196440:uid:4357", "notifications:post:196440:uid:1665",
"notifications:post:196440:uid:3035", "notifications:post:196440:uid:1807",
"notifications:post:196440:uid:1411", "notifications:post:196440:uid:1535",
"notifications:post:196440:uid:1054",
"notifications:tid:10315:pid:197408:uid:139",
"notifications:tid:10320:pid:197416:uid:1325",
"notifications:post:196440:uid:2612", "notifications:post:197467:uid:1325",
"notifications:post:197467:uid:1637", "notifications:post:197467:uid:1963",
"notifications:post:197467:uid:1040", "notifications:post:197467:uid:1072",
"notifications:post:197467:uid:817", "notifications:post:197467:uid:1349",
"notifications:post:197467:uid:1067", "notifications:post:197467:uid:1373",
"notifications:post:197467:uid:4179", "notifications:post:197467:uid:691",
"notifications:tid:9929:pid:197513:uid:2306",
"notifications:post:197467:uid:1508", "notifications:post:197467:uid:3038",
"notifications:post:197467:uid:3619", "notifications:post:197467:uid:139",
"notifications:post:197467:uid:3010", "notifications:follow:7:uid:4297",
"notifications:post:197467:uid:3231", "notifications:post:197467:uid:3436",
"notifications:post:197467:uid:74", "notifications:post:197467:uid:1274",
"notifications:post:197467:uid:925", "notifications:post:197467:uid:2244",
"notifications:post:197467:uid:1345", "notifications:post:197467:uid:2972",
"notifications:post:196440:uid:830", "notifications:post:197467:uid:45",
"notifications:post:197467:uid:830", "notifications:post:196440:uid:2568",
"notifications:post:1974 .......... post:222550:uid:4253",
"notifications:post:222544:uid:4253", "notifications:post:217836:uid:2558",
"notifications:post:217791:uid:3286", "notifications:post:222597:uid:836",
"notifications:post:222597:uid:1830", "notifications:post:222597:uid:4253",
"notifications:post:222544:uid:1878", "notifications:post:222597:uid:1770",
"notifications:post:222594:uid:2294", "notifications:post:222544:uid:2795",
"notifications:post:222549:uid:2795", "notifications:post:222594:uid:2795",
"notifications:post:222597:uid:1195", "notifications:post:222594:uid:2317",
"notifications:post:222614:uid:2795", "notifications:post:222614:uid:94",
"notifications:post:222614:uid:250", "notifications:post:222614:uid:1696",
"notifications:post:222614:uid:4570",
"notifications:post_flag:222655:uid:1770",
"notifications:post:222614:uid:3661", "notifications:post:222614:uid:999",
"notifications:post:222614:uid:468", "notifications:post:222549:uid:1611",
"notifications:post:222597:uid:1611", "notifications:post:222738:uid:468",
"notifications:post:222738:uid:1246", "notifications:post:222738:uid:1477",
"notifications:post:222682:uid:45", "notifications:post:222738:uid:45",
"notifications:follow:7:uid:4644", "notifications:post:222614:uid:1611",
"notifications:follow:7:uid:4645", "notifications:post:222614:uid:1731",
"notifications:post:222682:uid:593", "notifications:post:222614:uid:593",
"notifications:post:222614:uid:591", "notifications:post:222614:uid:3918",
"notifications:tid:2942:pid:222777:uid:1871",
"notifications:post:222692:uid:1936", "notifications:post:222614:uid:1411",
"notifications:tid:11186:pid:222834:uid:512",
"notifications:post:222682:uid:3327", "notifications:post:222614:uid:1971",
"notifications:post:222692:uid:3883", "notifications:post:217474:uid:94",
"notifications:tid:11076:pid:222922:uid:94",
"notifications:tid:11076:pid:222927:uid:2244",
"notifications:post_flag:222238:uid:1325",
"notifications:post:222614:uid:1928", "notifications:post:222614:uid:1217",
"notifications:tid:11076:pid:223004:uid:4647",
"notifications:post_flag:223004:uid:558",
"notifications:post:222682:uid:2015",
"notifications:post_flag:222998:uid:16",
"notifications:post_flag:222994:uid:16",
"notifications:post:223018:uid:983", "notifications:post:223018:uid:1411",
"notifications:post:223018:uid:1530",
"notifications:post_flag:223004:uid:2244",
"notifications:post:223018:uid:1878", "notifications:post:223018:uid:1611",
"notifications:tid:11076:pid:223059:uid:3537",
"notifications:tid:11076:pid:223068:uid:1876",
"notifications:post:223018:uid:94", "notifications:post:223018:uid:1061",
"notifications:tid:11076:pid:223095:uid:1876",
"notifications:post:223018:uid:147", "notifications:post:223018:uid:1217",
"notifications:post:222614:uid:1058", "notifications:post:223018:uid:691",
"notifications:post:223018:uid:2795", "notifications:post:223018:uid:591",
"notifications:post:223018:uid:3918", "notifications:post:223018:uid:2319",
"notifications:post:223018:uid:1441", "notifications:post:223141:uid:1058",
"notifications:post:223018:uid:2458", "notifications:post:223018:uid:1058",
"notifications:post:223018:uid:1936",
"notifications:post_flag:223167:uid:350",
"notifications:post:223018:uid:593" ] } } planSummary: IXSCAN { _key: 1,
score: -1 } cursorid:30669589930 ntoreturn:0 ntoskip:0 nscanned:129
nscannedObjects:101 keyUpdates:0 numYields:2 locks(micros) r:232144
nreturned:101 reslen:160463 220ms
Post by Rob Moore
That is another case were the operation ran quickly once it got the
lock (10044 nanoseconds = 10ms) but the total time for the operation was
large (491ms). That implies that the operation waited about 481ms to obtain
the write lock.
To be honest even the 10ms a bit long for a simple set and index
update. Are the _key fields fairly unique e.g., are there on the order of a
million unique _key values? I am wondering if the score values are not as
unique as I would have expected and there is a large group of entries with
the same _key/score index value. When MongoDB goes to update that index it
ends up having to scan a large number of documents to find the actual
record to update.
Back to the graph: Can you zoom in on one of the periods that has a
large spike in the operation time and look for an operation that is the
outlier in the graph. The "bad" operation could be a read, write, or even
a count. I think there is something that is grabbing and not releasing the
lock and that is what is blocking all of the updates, or only allowing a
relatively small number of writes to run per unit of time.
Rob.
Post by Barış Soner Uşaklı
<https://lh4.googleusercontent.com/-1lU_13d_rdw/VDyyS28MEnI/AAAAAAAAAMU/GNmdVf52eaE/s1600/Capture.PNG>
So this seems to show something useful I think.
The query 2014-10-14T00:20:30.317-0400 [conn1574] update 0.objects
query: { _key: "uid:2848:tids_read", value: "11343" } update: { $set: {
score: 1413260429824.0 } } nscanned:1 nscannedObjects:1 nMatched:1
nModified:1 keyUpdates:1 numYields:0 locks(micros) w:10044 491ms
Is called many times which I think causes the lockup
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/6f4a7b2a-7195-4ebf-a187-4bc83170aeb3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Rob Moore
2014-10-15 17:11:05 UTC
Permalink
I wonder if making the _key / value index "unique" will help the count
since it will then know it does not need to dedup all of the documents it
finds via the B-Tree.

Rob
Post by Barış Soner Uşaklı
The count is used to get the number of topics inside a category, for
example each topic id(tid) is added to categories:2:tid. So documents look
like this.
{
_key: 'categories:2:tid'
value: '5'
score: 1413336469123
},
{
_key: 'categories:2:tid',
value: '6',
score: 1413336469123
}
In this case category 2 has 5000 topics so there are 5000 documents with
_key = categories:2:tid, value is unique for every
_key='categories:2:tid'. And score is just the timestamp of the topic
creation time. I guess count is getting slower since it scans all the
documents with that key. I might have to change it so that I store the
topic count in the app instead of counting it all the time.
I have already limited the number of keys I pass into the $in operator and
that seemed to help too.
The problem comes from the fact that most of these operations are trying
to mimic redis where certain operations are much faster. For example that
count is actually O(1) in redis. The actual source code is here if anyone
is interested
https://github.com/NodeBB/NodeBB/blob/master/src/database/mongo/sorted.js.
The above `categories:2:cid` documents simulate a sorted set in redis.
Post by Rob Moore
Thanks for the mongovis file. It helped.
I am not an expert in reading these graphs but here is a screen shot with
it zoomed in on one of the periods were operations are slow. Notice how the
length of time to complete operations increases with time?
<https://lh5.googleusercontent.com/-MX7IVYoD920/VD3BOgm91jI/AAAAAAAADbc/T_v1o4lUsXY/s1600/slowdown.png>
At the end of the slow period are 3 green dots. Those are all
"killcursor" commands which represent the end of queries. I then looked at
<https://lh6.googleusercontent.com/-J8T1D3XXOGw/VD3EAgwGJXI/AAAAAAAADbo/CuBpy6x0h2k/s1600/slowdown2.png>
2014-10-14T00:11:24.906-0400 [conn1623] command 0.$cmd command: count {
count: "objects", query: { _key: "categories:2:tid" }, fields: null }
planSummary: COUNT { _key: 1, score: -1 } keyUpdates:0 numYields:42
locks(micros) r:13361 reslen:48 509ms
I think the combination of the counts and the large $in queries you
posted are blocking the writes and causing the slow downs.
What are the queries and counts used for? Can you make sure that only one
of them is running at a time? Have you considered using a replica set and
moving these expensive queries to a secondary?
Rob
Post by Barış Soner Uşaklı
Attached the mlogvis file
Post by Barış Soner Uşaklı
In case of this query { _key: "uid:2848:tids_read", value: "11343" }
update: { $set: { score: 1413260429824.0 } }
Out of 2.9 million documents there are 60 documents with
_key:"uid:2848:tids_read" and it can grow as the user reads more topics.
11343 is the topic id so that is also not unique in the global scheme of
this but together with _key the _key/value pair is unique in the
collection. Although I didn't create a unique compound index on _key/value
maybe I should.
I zoomed into the graph and there seems to be some large read queryies
that use $in. Here are two that have the largest duration.
{ $in: [ "topic:11367", "topic:11240", "topic:11376", "topic:11313",
"topic:11375", "topic:11374", "topic:11370", "topic:11323", "topic:11357",
"topic:11373", "topic:11105", "topic:11092", "topic:10843", "topic:11307",
"topic:11363", "topic:11368", "topic:11365", "topic:11362", "topic:11350",
"topic:11371", "topic:8439", "topic:11351", "topic:11359", "topic:11361",
"topic:11369", "topic:11304", "topic:11334", "topic:11366", "topic:11356",
"topic:11076", "topic:4890", "topic:11364", "topic:11277", "topic:11283",
"topic:11342", "topic:11260", "topic:11186", "topic:11346", "topic:11360",
"topic:11358", "topic:11203", "topic:11285", "topic:11354", "topic:11287",
"topic:5593", "topic:11353", "topic:10929", "topic:11280", "topic:11330",
"topic:11309", "topic:11347", "topic:11355", "topic:11096", "topic:11349",
"topic:11352", "topic:11348", "topic:11198", "topic:11344", "topic:11327",
"topic:10784", "topic:11252", "topic:11341", "topic:11339", "topic:11340",
"topic:11336", "topic:11039", "topic:11335", "topic:10993", "topic:11338",
"topic:11337", "topic:11314", "topic:11324", "topic:11329", "topic:11306",
"topic:2942", "topic:11333", "topic:9831", "topic:11318", "topic:10431",
"topic:11332", "topic:11325", "topic:11328", "topic:11159", "topic:11190",
"topic:11233", "topic:10667", "topic:11326", "topic:11242", "topic:11303",
"topic:7536", "topic:11191", "topic:8695", "topic:11315", "topic:11321",
"topic:5550", "topic:11322", "topic:10144", "topic:11320", "topic:11319",
"topic:10069", "topic:11024", "topic:11312", "topic:11140", "topic:10816",
"topic:11223", "topic:11317", "topic:10984", "topic:11316", "topic:10455",
"topic:10211", "topic:11298", "topic:11238", "topic:11249", "topic:11310",
"topic:11250", "topic:11224", "topic:11028", "topic:11048", "topic:11284",
"topic:11300", "topic:11278", "topic:11212", "topic:11305", "topic:11295",
"topic:4867", "topic:7877", "topic:11259", "topic:11120", "topic:11114",
"topic:11279", "topic:11302", "topic:8646", "topic:11301", "topic:11125",
"topic:11290", "topic:11297", "topic:11294", "topic:11085", "topic:11299",
"topic:9943", "topic:11293", "topic:10992", "topic:11103", "topic:11291",
"topic:11228", "topic:10524", "topic:11209", "topic:11266", "topic:10383",
"topic:11289", "topic:11292", "topic:11133", "topic:11286", "topic:11288",
"topic:11282", "topic:8872", "topic:7890", "topic:11272", "topic:9588",
"topic:11276", "topic:11281", "topic:7937", "topic:11256", "topic:11095",
"topic:11183", "topic:10135", "topic:11097", "topic:11271", "topic:11134",
"topic:11151", "topic:11274", "topic:11275", "topic:11027", "topic:11227",
"topic:11270", "topic:11179", "topic:11107", "topic:11222", "topic:4059",
"topic:11267", "topic:11273", "topic:10557", "topic:11269", "topic:11264",
"topic:11268", "topic:11064", "topic:5580", "topic:11263", "topic:11262",
"topic:11234", "topic:11257", "topic:11189", "topic:11248", "topic:11258",
"topic:11232", "topic:11193", "topic:11208", "topic:9525", "topic:11245",
"topic:11251", "topic:7730", "topic:11153", "topic:11254", "topic:11219",
"topic:11244", "topic:11255", "topic:11253", "topic:11247", "topic:11200",
"topic:11230", "topic:11243", "topic:11042", "topic:11197", "topic:11226",
"topic:11239", "topic:11235", "topic:11236", "topic:11187", "topic:10659",
"topic:11210", "topic:11231", "topic:11165", "topic:10762", "topic:11225",
"topic:11229", "topic:11204", "topic:11206", "topic:9433", "topic:10592",
"topic:11218", "topic:11213", "topic:11221", "topic:11201", "topic:11214",
"topic:11215", "topic:11216", "topic:11211", "topic:10916", "topic:4400",
"topic:11217", "topic:11192", "topic:9509", "topic:11207", "topic:9545",
"topic:11177", "topic:7611", "topic:11109", "topic:11045", "topic:11123",
"topic:11202", "topic:11205", "topic:11194", "topic:11199", "topic:11185",
"topic:11172", "topic:11195", "topic:10626", "topic:11196", "topic:11012",
"topic:11091", "topic:11163", "topic:11168", "topic:10385", "topic:11188",
"topic:11073", "topic:11184", "topic:11178", "topic:6742", "topic:6622",
"topic:11182", "topic:11174", "topic:11181", "topic:11170", "topic:11154",
"topic:11167", "topic:11176", "topic:11175", "topic:11171", "topic:11135",
"topic:11160", "topic:11162", "topic:11128", "topic:9542", "topic:11166",
"topic:11161", "topic:11137", "topic:11164", "topic:11148", "topic:10743",
"topic:11080", "topic:11146", "topic:10721", "topic:11155", "topic:11158",
"topic:11157", "topic:11131", "topic:11156", "topic:11152", "topic:11150",
"topic:11149", "topic:11147", "topic:11145", "topic:11093", "topic:11143",
"topic:10872", "topic:11144", "topic:3716", "topic:11142", "topic:11056",
"topic:11141", "topic:11139", "topic:11121", "topic:11138", "topic:11136",
"topic:11126", "topic:11124", "topic:11129", "topic:11132", "topic:11130",
"topic:3320", "topic:11127", "topic:11113", "topic:11118", "topic:11116",
"topic:11117", "topic:11115", "topic:11041", "topic:10981", "topic:11110",
"topic:11111", "topic:11112", "topic:11108", "topic:11100", "topic:11005",
"topic:11079", "topic:11029", "topic:8798", "topic:11104", "topic:11054",
"topic:10362", "topic:10533", "topic:11106", "topic:10734", "topic:11057",
"topic:11098", "topic:11102", "topic:11101", "topic:11099", "topic:6552",
"topic:10991", "topic:11033", "topic:11094", "topic:11075", "topic:11089",
"topic:11084", "topic:10955", "topic:5911", "topic:11090", "topic:11088",
"topic:11087", "topic:10954", "topic:11067", "topic:11037", "topic:10792",
"topic:11078", "topic:11077", "topic:11025", "topic:11003", "topic:9804",
"topic:11069", "topic:11063", "topic:11074", "topic:10632", "topic:10878",
"topic:11071", "topic:11050", "topic:11023", "topic:11022", "topic:11072",
"topic:11065", "topic:11070", "topic:11062", "topic:11061", "topic:1923",
"topic:10983", "topic:11068", "topic:11066", "topic:3484", "topic:10963",
"topic:11055", "topic:10949", "topic:11059" ] } } planSummary: IXSCAN {
_key: 1, score: -1 } cursorid:30595555047 ntoreturn:0 ntoskip:0
nscanned:162 nscannedObjects:101 keyUpdates:0 numYields:3 locks(micros)
r:322748 nreturned:101 reslen:4807 331ms
And
2014-10-14T00:28:36.878-0400 [conn1712] warning: log line attempted
(57k) over max size (10k), printing beginning and end ... query 0.objects
query: { _key: { $in: [ "notifications:post:196440:uid:1215",
"notifications:post:196440:uid:1609", "notifications:post:196440:uid:2217",
"notifications:post:196440:uid:891", "notifications:post:196440:uid:1534",
"notifications:post:196440:uid:673", "notifications:post:196440:uid:1302",
"notifications:post:196440:uid:766", "notifications:post:196440:uid:221",
"notifications:post:196440:uid:1067", "notifications:post:196440:uid:2226",
"notifications:post:196440:uid:817", "notifications:post:196440:uid:1477",
"notifications:post:196440:uid:1963",
"notifications:tid:10294:pid:196969:uid:74",
"notifications:post:196440:uid:1661", "notifications:post:196440:uid:361",
"notifications:post:196440:uid:1255",
"notifications:tid:10223:pid:196989:uid:1197",
"notifications:tid:10223:pid:196998:uid:1197",
"notifications:post:196440:uid:236",
"notifications:tid:10223:pid:197019:uid:1197",
"notifications:post:196440:uid:1274",
"notifications:tid:10223:pid:197023:uid:1197",
"notifications:post:196440:uid:1108", "notifications:post:196440:uid:1508",
"notifications:post:196440:uid:1028",
"notifications:tid:10296:pid:197068:uid:361",
"notifications:post:196440:uid:2319", "notifications:post:196440:uid:4390",
"notifications:post:196440:uid:3231", "notifications:post:196440:uid:1428",
"notifications:tid:9621:pid:197108:uid:956",
"notifications:post:196440:uid:3095", "notifications:post:196440:uid:1618",
"notifications:follow:7:uid:3095",
"notifications:tid:10299:pid:197130:uid:3035",
"notifications:post:196440:uid:2539", "notifications:post:196440:uid:1058",
"notifications:post:196440:uid:670", "notifications:post:196440:uid:1509",
"notifications:post:196440:uid:628",
"notifications:tid:10223:pid:197218:uid:1197",
"notifications:post:196440:uid:2048", "notifications:post:196440:uid:3286",
"notifications:post_flag:196446:uid:3095",
"notifications:post:196440:uid:4357", "notifications:post:196440:uid:1665",
"notifications:post:196440:uid:3035", "notifications:post:196440:uid:1807",
"notifications:post:196440:uid:1411", "notifications:post:196440:uid:1535",
"notifications:post:196440:uid:1054",
"notifications:tid:10315:pid:197408:uid:139",
"notifications:tid:10320:pid:197416:uid:1325",
"notifications:post:196440:uid:2612", "notifications:post:197467:uid:1325",
"notifications:post:197467:uid:1637", "notifications:post:197467:uid:1963",
"notifications:post:197467:uid:1040", "notifications:post:197467:uid:1072",
"notifications:post:197467:uid:817", "notifications:post:197467:uid:1349",
"notifications:post:197467:uid:1067", "notifications:post:197467:uid:1373",
"notifications:post:197467:uid:4179", "notifications:post:197467:uid:691",
"notifications:tid:9929:pid:197513:uid:2306",
"notifications:post:197467:uid:1508", "notifications:post:197467:uid:3038",
"notifications:post:197467:uid:3619", "notifications:post:197467:uid:139",
"notifications:post:197467:uid:3010", "notifications:follow:7:uid:4297",
"notifications:post:197467:uid:3231", "notifications:post:197467:uid:3436",
"notifications:post:197467:uid:74", "notifications:post:197467:uid:1274",
"notifications:post:197467:uid:925", "notifications:post:197467:uid:2244",
"notifications:post:197467:uid:1345", "notifications:post:197467:uid:2972",
"notifications:post:196440:uid:830", "notifications:post:197467:uid:45",
"notifications:post:197467:uid:830", "notifications:post:196440:uid:2568",
"notifications:post:1974 .......... post:222550:uid:4253",
"notifications:post:222544:uid:4253", "notifications:post:217836:uid:2558",
"notifications:post:217791:uid:3286", "notifications:post:222597:uid:836",
"notifications:post:222597:uid:1830", "notifications:post:222597:uid:4253",
"notifications:post:222544:uid:1878", "notifications:post:222597:uid:1770",
"notifications:post:222594:uid:2294", "notifications:post:222544:uid:2795",
"notifications:post:222549:uid:2795", "notifications:post:222594:uid:2795",
"notifications:post:222597:uid:1195", "notifications:post:222594:uid:2317",
"notifications:post:222614:uid:2795", "notifications:post:222614:uid:94",
"notifications:post:222614:uid:250", "notifications:post:222614:uid:1696",
"notifications:post:222614:uid:4570",
"notifications:post_flag:222655:uid:1770",
"notifications:post:222614:uid:3661", "notifications:post:222614:uid:999",
"notifications:post:222614:uid:468", "notifications:post:222549:uid:1611",
"notifications:post:222597:uid:1611", "notifications:post:222738:uid:468",
"notifications:post:222738:uid:1246", "notifications:post:222738:uid:1477",
"notifications:post:222682:uid:45", "notifications:post:222738:uid:45",
"notifications:follow:7:uid:4644", "notifications:post:222614:uid:1611",
"notifications:follow:7:uid:4645", "notifications:post:222614:uid:1731",
"notifications:post:222682:uid:593", "notifications:post:222614:uid:593",
"notifications:post:222614:uid:591", "notifications:post:222614:uid:3918",
"notifications:tid:2942:pid:222777:uid:1871",
"notifications:post:222692:uid:1936", "notifications:post:222614:uid:1411",
"notifications:tid:11186:pid:222834:uid:512",
"notifications:post:222682:uid:3327", "notifications:post:222614:uid:1971",
"notifications:post:222692:uid:3883", "notifications:post:217474:uid:94",
"notifications:tid:11076:pid:222922:uid:94",
"notifications:tid:11076:pid:222927:uid:2244",
"notifications:post_flag:222238:uid:1325",
"notifications:post:222614:uid:1928", "notifications:post:222614:uid:1217",
"notifications:tid:11076:pid:223004:uid:4647",
"notifications:post_flag:223004:uid:558",
"notifications:post:222682:uid:2015",
"notifications:post_flag:222998:uid:16",
"notifications:post_flag:222994:uid:16",
"notifications:post:223018:uid:983", "notifications:post:223018:uid:1411",
"notifications:post:223018:uid:1530",
"notifications:post_flag:223004:uid:2244",
"notifications:post:223018:uid:1878", "notifications:post:223018:uid:1611",
"notifications:tid:11076:pid:223059:uid:3537",
"notifications:tid:11076:pid:223068:uid:1876",
"notifications:post:223018:uid:94", "notifications:post:223018:uid:1061",
"notifications:tid:11076:pid:223095:uid:1876",
"notifications:post:223018:uid:147", "notifications:post:223018:uid:1217",
"notifications:post:222614:uid:1058", "notifications:post:223018:uid:691",
"notifications:post:223018:uid:2795", "notifications:post:223018:uid:591",
"notifications:post:223018:uid:3918", "notifications:post:223018:uid:2319",
"notifications:post:223018:uid:1441", "notifications:post:223141:uid:1058",
"notifications:post:223018:uid:2458", "notifications:post:223018:uid:1058",
"notifications:post:223018:uid:1936",
"notifications:post_flag:223167:uid:350",
"notifications:post:223018:uid:593" ] } } planSummary: IXSCAN { _key: 1,
score: -1 } cursorid:30669589930 ntoreturn:0 ntoskip:0 nscanned:129
nscannedObjects:101 keyUpdates:0 numYields:2 locks(micros) r:232144
nreturned:101 reslen:160463 220ms
Post by Rob Moore
That is another case were the operation ran quickly once it got the
lock (10044 nanoseconds = 10ms) but the total time for the operation was
large (491ms). That implies that the operation waited about 481ms to obtain
the write lock.
To be honest even the 10ms a bit long for a simple set and index
update. Are the _key fields fairly unique e.g., are there on the order of a
million unique _key values? I am wondering if the score values are not as
unique as I would have expected and there is a large group of entries with
the same _key/score index value. When MongoDB goes to update that index it
ends up having to scan a large number of documents to find the actual
record to update.
Back to the graph: Can you zoom in on one of the periods that has a
large spike in the operation time and look for an operation that is the
outlier in the graph. The "bad" operation could be a read, write, or even
a count. I think there is something that is grabbing and not releasing the
lock and that is what is blocking all of the updates, or only allowing a
relatively small number of writes to run per unit of time.
Rob.
On Tuesday, October 14, 2014 1:25:37 AM UTC-4, Barış Soner Uşaklı
Post by Barış Soner Uşaklı
<https://lh4.googleusercontent.com/-1lU_13d_rdw/VDyyS28MEnI/AAAAAAAAAMU/GNmdVf52eaE/s1600/Capture.PNG>
So this seems to show something useful I think.
The query 2014-10-14T00:20:30.317-0400 [conn1574] update 0.objects
query: { _key: "uid:2848:tids_read", value: "11343" } update: { $set: {
score: 1413260429824.0 } } nscanned:1 nscannedObjects:1 nMatched:1
nModified:1 keyUpdates:1 numYields:0 locks(micros) w:10044 491ms
Is called many times which I think causes the lockup
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/4d3afb29-b163-44b5-862e-7482862299f9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Barış Soner Uşaklı
2014-10-16 16:35:11 UTC
Permalink
I haven't tried making the _key / value index unique as I read it might
make inserts/upserts slower not sure how true is that.

However I got rid of most of the counts and started using just a field that
I calculate in the app. Ie instead of doing a
db.objects.count({"_key":"category:2:tids"}); to find the number of topics
in a category, i just increment a field on topic post and decrement it on
delete.

I also changed some of the code that was causing lots of updates at the
same time and spread those updates over time. After all these small changes
I am not seeing the long read queues any more. The mlogvis looks better now
as well. Although there is still some room for improvement.

Thanks for the help so far! mlogvis has been really useful in helping
diagnose the mongod.log.
Post by Rob Moore
I wonder if making the _key / value index "unique" will help the count
since it will then know it does not need to dedup all of the documents it
finds via the B-Tree.
Rob
Post by Barış Soner Uşaklı
The count is used to get the number of topics inside a category, for
example each topic id(tid) is added to categories:2:tid. So documents look
like this.
{
_key: 'categories:2:tid'
value: '5'
score: 1413336469123
},
{
_key: 'categories:2:tid',
value: '6',
score: 1413336469123
}
In this case category 2 has 5000 topics so there are 5000 documents with
_key = categories:2:tid, value is unique for every
_key='categories:2:tid'. And score is just the timestamp of the topic
creation time. I guess count is getting slower since it scans all the
documents with that key. I might have to change it so that I store the
topic count in the app instead of counting it all the time.
I have already limited the number of keys I pass into the $in operator
and that seemed to help too.
The problem comes from the fact that most of these operations are trying
to mimic redis where certain operations are much faster. For example that
count is actually O(1) in redis. The actual source code is here if anyone
is interested
https://github.com/NodeBB/NodeBB/blob/master/src/database/mongo/sorted.js.
The above `categories:2:cid` documents simulate a sorted set in redis.
Post by Rob Moore
Thanks for the mongovis file. It helped.
I am not an expert in reading these graphs but here is a screen shot
with it zoomed in on one of the periods were operations are slow. Notice
how the length of time to complete operations increases with time?
<https://lh5.googleusercontent.com/-MX7IVYoD920/VD3BOgm91jI/AAAAAAAADbc/T_v1o4lUsXY/s1600/slowdown.png>
At the end of the slow period are 3 green dots. Those are all
"killcursor" commands which represent the end of queries. I then looked at
<https://lh6.googleusercontent.com/-J8T1D3XXOGw/VD3EAgwGJXI/AAAAAAAADbo/CuBpy6x0h2k/s1600/slowdown2.png>
2014-10-14T00:11:24.906-0400 [conn1623] command 0.$cmd command: count {
count: "objects", query: { _key: "categories:2:tid" }, fields: null }
planSummary: COUNT { _key: 1, score: -1 } keyUpdates:0 numYields:42
locks(micros) r:13361 reslen:48 509ms
I think the combination of the counts and the large $in queries you
posted are blocking the writes and causing the slow downs.
What are the queries and counts used for? Can you make sure that only
one of them is running at a time? Have you considered using a replica set
and moving these expensive queries to a secondary?
Rob
Post by Barış Soner Uşaklı
Attached the mlogvis file
Post by Barış Soner Uşaklı
In case of this query { _key: "uid:2848:tids_read", value: "11343" }
update: { $set: { score: 1413260429824.0 } }
Out of 2.9 million documents there are 60 documents with
_key:"uid:2848:tids_read" and it can grow as the user reads more topics.
11343 is the topic id so that is also not unique in the global scheme of
this but together with _key the _key/value pair is unique in the
collection. Although I didn't create a unique compound index on _key/value
maybe I should.
I zoomed into the graph and there seems to be some large read queryies
that use $in. Here are two that have the largest duration.
{ $in: [ "topic:11367", "topic:11240", "topic:11376", "topic:11313",
"topic:11375", "topic:11374", "topic:11370", "topic:11323", "topic:11357",
"topic:11373", "topic:11105", "topic:11092", "topic:10843", "topic:11307",
"topic:11363", "topic:11368", "topic:11365", "topic:11362", "topic:11350",
"topic:11371", "topic:8439", "topic:11351", "topic:11359", "topic:11361",
"topic:11369", "topic:11304", "topic:11334", "topic:11366", "topic:11356",
"topic:11076", "topic:4890", "topic:11364", "topic:11277", "topic:11283",
"topic:11342", "topic:11260", "topic:11186", "topic:11346", "topic:11360",
"topic:11358", "topic:11203", "topic:11285", "topic:11354", "topic:11287",
"topic:5593", "topic:11353", "topic:10929", "topic:11280", "topic:11330",
"topic:11309", "topic:11347", "topic:11355", "topic:11096", "topic:11349",
"topic:11352", "topic:11348", "topic:11198", "topic:11344", "topic:11327",
"topic:10784", "topic:11252", "topic:11341", "topic:11339", "topic:11340",
"topic:11336", "topic:11039", "topic:11335", "topic:10993", "topic:11338",
"topic:11337", "topic:11314", "topic:11324", "topic:11329", "topic:11306",
"topic:2942", "topic:11333", "topic:9831", "topic:11318", "topic:10431",
"topic:11332", "topic:11325", "topic:11328", "topic:11159", "topic:11190",
"topic:11233", "topic:10667", "topic:11326", "topic:11242", "topic:11303",
"topic:7536", "topic:11191", "topic:8695", "topic:11315", "topic:11321",
"topic:5550", "topic:11322", "topic:10144", "topic:11320", "topic:11319",
"topic:10069", "topic:11024", "topic:11312", "topic:11140", "topic:10816",
"topic:11223", "topic:11317", "topic:10984", "topic:11316", "topic:10455",
"topic:10211", "topic:11298", "topic:11238", "topic:11249", "topic:11310",
"topic:11250", "topic:11224", "topic:11028", "topic:11048", "topic:11284",
"topic:11300", "topic:11278", "topic:11212", "topic:11305", "topic:11295",
"topic:4867", "topic:7877", "topic:11259", "topic:11120", "topic:11114",
"topic:11279", "topic:11302", "topic:8646", "topic:11301", "topic:11125",
"topic:11290", "topic:11297", "topic:11294", "topic:11085", "topic:11299",
"topic:9943", "topic:11293", "topic:10992", "topic:11103", "topic:11291",
"topic:11228", "topic:10524", "topic:11209", "topic:11266", "topic:10383",
"topic:11289", "topic:11292", "topic:11133", "topic:11286", "topic:11288",
"topic:11282", "topic:8872", "topic:7890", "topic:11272", "topic:9588",
"topic:11276", "topic:11281", "topic:7937", "topic:11256", "topic:11095",
"topic:11183", "topic:10135", "topic:11097", "topic:11271", "topic:11134",
"topic:11151", "topic:11274", "topic:11275", "topic:11027", "topic:11227",
"topic:11270", "topic:11179", "topic:11107", "topic:11222", "topic:4059",
"topic:11267", "topic:11273", "topic:10557", "topic:11269", "topic:11264",
"topic:11268", "topic:11064", "topic:5580", "topic:11263", "topic:11262",
"topic:11234", "topic:11257", "topic:11189", "topic:11248", "topic:11258",
"topic:11232", "topic:11193", "topic:11208", "topic:9525", "topic:11245",
"topic:11251", "topic:7730", "topic:11153", "topic:11254", "topic:11219",
"topic:11244", "topic:11255", "topic:11253", "topic:11247", "topic:11200",
"topic:11230", "topic:11243", "topic:11042", "topic:11197", "topic:11226",
"topic:11239", "topic:11235", "topic:11236", "topic:11187", "topic:10659",
"topic:11210", "topic:11231", "topic:11165", "topic:10762", "topic:11225",
"topic:11229", "topic:11204", "topic:11206", "topic:9433", "topic:10592",
"topic:11218", "topic:11213", "topic:11221", "topic:11201", "topic:11214",
"topic:11215", "topic:11216", "topic:11211", "topic:10916", "topic:4400",
"topic:11217", "topic:11192", "topic:9509", "topic:11207", "topic:9545",
"topic:11177", "topic:7611", "topic:11109", "topic:11045", "topic:11123",
"topic:11202", "topic:11205", "topic:11194", "topic:11199", "topic:11185",
"topic:11172", "topic:11195", "topic:10626", "topic:11196", "topic:11012",
"topic:11091", "topic:11163", "topic:11168", "topic:10385", "topic:11188",
"topic:11073", "topic:11184", "topic:11178", "topic:6742", "topic:6622",
"topic:11182", "topic:11174", "topic:11181", "topic:11170", "topic:11154",
"topic:11167", "topic:11176", "topic:11175", "topic:11171", "topic:11135",
"topic:11160", "topic:11162", "topic:11128", "topic:9542", "topic:11166",
"topic:11161", "topic:11137", "topic:11164", "topic:11148", "topic:10743",
"topic:11080", "topic:11146", "topic:10721", "topic:11155", "topic:11158",
"topic:11157", "topic:11131", "topic:11156", "topic:11152", "topic:11150",
"topic:11149", "topic:11147", "topic:11145", "topic:11093", "topic:11143",
"topic:10872", "topic:11144", "topic:3716", "topic:11142", "topic:11056",
"topic:11141", "topic:11139", "topic:11121", "topic:11138", "topic:11136",
"topic:11126", "topic:11124", "topic:11129", "topic:11132", "topic:11130",
"topic:3320", "topic:11127", "topic:11113", "topic:11118", "topic:11116",
"topic:11117", "topic:11115", "topic:11041", "topic:10981", "topic:11110",
"topic:11111", "topic:11112", "topic:11108", "topic:11100", "topic:11005",
"topic:11079", "topic:11029", "topic:8798", "topic:11104", "topic:11054",
"topic:10362", "topic:10533", "topic:11106", "topic:10734", "topic:11057",
"topic:11098", "topic:11102", "topic:11101", "topic:11099", "topic:6552",
"topic:10991", "topic:11033", "topic:11094", "topic:11075", "topic:11089",
"topic:11084", "topic:10955", "topic:5911", "topic:11090", "topic:11088",
"topic:11087", "topic:10954", "topic:11067", "topic:11037", "topic:10792",
"topic:11078", "topic:11077", "topic:11025", "topic:11003", "topic:9804",
"topic:11069", "topic:11063", "topic:11074", "topic:10632", "topic:10878",
"topic:11071", "topic:11050", "topic:11023", "topic:11022", "topic:11072",
"topic:11065", "topic:11070", "topic:11062", "topic:11061", "topic:1923",
"topic:10983", "topic:11068", "topic:11066", "topic:3484", "topic:10963",
"topic:11055", "topic:10949", "topic:11059" ] } } planSummary: IXSCAN {
_key: 1, score: -1 } cursorid:30595555047 ntoreturn:0 ntoskip:0
nscanned:162 nscannedObjects:101 keyUpdates:0 numYields:3 locks(micros)
r:322748 nreturned:101 reslen:4807 331ms
And
2014-10-14T00:28:36.878-0400 [conn1712] warning: log line attempted
(57k) over max size (10k), printing beginning and end ... query 0.objects
query: { _key: { $in: [ "notifications:post:196440:uid:1215",
"notifications:post:196440:uid:1609", "notifications:post:196440:uid:2217",
"notifications:post:196440:uid:891", "notifications:post:196440:uid:1534",
"notifications:post:196440:uid:673", "notifications:post:196440:uid:1302",
"notifications:post:196440:uid:766", "notifications:post:196440:uid:221",
"notifications:post:196440:uid:1067", "notifications:post:196440:uid:2226",
"notifications:post:196440:uid:817", "notifications:post:196440:uid:1477",
"notifications:post:196440:uid:1963",
"notifications:tid:10294:pid:196969:uid:74",
"notifications:post:196440:uid:1661", "notifications:post:196440:uid:361",
"notifications:post:196440:uid:1255",
"notifications:tid:10223:pid:196989:uid:1197",
"notifications:tid:10223:pid:196998:uid:1197",
"notifications:post:196440:uid:236",
"notifications:tid:10223:pid:197019:uid:1197",
"notifications:post:196440:uid:1274",
"notifications:tid:10223:pid:197023:uid:1197",
"notifications:post:196440:uid:1108", "notifications:post:196440:uid:1508",
"notifications:post:196440:uid:1028",
"notifications:tid:10296:pid:197068:uid:361",
"notifications:post:196440:uid:2319", "notifications:post:196440:uid:4390",
"notifications:post:196440:uid:3231", "notifications:post:196440:uid:1428",
"notifications:tid:9621:pid:197108:uid:956",
"notifications:post:196440:uid:3095", "notifications:post:196440:uid:1618",
"notifications:follow:7:uid:3095",
"notifications:tid:10299:pid:197130:uid:3035",
"notifications:post:196440:uid:2539", "notifications:post:196440:uid:1058",
"notifications:post:196440:uid:670", "notifications:post:196440:uid:1509",
"notifications:post:196440:uid:628",
"notifications:tid:10223:pid:197218:uid:1197",
"notifications:post:196440:uid:2048", "notifications:post:196440:uid:3286",
"notifications:post_flag:196446:uid:3095",
"notifications:post:196440:uid:4357", "notifications:post:196440:uid:1665",
"notifications:post:196440:uid:3035", "notifications:post:196440:uid:1807",
"notifications:post:196440:uid:1411", "notifications:post:196440:uid:1535",
"notifications:post:196440:uid:1054",
"notifications:tid:10315:pid:197408:uid:139",
"notifications:tid:10320:pid:197416:uid:1325",
"notifications:post:196440:uid:2612", "notifications:post:197467:uid:1325",
"notifications:post:197467:uid:1637", "notifications:post:197467:uid:1963",
"notifications:post:197467:uid:1040", "notifications:post:197467:uid:1072",
"notifications:post:197467:uid:817", "notifications:post:197467:uid:1349",
"notifications:post:197467:uid:1067", "notifications:post:197467:uid:1373",
"notifications:post:197467:uid:4179", "notifications:post:197467:uid:691",
"notifications:tid:9929:pid:197513:uid:2306",
"notifications:post:197467:uid:1508", "notifications:post:197467:uid:3038",
"notifications:post:197467:uid:3619", "notifications:post:197467:uid:139",
"notifications:post:197467:uid:3010", "notifications:follow:7:uid:4297",
"notifications:post:197467:uid:3231", "notifications:post:197467:uid:3436",
"notifications:post:197467:uid:74", "notifications:post:197467:uid:1274",
"notifications:post:197467:uid:925", "notifications:post:197467:uid:2244",
"notifications:post:197467:uid:1345", "notifications:post:197467:uid:2972",
"notifications:post:196440:uid:830", "notifications:post:197467:uid:45",
"notifications:post:197467:uid:830", "notifications:post:196440:uid:2568",
"notifications:post:1974 .......... post:222550:uid:4253",
"notifications:post:222544:uid:4253", "notifications:post:217836:uid:2558",
"notifications:post:217791:uid:3286", "notifications:post:222597:uid:836",
"notifications:post:222597:uid:1830", "notifications:post:222597:uid:4253",
"notifications:post:222544:uid:1878", "notifications:post:222597:uid:1770",
"notifications:post:222594:uid:2294", "notifications:post:222544:uid:2795",
"notifications:post:222549:uid:2795", "notifications:post:222594:uid:2795",
"notifications:post:222597:uid:1195", "notifications:post:222594:uid:2317",
"notifications:post:222614:uid:2795", "notifications:post:222614:uid:94",
"notifications:post:222614:uid:250", "notifications:post:222614:uid:1696",
"notifications:post:222614:uid:4570",
"notifications:post_flag:222655:uid:1770",
"notifications:post:222614:uid:3661", "notifications:post:222614:uid:999",
"notifications:post:222614:uid:468", "notifications:post:222549:uid:1611",
"notifications:post:222597:uid:1611", "notifications:post:222738:uid:468",
"notifications:post:222738:uid:1246", "notifications:post:222738:uid:1477",
"notifications:post:222682:uid:45", "notifications:post:222738:uid:45",
"notifications:follow:7:uid:4644", "notifications:post:222614:uid:1611",
"notifications:follow:7:uid:4645", "notifications:post:222614:uid:1731",
"notifications:post:222682:uid:593", "notifications:post:222614:uid:593",
"notifications:post:222614:uid:591", "notifications:post:222614:uid:3918",
"notifications:tid:2942:pid:222777:uid:1871",
"notifications:post:222692:uid:1936", "notifications:post:222614:uid:1411",
"notifications:tid:11186:pid:222834:uid:512",
"notifications:post:222682:uid:3327", "notifications:post:222614:uid:1971",
"notifications:post:222692:uid:3883", "notifications:post:217474:uid:94",
"notifications:tid:11076:pid:222922:uid:94",
"notifications:tid:11076:pid:222927:uid:2244",
"notifications:post_flag:222238:uid:1325",
"notifications:post:222614:uid:1928", "notifications:post:222614:uid:1217",
"notifications:tid:11076:pid:223004:uid:4647",
"notifications:post_flag:223004:uid:558",
"notifications:post:222682:uid:2015",
"notifications:post_flag:222998:uid:16",
"notifications:post_flag:222994:uid:16",
"notifications:post:223018:uid:983", "notifications:post:223018:uid:1411",
"notifications:post:223018:uid:1530",
"notifications:post_flag:223004:uid:2244",
"notifications:post:223018:uid:1878", "notifications:post:223018:uid:1611",
"notifications:tid:11076:pid:223059:uid:3537",
"notifications:tid:11076:pid:223068:uid:1876",
"notifications:post:223018:uid:94", "notifications:post:223018:uid:1061",
"notifications:tid:11076:pid:223095:uid:1876",
"notifications:post:223018:uid:147", "notifications:post:223018:uid:1217",
"notifications:post:222614:uid:1058", "notifications:post:223018:uid:691",
"notifications:post:223018:uid:2795", "notifications:post:223018:uid:591",
"notifications:post:223018:uid:3918", "notifications:post:223018:uid:2319",
"notifications:post:223018:uid:1441", "notifications:post:223141:uid:1058",
"notifications:post:223018:uid:2458", "notifications:post:223018:uid:1058",
"notifications:post:223018:uid:1936",
"notifications:post_flag:223167:uid:350",
"notifications:post:223018:uid:593" ] } } planSummary: IXSCAN { _key: 1,
score: -1 } cursorid:30669589930 ntoreturn:0 ntoskip:0 nscanned:129
nscannedObjects:101 keyUpdates:0 numYields:2 locks(micros) r:232144
nreturned:101 reslen:160463 220ms
Post by Rob Moore
That is another case were the operation ran quickly once it got the
lock (10044 nanoseconds = 10ms) but the total time for the operation was
large (491ms). That implies that the operation waited about 481ms to obtain
the write lock.
To be honest even the 10ms a bit long for a simple set and index
update. Are the _key fields fairly unique e.g., are there on the order of a
million unique _key values? I am wondering if the score values are not as
unique as I would have expected and there is a large group of entries with
the same _key/score index value. When MongoDB goes to update that index it
ends up having to scan a large number of documents to find the actual
record to update.
Back to the graph: Can you zoom in on one of the periods that has a
large spike in the operation time and look for an operation that is the
outlier in the graph. The "bad" operation could be a read, write, or even
a count. I think there is something that is grabbing and not releasing the
lock and that is what is blocking all of the updates, or only allowing a
relatively small number of writes to run per unit of time.
Rob.
On Tuesday, October 14, 2014 1:25:37 AM UTC-4, Barış Soner Uşaklı
Post by Barış Soner Uşaklı
<https://lh4.googleusercontent.com/-1lU_13d_rdw/VDyyS28MEnI/AAAAAAAAAMU/GNmdVf52eaE/s1600/Capture.PNG>
So this seems to show something useful I think.
The query 2014-10-14T00:20:30.317-0400 [conn1574] update 0.objects
query: { _key: "uid:2848:tids_read", value: "11343" } update: { $set: {
score: 1413260429824.0 } } nscanned:1 nscannedObjects:1 nMatched:1
nModified:1 keyUpdates:1 numYields:0 locks(micros) w:10044 491ms
Is called many times which I think causes the lockup
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/534793c3-692b-4910-9642-46f9076385d2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Continue reading on narkive:
Loading...