Discussion:
mongodb 3.4.10 memory growth
(too old to reply)
Daniel Doyle
2018-01-09 14:48:47 UTC
Permalink
Hi all,

We are running a mongo 3.4.10 instance on CentOS7 that appears to be
constantly growing in RES usage. Poking around a little bit, I believe this
is because of the "page heap freelist" in tcalloc growing. Eventually the
process steals all filesystem cache and performance degrades to the point
where we have to restart, usually on the order of 3-4 days. The machine has
128G of memory.

I grabbed a section of the serverStatus command from just before restarting
it last. My parsing of that is that it seems like tcmalloc has allocated a
ton of memory but is never (or not enough anyway) unmapping back to the OS.

db.serverStatus().wiredTiger.cache
...........

"bytes currently in the cache" : 42629567194,

"maximum bytes configured" : 56408145920,



db.serverStatus().tcmalloc

"generic" : {

"current_allocated_bytes" :
NumberLong("72999682360"),

"heap_size" : NumberLong("99688194048")

},

"tcmalloc" : {

"pageheap_free_bytes" : NumberLong("22580260864"),

"pageheap_unmapped_bytes" :
NumberLong("2185707520"),

"max_total_thread_cache_bytes" :
NumberLong(1073741824),

"current_total_thread_cache_bytes" : 27332768,

"total_free_bytes" : NumberLong(1922543304),

"central_cache_free_bytes" : NumberLong(1894941896),

"transfer_cache_free_bytes" : 268640,

"thread_cache_free_bytes" : 27332768,

"aggressive_memory_decommit" : 0,

"formattedString" :
"------------------------------------------------\nMALLOC: 72999682360
(69617.9 MiB) Bytes in use by application\nMALLOC: + 22580260864 (21534.2
MiB) Bytes in page heap freelist\nMALLOC: + 1894941896 ( 1807.2 MiB) Bytes

in central cache freelist\nMALLOC: + 268640 ( 0.3 MiB) Bytes in
transfer cache freelist\nMALLOC: + 27332768 ( 26.1 MiB) Bytes in
thread cache freelists\nMALLOC: + 558653632 ( 532.8 MiB) Bytes in
malloc metadata\nMALLOC: ------------\nMALLOC: = 98

061140160 (93518.4 MiB) Actual memory used (physical + swap)\nMALLOC: + 2185707520
( 2084.5 MiB) Bytes released to OS (aka unmapped)\nMALLOC: ------------\nMALLOC:
= 100246847680 (95602.8 MiB) Virtual address space used\nMALLOC:\nMALLOC:
7267027

Spans in use\nMALLOC: 177 Thread heaps in
use\nMALLOC: 4096 Tcmalloc page
size\n------------------------------------------------\nCall
ReleaseFreeMemory() to release freelist memory to the OS (via
madvise()).\nBytes relea

sed to the OS take up virtual address space but no physical memory.\n"

}




Based on some reading of issues
like https://jira.mongodb.org/browse/SERVER-20104 I tried enabling
aggressive decommit via the environment variable, but the overhead of doing
that resulted in a massive overall performance drop so we reverted fairly
quickly. There are a few other tcmalloc env variables like RELEASE_RATE or
HEAP_LIMIT that may be relevant, but I figured it would be good to ask
around first after trying a few in dev environment and managing to segfault
mongo (such as the max heap size setting).


Attached is the current full output of serverStatus if that is helpful. As
you can see the heap size is considerably lower right now, but it will grow
over time until it looks like the above.


A few misc details -

1. The server is running in a replica set, but at the moment the secondary
has been removed while we do some hardware changes.

2. The server is currently a single shard. This is an artifact from the 2.X
days when we had to shard data to get around single threaded limitations of
MMAP, but is a peculiarity so I wanted to point it out.

3. Our workflow is very heavily update based. We're storing time series
data at a fairly regular interval, and documents contain "blocks" of time.
One document for example represents a few hours worth of 60s samples, so
the workflow each minute is to update an array in that document to add the
value. These arrays are fixed size at which point a new doc is created, but
I would think internally that the documents grow from their initial size.


I am happy to provide any additional information if I can. Thanks.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/a58fc517-4fa7-4d4c-ba32-f4b832a44fa1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Daniel Doyle
2018-01-16 14:28:11 UTC
Permalink
As a followup to this, we tried turning off replication to see if that
helped any. Judging by our memory usage monitoring, it appears to have had
no appreciable impact. I did verify that according to
rs.printReplicationInfo() that the oplog is no longer updating, though I do
still have a "WT RecordStoreThread: local.oplog.rs" operation running, not
sure what to make of that.

The wiredtiger cache size was set to 50G, which it seems to be holding to
based on the output of db.serverStatus(), but the RES of the mongod process
has reached 62G as of this morning and will likely keep climbing until we
restart it based on the last couple of weeks.

This system was upgraded from 3.0.12 to 3.4.10 including the steps along
the way such as converting config servers to wiredtiger. We did not
experience this behavior in the 3.0 line, and we will be trying to
downgrade to 3.2 to see if that helps anything.

Any help or insights would be greatly appreciated.
Post by Daniel Doyle
Hi all,
We are running a mongo 3.4.10 instance on CentOS7 that appears to be
constantly growing in RES usage. Poking around a little bit, I believe this
is because of the "page heap freelist" in tcalloc growing. Eventually the
process steals all filesystem cache and performance degrades to the point
where we have to restart, usually on the order of 3-4 days. The machine has
128G of memory.
I grabbed a section of the serverStatus command from just before
restarting it last. My parsing of that is that it seems like tcmalloc has
allocated a ton of memory but is never (or not enough anyway) unmapping
back to the OS.
db.serverStatus().wiredTiger.cache
...........
"bytes currently in the cache" : 42629567194,
"maximum bytes configured" : 56408145920,
db.serverStatus().tcmalloc
"generic" : {
NumberLong("72999682360"),
"heap_size" : NumberLong("99688194048")
},
"tcmalloc" : {
"pageheap_free_bytes" : NumberLong("22580260864"),
NumberLong("2185707520"),
NumberLong(1073741824),
"current_total_thread_cache_bytes" : 27332768,
"total_free_bytes" : NumberLong(1922543304),
NumberLong(1894941896),
"transfer_cache_free_bytes" : 268640,
"thread_cache_free_bytes" : 27332768,
"aggressive_memory_decommit" : 0,
"------------------------------------------------\nMALLOC: 72999682360
(69617.9 MiB) Bytes in use by application\nMALLOC: + 22580260864
(21534.2 MiB) Bytes in page heap freelist\nMALLOC: + 1894941896 (
1807.2 MiB) Bytes
in central cache freelist\nMALLOC: + 268640 ( 0.3 MiB) Bytes in
transfer cache freelist\nMALLOC: + 27332768 ( 26.1 MiB) Bytes in
thread cache freelists\nMALLOC: + 558653632 ( 532.8 MiB) Bytes in
malloc metadata\nMALLOC: ------------\nMALLOC: = 98
061140160 (93518.4 MiB) Actual memory used (physical + swap)\nMALLOC: + 2185707520
7267027
Spans in use\nMALLOC: 177 Thread heaps in
use\nMALLOC: 4096 Tcmalloc page
size\n------------------------------------------------\nCall
ReleaseFreeMemory() to release freelist memory to the OS (via
madvise()).\nBytes relea
sed to the OS take up virtual address space but no physical memory.\n"
}
Based on some reading of issues like
https://jira.mongodb.org/browse/SERVER-20104 I tried enabling aggressive
decommit via the environment variable, but the overhead of doing that
resulted in a massive overall performance drop so we reverted fairly
quickly. There are a few other tcmalloc env variables like RELEASE_RATE or
HEAP_LIMIT that may be relevant, but I figured it would be good to ask
around first after trying a few in dev environment and managing to segfault
mongo (such as the max heap size setting).
Attached is the current full output of serverStatus if that is helpful. As
you can see the heap size is considerably lower right now, but it will grow
over time until it looks like the above.
A few misc details -
1. The server is running in a replica set, but at the moment the secondary
has been removed while we do some hardware changes.
2. The server is currently a single shard. This is an artifact from the
2.X days when we had to shard data to get around single threaded
limitations of MMAP, but is a peculiarity so I wanted to point it out.
3. Our workflow is very heavily update based. We're storing time series
data at a fairly regular interval, and documents contain "blocks" of time.
One document for example represents a few hours worth of 60s samples, so
the workflow each minute is to update an array in that document to add the
value. These arrays are fixed size at which point a new doc is created, but
I would think internally that the documents grow from their initial size.
I am happy to provide any additional information if I can. Thanks.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/73e028f2-57b6-417d-8a37-ce548ffa340e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Kevin Adistambha' via mongodb-user
2018-01-17 00:40:02 UTC
Permalink
Hi Daniel

Could you add more details please:

- Are you running with --enableMajorityReadConcern setting
<https://docs.mongodb.com/v3.4/reference/program/mongod/#cmdoption-mongod-enablemajorityreadconcern>
?
- Could you post the output of db.serverCmdLineOpts(), rs.conf(), and
the content of any config files?
- Do you have big databases? How many collections do you have in the
system? The output of db.stats() per database would help to determine
this.
- Could you post the directory listing of your dbpath that includes the
file sizes, e.g. output of ls -l
- Do you have big aggregation queries running all the time?

This system was upgraded from 3.0.12 to 3.4.10 including the steps along
the way such as converting config servers to wiredtiger.

Did you upgrade to MongoDB 3.2 in-between? Note that the supported upgrade
path involves upgrading through every intermediate major versions.

Also, did you change the replica set protocol version
<https://docs.mongodb.com/v3.4/reference/replica-set-protocol-versions/#modify-replica-set-protocol-version>
during the upgrade process?

we will be trying to downgrade to 3.2 to see if that helps anything.

Without knowing more about your use case, I would venture to say that it’s
likely to be better to upgrade to the current version of 3.6.2 instead of
moving downward to the 3.2 series.

Best regards
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/c9e8a068-6130-494d-85f4-b24507526c88%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Daniel Doyle
2018-01-17 19:26:08 UTC
Permalink
Kevin,

Certainly, thanks for replying. Responses inline...
Post by 'Kevin Adistambha' via mongodb-user
Hi Daniel
- Are you running with --enableMajorityReadConcern setting
<https://docs.mongodb.com/v3.4/reference/program/mongod/#cmdoption-mongod-enablemajorityreadconcern>
?
I do not believe so. The command line options are just the config file and
"--quiet", as you will see below.
Post by 'Kevin Adistambha' via mongodb-user
- Could you post the output of db.serverCmdLineOpts(), rs.conf(), and
the content of any config files?
db.serverCmdLineOpts();
{
"argv" : [
"/usr/bin/mongod",
"--quiet",
"-f",
"/etc/mongo/mongod-shard2.conf"
],
"parsed" : {
"config" : "/etc/mongo/mongod-shard2.conf",
"net" : {
"port" : 27025
},
"processManagement" : {
"fork" : true,
"pidFilePath" : "/run/mongodb/mongod-shard2.pid"
},
"security" : {
"authorization" : "enabled",
"keyFile" : "/etc/mongo/keyfile-tsds.bldc"
},
"sharding" : {
"clusterRole" : "shardsvr"
},
"storage" : {
"dbPath" : "/mongo/shard2",
"engine" : "wiredTiger",
"syncPeriodSecs" : 60,
"wiredTiger" : {
"engineConfig" : {
"cacheSizeGB" : 30,
"journalCompressor" : "snappy"
}
}
},
"systemLog" : {
"destination" : "file",
"logAppend" : true,
"logRotate" : "reopen",
"path" : "/var/log/mongodb/mongod-shard2.log",
"quiet" : true
}
},
"ok" : 1
}


$ cat /etc/mongo/mongod-shard2.conf
# THIS FILE IS MAINTAINED BY PUPPET

systemLog:
destination: file
path: "/var/log/mongodb/mongod-shard2.log"
logAppend: true
logRotate: reopen

storage:
engine: "wiredTiger"
syncPeriodSecs: 60
dbPath: "/mongo/shard2"
wiredTiger:
engineConfig:
journalCompressor: "snappy"
cacheSizeGB: 30

processManagement:
fork: true
pidFilePath: "/run/mongodb/mongod-shard2.pid"

net:
port: 27025


#replication:
# replSetName: "tsds.bldc"

sharding:
clusterRole: shardsvr

security:
authorization: "enabled"
keyFile: "/etc/mongo/keyfile-tsds.bldc"



I can't currently show the output of rs.conf() since it's not running in a
replica set presently (see 2nd message), but it was only a single mongod
data instance and a single mongod arbiter instance on separate hosts. There
is another mongod data instance not currently in the set as noted in my
original message. We haven't done anything special as far as I know to
replication, it's a 50G oplog with defaults.
Post by 'Kevin Adistambha' via mongodb-user
- Do you have big databases? How many collections do you have in the
system? The output of db.stats() per database would help to determine
this.
We have about 9 database with any substantial data in them, and each of
those has mostly the same set of 8 collections. "interface" is the obvious
dominator in terms of size. All queries should be fully indexed.

I have attached a txt file with that output, db_stats.txt.
Post by 'Kevin Adistambha' via mongodb-user
- Could you post the directory listing of your dbpath that includes
the file sizes, e.g. output of ls -l
I have attached a txt file with that output, db_path_list.txt.
Post by 'Kevin Adistambha' via mongodb-user
- Do you have big aggregation queries running all the time?
I don't know what qualifies as "big" aggregation queries, but most of what
we do is aggregate and update queries. The workflow across a given
timeperiod is pretty consistent. There are jobs at the top of the hour and
at midnight that do bigger crunches, but it's across the same dataset (data
received over the last hour/day respectively, which is effectively a
constant). These bigger crunches are typically doing aggregate and/or find
queries that find tens of thousands of results and page through them in
blocks.
Post by 'Kevin Adistambha' via mongodb-user
This system was upgraded from 3.0.12 to 3.4.10 including the steps along
the way such as converting config servers to wiredtiger.
Did you upgrade to MongoDB 3.2 in-between? Note that the supported upgrade
path involves upgrading through every intermediate major versions.
Yes, we upgraded to 3.2 in between. If memory serves that is where the
upgrade to config wiredtiger servers had to happen.
Post by 'Kevin Adistambha' via mongodb-user
Also, did you change the replica set protocol version
<https://www.google.com/url?q=https%3A%2F%2Fdocs.mongodb.com%2Fv3.4%2Freference%2Freplica-set-protocol-versions%2F%23modify-replica-set-protocol-version&sa=D&sntz=1&usg=AFQjCNEy7bVUwIKQAgZ7Ep-u8J2YsHXG7g>
during the upgrade process?
Yes.
Post by 'Kevin Adistambha' via mongodb-user
we will be trying to downgrade to 3.2 to see if that helps anything.
Without knowing more about your use case, I would venture to say that it’s
likely to be better to upgrade to the current version of 3.6.2 instead of
moving downward to the 3.2 series.
Unfortunately, we cannot currently go to 3.6.2 because the driver for the
language we are using (Perl) has not released with official support for
that version. I expect that will change in the near-ish future, but for
today we can only try going backwards.
https://jira.mongodb.org/projects/PERL/issues/PERL-754?filter=allopenissues

As noted earlier, this behavior did not occur in the 3.0 line so it seems
it was an unintended result of a change made between there and 3.4.10.
While rolling back versions is not an optimal or desired solution, we are
trying to get to a stable environment as a first priority. I am more than
happy to help provide more information and stay on 3.4.10 for the time
being if it helps track anything down.

As an aside, while poking around at the JIRA tracker, we did discover this
that seems like it may be related. I look at the diagnostics directory but
didn't see an obvious way to turn that into something useful, guessing it's
for internal team tools only.


Thanks again for taking a look, we appreciate it.
-Dan
Post by 'Kevin Adistambha' via mongodb-user
Best regards
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/1e5c5fae-ba78-4575-97bc-90a96ed172d5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Kevin Adistambha' via mongodb-user
2018-01-19 03:01:33 UTC
Permalink
Hi Daniel

Unfortunately there is nothing that points to a definitive issue from the
serverStatus output that you posted. There are a number of databases and
indexes, but I think it’s still a relatively manageable number. None of the
information you posted pointed to any obvious cause either.

It would be helpful if you could post about 3-4 output of db.serverStatus()
within a minute or so from each other during the height of the issue.
Hopefully this could point to a reason.

In the meantime, you might want to check out the Production Notes
<https://docs.mongodb.com/manual/administration/production-notes/> and see
if there are any settings that were missed/non-optimal.

Best regards
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/22cebb94-774b-45ba-99d1-d4fee756c172%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Daniel Doyle
2018-01-25 14:47:53 UTC
Permalink
Kevin,

Thanks for reviewing, and apologies for the delayed response as I have been
traveling. I have reviewed the Production Notes and confirmed that we are
doing all best practices to the best of my knowledge.

Attached is a 10minute span of db.serverStatus() run out of a minute
crontab. I hope this helps.

-Dan
Post by 'Kevin Adistambha' via mongodb-user
Hi Daniel
Unfortunately there is nothing that points to a definitive issue from the
serverStatus output that you posted. There are a number of databases and
indexes, but I think it’s still a relatively manageable number. None of the
information you posted pointed to any obvious cause either.
It would be helpful if you could post about 3-4 output of
db.serverStatus() within a minute or so from each other during the height
of the issue. Hopefully this could point to a reason.
In the meantime, you might want to check out the Production Notes
<https://docs.mongodb.com/manual/administration/production-notes/> and
see if there are any settings that were missed/non-optimal.
Best regards
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/182333ec-66d4-47b3-8d18-0357a4dd0806%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Daniel Doyle
2018-02-08 15:34:13 UTC
Permalink
I hate to bump this, but we are still seeing this issue. We have tried
upgrading to the most recent 3.4.11 to see if that would change anything,
but it seems to be the same behavior.

I am attaching several graphs out of a monitoring system that shows the
overall memory usage pattern in a few different dimensions. The far left
side is where it was at when we restarted it last time (note: we have
adjusted the cache size from 30G to 40G during the last restart to see if
there was any difference). The sawtoothing in the graph lines up pretty
well with when the hourly crunch jobs run, which is to be expected I think.
It's doing a fairly large aggregate query during that time so memory
balloons slightly and then should fall when it's finished. The stuff around
midnight is some more large jobs running. The problem is that you can
clearly see that the overall trend line for this is going upwards over
time, but I can't see anything that is obvious.

The graphs labeled "mongodb_*" are from tracking the output of
db.serverStatus(), the other is just something tracking memory usage of the
whole server.

I can also open up a ticket in the Jira tracker if this is the wrong place.
Post by Daniel Doyle
Kevin,
Thanks for reviewing, and apologies for the delayed response as I have
been traveling. I have reviewed the Production Notes and confirmed that we
are doing all best practices to the best of my knowledge.
Attached is a 10minute span of db.serverStatus() run out of a minute
crontab. I hope this helps.
-Dan
Post by 'Kevin Adistambha' via mongodb-user
Hi Daniel
Unfortunately there is nothing that points to a definitive issue from the
serverStatus output that you posted. There are a number of databases and
indexes, but I think it’s still a relatively manageable number. None of the
information you posted pointed to any obvious cause either.
It would be helpful if you could post about 3-4 output of
db.serverStatus() within a minute or so from each other during the
height of the issue. Hopefully this could point to a reason.
In the meantime, you might want to check out the Production Notes
<https://docs.mongodb.com/manual/administration/production-notes/> and
see if there are any settings that were missed/non-optimal.
Best regards
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/8fc44d76-6d9b-4c13-b114-da9efe1b2a90%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Kevin Adistambha' via mongodb-user
2018-02-09 02:26:39 UTC
Permalink
Hi Daniel

From the mongodb_WT_cache graph you posted, the WiredTiger cache seems to
be holding steady at ~32 GB, which is expected if you set the WiredTiger
cache to 40 GB. WiredTiger will use up to 80% of the configured cache size,
then will try to keep cache utilization at this level for performance
reasons.

Other than the mongodb_current_allocated graph, everything else looks
expected. I would like to reconfirm that other than this constantly
climbing allocation, is everything else in the database is running as
expected? E.g. no stalling, drop in throughput, unexpected shutdowns, etc.?

In the past, have you seen the mongodb_current_allocated graph to plateau
at some point?

It’s doing a fairly large aggregate query during that time
The stuff around midnight is some more large jobs running

It is entirely possible that the job you’re doing simply needs that much
memory to run.

Could you elaborate on the large aggregate queries? How large are we
talking about, e.g. the number of documents, the average size of the
documents, whether it uses the $out stage
<https://docs.mongodb.com/v3.4/reference/operator/aggregation/out/>, etc.
It might be useful to see the aggregation query and some example documents
to get a sense of the scale. A bit more insight into the other “large jobs”
would be useful to know as well. Could you elaborate on those large jobs?

Finally, if possible, could you perform these steps to collect more
information?

1. Run mongod with mongod --setParameter heapProfilingEnabled=true, or add
this entry in your config file:

setParameter:
heapProfilingEnabled: true

This will run mongod in a heap profiling mode, which is a troubleshooting
mode.

2. Run this script in the server to collect serverStatus data every minute:

mongo --eval "while(true) {print(JSON.stringify(db.serverStatus({tcmalloc:true}))); sleep(60*1000)}" > serverStatus.out

Ideally, the script should be run after restart, and let running for ~2
days (the length of the graphs you posted). Longer would definitely be
better. It will produce a serverStatus() output
<https://docs.mongodb.com/manual/reference/command/serverStatus/> every
minute. Note that there are no identifiable information in the
serverStatus() output, it only contains various metrics of what the server
is doing at that point in time.

Please post the resulting serverStatus.out. It’s a text file, so you can
compress it with zip or similar.

Best regards
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/707c49c2-8692-4975-a913-cdf27f48c766%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Daniel Doyle
2018-02-09 15:05:33 UTC
Permalink
Hi Kevin,

Thanks for sticking with me here, we really appreciate it.
Post by 'Kevin Adistambha' via mongodb-user
Hi Daniel
From the mongodb_WT_cache graph you posted, the WiredTiger cache seems to
be holding steady at ~32 GB, which is expected if you set the WiredTiger
cache to 40 GB. WiredTiger will use up to 80% of the configured cache size,
then will try to keep cache utilization at this level for performance
reasons.
Yes, it would seem that the WT engine may not be the culprit here as that
part is solid.
Post by 'Kevin Adistambha' via mongodb-user
Other than the mongodb_current_allocated graph, everything else looks
expected. I would like to reconfirm that other than this constantly
climbing allocation, is everything else in the database is running as
expected? E.g. no stalling, drop in throughput, unexpected shutdowns, etc.?
In the past, have you seen the mongodb_current_allocated graph to plateau
at some point?
Yes, everything else appears to be fine. We have never had a crash that
wasn't OOM induced, throughput seems more or less consistent based on the
output of mongostat. There is a 1-3s drop in write throughput following a
"flush" but that doesn't seem unreasonable, and it doesn't drop to 0.
During the ~2days it takes to reach the restart point, the application is
behaving just fine and performant. Eventually the performance starts to
degrade as the FS cache is eaten up, at which point we restart things and
it all goes back to being happy again for another few days.
Post by 'Kevin Adistambha' via mongodb-user
It’s doing a fairly large aggregate query during that time
The stuff around midnight is some more large jobs running
It is entirely possible that the job you’re doing simply needs that much
memory to run.
Could you elaborate on the large aggregate queries? How large are we
talking about, e.g. the number of documents, the average size of the
documents, whether it uses the $out stage
<https://docs.mongodb.com/v3.4/reference/operator/aggregation/out/>, etc.
It might be useful to see the aggregation query and some example documents
to get a sense of the scale. A bit more insight into the other “large jobs”
would be useful to know as well. Could you elaborate on those large jobs?
The large jobs are cyclical in nature. We allocate a document that stores
an array of datapoints and that array gets updated during the hour with
stuff being collected. This array is finite in size, so for a given "top of
the hour" the size and workflow is a constant even if the overall dataset
is growing. Today we are collecting on about 75k things, so to make things
simple we have 75k documents every hour that need summarizing.

At the top of the hour, an aggregation pipeline runs across about 75k
documents that (according to stats() output) are 741 avgobjsize
representing metadata. For each block of this metadata (our block size is
5) it then queries the actual datapoint collection for documents it needs
to summarize. The datapoint collection avgobjsize is 79618 bytes according
to stats(). This "join" happens in memory in our code, and there is no $out
stage. The summarized data is crunched and then written back to a third
collection with an avgobjsize of 554680.

At the top of the day there is an additional stage where it does the exact
same thing but for daily summaries instead of hourly (based on the
hourlies, not the original data, so again 75k total docs).

I am inclined to think that total memory required isn't the problem here
because it runs fine for several days and on restart does not immediately
go back to where it was, but at this point I feel like I am not sure of
anything completely. We have tried letting it run through the entire box's
memory just to see if it would eventually settle, but it got OOM'd when it
hit something like 106 or 107GB (with a 30GB WT cache size).
Post by 'Kevin Adistambha' via mongodb-user
Finally, if possible, could you perform these steps to collect more
information?
1. Run mongod with mongod --setParameter heapProfilingEnabled=true, or
heapProfilingEnabled: true
This will run mongod in a heap profiling mode, which is a troubleshooting
mode.
mongo --eval "while(true) {print(JSON.stringify(db.serverStatus({tcmalloc:true}))); sleep(60*1000)}" > serverStatus.out
Ideally, the script should be run after restart, and let running for ~2
days (the length of the graphs you posted). Longer would definitely be
better. It will produce a serverStatus() output
<https://docs.mongodb.com/manual/reference/command/serverStatus/> every
minute. Note that there are no identifiable information in the
serverStatus() output, it only contains various metrics of what the
server is doing at that point in time.
Please post the resulting serverStatus.out. It’s a text file, so you can
compress it with zip or similar.
Thanks, I will add this to the server when we restart it today and get you
the output.
Post by 'Kevin Adistambha' via mongodb-user
Best regards
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/e86513d4-ddae-4f3b-82bb-b5cb9d36f385%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Daniel Doyle
2018-02-12 14:20:55 UTC
Permalink
Here is a .xz of the command you posted above starting just a minute or so
after server startup and until we had to restart it, I think a little less
than 2 days.

The original file size is around 534M. I had to use "xz" to get it to
compress down enough for Google Groups to allow me to attach it here, as
even something like "gzip -9" resulted in a file it wouldn't accept. If
that format is a problem let me know and I can send it some other way.

Thanks again.
-Dan
Post by Daniel Doyle
Hi Kevin,
Thanks for sticking with me here, we really appreciate it.
Post by 'Kevin Adistambha' via mongodb-user
Hi Daniel
From the mongodb_WT_cache graph you posted, the WiredTiger cache seems
to be holding steady at ~32 GB, which is expected if you set the WiredTiger
cache to 40 GB. WiredTiger will use up to 80% of the configured cache size,
then will try to keep cache utilization at this level for performance
reasons.
Yes, it would seem that the WT engine may not be the culprit here as that
part is solid.
Post by 'Kevin Adistambha' via mongodb-user
Other than the mongodb_current_allocated graph, everything else looks
expected. I would like to reconfirm that other than this constantly
climbing allocation, is everything else in the database is running as
expected? E.g. no stalling, drop in throughput, unexpected shutdowns, etc.?
In the past, have you seen the mongodb_current_allocated graph to
plateau at some point?
Yes, everything else appears to be fine. We have never had a crash that
wasn't OOM induced, throughput seems more or less consistent based on the
output of mongostat. There is a 1-3s drop in write throughput following a
"flush" but that doesn't seem unreasonable, and it doesn't drop to 0.
During the ~2days it takes to reach the restart point, the application is
behaving just fine and performant. Eventually the performance starts to
degrade as the FS cache is eaten up, at which point we restart things and
it all goes back to being happy again for another few days.
Post by 'Kevin Adistambha' via mongodb-user
It’s doing a fairly large aggregate query during that time
The stuff around midnight is some more large jobs running
It is entirely possible that the job you’re doing simply needs that much
memory to run.
Could you elaborate on the large aggregate queries? How large are we
talking about, e.g. the number of documents, the average size of the
documents, whether it uses the $out stage
<https://docs.mongodb.com/v3.4/reference/operator/aggregation/out/>,
etc. It might be useful to see the aggregation query and some example
documents to get a sense of the scale. A bit more insight into the other
“large jobs” would be useful to know as well. Could you elaborate on those
large jobs?
The large jobs are cyclical in nature. We allocate a document that stores
an array of datapoints and that array gets updated during the hour with
stuff being collected. This array is finite in size, so for a given "top of
the hour" the size and workflow is a constant even if the overall dataset
is growing. Today we are collecting on about 75k things, so to make things
simple we have 75k documents every hour that need summarizing.
At the top of the hour, an aggregation pipeline runs across about 75k
documents that (according to stats() output) are 741 avgobjsize
representing metadata. For each block of this metadata (our block size is
5) it then queries the actual datapoint collection for documents it needs
to summarize. The datapoint collection avgobjsize is 79618 bytes according
to stats(). This "join" happens in memory in our code, and there is no $out
stage. The summarized data is crunched and then written back to a third
collection with an avgobjsize of 554680.
At the top of the day there is an additional stage where it does the exact
same thing but for daily summaries instead of hourly (based on the
hourlies, not the original data, so again 75k total docs).
I am inclined to think that total memory required isn't the problem here
because it runs fine for several days and on restart does not immediately
go back to where it was, but at this point I feel like I am not sure of
anything completely. We have tried letting it run through the entire box's
memory just to see if it would eventually settle, but it got OOM'd when it
hit something like 106 or 107GB (with a 30GB WT cache size).
Post by 'Kevin Adistambha' via mongodb-user
Finally, if possible, could you perform these steps to collect more
information?
1. Run mongod with mongod --setParameter heapProfilingEnabled=true, or
heapProfilingEnabled: true
This will run mongod in a heap profiling mode, which is a
troubleshooting mode.
mongo --eval "while(true) {print(JSON.stringify(db.serverStatus({tcmalloc:true}))); sleep(60*1000)}" > serverStatus.out
Ideally, the script should be run after restart, and let running for ~2
days (the length of the graphs you posted). Longer would definitely be
better. It will produce a serverStatus() output
<https://docs.mongodb.com/manual/reference/command/serverStatus/> every
minute. Note that there are no identifiable information in the
serverStatus() output, it only contains various metrics of what the
server is doing at that point in time.
Please post the resulting serverStatus.out. It’s a text file, so you can
compress it with zip or similar.
Thanks, I will add this to the server when we restart it today and get you
the output.
Post by 'Kevin Adistambha' via mongodb-user
Best regards
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/154ff155-9132-4e6f-aaef-4327498d0bfb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Kevin Adistambha' via mongodb-user
2018-02-13 01:49:03 UTC
Permalink
Hi Daniel

Thanks for the data. From what you posted, it appears that your workload
actually requires all that memory.

By default (in MongoDB), tcmalloc is not set to aggressively decommit
allocated memory. This was done to ensure that if this memory is required
quite soon in the future, tcmalloc doesn’t have to go through a
deallocate-reallocate cycle that would be detrimental to performance. The
side effect of this setting is that memory is not promptly released back to
the OS, which is what you are seeing in your case.

By plotting the tcmalloc pageheap_free_bytes metric, it can be observed
that there is a large portion of memory (26 GB at peak) that was allocated,
but was not released back to the OS. However, the point below illustrates a
good reason for not releasing it:

<Loading Image...>


Note in the graph above, at point B, there was 17 GB that was considered
“free” at 2018-02-11T14:51:19Z. However, this number drops to just 69 MB at
point C 2018-02-11T14:52:24Z — in about one minute, almost all 17 GB is
completely reused:

<Loading Image...>


The fact that turning on aggressive decommit leads to poor performance
seems to confirm that the process does actually need all those resources.
This poor performance would mean that mongod deallocated memory that was
being reallocated again later.

There are some things that you could try to alleviate this situation:

1. If you’re able to provision more memory, you might be able to observe
memory usage plateau at some point.
2. Given the current hardware, the workload seems to require more
resource than what is available. You might be able to shard and distribute
the load across more servers.

Best regards
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/5a888246-2d65-4222-8a5a-4daa5cb9053f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Continue reading on narkive:
Loading...