Discussion:
Flush to Disk decreases operation throughput
(too old to reply)
Francesco Rivola
2018-07-09 15:19:11 UTC
Permalink
Hi All,

We are running MongoDB 3.4.9 using WiredTiger Storage Engine.

- Window 2016 server in Azure
- 64GB of RAM
- Disk
- SSD Premium
- 5000 IOPS
- 200 MB/s
- 1 TB size


Couple of weeks ago we started to have more traffic in our production
application and also started to experiment the following issue:

- in mongostat we discovered that, right after disk flush, we got an
high peak in DISK I/O and during 10 or more seconds the stats show 0
operations (0 insert, 0 query, 0 update, 0 delete, etc..)

After some reading and investigation we decided to try decreasing the
WiredTiger cache from 31GB to 1GB. The result was impressive and
the operation throughput issue has been almost resolved completely.

The dirty cache is now around 7-10%, while before was constantly increasing
up to 2.5-3% at the moment of the disk flush.

We would like to ask the following:

- What could be the side effect of our change in WiredTiger cache?
- Do we have other alternatives to avoid the flush High DISK I/O?
- i.e.:
- Increase page eviction thread config min and max
- Change the checkpoints internal to be lower than 60 seconds
- Is this a normal behavior or could be due to some specific application
requirements? i.e.: heavy write load vs read load (I have attached a
screenshot of MongoStat after our wiredTiger change to 1GB to have an
understanding of our load)


Please, let me know if you need more information or if you need further
clarification.

Thank you so much.

Best Regards,
Francesco Rivola
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/969b7e1b-e0ca-4bd5-8e1d-de21bd72140a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Kevin Adistambha' via mongodb-user
2018-07-19 06:34:33 UTC
Permalink
Hi Francesco

It seems to me that the hardware can barely cope with the workload you’re
asking it to do. It appears that by reducing the size of the WiredTiger
cache allows you to basically throttle your workload so that the disk can
handle the load.

Specifically to answer your questions:

What could be the side effect of our change in WiredTiger cache?

A smaller WiredTiger cache means that there are less documents and indexes
that WiredTiger can handle within a particular time. The WiredTiger cache
forms the working memory of WiredTiger, containing the uncompressed indexes
and documents and comprises your working set.

Thus, a small WiredTiger cache will typically be detrimental to a
mostly-read workload. This is because WiredTiger needs to load and
uncompress documents not in its cache before it is able to process them.
This means that you’ll be hitting disk a lot more often.

However, this depends on your use case. In a write-heavy workload, this
usually have less impact since your bottleneck would be how fast your disk
can process the writes you’re telling it to do.

Increase page eviction thread config min and max

Since it appears that your workload is disk-bound, increasing the number of
eviction threads would have minimal impact. The additional threads would
just be sitting idle waiting for disk and not being productive.

Change the checkpoints internal to be lower than 60 seconds

This may help to spread out the disk writes and allow WiredTiger to write
less data but more often to disk (again, depending on your specific use
case). I would encourage you to do your own testing regarding this
parameter.

Is this a normal behavior or could be due to some specific application
requirements?

This is a normal behaviour. As mentioned above, I believe your disk appears
to be not fast enough for your use case. More concurrency e.g. more clients
could make this worse. Having said that, one thing you can try is to
separate your dbpath and your journal into separate physical disks. This
way, WiredTiger journal writes would not compete with data writes. Please
see the Production Notes section
<https://docs.mongodb.com/manual/administration/production-notes/#separate-components-onto-different-storage-devices>
for recommendations.

Finally, for production environment, it is strongly recommended to deploy a
replica set with a minimum of 3 data-bearing nodes. Please see the Production
Notes <https://docs.mongodb.com/manual/administration/production-notes/>
for more details.

Best regards
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/c928d331-b896-44a1-a21f-7ea3aca0f300%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Francesco Rivola
2018-07-19 10:51:02 UTC
Permalink
Hi Kevin,

Thank you very much for your response. Really appreciated.

This helps me a lot to clarify the issue and we will study how to approach
your suggestions.

I have just few more questions related to your answer:

"*reducing the WiredTiger cache is throttling our workflow*". Does this
happen as the default 5% eviction_dirty_target is now reached, so the
eviction thread starts to write to disk reducing the amount of work that
need to be done in the checkpoint? Is it right?
If it is right, could be tuning the eviction_dirty_target and trigger
parameters another approach to minimize the checkpoint issue?

What could be the factor that limits the disk in the checkpoint scenario?
The 200MiB/s throughput or the 5000 IOPS? Or both. I am asking this in case
we could vertically scale the hardware.

Thank you so much.

Best Regards,
Francesco Rivola
Post by 'Kevin Adistambha' via mongodb-user
Hi Francesco
It seems to me that the hardware can barely cope with the workload you’re
asking it to do. It appears that by reducing the size of the WiredTiger
cache allows you to basically throttle your workload so that the disk can
handle the load.
What could be the side effect of our change in WiredTiger cache?
A smaller WiredTiger cache means that there are less documents and indexes
that WiredTiger can handle within a particular time. The WiredTiger cache
forms the working memory of WiredTiger, containing the uncompressed indexes
and documents and comprises your working set.
Thus, a small WiredTiger cache will typically be detrimental to a
mostly-read workload. This is because WiredTiger needs to load and
uncompress documents not in its cache before it is able to process them.
This means that you’ll be hitting disk a lot more often.
However, this depends on your use case. In a write-heavy workload, this
usually have less impact since your bottleneck would be how fast your disk
can process the writes you’re telling it to do.
Increase page eviction thread config min and max
Since it appears that your workload is disk-bound, increasing the number
of eviction threads would have minimal impact. The additional threads would
just be sitting idle waiting for disk and not being productive.
Change the checkpoints internal to be lower than 60 seconds
This may help to spread out the disk writes and allow WiredTiger to write
less data but more often to disk (again, depending on your specific use
case). I would encourage you to do your own testing regarding this
parameter.
Is this a normal behavior or could be due to some specific application
requirements?
This is a normal behaviour. As mentioned above, I believe your disk
appears to be not fast enough for your use case. More concurrency e.g. more
clients could make this worse. Having said that, one thing you can try is
to separate your dbpath and your journal into separate physical disks.
This way, WiredTiger journal writes would not compete with data writes.
Please see the Production Notes section
<https://docs.mongodb.com/manual/administration/production-notes/#separate-components-onto-different-storage-devices>
for recommendations.
Finally, for production environment, it is strongly recommended to deploy
a replica set with a minimum of 3 data-bearing nodes. Please see the Production
Notes <https://docs.mongodb.com/manual/administration/production-notes/>
for more details.
Best regards
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/582ba21b-5d22-4249-a028-024f60372028%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Kevin Adistambha' via mongodb-user
2018-07-23 02:31:43 UTC
Permalink
Hi Francesco

“reducing the WiredTiger cache is throttling our workflow”. Does this
happen as the default 5% eviction_dirty_target is now reached, so the
eviction thread starts to write to disk reducing the amount of work that
need to be done in the checkpoint? Is it right?

Correct. The full explanation of the tunables are described in Cache and
eviction tuning page <http://source.wiredtiger.com/develop/tune_cache.html>,
specifically under the heading Eviction tuning. By default (in MongoDB 3.4)
this value is set to 5% of the WiredTiger cache size (see
https://github.com/mongodb/mongo/blob/r3.4.9/src/third_party/wiredtiger/dist/api_data.py#L426
)

I believe the mechanism at work here by lowering the WiredTiger cache from
31GB to 1GB is that it allows the disk to keep up writing dirty data. 5% of
31GB is ~1.5GB, while 5% of 1GB is ~50MB, a much smaller amount of data to
write. In other words, when your cache size is 31GB, you write faster to
memory but also wait longer for the writes to be flushed to disk, leading
to “stalls”. When this value is lowered to 1GB, the flushes are smaller but
more regular and faster, thus the “stalls” are smoothed out over time. The
overall time it took to write the data should be the same, since the
workload appears to be disk-bound.

What could be the factor that limits the disk in the checkpoint scenario?
The 200MiB/s throughput or the 5000 IOPS? Or both.

I think it’s both, since what you described so far sounds like the disk is
struggling to fulfil the work required of it. Larger throughput + IOPS
should generally provide you with a better performance.

You may be able to find a suitable setting for the eviction_dirty_target
parameter that is optimal for your workload and your provisioned hardware.
However this is an advanced tuning that is best reserved when all other
options are exhausted. Please make sure that you have recent backups before
doing advanced maintenance on your deployment, and that you have tested the
new parameters on test deployments before implementing it in production.

Best regards
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/ff24f66e-b4df-4061-a474-607069c103f4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Francesco Rivola
2018-07-23 07:08:42 UTC
Permalink
Hi Kevin,

Thank you so much for your help and your detailed answers. I think everthing is clear now :). As you suggested, we will address the issue testing any of the possible solution in our Dev and staging environment.

Thank you again.

Best Regards,
Francesco Rivola
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/da0edd75-6bdd-46b3-b61e-f38b0670ade5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Francesco Rivola
2018-11-06 11:25:34 UTC
Permalink
Hi Kevin,

We have been playing with the WiredTiger cache settings: eviction tuning
and cache size in order to minimize the impact of the problem. However, we
haven't found a setup that allow us to fully avoid the disk flush stall.

Finally we are considering to scale the disk to one with more IOPS and MB/s.

We are running with MongoDB 3.4.9 (standalone) using WiredTiger Storage
Engine.

- Window 2016 server in Azure
- 64GB of RAM
- Disk
- SSD Premium (azure disk type name: P30)
- 5000 IOPS
- 200 MB/s
- 1 TB size


We plan to resize the disk to P40, this is 7500 IOPS, 250 MB/s and 2TB
size, this is the next azure disk available
(https://docs.microsoft.com/en-us/azure/virtual-machines/windows/premium-storage#premium-storage-disk-limits)

Bfore performing this resize, we have tried to confirm that we are I/O
Bound. We have been monitoring the disk with PerfMon and this has been the
result:

- IOPS max value is around 1000 IOPS (*DiskTransfersSec.png)*
- MB/s max value is around 12 MB/s (*DiskBytesSec.png)*

Those values are far behind the promised limit.

With iometer.org
(see https://blogs.technet.microsoft.com/andrewc/2016/09/09/understanding-azure-virtual-machine-iops-throughput-and-disk-latency/)
we have been stressing the disk and we reached the disk promised limit in
IOPS and Throughput.

Reviewing mongo, we found that our max dirty bytes in cache is around 500MB
right before the flush (I have attached the result of
db.serverStatus().wiredTiger right before the flush to disk
ServerStatusWiredTiger.txt). Finally in mongostat we can see that the flush
provokes drop in performance around 10 seconds

See attached image MongoStat.png (note: the used% is not at 80% in the
screenshot because we recently increased to the defult wiredTiger cache and
the memory is slowly growing until the 80%)

We are feeling we are missing some point, with the provided data do you
still think the problem is the Disk and that we are currently I/O bound? If
we are I/O bound why we do not see in PerfMon values around the disk limit?

Finally, do you know somebody that could help us with this? We are thinking
in 1 hour remote meeting that of course will be payed. In this case you
know contact me in private at francesco.rivola @ xepient.com.

Thank you in advanced,

Best Regards,
Francesco Rivola
Post by 'Kevin Adistambha' via mongodb-user
Hi Francesco
“reducing the WiredTiger cache is throttling our workflow”. Does this
happen as the default 5% eviction_dirty_target is now reached, so the
eviction thread starts to write to disk reducing the amount of work that
need to be done in the checkpoint? Is it right?
Correct. The full explanation of the tunables are described in Cache and
eviction tuning page
<http://source.wiredtiger.com/develop/tune_cache.html>, specifically
under the heading Eviction tuning. By default (in MongoDB 3.4) this value
is set to 5% of the WiredTiger cache size (see
https://github.com/mongodb/mongo/blob/r3.4.9/src/third_party/wiredtiger/dist/api_data.py#L426
)
I believe the mechanism at work here by lowering the WiredTiger cache from
31GB to 1GB is that it allows the disk to keep up writing dirty data. 5% of
31GB is ~1.5GB, while 5% of 1GB is ~50MB, a much smaller amount of data to
write. In other words, when your cache size is 31GB, you write faster to
memory but also wait longer for the writes to be flushed to disk, leading
to “stalls”. When this value is lowered to 1GB, the flushes are smaller but
more regular and faster, thus the “stalls” are smoothed out over time. The
overall time it took to write the data should be the same, since the
workload appears to be disk-bound.
What could be the factor that limits the disk in the checkpoint scenario?
The 200MiB/s throughput or the 5000 IOPS? Or both.
I think it’s both, since what you described so far sounds like the disk is
struggling to fulfil the work required of it. Larger throughput + IOPS
should generally provide you with a better performance.
You may be able to find a suitable setting for the eviction_dirty_target
parameter that is optimal for your workload and your provisioned hardware.
However this is an advanced tuning that is best reserved when all other
options are exhausted. Please make sure that you have recent backups before
doing advanced maintenance on your deployment, and that you have tested the
new parameters on test deployments before implementing it in production.
Best regards
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/479f5880-a1f8-4f25-a0fd-15b4be1516ea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Kevin Adistambha' via mongodb-user
2018-11-08 01:22:56 UTC
Permalink
Hi Francesco,

We are feeling we are missing some point, with the provided data do you
still think the problem is the Disk and that we are currently I/O bound?

There are a lot of information there, but it is curious to see the apparent
performance cap that you’re seeing. Is it possible that it’s being
throttled in some way by Azure? The Throttling Section
<https://docs.microsoft.com/en-us/azure/virtual-machines/windows/premium-storage#throttling>
in Azure Premium Storage page says:

Throttling might occur, if your application IOPS or throughput exceeds the
allocated limits for a premium storage disk. Throttling also might occur if
your total disk traffic across all disks on the VM exceeds the disk
bandwidth limit available for the VM.

So my understanding is, throttling can occur not only from disk limits, but
also from VM limits. If my understanding is correct, this implies that a
single disk is only part of the story.

Looking into the mongostat output you provided, the flush was finished at
11:54:01 (in mongostat, flushes are recorded after they’re done) and the
stall appear at 11:54:11, a full 10 seconds after the flush. The stall
lasted until 11:54:25. To me this seems curious, since if the disk is
struggling to fulfil the flush, it should stall immediately, not 10 seconds
later. This *may* imply that you’re being throttled.

Finally, do you know somebody that could help us with this? We are thinking
in 1 hour remote meeting that of course will be payed.

Since you’re already on Azure, have you considered looking at Atlas Pro
<https://www.mongodb.com/collateral/mongodb-atlas-professional-datasheet>?
Atlas is configured to avoid these situations, and if it happens for some
reason, there are avenues you can use to get help.

Best regards,
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/218e9080-ff53-4c76-beb6-8ba0755a6175%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Francesco Rivola
2018-11-08 17:23:17 UTC
Permalink
Hi Kevin,

First of all, thank you so much for you quick reply.

We are using a L8 Azure VM. Based on its documentation
(https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-storage#ls-series)
the machine should be hable to handle 10K IOPS and 250 MB/s. We will open a
ticket to Azure Microsoft Support to get some help to confirm that the VM
is throttling our disk operations.

BTW: the L-serie VM is offering a 40K IOPS disk that is temporally (on
reboot all data is lost) (see top blue note on
https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-storage).
They suggest to use this disk to store the database data for high
throughput. Then having a replica in place, so even if the VM restart the
high availability of the replica guarantee no data loss.

- What do you think about this kind of setup? The fact that the disk is
volatile, would be fine when a replica set is in place?
- What could be the impact of re-sync a secondary from the beginning in
case one VM goes down?
- Backups: that disk does not have azure disk snapshot backups. So I
guess backups should be done by us using mongodump or other similar tools.
I have read in mongodb documentation that this is not the most recommended
way to backup data. What are your thoughts on this?


Finally, we are already customer of Mongo Atlas, our app is running on
Mongo Atlas. But for bussiness reason we currenlty have this specific
environment hosted outside Atlas.

Thank you so much again, as always your help in this group is awesome and
really appreciated :).

Best Regards,
Francesco Rivola
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/bf5cd8d1-4736-47c6-9f48-4a7e194e2625%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Kevin Adistambha' via mongodb-user
2018-11-21 03:34:23 UTC
Permalink
Hi Francesco,

Sorry for the delay in responding. Have you had a chance to confirm if this
is a throttling issue?

Regarding your questions:

What do you think about this kind of setup? The fact that the disk is
volatile, would be fine when a replica set is in place?

I’m not sure I can recommend for or against this setup, since it depends on
your use case and your goal. If a volatile disk use is one of your primary
concern, I would suggest you to take a look at MongoDB Enterprise Server
<https://www.mongodb.com/lp/download/mongodb-enterprise>, which contains
the in-memory storage engine
<https://docs.mongodb.com/manual/core/inmemory/> that is designed for this
use case.

What could be the impact of re-sync a secondary from the beginning in case
one VM goes down?

An initial sync will require a workload similar to a collection scan on all
databases on the sync source (typically this is the primary). This could
impact your primary since it would require it to (eventually) load all
documents into its cache, which is a change in workload that could
potentially be disruptive for your queries. During this period, I would
expect your typical queries to be slow since MongoDB needs to juggle
between servicing your query and an initial sync. For some use case, this
might result in an unacceptable dip in performance.

Backups: that disk does not have azure disk snapshot backups. So I guess
backups should be done by us using mongodump or other similar tools. I have
read in mongodb documentation that this is not the most recommended way to
backup data. What are your thoughts on this?

Backing up by mongodump and mongorestore is a perfectly acceptable method
as mentioned in the MongoDB Backup Methods page
<https://docs.mongodb.com/manual/core/backups/index.html>. However, this
method would require a change in workload similar to an initial sync; that
is, MongoDB would need to load the documents from disk to its cache to be
able to dump them. Thus the tradeoff would be similar to your second
question above. You might want to experiment with the methods outlined in
the page above to see which one would best suit your need, since there is
no correct answer to this question.

Best regards,
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/0924f068-7ee7-44b3-b2ce-37d219192e87%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Francesco Rivola
2018-11-21 14:33:13 UTC
Permalink
Hi Kevin,

Don't worry for the delay, your answer is welcome at any time :)

We haven't confirmed yet is a throttling issue with Azure Support, we will
do that shortly.

Finally, we have mount a replica set using the L8 disk and the flusk disk
problem gone. So I guess the issue was, as you said, that we were I/O bound
in that VM with that disk.

The new replica is composed by 2 L8 using the fast volatile disk and the
old server with the old disk as replica member with low priority. The good
news is that the old server is able to keep in sync with the fast primary.
We guess that the I/O bound was coming from the number of writes + the
number of page faults (reads).

Having a member with that persistent disk allows us to create backups from
the disk snapshot and use that snapshot to add new replica member avoiding
a full re-sync.

I have to said that convert the standalone to a replica set has been very
smooth process with almost no downtime. Kudu for Mongodb :)

Thank you for you help, really appreciated.

Best Regards,
Francesco
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/41ac3162-9d4a-43fe-bb01-dd5745289250%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Robert Cochran
2018-11-21 15:05:26 UTC
Permalink
Hi!

I have no special expertise in this area -- in fact I'm learning from both
of you. I do want to add my vote for Kevin's advice that you should back up
all your data and test any changes before doing them in a production
environment. I am a Tier 1 developer on IBM mainframes and the
administrators for one application that all the developers use -- it is not
MongoDB -- didn't perform any backups on the data for that application.
Then someone made some untested changes and put them in production. There
was loss of data that could not be recovered. It impacts every developer in
our enterprise. While I'm not discussing a MongoDB installation, this does
underline the need for good data backups.

I hope you take the advice to back up very seriously and do that before
making even tiny changes to your MongoDB infrastructure. It is important,
and backups are worth the time and money invested.

Thanks

Bob
Post by Francesco Rivola
Hi Kevin,
Don't worry for the delay, your answer is welcome at any time :)
We haven't confirmed yet is a throttling issue with Azure Support, we will
do that shortly.
Finally, we have mount a replica set using the L8 disk and the flusk disk
problem gone. So I guess the issue was, as you said, that we were I/O bound
in that VM with that disk.
The new replica is composed by 2 L8 using the fast volatile disk and the
old server with the old disk as replica member with low priority. The good
news is that the old server is able to keep in sync with the fast primary.
We guess that the I/O bound was coming from the number of writes + the
number of page faults (reads).
Having a member with that persistent disk allows us to create backups from
the disk snapshot and use that snapshot to add new replica member avoiding
a full re-sync.
I have to said that convert the standalone to a replica set has been very
smooth process with almost no downtime. Kudu for Mongodb :)
Thank you for you help, really appreciated.
Best Regards,
Francesco
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/bf1745c9-5d22-4f7f-a000-fbd041489ff2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Francesco Rivola
2018-11-21 20:30:29 UTC
Permalink
Hi Robert,

Thank you for your advice.

I am with you and Kevin 100%, Indeed backups are very important before
perform any change or maintenance operation in the database infrastructure
(no matter which database you are working on).

In fact, all the changes noted in this thread has been tested first in dev
and staging environment and before applied in production we created a
backup of our data (if you are on Azure I recommend you
https://docs.microsoft.com/en-us/azure/backup/backup-introduction-to-azure-backup).


Thank you so much.

Best Regards,
Francesco Rivola
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/2b7f4454-fed9-4e8c-9a65-1568c9ba22e9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Continue reading on narkive:
Loading...