2018-12-05 18:12:31 UTC
Mongo and still learning:
- I work in a department of a huge company as a data scientist.
- I have taken the work of doing some architecture/engineering and
putting a data pipeline together (I guess because of my 'hacking' skills,
don't know why I'm the guy to do it ;) )
- I have a data flow going through Mongo was a necessity based on the
open source software we were using to write to the database.
- I am doing this all in Azure. I spun up a single VM (16 VCPUS, 56GB
Ram) and attached a single managed drives which can support 5000iops and a
throughput of 250Mbps.
- These machines all have IIS, our open source software which processes
packets and our MongoDB -- yes all on a single node.
- These (I have two for different purposes) end up with a sustained cpu
of 20% and memory usage of about 50%. The disk queue length is sub 0.5
- My data on disk (based on the size of the folders on the managed disk)
in mongo seems to be 150GB when we write to a datalake in other non-mongo
flat files it ends up being closer to a TB
- I run a 3rd party synchronizer to pull in data. I don't have a good
idea of the throughput coming in, but the current resource usage is the
- We don't need multi-region redundancy outside of backups (non of this
is so mission-critical)
- Currently this setup seems to be meeting our needs, with two caveats:
- 1) we do receive errors once in a while with our synchronizer that
at the time seemed to be deadlock errors. The errors have since passed, but
we do receive the one-off errors once in a while that I cant be totally
sure are not unrelated to my naive provisioning.
- 2) there is some concern that there will be more requests coming in
and we might have to scale up more
- I have tried to search around and watch videos on how to properly
provision this machine in vain:
- I ran into some videos but they were at too low or too high of a
- I tried searching books (although only a couple) and they don't
seem to discuss this provisioning step
- We don't currently have a great Mongo Consult in our organization
and i was referred to this forum
- Mongo Atlas or any managed solutions outside of what I can install
from the Azure marketplace (or manually install on bare metal) are my only
current options unfortunately (it's a billing source thing).
- Is it absurd for me to run all this on a single node? Should this be
multi-node? How can I decide? I have read about 'replica sets' and
although the 'replica' portion is not essential for us, the ability to have
better disk throughput may be the thing that we will need
- Is there a good place to look to see if my machine is
under-provisioned or if I should have a different setup?
- Any other tips for tuning this machine?
- Is there a way to get a consult directly with mongo for something like
Basically I'm fine if you want to point me to particular resources and I
read, I just haven't ran into those resources after about 5 hours of google
searching and video watching. Yes the Mongo docs talk about shading and
vertical/horizontal scaling, but the biggest thing is I don't know how to
know if that's relevant for my case. (Maybe I have room to grow, or maybe
my setup is silly. I don't have the experience to know.)
Thanks for any help you can give!
You received this message because you are subscribed to the Google Groups "mongodb-user"
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
To post to this group, send email to firstname.lastname@example.org.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/7af16cc2-f6cf-46cc-8042-aa71116a7b58%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.