Discussion:
[mongodb-user] newbie provisioning: should I have a muliti-node setup
james m
2018-12-05 18:12:31 UTC
Permalink
Apologies if any of these questions sound naive. I am only getting into
Mongo and still learning:

BACKGROUND

- I work in a department of a huge company as a data scientist.
- I have taken the work of doing some architecture/engineering and
putting a data pipeline together (I guess because of my 'hacking' skills,
don't know why I'm the guy to do it ;) )
- I have a data flow going through Mongo was a necessity based on the
open source software we were using to write to the database.
- I am doing this all in Azure. I spun up a single VM (16 VCPUS, 56GB
Ram) and attached a single managed drives which can support 5000iops and a
throughput of 250Mbps.
- These machines all have IIS, our open source software which processes
packets and our MongoDB -- yes all on a single node.
- These (I have two for different purposes) end up with a sustained cpu
of 20% and memory usage of about 50%. The disk queue length is sub 0.5
- My data on disk (based on the size of the folders on the managed disk)
in mongo seems to be 150GB when we write to a datalake in other non-mongo
flat files it ends up being closer to a TB
- I run a 3rd party synchronizer to pull in data. I don't have a good
idea of the throughput coming in, but the current resource usage is the
above.
- We don't need multi-region redundancy outside of backups (non of this
is so mission-critical)
- Currently this setup seems to be meeting our needs, with two caveats:
- 1) we do receive errors once in a while with our synchronizer that
at the time seemed to be deadlock errors. The errors have since passed, but
we do receive the one-off errors once in a while that I cant be totally
sure are not unrelated to my naive provisioning.
- 2) there is some concern that there will be more requests coming in
and we might have to scale up more
- I have tried to search around and watch videos on how to properly
provision this machine in vain:
- I ran into some videos but they were at too low or too high of a
level
- I tried searching books (although only a couple) and they don't
seem to discuss this provisioning step
- We don't currently have a great Mongo Consult in our organization
and i was referred to this forum
- Mongo Atlas or any managed solutions outside of what I can install
from the Azure marketplace (or manually install on bare metal) are my only
current options unfortunately (it's a billing source thing).

Questions:

- Is it absurd for me to run all this on a single node? Should this be
multi-node? How can I decide? I have read about 'replica sets' and
although the 'replica' portion is not essential for us, the ability to have
better disk throughput may be the thing that we will need
- Is there a good place to look to see if my machine is
under-provisioned or if I should have a different setup?
- Any other tips for tuning this machine?
- Is there a way to get a consult directly with mongo for something like
this?

Basically I'm fine if you want to point me to particular resources and I
read, I just haven't ran into those resources after about 5 hours of google
searching and video watching. Yes the Mongo docs talk about shading and
vertical/horizontal scaling, but the biggest thing is I don't know how to
know if that's relevant for my case. (Maybe I have room to grow, or maybe
my setup is silly. I don't have the experience to know.)

Thanks for any help you can give!
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/7af16cc2-f6cf-46cc-8042-aa71116a7b58%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Robert Cochran
2018-12-06 01:19:03 UTC
Permalink
Hi!

What version of MongoDB are you using?

All my experience (except for some quick exercises in a few MongoDB online
classes) have been in standalone single nodes of the server. That is, I
have not implemented replica sets on separate nodes. It would seem to me
that your resources are much greater than mine, and you really should start
using replica sets on several nodes. Even on commodity hardware in the
office.

Because of my lack of experience and the fact that I'm just someone out on
the Internet, I'll just point to better resources than me: MongoDB staff,
who often answer posts of this nature; and also the online courses you can
and should take from MongoDB University. I think any one of the M101* type
courses would be really helpful to you. And since you already have a
considerable MongoDB database resource at your firm, you really should take
additional training.

I also highly recommend you obtain paid support from the company. I think
it would be well worth it to your company to do that.

Thanks so much

Bob
Post by james m
Apologies if any of these questions sound naive. I am only getting into
BACKGROUND
- I work in a department of a huge company as a data scientist.
- I have taken the work of doing some architecture/engineering and
putting a data pipeline together (I guess because of my 'hacking' skills,
don't know why I'm the guy to do it ;) )
- I have a data flow going through Mongo was a necessity based on the
open source software we were using to write to the database.
- I am doing this all in Azure. I spun up a single VM (16 VCPUS, 56GB
Ram) and attached a single managed drives which can support 5000iops and a
throughput of 250Mbps.
- These machines all have IIS, our open source software which
processes packets and our MongoDB -- yes all on a single node.
- These (I have two for different purposes) end up with a sustained
cpu of 20% and memory usage of about 50%. The disk queue length is sub 0.5
- My data on disk (based on the size of the folders on the managed
disk) in mongo seems to be 150GB when we write to a datalake in other
non-mongo flat files it ends up being closer to a TB
- I run a 3rd party synchronizer to pull in data. I don't have a good
idea of the throughput coming in, but the current resource usage is the
above.
- We don't need multi-region redundancy outside of backups (non of
this is so mission-critical)
- 1) we do receive errors once in a while with our synchronizer
that at the time seemed to be deadlock errors. The errors have since
passed, but we do receive the one-off errors once in a while that I cant be
totally sure are not unrelated to my naive provisioning.
- 2) there is some concern that there will be more requests coming
in and we might have to scale up more
- I have tried to search around and watch videos on how to properly
- I ran into some videos but they were at too low or too high of a
level
- I tried searching books (although only a couple) and they don't
seem to discuss this provisioning step
- We don't currently have a great Mongo Consult in our organization
and i was referred to this forum
- Mongo Atlas or any managed solutions outside of what I can install
from the Azure marketplace (or manually install on bare metal) are my only
current options unfortunately (it's a billing source thing).
- Is it absurd for me to run all this on a single node? Should this
be multi-node? How can I decide? I have read about 'replica sets' and
although the 'replica' portion is not essential for us, the ability to have
better disk throughput may be the thing that we will need
- Is there a good place to look to see if my machine is
under-provisioned or if I should have a different setup?
- Any other tips for tuning this machine?
- Is there a way to get a consult directly with mongo for something
like this?
Basically I'm fine if you want to point me to particular resources and I
read, I just haven't ran into those resources after about 5 hours of google
searching and video watching. Yes the Mongo docs talk about shading and
vertical/horizontal scaling, but the biggest thing is I don't know how to
know if that's relevant for my case. (Maybe I have room to grow, or maybe
my setup is silly. I don't have the experience to know.)
Thanks for any help you can give!
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/3cc501bb-953d-4692-b648-0ff49361a149%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
MH
2018-12-06 01:28:46 UTC
Permalink
Like Robert says, it sounds like your company has the resources to engage
mongo staff directly, convince them to invest in it, depending on how your
schema is structured you might benefit in sharding. I've begun modifying
the way my company deploys the database servers for our software because a
replica set is invaluable with saving time for maintenance and backups.
Being able to compact and restore hard drive space on a member of the
replica set one at a time with having only a small window of down time
while they fail over is very important, not to mention the secondaries can
be used for backups and pointing our reporting software at the backups so
that the primary is not affected.
Post by james m
Apologies if any of these questions sound naive. I am only getting into
BACKGROUND
- I work in a department of a huge company as a data scientist.
- I have taken the work of doing some architecture/engineering and
putting a data pipeline together (I guess because of my 'hacking' skills,
don't know why I'm the guy to do it ;) )
- I have a data flow going through Mongo was a necessity based on the
open source software we were using to write to the database.
- I am doing this all in Azure. I spun up a single VM (16 VCPUS, 56GB
Ram) and attached a single managed drives which can support 5000iops and a
throughput of 250Mbps.
- These machines all have IIS, our open source software which
processes packets and our MongoDB -- yes all on a single node.
- These (I have two for different purposes) end up with a sustained
cpu of 20% and memory usage of about 50%. The disk queue length is sub 0.5
- My data on disk (based on the size of the folders on the managed
disk) in mongo seems to be 150GB when we write to a datalake in other
non-mongo flat files it ends up being closer to a TB
- I run a 3rd party synchronizer to pull in data. I don't have a good
idea of the throughput coming in, but the current resource usage is the
above.
- We don't need multi-region redundancy outside of backups (non of
this is so mission-critical)
- 1) we do receive errors once in a while with our synchronizer
that at the time seemed to be deadlock errors. The errors have since
passed, but we do receive the one-off errors once in a while that I cant be
totally sure are not unrelated to my naive provisioning.
- 2) there is some concern that there will be more requests coming
in and we might have to scale up more
- I have tried to search around and watch videos on how to properly
- I ran into some videos but they were at too low or too high of a
level
- I tried searching books (although only a couple) and they don't
seem to discuss this provisioning step
- We don't currently have a great Mongo Consult in our organization
and i was referred to this forum
- Mongo Atlas or any managed solutions outside of what I can install
from the Azure marketplace (or manually install on bare metal) are my only
current options unfortunately (it's a billing source thing).
- Is it absurd for me to run all this on a single node? Should this
be multi-node? How can I decide? I have read about 'replica sets' and
although the 'replica' portion is not essential for us, the ability to have
better disk throughput may be the thing that we will need
- Is there a good place to look to see if my machine is
under-provisioned or if I should have a different setup?
- Any other tips for tuning this machine?
- Is there a way to get a consult directly with mongo for something
like this?
Basically I'm fine if you want to point me to particular resources and I
read, I just haven't ran into those resources after about 5 hours of google
searching and video watching. Yes the Mongo docs talk about shading and
vertical/horizontal scaling, but the biggest thing is I don't know how to
know if that's relevant for my case. (Maybe I have room to grow, or maybe
my setup is silly. I don't have the experience to know.)
Thanks for any help you can give!
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/77a450bd-db3b-4046-a78a-566c7f9e8b6a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
james m
2018-12-06 02:34:27 UTC
Permalink
Thank you both for your replies.

A direct mongo consult is not possible in the short term, unfortunately.
It has to do with who pays for what in my department vs the company for
this development work. But it will be something I will try and push that
going forward.

In the meantime I will look into the MongoDB classes. If you know any
other provisioning-specific resources, let me know.

Thanks for your help,
james
Post by MH
Like Robert says, it sounds like your company has the resources to engage
mongo staff directly, convince them to invest in it, depending on how your
schema is structured you might benefit in sharding. I've begun modifying
the way my company deploys the database servers for our software because a
replica set is invaluable with saving time for maintenance and backups.
Being able to compact and restore hard drive space on a member of the
replica set one at a time with having only a small window of down time
while they fail over is very important, not to mention the secondaries can
be used for backups and pointing our reporting software at the backups so
that the primary is not affected.
Post by james m
Apologies if any of these questions sound naive. I am only getting into
BACKGROUND
- I work in a department of a huge company as a data scientist.
- I have taken the work of doing some architecture/engineering and
putting a data pipeline together (I guess because of my 'hacking' skills,
don't know why I'm the guy to do it ;) )
- I have a data flow going through Mongo was a necessity based on the
open source software we were using to write to the database.
- I am doing this all in Azure. I spun up a single VM (16 VCPUS,
56GB Ram) and attached a single managed drives which can support 5000iops
and a throughput of 250Mbps.
- These machines all have IIS, our open source software which
processes packets and our MongoDB -- yes all on a single node.
- These (I have two for different purposes) end up with a sustained
cpu of 20% and memory usage of about 50%. The disk queue length is sub 0.5
- My data on disk (based on the size of the folders on the managed
disk) in mongo seems to be 150GB when we write to a datalake in other
non-mongo flat files it ends up being closer to a TB
- I run a 3rd party synchronizer to pull in data. I don't have a
good idea of the throughput coming in, but the current resource usage is
the above.
- We don't need multi-region redundancy outside of backups (non of
this is so mission-critical)
- 1) we do receive errors once in a while with our synchronizer
that at the time seemed to be deadlock errors. The errors have since
passed, but we do receive the one-off errors once in a while that I cant be
totally sure are not unrelated to my naive provisioning.
- 2) there is some concern that there will be more requests coming
in and we might have to scale up more
- I have tried to search around and watch videos on how to properly
- I ran into some videos but they were at too low or too high of a
level
- I tried searching books (although only a couple) and they don't
seem to discuss this provisioning step
- We don't currently have a great Mongo Consult in our
organization and i was referred to this forum
- Mongo Atlas or any managed solutions outside of what I can install
from the Azure marketplace (or manually install on bare metal) are my only
current options unfortunately (it's a billing source thing).
- Is it absurd for me to run all this on a single node? Should this
be multi-node? How can I decide? I have read about 'replica sets' and
although the 'replica' portion is not essential for us, the ability to have
better disk throughput may be the thing that we will need
- Is there a good place to look to see if my machine is
under-provisioned or if I should have a different setup?
- Any other tips for tuning this machine?
- Is there a way to get a consult directly with mongo for something
like this?
Basically I'm fine if you want to point me to particular resources and I
read, I just haven't ran into those resources after about 5 hours of google
searching and video watching. Yes the Mongo docs talk about shading and
vertical/horizontal scaling, but the biggest thing is I don't know how to
know if that's relevant for my case. (Maybe I have room to grow, or maybe
my setup is silly. I don't have the experience to know.)
Thanks for any help you can give!
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/21ebda56-6cb3-414c-87fb-f7dd3d0e4cb5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...