Discussion:
Mongodb replica set across AWS regions or subnets or data centers
(too old to reply)
Archanaa Panda
2015-04-28 20:10:47 UTC
Permalink
Hi,

I seem to have run into a blocking problem. I want a certain collection
which has to be replicated across multiple AWS regions/subnets/data centers
(eg a global lookup table) i.e. all data replicated across all regions and
it can be updated by application layer of any region. In order to do that
do I need to provide public IP address and therefore put all my mongodb EC2
nodes in public network for all the replicas so that replication can be
done across the regions so that in case of writes from application, a
master node in a different region / subnet can be accessed by my
application layer?

Thanks and Regards,
Archanaa Panda
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/472d8e52-cea7-45a6-ac19-e27feae3c414%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
s.molinari
2015-04-29 07:42:19 UTC
Permalink
I believe this won't work, at least not the way you are describing it with
"it can be updated by application layer of any region" or with "a master
node in a different region". With MongoDB replication
<http://docs.mongodb.org/manual/replication/> there is always only one
single primary (the master) in any type of replica set configuration
<http://docs.mongodb.org/manual/core/replica-set-architectures/>, therefore
the primary will always have to be in one (network) region. You can
however, have replica set nodes (the secondaries) with copies of your data
spread all over the world and read from them with the caveat that the data
may be stale or not yet up-to-date, which you'll need to account for in
your application.

A feature you might be able to use is tag aware sharding
<http://docs.mongodb.org/manual/core/tag-aware-sharding/>, where you can
basically have data saved in certain shards (locations) and have your
clients also in those same locations. However, I am uncertain this is would
be an efficient use case of tag aware sharding. The idea of sharding is to
distribute data over multiple replica sets in order to increase compute,
storage and I/O / network resources. It isn't really made to offer an
(efficient) world-wide database, because you'll need to have centrally
located config servers in one region, which means any latency of
communicating across to any "long distance" replica sets is still there.
(and why you want the multi-master system in the first place, right?) So,
if efficiency/ performance, isn't totally a major concern, tag aware
sharding could be a possible solution.

Scott
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/da8c8d10-1b51-41c7-bc52-cb6e4a9d95d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Archanaa Panda
2015-04-29 08:45:29 UTC
Permalink
Hi, Thanks for your reply. Say that I am not bothered too much about
latency across regions and want data to be replicated. I am more concerned
whether I would have to allocate a static public IP address for each mongod
process on each node - since if the primary fails in one region and the
secondary of another region becomes the primary, then applications from
some other region should be able to access the primary.

In other words, read local, write global as shown in the below diagrams.

<Loading Image...>


<Loading Image...>
Post by Archanaa Panda
Hi,
I seem to have run into a blocking problem. I want a certain collection
which has to be replicated across multiple AWS regions/subnets/data centers
(eg a global lookup table) i.e. all data replicated across all regions and
it can be updated by application layer of any region. In order to do that
do I need to provide public IP address and therefore put all my mongodb EC2
nodes in public network for all the replicas so that replication can be
done across the regions so that in case of writes from application, a
master node in a different region / subnet can be accessed by my
application layer?
Thanks and Regards,
Archanaa Panda
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/b885bffa-1c20-4c96-8d49-9658d54d36a7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
s.molinari
2015-04-29 09:52:16 UTC
Permalink
I am no expert, but the first image with the primary and distributed
secondaries would be a normal replica set and you would set the replicas
not in the same DC as the primary to non-voting members
<http://docs.mongodb.org/manual/tutorial/configure-a-non-voting-replica-set-member/>,
meaning they would never have the ability to become the primary. You would
also have secondaries with your primary in the same DC. They would then be
voting members and only they can become a primary.

Your second image can only work with the tag aware sharding I mentioned, as
MongoDB doesn't have multi-master replication. (I am not sure if it is a
proper solution either).

Maybe someone smarter can chime in?

Scott
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/ddc70c0e-48f7-4f3e-8797-14b377d353fb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Archanaa Panda
2015-04-29 12:10:35 UTC
Permalink
Hi,

Just clarifying my own question and understanding further, even if I do
enable tag aware sharding or keep replicas in a different DC as permanently
hidden or non-voting members, it is the networking access aspect that
bothers me...
1. For diagram 1, if my applications are in Asia or UK, they would still
want to be able to write to the primary - which is in US east coast. So the
primary will have to be accessible via a public network access (write
global). If tunnelling / port forwarding can ensure that the writes get to
the primary, I would still want to know what I will need to give as the
replica set member names to form the replica.
2. Same for diagram 2, I understand that it can be achieved via tag aware
sharding. However, how will I -
a) make a replica set by listing all the members - even those which are in
a remote DC.
b) How will my mongos and mongo config servers which have to be in each DC,
need to know and access the replica members of the shard. Best practice is
to have mongos running on all application server machines or at least
locally in that geography.
c) If a customer travels from US to UK and his primary shard replica were
in US, he won't be able to write any data to US - eg he can't change his
credentials, he can't make reservations for places which are in UK or back
home in US when he is in a different DC etc.

The only straightforward way I can think of is that I have to give public
EIP addresses to each and every node in every shard and replica so that
mongos or application tier will be able to route the request to the
appropriate location. Additionally, public EIP addresses to each config
servers as well.
If I grow to have 100s or 1000s of mongo db instances then it doesn't seem
quite right to have as many number of EIPs.
The other straightforward way is that I put everything i.e. all mongod,
mongos and application servers in one DC / geography and into 1 large VPC.
But then again I am not really taking advantage of multiple DCs and high
availability - and how will I decide whether requests from mobile users
from Singapore should go to US-east-1a or US-east-1b? Where will I draw the
boundary of what is a continent/geographical region and what is a country?

Not quite sure how large installations like Foursquare solve this problem
which have grown beyond the boundaries of a single VPC and single region.
Post by Archanaa Panda
Hi,
I seem to have run into a blocking problem. I want a certain collection
which has to be replicated across multiple AWS regions/subnets/data centers
(eg a global lookup table) i.e. all data replicated across all regions and
it can be updated by application layer of any region. In order to do that
do I need to provide public IP address and therefore put all my mongodb EC2
nodes in public network for all the replicas so that replication can be
done across the regions so that in case of writes from application, a
master node in a different region / subnet can be accessed by my
application layer?
Thanks and Regards,
Archanaa Panda
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/2d81c060-49cf-438f-9f08-298f641c2650%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Asya Kamsky
2015-04-30 00:32:02 UTC
Permalink
Any node in your cluster that a client may need to "talk" to must have a
reachable IP address.

What's the issue with having thousands of addresses if you have thousands
of nodes? Aren't they all supposed to have different names/addresses
anyway?

Asya
Post by Archanaa Panda
Hi,
Just clarifying my own question and understanding further, even if I do
enable tag aware sharding or keep replicas in a different DC as permanently
hidden or non-voting members, it is the networking access aspect that
bothers me...
1. For diagram 1, if my applications are in Asia or UK, they would still
want to be able to write to the primary - which is in US east coast. So the
primary will have to be accessible via a public network access (write
global). If tunnelling / port forwarding can ensure that the writes get to
the primary, I would still want to know what I will need to give as the
replica set member names to form the replica.
2. Same for diagram 2, I understand that it can be achieved via tag aware
sharding. However, how will I -
a) make a replica set by listing all the members - even those which are in
a remote DC.
b) How will my mongos and mongo config servers which have to be in each
DC, need to know and access the replica members of the shard. Best practice
is to have mongos running on all application server machines or at least
locally in that geography.
c) If a customer travels from US to UK and his primary shard replica were
in US, he won't be able to write any data to US - eg he can't change his
credentials, he can't make reservations for places which are in UK or back
home in US when he is in a different DC etc.
The only straightforward way I can think of is that I have to give public
EIP addresses to each and every node in every shard and replica so that
mongos or application tier will be able to route the request to the
appropriate location. Additionally, public EIP addresses to each config
servers as well.
If I grow to have 100s or 1000s of mongo db instances then it doesn't seem
quite right to have as many number of EIPs.
The other straightforward way is that I put everything i.e. all mongod,
mongos and application servers in one DC / geography and into 1 large VPC.
But then again I am not really taking advantage of multiple DCs and high
availability - and how will I decide whether requests from mobile users
from Singapore should go to US-east-1a or US-east-1b? Where will I draw the
boundary of what is a continent/geographical region and what is a country?
Not quite sure how large installations like Foursquare solve this problem
which have grown beyond the boundaries of a single VPC and single region.
Post by Archanaa Panda
Hi,
I seem to have run into a blocking problem. I want a certain collection
which has to be replicated across multiple AWS regions/subnets/data centers
(eg a global lookup table) i.e. all data replicated across all regions and
it can be updated by application layer of any region. In order to do that
do I need to provide public IP address and therefore put all my mongodb EC2
nodes in public network for all the replicas so that replication can be
done across the regions so that in case of writes from application, a
master node in a different region / subnet can be accessed by my
application layer?
Thanks and Regards,
Archanaa Panda
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/2d81c060-49cf-438f-9f08-298f641c2650%40googlegroups.com
<https://groups.google.com/d/msgid/mongodb-user/2d81c060-49cf-438f-9f08-298f641c2650%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
MongoDB World is back! June 1-2 in NYC. Use code ASYA for 25% off!
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/CAOe6dJB4-b8dwELR98p0aNWPzq_G_ozbB07ytxf-DCtyRzNspA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Archanaa Panda
2015-04-30 06:46:51 UTC
Permalink
Yes they are, but public IP addresses on AWS keep changing on restart of a
machine. I believe to form replica sets, I need static IP addresses so that
they don't change whenever I restart and the individual members rejoin the
cluster if the machine does restart.
So I think a couple of things I will need to try out is -
a) Check VPC peering and NAT port forwarding - if it can work with multiple
machines on either side - and also if it allows access/integration from our
MMS account for leveraging monitoring and backup etc.
b) If all else fails, check what it costs the business to apply for as many
number of EIPs as there are individual nodes as well as config servers.
There is currently a limit of only 5 per account.
c) Keep everything in one VPC and region.

I wanted to know the best practice that is followed for large installations
of MongoDB on AWS but there doesn't seem to be enough information
available, so we will have to try by ourselves.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/144790fd-0813-441d-b1fe-830679e8e637%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Archanaa Panda
2015-04-30 06:58:15 UTC
Permalink
Also because public static IP addresses will not be the best practice as
per security also...
Post by Archanaa Panda
Yes they are, but public IP addresses on AWS keep changing on restart of a
machine. I believe to form replica sets, I need static IP addresses so that
they don't change whenever I restart and the individual members rejoin the
cluster if the machine does restart.
So I think a couple of things I will need to try out is -
a) Check VPC peering and NAT port forwarding - if it can work with
multiple machines on either side - and also if it allows access/integration
from our MMS account for leveraging monitoring and backup etc.
b) If all else fails, check what it costs the business to apply for as
many number of EIPs as there are individual nodes as well as config
servers. There is currently a limit of only 5 per account.
c) Keep everything in one VPC and region.
I wanted to know the best practice that is followed for large
installations of MongoDB on AWS but there doesn't seem to be enough
information available, so we will have to try by ourselves.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/d8fb8b0e-cb83-46a3-bee6-2482b35265db%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...