Discussion:
MongoS server not starting also not balancing Server's sharding metadata manager failed asking for instance is manually reset
(too old to reply)
Virendra Agarwal
2016-05-02 12:51:56 UTC
Permalink
My MongoS servers are not staring they are sending this error in logs.

SHARDING [Balancer] caught exception while doing balance: Server's
sharding metadata manager failed to initialize and will remain in this
state until the instance is manually reset :: caused by :: HostNotFound:
unable to resolve DNS for host confserv_1.indiatimes.com

2016-05-02T17:57:06.612+0530 I SHARDING [Balancer] about to log
metadata event into actionlog: { _id:
"iBeatDB2255-2016-05-02T17:57:06.611+0530-5727479aa1051c5fb04fcc49",
server: "mongoS1", clientAddr: "", time: new Date(1462192026611), what:
"balancer.round", ns: "", details: { executionTimeMillis: 35, errorOccured:
true, errmsg: "Server's sharding metadata manager failed to initialize and
will remain in this state until the instance is manually reset :: caused by
:: HostNotFoun..." } }

When i connect config server using host name it is working fine.
I tried to restart MOngoS server it is not coming up.

Please help me.

Thanks
Viren
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.org/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/029a0c8b-e7e2-4b6b-a7d2-b259a3db4e92%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Virendra Agarwal
2016-05-02 15:08:19 UTC
Permalink
My MongoS servers are not staring they are sending this error in logs.

SHARDING [Balancer] caught exception while doing balance: Server's
sharding metadata manager failed to initialize and will remain in this
state until the instance is manually reset :: caused by :: HostNotFound:
unable to resolve DNS for host confserv_1.xyz.com

2016-05-02T17:57:06.612+0530 I SHARDING [Balancer] about to log
metadata event into actionlog: { _id:
"DB2255-2016-05-02T17:57:06.611+0530-5727479aa1051c5fb04fcc49", server:
"mongoS1", clientAddr: "", time: new Date(1462192026611), what:
"balancer.round", ns: "", details: { executionTimeMillis: 35, errorOccured:
true, errmsg: "Server's sharding metadata manager failed to initialize and
will remain in this state until the instance is manually reset :: caused by
:: HostNotFoun..." } }

When i connect config server using host name it is working fine.
I tried to restart MOngoS server it is not coming up.

Please help me.

Thanks
Viren
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.org/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/a0cbe29e-94cf-40bf-9df1-8149e66d7122%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Virendra Agarwal
2016-05-03 05:17:30 UTC
Permalink
I check Mongo code and found this error mentioned in
https://github.com/mongodb/mongo/blob/master/src/mongo/db/s/sharding_state.cpp


/ TODO: remove after v3.4.
// This is for backwards compatibility with old style initialization
through metadata
// commands/setShardVersion. As well as all assignments to
_initializationStatus and
// _setInitializationState_inlock in this method.
if (_getInitializationState() == InitializationState::kInitializing) {
auto waitStatus = _waitForInitialization_inlock(deadline, lk);
if (!waitStatus.isOK()) {
return waitStatus;
}
}

if (_getInitializationState() == InitializationState::kError) {
return {ErrorCodes::ManualInterventionRequired,
str::stream() << "Server's sharding metadata manager failed
to initialize and will "
"remain in this state until the instance
is manually reset"
<< causedBy(_initializationStatus)};
}
But it does not mention anything what manual intervention is required.

Thanks
Viren
Post by Virendra Agarwal
My MongoS servers are not staring they are sending this error in logs.
SHARDING [Balancer] caught exception while doing balance: Server's
sharding metadata manager failed to initialize and will remain in this
unable to resolve DNS for host confserv_1.xyz.com
2016-05-02T17:57:06.612+0530 I SHARDING [Balancer] about to log
true, errmsg: "Server's sharding metadata manager failed to initialize and
will remain in this state until the instance is manually reset :: caused by
:: HostNotFoun..." } }
When i connect config server using host name it is working fine.
I tried to restart MOngoS server it is not coming up.
Please help me.
Thanks
Viren
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.org/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/d896f11c-c981-40a3-b647-cf3ad012064a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Kevin Adistambha
2016-05-09 03:41:24 UTC
Permalink
Hi Viren,

SHARDING [Balancer] caught exception while doing balance: Server’s sharding
metadata manager failed to initialize and will remain in this state until
the instance is manually reset :: caused by :: HostNotFound: unable to
resolve DNS for host confserv_1.xyz.com

I believe the main issue is the inability of the mongos process to connect
to the config server confserv_1.xyz.com due to DNS issues. Is this a
constant issue, or is it intermittent?

When i connect config server using host name it is working fine.

Did you try to connect to confserv_1.xyz.com from the machine that is
hosting the mongos process? Also, how did you determine that the connection
between the two machines are fine (i.e. using ping, connecting using the
mongo shell, etc.)?

I tried to restart MOngoS server it is not coming up.

Is there any error messages in the mongos log that shows the reason why it
cannot be started?

If you are still having issues, could you please provide:

- your MongoDB version
- your deployment topology (i.e. how many config servers, how many mongos,
whether all mongos is having this issue, etc.)
- the output of db.serverCmdLineOpts() from the mongos processes
- the output of sh.status()
- any error messages in the logs (mongod and mongos)

Best regards,
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.org/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/e6379fa9-e106-4c50-a9d2-31bb4b7ddd67%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Virendra Agarwal
2016-05-09 04:28:20 UTC
Permalink
Hi Kevin,
Thanks for responding on threads I really appreciate your kind response.
Post by Kevin Adistambha
I believe the main issue is the inability of the mongos process to
connect to the config server confserv_1.xyz.com due to DNS issues. Is
this a constant issue, or is it intermittent?
This issue was not consistent as sometimes we see it on MongoS or some
times on replica set..
Post by Kevin Adistambha
Did you try to connect to confserv_1.xyz.com from the machine that is
hosting the mongos process? Also, how did you determine that the
connection between the two machines are fine (i.e. using ping, connecting
using the mongo shell, etc.)?
Yes i tried with ping then i opened this confserv_1.xyz.com from same
machine hosting MongoS server.
Post by Kevin Adistambha
Is there any error messages in the mongos log that shows the reason why
it cannot be started?
The same error was there when i confirmed the connection was fine i try to
restart the server but it gave me same error as host not resolved.
One more thing we also confirmed the dbhash of all config servers and it
was all fine.
We took a restart of whole cluster and then thyis error was gone. But now
we are occasinaly seeing mongo service down on cnfig servers.

- your MongoDB version -
- 3.2.3
- your deployment topology (i.e. how many config servers, how many mongos,
whether all mongos is having this issue, etc.)
- 3 Config Servers 4 MongoS yup alll serevrs showed same issue.
- the output of db.serverCmdLineOpts() from the mongos processes
- the output of sh.status()
- any error messages in the logs (mongod and mongos)
Post by Kevin Adistambha
Hi Viren,
SHARDING [Balancer] caught exception while doing balance: Server’s
sharding metadata manager failed to initialize and will remain in this
unable to resolve DNS for host confserv_1.xyz.com
I believe the main issue is the inability of the mongos process to
connect to the config server confserv_1.xyz.com due to DNS issues. Is
this a constant issue, or is it intermittent?
When i connect config server using host name it is working fine.
Did you try to connect to confserv_1.xyz.com from the machine that is
hosting the mongos process? Also, how did you determine that the
connection between the two machines are fine (i.e. using ping, connecting
using the mongo shell, etc.)?
I tried to restart MOngoS server it is not coming up.
Is there any error messages in the mongos log that shows the reason why
it cannot be started?
- your MongoDB version
- your deployment topology (i.e. how many config servers, how many
mongos, whether all mongos is having this issue, etc.)
- the output of db.serverCmdLineOpts() from the mongos processes
mongos> db.serverCmdLineOpts();
{

"argv" : [

"/opt/mongodb/bin/mongos",

"--config",

"/opt/mongodb.conf",

"--configdb",
Post by Kevin Adistambha
"confserv_1.xyz.com:27017,confserv_2.xyz.com:27017,confserv_3.xyz.com:27017",
"--maxConns=20000",

"--logpath=/opt/mongolog/log/mongodb.log",

"--logappend"

],

"parsed" : {

"config" : "/opt/mongodb.conf",

"net" : {

"http" : {

"enabled" : true

},

"maxIncomingConnections" : 20000

},

"sharding" : {
Post by Kevin Adistambha
"confserv_1.xyz.com:27017,confserv_2.xyz.com:27017,confserv_3.xyz.com:27017"
},

"systemLog" : {

"destination" : "file",

"logAppend" : true,

"path" : "/opt/mongolog/log/mongodb.log"

}

},

"ok" : 1

}
Post by Kevin Adistambha
- the output of sh.status()
Attached output.
- any error messages in the logs (mongod and mongos)
- [ReplicationExecutor] Error in heartbeat request to
secondary-rep2:27017; ExceededTimeLimit: Couldn't get a connection within
the time limit
-
Server’s sharding metadata manager failed to initialize and will remain in
this state until the instance is manually reset :: caused by ::
HostNotFound: unable to resolve DNS for host confserv_1.xyz.co
<http://confserv_1.xyz.com/>m
Post by Kevin Adistambha
Best regards,
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.org/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/bc280ec6-3920-47c9-b5c6-b83029319758%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Virendra Agarwal
2016-05-09 04:44:15 UTC
Permalink
Just to add one more thing we saw this issue again on one of our shard
replica.
Network connection was down from primary to replica for some time and we
restored it. But primary could not connect to resorted secondary.

It was always showed not reachable in rs.status() till i manually step it
down and restart mongo process then made this primary again.
Post by Virendra Agarwal
Hi Kevin,
Thanks for responding on threads I really appreciate your kind response.
Post by Kevin Adistambha
I believe the main issue is the inability of the mongos process to
connect to the config server confserv_1.xyz.com due to DNS issues. Is
this a constant issue, or is it intermittent?
This issue was not consistent as sometimes we see it on MongoS or some
times on replica set..
Post by Kevin Adistambha
Did you try to connect to confserv_1.xyz.com from the machine that is
hosting the mongos process? Also, how did you determine that the
connection between the two machines are fine (i.e. using ping,
connecting using the mongo shell, etc.)?
Yes i tried with ping then i opened this confserv_1.xyz.com from same
machine hosting MongoS server.
Post by Kevin Adistambha
Is there any error messages in the mongos log that shows the reason why
it cannot be started?
The same error was there when i confirmed the connection was fine i try
to restart the server but it gave me same error as host not resolved.
One more thing we also confirmed the dbhash of all config servers and it
was all fine.
We took a restart of whole cluster and then thyis error was gone. But now
we are occasinaly seeing mongo service down on cnfig servers.
- your MongoDB version -
- 3.2.3
- your deployment topology (i.e. how many config servers, how many
mongos, whether all mongos is having this issue, etc.)
- 3 Config Servers 4 MongoS yup alll serevrs showed same issue.
- the output of db.serverCmdLineOpts() from the mongos processes
- the output of sh.status()
- any error messages in the logs (mongod and mongos)
Post by Kevin Adistambha
Hi Viren,
SHARDING [Balancer] caught exception while doing balance: Server’s
sharding metadata manager failed to initialize and will remain in this
unable to resolve DNS for host confserv_1.xyz.com
I believe the main issue is the inability of the mongos process to
connect to the config server confserv_1.xyz.com due to DNS issues. Is
this a constant issue, or is it intermittent?
When i connect config server using host name it is working fine.
Did you try to connect to confserv_1.xyz.com from the machine that is
hosting the mongos process? Also, how did you determine that the
connection between the two machines are fine (i.e. using ping,
connecting using the mongo shell, etc.)?
I tried to restart MOngoS server it is not coming up.
Is there any error messages in the mongos log that shows the reason why
it cannot be started?
- your MongoDB version
- your deployment topology (i.e. how many config servers, how many
mongos, whether all mongos is having this issue, etc.)
- the output of db.serverCmdLineOpts() from the mongos processes
mongos> db.serverCmdLineOpts();
{
"argv" : [
"/opt/mongodb/bin/mongos",
"--config",
"/opt/mongodb.conf",
"--configdb",
"confserv_1.xyz.com:27017,confserv_2.xyz.com:27017,
Post by Kevin Adistambha
confserv_3.xyz.com:27017",
"--maxConns=20000",
"--logpath=/opt/mongolog/log/mongodb.log",
"--logappend"
],
"parsed" : {
"config" : "/opt/mongodb.conf",
"net" : {
"http" : {
"enabled" : true
},
"maxIncomingConnections" : 20000
},
"sharding" : {
"configDB" : "confserv_1.xyz.com:27017,
Post by Kevin Adistambha
confserv_2.xyz.com:27017,confserv_3.xyz.com:27017"
},
"systemLog" : {
"destination" : "file",
"logAppend" : true,
"path" : "/opt/mongolog/log/mongodb.log"
}
},
"ok" : 1
}
Post by Kevin Adistambha
- the output of sh.status()
Attached output.
- any error messages in the logs (mongod and mongos)
- [ReplicationExecutor] Error in heartbeat request to
secondary-rep2:27017; ExceededTimeLimit: Couldn't get a connection within
the time limit
-
Server’s sharding metadata manager failed to initialize and will remain in
HostNotFound: unable to resolve DNS for host confserv_1.xyz.co
<http://confserv_1.xyz.com/>m
Post by Kevin Adistambha
Best regards,
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.org/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/96af9f7a-d89d-4d03-85e1-d964a8c83c17%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Kevin Adistambha
2016-05-09 06:32:38 UTC
Permalink
Hi Viren,

This issue was not consistent as sometimes we see it on MongoS or some
times on replica set..

3 Config Servers 4 MongoS yup alll serevrs showed same issue.

Network connection was down from primary to replica for some time and we
restored it. But primary could not connect to resorted secondary.

We took a restart of whole cluster and then thyis error was gone

From what I have seen so far, I believe the underlying issue is network
connectivity/configuration within your deployment. My understanding so far
is:

- all servers are experiencing the same issue intermittently
- the issue seems to be spread across the whole cluster (i.e. sometimes
on mongos and other times on individual mongod)
- network connectivity issues within a replica set
- error messages in the form of “HostNotFound: unable to resolve DNS for
host” or “Couldn’t get a connection within the time limit”
- restarting the cluster seems to solve the problem for a while (likely
due to the refresh of the DNS cache)

All these signs seems to point that the issue is in your network setup
(e.g. DNS setup, network hardware issues, etc.) and not in your MongoDB
deployment. The output of sh.status() doesn’t seem to show any notable
issue, and:

One more thing we also confirmed the dbhash of all config servers and it
was all fine.

seems to indicate that the cluster is operating normally, the config
servers are consistent with each other (which is vital to the operation of
a sharded cluster), and cluster balancing seems to operate normally as well.

Is there a pattern to this issue? For example, did you observe these
network-related errors happening more often in some particular time, during
particular load, etc.?

On another note, I would recommend you to upgrade to the latest in the 3.2
series, which is currently 3.2.6
<https://docs.mongodb.com/manual/release-notes/3.2/#apr-28-2016> for
bugfixes and improvements.

Best regards,
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.org/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/662bb9fd-c239-4912-bd27-b484b44d80f8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Virendra Agarwal
2016-05-09 06:42:52 UTC
Permalink
Thanks Kevin.

Yup update is in plan most probably we will do it by today as it is drop in
update.

There is no fix pattern for network issue as far as i searched.
My query was After network issue occurred and server were not communicating
and everything came back online why still it was showing an error of
communication.
As for example Yesterday evening we faced that secondary of one Shard was
not reachable and heart beat issue was there.
But once secondary was up and reachable rs.status were giving different on
both primary and secondary.
Primary was still showing secondary not available but secondary replica
status was coming fine.
We try to open port from primary to secondary and it was working.
Issue was fixed when step down primary and restart then made it primary
again. (surprise)

Similarly when last time we took whole cluster restart issue was gone.
I have attached rs.status from both servers here.

Thanks
Virendra Agarwal
Post by Kevin Adistambha
Hi Viren,
This issue was not consistent as sometimes we see it on MongoS or some
times on replica set..
3 Config Servers 4 MongoS yup alll serevrs showed same issue.
Network connection was down from primary to replica for some time and we
restored it. But primary could not connect to resorted secondary.
We took a restart of whole cluster and then thyis error was gone
From what I have seen so far, I believe the underlying issue is network
connectivity/configuration within your deployment. My understanding so far
- all servers are experiencing the same issue intermittently
- the issue seems to be spread across the whole cluster (i.e.
sometimes on mongos and other times on individual mongod)
- network connectivity issues within a replica set
- error messages in the form of “HostNotFound: unable to resolve DNS
for host” or “Couldn’t get a connection within the time limit”
- restarting the cluster seems to solve the problem for a while
(likely due to the refresh of the DNS cache)
All these signs seems to point that the issue is in your network setup
(e.g. DNS setup, network hardware issues, etc.) and not in your MongoDB
deployment. The output of sh.status() doesn’t seem to show any notable
One more thing we also confirmed the dbhash of all config servers and it
was all fine.
seems to indicate that the cluster is operating normally, the config
servers are consistent with each other (which is vital to the operation of
a sharded cluster), and cluster balancing seems to operate normally as well.
Is there a pattern to this issue? For example, did you observe these
network-related errors happening more often in some particular time, during
particular load, etc.?
On another note, I would recommend you to upgrade to the latest in the 3.2
series, which is currently 3.2.6
<https://docs.mongodb.com/manual/release-notes/3.2/#apr-28-2016> for
bugfixes and improvements.
Best regards,
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.org/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/71ad8712-06fb-4168-9192-20eb68702a96%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Virendra Agarwal
2016-05-09 06:43:22 UTC
Permalink
Post by Virendra Agarwal
Thanks Kevin.
Yup update is in plan most probably we will do it by today as it is drop
in update.
There is no fix pattern for network issue as far as i searched.
My query was After network issue occurred and server were not
communicating and everything came back online why still it was showing an
error of communication.
As for example Yesterday evening we faced that secondary of one Shard was
not reachable and heart beat issue was there.
But once secondary was up and reachable rs.status were giving different on
both primary and secondary.
Primary was still showing secondary not available but secondary replica
status was coming fine.
We try to open port from primary to secondary and it was working.
Issue was fixed when step down primary and restart then made it primary
again. (surprise)
Similarly when last time we took whole cluster restart issue was gone.
I have attached rs.status from both servers here.
Thanks
Virendra Agarwal
Post by Kevin Adistambha
Hi Viren,
This issue was not consistent as sometimes we see it on MongoS or some
times on replica set..
3 Config Servers 4 MongoS yup alll serevrs showed same issue.
Network connection was down from primary to replica for some time and we
restored it. But primary could not connect to resorted secondary.
We took a restart of whole cluster and then thyis error was gone
From what I have seen so far, I believe the underlying issue is network
connectivity/configuration within your deployment. My understanding so far
- all servers are experiencing the same issue intermittently
- the issue seems to be spread across the whole cluster (i.e.
sometimes on mongos and other times on individual mongod)
- network connectivity issues within a replica set
- error messages in the form of “HostNotFound: unable to resolve DNS
for host” or “Couldn’t get a connection within the time limit”
- restarting the cluster seems to solve the problem for a while
(likely due to the refresh of the DNS cache)
All these signs seems to point that the issue is in your network setup
(e.g. DNS setup, network hardware issues, etc.) and not in your MongoDB
deployment. The output of sh.status() doesn’t seem to show any notable
One more thing we also confirmed the dbhash of all config servers and it
was all fine.
seems to indicate that the cluster is operating normally, the config
servers are consistent with each other (which is vital to the operation of
a sharded cluster), and cluster balancing seems to operate normally as well.
Is there a pattern to this issue? For example, did you observe these
network-related errors happening more often in some particular time, during
particular load, etc.?
On another note, I would recommend you to upgrade to the latest in the
3.2 series, which is currently 3.2.6
<https://docs.mongodb.com/manual/release-notes/3.2/#apr-28-2016> for
bugfixes and improvements.
Best regards,
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.org/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/7e3df34b-9ebc-4c60-ac58-d97c292f0ea8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Matthieu Rigal' via mongodb-user
2016-06-28 16:34:16 UTC
Permalink
Hi Virendra,

I just ran into this problem while trying to harden the security
configuration. As in your case, I was able to connect to the config servers
from all mongos instances.

In my case I was also testing a case with members of replica sets being in
different datacenters, and I had the problem only after steppingDown some
primaries.

The instability you may have in your network architecture is actually not
responsible for these routing issues, but probably for stepDowns. And your
different mongo machines may have different name resolving configurations
that should be the root of the problem.

I noticed at the end that, not as the error message is pretending, the
issue was happening on some primaries of one datacenter, who were not able
to route back to the config server. After fixing the routing problem
(/etc/hosts eventually), no more problems occurred on the mongo side.

Have fun fixing :)

Best, Matthieu
Post by Virendra Agarwal
Thanks Kevin.
Yup update is in plan most probably we will do it by today as it is drop
in update.
There is no fix pattern for network issue as far as i searched.
My query was After network issue occurred and server were not
communicating and everything came back online why still it was showing an
error of communication.
As for example Yesterday evening we faced that secondary of one Shard was
not reachable and heart beat issue was there.
But once secondary was up and reachable rs.status were giving different on
both primary and secondary.
Primary was still showing secondary not available but secondary replica
status was coming fine.
We try to open port from primary to secondary and it was working.
Issue was fixed when step down primary and restart then made it primary
again. (surprise)
Similarly when last time we took whole cluster restart issue was gone.
I have attached rs.status from both servers here.
Thanks
Virendra Agarwal
Post by Kevin Adistambha
Hi Viren,
This issue was not consistent as sometimes we see it on MongoS or some
times on replica set..
3 Config Servers 4 MongoS yup alll serevrs showed same issue.
Network connection was down from primary to replica for some time and we
restored it. But primary could not connect to resorted secondary.
We took a restart of whole cluster and then thyis error was gone
From what I have seen so far, I believe the underlying issue is network
connectivity/configuration within your deployment. My understanding so far
- all servers are experiencing the same issue intermittently
- the issue seems to be spread across the whole cluster (i.e.
sometimes on mongos and other times on individual mongod)
- network connectivity issues within a replica set
- error messages in the form of “HostNotFound: unable to resolve DNS
for host” or “Couldn’t get a connection within the time limit”
- restarting the cluster seems to solve the problem for a while
(likely due to the refresh of the DNS cache)
All these signs seems to point that the issue is in your network setup
(e.g. DNS setup, network hardware issues, etc.) and not in your MongoDB
deployment. The output of sh.status() doesn’t seem to show any notable
One more thing we also confirmed the dbhash of all config servers and it
was all fine.
seems to indicate that the cluster is operating normally, the config
servers are consistent with each other (which is vital to the operation of
a sharded cluster), and cluster balancing seems to operate normally as well.
Is there a pattern to this issue? For example, did you observe these
network-related errors happening more often in some particular time, during
particular load, etc.?
On another note, I would recommend you to upgrade to the latest in the
3.2 series, which is currently 3.2.6
<https://docs.mongodb.com/manual/release-notes/3.2/#apr-28-2016> for
bugfixes and improvements.
Best regards,
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.org/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/3ef5f586-6f57-4183-9305-8d9d196b18b0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Virendra Agarwal
2016-07-06 06:24:59 UTC
Permalink
Thanks Rigal.

Regards
Virendra Agarwal
Post by 'Matthieu Rigal' via mongodb-user
Hi Virendra,
I just ran into this problem while trying to harden the security
configuration. As in your case, I was able to connect to the config servers
from all mongos instances.
In my case I was also testing a case with members of replica sets being in
different datacenters, and I had the problem only after steppingDown some
primaries.
The instability you may have in your network architecture is actually not
responsible for these routing issues, but probably for stepDowns. And your
different mongo machines may have different name resolving configurations
that should be the root of the problem.
I noticed at the end that, not as the error message is pretending, the
issue was happening on some primaries of one datacenter, who were not able
to route back to the config server. After fixing the routing problem
(/etc/hosts eventually), no more problems occurred on the mongo side.
Have fun fixing :)
Best, Matthieu
Post by Virendra Agarwal
Thanks Kevin.
Yup update is in plan most probably we will do it by today as it is drop
in update.
There is no fix pattern for network issue as far as i searched.
My query was After network issue occurred and server were not
communicating and everything came back online why still it was showing an
error of communication.
As for example Yesterday evening we faced that secondary of one Shard was
not reachable and heart beat issue was there.
But once secondary was up and reachable rs.status were giving different
on both primary and secondary.
Primary was still showing secondary not available but secondary replica
status was coming fine.
We try to open port from primary to secondary and it was working.
Issue was fixed when step down primary and restart then made it primary
again. (surprise)
Similarly when last time we took whole cluster restart issue was gone.
I have attached rs.status from both servers here.
Thanks
Virendra Agarwal
Post by Kevin Adistambha
Hi Viren,
This issue was not consistent as sometimes we see it on MongoS or some
times on replica set..
3 Config Servers 4 MongoS yup alll serevrs showed same issue.
Network connection was down from primary to replica for some time and we
restored it. But primary could not connect to resorted secondary.
We took a restart of whole cluster and then thyis error was gone
From what I have seen so far, I believe the underlying issue is network
connectivity/configuration within your deployment. My understanding so far
- all servers are experiencing the same issue intermittently
- the issue seems to be spread across the whole cluster (i.e.
sometimes on mongos and other times on individual mongod)
- network connectivity issues within a replica set
- error messages in the form of “HostNotFound: unable to resolve DNS
for host” or “Couldn’t get a connection within the time limit”
- restarting the cluster seems to solve the problem for a while
(likely due to the refresh of the DNS cache)
All these signs seems to point that the issue is in your network setup
(e.g. DNS setup, network hardware issues, etc.) and not in your MongoDB
deployment. The output of sh.status() doesn’t seem to show any notable
One more thing we also confirmed the dbhash of all config servers and it
was all fine.
seems to indicate that the cluster is operating normally, the config
servers are consistent with each other (which is vital to the operation of
a sharded cluster), and cluster balancing seems to operate normally as well.
Is there a pattern to this issue? For example, did you observe these
network-related errors happening more often in some particular time, during
particular load, etc.?
On another note, I would recommend you to upgrade to the latest in the
3.2 series, which is currently 3.2.6
<https://docs.mongodb.com/manual/release-notes/3.2/#apr-28-2016> for
bugfixes and improvements.
Best regards,
Kevin
​
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/45b74237-3e9f-441d-aa40-fc1dda64cbab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Continue reading on narkive:
Loading...