Discussion:
Force replica set reconfig from pymongo?
(too old to reply)
David Paulsen
2015-03-10 17:49:16 UTC
Permalink
New to mongodb, investigating a 2-node (no arbiters) replica set
reconfiguration when one node goes down (panic, power out, etc.) I'm
starting with MongoDB 2.6.8 - may consider 3.0 real soon now.

From the mongo client, one can forcefully remove a replica set member
according to the procedure described in
http://docs.mongodb.org/manual/tutorial/remove-replica-set-member/

For example, say the secondary crashes, panics, whatever, one can do this:
rs100:PRIMARY>
2015-03-10T03:29:17.271-0700 DBClientCursor::init call() failed
2015-03-10T03:29:17.273-0700 trying reconnect to 127.0.0.1:27017
(127.0.0.1) failed
2015-03-10T03:29:17.274-0700 reconnect 127.0.0.1:27017 (127.0.0.1) ok
rs100:SECONDARY> cfg = rs.conf()
{
"_id" : "rs100",
"version" : 1039759,
"members" : [
{
"_id" : 13,
"host" : "sasp-dev-100:27017"
},
{
"_id" : 14,
"host" : "sasp-dev-101:27017"
}
]
}
rs100:SECONDARY> cfg.members = [cfg.members[0]]
[ { "_id" : 13, "host" : "sasp-dev-100:27017" } ]
rs100:SECONDARY> rs.reconfig(cfg, {force : true})
{ "ok" : 1 }
rs100:SECONDARY> rs.conf()
{
"_id" : "rs100",
"version" : 1118210,
"members" : [
{
"_id" : 13,
"host" : "sasp-dev-100:27017"
}
]
}
rs100:PRIMARY>

Later on, when the secondary is repaired and back online, I can go ahead
and do rs.add("sasp-dev-101:27017") and we're back to normal operating
state.

Picked up a hint about doing this from pymongo in the post
https://groups.google.com/forum/#!topic/mongodb-user/BpPc9nlS6nY

And I hacked up a little test script:
#!/usr/local/bin/python3.4
import sys
import pymongo
import pprint as pp

if len(sys.argv) < 2:
print('Specify replica set host to remove: {0}
"host:port"'.format(sys.argv[0]))
sys.exit(1)

client = pymongo.MongoClient()

cfgDict = client.local.system.replset.find_one()

pp.pprint(cfgDict)

ndx = 0
for mem in cfgDict['members']:
if mem.get('host') == sys.argv[1]:
print("FOUND MATCH ... RECONFIGURING!")
del cfgDict['members'][ndx]
cfgDict['version'] = cfgDict['version'] + 1
print("New Replica Set Config:")
pp.pprint(cfgDict)
input("HIT ENTER to try it :) ")
try:
*client.admin.command({'replSetReconfig': cfgDict}, {'force':
True})*
except pymongo.errors.ConnectionFailure:
pass
break
ndx = ndx + 1
print("Done...")

What seems to be lacking is the effect of "{force: true}" in the
replSetReconfig command issued to the admin db - this little script works
fine as long as the secondary is up, but if the secondary is down, the
primary has degraded into a secondary state, it will raise an
OperationFailure, output from the script being:
New Replica Set Config:
{'_id': 'rs100',
'members': [{'_id': 13, 'host': 'sasp-dev-100:27017'}],
'version': 1118214}
HIT ENTER to try it :)
Traceback (most recent call last):
File "./rm2.py", line 28, in <module>
client.admin.command({'replSetReconfig': cfgDict}, {'force': True})
File "/usr/local/lib/python3.4/site-packages/pymongo/database.py", line
439, in command
uuid_subtype, compile_re, **kwargs)[0]
File "/usr/local/lib/python3.4/site-packages/pymongo/database.py", line
345, in _command
msg, allowable_errors)
File "/usr/local/lib/python3.4/site-packages/pymongo/helpers.py", line
182, in _check_command_response
raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: command {'replSetReconfig': {'version':
1118214, 'members': [{'host': 'sasp-dev-100:27017', '_id': 13}], '_id':
'rs100'}} on namespace admin.$cmd failed: replSetReconfig command must be
sent to the current replica set primary.

My gut this morning says that on line 28, in the client.admin.command(...)
I'm not getting the same 'force:true' effect that I can get in the mongo
shell, that I'm not passing it through properly or that the pymongo command
method won't do what I'm wanting. Going to pdb and step through this, but
any insight is welcome!
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/545b3cf9-ece1-4e91-99d8-22d9c4437dc3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Will Berkeley
2015-03-10 18:10:03 UTC
Permalink
Wait, why are you ejecting a replica set member when it goes down? You
should allow it to come back up (or intervene to bring it back up) and wait
for it reconnect to the set and recover.

-Will
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/01ef7f21-4c2b-40af-a55b-c06e4f344596%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
David Paulsen
2015-03-10 18:22:15 UTC
Permalink
Yes, I may be off in the weeds here. What I *really *want is for the
(still living) primary to be able to perform updates (inserts, whatever)
while the secondary is recovering, and I'm assuming an outage could be
anywhere from minutes to hours to days, understanding that we're vulnerable
as long as we're in this degraded state. Forcefully evicting the
non-functioning member seemed to be a way to get that behavior. (Again,
I'm living with a limit of a 2-node system ... if I had a 3rd available,
arbiters could, I realize, make life wonderful.)
Post by Will Berkeley
Wait, why are you ejecting a replica set member when it goes down? You
should allow it to come back up (or intervene to bring it back up) and wait
for it reconnect to the set and recover.
-Will
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/c40caada-71f4-42c9-a31c-0a59946de7e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Will Berkeley
2015-03-10 18:39:04 UTC
Permalink
Ah, I misunderstood what you meant when you said "2-node (no arbiters)",
somehow. You really want to run with at lest three nodes - otherwise, you
don't benefit from the automated failover. Can you colocate an arbiter with
a data-bearing node? Obviously, there are still problems if the machine
holding both processes goes down, but it's better than a 2 member replica
set. If you can't have enough nodes for a fault-tolerant system, why are
you running a replica set at all? To try to keep a copy of the data?

-Will
Post by David Paulsen
Yes, I may be off in the weeds here. What I *really *want is for the
(still living) primary to be able to perform updates (inserts, whatever)
while the secondary is recovering, and I'm assuming an outage could be
anywhere from minutes to hours to days, understanding that we're vulnerable
as long as we're in this degraded state. Forcefully evicting the
non-functioning member seemed to be a way to get that behavior. (Again,
I'm living with a limit of a 2-node system ... if I had a 3rd available,
arbiters could, I realize, make life wonderful.)
Post by Will Berkeley
Wait, why are you ejecting a replica set member when it goes down? You
should allow it to come back up (or intervene to bring it back up) and wait
for it reconnect to the set and recover.
-Will
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/c40caada-71f4-42c9-a31c-0a59946de7e8%40googlegroups.com
<https://groups.google.com/d/msgid/mongodb-user/c40caada-71f4-42c9-a31c-0a59946de7e8%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/CALto7gZWhgf0qu0gh0K5-rjd6mHe6xMr9Wv7k415V-50kDiifw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
David Paulsen
2015-03-10 18:47:12 UTC
Permalink
I'd tried out adding arbiters on each node, but saw complaints in the log
to the effect, "I can't elect myself..." At any rate, reading through the
excellent docstrings in pymongo (pymongo/database.py method command) I
sorted out the ability to pass force=True as a kwarg and get the wanted
effect
client.admin.command({'replSetReconfig': cfgDict}, force=True)

LOTS of testing to see if this will play out.
Post by Will Berkeley
Ah, I misunderstood what you meant when you said "2-node (no arbiters)",
somehow. You really want to run with at lest three nodes - otherwise, you
don't benefit from the automated failover. Can you colocate an arbiter with
a data-bearing node? Obviously, there are still problems if the machine
holding both processes goes down, but it's better than a 2 member replica
set. If you can't have enough nodes for a fault-tolerant system, why are
you running a replica set at all? To try to keep a copy of the data?
-Will
Post by David Paulsen
Yes, I may be off in the weeds here. What I *really *want is for the
(still living) primary to be able to perform updates (inserts, whatever)
while the secondary is recovering, and I'm assuming an outage could be
anywhere from minutes to hours to days, understanding that we're vulnerable
as long as we're in this degraded state. Forcefully evicting the
non-functioning member seemed to be a way to get that behavior. (Again,
I'm living with a limit of a 2-node system ... if I had a 3rd available,
arbiters could, I realize, make life wonderful.)
Post by Will Berkeley
Wait, why are you ejecting a replica set member when it goes down? You
should allow it to come back up (or intervene to bring it back up) and wait
for it reconnect to the set and recover.
-Will
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
<javascript:>.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/c40caada-71f4-42c9-a31c-0a59946de7e8%40googlegroups.com
<https://groups.google.com/d/msgid/mongodb-user/c40caada-71f4-42c9-a31c-0a59946de7e8%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/9d1ea5e8-19ae-4fbc-87a0-01b68831ff79%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Will Berkeley
2015-03-10 19:06:07 UTC
Permalink
You would just add 1 arbiter. You want an odd number of members/votes in a
replica set.

-Will
Post by David Paulsen
I'd tried out adding arbiters on each node, but saw complaints in the log
to the effect, "I can't elect myself..." At any rate, reading through the
excellent docstrings in pymongo (pymongo/database.py method command) I
sorted out the ability to pass force=True as a kwarg and get the wanted
effect
client.admin.command({'replSetReconfig': cfgDict}, force=True)
LOTS of testing to see if this will play out.
Post by Will Berkeley
Ah, I misunderstood what you meant when you said "2-node (no arbiters)",
somehow. You really want to run with at lest three nodes - otherwise, you
don't benefit from the automated failover. Can you colocate an arbiter with
a data-bearing node? Obviously, there are still problems if the machine
holding both processes goes down, but it's better than a 2 member replica
set. If you can't have enough nodes for a fault-tolerant system, why are
you running a replica set at all? To try to keep a copy of the data?
-Will
Post by David Paulsen
Yes, I may be off in the weeds here. What I *really *want is for the
(still living) primary to be able to perform updates (inserts, whatever)
while the secondary is recovering, and I'm assuming an outage could be
anywhere from minutes to hours to days, understanding that we're vulnerable
as long as we're in this degraded state. Forcefully evicting the
non-functioning member seemed to be a way to get that behavior. (Again,
I'm living with a limit of a 2-node system ... if I had a 3rd available,
arbiters could, I realize, make life wonderful.)
Post by Will Berkeley
Wait, why are you ejecting a replica set member when it goes down? You
should allow it to come back up (or intervene to bring it back up) and wait
for it reconnect to the set and recover.
-Will
--
You received this message because you are subscribed to the Google
Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/
msgid/mongodb-user/c40caada-71f4-42c9-a31c-0a59946de7e8%
40googlegroups.com
<https://groups.google.com/d/msgid/mongodb-user/c40caada-71f4-42c9-a31c-0a59946de7e8%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/9d1ea5e8-19ae-4fbc-87a0-01b68831ff79%40googlegroups.com
<https://groups.google.com/d/msgid/mongodb-user/9d1ea5e8-19ae-4fbc-87a0-01b68831ff79%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/CALto7ga-KsJtUsuXy2_%2BgMSKPW8OxVXTovcFyEbgby6mCtZFhw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
David Paulsen
2015-03-11 21:32:35 UTC
Permalink
In practice, my little snippet here proves to be unreliable.
client.admin.command({'replSetReconfig': cfgDict}, force=True)

Sometimes it works, other times I'll end up getting stuff like:
'errmsg': 'no such cmd: force', 'code': 59

Something to do with the way pymongo is constructing a SON object...

If it gets to a point where in database.py def _command: command is:
SON([('replSetReconfig', {'version': 2668533, 'members': [{'host':
'sasp-dev-100:27017', '_id': 15}], '_id': 'rs100'}), ('force', True)])
this will work.

However sometimes (now that I want to post about it, I can't reproduce it!)
I've seen errors such as:
{'ok': 0.0, 'bad cmd': {'force': True, 'replSetReconfig': {'_id': 'rs100',
'members': [{'_id': 15, 'host': 'sasp-dev-100:27017'}], 'version':
2668533}}, 'errmsg': 'no such cmd: force', 'code': 59}
... which leads me to believe that the lower levels of pymongo are seeing
the first dict key as "force", an unknown command, as opposed to seeing the
first dict key as "replSetReconfig".
Post by David Paulsen
I'd tried out adding arbiters on each node, but saw complaints in the log
to the effect, "I can't elect myself..." At any rate, reading through the
excellent docstrings in pymongo (pymongo/database.py method command) I
sorted out the ability to pass force=True as a kwarg and get the wanted
effect
client.admin.command({'replSetReconfig': cfgDict}, force=True)
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/4ec7a682-7de9-4ba6-b718-40bb40d7252e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Bernie Hackett
2015-05-27 19:48:42 UTC
Permalink
The reason you are having this problem is that you are passing a python
dict to command. The dict type does not preserve order. The API of
Database.command is meant to work around this problem. This should be how
you call it:

client.admin.command('replSetReconfig', cfgDict, force=True)

The driver will create a SON under the covers. bson.son.SON is very similar
to collections.OrderedDict, but works back to python 2.4.
Post by David Paulsen
In practice, my little snippet here proves to be unreliable.
client.admin.command({'replSetReconfig': cfgDict}, force=True)
'errmsg': 'no such cmd: force', 'code': 59
Something to do with the way pymongo is constructing a SON object...
'sasp-dev-100:27017', '_id': 15}], '_id': 'rs100'}), ('force', True)])
this will work.
However sometimes (now that I want to post about it, I can't reproduce
{'ok': 0.0, 'bad cmd': {'force': True, 'replSetReconfig': {'_id': 'rs100',
2668533}}, 'errmsg': 'no such cmd: force', 'code': 59}
... which leads me to believe that the lower levels of pymongo are seeing
the first dict key as "force", an unknown command, as opposed to seeing the
first dict key as "replSetReconfig".
Post by David Paulsen
I'd tried out adding arbiters on each node, but saw complaints in the log
to the effect, "I can't elect myself..." At any rate, reading through the
excellent docstrings in pymongo (pymongo/database.py method command) I
sorted out the ability to pass force=True as a kwarg and get the wanted
effect
client.admin.command({'replSetReconfig': cfgDict}, force=True)
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/b43817ac-513d-4276-8900-bb0ef6f82351%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
David Paulsen
2015-05-28 19:11:26 UTC
Permalink
Thanks Bernie! Looking back at my code, that's exactly what I ended up
doing:
self.client.admin.command('replSetReconfig', cfgDict, force=True)
Though it's still an open issue whether we're going to take this approach
at all.
Post by Bernie Hackett
The reason you are having this problem is that you are passing a python
dict to command. The dict type does not preserve order. The API of
Database.command is meant to work around this problem. This should be how
client.admin.command('replSetReconfig', cfgDict, force=True)
The driver will create a SON under the covers. bson.son.SON is very
similar to collections.OrderedDict, but works back to python 2.4.
Post by David Paulsen
In practice, my little snippet here proves to be unreliable.
client.admin.command({'replSetReconfig': cfgDict}, force=True)
'errmsg': 'no such cmd: force', 'code': 59
Something to do with the way pymongo is constructing a SON object...
'sasp-dev-100:27017', '_id': 15}], '_id': 'rs100'}), ('force', True)])
this will work.
However sometimes (now that I want to post about it, I can't reproduce
2668533}}, 'errmsg': 'no such cmd: force', 'code': 59}
... which leads me to believe that the lower levels of pymongo are seeing
the first dict key as "force", an unknown command, as opposed to seeing the
first dict key as "replSetReconfig".
Post by David Paulsen
I'd tried out adding arbiters on each node, but saw complaints in the
log to the effect, "I can't elect myself..." At any rate, reading through
the excellent docstrings in pymongo (pymongo/database.py method command) I
sorted out the ability to pass force=True as a kwarg and get the wanted
effect
client.admin.command({'replSetReconfig': cfgDict}, force=True)
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/c2ad99e9-bd4a-4ce1-8788-8d47220deac0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...