Discussion:
2.4.1 & 2.4.3 Replica set primary becomes corrupt after a few hours everyday. Assertions thrown
(too old to reply)
Daniel Hodgin
2013-05-01 19:33:42 UTC
Permalink
I've been running mongodb in production for about 6 months on a single
instance server on the same machine as our web app. We've reached a point
where its time to move to a replica set.

I've setup a replica set internally in a development environment on one
computer with 3 instances on a Windows Server 2008R2 machine.
1 primary, priority 2, on port 27017
1 secondary, priority 1, on port 27018
1 secondary, priority 0, hidden on port 27019

rs0:PRIMARY> rs.conf()
{
"_id" : "rs0",
"version" : 4,
"members" : [
{
"_id" : 0,
"host" : "192.168.0.10:27017",
"priority" : 2
},
{
"_id" : 1,
"host" : "192.168.0.10:27018"
},
{
"_id" : 2,
"host" : "192.168.0.10:27019",
"priority" : 0,
"hidden" : true
}
]
}
rs0:PRIMARY>

My application is built on node.js with Mongoose middleware for connection
to MongoDB.

I start up the application and run an initialization script to fill the
database with about 200 sample records in 25 collections in total. It takes
about 3 seconds to run.

Nothing intense.

This will work when the database on the replica set is freshly created. I
can run our initialization script 10 times in a row with no issues when the
replica set is freshly initialized.

The problem is a few hours later when I run this same initialization script
again I start getting assertions thrown and "bad file number values"

To fix it I have to shut down the primary, delete its entire database
directory, and then turn it back on and have it sync from the secondary.
Then I can run my initialization script again just fine.

Here is a log of what shows up in my mongod console on the primary when I
run the script when it fails:

Wed May 01 11:08:54.425 [conn15] CMD: drop mydb.courses
Wed May 01 11:08:55.183 [conn18] getFile(): n=-2
Wed May 01 11:08:55.185 [conn18] Assertion: 10295:getFile(): bad file
number value (corrupt db?): run repair
Wed May 01 11:08:55.806 [conn18] mongod.exe
...\src\mongo\util\stacktrace.cpp(189)
mongo::printStackTrace+0x3e
Wed May 01 11:08:55.807 [conn18] mongod.exe
...\src\mongo\util\assert_util.cpp(159)
mongo::msgasserted+0xc1
Wed May 01 11:08:55.809 [conn18] mongod.exe
...\src\mongo\db\database.cpp(285)
mongo::Database::getFile+0x4ca
Wed May 01 11:08:55.811 [conn18] mongod.exe
...\src\mongo\db\pdfile.h(654)
mongo::DataFileMgr::getRecord+0x7f
Wed May 01 11:08:55.813 [conn18] mongod.exe
...\src\mongo\db\pdfile.h(592)
mongo::DiskLoc::obj+0x16
Wed May 01 11:08:55.814 [conn18] mongod.exe
...\src\mongo\db\namespace_details-inl.h(89)
mongo::NamespaceDetails::findIndexByName+0xc4
Wed May 01 11:08:55.816 [conn18] mongod.exe
...\src\mongo\db\index.cpp(372)
mongo::prepareToBuildIndex+0x8f4
Wed May 01 11:08:55.818 [conn18] mongod.exe
...\src\mongo\db\pdfile.cpp(1638)
mongo::DataFileMgr::insert+0x3cc
Wed May 01 11:08:55.819 [conn18] mongod.exe
...\src\mongo\db\pdfile.cpp(1328)
mongo::DataFileMgr::insertWithObjMod+0x55
Wed May 01 11:08:55.821 [conn18] mongod.exe
...\src\mongo\db\instance.cpp(796)
mongo::checkAndInsert+0x10b
Wed May 01 11:08:55.822 [conn18] mongod.exe
...\src\mongo\db\instance.cpp(869)
mongo::receivedInsert+0xb0c
Wed May 01 11:08:55.824 [conn18] mongod.exe
...\src\mongo\db\instance.cpp(441)
mongo::assembleResponse+0x57a
Wed May 01 11:08:55.825 [conn18] mongod.exe
...\src\mongo\db\db.cpp(194)
mongo::MyMessageHandler::process+0xfa
Wed May 01 11:08:55.827 [conn18] mongod.exe
...\src\mongo\util\net\message_server_port.cpp(207)
mongo::PortMessageServer::handleIncomingMsg+0x56a
Wed May 01 11:08:55.829 [conn18] mongod.exe
...\src\third_party\boost\libs\thread\src\win32\thread.cpp(180)
boost::`anonymous namespace'::thread_start_function+0x21
Wed May 01 11:08:55.832 [conn18] mongod.exe
f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(314)
_callthreadstartex+0x17
Wed May 01 11:08:55.833 [conn18] mongod.exe
f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(292)
_threadstartex+0x7f
Wed May 01 11:08:55.835 [conn18]
kernel32.dll
BaseThreadInitThunk+0xd
Wed May 01 11:08:55.837 [conn18] insert mydb.system.indexes keyUpdates:0
exception: getFile(): bad file number value (corrupt db?): run repair
code:10295 locks(micros) w:653752 653ms

sometimes I also get this assertion in tandem with the above:
[repl prefetch worker] Assertion: 10334:BSONObj size: 0 (0x00000000) is
invalid. Size must be between 0 and 16793600(16MB) First element: EOO

There is also another Assertion which is sometimes thrown:
[conn1025] mydb warning assertion failure n == 1 src\mongo\db\index.cpp 221


The replica set seems to be working fine when I create it initially and my
application is running on another machine and can connect and send data
back and forth ok. The issue is the primary becoming corrupt after just a
few hours and needing to be wiped and brought back up empty and resynced.

Has anyone else run into this problem with replica sets on Windows Server
2008R2?

I've tried running the mongod processes as a service, or just in a console
window and I have the same problem either way. The dev machine has 16GB of
ram which is more than plenty as mongod is using about 40-45MB each.
--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongodb-user-/***@public.gmane.org
To unsubscribe from this group, send email to
mongodb-user+unsubscribe-/***@public.gmane.org
See also the IRC channel -- freenode.net#mongodb

---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Asya Kamsky
2013-05-01 20:27:55 UTC
Permalink
When you initially create it, is it empty and then you insert data? Or are
you starting with an existing dataset/db directory?
Post by Daniel Hodgin
I've been running mongodb in production for about 6 months on a single
instance server on the same machine as our web app. We've reached a point
where its time to move to a replica set.
I've setup a replica set internally in a development environment on one
computer with 3 instances on a Windows Server 2008R2 machine.
1 primary, priority 2, on port 27017
1 secondary, priority 1, on port 27018
1 secondary, priority 0, hidden on port 27019
rs0:PRIMARY> rs.conf()
{
"_id" : "rs0",
"version" : 4,
"members" : [
{
"_id" : 0,
"host" : "192.168.0.10:27017",
"priority" : 2
},
{
"_id" : 1,
"host" : "192.168.0.10:27018"
},
{
"_id" : 2,
"host" : "192.168.0.10:27019",
"priority" : 0,
"hidden" : true
}
]
}
rs0:PRIMARY>
My application is built on node.js with Mongoose middleware for connection
to MongoDB.
I start up the application and run an initialization script to fill the
database with about 200 sample records in 25 collections in total. It takes
about 3 seconds to run.
Nothing intense.
This will work when the database on the replica set is freshly created. I
can run our initialization script 10 times in a row with no issues when the
replica set is freshly initialized.
The problem is a few hours later when I run this same initialization
script again I start getting assertions thrown and "bad file number values"
To fix it I have to shut down the primary, delete its entire database
directory, and then turn it back on and have it sync from the secondary.
Then I can run my initialization script again just fine.
Here is a log of what shows up in my mongod console on the primary when I
Wed May 01 11:08:54.425 [conn15] CMD: drop mydb.courses
Wed May 01 11:08:55.183 [conn18] getFile(): n=-2
Wed May 01 11:08:55.185 [conn18] Assertion: 10295:getFile(): bad file
number value (corrupt db?): run repair
Wed May 01 11:08:55.806 [conn18] mongod.exe
...\src\mongo\util\stacktrace.cpp(189)
mongo::printStackTrace+0x3e
Wed May 01 11:08:55.807 [conn18] mongod.exe
...\src\mongo\util\assert_util.cpp(159)
mongo::msgasserted+0xc1
Wed May 01 11:08:55.809 [conn18] mongod.exe
...\src\mongo\db\database.cpp(285)
mongo::Database::getFile+0x4ca
Wed May 01 11:08:55.811 [conn18] mongod.exe
...\src\mongo\db\pdfile.h(654)
mongo::DataFileMgr::getRecord+0x7f
Wed May 01 11:08:55.813 [conn18] mongod.exe
...\src\mongo\db\pdfile.h(592)
mongo::DiskLoc::obj+0x16
Wed May 01 11:08:55.814 [conn18] mongod.exe
...\src\mongo\db\namespace_details-inl.h(89)
mongo::NamespaceDetails::findIndexByName+0xc4
Wed May 01 11:08:55.816 [conn18] mongod.exe
...\src\mongo\db\index.cpp(372)
mongo::prepareToBuildIndex+0x8f4
Wed May 01 11:08:55.818 [conn18] mongod.exe
...\src\mongo\db\pdfile.cpp(1638)
mongo::DataFileMgr::insert+0x3cc
Wed May 01 11:08:55.819 [conn18] mongod.exe
...\src\mongo\db\pdfile.cpp(1328)
mongo::DataFileMgr::insertWithObjMod+0x55
Wed May 01 11:08:55.821 [conn18] mongod.exe
...\src\mongo\db\instance.cpp(796)
mongo::checkAndInsert+0x10b
Wed May 01 11:08:55.822 [conn18] mongod.exe
...\src\mongo\db\instance.cpp(869)
mongo::receivedInsert+0xb0c
Wed May 01 11:08:55.824 [conn18] mongod.exe
...\src\mongo\db\instance.cpp(441)
mongo::assembleResponse+0x57a
Wed May 01 11:08:55.825 [conn18] mongod.exe
...\src\mongo\db\db.cpp(194)
mongo::MyMessageHandler::process+0xfa
Wed May 01 11:08:55.827 [conn18] mongod.exe
...\src\mongo\util\net\message_server_port.cpp(207)
mongo::PortMessageServer::handleIncomingMsg+0x56a
Wed May 01 11:08:55.829 [conn18] mongod.exe
...\src\third_party\boost\libs\thread\src\win32\thread.cpp(180)
boost::`anonymous namespace'::thread_start_function+0x21
Wed May 01 11:08:55.832 [conn18] mongod.exe
f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(314)
_callthreadstartex+0x17
Wed May 01 11:08:55.833 [conn18] mongod.exe
f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(292)
_threadstartex+0x7f
Wed May 01 11:08:55.835 [conn18]
kernel32.dll
BaseThreadInitThunk+0xd
Wed May 01 11:08:55.837 [conn18] insert mydb.system.indexes keyUpdates:0
exception: getFile(): bad file number value (corrupt db?): run repair
code:10295 locks(micros) w:653752 653ms
[repl prefetch worker] Assertion: 10334:BSONObj size: 0 (0x00000000) is
invalid. Size must be between 0 and 16793600(16MB) First element: EOO
[conn1025] mydb warning assertion failure n == 1 src\mongo\db\index.cpp 221
The replica set seems to be working fine when I create it initially and my
application is running on another machine and can connect and send data
back and forth ok. The issue is the primary becoming corrupt after just a
few hours and needing to be wiped and brought back up empty and resynced.
Has anyone else run into this problem with replica sets on Windows Server
2008R2?
I've tried running the mongod processes as a service, or just in a console
window and I have the same problem either way. The dev machine has 16GB of
ram which is more than plenty as mongod is using about 40-45MB each.
--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongodb-user-/***@public.gmane.org
To unsubscribe from this group, send email to
mongodb-user+unsubscribe-/***@public.gmane.org
See also the IRC channel -- freenode.net#mongodb

---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Asya Kamsky
2013-05-01 20:30:08 UTC
Permalink
Sorry, I just realized that your description of data load is in the
original message.

Just to make sure I didn't miss anything, you run insert script several
times against a blank/new mongod then you run it again later (with no
intervening inserts/updates) and you get this corruption? What version is
the driver you are using?


Asya
Post by Asya Kamsky
When you initially create it, is it empty and then you insert data? Or
are you starting with an existing dataset/db directory?
Post by Daniel Hodgin
I've been running mongodb in production for about 6 months on a single
instance server on the same machine as our web app. We've reached a point
where its time to move to a replica set.
I've setup a replica set internally in a development environment on one
computer with 3 instances on a Windows Server 2008R2 machine.
1 primary, priority 2, on port 27017
1 secondary, priority 1, on port 27018
1 secondary, priority 0, hidden on port 27019
rs0:PRIMARY> rs.conf()
{
"_id" : "rs0",
"version" : 4,
"members" : [
{
"_id" : 0,
"host" : "192.168.0.10:27017",
"priority" : 2
},
{
"_id" : 1,
"host" : "192.168.0.10:27018"
},
{
"_id" : 2,
"host" : "192.168.0.10:27019",
"priority" : 0,
"hidden" : true
}
]
}
rs0:PRIMARY>
My application is built on node.js with Mongoose middleware for
connection to MongoDB.
I start up the application and run an initialization script to fill the
database with about 200 sample records in 25 collections in total. It takes
about 3 seconds to run.
Nothing intense.
This will work when the database on the replica set is freshly created. I
can run our initialization script 10 times in a row with no issues when the
replica set is freshly initialized.
The problem is a few hours later when I run this same initialization
script again I start getting assertions thrown and "bad file number values"
To fix it I have to shut down the primary, delete its entire database
directory, and then turn it back on and have it sync from the secondary.
Then I can run my initialization script again just fine.
Here is a log of what shows up in my mongod console on the primary when I
Wed May 01 11:08:54.425 [conn15] CMD: drop mydb.courses
Wed May 01 11:08:55.183 [conn18] getFile(): n=-2
Wed May 01 11:08:55.185 [conn18] Assertion: 10295:getFile(): bad file
number value (corrupt db?): run repair
Wed May 01 11:08:55.806 [conn18] mongod.exe
...\src\mongo\util\stacktrace.cpp(189)
mongo::printStackTrace+0x3e
Wed May 01 11:08:55.807 [conn18] mongod.exe
...\src\mongo\util\assert_util.cpp(159)
mongo::msgasserted+0xc1
Wed May 01 11:08:55.809 [conn18] mongod.exe
...\src\mongo\db\database.cpp(285)
mongo::Database::getFile+0x4ca
Wed May 01 11:08:55.811 [conn18] mongod.exe
...\src\mongo\db\pdfile.h(654)
mongo::DataFileMgr::getRecord+0x7f
Wed May 01 11:08:55.813 [conn18] mongod.exe
...\src\mongo\db\pdfile.h(592)
mongo::DiskLoc::obj+0x16
Wed May 01 11:08:55.814 [conn18] mongod.exe
...\src\mongo\db\namespace_details-inl.h(89)
mongo::NamespaceDetails::findIndexByName+0xc4
Wed May 01 11:08:55.816 [conn18] mongod.exe
...\src\mongo\db\index.cpp(372)
mongo::prepareToBuildIndex+0x8f4
Wed May 01 11:08:55.818 [conn18] mongod.exe
...\src\mongo\db\pdfile.cpp(1638)
mongo::DataFileMgr::insert+0x3cc
Wed May 01 11:08:55.819 [conn18] mongod.exe
...\src\mongo\db\pdfile.cpp(1328)
mongo::DataFileMgr::insertWithObjMod+0x55
Wed May 01 11:08:55.821 [conn18] mongod.exe
...\src\mongo\db\instance.cpp(796)
mongo::checkAndInsert+0x10b
Wed May 01 11:08:55.822 [conn18] mongod.exe
...\src\mongo\db\instance.cpp(869)
mongo::receivedInsert+0xb0c
Wed May 01 11:08:55.824 [conn18] mongod.exe
...\src\mongo\db\instance.cpp(441)
mongo::assembleResponse+0x57a
Wed May 01 11:08:55.825 [conn18] mongod.exe
...\src\mongo\db\db.cpp(194)
mongo::MyMessageHandler::process+0xfa
Wed May 01 11:08:55.827 [conn18] mongod.exe
...\src\mongo\util\net\message_server_port.cpp(207)
mongo::PortMessageServer::handleIncomingMsg+0x56a
Wed May 01 11:08:55.829 [conn18] mongod.exe
...\src\third_party\boost\libs\thread\src\win32\thread.cpp(180)
boost::`anonymous namespace'::thread_start_function+0x21
Wed May 01 11:08:55.832 [conn18] mongod.exe
f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(314)
_callthreadstartex+0x17
Wed May 01 11:08:55.833 [conn18] mongod.exe
f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(292)
_threadstartex+0x7f
Wed May 01 11:08:55.835 [conn18]
kernel32.dll
BaseThreadInitThunk+0xd
Wed May 01 11:08:55.837 [conn18] insert mydb.system.indexes keyUpdates:0
exception: getFile(): bad file number value (corrupt db?): run repair
code:10295 locks(micros) w:653752 653ms
[repl prefetch worker] Assertion: 10334:BSONObj size: 0 (0x00000000) is
invalid. Size must be between 0 and 16793600(16MB) First element: EOO
[conn1025] mydb warning assertion failure n == 1 src\mongo\db\index.cpp 221
The replica set seems to be working fine when I create it initially and
my application is running on another machine and can connect and send data
back and forth ok. The issue is the primary becoming corrupt after just a
few hours and needing to be wiped and brought back up empty and resynced.
Has anyone else run into this problem with replica sets on Windows Server
2008R2?
I've tried running the mongod processes as a service, or just in a
console window and I have the same problem either way. The dev machine has
16GB of ram which is more than plenty as mongod is using about 40-45MB each.
--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongodb-user-/***@public.gmane.org
To unsubscribe from this group, send email to
mongodb-user+unsubscribe-/***@public.gmane.org
See also the IRC channel -- freenode.net#mongodb

---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Daniel Hodgin
2013-05-01 20:38:04 UTC
Permalink
I initially create it with an empty database.

I add an admin user to the admin database for authentication, and otherwise
the local and my db are empty. then i add my sample data.

The second time i run my initialization script it contains code to drop all
my collections and re-add the data back in. When the uptime is low (under
about 4 hours) everything works ok

after a certain amount of time everything is corrupt on the primary in
mydb. (admin and local db's are fine from what i can tell examining the
data in mongovue)

This is a 64bit install by the way and here is my config file contents:
bind_ip = 192.168.0.10,localhost
port = 27017
oplogSize = 2048
dbpath = C:\mongodbs\mydb_1
logappend = true
replSet = rs0
keyFile = /Dropbox/mongodb_conf/keyfile
auth = true
rest = true
Post by Asya Kamsky
When you initially create it, is it empty and then you insert data? Or
are you starting with an existing dataset/db directory?
Post by Daniel Hodgin
I've been running mongodb in production for about 6 months on a single
instance server on the same machine as our web app. We've reached a point
where its time to move to a replica set.
I've setup a replica set internally in a development environment on one
computer with 3 instances on a Windows Server 2008R2 machine.
1 primary, priority 2, on port 27017
1 secondary, priority 1, on port 27018
1 secondary, priority 0, hidden on port 27019
rs0:PRIMARY> rs.conf()
{
"_id" : "rs0",
"version" : 4,
"members" : [
{
"_id" : 0,
"host" : "192.168.0.10:27017",
"priority" : 2
},
{
"_id" : 1,
"host" : "192.168.0.10:27018"
},
{
"_id" : 2,
"host" : "192.168.0.10:27019",
"priority" : 0,
"hidden" : true
}
]
}
rs0:PRIMARY>
My application is built on node.js with Mongoose middleware for
connection to MongoDB.
I start up the application and run an initialization script to fill the
database with about 200 sample records in 25 collections in total. It takes
about 3 seconds to run.
Nothing intense.
This will work when the database on the replica set is freshly created. I
can run our initialization script 10 times in a row with no issues when the
replica set is freshly initialized.
The problem is a few hours later when I run this same initialization
script again I start getting assertions thrown and "bad file number values"
To fix it I have to shut down the primary, delete its entire database
directory, and then turn it back on and have it sync from the secondary.
Then I can run my initialization script again just fine.
Here is a log of what shows up in my mongod console on the primary when I
Wed May 01 11:08:54.425 [conn15] CMD: drop mydb.courses
Wed May 01 11:08:55.183 [conn18] getFile(): n=-2
Wed May 01 11:08:55.185 [conn18] Assertion: 10295:getFile(): bad file
number value (corrupt db?): run repair
Wed May 01 11:08:55.806 [conn18] mongod.exe
...\src\mongo\util\stacktrace.cpp(189)
mongo::printStackTrace+0x3e
Wed May 01 11:08:55.807 [conn18] mongod.exe
...\src\mongo\util\assert_util.cpp(159)
mongo::msgasserted+0xc1
Wed May 01 11:08:55.809 [conn18] mongod.exe
...\src\mongo\db\database.cpp(285)
mongo::Database::getFile+0x4ca
Wed May 01 11:08:55.811 [conn18] mongod.exe
...\src\mongo\db\pdfile.h(654)
mongo::DataFileMgr::getRecord+0x7f
Wed May 01 11:08:55.813 [conn18] mongod.exe
...\src\mongo\db\pdfile.h(592)
mongo::DiskLoc::obj+0x16
Wed May 01 11:08:55.814 [conn18] mongod.exe
...\src\mongo\db\namespace_details-inl.h(89)
mongo::NamespaceDetails::findIndexByName+0xc4
Wed May 01 11:08:55.816 [conn18] mongod.exe
...\src\mongo\db\index.cpp(372)
mongo::prepareToBuildIndex+0x8f4
Wed May 01 11:08:55.818 [conn18] mongod.exe
...\src\mongo\db\pdfile.cpp(1638)
mongo::DataFileMgr::insert+0x3cc
Wed May 01 11:08:55.819 [conn18] mongod.exe
...\src\mongo\db\pdfile.cpp(1328)
mongo::DataFileMgr::insertWithObjMod+0x55
Wed May 01 11:08:55.821 [conn18] mongod.exe
...\src\mongo\db\instance.cpp(796)
mongo::checkAndInsert+0x10b
Wed May 01 11:08:55.822 [conn18] mongod.exe
...\src\mongo\db\instance.cpp(869)
mongo::receivedInsert+0xb0c
Wed May 01 11:08:55.824 [conn18] mongod.exe
...\src\mongo\db\instance.cpp(441)
mongo::assembleResponse+0x57a
Wed May 01 11:08:55.825 [conn18] mongod.exe
...\src\mongo\db\db.cpp(194)
mongo::MyMessageHandler::process+0xfa
Wed May 01 11:08:55.827 [conn18] mongod.exe
...\src\mongo\util\net\message_server_port.cpp(207)
mongo::PortMessageServer::handleIncomingMsg+0x56a
Wed May 01 11:08:55.829 [conn18] mongod.exe
...\src\third_party\boost\libs\thread\src\win32\thread.cpp(180)
boost::`anonymous namespace'::thread_start_function+0x21
Wed May 01 11:08:55.832 [conn18] mongod.exe
f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(314)
_callthreadstartex+0x17
Wed May 01 11:08:55.833 [conn18] mongod.exe
f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(292)
_threadstartex+0x7f
Wed May 01 11:08:55.835 [conn18]
kernel32.dll
BaseThreadInitThunk+0xd
Wed May 01 11:08:55.837 [conn18] insert mydb.system.indexes keyUpdates:0
exception: getFile(): bad file number value (corrupt db?): run repair
code:10295 locks(micros) w:653752 653ms
[repl prefetch worker] Assertion: 10334:BSONObj size: 0 (0x00000000) is
invalid. Size must be between 0 and 16793600(16MB) First element: EOO
[conn1025] mydb warning assertion failure n == 1 src\mongo\db\index.cpp 221
The replica set seems to be working fine when I create it initially and
my application is running on another machine and can connect and send data
back and forth ok. The issue is the primary becoming corrupt after just a
few hours and needing to be wiped and brought back up empty and resynced.
Has anyone else run into this problem with replica sets on Windows Server
2008R2?
I've tried running the mongod processes as a service, or just in a
console window and I have the same problem either way. The dev machine has
16GB of ram which is more than plenty as mongod is using about 40-45MB each.
--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongodb-user-/***@public.gmane.org
To unsubscribe from this group, send email to
mongodb-user+unsubscribe-/***@public.gmane.org
See also the IRC channel -- freenode.net#mongodb

---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Asya Kamsky
2013-05-01 20:33:49 UTC
Permalink
One more question (possibly most important one :) ).

Where is the data/db file located? Is it on a local drive or some sort of
network mounted file system?
Post by Daniel Hodgin
I've been running mongodb in production for about 6 months on a single
instance server on the same machine as our web app. We've reached a point
where its time to move to a replica set.
I've setup a replica set internally in a development environment on one
computer with 3 instances on a Windows Server 2008R2 machine.
1 primary, priority 2, on port 27017
1 secondary, priority 1, on port 27018
1 secondary, priority 0, hidden on port 27019
rs0:PRIMARY> rs.conf()
{
"_id" : "rs0",
"version" : 4,
"members" : [
{
"_id" : 0,
"host" : "192.168.0.10:27017",
"priority" : 2
},
{
"_id" : 1,
"host" : "192.168.0.10:27018"
},
{
"_id" : 2,
"host" : "192.168.0.10:27019",
"priority" : 0,
"hidden" : true
}
]
}
rs0:PRIMARY>
My application is built on node.js with Mongoose middleware for connection
to MongoDB.
I start up the application and run an initialization script to fill the
database with about 200 sample records in 25 collections in total. It takes
about 3 seconds to run.
Nothing intense.
This will work when the database on the replica set is freshly created. I
can run our initialization script 10 times in a row with no issues when the
replica set is freshly initialized.
The problem is a few hours later when I run this same initialization
script again I start getting assertions thrown and "bad file number values"
To fix it I have to shut down the primary, delete its entire database
directory, and then turn it back on and have it sync from the secondary.
Then I can run my initialization script again just fine.
Here is a log of what shows up in my mongod console on the primary when I
Wed May 01 11:08:54.425 [conn15] CMD: drop mydb.courses
Wed May 01 11:08:55.183 [conn18] getFile(): n=-2
Wed May 01 11:08:55.185 [conn18] Assertion: 10295:getFile(): bad file
number value (corrupt db?): run repair
Wed May 01 11:08:55.806 [conn18] mongod.exe
...\src\mongo\util\stacktrace.cpp(189)
mongo::printStackTrace+0x3e
Wed May 01 11:08:55.807 [conn18] mongod.exe
...\src\mongo\util\assert_util.cpp(159)
mongo::msgasserted+0xc1
Wed May 01 11:08:55.809 [conn18] mongod.exe
...\src\mongo\db\database.cpp(285)
mongo::Database::getFile+0x4ca
Wed May 01 11:08:55.811 [conn18] mongod.exe
...\src\mongo\db\pdfile.h(654)
mongo::DataFileMgr::getRecord+0x7f
Wed May 01 11:08:55.813 [conn18] mongod.exe
...\src\mongo\db\pdfile.h(592)
mongo::DiskLoc::obj+0x16
Wed May 01 11:08:55.814 [conn18] mongod.exe
...\src\mongo\db\namespace_details-inl.h(89)
mongo::NamespaceDetails::findIndexByName+0xc4
Wed May 01 11:08:55.816 [conn18] mongod.exe
...\src\mongo\db\index.cpp(372)
mongo::prepareToBuildIndex+0x8f4
Wed May 01 11:08:55.818 [conn18] mongod.exe
...\src\mongo\db\pdfile.cpp(1638)
mongo::DataFileMgr::insert+0x3cc
Wed May 01 11:08:55.819 [conn18] mongod.exe
...\src\mongo\db\pdfile.cpp(1328)
mongo::DataFileMgr::insertWithObjMod+0x55
Wed May 01 11:08:55.821 [conn18] mongod.exe
...\src\mongo\db\instance.cpp(796)
mongo::checkAndInsert+0x10b
Wed May 01 11:08:55.822 [conn18] mongod.exe
...\src\mongo\db\instance.cpp(869)
mongo::receivedInsert+0xb0c
Wed May 01 11:08:55.824 [conn18] mongod.exe
...\src\mongo\db\instance.cpp(441)
mongo::assembleResponse+0x57a
Wed May 01 11:08:55.825 [conn18] mongod.exe
...\src\mongo\db\db.cpp(194)
mongo::MyMessageHandler::process+0xfa
Wed May 01 11:08:55.827 [conn18] mongod.exe
...\src\mongo\util\net\message_server_port.cpp(207)
mongo::PortMessageServer::handleIncomingMsg+0x56a
Wed May 01 11:08:55.829 [conn18] mongod.exe
...\src\third_party\boost\libs\thread\src\win32\thread.cpp(180)
boost::`anonymous namespace'::thread_start_function+0x21
Wed May 01 11:08:55.832 [conn18] mongod.exe
f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(314)
_callthreadstartex+0x17
Wed May 01 11:08:55.833 [conn18] mongod.exe
f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(292)
_threadstartex+0x7f
Wed May 01 11:08:55.835 [conn18]
kernel32.dll
BaseThreadInitThunk+0xd
Wed May 01 11:08:55.837 [conn18] insert mydb.system.indexes keyUpdates:0
exception: getFile(): bad file number value (corrupt db?): run repair
code:10295 locks(micros) w:653752 653ms
[repl prefetch worker] Assertion: 10334:BSONObj size: 0 (0x00000000) is
invalid. Size must be between 0 and 16793600(16MB) First element: EOO
[conn1025] mydb warning assertion failure n == 1 src\mongo\db\index.cpp 221
The replica set seems to be working fine when I create it initially and my
application is running on another machine and can connect and send data
back and forth ok. The issue is the primary becoming corrupt after just a
few hours and needing to be wiped and brought back up empty and resynced.
Has anyone else run into this problem with replica sets on Windows Server
2008R2?
I've tried running the mongod processes as a service, or just in a console
window and I have the same problem either way. The dev machine has 16GB of
ram which is more than plenty as mongod is using about 40-45MB each.
--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongodb-user-/***@public.gmane.org
To unsubscribe from this group, send email to
mongodb-user+unsubscribe-/***@public.gmane.org
See also the IRC channel -- freenode.net#mongodb

---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Daniel Hodgin
2013-05-01 20:46:31 UTC
Permalink
Thanks for your reply's

The data/db folder is on the local c: same as the keyfile and config file
The initialization script will always drop all the collections in my db
(except system.users) and then add my sample data back in and recreate
indexes for unique records on specific keys

I can run the app for a while saving or creating other records and then
again initialze the db to wipe it clean and reset to my clean state. then
after a few hours the initialization starts throwing the bad file
assertions. the disk subsystem on that machine is a RAID 1 mirrored hdd set
with one drive (C:) everything for this mongodb install is on c:. no
network involvement.

my node app uses the following versions of the mongodb and mongoose modules
mongodb: >= 1.1.4
mongoose: 3.6.5

I'm not sure if now that we use mongoose that mongoose uses its own mongodb
driver as a dependancy or if our mongodb v 1.1.4 driver is whats used with
mongoose. I'll look into that. the mongodb driver 1.1.4 is probably quite
old now! Now that I think of it i haven't updated that in several months.
--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongodb-user-/***@public.gmane.org
To unsubscribe from this group, send email to
mongodb-user+unsubscribe-/***@public.gmane.org
See also the IRC channel -- freenode.net#mongodb

---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Daniel Hodgin
2013-05-01 20:52:44 UTC
Permalink
Mongoose 3.6.5 has a dependancy of mongodb driver version 1.2.14 that is
used with mongoose so it appears my 1.1.4 version we are not even using
right now since we have switched over to making connections to mongo
through mongoose in our app

Mongoose 3.6.8 now uses mongodb driver 1.3.0. so I will try updating soon
Post by Daniel Hodgin
Thanks for your reply's
The data/db folder is on the local c: same as the keyfile and config file
The initialization script will always drop all the collections in my db
(except system.users) and then add my sample data back in and recreate
indexes for unique records on specific keys
I can run the app for a while saving or creating other records and then
again initialze the db to wipe it clean and reset to my clean state. then
after a few hours the initialization starts throwing the bad file
assertions. the disk subsystem on that machine is a RAID 1 mirrored hdd set
with one drive (C:) everything for this mongodb install is on c:. no
network involvement.
my node app uses the following versions of the mongodb and mongoose modules
mongodb: >= 1.1.4
mongoose: 3.6.5
I'm not sure if now that we use mongoose that mongoose uses its own
mongodb driver as a dependancy or if our mongodb v 1.1.4 driver is whats
used with mongoose. I'll look into that. the mongodb driver 1.1.4 is
probably quite old now! Now that I think of it i haven't updated that in
several months.
--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongodb-user-/***@public.gmane.org
To unsubscribe from this group, send email to
mongodb-user+unsubscribe-/***@public.gmane.org
See also the IRC channel -- freenode.net#mongodb

---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Asya Kamsky
2013-05-02 18:33:46 UTC
Permalink
Since you can reproduce this, can you wait till you get corruption symptoms
and then run validate(true) on the DB?
http://docs.mongodb.org/manual/reference/command/validate/
Post by Daniel Hodgin
Mongoose 3.6.5 has a dependancy of mongodb driver version 1.2.14 that is
used with mongoose so it appears my 1.1.4 version we are not even using
right now since we have switched over to making connections to mongo
through mongoose in our app
Mongoose 3.6.8 now uses mongodb driver 1.3.0. so I will try updating soon
Post by Daniel Hodgin
Thanks for your reply's
The data/db folder is on the local c: same as the keyfile and config file
The initialization script will always drop all the collections in my db
(except system.users) and then add my sample data back in and recreate
indexes for unique records on specific keys
I can run the app for a while saving or creating other records and then
again initialze the db to wipe it clean and reset to my clean state. then
after a few hours the initialization starts throwing the bad file
assertions. the disk subsystem on that machine is a RAID 1 mirrored hdd set
with one drive (C:) everything for this mongodb install is on c:. no
network involvement.
my node app uses the following versions of the mongodb and mongoose modules
mongodb: >= 1.1.4
mongoose: 3.6.5
I'm not sure if now that we use mongoose that mongoose uses its own
mongodb driver as a dependancy or if our mongodb v 1.1.4 driver is whats
used with mongoose. I'll look into that. the mongodb driver 1.1.4 is
probably quite old now! Now that I think of it i haven't updated that in
several months.
--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongodb-user-/***@public.gmane.org
To unsubscribe from this group, send email to
mongodb-user+unsubscribe-/***@public.gmane.org
See also the IRC channel -- freenode.net#mongodb

---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Daniel Hodgin
2013-05-02 20:31:07 UTC
Permalink
When i log in to the mongo shell and run show collections after the
corruption starts i see that half my collections are gone completely. so
running validate on them is not possible as they dont exist anymore.

I also realized that when i do wipe the primary and restore from secondary
that half my data is missing. It does not restore the database back to its
uncorrupt state. data is lost.

However once wiped, the primary can be written to again.

Since I'm only working with test data I haven't lost anything yet but there
is no way I can roll a replica set out to production without know what is
causing this.

We are investigating our initialization script and how it creates indexes
to see if we can isolate the problem further

our initialization script ensures some indexes but since those are created
async sometimes our script ends before the indexes are completed building.

when we shutdown the server and restart it the indexes then get rebuilt
because we have a function that ensures all our indexes exist when we first
make a connection to the db in the app on startup.
Post by Asya Kamsky
Since you can reproduce this, can you wait till you get corruption
symptoms and then run validate(true) on the DB?
http://docs.mongodb.org/manual/reference/command/validate/
Post by Daniel Hodgin
Mongoose 3.6.5 has a dependancy of mongodb driver version 1.2.14 that is
used with mongoose so it appears my 1.1.4 version we are not even using
right now since we have switched over to making connections to mongo
through mongoose in our app
Mongoose 3.6.8 now uses mongodb driver 1.3.0. so I will try updating soon
Post by Daniel Hodgin
Thanks for your reply's
The data/db folder is on the local c: same as the keyfile and config file
The initialization script will always drop all the collections in my db
(except system.users) and then add my sample data back in and recreate
indexes for unique records on specific keys
I can run the app for a while saving or creating other records and then
again initialze the db to wipe it clean and reset to my clean state. then
after a few hours the initialization starts throwing the bad file
assertions. the disk subsystem on that machine is a RAID 1 mirrored hdd set
with one drive (C:) everything for this mongodb install is on c:. no
network involvement.
my node app uses the following versions of the mongodb and mongoose modules
mongodb: >= 1.1.4
mongoose: 3.6.5
I'm not sure if now that we use mongoose that mongoose uses its own
mongodb driver as a dependancy or if our mongodb v 1.1.4 driver is whats
used with mongoose. I'll look into that. the mongodb driver 1.1.4 is
probably quite old now! Now that I think of it i haven't updated that in
several months.
--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongodb-user-/***@public.gmane.org
To unsubscribe from this group, send email to
mongodb-user+unsubscribe-/***@public.gmane.org
See also the IRC channel -- freenode.net#mongodb

---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Asya Kamsky
2013-05-03 01:45:44 UTC
Permalink
Rerunning ensureIndex() won't reindex if the index already exists, it's
just a no-op.
I'm curious about what your test script does though - what writeConcern are
you using? Safe or unsafe writes?

You might need to open a SUPPORT ticket (in community private project in
Jira) to upload your script and logs to be able to track this down...

Asya
Post by Daniel Hodgin
When i log in to the mongo shell and run show collections after the
corruption starts i see that half my collections are gone completely. so
running validate on them is not possible as they dont exist anymore.
I also realized that when i do wipe the primary and restore from secondary
that half my data is missing. It does not restore the database back to its
uncorrupt state. data is lost.
However once wiped, the primary can be written to again.
Since I'm only working with test data I haven't lost anything yet but
there is no way I can roll a replica set out to production without know
what is causing this.
We are investigating our initialization script and how it creates indexes
to see if we can isolate the problem further
our initialization script ensures some indexes but since those are created
async sometimes our script ends before the indexes are completed building.
when we shutdown the server and restart it the indexes then get rebuilt
because we have a function that ensures all our indexes exist when we first
make a connection to the db in the app on startup.
Post by Asya Kamsky
Since you can reproduce this, can you wait till you get corruption
symptoms and then run validate(true) on the DB?
http://docs.mongodb.org/manual/reference/command/validate/
Post by Daniel Hodgin
Mongoose 3.6.5 has a dependancy of mongodb driver version 1.2.14 that is
used with mongoose so it appears my 1.1.4 version we are not even using
right now since we have switched over to making connections to mongo
through mongoose in our app
Mongoose 3.6.8 now uses mongodb driver 1.3.0. so I will try updating soon
Post by Daniel Hodgin
Thanks for your reply's
The data/db folder is on the local c: same as the keyfile and config file
The initialization script will always drop all the collections in my db
(except system.users) and then add my sample data back in and recreate
indexes for unique records on specific keys
I can run the app for a while saving or creating other records and then
again initialze the db to wipe it clean and reset to my clean state. then
after a few hours the initialization starts throwing the bad file
assertions. the disk subsystem on that machine is a RAID 1 mirrored hdd set
with one drive (C:) everything for this mongodb install is on c:. no
network involvement.
my node app uses the following versions of the mongodb and mongoose modules
mongodb: >= 1.1.4
mongoose: 3.6.5
I'm not sure if now that we use mongoose that mongoose uses its own
mongodb driver as a dependancy or if our mongodb v 1.1.4 driver is whats
used with mongoose. I'll look into that. the mongodb driver 1.1.4 is
probably quite old now! Now that I think of it i haven't updated that in
several months.
--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongodb-user-/***@public.gmane.org
To unsubscribe from this group, send email to
mongodb-user+unsubscribe-/***@public.gmane.org
See also the IRC channel -- freenode.net#mongodb

---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Daniel Hodgin
2013-05-06 22:13:17 UTC
Permalink
I've made some progress on this issue today.

We updated our initialization script on Friday to make our indexes create,
and then once complete, insert our sample data, and then finish up.

Before we were asynchronously calling ensureindexes on all our collections
and also we were calling it when we started the server. Our script would
sometimes end without actually finishing creating all the indexes. When we
would restart the server the remaining missing indexes would be created
because we had an ensureindex call in our db connection constructor.

we have removed the ensureindex on server startup and made sure the code to
ensureindexes in our initialization script finishes before we move on to
inserting data

we have not been able to reproduce the corruption issues Friday or all day
today on this new script version

I reverted back to a commit from Thursday before the script fixes and
within an hour or so was able to reproduce the corruption issues again.

When I have time in the next few days I will try to reduce down again.
--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongodb-user-/***@public.gmane.org
To unsubscribe from this group, send email to
mongodb-user+unsubscribe-/***@public.gmane.org
See also the IRC channel -- freenode.net#mongodb

---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Continue reading on narkive:
Search results for '2.4.1 & 2.4.3 Replica set primary becomes corrupt after a few hours everyday. Assertions thrown' (newsgroups and mailing lists)
61
replies
Stability in Feedback Amplifiers, Part Deux-A
started 2007-05-06 11:50:38 UTC
rec.audio.tubes
29
replies
The List
started 2012-11-12 00:37:42 UTC
squidlist@lists.laughingsquid.org
Loading...