Discussion:
[mongodb-user] MongoDB Performance Issue During Background Flush on CentOS
Rhys Domingo
2014-10-31 07:21:47 UTC
Permalink
Hello,

A few days ago we experienced a strange issue. Database writes and logins
took longer than a minute and the primary server appeared as down to the
secondary replicas. Performance in MongoDB was so bad that our client
application was not able to connect to the MongoDB database. There was no
unusual database activity and none of the physical resources were stressed
on the server. I checked network logs and SAN activity with our storage
vendor and nothing appeared out of the norm. All evidence indicates that
this was caused by the MongoDB background flush operation. I've seen
reports online with this happening on a Windows server, but it doesn't
apply in our case since we're running on CentOS 6.5. Any insight would be
greatly appreciated.

Our production configuration is as follows. MongoDB 2.4.10, 3 node replca
set, 24 logical CPUs, 144GB RAM, RAID 5, CentOS 6.5, C# driver

Logs captured during event:
A connection attempt failed because the connected party did not properly
respond after a period of time, or established connection failed because
connected host has failed to respond XX.XXX.XXX.XX1:27017 at
System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot,
SocketAddress socketAddress) at System.Net.Sockets.Socket.Connect(EndPoint
remoteEP) at System.Net.Sockets.TcpClient.Connect(IPEndPoint remoteEP) at
MongoDB.Driver.Internal.MongoConnection.Open()

Mon Oct 13 17:51:21.139 [rsHealthPoll] replset info XX.XXX.XXX.XX2:27017
thinks that we are down
Mon Oct 13 17:51:45.148 [rsHealthPoll] replset info XX.XXX.XXX.XX3:27017
thinks that we are down

Mon Oct 13 17:52:09.125 [conn1384510] command admin.$cmd command: {
authenticate: 1, nonce: "XXX", user: "master", key: "XXX" } ntoreturn: 1
keyUpdates:0 locks(micros) r:126 reslen:72 24473ms
Mon Oct 13 17:52:09.122 [conn2222382] update DialIQ.AdapterLegStatus query:
{ _id: "XXX" } update: { _id: "XXX", _t: "AgentLegLockableItem", StateLogs:
{}, Client: { ClientId: "XXX", ProviderId: null, UseFakeProvider: false,
TwilioAccountSid: "XXX", TwilioAuthToken: "XXX", CallRecordingEnabled:
false, GhostNumber: "XXX", GhostNumberEnabled: false,
MaxSimultaneousTwilioRequests: 0 }, CallId : "XXX", ParentCallId: null,
AgentId: "XXX", ConferenceId: "XXX", ExternalRefId: null, DurationSeconds:
0, RecordingUrl: null, StartTime: new Date(1413222610034), EndTime: new
Date(-62135596800000), Type: 1, CallType: 1, Disavowed: false, PhoneNumber:
"XXX", PhoneExtension: "", CallerId: "XXX", RingTimeout: 16,
LastSavedState: 0, ResponseToProvider: null, RecordingEnabled: false,
Agent: { AgentId: "XXX", _id: null, Phones: [ { Phone: "XXX",
PhoneExtension: "", PhoneType: "Dialer" } ], CallerId: "XXX", RingTimeout:
16, CallLegId: "XXX", ExternalRefId: null, CallPersistenceSeconds:16,
SoftPhone: null }, PromptTimeout: 0, WhisperMessage: null, ResultMessage:
null, PromtpToAcceptTransfer: false, TransferStyle: 0,
PromptMessageToAccept: null } idhack:1 nupdated
:1 upsert:1 keyUpdates:0 locks(micros) w:67122003 67122ms

Mon Oct 13 17:52:09.124 [conn2224348] command admin.$cmd command: {
authenticate: 1, user: "master", nonce: "XXX", key: "XXX" } ntoreturn:1
keyUpdates:0 locks(micros) r:186 reslen:72 64216ms
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/97898ab6-a15b-41f9-857d-9bbe32c55de8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Asya Kamsky
2014-11-03 05:33:54 UTC
Permalink
In the logs there would also be a line logged for slow flush operation -
was that there in the vicinity of these slow operations?

Asya
Post by Rhys Domingo
Hello,
A few days ago we experienced a strange issue. Database writes and logins
took longer than a minute and the primary server appeared as down to the
secondary replicas. Performance in MongoDB was so bad that our client
application was not able to connect to the MongoDB database. There was no
unusual database activity and none of the physical resources were stressed
on the server. I checked network logs and SAN activity with our storage
vendor and nothing appeared out of the norm. All evidence indicates that
this was caused by the MongoDB background flush operation. I've seen
reports online with this happening on a Windows server, but it doesn't
apply in our case since we're running on CentOS 6.5. Any insight would be
greatly appreciated.
Our production configuration is as follows. MongoDB 2.4.10, 3 node replca
set, 24 logical CPUs, 144GB RAM, RAID 5, CentOS 6.5, C# driver
A connection attempt failed because the connected party did not properly
respond after a period of time, or established connection failed because
connected host has failed to respond XX.XXX.XXX.XX1:27017 at
System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot,
SocketAddress socketAddress) at System.Net.Sockets.Socket.Connect(EndPoint
remoteEP) at System.Net.Sockets.TcpClient.Connect(IPEndPoint remoteEP) at
MongoDB.Driver.Internal.MongoConnection.Open()
Mon Oct 13 17:51:21.139 [rsHealthPoll] replset info XX.XXX.XXX.XX2:27017
thinks that we are down
Mon Oct 13 17:51:45.148 [rsHealthPoll] replset info XX.XXX.XXX.XX3:27017
thinks that we are down
Mon Oct 13 17:52:09.125 [conn1384510] command admin.$cmd command: {
authenticate: 1, nonce: "XXX", user: "master", key: "XXX" } ntoreturn: 1
keyUpdates:0 locks(micros) r:126 reslen:72 24473ms
Mon Oct 13 17:52:09.122 [conn2222382] update DialIQ.AdapterLegStatus
query: { _id: "XXX" } update: { _id: "XXX", _t: "AgentLegLockableItem",
StateLogs: {}, Client: { ClientId: "XXX", ProviderId: null,
UseFakeProvider: false, TwilioAccountSid: "XXX", TwilioAuthToken: "XXX",
CallRecordingEnabled: false, GhostNumber: "XXX", GhostNumberEnabled: false,
MaxSimultaneousTwilioRequests: 0 }, CallId : "XXX", ParentCallId: null,
0, RecordingUrl: null, StartTime: new Date(1413222610034), EndTime: new
"XXX", PhoneExtension: "", CallerId: "XXX", RingTimeout: 16,
LastSavedState: 0, ResponseToProvider: null, RecordingEnabled: false,
Agent: { AgentId: "XXX", _id: null, Phones: [ { Phone: "XXX",
16, CallLegId: "XXX", ExternalRefId: null, CallPersistenceSeconds:16,
null, PromtpToAcceptTransfer: false, TransferStyle: 0,
PromptMessageToAccept: null } idhack:1 nupdated
:1 upsert:1 keyUpdates:0 locks(micros) w:67122003 67122ms
Mon Oct 13 17:52:09.124 [conn2224348] command admin.$cmd command: {
authenticate: 1, user: "master", nonce: "XXX", key: "XXX" } ntoreturn:1
keyUpdates:0 locks(micros) r:186 reslen:72 64216ms
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/97898ab6-a15b-41f9-857d-9bbe32c55de8%40googlegroups.com
<https://groups.google.com/d/msgid/mongodb-user/97898ab6-a15b-41f9-857d-9bbe32c55de8%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/CAOe6dJBdZuYer7NuUjDwx%3DgAWBGi91Qa%3DHHCubXYQNrArcQT8w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Rhys Domingo
2014-11-08 03:17:53 UTC
Permalink
Hi Asya,

I do see the slow flush operation. Here's the info from the log:

Mon Oct 13 17:52:09.546 [DataFileSync] flushing mmaps took 19449ms for 77
files

Thanks,
Rhys
Post by Asya Kamsky
In the logs there would also be a line logged for slow flush operation -
was that there in the vicinity of these slow operations?
Asya
Post by Rhys Domingo
Hello,
A few days ago we experienced a strange issue. Database writes and logins
took longer than a minute and the primary server appeared as down to the
secondary replicas. Performance in MongoDB was so bad that our client
application was not able to connect to the MongoDB database. There was no
unusual database activity and none of the physical resources were stressed
on the server. I checked network logs and SAN activity with our storage
vendor and nothing appeared out of the norm. All evidence indicates that
this was caused by the MongoDB background flush operation. I've seen
reports online with this happening on a Windows server, but it doesn't
apply in our case since we're running on CentOS 6.5. Any insight would be
greatly appreciated.
Our production configuration is as follows. MongoDB 2.4.10, 3 node replca
set, 24 logical CPUs, 144GB RAM, RAID 5, CentOS 6.5, C# driver
A connection attempt failed because the connected party did not properly
respond after a period of time, or established connection failed because
connected host has failed to respond XX.XXX.XXX.XX1:27017 at
System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot,
SocketAddress socketAddress) at System.Net.Sockets.Socket.Connect(EndPoint
remoteEP) at System.Net.Sockets.TcpClient.Connect(IPEndPoint remoteEP) at
MongoDB.Driver.Internal.MongoConnection.Open()
Mon Oct 13 17:51:21.139 [rsHealthPoll] replset info XX.XXX.XXX.XX2:27017
thinks that we are down
Mon Oct 13 17:51:45.148 [rsHealthPoll] replset info XX.XXX.XXX.XX3:27017
thinks that we are down
Mon Oct 13 17:52:09.125 [conn1384510] command admin.$cmd command: {
authenticate: 1, nonce: "XXX", user: "master", key: "XXX" } ntoreturn: 1
keyUpdates:0 locks(micros) r:126 reslen:72 24473ms
Mon Oct 13 17:52:09.122 [conn2222382] update DialIQ.AdapterLegStatus
query: { _id: "XXX" } update: { _id: "XXX", _t: "AgentLegLockableItem",
StateLogs: {}, Client: { ClientId: "XXX", ProviderId: null,
UseFakeProvider: false, TwilioAccountSid: "XXX", TwilioAuthToken: "XXX",
CallRecordingEnabled: false, GhostNumber: "XXX", GhostNumberEnabled: false,
MaxSimultaneousTwilioRequests: 0 }, CallId : "XXX", ParentCallId: null,
0, RecordingUrl: null, StartTime: new Date(1413222610034), EndTime: new
"XXX", PhoneExtension: "", CallerId: "XXX", RingTimeout: 16,
LastSavedState: 0, ResponseToProvider: null, RecordingEnabled: false,
Agent: { AgentId: "XXX", _id: null, Phones: [ { Phone: "XXX",
16, CallLegId: "XXX", ExternalRefId: null, CallPersistenceSeconds:16,
null, PromtpToAcceptTransfer: false, TransferStyle: 0,
PromptMessageToAccept: null } idhack:1 nupdated
:1 upsert:1 keyUpdates:0 locks(micros) w:67122003 67122ms
Mon Oct 13 17:52:09.124 [conn2224348] command admin.$cmd command: {
authenticate: 1, user: "master", nonce: "XXX", key: "XXX" } ntoreturn:1
keyUpdates:0 locks(micros) r:186 reslen:72 64216ms
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
<javascript:>.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/97898ab6-a15b-41f9-857d-9bbe32c55de8%40googlegroups.com
<https://groups.google.com/d/msgid/mongodb-user/97898ab6-a15b-41f9-857d-9bbe32c55de8%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/87fcc293-7a3c-4104-9e2a-2180bf0428b7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Asya Kamsky
2014-11-08 13:58:50 UTC
Permalink
19449ms or 19seconds is very slow given that mongod flushes data files
(asynchronously) every 60 seconds - this suggests your disk IOPS are
not sufficient for the amount of writes you are doing (if this happens
periodically - it could be that some unusual write pattern was
happening at that time?)

Asya
Post by Rhys Domingo
Hi Asya,
Mon Oct 13 17:52:09.546 [DataFileSync] flushing mmaps took 19449ms for 77
files
Thanks,
Rhys
Post by Asya Kamsky
In the logs there would also be a line logged for slow flush operation -
was that there in the vicinity of these slow operations?
Asya
Post by Rhys Domingo
Hello,
A few days ago we experienced a strange issue. Database writes and logins
took longer than a minute and the primary server appeared as down to the
secondary replicas. Performance in MongoDB was so bad that our client
application was not able to connect to the MongoDB database. There was no
unusual database activity and none of the physical resources were stressed
on the server. I checked network logs and SAN activity with our storage
vendor and nothing appeared out of the norm. All evidence indicates that
this was caused by the MongoDB background flush operation. I've seen reports
online with this happening on a Windows server, but it doesn't apply in our
case since we're running on CentOS 6.5. Any insight would be greatly
appreciated.
Our production configuration is as follows. MongoDB 2.4.10, 3 node replca
set, 24 logical CPUs, 144GB RAM, RAID 5, CentOS 6.5, C# driver
A connection attempt failed because the connected party did not properly
respond after a period of time, or established connection failed because
connected host has failed to respond XX.XXX.XXX.XX1:27017 at
System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress
socketAddress) at System.Net.Sockets.Socket.Connect(EndPoint remoteEP) at
System.Net.Sockets.TcpClient.Connect(IPEndPoint remoteEP) at
MongoDB.Driver.Internal.MongoConnection.Open()
Mon Oct 13 17:51:21.139 [rsHealthPoll] replset info XX.XXX.XXX.XX2:27017
thinks that we are down
Mon Oct 13 17:51:45.148 [rsHealthPoll] replset info XX.XXX.XXX.XX3:27017
thinks that we are down
Mon Oct 13 17:52:09.125 [conn1384510] command admin.$cmd command: {
authenticate: 1, nonce: "XXX", user: "master", key: "XXX" } ntoreturn: 1
keyUpdates:0 locks(micros) r:126 reslen:72 24473ms
Mon Oct 13 17:52:09.122 [conn2222382] update DialIQ.AdapterLegStatus
query: { _id: "XXX" } update: { _id: "XXX", _t: "AgentLegLockableItem",
false, TwilioAccountSid: "XXX", TwilioAuthToken: "XXX",
CallRecordingEnabled: false, GhostNumber: "XXX", GhostNumberEnabled: false,
MaxSimultaneousTwilioRequests: 0 }, CallId : "XXX", ParentCallId: null,
0, RecordingUrl: null, StartTime: new Date(1413222610034), EndTime: new
"Dialer" } ], CallerId: "XXX", RingTimeout: 16, CallLegId: "XXX",
ExternalRefId: null, CallPersistenceSeconds:16, SoftPhone: null },
PromptTimeout: 0, WhisperMessage: null, ResultMessage: null,
PromtpToAcceptTransfer: false, TransferStyle: 0, PromptMessageToAccept: null
} idhack:1 nupdated
:1 upsert:1 keyUpdates:0 locks(micros) w:67122003 67122ms
Mon Oct 13 17:52:09.124 [conn2224348] command admin.$cmd command: {
authenticate: 1, user: "master", nonce: "XXX", key: "XXX" } ntoreturn:1
keyUpdates:0 locks(micros) r:186 reslen:72 64216ms
--
You received this message because you are subscribed to the Google Groups
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/97898ab6-a15b-41f9-857d-9bbe32c55de8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/87fcc293-7a3c-4104-9e2a-2180bf0428b7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/CAOe6dJAxC6Gj01ynk2gVo%3D86ZzG1KFpQvVmPp9stsQKjXNU%3D-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Rhys Domingo
2014-11-10 18:52:04 UTC
Permalink
That's why I'm at a loss here. A long write operation would be logged, and
there's nothing unusually except the length of time these queries took to
execute. We are not even close to hitting any thresholds on the SAN. Just
to make sure, I had the logs checked by the network engineers & storage
vendor, and nothing unusual can be identified.
Post by Asya Kamsky
19449ms or 19seconds is very slow given that mongod flushes data files
(asynchronously) every 60 seconds - this suggests your disk IOPS are
not sufficient for the amount of writes you are doing (if this happens
periodically - it could be that some unusual write pattern was
happening at that time?)
Asya
Post by Rhys Domingo
Hi Asya,
Mon Oct 13 17:52:09.546 [DataFileSync] flushing mmaps took 19449ms for
77
Post by Rhys Domingo
files
Thanks,
Rhys
Post by Asya Kamsky
In the logs there would also be a line logged for slow flush operation
-
Post by Rhys Domingo
Post by Asya Kamsky
was that there in the vicinity of these slow operations?
Asya
Post by Rhys Domingo
Hello,
A few days ago we experienced a strange issue. Database writes and
logins
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
took longer than a minute and the primary server appeared as down to
the
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
secondary replicas. Performance in MongoDB was so bad that our client
application was not able to connect to the MongoDB database. There was
no
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
unusual database activity and none of the physical resources were
stressed
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
on the server. I checked network logs and SAN activity with our
storage
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
vendor and nothing appeared out of the norm. All evidence indicates
that
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
this was caused by the MongoDB background flush operation. I've seen
reports
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
online with this happening on a Windows server, but it doesn't apply
in our
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
case since we're running on CentOS 6.5. Any insight would be greatly
appreciated.
Our production configuration is as follows. MongoDB 2.4.10, 3 node
replca
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
set, 24 logical CPUs, 144GB RAM, RAID 5, CentOS 6.5, C# driver
A connection attempt failed because the connected party did not
properly
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
respond after a period of time, or established connection failed
because
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
connected host has failed to respond XX.XXX.XXX.XX1:27017 at
System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot,
SocketAddress
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
socketAddress) at System.Net.Sockets.Socket.Connect(EndPoint remoteEP)
at
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
System.Net.Sockets.TcpClient.Connect(IPEndPoint remoteEP) at
MongoDB.Driver.Internal.MongoConnection.Open()
Mon Oct 13 17:51:21.139 [rsHealthPoll] replset info
XX.XXX.XXX.XX2:27017
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
thinks that we are down
Mon Oct 13 17:51:45.148 [rsHealthPoll] replset info
XX.XXX.XXX.XX3:27017
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
thinks that we are down
Mon Oct 13 17:52:09.125 [conn1384510] command admin.$cmd command: {
1
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
keyUpdates:0 locks(micros) r:126 reslen:72 24473ms
Mon Oct 13 17:52:09.122 [conn2222382] update DialIQ.AdapterLegStatus
"AgentLegLockableItem",
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
StateLogs: {}, Client: { ClientId: "XXX", ProviderId: null,
false, TwilioAccountSid: "XXX", TwilioAuthToken: "XXX",
false,
null,
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
AgentId: "XXX", ConferenceId: "XXX", ExternalRefId: null,
new
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Date(-62135596800000), Type: 1, CallType: 1, Disavowed: false,
"XXX", PhoneExtension: "", CallerId: "XXX", RingTimeout: 16,
0, ResponseToProvider: null, RecordingEnabled: false, Agent: {
"XXX", _id: null, Phones: [ { Phone: "XXX", PhoneExtension: "",
"Dialer" } ], CallerId: "XXX", RingTimeout: 16, CallLegId: "XXX",
ExternalRefId: null, CallPersistenceSeconds:16, SoftPhone: null },
PromptTimeout: 0, WhisperMessage: null, ResultMessage: null,
PromtpToAcceptTransfer: false, TransferStyle: 0,
PromptMessageToAccept: null
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
} idhack:1 nupdated
:1 upsert:1 keyUpdates:0 locks(micros) w:67122003 67122ms
Mon Oct 13 17:52:09.124 [conn2224348] command admin.$cmd command: {
authenticate: 1, user: "master", nonce: "XXX", key: "XXX" }
ntoreturn:1
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
keyUpdates:0 locks(micros) r:186 reslen:72 64216ms
--
You received this message because you are subscribed to the Google
Groups
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send
an
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/97898ab6-a15b-41f9-857d-9bbe32c55de8%40googlegroups.com.
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups
Post by Rhys Domingo
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups
Post by Rhys Domingo
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send
an
<javascript:>.
Post by Rhys Domingo
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/87fcc293-7a3c-4104-9e2a-2180bf0428b7%40googlegroups.com.
Post by Rhys Domingo
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/acb30ada-4c46-4376-bab4-21a845901f42%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Asya Kamsky
2014-11-10 23:53:00 UTC
Permalink
I don't mean a long running write, I mean more random writes than
normal - you can have some number of updates that (for example) cause
a move of several documents which have large indexed arrays, which
means many index entries have to be updated and that's more random
writes than a "typical" update...

Asya
Post by Rhys Domingo
That's why I'm at a loss here. A long write operation would be logged, and
there's nothing unusually except the length of time these queries took to
execute. We are not even close to hitting any thresholds on the SAN. Just to
make sure, I had the logs checked by the network engineers & storage vendor,
and nothing unusual can be identified.
Post by Asya Kamsky
19449ms or 19seconds is very slow given that mongod flushes data files
(asynchronously) every 60 seconds - this suggests your disk IOPS are
not sufficient for the amount of writes you are doing (if this happens
periodically - it could be that some unusual write pattern was
happening at that time?)
Asya
Post by Rhys Domingo
Hi Asya,
Mon Oct 13 17:52:09.546 [DataFileSync] flushing mmaps took 19449ms for 77
files
Thanks,
Rhys
Post by Asya Kamsky
In the logs there would also be a line logged for slow flush operation -
was that there in the vicinity of these slow operations?
Asya
Post by Rhys Domingo
Hello,
A few days ago we experienced a strange issue. Database writes and logins
took longer than a minute and the primary server appeared as down to the
secondary replicas. Performance in MongoDB was so bad that our client
application was not able to connect to the MongoDB database. There was no
unusual database activity and none of the physical resources were stressed
on the server. I checked network logs and SAN activity with our storage
vendor and nothing appeared out of the norm. All evidence indicates that
this was caused by the MongoDB background flush operation. I've seen reports
online with this happening on a Windows server, but it doesn't apply in our
case since we're running on CentOS 6.5. Any insight would be greatly
appreciated.
Our production configuration is as follows. MongoDB 2.4.10, 3 node replca
set, 24 logical CPUs, 144GB RAM, RAID 5, CentOS 6.5, C# driver
A connection attempt failed because the connected party did not properly
respond after a period of time, or established connection failed because
connected host has failed to respond XX.XXX.XXX.XX1:27017 at
System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress
socketAddress) at System.Net.Sockets.Socket.Connect(EndPoint remoteEP) at
System.Net.Sockets.TcpClient.Connect(IPEndPoint remoteEP) at
MongoDB.Driver.Internal.MongoConnection.Open()
Mon Oct 13 17:51:21.139 [rsHealthPoll] replset info
XX.XXX.XXX.XX2:27017
thinks that we are down
Mon Oct 13 17:51:45.148 [rsHealthPoll] replset info
XX.XXX.XXX.XX3:27017
thinks that we are down
Mon Oct 13 17:52:09.125 [conn1384510] command admin.$cmd command: {
authenticate: 1, nonce: "XXX", user: "master", key: "XXX" } ntoreturn: 1
keyUpdates:0 locks(micros) r:126 reslen:72 24473ms
Mon Oct 13 17:52:09.122 [conn2222382] update DialIQ.AdapterLegStatus
"AgentLegLockableItem",
false, TwilioAccountSid: "XXX", TwilioAuthToken: "XXX",
CallRecordingEnabled: false, GhostNumber: "XXX", GhostNumberEnabled: false,
MaxSimultaneousTwilioRequests: 0 }, CallId : "XXX", ParentCallId: null,
AgentId: "XXX", ConferenceId: "XXX", ExternalRefId: null,
0, RecordingUrl: null, StartTime: new Date(1413222610034), EndTime: new
"Dialer" } ], CallerId: "XXX", RingTimeout: 16, CallLegId: "XXX",
ExternalRefId: null, CallPersistenceSeconds:16, SoftPhone: null },
PromptTimeout: 0, WhisperMessage: null, ResultMessage: null,
PromtpToAcceptTransfer: false, TransferStyle: 0,
PromptMessageToAccept: null
} idhack:1 nupdated
:1 upsert:1 keyUpdates:0 locks(micros) w:67122003 67122ms
Mon Oct 13 17:52:09.124 [conn2224348] command admin.$cmd command: {
authenticate: 1, user: "master", nonce: "XXX", key: "XXX" } ntoreturn:1
keyUpdates:0 locks(micros) r:186 reslen:72 64216ms
--
You received this message because you are subscribed to the Google Groups
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/97898ab6-a15b-41f9-857d-9bbe32c55de8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/87fcc293-7a3c-4104-9e2a-2180bf0428b7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/acb30ada-4c46-4376-bab4-21a845901f42%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/CAOe6dJA_YwB%2B7iacurmp%3DYP3gpcJheWyEYKjki-gT94MQGqLXg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Rhys Domingo
2014-11-11 01:53:42 UTC
Permalink
Nothing different than normal on that day. We do have a lot of updates
happening all the time. MongoDB is used in a highly transactional manner.
Post by Asya Kamsky
I don't mean a long running write, I mean more random writes than
normal - you can have some number of updates that (for example) cause
a move of several documents which have large indexed arrays, which
means many index entries have to be updated and that's more random
writes than a "typical" update...
Asya
Post by Rhys Domingo
That's why I'm at a loss here. A long write operation would be logged,
and
Post by Rhys Domingo
there's nothing unusually except the length of time these queries took
to
Post by Rhys Domingo
execute. We are not even close to hitting any thresholds on the SAN.
Just to
Post by Rhys Domingo
make sure, I had the logs checked by the network engineers & storage
vendor,
Post by Rhys Domingo
and nothing unusual can be identified.
Post by Asya Kamsky
19449ms or 19seconds is very slow given that mongod flushes data files
(asynchronously) every 60 seconds - this suggests your disk IOPS are
not sufficient for the amount of writes you are doing (if this happens
periodically - it could be that some unusual write pattern was
happening at that time?)
Asya
Post by Rhys Domingo
Hi Asya,
Mon Oct 13 17:52:09.546 [DataFileSync] flushing mmaps took 19449ms
for
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
77
files
Thanks,
Rhys
Post by Asya Kamsky
In the logs there would also be a line logged for slow flush
operation
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
-
was that there in the vicinity of these slow operations?
Asya
Post by Rhys Domingo
Hello,
A few days ago we experienced a strange issue. Database writes and logins
took longer than a minute and the primary server appeared as down
to
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
the
secondary replicas. Performance in MongoDB was so bad that our
client
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
application was not able to connect to the MongoDB database. There
was
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
no
unusual database activity and none of the physical resources were stressed
on the server. I checked network logs and SAN activity with our storage
vendor and nothing appeared out of the norm. All evidence indicates that
this was caused by the MongoDB background flush operation. I've
seen
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
reports
online with this happening on a Windows server, but it doesn't
apply
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
in our
case since we're running on CentOS 6.5. Any insight would be
greatly
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
appreciated.
Our production configuration is as follows. MongoDB 2.4.10, 3 node replca
set, 24 logical CPUs, 144GB RAM, RAID 5, CentOS 6.5, C# driver
A connection attempt failed because the connected party did not properly
respond after a period of time, or established connection failed because
connected host has failed to respond XX.XXX.XXX.XX1:27017 at
System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress
socketAddress) at System.Net.Sockets.Socket.Connect(EndPoint
remoteEP)
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
at
System.Net.Sockets.TcpClient.Connect(IPEndPoint remoteEP) at
MongoDB.Driver.Internal.MongoConnection.Open()
Mon Oct 13 17:51:21.139 [rsHealthPoll] replset info
XX.XXX.XXX.XX2:27017
thinks that we are down
Mon Oct 13 17:51:45.148 [rsHealthPoll] replset info
XX.XXX.XXX.XX3:27017
thinks that we are down
Mon Oct 13 17:52:09.125 [conn1384510] command admin.$cmd command: {
authenticate: 1, nonce: "XXX", user: "master", key: "XXX" }
1
keyUpdates:0 locks(micros) r:126 reslen:72 24473ms
Mon Oct 13 17:52:09.122 [conn2222382] update
DialIQ.AdapterLegStatus
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
"AgentLegLockableItem",
false, TwilioAccountSid: "XXX", TwilioAuthToken: "XXX",
CallRecordingEnabled: false, GhostNumber: "XXX",
false,
MaxSimultaneousTwilioRequests: 0 }, CallId : "XXX", ParentCallId: null,
0, RecordingUrl: null, StartTime: new Date(1413222610034), EndTime: new
"Dialer" } ], CallerId: "XXX", RingTimeout: 16, CallLegId: "XXX",
ExternalRefId: null, CallPersistenceSeconds:16, SoftPhone: null },
PromptTimeout: 0, WhisperMessage: null, ResultMessage: null,
PromtpToAcceptTransfer: false, TransferStyle: 0,
PromptMessageToAccept: null
} idhack:1 nupdated
:1 upsert:1 keyUpdates:0 locks(micros) w:67122003 67122ms
Mon Oct 13 17:52:09.124 [conn2224348] command admin.$cmd command: {
authenticate: 1, user: "master", nonce: "XXX", key: "XXX" } ntoreturn:1
keyUpdates:0 locks(micros) r:186 reslen:72 64216ms
--
You received this message because you are subscribed to the Google Groups
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it,
send
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
an
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/97898ab6-a15b-41f9-857d-9bbe32c55de8%40googlegroups.com.
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it,
send
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
an
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/87fcc293-7a3c-4104-9e2a-2180bf0428b7%40googlegroups.com.
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups
Post by Rhys Domingo
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups
Post by Rhys Domingo
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send
an
<javascript:>.
Post by Rhys Domingo
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/acb30ada-4c46-4376-bab4-21a845901f42%40googlegroups.com.
Post by Rhys Domingo
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/8d3f3062-9901-4fb5-8451-9a7b0fc8ecb8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Rob Moore
2014-11-11 02:18:13 UTC
Permalink
Not sure this is what you are seeing but since you said "highly transaction
manner"...

We do the same thing and have seen performance crashes. What we found
happening was that the normal write back was occurring but the number of
dirty pages on the system was still growing. There is a memory setting in
the kernel (/proc/sys/vm/dirty_background_ratio) that once you trip the
kernel starts to write dirty pages back to disk, effectively, as fast as it
can. For small memory systems the write back is not that big a deal. For
big(ger) memory systems like yours the additional write back of even a few
gigs of memory can exhaust all of your remaining disk bandwidth.

Your storage team will say that the SAN and storage controller are asleep
because (they are and) the bottle neck is either at the machines SAN
interface or (in my case) they pulled all of the replica set member's disk
partitions from the same small set of spindles. They don't look at the
drive utilization unless you specifically ask and then sometimes don't have
a granular enough view to see the individual drives.

Long story short I agree with Asya that you were probably starved for disk
I/O. Does your MMS have the platform statistics? You should be able to see
the disk I/O in there if I remember correctly...

If that is the case the only real fix is to get a better disk subsystem or
spread across more machines/disks. I would also go to the storage team to
get a setup for random reads/writes. Make sure the members are hitting
different sets of spindles and they have a reasonable RAID setting for the
workload.

We ended giving up on disk entirely and decided to just stay in memory. I
don't recommend it unless you think through all of your failure modes but
it is fast and has less performance jitter.

HTH,
Rob
Post by Rhys Domingo
Nothing different than normal on that day. We do have a lot of updates
happening all the time. MongoDB is used in a highly transactional manner.
Post by Asya Kamsky
I don't mean a long running write, I mean more random writes than
normal - you can have some number of updates that (for example) cause
a move of several documents which have large indexed arrays, which
means many index entries have to be updated and that's more random
writes than a "typical" update...
Asya
Post by Rhys Domingo
That's why I'm at a loss here. A long write operation would be logged,
and
Post by Rhys Domingo
there's nothing unusually except the length of time these queries took
to
Post by Rhys Domingo
execute. We are not even close to hitting any thresholds on the SAN.
Just to
Post by Rhys Domingo
make sure, I had the logs checked by the network engineers & storage
vendor,
Post by Rhys Domingo
and nothing unusual can be identified.
Post by Asya Kamsky
19449ms or 19seconds is very slow given that mongod flushes data files
(asynchronously) every 60 seconds - this suggests your disk IOPS are
not sufficient for the amount of writes you are doing (if this happens
periodically - it could be that some unusual write pattern was
happening at that time?)
Asya
Post by Rhys Domingo
Hi Asya,
Mon Oct 13 17:52:09.546 [DataFileSync] flushing mmaps took 19449ms
for
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
77
files
Thanks,
Rhys
Post by Asya Kamsky
In the logs there would also be a line logged for slow flush
operation
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
-
was that there in the vicinity of these slow operations?
Asya
Post by Rhys Domingo
Hello,
A few days ago we experienced a strange issue. Database writes and logins
took longer than a minute and the primary server appeared as down
to
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
the
secondary replicas. Performance in MongoDB was so bad that our
client
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
application was not able to connect to the MongoDB database. There
was
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
no
unusual database activity and none of the physical resources were stressed
on the server. I checked network logs and SAN activity with our storage
vendor and nothing appeared out of the norm. All evidence
indicates
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
that
this was caused by the MongoDB background flush operation. I've
seen
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
reports
online with this happening on a Windows server, but it doesn't
apply
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
in our
case since we're running on CentOS 6.5. Any insight would be
greatly
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
appreciated.
Our production configuration is as follows. MongoDB 2.4.10, 3 node replca
set, 24 logical CPUs, 144GB RAM, RAID 5, CentOS 6.5, C# driver
A connection attempt failed because the connected party did not properly
respond after a period of time, or established connection failed because
connected host has failed to respond XX.XXX.XXX.XX1:27017 at
System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot,
SocketAddress
socketAddress) at System.Net.Sockets.Socket.Connect(EndPoint
remoteEP)
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
at
System.Net.Sockets.TcpClient.Connect(IPEndPoint remoteEP) at
MongoDB.Driver.Internal.MongoConnection.Open()
Mon Oct 13 17:51:21.139 [rsHealthPoll] replset info
XX.XXX.XXX.XX2:27017
thinks that we are down
Mon Oct 13 17:51:45.148 [rsHealthPoll] replset info
XX.XXX.XXX.XX3:27017
thinks that we are down
{
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
authenticate: 1, nonce: "XXX", user: "master", key: "XXX" }
1
keyUpdates:0 locks(micros) r:126 reslen:72 24473ms
Mon Oct 13 17:52:09.122 [conn2222382] update
DialIQ.AdapterLegStatus
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
"AgentLegLockableItem",
StateLogs: {}, Client: { ClientId: "XXX", ProviderId: null,
false, TwilioAccountSid: "XXX", TwilioAuthToken: "XXX",
CallRecordingEnabled: false, GhostNumber: "XXX",
false,
MaxSimultaneousTwilioRequests: 0 }, CallId : "XXX", ParentCallId: null,
0, RecordingUrl: null, StartTime: new Date(1413222610034),
new
"XXX", PhoneExtension: "", CallerId: "XXX", RingTimeout: 16,
"Dialer" } ], CallerId: "XXX", RingTimeout: 16, CallLegId: "XXX",
ExternalRefId: null, CallPersistenceSeconds:16, SoftPhone: null },
PromptTimeout: 0, WhisperMessage: null, ResultMessage: null,
PromtpToAcceptTransfer: false, TransferStyle: 0,
PromptMessageToAccept: null
} idhack:1 nupdated
:1 upsert:1 keyUpdates:0 locks(micros) w:67122003 67122ms
{
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
authenticate: 1, user: "master", nonce: "XXX", key: "XXX" } ntoreturn:1
keyUpdates:0 locks(micros) r:186 reslen:72 64216ms
--
You received this message because you are subscribed to the Google Groups
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it,
send
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
an
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/97898ab6-a15b-41f9-857d-9bbe32c55de8%40googlegroups.com.
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it,
send
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
an
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/87fcc293-7a3c-4104-9e2a-2180bf0428b7%40googlegroups.com.
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups
Post by Rhys Domingo
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups
Post by Rhys Domingo
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send
an
Post by Rhys Domingo
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/acb30ada-4c46-4376-bab4-21a845901f42%40googlegroups.com.
Post by Rhys Domingo
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/34a60210-5bed-44d0-8157-221102d32241%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Rhys Domingo
2014-11-11 18:35:56 UTC
Permalink
Thanks Rob. This is very helpful information. I've already started to
pursue re-evaluating the disk IO situation and found some things that can
be improved. Since this doesn't seem to be ringing a bell with the MongoDB
(thank you Asya), I'll continue down this path.

~Rhys
Post by Rob Moore
Not sure this is what you are seeing but since you said "highly
transaction manner"...
We do the same thing and have seen performance crashes. What we found
happening was that the normal write back was occurring but the number of
dirty pages on the system was still growing. There is a memory setting in
the kernel (/proc/sys/vm/dirty_background_ratio) that once you trip the
kernel starts to write dirty pages back to disk, effectively, as fast as it
can. For small memory systems the write back is not that big a deal. For
big(ger) memory systems like yours the additional write back of even a few
gigs of memory can exhaust all of your remaining disk bandwidth.
Your storage team will say that the SAN and storage controller are asleep
because (they are and) the bottle neck is either at the machines SAN
interface or (in my case) they pulled all of the replica set member's disk
partitions from the same small set of spindles. They don't look at the
drive utilization unless you specifically ask and then sometimes don't have
a granular enough view to see the individual drives.
Long story short I agree with Asya that you were probably starved for disk
I/O. Does your MMS have the platform statistics? You should be able to see
the disk I/O in there if I remember correctly...
If that is the case the only real fix is to get a better disk subsystem or
spread across more machines/disks. I would also go to the storage team to
get a setup for random reads/writes. Make sure the members are hitting
different sets of spindles and they have a reasonable RAID setting for the
workload.
We ended giving up on disk entirely and decided to just stay in memory. I
don't recommend it unless you think through all of your failure modes but
it is fast and has less performance jitter.
HTH,
Rob
Post by Rhys Domingo
Nothing different than normal on that day. We do have a lot of updates
happening all the time. MongoDB is used in a highly transactional manner.
Post by Asya Kamsky
I don't mean a long running write, I mean more random writes than
normal - you can have some number of updates that (for example) cause
a move of several documents which have large indexed arrays, which
means many index entries have to be updated and that's more random
writes than a "typical" update...
Asya
Post by Rhys Domingo
That's why I'm at a loss here. A long write operation would be logged,
and
Post by Rhys Domingo
there's nothing unusually except the length of time these queries took
to
Post by Rhys Domingo
execute. We are not even close to hitting any thresholds on the SAN.
Just to
Post by Rhys Domingo
make sure, I had the logs checked by the network engineers & storage
vendor,
Post by Rhys Domingo
and nothing unusual can be identified.
Post by Asya Kamsky
19449ms or 19seconds is very slow given that mongod flushes data
files
Post by Rhys Domingo
Post by Asya Kamsky
(asynchronously) every 60 seconds - this suggests your disk IOPS are
not sufficient for the amount of writes you are doing (if this
happens
Post by Rhys Domingo
Post by Asya Kamsky
periodically - it could be that some unusual write pattern was
happening at that time?)
Asya
Post by Rhys Domingo
Hi Asya,
Mon Oct 13 17:52:09.546 [DataFileSync] flushing mmaps took 19449ms
for
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
77
files
Thanks,
Rhys
Post by Asya Kamsky
In the logs there would also be a line logged for slow flush
operation
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
-
was that there in the vicinity of these slow operations?
Asya
Post by Rhys Domingo
Hello,
A few days ago we experienced a strange issue. Database writes
and
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
logins
took longer than a minute and the primary server appeared as down
to
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
the
secondary replicas. Performance in MongoDB was so bad that our
client
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
application was not able to connect to the MongoDB database.
There was
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
no
unusual database activity and none of the physical resources were
stressed
on the server. I checked network logs and SAN activity with our storage
vendor and nothing appeared out of the norm. All evidence
indicates
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
that
this was caused by the MongoDB background flush operation. I've
seen
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
reports
online with this happening on a Windows server, but it doesn't
apply
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
in our
case since we're running on CentOS 6.5. Any insight would be
greatly
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
appreciated.
Our production configuration is as follows. MongoDB 2.4.10, 3
node
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
replca
set, 24 logical CPUs, 144GB RAM, RAID 5, CentOS 6.5, C# driver
A connection attempt failed because the connected party did not properly
respond after a period of time, or established connection failed because
connected host has failed to respond XX.XXX.XXX.XX1:27017 at
System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot,
SocketAddress
socketAddress) at System.Net.Sockets.Socket.Connect(EndPoint
remoteEP)
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
at
System.Net.Sockets.TcpClient.Connect(IPEndPoint remoteEP) at
MongoDB.Driver.Internal.MongoConnection.Open()
Mon Oct 13 17:51:21.139 [rsHealthPoll] replset info
XX.XXX.XXX.XX2:27017
thinks that we are down
Mon Oct 13 17:51:45.148 [rsHealthPoll] replset info
XX.XXX.XXX.XX3:27017
thinks that we are down
{
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
authenticate: 1, nonce: "XXX", user: "master", key: "XXX" }
1
keyUpdates:0 locks(micros) r:126 reslen:72 24473ms
Mon Oct 13 17:52:09.122 [conn2222382] update
DialIQ.AdapterLegStatus
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
"AgentLegLockableItem",
StateLogs: {}, Client: { ClientId: "XXX", ProviderId: null,
false, TwilioAccountSid: "XXX", TwilioAuthToken: "XXX",
CallRecordingEnabled: false, GhostNumber: "XXX",
false,
MaxSimultaneousTwilioRequests: 0 }, CallId : "XXX", ParentCallId: null,
AgentId: "XXX", ConferenceId: "XXX", ExternalRefId: null,
0, RecordingUrl: null, StartTime: new Date(1413222610034),
new
Date(-62135596800000), Type: 1, CallType: 1, Disavowed: false,
"XXX", PhoneExtension: "", CallerId: "XXX", RingTimeout: 16,
"XXX", _id: null, Phones: [ { Phone: "XXX", PhoneExtension: "",
"Dialer" } ], CallerId: "XXX", RingTimeout: 16, CallLegId: "XXX",
ExternalRefId: null, CallPersistenceSeconds:16, SoftPhone: null
},
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
PromptTimeout: 0, WhisperMessage: null, ResultMessage: null,
PromtpToAcceptTransfer: false, TransferStyle: 0,
PromptMessageToAccept: null
} idhack:1 nupdated
:1 upsert:1 keyUpdates:0 locks(micros) w:67122003 67122ms
{
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
authenticate: 1, user: "master", nonce: "XXX", key: "XXX" } ntoreturn:1
keyUpdates:0 locks(micros) r:186 reslen:72 64216ms
--
You received this message because you are subscribed to the
Google
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Groups
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the
Google
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it,
send
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
an
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/97898ab6-a15b-41f9-857d-9bbe32c55de8%40googlegroups.com.
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it,
send
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
an
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/87fcc293-7a3c-4104-9e2a-2180bf0428b7%40googlegroups.com.
Post by Rhys Domingo
Post by Asya Kamsky
Post by Rhys Domingo
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups
Post by Rhys Domingo
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups
Post by Rhys Domingo
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send
an
Post by Rhys Domingo
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/acb30ada-4c46-4376-bab4-21a845901f42%40googlegroups.com.
Post by Rhys Domingo
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/e16354fe-01ad-46af-a675-94111845c84a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...