Discussion:
How far to push the document nesting?
(too old to reply)
Lloyd Cledwyn
2012-02-17 07:11:06 UTC
Permalink
I am relatively new to MongoDB, and so far am really impressed. I am
struggling with the best way to setup my document stores though. I am
trying to do some summary analytics using twitter data and I am not sure
whether to put the tweets into the user document, or to keep those as
a separate collection. It seems like putting the tweets inside the user
model would quickly hit the limit with regards to size. If that is the case
then what is a good way to be able to run MapReduce across a group of
user's tweets?

I hope I am not being too vague but I don't want to get too specific and
too far down the wrong path as far as setting up my domain model.

As I am sure you are all bored of hearing, I am used to RDB

|USER___|
---------
|ID
|Name
|Etc.

|TWEET__|
---------
|ID
|UserID
|Etc

It seems like

User
|-Tweet (0..3000)
|-Entities
|-Hashtags (0..10+)
|-urls (0..5)
|-user_mentions (0..12)
|-GeoData (0..20)
|-somegroupID

would quickly bloat the User document beyond capacity. But I would like to
run analysis on tweets belonging to users with similar *somegroupID*. It
conceptually makes sense to to the model layout as above, but at what point
is that too unweildy? And what are viable alternatives?
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/6UaV3E6xhfoJ.
To post to this group, send email to mongodb-user-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to mongodb-user+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
Nat
2012-02-17 11:37:16 UTC
Permalink
I would not store tweet data inside user. It's better to keep them separated. If you need to run analytic based on user profile such as age, sex, etc, you might store them together with tweet data, it will make it easier to run map/reduce or aggregation on it.
-----Original Message-----
From: Lloyd Cledwyn <cledwyn-***@public.gmane.org>
Sender: mongodb-user-/***@public.gmane.org
Date: Thu, 16 Feb 2012 23:11:06
To: <mongodb-user-/***@public.gmane.org>
Reply-To: mongodb-user-/***@public.gmane.org
Subject: [mongodb-user] How far to push the document nesting?

I am relatively new to MongoDB, and so far am really impressed. I am
struggling with the best way to setup my document stores though. I am
trying to do some summary analytics using twitter data and I am not sure
whether to put the tweets into the user document, or to keep those as
a separate collection. It seems like putting the tweets inside the user
model would quickly hit the limit with regards to size. If that is the case
then what is a good way to be able to run MapReduce across a group of
user's tweets?

I hope I am not being too vague but I don't want to get too specific and
too far down the wrong path as far as setting up my domain model.

As I am sure you are all bored of hearing, I am used to RDB

|USER___|
---------
|ID
|Name
|Etc.

|TWEET__|
---------
|ID
|UserID
|Etc

It seems like

User
|-Tweet (0..3000)
|-Entities
|-Hashtags (0..10+)
|-urls (0..5)
|-user_mentions (0..12)
|-GeoData (0..20)
|-somegroupID

would quickly bloat the User document beyond capacity. But I would like to
run analysis on tweets belonging to users with similar *somegroupID*. It
conceptually makes sense to to the model layout as above, but at what point
is that too unweildy? And what are viable alternatives?
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/6UaV3E6xhfoJ.
To post to this group, send email to mongodb-user-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to mongodb-user+***@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongodb-user-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to mongodb-user+***@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
Lloyd Cledwyn
2012-02-18 05:48:59 UTC
Permalink
Interesting. Of course. So counter intuitive from a "normalized" mindset.
May add a [duplicated] data element across thousands of elements, but if
that helps performance when I'm hoping for it, it could just work.
Post by Nat
**
I would not store tweet data inside user. It's better to keep them
separated. If you need to run analytic based on user profile such as age,
sex, etc, you might store them together with tweet data, it will make it
easier to run map/reduce or aggregation on it.
------------------------------
*Date: *Thu, 16 Feb 2012 23:11:06 -0800 (PST)
*Subject: *[mongodb-user] How far to push the document nesting?
I am relatively new to MongoDB, and so far am really impressed. I am
struggling with the best way to setup my document stores though. I am
trying to do some summary analytics using twitter data and I am not sure
whether to put the tweets into the user document, or to keep those as
a separate collection. It seems like putting the tweets inside the user
model would quickly hit the limit with regards to size. If that is the case
then what is a good way to be able to run MapReduce across a group of
user's tweets?
I hope I am not being too vague but I don't want to get too specific and
too far down the wrong path as far as setting up my domain model.
As I am sure you are all bored of hearing, I am used to RDB
|USER___|
---------
|ID
|Name
|Etc.
|TWEET__|
---------
|ID
|UserID
|Etc
It seems like
User
|-Tweet (0..3000)
|-Entities
|-Hashtags (0..10+)
|-urls (0..5)
|-user_mentions (0..12)
|-GeoData (0..20)
|-somegroupID
would quickly bloat the User document beyond capacity. But I would like
to run analysis on tweets belonging to users with similar *somegroupID*.
It conceptually makes sense to to the model layout as above, but at what
point is that too unweildy? And what are viable alternatives?
--
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/mongodb-user/-/6UaV3E6xhfoJ.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/mongodb-user?hl=en.
--
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/mongodb-user?hl=en.
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongodb-user-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to mongodb-user+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
Nat
2012-02-18 05:52:33 UTC
Permalink
Like many other nosqls, mongodb doesn't offer join operations. Denormalizing can give better performance than keep multiple fetching other table to simulate joins especially when you only do it one-off for analytical purpose.
-----Original Message-----
From: Lloyd Cledwyn <cledwyn-***@public.gmane.org>
Sender: mongodb-user-/***@public.gmane.org
Date: Fri, 17 Feb 2012 23:48:59
To: <mongodb-user-/***@public.gmane.org>
Reply-To: mongodb-user-/***@public.gmane.org
Subject: Re: [mongodb-user] How far to push the document nesting?

Interesting. Of course. So counter intuitive from a "normalized" mindset.
May add a [duplicated] data element across thousands of elements, but if
that helps performance when I'm hoping for it, it could just work.
Post by Nat
**
I would not store tweet data inside user. It's better to keep them
separated. If you need to run analytic based on user profile such as age,
sex, etc, you might store them together with tweet data, it will make it
easier to run map/reduce or aggregation on it.
------------------------------
*Date: *Thu, 16 Feb 2012 23:11:06 -0800 (PST)
*Subject: *[mongodb-user] How far to push the document nesting?
I am relatively new to MongoDB, and so far am really impressed. I am
struggling with the best way to setup my document stores though. I am
trying to do some summary analytics using twitter data and I am not sure
whether to put the tweets into the user document, or to keep those as
a separate collection. It seems like putting the tweets inside the user
model would quickly hit the limit with regards to size. If that is the case
then what is a good way to be able to run MapReduce across a group of
user's tweets?
I hope I am not being too vague but I don't want to get too specific and
too far down the wrong path as far as setting up my domain model.
As I am sure you are all bored of hearing, I am used to RDB
|USER___|
---------
|ID
|Name
|Etc.
|TWEET__|
---------
|ID
|UserID
|Etc
It seems like
User
|-Tweet (0..3000)
|-Entities
|-Hashtags (0..10+)
|-urls (0..5)
|-user_mentions (0..12)
|-GeoData (0..20)
|-somegroupID
would quickly bloat the User document beyond capacity. But I would like
to run analysis on tweets belonging to users with similar *somegroupID*.
It conceptually makes sense to to the model layout as above, but at what
point is that too unweildy? And what are viable alternatives?
--
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/mongodb-user/-/6UaV3E6xhfoJ.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/mongodb-user?hl=en.
--
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/mongodb-user?hl=en.
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongodb-user-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to mongodb-user+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongodb-user-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to mongodb-user+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
Lloyd Cledwyn
2012-02-18 05:56:14 UTC
Permalink
Do I need to worry about update/insert performance, if say I add a
"somegroupID" element to all the tweets associated with a user, thus
updating thousands of documents? (Of course over and over for each user in
the "somegroupID") And I can appreciate that once that element is present
in all those documents then doing a mapReduce / analysis for all those
documents is straight forward.
Post by Nat
**
Like many other nosqls, mongodb doesn't offer join operations.
Denormalizing can give better performance than keep multiple fetching other
table to simulate joins especially when you only do it one-off for
analytical purpose.
------------------------------
*Date: *Fri, 17 Feb 2012 23:48:59 -0600
*Subject: *Re: [mongodb-user] How far to push the document nesting?
Interesting. Of course. So counter intuitive from a "normalized"
mindset. May add a [duplicated] data element across thousands of elements,
but if that helps performance when I'm hoping for it, it could just work.
Post by Nat
**
I would not store tweet data inside user. It's better to keep them
separated. If you need to run analytic based on user profile such as age,
sex, etc, you might store them together with tweet data, it will make it
easier to run map/reduce or aggregation on it.
------------------------------
*Date: *Thu, 16 Feb 2012 23:11:06 -0800 (PST)
*Subject: *[mongodb-user] How far to push the document nesting?
I am relatively new to MongoDB, and so far am really impressed. I am
struggling with the best way to setup my document stores though. I am
trying to do some summary analytics using twitter data and I am not sure
whether to put the tweets into the user document, or to keep those as
a separate collection. It seems like putting the tweets inside the user
model would quickly hit the limit with regards to size. If that is the case
then what is a good way to be able to run MapReduce across a group of
user's tweets?
I hope I am not being too vague but I don't want to get too specific and
too far down the wrong path as far as setting up my domain model.
As I am sure you are all bored of hearing, I am used to RDB
|USER___|
---------
|ID
|Name
|Etc.
|TWEET__|
---------
|ID
|UserID
|Etc
It seems like
User
|-Tweet (0..3000)
|-Entities
|-Hashtags (0..10+)
|-urls (0..5)
|-user_mentions (0..12)
|-GeoData (0..20)
|-somegroupID
would quickly bloat the User document beyond capacity. But I would like
to run analysis on tweets belonging to users with similar *somegroupID*.
It conceptually makes sense to to the model layout as above, but at what
point is that too unweildy? And what are viable alternatives?
--
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/mongodb-user/-/6UaV3E6xhfoJ.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/mongodb-user?hl=en.
--
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/mongodb-user?hl=en.
--
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/mongodb-user?hl=en.
--
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/mongodb-user?hl=en.
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongodb-user-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to mongodb-user+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
Chris Winslett
2012-02-18 12:10:02 UTC
Permalink
Lloyd,

You will find this video interesting:

http://www.10gen.com/presentations/mongosv-2011/schema-design-at-scale

Essentially, in one document, store one days of tweets for one
person. The reasoning:

- Querying typically consists of days and users

Therefore, you can have the following index:

{user_id: 1, date: 1} # Date needs to be last because you will range
and sort on the date

Have fun!

Chris
MongoHQ
Post by Lloyd Cledwyn
Do I need to worry about update/insert performance, if say I add a
"somegroupID" element to all the tweets associated with a user, thus
updating thousands of documents?  (Of course over and over for each user in
the "somegroupID")  And I can appreciate that once that element is present
in all those documents then doing a mapReduce / analysis for all those
documents is straight forward.
Post by Nat
**
Like many other nosqls, mongodb doesn't offer join operations.
Denormalizing can give better performance than keep multiple fetching other
table to simulate joins especially when you only do it one-off for
analytical purpose.
------------------------------
*Date: *Fri, 17 Feb 2012 23:48:59 -0600
*Subject: *Re: [mongodb-user] How far to push the document nesting?
Interesting.  Of course.  So counter intuitive from a "normalized"
mindset.  May add a [duplicated] data element across thousands of elements,
but if that helps performance when I'm hoping for it, it could just work.
Post by Nat
**
I would not store tweet data inside user. It's better to keep them
separated. If you need to run analytic based on user profile such as age,
sex, etc, you might store them together with tweet data, it will make it
easier to run map/reduce or aggregation on it.
------------------------------
*Date: *Thu, 16 Feb 2012 23:11:06 -0800 (PST)
*Subject: *[mongodb-user] How far to push the document nesting?
I am relatively new to MongoDB, and so far am really impressed.  I am
struggling with the best way to setup my document stores though.  I am
trying to do some summary analytics using twitter data and I am not sure
whether to put the tweets into the user document, or to keep those as
a separate collection.  It seems like putting the tweets inside the user
model would quickly hit the limit with regards to size. If that is the case
then what is a good way to be able to run MapReduce across a group of
user's tweets?
I hope I am not being too vague but I don't want to get too specific and
too far down the wrong path as far as setting up my domain model.
As I am sure you are all bored of hearing, I am used to RDB
|USER___|
---------
|ID
|Name
|Etc.
|TWEET__|
---------
|ID
|UserID
|Etc
It seems like
User
|-Tweet (0..3000)
  |-Entities
    |-Hashtags (0..10+)
    |-urls (0..5)
    |-user_mentions (0..12)
  |-GeoData (0..20)
|-somegroupID
would quickly bloat the User document beyond capacity.  But I would like
to run analysis on tweets belonging to users with similar *somegroupID*.
 It conceptually makes sense to to the model layout as above, but at what
point is that too unweildy?  And what are viable alternatives?
 --
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/mongodb-user/-/6UaV3E6xhfoJ.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/mongodb-user?hl=en.
--
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/mongodb-user?hl=en.
 --
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/mongodb-user?hl=en.
--
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group, send email to
For more options, visit this group at
http://groups.google.com/group/mongodb-user?hl=en.
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongodb-user-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to mongodb-user+***@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
Continue reading on narkive:
Loading...