Lloyd Cledwyn
2012-02-17 07:11:06 UTC
I am relatively new to MongoDB, and so far am really impressed. I am
struggling with the best way to setup my document stores though. I am
trying to do some summary analytics using twitter data and I am not sure
whether to put the tweets into the user document, or to keep those as
a separate collection. It seems like putting the tweets inside the user
model would quickly hit the limit with regards to size. If that is the case
then what is a good way to be able to run MapReduce across a group of
user's tweets?
I hope I am not being too vague but I don't want to get too specific and
too far down the wrong path as far as setting up my domain model.
As I am sure you are all bored of hearing, I am used to RDB
|USER___|
---------
|ID
|Name
|Etc.
|TWEET__|
---------
|ID
|UserID
|Etc
It seems like
User
|-Tweet (0..3000)
|-Entities
|-Hashtags (0..10+)
|-urls (0..5)
|-user_mentions (0..12)
|-GeoData (0..20)
|-somegroupID
would quickly bloat the User document beyond capacity. But I would like to
run analysis on tweets belonging to users with similar *somegroupID*. It
conceptually makes sense to to the model layout as above, but at what point
is that too unweildy? And what are viable alternatives?
struggling with the best way to setup my document stores though. I am
trying to do some summary analytics using twitter data and I am not sure
whether to put the tweets into the user document, or to keep those as
a separate collection. It seems like putting the tweets inside the user
model would quickly hit the limit with regards to size. If that is the case
then what is a good way to be able to run MapReduce across a group of
user's tweets?
I hope I am not being too vague but I don't want to get too specific and
too far down the wrong path as far as setting up my domain model.
As I am sure you are all bored of hearing, I am used to RDB
|USER___|
---------
|ID
|Name
|Etc.
|TWEET__|
---------
|ID
|UserID
|Etc
It seems like
User
|-Tweet (0..3000)
|-Entities
|-Hashtags (0..10+)
|-urls (0..5)
|-user_mentions (0..12)
|-GeoData (0..20)
|-somegroupID
would quickly bloat the User document beyond capacity. But I would like to
run analysis on tweets belonging to users with similar *somegroupID*. It
conceptually makes sense to to the model layout as above, but at what point
is that too unweildy? And what are viable alternatives?
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/6UaV3E6xhfoJ.
To post to this group, send email to mongodb-user-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to mongodb-user+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/6UaV3E6xhfoJ.
To post to this group, send email to mongodb-user-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To unsubscribe from this group, send email to mongodb-user+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.