Discussion:
No more pattern /performance solution , mongodb limit capabilities ?!?
(too old to reply)
petit curieux
2015-01-09 12:15:41 UTC
Permalink
relating to this post Why shouldn't I embed large arrays in my documents?
<http://askasya.com/post/largeembeddedarrays>

its very clear thats not a solution to embed data in the same collection to
make query "easy" .


I have a big collection profile, basicly this is a collection for finding
people whith common interests. In each doc i have some other long list like
blacklist you know, like favoris, etc..


my actual bad structure looks like this :



"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands others
....],
"favorites : [18,1982,939,1982,98716,7611,983838, and thousands others
....]
}


The most querys are " find x =3 and whatever =5 where all profile are not
in my blacklist "


The theory and the only alternative to avoid this bad embed array seems to
split "blacklist" and "favorites"

in others collection with some reference of the collection profile, kind of
sql normalisation with foreign key, great,

i can accept the fact thats method need multi step (2 query min) querys to
retrieve your elements;


But what i can't accept is to have tons of items list in your second query,
thats not realistic in performance term..

Here your flow if after the slip:

query1) first you get all _id reference in ur new blacklist colection.(lets
say 4500 items , now pb thats fast)..

query2) With your previous result , you can now perform the second query
and call all profiles were not in the blacklist match list

kind of query :

query : {
idprofile : {$in : [75010,75020,75011,75006,75007, with , thousand, of , _id
, in the list, im, litle,concern,about,the,post,size ,for,performance,
reasons]},



Here is the problem..

Its notn acceptable to make some kind of querys at least for performance
reasons, its a non sens to post thousands of $in reference ,

thats produce an heavy post size and for anybody realistic, thats another
anti tcp patern at least..


So now what can you do ? whats the correct pattern in this common case ?
I'm gona to think we touch the capabilities limit of mongo db ,

and perhaps the nosql document model.


Why don't use sql with my kind of social need ?

Well , SQL don't scale billion of geosearch with many join , thats why we
choose mongo , mongo is fast with geo search and cover 80% of what we need
so its a big pain for us..
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/b833dbaa-252b-484c-8ccc-32243078a983%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Asya Kamsky
2015-01-10 04:44:31 UTC
Permalink
I don't understand how this query relates to your sample document:

The most querys are " find x =3 and whatever =5 where all profile are
not in my blacklist "

"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"favorites : [18,1982,939,1982,98716,7611,983838, and thousands others ....]
}

What is "x", what is "whatever" and what are "profile are not in my blacklist"?

Maybe if you explain what this represents or give a more complete example.

I see no reason why you wouldn't be able to represent this via a
different schema and probably avoid the $in (list of thousands) type
of query (although I know of plenty of users who make such queries -
they actually perform quite well when appropriately indexed).

What are the entities in your application? And what are the queries
that you have to run? Please be complete.

Right now I know that you have people/profiles and you know something
about their favorites and something about their blacklist - can you
explain those in a bit more detail (I don't understand why they are
represented by an array of numbers, for example).

Asya
Post by petit curieux
relating to this post Why shouldn't I embed large arrays in my documents?
its very clear thats not a solution to embed data in the same collection to
make query "easy" .
I have a big collection profile, basicly this is a collection for finding
people whith common interests. In each doc i have some other long list like
blacklist you know, like favoris, etc..
"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands others
....],
"favorites : [18,1982,939,1982,98716,7611,983838, and thousands others
....]
}
The most querys are " find x =3 and whatever =5 where all profile are not in
my blacklist "
The theory and the only alternative to avoid this bad embed array seems to
split "blacklist" and "favorites"
in others collection with some reference of the collection profile, kind of
sql normalisation with foreign key, great,
i can accept the fact thats method need multi step (2 query min) querys to
retrieve your elements;
But what i can't accept is to have tons of items list in your second query,
thats not realistic in performance term..
query1) first you get all _id reference in ur new blacklist colection.(lets
say 4500 items , now pb thats fast)..
query2) With your previous result , you can now perform the second query and
call all profiles were not in the blacklist match list
query : {
idprofile : {$in : [75010,75020,75011,75006,75007, with , thousand, of ,
_id, in the list, im, litle,concern,about,the,post,size
,for,performance,reasons]},
Here is the problem..
Its notn acceptable to make some kind of querys at least for performance
reasons, its a non sens to post thousands of $in reference ,
thats produce an heavy post size and for anybody realistic, thats another
anti tcp patern at least..
So now what can you do ? whats the correct pattern in this common case ? I'm
gona to think we touch the capabilities limit of mongo db ,
and perhaps the nosql document model.
Why don't use sql with my kind of social need ?
Well , SQL don't scale billion of geosearch with many join , thats why we
choose mongo , mongo is fast with geo search and cover 80% of what we need
so its a big pain for us..
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/b833dbaa-252b-484c-8ccc-32243078a983%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/CAOe6dJA6CGkYp%2B%3DMy%2B29StaGrPimmjmdO3z%2Bg5Or0HTbEdYrrg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
petit curieux
2015-01-10 13:26:03 UTC
Permalink
my actual structure is one collection for finding people for travel , so
poeple search people with common interests and caracteristics in some
distance from himself, the blacklist allow to exclude people for the next
search , thats necessary to not see always the same profils.., faveorites
is to have a list of..favorites people .
The blacklist list has no limit grow , you can have in theory 10000 poeple
in your blacklist. So for each search you have to exclud blacklisted people
for the search result..

the structure :

{
"_id" : 1656,
"Region1" : "Alsace",
"code_postal" : 67110,
"ville" : "Eberbach-Woerth",
"work" : "finance",
"travel" : "london",
"age" : "34" ,
"origin" : "france",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands others
....],
"favorites" : [18,1982,939,1982,98716,7611,983838, and thousands others
....],
"hobbies" : ["dancing","dreaming"]
"loc" : {
"type" : "Point",
"coordinates" : [
7.72,
48.91
]
}
}

Each number reference in blacklist and favorite list are _ID of people
blacklisted .
When you blacklist someone , you push *your _ID* in the blacklist list of
the blacklisted guy . It will be by far simple to put the blacklisted guy*
in your* blacklisted list ,
but the aggregation framework doesn't allow some kind of semi join query.
The best logic query should be :

Find all _id in collection profile where _id are not in my blacklist list .
But as far i know you can't do that in one query with mongo .

So this is the reason why the blackliste list can be so big , thats because
each people who want to blakclist bob have to putt their own _ID in the
blacklist list of bob ..
Imagine whats happen now if 134000 pepople don't like bob..

I know this logic is very borderline , but this is the only one i find to
avoid the 2 step query ( $in [134000 , _ID , reference , in , this , kind
, of , query , is , unnacceptable]).

With my borderline bad shema i can query in 1 query and exclude blacklisted
poeple where my _ID is in their blacklist , see here, my _ID is 14 , and i
can exclude 134000 blacklisted guys with only one number in the query post
like this :


db.tablegui.aggregate([


{
$geoNear: {
near: { type: "Point", coordinates: [2.29963, 48.84167] },
distanceField: "dist.calculated",
distanceMultiplier: 1/1000,
maxDistance:25 * 1000 ,
query : {

favorites : {$in : [75010,75020,75011,75006,75007]},
blacklist: { $nin : [14]}


} ,

limit:2000 ,
spherical: true
}
}



I hope you understand more now my shema.. I just do the only move possible
with mongo capabilities..
Now after some lectures, i understood why its not a good idea to embed ,
but, do you have any alternative ?
this is so common case .., i think many guy have my bad shema and when
proble;s comes when their project grow,
they could be very hungry..:)

I'm very open for your advises or solutions alternative .. But please don't
tell me its not a pb to put 134000 _ID $in a list of search, make a query
in 2 step is not a pb at all,
the pb is the unpattern form of the second query.. I propose some feature
request to find a solution a this problematic, you have to consider its not *my
*problematic ,
but more a mongo problematic perhaps ?!

Mongo is great for manythings and first we love , but if there is no
alternative/solution , we will unfortunatly force to find other alternative
..
Post by petit curieux
The most querys are " find x =3 and whatever =5 where all profile are
not in my blacklist "
"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"favorites : [18,1982,939,1982,98716,7611,983838, and thousands others ....]
}
What is "x", what is "whatever" and what are "profile are not in my blacklist"?
Maybe if you explain what this represents or give a more complete example.
I see no reason why you wouldn't be able to represent this via a
different schema and probably avoid the $in (list of thousands) type
of query (although I know of plenty of users who make such queries -
they actually perform quite well when appropriately indexed).
What are the entities in your application? And what are the queries
that you have to run? Please be complete.
Right now I know that you have people/profiles and you know something
about their favorites and something about their blacklist - can you
explain those in a bit more detail (I don't understand why they are
represented by an array of numbers, for example).
Asya
Post by petit curieux
relating to this post Why shouldn't I embed large arrays in my
documents?
Post by petit curieux
its very clear thats not a solution to embed data in the same collection
to
Post by petit curieux
make query "easy" .
I have a big collection profile, basicly this is a collection for
finding
Post by petit curieux
people whith common interests. In each doc i have some other long list
like
Post by petit curieux
blacklist you know, like favoris, etc..
"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others
Post by petit curieux
....],
"favorites : [18,1982,939,1982,98716,7611,983838, and thousands
others
Post by petit curieux
....]
}
The most querys are " find x =3 and whatever =5 where all profile are
not in
Post by petit curieux
my blacklist "
The theory and the only alternative to avoid this bad embed array seems
to
Post by petit curieux
split "blacklist" and "favorites"
in others collection with some reference of the collection profile, kind
of
Post by petit curieux
sql normalisation with foreign key, great,
i can accept the fact thats method need multi step (2 query min) querys
to
Post by petit curieux
retrieve your elements;
But what i can't accept is to have tons of items list in your second
query,
Post by petit curieux
thats not realistic in performance term..
query1) first you get all _id reference in ur new blacklist
colection.(lets
Post by petit curieux
say 4500 items , now pb thats fast)..
query2) With your previous result , you can now perform the second query
and
Post by petit curieux
call all profiles were not in the blacklist match list
query : {
idprofile : {$in : [75010,75020,75011,75006,75007, with , thousand, of ,
_id, in the list, im, litle,concern,about,the,post,size
,for,performance,reasons]},
Here is the problem..
Its notn acceptable to make some kind of querys at least for performance
reasons, its a non sens to post thousands of $in reference ,
thats produce an heavy post size and for anybody realistic, thats
another
Post by petit curieux
anti tcp patern at least..
So now what can you do ? whats the correct pattern in this common case ?
I'm
Post by petit curieux
gona to think we touch the capabilities limit of mongo db ,
and perhaps the nosql document model.
Why don't use sql with my kind of social need ?
Well , SQL don't scale billion of geosearch with many join , thats why
we
Post by petit curieux
choose mongo , mongo is fast with geo search and cover 80% of what we
need
Post by petit curieux
so its a big pain for us..
--
You received this message because you are subscribed to the Google
Groups
Post by petit curieux
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups
Post by petit curieux
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send
an
<javascript:>.
Post by petit curieux
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/b833dbaa-252b-484c-8ccc-32243078a983%40googlegroups.com.
Post by petit curieux
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/2022411a-974c-4891-ba70-10887e0a0327%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Asya Kamsky
2015-01-10 16:54:37 UTC
Permalink
I think I see what you are trying to do, and I think I finally understand
what abbreviation "pb" means :) (is it "problem"?)
But please don't tell me its not a pb to put 134000 _ID $in a list of
search, make a query in 2 step is not a pb at all,
the pb is the unpattern form of the second query..
I think maybe you are thinking about two queries somewhat backwards (at
least to the way I'm thinking about it).

I'm not sure whether it's necessary to be querying for long list of $ids in
collection of people blacklisted or favorites, because before we even
consider best schema for this, I would like to draw an analogy first - a
social graph. If you have a twitter-like service, you will have millions
of people. Some are following a few people, some are following thousands.
Some are being followed by a few people, some are being followed by
millions (celebrities, companies, presidents of big countries, etc).

A colleague of mine and I gave a three part talk about an open source
implementation of a social status feed application at MongoDB World last
year and the recording of part two <http://I'm very open for your advises
or solutions alternative .. But please don't tell me its not a pb to put
134000 _ID $in a list of search, make a query in 2 step is not a pb at all,
the pb is the unpattern form of the second query.. I propose some feature
request to find a solution a this problematic, you have to consider its not
my problematic , but more a mongo problematic perhaps ?!> might be helpful
to you - it's the one that talks about how to represent and query a large
social graph in MongoDB - not surprisingly, embedding a long array of
followers and following into each user profile is *terrible* as it breaks
for the super-users *and* it's not very efficient as on every update you
have to push to an array and you have to have the array indexed and all of
those are horrible for performance (I cover an example of such a
performance problem happening in real life in my other talk <http://I'm
very open for your advises or solutions alternative .. But please don't
tell me its not a pb to put 134000 _ID $in a list of search, make a query
in 2 step is not a pb at all, the pb is the unpattern form of the second
query.. I propose some feature request to find a solution a this
problematic, you have to consider its not my problematic , but more a mongo
problematic perhaps ?!> I gave at MongoDB World on performance debugging,
this case starts at 22:00).

So if you have listened to the talk on handling a social graph in MongoDB
you can see that your use case is not that different - your users or user
profiles are "nodes" of the graph and your "favorites" relationship are
edges, as are your "blacklist" - though to be honest I don't understand how
you can blacklist someone forever if they have been shown - is this the
feature you wanted TTL for? Where you don't show the same person to me for
some period of time after you've already shown them to me?

The thing in your case that makes it more complex is that you have
additional attributes and that's location, interests and some other
characteristics (presumably those are more limited in how many each profile
can have?)

So your person/profile/node is a person, they have a location (geo
coordinates of some kind), array of hobbies/interests, ok, so now you can
have an efficient query by having a compound 2dsphere index on loc,
hobbies.

But your sample query just queries on location and favorites/blacklist -
why isn't hobbies part of the initial filter? Wouldn't that significantly
reduce your initial result set?

The other thing that is not clear to me (a) you say blacklist is based on
who you already showed them - does it require a click from them (don't show
me this person again) or do you automatically blacklist someone once you
showed them and the user didn't pick them? And for favorites, your user
presumably clicks on something to add someone to their favorites - does
that mean you want to show them next time always, or only if the search
criteria matches?

The answers to these questions may suggest a different schema that would be
performant - but you can see, I hope, that without understanding all the
requirements and interactions of the system it's impossible to propose a
"best" schema solution, yes?

Asya
P.S. could you also include the number of users/profiles you need to
support as well as frequency of searches, updates of profiles, etc?
my actual structure is one collection for finding people for travel , so
poeple search people with common interests and caracteristics in some
distance from himself, the blacklist allow to exclude people for the next
search , thats necessary to not see always the same profils.., faveorites
is to have a list of..favorites people .
The blacklist list has no limit grow , you can have in theory 10000 poeple
in your blacklist. So for each search you have to exclud blacklisted people
for the search result..
{
"_id" : 1656,
"Region1" : "Alsace",
"code_postal" : 67110,
"ville" : "Eberbach-Woerth",
"work" : "finance",
"travel" : "london",
"age" : "34" ,
"origin" : "france",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"favorites" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"hobbies" : ["dancing","dreaming"]
"loc" : {
"type" : "Point",
"coordinates" : [
7.72,
48.91
]
}
}
Each number reference in blacklist and favorite list are _ID of people
blacklisted .
When you blacklist someone , you push *your _ID* in the blacklist list of
the blacklisted guy . It will be by far simple to put the blacklisted guy*
in your* blacklisted list ,
but the aggregation framework doesn't allow some kind of semi join query.
Find all _id in collection profile where _id are not in my blacklist list
. But as far i know you can't do that in one query with mongo .
So this is the reason why the blackliste list can be so big , thats
because each people who want to blakclist bob have to putt their own _ID in
the blacklist list of bob ..
Imagine whats happen now if 134000 pepople don't like bob..
I know this logic is very borderline , but this is the only one i find to
avoid the 2 step query ( $in [134000 , _ID , reference , in , this , kind
, of , query , is , unnacceptable]).
With my borderline bad shema i can query in 1 query and exclude
blacklisted poeple where my _ID is in their blacklist , see here, my _ID is
14 , and i can exclude 134000 blacklisted guys with only one number in the
db.tablegui.aggregate([
{
$geoNear: {
near: { type: "Point", coordinates: [2.29963, 48.84167] },
distanceField: "dist.calculated",
distanceMultiplier: 1/1000,
maxDistance:25 * 1000 ,
query : {
favorites : {$in : [75010,75020,75011,75006,75007]},
blacklist: { $nin : [14]}
} ,
limit:2000 ,
spherical: true
}
}
I hope you understand more now my shema.. I just do the only move possible
with mongo capabilities..
Now after some lectures, i understood why its not a good idea to embed ,
but, do you have any alternative ?
this is so common case .., i think many guy have my bad shema and when
proble;s comes when their project grow,
they could be very hungry..:)
I'm very open for your advises or solutions alternative .. But please
don't tell me its not a pb to put 134000 _ID $in a list of search, make a
query in 2 step is not a pb at all,
the pb is the unpattern form of the second query.. I propose some feature
request to find a solution a this problematic, you have to consider its not *my
*problematic ,
but more a mongo problematic perhaps ?!
Mongo is great for manythings and first we love , but if there is no
alternative/solution , we will unfortunatly force to find other alternative
..
Post by petit curieux
The most querys are " find x =3 and whatever =5 where all profile are
not in my blacklist "
"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"favorites : [18,1982,939,1982,98716,7611,983838, and thousands others ....]
}
What is "x", what is "whatever" and what are "profile are not in my blacklist"?
Maybe if you explain what this represents or give a more complete example.
I see no reason why you wouldn't be able to represent this via a
different schema and probably avoid the $in (list of thousands) type
of query (although I know of plenty of users who make such queries -
they actually perform quite well when appropriately indexed).
What are the entities in your application? And what are the queries
that you have to run? Please be complete.
Right now I know that you have people/profiles and you know something
about their favorites and something about their blacklist - can you
explain those in a bit more detail (I don't understand why they are
represented by an array of numbers, for example).
Asya
Post by petit curieux
relating to this post Why shouldn't I embed large arrays in my
documents?
Post by petit curieux
its very clear thats not a solution to embed data in the same
collection to
Post by petit curieux
make query "easy" .
I have a big collection profile, basicly this is a collection for
finding
Post by petit curieux
people whith common interests. In each doc i have some other long list
like
Post by petit curieux
blacklist you know, like favoris, etc..
"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others
Post by petit curieux
....],
"favorites : [18,1982,939,1982,98716,7611,983838, and thousands
others
Post by petit curieux
....]
}
The most querys are " find x =3 and whatever =5 where all profile are
not in
Post by petit curieux
my blacklist "
The theory and the only alternative to avoid this bad embed array seems
to
Post by petit curieux
split "blacklist" and "favorites"
in others collection with some reference of the collection profile,
kind of
Post by petit curieux
sql normalisation with foreign key, great,
i can accept the fact thats method need multi step (2 query min) querys
to
Post by petit curieux
retrieve your elements;
But what i can't accept is to have tons of items list in your second
query,
Post by petit curieux
thats not realistic in performance term..
query1) first you get all _id reference in ur new blacklist
colection.(lets
Post by petit curieux
say 4500 items , now pb thats fast)..
query2) With your previous result , you can now perform the second
query and
Post by petit curieux
call all profiles were not in the blacklist match list
query : {
idprofile : {$in : [75010,75020,75011,75006,75007, with , thousand, of
,
Post by petit curieux
_id, in the list, im, litle,concern,about,the,post,size
,for,performance,reasons]},
Here is the problem..
Its notn acceptable to make some kind of querys at least for
performance
Post by petit curieux
reasons, its a non sens to post thousands of $in reference ,
thats produce an heavy post size and for anybody realistic, thats
another
Post by petit curieux
anti tcp patern at least..
So now what can you do ? whats the correct pattern in this common case
? I'm
Post by petit curieux
gona to think we touch the capabilities limit of mongo db ,
and perhaps the nosql document model.
Why don't use sql with my kind of social need ?
Well , SQL don't scale billion of geosearch with many join , thats why
we
Post by petit curieux
choose mongo , mongo is fast with geo search and cover 80% of what we
need
Post by petit curieux
so its a big pain for us..
--
You received this message because you are subscribed to the Google
Groups
Post by petit curieux
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups
Post by petit curieux
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send
an
Post by petit curieux
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/b833dbaa-
252b-484c-8ccc-32243078a983%40googlegroups.com.
Post by petit curieux
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/2022411a-974c-4891-ba70-10887e0a0327%40googlegroups.com
<https://groups.google.com/d/msgid/mongodb-user/2022411a-974c-4891-ba70-10887e0a0327%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/CAOe6dJDOMaQ40KymON50%2BonWOMTd0jnC%2B0%2BDkjz2fkt62oVDZA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Asya Kamsky
2015-01-10 17:00:29 UTC
Permalink
this is the only one i find to avoid the 2 step query ( $in [134000 ,
_ID , reference , in , this , kind , of , query , is , unnacceptable]).

I don't think you need two queries if you store "blacklist" in the user's
document who blacklisted someone. I.e. if I blacklist 10 people from some
search, you have an array in my document called "blacklist". You don't
have to query for it for the same reason you don't need to query for my
hobbies or userId when you are doing a search for me - you already read my
document when I started my interaction with your application. You read it
in, and it's now available, including my hobbies, my blacklist (which you
can be updating in "real time" as I interact with your site) and my
location, etc.

I still don't understand how favorites intersects with query results - once
I mark favorites won't I only see favorites no matter what else matches?
And until I have favorites isn't it the case that no one matches - i.e. no
one satisfies the query of having me in their favorites list??? Anyway,
those are all side points, I just wanted to point out that if you limit the
size of blacklist to most recent 100 or whatever, they will be available as
soon as you read my profile document and don't have to be queried for.

Asya
my actual structure is one collection for finding people for travel , so
poeple search people with common interests and caracteristics in some
distance from himself, the blacklist allow to exclude people for the next
search , thats necessary to not see always the same profils.., faveorites
is to have a list of..favorites people .
The blacklist list has no limit grow , you can have in theory 10000 poeple
in your blacklist. So for each search you have to exclud blacklisted people
for the search result..
{
"_id" : 1656,
"Region1" : "Alsace",
"code_postal" : 67110,
"ville" : "Eberbach-Woerth",
"work" : "finance",
"travel" : "london",
"age" : "34" ,
"origin" : "france",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"favorites" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"hobbies" : ["dancing","dreaming"]
"loc" : {
"type" : "Point",
"coordinates" : [
7.72,
48.91
]
}
}
Each number reference in blacklist and favorite list are _ID of people
blacklisted .
When you blacklist someone , you push *your _ID* in the blacklist list of
the blacklisted guy . It will be by far simple to put the blacklisted guy*
in your* blacklisted list ,
but the aggregation framework doesn't allow some kind of semi join query.
Find all _id in collection profile where _id are not in my blacklist list
. But as far i know you can't do that in one query with mongo .
So this is the reason why the blackliste list can be so big , thats
because each people who want to blakclist bob have to putt their own _ID in
the blacklist list of bob ..
Imagine whats happen now if 134000 pepople don't like bob..
I know this logic is very borderline , but this is the only one i find to
avoid the 2 step query ( $in [134000 , _ID , reference , in , this , kind
, of , query , is , unnacceptable]).
With my borderline bad shema i can query in 1 query and exclude
blacklisted poeple where my _ID is in their blacklist , see here, my _ID is
14 , and i can exclude 134000 blacklisted guys with only one number in the
db.tablegui.aggregate([
{
$geoNear: {
near: { type: "Point", coordinates: [2.29963, 48.84167] },
distanceField: "dist.calculated",
distanceMultiplier: 1/1000,
maxDistance:25 * 1000 ,
query : {
favorites : {$in : [75010,75020,75011,75006,75007]},
blacklist: { $nin : [14]}
} ,
limit:2000 ,
spherical: true
}
}
I hope you understand more now my shema.. I just do the only move possible
with mongo capabilities..
Now after some lectures, i understood why its not a good idea to embed ,
but, do you have any alternative ?
this is so common case .., i think many guy have my bad shema and when
proble;s comes when their project grow,
they could be very hungry..:)
I'm very open for your advises or solutions alternative .. But please
don't tell me its not a pb to put 134000 _ID $in a list of search, make a
query in 2 step is not a pb at all,
the pb is the unpattern form of the second query.. I propose some feature
request to find a solution a this problematic, you have to consider its not *my
*problematic ,
but more a mongo problematic perhaps ?!
Mongo is great for manythings and first we love , but if there is no
alternative/solution , we will unfortunatly force to find other alternative
..
Post by petit curieux
The most querys are " find x =3 and whatever =5 where all profile are
not in my blacklist "
"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"favorites : [18,1982,939,1982,98716,7611,983838, and thousands others ....]
}
What is "x", what is "whatever" and what are "profile are not in my blacklist"?
Maybe if you explain what this represents or give a more complete example.
I see no reason why you wouldn't be able to represent this via a
different schema and probably avoid the $in (list of thousands) type
of query (although I know of plenty of users who make such queries -
they actually perform quite well when appropriately indexed).
What are the entities in your application? And what are the queries
that you have to run? Please be complete.
Right now I know that you have people/profiles and you know something
about their favorites and something about their blacklist - can you
explain those in a bit more detail (I don't understand why they are
represented by an array of numbers, for example).
Asya
Post by petit curieux
relating to this post Why shouldn't I embed large arrays in my
documents?
Post by petit curieux
its very clear thats not a solution to embed data in the same
collection to
Post by petit curieux
make query "easy" .
I have a big collection profile, basicly this is a collection for
finding
Post by petit curieux
people whith common interests. In each doc i have some other long list
like
Post by petit curieux
blacklist you know, like favoris, etc..
"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others
Post by petit curieux
....],
"favorites : [18,1982,939,1982,98716,7611,983838, and thousands
others
Post by petit curieux
....]
}
The most querys are " find x =3 and whatever =5 where all profile are
not in
Post by petit curieux
my blacklist "
The theory and the only alternative to avoid this bad embed array seems
to
Post by petit curieux
split "blacklist" and "favorites"
in others collection with some reference of the collection profile,
kind of
Post by petit curieux
sql normalisation with foreign key, great,
i can accept the fact thats method need multi step (2 query min) querys
to
Post by petit curieux
retrieve your elements;
But what i can't accept is to have tons of items list in your second
query,
Post by petit curieux
thats not realistic in performance term..
query1) first you get all _id reference in ur new blacklist
colection.(lets
Post by petit curieux
say 4500 items , now pb thats fast)..
query2) With your previous result , you can now perform the second
query and
Post by petit curieux
call all profiles were not in the blacklist match list
query : {
idprofile : {$in : [75010,75020,75011,75006,75007, with , thousand, of
,
Post by petit curieux
_id, in the list, im, litle,concern,about,the,post,size
,for,performance,reasons]},
Here is the problem..
Its notn acceptable to make some kind of querys at least for
performance
Post by petit curieux
reasons, its a non sens to post thousands of $in reference ,
thats produce an heavy post size and for anybody realistic, thats
another
Post by petit curieux
anti tcp patern at least..
So now what can you do ? whats the correct pattern in this common case
? I'm
Post by petit curieux
gona to think we touch the capabilities limit of mongo db ,
and perhaps the nosql document model.
Why don't use sql with my kind of social need ?
Well , SQL don't scale billion of geosearch with many join , thats why
we
Post by petit curieux
choose mongo , mongo is fast with geo search and cover 80% of what we
need
Post by petit curieux
so its a big pain for us..
--
You received this message because you are subscribed to the Google
Groups
Post by petit curieux
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups
Post by petit curieux
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send
an
Post by petit curieux
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/b833dbaa-
252b-484c-8ccc-32243078a983%40googlegroups.com.
Post by petit curieux
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/2022411a-974c-4891-ba70-10887e0a0327%40googlegroups.com
<https://groups.google.com/d/msgid/mongodb-user/2022411a-974c-4891-ba70-10887e0a0327%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/CAOe6dJCuE89Dg0JH5tvgJRi%3D6K2LC11An%3D459ZfrBf_QDXcNsA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
petit curieux
2015-01-10 17:41:23 UTC
Permalink
thans Asya but before i answer, i'd like to see your talk , link seems
broken , can you give the correct link please
Post by Asya Kamsky
this is the only one i find to avoid the 2 step query ( $in [134000 ,
_ID , reference , in , this , kind , of , query , is , unnacceptable]).
I don't think you need two queries if you store "blacklist" in the user's
document who blacklisted someone. I.e. if I blacklist 10 people from some
search, you have an array in my document called "blacklist". You don't
have to query for it for the same reason you don't need to query for my
hobbies or userId when you are doing a search for me - you already read my
document when I started my interaction with your application. You read it
in, and it's now available, including my hobbies, my blacklist (which you
can be updating in "real time" as I interact with your site) and my
location, etc.
I still don't understand how favorites intersects with query results -
once I mark favorites won't I only see favorites no matter what else
matches? And until I have favorites isn't it the case that no one matches
- i.e. no one satisfies the query of having me in their favorites list???
Anyway, those are all side points, I just wanted to point out that if you
limit the size of blacklist to most recent 100 or whatever, they will be
available as soon as you read my profile document and don't have to be
queried for.
Asya
my actual structure is one collection for finding people for travel , so
poeple search people with common interests and caracteristics in some
distance from himself, the blacklist allow to exclude people for the next
search , thats necessary to not see always the same profils.., faveorites
is to have a list of..favorites people .
The blacklist list has no limit grow , you can have in theory 10000
poeple in your blacklist. So for each search you have to exclud blacklisted
people for the search result..
{
"_id" : 1656,
"Region1" : "Alsace",
"code_postal" : 67110,
"ville" : "Eberbach-Woerth",
"work" : "finance",
"travel" : "london",
"age" : "34" ,
"origin" : "france",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"favorites" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"hobbies" : ["dancing","dreaming"]
"loc" : {
"type" : "Point",
"coordinates" : [
7.72,
48.91
]
}
}
Each number reference in blacklist and favorite list are _ID of people
blacklisted .
When you blacklist someone , you push *your _ID* in the blacklist list
of the blacklisted guy . It will be by far simple to put the blacklisted guy*
in your* blacklisted list ,
but the aggregation framework doesn't allow some kind of semi join query.
Find all _id in collection profile where _id are not in my blacklist list
. But as far i know you can't do that in one query with mongo .
So this is the reason why the blackliste list can be so big , thats
because each people who want to blakclist bob have to putt their own _ID in
the blacklist list of bob ..
Imagine whats happen now if 134000 pepople don't like bob..
I know this logic is very borderline , but this is the only one i find to
avoid the 2 step query ( $in [134000 , _ID , reference , in , this , kind
, of , query , is , unnacceptable]).
With my borderline bad shema i can query in 1 query and exclude
blacklisted poeple where my _ID is in their blacklist , see here, my _ID is
14 , and i can exclude 134000 blacklisted guys with only one number in the
db.tablegui.aggregate([
{
$geoNear: {
near: { type: "Point", coordinates: [2.29963, 48.84167] },
distanceField: "dist.calculated",
distanceMultiplier: 1/1000,
maxDistance:25 * 1000 ,
query : {
favorites : {$in : [75010,75020,75011,75006,75007]},
blacklist: { $nin : [14]}
} ,
limit:2000 ,
spherical: true
}
}
I hope you understand more now my shema.. I just do the only move
possible with mongo capabilities..
Now after some lectures, i understood why its not a good idea to embed ,
but, do you have any alternative ?
this is so common case .., i think many guy have my bad shema and when
proble;s comes when their project grow,
they could be very hungry..:)
I'm very open for your advises or solutions alternative .. But please
don't tell me its not a pb to put 134000 _ID $in a list of search, make a
query in 2 step is not a pb at all,
the pb is the unpattern form of the second query.. I propose some feature
request to find a solution a this problematic, you have to consider its not *my
*problematic ,
but more a mongo problematic perhaps ?!
Mongo is great for manythings and first we love , but if there is no
alternative/solution , we will unfortunatly force to find other alternative
..
Post by petit curieux
The most querys are " find x =3 and whatever =5 where all profile are
not in my blacklist "
"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"favorites : [18,1982,939,1982,98716,7611,983838, and thousands others ....]
}
What is "x", what is "whatever" and what are "profile are not in my blacklist"?
Maybe if you explain what this represents or give a more complete example.
I see no reason why you wouldn't be able to represent this via a
different schema and probably avoid the $in (list of thousands) type
of query (although I know of plenty of users who make such queries -
they actually perform quite well when appropriately indexed).
What are the entities in your application? And what are the queries
that you have to run? Please be complete.
Right now I know that you have people/profiles and you know something
about their favorites and something about their blacklist - can you
explain those in a bit more detail (I don't understand why they are
represented by an array of numbers, for example).
Asya
Post by petit curieux
relating to this post Why shouldn't I embed large arrays in my
documents?
Post by petit curieux
its very clear thats not a solution to embed data in the same
collection to
Post by petit curieux
make query "easy" .
I have a big collection profile, basicly this is a collection for
finding
Post by petit curieux
people whith common interests. In each doc i have some other long list
like
Post by petit curieux
blacklist you know, like favoris, etc..
"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others
Post by petit curieux
....],
"favorites : [18,1982,939,1982,98716,7611,983838, and thousands
others
Post by petit curieux
....]
}
The most querys are " find x =3 and whatever =5 where all profile are
not in
Post by petit curieux
my blacklist "
The theory and the only alternative to avoid this bad embed array
seems to
Post by petit curieux
split "blacklist" and "favorites"
in others collection with some reference of the collection profile,
kind of
Post by petit curieux
sql normalisation with foreign key, great,
i can accept the fact thats method need multi step (2 query min)
querys to
Post by petit curieux
retrieve your elements;
But what i can't accept is to have tons of items list in your second
query,
Post by petit curieux
thats not realistic in performance term..
query1) first you get all _id reference in ur new blacklist
colection.(lets
Post by petit curieux
say 4500 items , now pb thats fast)..
query2) With your previous result , you can now perform the second
query and
Post by petit curieux
call all profiles were not in the blacklist match list
query : {
idprofile : {$in : [75010,75020,75011,75006,75007, with , thousand,
of ,
Post by petit curieux
_id, in the list, im, litle,concern,about,the,post,size
,for,performance,reasons]},
Here is the problem..
Its notn acceptable to make some kind of querys at least for
performance
Post by petit curieux
reasons, its a non sens to post thousands of $in reference ,
thats produce an heavy post size and for anybody realistic, thats
another
Post by petit curieux
anti tcp patern at least..
So now what can you do ? whats the correct pattern in this common case
? I'm
Post by petit curieux
gona to think we touch the capabilities limit of mongo db ,
and perhaps the nosql document model.
Why don't use sql with my kind of social need ?
Well , SQL don't scale billion of geosearch with many join , thats why
we
Post by petit curieux
choose mongo , mongo is fast with geo search and cover 80% of what we
need
Post by petit curieux
so its a big pain for us..
--
You received this message because you are subscribed to the Google
Groups
Post by petit curieux
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups
Post by petit curieux
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send
an
Post by petit curieux
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/b833dbaa-
252b-484c-8ccc-32243078a983%40googlegroups.com.
Post by petit curieux
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
<javascript:>.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/2022411a-974c-4891-ba70-10887e0a0327%40googlegroups.com
<https://groups.google.com/d/msgid/mongodb-user/2022411a-974c-4891-ba70-10887e0a0327%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/cf0e88f0-0a65-4a6e-8df1-8c2ccaa1ea4b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
petit curieux
2015-01-10 22:37:20 UTC
Permalink
Ok i finaly find by myself ur talk, thats interesting but ur alternatives
are more or less some hack rather real solutions.. As you pointed out
mongodb isnot a graph databse..
I'm not gonna say more about my pattern pb now , because i finally you will
not able to have solutions .
I mean problem is not you for sure , you tried to help and many thanks for
that , problem is the mongodb limit..

On this point from my point of vue , you should be more clear about, and
admit we can.t do manythings with mongo (actualy at least ) , thats not
always a cleint pb with bad pattern and bad choice .., other possibility
may be mongo is not mature, simply..

I finaly find a total solution with elasticsearch who provide exaclty what
is needed , a path to retrieve array items in other 'collection' , thats
simply resolve 2problesm , the 2step query and the fragmetation(write
concern), please see this feature and seriously think to add this to ur
aggregation framework :

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-ids-filter.html

Honestly , i have to say that i didn't like to much the developpers
attitudes concerning some feature request and problematic exposure.. For
this reason , i will not post and propose this feature request in jira
mongo .
You know, see in jira feature list, for example the bitwise operator
feature request is essential and tagged by trivial , yes trivial thats mean
easy to add..
Now you 're fighting with the new tiger in your garden rather implement the
easy bitwise operator were tons of peoples are waiting for it..

Anyway , thanks for your help (and the talk was great ) , i will get an
eyes of the mongo evolution in some future

regards
Post by petit curieux
thans Asya but before i answer, i'd like to see your talk , link seems
broken , can you give the correct link please
Post by Asya Kamsky
this is the only one i find to avoid the 2 step query ( $in [134000 ,
_ID , reference , in , this , kind , of , query , is , unnacceptable]).
I don't think you need two queries if you store "blacklist" in the user's
document who blacklisted someone. I.e. if I blacklist 10 people from some
search, you have an array in my document called "blacklist". You don't
have to query for it for the same reason you don't need to query for my
hobbies or userId when you are doing a search for me - you already read my
document when I started my interaction with your application. You read it
in, and it's now available, including my hobbies, my blacklist (which you
can be updating in "real time" as I interact with your site) and my
location, etc.
I still don't understand how favorites intersects with query results -
once I mark favorites won't I only see favorites no matter what else
matches? And until I have favorites isn't it the case that no one matches
- i.e. no one satisfies the query of having me in their favorites list???
Anyway, those are all side points, I just wanted to point out that if you
limit the size of blacklist to most recent 100 or whatever, they will be
available as soon as you read my profile document and don't have to be
queried for.
Asya
my actual structure is one collection for finding people for travel , so
poeple search people with common interests and caracteristics in some
distance from himself, the blacklist allow to exclude people for the next
search , thats necessary to not see always the same profils.., faveorites
is to have a list of..favorites people .
The blacklist list has no limit grow , you can have in theory 10000
poeple in your blacklist. So for each search you have to exclud blacklisted
people for the search result..
{
"_id" : 1656,
"Region1" : "Alsace",
"code_postal" : 67110,
"ville" : "Eberbach-Woerth",
"work" : "finance",
"travel" : "london",
"age" : "34" ,
"origin" : "france",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"favorites" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"hobbies" : ["dancing","dreaming"]
"loc" : {
"type" : "Point",
"coordinates" : [
7.72,
48.91
]
}
}
Each number reference in blacklist and favorite list are _ID of people
blacklisted .
When you blacklist someone , you push *your _ID* in the blacklist list
of the blacklisted guy . It will be by far simple to put the blacklisted guy*
in your* blacklisted list ,
but the aggregation framework doesn't allow some kind of semi join
Find all _id in collection profile where _id are not in my blacklist
list . But as far i know you can't do that in one query with mongo .
So this is the reason why the blackliste list can be so big , thats
because each people who want to blakclist bob have to putt their own _ID in
the blacklist list of bob ..
Imagine whats happen now if 134000 pepople don't like bob..
I know this logic is very borderline , but this is the only one i find
to avoid the 2 step query ( $in [134000 , _ID , reference , in , this ,
kind , of , query , is , unnacceptable]).
With my borderline bad shema i can query in 1 query and exclude
blacklisted poeple where my _ID is in their blacklist , see here, my _ID is
14 , and i can exclude 134000 blacklisted guys with only one number in the
db.tablegui.aggregate([
{
$geoNear: {
near: { type: "Point", coordinates: [2.29963, 48.84167] },
distanceField: "dist.calculated",
distanceMultiplier: 1/1000,
maxDistance:25 * 1000 ,
query : {
favorites : {$in : [75010,75020,75011,75006,75007]},
blacklist: { $nin : [14]}
} ,
limit:2000 ,
spherical: true
}
}
I hope you understand more now my shema.. I just do the only move
possible with mongo capabilities..
Now after some lectures, i understood why its not a good idea to embed ,
but, do you have any alternative ?
this is so common case .., i think many guy have my bad shema and when
proble;s comes when their project grow,
they could be very hungry..:)
I'm very open for your advises or solutions alternative .. But please
don't tell me its not a pb to put 134000 _ID $in a list of search, make a
query in 2 step is not a pb at all,
the pb is the unpattern form of the second query.. I propose some
feature request to find a solution a this problematic, you have to consider
its not *my *problematic ,
but more a mongo problematic perhaps ?!
Mongo is great for manythings and first we love , but if there is no
alternative/solution , we will unfortunatly force to find other alternative
..
Post by petit curieux
The most querys are " find x =3 and whatever =5 where all profile are
not in my blacklist "
"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"favorites : [18,1982,939,1982,98716,7611,983838, and thousands others ....]
}
What is "x", what is "whatever" and what are "profile are not in my blacklist"?
Maybe if you explain what this represents or give a more complete example.
I see no reason why you wouldn't be able to represent this via a
different schema and probably avoid the $in (list of thousands) type
of query (although I know of plenty of users who make such queries -
they actually perform quite well when appropriately indexed).
What are the entities in your application? And what are the queries
that you have to run? Please be complete.
Right now I know that you have people/profiles and you know something
about their favorites and something about their blacklist - can you
explain those in a bit more detail (I don't understand why they are
represented by an array of numbers, for example).
Asya
Post by petit curieux
relating to this post Why shouldn't I embed large arrays in my
documents?
Post by petit curieux
its very clear thats not a solution to embed data in the same
collection to
Post by petit curieux
make query "easy" .
I have a big collection profile, basicly this is a collection for
finding
Post by petit curieux
people whith common interests. In each doc i have some other long
list like
Post by petit curieux
blacklist you know, like favoris, etc..
"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others
Post by petit curieux
....],
"favorites : [18,1982,939,1982,98716,7611,983838, and thousands
others
Post by petit curieux
....]
}
The most querys are " find x =3 and whatever =5 where all profile are
not in
Post by petit curieux
my blacklist "
The theory and the only alternative to avoid this bad embed array
seems to
Post by petit curieux
split "blacklist" and "favorites"
in others collection with some reference of the collection profile,
kind of
Post by petit curieux
sql normalisation with foreign key, great,
i can accept the fact thats method need multi step (2 query min)
querys to
Post by petit curieux
retrieve your elements;
But what i can't accept is to have tons of items list in your second
query,
Post by petit curieux
thats not realistic in performance term..
query1) first you get all _id reference in ur new blacklist
colection.(lets
Post by petit curieux
say 4500 items , now pb thats fast)..
query2) With your previous result , you can now perform the second
query and
Post by petit curieux
call all profiles were not in the blacklist match list
query : {
idprofile : {$in : [75010,75020,75011,75006,75007, with , thousand,
of ,
Post by petit curieux
_id, in the list, im, litle,concern,about,the,post,size
,for,performance,reasons]},
Here is the problem..
Its notn acceptable to make some kind of querys at least for
performance
Post by petit curieux
reasons, its a non sens to post thousands of $in reference ,
thats produce an heavy post size and for anybody realistic, thats
another
Post by petit curieux
anti tcp patern at least..
So now what can you do ? whats the correct pattern in this common
case ? I'm
Post by petit curieux
gona to think we touch the capabilities limit of mongo db ,
and perhaps the nosql document model.
Why don't use sql with my kind of social need ?
Well , SQL don't scale billion of geosearch with many join , thats
why we
Post by petit curieux
choose mongo , mongo is fast with geo search and cover 80% of what we
need
Post by petit curieux
so its a big pain for us..
--
You received this message because you are subscribed to the Google
Groups
Post by petit curieux
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups
Post by petit curieux
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it,
send an
Post by petit curieux
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/b833dbaa-
252b-484c-8ccc-32243078a983%40googlegroups.com.
Post by petit curieux
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/2022411a-974c-4891-ba70-10887e0a0327%40googlegroups.com
<https://groups.google.com/d/msgid/mongodb-user/2022411a-974c-4891-ba70-10887e0a0327%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/2586c7d1-07db-4882-99a8-5ec800525ef8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
petit curieux
2015-01-10 22:41:23 UTC
Permalink
my link was wrong , get an eys on it , just fantastic.. :
http://www.elasticsearch.org/blog/terms-filter-lookup/
Post by petit curieux
Ok i finaly find by myself ur talk, thats interesting but ur alternatives
are more or less some hack rather real solutions.. As you pointed out
mongodb isnot a graph databse..
I'm not gonna say more about my pattern pb now , because i finally you
will not able to have solutions .
I mean problem is not you for sure , you tried to help and many thanks for
that , problem is the mongodb limit..
On this point from my point of vue , you should be more clear about, and
admit we can.t do manythings with mongo (actualy at least ) , thats not
always a cleint pb with bad pattern and bad choice .., other possibility
may be mongo is not mature, simply..
I finaly find a total solution with elasticsearch who provide exaclty what
is needed , a path to retrieve array items in other 'collection' , thats
simply resolve 2problesm , the 2step query and the fragmetation(write
concern), please see this feature and seriously think to add this to ur
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-ids-filter.html
Honestly , i have to say that i didn't like to much the developpers
attitudes concerning some feature request and problematic exposure.. For
this reason , i will not post and propose this feature request in jira
mongo .
You know, see in jira feature list, for example the bitwise operator
feature request is essential and tagged by trivial , yes trivial thats mean
easy to add..
Now you 're fighting with the new tiger in your garden rather implement
the easy bitwise operator were tons of peoples are waiting for it..
Anyway , thanks for your help (and the talk was great ) , i will get an
eyes of the mongo evolution in some future
regards
Post by petit curieux
thans Asya but before i answer, i'd like to see your talk , link seems
broken , can you give the correct link please
this is the only one i find to avoid the 2 step query ( $in [134000
, _ID , reference , in , this , kind , of , query , is , unnacceptable]).
I don't think you need two queries if you store "blacklist" in the
user's document who blacklisted someone. I.e. if I blacklist 10 people
from some search, you have an array in my document called "blacklist".
You don't have to query for it for the same reason you don't need to query
for my hobbies or userId when you are doing a search for me - you already
read my document when I started my interaction with your application. You
read it in, and it's now available, including my hobbies, my blacklist
(which you can be updating in "real time" as I interact with your site) and
my location, etc.
I still don't understand how favorites intersects with query results -
once I mark favorites won't I only see favorites no matter what else
matches? And until I have favorites isn't it the case that no one matches
- i.e. no one satisfies the query of having me in their favorites list???
Anyway, those are all side points, I just wanted to point out that if you
limit the size of blacklist to most recent 100 or whatever, they will be
available as soon as you read my profile document and don't have to be
queried for.
Asya
my actual structure is one collection for finding people for travel ,
so poeple search people with common interests and caracteristics in some
distance from himself, the blacklist allow to exclude people for the next
search , thats necessary to not see always the same profils.., faveorites
is to have a list of..favorites people .
The blacklist list has no limit grow , you can have in theory 10000
poeple in your blacklist. So for each search you have to exclud blacklisted
people for the search result..
{
"_id" : 1656,
"Region1" : "Alsace",
"code_postal" : 67110,
"ville" : "Eberbach-Woerth",
"work" : "finance",
"travel" : "london",
"age" : "34" ,
"origin" : "france",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"favorites" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"hobbies" : ["dancing","dreaming"]
"loc" : {
"type" : "Point",
"coordinates" : [
7.72,
48.91
]
}
}
Each number reference in blacklist and favorite list are _ID of
people blacklisted .
When you blacklist someone , you push *your _ID* in the blacklist list
of the blacklisted guy . It will be by far simple to put the blacklisted guy*
in your* blacklisted list ,
but the aggregation framework doesn't allow some kind of semi join
Find all _id in collection profile where _id are not in my blacklist
list . But as far i know you can't do that in one query with mongo .
So this is the reason why the blackliste list can be so big , thats
because each people who want to blakclist bob have to putt their own _ID in
the blacklist list of bob ..
Imagine whats happen now if 134000 pepople don't like bob..
I know this logic is very borderline , but this is the only one i find
to avoid the 2 step query ( $in [134000 , _ID , reference , in , this ,
kind , of , query , is , unnacceptable]).
With my borderline bad shema i can query in 1 query and exclude
blacklisted poeple where my _ID is in their blacklist , see here, my _ID is
14 , and i can exclude 134000 blacklisted guys with only one number in the
db.tablegui.aggregate([
{
$geoNear: {
near: { type: "Point", coordinates: [2.29963, 48.84167] },
distanceField: "dist.calculated",
distanceMultiplier: 1/1000,
maxDistance:25 * 1000 ,
query : {
favorites : {$in : [75010,75020,75011,75006,75007]},
blacklist: { $nin : [14]}
} ,
limit:2000 ,
spherical: true
}
}
I hope you understand more now my shema.. I just do the only move
possible with mongo capabilities..
Now after some lectures, i understood why its not a good idea to embed
, but, do you have any alternative ?
this is so common case .., i think many guy have my bad shema and when
proble;s comes when their project grow,
they could be very hungry..:)
I'm very open for your advises or solutions alternative .. But please
don't tell me its not a pb to put 134000 _ID $in a list of search, make a
query in 2 step is not a pb at all,
the pb is the unpattern form of the second query.. I propose some
feature request to find a solution a this problematic, you have to consider
its not *my *problematic ,
but more a mongo problematic perhaps ?!
Mongo is great for manythings and first we love , but if there is no
alternative/solution , we will unfortunatly force to find other alternative
..
Post by petit curieux
The most querys are " find x =3 and whatever =5 where all profile are
not in my blacklist "
"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"favorites : [18,1982,939,1982,98716,7611,983838, and thousands others ....]
}
What is "x", what is "whatever" and what are "profile are not in my blacklist"?
Maybe if you explain what this represents or give a more complete example.
I see no reason why you wouldn't be able to represent this via a
different schema and probably avoid the $in (list of thousands) type
of query (although I know of plenty of users who make such queries -
they actually perform quite well when appropriately indexed).
What are the entities in your application? And what are the queries
that you have to run? Please be complete.
Right now I know that you have people/profiles and you know something
about their favorites and something about their blacklist - can you
explain those in a bit more detail (I don't understand why they are
represented by an array of numbers, for example).
Asya
Post by petit curieux
relating to this post Why shouldn't I embed large arrays in my
documents?
Post by petit curieux
its very clear thats not a solution to embed data in the same
collection to
Post by petit curieux
make query "easy" .
I have a big collection profile, basicly this is a collection for
finding
Post by petit curieux
people whith common interests. In each doc i have some other long
list like
Post by petit curieux
blacklist you know, like favoris, etc..
"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and
thousands others
Post by petit curieux
....],
"favorites : [18,1982,939,1982,98716,7611,983838, and thousands
others
Post by petit curieux
....]
}
The most querys are " find x =3 and whatever =5 where all profile
are not in
Post by petit curieux
my blacklist "
The theory and the only alternative to avoid this bad embed array
seems to
Post by petit curieux
split "blacklist" and "favorites"
in others collection with some reference of the collection profile,
kind of
Post by petit curieux
sql normalisation with foreign key, great,
i can accept the fact thats method need multi step (2 query min)
querys to
Post by petit curieux
retrieve your elements;
But what i can't accept is to have tons of items list in your second
query,
Post by petit curieux
thats not realistic in performance term..
query1) first you get all _id reference in ur new blacklist
colection.(lets
Post by petit curieux
say 4500 items , now pb thats fast)..
query2) With your previous result , you can now perform the second
query and
Post by petit curieux
call all profiles were not in the blacklist match list
query : {
idprofile : {$in : [75010,75020,75011,75006,75007, with , thousand,
of ,
Post by petit curieux
_id, in the list, im, litle,concern,about,the,post,size
,for,performance,reasons]},
Here is the problem..
Its notn acceptable to make some kind of querys at least for
performance
Post by petit curieux
reasons, its a non sens to post thousands of $in reference ,
thats produce an heavy post size and for anybody realistic, thats
another
Post by petit curieux
anti tcp patern at least..
So now what can you do ? whats the correct pattern in this common
case ? I'm
Post by petit curieux
gona to think we touch the capabilities limit of mongo db ,
and perhaps the nosql document model.
Why don't use sql with my kind of social need ?
Well , SQL don't scale billion of geosearch with many join , thats
why we
Post by petit curieux
choose mongo , mongo is fast with geo search and cover 80% of what
we need
Post by petit curieux
so its a big pain for us..
--
You received this message because you are subscribed to the Google
Groups
Post by petit curieux
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups
Post by petit curieux
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it,
send an
Post by petit curieux
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/b833dbaa-
252b-484c-8ccc-32243078a983%40googlegroups.com.
Post by petit curieux
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/2022411a-974c-4891-ba70-10887e0a0327%40googlegroups.com
<https://groups.google.com/d/msgid/mongodb-user/2022411a-974c-4891-ba70-10887e0a0327%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/8be2601d-e21d-4afd-9fa9-6f209cb34196%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
s.molinari
2015-01-11 18:51:06 UTC
Permalink
Even with my limited knowledge, I think you are denormalizing, when it
isn't proper to do so.

Also, the MongoDB staff have to be one of the most honest groups of folks I
know, especially when it comes to their own technology and if you want to
do something improper with Mongo and it simply won't work, they will be the
first to tell you.

Scott
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/c7a7f120-d050-4b17-9d2e-1619cb42f452%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Asya Kamsky
2015-01-12 00:36:15 UTC
Permalink
Sorry this ended up being long, but I wanted to address a number of points
you brought up.

First, you are certainly entitled to your opinion, but I happen to strongly
disagree with this statement:

" ur alternatives are more or less some hack rather real solutions."

I think you have a preconceived notion of what the solution *should* look
like and anything that looks different you label as a "hack".

That's not what makes something a hack. A hack would be using something
*not* the way it was designed or intended to work - MongoDB is a database -
it's meant to support many different use cases, and different schemas and
approaches will work better or worse depending on requirements.

It's a common misconception that things must work a specific way or else
MongoDB isn't being used correctly, but while there are definitely
anti-patterns, there are almost always multiple ways to solve a problem.

And while there are plenty of use case MongoDB is not a good match for it's
not necessarily because it's immature, but because it's the wrong tool.
Oracle is mature but it's not the right tool for some use cases too!

And commenting on Jira tickets is precisely how we know what capabilities
are important to users and *why*.

But I'm glad you brought up the $bitwise operator ticket - you mention in
it storing 30 values turned into numbers and crammed into a single number
as bits. *That* is a hack, in my opinion. If you have 30 possible values,
a non-hack way would be to store them in an array and then you can index
it, you can query it, you *don't* have to store a separate lookup for it,
and it's actually readable and maintainable. Performance optimization of
collapsing it into a single number that you then have to set and query the
bits of seems like the hack here.

And speaking of things that may seem like a good idea but sometimes end up
not so good: you link to feature in Elastic Search that looks a lot like
supporting a join to me - you are making a query of one table but your
condition is based on a value in another table, so you do a sub-query of
sorts, on the server side.

There is no problem with this UNTIL YOU NEED TO SCALE. I don't know how ES
scales up, but MongoDB scales via sharding and the problem with an approach
like this is that it won't work sharded - your "server-side" lookup may
suddenly need to query a completely different shard than the one you are
querying (or multiple shards, each of which then needs to look up this list
of terms!)

One of MongoDB architecture/design principles says everything we add to the
system *must* work sharded, because there is nothing worse than using a
database and then when you need to scale it because things are going great
you find out that a bunch of features you are relying on suddenly stop
working! Or they work but are suddenly they are dog-slow!

Funny thing is, that we _do_ have a future plan to support joins (in the
long term), but it *has* to be support that works sharded, which means it
will be restricted in certain ways (like you can only join things on the
same shard) and that means we have to implement a number of features as a
prerequisite to being able to support joins in a sharded MongoDB cluster.

Asya

P.S. the $bitwise query operator request was filed as "P5 Trivial" by the
original creator of the Jira ticket.
Post by petit curieux
Ok i finaly find by myself ur talk, thats interesting but ur alternatives
are more or less some hack rather real solutions.. As you pointed out
mongodb isnot a graph databse..
I'm not gonna say more about my pattern pb now , because i finally you
will not able to have solutions .
I mean problem is not you for sure , you tried to help and many thanks for
that , problem is the mongodb limit..
On this point from my point of vue , you should be more clear about, and
admit we can.t do manythings with mongo (actualy at least ) , thats not
always a cleint pb with bad pattern and bad choice .., other possibility
may be mongo is not mature, simply..
I finaly find a total solution with elasticsearch who provide exaclty what
is needed , a path to retrieve array items in other 'collection' , thats
simply resolve 2problesm , the 2step query and the fragmetation(write
concern), please see this feature and seriously think to add this to ur
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-ids-filter.html
Honestly , i have to say that i didn't like to much the developpers
attitudes concerning some feature request and problematic exposure.. For
this reason , i will not post and propose this feature request in jira
mongo .
You know, see in jira feature list, for example the bitwise operator
feature request is essential and tagged by trivial , yes trivial thats mean
easy to add..
Now you 're fighting with the new tiger in your garden rather implement
the easy bitwise operator were tons of peoples are waiting for it..
Anyway , thanks for your help (and the talk was great ) , i will get an
eyes of the mongo evolution in some future
regards
Post by petit curieux
thans Asya but before i answer, i'd like to see your talk , link seems
broken , can you give the correct link please
this is the only one i find to avoid the 2 step query ( $in [134000
, _ID , reference , in , this , kind , of , query , is , unnacceptable]).
I don't think you need two queries if you store "blacklist" in the
user's document who blacklisted someone. I.e. if I blacklist 10 people
from some search, you have an array in my document called "blacklist".
You don't have to query for it for the same reason you don't need to query
for my hobbies or userId when you are doing a search for me - you already
read my document when I started my interaction with your application. You
read it in, and it's now available, including my hobbies, my blacklist
(which you can be updating in "real time" as I interact with your site) and
my location, etc.
I still don't understand how favorites intersects with query results -
once I mark favorites won't I only see favorites no matter what else
matches? And until I have favorites isn't it the case that no one matches
- i.e. no one satisfies the query of having me in their favorites list???
Anyway, those are all side points, I just wanted to point out that if you
limit the size of blacklist to most recent 100 or whatever, they will be
available as soon as you read my profile document and don't have to be
queried for.
Asya
my actual structure is one collection for finding people for travel ,
so poeple search people with common interests and caracteristics in some
distance from himself, the blacklist allow to exclude people for the next
search , thats necessary to not see always the same profils.., faveorites
is to have a list of..favorites people .
The blacklist list has no limit grow , you can have in theory 10000
poeple in your blacklist. So for each search you have to exclud blacklisted
people for the search result..
{
"_id" : 1656,
"Region1" : "Alsace",
"code_postal" : 67110,
"ville" : "Eberbach-Woerth",
"work" : "finance",
"travel" : "london",
"age" : "34" ,
"origin" : "france",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"favorites" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"hobbies" : ["dancing","dreaming"]
"loc" : {
"type" : "Point",
"coordinates" : [
7.72,
48.91
]
}
}
Each number reference in blacklist and favorite list are _ID of
people blacklisted .
When you blacklist someone , you push *your _ID* in the blacklist list
of the blacklisted guy . It will be by far simple to put the blacklisted guy*
in your* blacklisted list ,
but the aggregation framework doesn't allow some kind of semi join
Find all _id in collection profile where _id are not in my blacklist
list . But as far i know you can't do that in one query with mongo .
So this is the reason why the blackliste list can be so big , thats
because each people who want to blakclist bob have to putt their own _ID in
the blacklist list of bob ..
Imagine whats happen now if 134000 pepople don't like bob..
I know this logic is very borderline , but this is the only one i find
to avoid the 2 step query ( $in [134000 , _ID , reference , in , this ,
kind , of , query , is , unnacceptable]).
With my borderline bad shema i can query in 1 query and exclude
blacklisted poeple where my _ID is in their blacklist , see here, my _ID is
14 , and i can exclude 134000 blacklisted guys with only one number in the
db.tablegui.aggregate([
{
$geoNear: {
near: { type: "Point", coordinates: [2.29963, 48.84167] },
distanceField: "dist.calculated",
distanceMultiplier: 1/1000,
maxDistance:25 * 1000 ,
query : {
favorites : {$in : [75010,75020,75011,75006,75007]},
blacklist: { $nin : [14]}
} ,
limit:2000 ,
spherical: true
}
}
I hope you understand more now my shema.. I just do the only move
possible with mongo capabilities..
Now after some lectures, i understood why its not a good idea to embed
, but, do you have any alternative ?
this is so common case .., i think many guy have my bad shema and when
proble;s comes when their project grow,
they could be very hungry..:)
I'm very open for your advises or solutions alternative .. But please
don't tell me its not a pb to put 134000 _ID $in a list of search, make a
query in 2 step is not a pb at all,
the pb is the unpattern form of the second query.. I propose some
feature request to find a solution a this problematic, you have to consider
its not *my *problematic ,
but more a mongo problematic perhaps ?!
Mongo is great for manythings and first we love , but if there is no
alternative/solution , we will unfortunatly force to find other alternative
..
Post by petit curieux
The most querys are " find x =3 and whatever =5 where all profile are
not in my blacklist "
"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"favorites : [18,1982,939,1982,98716,7611,983838, and thousands others ....]
}
What is "x", what is "whatever" and what are "profile are not in my blacklist"?
Maybe if you explain what this represents or give a more complete example.
I see no reason why you wouldn't be able to represent this via a
different schema and probably avoid the $in (list of thousands) type
of query (although I know of plenty of users who make such queries -
they actually perform quite well when appropriately indexed).
What are the entities in your application? And what are the queries
that you have to run? Please be complete.
Right now I know that you have people/profiles and you know something
about their favorites and something about their blacklist - can you
explain those in a bit more detail (I don't understand why they are
represented by an array of numbers, for example).
Asya
Post by petit curieux
relating to this post Why shouldn't I embed large arrays in my
documents?
Post by petit curieux
its very clear thats not a solution to embed data in the same
collection to
Post by petit curieux
make query "easy" .
I have a big collection profile, basicly this is a collection for
finding
Post by petit curieux
people whith common interests. In each doc i have some other long
list like
Post by petit curieux
blacklist you know, like favoris, etc..
"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and
thousands others
Post by petit curieux
....],
"favorites : [18,1982,939,1982,98716,7611,983838, and thousands
others
Post by petit curieux
....]
}
The most querys are " find x =3 and whatever =5 where all profile
are not in
Post by petit curieux
my blacklist "
The theory and the only alternative to avoid this bad embed array
seems to
Post by petit curieux
split "blacklist" and "favorites"
in others collection with some reference of the collection profile,
kind of
Post by petit curieux
sql normalisation with foreign key, great,
i can accept the fact thats method need multi step (2 query min)
querys to
Post by petit curieux
retrieve your elements;
But what i can't accept is to have tons of items list in your second
query,
Post by petit curieux
thats not realistic in performance term..
query1) first you get all _id reference in ur new blacklist
colection.(lets
Post by petit curieux
say 4500 items , now pb thats fast)..
query2) With your previous result , you can now perform the second
query and
Post by petit curieux
call all profiles were not in the blacklist match list
query : {
idprofile : {$in : [75010,75020,75011,75006,75007, with , thousand,
of ,
Post by petit curieux
_id, in the list, im, litle,concern,about,the,post,size
,for,performance,reasons]},
Here is the problem..
Its notn acceptable to make some kind of querys at least for
performance
Post by petit curieux
reasons, its a non sens to post thousands of $in reference ,
thats produce an heavy post size and for anybody realistic, thats
another
Post by petit curieux
anti tcp patern at least..
So now what can you do ? whats the correct pattern in this common
case ? I'm
Post by petit curieux
gona to think we touch the capabilities limit of mongo db ,
and perhaps the nosql document model.
Why don't use sql with my kind of social need ?
Well , SQL don't scale billion of geosearch with many join , thats
why we
Post by petit curieux
choose mongo , mongo is fast with geo search and cover 80% of what
we need
Post by petit curieux
so its a big pain for us..
--
You received this message because you are subscribed to the Google
Groups
Post by petit curieux
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups
Post by petit curieux
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it,
send an
Post by petit curieux
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/b833dbaa-252b
-484c-8ccc-32243078a983%40googlegroups.com.
Post by petit curieux
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/
msgid/mongodb-user/2022411a-974c-4891-ba70-10887e0a0327%
40googlegroups.com
<https://groups.google.com/d/msgid/mongodb-user/2022411a-974c-4891-ba70-10887e0a0327%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/2586c7d1-07db-4882-99a8-5ec800525ef8%40googlegroups.com
<https://groups.google.com/d/msgid/mongodb-user/2586c7d1-07db-4882-99a8-5ec800525ef8%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/CAOe6dJAwnif-EU6oEnjFveT1ZirAQ7ZwEBVVqY54%3DfQX2-vvYg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
petit curieux
2015-01-14 01:57:04 UTC
Permalink
Well , i guess we can debate several years to define whats is or not a hack
..

My arguments i answer
here https://jira.mongodb.org/browse/SERVER-3518?jql=text%20~%20%22bitwise%22
concerning the interest of bitwise operator doesn't change , and any
serious devellopers will tell you the same..
You have 50 fields with 50 choices , if posting a query with 2500 values is
ok for you i have nothing to add..

This feature have 110 votes, 73 watchers , many comments.. thats an
important feature but its planned but not scheduled
<https://jira.mongodb.org/browse/SERVER/fixforversion/10110> !

Anyway, focusing on the real pb, the fact :
* mongo cannot scale my problematic and its a common ( and very interesting
) problematic:*

Embed growing array doesn't scale, ok. My blacklist have 5000 items and
need to be a part of each profiles query search, ok. Posting 5000 items in
a list of the second query is unacceptable, ok .
So its finish? i have to choose another product ?

well , when a pb comes and it touch many cases , generally we try to find a
solution..
I propose you some alternative who can solve or at least atenuate the pb :

1) a capped collection at field level , thats mean the blacklist array
reduce by itself , a restrict numbers items array if you prefer
2) TTL at field level if basicly the same kind of 1)
3) a batching process to perform the 2 step localy and avoid the the yield
of 5000 items in a post.. a kind of stored procedure

I propose you some valuable feature request for a real problem and you
simply not concidering them...

PS: the path feature of elasticsearch who solve the pb is not a real join ,
i mean that just point on an array, this kind of filter could be a part
of the first filter of the agregation framework for example..
you only need to know on witch shard you have to search the array ..:)
Post by Asya Kamsky
Sorry this ended up being long, but I wanted to address a number of points
you brought up.
First, you are certainly entitled to your opinion, but I happen to
" ur alternatives are more or less some hack rather real solutions."
I think you have a preconceived notion of what the solution *should* look
like and anything that looks different you label as a "hack".
That's not what makes something a hack. A hack would be using something
*not* the way it was designed or intended to work - MongoDB is a database -
it's meant to support many different use cases, and different schemas and
approaches will work better or worse depending on requirements.
It's a common misconception that things must work a specific way or else
MongoDB isn't being used correctly, but while there are definitely
anti-patterns, there are almost always multiple ways to solve a problem.
And while there are plenty of use case MongoDB is not a good match for
it's not necessarily because it's immature, but because it's the wrong
tool. Oracle is mature but it's not the right tool for some use cases too!
And commenting on Jira tickets is precisely how we know what capabilities
are important to users and *why*.
But I'm glad you brought up the $bitwise operator ticket - you mention in
it storing 30 values turned into numbers and crammed into a single number
as bits. *That* is a hack, in my opinion. If you have 30 possible values,
a non-hack way would be to store them in an array and then you can index
it, you can query it, you *don't* have to store a separate lookup for it,
and it's actually readable and maintainable. Performance optimization of
collapsing it into a single number that you then have to set and query the
bits of seems like the hack here.
And speaking of things that may seem like a good idea but sometimes end up
not so good: you link to feature in Elastic Search that looks a lot like
supporting a join to me - you are making a query of one table but your
condition is based on a value in another table, so you do a sub-query of
sorts, on the server side.
There is no problem with this UNTIL YOU NEED TO SCALE. I don't know how
ES scales up, but MongoDB scales via sharding and the problem with an
approach like this is that it won't work sharded - your "server-side"
lookup may suddenly need to query a completely different shard than the one
you are querying (or multiple shards, each of which then needs to look up
this list of terms!)
One of MongoDB architecture/design principles says everything we add to
the system *must* work sharded, because there is nothing worse than using a
database and then when you need to scale it because things are going great
you find out that a bunch of features you are relying on suddenly stop
working! Or they work but are suddenly they are dog-slow!
Funny thing is, that we _do_ have a future plan to support joins (in the
long term), but it *has* to be support that works sharded, which means it
will be restricted in certain ways (like you can only join things on the
same shard) and that means we have to implement a number of features as a
prerequisite to being able to support joins in a sharded MongoDB cluster.
Asya
P.S. the $bitwise query operator request was filed as "P5 Trivial" by the
original creator of the Jira ticket.
Post by petit curieux
Ok i finaly find by myself ur talk, thats interesting but ur alternatives
are more or less some hack rather real solutions.. As you pointed out
mongodb isnot a graph databse..
I'm not gonna say more about my pattern pb now , because i finally you
will not able to have solutions .
I mean problem is not you for sure , you tried to help and many thanks
for that , problem is the mongodb limit..
On this point from my point of vue , you should be more clear about, and
admit we can.t do manythings with mongo (actualy at least ) , thats not
always a cleint pb with bad pattern and bad choice .., other possibility
may be mongo is not mature, simply..
I finaly find a total solution with elasticsearch who provide exaclty
what is needed , a path to retrieve array items in other 'collection' ,
thats simply resolve 2problesm , the 2step query and the fragmetation(write
concern), please see this feature and seriously think to add this to ur
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-ids-filter.html
Honestly , i have to say that i didn't like to much the developpers
attitudes concerning some feature request and problematic exposure.. For
this reason , i will not post and propose this feature request in jira
mongo .
You know, see in jira feature list, for example the bitwise operator
feature request is essential and tagged by trivial , yes trivial thats mean
easy to add..
Now you 're fighting with the new tiger in your garden rather implement
the easy bitwise operator were tons of peoples are waiting for it..
Anyway , thanks for your help (and the talk was great ) , i will get an
eyes of the mongo evolution in some future
regards
Post by petit curieux
thans Asya but before i answer, i'd like to see your talk , link seems
broken , can you give the correct link please
this is the only one i find to avoid the 2 step query ( $in [134000
, _ID , reference , in , this , kind , of , query , is , unnacceptable]).
I don't think you need two queries if you store "blacklist" in the
user's document who blacklisted someone. I.e. if I blacklist 10 people
from some search, you have an array in my document called "blacklist".
You don't have to query for it for the same reason you don't need to query
for my hobbies or userId when you are doing a search for me - you already
read my document when I started my interaction with your application. You
read it in, and it's now available, including my hobbies, my blacklist
(which you can be updating in "real time" as I interact with your site) and
my location, etc.
I still don't understand how favorites intersects with query results -
once I mark favorites won't I only see favorites no matter what else
matches? And until I have favorites isn't it the case that no one matches
- i.e. no one satisfies the query of having me in their favorites list???
Anyway, those are all side points, I just wanted to point out that if you
limit the size of blacklist to most recent 100 or whatever, they will be
available as soon as you read my profile document and don't have to be
queried for.
Asya
my actual structure is one collection for finding people for travel ,
so poeple search people with common interests and caracteristics in some
distance from himself, the blacklist allow to exclude people for the next
search , thats necessary to not see always the same profils.., faveorites
is to have a list of..favorites people .
The blacklist list has no limit grow , you can have in theory 10000
poeple in your blacklist. So for each search you have to exclud blacklisted
people for the search result..
{
"_id" : 1656,
"Region1" : "Alsace",
"code_postal" : 67110,
"ville" : "Eberbach-Woerth",
"work" : "finance",
"travel" : "london",
"age" : "34" ,
"origin" : "france",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"favorites" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"hobbies" : ["dancing","dreaming"]
"loc" : {
"type" : "Point",
"coordinates" : [
7.72,
48.91
]
}
}
Each number reference in blacklist and favorite list are _ID of
people blacklisted .
When you blacklist someone , you push *your _ID* in the blacklist
list of the blacklisted guy . It will be by far simple to put the
blacklisted guy* in your* blacklisted list ,
but the aggregation framework doesn't allow some kind of semi join
Find all _id in collection profile where _id are not in my blacklist
list . But as far i know you can't do that in one query with mongo .
So this is the reason why the blackliste list can be so big , thats
because each people who want to blakclist bob have to putt their own _ID in
the blacklist list of bob ..
Imagine whats happen now if 134000 pepople don't like bob..
I know this logic is very borderline , but this is the only one i find
to avoid the 2 step query ( $in [134000 , _ID , reference , in , this ,
kind , of , query , is , unnacceptable]).
With my borderline bad shema i can query in 1 query and exclude
blacklisted poeple where my _ID is in their blacklist , see here, my _ID is
14 , and i can exclude 134000 blacklisted guys with only one number in the
db.tablegui.aggregate([
{
$geoNear: {
near: { type: "Point", coordinates: [2.29963, 48.84167] },
distanceField: "dist.calculated",
distanceMultiplier: 1/1000,
maxDistance:25 * 1000 ,
query : {
favorites : {$in : [75010,75020,75011,75006,75007]},
blacklist: { $nin : [14]}
} ,
limit:2000 ,
spherical: true
}
}
I hope you understand more now my shema.. I just do the only move
possible with mongo capabilities..
Now after some lectures, i understood why its not a good idea to embed
, but, do you have any alternative ?
this is so common case .., i think many guy have my bad shema and when
proble;s comes when their project grow,
they could be very hungry..:)
I'm very open for your advises or solutions alternative .. But please
don't tell me its not a pb to put 134000 _ID $in a list of search, make a
query in 2 step is not a pb at all,
the pb is the unpattern form of the second query.. I propose some
feature request to find a solution a this problematic, you have to consider
its not *my *problematic ,
but more a mongo problematic perhaps ?!
Mongo is great for manythings and first we love , but if there is no
alternative/solution , we will unfortunatly force to find other alternative
..
Post by petit curieux
The most querys are " find x =3 and whatever =5 where all profile are
not in my blacklist "
"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"favorites : [18,1982,939,1982,98716,7611,983838, and thousands others ....]
}
What is "x", what is "whatever" and what are "profile are not in my blacklist"?
Maybe if you explain what this represents or give a more complete example.
I see no reason why you wouldn't be able to represent this via a
different schema and probably avoid the $in (list of thousands) type
of query (although I know of plenty of users who make such queries -
they actually perform quite well when appropriately indexed).
What are the entities in your application? And what are the queries
that you have to run? Please be complete.
Right now I know that you have people/profiles and you know something
about their favorites and something about their blacklist - can you
explain those in a bit more detail (I don't understand why they are
represented by an array of numbers, for example).
Asya
Post by petit curieux
relating to this post Why shouldn't I embed large arrays in my
documents?
Post by petit curieux
its very clear thats not a solution to embed data in the same
collection to
Post by petit curieux
make query "easy" .
I have a big collection profile, basicly this is a collection for
finding
Post by petit curieux
people whith common interests. In each doc i have some other long
list like
Post by petit curieux
blacklist you know, like favoris, etc..
"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and
thousands others
Post by petit curieux
....],
"favorites : [18,1982,939,1982,98716,7611,983838, and
thousands others
Post by petit curieux
....]
}
The most querys are " find x =3 and whatever =5 where all profile
are not in
Post by petit curieux
my blacklist "
The theory and the only alternative to avoid this bad embed array
seems to
Post by petit curieux
split "blacklist" and "favorites"
in others collection with some reference of the collection profile,
kind of
Post by petit curieux
sql normalisation with foreign key, great,
i can accept the fact thats method need multi step (2 query min)
querys to
Post by petit curieux
retrieve your elements;
But what i can't accept is to have tons of items list in your
second query,
Post by petit curieux
thats not realistic in performance term..
query1) first you get all _id reference in ur new blacklist
colection.(lets
Post by petit curieux
say 4500 items , now pb thats fast)..
query2) With your previous result , you can now perform the second
query and
Post by petit curieux
call all profiles were not in the blacklist match list
query : {
idprofile : {$in : [75010,75020,75011,75006,75007, with ,
thousand, of ,
Post by petit curieux
_id, in the list, im, litle,concern,about,the,post,size
,for,performance,reasons]},
Here is the problem..
Its notn acceptable to make some kind of querys at least for
performance
Post by petit curieux
reasons, its a non sens to post thousands of $in reference ,
thats produce an heavy post size and for anybody realistic, thats
another
Post by petit curieux
anti tcp patern at least..
So now what can you do ? whats the correct pattern in this common
case ? I'm
Post by petit curieux
gona to think we touch the capabilities limit of mongo db ,
and perhaps the nosql document model.
Why don't use sql with my kind of social need ?
Well , SQL don't scale billion of geosearch with many join , thats
why we
Post by petit curieux
choose mongo , mongo is fast with geo search and cover 80% of what
we need
Post by petit curieux
so its a big pain for us..
--
You received this message because you are subscribed to the Google
Groups
Post by petit curieux
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups
Post by petit curieux
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it,
send an
Post by petit curieux
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/b833dbaa-252b
-484c-8ccc-32243078a983%40googlegroups.com.
Post by petit curieux
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/
msgid/mongodb-user/2022411a-974c-4891-ba70-10887e0a0327%
40googlegroups.com
<https://groups.google.com/d/msgid/mongodb-user/2022411a-974c-4891-ba70-10887e0a0327%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
<javascript:>.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/2586c7d1-07db-4882-99a8-5ec800525ef8%40googlegroups.com
<https://groups.google.com/d/msgid/mongodb-user/2586c7d1-07db-4882-99a8-5ec800525ef8%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/e046d7b3-d39a-4d32-9f06-5ce2502411e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
s.molinari
2015-01-14 05:23:54 UTC
Permalink
But if the array is on a single shard, and you need that array for the
majority of queries going to all shards, you've lost at least some of the
reason why you've sharded in the first place.

I'd personally like to understand the requirement of the blacklist. Can you
explain that more? Why can't you leave it in memory/ your application and
just programmatically compare the two lists?

Scott
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/c117ae42-6105-4401-bbca-37530784e6b7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
petit curieux
2015-01-15 10:53:52 UTC
Permalink
@scott i ever explained the blacklist function, you have a result of
profiles search, you click on them for blacklist them, they should not be
appear in the next search ..
I have a redis layer to avoid update every time..
You cannot compare an array result with a in memory list because items must
be exclude at the search level..
Imagine you hava blacklisted profile id form 1 to 300 , each search return
500 result limit , with a compare list logic you loose 200 id in ur result
, the search is inccorrect, etc...
item blacklisted must be exclude real time when search.

The only thing you can do if you want to avoid 2 querys is to catch the
blacklist array in memcached or redis , but if ur list is 2000 items, this
is not interesting, a second query
is better from a performance point of vue..
Post by s.molinari
But if the array is on a single shard, and you need that array for the
majority of queries going to all shards, you've lost at least some of the
reason why you've sharded in the first place.
I'd personally like to understand the requirement of the blacklist. Can
you explain that more? Why can't you leave it in memory/ your application
and just programmatically compare the two lists?
Scott
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/3e1a479c-a8ea-45ce-84d3-8fd5fbad32b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
s.molinari
2015-01-15 13:32:27 UTC
Permalink
How about a "mark as read function"? The user gets shown the "old" profiles
and the "new" profiles, which are denoted differently in some way, like
make the name of the user in the profile list bold. This way, you don't
have the issue of having to not show the "blacklisted" or "read" profiles.
You just show them differently.

Scott
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/65d878df-0311-4d00-b81f-f9c18df1dd81%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
petit curieux
2015-01-15 18:42:30 UTC
Permalink
thanks for suggestion but thats will change the functionality , i continue
to hope a Asya or any mongo expert solution to solve this issue ..
Post by s.molinari
How about a "mark as read function"? The user gets shown the "old"
profiles and the "new" profiles, which are denoted differently in some way,
like make the name of the user in the profile list bold. This way, you
don't have the issue of having to not show the "blacklisted" or "read"
profiles. You just show them differently.
Scott
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/6ebd32ca-50af-4697-9552-2e17845d8674%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Asya Kamsky
2015-01-16 01:29:51 UTC
Permalink
Post by petit curieux
Embed growing array doesn't scale, ok.
True.
Post by petit curieux
My blacklist have 5000 items and need to be a part of each profiles query
search, ok. Posting 5000 items in a list of the second query is
unacceptable, ok .

Why is it unacceptable? It's actually not a big deal at all if it's
properly indexed.
Post by petit curieux
1) a capped collection at field level , thats mean the blacklist array
reduce by itself , a restrict numbers items array if you prefer

This feature already exists. It's called capped arrays but when it was
pointed out to you, I thought you said it was a hack and not a real
solution. It *is* a real solution and it was meant to be used in exactly
the type of case you describe - you have an array you keep adding to but
you only need to keep the last 10, 100, 1000, however many elements.

And again, 3) can't work sharded so that's not really an option since it
will limit scaling.

Asya
Post by petit curieux
Well , i guess we can debate several years to define whats is or not a
hack ..
My arguments i answer here
https://jira.mongodb.org/browse/SERVER-3518?jql=text%20~%20%22bitwise%22
concerning the interest of bitwise operator doesn't change , and any
serious devellopers will tell you the same..
You have 50 fields with 50 choices , if posting a query with 2500 values
is ok for you i have nothing to add..
This feature have 110 votes, 73 watchers , many comments.. thats an
important feature but its planned but not scheduled
<https://jira.mongodb.org/browse/SERVER/fixforversion/10110> !
* mongo cannot scale my problematic and its a common ( and very
interesting ) problematic:*
Embed growing array doesn't scale, ok. My blacklist have 5000 items and
need to be a part of each profiles query search, ok. Posting 5000 items in
a list of the second query is unacceptable, ok .
So its finish? i have to choose another product ?
well , when a pb comes and it touch many cases , generally we try to find
a solution..
1) a capped collection at field level , thats mean the blacklist array
reduce by itself , a restrict numbers items array if you prefer
2) TTL at field level if basicly the same kind of 1)
3) a batching process to perform the 2 step localy and avoid the the yield
of 5000 items in a post.. a kind of stored procedure
I propose you some valuable feature request for a real problem and you
simply not concidering them...
PS: the path feature of elasticsearch who solve the pb is not a real join
, i mean that just point on an array, this kind of filter could be a part
of the first filter of the agregation framework for example..
you only need to know on witch shard you have to search the array ..:)
Post by Asya Kamsky
Sorry this ended up being long, but I wanted to address a number of
points you brought up.
First, you are certainly entitled to your opinion, but I happen to
" ur alternatives are more or less some hack rather real solutions."
I think you have a preconceived notion of what the solution *should* look
like and anything that looks different you label as a "hack".
That's not what makes something a hack. A hack would be using something
*not* the way it was designed or intended to work - MongoDB is a database -
it's meant to support many different use cases, and different schemas and
approaches will work better or worse depending on requirements.
It's a common misconception that things must work a specific way or else
MongoDB isn't being used correctly, but while there are definitely
anti-patterns, there are almost always multiple ways to solve a problem.
And while there are plenty of use case MongoDB is not a good match for
it's not necessarily because it's immature, but because it's the wrong
tool. Oracle is mature but it's not the right tool for some use cases too!
And commenting on Jira tickets is precisely how we know what capabilities
are important to users and *why*.
But I'm glad you brought up the $bitwise operator ticket - you mention in
it storing 30 values turned into numbers and crammed into a single number
as bits. *That* is a hack, in my opinion. If you have 30 possible values,
a non-hack way would be to store them in an array and then you can index
it, you can query it, you *don't* have to store a separate lookup for it,
and it's actually readable and maintainable. Performance optimization of
collapsing it into a single number that you then have to set and query the
bits of seems like the hack here.
And speaking of things that may seem like a good idea but sometimes end
up not so good: you link to feature in Elastic Search that looks a lot like
supporting a join to me - you are making a query of one table but your
condition is based on a value in another table, so you do a sub-query of
sorts, on the server side.
There is no problem with this UNTIL YOU NEED TO SCALE. I don't know how
ES scales up, but MongoDB scales via sharding and the problem with an
approach like this is that it won't work sharded - your "server-side"
lookup may suddenly need to query a completely different shard than the one
you are querying (or multiple shards, each of which then needs to look up
this list of terms!)
One of MongoDB architecture/design principles says everything we add to
the system *must* work sharded, because there is nothing worse than using a
database and then when you need to scale it because things are going great
you find out that a bunch of features you are relying on suddenly stop
working! Or they work but are suddenly they are dog-slow!
Funny thing is, that we _do_ have a future plan to support joins (in the
long term), but it *has* to be support that works sharded, which means it
will be restricted in certain ways (like you can only join things on the
same shard) and that means we have to implement a number of features as a
prerequisite to being able to support joins in a sharded MongoDB cluster.
Asya
P.S. the $bitwise query operator request was filed as "P5 Trivial" by the
original creator of the Jira ticket.
Post by petit curieux
Ok i finaly find by myself ur talk, thats interesting but ur
alternatives are more or less some hack rather real solutions.. As you
pointed out mongodb isnot a graph databse..
I'm not gonna say more about my pattern pb now , because i finally you
will not able to have solutions .
I mean problem is not you for sure , you tried to help and many thanks
for that , problem is the mongodb limit..
On this point from my point of vue , you should be more clear about, and
admit we can.t do manythings with mongo (actualy at least ) , thats not
always a cleint pb with bad pattern and bad choice .., other possibility
may be mongo is not mature, simply..
I finaly find a total solution with elasticsearch who provide exaclty
what is needed , a path to retrieve array items in other 'collection' ,
thats simply resolve 2problesm , the 2step query and the fragmetation(write
concern), please see this feature and seriously think to add this to ur
http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/query-dsl-ids-filter.html
Honestly , i have to say that i didn't like to much the developpers
attitudes concerning some feature request and problematic exposure.. For
this reason , i will not post and propose this feature request in jira
mongo .
You know, see in jira feature list, for example the bitwise operator
feature request is essential and tagged by trivial , yes trivial thats mean
easy to add..
Now you 're fighting with the new tiger in your garden rather implement
the easy bitwise operator were tons of peoples are waiting for it..
Anyway , thanks for your help (and the talk was great ) , i will get an
eyes of the mongo evolution in some future
regards
Post by petit curieux
thans Asya but before i answer, i'd like to see your talk , link seems
broken , can you give the correct link please
this is the only one i find to avoid the 2 step query ( $in [134000
, _ID , reference , in , this , kind , of , query , is , unnacceptable]).
I don't think you need two queries if you store "blacklist" in the
user's document who blacklisted someone. I.e. if I blacklist 10 people
from some search, you have an array in my document called "blacklist".
You don't have to query for it for the same reason you don't need to query
for my hobbies or userId when you are doing a search for me - you already
read my document when I started my interaction with your application. You
read it in, and it's now available, including my hobbies, my blacklist
(which you can be updating in "real time" as I interact with your site) and
my location, etc.
I still don't understand how favorites intersects with query results -
once I mark favorites won't I only see favorites no matter what else
matches? And until I have favorites isn't it the case that no one matches
- i.e. no one satisfies the query of having me in their favorites list???
Anyway, those are all side points, I just wanted to point out that if you
limit the size of blacklist to most recent 100 or whatever, they will be
available as soon as you read my profile document and don't have to be
queried for.
Asya
my actual structure is one collection for finding people for travel ,
so poeple search people with common interests and caracteristics in some
distance from himself, the blacklist allow to exclude people for the next
search , thats necessary to not see always the same profils.., faveorites
is to have a list of..favorites people .
The blacklist list has no limit grow , you can have in theory 10000
poeple in your blacklist. So for each search you have to exclud blacklisted
people for the search result..
{
"_id" : 1656,
"Region1" : "Alsace",
"code_postal" : 67110,
"ville" : "Eberbach-Woerth",
"work" : "finance",
"travel" : "london",
"age" : "34" ,
"origin" : "france",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"favorites" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"hobbies" : ["dancing","dreaming"]
"loc" : {
"type" : "Point",
"coordinates" : [
7.72,
48.91
]
}
}
Each number reference in blacklist and favorite list are _ID of
people blacklisted .
When you blacklist someone , you push *your _ID* in the blacklist
list of the blacklisted guy . It will be by far simple to put the
blacklisted guy* in your* blacklisted list ,
but the aggregation framework doesn't allow some kind of semi join
Find all _id in collection profile where _id are not in my blacklist
list . But as far i know you can't do that in one query with mongo .
So this is the reason why the blackliste list can be so big , thats
because each people who want to blakclist bob have to putt their own _ID in
the blacklist list of bob ..
Imagine whats happen now if 134000 pepople don't like bob..
I know this logic is very borderline , but this is the only one i
find to avoid the 2 step query ( $in [134000 , _ID , reference , in ,
this , kind , of , query , is , unnacceptable]).
With my borderline bad shema i can query in 1 query and exclude
blacklisted poeple where my _ID is in their blacklist , see here, my _ID is
14 , and i can exclude 134000 blacklisted guys with only one number in the
db.tablegui.aggregate([
{
$geoNear: {
near: { type: "Point", coordinates: [2.29963, 48.84167] },
distanceField: "dist.calculated",
distanceMultiplier: 1/1000,
maxDistance:25 * 1000 ,
query : {
favorites : {$in : [75010,75020,75011,75006,75007]},
blacklist: { $nin : [14]}
} ,
limit:2000 ,
spherical: true
}
}
I hope you understand more now my shema.. I just do the only move
possible with mongo capabilities..
Now after some lectures, i understood why its not a good idea to
embed , but, do you have any alternative ?
this is so common case .., i think many guy have my bad shema and
when proble;s comes when their project grow,
they could be very hungry..:)
I'm very open for your advises or solutions alternative .. But please
don't tell me its not a pb to put 134000 _ID $in a list of search, make a
query in 2 step is not a pb at all,
the pb is the unpattern form of the second query.. I propose some
feature request to find a solution a this problematic, you have to consider
its not *my *problematic ,
but more a mongo problematic perhaps ?!
Mongo is great for manythings and first we love , but if there is no
alternative/solution , we will unfortunatly force to find other alternative
..
Post by petit curieux
The most querys are " find x =3 and whatever =5 where all profile are
not in my blacklist "
"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"favorites : [18,1982,939,1982,98716,7611,983838, and thousands
others ....]
}
What is "x", what is "whatever" and what are "profile are not in my blacklist"?
Maybe if you explain what this represents or give a more complete example.
I see no reason why you wouldn't be able to represent this via a
different schema and probably avoid the $in (list of thousands) type
of query (although I know of plenty of users who make such queries -
they actually perform quite well when appropriately indexed).
What are the entities in your application? And what are the queries
that you have to run? Please be complete.
Right now I know that you have people/profiles and you know something
about their favorites and something about their blacklist - can you
explain those in a bit more detail (I don't understand why they are
represented by an array of numbers, for example).
Asya
Post by petit curieux
relating to this post Why shouldn't I embed large arrays in my
documents?
Post by petit curieux
its very clear thats not a solution to embed data in the same
collection to
Post by petit curieux
make query "easy" .
I have a big collection profile, basicly this is a collection for
finding
Post by petit curieux
people whith common interests. In each doc i have some other long
list like
Post by petit curieux
blacklist you know, like favoris, etc..
"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and
thousands others
Post by petit curieux
....],
"favorites : [18,1982,939,1982,98716,7611,983838, and
thousands others
Post by petit curieux
....]
}
The most querys are " find x =3 and whatever =5 where all profile
are not in
Post by petit curieux
my blacklist "
The theory and the only alternative to avoid this bad embed array
seems to
Post by petit curieux
split "blacklist" and "favorites"
in others collection with some reference of the collection
profile, kind of
Post by petit curieux
sql normalisation with foreign key, great,
i can accept the fact thats method need multi step (2 query min)
querys to
Post by petit curieux
retrieve your elements;
But what i can't accept is to have tons of items list in your
second query,
Post by petit curieux
thats not realistic in performance term..
query1) first you get all _id reference in ur new blacklist
colection.(lets
Post by petit curieux
say 4500 items , now pb thats fast)..
query2) With your previous result , you can now perform the second
query and
Post by petit curieux
call all profiles were not in the blacklist match list
query : {
idprofile : {$in : [75010,75020,75011,75006,75007, with ,
thousand, of ,
Post by petit curieux
_id, in the list, im, litle,concern,about,the,post,size
,for,performance,reasons]},
Here is the problem..
Its notn acceptable to make some kind of querys at least for
performance
Post by petit curieux
reasons, its a non sens to post thousands of $in reference ,
thats produce an heavy post size and for anybody realistic, thats
another
Post by petit curieux
anti tcp patern at least..
So now what can you do ? whats the correct pattern in this common
case ? I'm
Post by petit curieux
gona to think we touch the capabilities limit of mongo db ,
and perhaps the nosql document model.
Why don't use sql with my kind of social need ?
Well , SQL don't scale billion of geosearch with many join , thats
why we
Post by petit curieux
choose mongo , mongo is fast with geo search and cover 80% of what
we need
Post by petit curieux
so its a big pain for us..
--
You received this message because you are subscribed to the Google
Groups
Post by petit curieux
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups
Post by petit curieux
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it,
send an
Post by petit curieux
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/b833dbaa-252b
-484c-8ccc-32243078a983%40googlegroups.com.
Post by petit curieux
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it,
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/
msgid/mongodb-user/2022411a-974c-4891-ba70-10887e0a0327%40goog
legroups.com
<https://groups.google.com/d/msgid/mongodb-user/2022411a-974c-4891-ba70-10887e0a0327%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/
msgid/mongodb-user/2586c7d1-07db-4882-99a8-5ec800525ef8%
40googlegroups.com
<https://groups.google.com/d/msgid/mongodb-user/2586c7d1-07db-4882-99a8-5ec800525ef8%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/e046d7b3-d39a-4d32-9f06-5ce2502411e3%40googlegroups.com
<https://groups.google.com/d/msgid/mongodb-user/e046d7b3-d39a-4d32-9f06-5ce2502411e3%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/CAOe6dJDpYd46eYXtOVh7UroyFtrLpnE_EvA3dDa5RLbXeye4MA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
s.molinari
2015-01-16 05:44:14 UTC
Permalink
I just noticed, the $in operator used to take max 4000000 array elements
and now that limit is actually removed in 2.6. Are there actually use
cases, where even 4000000 elements were needed and could be efficiently
used? That number and the fact it isn't even a limit anymore, is sort of
blowing my mind right now.:D That makes petit's numbers look like passing
such a relatively small array (5-10k elements) not that big a deal.

Scott
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/85e339c6-7aec-4def-b1ec-207df84e42e9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Asya Kamsky
2015-01-11 19:36:03 UTC
Permalink
This link doesn't work?

http://www.mongodb.com/presentations/socialite-open-source-status-feed-part-2-managing-social-graph

You can google for words
mongodb world socialite user graph
And this presentation comes up as the first hit for me.

Asya
Post by petit curieux
thans Asya but before i answer, i'd like to see your talk , link seems
broken , can you give the correct link please
Post by Asya Kamsky
this is the only one i find to avoid the 2 step query ( $in [134000 ,
_ID , reference , in , this , kind , of , query , is , unnacceptable]).
I don't think you need two queries if you store "blacklist" in the user's
document who blacklisted someone. I.e. if I blacklist 10 people from some
search, you have an array in my document called "blacklist". You don't
have to query for it for the same reason you don't need to query for my
hobbies or userId when you are doing a search for me - you already read my
document when I started my interaction with your application. You read it
in, and it's now available, including my hobbies, my blacklist (which you
can be updating in "real time" as I interact with your site) and my
location, etc.
I still don't understand how favorites intersects with query results -
once I mark favorites won't I only see favorites no matter what else
matches? And until I have favorites isn't it the case that no one matches
- i.e. no one satisfies the query of having me in their favorites list???
Anyway, those are all side points, I just wanted to point out that if you
limit the size of blacklist to most recent 100 or whatever, they will be
available as soon as you read my profile document and don't have to be
queried for.
Asya
my actual structure is one collection for finding people for travel , so
poeple search people with common interests and caracteristics in some
distance from himself, the blacklist allow to exclude people for the next
search , thats necessary to not see always the same profils.., faveorites
is to have a list of..favorites people .
The blacklist list has no limit grow , you can have in theory 10000
poeple in your blacklist. So for each search you have to exclud blacklisted
people for the search result..
{
"_id" : 1656,
"Region1" : "Alsace",
"code_postal" : 67110,
"ville" : "Eberbach-Woerth",
"work" : "finance",
"travel" : "london",
"age" : "34" ,
"origin" : "france",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"favorites" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"hobbies" : ["dancing","dreaming"]
"loc" : {
"type" : "Point",
"coordinates" : [
7.72,
48.91
]
}
}
Each number reference in blacklist and favorite list are _ID of people
blacklisted .
When you blacklist someone , you push *your _ID* in the blacklist list
of the blacklisted guy . It will be by far simple to put the blacklisted guy*
in your* blacklisted list ,
but the aggregation framework doesn't allow some kind of semi join
Find all _id in collection profile where _id are not in my blacklist
list . But as far i know you can't do that in one query with mongo .
So this is the reason why the blackliste list can be so big , thats
because each people who want to blakclist bob have to putt their own _ID in
the blacklist list of bob ..
Imagine whats happen now if 134000 pepople don't like bob..
I know this logic is very borderline , but this is the only one i find
to avoid the 2 step query ( $in [134000 , _ID , reference , in , this ,
kind , of , query , is , unnacceptable]).
With my borderline bad shema i can query in 1 query and exclude
blacklisted poeple where my _ID is in their blacklist , see here, my _ID is
14 , and i can exclude 134000 blacklisted guys with only one number in the
db.tablegui.aggregate([
{
$geoNear: {
near: { type: "Point", coordinates: [2.29963, 48.84167] },
distanceField: "dist.calculated",
distanceMultiplier: 1/1000,
maxDistance:25 * 1000 ,
query : {
favorites : {$in : [75010,75020,75011,75006,75007]},
blacklist: { $nin : [14]}
} ,
limit:2000 ,
spherical: true
}
}
I hope you understand more now my shema.. I just do the only move
possible with mongo capabilities..
Now after some lectures, i understood why its not a good idea to embed ,
but, do you have any alternative ?
this is so common case .., i think many guy have my bad shema and when
proble;s comes when their project grow,
they could be very hungry..:)
I'm very open for your advises or solutions alternative .. But please
don't tell me its not a pb to put 134000 _ID $in a list of search, make a
query in 2 step is not a pb at all,
the pb is the unpattern form of the second query.. I propose some
feature request to find a solution a this problematic, you have to consider
its not *my *problematic ,
but more a mongo problematic perhaps ?!
Mongo is great for manythings and first we love , but if there is no
alternative/solution , we will unfortunatly force to find other alternative
..
Post by petit curieux
The most querys are " find x =3 and whatever =5 where all profile are
not in my blacklist "
"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others ....],
"favorites : [18,1982,939,1982,98716,7611,983838, and thousands others ....]
}
What is "x", what is "whatever" and what are "profile are not in my blacklist"?
Maybe if you explain what this represents or give a more complete example.
I see no reason why you wouldn't be able to represent this via a
different schema and probably avoid the $in (list of thousands) type
of query (although I know of plenty of users who make such queries -
they actually perform quite well when appropriately indexed).
What are the entities in your application? And what are the queries
that you have to run? Please be complete.
Right now I know that you have people/profiles and you know something
about their favorites and something about their blacklist - can you
explain those in a bit more detail (I don't understand why they are
represented by an array of numbers, for example).
Asya
Post by petit curieux
relating to this post Why shouldn't I embed large arrays in my
documents?
Post by petit curieux
its very clear thats not a solution to embed data in the same
collection to
Post by petit curieux
make query "easy" .
I have a big collection profile, basicly this is a collection for
finding
Post by petit curieux
people whith common interests. In each doc i have some other long
list like
Post by petit curieux
blacklist you know, like favoris, etc..
"_id" : ObjectId("5491ac5752c5c30b15bdd8b7"),
"name" : "Eberbach-Woerth",
"blacklist" : [18,1982,939,1982,98716,7611,983838, and thousands
others
Post by petit curieux
....],
"favorites : [18,1982,939,1982,98716,7611,983838, and thousands
others
Post by petit curieux
....]
}
The most querys are " find x =3 and whatever =5 where all profile are
not in
Post by petit curieux
my blacklist "
The theory and the only alternative to avoid this bad embed array
seems to
Post by petit curieux
split "blacklist" and "favorites"
in others collection with some reference of the collection profile,
kind of
Post by petit curieux
sql normalisation with foreign key, great,
i can accept the fact thats method need multi step (2 query min)
querys to
Post by petit curieux
retrieve your elements;
But what i can't accept is to have tons of items list in your second
query,
Post by petit curieux
thats not realistic in performance term..
query1) first you get all _id reference in ur new blacklist
colection.(lets
Post by petit curieux
say 4500 items , now pb thats fast)..
query2) With your previous result , you can now perform the second
query and
Post by petit curieux
call all profiles were not in the blacklist match list
query : {
idprofile : {$in : [75010,75020,75011,75006,75007, with , thousand,
of ,
Post by petit curieux
_id, in the list, im, litle,concern,about,the,post,size
,for,performance,reasons]},
Here is the problem..
Its notn acceptable to make some kind of querys at least for
performance
Post by petit curieux
reasons, its a non sens to post thousands of $in reference ,
thats produce an heavy post size and for anybody realistic, thats
another
Post by petit curieux
anti tcp patern at least..
So now what can you do ? whats the correct pattern in this common
case ? I'm
Post by petit curieux
gona to think we touch the capabilities limit of mongo db ,
and perhaps the nosql document model.
Why don't use sql with my kind of social need ?
Well , SQL don't scale billion of geosearch with many join , thats
why we
Post by petit curieux
choose mongo , mongo is fast with geo search and cover 80% of what we
need
Post by petit curieux
so its a big pain for us..
--
You received this message because you are subscribed to the Google
Groups
Post by petit curieux
"mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups
Post by petit curieux
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it,
send an
Post by petit curieux
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/b833dbaa-252b
-484c-8ccc-32243078a983%40googlegroups.com.
Post by petit curieux
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/
msgid/mongodb-user/2022411a-974c-4891-ba70-10887e0a0327%
40googlegroups.com
<https://groups.google.com/d/msgid/mongodb-user/2022411a-974c-4891-ba70-10887e0a0327%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mongodb-user/cf0e88f0-0a65-4a6e-8df1-8c2ccaa1ea4b%40googlegroups.com
<https://groups.google.com/d/msgid/mongodb-user/cf0e88f0-0a65-4a6e-8df1-8c2ccaa1ea4b%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+***@googlegroups.com.
To post to this group, send email to mongodb-***@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/CAOe6dJCTEz%3DvbhG8DcxmkKbLP%3D%2BHax4Ne2-3BX3SpP%3DZHeHbhg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Continue reading on narkive:
Loading...