一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写

JavaScript代寫-COMP5338
時間:2021-08-27
Dr. Ying Zhou
School of Computer Science
COMP5338 – Advanced Data Models
Week 3: MongoDB – Aggregation Framework
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
Outline
nNull type
nMongoDB Data Modelling
nAggregation
?Single collection aggregation
?Aggregation pipeline with multiple collection
03-2
Null, empty string and related
operators
n Null (or null) is a special data type
? Similar to None, Null or Nil in any programming language
? It has a singleton value expressed as null
? Indicating no value is here
n The interpretation of null is different depending on where it
appears
n It might represents
? The field exists, but has no value
? The field does not exits
? Or both
n This is different to giving a field an empty string “” as value
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
http://docs.mongodb.com/manual/tutorial/query-for-null-fields/index.html
03-3
Null and empty string example
n Collection tweets has many optional fields
? Retweet_id, retweet_user_id, hash_tags, and more
n Collection users uses empty string for fields with no value
? location, description
n The schema was based on the source JSON file. Making
location and description as optional field is a better
option.
n Examples
? db.tweets.find({retweet_id:{$exists:true}})
? db.users.find({location:{$ne: “”}})
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
http://docs.mongodb.com/manual/tutorial/query-for-null-fields/
03-4
Querying for null or field existence
n Queries
? db.tweets.find({retweet_id:{$exists:true}})
? Find all documents that has a field called retweet_id
? db.tweets.find({retweet_id: {$ne: null}})
? Find all documents whose retweet_id field’s value is not null
? db.tweets.find({retweet_id:null})
? Find all documents that do not have a retweet_id field or the value of
retweet_id field is null
? db.tweets.find({retweet_id:{$exists:false}})
? Find all documents that does not have a retweet_id field
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-5
It is possible to set the value to null
db.users.updateMany({location:"", description:""},
{$set:{location: null}})
db.users.updateMany({location:""},{$unset:{location:1}})
db.users.find({location: null}) would return all documents with either
location field with null value or no location field
db.users.find({location:{$exists:false}}) only returns the documents that
do not have the location field
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-6
Outline
nNull type
nMongoDB Data Modelling
nAggregation
?Single collection aggregation
?Aggregation pipeline with multiple collection
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-7
“Schema” Design Example
n A fully normalized relational
model would have the
following tables:
?User
?Post
?Comment
?PostLink
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
http://docs.mongodb.org/manual/applications/data-models/
03-8
MongoDB schema design
n Using three collections
? User collection
? Post collection (with links to User and Post itself)
? Comment Collection(with links to User and Post )
n Using two collections
? User collection
? Post collection (with embedded Comment object and links to User and
Post itself
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-9
Two Collections Schema
n Two collections
? User collection
? Post collection (with embedded Comment object and links to User and
Post itself )
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
{ _id: “p1”,
author: “u1” ,
title: “A nice day”,
date: 2012-09-10,
comments: [
{ author: “u2”,
content: “nice here too”,
date: 2012-09-11,
}
]
backlinks: [“p2”]
}
{ _id: “u1”,
name: “user1”,
password: “bq7e0dx…”,
email: “user1@gmail.com”
}
{ _id: “u2”,
name: “user2”,
password: “mb8xfv…”,
email: “user2@gmail.com”
}
{ _id: “p2”,
author: “u2”
title: “NoSQL is dead”,
date: 2012-09-11,
tags: [“MongoDB”, “HBase”],
comments: [
{ author: “u1”,
content: “nonsense”
date: 2012-09-11
}
]
}
User collection:
Post collection:
This post does not have tags, so no “tags” field This post does not have links pointing to it, so no “backlink” field
Each user profile is saved as a JSON document
An array of Comment objects
Tags and backlinks are stored as
array
03-10
Three Collections Schema
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
n Three collections
? User collection
? Post collection (with links to User and Post itself)
? Comment Collection(with links to User and Post)
{ _id: “u1”,
name: “user1”,
password: “bq7e0dx…”,
email: “user1@gmail.com”
}
{ _id: “u2”,
name: “user2”,
password: “mb8xfv…”,
email: “user2@gmail.com”
}
User collection: Post collection:
{ _id: “p1”,
author: “u1” ,
title: “A nice day”,
date: 2012-09-10,
backlinks: [“p2”]
}
{ _id: “p2”,
author: “u2”
title: “NoSQL is dead”,
date: 2012-09-11,
tags: [“MongoDB”, “HBase”],
}
Comment collection:
{ _id: “c2”,
author: “u1” ,
post: p2
content: “nonsense”,
date: 2012-09-11,
}
{ _id: “c1”,
author: “u2” ,
post: p1
content: “nice here too”,
date: 2012-09-11,
}
03-11
Two Collections vs. Three Collections
n Which one is better?
? Hard to tell by schema itself, we need to look at the actual application to
understand
? Typical data feature
? What would happen if a post attracts lots of comments?
? Typical queries
? Do we want to show all comments when showing a post, or only the latest few, or not at
all?
? Are most comments made in a short period of time?
? Atomicity consideration
? Is there “all or nothing” update requirement with respect to post and comment
n Other design variation?
? In three collection schema, store post-comment link information in Post collection
instead of Comment collection?
? Embed the recent comments in Post?
? One User collection with embedded Post and Comment objects?
? One User collection with user, post and comment documents?
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-12
General Schema Design Guideline
n Depends on data and intended use cases
? “independent” object should have its own collection
? composition relationship are generally modelled as embedded
relation
? Eg. ShoppingOrder and LineItems, Polygon and Points belonging to it
? BUT, other features need to be considered
? Post and Comment have a composition relationship, but it might be
beneficial to model them as separate documents
? aggregation relationship are generally modelled as links
(references) with the link data modelled in the ‘part’ object.
? Eg. Department and Employee
?Many-to-Many relationship are generally modelled as links
(references)
? Eg. Course and Students enrolled in a course
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-13
The Schema of Lab Data Set
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-14
users collection:
tweets collection:
{
"_id" : 142
"created_at" : 2021-08-11T12:09:55.00,
"text" : "…",
"user_id" : 266,
"retweet_id" :297
"retweet_user_id" : 151
}
{
"_id" : 544,
"created_at" : 2021-08-11T12:09:53
"text" : …",
"user_id" : 220,
"replyto_id" : 297
"replyto_user_id" : 151
}
{
"_id" : 297,
"created_at" : 2021-08-11T12:09:00
"text" : …",
"user_id" :151,
}
{
"_id" : 266
"name" : "D",
"screen_name" : "DThinksIt",
"created_at" : 2014-08-31T13:48:54.00,
"location" : "",
"description" : "",
"followes_count" : 45,
"friends_count" : 409,
"favourites_count" : 5068,
"listed_count" : 12
}
Outline
nNull type
nMongoDB Data Modelling
nAggregation
?Single collection aggregation pipeline
?Aggregation pipeline with multiple collection
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-15
Aggregation
n Simple and relatively standard data analytics can be
achieved through aggregation
? Grouping, summing up value, counting, sorting, etc
? Running on the DB engine instead of application layer
n Several options
? Aggregation Pipeline
?MapReduce
? Through JavaScript Functions
? Is able to do customized aggregations
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-16
Aggregation Pipeline
n Aggregation pipeline consists of multiple stages
? Each stage transforms the incoming documents as expressed in the
stage object
? The stage object is enclosed in a pair of curly braces
? The pipeline is an array of many stage objects.
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
db.collection.aggregate( [
{ stage name: {expression,…, expression} },
{ stage name: {expression,…, expression} },
...
] )
03-17
Aggregation Example
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
select cust_id as _id, SUM(amount) as total
from orders
where status = “A”
group by cust_id
03-18
Typical aggregation stages
n $match
n $group
n $project
n $unwind
n $sort
n $skip
n $limit
n $count
n $sample
n $out
n $lookup
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-19
$match stage
n $match: filters the incoming documents based on given
conditions
n Format:
{$match: {}}
? The query document is the same as those in the find query
n Example:
db.users.aggregate([{$match:{friends_count :{$lt: 250000 }}}])
Has the same effect as
db.users.find({friends_count :{$lt: 250000 }})
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-20
$group stage
n $group: groups incoming documents by some specified expression
and outputs to the next stage a document for each distinct group
n Format:
{ $group:
{_id:, // Group By Expression
:{accumulator: },
…}
}
? The _id field of the output document has the value of the group key
for each group
? The other fields usually represent the statistics you want to produce
for each group
? One statistics per field
? Total amount, average price, group size
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-21
$group stage (cont’d)
n in {_id:,
? null value, to specify the whole collection as a group
? field path to to specify one or many fields as grouping key
? Field name prefixed with $ sign in a pair of quotes
? “$title”, or “$address.street”
n {accumulator: }
? There are predefined accumulators: $sum, $avg, $first, $last, etc
? User defined accumulators can be used as well
? Field path will be used in the if the accumulator returns
value based on field values in the incoming document
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-22
$group stage example
n Find the earliest registered user in the whole collection
db.users.find({},{created_at:1, _id:0})
.sort({created_at:1})
.limit(1)
db.users.aggregate([
{$group: {_id:null, earliest: {$min: "$created_at"}}}
])
n Find the earliest retweet time of tweet in the collection
db.tweets.aggregate([
{$match: {retweet_user_id: {$exists: true}}},
{$group: {_id:"$retweet_id",
earliest: {$min: "$created_at"}}}
])
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
Accumulator
field path
Returns a single document
Returns a document for each distinct retweet_id
Sort in ascending order of created_at
Field path as expression
03-23
$group stage example (cont’d)
n Find the number of times one user retweeted another user’s
tweet
? This would require grouping based on two fields: retweet_user_id
and user_id
?We need to specify these two as the _id field of the output
document
db.tweets.aggregate([
{$match: {retweet_user_id: {$exists: true}}},
{$group: {_id:{user:"$retweet_user_id",retweet_user:"$user_id"},
rt_count: {$sum: 1}}},
])
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
Composite type as _id
accumulator number literal
Same effect as count
03-24
$group by more than one field
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
{_id:.. user_id: 123, retweet_user_id:333, created_at:…, … }
{_id:.. user_id: 234, retweet_user_id:444, created_at:…, … }
{_id:.. user_id: 345, retweet_user_id:333, created_at:…, … }
{_id:.. user_id: 123, retweet_user_id:333, created_at:…, … }
{_id:.. User_id: 567, retweet_user_id:444, created_at:…, … }
{_id: {user: 333, retweet_user:123}, rt_count: 2}
{_id: {user: 333, retweet_user:345}, rt_count: 1}
{_id: {user: 444, retweet_user:234}, rt_count: 1}
{_id: {user: 444, retweet_user:567}, rt_count: 1}
{$group: {_id:{
user:"$retweet_user_id",
retweet_user:"$user_id"},
rt_count: {$sum: 1}}},
03-25
$group examples (cont’d)
n Accumulators do not just return a single value, we can use
accumulators to create an array to hold data from incoming
documents
n Example of two commands:
db.tweets.aggregate([
{$group: {_id:"$replyto_id",
rp_users: {$push: "$user_id" }}}
}])
db.tweets.aggregate([
{$group: {_id:"$replyto_id",rp_users: {$addToSet:"$user_id"}}}
])
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
They have the same group key: $replyto_id
They have another field in addition to the group key
The other field is created with different accumulators
03-26
$push accumulator
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
db.tweets.aggregate([
{$group: {_id:"$replyto_id",
rp_users: {$push: "$user_id" }}}
}])
{ _id: 3333,
rp_users:[123,345,123]
}
{ _id:4444,
rp_users:[234,567]
}
{_id:.. user_id: 123, replyto_id:3333, created_at:…, … }
{_id:.. user_id: 234, replyto_id:4444, created_at:…, … }
{_id:.. user_id: 345, replyto_id:3333, created_at:…, … }
{_id:.. user_id: 123, replyto_id:3333, created_at:…, … }
{_id:.. User_id: 567, replyto_id:4444, created_at:…, … }
03-27
$addToSet accumulator
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
db.tweets.aggregate([
{$group: {_id:"$replyto_id",
rp_users: {$addToSet: "$user_id" }}}
}])
{ _id: 3333,
rp_users:[123,345]
}
{ _id:4444,
rp_users:[234,567]
}
{_id:.. user_id: 123, replyto_id:3333, created_at:…, … }
{_id:.. user_id: 234, replyto_id:4444, created_at:…, … }
{_id:.. user_id: 345, replyto_id:3333, created_at:…, … }
{_id:.. user_id: 123, replyto_id:3333, created_at:…, … }
{_id:.. User_id: 567, replyto_id:4444, created_at:…, … }
03-28
$project stage
n $project
? Restructure the document by including/excluding field, adding new
fields, resetting the value of existing field
?More powerful than the project argument in find query
? Format
{$project: {? The specification can be an existing field name followed by a single
value indicating the inclusion (1) or exclusion (0) of fields
? Or it can be a field name (existing or new) followed by an expression
to compute the value of the field
:
? In the expression, existing field from incoming document can be
accessed using field path: “$fieldname”
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-29
$project examples
n Find the active retweet period of each tweet in the
collection, where the active retweet period is defined as
the duration between the last and the first retweets of that
tweet, assuming the timestamp is of ISODate type
db.tweets.aggregate([
{$match: {retweet_id: {$exists: true}}},
{$group: {_id:"$retweet_id",
first: {$min:"$created_at"},
last: {$max:"$created_at"} }},
{$project: {actp: {$subtract:["$last","$first"]}}},
{$match: {actp:{$gt: 0}}}
])
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
http://docs.mongodb.com/manual/reference/operator/aggregation/#arithmetic-expression-operators
Arithmetic expression operator, part of a large group of Aggregation pipeline operator
03-30
$group then $project
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
{_id:ObjectId(“…”), retweet_id: 3333, timestamp:“2016-07-01 00:03:46.000Z”, … }
{_id:ObjectId(“…”), retweet_id: 4444, timestamp:“2016-07-01 00:55:44.000Z”, … }
{_id:ObjectId(“…”), retweet_id: 3333, timestamp:“2016-07-15 12:22:35.000Z”, … }
{_id:ObjectId(“…”), retweet_id: 4444, timestamp:“2016-07-28 00:03:58.000Z”,… }
{_id:ObjectId(“…”), retweet_id: 3333, timestamp:“2016-07-28 00:20:19.000Z”, … }
{_id:3333, first:“2016-07-01 00:03:46.000Z”, last:“2016-07-28 00:20:19.000Z”}
{_id:4444, first:“2016-07-01 00:55:44.000Z”, last:“2016-07-28 00:03:58.000Z”}
{title: 3333, age:2333793000}
{title: 4444, age:2329694000}
{$group: {_id:"$retweet_id",
first: {$min:"$created_at"},
last: {$max:"$created_at"} }},
{$project: {
actp: {$subtract:["$last","$first"]}
}}
03-31
We can combine multiple operators
db.tweets.aggregate([
{$match: {retweet_user_id: {$exists: true}}},
{$group: {_id:"$retweet_id",
first: {$min:"$created_at"},
last: {$max:"$created_at"} }},
{$project: {
actp:{$divide:
[{$subtract:["$last","$first"]},
86400000]}}}
actp_unit: {$literal:"day"}}}
])
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
($last-$first)/86400000
03-32
Dealing with data of array type
n To aggregate (e.g. grouping) values in an array field, it is possible to
flatten the array to access individual value
n $unwind stage flattens an array field from the input documents to output
a document for each element. Each output document is the input
document with the value of the array field replaced by the element.
? { $unwind: } or
? {
$unwind:
{
path: ,
includeArrayIndex: ,
preserveNullAndEmptyArrays:
}
}
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-33
$unwind example
n Default behaviour
? Input document:
{ "_id" : 1, "item" : "ABC1", sizes: [ "S", "M", "L"] }
? After $unwind:"$sizes"
? Becomes 3 output documents:
{ "_id" : 1, "item" : "ABC1", "sizes" : "S" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "M" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "L" }
n Find the number of items that are available in each size
db.inventory.aggregate( [
{ $unwind : "$sizes" },
{ $group:{_id: “$sizes”, item_count: {$sum:1}} }
] )
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-34
$unwind then $group
{ "_id" : 1, "item" : "ABC", "sizes": [ "S", "M", "L"] }
{ "_id" : 2, "item" : "EFG", "sizes" : [ ] }
{ "_id" : 3, "item" : "IJK", "sizes": "M" }
{ "_id" : 4, "item" : "LMN" }
{ "_id" : 5, "item" : "XYZ", "sizes" : null }
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
{ "_id" : 1, "item" : "ABC", "sizes": "S"}
{ "_id" : 1, "item" : “ABC", "sizes": "M"}
{ "_id" : 1, "item" : “ABC", "sizes": "L"}
{ "_id" : 3, "item" : "IJK", "sizes": "M" }
{ $group:{_id: “$sizes”,
item_count: {$sum:1}}
{ "_id" : "S", "item_count": 1}
{ "_id" : "M", "item_count": 2}
{ "_id" : "L", "item_count": 1}
{ $unwind : "$sizes" },
03-35
$sort, $skip, $limit and $count stages
n $sort stage sorts the incoming documents based on specified field(s)
in ascending or descending order
? The function and format is similar to the sort modifier in find query
? { $sort: { : , : ...
} }
n $skip stage skips over given number of documents
? The function and format is similar to the skip modifier in find query
? { $skip: }
n $limit stage limits the number of documents passed to the next stage
? The function and format is similar to the limit modifier in find query
? { $limit: }
n $count stage counts the number of documents passing to this stage
? The function and format is similar to the count modifier in find query
? { $count: }
? String is the name of the field representing the count
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-36
$sample and $out stages
n The $sample stage randomly selects given number of
documents from the previous stage
? { $sample: { size: } }
? Different sampling approaches depending on the location of the
stage and the size of the sample and the collection
?May fail due to memory constraints
n The $out stage writes the documents in a given collection
? should be the last one in the pipeline
? { $out: "" }
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-37
Aggregation Operators
n A few aggregation stages allow us to add new fields or to
give existing fields new values based on expression
? In $group stage we can use various operators or accumulators to
compute values for new fields
? In $project stage we can use operators to compute values for new
or exiting fields
n There are many predefined operators for various data types
to carry out common operations in that data type
? Arithmetic operators: $mod, $log, $sqrt, $subtract, …
? String operators: $concat, $split, $indexofBytes, …
? Comparison operators: $gt, $gte, $lt, $lte,…
? Set operators: $setEquals, $setIntersection, …
? Boolean operators: $and, $or, $not, …
? Array operators: $in, $size, ..
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-38
Aggregation vs. Query operators
n There is another set of operators that can be used in
find/update/delete queries or the $match stage of an
aggregation
? E.g. $gt, $lt, $in, $all….
n The set is smaller and are different to the operators used in
$group or $project stage
n Some operators look the same but have different syntax and
slightly different interpretation in query and in aggregation.
? E.g. $gt in find query looks like
{age: {$gt:18}}
? $gt in $project stage looks like:
{over18: {$gt:[“$age”, 18]}}
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
Returns true or false
03-39
Outline
nNull type
nAggregation
?Single collection aggregation pipeline
?Aggregation pipeline with multiple
collections
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-40
$lookup stage
n $lookup stage is added since 3.2 to perform left outer join
between two collections
? The collection already in the pipeline (maybe after a few stages)
? Another collection (could be the same one)
n For each incoming document from the pipeline, the $lookup
stage adds a new array field whose elements are the
matching documents from the other collection.
n A few different forms
? Equality match
? Join with other conditions
? Join with uncorrelated sub-queries
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-41
$lookup stage properties
n The output of $lookup stage has the same number of
documents as the previous stage
n Each document is augmented with an array field storing
matching document(s) from the other collection
n The array could contain any number of documents
depending on the match, including zero
n Missing local or foreign field is treated as having null value
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-42
Equality Match $lookup
{$lookup:
{ from: ,
localField: ,
foreignField: ,
as:
}
}
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-43
Equality match $lookup example
db.orders.aggregate([
{
$lookup:
{
from: "inventory",
localField: "item",
foreignField: "sku",
as: "inventory_docs"
}
}
])
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
{"_id":1, "item":"abc", "price":12,"quantity":2 }
{"_id":2, "item":"nosku", "price":20,"quantity":1 }
{"_id":3 }
{"_id":1, "sku":"abc", description:"product 1", "instock":120}
{"_id":2, "sku":"def", description:"product 2", "instock":80 }
{"_id":3, "sku":"ijk", description:"product 3", "instock":60}
{"_id":4, "sku":"jkl", description:"product 4", "instock":70 }
{"_id":5, "sku":null, description:"Incomplete" }
{"_id":6}
orders
inventory
http://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#pipe._S_lookup
A document with no item field
A document with sku field
equals null
A document with no sku field
03-44
Equality match $lookup example (cont’d)
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
{"_id":1, "item":"abc", "price":12,"quantity":2 }
{"_id":2, "item":"nosku", "price":20,"quantity":1 }
{"_id":3 }
{"_id":1, "sku":"abc", description:"product 1", "instock":120}
{"_id":2, "sku":"def", description:"product 2", "instock":80 }
{"_id":3, "sku":"ijk", description:"product 3", "instock":60}
{"_id":4, "sku":"jkl", description:"product 4", "instock":70 }
{"_id":5, "sku":null, description:"Incomplete" }
{"_id":6}
{"_id":1, "item":"abc", "price":12,"quantity":2,
"inventory_docs": [
{ "_id":1, "sku":"abc", description:"product 1", "instock":120 }] }
{"_id":2, "item":“nosku", "price":20,"quantity":1,
"inventory_docs" : [] }
{"_id":3, "inventory_docs" : [
{ "_id" : 5, "sku" : null, "description" : "Incomplete" },
{ "_id" : 6 }]}
An empty array for no matching from other collection
Non exists field matches null and non exists field
local
foreign
output
03-45
Other format of $lookup
{
$lookup:
{
from: ,
let: { : , …, : },
pipeline: [ ],
as:
}
}
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
let: Optionally specifies variables to use in the pipeline field
stages. Most likely the variable(s) may refer to field(s) in the
local collection already in the pipeline
pipeline: Specifies the pipeline to run on the joined collection.
The pipeline determines the resulting documents from the
joined collection.
03-46
Multiple Join Condition Example
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
{ "_id" : 1, "item" : "almonds", "price" : 12, "ordered" : 2 }
{ "_id" : 2, "item" : "pecans", "price" : 20, "ordered" : 1 }
{ "_id" : 3, "item" : "cookies", "price" : 10, "ordered" : 60 }
orders collection
{ "_id" : 1, "stock_item" : "almonds", warehouse: "A", "instock" : 120 }
{ "_id" : 2, "stock_item" : "pecans", warehouse: "A", "instock" : 80 }
{ "_id" : 3, "stock_item" : "almonds", warehouse: "B", "instock" : 60 }
{ "_id" : 4, "stock_item" : "cookies", warehouse: "B", "instock" : 40 }
{ "_id" : 5, "stock_item" : "cookies", warehouse: "A", "instock" : 80 }
warehouses collection
An ordered item may be stocked in multiple warehouses;
We want to find for each ordered item the warehouse with
sufficient stock to cover the order
03-47
Insufficient stock
Multiple Joint Condition
n This query involves comparing two fields of the local and
foreign documents:
? “item” in orders should match ”stock_item” in warehouses
? “ordered” in orders should be less than or equal to “instock” in
warehouses
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
{ "_id" : 1, "item" : "almonds", "price" : 12, "ordered" : 2 }
{ "_id" : 2, "item" : "pecans", "price" : 20, "ordered" : 1 }
{ "_id" : 3, "item" : "cookies", "price" : 10, "ordered" : 60 }
{ "_id" : 1, "stock_item" : "almonds", warehouse: "A", "instock" : 120 }
{ "_id" : 2, "stock_item" : "pecans", warehouse: "A", "instock" : 80 }
{ "_id" : 3, "stock_item" : "almonds", warehouse: "B", "instock" : 60 }
{ "_id" : 4, "stock_item" : "cookies", warehouse: "B", "instock" : 40 }
{ "_id" : 5, "stock_item" : "cookies", warehouse: "A", "instock" : 80 }
orders collection
warehouses collection
03-48
Multiple Joint Condition $lookup
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
db.orders.aggregate([
{
$lookup:
{
from: "warehouses",
let: { order_item: "$item", order_qty: "$ordered" },
pipeline: [
{ $match:
{ $expr:
{ $and:
[
{ $eq: [ "$stock_item", "$$order_item" ] },
{ $gte: [ "$instock", "$$order_qty" ] }
]
}
}
},
{ $project: { stock_item: 0, _id: 0 } }
],
as: "stockdata"
}
}
])
This is the way to let the pipeline access local fields:
use variable order_item to access the local
document’s item field; use variable order_qty to
access the local document’s ordered field
This is the
way to
specify
multiple
condition
variables are accessed
using “$$” prefix
$lookup by default
includes the entire matched
foreign document in the
array, we can use $project
stage to get rid of some
fieldsMatching document after
the pipeline stage will be
stored in this variable
03-49
Results
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
{ "_id" : 1,
"item" : "almonds",
"price" : 12,
"ordered" : 2,
"stockdata" : [
{ "warehouse" : "A", "instock" : 120 },
{ "warehouse" : "B", "instock" : 60 }
]
}
{ "_id" : 2,
"item" : "pecans",
"price" : 20,
"ordered" : 1,
"stockdata" : [
{ "warehouse" : "A", "instock" : 80 }
]
}
{ "_id" : 3,
"item" : "cookies",
"price" : 10,
"ordered" : 60,
"stockdata" : [
{ "warehouse" : "A", "instock" : 80 }
]
}
03-50
Concise syntax introduced in version 5.0
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-51
$lookup:
{
from: ,
localField: ,
foreignField: ,
let : { : , …, : },
pipeline: [ ],
as:
}
Keep the simple expression of the equality condition
Equivalent Concise Syntax
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-52
db.orders.aggregate([
{
$lookup:
{
from: "warehouses",
localField: "item",
foreignField: "stock_item",
let: {order_qty: "$ordered" },
pipeline: [
{ $match:
{ $expr:
{ $gte: [ "$instock", "$$order_qty" ] }
}
},
{ $project: { stock_item: 0, _id: 0 } }
],
as: "stockdata"
}
}
])
Uncorrelated Subquery Example
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
{
"_id" : 1,
"student" : "Ann Aardvark",
sickdays: [ "2018-05-01", 2018-08-23"]
}
{
"_id" : 2,
"student" : "Zoe Zebra",
sickdays: [“2018-02-01", 2018-05-23") ]
}
absences collection
{ "_id" : 1, year: 2018, name: "New Years", date: "2018-01-01" }
{ "_id" : 2, year: 2018, name: "Pi Day", date: 2018-03-14" }
{ "_id" : 3, year: 2018, name: "Ice Cream Day", date: "2018-07-15"}
{ "_id" : 4, year: 2017, name: "New Years", date: "2017-01-01" }
{ "_id" : 5, year: 2017, name: "Ice Cream Day", date: "2017-07-16”}
holidays collection
We want to include all 2018 public holidays in the absences
collection
03-53
Uncorrelated Subquery $lookup
db.absences.aggregate([
{
$lookup:
{
from: "holidays",
pipeline: [
{ $match: { year: 2018 } },
{ $project: { _id: 0,
date: { name: "$name", date: "$date" } } },
{ $replaceRoot: { newRoot: "$date" } }
],
as: "holidays"
}
}
])
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
The inner pipeline selects documents
from holidays collection based on a
condition unrelated with the local
collection, ‘let’ field is not needed
The $project and $replaceRoot change the
structure of the inner pipeline output documents
Has the same effect as: { $project: { _id: 0, year:0 } } in this case
$replaceRoot and similar “project” like stage are useful for promoting an
embedded document at root level
03-54
Results
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
{
"_id" : 1,
"student" : "Ann Aardvark",
sickdays: [ "2018-05-01", 2018-08-23"],
"holidays" : [
{ "name" : "New Years", "date" : " 2018-01-01") },
{ "name" : "Pi Day", "date" : "2018-03-14") },
{ "name" : "Ice Cream Day", "date" : "2018-07-15"}
]
}
{
"_id" : 2,
"student" : "Zoe Zebra",
sickdays: [“2018-02-01", 2018-05-23") ],
"holidays" : [
{ "name" : "New Years", "date" : " 2018-01-01") },
{ "name" : "Pi Day", "date" : "2018-03-14") },
{ "name" : "Ice Cream Day", "date" : "2018-07-15"}
]
}
03-55
References
n BSON types
? http://docs.mongodb.com/manual/reference/bson-types/
n Querying for Null or Missing Field
? http://docs.mongodb.com/manual/tutorial/query-for-null-
fields/index.html
n Aggregation Pipelines
? http://docs.mongodb.com/manual/core/aggregation-pipeline/
n Aggregation operators
? http://docs.mongodb.com/manual/reference/operator/aggregation/
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-56

學霸聯盟

在線客服

售前咨詢
售后咨詢
微信號
Essay_Cheery
微信
专业essay代写|留学生论文,作业,网课,考试|代做功課服務-PROESSAY HKG 专业留学Essay|Assignment代写|毕业论文代写-rushmyessay,绝对靠谱负责 代写essay,代写assignment,「立减5%」网课代修-Australiaway 代写essay,代写assignment,代写PAPER,留学生论文代写网 毕业论文代写,代写paper,北美CS代写-编程代码,代写金融-第一代写网 作业代写:CS代写|代写论文|统计,数学,物理代写-天天论文网 提供高质量的essay代写,Paper代写,留学作业代写-天才代写 全优代写 - 北美Essay代写,Report代写,留学生论文代写作业代写 北美顶级代写|加拿大美国论文作业代写服务-最靠谱价格低-CoursePass 论文代写等留学生作业代做服务,北美网课代修领导者AssignmentBack