MongoDB学习笔记(四)- 聚合框架
文章目录
Mongo Aggregation Framework聚合框架是一个计算框架,由管道(pipeline)和步骤(stage)组成,如下图所示:
- 原始的文档作为原始的输入数据
- 每个步骤对输入数据进行处理,处理的结果作为下一个处理步骤的输入
- 最终的结果也就是管道处理的结果,是由多个步骤处理完成
定义
聚合框架命令如下:
1pipeline = [$stage1, $stage2, ...$stageN];
2
3db.<COLLECTION>.aggregate(pipeline, options);
具体的每个步骤支持的操作非常丰富,需要查阅相应的文档。支持的步骤列表详见:Aggregation Pipeline Stages — MongoDB Manual。常见的的步骤如下:
Stage | Description | ~ SQL |
---|---|---|
$match | 过滤 | WHERE |
$project | 投影 | AS |
$sort | 排序 | ORDER BY |
$group | 分组 | GROUP BY |
$skip/$limit | 结果限制 | SKIP / LIMIT |
$lookup | 左外连接 | LEFT OUTER JOIN |
$unwind | 展开数组,将数组的每一个元素都展开到父文档里 | |
$graphLookup | 图搜索 | |
$facet / $bucket | 分面搜索 |
$match:
1{ $match: { <query> } }
query的格式跟db.collection.find
的格式一样
$project格式:
将所需的字段投影出来传递给下一个步骤。字段可以是文档中存在的,也可以是通过表达式生成的自定义字段
1{ $project: { <specification(s)> } }
specification(s):
格式 | 说明 |
---|---|
<field>: <1 or true> | 包含字段。除了使用1或者true,非0的整数都看作是true |
_id: <0 or false> | 排除_id 字段,默认是包含 |
<field>: <expression> | 生成一个新的字段,或者对现有字段通过表达式赋值 |
<field>:<0 or false> | 排除字段 |
$sort格式:
1{ $sort: { <field1>: <sort order>, <field2>: <sort order> ... } }
按字段进行排序,根据sort order
的值不同判定是升序还是降序:1代表升序,-1代表降序
$group格式:
1{
2 $group:
3 {
4 _id: <expression>, // Group By Expression
5 <field1>: { <accumulator1> : <expression1> },
6 ...
7 }
8 }
_id
:必填字段。该字段指明分组的条件 - 可以按照文档中某个字段分组或者干脆不分组。按照文档某个字段分组时,通过$
引用文档字段,然后按组对文档做聚合。如果不分组,那么_id
指定为null
,这样对符合条件的所有文档做聚合
filedN
:可选字段。存储聚合计算的结果。mongo支持的聚合计算运算符参考$group (aggregation) — MongoDB Manual
$skip/$limit格式:
1{ $limit: <positive 64-bit integer> } // 从结果集中选取top N传递给下一个步骤
2{ $skip: <positive 64-bit integer> } // 跳过结果集中的top N把剩余的结果传递给下一个步骤
示例
以下示例基于mock数据库,单个文档示例如下:
1{
2 "_id": {"$oid": "5dbe7a545368f69de2b4d36e"},
3 "city": "Champlinberg",
4 "country": "Malaysia",
5 "name": "Destinee Schneider",
6 "orderDate": {"$date": "2019-03-26T03:20:08.805Z"},
7 "orderLines": [
8 {
9 "product": "Refined Fresh Tuna",
10 "sku": "2057",
11 "qty": 25,
12 "price": {"$numberDecimal": 56.00},
13 "cost": {"$numberDecimal": 46.48}
14 },
15 {
16 "product": "Refined Concrete Ball",
17 "sku": "1738",
18 "qty": 61,
19 "price": {"$numberDecimal": 47.00},
20 "cost": {"$numberDecimal": 47}
21 },
22 {
23 "product": "Rustic Granite Towels",
24 "sku": "500",
25 "qty": 62,
26 "price": {"$numberDecimal": 74.00},
27 "cost": {"$numberDecimal": 62.16}
28 },
29 {
30 "product": "Refined Rubber Salad",
31 "sku": "1400",
32 "qty": 73,
33 "price": {"$numberDecimal": 93.00},
34 "cost": {"$numberDecimal": 87.42}
35 },
36 {
37 "product": "Intelligent Wooden Towels",
38 "sku": "5674",
39 "qty": 72,
40 "price": {"$numberDecimal": 84.00},
41 "cost": {"$numberDecimal": 68.88}
42 },
43 {
44 "product": "Refined Steel Bacon",
45 "sku": "5009",
46 "qty": 8,
47 "price": {"$numberDecimal": 53.00},
48 "cost": {"$numberDecimal": 50.35}
49 }
50 ],
51 "phone": "425.956.7743 x4621",
52 "shippingFee": {"$numberDecimal": 8.00},
53 "state": "Texas",
54 "status": "created",
55 "street": "493 Hilll Curve",
56 "total": {"$numberDecimal": 407},
57 "userId": 3573,
58 "zip": "24344-1715"
59 }
计算总销量
计算到目前为止的总销售额
- 无论订单状态
- 不限制时间范围
- 不算运费
1// 查询命令
2db.orders.aggregate(
3 [
4 {$group: {_id: null, totalAmout: {$sum: "$total"}}}
5 ]
6)
7// 查询结果
8[
9 {
10 "_id": null,
11 "totalAmout": {"$numberDecimal": 44019609}
12 }
13]
计算Q1已完成的订单金额
查询2019年第一季度(1月1日~3月31日)订单中已完成(completed)状态的总金额和总数量
1// 查询命令
2db.orders.aggregate(
3 [
4 {$match: {"orderDate": {$gte: new Date("2019-01-01"), $lt: new Date("2019-04-01")}, "status": "completed"}},
5 {$group: {_id: null, total: {$sum: "$total"}, shippingFee: {$sum: "$shippingFee"}, orderCount: {$count: {}}}},
6 {$project: {_id: 0, count: "$orderCount", grandTotal: {$add: ["$total", "$shippingFee"]}}}
7 ]
8)
9// 查询结果
10[
11 {
12 "count": 5875,
13 "grandTotal": {"$numberDecimal": 2636376.00}
14 }
15]
计算月销量
计算前半年每个月的销售额和总订单数。
- 不算运费
- 不算取消(cancelled)状态的订单
1// 查询命令
2db.orders.aggregate(
3 [
4 {$match: {"orderDate": {$lt: new Date("2019-07-01")}, "status": {$ne: "cancelled"}}},
5 {$group: {_id: {$month: "$orderDate"}, count: {$count: {}}, total: {$sum: "$total"}}},
6 {$sort: {"_id": 1}},
7 {$project: {"month": "$_id", _id: 0, "count": 1, "total": 1}}
8 ]
9)
10// 查询结果
11[
12 {
13 "count": 8249,
14 "month": 1,
15 "total": {"$numberDecimal": 3620936}
16 },
17 {
18 "count": 7387,
19 "month": 2,
20 "total": {"$numberDecimal": 3258201}
21 },
22 {
23 "count": 8167,
24 "month": 3,
25 "total": {"$numberDecimal": 3574185}
26 },
27 {
28 "count": 8038,
29 "month": 4,
30 "total": {"$numberDecimal": 3551291}
31 },
32 {
33 "count": 8163,
34 "month": 5,
35 "total": {"$numberDecimal": 3590503}
36 },
37 {
38 "count": 7942,
39 "month": 6,
40 "total": {"$numberDecimal": 3496645}
41 },
42 ...
43]
地区销量Top1
计算第一季度每个州(state)销量最多的sku
第一名。
- 只算
complete
订单
1db.orders.aggregate(
2 [
3 {$match: {"status": "completed", "orderDate": {$lt: new Date("2019-04-01")}}},
4 {$unwind: "$orderLines"},
5 {$group: {_id: {state: "$state", sku: "$orderLines.sku"}, count: {$sum: "$orderLines.qty"}}},
6 {$sort: {"_id.state": 1, count: -1}},
7 {$group: {_id: "$_id.state", sku: {$first: "$_id.sku"}, count: {$first: "$count"}}},
8 {$sort: {_id: 1}}
9 ]
10)
统计SKU销售件数
统计每个sku
在第一季度销售的次数
- 不算取消(cancelled)状态的订单
- 按销售数量降序排列
1// 查询命令
2db.orders.aggregate(
3 [
4 {$match: {"orderDate": {$lt: new Date("2019-04-01")}, "status": {$ne: "cancelled"}}},
5 {$unwind: "$orderLines"},
6 {$group: {_id: "$orderLines.sku", count: {$sum: "$orderLines.qty"}}},
7 {$sort: {count: -1}}
8 ]
9)
10// 查询结果
11[
12 {
13 "_id": "4751",
14 "count": 2115
15 },
16 {
17 "_id": "798",
18 "count": 1945
19 },
20 {
21 "_id": "3863",
22 "count": 1913
23 },
24 {
25 "_id": "2558",
26 "count": 1896
27 },
28 {
29 "_id": "2049",
30 "count": 1859
31 },
32 ...
33]