MongoDB学习笔记(四)- 聚合框架

文章目录

Mongo Aggregation Framework聚合框架是一个计算框架,由管道(pipeline)和步骤(stage)组成,如下图所示:

Pipeline/Stage

  1. 原始的文档作为原始的输入数据
  2. 每个步骤对输入数据进行处理,处理的结果作为下一个处理步骤的输入
  3. 最终的结果也就是管道处理的结果,是由多个步骤处理完成

定义

聚合框架命令如下:

1pipeline = [$stage1, $stage2, ...$stageN];
2
3db.<COLLECTION>.aggregate(pipeline, options);

具体的每个步骤支持的操作非常丰富,需要查阅相应的文档。支持的步骤列表详见:Aggregation Pipeline Stages — MongoDB Manual。常见的的步骤如下:

StageDescription~ SQL
$match过滤WHERE
$project投影AS
$sort排序ORDER BY
$group分组GROUP BY
$skip/$limit结果限制SKIP / LIMIT
$lookup左外连接LEFT OUTER JOIN
$unwind展开数组,将数组的每一个元素都展开到父文档里
$graphLookup图搜索
$facet / $bucket分面搜索

$match:

1{ $match: { <query> } }

query的格式跟db.collection.find的格式一样

$project格式:

将所需的字段投影出来传递给下一个步骤。字段可以是文档中存在的,也可以是通过表达式生成的自定义字段

1{ $project: { <specification(s)> } }

specification(s):

格式说明
<field>: <1 or true>包含字段。除了使用1或者true,非0的整数都看作是true
_id: <0 or false>排除_id字段,默认是包含
<field>: <expression>生成一个新的字段,或者对现有字段通过表达式赋值
<field>:<0 or false>排除字段

$sort格式:

1{ $sort: { <field1>: <sort order>, <field2>: <sort order> ... } }

按字段进行排序,根据sort order的值不同判定是升序还是降序:1代表升序,-1代表降序

$group格式:

1{
2  $group:
3    {
4      _id: <expression>, // Group By Expression
5      <field1>: { <accumulator1> : <expression1> },
6      ...
7    }
8 }

_id:必填字段。该字段指明分组的条件 - 可以按照文档中某个字段分组或者干脆不分组。按照文档某个字段分组时,通过$引用文档字段,然后按组对文档做聚合。如果不分组,那么_id指定为null,这样对符合条件的所有文档做聚合

filedN:可选字段。存储聚合计算的结果。mongo支持的聚合计算运算符参考$group (aggregation) — MongoDB Manual

$skip/$limit格式:

1{ $limit: <positive 64-bit integer> } // 从结果集中选取top N传递给下一个步骤
2{ $skip: <positive 64-bit integer> } // 跳过结果集中的top N把剩余的结果传递给下一个步骤

示例

以下示例基于mock数据库,单个文档示例如下:

 1{
 2    "_id": {"$oid": "5dbe7a545368f69de2b4d36e"},
 3    "city": "Champlinberg",
 4    "country": "Malaysia",
 5    "name": "Destinee Schneider",
 6    "orderDate": {"$date": "2019-03-26T03:20:08.805Z"},
 7    "orderLines": [
 8      {
 9        "product": "Refined Fresh Tuna",
10        "sku": "2057",
11        "qty": 25,
12        "price": {"$numberDecimal": 56.00},
13        "cost": {"$numberDecimal": 46.48}
14      },
15      {
16        "product": "Refined Concrete Ball",
17        "sku": "1738",
18        "qty": 61,
19        "price": {"$numberDecimal": 47.00},
20        "cost": {"$numberDecimal": 47}
21      },
22      {
23        "product": "Rustic Granite Towels",
24        "sku": "500",
25        "qty": 62,
26        "price": {"$numberDecimal": 74.00},
27        "cost": {"$numberDecimal": 62.16}
28      },
29      {
30        "product": "Refined Rubber Salad",
31        "sku": "1400",
32        "qty": 73,
33        "price": {"$numberDecimal": 93.00},
34        "cost": {"$numberDecimal": 87.42}
35      },
36      {
37        "product": "Intelligent Wooden Towels",
38        "sku": "5674",
39        "qty": 72,
40        "price": {"$numberDecimal": 84.00},
41        "cost": {"$numberDecimal": 68.88}
42      },
43      {
44        "product": "Refined Steel Bacon",
45        "sku": "5009",
46        "qty": 8,
47        "price": {"$numberDecimal": 53.00},
48        "cost": {"$numberDecimal": 50.35}
49      }
50    ],
51    "phone": "425.956.7743 x4621",
52    "shippingFee": {"$numberDecimal": 8.00},
53    "state": "Texas",
54    "status": "created",
55    "street": "493 Hilll Curve",
56    "total": {"$numberDecimal": 407},
57    "userId": 3573,
58    "zip": "24344-1715"
59  }

计算总销量

计算到目前为止的总销售额

  • 无论订单状态
  • 不限制时间范围
  • 不算运费
 1// 查询命令
 2db.orders.aggregate(
 3    [
 4        {$group: {_id: null, totalAmout: {$sum: "$total"}}}
 5    ]
 6)
 7// 查询结果
 8[
 9  {
10    "_id": null,
11    "totalAmout": {"$numberDecimal": 44019609}
12  }
13]

计算Q1已完成的订单金额

查询2019年第一季度(1月1日~3月31日)订单中已完成(completed)状态的总金额和总数量

 1// 查询命令
 2db.orders.aggregate(
 3    [
 4        {$match: {"orderDate": {$gte: new Date("2019-01-01"), $lt: new Date("2019-04-01")}, "status": "completed"}},
 5        {$group: {_id: null, total: {$sum: "$total"}, shippingFee: {$sum: "$shippingFee"}, orderCount: {$count: {}}}},
 6        {$project: {_id: 0, count: "$orderCount", grandTotal: {$add: ["$total", "$shippingFee"]}}}
 7    ]
 8)
 9// 查询结果
10[
11  {
12    "count": 5875,
13    "grandTotal": {"$numberDecimal": 2636376.00}
14  }
15]

计算月销量

计算前半年每个月的销售额和总订单数。

  • 不算运费
  • 不算取消(cancelled)状态的订单
 1// 查询命令
 2db.orders.aggregate(
 3    [
 4        {$match: {"orderDate": {$lt: new Date("2019-07-01")}, "status": {$ne: "cancelled"}}},
 5        {$group: {_id: {$month: "$orderDate"}, count: {$count: {}}, total: {$sum: "$total"}}},
 6        {$sort: {"_id": 1}},
 7        {$project: {"month": "$_id", _id: 0, "count": 1, "total": 1}}
 8    ]
 9)
10// 查询结果
11[
12  {
13    "count": 8249,
14    "month": 1,
15    "total": {"$numberDecimal": 3620936}
16  },
17  {
18    "count": 7387,
19    "month": 2,
20    "total": {"$numberDecimal": 3258201}
21  },
22  {
23    "count": 8167,
24    "month": 3,
25    "total": {"$numberDecimal": 3574185}
26  },
27  {
28    "count": 8038,
29    "month": 4,
30    "total": {"$numberDecimal": 3551291}
31  },
32  {
33    "count": 8163,
34    "month": 5,
35    "total": {"$numberDecimal": 3590503}
36  },
37  {
38    "count": 7942,
39    "month": 6,
40    "total": {"$numberDecimal": 3496645}
41  },
42  ...
43]

地区销量Top1

计算第一季度每个州(state)销量最多的sku第一名。

  • 只算complete订单
 1db.orders.aggregate(
 2    [
 3        {$match: {"status": "completed", "orderDate": {$lt: new Date("2019-04-01")}}},
 4        {$unwind: "$orderLines"},
 5        {$group: {_id: {state: "$state", sku: "$orderLines.sku"}, count: {$sum: "$orderLines.qty"}}},
 6        {$sort: {"_id.state": 1, count: -1}},
 7        {$group: {_id: "$_id.state", sku: {$first: "$_id.sku"}, count: {$first: "$count"}}},
 8        {$sort: {_id: 1}}
 9    ]
10)

统计SKU销售件数

统计每个sku在第一季度销售的次数

  • 不算取消(cancelled)状态的订单
  • 按销售数量降序排列
 1// 查询命令
 2db.orders.aggregate(
 3    [
 4        {$match: {"orderDate": {$lt: new Date("2019-04-01")}, "status": {$ne: "cancelled"}}},
 5        {$unwind: "$orderLines"},
 6        {$group: {_id: "$orderLines.sku", count: {$sum: "$orderLines.qty"}}},
 7        {$sort: {count: -1}}
 8    ]
 9)
10// 查询结果
11[
12  {
13    "_id": "4751",
14    "count": 2115
15  },
16  {
17    "_id": "798",
18    "count": 1945
19  },
20  {
21    "_id": "3863",
22    "count": 1913
23  },
24  {
25    "_id": "2558",
26    "count": 1896
27  },
28  {
29    "_id": "2049",
30    "count": 1859
31  },
32  ...
33]