系列十三、MongoDB聚合查询
一、概述
MongoDB聚合框架(Aggregation Framework)是一个计算框架,为集合文档数据提供各种处理数据的方法,并返回计算结果。MongoDB提供了三种方式来执行聚合(Aggregation)命令,即:聚合管道方法、map-reduce方法、单一目标聚合方法。常见功能:
①:作用在一个或者几个集合上;
②:对集合中的数据进行一系列的运算;
③:将这些数据转化为期望的形式;
从效果而言,聚合框架相当于SQL中的 group by、left outer join、as等。
1.1、聚合管道方法
管道在Unix或者Linux中一般用于将当前指令的输出结果作为下一个命令的参数。MongoDB的聚合管道将MongoDB文档在一个管道处理完毕后将结果传递给下一个管道处理。
二、聚合
2.1、管道(Pipeline)和步骤(Stage)
整个聚合运算过程称为管道(Pipeline),它是由多个步骤(Stage)组成的,每个管道:
①:接受一系列文档(原始数据);
②:每个步骤对这些文档进行一系列运算;
③:结果文档输出给下一个步骤;
2.2、基本格式
pipline = [$stage1,$stage2,...$stageN]
db.<COLLECTION>.aggregate(
pipline,
{ options }
);
2.3、常见步骤
2.3.1、常见步骤中的运算符
2.4、少见步骤
2.5、使用场景
聚合查询可以用于OLAP和OLTP场景,例如:
2.6、MQL常用步骤与SQL对比
2.6.1、vs1
2.6.2、vs2
2.6.3、特有步骤$unwind
2.6.4、特有步骤$bucket
2.6.5、特有步骤$facet
三、实验一
3.1、初始化数据
db.orders.insert(
[
{
"street": "西兴街道",
"city": "杭州",
"state": "浙江省",
"country": "中国",
"zip": "24344-1715",
"phone": "18866668888",
"name": "李白",
"userId": "3573",
"orderDate": "2019-01-02 03:20:08.805",
"status": "completed",
"shippingFee": 8.00,
"orderLines": [{
"product": "iPhone5",
"sku": "2001",
"qty": 1,
"price": 100.00,
"cost": 100.00
},
{
"product": "iPhone5s",
"sku": "2002",
"qty": 2,
"price": 200.00,
"cost": 400.00
},
{
"product": "iPhone6",
"sku": "2003",
"qty": 1,
"price": 300.00,
"cost": 300.00
},
{
"product": "iPhone6s",
"sku": "2004",
"qty": 2,
"price": 400.00,
"cost": 800.00
},
{
"product": "iPhone8",
"sku": "2005",
"qty": 2,
"price": 500.00,
"cost": 1000.00
}
],
"total": 2600
},
{
"street": "长河街道",
"city": "杭州",
"state": "浙江省",
"country": "中国",
"zip": "24344-1716",
"phone": "18866668881",
"name": "杜甫",
"userId": "3574",
"orderDate": "2019-02-02 13:20:08.805",
"status": "completed",
"shippingFee": 5.00,
"orderLines": [{
"product": "iPhone5",
"sku": "2001",
"qty": 1,
"price": 100.00,
"cost": 100.00
},
{
"product": "iPhone5s",
"sku": "2002",
"qty": 2,
"price": 200.00,
"cost": 400.00
},
{
"product": "iPhone6",
"sku": "2003",
"qty": 1,
"price": 300.00,
"cost": 300.00
},
{
"product": "iPhone6s",
"sku": "2004",
"qty": 2,
"price": 400.00,
"cost": 800.00
},
{
"product": "iPhone8",
"sku": "2005",
"qty": 2,
"price": 500.00,
"cost": 1000.00
}
],
"total": 2600
},
{
"street": "浦沿街道",
"city": "杭州",
"state": "浙江省",
"country": "中国",
"zip": "24344-1717",
"phone": "18866668882",
"name": "王安石",
"userId": "3575",
"orderDate": "2019-03-02 14:20:08.805",
"status": "completed",
"shippingFee": 20.00,
"orderLines": [{
"product": "iPhone5",
"sku": "2001",
"qty": 1,
"price": 100.00,
"cost": 100.00
},
{
"product": "iPhone5s",
"sku": "2002",
"qty": 2,
"price": 200.00,
"cost": 400.00
},
{
"product": "iPhone6",
"sku": "2003",
"qty": 1,
"price": 300.00,
"cost": 300.00
},
{
"product": "iPhone6s",
"sku": "2004",
"qty": 2,
"price": 400.00,
"cost": 800.00
},
{
"product": "iPhone12 ProMax",
"sku": "2006",
"qty": 1,
"price": 1500.00,
"cost": 1500.00
}
],
"total": 3100
},
{
"street": "长庆街道",
"city": "杭州",
"state": "浙江省",
"country": "中国",
"zip": "24344-1717",
"phone": "18866668883",
"name": "苏东坡",
"userId": "3576",
"orderDate": "2019-04-02 15:20:08.805",
"status": "completed",
"shippingFee": 10.00,
"orderLines": [
{
"product": "iPhone6s",
"sku": "2004",
"qty": 2,
"price": 400.00,
"cost": 800.00
},
{
"product": "iPhone12 ProMax",
"sku": "2006",
"qty": 1,
"price": 1500.00,
"cost": 1500.00
}
],
"total": 2300
}
]
)
3.2、案例一:计算到目前为止的所有订单的总销售额
db.orders.aggregate(
[
{
$group:{
_id: null,
total: {$sum:"$total"}
}
}
]
)
3.3、案例二:查询2019年第一季度(1月1日-3月31日)已完成订单(completed)的订单总金额和订单总数
db.orders.aggregate(
[
{
'$match': {
'status': 'completed',
'orderDate': {
'$gte': '2019-01-01',
'$lt': '2019-04-01'
}
}
}, {
'$group': {
'_id': null,
'total': {
'$sum': '$total'
},
'shippingFee': {
'$sum': '$shippingFee'
},
'count': {
'$sum': 1
}
}
}, {
'$project': {
'grandTotal': {
'$add': [
'$total', '$shippingFee'
]
},
'_id': 0
}
}
]
)
四、实验二
4.1、$project实例
db.article.aggregate({
$project: {
title: 1,
author: 1
}
})
说明:执行结果中将包含_id、title、author三个字段,默认情况下_id字段是被包含的。1:显示、0:不显示
4.2、$match实例
db.article.aggregate([
{
$match: {
score: {$gt: 70,$lte: 90}
}
},
{
$group: {
_id: null,count: {$sum:1}
}
}
])
说明:$match用于获取分数大于70小于等于90的记录,然后将符合条件的记录送到下一阶段$group管道操作符进行处理
4.3、$skip实例
db.article.aggregate(
{$skip: 5}
)
说明:经过$skip管道操作符处理后,前边5个文档被"过滤"掉
4.4、集合分类统计实例
# 创建索引(db)
use goodsdb
# 插入数据
db.order_detail.insert([
{goodsid:"1001",amount:2,price:10.2,sale:false},
{goodsid:"1001",amount:3,price:14.8,sale:false},
{goodsid:"1002",amount:10,price:50,sale:false},
{goodsid:"1003",amount:2,price:10,sale:true}
])
# 执行查询
db.order_detail.aggregate(
[
{$match:{sale:false}},
{$group:
{
_id:"$goodsid",
total:{$sum:"$amount"}
}
}
]
)
4.5、综合案例
4.5.1、初始化数据
# 需求1:在task数据库的courses集合中插入如下数据
use task
# 中文乱码
db.courses.insert(
[
{course:"隐私保护基础",teacher:"吴娟",classperiod:32,experimental_lessons:0,classnum:1},
{course:"网络安全管理",teacher:"吴娟",classperiod:32,experimental_lessons:0,classnum:1},
{course:"NoSQL数据库技术",teacher:"陈雨婕",classperiod:48,experimental_lessons:15,classnum:1},
{course:"操作系统" ,teacher:"陈雨婕",classperiod:64,experimental_lessons:15,classnum:1},
{course:"大数据处理技术",teacher:"陈雨婕" ,classperiod:48,experimental_lessons:15,classnum:3},
{course:"人工智能",teacher:"邓敏娜",classperiod:48,experimental_lessons:15,classnum:1},
{course:"算法分析与设计",teacher:"邓敏娜",classperiod:48,experimental_lessons:0,classnum:2},
{course:"统计分析技术",teacher:"段红叶",classperiod:32,experimental_lessons:8,classnum:1},
{course:"非结构化大数据分析",teacher:"段红叶" ,classperiod:32,experimental_lessons:0,classnum:1},
{course:"计算机网络" ,teacher:"段红叶",classperiod:48,experimental_lessons:0,classnum:2},
{course:"数据结构与算法课程实践",teacher:"段红叶",classperiod:32,experimental_lessons:0,classnum:1},
{course:"三维动画模型与渲染",teacher:"韩战壕",classperiod:48,experimental_lessons:15,classnum:1},
{course:"面向对象程序设计",teacher:"李贝贝",classperiod:64,experimental_lessons:15,classnum:2},
{course:"面向对象程序设计课程实践",teacher:"李贝贝" ,classperiod:24,experimental_lessons:0,classnum:2},
{course:"数字信号处理",teacher:"刘欢欢",classperiod:48,experimental_lessons:8,classnum:1},
{course:"操作系统",teacher:"刘欢欢",classperiod:64,experimental_lessons:15,classnum:2},
{course:"云计算技术" ,teacher:"王磊",classperiod:48,experimental_lessons:0,classnum:1},
{course:"智能科学与技术导论",teacher:"王磊",classperiod:48,experimental_lessons:0,classnum:1},
{course:"虚拟现实与可视化",teacher:"王磊" ,classperiod:48,experimental_lessons:15,classnum:1},
{course:"系统设计与分析",teacher:"王磊",classperiod:48,experimental_lessons:0,classnum:2},
{course:"数据库技术课程实践",teacher:"王磊",classperiod:32,experimental_lessons:0,classnum:1},
{course:"面向对象程序设计",teacher:"韦茜妤",classperiod:64,experimental_lessons:15,classnum:2},
{course:"面向对象程序设计课程实践",teacher:"韦茜妤" ,classperiod:24,experimental_lessons:0,classnum:2},
{course:"HTML5开发技术",teacher:"韦茜妤",classperiod:48,experimental_lessons:15,classnum:2},
{course:"数据结构与算法课程实践",teacher:"韦茜妤" ,classperiod:32,experimental_lessons:0,classnum:1}
]
)
# 无中文
db.courses.insert(
[
{course:"Privacy protection foundation",teacher:"wu juan",classperiod:32,experimental_lessons:0,classnum:1},
{course:"Network Security Management",teacher:"wu juan",classperiod:32,experimental_lessons:0,classnum:1},
{course:"NoSQL database technology",teacher:"chen yu jie",classperiod:48,experimental_lessons:15,classnum:1},
{course:"operating system" ,teacher:"chen yu jie",classperiod:64,experimental_lessons:15,classnum:1},
{course:"Big data Processing Technology",teacher:"chen yu jie" ,classperiod:48,experimental_lessons:15,classnum:3},
{course:"artificial intelligence",teacher:"deng min na",classperiod:48,experimental_lessons:15,classnum:1},
{course:"Algorithm Analysis and Design",teacher:"deng min na",classperiod:48,experimental_lessons:0,classnum:2},
{course:"Statistical analysis techniques",teacher:"duan ye hong",classperiod:32,experimental_lessons:8,classnum:1},
{course:"Unstructured Big data analysis",teacher:"duan ye hong" ,classperiod:32,experimental_lessons:0,classnum:1},
{course:"computer network" ,teacher:"duan ye hong",classperiod:48,experimental_lessons:0,classnum:2},
{course:"Practice of Data Structure and Algorithms Course",teacher:"duan ye hong",classperiod:32,experimental_lessons:0,classnum:1},
{course:"3D Animation Model and Rendering",teacher:"han zhan hao",classperiod:48,experimental_lessons:15,classnum:1},
{course:"Object-Oriented Programming",teacher:"li bei bei",classperiod:64,experimental_lessons:15,classnum:2},
{course:"Object Oriented Programming Course Practice",teacher:"li bei bei" ,classperiod:24,experimental_lessons:0,classnum:2},
{course:"Digital signal processing",teacher:"liu huan huan",classperiod:48,experimental_lessons:8,classnum:1},
{course:"operating system",teacher:"liu huan huan",classperiod:64,experimental_lessons:15,classnum:2},
{course:"Cloud computing technology" ,teacher:"wang lei",classperiod:48,experimental_lessons:0,classnum:1},
{course:"Introduction to Intelligent Science and Technology",teacher:"wang lei",classperiod:48,experimental_lessons:0,classnum:1},
{course:"Virtual Reality and Visualization",teacher:"wang lei" ,classperiod:48,experimental_lessons:15,classnum:1},
{course:"system design and analysis",teacher:"wang lei",classperiod:48,experimental_lessons:0,classnum:2},
{course:"Database Technology Course Practice",teacher:"wang lei",classperiod:32,experimental_lessons:0,classnum:1},
{course:"Object-Oriented Programming",teacher:"wei qian yu",classperiod:64,experimental_lessons:15,classnum:2},
{course:"Object Oriented Programming Course Practice",teacher:"wei qian yu" ,classperiod:24,experimental_lessons:0,classnum:2},
{course:"HTML5 development technology",teacher:"wei qian yu",classperiod:48,experimental_lessons:15,classnum:2},
{course:"Practice of Data Structure and Algorithms Course",teacher:"wei qian yu" ,classperiod:32,experimental_lessons:0,classnum:1}
]
)
4.5.2、统计每个老师上课课时共多少节课
# 格式化前
db.courses.aggregate({$group:{_id:"$teacher",lesson_num:{$sum:{$multiply:["$classperiod","$classnum"]}}}})
# 格式化后
db.courses.aggregate({
$group:{
_id:"$teacher",
lesson_num:{
$sum:{
$multiply:["$classperiod","$classnum"]
}
}
}
})
4.5.3、课时超过32节课的老师有谁,他们分别有多少节超过32课时的课
# 格式化前
db.courses.aggregate([{$match: {classperiod: {$gt:32}}},{$group: {_id: "$teacher",classnum: {$sum:1}}}])
# 格式化后
db.courses.aggregate([
{
$match: {
classperiod: {$gt:32}
}
},
{
$group: {
_id: "$teacher",
classnum: {$sum:1}
}
}
])