GROUP BY 子句
2025年1月3日大约 5 分钟
GROUP BY 子句
1 语法概览
GROUP BY expression (',' expression)*
- GROUP BY 子句用于将 SELECT 语句的结果集按指定的列值进行分组计算。这些分组列的值在结果中保持原样,其他列中具备相同分组列值的所有记录通过指定的聚合函数(例如 COUNT、AVG)进行计算。
2 注意事项
- 在 SELECT 子句中的项必须包含聚合函数或由出现在 GROUP BY 子句中的列组成。
合法示例:
SELECT concat(device_id, model_id), avg(temperature)
FROM table1
GROUP BY device_id, model_id; -- 合法
执行结果如下:
+-----+-----+
|_col0|_col1|
+-----+-----+
| 100A| 90.0|
| 100C| 86.0|
| 100E| 90.0|
| 101B| 85.0|
| 101D| 85.0|
| 101F| 90.0|
+-----+-----+
Total line number = 6
It costs 0.094s
不合法示例1:
SELECT device_id, temperature
FROM table1
GROUP BY device_id;-- 不合法
执行结果如下:
Msg: org.apache.iotdb.jdbc.IoTDBSQLException: 701:
'temperature' must be an aggregate expression or appear in GROUP BY clause
不合法示例2:
SELECT device_id, avg(temperature)
FROM table1
GROUP BY model; -- 不合法
执行结果如下:
Msg: org.apache.iotdb.jdbc.IoTDBSQLException: 701:
Column 'model' cannot be resolved
- 如果没有 GROUP BY 子句,则 SELECT 子句中的所有项要么都包含聚合函数,要么都不包含聚合函数。
合法示例:
SELECT COUNT(*), avg(temperature)
FROM table1; -- 合法
执行结果如下:
+-----+-----------------+
|_col0| _col1|
+-----+-----------------+
| 18|87.33333333333333|
+-----+-----------------+
Total line number = 1
It costs 0.094s
不合法示例:
SELECT humidity, avg(temperature) FROM table1; -- 不合法
执行结果如下:
Msg: org.apache.iotdb.jdbc.IoTDBSQLException: 701:
'humidity' must be an aggregate expression or appear in GROUP BY clause
- group by子句可以使用从 1 开始的常量整数来引用 SELECT 子句中的项,如果常量整数小于1或大于选择项列表的大小,则会抛出错误。
SELECT date_bin(1h, time), device_id, avg(temperature)
FROM table1
WHERE time >= 2024-11-27 00:00:00 and time <= 2024-11-29 00:00:00
GROUP BY 1, device_id;
执行结果如下:
+-----------------------------+---------+-----+
| _col0|device_id|_col2|
+-----------------------------+---------+-----+
|2024-11-28T08:00:00.000+08:00| 100| 85.0|
|2024-11-28T09:00:00.000+08:00| 100| null|
|2024-11-28T10:00:00.000+08:00| 100| 85.0|
|2024-11-28T11:00:00.000+08:00| 100| 88.0|
|2024-11-27T16:00:00.000+08:00| 101| 85.0|
+-----------------------------+---------+-----+
Total line number = 5
It costs 0.092s
- 不支持在 group by 子句中使用 select item 的别名。以下 SQL 将抛出错误,可以使用上述 SQL 代替。
SELECT date_bin(1h, time) AS hour_time, device_id, avg(temperature)
FROM table1
WHERE time >= 2024-11-27 00:00:00 and time <= 2024-11-29 00:00:00
GROUP BY date_bin(1h, time), device_id;
执行结果如下:
+-----------------------------+---------+-----+
| hour_time|device_id|_col2|
+-----------------------------+---------+-----+
|2024-11-28T08:00:00.000+08:00| 100| 85.0|
|2024-11-28T09:00:00.000+08:00| 100| null|
|2024-11-28T10:00:00.000+08:00| 100| 85.0|
|2024-11-28T11:00:00.000+08:00| 100| 88.0|
|2024-11-27T16:00:00.000+08:00| 101| 85.0|
+-----------------------------+---------+-----+
Total line number = 5
It costs 0.092s
- 只有 COUNT 函数可以与星号(*)一起使用,用于计算表中的总行数。其他聚合函数与
*
一起使用,将抛出错误。
SELECT count(*) FROM table1;
执行结果如下:
+-----+
|_col0|
+-----+
| 18|
+-----+
Total line number = 1
It costs 0.047s
3 示例数据
在示例数据页面中,包含了用于构建表结构和插入数据的SQL语句,下载并在IoTDB CLI中执行这些语句,即可将数据导入IoTDB,您可以使用这些数据来测试和执行示例中的SQL语句,并获得相应的结果。
示例 1:降采样时间序列数据
对设备 101 下述时间范围的温度进行降采样,每小时返回一个平均温度。
SELECT date_bin(1h, time) AS hour_time, AVG(temperature) AS avg_temperature
FROM table1
WHERE time >= 2024-11-27 00:00:00 and time <= 2024-11-30 00:00:00
AND device_id='101'
GROUP BY 1;
执行结果如下:
+-----------------------------+---------------+
| hour_time|avg_temperature|
+-----------------------------+---------------+
|2024-11-29T10:00:00.000+08:00| 85.0|
|2024-11-27T16:00:00.000+08:00| 85.0|
+-----------------------------+---------------+
Total line number = 2
It costs 0.054s
对每个设备过去一天的温度进行降采样,每小时返回一个平均温度。
SELECT date_bin(1h, time) AS hour_time, device_id, AVG(temperature) AS avg_temperature
FROM table1
WHERE time >= 2024-11-27 00:00:00 and time <= 2024-11-30 00:00:00
GROUP BY 1, device_id;
执行结果如下:
+-----------------------------+---------+---------------+
| hour_time|device_id|avg_temperature|
+-----------------------------+---------+---------------+
|2024-11-29T11:00:00.000+08:00| 100| null|
|2024-11-29T18:00:00.000+08:00| 100| 90.0|
|2024-11-28T08:00:00.000+08:00| 100| 85.0|
|2024-11-28T09:00:00.000+08:00| 100| null|
|2024-11-28T10:00:00.000+08:00| 100| 85.0|
|2024-11-28T11:00:00.000+08:00| 100| 88.0|
|2024-11-29T10:00:00.000+08:00| 101| 85.0|
|2024-11-27T16:00:00.000+08:00| 101| 85.0|
+-----------------------------+---------+---------------+
Total line number = 8
It costs 0.081s
有关date_bin函数的更多详细信息可以参见 date_bin (时间分桶规整)函数功能定义
示例 2:查询每个设备的最新数据点
SELECT device_id, LAST(temperature), LAST_BY(time, temperature)
FROM table1
GROUP BY device_id;
执行结果如下:
+---------+-----+-----------------------------+
|device_id|_col1| _col2|
+---------+-----+-----------------------------+
| 100| 90.0|2024-11-29T18:30:00.000+08:00|
| 101| 90.0|2024-11-30T14:30:00.000+08:00|
+---------+-----+-----------------------------+
Total line number = 2
It costs 0.078s
示例 3:计算总行数
计算所有设备的总行数:
SELECT COUNT(*) FROM table1;
执行结果如下:
+-----+
|_col0|
+-----+
| 18|
+-----+
Total line number = 1
It costs 0.060s
计算每个设备的总行数:
SELECT device_id, COUNT(*) AS total_rows
FROM table1
GROUP BY device_id;
执行结果如下:
+---------+----------+
|device_id|total_rows|
+---------+----------+
| 100| 8|
| 101| 10|
+---------+----------+
Total line number = 2
It costs 0.060s
示例 4:没有 group by 子句的聚合
查询所有设备中的最大温度:
SELECT MAX(temperature)
FROM table1;
执行结果如下:
+-----+
|_col0|
+-----+
| 90.0|
+-----+
Total line number = 1
It costs 0.086s
示例 5:对子查询的结果进行聚合
查询在指定时间段内平均温度超过 80.0 且至少有两次记录的设备和工厂组合:
SELECT plant_id, device_id
FROM (
SELECT date_bin(10m, time) AS time, plant_id, device_id, AVG(temperature) AS temp FROM table1 WHERE time >= 2024-11-26 00:00:00 AND time <= 2024-11-29 00:00:00
GROUP BY 1, plant_id, device_id
)
WHERE temp > 80.0
GROUP BY plant_id, device_id
HAVING COUNT(*) > 1;
执行结果如下:
+--------+---------+
|plant_id|device_id|
+--------+---------+
| 1001| 101|
| 3001| 100|
+--------+---------+
Total line number = 2
It costs 0.073s