窗口函数


rows between ... and ... 子句(滑动窗口)

  • 语法:rows between ... and ...

    • unbounded preceding
      • 组内第一行数据
    • n preceding
      • 组内当前行,向前数第 n 行数据
    • current row
      • 当前行数据
    • n following
      • 组内当前行,先后数第 n 行数据
    • unbounded following
      • 组内最后一行数据
    • 093
  • 示例:第一行直到当前行的和

    • select
          name, salary, dept,
          rank() over(
              order by salary
          )as rank,
          sum(salary) over(
              order by salary
          ) as sum_over_without_rows,
          sum(salary) over(
              order by salary
              rows between unbounded preceding and current row
          ) as sum_over_with_rows
      from test.test;
      
    • 094

    • 可以看到(参考

      • 不使用 rows between ... and ... 子句:对于并列排序,并列执行聚合函数
      • 使用了 rows between ... and ... 子句:严格逐行执行聚合函数
  • 举例:

    • -- 组内,第一行到最后一行的和
      select
          name, salary, dept,
          sum(salary) over(
              partition by dept 
              order by salary
              rows between unbounded preceding and unbounded following
          )
      from test.test;
      
      -- 组内,上一行、当前行、下一行的和
      select
          name, salary, dept,
          sum(salary) over(
              partition by dept
              order by salary
              rows between 1 preceding and 1 following
          )
      from test.test;
      
    • 095


序列函数

  • LAG (scalar_expression [,offset] [,default]) OVER ([query_partition_clause] order_by_clause)

    • 返回当前行的上一行(也可以指定偏移量)数据

    • 如果不存在上一个值,那么使用 default 值,如果也没有指定 default 值,则使用 null

    • 对于记录中本身的 null 值,不会使用 default 值代替

    • drop table if exists test.test;
      create table IF NOT EXISTS test.test(
          name string,
          salary int,
          dept string
      );
      insert into test.test values
      ('刘一', 5000, 'dept1'),
      ('陈二', 6000, 'dept1'),
      ('张三', 5000, 'dept1'),
      ('李四', 4500, 'dept1'),
      (null, 12000, 'dept2'),
      ('赵六', 15000, 'dept2'),
      ('孙七', 13000, 'dept2'),
      ('周八', 8000, 'dept2'),
      ('吴九', 12000, 'dept2'),
      ('郑十', 12000, 'dept2');
      select * from test.test;
      
      select
          dept, salary, name,
          lag(name) over(partition by dept order by salary),
          lag(name,0) over(partition by dept order by salary), -- 使用原值
          lag(name,1) over(partition by dept order by salary), -- 相当于不指定偏移
          lag(name,2) over(partition by dept order by salary),
          lag(name,2,'财神爷') over(partition by dept order by salary)
      from test.test;
      
    • 096

  • LEAD (scalar_expression [,offset] [,default]) OVER ([query_partition_clause] order_by_clause)

    • lag 相反,返回当前行的下一行(或指定偏移量)数据

    • select
          dept, salary, name,
          lead(name) over(order by salary),
          lead(name) over(partition by dept order by salary)
      from test.test;
      
    • 097

  • first_value

    • 取分组内排序后的第一个值
  • last_value

    • 分组内排序后,(截止到当前行的,)最后一个值

    • select
          dept, salary, name,
          first_value(name) over(),
          last_value(name) over(),
          last_value(name) over(partition by dept)
      from test.test;
      
      select
          dept, salary, name,
          first_value(name) over(
              partition by dept
              order by salary, name
          ),
          last_value(name) over(
              partition by dept
              order by salary, name
          ),
          last_value(name, true) over(
              partition by dept
              order by salary, name
          ),
          last_value(name, true) over(
              partition by dept
              order by salary, name
              rows between unbounded preceding and unbounded following
          )
      from test.test;
      
    • 098