至今指标总结

wang-possible
wang-possible
wang-possible
9
文章
0
评论
2020-09-0313:09:00 评论 86 5203字
摘要

本文首先向大家介绍什么是至今指标,包括年至今、月至今、周至今指标。在掌握至今指标的概念后,本文将对如何对至今指标进行计算,以及对不同的计算方法(全量、增量)的优缺点进行分析总结。

至今指标是什么

至今指标是什么呢?请看下面的图片:

至今指标总结

在上图中有月至今销售额,月至今销售额是从每天所在月的1号开始将销售额累加到当前日期。例如,在上图中,在 2020-01-11 这行上,月至今单量=01号销售额+02号销售额+…+ 11号销售额。

说完,月至今的,再来说说,年至今的销售额=1月1号销售额+1月2号销售额+…+1月31号销售额+…+2月1号销售额+13月31号销售额。

周至今的销售额=周一销售额+周二销售额+周三销售额+周四销售额+周五销售额+周六销售额+周日销售额

这就是所谓的至今指标。

全量如何计算

说完来至今指标是什么。我们接下来说说用什么办法计算出至今指标。

首先,我们来谈谈使用开窗函数计算至今值。请看:

with detail as (             select "2019-06-06" as sale_date , "shop1" as shop_id , 10 as sale_amt   union all select "2019-06-05" as sale_date , "shop1" as shop_id , 14 as sale_amt   union all select "2019-06-03" as sale_date , "shop1" as shop_id , 17 as sale_amt   union all select "2019-06-02" as sale_date , "shop1" as shop_id , 18 as sale_amt    union all select "2019-06-01" as sale_date , "shop1" as shop_id , 13 as sale_amt    union all select "2019-06-06" as sale_date , "shop2" as shop_id , 11 as sale_amt   union all select "2019-06-05" as sale_date , "shop2" as shop_id , 15 as sale_amt   union all select "2019-06-03" as sale_date , "shop2" as shop_id , 18 as sale_amt   union all select "2019-06-02" as sale_date , "shop2" as shop_id , 19 as sale_amt    union all select "2019-06-01" as sale_date , "shop2" as shop_id , 16 as sale_amt), dim_date as (    select "2019-06-01" as date_d , "2019-06" as month_code    union all select "2019-06-02" as date_d , "2019-06" as month_code    union all select "2019-06-03" as date_d , "2019-06" as month_code    union all select "2019-06-04" as date_d , "2019-06" as month_code    union all select "2019-06-05" as date_d , "2019-06" as month_code    union all select "2019-06-06" as date_d , "2019-06" as month_code    union all select "2019-06-07" as date_d , "2019-06" as month_code    union all select "2019-06-08" as date_d , "2019-06" as month_code            union all select "2019-06-09" as date_d , "2019-06" as month_code         union all select "2019-06-10" as date_d , "2019-06" as month_code         union all select "2019-06-11" as date_d , "2019-06" as month_code         union all select "2019-06-12" as date_d , "2019-06" as month_code            )select a.sale_date      , a.shop_id      , a.sale_amt     , sum(a.sale_amt) over(partition by b.month_code , a.shop_id order by a.sale_date) as agg_sale_amtfrom detail as a left join dim_date as b on a.sale_date = b.date_d 

上面这段 query 使用开窗函数来实现的。

我们还会遇到下面的问题。

至今指标总结

从图中,我们可看到,在2020-01-03中 B 没有营业,所以没有销售额了。在2020-01-04月份,可能因为疫情的原因,A、B两家店都没有营业。那我们在2020-01-03的月至今指标的时候,使用开窗函数的话,就不能计算出 B 门店的销售金额了。

应该怎么做呢?一个先做一个表,此表包含了日期、门店的信息,大概就是下面的样子吧。

至今指标总结

我们管这个表叫做full_dim吧,然后,我们可以像下面这样写了。

with full_dim as (  select "2019-01-01" as date_d, "2019-01" as month_code , "shop1" as shop_code   union all select "2019-01-02" as date_d, "2019-01" as month_code , "shop1" as shop_code   union all select "2019-01-03" as date_d , "2019-01" as month_code , "shop1" as shop_code  union all select "2019-01-04" as date_d , "2019-01" as month_code , "shop1" as shop_code  union all select "2019-01-05" as date_d , "2019-01" as month_code , "shop1" as shop_code  union all select "2019-01-06" as date_d , "2019-01" as month_code , "shop1" as shop_code  union all select "2019-01-02" as date_d , "2019-01" as month_code , "shop2" as shop_code   union all select "2019-01-03" as date_d , "2019-01" as month_code , "shop2" as shop_code  union all select "2019-01-04" as date_d , "2019-01" as month_code , "shop2" as shop_code  union all select "2019-01-05" as date_d , "2019-01" as month_code , "shop2" as shop_code  union all select "2019-01-06" as date_d , "2019-01" as month_code , "shop2" as shop_code  )select a.date_d       ,a.shop_code      ,sum(b.sale_amt) over(partiton by a.shop_code order by a.date_d) as sale_amtfrom full_dim as aleft join fct_sale_d as b on a.shop_code = b.shop_code and a.date_d = b.date_d 

full_dim的作用是补全所有的维度值。这样补一下,上面说的补连续的问题就解决了。

full_dim的生成方式,可以使用维度表拼接而成,另外,可以使用事实表来做。

维度表拼接:

select a.date_d        ,b.shop_code        ,b.shop_name from dim_date as a-- 日期维表cross join dim_store as b -- 门店维表

事实表的方式:

select shop_code , shop_name , sale_datefrom fct_sale_d where ts between "start_date" and "end_date" group by shop_code , shop_name , sale_date

增量如何计算

开窗函数可以实现全量的方式来计算至今值。接下来是说说增量的方式了。

我归纳了下面几个步骤:

  • 计算T的日期门店销售额
  • 计算出T+1的日期门店销售额
  • T 和 T+1 的数据做 full join
  • 使用 coalesce() 实现相同门店日期下的值相加,不同日期门店的值设置初始值。
  • 使用流程判断语句来判断,周至今、月至今、年至今的开始。

第一、计算T的日期门店销售额:

select sale_date       ,shop_code       ,shop_name       ,sum(sale_amt) as sale_amt from fct_sale_d where ts between "T" and "T" 

第二、计算出T+1的日期门店销售额

select sale_date       ,shop_code       ,shop_name       ,sum(sale_amt) as sale_amt from fct_sale_d where ts between "T+1" and "T+1" 

第三、T 和 T+1 的数据做 full join

with table_t as (select sale_date       ,shop_code       ,shop_name       ,sum(sale_amt) as sale_amt from fct_sale_d where ts between "T" and "T" ), table_t_1 as (select sale_date       ,shop_code       ,shop_name       ,sum(sale_amt) as sale_amt from fct_sale_d where ts between "T+1" and "T+1" )select "T+1" as sale_date      , coalesce(a.shop_code ,b.shop_code) as shop_code       , coalesce(a.shop_name ,b.shop_name) as shop_name       , coalesce(a.sale_amt ,0)+coalesce(b.sale_amt ,0) as sale_amt from table_t as a full join table_t_1 as b on a.shop_code = b.shop_code 

第四、使用流程判断语句来判断,周至今、月至今、年至今的开始

with table_t as (select sale_date       ,shop_code       ,shop_name       ,sum(sale_amt) as sale_amt from fct_sale_d where ts between "T" and "T" ), table_t_1 as (select sale_date       ,shop_code       ,shop_name       ,sum(sale_amt) as sale_amt from fct_sale_d where ts between "T+1" and "T+1" ) , dim_date as (    select date_d , month_code , month_first_day_ind    from dim_date     where date_d between "T+1" and "T+1" )select "T+1" as sale_date      , coalesce(a.shop_code ,b.shop_code) as shop_code       , coalesce(a.shop_name ,b.shop_name) as shop_name       -- 如果sale_date为所在月的第一天,那不用向sale_amt字段中加入 T 的销售额      , if(c.month_first_day_ind= 1 , 0 , coalesce(a.sale_amt ,0))+coalesce(b.sale_amt ,0) as sale_amt from table_t as a full join table_t_1 as b on a.shop_code = b.shop_codeleft join dim_date as c on c.date_d = a.sale_date 

增量和全量计算的优缺点分析

总结如下所示:

  • 全量的方式适用于跑历史数据。例如,要求跑 1 年的历史数据,我们可以跑一个时间段里面的数据。
  • 增量的方式适用于每天更新情况,这中方式天然的解决维度在日期上连续的缺点。但是在跑历史数据的时候,需要每天执行一次,大家知道,一个计算平台为一个查询任务准备运行资源过程都是一样的,一个任务真正执行可能用不了多长时间,但是启动和停止任务用的时间是差不多的,所以这不利于跑历史数据。

End.

爱数据网专栏作者:wang-possible

作者介绍:6年零售大数据工作经验,技能持续精进CSDN个人主页:bluedraam_pp

  • 我的微信公众号
  • 微信扫一扫
  • weinxin
  • 我的微信公众号
  • 微信扫一扫
  • weinxin
匿名

发表评论

匿名网友 填写信息

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: