至今指标是什么
至今指标是什么呢?请看下面的图片:
在上图中有月至今销售额,月至今销售额是从每天所在月的1号开始将销售额累加到当前日期。例如,在上图中,在 2020-01-11 这行上,月至今单量=01号销售额+02号销售额+…+ 11号销售额。
说完,月至今的,再来说说,年至今的销售额=1月1号销售额+1月2号销售额+…+1月31号销售额+…+2月1号销售额+13月31号销售额。
周至今的销售额=周一销售额+周二销售额+周三销售额+周四销售额+周五销售额+周六销售额+周日销售额
这就是所谓的至今指标。
全量如何计算
说完来至今指标是什么。我们接下来说说用什么办法计算出至今指标。
首先,我们来谈谈使用开窗函数计算至今值。请看:
with detail as ( select "2019-06-06" as sale_date , "shop1" as shop_id , 10 as sale_amt union all select "2019-06-05" as sale_date , "shop1" as shop_id , 14 as sale_amt union all select "2019-06-03" as sale_date , "shop1" as shop_id , 17 as sale_amt union all select "2019-06-02" as sale_date , "shop1" as shop_id , 18 as sale_amt union all select "2019-06-01" as sale_date , "shop1" as shop_id , 13 as sale_amt union all select "2019-06-06" as sale_date , "shop2" as shop_id , 11 as sale_amt union all select "2019-06-05" as sale_date , "shop2" as shop_id , 15 as sale_amt union all select "2019-06-03" as sale_date , "shop2" as shop_id , 18 as sale_amt union all select "2019-06-02" as sale_date , "shop2" as shop_id , 19 as sale_amt union all select "2019-06-01" as sale_date , "shop2" as shop_id , 16 as sale_amt), dim_date as ( select "2019-06-01" as date_d , "2019-06" as month_code union all select "2019-06-02" as date_d , "2019-06" as month_code union all select "2019-06-03" as date_d , "2019-06" as month_code union all select "2019-06-04" as date_d , "2019-06" as month_code union all select "2019-06-05" as date_d , "2019-06" as month_code union all select "2019-06-06" as date_d , "2019-06" as month_code union all select "2019-06-07" as date_d , "2019-06" as month_code union all select "2019-06-08" as date_d , "2019-06" as month_code union all select "2019-06-09" as date_d , "2019-06" as month_code union all select "2019-06-10" as date_d , "2019-06" as month_code union all select "2019-06-11" as date_d , "2019-06" as month_code union all select "2019-06-12" as date_d , "2019-06" as month_code )select a.sale_date , a.shop_id , a.sale_amt , sum(a.sale_amt) over(partition by b.month_code , a.shop_id order by a.sale_date) as agg_sale_amtfrom detail as a left join dim_date as b on a.sale_date = b.date_d
上面这段 query 使用开窗函数来实现的。
我们还会遇到下面的问题。
从图中,我们可看到,在2020-01-03中 B 没有营业,所以没有销售额了。在2020-01-04月份,可能因为疫情的原因,A、B两家店都没有营业。那我们在2020-01-03的月至今指标的时候,使用开窗函数的话,就不能计算出 B 门店的销售金额了。
应该怎么做呢?一个先做一个表,此表包含了日期、门店的信息,大概就是下面的样子吧。
我们管这个表叫做full_dim吧,然后,我们可以像下面这样写了。
with full_dim as ( select "2019-01-01" as date_d, "2019-01" as month_code , "shop1" as shop_code union all select "2019-01-02" as date_d, "2019-01" as month_code , "shop1" as shop_code union all select "2019-01-03" as date_d , "2019-01" as month_code , "shop1" as shop_code union all select "2019-01-04" as date_d , "2019-01" as month_code , "shop1" as shop_code union all select "2019-01-05" as date_d , "2019-01" as month_code , "shop1" as shop_code union all select "2019-01-06" as date_d , "2019-01" as month_code , "shop1" as shop_code union all select "2019-01-02" as date_d , "2019-01" as month_code , "shop2" as shop_code union all select "2019-01-03" as date_d , "2019-01" as month_code , "shop2" as shop_code union all select "2019-01-04" as date_d , "2019-01" as month_code , "shop2" as shop_code union all select "2019-01-05" as date_d , "2019-01" as month_code , "shop2" as shop_code union all select "2019-01-06" as date_d , "2019-01" as month_code , "shop2" as shop_code )select a.date_d ,a.shop_code ,sum(b.sale_amt) over(partiton by a.shop_code order by a.date_d) as sale_amtfrom full_dim as aleft join fct_sale_d as b on a.shop_code = b.shop_code and a.date_d = b.date_d
full_dim的作用是补全所有的维度值。这样补一下,上面说的补连续的问题就解决了。
full_dim的生成方式,可以使用维度表拼接而成,另外,可以使用事实表来做。
维度表拼接:
select a.date_d ,b.shop_code ,b.shop_name from dim_date as a-- 日期维表cross join dim_store as b -- 门店维表
事实表的方式:
select shop_code , shop_name , sale_datefrom fct_sale_d where ts between "start_date" and "end_date" group by shop_code , shop_name , sale_date
增量如何计算
开窗函数可以实现全量的方式来计算至今值。接下来是说说增量的方式了。
我归纳了下面几个步骤:
- 计算T的日期门店销售额
- 计算出T+1的日期门店销售额
- T 和 T+1 的数据做 full join
- 使用 coalesce() 实现相同门店日期下的值相加,不同日期门店的值设置初始值。
- 使用流程判断语句来判断,周至今、月至今、年至今的开始。
第一、计算T的日期门店销售额:
select sale_date ,shop_code ,shop_name ,sum(sale_amt) as sale_amt from fct_sale_d where ts between "T" and "T"
第二、计算出T+1的日期门店销售额
select sale_date ,shop_code ,shop_name ,sum(sale_amt) as sale_amt from fct_sale_d where ts between "T+1" and "T+1"
第三、T 和 T+1 的数据做 full join
with table_t as (select sale_date ,shop_code ,shop_name ,sum(sale_amt) as sale_amt from fct_sale_d where ts between "T" and "T" ), table_t_1 as (select sale_date ,shop_code ,shop_name ,sum(sale_amt) as sale_amt from fct_sale_d where ts between "T+1" and "T+1" )select "T+1" as sale_date , coalesce(a.shop_code ,b.shop_code) as shop_code , coalesce(a.shop_name ,b.shop_name) as shop_name , coalesce(a.sale_amt ,0)+coalesce(b.sale_amt ,0) as sale_amt from table_t as a full join table_t_1 as b on a.shop_code = b.shop_code
第四、使用流程判断语句来判断,周至今、月至今、年至今的开始
with table_t as (select sale_date ,shop_code ,shop_name ,sum(sale_amt) as sale_amt from fct_sale_d where ts between "T" and "T" ), table_t_1 as (select sale_date ,shop_code ,shop_name ,sum(sale_amt) as sale_amt from fct_sale_d where ts between "T+1" and "T+1" ) , dim_date as ( select date_d , month_code , month_first_day_ind from dim_date where date_d between "T+1" and "T+1" )select "T+1" as sale_date , coalesce(a.shop_code ,b.shop_code) as shop_code , coalesce(a.shop_name ,b.shop_name) as shop_name -- 如果sale_date为所在月的第一天,那不用向sale_amt字段中加入 T 的销售额 , if(c.month_first_day_ind= 1 , 0 , coalesce(a.sale_amt ,0))+coalesce(b.sale_amt ,0) as sale_amt from table_t as a full join table_t_1 as b on a.shop_code = b.shop_codeleft join dim_date as c on c.date_d = a.sale_date
增量和全量计算的优缺点分析
总结如下所示:
- 全量的方式适用于跑历史数据。例如,要求跑 1 年的历史数据,我们可以跑一个时间段里面的数据。
- 增量的方式适用于每天更新情况,这中方式天然的解决维度在日期上连续的缺点。但是在跑历史数据的时候,需要每天执行一次,大家知道,一个计算平台为一个查询任务准备运行资源过程都是一样的,一个任务真正执行可能用不了多长时间,但是启动和停止任务用的时间是差不多的,所以这不利于跑历史数据。
End.
爱数据网专栏作者:wang-possible
作者介绍:6年零售大数据工作经验,技能持续精进CSDN个人主页:bluedraam_pp
- 我的微信公众号
- 微信扫一扫
- 我的微信公众号
- 微信扫一扫
评论