大数据实战(五十一):电商数仓(三十四)之系统业务数据仓库(七)数仓搭建(五)DWS层之用户行为宽表
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了大数据实战(五十一):电商数仓(三十四)之系统业务数据仓库(七)数仓搭建(五)DWS层之用户行为宽表,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含3459字,纯文字阅读大概需要5分钟。
内容图文
1)为什么要建宽表
需求目标,把每个用户单日的行为聚合起来组成一张多列宽表,以便之后关联用户维度信息后进行,不同角度的统计分析
1 创建用户行为宽表
hive (gmall)> drop table if exists dws_user_action; create external table dws_user_action ( user_id string comment ‘用户 id‘, order_count bigint comment ‘下单次数 ‘, order_amount decimal(16,2) comment ‘下单金额 ‘, payment_count bigint comment ‘支付次数‘, payment_amount decimal(16,2) comment ‘支付金额 ‘, comment_count bigint comment ‘评论次数‘ ) COMMENT ‘每日用户行为宽表‘ PARTITIONED BY (`dt` string) stored as parquet location ‘/warehouse/gmall/dws/dws_user_action/‘;
2 向用户行为宽表导入数据
1)导入数据
hive (gmall)> with tmp_order as ( select user_id, count(*) order_count, sum(oi.total_amount) order_amount from dwd_order_info oi where date_format(oi.create_time,‘yyyy-MM-dd‘)=‘2019-02-10‘ group by user_id ) , tmp_payment as ( select user_id, sum(pi.total_amount) payment_amount, count(*) payment_count from dwd_payment_info pi where date_format(pi.payment_time,‘yyyy-MM-dd‘)=‘2019-02-10‘ group by user_id ), tmp_comment as ( select user_id, count(*) comment_count from dwd_comment_log c where date_format(c.dt,‘yyyy-MM-dd‘)=‘2019-02-10‘ group by user_id ) insert overwrite table dws_user_action partition(dt=‘2019-02-10‘) select user_actions.user_id, sum(user_actions.order_count), sum(user_actions.order_amount), sum(user_actions.payment_count), sum(user_actions.payment_amount), sum(user_actions.comment_count) from ( select user_id, order_count, order_amount, 0 payment_count, 0 payment_amount, 0 comment_count from tmp_order union all select user_id, 0, 0, payment_count, payment_amount, 0from tmp_payment union all select user_id, 0, 0, 0, 0, comment_count from tmp_comment ) user_actions group by user_id;
2)查询导入结果
hive (gmall)> select * from dws_user_action;
3 用户行为数据宽表导入脚本
1)在/home/atguigu/bin目录下创建脚本dws_db_wide.sh
[atguigu@hadoop102 bin]$ vim dws_db_wide.sh
在脚本中填写如下内容
#!/bin/bash
# 定义变量方便修改
APP=gmall
hive=/opt/module/hive/bin/hive
# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
with
tmp_order as
(
select
user_id,
sum(oi.total_amount) order_amount,
count(*) order_count
from "$APP".dwd_order_info oi
where date_format(oi.create_time,‘yyyy-MM-dd‘)=‘$do_date‘
group by user_id
) ,
tmp_payment as
(
select
user_id,
sum(pi.total_amount) payment_amount,
count(*) payment_count
from "$APP".dwd_payment_info pi
where date_format(pi.payment_time,‘yyyy-MM-dd‘)=‘$do_date‘
group by user_id
),
tmp_comment as
(
select
user_id,
count(*) comment_count
from "$APP".dwd_comment_log c
where date_format(c.dt,‘yyyy-MM-dd‘)=‘$do_date‘
group by user_id
)
Insert overwrite table "$APP".dws_user_action partition(dt=‘$do_date‘)
select
user_actions.user_id,
sum(user_actions.order_count),
sum(user_actions.order_amount),
sum(user_actions.payment_count),
sum(user_actions.payment_amount),
sum(user_actions.comment_count)
from
(
select
user_id,
order_count,
order_amount,
0 payment_count,
0 payment_amount,
0 comment_count
from tmp_order
union all
select
user_id,
0,
0,
payment_count,
payment_amount,
0
from tmp_payment
union all
select
user_id,
0,
0,
0,
0,
comment_count
from tmp_comment
) user_actions
group by user_id;
"
$hive -e "$sql"
2)增加脚本执行权限
[atguigu@hadoop102 bin]$ chmod 777 dws_db_wide.sh
3)执行脚本导入数据
[atguigu@hadoop102 bin]$ dws_db_wide.sh 2019-02-11
4)查看导入数据
hive (gmall)>
select * from dws_user_action where dt=‘2019-02-11‘ limit 2;
原文:https://www.cnblogs.com/qiu-hua/p/13547364.html
内容总结
以上是互联网集市为您收集整理的大数据实战(五十一):电商数仓(三十四)之系统业务数据仓库(七)数仓搭建(五)DWS层之用户行为宽表全部内容,希望文章能够帮你解决大数据实战(五十一):电商数仓(三十四)之系统业务数据仓库(七)数仓搭建(五)DWS层之用户行为宽表所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。