11——20HIVE

2022-11-04 20:30:38 网络空间安全 ℃

　　HDFS

　　HIVE

　　HIve自己本身没有数据，分析的时候通过MR，直接存取数据的时候直接从HDFS来

　　HIve元数据存储在关系型数据库中元数据指的是文件行列和关系型数据库的映射

　　元数据是用来翻译sql语句的 HIVE可以看做HDFS的客户端

　　hive可以看做是hdfs的一个客户端

　　Hive:数据仓库。

　　Hive：解释器，编译器，优化器等。

　　Hive运行时，元数据存储在关系型数据库里面

　　Sql实际上是类sql语句也叫做hql HIve query language

　　帮助文档：

　　使用步骤：

　　启动hadoop集群：

　　首先启动ZK /usr/zookeeper-3.4.6/bin/zkServer.sh start

　　启用所有：start-all.sh

　　启用备份resourcemanager：

　　yarn-daemon.sh start resourcemanager

　　安装一个关系型数据库

　　（mysql）：yum install mysql-server

　　查看是否安装： rpm –qa

　　 grep mysql

　　卸载：rpm -e mysql-server-5.1.73-5.el6_6.x86_64

　　启动mysql服务：service mysqld start 默认是： service mysql start

　　创建用户：mysql> grant all on *.* to bob@% identified by 123456 %指允许远程

　　如果创建允许本地登录的用户，则是：

　　mysql> grant all on *.* to root@localhost identified by 123456

　　远程用户登录：mysql -uroot -p123456

　　显示所有可用数据库：mysql> show databases

　　切换数据库：use mysql

　　线束所有表：show tables

　　解压安装包

　　tar -zxvf apache-hive-1.2.1-bin.tar.gz

　　配置环境变量：

　　export HIVE_HOME=/usr/apache-hive-1.2.1-bin

　　export PATH=$PATH:$HIVE_HOME/bin

　　更新hadoopjar包：

　　先删除

　　rm -rf /usr/hadoop-2.5.1/share/hadoop/yarn/lib/jline-0.9.94.jar 搜索： find ./ -name jline*.jar

　　再拷贝：

　　cp-a/home/apache-hive-1.2.1-bin/lib/jline-2.12.jar/home/hadoop-2.5.2/share/hadoop/yarn/lib/

　　拷贝mysql的驱动

　　cp -a mysql-connector-java-5.1.32-bin.jar /usr/apache-hive-1.2.1-bin/lib/

　　hive-site.xml配置：

　　5、重命名conf下hive-default.xml.template 为hive-site.xml

　　注意：如果自己提前创建了数据库，mysql数据库的字符集为latin1

　　修改system:java.io.tmpdir 的目录变成一个绝对路径

　　便捷操作：启用vi搜索快速定位需要修改的地方 /ConnectionURl

　　修改system:java.io.tmpdir 的目录变成一个绝对路径时候快速替换

　　:1,$s/${system:user.name}/hive

　　1,$s/${system:java.io.tmpdir}//tmp

　　运行CLI

　　需要先打开关系型数据库的服务 service mysqld start

　　hive

　　文件默认放在：/user/hive/warehouse 建表后可见

　　建表：

　　文件样式：

　　123,CJH,baseball-pingpang-run,{pro:HB}-{city:HS}

　　文件映射：

　　将本地文件映射成一张表：

　　将hdfs中的文件映射成一张表

　　partitoin 分目录

　　可以将不同数据存放在不同目录中，加快查询速度

　　建表语句：

　　文件导入：

　　查询：

　　开启hiveServer

　　开启服务以后，就可以通过JDBC链接hive，实现编程处理

　　注意：平时开启这个服务时候先开启数据库的服务

　　开启服务

　　bin下 ./hiveserver2 &>>/tmp/hive.log &

　　查看hiveServer 是否监听10000端口

　　netstat -ntpl

　　 grep java

　　 grep 10000

　　测试客户端

　　打开客户端 beeline

　　连接数据库

　　node2是hive所在的机器

　　!connect jdbc:hive2://localhost:10000/default;

　　输入用户名，不用输入密码

　　java代码：

　　HQL脚本有三种方式执行

　　1、 hive –e hql

　　2、 hive -f hql.file

　　3、 hive jdbc 代码执行脚本

　　hive有两种函数：UDF，UDAF

　　1、 UDF:输入数据为一条数据，输出数据也为一条数据，

　　2、 UDAF:输入数据为多好数据。Count聚合函数。Avg,min,

　　自定义udf：

　　写个类，打成jar

　　add jar /root/udf.jar

　　CREATE TEMPORARY FUNCTION my_date_format AS com.cjh. Upper;

　　CREATE FUNCTION my_date_format2 AS com.cjh.Upper USING JAR hdfs://laoxiao/usr/udf.jar

　　java代码：

　　分临时的和永久的udf 永久的udf需要上传到hdfs中

　　JAVA代码

　　示例

　　查询掉话率最高的前10个基站编号

　　相关字段

　　record_time：通话时间

　　imei：基站编号

　　call_num 拨号次数

　　drop_num 掉话次数

　　掉话率=掉话次数/拨号次数

　　建表：

　　导入文件

　　sql语句

　　基本Hql语句集合

　　#通过参数hive.cli.print.header可以控制在cli中是否显示表的列名

　　set hive.cli.print.header=true;

　　CREATE TABLE page_view(viewTime INT, userid BIGINT,

　　page_url STRING, referrer_url STRING,

　　ip STRING COMMENT IP Address of the User)

　　COMMENT This is the page view table

　　PARTITIONED BY(dt STRING, country STRING)

　　ROW FORMAT DELIMITED

　　FIELDS TERMINATED BY \001

　　STORED AS SEQUENCEFILE; TEXTFILE

　　//sequencefile

　　create table tab_ip_seq(id int,name string,ip string,country string)

　　row format delimited

　　fields terminated by ,

　　stored as sequencefile;

　　insert overwrite table tab_ip_seq select * from tab_ext;

　　//create & load

　　create table tab_ip(id int,name string,ip string,country string)

　　row format delimited

　　fields terminated by ,

　　stored as textfile;

　　load data local inpath /home/hadoop/ip.txt into table tab_ext;

　　//external

　　CREATE EXTERNAL TABLE tab_ip_ext(id int, name string,

　　ip STRING,

　　country STRING)

　　ROW FORMAT DELIMITED FIELDS TERMINATED BY ,

　　STORED AS TEXTFILE

　　LOCATION /external/hive;

　　// CTAS 用于创建一些临时表存储中间结果

　　CREATE TABLE tab_ip_ctas

　　SELECT id new_id, name new_name, ip new_ip,country new_country

　　FROM tab_ip_ext

　　SORT BY new_id;

　　//insert from select 用于向临时表中追加中间结果数据

　　create table tab_ip_like like tab_ip;

　　insert overwrite table tab_ip_like

　　select * from tab_ip;

　　//CLUSTER <--相对高级一点，你可以放在有精力的时候才去学习>

　　create table tab_ip_cluster(id int,name string,ip string,country string)

　　clustered by(id) into 3 buckets;

　　load data local inpath /home/hadoop/ip.txt overwrite into table tab_ip_cluster;

　　set hive.enforce.bucketing=true;

　　insert into table tab_ip_cluster select * from tab_ip;

　　select * from tab_ip_cluster tablesample(bucket 2 out of 3 on id);

　　//PARTITION

　　create table tab_ip_part(id int,name string,ip string,country string)

　　partitioned by (part_flag string)

　　row format delimited fields terminated by ,;

　　load data local inpath /home/hadoop/ip.txt overwrite into table tab_ip_part

　　partition(part_flag=part1);

　　load data local inpath /home/hadoop/ip_part2.txt overwrite into table tab_ip_part

　　partition(part_flag=part2);

　　select * from tab_ip_part;

　　select * from tab_ip_part where part_flag=part2;

　　select count(*) from tab_ip_part where part_flag=part2;

　　alter table tab_ip change id id_alter string;

　　ALTER TABLE tab_cts ADD PARTITION (partCol = dt) location /external/hive/dt;

　　show partitions tab_ip_part;

　　//write to hdfs

　　insert overwrite local directory /home/hadoop/hivetemp/test.txt select * from tab_ip_part where part_flag=part1;

　　insert overwrite directory /hiveout.txt select * from tab_ip_part where part_flag=part1;

　　//array

　　create table tab_array(a array<int>,b array<string>)

　　row format delimited

　　fields terminated by \t

　　collection items terminated by ,;

　　示例数据

　　tobenbrone,laihama,woshishui 13866987898,13287654321

　　abc,iloveyou,itcast 13866987898,13287654321

　　select a[0] from tab_array;

　　select * from tab_array where array_contains(b,word);

　　insert into table tab_array select array(0),array(name,ip) from tab_ext t;

　　//map

　　create table tab_map(name string,info map<string,string>)

　　row format delimited

　　fields terminated by \t

　　collection items terminated by ;

　　map keys terminated by :;

　　示例数据：

　　fengjieage:18;size:36A;addr:usa

　　furong age:28;size:39C;addr:beijing;weight:180KG

　　load data local inpath /home/hadoop/hivetemp/tab_map.txt overwrite into table tab_map;

　　insert into table tab_map select name,map(name,name,ip,ip) from tab_ext;

　　//struct

　　create table tab_struct(name string,info struct<age:int,tel:string,addr:string>)

　　row format delimited

　　fields terminated by \t

　　collection items terminated by ,

　　load data local inpath /home/hadoop/hivetemp/tab_st.txt overwrite into table tab_struct;

　　insert into table tab_struct select name,named_struct(age,id,tel,name,addr,country) from tab_ext;

　　//cli shell

　　hive -S -e select country,count(*) from tab_ext > /home/hadoop/hivetemp/e.txt

　　有了这种执行机制，就使得我们可以利用脚本语言（bash shell,python）进行hql语句的批量执行

　　select * from tab_ext sort by id desc limit 5;

　　select a.ip,b.book from tab_ext a join tab_ip_book b on(a.name=b.name);

　　//UDF

　　select if(id=1,first,no-first),name from tab_ext;

　　hive>add jar /home/hadoop/myudf.jar;

　　hive>CREATE TEMPORARY FUNCTION my_lower AS org.dht.Lower;

　　select my_upper(name) from tab_ext;

标签：网络空间安全

上一篇：信息技术中考客观题总结：网络技术基础

下一篇：返回列表

11——20HIVE

相关推荐

信息技术中考客观题总结：网络技术基础

女子在家设置了捕鼠笼，回去后发现老鼠幼崽：这该放生了吗？

华硕ZX53VW6700游戏完美运行 天猫敏嘉诺数码专营店4699元销售中

增3款迷你机!ThinkCentre M系列新品发布

华硕ZX53VW6700游戏完美运行天猫敏嘉诺数码专营店4699元销售中