Hive进行测试,使用。

配置环境:

Master (192.168.31.30)

Slave38 (192.168.31.38)

Slave39 (192.168.31.39)

Slave36 (192.168.31.36)

Slave37 (192.168.31.37)

Slave34 (192.168.31.34)

Slave35 (192.168.31.35)

导入数据

使用SQOOP导入方法

MySQL

./bin/sqoop import --connect jdbc:mysql://X.X.X.X:3306/XXXXXX --username XXXXXX --password XXXXXX 
--table XXXXXX  
--target-dir /tmp/tablename 
--delete-target-dir 
--fields-terminated-by '\t' 
--lines-terminated-by '\n' 
--hive-import 
--hive-table XXXXXX

Oracle

./bin/sqoop import --connect jdbc:oracle:thin:@X.X.X.X:1521:XXXXXX  --username XXXXXX --password XXXXXX  
--target-dir /tmp/tablename 
--delete-target-dir 
--fields-terminated-by '\t' 
--lines-terminated-by '\n' 
--hive-import 
--hive-table oracle.XXXXXX  
--query ' select * from XXXXXX  where ROWNUM < 10000001 and $CONDITIONS order by  PK_NUM_IID asc  ' 
--split-by PK_NUM_IID

条件

--split-by: 指定主键

--query : SELECT 查询

--table : 指定Table 表名称

--where : Where查询条件

--fields-terminated-by : 列 区分

--lines-terminated-by : 行 区分

--hive-import : 导入 HIVE

--hive-table : HIVE 表名称

--target-dir : 导入进HDFS 目录

测试表单

已导入进HDFS数据文件

测试数据 2.7G+3.12G+3.34G+3.32G = 12.48 G

统计行数

select count(*) from taboa_item_info;

FAILED: SemanticException [Error 10001]: Line 1:21 Table not found 'taboa_item_info'
hive> select count(*) from taobao_item_info;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20160310021521_09139db0-7c2c-4605-8049-d9b3b0246fe3
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1457575610003_0002, Tracking URL = http://Master:8088/proxy/application_1457575610003_0002/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1457575610003_0002
Hadoop job information for Stage-1: number of mappers: 50; number of reducers: 1
2016-03-10 02:42:00,259 Stage-1 map = 0%,  reduce = 0%
2016-03-10 02:43:01,122 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 150.29 sec
2016-03-10 02:44:01,579 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 626.45 sec
2016-03-10 02:45:01,894 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 732.74 sec
2016-03-10 02:46:02,730 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 833.42 sec
2016-03-10 02:47:03,569 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 944.86 sec
2016-03-10 02:48:03,962 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 1032.26 sec
2016-03-10 02:49:04,881 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 1121.68 sec
2016-03-10 02:50:05,253 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 1209.3 sec
2016-03-10 02:50:45,384 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 1263.81 sec
2016-03-10 02:51:37,347 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 1332.3 sec
2016-03-10 02:51:40,504 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 1336.6 sec
2016-03-10 02:51:51,113 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 1348.75 sec
2016-03-10 02:52:34,584 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 1413.15 sec
2016-03-10 02:52:53,601 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 1442.64 sec
2016-03-10 02:52:58,878 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 1449.7 sec
2016-03-10 02:53:03,122 Stage-1 map = 8%,  reduce = 0%, Cumulative CPU 1454.42 sec
2016-03-10 02:53:04,176 Stage-1 map = 9%,  reduce = 0%, Cumulative CPU 1455.63 sec
2016-03-10 02:53:08,409 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 1465.09 sec
.......
2016-03-10 03:04:17,086 Stage-1 map = 90%,  reduce = 27%, Cumulative CPU 2423.38 sec
2016-03-10 03:04:18,162 Stage-1 map = 91%,  reduce = 29%, Cumulative CPU 2425.34 sec
2016-03-10 03:05:07,889 Stage-1 map = 92%,  reduce = 29%, Cumulative CPU 2388.43 sec
2016-03-10 03:05:13,196 Stage-1 map = 92%,  reduce = 30%, Cumulative CPU 2389.35 sec
2016-03-10 03:06:13,628 Stage-1 map = 92%,  reduce = 30%, Cumulative CPU 2398.9 sec
2016-03-10 03:06:18,915 Stage-1 map = 93%,  reduce = 30%, Cumulative CPU 2399.67 sec
2016-03-10 03:06:19,983 Stage-1 map = 93%,  reduce = 31%, Cumulative CPU 2399.9 sec
2016-03-10 03:07:20,373 Stage-1 map = 93%,  reduce = 31%, Cumulative CPU 2406.84 sec
2016-03-10 03:08:20,725 Stage-1 map = 93%,  reduce = 31%, Cumulative CPU 2413.96 sec
2016-03-10 03:08:28,132 Stage-1 map = 94%,  reduce = 31%, Cumulative CPU 2417.71 sec
2016-03-10 03:08:30,254 Stage-1 map = 95%,  reduce = 31%, Cumulative CPU 2420.51 sec
2016-03-10 03:08:48,246 Stage-1 map = 97%,  reduce = 31%, Cumulative CPU 2422.35 sec
2016-03-10 03:08:51,417 Stage-1 map = 97%,  reduce = 32%, Cumulative CPU 2422.62 sec
2016-03-10 03:09:06,212 Stage-1 map = 98%,  reduce = 32%, Cumulative CPU 2423.8 sec
2016-03-10 03:09:10,579 Stage-1 map = 99%,  reduce = 33%, Cumulative CPU 2425.47 sec
2016-03-10 03:09:17,183 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 2429.34 sec
2016-03-10 03:09:19,364 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 2432.34 sec
2016-03-10 03:09:20,419 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2434.06 sec
MapReduce Total cumulative CPU time: 40 minutes 34 seconds 60 msec
Ended Job = job_1457575610003_0002
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 50  Reduce: 1   Cumulative CPU: 2434.06 sec   HDFS Read: 13406416600 HDFS Write: 9 SUCCESS
Total MapReduce CPU Time Spent: 40 minutes 34 seconds 60 msec
OK
38801720
Time taken: 1658.858 seconds, Fetched: 1 row(s)

启动SELECT COUNT 时间

执行时间约在 28分钟

执行复杂的SQL性能

SELECT
COUNT (0)
FROM
    (
        SELECT
            FK_category_id,
            FK_BRAND_ID,
            FK_SHOP_ID,
            SUM (price),
            AVG (price),
            COUNT (0) AS count_s
        FROM
            (
                SELECT
                    *
                FROM
                    taobao_item_info
                LIMIT 10000000
            ) A
        GROUP BY
            FK_category_id,
            FK_BRAND_ID,
            FK_SHOP_ID
    ) b

三层嵌套 SELECT 

hive> select count(0) from ( select FK_category_id ,FK_BRAND_ID,FK_SHOP_ID,sum(price),avg(price),count(0) as count_s from ( select * from taobao_item_info limit 10000000  ) A  group by FK_category_id ,FK_BRAND_ID,FK_SHOP_ID ) b  ;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20160310091308_4da2d3f3-846b-4718-94fb-26a0c58f5fea
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1457575610003_0004, Tracking URL = http://Master:8088/proxy/application_1457575610003_0004/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1457575610003_0004
Hadoop job information for Stage-1: number of mappers: 50; number of reducers: 1
2016-03-10 09:14:56,551 Stage-1 map = 0%,  reduce = 0%
2016-03-10 09:15:57,969 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 163.65 sec
2016-03-10 09:16:58,474 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 731.19 sec
2016-03-10 09:17:59,423 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 882.38 sec
2016-03-10 09:19:00,197 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 1017.52 sec
2016-03-10 09:20:00,860 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 1124.38 sec
2016-03-10 09:21:01,496 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 1233.33 sec
2016-03-10 09:22:02,109 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 1323.32 sec
2016-03-10 09:23:02,477 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 1423.48 sec
2016-03-10 09:23:27,848 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 1456.21 sec
2016-03-10 09:23:53,308 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 1512.65 sec
2016-03-10 09:23:56,481 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 1518.25 sec
2016-03-10 09:24:02,803 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 1532.6 sec
2016-03-10 09:24:05,956 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 1539.29 sec
2016-03-10 09:24:09,122 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 1546.96 sec
2016-03-10 09:24:12,315 Stage-1 map = 8%,  reduce = 0%, Cumulative CPU 1551.68 sec
2016-03-10 09:24:15,489 Stage-1 map = 9%,  reduce = 0%, Cumulative CPU 1556.53 sec

........

2016-03-10 09:36:54,677 Stage-1 map = 90%,  reduce = 27%, Cumulative CPU 2869.93 sec
2016-03-10 09:36:56,049 Stage-1 map = 92%,  reduce = 27%, Cumulative CPU 2876.1 sec
2016-03-10 09:36:57,119 Stage-1 map = 94%,  reduce = 27%, Cumulative CPU 2885.87 sec
2016-03-10 09:36:59,275 Stage-1 map = 96%,  reduce = 27%, Cumulative CPU 2902.68 sec
2016-03-10 09:37:00,334 Stage-1 map = 97%,  reduce = 28%, Cumulative CPU 2906.44 sec
2016-03-10 09:37:05,716 Stage-1 map = 97%,  reduce = 29%, Cumulative CPU 2919.38 sec
2016-03-10 09:37:09,032 Stage-1 map = 97%,  reduce = 31%, Cumulative CPU 2924.19 sec
2016-03-10 09:37:17,655 Stage-1 map = 98%,  reduce = 31%, Cumulative CPU 2934.75 sec
2016-03-10 09:37:18,714 Stage-1 map = 99%,  reduce = 32%, Cumulative CPU 2922.13 sec
2016-03-10 09:37:21,900 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 2924.06 sec
2016-03-10 09:37:30,454 Stage-1 map = 100%,  reduce = 34%, Cumulative CPU 2930.15 sec
2016-03-10 09:37:33,599 Stage-1 map = 100%,  reduce = 36%, Cumulative CPU 2931.76 sec
2016-03-10 09:37:36,733 Stage-1 map = 100%,  reduce = 40%, Cumulative CPU 2934.88 sec
2016-03-10 09:37:39,864 Stage-1 map = 100%,  reduce = 44%, Cumulative CPU 2937.93 sec
2016-03-10 09:37:43,030 Stage-1 map = 100%,  reduce = 46%, Cumulative CPU 2939.41 sec
2016-03-10 09:37:52,515 Stage-1 map = 100%,  reduce = 49%, Cumulative CPU 2942.91 sec
2016-03-10 09:37:55,683 Stage-1 map = 100%,  reduce = 52%, Cumulative CPU 2945.48 sec
2016-03-10 09:37:58,839 Stage-1 map = 100%,  reduce = 56%, Cumulative CPU 2948.5 sec
2016-03-10 09:38:01,993 Stage-1 map = 100%,  reduce = 59%, Cumulative CPU 2951.46 sec
2016-03-10 09:38:04,086 Stage-1 map = 100%,  reduce = 63%, Cumulative CPU 2954.52 sec
2016-03-10 09:38:07,236 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 2957.55 sec
2016-03-10 09:39:07,347 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 3021.85 sec
2016-03-10 09:40:07,383 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 3087.12 sec
2016-03-10 09:41:08,401 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 3153.06 sec
2016-03-10 09:42:02,966 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3209.14 sec
MapReduce Total cumulative CPU time: 53 minutes 29 seconds 140 msec
Ended Job = job_1457575610003_0004
Launching Job 2 out of 3
Number of reduce tasks not specified. Estimated from input data size: 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1457575610003_0005, Tracking URL = http://Master:8088/proxy/application_1457575610003_0005/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1457575610003_0005
Hadoop job information for Stage-2: number of mappers: 2; number of reducers: 2
2016-03-10 09:42:23,192 Stage-2 map = 0%,  reduce = 0%
2016-03-10 09:42:56,329 Stage-2 map = 17%,  reduce = 0%, Cumulative CPU 60.93 sec
2016-03-10 09:43:02,686 Stage-2 map = 50%,  reduce = 0%, Cumulative CPU 75.78 sec
2016-03-10 09:43:08,007 Stage-2 map = 56%,  reduce = 0%, Cumulative CPU 85.74 sec
2016-03-10 09:43:11,197 Stage-2 map = 62%,  reduce = 0%, Cumulative CPU 97.21 sec
2016-03-10 09:43:13,313 Stage-2 map = 67%,  reduce = 0%, Cumulative CPU 99.26 sec
2016-03-10 09:43:20,771 Stage-2 map = 83%,  reduce = 0%, Cumulative CPU 108.79 sec
2016-03-10 09:43:27,144 Stage-2 map = 86%,  reduce = 8%, Cumulative CPU 116.44 sec
2016-03-10 09:43:29,249 Stage-2 map = 86%,  reduce = 17%, Cumulative CPU 118.42 sec
2016-03-10 09:43:30,318 Stage-2 map = 90%,  reduce = 17%, Cumulative CPU 121.67 sec
2016-03-10 09:43:32,440 Stage-2 map = 94%,  reduce = 17%, Cumulative CPU 125.3 sec
2016-03-10 09:43:35,684 Stage-2 map = 97%,  reduce = 17%, Cumulative CPU 128.74 sec
2016-03-10 09:43:38,837 Stage-2 map = 100%,  reduce = 17%, Cumulative CPU 132.0 sec
2016-03-10 09:43:42,012 Stage-2 map = 100%,  reduce = 42%, Cumulative CPU 136.88 sec
2016-03-10 09:43:44,114 Stage-2 map = 100%,  reduce = 67%, Cumulative CPU 142.29 sec
2016-03-10 09:43:45,163 Stage-2 map = 100%,  reduce = 69%, Cumulative CPU 148.67 sec
2016-03-10 09:43:47,256 Stage-2 map = 100%,  reduce = 74%, Cumulative CPU 153.29 sec
2016-03-10 09:43:48,302 Stage-2 map = 100%,  reduce = 78%, Cumulative CPU 156.42 sec
2016-03-10 09:43:50,417 Stage-2 map = 100%,  reduce = 83%, Cumulative CPU 159.56 sec
2016-03-10 09:43:51,469 Stage-2 map = 100%,  reduce = 89%, Cumulative CPU 162.64 sec
2016-03-10 09:43:53,554 Stage-2 map = 100%,  reduce = 94%, Cumulative CPU 165.74 sec
2016-03-10 09:43:54,632 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 171.65 sec
MapReduce Total cumulative CPU time: 2 minutes 51 seconds 650 msec
Ended Job = job_1457575610003_0005
Launching Job 3 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1457575610003_0006, Tracking URL = http://Master:8088/proxy/application_1457575610003_0006/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1457575610003_0006
Hadoop job information for Stage-3: number of mappers: 2; number of reducers: 1
2016-03-10 09:44:13,870 Stage-3 map = 0%,  reduce = 0%
2016-03-10 09:44:23,533 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 4.5 sec
2016-03-10 09:44:36,349 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 10.26 sec
MapReduce Total cumulative CPU time: 10 seconds 260 msec
Ended Job = job_1457575610003_0006
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 50  Reduce: 1   Cumulative CPU: 3209.14 sec   HDFS Read: 13406392259 HDFS Write: 407010956 SUCCESS
Stage-Stage-2: Map: 2  Reduce: 2   Cumulative CPU: 171.65 sec   HDFS Read: 407027303 HDFS Write: 234 SUCCESS
Stage-Stage-3: Map: 2  Reduce: 1   Cumulative CPU: 10.26 sec   HDFS Read: 7231 HDFS Write: 8 SUCCESS
Total MapReduce CPU Time Spent: 56 minutes 31 seconds 50 msec
OK
5294076
Time taken: 1804.504 seconds, Fetched: 1 row(s)

一个SQL 被分成了 三个任务执行

切分三个任务

第一个任务 SELECT * FROM taobao_item_info LIMIT 10000000 执行 28分钟 从节点获取数据 千万。

第二个任务 执行 Group By 执行时间 约2分钟 从数据集中 三次分组

第三个任务 统计分组后的数据 执行 37秒。

可以看出来这个是 获取节点千万 耗费的时间比较长。(如果物理机估计性能会好很多,IO性能。)

简单的SQL

select from (select from taobao_item_info limit 10 ) a order by fk_shop_id desc ;

hive> select * from (select * from taobao_item_info limit 10  ) a order by fk_shop_id desc ; 
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20160310091308_4da2d3f3-846b-4718-94fb-26a0c58f5fea
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1457575610003_0008, Tracking URL = http://Master:8088/proxy/application_1457575610003_0008/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1457575610003_0008
Hadoop job information for Stage-1: number of mappers: 50; number of reducers: 1
2016-03-11 04:16:54,732 Stage-1 map = 0%,  reduce = 0%
2016-03-11 04:17:54,117 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 36.9 sec
2016-03-11 04:17:57,446 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 39.1 sec
2016-03-11 04:17:58,517 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 41.09 sec
2016-03-11 04:17:59,572 Stage-1 map = 8%,  reduce = 0%, Cumulative CPU 45.47 sec
.......
2016-03-11 04:21:50,668 Stage-1 map = 90%,  reduce = 29%, Cumulative CPU 276.97 sec
2016-03-11 04:21:53,863 Stage-1 map = 92%,  reduce = 31%, Cumulative CPU 281.66 sec
2016-03-11 04:22:07,722 Stage-1 map = 94%,  reduce = 31%, Cumulative CPU 286.75 sec
2016-03-11 04:22:22,825 Stage-1 map = 96%,  reduce = 31%, Cumulative CPU 291.83 sec
2016-03-11 04:22:24,977 Stage-1 map = 96%,  reduce = 32%, Cumulative CPU 291.91 sec
2016-03-11 04:23:25,500 Stage-1 map = 96%,  reduce = 32%, Cumulative CPU 293.29 sec
2016-03-11 04:23:30,797 Stage-1 map = 98%,  reduce = 32%, Cumulative CPU 296.99 sec
2016-03-11 04:23:31,850 Stage-1 map = 98%,  reduce = 33%, Cumulative CPU 297.08 sec
2016-03-11 04:23:57,299 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 301.86 sec
2016-03-11 04:23:59,455 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 305.8 sec
MapReduce Total cumulative CPU time: 5 minutes 5 seconds 800 msec
Ended Job = job_1457575610003_0008
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1457575610003_0009, Tracking URL = http://Master:8088/proxy/application_1457575610003_0009/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1457575610003_0009
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2016-03-11 04:24:25,185 Stage-2 map = 0%,  reduce = 0%
2016-03-11 04:24:36,085 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 2.42 sec
2016-03-11 04:25:01,588 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 6.96 sec
MapReduce Total cumulative CPU time: 6 seconds 960 msec
Ended Job = job_1457575610003_0009
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 50  Reduce: 1   Cumulative CPU: 305.8 sec   HDFS Read: 1098144 HDFS Write: 4016 SUCCESS
Stage-Stage-2: Map: 1  Reduce: 1   Cumulative CPU: 6.96 sec   HDFS Read: 17243 HDFS Write: 4042 SUCCESS
Total MapReduce CPU Time Spent: 5 minutes 12 seconds 760 msec
OK
7.561175312E9   2.0 18.0    0.0 20595.0 6.2816906E7 5.0012322E7 5.0010728E7 500355.0    0.0 2013-05-28 09:30:54.0   0.0 2.0 2013-12-12 20:16:34.0   18.0    2013-12-12 20:16:34.0   Yonex 尤尼克斯羽毛球10 36.0    36.0    http://img04.taobaocdn.com/bao/uploaded/i4/T1L72RXeFgXXX4.isU_015200.jpg    辰灭舞 3.44273149E8    0.0 0001-01-01 00:00:00.0
7.561178308E9   262.0   5.4 0.0 3.3856229E7 6.2765949E7 5.0011039E7 5.0010404E7 36183.0 0.0 2013-05-25 23:35:06.0   0.0 266.0   2013-08-08 11:08:58.0   6.0 2013-08-08 11:08:58.0   硅胶磨脚贴 后跟帖 随意小圆贴 随意贴 磨脚贴 超值6颗装后跟贴    1436.4  1128.0  http://img02.taobaocdn.com/bao/uploaded/i2/14752022939039717/T1vqwZXa8aXXXXXXXX_!!0-item_pic.jpg    默客服饰专营店 5.12924752E8    262.0   2013-08-10 14:28:25.0
7.561176669E9   2.0 25.0    0.0 0.0 6.080252E7  5.0008159E7 25.0    0.0 0.0 0001-01-01 00:00:00.0   0.0 2.0 2013-09-14 23:55:41.0   25.0    2013-09-14 23:55:41.0   matchbox 绝版 火柴盒 上海环球 PK-118  P-38  闪电战斗机    50.0    50.0 http://img02.taobaocdn.com/bao/uploaded/i6/T1O8xLXaJdXXc9IAza_122024.jpg   西安誉秦    1.35742565E8    0.0 0001-01-01 00:00:00.0
7.561173937E9   3.0 75.0    0.0 3683806.0   6.0424775E7 213202.0    5.0025705E7 14426.0 1.0 0001-01-01 00:00:00.0   0.0 3.0 0001-01-01 00:00:00.0   75.0    2013-04-09 16:23:01.0   美国直邮Nature's Gate天然之扉红石榴多酚滋养亮泽洗发露532ml  225.0   225.0   http://img04.taobaocdn.com/bao/uploaded/i4/T1SqjkXh4iXXc9S9AY_030523.jpg    null    0.0 0.0 0001-01-01 00:00:00.0
7.561171414E9   1.0 1.8 0.0 6887002.0   5.9810857E7 5.0009361E7 5.0007216E7 22226.0 0.0 2013-05-03 13:44:31.0   0.0 1.0 2013-05-03 13:44:31.0   1.8 2013-05-03 13:44:31.0   花盆 双色花盆 摔不坏的花盆 13cm*11.5cm  1.8 1.8  http://img01.taobaocdn.com/bao/uploaded/i1/T1Th8LXdXFXXbM8oZ._113714.jpg   null    0.0 0.0 0001-01-01 00:00:00.0
7.561175162E9   1.0 1900.0  0.0 -1.0    5.9441723E7 -1.0    -1.0    0.0 0.0 2014-06-06 08:19:58.0   1900.0  1.0 2014-06-15 01:07:07.0   1900.0  2014-06-15 01:07:07.0   冲四钻!星星冰柜冷柜星星陈列柜星星岛柜SD/SC-276BP 发票+联保    1900.0  1900.http://img02.taobaocdn.com/bao/uploaded/i2/T1U4RtXmVyXXXhRh3W_024008.jpg   cdhhdq8 2.89292505E8    NULL    null
7.561177195E9   1.0 1.85    0.0 0.0 3.6764973E7 5.0022301E7 5.0020485E7 0.0 0.0 2013-04-30 19:35:51.0   0.0 1.0 2013-04-30 19:35:51.0   1.85    2013-04-30 19:35:51.0   微型轴承681XZZ,618/1.5或10008/1.5,内径1.5外径4厚度2mm  1.85 1.85   http://img04.taobaocdn.com/bao/uploaded/i4/T1P0ptXhFlXXXXQ.Z._083713.jpg    null    0.0 0.0 0001-01-01 00:00:00.0
7.561180686E9   2.0 15.7    0.0 10858.0 3.646117E7  5.0013402E7 5.0020579E7 79.0    0.0 2013-07-11 03:03:47.0   0.0 2.0 2013-08-28 03:31:51.0   15.7    2013-08-28 03:31:51.0   TCL开关插座 118型 Q系列 一插 多功能三孔  实体店 正品特价     31.4    31.4    http://img02.taobaocdn.com/bao/uploaded/i6/T1bjNLXfXwXXchaEs__081007.jpg    安顺国际2008    1.36040206E8    0.0 0001-01-01 00:00:00.0
7.561175709E9   1.0 33.0    0.0 0.0 3.4260183E7 2311.0  23.0    0.0 1.0 0001-01-01 00:00:00.0   0.0 1.0 0001-01-01 00:00:00.0   33.0    2013-04-09 16:23:01.0   蔡枫华 IQ成熟时 LP黑胶唱片    33.0    33.0    http://img02.taobaocdn.com/bao/uploaded/i2/T1Bu0mXn0LXXaAYSna_092246.jpg    null    0.0 0.0 0001-01-01 00:00:00.0
7.561175086E9   1.0 25.0    -50.0   29534.0 58040.0 5.0011981E7 1801.0  31.0    0.0 2013-07-10 16:48:21.0   0.0 1.0 2014-01-28 21:20:42.0   25.0    2014-01-28 21:20:42.0   甜唇制皂-- 蚕丝蛋白美白睡眠面膜 免洗 50G    25.0    25.0    http://img04.taobaocdn.com/bao/uploaded/i4/T1wYWZXddlXXcfDLs0_035200.jpg    温室效应    58040.0 0.0 0001-01-01 00:00:00.0
Time taken: 529.649 seconds, Fetched: 10 row(s)

本试想 ( select * from taobao_item_info limit 10 ) 的 10条记录 。应该是随机获取10条记录, 但是HIVE却建立了任务。这点让我感觉很奇怪。 单独执行的话,却不会创建任务管理

错误记录

hive> SELECT * FROM ( SELECT FK_category_id, FK_BRAND_ID, FK_SHOP_ID, SUM (price), AVG (price), COUNT (0) AS count_s FROM ( SELECT * FROM taobao_item_info ORDER BY FK_SHOP_ID ASC limit 10000000  ) A GROUP BY FK_category_id, FK_BRAND_ID, FK_SHOP_ID ) b WHERE FK_SHOP_ID >= 33979700 AND FK_SHOP_ID < 33979800 ; 
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20160310090858_e8aa927c-ea9c-4b14-8305-35920c7491eb
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1457575610003_0013, Tracking URL = http://Master:8088/proxy/application_1457575610003_0013/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1457575610003_0013
Hadoop job information for Stage-1: number of mappers: 50; number of reducers: 1
2016-03-11 07:13:14,139 Stage-1 map = 0%,  reduce = 0%
2016-03-11 07:14:15,390 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 100.93 sec
2016-03-11 07:15:16,213 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 881.08 sec
2016-03-11 07:16:16,936 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 1016.38 sec
....
2016-03-11 08:32:50,636 Stage-1 map = 100%,  reduce = 69%, Cumulative CPU 3623.28 sec
2016-03-11 08:32:52,719 Stage-1 map = 76%,  reduce = 0%, Cumulative CPU 2521.09 sec
2016-03-11 08:33:08,713 Stage-1 map = 76%,  reduce = 11%, Cumulative CPU 2540.45 sec
2016-03-11 08:33:14,579 Stage-1 map = 76%,  reduce = 16%, Cumulative CPU 2554.78 sec
2016-03-11 08:33:16,766 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2545.08 sec
MapReduce Total cumulative CPU time: 42 minutes 25 seconds 80 msec
Ended Job = job_1457575610003_0013 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1457575610003_0013_m_000033 (and more) from job job_1457575610003_0013
Examining task ID: task_1457575610003_0013_m_000010 (and more) from job job_1457575610003_0013
Examining task ID: task_1457575610003_0013_m_000031 (and more) from job job_1457575610003_0013
Examining task ID: task_1457575610003_0013_m_000044 (and more) from job job_1457575610003_0013
Examining task ID: task_1457575610003_0013_m_000014 (and more) from job job_1457575610003_0013
Examining task ID: task_1457575610003_0013_m_000048 (and more) from job job_1457575610003_0013
Examining task ID: task_1457575610003_0013_m_000020 (and more) from job job_1457575610003_0013
Examining task ID: task_1457575610003_0013_m_000048 (and more) from job job_1457575610003_0013
Examining task ID: task_1457575610003_0013_m_000015 (and more) from job job_1457575610003_0013
Examining task ID: task_1457575610003_0013_m_000014 (and more) from job job_1457575610003_0013

Task with the most failures(8): 
-----
Task ID:
  task_1457575610003_0013_m_000014

URL:
  http://Master:8088/taskdetails.jsp?jobid=job_1457575610003_0013&tipid=task_1457575610003_0013_m_000014
-----
Diagnostic Messages for this Task:
Container launch failed for container_1457575610003_0013_01_000114 : java.lang.IllegalArgumentException: java.net.UnknownHostException: Slave42
    at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374)
    at org.apache.hadoop.security.SecurityUtil.setTokenService(SecurityUtil.java:353)
    at org.apache.hadoop.yarn.util.ConverterUtils.convertFromYarn(ConverterUtils.java:231)
    at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:266)
    at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.<init>(ContainerManagementProtocolProxy.java:244)
    at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129)
    at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
    at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
    at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: Slave42
    ... 12 more

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 50  Reduce: 1   Cumulative CPU: 2545.08 sec   HDFS Read: 10274437068 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 42 minutes 25 seconds 80 msec

java.lang.IllegalArgumentException: java.net.UnknownHostException: Slave42

看上去这个错误无法连接我的节点服务器。 查了很久没有查到什么问题。。先放下。

这个SQL语句 我执行了 三次 才执行成功。

hive>SELECT * FROM ( SELECT FK_category_id, FK_BRAND_ID, FK_SHOP_ID, SUM (price), AVG (price), COUNT (0) AS count_s FROM ( SELECT * FROM taobao_item_info ORDER BY FK_SHOP_ID ASC limit 10000000  ) A GROUP BY FK_category_id, FK_BRAND_ID, FK_SHOP_ID ) b WHERE FK_SHOP_ID >= 33979700 AND FK_SHOP_ID < 33979800 ; 

Query ID = hadoop_20160310090858_e8aa927c-ea9c-4b14-8305-35920c7491eb
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1457575610003_0014, Tracking URL = http://Master:8088/proxy/application_1457575610003_0014/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1457575610003_0014
Hadoop job information for Stage-1: number of mappers: 50; number of reducers: 1
2016-03-11 08:45:27,262 Stage-1 map = 0%,  reduce = 0%
2016-03-11 08:46:28,849 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 74.8 sec
2016-03-11 08:47:29,233 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 976.73 sec
....
2016-03-11 09:07:19,752 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 3229.32 sec
2016-03-11 09:07:21,859 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3231.46 sec
MapReduce Total cumulative CPU time: 53 minutes 51 seconds 460 msec
Ended Job = job_1457575610003_0014
Launching Job 2 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1457575610003_0015, Tracking URL = http://Master:8088/proxy/application_1457575610003_0015/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1457575610003_0015
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2016-03-11 09:07:57,626 Stage-2 map = 0%,  reduce = 0%
2016-03-11 09:08:18,319 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 3.36 sec
2016-03-11 09:08:30,064 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 8.53 sec
MapReduce Total cumulative CPU time: 8 seconds 530 msec
Ended Job = job_1457575610003_0015
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 50  Reduce: 1   Cumulative CPU: 3231.46 sec   HDFS Read: 13406394152 HDFS Write: 7964 SUCCESS
Stage-Stage-2: Map: 1  Reduce: 1   Cumulative CPU: 8.53 sec   HDFS Read: 16602 HDFS Write: 5704 SUCCESS
Total MapReduce CPU Time Spent: 53 minutes 59 seconds 990 msec
OK
-1.0    -1.0    3.3979768E7 8.0 8.0 1
0.0 -1.0    3.397979E7  259.0   259.0   1
1402.0  10752.0 3.3979709E7 9330.25 3110.0833333333335  3
1403.0  10752.0 3.3979709E7 580.0   580.0   1
1403.0  21459.0 3.3979709E7 1950.0  1950.0  1
1403.0  21462.0 3.3979709E7 4580.34 4580.34 1
1403.0  21465.0 3.3979709E7 700.0   700.0   1
1403.0  81147.0 3.3979709E7 2100.02 2100.02 1
140116.0    10752.0 3.3979709E7 500.0   500.0   1
140116.0    21465.0 3.3979709E7 480.0   480.0   1
140116.0    81155.0 3.3979709E7 6750.0  1125.0  6
211104.0    20314.0 3.3979749E7 824.0   412.0   2
.....
5.0050366E7 0.0 3.3979768E7 43.0    6.142857142857143   7
Time taken: 1402.137 seconds, Fetched: 122 row(s)

切分三个任务

在虚拟机中频繁的操作可能会导致虚拟机无法响应。这个很讨厌。