本站首页    管理页面    写新日志    退出


«September 2025»
123456
78910111213
14151617181920
21222324252627
282930


公告
 本博客在此声明所有文章均为转摘,只做资料收集使用。

我的分类(专题)

日志更新

最新评论

留言板

链接

Blog信息
blog名称:
日志总数:1304
评论数量:2242
留言数量:5
访问次数:7614871
建立时间:2006年5月29日




[Apache(jakarta)]Hadoop Learning (1)
软件技术

lhwork 发表于 2006/12/13 15:26:12

My Demo Statistic.java 1. 初始化配置文件,临时文件存放目录,还有具体的Job。         Configuration defaults = new Configuration();         File tempDir = new File("tmp/stat-temp-"+Integer.toString(                 new Random().nextInt(Integer.MAX_VALUE)));         JobConf statJob = new JobConf(defaults, Statistic.class); 2. 设置Job的相关参数         statJob.setJobName("StatTestJob");         statJob.setInputDir(new File("tmp/input/"));         statJob.setMapperClass(StatMapper.class);         statJob.setReducerClass(StatReducer.class);         statJob.setOutputDir(tempDir); 3. 运行Job,清理临时文件         JobClient.runJob(statJob);         new JobClient(defaults).getFs().delete(tempDir); 4. 运行这个Demo,会输出 060328 151414 parsing jar:file:/E:/workground/opensource/hadoop-nightly/hadoop-nightly.jar!/hadoop-default.xml 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/mapred-default.xml 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/hadoop-site.xml 060328 151414 parsing jar:file:/E:/workground/opensource/hadoop-nightly/hadoop-nightly.jar!/hadoop-default.xml 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/mapred-default.xml 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/mapred-default.xml 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/hadoop-site.xml 060328 151414 parsing jar:file:/E:/workground/opensource/hadoop-nightly/hadoop-nightly.jar!/hadoop-default.xml Key: 0, Value: For the latest information about Hadoop, please visit our website at: Key: 70, Value: Key: 71, Value:    http://lucene.apache.org/hadoop/ Key: 107, Value: Key: 108, Value: and our wiki, at: Key: 126, Value: Key: 127, Value:    http://wiki.apache.org/hadoop/ 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/mapred-default.xml 060328 151414 parsing build\test\mapred\local\job_lck1iq.xml\localRunner 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/hadoop-site.xml 060328 151414 Running job: job_lck1iq 060328 151414 parsing jar:file:/E:/workground/opensource/hadoop-nightly/hadoop-nightly.jar!/hadoop-default.xml 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/mapred-default.xml 060328 151414 parsing build\test\mapred\local\job_lck1iq.xml\localRunner 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/mapred-default.xml 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/hadoop-site.xml 060328 151414 E:\workground\opensource\hadoop-nightly\tmp\input\README.txt:0+161 060328 151414 parsing jar:file:/E:/workground/opensource/hadoop-nightly/hadoop-nightly.jar!/hadoop-default.xml 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/mapred-default.xml 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/mapred-default.xml 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/hadoop-site.xml 060328 151414 parsing jar:file:/E:/workground/opensource/hadoop-nightly/hadoop-nightly.jar!/hadoop-default.xml 060328 151415 parsing file:/E:/workground/opensource/hadoop-nightly/bin/mapred-default.xml 060328 151415 parsing build\test\mapred\local\job_lck1iq.xml\localRunner 060328 151415 parsing file:/E:/workground/opensource/hadoop-nightly/bin/mapred-default.xml 060328 151415 parsing file:/E:/workground/opensource/hadoop-nightly/bin/hadoop-site.xml 060328 151415 reduce > reduce 060328 151415  map 100%  reduce 100% 060328 151415 Job complete: job_lck1iq 060328 151415 parsing jar:file:/E:/workground/opensource/hadoop-nightly/hadoop-nightly.jar!/hadoop-default.xml 060328 151415 parsing file:/E:/workground/opensource/hadoop-nightly/bin/hadoop-site.xml 5. 分析一下输出。 刚开始hadoop加载了一大堆配置文件,这里先不管。接着程序对 tmp/input/ 下面的 readme.txt 进行了解析,调用我的 StatMapper.map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter) ,程序输出了key和value的值。可见key是当前指针在文件中的位置,而value是当前行的内容。接着还看到了解析xml文件的log,估计是程 序框架启动了多个线程来进行操作,提高效率。 因为我的 StatMapper 只输出key-value,没有做其它事情,reduce这步被略过了。 6. 想办法让Reduce执行,看来要在StatMapper那里动动手脚。 StatMapper.java:     public void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter)             throws IOException     {         String tokenLength = String.valueOf(value.toString().split(" ").length);         output.collect(new UTF8(""), new LongWritable(1));     } 每行的单词数作key,1作value提交给output.collect(),这样应该就能够统计文件里面每行的单词数频率了。 7. 接着还要修改Statistic.java:         statJob.setOutputDir(tempDir);         statJob.setOutputFormat(SequenceFileOutputFormat.class);         statJob.setOutputKeyClass(UTF8.class);         statJob.setOutputValueClass(LongWritable.class); 8. 以及StatReducer.java:     public void reduce(WritableComparable key, Iterator values, OutputCollector output, Reporter reporter)             throws IOException     {         long sum = 0;         while (values.hasNext())         {             sum += ((LongWritable)values.next()).get();         }         System.out.println("Length: " + key + ", Count: " + sum);         output.collect(key, new LongWritable(sum));     } 9. 放一堆java文件到input目录下面,再次运行。 (省略无数的xml解析log) Length: 0, Count: 359 Length: 1, Count: 3474 Length: 10, Count: 1113 Length: 11, Count: 1686 Length: 12, Count: 1070 Length: 13, Count: 1725 Length: 14, Count: 773 Length: 15, Count: 707 Length: 16, Count: 490 Length: 17, Count: 787 Length: 18, Count: 348 Length: 19, Count: 303 Length: 2, Count: 1543 Length: 20, Count: 227 Length: 21, Count: 421 Length: 22, Count: 155 Length: 23, Count: 143 Length: 24, Count: 109 Length: 25, Count: 219 Length: 26, Count: 83 Length: 27, Count: 70 Length: 28, Count: 55 Length: 29, Count: 107 Length: 3, Count: 681 Length: 30, Count: 53 Length: 31, Count: 43 Length: 32, Count: 38 Length: 33, Count: 66 Length: 34, Count: 36 Length: 35, Count: 26 Length: 36, Count: 42 Length: 37, Count: 52 Length: 38, Count: 32 Length: 39, Count: 33 Length: 4, Count: 236 Length: 40, Count: 17 Length: 41, Count: 40 Length: 42, Count: 15 Length: 43, Count: 23 Length: 44, Count: 14 Length: 45, Count: 27 Length: 46, Count: 15 Length: 47, Count: 30 Length: 48, Count: 2 Length: 49, Count: 18 Length: 5, Count: 1940 Length: 50, Count: 8 Length: 51, Count: 11 Length: 52, Count: 2 Length: 53, Count: 5 Length: 54, Count: 2 Length: 55, Count: 1 Length: 57, Count: 4 Length: 58, Count: 1 Length: 59, Count: 3 Length: 6, Count: 1192 Length: 60, Count: 1 Length: 61, Count: 4 Length: 62, Count: 1 Length: 63, Count: 3 Length: 66, Count: 1 Length: 7, Count: 1382 Length: 8, Count: 1088 Length: 9, Count: 2151 060328 154741 reduce > reduce 060328 154741  map 100%  reduce 100% 060328 154741 Job complete: job_q618hy Cool,统计出来了,但是这些数据有什么意义............. 看来该搞一个(2)来捣鼓一些更有意义的事情了


阅读全文(1532) | 回复(0) | 编辑 | 精华
 



发表评论:
昵称:
密码:
主页:
标题:
验证码:  (不区分大小写,请仔细填写,输错需重写评论内容!)



站点首页 | 联系我们 | 博客注册 | 博客登陆

Sponsored By W3CHINA
W3CHINA Blog 0.8 Processed in 0.063 second(s), page refreshed 144778907 times.
《全国人大常委会关于维护互联网安全的决定》  《计算机信息网络国际联网安全保护管理办法》
苏ICP备05006046号