--Hadoop Learning (1)

本站首页 管理页面写新日志退出

« September 2025 »
日一二三四五六
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30

公告

我的分类（专题）

首页(1304)
Eclipse(8)
J2ME(3)
OpenSymphony(16)
Hibernate(97)
Tapestry(23)
J2SE(72)
Symbian(2)
eXtremeComponents(13)
JBoss(33)
Javascript(13)
MySQL(72)
Java Open Source(104)
DWR(Ajax)(29)
Spring(61)
WebWork(15)
Apache(jakarta)(77)
软件设计(6)
算法(22)
Acegi(2)
Subversion(44)
Dojo(Ajax)(2)
Wicket(3)
IDEA(2)
ESB(6)
TinyMCE+FCKeditor(20)
Grails(1)
Prototype(Ajax)(32)
设计模式(20)
Prototype(0)
FreeMarker(17)
集成测试(14)
codehaus.org(2)
AOP(13)
Java代码(7)
Struts 2.0(6)
Groovy(5)
Linux(10)
网站架构(70)
Cache(11)
Python(40)
网络与系统管理(34)
shell/bash(4)
Pylons学习(2)
Django(88)
Ruby on Rails(120)
Ubuntu(4)
Quixote(3)
视频处理(20)
Web(UI+UE)(2)
TurboGears(25)
jQuery(2)
iBatis(7)
CentOS(2)
MySQL集群(1)
SELinux(1)

日志更新

Java中压缩与解压--中文文件名乱码解
对当前目录下所有文件进行压缩代码
java zip 中文问题
iBatis for Paging
再析在spring框架中解决多数据源的问
如何在spring框架中解决多数据源的问
SELinux 的配置小解
apache+mod_ssl中证书生成方
StatSVN的使用（续）
[原创]MySQL的LIST分区体验与总

留言板

签写新留言

我也想装饰元件
谢谢
飘过！
模板的问题
mule 求助
extremecomponents.cs
搜索呢？
[Apache(jakarta)]Apa
jsper报表的制作!
求助一下,关于compass的

链接

SpringSide
SpringFramework中文论坛
 BlogJava
Java开源大全
 Java视线论坛
 CSDN Java频道
 JavaScud开源平台
 JavaAPI中文文档
 一个不错的提供代码示例的站点
 Spring 中文开发手册(1.1.PR)
Springframework
Hibernate
Java版模式速查手册
 良葛格學習筆記
 javareference
java2s
GRAILS

Blog信息

blog名称:
日志总数:1304
评论数量:2242
留言数量:5
访问次数:7614871
建立时间:2006年5月29日

[Apache(jakarta)]Hadoop Learning (1)
软件技术

lhwork 发表于 2006/12/13 15:26:12

My Demo Statistic.java 1. 初始化配置文件，临时文件存放目录，还有具体的Job。 Configuration defaults = new Configuration(); File tempDir = new File("tmp/stat-temp-"+Integer.toString( new Random().nextInt(Integer.MAX_VALUE))); JobConf statJob = new JobConf(defaults, Statistic.class); 2. 设置Job的相关参数 statJob.setJobName("StatTestJob"); statJob.setInputDir(new File("tmp/input/")); statJob.setMapperClass(StatMapper.class); statJob.setReducerClass(StatReducer.class); statJob.setOutputDir(tempDir); 3. 运行Job，清理临时文件 JobClient.runJob(statJob); new JobClient(defaults).getFs().delete(tempDir); 4. 运行这个Demo，会输出 060328 151414 parsing jar:file:/E:/workground/opensource/hadoop-nightly/hadoop-nightly.jar!/hadoop-default.xml 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/mapred-default.xml 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/hadoop-site.xml 060328 151414 parsing jar:file:/E:/workground/opensource/hadoop-nightly/hadoop-nightly.jar!/hadoop-default.xml 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/mapred-default.xml 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/mapred-default.xml 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/hadoop-site.xml 060328 151414 parsing jar:file:/E:/workground/opensource/hadoop-nightly/hadoop-nightly.jar!/hadoop-default.xml Key: 0, Value: For the latest information about Hadoop, please visit our website at: Key: 70, Value: Key: 71, Value: http://lucene.apache.org/hadoop/ Key: 107, Value: Key: 108, Value: and our wiki, at: Key: 126, Value: Key: 127, Value: http://wiki.apache.org/hadoop/ 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/mapred-default.xml 060328 151414 parsing build\test\mapred\local\job_lck1iq.xml\localRunner 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/hadoop-site.xml 060328 151414 Running job: job_lck1iq 060328 151414 parsing jar:file:/E:/workground/opensource/hadoop-nightly/hadoop-nightly.jar!/hadoop-default.xml 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/mapred-default.xml 060328 151414 parsing build\test\mapred\local\job_lck1iq.xml\localRunner 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/mapred-default.xml 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/hadoop-site.xml 060328 151414 E:\workground\opensource\hadoop-nightly\tmp\input\README.txt:0+161 060328 151414 parsing jar:file:/E:/workground/opensource/hadoop-nightly/hadoop-nightly.jar!/hadoop-default.xml 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/mapred-default.xml 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/mapred-default.xml 060328 151414 parsing file:/E:/workground/opensource/hadoop-nightly/bin/hadoop-site.xml 060328 151414 parsing jar:file:/E:/workground/opensource/hadoop-nightly/hadoop-nightly.jar!/hadoop-default.xml 060328 151415 parsing file:/E:/workground/opensource/hadoop-nightly/bin/mapred-default.xml 060328 151415 parsing build\test\mapred\local\job_lck1iq.xml\localRunner 060328 151415 parsing file:/E:/workground/opensource/hadoop-nightly/bin/mapred-default.xml 060328 151415 parsing file:/E:/workground/opensource/hadoop-nightly/bin/hadoop-site.xml 060328 151415 reduce > reduce 060328 151415 map 100% reduce 100% 060328 151415 Job complete: job_lck1iq 060328 151415 parsing jar:file:/E:/workground/opensource/hadoop-nightly/hadoop-nightly.jar!/hadoop-default.xml 060328 151415 parsing file:/E:/workground/opensource/hadoop-nightly/bin/hadoop-site.xml 5. 分析一下输出。刚开始hadoop加载了一大堆配置文件，这里先不管。接着程序对 tmp/input/ 下面的 readme.txt 进行了解析，调用我的 StatMapper.map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter) ，程序输出了key和value的值。可见key是当前指针在文件中的位置，而value是当前行的内容。接着还看到了解析xml文件的log，估计是程序框架启动了多个线程来进行操作，提高效率。因为我的 StatMapper 只输出key-value，没有做其它事情，reduce这步被略过了。 6. 想办法让Reduce执行，看来要在StatMapper那里动动手脚。 StatMapper.java： public void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter) throws IOException { String tokenLength = String.valueOf(value.toString().split(" ").length); output.collect(new UTF8(""), new LongWritable(1)); } 每行的单词数作key，1作value提交给output.collect()，这样应该就能够统计文件里面每行的单词数频率了。 7. 接着还要修改Statistic.java: statJob.setOutputDir(tempDir); statJob.setOutputFormat(SequenceFileOutputFormat.class); statJob.setOutputKeyClass(UTF8.class); statJob.setOutputValueClass(LongWritable.class); 8. 以及StatReducer.java: public void reduce(WritableComparable key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { long sum = 0; while (values.hasNext()) { sum += ((LongWritable)values.next()).get(); } System.out.println("Length: " + key + ", Count: " + sum); output.collect(key, new LongWritable(sum)); } 9. 放一堆java文件到input目录下面，再次运行。（省略无数的xml解析log） Length: 0, Count: 359 Length: 1, Count: 3474 Length: 10, Count: 1113 Length: 11, Count: 1686 Length: 12, Count: 1070 Length: 13, Count: 1725 Length: 14, Count: 773 Length: 15, Count: 707 Length: 16, Count: 490 Length: 17, Count: 787 Length: 18, Count: 348 Length: 19, Count: 303 Length: 2, Count: 1543 Length: 20, Count: 227 Length: 21, Count: 421 Length: 22, Count: 155 Length: 23, Count: 143 Length: 24, Count: 109 Length: 25, Count: 219 Length: 26, Count: 83 Length: 27, Count: 70 Length: 28, Count: 55 Length: 29, Count: 107 Length: 3, Count: 681 Length: 30, Count: 53 Length: 31, Count: 43 Length: 32, Count: 38 Length: 33, Count: 66 Length: 34, Count: 36 Length: 35, Count: 26 Length: 36, Count: 42 Length: 37, Count: 52 Length: 38, Count: 32 Length: 39, Count: 33 Length: 4, Count: 236 Length: 40, Count: 17 Length: 41, Count: 40 Length: 42, Count: 15 Length: 43, Count: 23 Length: 44, Count: 14 Length: 45, Count: 27 Length: 46, Count: 15 Length: 47, Count: 30 Length: 48, Count: 2 Length: 49, Count: 18 Length: 5, Count: 1940 Length: 50, Count: 8 Length: 51, Count: 11 Length: 52, Count: 2 Length: 53, Count: 5 Length: 54, Count: 2 Length: 55, Count: 1 Length: 57, Count: 4 Length: 58, Count: 1 Length: 59, Count: 3 Length: 6, Count: 1192 Length: 60, Count: 1 Length: 61, Count: 4 Length: 62, Count: 1 Length: 63, Count: 3 Length: 66, Count: 1 Length: 7, Count: 1382 Length: 8, Count: 1088 Length: 9, Count: 2151 060328 154741 reduce > reduce 060328 154741 map 100% reduce 100% 060328 154741 Job complete: job_q618hy Cool，统计出来了，但是这些数据有什么意义............. 看来该搞一个(2)来捣鼓一些更有意义的事情了

阅读全文(1532) | 回复(0) | 编辑 | 精华

发表评论：

昵称：
密码：
主页：
标题：

验证码： (不区分大小写,请仔细填写,输错需重写评论内容！)

站点首页 | 联系我们 | 博客注册 | 博客登陆

Sponsored By W3CHINA
W3CHINA Blog 0.8 Processed in 0.063 second(s), page refreshed 144778907 times.
《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》
苏ICP备05006046号