[Hadoop in Action] 第6章 编程实践
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了[Hadoop in Action] 第6章 编程实践,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含6377字,纯文字阅读大概需要10分钟。
内容图文
- Hadoop程序开发的独门绝技
- 在本地,伪分布和全分布模式下调试程序
- 程序输出的完整性检查和回归测试
- 日志和监控
- 性能调优
- 完整性检查
- 回归测试
- 考虑使用long而非int
1 import java.io.IOException; 2 import java.util.regex.PatternSyntaxException; 3 import java.util.Iterator; 4 5 import org.apache.hadoop.conf.Configuration; 6 import org.apache.hadoop.conf.Configured; 7 import org.apache.hadoop.fs.Path; 8 import org.apache.hadoop.io.IntWritable; 9 import org.apache.hadoop.io.LongWritable; 10 import org.apache.hadoop.io.DoubleWritable; 11 import org.apache.hadoop.io.Text; 12 import org.apache.hadoop.mapred.*; 13import org.apache.hadoop.util.Tool; 14import org.apache.hadoop.util.ToolRunner; 15 16 17publicclass AveragingWithCombiner extends Configured implements Tool { 18 19publicstaticclass MapClass extends MapReduceBase 20implements Mapper<LongWritable, Text, Text, Text> { 21 22staticenum ClaimsCounters { MISSING, QUOTED }; 23 24publicvoid map(LongWritable key, Text value, 25 OutputCollector<Text, Text> output, 26 Reporter reporter) throws IOException { 27 28 String fields[] = value.toString().split(",", -20); 29 String country = fields[4]; 30 String numClaims = fields[8]; 31if (numClaims.length() == 0) { 32 reporter.incrCounter(ClaimsCounters.MISSING, 1); 33 } elseif (numClaims.startsWith("\"")) { 34 reporter.incrCounter(ClaimsCounters.QUOTED, 1); 35 } else { 36 output.collect(new Text(country), new Text(numClaims + ",1")); 37 } 38 39 } 40 } 41 42publicstaticclass Combine extends MapReduceBase 43implements Reducer<Text, Text, Text, Text> { 44 45publicvoid reduce(Text key, Iterator<Text> values, 46 OutputCollector<Text, Text> output, 47 Reporter reporter) throws IOException { 48 49double sum = 0; 50int count = 0; 51while (values.hasNext()) { 52 String fields[] = values.next().toString().split(","); 53 sum += Double.parseDouble(fields[0]); 54 count += Integer.parseInt(fields[1]); 55 } 56 output.collect(key, new Text(sum + "," + count)); 57 } 58 } 59 60publicstaticclass Reduce extends MapReduceBase 61implements Reducer<Text, Text, Text, DoubleWritable> { 62 63publicvoid reduce(Text key, Iterator<Text> values, 64 OutputCollector<Text, DoubleWritable> output, 65 Reporter reporter) throws IOException { 66 67double sum = 0; 68int count = 0; 69while (values.hasNext()) { 70 String fields[] = values.next().toString().split(","); 71 sum += Double.parseDouble(fields[0]); 72 count += Integer.parseInt(fields[1]); 73 } 74 output.collect(key, new DoubleWritable(sum/count)); 75 } 76 } 77 78publicint run(String[] args) throws Exception { 79// Configuration processed by ToolRunner 80 Configuration conf = getConf(); 81 82// Create a JobConf using the processed conf 83 JobConf job = new JobConf(conf, AveragingWithCombiner.class); 84 85// Process custom command-line options 86 Path in = new Path(args[0]); 87 Path out = new Path(args[1]); 88 FileInputFormat.setInputPaths(job, in); 89 FileOutputFormat.setOutputPath(job, out); 90 91// Specify various job-specific parameters 92 job.setJobName("AveragingWithCombiner"); 93 job.setMapperClass(MapClass.class); 94 job.setCombinerClass(Combine.class); 95 job.setReducerClass(Reduce.class); 96 97 job.setInputFormat(TextInputFormat.class); 98 job.setOutputFormat(TextOutputFormat.class); 99 job.setOutputKeyClass(Text.class); 100 job.setOutputValueClass(Text.class); 101102// Submit the job, then poll for progress until the job is complete103 JobClient.runJob(job); 104105return 0; 106 } 107108publicstaticvoid main(String[] args) throws Exception { 109// Let ToolRunner handle generic command-line options 110int res = ToolRunner.run(new Configuration(), new AveragingWithCombiner(), args); 111112 System.exit(res); 113 } 114 }
SkipBadRecords方法
|
JobConf属性
|
setAttemptsToStartSkipping() | mapred.skip.attempts.to.start.skipping |
setMapperMaxSkipRecords() | mapred.skip.map.max.skip.records |
setReducerMaxSkipGroups() | mapred.skip.reduce.max.skip.groups |
setSkipOutputPath() | mapred.skip.out.dir |
setAutoIncrMapperProcCount() | mapred.skip.map.auto.incr.proc.count |
setAutoIncrReducerProcCount() | mapred.skip.reduce.auto.incr.proc.count |
属性
|
描述
|
mapred.compress.map.output | Boolean属性,表示mapper的输出是否被压缩 |
mapred.map.output.compression.codec | Class属性,表示哪种CompressionCodec被用于压缩mapper的输出 |
属性
|
描述
|
mapred.map.tasks.speculative.execution | 布尔属性,表示是否运行map任务猜测执行 |
mapred.reduce.tasks.speculative.execution | 布尔属性,表示是否运行reduce任务猜测执行 |
原文:http://www.cnblogs.com/zhengrunjian/p/4994969.html
内容总结
以上是互联网集市为您收集整理的[Hadoop in Action] 第6章 编程实践全部内容,希望文章能够帮你解决[Hadoop in Action] 第6章 编程实践所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。