跳過導覽列.
首頁

Spring Hadoop Quickstart

Since Spring annouce spring-hadoop, let's perform a quick practice. (http://blog.springsource.org/2012/02/29/introducing-spring-hadoop/ )This is definetly not a regular usage for spring-hadoop.I make some change because I met some problems with dependencies and ipc issues.

  • pre-requirement: hadoop 0.20.2+

  If you don't have hadoop environment yet, please reference these documents to install it.  http://trac.nchc.org.tw/cloud/wiki/Hadoop_Lab1                                                                         Of course, you can get a free one from NCHC too.  http://hadoop.nchc.org.tw/                                                                    

  • Step1. get spring-hadoop. you can get it by git or download from website.                                            

  /home/evanshsu/springhadoop git init  /home/evanshsu/springhadoop git pull "git://github.com/SpringSource/spring-hadoop.git" 

  • Step2. build spring-hadoop.jar 

  /home/evanshsu/springhadoop ./gradlew jar
  /home/evanshsu/springhadoop mkdir lib  /home/evanshsu/springhadoop cp build/libs/spring-data-hadoop-1.0.0.BUILD-SNAPSHOT.jar lib/

  • Step3. get spring-framework.

  /home/evanshsu/spring wget "http://s3.amazonaws.com/dist.springframework.org/release/SPR/spring-framework-3.1.1.RELEASE.zip"  /home/evanshsu/spring unzip spring-framework-3.1.1.RELEASE.zip  /home/evanshsu/spring cp spring-framework-3.1.1.RELEASE/dist/*.jar /home/evanshsu/springhadoop/lib/

  • Step4. change the build file. we will assembly all jars into 1 jar file.

  /home/evanshsu/spring/samples/wordcount vim build.gradledescription = 'Spring Hadoop Samples - WordCount'apply plugin: 'base'apply plugin: 'java'apply plugin: 'idea'apply plugin: 'eclipse'repositories {    flatDir(dirs: '/home/evanshsu/springhadoop/lib/')    // Public Spring artefacts    maven { url "http://repo.springsource.org/libs-release" }    maven { url "http://repo.springsource.org/libs-milestone" }    maven { url "http://repo.springsource.org/libs-snapshot" }}dependencies {    compile fileTree('/home/evanshsu/springhadoop/lib/')    compile "org.apache.hadoop:hadoop-examples:$hadoopVersion"    // see HADOOP-7461    runtime "org.codehaus.jackson:jackson-mapper-asl:$jacksonVersion"    testCompile "junit:junit:$junitVersion"    testCompile "org.springframework:spring-test:$springVersion"}jar {    from configurations.compile.collect { it.isDirectory() ? it : zipTree(it).matching{        exclude 'META-INF/spring.schemas'        exclude 'META-INF/spring.handlers'        } }}

  • Step5. change hdfs hostname and wordcount paths (wordcount.input.path wordcount.output.path hd.fs) If you use nchc, please change hd.fs to hd.fs=hdfs://gm2.nchc.org.tw:8020

  /home/evanshsu/spring/samples/wordcount vim src/main/resources/hadoop.propertieswordcount.input.path=/user/evanshsu/input.txtwordcount.output.path=/user/evanshsu/outputhive.host=localhosthive.port=12345hive.url=jdbc:hive://${hive.host}:${hive.port}hd.fs=hdfs://localhost:9000mapred.job.tracker=localhost:9001path.cat=bin${file.separator}stream-bin${file.separator}catpath.wc=bin${file.separator}stream-bin${file.separator}wcinput.directory=logslog.input=/logs/input/log.output=/logs/output/distcp.src=${hd.fs}/distcp/source.txtdistcp.dst=${hd.fs}/distcp/dst

  • Step6. This is the most important part of spring-hadoop

  /home/evanshsu/spring/samples/wordcount vim src/main/resources/META-INF/spring/context.xml<?xml version="1.0" encoding="UTF-8"?>    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xmlns:context="http://www.springframework.org/schema/context"    xmlns:hdp="http://www.springframework.org/schema/hadoop"    xmlns:p="http://www.springframework.org/schema/p"    xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd    http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd    http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd">                fs.default.name=${hd.fs}                input-path="${wordcount.input.path}" output-path="${wordcount.output.path}"         mapper="org.springframework.data.hadoop.samples.wordcount.WordCountMapper"        reducer="org.springframework.data.hadoop.samples.wordcount.WordCountReducer"        jar-by-class="org.springframework.data.hadoop.samples.wordcount.WordCountMapper" />                   // 'hack' default permissions to make Hadoop work on Windows        if (java.lang.System.getProperty("os.name").startsWith("Windows")) {            // 0655 = -rwxr-xr-x             org.apache.hadoop.mapreduce.JobSubmissionFiles.JOB_DIR_PERMISSION.fromShort(0655)             org.apache.hadoop.mapreduce.JobSubmissionFiles.JOB_FILE_PERMISSION.fromShort(0655)        }                 inputPath = "${wordcount.input.path}"        outputPath = "${wordcount.output.path}"            if (fsh.test(inputPath)) { fsh.rmr(inputPath) }        if (fsh.test(outputPath)) { fsh.rmr(outputPath) }        // copy using the streams directly (to be portable across envs)        inStream = cl.getResourceAsStream("data/nietzsche-chapter-1.txt")        org.apache.hadoop.io.IOUtils.copyBytes(inStream, fs.create(inputPath), cfg)                   

  • Step7. Add your self mapper and reducer

  /home/evanshsu/spring/samples/wordcount vim src/main/java/org/springframework/data/hadoop/samples/wordcount/WordCountMapper.javapackage org.springframework.data.hadoop.samples.wordcount;import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;public class WordCountMapper extends Mapper {    private final static IntWritable one = new IntWritable(1);    private Text word = new Text();    public void map(Object key, Text value, Context context)            throws IOException, InterruptedException {        StringTokenizer itr = new StringTokenizer(value.toString());        while (itr.hasMoreTokens()) {            word.set(itr.nextToken());            context.write(word, one);        }    }}  /home/evanshsu/spring/samples/wordcount vim src/main/java/org/springframework/data/hadoop/samples/wordcount/WordCountReducer.javapackage org.springframework.data.hadoop.samples.wordcount;import java.io.IOException;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Reducer;public class WordCountReducer extends        Reducer {    private IntWritable result = new IntWritable();    public void reduce(Text key, Iterable values, Context context)            throws IOException, InterruptedException {        int sum = 0;        for (IntWritable val : values) {            sum += val.get();        }        result.set(sum);        context.write(key, result);    }}

  • Step8. add spring.schemas and spring.handlers.

  /home/evanshsu/spring/samples/wordcount vim src/main/resources/META-INF/spring.schemashttp\://www.springframework.org/schema/context/spring-context.xsd=org/springframework/context/config/spring-context-3.1.xsdhttp\://www.springframework.org/schema/hadoop/spring-hadoop.xsd=/org/springframework/data/hadoop/config/spring-hadoop-1.0.xsd  /home/evanshsu/spring/samples/wordcount vim src/main/resources/META-INF/spring.handlershttp\://www.springframework.org/schema/p=org.springframework.beans.factory.xml.SimplePropertyNamespaceHandlerhttp\://www.springframework.org/schema/context=org.springframework.context.config.ContextNamespaceHandlerhttp\://www.springframework.org/schema/hadoop=org.springframework.data.hadoop.config.HadoopNamespaceHandler

  • Step9. build and run

/home/evanshsu/spring/samples/wordcount ../../gradlew jar/home/evanshsu/spring/samples/wordcount hadoop jar build/libs/wordcount-1.0.0.M1.jar org.springframework.data.hadoop.samples.wordcount.Main

  • Step10. confirm it works

/home/evanshsu/spring/samples/wordcount hadoop fs -cat /user/evanshsu/output/*