This Season
 

How to Build an Inverted Index With MapReduce

How to Build an Inverted Index With MapReducethumbnail
MapReduce is a programming tool for analyzing large volumes of data.

MapReduce is a parallel programming model developed in Google for large data sets. It processes data in chunks rather than in sequential order. In doing so, it relies on a map of paired input functions (keys) and values that it then puts through the reduce function -- thus, its name -- to make the data easier to understand. Instead of providing the map function with a key and value, an inverted index pairs words and documents to search text. You can use inverted indexes in MapReduce to create an index for a keyword search, for example.

Related Searches:
    Difficulty:
    Moderate

    Instructions

      • 1

        Type the following code for the map function:

        public static class InvertedIndexerMapper extends MapReduceBase
        implements Mapper<LongWritable, Text, Text, Text>
        {
        private final static Text word = new Text () ;
        private final static Text location = new Text () ;

        public void map(LongWritable key, Text val,
        OutputCollector<Text, Text> output, Reporter reporter)
        throws IOException
        {
        FileSplit fileSplit = (FileSplit) reporter.getInputSplit() ;
        String fileName = fileSplit.getPath() .getName() ;
        location.set(fileName) ;

        String line - val.toString() ;
        StringTokenizer itr = new StringTokenizer(line.toLowerCase()) ;
        while (itr.hasMoreTokens()) {
        word.set(itr.nextToken()) ;
        output.collect(word, location) ;
        }
        }
        }

      • 2

        Type the following code for the reduce function:

        public static class InvertedIndexerReducer extends MapReduceBase
        implements Reducer<Text, Text, Text, Text>
        {
        public void reduce(Text key, Iterator<Text> values,
        OutputCollector<Text, Text> output,
        Reporter reporter) throws IOException
        {
        boolean first = true;
        StringBuilder toReturn = new StringBuilder() ;
        while (values.hasNext()) {
        if (!first)
        toReturn.append(", ") ;
        first = false;
        toReturn.append(values.next().toString()) ;
        }
        output.collect(key, new Text(toReturn.toString())) ;
        }
        }

      • 3

        Type the following code to complete the inverted index:

        public static void main(String[] args) throws IOException
        {
        if (args.length < 2) {
        System.out
        println("Usage: InvertedIndex <input path> <output path>") ;
        system.exit(1) ;
        }
        JobConf conf = new JobConf(InvertedIndex.class) ;
        conf.setJobName("InvertedIndex") ;

        conf.setOutputKeyClass(Text.class) ;
        conf.setOutputValueClass(Text.class) ;

        conf.setMapperClass(InvertedIndexerMapper.class) ;
        conf.setReducerClass(InvertedIndexerReducer.class) ;

        FileInputFormat.setInputPaths(conf, new Path(args[0])) ;
        FileOutputFormat.setOutputPath(conf, new Path(args[1])) ;
        try {
        JobClient.runJob(conf) ;
        } catch (Exception e) {
        e.pringStackTrace() ;
        }
        }

    Related Searches

    References

    Resources

    • Photo Credit Comstock/Comstock/Getty Images

    Read Next:

    Comments

    Follow eHow

    Related Ads