The project linked at the bottom of this article provides a simple running example of an Eclipse Hadoop 0.23.1 Map Reduce project. It includes
You'll need the following software to use this project:
The project uses a Maven pom.xml file to define Hadoop Map Reduce dependencies, and an ANT build.xml file to perform automated jar building, and output directory cleanup. These replace the normal task of manually creating jar files, and deleting output directories between debug cycles.
The HadoopMapReduceTemplate class in this project includes a single line of code that causes that Map Reduce job to run in "local mode". This enables the use of the eclipse debugger for the map and reduce classes; in non-local mode the map and reduce code would be executed in a separate container, even on the local computer, so the debugger would not be connected. Before deploying any project based on this template, you should remove the following line.
The HadoopMapReduceTemplate class includes a simple job that uses the
TextInputFormat to map over lines in the input/test.txt file. The output of the reduce step
writes back the same file with the byte offset at the beginning of each line. The run method in this class includes all of the necessary configuration setup for the average job; though its overkill for this simple example.
Hadoop's GenericOptionsParser is used to parse out Hadoop specific command line arguments. This allows the job to be configured using the command line, which is a best practice for production Hadoop.
Both the ExampleMapper and ExampleReducer classes include the setup and cleanup methods, which are a best practice for instantiating Writable instances and making outside connections.