This post is about Hadoop 2.4.1 (latest stable version) Single Node Cluster Installation on Ubuntu 14.04. The Hadoop Single Node Cluster Installation method are suitable only for beginners practicing. The Hadoop Multi Node Cluster is more suitable for production environment Since Hadoop is meant for distributed environment. The Multi Node Cluster Hadoop Installation steps are almost as same as Single Node Cluster Hadoop Installation with few configuration changes. My next post will cover the Multi Node Cluster Hadoop Installation.

The Hadoop installation requires the pre-require steps like installing Java & ssh, creating dedicated Linux user account, disabling IPv6, generating key-gen for user account. This pre-require steps requires for Hadoop node(s) communication by Secure Shell Protocol(ssh). Disable IPv6 requires since Hadoop doesn't support for IPv6.

Step 1: Install Oracle Java JDK7 on Ubuntu.

Step 2: Install SSH-Server
  • To install Open ssh server, execute the following command in terminal
    sudo apt-get install openssh-server
Step 3: Create "hduser" user account under "hadoop" user group.
  • Execute the following command to create user group and user account. Give all required information while creating user account and give password as "hduser".
    sudo addgroup hadoop
    sudo adduser --ingroup hadoop hduser
    sudo adduser hduser sudo
Step 4: Disable IPv6
  • Open the "/etc/sysctl.conf" file with gedit as sudo user.
    sudo gedit /etc/sysctl.conf
  • Configure to disable the IPv6 by modifying/adding the following lines at last.
    #disable ipv6
    net.ipv6.conf.all.disable_ipv6 = 1
    net.ipv6.conf.default.disable_ipv6 = 1
    net.ipv6.conf.lo.disable_ipv6 = 1
  • Save & close the file then reboot the machine.
    sudo shutdown -r now
  • Check the configuration
    sudo sysctl -p
Step 5: Generate key-gen for hduser.
  • Execute the following command and give no password or key while generating key-gen
  • su - hduser
    ssh-keygen -t rsa -P ""
    cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
  • Now check the ssh for localhost
    su - hduser
    ssh localhost
  • If the above ssh is not working properly, we need to check and redo our Step 4 and Step 5 properly again.

Step 6: Download, Extract and Move Hadoop to hduser Home.
  • Execute the following commands on terminal to download Hadoop and extract to hduser home directory
    su - hduser
    cd /home/hduser/
    wget http://apache.osuosl.org/hadoop/common/hadoop-2.4.1/hadoop-2.4.1.tar.gz
    tar -zxvf hadoop-2.4.1.tar.gz
    mv hadoop-2.4.1 hadoop
Step 7: Configure Environment Variable.
  • Open  "$HOME/.bashrc" file with gedit as hduser.
    gedit $HOME/.bashrc
  • Add/Modify the following Environment Variables.
    export HADOOP_PREFIX=/home/hduser/hadoop
    export JAVA_HOME=/usr/java/jdk1.7.0_51
    export PATH=$PATH:$HADOOP_PREFIX/bin:$JAVA_HOME/bin
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_PREFIX/lib/native
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
    
  • Save & Close the file then Execute the bash file:
    exec bash
  • Check the Path environment variable:
    $PATH
Step 8: Configure Hadoop Environment variable
  • Open "/home/hduser/hadoop/etc/hadoop/hadoop-env.sh" file with gedit as hduser.
    gedit /home/hduser/hadoop/etc/hadoop/hadoop-env.sh
  • Add/Modify the following Environment Variable Path
    export JAVA_HOME=/usr/java/jdk1.7.0_51
  • Save & Close the file
Step 9: Configure "core-site.xml" file for working temporary directory and name for File System.
  • Create a temp directory at hduser home
    mkdir /home/hduser/tmp
  • Open "/home/hduser/hadoop/etc/hadoop/core-site.xml" flle with gedit as hduser
    gedit /home/hduser/hadoop/etc/hadoop/core-site.xml
  • Add the following configurations in "core-site.xml" file. Then Save & Close the file
    <configuration>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/home/hduser/tmp</value>
            <description>Base temporary directories</description>
        </property>
        <property>
            <name>fs.default.name</name>
            <value>hdfs://localhost:54310</value>
            <description>Default file system name</description>
        </property>
    </configuration>
    
Step 10: Configure "hdfs-site.xml" file for data directory and replication.
  • Create a data direcotry directory
    mkdir /home/hduser/tmp/dfs/data
  • Open "/home/hduser/hadoop/etc/hadoop/hdfs-site.xml" flle with gedit as hduser
    gedit /home/hduser/hadoop/etc/hadoop/hdfs-site.xml
  • Add the following configurations in "hdfs-site.xml" file. Then Save & Close the file
    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
        <property>
            <name>dfs.data.dir</name>
            <value>/home/hduser/tmp/dfs/data</value>
        </property>
    </configuration>
Step 11: Configure "mapred-site.xml" file for Host and Port number of Job Tracker.
  • Open "/home/hduser/hadoop/etc/hadoop/mapred-site.xml" file with gedit as hduser
    gedit /home/hduser/hadoop/etc/hadoop/mapred-site.xml
  • Add the following configurations in "mapred-site.xml" file. Then Save & Close the file
    <configuration>
     <property>
            <name>mapred.job.tracker</name>
            <value>localhost:54311</value>
            <description>Host,port for MapReduce Job Tracker</description>
        </property>
    </configuration>
    
Step 12: Configure "yarn-site.xml" file for node manager and resource manager configurations.(Configuring yarn is optional)
  • Open "/home/hduser/hadoop/etc/hadoop/yarn-site.xml" file with gedit as hduser
    gedit /home/hduser/hadoop/etc/hadoop/yarn-site.xml
  • Add the following configurations in "yarn-site.xml" file. Then Save & Close the file
    <configuration>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
            <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        <property>
            <name>yarn.resourcemanager.resource-tracker.address</name>
            <value>localhost:8025</value>
        </property>
        <property>
            <name>yarn.resourcemanager.scheduler.address</name>
            <value>localhost:8030</value>
        </property>
        <property>
            <name>yarn.resourcemanager.address</name>
            <value>localhost:8050</value>
        </property>
    </configuration>
Step 13: Format namenode and start Hadoop dfs
  • Execute the following command to format namenode and to start HDFS
    /home/hduser/hadoop/bin/hdfs namenode -format
    /home/hduser/hadoop/sbin/start-dfs.sh
  • List Hadoop Nodes
    jps
    It will list all tasks running like NameNode, DataNode, SecondaryNameNode, JobTracker, TaskTracker
The Hadoop Installation is completed, Now we can use Hadoop.