Running Hadoop in Standalone Mode

This section contains instructions for Hadoop installation on ubuntu. This is Hadoop quickstart tutorial to setup Hadoop quickly. This is shortest tutorial of Hadoop installation, here you will get all the commands and their description required to install Hadoop in Standalone mode(single node cluster)


COMMAND DESCRIPTION
sudo apt-get install sun-java6-jdk Install java
if you don't have hadoop bundle download here download hadoop
sudo tar xzf file_name.tar.gz Extract hadoop bundle
vi conf/hadoop-env.sh Edit configuration file hadoop-env.sh and set JAVA_HOME:
export JAVA_HOME=path to be the root of your Java installation(eg: /usr/lib/jvm/java-6-sun)
Go your hadoop installation directory(HADOOP_HOME) and type:
bin/hadoop
This will display the usage documentation for the hadoop
Congratulations Your Hadoop Setup is Completed. Now lets run some examples
bin/hadoop jar hadoop-*-examples.jar pi 10 100 Run pi example
mkdir input
cp conf/*.xml input
bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
cat output/*
Run grep example
mkdir inputwords
cp conf/*.xml inputwords
bin/hadoop jar hadoop-*-examples.jar wordcount inputwords outputwords
run word count example
If you got any error while running examples visit Hadoop Troubleshooting

7 comments:

  1. Any feedback and suggestions are invited.
    Thanx for visiting my blog.:):):)

    ReplyDelete
  2. i cannot edit hadoop-env.sh file.. permission denied

    ReplyDelete
  3. Are you working as root user?
    if not you must explicitly provide permission to that user.
    run this command as root
    chown -R YOUR-USER-NAME PATH-TO-HADOOP-DIR

    or you can run cmd specified in above tutorial as
    sudo vi conf/hadoop-env.sh

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Can you post steps for windows user. I am a amateur in Hadoop and would like to setup a single node and run pi, word count example. Thanks.

    ReplyDelete
  6. I have my hadoop running in pseudo distributed mode. Could you please suggest necessary changes to be made to make it run in Standalone mode.

    ReplyDelete
    Replies
    1. I think you should go from pseudo distributed to distributed
      most simple way is to deploy on standalone then pseudo distributed then distributed

      any ways if you want to make pseudo distributed to standalone then just remove entries from configuration files (core-site.xml, hdfs-site.xml, mapred-site.xml)

      Delete