Running Hadoop in Pseudo Distributed Mode

This section contains instructions for Hadoop installation on ubuntu. This is Hadoop quickstart tutorial to setup Hadoop quickly. This is shortest tutorial of Hadoop installation, here you will get all the commands and their description required to install Hadoop in Pseudo distributed mode(single node cluster)


COMMAND DESCRIPTION
sudo apt-get install sun-java6-jdk Install java
If you don't have hadoop bundle download here download hadoop
sudo tar xzf file_name.tar.gz Extract hadoop bundle
Go to your hadoop installation directory(HADOOP_HOME)
vi conf/hadoop-env.sh Edit configuration file hadoop-env.sh and set JAVA_HOME:
export JAVA_HOME=path to be the root of your Java installation(eg: /usr/lib/jvm/java-6-sun)
vi conf/core-site.xml
then type:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Edit configuration file core-site.xml
vi conf/hdfs-site.xml
then type:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Edit configuration file hdfs-site.xml
vi conf/mapred.xml
then type:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
Edit configuration file mapred-site.xml and type:
sudo apt-get install openssh-server openssh-client install ssh
ssh-keygen -t rsa -P ""
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
ssh localhost
Setting passwordless ssh
bin/hadoop namenode –format Format the new distributed-filesystem
During this operation :
Name node get start
Name node get formatted
Name node get stopped
bin/start-all.sh Start the hadoop daemons
jps It should give output like this:
14799 NameNode
14977 SecondaryNameNode
15183 DataNode
15596 JobTracker
15897 TaskTracker
Congratulations Hadoop Setup is Completed
http://localhost:50070/ web based interface for name node
http://localhost:50030/ web based interface for job tracker
Now lets run some examples
bin/hadoop jar hadoop-*-examples.jar pi 10 100 run pi example
bin/hadoop dfs -mkdir input
bin/hadoop dfs -put conf input
bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
bin/hadoop dfs -cat output/*
run grep example
bin/hadoop dfs -mkdir inputwords
bin/hadoop dfs -put conf inputwords
bin/hadoop jar hadoop-*-examples.jar wordcount inputwords outputwords
bin/hadoop dfs -cat outputwords/*
run wordcount example
bin/stop-all.sh Stop the hadoop daemons

89 comments:

  1. Hi,
    I am trying to run hadoop in pseudo distributed
    mode using cloudera as vm.
    I have copied the files into hdfs via hue and using the job browser I am trying to run it as a job.
    But it dies saying permissions denied error.
    Could you help me the same?
    Thanks,
    Sayali

    ReplyDelete
  2. Hi Sayali,
    When we run jobs some files are created,
    I think you have not given permission to create a file at that location(hadoop.tmp.dir).
    Please provide all the permissions to the current user.
    or you can also install as root user
    For cloudera you can refer:
    http://cloudera-tutorial.blogspot.com/
    For Hue you can refer:
    http://hivetutorial.wordpress.com/

    ReplyDelete
  3. Hi, I am getting following errors. Can you please help

    11/02/21 00:23:23 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 0 time(s).
    11/02/21 00:23:24 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 1 time(s).
    11/02/21 00:23:25 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 2 time(s).
    11/02/21 00:23:26 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 3 time(s).

    ReplyDelete
  4. Hi,
    Please run jps command to ensure all the daemons are running.
    Also if you run job just after bin/start-all.sh, you may get this error because namenode is in safe mode, so wait for 10 - 30 sec and try to run same job again

    ReplyDelete
  5. We are trying to set up a database in hive. Every time we create tables, insert data and end our session by stopping all hadoop daemons, we lose previously inserted data in hive as we need to execute hadoop namenode -format every time that seems to wipe out all data. Is there any way to retain the data created in hive?

    ReplyDelete
  6. You need not to execute "hadoop namenode -format" every time you start the cluster.
    just execute start-all.sh to start hadoop daemons.
    If we format the namenode all the data will be lost

    ReplyDelete
    Replies
    1. This is a great point. I kept wondering why I could not see the namenode when everything else was working properly.Thanks pal!

      Delete
    2. Please post your namenode logs, so that I can help you on this..

      Delete
  7. Thanks Rahul for your prompt reply. What you said is true only when we keep ubuntu running. But when we restart our machine, we had to format namenode server otherwise http://localhost:50070/ won't work. So, maybe we need to start namenode in some other manner, right?

    ReplyDelete
  8. when you restart your machine, just restart your hadoop daemons by executing "start-all.sh"
    After starting daemons http://localhost:50070/ will work

    ReplyDelete
  9. hi can u tell me how to install Hadoop tutorial for running first in pseudo-distributed mode in windows7

    ReplyDelete
  10. Hi,
    I would suggest you to install hadoop on linux.
    If you want to install on windows:
    first install cygwin from http://www.cygwin.com/ , it will provide linux atmosphere,
    now you can follow above tutorial for hadoop installation.

    ReplyDelete
  11. HI, I get following errors, when i want to run wordcount example, and many other examples provided by hadoop itself, can you help me?

    11/02/21 00:23:23 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 0 time(s).
    11/02/21 00:23:24 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 1 time(s).
    11/02/21 00:23:25 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 2 time(s).
    11/02/21 00:23:26 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 3 time(s).

    ReplyDelete
  12. Please ensure all the daemons of hadoop are running By running jps command.
    also ensure that name node is not in safe mode

    Please see logs in case of any error

    ReplyDelete
  13. Hi, when I run the command bin/start-all.sh I get an error. Can you please help me to sort it out. here is the error message

    localhost: starting secondarynamenode, logging to /data/hadoop/hadoop-0.20.2/bin/../logs/hadoop-waqas-secondarynamenode-trinity.out

    localhost: Exception in thread "main" java.lang.NumberFormatException: For input string: "localhost:9000"

    localhost: at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

    localhost: at java.lang.Integer.parseInt(Integer.java:492)

    localhost: at java.lang.Integer.parseInt(Integer.java:527)

    localhost: at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:146)

    localhost: at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:156)

    localhost: at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:160)

    localhost: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:131)

    localhost: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:115)

    localhost: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:469)

    ReplyDelete
  14. There might be mistake in your configuration files
    like core-site.xml, hdfs-site.xml, mapred-site.xml etc.
    please post your configuration files contents

    ReplyDelete
  15. I edited my configuration files exactly the same as you suggested up here in this blog.

    ReplyDelete
  16. please posts you configuration files so i can help u on this

    ReplyDelete
  17. ohh..yeah I rechecked and there was some error in my configuration files.Thanks for your time and suggestion

    ReplyDelete
  18. Hi,
    I have configured Hadoop and Hive on Windows through Cygwin. But I am facing some problems like: in hive terminal (CLI): hive> when i enter query, the query do not execute and terminal remains busy.

    If i enter the query like: bin/hive -e 'LOAD DATA INPATH 'kv1.txt' OVERWRITE INTO TABLE pokes;'

    The Output is like this:
    Hive history file=/tmp/Bhavesh.Shah/hive_job_log_Bhavesh.Shah_201111301549_1377455380.txt FAILED: Parse Error: line 1:17 mismatched input 'kv1' expecting StringLiteral near 'INPATH' in load statement

    What could be the problem? Pls suggest me

    ReplyDelete
  19. You need to create file "kv1.txt" and provide path to that.

    If error persist please post contents of log file (history file=/tmp/Bhavesh.Shah/hive_job_log_Bhavesh.Shah......)

    ReplyDelete
  20. This comment has been removed by the author.

    ReplyDelete
  21. I have already put that file in the same directory thats why I have written kv1.txt.
    The error String Lateral actually creating the problem, I dont no know why?

    And I one more thing is that I am not finding that particular directory i.e. /tmp/Bhavesh.Shah/...

    Now what to do?......:(

    ReplyDelete
  22. hi,
    I just found the error log file.
    CONTENT IS:
    ---------
    SessionStart SESSION_ID="Bhavesh.Shah_201111301549" TIME="1322648344557"

    Sorry for the multiple posts.

    ReplyDelete
  23. change outer single quotes to double quotes
    also put local keyword
    the correct query would be:

    bin/hive -e "LOAD DATA LOCAL INPATH 'kv1.txt' OVERWRITE INTO TABLE pokes;"

    following link would be useful:
    http://hivebasic.blogspot.com/

    ReplyDelete
  24. I have one more doubt that,
    When I enter the query in Hive CLI, I get the error as:

    $ bin/hive -e "insert overwrite table pokes select a.* from invites a where a.ds='2008-08-15';"
    bin/hive -e "insert overwrite table pokes select a.* from invites a where a.ds='2008-08-15';"
    Hive history file=/tmp/Bhavesh.Shah/hive_job_log_Bhavesh.Shah_201112021007_2120318983.txt
    Total MapReduce jobs = 2
    Launching Job 1 out of 2
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_201112011620_0004, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201112011620_0004
    Kill Command = C:\cygwin\home\Bhavesh.Shah\hadoop-0.20.2\/bin/hadoop job -Dmapred.job.tracker=localhost:9101 -kill job_201112011620_0004
    2011-12-02 10:07:30,777 Stage-1 map = 0%, reduce = 0%
    2011-12-02 10:07:57,796 Stage-1 map = 100%, reduce = 100%
    Ended Job = job_201112011620_0004 with errors
    FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

    I think that map-reduce job is not started and hence it is not executed?
    So what could be solution?
    Thanks.

    ReplyDelete
  25. Please scan your error logs and post detailed exception

    ReplyDelete
  26. 2011-12-02 12:29:19,275 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved.
    2011-12-02 12:29:19,275 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved.
    2011-12-02 12:29:19,275 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved.
    2011-12-02 12:29:19,275 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved.
    2011-12-02 12:29:19,275 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved.
    2011-12-02 12:29:19,275 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved.
    2011-12-02 12:29:23,011 WARN mapred.JobClient (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    2011-12-02 12:29:58,749 ERROR exec.MapRedTask (SessionState.java:printError(343)) - Ended Job = job_201112011620_0006 with errors
    2011-12-02 12:29:58,858 ERROR ql.Driver (SessionState.java:printError(343)) - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

    ReplyDelete
  27. Hello Rahul,
    Now this time I have configured Hadoop in Linux (Ubuntu) and trying for Hive. But there is one problem while configuring Hive that:

    After successfully building the package through ant, when I try for the launching Hive CLI from Hive directory. I am getting errors as:
    "Missing Hive Builtins Jar: /home/hadoop/hive-0.7.1/hive/lib/hive-builtins-*.jar"

    What could be the problem in configuration? Pls suggest me as soon as possible.

    ReplyDelete
  28. For configuring Hive you don't need to build it
    I didn't build it

    you can follow this approach
    1. Install hadoop
    2. set HADOOP_HOME
    3. Untar Hive*.tar.gz
    4. go to HIVE_HOME and type bin/hive
    hive shell should be open

    ReplyDelete
  29. i am getting error :

    hdfs://127.0.0.1:9100/tmp/hadoop-DEFTeam-N5/mapred/system/jobtracker.info is missing!
    ...my name node is running, however Jobtracker is not running

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. On what operation you are getting this error ??
      please check you Jobtracker logs and post the error

      Delete
  30. when I searched in /tmp folder, there is no directory called Mapred/system/jobtracker.info

    ReplyDelete
    Replies
    1. its giving error in hdfs not in local filesystem
      please scan post logs

      Delete
  31. INFO org.apache.hadoop.mapred.JobTracker: STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting JobTracker
    STARTUP_MSG: host = DEFTeam-N5-PC/192.168.2.104
    STARTUP_MSG: args = []
    STARTUP_MSG: version = 0.20.0
    STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.20 -r 763504; compiled by 'ndaley' on Thu Apr 9 05:18:40 UTC 2009
    ************************************************************/
    INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=JobTracker, port=9101
    INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
    INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50030
    INFO org.mortbay.log: jetty-6.1.14
    INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50030
    INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
    INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 9101
    INFO org.apache.hadoop.mapred.JobTracker: JobTracker webserver: 50030
    INFO org.apache.hadoop.mapred.JobTracker: Cleaning up the system directory
    WARN org.apache.hadoop.mapred.JobTracker: Failed to initialize recovery manager. The Recovery manager failed to access the system files in the system dir (hdfs://127.0.0.1:9100/tmp/hadoop-DEFTeam-N5/mapred/system).
    WARN org.apache.hadoop.mapred.JobTracker: It might be because the JobTracker failed to read/write system files (hdfs://127.0.0.1:9100/tmp/hadoop-DEFTeam-N5/mapred/system/jobtracker.info / hdfs://127.0.0.1:9100/tmp/hadoop-DEFTeam-N5/mapred/system/jobtracker.info.recover) or the system file hdfs://127.0.0.1:9100/tmp/hadoop-DEFTeam-N5/mapred/system/jobtracker.info is missing!
    WARN org.apache.hadoop.mapred.JobTracker: Bailing out...
    WARN org.apache.hadoop.mapred.JobTracker: Error starting tracker: org.apache.hadoop.ipc.RemoteException: java.io.IOException: failed to create file /tmp/hadoop-DEFTeam-N5/mapred/system/jobtracker.info on client 127.0.0.1.
    Requested replication 0 is less than the required minimum 1

    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:238)
    at org.apache.hadoop.mapred.JobTracker$RecoveryManager.updateRestartCount(JobTracker.java:1168)
    at org.apache.hadoop.mapred.JobTracker.(JobTracker.java:1657)
    at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:174)
    at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3528)

    2012-04-16 18:48:39,832 FATAL org.apache.hadoop.mapred.JobTracker: java.net.BindException: Problem binding to /127.0.0.1:9101 : Address already in use: bind

    ReplyDelete
    Replies
    1. There are multiple errors, so I would suggest you to start from scratch, please remove your old installation and install it again

      talking about above error
      one exception is required port is already in use
      your data node daemon might not running

      after installation run jps command to ensure that all services are running

      Delete
    2. Thanx Rahul, I re-installed and working fine now :)

      Delete
  32. Hi

    When I am running jps command in cygwin and it says "command not found". Any idea how to run jps command in cygwin?

    Thanks,
    Nathan

    ReplyDelete
    Replies
    1. I found it. It should executed like /cygdrive/c/Program\ Files/Java/jdk1.6.0_29/bin/jps.exe

      it works...

      Thanks,
      Nathan

      Delete
  33. Hi,
    Few days back, I installed Hadoop 0.20.2 on window's 7 through Cygwin...and its working fine, I ran Wordcount example on command prompt, everything is working fine.

    I would like to know about Cloudera, I am going through cloudera videos, is there any difference in Cloudera installation and installation of hadoop through Cygwin. If I want to learn Cloudera hadoop in detail, Shall I do setup of hadoop through cloudera? can we install on windows?.....plz help me

    ReplyDelete
    Replies
    1. installation of both Apache and cloudera through tarball is same
      Yes you can install hadoop on windows through cygwin, but it is not recommended
      To install Cloudera Hadoop you need to download tarball from cloudera.com

      their are some other debian and rpm format of hadoop also present on both cloudera and apache whose installation steps are different, but I recommend to download tarball and install

      Delete
  34. Thanks for information,
    I am going to use Pentaho BI Tool with apache hadoop , i found document on pentaho which is saying to create virtual Operating system, installing VMware player and then Ubuntu, for hadoop installation...... will it be useful?

    ReplyDelete
  35. Hello. I am a new comer to Hadoop. I followed the instructions on http://alans.se/blog/2010/hadoop-hbase-cygwin-windows-7-x64/. When I run the test
    "bin/hadoop jar hadoop-*examples*.jar grep input output 'dfs[a-z.]+'"
    I get an exception following a set of errors
    "Retrying connect to server: /127.0.0.1:9101. Already tried"

    I checked that Safe Mode is off. Any thoughts on what I can try to make sure I can get this working?

    ReplyDelete
    Replies
    1. One of Hadoop daemons(namenode, secondary namenode, job tracker, datanode, task tracker) not running please check the error logs

      Delete
  36. Hi Rahul, can we have common storage for both HBase and Hive, I want to retrieve data from both Hbase and Hive.....so I would like to know whether I can make common storage for data coming from both Hbase and Hive...plz help

    ReplyDelete
    Replies
    1. HBase and hive already have same storage, data of both is saved in HDFS

      If you want to query HBase's data using Hive (using its SQL) you need to integrate Hive and HBase

      Delete
  37. Hi Rahul,
    I am trying to confidgure hadoop 1.0. When I run the command for pi example then I get the following error:

    Number of Maps = 10
    Samples per Map = 100
    java.lang.RuntimeException: java.io.IOException: Call to localhost/127.0.0.1:9000 failed on local exception: java.io.EOFException
    at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:546)
    at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:318)
    at org.apache.hadoop.examples.PiEstimator.estimate(PiEstimator.java:265)
    at org.apache.hadoop.examples.PiEstimator.run(PiEstimator.java:342)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.examples.PiEstimator.main(PiEstimator.java:351)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
    Caused by: java.io.IOException: Call to localhost/127.0.0.1:9000 failed on local exception: java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:1103)
    at org.apache.hadoop.ipc.Client.call(Client.java:1071)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
    at $Proxy1.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
    at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:238)
    at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:203)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
    at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:542)
    ... 17 more
    Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:392)
    at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:800)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:745)

    What it might be the problem?Thanks

    waqas

    ReplyDelete
    Replies
    1. Please verify:
      Are all daemons of hadoop running ?
      Did you format namenode ?

      Delete
    2. Yes I formated it but in log it says its not formated.

      Delete
    3. org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times
      2012-05-14 15:39:39,586 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
      java.io.IOException: NameNode is not formatted.
      at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:315)
      at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:97)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:386)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:360)
      at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
      at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:496)
      at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
      at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
      2012-05-14 15:39:39,587 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException: NameNode is not formatted.
      at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:315)
      at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:97)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:386)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:360)
      at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
      at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:496)
      at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
      at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)

      2012-05-14 15:39:39,588 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
      /************************************************************
      SHUTDOWN_MSG: Shutting down NameNode at test/127.0.0.1
      ************************************************************/

      Delete
  38. I got rid of this problem and now I am getting this error at pi example

    error message part1(due to limit of 4096 words)

    Number of Maps = 10
    Samples per Map = 100
    12/05/14 16:21:47 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/waqas/PiEstimator_TMP_3_141592654/in/part0 could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1556)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

    at org.apache.hadoop.ipc.Client.call(Client.java:1066)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
    at $Proxy1.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy1.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3507)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3370)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2700(DFSClient.java:2586)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2826)

    ReplyDelete
    Replies
    1. What was exact problem in previous case ??

      This error come due to following reasons:
      1.might be datanode daemon not running, (check by running jps command)

      2. see no of live nodes (it should not be zero) check http://localhost:50070

      3. namenode in safe mode check running this command bin/hadoop dfsadmin -safemode get

      Delete
    2. It looks like datanode is not running.
      Here is output of datanode.out

      #
      # A fatal error has been detected by the Java Runtime Environment:
      #
      # SIGFPE (0x8) at pc=0x00002ab1d93d368f, pid=14357, tid=1074792768
      #
      # JRE version: 7.0_01-b08
      # Java VM: Java HotSpot(TM) 64-Bit Server VM (21.1-b02 mixed mode linux-amd64 compressed oops)
      # Problematic frame:
      # C [ld-linux-x86-64.so.2+0x868f] do_lookup_x+0xcf
      #
      # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
      #
      # An error report file with more information is saved as:
      # /data/hadoop/hadoop-1.0.0/hs_err_pid14357.log
      #
      # If you would like to submit a bug report, please visit:
      # http://bugreport.sun.com/bugreport/crash.jsp
      # The crash happened outside the Java Virtual Machine in native code.
      # See problematic frame for where to report the bug.
      #

      I also want to mention that I already have hadoop 0.20 running on this system but having problem with hadoop 1.0...Can you please suggest how can I solve this problem.Thanks
      waqas

      Delete
    3. also when i try to run ultimate -c unlimited then it says no such command. I am not use to all this command line stuff so please guide me in this as well

      Delete
    4. Why are you running "ultimate -c unlimited"
      it looks like you are using java 7 (with hadoop java 6 is recommended)
      If Hadoop(0.20) is already running on your machine then first stop it otherwise you will get port related issues

      Delete
    5. I am not running both simultaniously. Also I used ports 8020 and 8021 for hadoop 1.0 instead of 9000 and 9001 which i assigend to hadoop 0.20 as mentioned in your blog.
      Hadoop 0.20 is also running with java 7. I tried ulimit -c unlimited because it was mentioned in the datanote.out file that I posted

      Delete
  39. Hi Rahul, I am using hadoop 0.20.2, and it's working fine, however sometimes, namenode stops working, then to resolve the issue, I need to delete hadoop image stored in tmp directory and have to do namenode format step, it resolve the issue but all the data is lost. is there anyother way to resolve issue?

    ReplyDelete
    Replies
    1. Please scan your error logs and post so the error log, so that I can find root cause

      Delete
  40. There is no error log, I did reformat now working fine...thnx

    ReplyDelete
  41. I have 1 more doubt :

    I want to retrieve data from Hbase and Hive, my dimension table's are stored in Hbase and fact tables in Hive, so using Hive how to integrate and retrieve data from hbase, I am using Hive version 0.9.0, Hbase version 0.92.0, I heard that hive 0.9.0 onwards we can retrieve existing data from Hbase, but i dnt knw how to, plz help

    ReplyDelete
    Replies
    1. Hi Karry you need to integrate HBase and Hive, with that you can fulfill your requirements, you can find all the stuff about that here (https://cwiki.apache.org/Hive/hbaseintegration.html#HBaseIntegration-HiveHBaseIntegration)

      for earlier response please post comments on http://www.technology-mania.com/

      Delete
  42. My Issue is: Taking simple example , if Dimension table is in Hbase "eg(prodid,prodname,date)" and fact table is in Hive "eg: (prodid,sales)" , then I would like to know how to do integration if I wanna print O/P:(Prodname,sales,date)..

    The link provided "https://cwiki.apache.org/Hive/hbaseintegration" says that hbase table is to be created through hive....However in my case Hbase table is not created through hive......I am using hive:0.9,Hbase:0.92.1 versions ...plz help

    ReplyDelete
    Replies
    1. stuff for your requirement is also mentioned in that page
      If you want to give Hive access to an existing HBase table, use CREATE EXTERNAL TABLE:

      CREATE EXTERNAL TABLE hbase_table_2(key int, value string)
      STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
      WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf1:val")
      TBLPROPERTIES("hbase.table.name" = "some_existing_table");

      Delete
  43. Hi rahul, I have installed hadoop-0.20 using cygwin in my windows7, and namenode,secondarynamenode,datanode,jobtracker and tasktraker are working fine.
    I have set configuration in Eclipse IDE, and running wordcount example.it is also working fine.
    the problem is, if I stop all the daemons by stop-all.sh and running my wordcount example..
    it is running without any error, and producing the output file...
    I dont know how it is working....?? any ideas...please
    thanks,
    Nitai

    ReplyDelete
    Replies
    1. After you have stopped all the services might be the word count example is running in standalone mode, search for the output directory on your local filesystem
      one more check when you run wordcount example after stopping services should not give % of map completion

      Delete
  44. Hi Rahul I am new to hadoop and I made my first multicluster but hte problem I am facing is that everything worked except jps command .Its showing error -bash jps :command not found . please tell me where I am going wrong I am using CentOS 6

    ReplyDelete
    Replies
    1. On centOS jps command doesn't work,
      above mentioned jps command is for Ubuntu
      if there is no error in the logs then every thing is fine

      Delete
  45. hi i have change Java_home in hadoop-env.sh file to usr/lib/jvm/java-6-openjdk
    but on terminal it show error java_home is not set.
    what should i do?

    ReplyDelete
    Replies
    1. Hi,
      You should set complete path, I think / is missing

      in my current case it looks like this:
      JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64/

      Delete
    2. thanks for answering....but still the terminal show the same error
      may i know that hadoop 1.0.4 works on openjdk or not?
      and i have java-6-openjdk not java-6-openjdk-amd64....is that for 64 bit?

      Delete
    3. yes Hadoop-1.x works with open-jdk
      might be your path is not correct, plz check

      in my case it is "java-6-openjdk-amd64" as I am working on 64 bit machine

      Delete
    4. thanks for answering.....i will check the path

      Delete
  46. hi
    i have download hadoop 1.0.4 and extract it in a folder
    how it has to install?...

    ReplyDelete
    Replies
    1. above steps explains how to explains How to install Hadoop in pseudo distributed mode

      Delete
  47. how to go to hadoop installation directory(HADOOP_HOME)??

    ReplyDelete
    Replies
    1. its the directory where you extract hadoop and in this directory you will find all the other directories like bin, conf, src, etc

      Delete
    2. what is the next step after that?

      Delete
    3. vi config/hadoop-env.sh is the next step ?
      nothing has to do before that ?

      Delete
  48. This comment has been removed by the author.

    ReplyDelete
  49. hi
    when i run bin/hadoop jar hadoop-*-examples.jar pi 10 100 it give an error
    cannot unzip the zip file...please help

    ReplyDelete
  50. When executing the start-all some of the daemons are not started:
    for data node i see this backtrace:

    Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
    C [ld-linux-x86-64.so.2+0x868f] double+0xcf
    C [ld-linux-x86-64.so.2+0xa028] _dl_relocate_object+0x588
    C [ld-linux-x86-64.so.2+0x102d5] double+0x3d5
    C [ld-linux-x86-64.so.2+0xc1f6] _dl_catch_error+0x66
    C [libdl.so.2+0x11fa] double+0x6a

    Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
    j java.lang.ClassLoader$NativeLibrary.load(Ljava/lang/String;)V+0
    j java.lang.ClassLoader.loadLibrary0(Ljava/lang/Class;Ljava/io/File;)Z+300
    j java.lang.ClassLoader.loadLibrary(Ljava/lang/Class;Ljava/lang/String;Z)V+347
    j java.lang.Runtime.loadLibrary0(Ljava/lang/Class;Ljava/lang/String;)V+54
    j java.lang.System.loadLibrary(Ljava/lang/String;)V+7
    j org.apache.hadoop.util.NativeCodeLoader.()V+25
    v ~StubRoutines::call_stub
    j org.apache.hadoop.io.nativeio.NativeIO.()V+13
    v ~StubRoutines::call_stub
    j org.apache.hadoop.fs.FileUtil.setPermission(Ljava/io/File;Lorg/apache/hadoop/fs/permission/FsPermission;)V+22
    j org.apache.hadoop.fs.RawLocalFileSystem.setPermission(Lorg/apache/hadoop/fs/Path;Lorg/apache/hadoop/fs/permission/FsPermission;)V+6
    j org.apache.hadoop.fs.FilterFileSystem.setPermission(Lorg/apache/hadoop/fs/Path;Lorg/apache/hadoop/fs/permission/FsPermission;)V+6
    j org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(Lorg/apache/hadoop/fs/LocalFileSystem;Lorg/apache/hadoop/fs/Path;Lorg/apache/hadoop/fs/permission/FsPermission;)Z+40
    j org.apache.hadoop.util.DiskChecker.checkDir(Lorg/apache/hadoop/fs/LocalFileSystem;Lorg/apache/hadoop/fs/Path;Lorg/apache/hadoop/fs/permission/FsPermission;)V+3
    j org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance([Ljava/lang/String;Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/hdfs/server/datanode/SecureDataNodeStarter$SecureResources;)Lorg/apache/hadoop/hdfs/server/datanode/DataNode;+74
    j org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode([Ljava/lang/String;Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/hdfs/server/datanode/SecureDataNodeStarter$SecureResources;)Lorg/apache/hadoop/hdfs/server/datanode/DataNode;+99
    j org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode([Ljava/lang/String;Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/hdfs/server/datanode/SecureDataNodeStarter$SecureResources;)Lorg/apache/hadoop/hdfs/server/datanode/DataNode;+3


    Any idea on this ??

    ReplyDelete
  51. I am trying to connect from client machine to hive server.I installed hive and hadoop on the client.I am able to run hive.I have copied the hive-site.xml from server.But whenever I run any query it gives me this error..:

    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    # SIGFPE (0x8) at pc=0x00002aaaaaab368f, pid=25697, tid=1076017472
    #
    # JRE version: 6.0_31-b04
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.6-b01 mixed mode linux-amd64 compressed oops)
    # Problematic frame:
    # C [ld-linux-x86-64.so.2+0x868f] double+0xcf
    #
    # An error report file with more information is saved as:
    # /usr/local/hcat/hs_err_pid25697.log
    #
    # If you would like to submit a bug report, please visit:
    # http://java.sun.com/webapps/bugreport/crash.jsp
    # The crash happened outside the Java Virtual Machine in native code.
    # See problematic frame for where to report the bug.
    #

    ReplyDelete
  52. I have problem when I run the word count example. I have check /etc/hosts and config files (core-site.xml, hdfs-site.xml and mapred-site.xml. Could you please check it for me?

    hadoop@Hadoop hadoop]$ bin/hadoop jar hadoop-examples-1.1.1.jar wordcount input output
    Warning: $HADOOP_HOME is deprecated.

    13/01/06 13:27:18 INFO ipc.Client: Retrying connect to server: Hadoop/10.57.250.186:6868. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    13/01/06 13:27:19 INFO ipc.Client: Retrying connect to server: Hadoop/10.57.250.186:6868. Already tried 1 time(s); retry policy is
    13/01/06 13:27:22 INFO ipc.Client: Retrying connect to server: Hadoop/10.57.250.186:6868. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    13/01/06 13:27:23 INFO ipc.Client: Retrying connect to server: Hadoop/10.57.250.186:6868. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    13/01/06 13:27:24 INFO ipc.Client: Retrying connect to server: Hadoop/10.57.250.186:6868. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    13/01/06 13:27:25 INFO ipc.Client: Retrying connect to server: Hadoop/10.57.250.186:6868. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    13/01/06 13:27:26 INFO ipc.Client: Retrying connect to server: Hadoop/10.57.250.186:6868. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    13/01/06 13:27:27 INFO ipc.Client: Retrying connect to server: Hadoop/10.57.250.186:6868. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    13/01/06 13:27:27 ERROR security.UserGroupInformation: PriviledgedActionException as:hadoop cause:java.net.ConnectException: Call to Hadoop/10.57.250.186:6868 failed on connection exception: java.net.ConnectException: Connection refused
    java.net.ConnectException: Call to Hadoop/10.57.250.186:6868 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:1136)
    at org.apache.hadoop.ipc.Client.call(Client.java:1112)
    ...

    ReplyDelete
    Replies
    1. run jps command to ensure all hadoop daemons are running...

      Delete
    2. mtech11@cse-desktop:~/hadoop/bin$ hadoop jar hadoop-*-examples.jar pi 10 100
      Exception in thread "main" java.io.IOException: Error opening job jar: hadoop-*-examples.jar
      at org.apache.hadoop.util.RunJar.main(RunJar.java:130)
      Caused by: java.util.zip.ZipException: error in opening zip file
      at java.util.zip.ZipFile.open(Native Method)
      at java.util.zip.ZipFile.(ZipFile.java:127)
      at java.util.jar.JarFile.(JarFile.java:135)
      at java.util.jar.JarFile.(JarFile.java:72)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:128)


      This is is the result
      I get when I run an example map reduce job.I have tried almost all solutions,but there is no result
      Kindly help me

      Delete
    3. Please correct the name of jar file.

      Delete