Install Hadoop(Single node)
10 Sep 2015 #hadoop #bigdata #ubuntuOn a starting note :
I am assuming that you have a fresh Ubuntu install on your system as this will cut down a lot of frustration trying to debug why Hadoop is not running.
I am using nano as the text editor for this article but you can use any other text-editor like vi, sublime or atom for doing the same
Next article : Install Hadoop Multi Node 1.0.3 on Ubuntu 14.04.2
Installation
Create a dedicated user
Create a seperate user named hadoop for seperating out the configuration files and the installation files
sys8@sys8:~$ sudo adduser hadoopYou would be then prompted to add the new UNIX password and details
add this user to the sudoer’s group
sys8@sys8:~$ sudo adduser hadoop sudoSwitch to the newly created user
sys8@sys8:~$ su - hadoopInstall JAVA which is in the default repos of Ubuntu
hadoop@sys8:~$ sudo apt-get install default-jdkConfigure SSH:
Enabling ssh for localhost is the next step
Install openssh-server
hadoop@sys8:~$ sudo apt-get install openssh-serverGenerate the keys
hadoop@sys8:~$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
Created directory '/home/hadoop/.ssh'.
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
68:b0:84:c0:3f:16:41:38:d9:7e:d6:63:a3:a0:28:f5 hadoop@sys8
The key's randomart image is:
+--[ RSA 2048]----+
|o =o.            |
| * +             |
|  = + .          |
|  .B = *         |
|..o.* = S        |
|o.  Eo           |
|.                |
|                 |
|                 |
+-----------------+
hadoop@sys8:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keysCopy the keys over to enable passwordless ssh
hadoop@sys8:~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is 17:f4:fc:aa:88:4d:51:b1:08:ae:df:75:2f:07:37:26.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.Test the ssh connection to the localhost
hadoop@sys8:~$ ssh hadoop@localhost
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.19.0-28-generic x86_64)
 * Documentation:  https://help.ubuntu.com/
60 packages can be updated.
24 updates are security updates.
hadoop@sys8:~$ exit
logout
Connection to localhost closed.
hadoop@sys8:~$Apache Hadoop Installation:
Download hadoop from apache’s site,
But first change to hadoop’s home directory first
hadoop@sys8:~$ pwd
/home/hadoop
hadoop@sys8:~$ wget https://archive.apache.org/dist/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz
--2015-09-12 18:51:49--  https://archive.apache.org/dist/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz
Resolving archive.apache.org (archive.apache.org)... 140.211.11.131, 192.87.106.229, 2001:610:1:80bc:192:87:106:229
Connecting to archive.apache.org (archive.apache.org)|140.211.11.131|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 63851630 (61M) [application/x-gzip]
Saving to: ‘hadoop-1.2.1.tar.gz’
100%[======================================================================================================>] 6,38,51,630 1.82MB/s   in 42s    
2015-09-12 18:52:32 (1.45 MB/s) - ‘hadoop-1.2.1.tar.gz’ saved [63851630/63851630]Extract and rename the file
hadoop@sys8:~$ sudo tar xzf hadoop-1.2.1.tar.gz
hadoop@sys8:~$ sudo mv hadoop-1.2.1 hadoopSet the user permissions
hadoop@sys8:~$ sudo chown -R hadoop:hadoop hadoopConfiguration
Update /etc/profile
hadoop@sys8:~$ sudo nano /etc/profileAdd the following lines at the end
NOTE :
This is assuming that you have installed the default-jdk from the repo’s. Change it accordingly for any other java version
1
2
3
4
5
6
7
8
9
export HADOOP_HOME="/home/hadoop/hadoop/"
export JAVA_HOME="/usr/lib/jvm/java-1.7.0-openjdk-amd64"
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
export PATH="$PATH:$HADOOP_HOME/bin"
 
Edit the configuration files for Hadoop
Edit hadoop-env.sh
Change to the hadoop folder and add the JAVA home path
hadoop@sys8:~$ cd ~/hadoop
hadoop@sys8:~/hadoop$ ls
bin        CHANGES.txt  docs                     hadoop-core-1.2.1.jar         hadoop-test-1.2.1.jar   ivy.xml  LICENSE.txt  sbin   webapps
build.xml  conf         hadoop-ant-1.2.1.jar     hadoop-examples-1.2.1.jar     hadoop-tools-1.2.1.jar  lib      NOTICE.txt   share
c++        contrib      hadoop-client-1.2.1.jar  hadoop-minicluster-1.2.1.jar  ivy                     libexec  README.txt   src
hadoop@sys8:~/hadoop$ sudo nano conf/hadoop-env.shAdd the following
# The java implementation to use.  Required.
export JAVA_HOME="/usr/lib/jvm/java-1.7.0-openjdk-amd64"
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=trueNote :
Added the second line so as to disable ipv6 specifically for hadoop and not for the whole system
###Create the tmp folder for Hadoop
hadoop@sys8:~/hadoop$ sudo mkdir -p /app/hadoop/tmp
### add the permissions
hadoop@sys8:~/hadoop$ sudo chown hadoop:hadoop /app/hadoop/tmp
hadoop@sys8:~/hadoop$ sudo chmod 750 /app/hadoop/tmpEdit conf/core-site.xml
hadoop@sys8:~/hadoop$ sudo nano conf/core-site.xmlYou would be having blank spaces in between the 
It should look something like this after editing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
 <property>
  <name>hadoop.tmp.dir</name>
  <value>/app/hadoop/tmp</value>
  <description>A base for other temporary directories.</description>
 </property>
 <property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
 </property>
</configuration>
 
Edit conf/mapred-site.xml
hadoop@sys8:~/hadoop$ sudo nano conf/mapred-site.xmlIt should look something like this after editing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
 <property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>
</configuration>
 
Edit conf/hdfs-site.xml
hadoop@sys8:~/hadoop$ sudo nano conf/hdfs-site.xmlIt should look something like this after editing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
 <property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
 </property>
</configuration>
 
Format the HDFS file
Note : Run this one command only once, i.e now
hadoop@sys8:~/hadoop$ bin/hadoop namenode -format
15/09/12 19:01:48 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = sys8/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.2.1
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG:   java = 1.7.0_79
************************************************************/
15/09/12 19:01:49 INFO util.GSet: Computing capacity for map BlocksMap
15/09/12 19:01:49 INFO util.GSet: VM type       = 64-bit
15/09/12 19:01:49 INFO util.GSet: 2.0% max memory = 932184064
15/09/12 19:01:49 INFO util.GSet: capacity      = 2^21 = 2097152 entries
15/09/12 19:01:49 INFO util.GSet: recommended=2097152, actual=2097152
15/09/12 19:01:49 INFO namenode.FSNamesystem: fsOwner=hadoop
15/09/12 19:01:49 INFO namenode.FSNamesystem: supergroup=supergroup
15/09/12 19:01:49 INFO namenode.FSNamesystem: isPermissionEnabled=true
15/09/12 19:01:49 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
15/09/12 19:01:49 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
15/09/12 19:01:49 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
15/09/12 19:01:49 INFO namenode.NameNode: Caching file names occuring more than 10 times 
15/09/12 19:01:49 INFO common.Storage: Image file /app/hadoop/tmp/dfs/name/current/fsimage of size 112 bytes saved in 0 seconds.
15/09/12 19:01:49 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/app/hadoop/tmp/dfs/name/current/edits
15/09/12 19:01:49 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/app/hadoop/tmp/dfs/name/current/edits
15/09/12 19:01:49 INFO common.Storage: Storage directory /app/hadoop/tmp/dfs/name has been successfully formatted.
15/09/12 19:01:49 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at sys8/127.0.1.1
************************************************************/##Start the Single node
NOTE :
bin/start-all.sh is deprecated so we will use bin/start-dfs.sh and then bin/start-mapred.dfs
hadoop@sys8:~/hadoop$ bin/start-dfs.sh 
starting namenode, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-namenode-sys9.out
localhost: starting datanode, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-datanode-sys9.out
localhost: starting secondarynamenode, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-secondarynamenode-sys9.out
hadoop@sys8:~/hadoop$ bin/start-mapred.sh 
starting jobtracker, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-jobtracker-sys9.out
localhost: starting tasktracker, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-tasktracker-sys9.outCheck if everything is running fine:
Run jps for that and the output should look something like this
hadoop@sys8:~/hadoop$ jps
11508 TaskTracker
10938 NameNode
11868 Jps
11250 SecondaryNameNode
11347 JobTracker
11085 DataNode
hadoop@sys8:~/hadoop$ Stopping hadoop
hadoop@sys8:~/hadoop$ bin/stop-all.shThat’s all Folks!