Install Hadoop(Single node)
10 Sep 2015 #hadoop #bigdata #ubuntu
On a starting note :
I am assuming that you have a fresh Ubuntu install on your system as this will cut down a lot of frustration trying to debug why Hadoop is not running.
I am using nano
as the text editor for this article but you can use any other text-editor like vi
, sublime
or atom
for doing the same
Next article : Install Hadoop Multi Node 1.0.3 on Ubuntu 14.04.2
Installation
Create a dedicated user
Create a seperate user named hadoop
for seperating out the configuration files and the installation files
sys8@sys8:~$ sudo adduser hadoop
You would be then prompted to add the new UNIX password and details
add this user to the sudoer’s group
sys8@sys8:~$ sudo adduser hadoop sudo
Switch to the newly created user
sys8@sys8:~$ su - hadoop
Install JAVA which is in the default repos of Ubuntu
hadoop@sys8:~$ sudo apt-get install default-jdk
Enabling ssh for localhost is the next step
Install openssh-server
hadoop@sys8:~$ sudo apt-get install openssh-server
Generate the keys
hadoop@sys8:~$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key ( /home/hadoop/.ssh/id_rsa) :
Created directory '/home/hadoop/.ssh' .
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
68:b0:84:c0:3f:16:41:38:d9:7e:d6:63:a3:a0:28:f5 hadoop@sys8
The key's randomart image is:
+--[ RSA 2048]----+
|o =o. |
| * + |
| = + . |
| .B = * |
|..o.* = S |
|o. Eo |
|. |
| |
| |
+-----------------+
hadoop@sys8:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Copy the keys over to enable passwordless ssh
hadoop@sys8:~$ ssh-copy-id -i $HOME /.ssh/id_rsa.pub hadoop@localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is 17:f4:fc:aa:88:4d:51:b1:08:ae:df:75:2f:07:37:26.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
Test the ssh connection to the localhost
hadoop@sys8:~$ ssh hadoop@localhost
Welcome to Ubuntu 14.04.3 LTS ( GNU/Linux 3.19.0-28-generic x86_64)
* Documentation: https://help.ubuntu.com/
60 packages can be updated.
24 updates are security updates.
hadoop@sys8:~$ exit
logout
Connection to localhost closed.
hadoop@sys8:~$
Apache Hadoop Installation:
Download hadoop from apache’s site,
But first change to hadoop’s home directory first
hadoop@sys8:~$ pwd
/home/hadoop
hadoop@sys8:~$ wget https://archive.apache.org/dist/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz
--2015-09-12 18:51:49-- https://archive.apache.org/dist/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz
Resolving archive.apache.org ( archive.apache.org) ... 140.211.11.131, 192.87.106.229, 2001:610:1:80bc:192:87:106:229
Connecting to archive.apache.org ( archive.apache.org) |140.211.11.131|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 63851630 ( 61M) [ application/x-gzip]
Saving to: ‘hadoop-1.2.1.tar.gz’
100%[======================================================================================================>] 6,38,51,630 1.82MB/s in 42s
2015-09-12 18:52:32 ( 1.45 MB/s) - ‘hadoop-1.2.1.tar.gz’ saved [ 63851630/63851630]
Extract and rename the file
hadoop@sys8:~$ sudo tar xzf hadoop-1.2.1.tar.gz
hadoop@sys8:~$ sudo mv hadoop-1.2.1 hadoop
Set the user permissions
hadoop@sys8:~$ sudo chown -R hadoop:hadoop hadoop
Configuration
Update /etc/profile
hadoop@sys8:~$ sudo nano /etc/profile
Add the following lines at the end
NOTE :
This is assuming that you have installed the default-jdk from the repo’s. Change it accordingly for any other java version
1
2
3
4
5
6
7
8
9
export HADOOP_HOME = "/home/hadoop/hadoop/"
export JAVA_HOME = "/usr/lib/jvm/java-1.7.0-openjdk-amd64"
unalias fs &> /dev/null
alias fs = "hadoop fs"
unalias hls &> /dev/null
alias hls = "fs -ls"
export PATH = " $PATH : $HADOOP_HOME /bin"
Edit the configuration files for Hadoop
Edit hadoop-env.sh
Change to the hadoop folder and add the JAVA home path
hadoop@sys8:~$ cd ~/hadoop
hadoop@sys8:~/hadoop$ ls
bin CHANGES.txt docs hadoop-core-1.2.1.jar hadoop-test-1.2.1.jar ivy.xml LICENSE.txt sbin webapps
build.xml conf hadoop-ant-1.2.1.jar hadoop-examples-1.2.1.jar hadoop-tools-1.2.1.jar lib NOTICE.txt share
c++ contrib hadoop-client-1.2.1.jar hadoop-minicluster-1.2.1.jar ivy libexec README.txt src
hadoop@sys8:~/hadoop$ sudo nano conf/hadoop-env.sh
Add the following
# The java implementation to use. Required.
export JAVA_HOME = "/usr/lib/jvm/java-1.7.0-openjdk-amd64"
export HADOOP_OPTS = -Djava .net.preferIPv4Stack= true
Note :
Added the second line so as to disable ipv6 specifically for hadoop and not for the whole system
###Create the tmp
folder for Hadoop
hadoop@sys8:~/hadoop$ sudo mkdir -p /app/hadoop/tmp
### add the permissions
hadoop@sys8:~/hadoop$ sudo chown hadoop:hadoop /app/hadoop/tmp
hadoop@sys8:~/hadoop$ sudo chmod 750 /app/hadoop/tmp
Edit conf/core-site.xml
hadoop@sys8:~/hadoop$ sudo nano conf/core-site.xml
You would be having blank spaces in between the tags
It should look something like this after editing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name> hadoop.tmp.dir</name>
<value> /app/hadoop/tmp</value>
<description> A base for other temporary directories.</description>
</property>
<property>
<name> fs.default.name</name>
<value> hdfs://localhost:54310</value>
<description> The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
Edit conf/mapred-site.xml
hadoop@sys8:~/hadoop$ sudo nano conf/mapred-site.xml
It should look something like this after editing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name> mapred.job.tracker</name>
<value> localhost:54311</value>
<description> The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
Edit conf/hdfs-site.xml
hadoop@sys8:~/hadoop$ sudo nano conf/hdfs-site.xml
It should look something like this after editing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name> dfs.replication</name>
<value> 1</value>
<description> Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
Note : Run this one command only once, i.e now
hadoop@sys8:~/hadoop$ bin/hadoop namenode -format
15/09/12 19:01:48 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = sys8/127.0.1.1
STARTUP_MSG: args = [ -format ]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.7.0_79
************************************************************ /
15/09/12 19:01:49 INFO util.GSet: Computing capacity for map BlocksMap
15/09/12 19:01:49 INFO util.GSet: VM type = 64-bit
15/09/12 19:01:49 INFO util.GSet: 2.0% max memory = 932184064
15/09/12 19:01:49 INFO util.GSet: capacity = 2^21 = 2097152 entries
15/09/12 19:01:49 INFO util.GSet: recommended = 2097152, actual = 2097152
15/09/12 19:01:49 INFO namenode.FSNamesystem: fsOwner = hadoop
15/09/12 19:01:49 INFO namenode.FSNamesystem: supergroup = supergroup
15/09/12 19:01:49 INFO namenode.FSNamesystem: isPermissionEnabled = true
15/09/12 19:01:49 INFO namenode.FSNamesystem: dfs.block.invalidate.limit= 100
15/09/12 19:01:49 INFO namenode.FSNamesystem: isAccessTokenEnabled = false accessKeyUpdateInterval = 0 min( s) , accessTokenLifetime = 0 min( s)
15/09/12 19:01:49 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
15/09/12 19:01:49 INFO namenode.NameNode: Caching file names occuring more than 10 times
15/09/12 19:01:49 INFO common.Storage: Image file /app/hadoop/tmp/dfs/name/current/fsimage of size 112 bytes saved in 0 seconds.
15/09/12 19:01:49 INFO namenode.FSEditLog: closing edit log: position = 4, editlog = /app/hadoop/tmp/dfs/name/current/edits
15/09/12 19:01:49 INFO namenode.FSEditLog: close success: truncate to 4, editlog = /app/hadoop/tmp/dfs/name/current/edits
15/09/12 19:01:49 INFO common.Storage: Storage directory /app/hadoop/tmp/dfs/name has been successfully formatted.
15/09/12 19:01:49 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at sys8/127.0.1.1
************************************************************ /
##Start the Single node
NOTE :
bin/start-all.sh
is deprecated so we will use bin/start-dfs.sh
and then bin/start-mapred.dfs
hadoop@sys8:~/hadoop$ bin/start-dfs.sh
starting namenode, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-namenode-sys9.out
localhost: starting datanode, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-datanode-sys9.out
localhost: starting secondarynamenode, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-secondarynamenode-sys9.out
hadoop@sys8:~/hadoop$ bin/start-mapred.sh
starting jobtracker, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-jobtracker-sys9.out
localhost: starting tasktracker, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-tasktracker-sys9.out
Check if everything is running fine:
Run jps
for that and the output should look something like this
hadoop@sys8:~/hadoop$ jps
11508 TaskTracker
10938 NameNode
11868 Jps
11250 SecondaryNameNode
11347 JobTracker
11085 DataNode
hadoop@sys8:~/hadoop$
Stopping hadoop
hadoop@sys8:~/hadoop$ bin/stop-all.sh
That’s all Folks!