Install Hadoop(Multi Node)
13 Sep 2015 #hadoop #bigdata #ubuntuIntro:
In this article, I will describe the required steps for setting up a distributed, multi-node Apache Hadoop cluster backed by the Hadoop Distributed File System (HDFS), running on Ubuntu Linux.
I am using nano
as the text editor for this article but you can use any other text-editor like vi
, sublime
or atom
for doing the same
I suggest you set up hadoop on single node before directly going for multi node as it will be easier to debug any errors. I have explained about how to setup single node hadoop on ubuntu
Note :
I will be demonstrating for 3 nodes but you can add more nodes as you like.
Installation
Install for the three node using this tutuorial
Done? Let’s continue then!
The very first step would be stop hadoop
on all the three machines.
For that, you just need to do
on each of the nodes
Now to check whether hadoop processes are stopped in all the nodes or not!
For sys9
For sys10
For sys8
Now we have to choose which one node will be the master node, so that the other two nodes can be the slaves
In my case, I will chose sys8
to be my master
node and the others to be slaves
Master node Configuration :
Add the the ip address of the slaves to /etc/hosts
on sys8
It should look something like this after you have added it.
For sys9
For sys10
Changing the hadoop configurations :
Shift to the hadoop
directory first
Editing conf/core-site.xml
(all nodes)
After editing, it should look like
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.103.24:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
similarly for nodes sys9
and sys10
.
Note
Here 192.168.103.24
is the ip address of the master node.
We just have replaced localhost
with this ip address
Editing conf/mapred-site.xml
(All nodes)
After editing, the file should be
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>192.168.103.24:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
similarly for nodes sys9
and sys10
.
Note :
The file conf/hdfs-site.xml
remains the same for all the nodes
Editing the conf/masters
(master node only)
It should look like
Editing the conf/slaves
(master node only)
It should look like
If you have additional slaves to add up, just add those in the conf/slaves
file after a newline
So we have basically added ip’s of all the nodes inside the slaves
file.
###Generate the ssh keys for the master again
Add this key to all the nodes (Including the master)
For sys9
For sys10
For sys8
itself!
Format the /app/tmp/
directory contents (master node)
Format the namenode (master node)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
hadoop@sys8:~/hadoop$ bin/hadoop namenode -format
Warning: $HADOOP_HOME is deprecated.
15/09/13 23:11:22 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = sys8/192.168.103.24
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.7.0_79
************************************************************/
15/09/13 23:11:22 INFO util.GSet: Computing capacity for map BlocksMap
15/09/13 23:11:22 INFO util.GSet: VM type = 64-bit
15/09/13 23:11:22 INFO util.GSet: 2.0% max memory = 932184064
15/09/13 23:11:22 INFO util.GSet: capacity = 2^21 = 2097152 entries
15/09/13 23:11:22 INFO util.GSet: recommended=2097152, actual=2097152
15/09/13 23:11:22 INFO namenode.FSNamesystem: fsOwner=hadoop
15/09/13 23:11:22 INFO namenode.FSNamesystem: supergroup=supergroup
15/09/13 23:11:22 INFO namenode.FSNamesystem: isPermissionEnabled=true
15/09/13 23:11:22 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
15/09/13 23:11:22 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
15/09/13 23:11:22 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
15/09/13 23:11:22 INFO namenode.NameNode: Caching file names occuring more than 10 times
15/09/13 23:11:22 INFO common.Storage: Image file /app/hadoop/tmp/dfs/name/current/fsimage of size 112 bytes saved in 0 seconds.
15/09/13 23:11:22 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/app/hadoop/tmp/dfs/name/current/edits
15/09/13 23:11:22 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/app/hadoop/tmp/dfs/name/current/edits
15/09/13 23:11:23 INFO common.Storage: Storage directory /app/hadoop/tmp/dfs/name has been successfully formatted.
15/09/13 23:11:23 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at sys8/192.168.103.24
************************************************************/
hadoop@sys8:~/hadoop$
Start the name node (in master node)
1
2
3
4
5
6
7
8
9
10
11
12
13
hadoop@sys8:~/hadoop$ bin/start-all.sh
Warning: $HADOOP_HOME is deprecated.
starting namenode, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-namenode-sys8.out
192.168.103.24: starting datanode, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-datanode-sys8.out
192.168.103.28: starting datanode, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-datanode-sys10.out
192.168.103.26: starting datanode, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-datanode-sys9.out
192.168.103.24: starting secondarynamenode, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-secondarynamenode-sys8.out
starting jobtracker, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-jobtracker-sys8.out
192.168.103.26: starting tasktracker, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-tasktracker-sys9.out
192.168.103.28: starting tasktracker, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-tasktracker-sys10.out
192.168.103.24: starting tasktracker, logging to /home/hadoop/hadoop/libexec/../logs/hadoop-hadoop-tasktracker-sys8.out
hadoop@sys8:~/hadoop$
Check JPS (in all systems)
sys8
sys9
sys10
Check the web interface in your browser
You can also check the web interface of hadoop in your browser
Using the url : http://192.168.103.24:50030/