Install Hadoop on windows Part 3

Before reading this article, I highly recommend reading my previous article.

After successful installation of ubuntu loged in into ubuntu with your credentials.
1

After login we have to install ubuntu update if there is any. Write the following code.

sudo apt-get install update

2

If you find “Unable to locate package update”, it means your operating system does not require any update.

Install jdk using following code

sudo apt-get install default-jdk

Lets create a dedicated hadoop group and hadoop user called hduser

sudo addgroup hadoop

4

It return an error that hadoop group already exists. We had created that group when we install ubuntu on VM. Now we add user.

sudo adduser --ingroup hadoop hduser

5

After entering password leave the default and say y.
7

Now let’s add hduser as administrator or sudoer.

sudo adduser hduser sudo

Now let’s install openssh server. Wikipedia says “OpenSSH, also known as OpenBSD Secure Shell,[a] is a suite of security-related network-level utilities based on the SSH protocol, which help to secure network communications via the encryption of network traffic over multiple authentication methods and by providing secure tunneling capabilities.”
9

Now let’s login with hduser and generate a key for hduser and add the key to the autherized keys.

su hduser
ssh-keygen –t rsa –P ""

10

11

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

12
Now lets try to login on localhost

ssh localhost

14

Don’t worry about all these messages. Now say logout for close connection of localhost.

Now let’s install hadoop. Download hadoop from “http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.7.1.tar.gz”.
16

After download completed you will find a message like “hadoop-2.7.1.tar.gz saved”. For me its like “hadoop-2.7.1.tar.gz.2 saved” because of my net connection. This download completed on third attempt.

18

tar xvzf hadoop-2.7.1.tar.gz

19

Don’t confuse with tar xvzf hadoop-2.7.1.tar.gz.2. My zip downloaded file name is
“hadoop-2.7.1.tar.gz.2” that’s why I have written hadoop-2.7.1.tar.gz.2.

Now let’s move hadoop 2.7.1 to a directory /usr/local/hadoop.

sudo mv hadoop-2.7.1 /usr/local/hadoop

20

Let give the directory to the hduser as the owner. Afterthat edit the bashrc file and append to the end of the file the path of hadoop.

sudo chown –R hduser /user/local
sudo nano ~/.bashrc

21

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS=”-Djava.library.path”= $HADOOP_HOME/lib

22

After press ctr+”X”
23 + press control+x

Say yes.

source ~/.bashrc

24

Now let give the java path to run hadoop.

sudo nano  /usr/local/hadoop/etc/hadoop/hadoop-env.sh

After some scroll we find export JAVA_HOME=$( JAVA_HOME)
25

Replace $(JAVA_HOME) to usr/lib/jvm/java-7-openjdk-amd64(your java location) and save.
Now let configure the following xml file. Write following code in the configuration tag and save.

Core-site.xml

sudo nano /usr/local/hadoop/etc/hadoop/hadoop/core-site.xml

28


fs.default.name
hdfc://localhost:9000

29

hdfs-site.xml

sudo nano /usr/local/hadoop/etc/hadoop/hadoop/hdfs-site.xml


dfs.replication
1


dfs.namenode.name.dir
file:/usr/local/hadoop_tem/hdfs/namenode


dfs.datanode.data.dir
 file:/usr/local/hadoop_tem/hdfs/datanode 

30

yarn-site.xml

sudo nano /usr/local/hadoop/etc/hadoop/hadoop/yarn-site.xml


yarn.nodemanager.aux-services
mapreduce_shuffle


 yarn.nodemanager.aux-services.mapreduce.shuffle.class 
org.apache.hadoop.mapred.ShuffleHandler

31

Let copy the mapred.xml template and then edit the file.

cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml

After copy the file let make the following changes.
mapred-site.xml

sudo nano /usr/local/hadoop/etc/hadoop/hadoop/mapred-site.xml


mapreduce.framework.name
yarn

34

Now create a folder where hadoop will process the hdfs jobs.

sudo mkdir –p /usr/local/hadoop_tmp
sudo mkdir –p /usr/local/hadoop_tmp/hdfs/namenode
sudo mkdir –p /usr/local/hadoop_tmp/hdfs/datanode

35

36

Now assign hduser the ownership of the folder. Run all the following commonds.

sudo chown –R hduser /usr/local/hadoop_tmp 
hdfs namenode –format
start –dfs.sh
start-yarn.sh
jsp

Now single node hadoop cluster installed. Now you can write the program.

Hope this article is helpful.
Happy coding 

Tags: ,