Sunday, September 27, 2015

Hadoop Installation


This post covers Hadoop 1.2.1 and Hadoop 2.2.1 installation After downloading the tar ball tar xzf hadoop.tar.gz apply the following changes

HADOOP 1.2.1

On Master

conf/core-site.xml

<configuration> <property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://master:59310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property> </configuration>

conf/mapred-site.xml

<configuration> <property> <name>mapred.job.tracker</name> <value>hdfs://master:59311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> </configuration>

conf/hdfs-site.xml

<configuration> <property> <name>dfs.replication</name> <value>2</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property> </configuration>

hadoop-env.sh

Add this: export JAVA_HOME=/usr/lib/jvm/java-1.6.0

conf/masters

add: master

conf/slaves

master slave

On SLAVE

conf/core-site.xml

<configuration> <property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://master:59310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property> </configuration>

conf/hadoop-env.sh

Add export JAVA_HOME=/usr/java/jdk1.6.0_23

conf/mapred-site.xml

<configuration> <property> <name>mapred.job.tracker</name> <value>master:59311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> </configuration>

conf/hdfs-site.xml

<configuration> <property> <name>dfs.replication</name> <value>2</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property> </configuration>

on master jps (with hbase running)

30338 NameNode 30613 SecondaryNameNode 14416 HRegionServer 30856 TaskTracker 14119 HQuorumPeer 30467 DataNode 11112 Jps 30715 JobTracker 10542 RunJar 14248 HMaster

on slave jps

[hduser@tomcattest conf]$ /usr/java/jdk1.6.0_23/bin/jps 28709 Jps 15090 DataNode 15198 TaskTracker

Some helpful Scripts

myprestart.sh

rm -rf $HADOOP_HOME/hdfs rm $HADOO_HOME/logs/* $HADOOP_PREFIX/bin/hdfs namenode -format

mystartall.sh

$HADOOP_PREFIX/sbin/hadoop-daemon.sh start namenode $HADOOP_PREFIX/sbin/hadoop-daemon.sh start datanode $HADOOP_PREFIX/sbin/yarn-daemon.sh start resourcemanager $HADOOP_PREFIX/sbin/yarn-daemon.sh start nodemanager

mystartdfs.sh(note Yarn is introduced in hadoop-2)

#rm -rf $HADOOP_PREFIX/hdfs #$HADOOP_PREFIX/bin/hdfs namenode -format $HADOOP_PREFIX/sbin/start-dfs.sh $HADOOP_PREFIX/bin/hdfs dfs -mkdir -p /user/hduser $HADOOP_PREFIX/sbin/yarn-daemon.sh start resourcemanager $HADOOP_PREFIX/sbin/yarn-daemon.sh start nodemanager

mystopall.sh

$HADOOP_PREFIX/sbin/hadoop-daemon.sh stop namenode $HADOOP_PREFIX/sbin/hadoop-daemon.sh stop secondarynamenode $HADOOP_PREFIX/sbin/hadoop-daemon.sh stop datanode $HADOOP_PREFIX/sbin/yarn-daemon.sh stop resourcemanager $HADOOP_PREFIX/sbin/yarn-daemon.sh stop nodemanager $HADOOP_PREFIX/sbin/stop-dfs.sh

mystopdfs.sh

$HADOOP_PREFIX/sbin/stop-dfs.sh $HADOOP_PREFIX/sbin/yarn-daemon.sh stop resourcemanager $HADOOP_PREFIX/sbin/yarn-daemon.sh stop nodemanager

the following scripts are used on SLAVE

myprestart.sh

rm -rf hdfs rm logs/* $HADOOP_PREFIX/bin/hdfs namenode -format

mystartdfs.sh

#rm -rf hdfs #$HADOOP_PREFIX/bin/hdfs namenode -format #./myprestart.sh $HADOOP_PREFIX/sbin/yarn-daemon.sh start resourcemanager $HADOOP_PREFIX/sbin/yarn-daemon.sh start nodemanager

mystopdfs.sh

$HADOOP_PREFIX/sbin/yarn-daemon.sh stop resourcemanager $HADOOP_PREFIX/sbin/yarn-daemon.sh stop nodemanager

Hadoop 2.2.1

MASTER: $HADOOP_HOME/etc/hadoop/slaves

master slave

MASTER: $HADOOP_HOME/etc/hadoop/core-site.xml

<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master/</value> <description>NameNode URI</description> </property> </configuration>

MASTER: $HADOOP_HOME/etc/hadoop/hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.datanode.data.dir</name> <value>file:///usr/local/hadoop-2.2.0/hdfs/datanode</value> <description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///usr/local/hadoop-2.2.0/hdfs/namenode</value> <description>Path on the local filesystem where the NameNode stores the namespace and transaction logs persistently.</description> </property> </configuration>

MASTER: $HADOOP_HOME/etc/hadoop/mapred-site.xml

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>1024</value> </property> <property> <name>yarn.app.mapreduce.am.command-opts</name> <value>-Xmx768m</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> <description>Execution framework.</description> </property> <property> <name>mapreduce.map.cpu.vcores</name> <value>1</value> <description>The number of virtual cores required for each map task.</description> </property> <property> <name>mapreduce.reduce.cpu.vcores</name> <value>1</value> <description>The number of virtual cores required for each map task.</description> </property> <property> <name>mapreduce.map.memory.mb</name> <value>1024</value> <description>Larger resource limit for maps.</description> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx768m</value> <description>Heap-size for child jvms of maps.</description> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>1024</value> <description>Larger resource limit for reduces.</description> </property> <property> <name>mapreduce.reduce.java.opts</name> <property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx768m</value> <description>Heap-size for child jvms of reduces.</description> </property> <property> <name>mapreduce.jobtracker.address</name> <value>master:8021</value> </property> </configuration>

SLAVE:mapred-site.xml

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>1024</value> </property> <property> <name>yarn.app.mapreduce.am.command-opts</name> <value>-Xmx768m</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> <description>Execution framework.</description> </property> <property> <name>mapreduce.map.cpu.vcores</name> <value>1</value> <description>The number of virtual cores required for each map task.</description> </property> <property> <name>mapreduce.reduce.cpu.vcores</name> <value>1</value> <description>The number of virtual cores required for each map task.</description> </property> <property> <name>mapreduce.map.memory.mb</name> <value>1024</value> <description>Larger resource limit for maps.</description> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx768m</value> <description>Heap-size for child jvms of maps.</description> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>1024</value> <description>Larger resource limit for reduces.</description> </property> <property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx768m</value> <description>Heap-size for child jvms of reduces.</description> </property> <property> <name>mapreduce.jobtracker.address</name> <value>master:8021</value> </property> </configuration>

SLAVE: $HADOOP_HOME/etc/hadoop/hdfs-site.xml

<configuration> <property> <name>dfs.datanode.data.dir</name> <value>file:///usr/local/hadoop-2.2.0/hdfs/datanode</value> <description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///usr/local/hadoop-2.2.0/hdfs/namenode</value> <description>Path on the local filesystem where the NameNode stores the namespace and transaction logs persistently.</description> </property> </configuration>