集群三部曲(二):完美的Hadoop集群搭建

    继上次Zookeeper集群的搭建成功后,现在来说下Hadoop的集群搭建。好了,闲话不多说,让我们开始今天的冒险之旅吧。

    一、环境:虚拟机CentOs7系统,完整的环境,请确认已安装JDK,及Hadoop安装包(本人用的是2.9.0),节点仍然使用上次克隆的两个,下面先进行其中一个的环境搭建。

    二、Hadoop配置(自己解压哈)

    配置前说明一下(很重要,拿本子记一下):首先,namenode,datanode,ResourceManager了解一下,这里准备将slave01作为namenode,slave02和slave03作为datanode,下面的操作都是以这个为前提展开的,请大家做好战前温习,以防不知下面的操作的意义。开始表演吧:

    解压后文件目录结构基本如下:

[hadoop@slave01 hadoop]$ pwd
/usr/local/hadoop
[hadoop@slave01 hadoop]$ ls
bin  include  libexec      logs        README.txt  share
etc  lib      LICENSE.txt  NOTICE.txt  sbin        tmp
[hadoop@slave01 hadoop]$ cd etc/hadoop/
[hadoop@slave01 hadoop]$ ls
capacity-scheduler.xml      httpfs-env.sh            mapred-env.sh
configuration.xsl           httpfs-log4j.properties  mapred-queues.xml.template
container-executor.cfg      httpfs-signature.secret  mapred-site.xml.template
core-site.xml               httpfs-site.xml          slaves
hadoop-env.cmd              kms-acls.xml             ssl-client.xml.example
hadoop-env.sh               kms-env.sh               ssl-server.xml.example
hadoop-metrics2.properties  kms-log4j.properties     yarn-env.cmd
hadoop-metrics.properties   kms-site.xml             yarn-env.sh
hadoop-policy.xml           log4j.properties         yarn-site.xml
hdfs-site.xml               mapred-env.cmd

    关闭防火墙,大家了解一下:

systemctl stop firewalld    #只在本次运用时生效,下次开启机器时需重复此操作
或
systemctl disable firewalld   #此命令在下次重启时生效,将永久关闭防火墙

    (1)将Hadoop添加到环境变量中

vim /etc/profile

    添加HADOOP_HOME,修改如下:

JAVA_HOME=/usr/java/jdk1.8.0_161
JRE_HOME=/usr/java/jdk1.8.0_161/jre
SCALA_HOME=/usr/local/scala
HADOOP_HOME=/usr/local/hadoop
ZOOKEEPER_HOME=/usr/local/zookeeper
KAFKA_HOME=/usr/local/kafka
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$SCALA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$KAFKA_HOME/bin
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export JAVA_HOME JRE_HOME SCALA_HOME HADOOP_HOME ZOOKEEPER_HOME KAFKA_HOME PATH CLASSPATH

    查看Hadoop解压目录下:/etc/hadoop/hadoop-env.sh文件中配置是否正常:

# The java implementation to use.
export JAVA_HOME=/usr/java/jdk1.8.0_161

# The jsvc implementation to use. Jsvc is required to run secure datanodes
# that bind to privileged ports to provide authentication of data transfer
# protocol.  Jsvc is not required if SASL is configured for authentication of
# data transfer protocol using non-privileged ports.
#export JSVC_HOME=${JSVC_HOME}

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}

    JAVA_HOME改为自己本机jdk安装地址。

    (2)修改core-site.xml文件

    从上面的目录结构中找到该文件,打开后显示的初始配置如下:

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/usr/local/hadoop/tmp</value>
        <description>Abase for other temporary directories.</description>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

    进行如下修改:

<configuration>
    <!-- 指定Hadoop临时目录 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/usr/local/hadoop/tmp</value>
        <description>Abase for other temporary directories.</description>
    </property>
    <!-- 指定HDFS的namenode的通信地址 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://slave01:9000</value>
    </property>
</configuration>

    (3)修改hdfs-site.xml文件

     从上面的目录结构中找到该文件,打开后显示的初始配置如下:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/usr/local/hadoop/tmp/dfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/usr/local/hadoop/tmp/dfs/data</value>
    </property>
</configuration>

    进行如下修改:

<configuration>
    <!-- 设置namenode的http通信地址 -->
    <property>
        <name>dfs.namenode.http-address</name>
        <value>slave01:50070</value>
    </property>
    <!-- 设置hdfs副本数量 --> 
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <!-- 设置namenode存放路径 -->
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/usr/local/hadoop/tmp/dfs/name</value>
    </property>
    <!--设置datanode存放路径  -->
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/usr/local/hadoop/tmp/dfs/data</value>
    </property>
</configuration>

    此处可添加其他的配置参数,在此不做介绍,可自行百度查看,注意设置的存放路径的文件夹的创建,若不存在,则创建。

    (4)修改mapred-site.xml文件

    首先复制一份该文件的temple,重新命名:

cp mapred-site.xml.template mapred-site.xml

    初始内容为空,修改后如下:

<configuration>
      <!-- 指定mapreduce框架为yarn方式 -->
      <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
      </property>
</configuration>

    (5)修改yarn-site.xml文件

    初始内容为空,修改后如下:

<configuration>

<!-- Site specific YARN configuration properties -->
        <!-- 设置 resourcemanager 在哪个节点-->
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>slave01</value>
        </property>
         <!-- reducer取数据的方式是mapreduce_shuffle -->
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
</configuration>

    (6)修改slaves文件(只需将作为datanode的服务器名加上即可)

slave02
slave03

    (7)拷贝配置文件到其他服务器

    若不想一个一个的进行文件配置,可以将已经配置好的文件拷贝到其他需要的服务器上,注意拷贝成功后执行命令:source /etc/profile使之生效:

//slave01上的/etc/profile文件拷贝到slave02
scp -r /etc/profile slave02:/etc/profile
//slave01上的/usr/local/hadoop文件夹整个目录拷贝到slave02
scp -r /usr/local/hadoop slave02:/usr/local/

    (8)格式化HDFS

    在namenode节点的服务器即slave01上执行命令:

hdfs namenode -format

    格式化成功则显示如下:

8/03/16 15:40:23 INFO namenode.FSDirectory: XAttrs enabled? true
18/03/16 15:40:23 INFO namenode.NameNode: Caching file names occurring more than 10 times
18/03/16 15:40:23 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: falseskipCaptureAccessTimeOnlyChange: false
18/03/16 15:40:23 INFO util.GSet: Computing capacity for map cachedBlocks
18/03/16 15:40:23 INFO util.GSet: VM type       = 64-bit
18/03/16 15:40:23 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
18/03/16 15:40:23 INFO util.GSet: capacity      = 2^18 = 262144 entries
18/03/16 15:40:23 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
18/03/16 15:40:23 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
18/03/16 15:40:23 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
18/03/16 15:40:23 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
18/03/16 15:40:23 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
18/03/16 15:40:23 INFO util.GSet: Computing capacity for map NameNodeRetryCache
18/03/16 15:40:23 INFO util.GSet: VM type       = 64-bit
18/03/16 15:40:23 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
18/03/16 15:40:23 INFO util.GSet: capacity      = 2^15 = 32768 entries
Re-format filesystem in Storage Directory /usr/local/hadoop/tmp/dfs/name ? (Y or N) y
18/03/16 15:40:26 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1094714660-127.0.0.1-1521186026480
18/03/16 15:40:26 INFO common.Storage: Will remove files: [/usr/local/hadoop/tmp/dfs/name/current/VERSION, /usr/local/hadoop/tmp/dfs/name/current/seen_txid, /usr/local/hadoop/tmp/dfs/name/current/fsimage_0000000000000000000.md5, /usr/local/hadoop/tmp/dfs/name/current/fsimage_0000000000000000000, /usr/local/hadoop/tmp/dfs/name/current/edits_0000000000000000001-0000000000000000004, /usr/local/hadoop/tmp/dfs/name/current/fsimage_0000000000000000004.md5, /usr/local/hadoop/tmp/dfs/name/current/fsimage_0000000000000000004, /usr/local/hadoop/tmp/dfs/name/current/edits_0000000000000000005-0000000000000000005, /usr/local/hadoop/tmp/dfs/name/current/edits_inprogress_0000000000000000006]
18/03/16 15:40:26 INFO common.Storage: Storage directory /usr/local/hadoop/tmp/dfs/name has been successfully formatted.
18/03/16 15:40:26 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
18/03/16 15:40:26 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 323 bytes saved in 0 seconds.
18/03/16 15:40:26 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
18/03/16 15:40:26 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at slave01/127.0.0.1
************************************************************/
[hadoop@slave01 hadoop]$ 

    中间有要操作的直接按提示即可,注意,下面说下报错的情况,可能很多同学在配置的时候都可能遇到过,折腾不少时间吧:

18/03/16 15:54:14 WARN namenode.NameNode: Encountered exception during format: 
java.io.IOException: Cannot remove current directory: /usr/local/hadoop/tmp/dfs/name/current
	at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:358)
	at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:571)
	at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:592)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:166)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1172)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1614)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1741)
18/03/16 15:54:14 ERROR namenode.NameNode: Failed to start namenode.
java.io.IOException: Cannot remove current directory: /usr/local/hadoop/tmp/dfs/name/current
	at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:358)
	at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:571)
	at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:592)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:166)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1172)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1614)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1741)
18/03/16 15:54:14 INFO util.ExitUtil: Exiting with status 1: java.io.IOException: Cannot remove current directory: /usr/local/hadoop/tmp/dfs/name/current
18/03/16 15:54:14 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at slave02/127.0.0.1
************************************************************/
[hadoop@slave02 hadoop]$ 

    报错的原因呢,只是因为没有设置 /usr/local/hadoop/tmp文件夹的权限,悲伤辣么大,我竟无言以对。解决方法,当然是给这个目录加上权限了,chmod了解一下,在三个服务器上都进行该操作:

sudo chmod -R a+w /usr/local/hadoop

    到此呢,基本的配置已经完成了。当然,还有许多的配置没有使用,这里只是做个初步简单的集群。

    (三)启动hadoop集群

    (1)在slave01上执行如下命令

start-dfs.sh
start-yarn.sh
jps

    启动成功显示如下:

[hadoop@slave01 sbin]$ ./start-dfs.sh 
Starting namenodes on [slave01]
slave01: namenode running as process 26894. Stop it first.
The authenticity of host 'slave03 (192.168.89.131)' can't be established.
ECDSA key fingerprint is SHA256:AJ/rhsl+I6zFOYitxSG1CuDMEos0Oue/u8co7cF5L0M.
ECDSA key fingerprint is MD5:75:eb:3c:52:df:9b:35:cb:b3:05:c4:1a:20:13:73:01.
Are you sure you want to continue connecting (yes/no)? slave02: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-slave02.out
yes
slave03: Warning: Permanently added 'slave03,192.168.89.131' (ECDSA) to the list of known hosts.
slave03: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-slave03.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:AJ/rhsl+I6zFOYitxSG1CuDMEos0Oue/u8co7cF5L0M.
ECDSA key fingerprint is MD5:75:eb:3c:52:df:9b:35:cb:b3:05:c4:1a:20:13:73:01.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-slave01.out
[hadoop@slave01 sbin]$ jps
27734 Jps
27596 SecondaryNameNode
26894 NameNode
[hadoop@slave01 sbin]$ ./start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-resourcemanager-slave01.out
slave03: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-slave03.out
slave02: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-slave02.out
[hadoop@slave01 sbin]$ jps
28080 Jps
27814 ResourceManager
27596 SecondaryNameNode
26894 NameNode
[hadoop@slave01 sbin]$ 

    中间需要操作的直接按提示来,从jps命令显示的结果可以看出,NameNode和ResourceManager等都已经正常启动了,很欣慰啊。

    (2)在slave02和slave03上执行jps命令

#slave02
[hadoop@slave02 hadoop]$ jps
12296 DataNode
13226 Jps
12446 NodeManager
[hadoop@slave02 hadoop]$ 

#slave03
[hadoop@slave03 hadoop]$ jps
12122 NodeManager
11978 DataNode
12796 Jps
[hadoop@slave03 hadoop]$

    可以看到DataNode和NodeManager都正常启动了。若没有成功启动,需要关掉slave01中的进程,找寻原因,我也出现该问题了,后来将三个服务器上/hadoop/tmp/dfs目录下的data文件夹删掉了,重新启动则成功了。当然视情况而定。

    (3)查看hadoop运行管理界面

    访问http://slave01:8088(默认8088端口,若被占用,请修改):

    访问http://slave01:50070:

    好了,现在Hadoop集群的搭建已经完成了,O(∩_∩)O。

文章来源:

Author:海岸线的曙光
link:https://my.oschina.net/u/3747963/blog/1636026