########################################################
配置host 
Namenode的机器,需要配置集群中所有机器的ip 
修改/etc/hosts 
10.10.236.190   master  
10.10.236.191   slave-A  
10.10.236.193   slave-B
其他的datanode的/etc/hosts 只需要配置namenode的机器ip和本机ip 
10.10.236.190   master  
10.10.236.191   slave-A
#######################################################


1. 拓扑图

目录结构:
/usr/local/hadoop2.2               --hadoop主目录
/data/hdfs                               --hdfs数据主目录
架构:namenode做成raid5


2.创建独立的hadoop用户
useradd -u 1001 hadoop
password hadoop

3.下载安装软件
   1).jkd1.6以上(/opt/jdk1.6)
   2)hadoop-xx.tar.gz

mkdir -p /usr/local/hadoop2.2
tar -zxvf hadoop-2.2.0.tar.gz -C /usr/local/hadoop2.2

chown hadoop:hadoop -R /usr/local/hadoop2.2
chmod 755 -R /usr/local/hadoop2.2

4.创建hadoop用户间互信


4.1)每个节点产生自己的秘钥对

--node1:
mkdir -p ~/.ssh
chmod 755 ~/.ssh
--node2:
mkdir -p ~/.ssh
chmod 755 ~/.ssh

--node1:

ssh hadoop1 cat ~/.ssh/id_dsa.pub >> authorized_keys
ssh hadoop1 cat ~/.ssh/id_rsa.pub >> authorized_keys
ssh hadoop2 cat ~/.ssh/id_dsa.pub >> authorized_keys
ssh hadoop2 cat ~/.ssh/id_rsa.pub >> authorized_keys

4.2)将authoried_keys分发到另外的节点
将node1的authoried_keys拷贝到远程node2节点
scp authorized_keys hadoop2:~/.ssh/
--node2:
chmod 600 authoried_keys

4.3)测试
--node1
[hadoop@hadoop1 .ssh]$ ssh hadoop1 date
Mon Apr 21 18:37:09 PDT 2014
[hadoop@hadoop1 .ssh]$ ssh hadoop2 date
Mon Apr 21 18:48:02 PDT 2014

--node2
ssh hadoop1 date
The authenticity of host 'hadoop1 (192.168.0.201)' can't be established.
RSA key fingerprint is a1:60:f5:71:da:5a:ca:75:f8:e5:8a:d5:eb:84:95:60.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hadoop1,192.168.0.201' (RSA) to the list of known hosts. --由于没有node2没有known_hosts第一次需要添加

[hadoop@hadoop2 .ssh]$ ssh hadoop1 date
Mon Apr 21 18:37:37 PDT 2014                                                                               --证明添加成功

[hadoop@hadoop2 .ssh]$ ssh hadoop2 date
The authenticity of host 'hadoop2 (192.168.0.202)' can't be established.
RSA key fingerprint is 7c:06:f6:12:09:ce:33:1b:8b:ad:88:94:f5:14:f5:15.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hadoop2,192.168.0.202' (RSA) to the list of known hosts.  ---添加到known_hosts
Mon Apr 21 18:48:32 PDT 2014
[hadoop@hadoop2 .ssh]$ ssh hadoop2 date                                 
Mon Apr 21 18:48:34 PDT 2014                                                                                --证明添加成功



5.修改hadoop用户默认配置.bash_profile

export JAVA_HOME=/opt/jdk1.6
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH  
export PATH==$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
export HADOOP_HOME=/usr/local/hadoop2.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:HADOOP_HOME/sbin
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH

6. 配置hadoop
hadoop1 为name node
hadoop2..hadoop2为data node

7. 配置hadoop 
cd $HADOOP_HOME/etc/hadoop
7.1.1).修改hadoop-env.sh,添加jdk支持export JAVA_HOME=/opt/jdk1.6    //如果在用户环境变量配置,则不需要配置

修改hadoop_env.sh
JAVA_HOME=/opt/jdk1.6                           // 必须更改
--  如果ssh端口不是默认的22,在conf/hadoop-env.sh里改下。如:
--export HADOOP_SSH_OPTS="-p 1234"       //默认是22则不需要更改
7.1.2)配置yarn-env.sh
更改
export JAVA_HOME=/opt/jdk1.6

7.2).修改core-site.xml,增加下面内容 
fs.default.name    hdfs://master:54321                                                         //这个才是真正决定namenode  
hadoop.tmp.dir    /data/hdfs/tmp                                                               //临时文件,有问题的时候,可以删除  ,A base for other temporary directories.  
7.3).修改hdfs-site.xml,增加下面内容 
dfs.name.dir  /data/hdfs/name                                   //namenode持久存储名字空间,事务日志的本地路径,name node存放数据  
dfs.data.dir   /data/hdfs/data                                     //datanode存放数据的路径  

--dfs.datanode.max.xcievers  4096  
--dfs.replication  1                                                      //数据备份的个数,默认是3  
--mapred.job.tracker  master:54311                              //jobTracker的主机  


7.4).修改mapred-site.xml,增加下面内容
mapred.job.tracker  hadoop1:54320                              

7.5). 修改masters,这个决定那个是secondarynamenode   //手动创建文件
hadoop1
7.6) .修改slaves,这个是所有datanode的机器                    //手动创建文件
hadoop2
hadoop3(如果有节点就追加)

7.7)配置yarn资源管理


7.8) 将配置好的hadoop配置文件拷贝到所有的datanode 
 



7. 格式化hdfs文件系统的namenode 
hadoop namenode -fomat
y

8.启动hadoop集群
--$HADOOP_HOME/ bin/start-all.sh       --已经废弃
$HADOOP_HOME/sbin/start-dfs.sh

上面命令等价如下命令:
 hdfs namenode  
 hdfs secondarynamenode  
 hdfs datanode  
9.验证:
--name node
[hadoop@hadoop1 hadoop]$ start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop2.2/logs/yarn-hadoop-resourcemanager-hadoop1.out
hadoop2: starting nodemanager, logging to /usr/local/hadoop2.2/logs/yarn-hadoop-nodemanager-hadoop2.out

[hadoop@hadoop1 hadoop]$ jps 
32318 Jps
32054 SecondaryNameNode
31885 NameNode
32254 ResourceManager

--datanode
[hadoop@hadoop2 sbin]$ jps 
29711 DataNode
29822 Jps


10. 网页验证yarn默认
http://hadoop1:8088 
http://192.168.0.201:8088       //管理窗口, 查看mr
http://192.168.0.201:50070     //查看节点信息
//http://192.168.0.201:50030     //Jobtracker

10.hdfs操作

建立目录 
$HADOOP_HOME/bin/hdfs dfs -mkdir /testdir                    //注意此处的'/'和传统的不一样,这里为 hdfs 的根目录
查看现有文件 
$HADOOP_HOME/bin/hdfs dfs -ls /
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2014-04-21 23:19 /testdir


10.关闭Hdfs
$HADOOP_HOME/sbin/stop-dfs.sh

 

11.测试hadoop运算
 hadoop jar  /usr/local/hadoop2.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar    pi   2 5