[研究] 雲端軟體 Hadoop 0.20.2 安裝 (CentOS 5.5 x86)
Hadoop 是個架設雲端的系統,它參考Google Filesystem,以Java開發,提供HDFS與MapReduce API。
官方網站
http://hadoop.apache.org/common/releases.htmlhttp://apache.ntu.edu.tw/hadoop/core/ Quick Start
http://hadoop.apache.org/common/docs/current/quickstart.html#Supported+Platforms一、準備工作
1.安裝基本套件
代碼:
[root@localhost ~]# yum -y install openssh rsync
[root@localhost ~]# chmod +x jre-6u21-linux-i586-rpm.bin
[root@localhost ~]# ./jre-6u21-linux-i586-rpm.bin
[root@localhost ~]# find / -name java
/etc/java
/etc/alternatives/java
/usr/java
/usr/java/jre1.6.0_21/bin/java
/usr/bin/java
/usr/share/java
/usr/lib/java
/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre/bin/java
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0/jre/bin/java
/usr/lib/openoffice.org/ure/share/java
/usr/lib/openoffice.org/basis3.1/share/Scripts/java
/var/lib/alternatives/java
[root@localhost ~]#
2.建立 hadoop 帳號,設定密碼,切換為 hadoop 身分
代碼:
[root@centos1 ~]# useradd hadoop
[root@centos1 ~]# passwd hadoop
[root@centos1 ~]# su hadoop
[hadoop@localhost root]$ cd
[hadoop@localhost ~]$
3.設定 ssh 連線免輸入密碼
代碼:
[hadoop@localhost ~]$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Created directory '/home/hadoop/.ssh'.
Your identification has been saved in /home/hadoop/.ssh/id_dsa.
Your public key has been saved in /home/hadoop/.ssh/id_dsa.pub.
The key fingerprint is:
2c:8a:b6:f5:99:e9:2d:a2:43:23:8c:44:a8:14:be:ee
[email protected][hadoop@localhost ~]$
[hadoop@localhost ~]$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
[hadoop@localhost ~]$ chmod 600 .ssh/authorized_keys
測試一下,第一次測試可能仍會詢問問題
代碼:
[hadoop@localhost ~]$ ssh hadoop@localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
RSA key fingerprint is 2b:90:90:a7:f4:c0:31:df:06:ec:51:99:7f:f0:39:6d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
[hadoop@localhost ~]$ exit
Connection to localhost closed.
第二次測試應該可以直接連線
代碼:
[hadoop@localhost ~]$ ssh hadoop@localhost
Last login: Fri Jul 9 01:50:05 2010 from localhost.localdomain
[hadoop@localhost ~]$
3.下載解壓hadoop
代碼:
[hadoop@localhost ~]$ wget http://apache.ntu.edu.tw/hadoop/core/hadoop-0.20.2/hadoop-0.20.2.tar.gz
[hadoop@localhost ~]$ tar xzvf hadoop-0.20.2.tar.gz
[hadoop@localhost ~]$ cd /home/hadoop/hadoop-0.20.2
[hadoop@localhost hadoop-0.20.2]$ vim /home/hadoop/hadoop-0.20.2/conf/hadoop-env.sh
加入一行
代碼:
export JAVA_HOME=/usr
4.測試一下hadoop可否執行
二、測試
1. 測試 hadoop 命令
代碼:
[hadoop@localhost hadoop-0.20.2]$ bin/hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
namenode -format format the DFS filesystem
secondarynamenode run the DFS secondary namenode
namenode run the DFS namenode
datanode run a DFS datanode
dfsadmin run a DFS admin client
mradmin run a Map-Reduce admin client
fsck run a DFS filesystem checking utility
fs run a generic filesystem user client
balancer run a cluster balancing utility
jobtracker run the MapReduce job Tracker node
pipes run a Pipes job
tasktracker run a MapReduce task Tracker node
job manipulate MapReduce jobs
queue get information regarding JobQueues
version print the version
jar <jar> run a jar file
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME <src>* <dest> create a hadoop archive
daemonlog get/set the log level for each daemon
or
CLASSNAME run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
2. 測試 Local (Standalone) Mode
代碼:
[hadoop@localhost hadoop-0.20.2]$ mkdir input
[hadoop@localhost hadoop-0.20.2]$ cp conf/*.xml input
[hadoop@localhost hadoop-0.20.2]$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
10/07/09 01:56:24 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
10/07/09 01:56:24 INFO mapred.FileInputFormat: Total input paths to process : 5
10/07/09 01:56:25 INFO mapred.JobClient: Running job: job_local_0001
10/07/09 01:56:25 INFO mapred.FileInputFormat: Total input paths to process : 5
10/07/09 01:56:25 INFO mapred.MapTask: numReduceTasks: 1
10/07/09 01:56:25 INFO mapred.MapTask: io.sort.mb = 100
10/07/09 01:56:25 INFO mapred.MapTask: data buffer = 79691776/99614720
10/07/09 01:56:25 INFO mapred.MapTask: record buffer = 262144/327680
10/07/09 01:56:26 INFO mapred.MapTask: Starting flush of map output
10/07/09 01:56:26 INFO mapred.MapTask: Finished spill 0
10/07/09 01:56:26 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
10/07/09 01:56:26 INFO mapred.LocalJobRunner: file:/home/hadoop/hadoop-0.20.2/input/hadoop-policy.xml:0+4190
10/07/09 01:56:26 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
10/07/09 01:56:26 INFO mapred.MapTask: numReduceTasks: 1
10/07/09 01:56:26 INFO mapred.MapTask: io.sort.mb = 100
10/07/09 01:56:26 INFO mapred.JobClient: map 100% reduce 0%
10/07/09 01:56:26 INFO mapred.MapTask: data buffer = 79691776/99614720
10/07/09 01:56:26 INFO mapred.MapTask: record buffer = 262144/327680
10/07/09 01:56:26 INFO mapred.MapTask: Starting flush of map output
10/07/09 01:56:26 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
10/07/09 01:56:26 INFO mapred.LocalJobRunner: file:/home/hadoop/hadoop-0.20.2/input/capacity-scheduler.xml:0+3936
10/07/09 01:56:26 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000001_0' done.
10/07/09 01:56:26 INFO mapred.MapTask: numReduceTasks: 1
10/07/09 01:56:26 INFO mapred.MapTask: io.sort.mb = 100
10/07/09 01:56:26 INFO mapred.MapTask: data buffer = 79691776/99614720
10/07/09 01:56:26 INFO mapred.MapTask: record buffer = 262144/327680
10/07/09 01:56:26 INFO mapred.MapTask: Starting flush of map output
10/07/09 01:56:26 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000002_0 is done. And is in the process of commiting
10/07/09 01:56:26 INFO mapred.LocalJobRunner: file:/home/hadoop/hadoop-0.20.2/input/hdfs-site.xml:0+178
10/07/09 01:56:26 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000002_0' done.
10/07/09 01:56:26 INFO mapred.MapTask: numReduceTasks: 1
10/07/09 01:56:26 INFO mapred.MapTask: io.sort.mb = 100
10/07/09 01:56:27 INFO mapred.MapTask: data buffer = 79691776/99614720
10/07/09 01:56:27 INFO mapred.MapTask: record buffer = 262144/327680
10/07/09 01:56:27 INFO mapred.MapTask: Starting flush of map output
10/07/09 01:56:27 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000003_0 is done. And is in the process of commiting
10/07/09 01:56:27 INFO mapred.LocalJobRunner: file:/home/hadoop/hadoop-0.20.2/input/mapred-site.xml:0+178
10/07/09 01:56:27 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000003_0' done.
10/07/09 01:56:27 INFO mapred.MapTask: numReduceTasks: 1
10/07/09 01:56:27 INFO mapred.MapTask: io.sort.mb = 100
10/07/09 01:56:27 INFO mapred.MapTask: data buffer = 79691776/99614720
10/07/09 01:56:27 INFO mapred.MapTask: record buffer = 262144/327680
10/07/09 01:56:27 INFO mapred.MapTask: Starting flush of map output
10/07/09 01:56:27 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000004_0 is done. And is in the process of commiting
10/07/09 01:56:27 INFO mapred.LocalJobRunner: file:/home/hadoop/hadoop-0.20.2/input/core-site.xml:0+178
10/07/09 01:56:27 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000004_0' done.
10/07/09 01:56:27 INFO mapred.LocalJobRunner:
10/07/09 01:56:27 INFO mapred.Merger: Merging 5 sorted segments
10/07/09 01:56:27 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 21 bytes
10/07/09 01:56:27 INFO mapred.LocalJobRunner:
10/07/09 01:56:27 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
10/07/09 01:56:27 INFO mapred.LocalJobRunner:
10/07/09 01:56:27 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now
10/07/09 01:56:27 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to file:/home/hadoop/hadoop-0.20.2/grep-temp-635509426
10/07/09 01:56:27 INFO mapred.LocalJobRunner: reduce > reduce
10/07/09 01:56:27 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
10/07/09 01:56:28 INFO mapred.JobClient: map 100% reduce 100%
10/07/09 01:56:28 INFO mapred.JobClient: Job complete: job_local_0001
10/07/09 01:56:28 INFO mapred.JobClient: Counters: 13
10/07/09 01:56:28 INFO mapred.JobClient: FileSystemCounters
10/07/09 01:56:28 INFO mapred.JobClient: FILE_BYTES_READ=981793
10/07/09 01:56:28 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1030090
10/07/09 01:56:28 INFO mapred.JobClient: Map-Reduce Framework
10/07/09 01:56:28 INFO mapred.JobClient: Reduce input groups=1
10/07/09 01:56:28 INFO mapred.JobClient: Combine output records=1
10/07/09 01:56:28 INFO mapred.JobClient: Map input records=219
10/07/09 01:56:28 INFO mapred.JobClient: Reduce shuffle bytes=0
10/07/09 01:56:28 INFO mapred.JobClient: Reduce output records=1
10/07/09 01:56:28 INFO mapred.JobClient: Spilled Records=2
10/07/09 01:56:28 INFO mapred.JobClient: Map output bytes=17
10/07/09 01:56:28 INFO mapred.JobClient: Map input bytes=8660
10/07/09 01:56:28 INFO mapred.JobClient: Combine input records=1
10/07/09 01:56:28 INFO mapred.JobClient: Map output records=1
10/07/09 01:56:28 INFO mapred.JobClient: Reduce input records=1
10/07/09 01:56:28 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
10/07/09 01:56:28 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/07/09 01:56:28 INFO mapred.FileInputFormat: Total input paths to process : 1
10/07/09 01:56:28 INFO mapred.JobClient: Running job: job_local_0002
10/07/09 01:56:28 INFO mapred.FileInputFormat: Total input paths to process : 1
10/07/09 01:56:28 INFO mapred.MapTask: numReduceTasks: 1
10/07/09 01:56:28 INFO mapred.MapTask: io.sort.mb = 100
10/07/09 01:56:29 INFO mapred.MapTask: data buffer = 79691776/99614720
10/07/09 01:56:29 INFO mapred.MapTask: record buffer = 262144/327680
10/07/09 01:56:29 INFO mapred.MapTask: Starting flush of map output
10/07/09 01:56:29 INFO mapred.MapTask: Finished spill 0
10/07/09 01:56:29 INFO mapred.TaskRunner: Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
10/07/09 01:56:29 INFO mapred.LocalJobRunner: file:/home/hadoop/hadoop-0.20.2/grep-temp-635509426/part-00000:0+111
10/07/09 01:56:29 INFO mapred.TaskRunner: Task 'attempt_local_0002_m_000000_0' done.
10/07/09 01:56:29 INFO mapred.LocalJobRunner:
10/07/09 01:56:29 INFO mapred.Merger: Merging 1 sorted segments
10/07/09 01:56:29 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 21 bytes
10/07/09 01:56:29 INFO mapred.LocalJobRunner:
10/07/09 01:56:29 INFO mapred.TaskRunner: Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
10/07/09 01:56:29 INFO mapred.LocalJobRunner:
10/07/09 01:56:29 INFO mapred.TaskRunner: Task attempt_local_0002_r_000000_0 is allowed to commit now
10/07/09 01:56:29 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0002_r_000000_0' to file:/home/hadoop/hadoop-0.20.2/output
10/07/09 01:56:29 INFO mapred.LocalJobRunner: reduce > reduce
10/07/09 01:56:29 INFO mapred.TaskRunner: Task 'attempt_local_0002_r_000000_0' done.
10/07/09 01:56:30 INFO mapred.JobClient: map 100% reduce 100%
10/07/09 01:56:30 INFO mapred.JobClient: Job complete: job_local_0002
10/07/09 01:56:30 INFO mapred.JobClient: Counters: 13
10/07/09 01:56:30 INFO mapred.JobClient: FileSystemCounters
10/07/09 01:56:30 INFO mapred.JobClient: FILE_BYTES_READ=640315
10/07/09 01:56:30 INFO mapred.JobClient: FILE_BYTES_WRITTEN=683829
10/07/09 01:56:30 INFO mapred.JobClient: Map-Reduce Framework
10/07/09 01:56:30 INFO mapred.JobClient: Reduce input groups=1
10/07/09 01:56:30 INFO mapred.JobClient: Combine output records=0
10/07/09 01:56:30 INFO mapred.JobClient: Map input records=1
10/07/09 01:56:30 INFO mapred.JobClient: Reduce shuffle bytes=0
10/07/09 01:56:30 INFO mapred.JobClient: Reduce output records=1
10/07/09 01:56:30 INFO mapred.JobClient: Spilled Records=2
10/07/09 01:56:30 INFO mapred.JobClient: Map output bytes=17
10/07/09 01:56:30 INFO mapred.JobClient: Map input bytes=25
10/07/09 01:56:30 INFO mapred.JobClient: Combine input records=0
10/07/09 01:56:30 INFO mapred.JobClient: Map output records=1
10/07/09 01:56:30 INFO mapred.JobClient: Reduce input records=1
[hadoop@localhost hadoop-0.20.2]$
[hadoop@localhost hadoop-0.20.2]$ cat output/*
1 dfsadmin
3. 測試 Pseudo-Distributed Mode
此模式下每個 Hadoop daemon 執行在一個分離的 Java 程序中。
代碼:
[hadoop@localhost hadoop-0.20.2]$ vim conf/core-site.xml
改為
代碼:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
</configuration>
代碼:
[hadoop@localhost hadoop-0.20.2]$ vim conf/hdfs-site.xml
改為
代碼:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
代碼:
[hadoop@localhost hadoop-0.20.2]$ vim conf/mapred-site.xml
改為
代碼:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
格式化分散式檔案系統
代碼:
[hadoop@localhost hadoop-0.20.2]$ bin/hadoop namenode -format
10/07/09 01:58:43 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = localhost.localdomain/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/07/09 01:58:43 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop
10/07/09 01:58:43 INFO namenode.FSNamesystem: supergroup=supergroup
10/07/09 01:58:43 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/07/09 01:58:43 INFO common.Storage: Image file of size 96 saved in 0 seconds.
10/07/09 01:58:44 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
10/07/09 01:58:44 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1
************************************************************/
[hadoop@localhost hadoop-0.20.2]$
啟動 hadoop daemons
代碼:
[hadoop@localhost hadoop-0.20.2]$ bin/start-all.sh
starting namenode, logging to /home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-namenode-localhost.localdomain.out
localhost: starting datanode, logging to /home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-datanode-localhost.localdomain.out
localhost: starting secondarynamenode, logging to /home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-secondarynamenode-localhost.localdomain.out
starting jobtracker, logging to /home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-jobtracker-localhost.localdomain.out
localhost: starting tasktracker, logging to /home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-tasktracker-localhost.localdomain.out
[hadoop@localhost hadoop-0.20.2]$
log會輸出到 ${HADOOP_LOG_DIR} 目錄 (預設是 to ${HADOOP_HOME}/logs).
瀏覽 NameNode 和 JobTracker 網頁介面,預設為
NameNode
http://localhost:50070/

JobTracker
http://localhost:50030/


拷貝檔案到分散式檔案系統
代碼:
[hadoop@localhost hadoop-0.20.2]$ bin/hadoop fs -put conf input
一些範例
從分散式檔案系統拷貝檔案到本機檔案系統檢驗
代碼:
[hadoop@localhost hadoop-0.20.2]$ bin/hadoop fs -get output output
[hadoop@localhost hadoop-0.20.2]$ cat output/*
cat: output/output: Is a directory
1 dfsadmin
或在分散式檔案系統上檢驗輸出檔案
代碼:
[hadoop@localhost hadoop-0.20.2]$ bin/hadoop fs -cat output/*
cat: File does not exist: output/output
3 dfs.class
2 dfs.period
1 dfs.file
1 dfs.replication
1 dfs.servers
1 dfsadmin
1 dfsmetrics.log
停止 daemon 命令如下
代碼:
[hadoop@localhost hadoop-0.20.2]$ bin/stop-all.sh
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
4. 測試 Fully-Distributed Mode
請看 Hadoop Cluster Setup
http://hadoop.apache.org/common/docs/current/cluster_setup.html參考
[研究] 雲端軟體 Hadoop 0.20.2 安裝 (CentOS 5.5 x86)
http://forum.icst.org.tw/phpbb/viewtopic.php?t=18513[研究] 雲端軟體 Hadoop 0.20.2 安裝 (CentOS 5.4 x86)
http://forum.icst.org.tw/phpbb/viewtopic.php?f=10&t=17974(待續)