CH 10 Setting up a Hadoop Cluster
How large should your cluster be? There isn't an exact answer to this question, but the beauty of Hadoop is that you can start with a small cluster (say, 10 nodes) and grow it as your storage and computational needs grow.In many ways, a better question is this:
how fast does your cluster need to grow? You can get a good feel for this by considering storage capacity.
For example, if your data grows by 1 TB a day and you have three-way HDFS replication, you need an additional 3 TB of raw storage per day.
Master node scenarios
Depending on the size of the cluster, there are various configurations for running the master daemons:
the namenode, secondary namenode, resource manager, and history server.
The namenode has high memory requirements, as it holds file and block metadata for the entire namespace in memory. The secondary namenode, although idle most of the time, has a comparable memory footprint to the primary when it create a checkpoint. For filesystems with a large number of files, there may not be enough physical memory on one machine to run both the primary and secondary namenode.
Cluster Setup and Installation
Creating UNIX User Accounts
cd /usr/local sudo tar xzf hadoop-x.y.z.tar.gz sudo chown -R hadoop:hadoop hadoop-x.y.z export HADOOP_HOME=/usr/local/hadoop-x.y.z export PATH=$PATH:$HADOOP/bin:$HADOOP/sbin
Formatting the HDFS Filesystem
a brand-new HDFS installation needs to be formatted.The formatting process creates an empty filesystem by creating the storage directories and the initial versions of the namenode's persistent data structures. Datanodes are not involved in the initial formatting process, since the namenode manage all of the filesystem's metadata, and datanodes can join or leave the cluster dynamically.
Formatting HDFS is a fast operation. Run:
hdfs namenode -format
Starting and Stopping the Daemons
The machine (or machines) that the namenode and secondary namenode run on is determined by interrogating the Hadoop configuration for their hostnames. For example, the script finds the namenode's hostname by executing the following:
% hdfs getconf -namenodes
By default, this finds the namenode's hostname from fs.defaultFSthe start-dfs.sh does the following:
Starts a namenode on each machine returned by excuting
hdfs getconf -namenodes
- Starts a datanode on each machine listed in the slaves file.
Starts a secondary namenode on each machine returned by executing
hdfs getconf -secondarynamenodes
The YARN daemons are started in a similar way, by running the following command as the yarn user on the machine hosting the resource manager:
More specifically, the script:
- Starts a resource manager on the local machine
- Starts a node manager on each machine listed in the slaves file
scripts to stop the daemons started by the corresponding start scripts.
Creating USER Directories
% hadoop fs -mkdir /user/steve% hadoop fs -chown steve:steve /user/steve
Hadoop configuration files
- hadoop-env.sh Environment variables that are used in the scripts to run Hadoop
- mapred-env.sh Environment variables that are used in the scripts to run MapReduce (overrides variables set in hadoop-env.sh )
- yarn-env.sh Environment variables that are used in the scripts to run YARN (overrides variable set in hadoop-env.sh)
- core-site.xml Configuration settings for Hadoop Core, such as I/O settings that are common to HDFS, MapReduce, and YARN
- hdfs-site.xml Configuration settings for HDFS daemons: the name node, the secondary namenode, and the datanodes
- mapred-site.xml Configuration settings for MapReduce daemons: job history server
- yarn-site.xml Configuration settings for YARN daemons: the resource manager, the web app proxy server, and the node managers
- slaves A list of machines (one per line) that each run a datanode and a node manager.
Important Hadoop Daemon Properties
<property> <name>fs.defaultFS</name> <value>hdfs://namenode/</value> </property>
<property> <name>dfs.namenode.name.dir</name> <value>/disk1/hdfs/name,/remote/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/disk1/hdfs/data,/disk2/hdfs/data</value> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>/disk1/hdfs/namesecondary,/disk2/hdfs/namesecondary</value> </property>
<property> <name>yarn.resourcemanager.hostname</name> <value>resourcemanager</value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/disk1/nm-local-dir,/disk2/nm-local-dir</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce.shuffle</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>16384</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>16</value> </property>
to run HDFS, you need to designate one machine as a namenode. In this case, the property fs.defaultFS is an HDFS filesystem URI whose host is the namenode's hostname or IP address and whose port is the port that namenode will listen on for RPCs. If no port is specified, the default port of 8020 used.
Benchmarking a Hadoop Cluster
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop...
CH 11 Administering Hadoop
The filesystem image and edit log
When a filesystem client performs a write operation (such as creating or moving a file), the transaction first record in the edit log. The namenode also has an in-memory representation of the filesystem metadata, which it updates after the edit log has been modified. The in-memory metadata is used to serve read requests.
Conceptually the edit log is a single entity, but it is represented as a number of files on disk. Each file is called a segment, and has the prefix edits and a suffix that indicates the transaction IDs contained in it. Only one file is open for writes at any one time (edits_inprogress_0000000000000000020 in the preceding example), and it is flushed and synced after every transaction before a success code is returned to the client. For namenodes that write to multiple directories, the write must be flushed and synced to every copy before returning successfully. This ensures that no transaction is lost due to machine failure.
As described, the edit log would grow without bound. Though this state of affairs would have no impact on the system while the namenode is running, if the namenode were restarted, it would take a long time to apply each of the transactions in its (very long ) edit log. During this time, the filesystem would be offline, which is generally undesirable.
Filesystem check (fsck)
Hadoop provides an *fsck*utility for checking the health of files in HDFS. The tool looks for blocks that are missing from all datanodes, as well as under- or over-replciated blocks. Here is an example of check the whole filesystem for a small cluster.
% hdfs fsck / ......................Status: HEALTHY Total size: 511799225 B Total dirs: 10 Total files: 22 Total blocks (validated): 22 (avg. block size 23263601 B) Minimally replicated blocks: 22 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 3.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 4 Number of racks: 1 The filesystem under path '/' is HEALTHY
fsck recursively walks the filesystem namespace, starting at given path (/), and checks the file it finds. It prints a dot for every file it checks. To check a file, fsck retrieves the metadata for the file's blocks and looks for problems or inconsistencies. Note that fsck retrieves all of its information from the namenode; it does not communicate with any datanodes to actually retrieve any block data.