Saturday, March 17, 2012

Hadoop NameNode Failure

Scenario 1:

The HDFS fsimage and editlog is written into multiple places including a NFS mount.

A) NameNode Daemon Crash :
Solution:
Just restart the Namenode process

B) Host is Down where the Name Node is running.

Solution:

1. Start the namenode in a different host with a empty dfs.name.dir
2. Point the dfs.name.dir to the NFS mount where we have copy of the meta data.
OR
3. Use --importCheckpoint option while starting namenode after pointing fs.checkpoint.dir to checkpoint directory from Secondary NameNode
4. Change the fs.default.name to the backup host name URI and restart the cluster with all the slave IP's in slaves file.

Note - We may miss the edit that might have happened after the last checkpoint.



Scenario 2:

The HDFS fsimage is written into a single directory.

A ) NameNode Daemon Crash:
Solution : Unknown

B ) Host is down where the Name Node is running.

Solution:


1. Create a blank directory pointing to dfs.name.dir to directory in (1)
2. Start the Namenode with -importCheckpoint after pointing fs.checkpoint.dir to checkpoint directory from Secondary NameNode
3. Change the fs.default.name to the backup host name URI and restart the cluster with all the slave IP's in slaves file.

This way we would miss again the files edited after last checkpoint.

Change the fs.default.name to the backup host name URI

If we can change the IP of the new name node to the old name node IP.

It would be easier to fix.

HA Namenode

More info on this can be found here: https://ccp.cloudera.com/display/CDHDOC/CDH3+Deployment+on+a+Cluster

No comments: