Wednesday, June 22, 2011

File Append in HDFS

HDFS files are immutable.


For Experimental Purpose Follow two simple steps:

1. Add a new property in hdfs-site.xml
dfs.support.append = true. By Default it is always false.

2.

import java.io.*;
import java.net.URI;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
public class HadoopFileWriter {
public static void main (String [] args) throws Exception{
try{
URI uri = new URI("hadoop path data url");
Path pt=new Path(uri);
FileSystem fs = FileSystem.get(new Configuration());
BufferedWriter br;
if(fs.isFile(pt)){
br=new BufferedWriter(new OutputStreamWriter(fs.append(pt)));
br.newLine();
}else{
br=new BufferedWriter(new OutputStreamWriter(fs.create(pt,true)));
}
String line = args[0];
System.out.println(line);
br.write(line);
br.close();
}catch(Exception e){
e.printStackTrace();
System.out.println("File not found");
}
}
}

But the current scenario is :

- 0.20.x (includes release 0.20.2)
There are known bugs in append. The bugs may cause data loss.

- 0.20-append
There were effort on fixing the known append bugs but there are no releases. I
heard Facebook was using it (with additional patches?) in production but I did
not have the details.

- 0.21
It has a new append design (HDFS-265). However, the 0.21.0 release is only a
minor release. It has not undergone testing at scale and should not be
considered stable or suitable for production. Also, 0.21 development has been
discontinued. Newly discovered bugs may not be fixed.

- 0.22, 0.23
Not yet released.

Hope this helps.

No comments: