Sunday, July 31, 2011

Hadoop Pig Job Name in CommandLine

By Default the PIG job name shown in Job Portal is the name of the PIG Script.
If we want to give our custom namem we can do so as below:

set job.name '$job_name'
set pig.optimistic.files.concatenation 'true'
set pig.files.concatenation.threshold '100'
A = load '/home/hadoop/work/ABC/raw/' using PigStorage('\t');
STORE A INTO '/home/hadoop/work/ABC/merged/' using PigStorage('\t');

and then invoke the pig script as

pig -param job_name='Hello World' -stop_on_failure -x mapreduce merger.pig