Friday, 6 November 2009

More Hadoop Mysteries - order of initialisation


Hey out there! Still not tired of my Hadoop experiments? Not yet? That’s another one for you!


What’d you think the difference is between two snippets of code? Say, this:


SomeCodeWhichChangesConfig.initialise(getConf()); 
Job job = new Job(conf, "MyHadoopJob");

// ... setting the job details

if (!job.waitForCompletion(true))
{
System.err.println("FAILED, cannot continue");
}


… and this:


Job job = new Job(conf, "MyHadoopJob"); 

// ... setting the job details

SomeCodeWhichChangesConfig.initialise(getConf());
if (!job.waitForCompletion(true))
{
System.err.println("FAILED, cannot continue");
}


No difference, you say? Not quite right, sir: the difference is that whatever you do to conf after creating a job will have no further effect. That is, Job constructor apparently copies all the data and doesn’t link your copy of Configuration object with it’s copy. Brilliant, no?


(and I spent a couple of hours trying to understand why distributed cache works properly in one app and doesn’t work at all in another). So you know now. Be warned.