Monday, June 9, 2014

Hive 0.13 Hadoop 2.4.0 getTaskLogUrl error

The org.apache.hadoop.mapreduce.util.HostUtil API changes between Hadoop 2.3.0 and Hadoop 2.4.0

The 2.3.0 version has 3 String parameters in function getTaskLogUrl

public static String getTaskLogUrl(String taskTrackerHostName,
    String httpPort, String taskAttemptID) {
    return (HttpConfig.getSchemePrefix() + taskTrackerHostName + ":" +
        httpPort + "/tasklog?attemptid=" + taskAttemptID);
  }

The 2.4.0 version has  4 string parameters....

 public static String getTaskLogUrl(String scheme, String taskTrackerHostName,
    String httpPort, String taskAttemptID) {
    return (scheme + taskTrackerHostName + ":" +
        httpPort + "/tasklog?attemptid=" + taskAttemptID);
  }


So you get a compile time error on number of invalid parameters when compiling Hadoop 2.4.0 and Hive 0.13 together. 

The developers know this... 

but if you want to benchmark Hive 0.13 on hadoop 2.4 and you cant wait....

I put in a null... in Hadoop23Shims.java

return HostUtil.getTaskLogUrl(taskTrackerHttpURL.getHost(),
        Integer.toString(taskTrackerHttpURL.getPort()),
        taskAttemptId,null);
    }

Better idea would be to create a unit tests for this function and make the mod and patch it up. But hard to tell if this isn't already done by the Hive devs. ....


No comments:

Post a Comment