Thursday, May 15, 2014

Spark build error while loading RpcResponseHeaderProto

This error is caused by the inconsistent specification of protocol buffers between Spark and Hadoop-2.2.X.

Hadoop 2.2.X uses protobuf 2.5 and spark specifies protocol buffer version 2.4.1 to maintain compatability with Hadoop1.1.X which is the default if a Hadoop version is not specified.

You see this problem if you invoke:
[dc@unknown002314cd9054 spark-0.9.1]$ mvn -Dhadoop.version=2.3.0 -DskipTests clean package

[ERROR] error while loading RpcResponseHeaderProto, class file '/home/dc/.m2/repository/org/apache/hadoop/hadoop-common/2.3.0/hadoop-common-2.3.0.jar(org/apache/hadoop/ipc/protobuf/RpcHeaderProtos$RpcResponseHeaderProto.class)' is broken
(class java.lang.NullPointerException/null)
[WARNING] one warning found
[ERROR] one error found
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM .......................... SUCCESS [2.997s]
[INFO] Spark Project Core ................................ FAILURE [36.867s]
[INFO] Spark Project Bagel ............................... SKIPPED
[INFO] Spark Project GraphX .............................. SKIPPED
[INFO] Spark Project ML Library .......................... SKIPPED
[INFO] Spark Project Streaming ........................... SKIPPED
[INFO] Spark Project Tools ............................... SKIPPED
[INFO] Spark Project REPL ................................ SKIPPED
[INFO] Spark Project Assembly ............................ SKIPPED
[INFO] Spark Project External Twitter .................... SKIPPED
[INFO] Spark Project External Kafka ...................... SKIPPED
[INFO] Spark Project External Flume ...................... SKIPPED
[INFO] Spark Project External ZeroMQ ..................... SKIPPED
[INFO] Spark Project External MQTT ....................... SKIPPED
[INFO] Spark Project Examples ............................ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 40.588s
[INFO] Finished at: Thu May 15 10:56:50 PDT 2014
[INFO] Final Memory: 37M/504M



If you change the pom.xml 2.4.1
to 2.5.0 this error goes away. 

Or you can override the protobuf.version value in the mvn command line like: 
[dc@unknown002314cd9054 spark-0.9.1]$ mvn -Dhadoop.version=2.3.0 -DskipTests -Dprotobuf.version=2.5.0 clean package


Or you can use SBT: [dc@unknown002314cd9054 spark-0.9.1]SPARK_HADOOP_VERSION=2.3.0 sbt/sbt assembly


SBT is able to figure out the different protobuf versions and reconcile. Spark is not as in the pom.xml comment:

           In theory we need not directly depend on protobuf since Spark does not directly
           use it. However, when building with Hadoop/YARN 2.2 Maven doesn't correctly bump
           the protobuf version up from the one Mesos gives. For now we include this variable
           to explicitly bump the version when building with YARN. It would be nice to figure
           out why Maven can't resolve this correctly (like SBT does).


To verify the build is correct look under lib_managed and make sure the correct Hadoop jars are there:

hadoop-annotations-2.3.0.jar               jetty-util-6.1.26.jar                        velocity-1.7.jar
hadoop-auth-2.3.0.jar                      jetty-util-7.6.8.v20121106.jar               xmlenc-0.52.jar
hadoop-client-1.0.4.jar                    jline-0.9.94.jar                             xz-1.0.jar
hadoop-client-2.3.0.jar                    jline-2.10.3.jar                             zeromq-scala-binding_2.10-0.0.7.jar
hadoop-common-2.3.0.jar                    jna-3.0.9.jar                                zkclient-0.1.jar
hadoop-core-1.0.4.jar                      jnr-constants-0.8.2.jar                      zkclient-0.3.jar
hadoop-hdfs-2.3.0.jar                      jruby-complete-1.6.5.jar                     zookeeper-3.4.5.jar
hadoop-mapreduce-client-app-2.3.0.jar      json-simple-1.1.jar
hadoop-mapreduce-client-common-2.3.0.jar

We have hadoop-2.3.0. If you only see hadoop-1.X.X jars then your build did not pick up the Hadoop-2.x.x option. 





No comments:

Post a Comment