Hadoop 2.2.X uses protobuf 2.5 and spark specifies protocol buffer version 2.4.1 to maintain compatability with Hadoop1.1.X which is the default if a Hadoop version is not specified.
You see this problem if you invoke:
[dc@unknown002314cd9054 spark-0.9.1]$ mvn -Dhadoop.version=2.3.0 -DskipTests clean package
[ERROR] error while loading RpcResponseHeaderProto, class file '/home/dc/.m2/repository/org/apache/hadoop/hadoop-common/2.3.0/hadoop-common-2.3.0.jar(org/apache/hadoop/ipc/protobuf/RpcHeaderProtos$RpcResponseHeaderProto.class)' is broken
(class java.lang.NullPointerException/null)
[WARNING] one warning found
[ERROR] one error found
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM .......................... SUCCESS [2.997s]
[INFO] Spark Project Core ................................ FAILURE [36.867s]
[INFO] Spark Project Bagel ............................... SKIPPED
[INFO] Spark Project GraphX .............................. SKIPPED
[INFO] Spark Project ML Library .......................... SKIPPED
[INFO] Spark Project Streaming ........................... SKIPPED
[INFO] Spark Project Tools ............................... SKIPPED
[INFO] Spark Project REPL ................................ SKIPPED
[INFO] Spark Project Assembly ............................ SKIPPED
[INFO] Spark Project External Twitter .................... SKIPPED
[INFO] Spark Project External Kafka ...................... SKIPPED
[INFO] Spark Project External Flume ...................... SKIPPED
[INFO] Spark Project External ZeroMQ ..................... SKIPPED
[INFO] Spark Project External MQTT ....................... SKIPPED
[INFO] Spark Project Examples ............................ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 40.588s
[INFO] Finished at: Thu May 15 10:56:50 PDT 2014
[INFO] Final Memory: 37M/504M
If you change the pom.xml
to 2.5.0 this error goes away.
Or you can override the protobuf.version value in the mvn command line like:
[dc@unknown002314cd9054 spark-0.9.1]$ mvn -Dhadoop.version=2.3.0 -DskipTests -Dprotobuf.version=2.5.0 clean package
Or you can use SBT: [dc@unknown002314cd9054 spark-0.9.1]SPARK_HADOOP_VERSION=2.3.0 sbt/sbt assembly
SBT is able to figure out the different protobuf versions and reconcile. Spark is not as in the pom.xml comment:
In theory we need not directly depend on protobuf since Spark does not directly
use it. However, when building with Hadoop/YARN 2.2 Maven doesn't correctly bump
the protobuf version up from the one Mesos gives. For now we include this variable
to explicitly bump the version when building with YARN. It would be nice to figure
out why Maven can't resolve this correctly (like SBT does).
use it. However, when building with Hadoop/YARN 2.2 Maven doesn't correctly bump
the protobuf version up from the one Mesos gives. For now we include this variable
to explicitly bump the version when building with YARN. It would be nice to figure
out why Maven can't resolve this correctly (like SBT does).
To verify the build is correct look under lib_managed and make sure the correct Hadoop jars are there:
hadoop-annotations-2.3.0.jar jetty-util-6.1.26.jar velocity-1.7.jar
hadoop-auth-2.3.0.jar jetty-util-7.6.8.v20121106.jar xmlenc-0.52.jar
hadoop-client-1.0.4.jar jline-0.9.94.jar xz-1.0.jar
hadoop-client-2.3.0.jar jline-2.10.3.jar zeromq-scala-binding_2.10-0.0.7.jar
hadoop-common-2.3.0.jar jna-3.0.9.jar zkclient-0.1.jar
hadoop-core-1.0.4.jar jnr-constants-0.8.2.jar zkclient-0.3.jar
hadoop-hdfs-2.3.0.jar jruby-complete-1.6.5.jar zookeeper-3.4.5.jar
hadoop-mapreduce-client-app-2.3.0.jar json-simple-1.1.jar
hadoop-mapreduce-client-common-2.3.0.jar
We have hadoop-2.3.0. If you only see hadoop-1.X.X jars then your build did not pick up the Hadoop-2.x.x option.
No comments:
Post a Comment