Friday, December 9, 2011

Apache Bigtop Build using Ubuntu 10.04 LTS

Apache BigTop Lab #2 Building BigTop from source.

This build using the make all command builds sdeb, tar.gz and deb package files for the Hadoop components included in Apache BigTop.

The bigtop build uses a make file which is contained in 3 files, Makefile, bigtop.mk and package.mk.

Before we talk about Make files, a small digression. There are different flavors of Linux distributions where each has a separate system for package management. In general there are three:
1) tar.gz, these archive files can be installed on any linux distribution. Typically they are just unarchived and present a file directory to the user without any additional configuration. The other packages, rpm, deb work with other installation and dependency management components.
2) rpm. Created by Redhat, used in Fedora and OpenSUSE. OpenSUSE uses YaST and the Zypper command line program to manage dependencies so when you download and install package A it knows to download and install dependencies also. The package manager which resolves dependencies relies on the RPM database which lives in /var/lib/rpm on the system you are installing on.
3) deb. Used in Ubuntu. Debian type systems install packages using *.deb files. To install a package you would type dpkg –i programname.deb.

You can use Alien to convert between the different types of files, e.g. from rpm to deb and vice versa. I don’t know if using Alien to convert between formats is equivalent to using the different types of packages for the Hadoop components generated by Bigtop.

OK, back to Make files. Make files are a collection of variable definitions and Make file Rules.
A make file rule consists of a target, a prerequisite and a set of commands.
Targets and prerequisites are separated by a colon. There is a space between the prerequisite command and colon. The commands are tab delimited.
target: prerequisites
[TAB]command

Here is a sample target “deb” from Makefile

deb: $(TARGETS_DEB)

For the target deb which is run using the command make deb, the target is deb. And the prerequisites are $(TARGETS_DEB). There are no commands.

The variable $(TARGETS_DEB) is defined in packages.mk as
TARGETS_DEB += $(1)-deb

The command above expands TARGETS_DEB into more targets which you can figure out using @echo $(TARGETS_DEB)

TARGETS_DEB=hadoop-deb zookeeper-deb hbase-deb pig-deb hive-deb oozie-deb whirr-deb mahout-deb flume-deb bigtop-utils-deb

This target expands into a series of 10 subtargets which rnage from hadoop-deb to bigtop-utils-deb. Searching for hadoop-deb leads to:

Make is recursive. When Make encounters a TARGET it looks at the prerequisite of the target and recursively searches for a TARGET with no prerequisite. When it finds the first TARGET with no prerequisites it can run that set of commands for that TARGET. In the example above we have to look for target haoop-deb and see what prerequisites exist for TARGET hadoop-deb. Then we can find the first target the make command will execute.

One linux command which is particularly useful is learning how to recursively search inside files for strings.

find "path to search" -type f -exec grep -i "phrase to find" {} \; -print

To find the string hadoop-deb use:

find . -type f -exec grep -i hadoop-deb {} \; -print



Download bigtop from the download link on the public incubator website: http://incubator.apache.org/bigtop/ Make sure to pick the latest stable release. The bigtop release can be built following the README.

1) Start an AMI on AWS. I used








I also made this a large AMI to get more than 2GB memory to repeat Lab 1 on the generated deb file as a verification step.


2) Download apache-incubating from stable release
3) Install rpmbuild
sudo apt-get install rpm
4) Install forrest 0.8. If you use the current 0.9 version you will get a build error.
5) Install a JDK6 and JDK5. Set JAVA32, JAVA64, JAVA__HOME and JAVA5_HOME. JAVA5_HOME is used for apache forrest. JAVA_HOME can be set to the same path as JAVA64.

Here is a copy of my .bashrc.


Run source .bashrc after modifying the environment variables. Type echo $JAVA_HOME to verify your variables are really set if you want to be extra sure. All of the environment variables below are set for a bigtop build on the above AMI.


6) Change permissions on bin/forrest subdirectory to include execute. Chmod 555 /bin/forrest and chmod 555 bin/ant chmod 555 ant, etc…or you get errors with HIVE.
7) Remove Maven 2: sudo apt-get remove maven2
8) Install Maven 3.0.1 vs maven 2.2.1 or you get errors wihth oozie. Make sure to remove Maven 2.2.1 or install Maven 3.0.X over Maven 2.2.1 or order the PATHS so Maven 3.0.X gets called before Maven 2.2.1.
9) sudo apt-get install liblzo2-dev sharutils libfuse-dev
10) Install screen if you want to leave the process running and not have to leave the terminal window open
11) >make all
The bigtop output files, tar.gz, .rpm and .deb are stored under ~../output
Ubuntu is a debian type distribution so making .deb files from an ubuntu instance seems reasonable.
12) Verify the bigtop archive files are created using
sudo updatedb
locate *.deb, locate *.tar.gz, locate *.rpm
You should see the bigtop debian files:

The other files created on a make all command are srpms and tar.gz files.















13) Create a ssh key using ssh-keygen dsa –P ‘’
14) Install the hadoop-deb using sudo dpkg –I xxx.deb
15) Modify the hadoop-env.sh file to include JAVA_HOME
16) Configure core-site.xml, hdfs-site.xml, mapred-site.xml. there are multiple copies of these 3 files on your system. Make sure you modify the set which corresponds to the start-all.sh command you are going to invoke to start the cluster
17) Format the Namenode
18) run the dfs.start-all command (format the namenode) and test if you can r/w to the hadoop namespace
19) Start the job tracker and tastracker for mapreduce. Start the map reduce daemons using start-mapreduce.sh.
20) Run the pi example in the earlier lab to verify MR works

No comments:

Post a Comment