Friday, October 26, 2012

Twitter Storm Installation Example

As of 10/2012 the current instructions from a Web Search for Twitter Storm didn't work.

You are supposed to use JDK 1.6.X but I left tthe 1.7.X in place. Apache Hadoop components have bugs with other JDKs besides 1.6.x, this is mostly for Apache Zookeeper compatibility. I have no information on the other Storm components and how the JDKs are compatible with different versions of clojure.

 [dc@vivian-y1639vf3 ~]$ java -version
java version "1.7.0_07"
Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)



Remove JDK 7 and OpenJDK. While these JDKs may work there are known issues with OpenJDK and  Hadoop. Since Zookeeper is a Hadoop component it wasn't worth the time investment to figure out if they were compatible or not. Same with JDK7.


Download java-6u34-linux-x64.bin,
>sudo chmod 777 java-6u-linux-x64.bin
>./java-6u34-linux-x64.bin

Most Hadoop components only run in JDK-6 not JDK 7 which is the more recent versions. Zookeeper is a hadoop related component, it may work in JDK7 but the core Hadoop programs like HDFS haven't been debugged on JDK 7 yet. Most hadoop related components also aren't guaranteed to run on OpenJDK. There are known issues with OpenJDK which nobody is actively working on debugging.

Once Java 6 JDK is installed then edit .bashrc to set the environment variables
>cd
>nano .bashrc
If nano isnt installed,
>sudo yum install nano
Edit .bashrc to add JAVA_HOME and add $JAVA_HOME to PATH

My .bashrc file looks like:
[dc@vivian-y1639vf3 ~]$ cat .bashrc

# .bashrc



# Source global definitions

if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# User specific aliases and functions
export MONGO_HOME=/home/dc/mongodb-linux-x86_64-2.2.0
export JAVA_HOME=/home/dc/jdk1.6.0_34
#export STORM_HOME=/home/dc/storm-0.8.1
export PATH=$PATH:$PATH/bin:$JAVA_HOME/bin

Read the Google Groups for updates on Storm installs. Fortunately there was one. 

Lets see if the Storm packaging works.


says to install zookeeper-cluster from CDH 4, which is here;

I use Apache Bigtop instead which is an alternate process for installing Zookeeper: Use the bigtop-0.5.0 directions here.

Run this command: >sudo wget -O /etc/yum.repos.d/bigtop.repo http://archive.apache.org/dist/bigtop/bigtop-0.5.0/repos/centos6/bigtop.repo


You should see a new repo file in /etc/yum.repos.d/bigtop.repo which was downloaded after the above command.


To test we can install zookeeper run:
> sudo yum update
You don't have to say Y after the update. Say N and your bigtop.repo is now registered with the yum  service.
>yum search zookeeper

============================ N/S Matched: zookeeper ============================
zookeeper-server.noarch : The Hadoop Zookeeper server
zookeeper.noarch : A high-performance coordination service for distributed
                 : applications.

Install zookeeper using: 

>sudo yum install zookeeper-server.noarch zookeeper.noarch



On centos we can use RPM or YUM to install. Use YUM since this takes care of the dependencies if there are any.

YUM works using files ending in .repo. The convention is to store all repo files for centos under /etc/yum.repos.d/ for all s/w packages. These repo files describe the location of the URL of where to download the s/w and when you do a sudo yum update the yum program reads all the repo files in this directory and assumes all of these programs are installed onto the system. The yum program also adds updates from the repo URL when they are available. From the above link, the CDH 4 link, the repo file for CDH4 of which zookeeper is one component should look like:

Click on the Add the CDH 4 repository under the heading On Red Hat-compatible Systems





Then you should see:






Click on the RedHat/CentOS 6 link under the heading To add the CDH4 repository:


Showing this:
[cloudera-cdh4]
name=Cloudera's Distribution for Hadoop, Version 4
baseurl=http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/4/
gpgkey = http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera    
gpgcheck = 1

If all the above screenshots can't be read, just cut and paste the above contents into a file cdh.repo and store it under /etc/yum.repos.d/cdh.repo

Then do
> sudo yum update
system will do a lot of stuff, downloading, updating system, just say Y to the prompts

after done in many minutes, verify zookeeper-server is there using

>yum search zookeeper-server
you should see a verification you have this:

[dc@vivian-y1639vf3 ~]$ yum search zookeeper-server
Loaded plugins: fastestmirror, refresh-packagekit, security
Loading mirror speeds from cached hostfile
* base: mirror.nwresd.org
* extras: centos.mirror.lstn.net
* updates: centos.mirrors.hoobly.com
======================== N/S Matched: zookeeper-server =========================
zookeeper-server.noarch : The Hadoop Zookeeper server
Name and summary matches only, use "search all" for everything.
[dc@vivian-y1639vf3 ~]$



OK, now install zookeeper-server.

>sudo yum install zookeeper-server

and prompt Y when it comes back asking.

Should see below:

Is this ok [y/N]: y

Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
Installing : bigtop-utils-0.4+352-1.cdh4.1.0.p0.28.el6.noarch 1/3
Installing : zookeeper-3.4.3+25-1.cdh4.1.0.p0.28.el6.noarch 2/3
Installing : zookeeper-server-3.4.3+25-1.cdh4.1.0.p0.28.el6.noarch 3/3
Verifying : zookeeper-3.4.3+25-1.cdh4.1.0.p0.28.el6.noarch 1/3
Verifying : bigtop-utils-0.4+352-1.cdh4.1.0.p0.28.el6.noarch 2/3
Verifying : zookeeper-server-3.4.3+25-1.cdh4.1.0.p0.28.el6.noarch 3/3



Installed:
zookeeper-server.noarch 0:3.4.3+25-1.cdh4.1.0.p0.28.el6
Dependency Installed:
bigtop-utils.noarch 0:0.4+352-1.cdh4.1.0.p0.28.el6
zookeeper.noarch 0:3.4.3+25-1.cdh4.1.0.p0.28.el6
Complete!

[dc@vivian-y1639vf3 ~]$


OK lets test zookeper first .Usually these Hadoop components work out of the box but they may require some configuration parameters to be modified first.

One of the advantages of using the Cloudera CDH is they do all the compatibility for us. We know zookeeper works with all the other components like Hadoop, bigtop, etc.. .we don't care about this b/c we are just going to use Zookeeper by itself with storm.

One of the confusing parts using the CDH or any YUM based installer is there are certain conventions which nobody really explains. The config files for a YUM component usually go under /etc. For zookeeper they are under /etc/zookeeper/conf

There are 4 files;

[dc@vivian-y1639vf3 conf]$ ls
configuration.xsl log4j.properties zoo.cfg zoo_sample.cfg
[dc@vivian-y1639vf3 conf]$


OK this is good. log4j.properties is the standard logger file for Java programs where it explains the formatting and location. We don't need to mess with this. Zoo.cfg is important. The last line in that file specified port 2181 for the default client port. We need to connect to it to make sure our zookeeper server works and we may have to configure storm to make sure it connects here. This is for 1 zookeeper instance, not sure if storm is happy with only 1. Usually zookeeper servers run in a group so if one fails the others still provide state data for the rest of the cluster. It is designed to be a single point of failure to make the administration of the cluster easier.

If storm requires a cluster of zookeeper servers in an ensemble then we can set that up using these instructions: http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html This is out of scope for now.

OK lets try starting zookeeper. The convention for CDH installs is to use
>sudo service zookeeper-server start

OK I get an error like this;

[dc@vivian-y1639vf3 ~]$ sudo service zookeeper-server start
[sudo] password for dc:
JMX enabled by default
Using config: /etc/zookeeper/conf/zoo.cfg
ZooKeeper data directory is missing at /var/lib/zookeeper fix the path or run initialize


Time to do a websearch, run the server init first. Didnt know that.

[dc@vivian-y1639vf3 ~]$ sudo service zookeeper-server init
No myid provided, be sure to specify it in /var/lib/zookeeper/myid if using non-standalone

OK another error. Another web search. Web search kinda vague, just create a myid file and see what happens.

[dc@vivian-y1639vf3 ~]$ sudo nano /var/lib/zookeeper/myid


I just entered an integer, 1 and saved the file. This integer corresponds to the zookeeper conf.dist file which is the zookeeper config file in distributed  mode where you want to run more than 1 zookeeper. Here is an example of a cluster of 4 zookeepers:


maxClientCnxns=50
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=/var/lib/zookeeper
# the port at which the clients will connect
clientPort=2181
server.4=172.16.144.252:2888:3888
server.3=172.16.144.251:2888:3888
server.2=172.16.144.250:2888:3888
server.1=172.16.144.249:2888:3888


NOTE: The server.1 setting refers to the 1 in the myid setting you just created. 

Start again:
[dc@vivian-y1639vf3 ~]$ sudo service zookeeper-server start
JMX enabled by default
Using config: /etc/zookeeper/conf/zoo.cfg
Starting zookeeper ... STARTED



OK server looks to be up. We should do 2 things for verification which exists for all Hadoop components:
  1. look at the logs to make sure there is nothing funky going on.
  2. See if there is a HTTP port where there is an administration interface to show everything is good.

By convention all logs in a YUM installed component are under /var/log, in zookeeper's case we have 2 logs, zookeeper.log and zookeeper.out:

[dc@vivian-y1639vf3 ~]$ ls /var/log/zookeeper

zookeeper.log zookeeper.out

[dc@vivian-y1639vf3 ~]$


Lets just see what is in them:

[dc@vivian-y1639vf3 ~]$ cat /var/log/zookeeper/zookeeper.log
2012-10-05 22:38:29,743 [myid:] - INFO [main:QuorumPeerConfig@101] - Reading configuration from: /etc/zookeeper/conf/zoo.cfg
2012-10-05 22:38:29,758 [myid:] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2012-10-05 22:38:29,758 [myid:] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
2012-10-05 22:38:29,759 [myid:] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2012-10-05 22:38:29,759 [myid:] - WARN [main:QuorumPeerMain@118] - Either no config or no quorum defined in config, running in standalone mode
2012-10-05 22:38:29,775 [myid:] - INFO [main:QuorumPeerConfig@101] - Reading configuration from: /etc/zookeeper/conf/zoo.cfg
2012-10-05 22:38:29,776 [myid:] - INFO [main:ZooKeeperServerMain@100] - Starting server
2012-10-05 22:38:29,792 [myid:] - INFO [main:Environment@100] - Server environment:zookeeper.version=3.4.3-cdh4.1.0--1, built on 09/29/2012 17:54 GMT
2012-10-05 22:38:29,794 [myid:] - INFO [main:Environment@100] - Server environment:host.name=vivian-y1639vf3
2012-10-05 22:38:29,795 [myid:] - INFO [main:Environment@100] - Server environment:java.version=1.7.0_07
2012-10-05 22:38:29,795 [myid:] - INFO [main:Environment@100] - Server environment:java.vendor=Oracle Corporation
2012-10-05 22:38:29,795 [myid:] - INFO [main:Environment@100] - Server environment:java.home=/usr/java/jre1.7.0_07
2012-10-05 22:38:29,796 [myid:] - INFO [main:Environment@100] - Server environment:java.class.path=/usr/lib/zookeeper/bin/../build/classes:/usr/lib/zookeeper/bin/../build/lib/*.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/lib/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/lib/zookeeper/bin/../lib/netty-3.2.2.Final.jar:/usr/lib/zookeeper/bin/../lib/log4j-1.2.15.jar:/usr/lib/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/lib/zookeeper/bin/../zookeeper-3.4.3-cdh4.1.0.jar:/usr/lib/zookeeper/bin/../src/java/lib/*.jar:/etc/zookeeper/conf::/etc/zookeeper/conf:/usr/lib/zookeeper/zookeeper.jar:/usr/lib/zookeeper/zookeeper-3.4.3-cdh4.1.0.jar:/usr/lib/zookeeper/lib/log4j-1.2.15.jar:/usr/lib/zookeeper/lib/jline-0.9.94.jar:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/zookeeper/lib/slf4j-api-1.6.1.jar:/usr/lib/zookeeper/lib/netty-3.2.2.Final.jar
2012-10-05 22:38:29,796 [myid:] - INFO [main:Environment@100] - Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2012-10-05 22:38:29,797 [myid:] - INFO [main:Environment@100] - Server environment:java.io.tmpdir=/tmp
2012-10-05 22:38:29,797 [myid:] - INFO [main:Environment@100] - Server environment:java.compiler=
2012-10-05 22:38:29,798 [myid:] - INFO [main:Environment@100] - Server environment:os.name=Linux
2012-10-05 22:38:29,798 [myid:] - INFO [main:Environment@100] - Server environment:os.arch=amd64
2012-10-05 22:38:29,799 [myid:] - INFO [main:Environment@100] - Server environment:os.version=2.6.32-279.9.1.el6.x86_64
2012-10-05 22:38:29,799 [myid:] - INFO [main:Environment@100] - Server environment:user.name=zookeeper
2012-10-05 22:38:29,800 [myid:] - INFO [main:Environment@100] - Server environment:user.home=/var/run/zookeeper
2012-10-05 22:38:29,800 [myid:] - INFO [main:Environment@100] - Server environment:user.dir=/
2012-10-05 22:38:29,808 [myid:] - INFO [main:ZooKeeperServer@726] - tickTime set to 2000
2012-10-05 22:38:29,808 [myid:] - INFO [main:ZooKeeperServer@735] - minSessionTimeout set to -1
2012-10-05 22:38:29,809 [myid:] - INFO [main:ZooKeeperServer@744] - maxSessionTimeout set to -1
2012-10-05 22:38:29,844 [myid:] - INFO [main:NIOServerCnxnFactory@99] - binding to port 0.0.0.0/0.0.0.0:2181
2012-10-05 22:38:29,859 [myid:] - INFO [main:FileTxnSnapLog@270] - Snapshotting: 0x0 to /var/lib/zookeeper/version-2/snapshot.0

[dc@vivian-y1639vf3 ~]$


Looks OK, no ERROR messages, just INFO. The hadoop convention is to use ERROR if something is wrong.

How about the other one?

[dc@vivian-y1639vf3 ~]$ cat /var/log/zookeeper/zookeeper.out
[dc@vivian-y1639vf3 ~]$


blank, ok logs are good. Now lets see if we can find an admin interface.


looks like there is no admin web page but there is a client called zkCli.sh you can use to test the connection as described in the link above

cd to /usr/lib/zookeeper

>bin/zkCli.sh -server localhost:2181

OK this makes sense b/c the earlier zoo.cfg file we saw port 2181 as the last entry. Lets try the above command. We dont have a bin directory b/c we used the YUM program to install instead of downloading zookeeper directly

[dc@vivian-y1639vf3 ~]$ /usr/lib/zookeeper/bin/zkCli.sh -server localhost:2181

Connecting to localhost:2181

2012-10-05 22:52:17,904 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.3-cdh4.1.0--1, built on 09/29/2012 17:54 GMT
2012-10-05 22:52:17,908 [myid:] - INFO [main:Environment@100] - Client environment:host.name=vivian-y1639vf3
2012-10-05 22:52:17,908 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.6.0_34
2012-10-05 22:52:17,909 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Sun Microsystems Inc.
2012-10-05 22:52:17,909 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/home/dc/jdk1.6.0_34/jre
2012-10-05 22:52:17,909 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/usr/lib/zookeeper/bin/../build/classes:/usr/lib/zookeeper/bin/../build/lib/*.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/lib/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/lib/zookeeper/bin/../lib/netty-3.2.2.Final.jar:/usr/lib/zookeeper/bin/../lib/log4j-1.2.15.jar:/usr/lib/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/lib/zookeeper/bin/../zookeeper-3.4.3-cdh4.1.0.jar:/usr/lib/zookeeper/bin/../src/java/lib/*.jar:/usr/lib/zookeeper/bin/../conf:
2012-10-05 22:52:17,910 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/home/dc/jdk1.6.0_34/jre/lib/amd64/server:/home/dc/jdk1.6.0_34/jre/lib/amd64:/home/dc/jdk1.6.0_34/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2012-10-05 22:52:17,910 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp

2012-10-05 22:52:17,911 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=
2012-10-05 22:52:17,911 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux
2012-10-05 22:52:17,912 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64
2012-10-05 22:52:17,912 [myid:] - INFO [main:Environment@100] - Client environment:os.version=2.6.32-279.9.1.el6.x86_64
2012-10-05 22:52:17,912 [myid:] - INFO [main:Environment@100] - Client environment:user.name=dc
2012-10-05 22:52:17,913 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/home/dc
2012-10-05 22:52:17,913 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/home/dc
2012-10-05 22:52:17,915 [myid:] - INFO [main:ZooKeeper@433] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@3dac2f9c
Welcome to ZooKeeper!

2012-10-05 22:52:18,143 [myid:] - INFO [main-SendThread(localhost.localdomain:2181):ClientCnxn$SendThread@958] - Opening socket connection to server localhost.localdomain/127.0.0.1:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)

JLine support is enabled

2012-10-05 22:52:18,213 [myid:] - INFO [main-SendThread(localhost.localdomain:2181):ClientCnxn$SendThread@850] - Socket connection established to localhost.localdomain/127.0.0.1:2181, initiating session

[zk: localhost:2181(CONNECTING) 0] 2012-10-05 22:52:18,588 [myid:] - INFO [main-SendThread(localhost.localdomain:2181):ClientCnxn$SendThread@1187] - Session establishment complete on server localhost.localdomain/127.0.0.1:2181, sessionid = 0x13a3494fb6a0000, negotiated timeout = 30000



WATCHER::



WatchedEvent state:SyncConnected type:None path:null



[zk: localhost:2181(CONNECTED) 0]

OK type help to make sure the client interface to the server still works...

ZooKeeper -server host:port cmd args

connect host:port
get path [watch]
ls path [watch]
set path data [version]
rmr path
delquota [-n|-b] path
quit
printwatches on|off
create [-s] [-e] path data acl
stat path [watch]
close
ls2 path [watch]
history
listquota path
setAcl path acl
getAcl path
sync path
redo cmdno
addauth scheme auth
delete path [version]
setquota -n|-b val path
[zk: localhost:2181(CONNECTED) 1]



OK good enough I think. On to storm. ….

Back to the original link for storm,

ok download the storm package and follow 2) on the link;

OK this is strange. The fact you have to become root user. For correctly built packages on linux you shouldn't have to do this. Looks like this isn't a debugged packager like what RPM or YUM are, there are a bunch of scripts you are supposed to run to change users, groups, etc.. Something funny. Is ok we can try it his way. The problem with root install is only root can access these programs.

Install uuid first.

[dc@vivian-y1639vf3 storm-installer-0.8.0_1.el6.x86_64]$ yum search uuid
Loaded plugins: fastestmirror, refresh-packagekit, security
Loading mirror speeds from cached hostfile
* base: mirror.nwresd.org
* extras: centos.mirror.lstn.net
* updates: centos.mirrors.hoobly.com
============================== N/S Matched: uuid ===============================
uuidd.x86_64 : Helper daemon to guarantee uniqueness of time-based UUIDs
libuuid.i686 : Universally unique ID library
libuuid.x86_64 : Universally unique ID library
libuuid-devel.i686 : Universally unique ID library
libuuid-devel.x86_64 : Universally unique ID library
uuid.i686 : Universally Unique Identifier library
uuid.x86_64 : Universally Unique Identifier library
uuid-c++.i686 : C++ support for Universally Unique Identifier library
uuid-c++.x86_64 : C++ support for Universally Unique Identifier library
uuid-c++-devel.i686 : C++ development support for Universally Unique Identifier
: library
uuid-c++-devel.x86_64 : C++ development support for Universally Unique
: Identifier library
uuid-dce.i686 : DCE support for Universally Unique Identifier library
uuid-dce.x86_64 : DCE support for Universally Unique Identifier library
uuid-dce-devel.i686 : DCE development support for Universally Unique Identifier
: library
uuid-dce-devel.x86_64 : DCE development support for Universally Unique
: Identifier library
uuid-devel.i686 : Development support for Universally Unique Identifier library
uuid-devel.x86_64 : Development support for Universally Unique Identifier
: library
uuid-perl.x86_64 : Perl support for Universally Unique Identifier library
uuid-pgsql.x86_64 : PostgreSQL support for Universally Unique Identifier library
uuid-php.x86_64 : PHP support for Universally Unique Identifier library



Name and summary matches only, use "search all" for everything.

[dc@vivian-y1639vf3 storm-installer-0.8.0_1.el6.x86_64]$ sudo yum install uuid.x86_64
[sudo] password for dc:
Loaded plugins: fastestmirror, refresh-packagekit, security
Loading mirror speeds from cached hostfile
* base: mirrors.arpnetworks.com
* extras: mirror.stanford.edu
* updates: mirrors.xmission.com
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package uuid.x86_64 0:1.6.1-10.el6 will be installed
--> Finished Dependency Resolution



Dependencies Resolved
===============================================================================================
Package Arch Version Repository Size
===============================================================================================

Installing:
uuid x86_64 1.6.1-10.el6 base 54 k
Transaction Summary
===============================================================================================

Install 1 Package(s)



Total download size: 54 k
Installed size: 113 k
Is this ok [y/N]: y
Downloading Packages:
uuid-1.6.1-10.el6.x86_64.rpm | 54 kB 00:00
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
Installing : uuid-1.6.1-10.el6.x86_64 1/1
Verifying : uuid-1.6.1-10.el6.x86_64 1/1



Installed:
uuid.x86_64 0:1.6.1-10.el6

Complete!
[dc@vivian-y1639vf3 storm-installer-0.8.0_1.el6.x86_
su -
we are now in the root directory, go back to where our downloads are.
>cd /home/dc/Downloads
>cd storm-installer-0.8.0_1.el6.x86_64
OK following the instructtions from the webpage:
[root@vivian-y1639vf3 storm-installer-0.8.0_1.el6.x86_64]# rpm -ivh zeromq-2.1.7-1.el6.x86_64.rpm
Preparing... ########################################### [100%]
1:zeromq ########################################### [100%]
[root@vivian-y1639vf3 storm-installer-0.8.0_1.el6.x86_64]# rpm -ivh zeromq-devel-2.1.7-1.el6.x86_64.rpm
Preparing... ########################################### [100%]
1:zeromq-devel ########################################### [100%]
[root@vivian-y1639vf3 storm-installer-0.8.0_1.el6.x86_64]# rpm -ivh jzmq-2.1.0-1.el6.x86_64.rpm
Preparing... ########################################### [100%]
1:jzmq ########################################### [100%]
[root@vivian-y1639vf3 storm-installer-0.8.0_1.el6.x86_64]# rpm -ivh storm-0.8.0-1.el6.x86_64.rpm
Preparing... ########################################### [100%]
1:storm ########################################### [100%]
[root@vivian-y1639vf3 storm-installer-0.8.0_1.el6.x86_64]# rpm -ivh storm-service-0.8.0-1.el6.x86_64.rpm
Preparing... ########################################### [100%]
1:storm-service ########################################### [100%]
[root@vivian-y1639vf3 storm-installer-0.8.0_1.el6.x86_64]# sudo updatedb
[root@vivian-y1639vf3 storm-installer-0.8.0_1.el6.x86_64]# ls /opt/storm/conf/storm.yaml
/opt/storm/conf/storm.yaml
[root@vivian-y1639vf3 storm-installer-0.8.0_1.el6.x86_64]# nano /opt/storm/conf/storm.yaml
[root@vivian-y1639vf3 storm-installer-0.8.0_1.el6.x86_64]# service storm-nimbus start
Starting storm nimbus...
Storm nimbus is running. [ OK ]
[root@vivian-y1639vf3 storm-installer-0.8.0_1.el6.x86_64]# service storm-ui start
Starting storm ui...
Storm ui is running. [ OK ]
[root@vivian-y1639vf3 storm-installer-0.8.0_1.el6.x86_64]# service storm-supervisor start
Starting storm supervisor...
Storm supervisor is running. [ OK ]
[root@vivian-y1639vf3 storm-installer-0.8.0_1.el6.x86_64]#


Verifying Storm
Once the daemons are started, verify using ps.

>ps -ef | grep storm

[root@vivian-y1639vf3 ~]# ps -ef | grep storm

root 5826 1 0 Oct05 pts/1 00:00:48 java -server -Xmx768m -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib:/usr/lib64 -Dstorm.options= -Dstorm.home=/opt/storm -Dlogfile.name=nimbus.log -Dlog4j.configuration=storm.log.properties -cp /opt/storm/lib/*:/opt/storm/storm-0.8.0.jar:/opt/storm/conf:/opt/storm/log4j backtype.storm.daemon.nimbus

root 5859 1 0 Oct05 pts/1 00:00:34 java -server -Xmx768m -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib:/usr/lib64 -Dstorm.options= -Dstorm.home=/opt/storm -Dlogfile.name=ui.log -Dlog4j.configuration=storm.log.properties -cp /opt/storm/lib/*:/opt/storm/storm-0.8.0.jar:/opt/storm/conf:/opt/storm/log4j:/opt/storm backtype.storm.ui.core

root 5901 1 0 Oct05 pts/1 00:01:02 java -server -Xmx1024m -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib:/usr/lib64 -Dstorm.options= -Dstorm.home=/opt/storm -Dlogfile.name=supervisor.log -Dlog4j.configuration=storm.log.properties -cp /opt/storm/lib/*:/opt/storm/storm-0.8.0.jar:/opt/storm/conf:/opt/storm/log4j backtype.storm.daemon.supervisor

root 9124 9046 0 02:26 pts/1 00:00:00 grep storm


You can also look at the config file for more clues how Storm works:

These are the default values for storm. We may have to change the parameter storm.cluster.mode from distributed to local. Storm.yaml is used to override the default values.

The admin is on port 8080 from th e ui.port setting.

OK lets test the maven files for running the compile and execution of the java sample code under storm -sample.

The instructions https://github.com/nathanmarz/storm-starter here on the last section say:


compile and run WordCountTopology in local mode, use this command:
mvn -f m2-pom.xml compile exec:java -Dexec.classpathScope=compile -Dexec.mainClass=storm.starter.WordCountTopology

Lets try setting storm.yaml to local mode first.
root>updatedb
root>locate storm.yaml
[root@vivian-y1639vf3 storm]# locate storm.yaml

/home/dc/storm-0.8.1/conf/storm.yaml

/opt/storm-0.8.0/conf/storm.yaml


root@vivian-y1639vf3> nano /opt/storm-0.8/conf/storm.yaml



########### These MUST be filled in for a storm configuration

storm.zookeeper.servers:

- "localhost"



nimbus.host: "localhost"



#

# ##### These may optionally be filled in:

#

## List of custom serializations

# topology.kryo.register:

# - org.mycompany.MyType

# - org.mycompany.MyType2: org.mycompany.MyType2Serializer

#

## Locations of the drpc servers

# drpc.servers:

# - "server1"

# - "server2"



java.library.path: "/usr/local/lib:/opt/local/lib:/usr/lib:/usr/lib64"



storm.local.dir: "/opt/storm"

#storm.cluster.mode: "local"




OK had to comment out the local mode above. It didnt work. Found this in the /var/log/storm/nimbus.log
2012-10-06 02:49:07 NIOServerCnxn [ERROR] Thread Thread[main,5,main] died
java.lang.IllegalArgumentException: Cannot start server in local mode!
at backtype.storm.daemon.common$validate_distributed_mode_BANG_.invoke(common.clj:77)
at backtype.storm.daemon.nimbus$launch_server_BANG_.invoke(nimbus.clj:1078)
at backtype.storm.daemon.nimbus$_launch.invoke(nimbus.clj:1110)
at backtype.storm.daemon.nimbus$_main.invoke(nimbus.clj:1134)
at clojure.lang.AFn.applyToHelper(AFn.java:159)
at clojure.lang.AFn.applyTo(AFn.java:151)
at backtype.storm.daemon.nimbus.main(Unknown Source)

Back to distributed mode. Crappy Docs.



Restart the daemons. Forgot what they were called. By convention all the daemons are under /etc/init.d. Lets do a ls and grep for them:

[root@vivian-y1639vf3 storm]# ls /etc/init.d | grep storm
storm-nimbus
storm-supervisor
storm-ui
[root@vivian-y1639vf3 storm]#

Now lets restart them:

[root@vivian-y1639vf3 storm]# service storm-nimbus start
Starting storm nimbus...
Storm nimbus is running. [ OK ]

[root@vivian-y1639vf3 storm]# service storm-supervisor restart
Stopping storm supervisor...
Storm supervisor is stopped. [ OK ]
Starting storm supervisor...
Storm supervisor is running. [ OK ]

[root@vivian-y1639vf3 storm]# service storm-ui restart
Stopping storm ui...
Storm ui is stopped. [ OK ]
Starting storm ui...
Storm ui is running. [ OK ]
[root@vivian-y1639vf3 storm]#



Test the storm CLI, this is the command line interface. This isn't well documented. The command line interface is at /opt/storm-0.8.0/bin/storm, run this program and you should get something like this:

[root@vivian-y1639vf3 storm-starter]# /opt/storm-0.8.0/bin/storm

Commands:
activate
classpath
deactivate
dev-zookeeper
drpc
help
jar
kill
list
localconfvalue
nimbus
rebalance
remoteconfvalue
repl
shell
supervisor
ui
version
Help:
help
help



Documentation for the storm client can be found at https://github.com/nathanmarz/storm/wiki/Command-line-client



Configs can be overridden using one or more -c flags, e.g. "storm list -c nimbus.host=nimbus.mycompany.com"



[root@vivian-y1639vf3 storm-starter]#



OK back to maven to try to run the sample starter word count program.

Do a web search for Apache Maven, download Apache Maven from here;

Unzip and untar this using >gunzip apache-maven-3.0.4.tar.gz
then >tar -xvf apache-maven-3.0.4.tar

cd into the newly extracted directory, apache-maven-3.0.4 and run pwd to get the full path.

[dc@vivian-y1639vf3 apache-maven-3.0.4]$ pwd

/home/dc/apache-maven-3.0.4

[dc@vivian-y1639vf3 apache-maven-3.0.4]$


cd again to get to your home directory, add the pwd path /home/dc/apache-maven-3.0.4 to MAVEN_HOME and set PATH to include $MAVEN_HOME/bin. This should have the same convention as how Java is setup. Your .bashrc file should look like this:


[dc@vivian-y1639vf3 apache-maven-3.0.4]$ cat ~/.bashrc

# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi

# User specific aliases and functions
export MONGO_HOME=/home/dc/mongodb-linux-x86_64-2.2.0
export JAVA_HOME=/home/dc/jdk1.6.0_34
#export STORM_HOME=/home/dc/storm-0.8.1
export MAVEN_HOME=/home/dc/apache-maven-3.0.4
export PATH=$PATH:$PATH/bin:$JAVA_HOME/bin:$MAVEN_HOME/bin




>source .bashrc to set the Maven executable directory.

Test you have maven by running >mvn -v

[dc@vivian-y1639vf3 apache-maven-3.0.4]$ mvn -v
Apache Maven 3.0.4 (r1232337; 2012-01-17 00:44:56-0800)
Maven home: /home/dc/apache-maven-3.0.4
Java version: 1.6.0_34, vendor: Sun Microsystems Inc.
Java home: /home/dc/jdk1.6.0_34/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-279.9.1.el6.x86_64", arch: "amd64", family: "unix"
[dc@vivian-y1639vf3 apache-maven-3.0.4]$



OK we are good to go. Now we can try the maven storm-starter instructions here: https://github.com/nathanmarz/storm-starter


cd into storm-starter. You should be in the same directory as the m2-pom.xml file.

Before we build storm-starter, we installed everything as root.

>su –
to become root.

[dc@vivian-y1639vf3 storm-starter]$ ls

LICENSE m2-pom.xml multilang project.clj README.markdown src target
[dc@vivian-y1639vf3 storm-starter]$


Lets build the source and create the jar files for storm-starter first.
>mvn -f m2-pom.xml package

you should see a long output which
[root@vivian-y1639vf3 storm-starter]# mvn -f m2-pom.xml package

The output is too big to add here but you should see a lot of lines like:
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefire-junit3/2.10/surefire-junit3-2.10.jar

Downloaded: http://repo.maven.apache.org/maven2/org/apache/maven/surefire/surefire-junit3/2.10/surefire-junit3-2.10.jar (26 KB at 171.3 KB/sec)


in the end you should see a build success message like this:

[root@vivian-y1639vf3 storm-starter]# mvn -f m2-pom.xml package
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for storm.starter:storm-starter:jar:0.0.1-SNAPSHOT
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-compiler-plugin is missing. @ line 127, column 12
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building storm-starter 0.0.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ storm-starter ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 4 resources
[INFO]
[INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ storm-starter ---
[INFO] Compiling 2 source files to /home/dc/storm-starter/target/classes
[INFO]
[INFO] --- clojure-maven-plugin:1.3.8:compile (compile) @ storm-starter ---
[INFO]
[INFO] --- maven-resources-plugin:2.5:testResources (default-testResources) @ storm-starter ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/dc/storm-starter/src/test/resources
[INFO]
[INFO] --- maven-compiler-plugin:2.3.2:testCompile (default-testCompile) @ storm-starter ---
[INFO] No sources to compile
[INFO]
[INFO] --- maven-surefire-plugin:2.10:test (default-test) @ storm-starter ---
[INFO] No tests to run.
[INFO] Surefire report directory: /home/dc/storm-starter/target/surefire-reports
-------------------------------------------------------
T E S T S
-------------------------------------------------------
Results :
Tests run: 0, Failures: 0, Errors: 0, Skipped: 0


[INFO]
[INFO] --- clojure-maven-plugin:1.3.8:test (test) @ storm-starter ---



Testing com.theoryinpractise.clojure.testrunner



Ran 0 tests containing 0 assertions.
0 failures, 0 errors.
[INFO]
[INFO] --- maven-jar-plugin:2.3.2:jar (default-jar) @ storm-starter ---
[INFO]
[INFO] --- maven-assembly-plugin:2.2-beta-5:single (make-assembly) @ storm-starter ---
[INFO] META-INF/ already added, skipping
[INFO] META-INF/MANIFEST.MF already added, skipping
[INFO] twitter4j/ already added, skipping
[INFO] META-INF/LICENSE.txt already added, skipping
[INFO] META-INF/maven/ already added, skipping
[INFO] META-INF/maven/org.twitter4j/ already added, skipping
[INFO] META-INF/ already added, skipping
[INFO] META-INF/MANIFEST.MF already added, skipping
[INFO] Building jar: /home/dc/storm-starter/target/storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar
[INFO] META-INF/ already added, skipping
[INFO] META-INF/MANIFEST.MF already added, skipping
[INFO] twitter4j/ already added, skipping
[INFO] META-INF/LICENSE.txt already added, skipping
[INFO] META-INF/maven/ already added, skipping
[INFO] META-INF/maven/org.twitter4j/ already added, skipping
[INFO] META-INF/ already added, skipping
[INFO] META-INF/MANIFEST.MF already added, skipping
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 8.059s
[INFO] Finished at: Sat Oct 06 04:11:49 PDT 2012
[INFO] Final Memory: 17M/270M
[INFO] -------------------------------




If you get a build faiure permission denied then you arent running as root.

Once the jars are built you can run the word count program. The instructions on the web page are reversed.

You should still be in the storm-starter directory.

[root@vivian-y1639vf3 storm-starter]# mvn -f m2-pom.xml compile exec:java -Dexec.classpathScope=compile -Dexec.mainClass=storm.starter.WordCountTopology


You should see a BUILD SUCCESS message at the end and output showing words being counted:
11505 [Thread-25] INFO backtype.storm.daemon.task - Emitting: split default ["four"]
11505 [Thread-21] INFO backtype.storm.daemon.executor - Processing received message source: split:5, stream: default, id: {}, ["four"]
11506 [Thread-21] INFO backtype.storm.daemon.task - Emitting: count default [four, 58]
11506 [Thread-25] INFO backtype.storm.daemon.task - Emitting: split default ["score"]
11506 [Thread-21] INFO backtype.storm.daemon.executor - Processing received message source: split:5, stream: default, id: {}, ["score"]
11506 [Thread-21] INFO backtype.storm.daemon.task - Emitting: count default [score, 58]
11506 [Thread-25] INFO backtype.storm.daemon.task - Emitting: split default ["and"]
11507 [Thread-23] INFO backtype.storm.daemon.executor - Processing received message source: split:5, stream: default, id: {}, ["and"]
11507 [Thread-23] INFO backtype.storm.daemon.task - Emitting: count default [and, 103]
11507 [Thread-25] INFO backtype.storm.daemon.task - Emitting: split default ["seven"]
11507 [Thread-19] INFO backtype.storm.daemon.executor - Processing received message source: split:5, stream: default, id: {}, ["seven"]
11508 [Thread-19] INFO backtype.storm.daemon.task - Emitting: count default [seven, 103]
11508 [Thread-25] INFO backtype.storm.daemon.task - Emitting: split default ["years"]
11508 [Thread-19] INFO backtype.storm.daemon.executor - Processing received message source: split:5, stream: default, id: {}, ["years"]
11508 [Thread-19] INFO backtype.storm.daemon.task - Emitting: count default [years, 58]
11508 [Thread-25] INFO backtype.storm.daemon.task - Emitting: split default ["ago"]



Like years has a count of 58 occurrences.





2 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Thanks for sharing this informative information..
    For Storm components all over information you may also refer.... http://www.s4techno.com/blog/2016/08/13/storm-components/

    ReplyDelete