Tuesday, December 09, 2014

Configuring single node Storm Cluster.


Apache Storm is a distributed real-time computation system for processing fast, large streams of data, adding real-time data processing to Apache Hadoop.
A social analytics company called BackType acquired by Twitter developed Storm. You can read more about it in this tutorial.

Storm installation can be separated into three parts as follows.


1. ZOOKEEPER CLUSTER INSTALLATION 

Zookeeper is the coordinator for Storm cluster. The interaction between nimbus and worker nodes is done through the Zookeeper.

Get the Zookeeper Download the zookeeper setup

$ wget http://www.eng.lsu.edu/mirrors/apache/zookeeper/stable/zookeeper-3.4.6.tar.gz
$ tar -xvf zookeeper-3.4.6.tar.gz
$ mv zookeeper-3.4.6 zookeeper

Optionally :
    a)  Add ZOOKEEPER_HOME under .bashrc
    b)  Add ZOOKEEPER_HOME/bin to the PATH

export ZOOKEEPER_HOME=/home/hduser/zookeeper
export PATH=$ZOOKEEPER_HOME/bin:$PATH


Create the data folder and update the conf/zoo.cfg to point to the data folder. By default it is set to /tmp folder which will be cleansed with every boot. Rest of the default settings are good enough.

$ mkdir zookeeper-data/   
$ cd zookeeper/conf
$ sudo mv zoo_sample.cfg zoo.cfg
$ sudo nano zoo.cfg


tickTime

the basic time unit in milliseconds used by Zookeeper. It is used to do heartbeats and the minimum session timeout will be twice the tickTime.

dataDir

the location to store the in-memory database snapshots and, unless specified otherwise, the transaction log of updates to the database.

clientPort

the port to listen for client connections


Now your Zookeeper cluster is ready to start.

dataDir=/home/hduser/zookeeper-data


Verify that you are able to start the zookeeper server  :

$ cd ..
$ bin/zkServer.sh start

JMX enabled by default
Using config: /home/hduser/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... ./zkServer.sh: line 109: ./zookeeper.out: Permission denied
STARTED

I resolved this error by telling zookeeper where I wanted the log file to be placed.

$ sudo nano zkEnv.sh

Add this assignment at the top of the file:

ZOO_LOG_DIR=/var/log/zookeeper

Then create that directory:

$ sudo mkdir /var/log/zookeeper
$ sudo chown zookeeper /var/log/zookeeper

$ bin/zkServer.sh start
JMX enabled by default
Using config: /home/hduser/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

The problem is resolved!!!

$ jps
4117 Jps
3982 QuorumPeerMain



2. INSTALL NATIVE DEPENDENCIES

Storm internally uses ZeroMQ. Download the code for zeromq, compile and install it. Be careful with the versions

$ wget http://download.zeromq.org/zeromq-4.0.5.tar.gz
$ tar -xzf zeromq-4.0.5.tar.gz
$ cd zeromq-4.0.5
$ mv zeromq-4.0.5 zeromq
$ ./configure
$ make
$ sudo make install

Install the git and libtool packages from the terminal. This are the prerequisites for the next step.

$ sudo apt-get install libtool git

Download the code for jzmq. These are the Java bindings for zeromq. Compile and install it.

$ sudo git clone https://github.com/nathanmarz/jzmq.git
$ cd jzmq

$ sed -i 's/classdist_noinst.stamp/classnoinst.stamp/g' src/Makefile.am
$ ./autogen.sh 
./configure
$ make
sudo make install


3. STORM INSTALLATION


Now we are all set with the installation of Storm. Download the latest Storm and extract it. 


wget 'http://people.apache.org/~ptgoetz/apache-storm-0.9.3-rc1/apache-storm-0.9.3-rc1.tar.gz'
$ tar -xzvf apache-storm-0.9.3-rc1.tar.gz

$ mv apache-storm-0.9.3-rc1 storm

Now you need to configure Storm so you need to create the Storm configuration file called ‘storm.yaml’ and its present in the ‘conf’ folder of the untar Storm root folder.

$ nano conf/storm.yaml

storm.zookeeper.servers:

- "localhost"

storm.local.dir: "/home/hduser/storm/data"

nimbus.host: "localhost"

supervisor.slots.ports:

- 6700


- 6701


4. RUN STORM TOPOLOGY 

Start the cluster


A) Start the Zookeeper cluster

To start the Zookeeper server go to the ‘bin’ directory of the Zookeeper installation and execute following command.

sudo sh zkServer.sh start



B) Start the Storm daemons

The nimbus service is similar to JobTracker and the supervisor service is similar to TaskTracker in Hadoop. More details about the Storm terminology are specified here.

bin/storm nimbus

bin/storm supervisor



C) Start the Storm UI

bin/storm ui

Use the Web UI to check the logs for any exceptions. Go to the Storm UI at http://localhost:8080.



5. UPLOAD TOPOLOGY

To upload topology to Storm Cluster go to the ‘bin’ directory of the Storm installation and execute following command

$ storm jar <path-to-topology-jar> <class-with-the-main> <arg1> <arg2> <argN>

where:

<path-to-topology-jar>: is the complete path to the complied jar where your topology code and all your libraries are.

<class-with-the-main>: will be the class in jar file having main method where the StormSubmitter is executed


<arg1> <arg2> <argN>:  the rest of the arguments will be the params that receive our main method.


With the Storm running on a single node, now it's time to execute the sample code. Get the sample code from Git.

git clone https://github.com/nathanmarz/storm-starter.git

$ cd storm-starter

Package the code. storm-starter-*.jar would be created after a successful build in the target folder.

$ mvn -f m2-pom.xml package

Execute the WordCountTopology example. The job is submitted to Storm and the control returns back immediately. The last parameter WordCount is the topology name which can be observed in the Storm UI. Check the logs for any exceptions.

bin/storm jar /home/hduser/storm-starter/target/storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar storm.starter.WordCountTopology WordCount

You can use the following command to check the src code of word counting:

~/storm-starter/src/jvm/storm/starter$ less WordCountTopology.java

No comments:

Post a Comment