Monday, December 01, 2014

Apache Mahout: Scalable machine learning library

Apache Mahout is a project of the Apache Software Foundation that tries to build intelligent algorithms that learn from some data input (machine learning). Mahout offers algorithms in three major  areas: Clustering, Categorization and Recommender Systems. 

  • Taste
Taste is the Recommender System part of Mahout and it provides a very consistent and flexible collaborative filtering engine. Mahout provides a rich set of components from which you can construct a customized recommender system from a selection of algorithms. The package defines the following interfaces:
  1. DataModel
  2. UserSimilarity
  3. ItemSimilarity
  4. UserNeighborhood
  5. Recommender




This diagram shows the relationship between various Mahout components in a user-based recommender.


  • Installation (ubuntu)

Here follow a step-by-step guide to install and test the Mahout recommender system.

1. Make sure you have the Java JDK. 


$ java -version
java version "1.6.0_33"
OpenJDK Runtime Environment (IcedTea6 1.13.5) (6b33-1.13.5-1ubuntu0.12.04)
OpenJDK Client VM (build 23.25-b01, mixed mode, sharing)

2. Install the project manager Maven

$  mvn -version


Apache Maven 3.0.4
Maven home: /usr/share/maven
Java version: 1.6.0_33, vendor: Sun Microsystems Inc.
Java home: /usr/lib/jvm/java-6-openjdk-i386/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.8.0-44-generic", arch: "i386", family: "unix"

3. Download a Hadoop version

I downloaded:  1.2.1. Be careful with this, with Hadoop 2 you can get problems with the Mahout's version


$ tar xfz hadoop-1.2.1.tar.gz
$ sudo mv hadoop-1.2.1 /usr/local/hadoop


4. Download the Mahout package 

I downloaded the version 0.9: mahout-distribution-0.9-src.tar.gz

5. Unpack mahout-distribution-0.9-src.tar.gz

$ cd /opt/
$ tar -xvzf mahout-distribution-0.9-src.tar.gz 
$ sudo mvn install

With this, you will have compiled Mahout's code, and run the UnitTests that comes with it, to make sure everything is ok with the component.

If the build was sucessfull you should see something like:

[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 55 minutes 23 seconds
[INFO] Finished at: Tue Dec 01 10:15:02 BRT 2014
[INFO] Final Memory: 60M/275M
[INFO] ------------------------------------------------------------------------


6. Now use gedit(or your favored editor) to edit ~/.bashrc using the following command:

$ gedit ~/.bashrc

This will open the .bashrc file in a text editor. Go to the end of the file and paste/type the following content in it:

#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386
export HADOOP_INSTALL=/usr/local/hadoop/hadoop-1.2.1
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=/usr/local/hadoop/lib/native"
#HADOOP VARIABLES END

7. Executing Recomender 

 Get your data on the following format:

userid, itemid, rating

$ cd /opt/mahout-distribution-0.9

For example, copy the following data and name it as mydata.dat  where you have installed Mahout:

1,101,5.0
1,102,3.0
1,103,2.5
2,101,2.0
2,102,2.5
2,103,5.0
2,104,2.0
3,101,2.5
3,104,4.0
3,105,4.5
3,107,5.0
4,101,5.0
4,103,3.0
4,104,4.5
4,106,4.0
5,101,4.0
5,102,3.0
5,103,2.0
5,104,4.0
5,105,3.5
5,106,4.0 

Now you need to create the file users.dat in the same folder. 

$ chmod 7777 -R /opt/mahout-distribution-0.9

Now, run:

$ bin/mahout recommenditembased --input mydata.dat --usersFile users.dat --numRecommendations 2 --output output/ --similarityClassname SIMILARITY_PEARSON_CORRELATION

The usersFile is where you should put for which users you want to o the recommendation for. You can change numRecommendations to the number of recommendations you desire.







1 comment:

  1. http://stackoverflow.com/questions/18767843/how-can-i-compile-using-mahout-for-hadoop-2-0

    ReplyDelete