Thursday 14 April 2016

Install Apache Spark on Ubuntu 14.04


Install Scala

To install Java in Ubuntu machine,
apt-add-repository ppa:webupd8team/java
apt-get update
apt-get install oracle-java7-installer
java -version

java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

cd /usr/local/src/
wget http://www.scala-lang.org/files/archive/scala-2.11.8.tgz
sudo mkdir /usr/local/src/scala


tar xvf scala-2.11.8.tgz -C /usr/local/src/scala/

vi .bashrc
export SCALA_HOME=/usr/local/src/scala/scala-2.11.8
export PATH=$SCALA_HOME/bin:$PATH

restart bashrc
. .bash

scala -version
Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL

scala
Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80).
Type in expressions for evaluation. Or try :help.
scala>

Install Spark

cd /usr/local/src/
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.1.tgz
tar xvf spark-1.6.1.tgz
cd spark-1.6.1/


SBT(Simple Build Tool) is used for building Spark, which is bundled with it. To compile the code,
 
sbt/sbt assembly
NOTE: The sbt/sbt script has been relocated to build/sbt.
      Please update references to point to the new location.

      Invoking 'build/sbt assembly' now ...

Attempting to fetch sbt
Launching sbt from build/sbt-launch-0.13.7.jar
Getting org.scala-sbt sbt 0.13.7 ...
.....................................
[info] Packaging /usr/local/src/spark-1.6.1/assembly/target/scala-2.10/spark-assembly-1.6.1-hadoop2.2.0.jar ...
[info] Done packaging.
[success] Total time: 2597 s, completed Apr 14, 2016 5:01:34 PM

test a sample program

./bin/run-example SparkPi 10
Pi is roughly 3.139988

run Spark interactively through the Scala shell
./bin/spark-shell

scala> val textFile = sc.textFile("README.md")
textFile: org.apache.spark.rdd.RDD[String] = README.md MapPartitionsRDD[1] at textFile at <console>:27

scala> textFile.count()
res0: Long = 95

scala> exit

If want to check some particular sections of spark using shell.
For example run MQTT interactevely, the mqtt is defined under external for import that into spark-shell just follow the instructions

sbt/sbt "streaming-mqtt/package"
bin/spark-shell --driver-class-path external/mqtt/target/scala-2.10/spark-streaming-mqtt_2.10-1.1.0.jar
scala> import org.apache.spark.streaming.mqtt._

Ref : http://blog.prabeeshk.com/blog/2014/10/31/install-apache-spark-on-ubuntu-14-dot-04/

4 comments: