Create First Spark Project in Scala in Eclipse

Datetime:2016-08-23 04:17:23          Topic: Spark  Scala           Share


This step by step tutorial will explain how to create a Spark project in Scala in Eclipse without maven and how to submit the application after creation of jar. This Guide also briefs about installation of Scala plugin in eclipse and setup spark environment in eclipse. Lean how to configure development environment for developing Spark applications in Scala. To learn more about Apache Spark follow this introductory guide .

Platform Used / Required

  • Operating System: Windows / Linux / Mac
  • Java: Oracle Java 7
  • Scala: 2.11
  • Eclipse: Eclipse Luna, Mars or later

Install Eclipse plugin for Scala:

Open Eclipse Marketplace (Help >> Eclipse Marketplace) and search for “scala ide”. Now install the Scala IDE. Alternatively you can download Eclipse for Scala .

Create a New Spark Scala Project

To create a new Spark Scala project, click on File >> New >> Other

Select Scala Project

Supply Project Name

Create New Package

After creating the project, now create a new package.

Supply Package Name

Create a New Scala Object

Now create a new Scala Object to develop Scala program for Spark application

Select Scala Object

Supply Object Name

New Scala Object in Editor

Scala object is ready now we can develop our Spark wordcount code in Scala

Copy below Spark Scala Wordcount Code in Editor

package com.dataflair.spark

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

object Wordcount {
 def main(args: Array[String]) {
 //Create conf object
 val conf = new SparkConf()
 //create spark context object
 val sc = new SparkContext(conf)

//Check whether sufficient params are supplied
 if (args.length < 2) {
 println("Usage: ScalaWordCount <input> <output>")
 //Read file and create RDD
 val rawData = sc.textFile(args(0))
 //convert the lines into words using flatMap operation
 val words = rawData.flatMap(line => line.split(" "))
 //count the individual words using map and reduceByKey operation
 val wordCount = => (word, 1)).reduceByKey(_ + _)
 //Save the result

//stop the spark context

You will see lots of error due to missing libraries

Add Spark Libraries

Configure Spark environment in eclipse: Right click on project name >> build path >> Configure build Path

Add the External Jars

Select Spark Jars and insert

You should have spark setup available in develop environment, it will be needed for spark libraries.

Go to “Spark-Home >> jars” and select all the jars

Import the selected jar

Spark Scala Wordcount Program

After importing the libraries all the errors will be removed.

We have successfully created Spark environment in Eclipse and developed Spark scala program. Now lets deploy the Spark job on Linux, before deploying / running the application you must have Spark Installed, follow this Installation guide for Spark Installation .

Create the Spark Scala Program Jar File

Before running created Spark wordcount application we have to create jar file. Right click on project >> export

Select Jar-file Option to Export:

Create the Jar file

The jar file for the Spark Scala application has been created, now we need to run it.

Go to Spark Home Directory

Login to Linux and open terminal. To run Spark Scala application we will be using Ubuntu Linux. Copy the jar file to Ubuntu and create one text file, which we will use as input for spark Scala wordcount job.

Submit Spark Application using spark-submit script

To submit the application use below command:

bin/spark-submit --class <Qualified-Class-Name> --master <Master> <Path-Of-Jar-File> <Input-Path> <Output-Path> 
bin/spark-submit --class com.dataflair.spark.Wordcount --master local ../sparkJob.jar ../wc-data output

Let’s understand above command:

  • bin/spark-submit: To submit Spark Application
  • –class: To specify the class name to execute
  • –master: Master (local / / yarn)
  • : The jar file of application
  • : Location from where input data will be read
  • : Location where Spark application will write output

The application has been completed successfully, now browse the result

Browse the result

Browse the output directory and open the file with name part-xxxxx which contains the output of the application.

We have successfully developed Spark Job in Scala and deployed on Ubuntu. To perform various Spark operation like Transformation & Action follow this Guide .

About List