Apache Spark – Top Interview Questions and Answers

Datetime:2016-08-23 01:32:28          Topic: Spark           Share

Objective

This blog lists commonly asked and important interview questions & answers of Apache Spark which you should prepare. Each question is associated with detailed answer, which will make you confident to face the interviews of Apache Spark. This guide lists frequently asked questions with tips to cracks the interview, to learn more about Apache Spark follow this introductory guide .

Q. What are the features and characteristics of Apache Spark which make it superior than other Big Data solutions like Hadoop-MapReduce?

View Answer >>

Q. What is Resilient Distributed Dataset (RDD) in Apache Spark ? How it provides abstraction in Spark and make spark operator rich ?

View Answer >>

Q. What is RDD lineage graph or linage operation in Apache Spark ? Explain lineage graph operator in Apache Spark, how it enables fault-tolerance in Spark ?

View Answer >>

Q. Explain Apache spark eco-system components: Spark SQL, Spark Streaming, Spark MLlib and GraphX.

In which scenarios we can use these components ? what type of problems can be solved using them ?

View Answer >>

Q. What is the difference between rdd and dataframes ?

View Answer >>

Q. What is the exact differences between reduce and fold operation in spark?

View Answer >>

Q. How to process data using Transformation operation in Spark ? what is the need of transformations in Spark ? provide the list of all the transformation available in Spark.

View Answer >>

Q. Brief explanation of RDD in Apache Spark. Why RDD is used to process the data ? What are the major features/characteristics of RDD (Resilient Distributed Datasets) ?

View Answer >>

Q. Explain briefly what is Action in Apache Spark, how action is used to generate final results ? Provide some examples of actions ?

View Answer >>

Q. What is the use of Spark driver, where it gets executed on the cluster ?

View Answer >>

Q. What is Parquet file format ? Where Parquet format should be used ? how to convert data to Parquet format ?

View Answer >>

Q. Benefits of Spark over MapReduce or Spark vs MapReduce?

View Answer >>

Q.  How to split single HDFS block into partitions RDD ?

View Answer >>

Q. What are the roles and responsibilities of worker nodes in the apache spark cluster? Is Worker Node in Spark is same as Slave Node?

View Answer >>

Q. What is a Broadcast Variables?

View Answer >>

Q. What is PageRank?

View Answer >>

Q. What is Standalone mode Spark?

View Answer >>

Q. What is difference between Caching and Persistence?

View Answer >>

Q. What is a Dstream?

View Answer >>





About List