This blog lists commonly asked and important interview questions & answers of Apache Spark which you should prepare. Each question is associated with detailed answer, which will make you confident to face the interviews of Apache Spark. This guide lists frequently asked questions with tips to cracks the interview, to learn more about Apache Spark follow this introductory guide .
Q. What are the features and characteristics of Apache Spark which make it superior than other Big Data solutions like Hadoop-MapReduce?
Q. What is Resilient Distributed Dataset (RDD) in Apache Spark ? How it provides abstraction in Spark and make spark operator rich ?
Q. What is RDD lineage graph or linage operation in Apache Spark ? Explain lineage graph operator in Apache Spark, how it enables fault-tolerance in Spark ?
Q. Explain Apache spark eco-system components: Spark SQL, Spark Streaming, Spark MLlib and GraphX.
In which scenarios we can use these components ? what type of problems can be solved using them ?
Q. What is the difference between rdd and dataframes ?
Q. What is the exact differences between reduce and fold operation in spark?
Q. How to process data using Transformation operation in Spark ? what is the need of transformations in Spark ? provide the list of all the transformation available in Spark.
Q. Brief explanation of RDD in Apache Spark. Why RDD is used to process the data ? What are the major features/characteristics of RDD (Resilient Distributed Datasets) ?
Q. Explain briefly what is Action in Apache Spark, how action is used to generate final results ? Provide some examples of actions ?
Q. What is the use of Spark driver, where it gets executed on the cluster ?
Q. What is Parquet file format ? Where Parquet format should be used ? how to convert data to Parquet format ?
Q. Benefits of Spark over MapReduce or Spark vs MapReduce?
Q. How to split single HDFS block into partitions RDD ?
Q. What are the roles and responsibilities of worker nodes in the apache spark cluster? Is Worker Node in Spark is same as Slave Node?
Q. What is a Broadcast Variables?
Q. What is PageRank?
Q. What is Standalone mode Spark?
Q. What is difference between Caching and Persistence?
Q. What is a Dstream?