Spark 2.0 was released GA from the Apache Foundation last week, and you can now leverage Spark’s new performance enhancements, better SQL support, the Structured Streaming API, and better SparkR support. Hive 2.1 has improved support for the Apache Parquet file format, various performance optimizations, and increased SQL support. For information on how Hive 2.1 differs from Hive 1.0 on Amazon EMR,click here. Zeppelin 0.6.1 (Snapshot) now has authentication and authorization support for notebooks, and Hue 3.10 has many UI improvements, including a notebook interface and an updated Apache Oozie workflow editor to visually create complex workflows.
You can create an Amazon EMR cluster with release 5.0 by choosing release label “emr-5.0.0” from the AWS Management Console, AWS CLI, or SDK. You can specify the set of applications to install on your cluster, and previous sandbox applications are now specified without the “-sandbox” suffix. Enhanced debugging information is available in the console or directly in the step description, and it is automatically enabled for clusters with release 5.0. Please visit the Amazon EMR documentation for more information aboutrelease 5.0, Spark 2.0 , Hive 2.1 , Presto 0.150 , Tez 0.8.4 , Zeppelin 0.6.1 (Snapshot) ,Hue 3.10, and enhanced debugging. You can also join our live webinar, Introducing Amazon EMR Release 5.0 , at 9AM PDT on Tuesday, August 23 for more details about release 5.0.