Announcing Apache Pig 0.12…The Community Breeds a More Powerful Pig

Datetime:2016-08-23 02:38:44         Topic: Apache Pig          Share        Original >>
Here to See The Original Article!!!

Today we are proud to announce the general availability of Apache Pig 0.12!

pig12Notable If you are a Pig user and you’ve been yearning to use additional languages, for more data validation tools, for more expressions, operators and data types, then read on. Version 0.12 includes all of those additions, and now Pig runs on Windows without Cygwin.

This was a great team effort over the past six months with over 30 engineers from Twitter, Yahoo, Linkedin, Mortardata, Cloudera, Microsoft, and several others (including Hortonworks of course). Between Pig 0.11 and Pig 0.12, we resolved 305 Jira issues.

Improvements in Apache Pig 0.12

Assert operator

An assert operator can be used for data validation. For example, the following script will fail if any value is a negative integer:

a = load 'something' as (a0:int, a1:int);
assert a by a0 > 0, 'a cant be negative for reasons';

Streaming UDF

Users can now write a UDF using a language without JVM implementations. In particular, we implemented C Python UDF in this version. Users are able to write Python UDF using C Python extensions which otherwise are not possible in Jython.

Rewrite of AvroStorage

We completely revamped the AvroStorage. It is now part of Pig built-in functions. It uses the latest version of Avro and is significantly faster, with many bug fixes.

IN operator

Previously, Pig had no support for IN operators. To mimic those, users had to concatenate several OR operators, as in this example:

a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
b = FILTER a BY 
   (i == 1) OR
   (i == 22) OR
   (i == 333) OR
   (i == 4444) OR
   (i == 55555)

Now, this type of expression can be re-written in a more compact manner, using an IN operator :

a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
b = FILTER a BY i IN (1,22,333,4444,55555);

CASE expression

Before Pig had no support for a case statement. To mimic it, users often use nested bincond operators. Those could become unreadable when there were multiple levels of nesting.

Here’s an example of the type of CASE expression that Pig now supports:

  CASE i % 3 
     WHEN 0 THEN '3n' 
     WHEN 1 THEN '3n+1' 
     ELSE '3n+2' 

BigInteger/BigDecimal data types

Some applications require calculations with a high degree of precision. In these cases BigInteger and BigDecimal can be used for more precise calculations.

Support for Microsoft Windows™

Changes that enable Apache Pig to run on Windows without Cygwin have now been committed to the trunk.

Parquet Support

Pig now wraps ParquetLoader/ParquetStorer in built-in functions. Users are able to load/store Parquet data easily.



Put your ads here, just $200 per month.