Pig is a framework built on top of Hadoop and it abstracts development on MapReduce apps. This is a surprisingly versatile platform and it’s even used by Netflix in their development cycle.
Learning Pig from scratch can be intimidating. But if you have the right learning materials and enough drive to keep practicing then you can pick it up in no time.
To help you get started I’ve cataloged the 5 best books on Apache Pig and MapReduce. Some books are more beginner-friendly than others but they can all take you to the level of an expert Pig/Hadoop developer.
Best Beginner’s Pig Book
If you’re new to Pig and don’t know where to start I would absolutely recommend a copy of Programming Pig . It’s published by O’Reilly now in its 2nd edition, and this book covers everything you need to know about launching and scaling a Pig app. It also works as a handy reference guide and it’s the most up-to-date book on Pig/Hadoop development.
Beginning Apache Pig
I’ve been pleasantly surprised with the quality coming from Apress books. Beginning Apache Pig is a newer title first published at the very end of 2016.
It covers all the basics of Pig from setup to customization over the course of 270 pages. The author Balaswamy Vaddeman is a big data evangelist with almost a decade of practical experience working with big data environments.
Apache Pig is just one more tool for your big data toolbelt. This book covers everything from MapReduce to the more customized features of Pig. You’ll learn how to write your own Pig code using Pig Latin, the default language for Pig development.
This is a brilliant book for beginners and it should be a fun read cover to cover.
This is the most detailed Pig book on the market and it’s perfect for complete beginners or intermediate users who want to advance their skillset.
Programming Pig is currently in its second edition spanning 350 pages of Pig development tips & techniques. This is also the most comprehensive guide to building Pig apps with the Pig Latin programming language.
You’ll learn how to write properly structured Pig code, how to connect into databases, and how to write your own User-Defined Functions to expand the capabilities of Pig. Each chapter covers different techniques using both theory and a hands-on approach.
Beginners should have no trouble picking up this book and following it through to completion.
Along the way you’re sure to encounter confusing topics that you may not understand. But Pig is easy enough to troubleshoot and these exercises will really build your confidence working with Apache Pig in any environment.
Pig Design Patterns
It doesn’t take long to understand the basics of Pig in practice. The difficult part is mastering Pig development to create lightning-fast apps that are easy to edit & simple to maintain.
Pig Design Patterns covers all the top Pig development features that professionals use on a day-to-day basis. It’s a fairly large book covering 300 pages with 7 large chapters on data transformations, validations, and data reduction patterns with Pig.
Depending on your level of expertise this book may be fairly simple or somewhat confusing.
You should already have a good understanding of Hadoop and maybe a basic understanding of Pig. While a novice Pig developer could work their way through this book, it’d be better to start with a simpler title.
This is an intermediate-to-advanced level book teaching best practices for Pig in the real world.
Hadoop For Dummies
I’m rarely a fan of the dummies books especially for technical/programming subjects. However Hadoop For Dummies offers a look into Hadoop along with many intro chapters on Pig.
This could be an excellent book for developers who don’t anything about Hadoop. You do need a Hadoop environment to build Pig applications, so this book could be an excellent beginner’s guide to both.
However some of the Hadoop information is a bit outdated so it’s not the absolute best book you can get.
If you’re looking for more Hadoop-focused titles take a peek atour related article covering the best Hadoop books.
Many of those books do have sections on Pig, although you’ll learn much more from a focused book like Programming Pig .
Hadoop MapReduce v2 Cookbook
Although this isn’t strictly a Pig book it can help you understand the underlying framework behind Pig.
The Hadoop MapReduce v2 Cookbook comes with 90+ different recipes built on Hadoop with Pig solutions mixed in. You’ll learn all about the MapReduce function and how to properly manage big data through a Hadoop environment.
I still think this is worth buying just for the Pig source code. Every recipe has its own step-by-step approach so they all work like mini tutorials. You’ll learn how to connect into different databases, how to connect with an AWS instance, and so much more.
Keep in mind this is a technical book made for practical dev projects. You should already have familiarity with the basics of Pig, if not a solid grasp of Pig, before buying a copy of this book.
But if you frequently work with MapReduce and need better strategies then this cookbook will not disappoint.
This is a pretty small list of books, but the ones listed here are pure gold.
Complete beginners should start with a copy of Programming Pig to get a solid grasp on the technology. It’s the most in-depth book on this subject and it’s also my personal favorite.
But if you want something a little simpler I would also recommend Beginning Apache Pig . This covers much of the same material but it holds your hand more throughout the process.
And once you get to start building your own projects you’ll probably want a copy of Pig Design Patterns . This book teaches you all the best practices for structuring and scaling your own Pig-supported applications.
Hadoop is full of little tools like Pig so it can be overwhelming to get started. But with these books by your side you can become an expert Pig Latin coder in no time.