Lipwig for Hive Is the Greatest!

Datetime:2016-08-22 21:52:43          Topic: Hive           Share

Ok, this is the coolest thing this Hive user has seen all day.

As you probably know, if you prepend the word EXPLAIN to your SQL query and then run it, Hive prints out a text description of the query plan. This lets you explore the effects such variations as code changes, the use of  analyze , turning on/off the cost-based optimizer (CBO), and so on. It’s an essential tool for optimizing Hive.

The output of  EXPLAIN is far from pretty, but fortunately, a simple pipeline of Linux commands can give you a slick graphical rendition like the one below.

By the way, I’m doing this on the Hortonworks Sandbox for HDP 2.3 , but you can do exactly the same thing for any Linux installation of Hive/Hadoop. The Sandbox is a free VM image of a single-node installation the Hortonworks Hadoop distribution. Mine is running on VirtualBox, but they have the same thing for VMware.

  1. First, you’ll use standard Hive to generate  EXPLAIN  output in JSON format.
  2. Then you’ll use a Python script called  lipwig.py  that is most courteously supplied to us by the   t3rmin4t0r .  His program reads the JSON and emits something called a  dot  file.
  3. The  dot  file is a generic, humanly readable and portable description of a graphical object. Such a file is the input to a program called  dot , which knows how to output graphics for the described image in a form that any computer can understand. Read more about  dot   here .
  4. In this case,  dot  will output a file in SVG format. SVG stands for Scalable Vector Graphics, and most machines will know what to do with it.

I describe below the sequence of steps used to generate the graphical representation of EXPLAIN’s output shown above. Before you try it out on your own code, you’ll need to get  lipwig.py from github.com.  I assume you know what  git is, but if you don’t, check  this out first.

Use cd to go to a convenient workspace directory and pull down the lipwig project using the command below. This will “clone,” i.e., get a complete copy of the source code tree for a program and put it in its own directory within the directory you are working in. In this case, it’s just one file plus a  readme.txt file.

git clone  https://github.com/t3rmin4t0r/lipwig.git

You’ll also need the dot program, which is part of the  graphviz package from AT&T.  You can install this on Linux by running the following:

sudo yum install graphviz

Now you should be ready to rock and roll.

First tack the keywords “ EXPLAIN FORMATTED ” in front of the query you want to analyze. My source is in a file called  myprog.sql and it looks like this:


EXPLAIN FORMATTED SELECT
     br.browser, ct.connection_type, co.country, date_time, evar3, evar28, evar29, first_hit_time_gmt, geo_dma,
     javascript, last_hit_time_gmt, post_purchaseid, last_purchase_time_gmt, new_visit, post_browser_height,
     post_browser_width, post_cookies, post_currency, post_cust_hit_time_gmt, post_evar3, post_evar28,
     post_evar29, post_event_list, post_persistent_cookie, post_product_list, post_visid_high, post_visid_low,
     post_zip, prev_page, visit_num, visit_page_num, visit_start_time_gmt
FROM
     hit_data hd join browser br join connection_type ct join country co
ON
     hd.browser=br.id and hd.connection_type=ct.id and hd.country=co.id
WHERE
     hd.post_product_list like "%hash::448163969%"
AND 
     hd.post_product_list like "%hash::1256898673%"
AND 
      hd.post_product_list not like "%hash::125680000%" 

Run it as follows to put the output of EXPLAIN FORMATTED into a file.  The  ‘>’ sign puts the output into a file instead of dumping it on the screen.

hive -f myprog.sql > explain.json

The contents of  explain.json is the raw JSON-formatted  EXPLAIN output. The JSON is human-readable, but only for specially gifted humans, unless you have set up your browser to display it.  If you’re using Chrome, try  this page for a plugin. Something similar is available for almost any browser.

Use lipwig.py to create a dot file from the JSON.  If you’re running Hive, then you already have Python, so all you need to do is tell it where  lipwig.py is. For me, the installation directory for  lipwig.py is one directory up, hence the double dots, but yours is wherever you installed it. The  lipwig.py program dumps the graphical specificiation into a file called  explain.dot.

python ../lipwig/lipwig.py explain.json > explain.dot

Now, all you need to do is create the .svg file. My dot file was called explain.dot , so I tell dot to create a type  .svg file called  explain.svg , using  explain.dot as input.

dot -Tsvg -o explain.svg  explain.dot

That’s it.  If you double click on explain. svg you’ll see your graphical output.  Otherwise, you could use something like GhostScript  (which a freeware  PostScript  utility) to display the output directly out of  dot as follows:

dot -Tsvg explain.dot | gs –

SVG files work fine for most purposes, but as it happens, the WordPress editor doesn’t support that format. If you also have some other reason to want an alternate format, -Tjpg and   -Tpng , etc. will give you alternate formats.





About List