Replicating Mixpanel Analytics with the ELK Stack and Logz.io

Datetime:2016-08-23 03:45:55          Topic: JavaScript           Share

Web analytics includes the measurement, collection, analysis, and reporting on websites and web applications to optimize user experience and engagement. In addition to the classic aggregation metrics, advanced web analytics solutions can provide real-time data on your users and help to identify specific actions along their user journeys.

The web analytics market includes both hosted and SaaS solutions, the most popular of which are Google Analytics, Adobe Analytics, Mixpanel, and Kissmetrics. These platforms understand specific user flows, specify and report on goals, and provide insights that help product managers, marketers, and developers to optimize their site performance.

Companies, however, that depend on web analytics sometimes prefer to build their own solutions because of the limited flexibility, lock-in risk, and liability that comes with sharing data with third parties. Those customized analytics platforms are often based on open source technologies (Wikipedia is one such example ).

In this post, we will describe one specific solution to this issue: building a Mixpanel analytics solution with the Logz.io ELK Stack for data collection, visualization, and analysis.

Data format, tracking and shipping

There are many open source solutions that can help you to track, collect, and analyze usage and events data. For example, you can use JavaScript trackers such as Open Web Analytics or Snowplow .

There are also several data formats that can be used to track user activity. Below is a simple example of an argument passed to a tracker before being sent to Elasticsearch:

{
   "type": "page_visit",
   "timestamp": "2016-08-02 12:01:15",
   "User": { "name": "John", "last_name": "Doe" }
}

Deploying the logger

Here, we will use some JavaScript tracker code that both creates the tracking events data and sends it to the ELK Stack. To ship to the Logz.io ELK Stack, you can use a premade client-side logger that can be found in our GitHub repository . It also includes the source code of the JavaScript tracker as well as its configuration.

In the example below, you can see how to set up the tracker. This needs to be loaded into the site. It will then create a Logz.io logger object that will be responsible for sending event data to the server. This is the first step you should take before adding the ability to track events that occur on your website pages.

The snippet below should be placed just before the closing </body> tag:

<scripttype="text/javascript">
   (function() {
       var d = parseInt(new Date().getTime()) / 1000 / 60 / 30;
       var l = document.createElement("script"); l.type = "text/javascript"; l.async = true;
       l.src = "//cdn.logz.io/logger.min.js?d="+d;
       document.getElementsByTagName("head")[0].appendChild(l);
   })();
   window.LogzioLogger = new LogzioLogger('__YOUR_API_KEY__');
</script>

The next step is to log into your Logz.io account (if you don’t have one, you can start a free trial here ) and enter the Settings page to get your Logz.io token field, which we will need to complete the JavaScript tracker settings.

Settings page with the token field

Initiating a Tracking Event

After you place the token value inside the LogzioLogger constructor (__YOUR_API_KEY__), the instance of the Logz.io logger will be ready to send logs to the servers.

The LogzioLogger object can then be placed anywhere in your code to initiate tracking. Use the format below:

   LogzioLogger.log('Log message');
 
   LogzioLogger.log({
       type: 'page_visit',
       value: '123456',
       message: 'Page visit'
   });

Note:

If you are new to ELK, we strongly suggest that you read our complete guide to the ELK Stack . There, you will find information on installation and configuration. If you are already familiar with ELK, move on to the next section — which covers how to replicate specific Mixpanel dashboard components using the stack.

Replicating Mixpanel Key Capabilities

Once you have the data tracked and indexed in the ELK Stack, the next step is to build a dashboard with the same functionality as Mixpanel. In the following section, we will demonstrate how to replicate some of the key Mixpanel functionalities in ELK such as segmentation, formulas, and live view.

Segmentation

Mixpanel’s segmentation chart allows analysts to see which event (such as a specific functionality in an application) results in the highest and lowest user engagement. In case of a large amount of data resulting from a specific event, users can use the logarithmic scale option to scale the Y axis. This helps large sites to correlate different event trends easily.

Here is the segmentation chart’s configuration setting that will allow you to select event properties and build a query:

The segmentation box in Mixpanel

With ELK, you can use Lucene search syntax in Kibana to run complex queries. For example, imagine that you want to filter all of your event logs by a specific song and artist name. The song has a value of Song1 and the artist name is John Doe . This is the search query you would use in Kibana:

song: "Song1" AND artist_name: "John Doe"

You can then use this search as a basis for a line chart visualization. Select the Visualize tab in Kibana and select the Line Chart visualization type.

Next, configure the aggregation for the visualization using the configuration pane on the left. In the example below, you can see that the Y axis aggregation is count while the X axis is set to show the date using the @timestamp field.

Line chart with filtered values per song and artist

Formulas

Mixpanel’s analytics tools provide the option to create chart-based formulas . Essentially, formulas allow users to combine events and run arithmetic functions on top of them. For example, you can use division to calculate the ratio between two event counts that are reported within the same time interval.

To perform this type of analysis in Kibana, create area charts with aggregation functions on both the X and Y axis.

For the Y axis, select the count aggregation for the artist_name field, whereas for the X axis, we can use the date histogram for the timestamp field. We also added a Split Area and set its aggregation to be by specific song names (“terms”).

This output chart describes the ratio between specific songs in a specific time interval:

An area chart with the splitting option

Live view

One of the core capabilities of comprehensive web analytics tools such as Mixpanel is their ability to report on specific user activities in almost real time. The ability to analyze usage information that quickly enables websites to learn how new features are affecting the end-user experience.

Mixpanel live view is used to display events as they take place. When combined with Mixpanel’s activity feed , it becomes a powerful analytical tool:

( source )

We can use Kibana’s static table visualization type to have the same functionality to present events and their properties in a very detailed way.

Creating this type of table is simple. Select the Visualize tab and then Static table in Kibana. Next, configure the desired view options in the configuration pane on the left. You can decide which fields to use to filter the event log and configure the sizes of the row entries to be displayed.

As you can see, this table displays a list of events with a detailed view of key-value pairs that is identical to Mixpanel’s live view table of events:

A static table with a live feed of the latest events

Building the dashboard

Once you have a series of visualizations, the next step is to combine them into a comprehensive Kibana dashboard. To do this, simply select the Dashboard tab in Kibana, create a new dashboard, and add your saved visualizations.

Having all your data on one screen will allow you to correlate single events with other points of reference in your data. The most obvious example is tracking conversion rates and being able to look at the rates of traffic growth and new-user signups side-by-side.

The ability to create a centralized monitoring dashboard — such as the one below — to view charts side by side is not supported by Mixpanel out of the box. It will be interesting to see if they add this functionality in the future.

An example of an ELK-based Mixpanel dashboard

A Final Note

People who use proprietary web analytics tools can become locked into those platforms for years — and then they often lose past log data whenever they decide to change to other software. Proprietary software tends to be costly and — in some cases — inflexible.

It’s just one reason why open source is winning the war with proprietary software .

Of course, building a web analytics system from the ground up is not usually feasible. Open source solutions are often reasonable methods to meet your business needs while keeping the considerations listed above — cost, flexibility, and vendor lock-in — in mind.

When using open source technologies such as the ELK Stack, you not only avoid the risk of getting locked in but also remain free to build, customize, and continuously optimize the way that you use it to analyze your website visitors.

Logz.io is a predictive, cloud-based log management platform that is built on top of the open-source ELK Stack and can be used for log analysis, application monitoring, business intelligence, and more. Start your free trial today !

Asaf Yigal

Asaf Yigal is co-founder and VP Product at Logz.io. Prior to Logz.io, Asaf co-founded Currensee, a social-trading platform, which was later acquired by OANDA in 2013. Prior to Currensee, Asaf played executive roles at Akorri in developing an end-to-end performance monitoring platform and at Onaro in developing a storage resource management platform. Both Akorri and Onaro were acquired by NetApp. Prior to Onaro, Asaf headed a research team in the Israeli Navy, taking an artificial intelligence system to military deployment. Asaf holds a B.S. from the Technion and is an Instrument-rated private pilot.





About List