Monitoring Apache Cassandra Metrics With Graphite and Grafana

Datetime:2016-08-23 00:43:28          Topic: Cassandra           Share

1. Overview

Despite the fact thatApache Cassandra provides a large number of metrics through the popular Metrics library, it does not, however, provide any out-of-the-package solutions to monitor these metrics. The command-line based nodetool utility can be used to analyze some of the Cassandra internal metrics, but it is by the nature not designed for monitoring purpose. For many users, DataStax OpsCenter becomes the only viable and ready-to-use monitoring solution for them to monitor their Cassandra clusters. The bad news, though, is that starting from OpsCenter v6.0, OpsCenter will be only available to DataStax Enterprise Edition (DSE) users. Open source Cassandra users cannot use this tool any more.

In this post, I will explore an open source Cassandra monitoring solution based on Cassandra pluggable metrics reporting with Graphite and Grafana. The post will start with the high level architecture of this solution, followed by the step-by-step instructions of setting this solution up on a Ubuntu 14.0.4 VM based host. The Cassandra version used in this setup is 2.1.14.

2. Architecture Overview

The diagram below describes a high level, logical view of the proposed solution. The main components of this solution are as follows and I’ll go through each of them with more details in later sections.

  • Cassandra cluster (with Metrics-Graphite reporter enabled): source of monitoring metrics
  • Graphite server: receiver and aggregator of Cassandra metrics
  • Grafana server: metrics dashboard provider
  • Apache web server: web host for metrics dashborad
  • Postgres SQLdatabase server: storage provider of received metrics and dashboard metadata

Please note that for simplicity purpose the diagram only shows one Cassandra node, but the idea here can be extended to a Cassandra cluster. Basically the concept behind is that each node within a Cassandra cluster sends Cassandra internal metrics to a central Graphite server (in particular Graphite-carbon sub-component). The metrics are stored, aggregated by Graphite and then displayed via Grafana (a web-based dashboard solution).

Please also note that the web server and database server in the diagram are not necessarily limited to only Apache web server and Postgres SQL database server. Other web servers and database servers that are supported by Graphite can also be used.

3. Configure Cassandra with Graphite metrics reporter

Since Cassandra version 2.0.2, Cassandra has provided the built-in feature of Pluggable Metrics Reporting that can expose internal Cassandra metrics on the fly to different metrics reporters such as CSV, console, Graphite, Gangalia, and so on. In this solution, we use the Graphite reporter. Having said so, the solution discussed in this post requires Cassandra version at least 2.0.2. For earlier version like 1.2, a customized metrics collection agent need to be deployed on Cassandra node to collect Cassandra JMX metrics and send it to the target Graphite server.

In order to configure Cassandra service to work with graphite metrics reporter, the following steps are required:

1). Download Graphite metrics reporter jar file (metrics-graphite-2.2.0.jar) from here

2). Put the downloaded jar file in Cassandra library folder, e.g. /usr/share/cassandra/lib/ (the default Cassandra library folder under packaged installation on Ubuntu 14.0.4)

3). Create a metrics reporter configuration file (e.g. metrics_reporter_graphite.yaml ) and put it under the same folder as cassandra.yaml file, e.g. /etc/cassandra/ (the default Cassandra configuration folder under packaged installation on Ubuntu 14.0.4).

graphite:
  -
    period: 30
    timeunit: 'SECONDS'
    prefix: 'cassandra-clustername-node1'
    hosts:
     - host: 'localhost'
       port: 2003
    predicate:
      color: 'white'
      useQualifiedName: true
      patterns:
        - '^org.apache.cassandra.+'
        - '^jvm.+'

4). Modify cassandra-env.sh file to include the following JVM option:

METRICS_REPORTER_CFG="metrics_reporter_graphite.yaml"
JVM_OPTS="$JVM_OPTS -Dcassandra.metricsReporterConfigFile=$METRICS_REPORTER_CFG"

5). Restart Cassandra service

The meaning of of the Graphite metrics reporter configuration file contents is quite straightforward and self-explanatory. Some key contents are:

  • period ” and “ timeunit ” together determines how frequently the metrics are sent out to the target receiver/sink (Graphite server for our case)
  • prefix ” can be thought of as a source metric identifier. This is because Graphite is a generic monitoring framework which can receive metrics from different sources. It is therefore a good practice to provide the prefix string with a clear naming pattern. For example, a good pattern for Cassandra monitoring can be something like: ‘ cassandra-cluster.name-node.IP/name ‘. Once we do so, we can easily identify Cassandra metrics on Graphite/Grafana side for a particular node within a particular cluster.
  • hosts ” part defines the target Graphite host name/IP and port number (default 2003 if not otherwise changed)
  • predicate.useQualifiedName ” specifies whether or not the fully qualified metrics names are used (e.g. org.apache.cassandra.metrics.Compaction.pendingTasks).
  • predicate.patterns ” defines the metrics filter, meaning only those metrics with the names matching the specified pattern are sent out to the target receiver/sink.

4. Graphite Monitoring Framework

The core part of the solution is based on the generic Graphite monitoring framework which is designed to store, aggregate, and render time-series data. It is a widely used framework and the detail description of it is beyond the scope of this post. In this section, I will briefly touch upon the high level structure of the framework and how our solution fits into this framework.

There are three major components within the core Graphite monitoring framework:

  • Graphite-carbon is an event-driven networking engine that listens for time-series data
  • Graphite-whisper is a simple storage library to store time-series data
  • Graphite-web is a web application that is built upon Python Django web framework and uses Cairo 2D graphics library to render time-series data on demand.

Graphite itself does not collect metrics, it relies on other metrics collection software (e.g. Cassandra with Graphite metrics reporter) to send the metrics to it. Once the metrics is sent to Graphite, the event is detected by Graphite-carbon and processed further, such as being aggregated, stored, and rendered on the web.

In the solution as discussed in this post,we use

  • Postgres SQL database to replace the default embedded SQLite database as the metrics store. It is also used to store Grafana dashboard metadata.
  • Grafana to replace the default “Graphite-web” component for better metrics visualization, although “Graphite-web” is still available at a different port number (as the data source for Grafana)

5. Install and Configure Cassandra Monitoring Solution Software Components

In this section, we will go through the step-by-step instructions of installing and configuring various monitoring components, other than Cassandra part, of this solution on an Ubuntu 14.0.4 host. The configuration of Cassandra part is already described in Chapter 3.

  • Install and Configure Postgres SQL database server
## Install Postgres SQL database server
sudo apt-get update
sudo apt-get install postgresql libpq-dev python-psycopg2

## Create a user (cassmon) for Cassandra monitoring purpose
## Create two database, one for Graphite and one for Grafana
sudo -u postgres psql
CREATE USER cassmon WITH PASSWORD 'some_password';
CREATE DATABASE graphite WITH OWNER cassmon;
CREATE DATABASE grafana WITH OWNER cassmon;
  • Install and Configure Graphite
## Install Graphite-carbon and Graphite-web
sudo apt-get install graphite-web graphite-carbon

## Configure Graphite
sudo vi /etc/graphite/local_settings.py
   #### configure using Postgres SQL as the data store
   DATABASE S = {
     'default': {
     'NAME': 'graphite',
     'ENGINE': 'django.db.backends.postgresql_psycopg2',
     'USER': 'cassmon',
     'PASSWORD': 'some_password',
     'HOST': '127.0.0.1',
     'PORT': ''
     }
   }
   #### some other possible key changes
   SECRET_KEY = 'some long random string'
   # change to your own time zone
   TIME_ZONE = 'America/Toronto' 
   USE_REMOTE_USER_AUTHENTICATION = True

## Sync Graphite database
sudo graphite-manage syncdb 
#-- This step will create a new super user account 
#-- that will be used to log in Graphite web UI

## Configure Carbon
sudo vi /etc/default/graphite-carbon
#-- CARBON_CACHE_ENABLED=true

sudo vi /etc/carbon/carbon.conf
#-- ENABLE_LOGROTATION = True

## Configure Carbon storage schema
sudo vi /etc/carbon/storage-schemas.conf
#-- Add a section called "cassandra" before the last default section "default_1min_for_1day"
    [cassandra]
        pattern = ^cassandra\.
        retentions = 10s:10m,1m:1h,10m:1d

## Configure metrics storage aggregation method
sudo cp /usr/share/doc/graphite-carbon/examples/storage-aggregation.conf.example /etc/carbon/storage-aggregation.conf
#-- Make any changes if needed

## Start carbon-cache service
sudo service carbon-cache start
  • Install and Configure Grafana
## Install Grafana
sudo apt-get install grafana

## Configure Grafana
sudo vi /etc/grafana/grafana.ini
#-- Make the following changes
    [database]
    type = postgres
    host = 127.0.0.1:5432
    name = grafana
    user = cassmon
    password = some_password
    
    [security]
    admin_user = admin
    admin_password = admin
    secret_key = some_long_random_string
    
    [server]
    protocol = http
    http_addr = 127.0.0.1
    http_port = 3000
    domain = localhost
    enforce_domain = true
    root_url = %(protocol)s://%(domain)s/
  • Install and Configure Apache Web Server
## Install Apache Web Server
sudo apt-get install apache2 libapache2-mod-wsgi

## Disable the default virtual host file
sudo a2dissite 000-default

## Copy the Graphite Apache virtual host file into the available sites directory
## and change default port for Graphite-web from 80 to 8080
sudo cp /usr/share/graphite-web/apache2-graphite.conf /etc/apache2/sites-available

sudo vi /etc/apache2/sites-available/apache2-graphite.conf
#-- Make the following change (port number changes from 80 to 8080)
    <VirtualHost *:8080>

## Enable Apache proxy modules for Apache reverse proxy to work
sudo a2enmod proxy proxy_http xml2enc

## Create an Apache site configuration file to proxy requests to Grafana
sudo vi /etc/apache2/sites-available/apache2-grafana.conf
#-- Make the following change
    <VirtualHost *:80>
        ProxyPreserveHost On
        ProxyPass / http://127.0.0.1:3000/
        ProxyPassReverse / http://127.0.0.1:3000/
        ServerName localhost
    </VirtualHost>

## Enable Apache listening on port 80 and 8080
sudo vi /etc/apache2/ports.conf
#-- Make the following change
    Listen 80
    Listen 8080

## Enable Apache sites for Graphite and Grafana
sudo a2ensite apache2-graphite
sudo a2ensite apache2-grafana

## Reload Apache services
sudo service apache2 reload

6. Display Cassandra Metrics via Grafana

At this point, if everything is working fine (e.g. no error in log files). Graphite should be already ready to receive the metrics sent from Cassandra node and display them through Graphite-web page. A sample screenshot is as below:

Graphite-web UI, although working, is far from being as a beautiful and more user friendly way to manage and display the Cassandra metrics through a web page. In order to gain the benefits offered by Grafana, we need to link Graphite and Grafana together, having Graphite as the feeding data source for Grafana. We can do so through Grafana web UI. The steps are as below:

1). Add Graphite as Grafana Data Source. Once the Graphite data source is added, click the “Save and Test” button to make sure it is working.

2). The next step is to add a dashboard, a graph, and display the desired metrics on the graph. The final graph looks like this:

7. Conclusion

In this post, I explored an alternative Cassandra monitoring solution to DataStax’s OpsCenter. This solution is based purely on open-source-based software stacks.

It is worthy to point out that this solution is not only designed for monitoring Cassandra metrics. Actually, from the architecture diagram above, Cassandra node is only acting as one type of metrics provider. Other types of metrics provider, such as StatsD, or CollectsD, can also be used to feed data into this solution, as long as it can trigger data event for Graphite-carbon.

Please also note that the solution discussed in this post is far from being a complete Cassandra monitoring solution. It can be expanded further to include other types of metrics such as 1) OS/hardware level metrics and 2) Application/Code level metrics into the picture to provide a holistic view of the application system.





About List