Feature: Compaction

Datetime:2016-08-23 00:50:36          Topic: CouchDB  DataBase           Share

This is the sixth in a series of blog posts introducing the Apache CouchDB 2.0 release. Read partsone, twothree , fourandfive in the series.

One way CouchDB averts data corruption is by only updating database files via append operations, never mutating existing data. While this method has numerous advantages, it tends to use a lot of disk space relative to a more traditional update-in-place DBMS. With every change to a database — be it insertion of new documents, update, or deletion — CouchDB internal b-trees and headers also need to be partially updated to incorporate any changes. These updates are added at the end of the database file, so CouchDB files can grow very quickly, and can contain a lot of unreferenced data, AKA “garbage.”

To free the system from this “garbage,” CouchDB uses a process called Compaction . Compaction works by copying the most recent revision of every document (while keeping some small metadata info of previous revisions) to a new compacted file, and should be run periodically to recover this wasted disk space. While it can be useful for databases where only new documents are inserted, it is especially beneficial for update-heavy databases where documents have many revisions.

IOQ

In previous versions of CouchDB, all database operations had equal priority in their access to I/O. Thus, longer running compaction tasks would have the same priority as latency-sensitive interactive requests from an application. Moreover, compaction tasks requiring a lot of I/O would noticeably impact the performance of interactive requests, resulting in significantly increased request latencies.

To prioritize different types of requests in their access to I/O, Cloudant developed an IOQ application , which has been added to CouchDB 2.0 . Every database request requiring an I/O operation first goes through IOQ, and is put into one of two queues: one for interactive requests, another one for compaction requests. By default, ten I/O requests can be served concurrently; all other outstanding requests are put into the queues. A next request to be served is either chosen from the interactive queue, or from the compaction queue with the ratio of 100:1 (default ratio). This allows to prioritize concurrent interactive requests, and substantially lessen the impact of compaction on them.

The ratio and concurrency parameters for ioq can be configured in the default.ini file.

[ioq]

ratio = 0.01

concurrency = 10

Size and speed optimizations

While CouchDB committer and Cloudant lead architect Paul Davis contends that compaction as a concept is “dead simple,” doing it in the most straightforward way can leave a lot of room for improvement. The basic process involves walking the database for all docs by order of their last update (seq_tree), and copying all related data to a new file, which ultimately replaces the original db file. At one time the id_tree (all docs by order of their ids) was written directly to the new file, but Adam Kocoloski observed that writing to the id_tree in the order of the seq_tree could cause excessive garbage to be generated since the id_tree writes would be out of order.

This observation ultimately resulted in a major optimization in which the id_tree is written to a temp (.compact.meta) file. At the end of compaction, that id_tree is copied back to the compacted (.compact.data) file in order, which can result in greatly reduced size and compaction time.

Overall, this technique works well, and has been used by Cloudant in production for several years. The one caveat we’ve found so far is that it’s sometimes possible to create temp files with the last header buried several GB from the end of the file. If the compaction is interrupted for some reason (like a reboot), when it resumes it needs to find the most recently written header, which can take a lot of time if it’s deeply buried. We are working on techniques to speed up the location of buried headers , but it may also be possible to improve the underlying algorithm to prevent headers from being buried too deeply in the first place.

Compaction is a shard operation

In CouchDB 2.0, compaction is a shard operation, as every shard is an individual CouchDB database. Cluster-wide, node-wide manual compaction through a single http request is not implemented, as compacting all shards of a db on all nodes at once would significantly impair the database’s performance even with controlling IOQ. Thus, a compaction task is left for admins, and should go through a backdoor port 5986.

Manual database compaction

An example of a http request for compacting a shard 00000000-1fffffff  of “test” db on the node1:

curl -H &quot;Content-Type: application/json&quot; -X POST <a href="15986/shards%2F00000000-1fffffff%2Ftest.1470075898/_compact&lt;/a"><br /> where &quot;test.1470075898&quot; is the name of couch file on this shard.</a>

Manual view compaction

Similar to databases, views are also shard based, and view compaction operations and should be run on the backend port 5986.

For a view stored in design doc: “_design/app” on the shard 80000000-9fffffff of the database “test”, the request would be:

curl -H &quot;Content-Type: application/json&quot; -X POST

Currently, the view compaction feature is not fully implemented, and will only compact views for shards that contain a design doc. For example, an attempt to compact the view of the shard 00000000-1fffffff that doesn’t contain a design doc “_design/app”, will cause the following error:

curl -H &quot;Content-Type: application/json&quot; -X POST

{&quot;error&quot;:&quot;not_found&quot;,&quot;reason&quot;:&quot;missing&quot;}

There is an open JIRA issue for this , and this will be fixed in the future.

Automatic compaction

Automatic compaction in CouchDB 2.0 works similarly with CouchdDB 1.6 , using the same configurations.

For compacting views, the compaction daemon has the same problem as the manual compaction of views: it will only compact views on shards that contain design documents.

Jay Doane is a software developer at IBM working on Cloudant Local (the on-prem version of the database), and Cluster Elasticity.

Mayya Sharipova is a software developer at IBM Cloudant focusing on integrations of CouchDB database with Apache Lucene and Spark.

You can download the latest release candidate from http://couchdb.apache.org/release-candidate/2.0/ . Files with -RC in their name a special release candidate tags, and the files with the git hash in their name are builds off of every commit to CouchDB master.

We are inviting the community to thoroughly test their applications with CouchDB 2.0 release candidates. See the testing and setup instructions for more details.





About List