The TPC-C/SPC-1 storage benchmarks are screwed. You know what we need?

Datetime:2016-08-23 05:11:19          Topic: Performance Test           Share

Comment The storage benchmarking world is broken because there are no realistic and pratical storage benchmarks with realistic workloads customers can apply to systems.

So says storage analyst Howard Marks , and he aims to fix this mess with the help of a consortium of industry players.

He says the world of storage benchmarking is currently divided into the simple load generators like IOmeter, FIO or VDbench and complex benchmarks like TPC-C and SPC-1 that can take months to complete.

Marks declares: "Even the best of today’s storage benchmarks have an ever more distant relationship with how modern storage systems work and are used in the real world. Even worse vendors continue to publish test reports that game the system by using a data set smaller than the system under test’s cache or reporting very low latency on an OLTP benchmark because the system under test lacks controller HA."

POC cookbook

The desired outcome is a Proof of Concept (POC) cookbook customers can use to run their own benchmarks using a realistic workload. Their ideal benchmark will be a synthetic one, with a semi-random workload data distribution in addition to sequential and random distributions. It would:

  • Feature controllable levels of data reduction,
  • Be more accessible and affordable than the high-endLoad Dynamix testing system,
  • Support multiple workloads and take the number of controllers and paths into account,
  • React to workload changes over time,
  • Test for the impact of increases in storage I/O demand of lower priority workloads on high priority workloads, and
  • Test how fairly resources are allocated when demand exceeds supply.

The POC cookbook will be a benchmarking guide, with step-by-step instructions on how to run the benchmark, and come with the benchmark software, OVFs of the virtual machines needed to run the benchmarks with the benchmark software installed, and data analysis scripts to generate graphs.

It is intended for the evaluation of primary storage systems and the two "anticipate that the mixed VM workload will be either a tiled set of VMs that are duplicated to increase load or a set of VMs that provide a background load while the primary application(s) run with increasing load until the latency threshold is reached."

Storage profiler

He would "like to integrate a storage profiler into the cookbook so users can profile their applications and then create synthetic workloads that approximate their actual applications."

The ideal storage profiler would view the I/O stream to a storage device and record data that provides us with information about the workload hopefully including:

  • Break I/Os into bands by I/O size,
  • For each I/O size band:
    • Number of I/Os
    • Avg I/O size
    • 90th%tile I/O size
    • Information on locality
    • Information on data reducibility

Storage industry players are invited to join a new benchmark consortium. They'll be invited, but not required, to participate by running pre-release versions of the benchmarks, contributing code and the like. Membership will cost a certain amount of cash per year.

Workload definition

Marks wants to better characterise real-world applications, understand read/write IO size distributions, IO inter-relationships such as associated log file access, data reducibility and locality.

Some of this data is readily available. Several existing member vendors, including Tintri, Nimble and HPE, collect the read/write I/O size information. In theory, this data can be collected with PernixData’s Architect installed as a filter on a vSphere host.

He has developed a scanner (in Java) that will read a dataset and report:

  • Compressibility (currently uses LZW but is engineered to allow additional compression algorithms to be used)
  • Data de-dupability using fixed blocks (4KB-1MB) and SHA-1

Marks would like benchmarkimg community members to scan their data. He is going to recommend that users restore the data for their workloads and point the scanner at the restored files. And he wants to build a library of traces and reduction scans.

The idea is that users will be able to download the tools for the reduction scan and scripts that simplify the trace collection process. They’ll then fill out a form describing the application, and optionally their organization, and upload the files.

Analysts would then run the files through parsers, upload some histograms and feed the data into a aggregator process so 5 OLTP workloads can create one “average” workload.

The end result of all this should be a prescriptive cookbook that will allow a customer to run a POC in under 30 days and get a good idea how the storage system under test will behave in the real world.

The new benchmark organisation has a somewhat wacky name: The Other Other Operation - think Monty Python - and you can get in touch with Howard Marks by using this contact form . ®





About List