System News
A Better Benchmarking Tool: vdbench vs. dd(1M)
For the Most Part, vdbench Is the Winner
January 22, 2010,
Volume 143, Issue 3

vdbench: a free, multiplatform tool designed for benchmarking storage
 

Although it was never intended by its developers to be used as a benchmarking tool, the dd utility -- a simple, basic utility -- is nevertheless frequently used as a sequential workload generator for quick tests, writes Lisa Noordergraaf in the blog Pitfalls of Benchmarking Flash with dd(1M). She recommends vdbench as an alternative.

People often plug in a drive, then fire up a dd to see what kind of throughput they can achieve, says Noordergraaf, explaining that her article walks readers through some examples of the pitfalls of using dd for flash benchmarking, and suggests an alternative strategy for measuring raw IO throughput. Many of these issues are not unique to dd, but they are easy to understand and explain in the context of the simple dd workload, she adds.

In the first example Noordergraaf presents, she used one of the four flash modules in the Sun F20 PCIe Accelerator Card in her workstation under a simple sequential workload.

The problem here, she observes is that the default block size is 512 bytes. Today's flash devices are pretty uniformly 4k sector devices: this means you need to use a multiple of 4k block sizes in order to achieve optimal performance. The dd default block size of 512 bytes is definitely something likely to impact performance.

Even with the block size cranked up to 16k, Noordergraaf continues, while the performance jumps from 8MB/s to 134MB/s, a lack of parallelism is the result: dd is a single threaded benchmark. This can be observed in the iostat output she provides by looking at the "actv" and "wait" queues. At any given point in time, there is 0.9 (best to round that to 1) IO request outstanding.

The author explains that modern enterprise flash and SSDs may appear as a single disk drive to the operating system by virtue of the their SAS and SATA interfaces. In this case, however, appearances are misleading. In actuality each of these flash modules contains multiple NAND dies, and multiple channels supplying IO to the dies. Consequently, using a single-threaded benchmark can be performance limiting: it does not allow us to utilize the inherent parallelism built into these devices.

One way to get around this limitation, Noordergraaf explains, is to use a benchmark tool that is a bit more up to the task. She recommends vdbench: a multiplatform tool designed for benchmarking storage, freely available at vdbench.org.

Running the same 16k blocksize sequential read workload as before, but this time using 32 threads of parallelism with vdbench as a driver, the iostat output shows performance bumped up to 255MB/sec - a far cry from the 8MB/s with the default dd settings, and also a large improvement from the single threaded, 16k dd results of 134MB/sec. Also, the number of outstanding IOs/sec came in at 31.8, which illustrates that the processing of large numbers of IOs in parallel.

Noordergraaf continues with the observation that another option is to ratchet up the block size even further. If the block size is large enough, she explains, a single-threaded workload such as dd(1M) can typically push enough data to fully utilize a flash device. Of course, you may not know if you are limited by lack of a parallel workload unless you are using a tool that allows you to vary the amount of parallelism.

In the final analysis, Noordergraaf recommends that in order to see the optimal performance with flash devices, both the block size and parallelism are key. If you are going to benchmark IO on flash, be sure to take a look at these issues, whatever your benchmark of choice may be.

She also recommends strongly that one check out vdbench, which runs on multiple platforms and OSes. It is freely available, and well suited for testing modern flash devices and traditional storage.

More Information

Pitfalls of Benchmarking Flash with dd(1M)

Vdbench Test Results for Sun Flash Accelerator F20 PCle Card

A Survey of Nine Years' Worth of Benchmarking Results [...read more...]

Keywords:

fullsource
 

Other articles in the Performance section of Volume 143, Issue 3:
  • A Better Benchmarking Tool: vdbench vs. dd(1M) (this article)

See all archived articles in the Performance section.



News and Solutions for Users of Solaris, Java and Oracle's Sun hardware products
Just the news you need, none of what you don't – 42,000+ Members – 24,000+ Articles Published since 1998