Sun designed its 128-bit general-purpose Solaris Zettabyte File System (ZFS) to span the range between the desktop and the data center. ZFS can manage huge amounts of data, easing administration as it does so by removing the constraints associated with directories and subdirectories and allowing administrators to "virtualize" disks and manage data across physical volumes. ZFS is the subject of a BigAdmin System Administration Portal feature by Amy Rich entitled "ZFS, Sun's Cutting-Edge File System (Part 1: Storage Integrity, Security, and Scalability)."
Sun has designed ZFS to address the important issues of integrity and security, scalability and difficulty of administration typical of other UNIXR file systems. This two-part examines the behind-the-scenes workings of ZFS and how this can translate into a savings of time and money for an enterprise. The second part of this article will cover ease of administration and future ZFS enhancements.
ZFS combines the attributes of both file system and volume manager with the additional virtue that the file system-level commands require no concept of the underlying physical disks because of storage pool virtualization, as Rich explains. With this design, all of the high-level interactions occur through the data management unit (DMU), a concept similar to a memory management unit (MMU), only for disks instead of RAM. Because all of the transactions committed through the DMU are atomic, data is never left in an inconsistent state.
To accommodate the move to disk and file system mirroring, Sun has incorporated transaction-based copy-on-write modifications and the continuous checksumming of every in-use block in a file system. The blocks containing the in-use data on disk are therefore never modified. Changed information is written to alternate blocks, and the block pointer to the in-use data is only moved once the write transactions are complete. This happens all the way up the file system block structure to the top block, called the uberblock. Thus, the problem of deciding which mirror contains the uncorrupted data is solved.
ZFS is also an eminently scalable solution because Sun has anticipated the likelihood that 64-bit addressability will soon be inadequate. ZFS is thus a 128-bit solution, which allows it to deliver more than 16 billion times the capacity of current 64-bit systems.
According to Jeff Bonwick, the ZFS chief architect, "Populating 128-bit file systems would exceed the quantum limits of Earth-based storage. You couldn't fill a 128-bit storage pool without boiling the oceans."
Rich writes that ZFS uses a pipelined I/O engine, similar in concept to CPU pipelines that provides scoreboarding, priority, deadline scheduling, out-of-order issue and I/O aggregation. ZFS also implements an intelligent prefetch algorithm that recognizes linear or algorithmic access patterns and guesses the next block to prefetch.
According to Rich, ZFS uses concurrency to improve speed whenever it can, supporting parallel reads and writes to the same file, as well as parallel constant-time directory operations with a locking strategy that is scalable and fast.
ZFS also dynamically stripes data across all available devices, automatically incorporating the new space and re-balancing the write striping whenever a user adds another disk or slice to a stripe.
Finally, Rich concludes, ZFS offers built-in data compression on a per-file system basis, enabling the solution not only to reduce the on-disk space usage but also to decrease the necessary amount of I/O by two to three times. For this reason, enabling compression actually makes some workloads go faster if they are I/O bound instead of CPU bound, Rich writes.
[...read more...]