Implementing New Block Drivers


A QEMU Developer Primer

Jeff Cody
Red Hat, Inc.

First, some background on Block Drivers

What Is A Block Driver?

A QEMU block driver provides storage on the host, that the guest sees as a drive.


In the QEMU code, they are located in the block/ directory.

The two major block driver types in QEMU

  • Image format drivers
  • Protocol based drivers

Image Format Drivers

Provide a structured way to store data, often with built-in features (such as snapshots, data sparseness, drive metadata, etc..).  

Examples

  • QCOW2
  • QED
  • raw
  • etc.

Protocol based drivers:

Provides the underlying data I/O for image formats, although may also be a stand-alone driver as well.   

Examples include:

Network-based protocols drivers in QEMU:

  • Gluster
  • iSCSI
  • NBD
  • etc.

Host protocols drivers in QEMU:

  • raw-posix files
  • raw-posix host devices
  • etc.

The BlockDriver struct

The BlockDriver struct contains the block-layer internal API to the format driver.   It consists mainly of function pointers, used to interface with the driver.

  • .bdrv_open
  • .bdrv_co_readv
  • etc..


We'll get into these more later.

Core Concepts

  • Probing
  • Open and Reopen
  • Coroutines, and Read / Write
  • Metadata Caching
  • Image Creation
  • Data Handling
  • Backing Files
  • Testing
The Probe

Driver is passed first 2048 bytes



00000000  76 68 64 78 66 69 6c 65  51 00 45 00 4d 00 55 00  |vhdxfileQ.E.M.U.|
00000010  20 00 76 00 31 00 2e 00  36 00 2e 00 35 00 30 00  | .v.1...6...5.0.|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
                 
VHDX


00000000  51 46 49 fb 00 00 00 02  00 00 00 00 00 00 00 b8  |QFI.............|
                 
QCOW2


  • If first 2K is not enough, probe won't work.
  • Return a confidence score from 0-100.

The Probe

static int qcow2_probe(const uint8_t *buf, int buf_size, const char *filename)
{
    const QCowHeader *cow_header = (const void *)buf;

    if (buf_size >= sizeof(QCowHeader) &&
        be32_to_cpu(cow_header->magic) == QCOW_MAGIC &&
        be32_to_cpu(cow_header->version) >= 2)
        return 100;
    else
        return 0;
}
                 

The BlockDriver Open Flow

(Simplified view, as relevant to an Image Format Driver)

Let's Jump In

Implementing a (very basic!) Image Format Driver

Creating a new format - the "silly" format

At the minimum, we need:

  • block_init(), for our driver to be seen
  • To define our BlockDriver, and register it via bdrv_register()
  • A format name (ours is "silly")
  • In our BlockDriver, we must at least have a   .bdrv_open() implementation





Let's see this as code.


#include "block/block_int.h"

/* Our open function */
static int silly_open(BlockDriverState *bs, QDict *options, int flags,
                      Error **errp)
{
    return 0;
}

/* Our attributes / functions */
static BlockDriver bdrv_silly = {
    .format_name    = "silly",
    .bdrv_open      = silly_open
};

/* register our BlockDrivers */
static void bdrv_silly_init(void)
{
    bdrv_register(&bdrv_silly);
}

/* macro magic, creates an init function */
block_init(bdrv_silly_init);
                 

Your Driver's Stateful Data

Most Block Drivers require state data, populated on open.

  • Provided via BlockDriverState .opaque field
  • Size is specified in BlockDriver struct,  .instance_size
  • Not allocated by your driver, but by the block layer


The BlockDriverState struct is passed to most functions. The "opaque" field is for your driver's internal data.

What your open must provide

At a minimum, your .bdrv_open must:

  • Verify the integrity of the image, if relevant
  • Populate your internal state data ("opaque")
  • Total sectors, if relevant for your driver

Reopen functionality

Reopen allows QEMU to open an image chain with different flags

  • Live snapshots
  • Block commit
  • etc..


Reopen has three hooks:

  • Prepare (.bdrv_reopen_prepare)
  • Commit (.bdrv_reopen_commit)
  • Abort (.bdrv_reopen_abort)

Reopen functionality


.bdrv_reopen_prepare(), required if reopen supported

  • Returns 0 for success - stub OK
  • Protocols will likely have more to do


Depending on what your driver needs to support reopen, you may also need to provide the commit and abort functions, but these are optional

Coroutines, and Read / Write

Block driver read / write operations all make use of coroutines

What are coroutines?

Coroutines allow asynchronous code to masquerade as synchronous code


(Despite sounding complex, this makes life easier for you!)


Control stays with your coroutine until you either yield, or return

How does my driver use coroutines?

  • Take CoMutex when needed (qemu_co_mutex_lock)
  • Yield if needed (qemu_coroutine_yeild())
  • Everything uses coroutines

The BlockDriver Read Flow

Simplied read flow, for an Image Format driver using I/O vectors

The BlockDriver Write Flow

Simplied write flow, for an Image Format driver using I/O vectors

Metadata Caching

Block Drivers will often cache image format metadata



This can cause problems

During Live Migration, the image format metadata may differ from your cache.

Metadata Cache Flushing

You can either invalidate the metadata when requested, or prevent migration


  • .bdrv_invalidate_cache()
  • migrate_add_blocker()

BlockDriver Image File Creation




  • Implement .bdrv_create()
  • Define QEMUOptionParameter
  • Size
    OPT_SIZE
    String
    OPT_STRING
    Flag
    OPT_FLAG
    Number
    OPT_NUMBER

Sample BlockDriver Struct


static BlockDriver bdrv_qcow2 = {
    .format_name          = "qcow2",
    .instance_size        = sizeof(BDRVQcowState),
    .bdrv_probe           = qcow2_probe,
    .bdrv_open            = qcow2_open,
    .bdrv_close           = qcow2_close,
    .bdrv_reopen_prepare  = qcow2_reopen_prepare,
    .bdrv_create          = qcow2_create,

/* snipped */

    .bdrv_co_readv          = qcow2_co_readv,
    .bdrv_co_writev         = qcow2_co_writev,
    .bdrv_co_flush_to_os    = qcow2_co_flush_to_os,

/* more snipped */
    .create_options = qcow2_create_options,
};  
                 

Data Handling

If you are reading or writing data structures from / to disk, please:

  • Don't trust it
  • Do pack it
  • Do convert endian

Don't Trust That Data

Unvalidated data can be security risks:

  • We don't want to allocate more memory than expected (DoS)
  • We don't want to read / write past buffers (overflows)
  • We don't want to crash QEMU

Packing and Endian conversion

Make sure that your driver will work on all platforms

  • Use QEMU_PACKED on structures read/written from disk
  • Convert endiannes for on-disk fields


If you don't do this, your driver may work fine on your system.

  • But maybe not other systems

Backing Files

  • Populate bs->backing_file
  • Implement .bdrv_co_get_block_status

Testing

  • Use qemu-iotests
    • Create test in tests/qemu-iotests
    • Bash or Python
    • Put test in 0##, and expected output in 0##.out

  • Sample Images
    • For a compatibility driver, test against source

There is much more!

Look at BlockDriver struct in include/block/block_int.h

THE END

Jeff Cody | Red Hat, Inc.