Part 3: Putting the F in FFI

10 minutes read


If there’s one thing the VA-API documentation doesn’t convey too well, it’s how absolutely massive the API surface is. The main decode loop example shows it to be simple and straightforward, “configure, upload, run”, however a lot of complexity is hiding in the configure and upload phase.

Wrapping all of it is going to be a large project, so some divide-and-conquer will be required. It would be nice to immediately jump into decoding AV1, but as of writing this I am unfortunately unable to jump, so we’ll have to take this one step at a time.

The previous paragraph should give you an idea of how much I’ve procrastinated while writing this post.

Building the staircase to Greatness

As we’ve discussed earlier, VA-API and its modules can be split into two large groups. The core functions (further separable into codecs), and the platform-specific glue libraries. It is also capable of much more than video coding as we will discover, but for now, the first priority will be simple media decoding.

All my computers run Wayland now, so the first platform to implement should either be X11 through XWayland or straight Wayland-native processing. Configuring VAAPI to call directly into the Direct Rendering Manager - skipping the desktop environment - would also be possible, though I would really prefer being able to draw to a window with as little friction as possible, so it wouldn’t be a good first platform.

The platform libraries don’t contain much exposed functionality, so let’s scan through the capabilities of both.

X marks the way(land)

Starting with the more broadly available X11 platform, the header file is at va/va_x11.h

/*
 * Returns a suitable VADisplay for VA API
 */
VADisplay vaGetDisplay(
    Display *dpy
);

/*
 * Output rendering
 * Following is the rendering interface for X windows,
 * to get the decode output surface to a X drawable
 * It basically performs a de-interlacing (if needed),
 * color space conversion and scaling to the destination
 * rectangle
 */
VAStatus vaPutSurface(
    VADisplay dpy,
    VASurfaceID surface,
    Drawable draw, /* X Drawable */
    short srcx,
    short srcy,
    unsigned short srcw,
    unsigned short srch,
    short destx,
    short desty,
    unsigned short destw,
    unsigned short desth,
    VARectangle *cliprects, /* client supplied destination clip list */
    unsigned int number_cliprects, /* number of clip rects in the clip list */
    unsigned int flags /* PutSurface flags */
);

Nice and simple. A way to convert an X11 display handle into a VA-API-branded display handle, and a monster function to draw a surface to something drawable with a load of additional clipping and mapping capabilities.

For Wayland, the story is a little different, though much better documented:

/**
 * \defgroup api_wayland Wayland rendering API
 *
 * @{
 *
 * Theory of operations:
 * - Create a VA display for an active Wayland display ;
 * - Perform normal VA-API operations, e.g. decode to a VA surface ;
 * - Get wl_buffer associated to the VA surface ;
 * - Attach wl_buffer to wl_surface ;
 */

/**
 * \brief Returns a VA display wrapping the specified Wayland display.
 *
 * This functions returns a (possibly cached) VA display from the
 * specified Wayland @display.
 *
 * @param[in]   display         the native Wayland display
 * @return the VA display
 */
VADisplay
vaGetDisplayWl(struct wl_display *display);

/**
 * \brief Returns the Wayland buffer associated with a VA surface.
 *
 * This function returns a wl_buffer handle that can be used as an
 * argument to wl_surface_attach(). This buffer references the
 * underlying VA @surface. As such, the VA @surface and Wayland
 * @out_buffer have the same size and color format. Should specific
 * color conversion be needed, then VA/VPP API can fulfill this
 * purpose.
 *
 * The @flags describe the desired picture structure. This is useful
 * to expose a de-interlaced buffer. If the VA driver does not support
 * any of the supplied flags, then #VA_STATUS_ERROR_FLAG_NOT_SUPPORTED
 * is returned. The following flags are allowed: \c VA_FRAME_PICTURE,
 * \c VA_TOP_FIELD, \c VA_BOTTOM_FIELD.
 *
 * @param[in]   dpy         the VA display
 * @param[in]   surface     the VA surface
 * @param[in]   flags       the deinterlacing flags
 * @param[out]  out_buffer  a wl_buffer wrapping the VA @surface
 * @return VA_STATUS_SUCCESS if successful
 */
VAStatus
vaGetSurfaceBufferWl(
    VADisplay           dpy,
    VASurfaceID         surface,
    unsigned int        flags,
    struct wl_buffer  **out_buffer
);

/**
 * \brief Returns the Wayland buffer associated with a VA image.
 *
 * This function returns a wl_buffer handle that can be used as an
 * argument to wl_surface_attach(). This buffer references the
 * underlying VA @image. As such, the VA @image and Wayland
 * @out_buffer have the same size and color format. Should specific
 * color conversion be needed, then VA/VPP API can fulfill this
 * purpose.
 *
 * The @flags describe the desired picture structure. See
 * vaGetSurfaceBufferWl() description for more details.
 *
 * @param[in]   dpy         the VA display
 * @param[in]   image       the VA image
 * @param[in]   flags       the deinterlacing flags
 * @param[out]  out_buffer  a wl_buffer wrapping the VA @image
 * @return VA_STATUS_SUCCESS if successful
 */
VAStatus
vaGetImageBufferWl(
    VADisplay           dpy,
    VAImageID           image,
    unsigned int        flags,
    struct wl_buffer  **out_buffer
);

This platform doesn’t offer as many clipping and scaling capabilities at presentation time, but I imagine that is mostly because the Wayland protocol is much leaner and leverages GPU programming APIs instead of cramming everything and the print spooler into itself. Though having to attach these buffers to the window manually will mean needing to call into Wayland functions directly, which might add to the complexity.

Another potential issue would be needing to match pixel formats manually, since both buffer conversion functions mention “Should specific color conversion be needed, then VA/VPP can fulfill this purpose.” So for simplicity’s sake let’s stick to X and XWayland for initial development.

Initial Commit

I foresee this project growing quite large, with accompanying tools and multiple crates, so let’s create a workspace for them.

$ mkdir vaudeville
$ git init
Initialized empty Git repository in [...]/vaudeville/.git/

Welcome to the world, Vaudeville. Let’s define the workspace by creating a Cargo.toml file in the root directory:

[workspace]
resolver = "3"

[workspace.package]
version = "0.0.1"
authors = ["Karcsesz <git@karcsesz.hu>"]
description = "A VA-API library for Rust"
license = "MIT OR Apache-2.0"
repository = "https://code.thishorsie.rocks/Karcsesz/vaudeville"
readme = "README.md"
keywords = ["vaapi", "video", "api"]
categories = ["api-bindings", "encoding", "hardware-support", "multimedia"]
edition = "2024"
Resol-what? Cargo currently has three slightly incompatible behaviours for resolving what versions of dependencies to use for crates, and for unifying optional features between two separate imports of the same library; normally decided by the Rust edition that a specific crate uses. However, for workspaces, there is no edition specified, and Cargo defaults to the backwards-compatible behaviour of using resolver = "1".

This might prank us in the long run, being used to how resolver version 3 (introduced in the 2024 edition) works, so it is recommended to explicitly set the resolver version in workspaces. Read more

Note that I’ve specified a lot of package-like values, but instead of putting them in the [package] section, I’ve put them in the [workspace.package] one. This will let the packages in the workspace share these values, pulling from a common source instead of having to duplicate version numbers and licence data for each of them.

If we now add the main crate:

$ cargo new vaudeville --lib
    Creating library `vaudeville` package
      Adding `vaudeville` as member of workspace at `[...]/vaudeville`
note: see more `Cargo.toml` keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

Then when we open the freshly created manifest:

[package]
name = "vaudeville"
version.workspace = true
authors.workspace = true
description.workspace = true
license.workspace = true
repository.workspace = true
readme.workspace = true
keywords.workspace = true
categories.workspace = true
edition.workspace = true

We’ll see that the fields are automatically set to inherit the workspace’s values.

Let’s bang out a temporary README to go along with it.

FFI Separation

I’d also like to separate the FFI bindings. Just going to call it vaudeville-ffi instead of the more idiomatic vaapi-sys because I will likely taylor it to work with Vaudeville first and foremost, as opposed to the more generic “just bindgen the lib” others would expect from something called vaapi-sys. Also, I don’t feel confident taking such an important name in the global namespace.

$ cargo new vaudeville-ffi --lib
    Creating library `vaudeville-ffi` package
      Adding `vaudeville-ffi` as member of workspace at `[...]/vaudeville`
note: see more `Cargo.toml` keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

This latter one shouldn’t just inherit every package field from the workspace, so we’ll need to edit the Cargo.toml for it.

[package]
name = "vaudeville-ffi"
description = "VA-API FFI bindings for use with Vaudeville"
keywords = ["vaapi", "video", "api", "ffi"]
categories = ["external-ffi-bindings", "encoding", "hardware-support", "multimedia"]

version.workspace = true
authors.workspace = true
license.workspace = true
repository.workspace = true
readme.workspace = true
edition.workspace = true

[dependencies]

And now we can start by pasting the macro from the previous part in its rightful place at /vaudeville-ffi/src/macros/dyload.rs.

Rust has some special syntax for achieving a “pub macro_rules!”, so we have to add #[macro_export].

//! Macro for loading functions from a dynamic library

#[macro_export]
macro_rules! dylib {
//...
}

X11 integration

With that done and dusted, we can finally test the macro by binding the X11 platform functions inside /vaudeville-ffi/src/x11.rs:

use std::ffi::{c_short, c_uint, c_ushort};
use crate::dyload;

dyload! {
    const LIB_NAME: &str = "va_x11";
    pub struct X11;

    fn vaGetDisplay(
        display: Display
    ) -> VADisplay;

    fn vaPutSurface(
        display: VADisplay,
        surface: VASurfaceID,
        drawable: Drawable,
        srcx: c_short,
        srcy: c_short,
        srcw: c_ushort,
        srch: c_ushort,
        destx: c_short,
        desty: c_short,
        destw: c_ushort,
        desth: c_ushort,
        cliprects: *const VARectangle,
        number_cliprects: c_uint,
        flags: c_uint,
    ) -> VAStatus;
}

We can use the special typedefs from std::ffi to match most function parameters, but there are a few that we have to port over from C ourselves.

Display is going to be a special handle only used for this function, so I’ll just define it right in the same file. The C version is imported from X11/Xlib.h:

/*
 * Display datatype maintaining display specific data.
 * The contents of this structure are implementation dependent.
 * A Display should be treated as opaque by application code.
 */
#ifndef XLIB_ILLEGAL_ACCESS
typedef struct _XDisplay Display;
#endif

Should be treated as opaque by application code, fair enough. A pub type to *mut c_void should do for now. Drawable is similar, a simple typedef to XID which is a typedef to unsigned long. The rest of the types are going to be used all over the codebase, which warrants better separation.

VADisplay is another simple case, typedef to void*. Now we could also just define this as a *mut c_void too, but that would let users accidentally pass all sorts of funky types also resolving to just *mut c_void without any resistance from the compiler. But if we wrap the pointer in a #[repr(transparent)] struct, it will pass through FFI barriers just the same, while telling the compiler that it is its own distinct type which should be respected as such.

I’ll also toss in a NonNull wrapper, because we don’t want to accidentally mistake vaGetDisplay returning NULL as a valid VADisplay.

use std::ffi::c_void;

#[repr(transparent)]
pub struct VADisplay(NonNull<c_void>);

This will require a NullableVADisplay to be defined too, to be returned by vaGetDisplay. Here we can make use of a guaranteed optimisation, where Option<NonNull> pointers are guaranteed to have NULL represent None. So we just need to write:

//...

pub type NullableVADisplay = Option<VADisplay>;

Then we can specify that vaGetDisplay can return NULL:

fn vaGetDisplay(
    display: Display
) -> NullableVADisplay;

VASurfaceID and its friends are typedefs to VAGenericID, which is a typedef to unsigned int. Going to utilise the same newtype trick to separate the two, but with some From implementations to allow opt-in conversion between the base variant and the specialised IDs.

use std::ffi::c_uint;

#[repr(transparent)]
pub struct VAGenericID(c_uint);

impl From<VAGenericID> for c_uint {
    fn from(id: VAGenericID) -> Self {
        id.0
    }
}

#[repr(transparent)]
pub struct VADisplayID(VAGenericID);

impl From<VADisplayID> for VAGenericID {
    fn from(va_display: VADisplayID) -> Self {
        va_display.0
    }
}

#[repr(transparent)]
pub struct VAConfigID(VAGenericID);

//...

#[repr(transparent)]
pub struct VAContextID(VAGenericID);

//...

#[repr(transparent)]
pub struct VASurfaceID(VAGenericID);

//...

#[repr(transparent)]
pub struct VABufferID(VAGenericID);

//...

#[repr(transparent)]
pub struct VAImageID(VAGenericID);

//...

VARectangle is our first compound type:

/** \brief Structure to describe rectangle. */
typedef struct _VARectangle {
    int16_t x;
    int16_t y;
    uint16_t width;
    uint16_t height;
} VARectangle;

In Rust, you can mark a struct as #[repr(C)], and it will use the padding and ordering rules defined in the C standard.

#[repr(C)]
pub struct VARectangle{
    x: i16,
    y: i16,
    width: u16,
    height: u16,
}

And finally we have VAStatus which is…

typedef int VAStatus;   /** Return status type from functions */
/** Values for the return status */
#define VA_STATUS_SUCCESS           0x00000000
#define VA_STATUS_ERROR_OPERATION_FAILED    0x00000001
#define VA_STATUS_ERROR_ALLOCATION_FAILED   0x00000002
#define VA_STATUS_ERROR_INVALID_DISPLAY     0x00000003
#define VA_STATUS_ERROR_INVALID_CONFIG      0x00000004
#define VA_STATUS_ERROR_INVALID_CONTEXT     0x00000005
#define VA_STATUS_ERROR_INVALID_SURFACE     0x00000006
#define VA_STATUS_ERROR_INVALID_BUFFER      0x00000007
#define VA_STATUS_ERROR_INVALID_IMAGE       0x00000008
#define VA_STATUS_ERROR_INVALID_SUBPICTURE  0x00000009
#define VA_STATUS_ERROR_ATTR_NOT_SUPPORTED  0x0000000a
#define VA_STATUS_ERROR_MAX_NUM_EXCEEDED    0x0000000b
#define VA_STATUS_ERROR_UNSUPPORTED_PROFILE 0x0000000c
#define VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT  0x0000000d
#define VA_STATUS_ERROR_UNSUPPORTED_RT_FORMAT   0x0000000e
#define VA_STATUS_ERROR_UNSUPPORTED_BUFFERTYPE  0x0000000f
#define VA_STATUS_ERROR_SURFACE_BUSY        0x00000010
#define VA_STATUS_ERROR_FLAG_NOT_SUPPORTED      0x00000011
#define VA_STATUS_ERROR_INVALID_PARAMETER   0x00000012
#define VA_STATUS_ERROR_RESOLUTION_NOT_SUPPORTED 0x00000013
#define VA_STATUS_ERROR_UNIMPLEMENTED           0x00000014
#define VA_STATUS_ERROR_SURFACE_IN_DISPLAYING   0x00000015
#define VA_STATUS_ERROR_INVALID_IMAGE_FORMAT    0x00000016
#define VA_STATUS_ERROR_DECODING_ERROR          0x00000017
#define VA_STATUS_ERROR_ENCODING_ERROR          0x00000018
/**
 * \brief An invalid/unsupported value was supplied.
 *
 * This is a catch-all error code for invalid or unsupported values.
 * e.g. value exceeding the valid range, invalid type in the context
 * of generic attribute values.
 */
#define VA_STATUS_ERROR_INVALID_VALUE           0x00000019
/** \brief An unsupported filter was supplied. */
#define VA_STATUS_ERROR_UNSUPPORTED_FILTER      0x00000020
/** \brief An invalid filter chain was supplied. */
#define VA_STATUS_ERROR_INVALID_FILTER_CHAIN    0x00000021
/** \brief Indicate HW busy (e.g. run multiple encoding simultaneously). */
#define VA_STATUS_ERROR_HW_BUSY                 0x00000022
/** \brief An unsupported memory type was supplied. */
#define VA_STATUS_ERROR_UNSUPPORTED_MEMORY_TYPE 0x00000024
/** \brief Indicate allocated buffer size is not enough for input or output. */
#define VA_STATUS_ERROR_NOT_ENOUGH_BUFFER       0x00000025
/** \brief Indicate an operation isn't completed because time-out interval elapsed. */
#define VA_STATUS_ERROR_TIMEDOUT                0x00000026
#define VA_STATUS_ERROR_UNKNOWN                 0xFFFFFFFF

…something I’m just going to use an enum for right now.

#[repr(i32)]
#[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)]
pub enum VAStatus {
    Success = 0x00000000,
    OperationFailed = 0x00000001,
    AllocationFailed = 0x00000002,
    InvalidDisplay = 0x00000003,
    InvalidConfig = 0x00000004,
    InvalidContext = 0x00000005,
    InvalidSurface = 0x00000006,
    InvalidBuffer = 0x00000007,
    InvalidImage = 0x00000008,
    InvalidSubpicture = 0x00000009,
    AttrNotSupported = 0x0000000A,
    MaxNumExceeded = 0x0000000B,
    UnsupportedProfile = 0x0000000C,
    UnsupportedEntrypoint = 0x0000000D,
    UnsupportedRtFormat = 0x0000000E,
    UnsupportedBufferType = 0x0000000F,
    SurfaceBusy = 0x00000010,
    FlagNotSupported = 0x00000011,
    InvalidParameter = 0x00000012,
    ResolutionNotSupported = 0x00000013,
    Unimplemented = 0x00000014,
    SurfaceInDisplaying = 0x00000015,
    InvalidImageFormat = 0x00000016,
    DecodingError = 0x00000017,
    EncodingError = 0x00000018,
    /// An invalid/unsupported value was supplied
    ///
    /// This is a catch-all error code for invalid or unsupported values.
    /// e.g. value exceeding the valid range, invalid type in the contex
    /// of generic attribute values.
    InvalidValue = 0x00000019,
    /// An unsupported filter was supplied.
    UnsupportedFilter = 0x00000020,
    /// An invalid filter chain was supplied.
    InvalidFilterChain =  0x00000021,
    /// Indicate HW busy (e.g. run multiple encoding simultaneously).
    HwBusy = 0x00000022,
    /// An unsupported memory type was supplied.
    UnsupportedMemoryType = 0x00000024,
    /// Indicate allocated buffer size is not enough for input or output
    NotEnoughBuffers = 0x00000025,
    /// Indicate an operation isn't completed because time-out interval elapsed.
    Timedout = 0x00000026,
    // The VAStatus is an i32, so Rust complains about putting the
    // positive max value of an u32 instead of blindly casting like in C.
    // Using `as` allows the same behaviour.
    Unknown = 0xFFFFFFFFu32 as i32,
}
Here be UB! Here we have to think about a very interesting bit of undefined behavior that Rust kept around. What if libva adds a new error variant on their side, and we don’t track it on the Rust side?

Rust defines that “an enum must have a valid discriminant […]”, as in, it is considered UB to have an enum which is set to a value not defined in its variant list. This allows the compiler to eliminate catch-all match arms it deems unneeded, which would break the following code:

let some_value: VAStatus = get_a_status();

match some_value {
    VAStatus::Success => {println!("Success!")}
    // Further branches matching every other value
    unknown => {println!("Unknown error type received: {:x}", unknown as u32)}
}

Since the compiler can verify that we have already matched all variants it knows about, it optimises the fallback straight out, meaning we suddenly have a match arm that can potentially not match anything. If we return values from the arms, suddenly we can forget to initialise memory. Rust’s safety guarantees collapse in on themselves like a house of cards.

Unfortunately even something like #[non_exhaustive] isn’t enough for us, since it is only a lint, and doesn’t change the behaviour of the code generation. So we will eventually have to switch to a more robust system likely involving macros and two separate types. One safe for Rust code, and one that can traverse the FFI boundary. Read more

Workspaced dependencies

A quick compile check shows us that we’re missing a dependency. Since we’re in a workspace, I’m going to make use of its common dependency management capabilities. To do that, first we have to add a [workspace.dependencies] section to the workspace’s TOML file (remember how [workspace.package] worked?) and define the version of libloading that we would like to import.

# ...
edition = "2024"

[workspace.dependencies]
libloading = "0.8.6"

Followed by defining the fact that libloading is required in our FFI crate in its manifest.

# ...
edition.workspace = true

[dependencies]
libloading = {workspace = true}

Note how instead of adding a version, we’ve instead written that it should pull the dependency from the workspace. Aaaand…

$ cargo check
warning: `vaudeville-ffi` (lib) generated 6 warnings
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.29s

Now our code compiles!

Warnings everywhere!

I hate seeing code warnings, so let’s clean them up next. Thankfully they’re all style warnings about how our functions and function types should conform to the Rust standard, which we can’t really do much about without breaking the library loading logic. So we’re just going to edit the macro to tell the compiler to ignore these lints.

Add a quick #[allow(non_camel_case_types)] to the loop creating the type $func lines, and some #[allow(non_snake_case)] attributes to the helper method definitions and the structure declaration…

$ cargo check
    Checking cfg-if v1.0.0
    Checking vaudeville v0.0.1
    Checking libloading v0.8.6
    Checking vaudeville-ffi v0.0.1
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.13s

And we’re clean!

More Macro: Attribute propagation

It would be very useful if we could declare attributes and docstrings on the functions defined through the dyload! macro and have them propagate to the generated function definitions. Thankfully the /// docstring syntax is just sugar for another added attribute, so we can match both quite easily.

//...
macro_rules! dyload {
    (
        const LIB_NAME: &str = $lib_name:literal;
        $(#[$struct_attr:meta])*
        pub struct $struct_name:ident;

        $(
            $(#[$attr:meta])*
            fn $func:ident( $( $param_name:ident : $param_type:ty ),* $(,)? ) $( -> $ret:ty )?;
        )+
    ) => {
//...

Then it’s just a matter of expanding the matches in the right places:

//...
        #[allow(non_snake_case)]
        $(#[$struct_attr])*
        pub struct $struct_name {
            $(
                $func: $func,
            )+
        }
//...
            $(
                #[allow(non_snake_case)]
                $(#[$attr])*
                pub unsafe fn $func( &self, $( $param_name : $param_type ), * ) $( -> $ret )? {
                    unsafe {
                        (self.$func)($($param_name),*)
                    }
                }
            )+
//...

After adding some documentation to the implemented functions, I think it’s time to wrap up for today. In the next part, we’re going to start implementing some core VA-API objects, and figure out a way to manage their lifetimes.