Rust
The task has been set, so let’s do some research.
Every good API needs to have documentation. For VA-API, it’s located at https://intel.github.io/libva/. As of the writing of this article, the published version is 2.19.0
. We can follow the lead to the source code of the reference implementation too.
I’m not the first person to attempt developing Rust bindings to this API, there’s Ferrovanadium
advertising a “Blatantly unsound API” among other humorous features. As for other resources,ffmpeg
obviously supports VA-API acceleration, so we’ll also be able to study how that functions in case we get stuck. Finally, there’s the official libva-utils
providing further examples.
Reading the documentation’s introduction, we can see the separate functions that VA-API provides.
We also get a guarantee that all functions are inherently thread-safe (though we should be careful about what order they execute in), as long as they’re not called from signal handlers. Not sure how we can forbid that with Rust’s type system, but we’ll figure something out. Worst case we’ll just replicate the warning.
Finally, we get a quick C example with a few comments:
// Initialization
dpy = ;
;
// Create surfaces required for decoding and subsequence encoding
;
// Set up a queue for the surfaces shared between decode and encode threads
surface_queue = ;
// Create decode_thread
;
// Create encode_thread
;
// Decode thread function
// Encode thread function
This last part looks interesting, so let’s go through it step by step:
First off, we have to initialize the library. To do that, first we have to acquire a so-called “display handle”
dpy = ;
In this case, we’re grabbing onto Linux’s Direct Rendering Manager device (the good DRM) through a file descriptor. The Direct Rendering Manager (DRM) is a subsystem of the Linux kernel responsible for interfacing with GPUs of modern video cards. We all know that in Unix, everything is a file. And for everything to be a universally usable file, there should be a unified and standardised way to make use of that file. That is what the Direct Rendering Manager subsystem is for. Pop open a terminal on a linux machine: This specific computer has a single graphics card (built into the AMD APU), so there are two files, and a directory that shows the same two files, but as nodes on the PCI-e device tree. Each card gets a It is this latter one that we’ll eventually work with for our VA-API adventures. Read moreWhat's the Direct Rendering Manager?
card[x]
file and a renderD[x]
file. The card
file is for privileged access meant for setting global preferences (like the current screen resolution), while the renderD128
and up files are for user applications to utilise to submit render tasks to the device.
;
The next line actually tells VA-API that we are indeed going to do VA-API things with this display handle, so it better wake up.
;
Next we’re going to create some surfaces that will store images in a specified “render target format” and have a set resolution. Surfaces can be thought of as special data buffers that (for all intents and purposes) live on the GPU and have the required metadata to be interpreted as an image.
surface_queue = ;
;
;
Then there’s some C-style threading stuff. We set up a queue to send messages between threads, then spawn two threads that will be executing the two functions defined below.
To decode images, the function has to go through a couple setup steps:
;
First it lists the supported “entrypoints” for a “profile”.
A profile is a specific codec, and an entrypoint is a processing pipeline that the hardware can implement for that codec. In this case we’re listing the supported pipelines for the H.264 profile.
;
Then it creates a “config” targeting the H.264 profile and calling into the VLD entrypoint.
VLD stands for Variable Length Decoding, I imagine because that’s a good collective name for all codecs. Decoding variable length inputs to known-resolution outputs.
;
A config itself is just an inert blob vaguely pointing at a part of the processing tools exposed by the API. To be able to make use of it, we have to create a “context” from the config.
All further operations are going to be done in this context, so this way we can separate our operations without having to register the same display multiple times.
Now we decode the given frames:
for
If you’ve seen OpenGL, this might look familiar. We begin a picture, which marks a surface and starts collecting information. Then we provide the data we want it to process in the form of buffers, and finally we tell it we’re done, and it starts crunching in the background.
If we want to make sure that operations are complete on a surface, we can use vaQuerySurfaceStatus
.
Encoding is done very similarly:
// Find the encode entrypoint for HEVC
// Every process we do in VA-API goes through the same rigmarole,
// just different entrypoints and buffers.
;
// Create a config for HEVC encode
;
// Create a context for encode
;
Same setup process as last time, but this time we use the HEVC (H.265) profile and the EncSlice
entrypoint.
And the actual frame processing is done the same way:
for
Easy-peasy! Let’s get some FFI bindings going.