Rust
I should probably head this whole adventure with a little backstory. I was tasked at work with implementing low-latency streaming of video and occasionally still images, apropos of a project grant. Though the main assignment was “streaming any raw data”, I vividly remembered how the current video and image streaming modules “worked”, and wanted to do something against it apropos this.
The current state of affairs was a Python script that took in RTSP streams through OpenCV, individually re-encoded every frame as a JPEG, then sent that along for storage and processing. I’ve been itching to flex the power of Rust for a couple of months now, and this is exactly the systems-level, low-latency, and high performance task where it would shine.
So I set off with a simple game plan to a proof-of-concept implementation:
[a week or two passes]
Two issues:
Let’s get the lay of the land on PyPi. Searching for “rtsp” brings up the following options:
What does the readme-
/((((((\\\\
=======((((((((((\\\\\
(( \\\\\\\
( (* _/ \\\\\\\
\ / \ \\\\\\________________
| | | </ __ ((\\\\
o_| / ____/ / _______ \ \\\\ \\\\\\\
| ._ / __/ __(_-</ _ \ \ \\\\\\\\\\\\\\\\
| / /_/ \__/___/ .__/ / \\\\\\\ \\
.______/\/ / /_/ / \\\
/ __.____/ _/ ________( /\
/ / / ________/`---------' \ / \_
/ / \ \ \ \ \_ \
( < \ \ > / \ \
\/ \\_ / / > )
\_| / / / /
_// _//
/_| /_|
…that’s definitely one way to get my attention! This unicorn is Brackets Approved!
The next line though…
Convenience-wrapper around OpenCV-Python RTSP functions
Yup, it’s just OpenCV. Provides a fully decoded image, and there’s no accessible way to bypass that. Also, pulling in all of OpenCV just for reading a network stream sounds like the definition of overkill.
Convert rtsp.c to rtsp_curl.py
I’m sure this works just fine for the author, but I’d rather use something a bit more documented and tested.
Version 0.0.6
Solid start. The API example seems much better thought out, but…
=
=
,
break
It does its own decoding yet again, though at least it seems to only call into ffmpeg instead of commandeering all of OpenCV.
Python extension for fast opencv-
And that’s all I needed to hear.
It is at this point where I gave up. Had I gone just a couple items further in the search results, I would have run across…
This is a very simple asyncio library for interacting with an RTSP server, with basic RTP/RTCP support.
The intended use case is to provide a pretty low level control of what happens at RTSP connection level, all in python/asyncio.
This library does not provide any decoding capability, it is up to the client to decide what to do with received RTP packets.
…the exact library I needed.
But that didn’t happen. I got discouraged by a sea of OpenCV, and so began the journey of rewriting everything from scratch.
Searching for “rtsp” on crates.io provides us with much less. rtsp
is a stub, gst-plugin-rtsp
is for GStreamer, rtsp-types
is just types, parser, that sort of stuff, and I’d rather not have to strip functions out of rave
.
This is when, while scrolling IETF Datatracker, dreading having to implement the protocol from scratch, a stray web search guided me to retina
. A library written for a network video recorder with pretty solid support for h.264, which is what I will have to deal with most of the time. Nearly perfect, and the few remaining issues can be worked around.
Game plan part one is finally coming together, so let’s get on with game plan part two
We have data coming in, it’s time to process it. Searching for “h264” gives us…
Let’s read the decoding example:
use Decoder;
use nal_units;
let h264_in = include_bytes!;
let api = from_source;
let mut decoder = new?;
// Split H.264 into NAL units and decode each.
for packet in nal_units
Nice and reassuringly brief. There are some weird terms here like “NAL units” and “YUV”, but we’ll cross those bridges when we get there.
I’d like to be able to show the decoded image in a window, so I’ll use the ubiquitous winit
to create a window, and softbuffer
to present to it.
For testing, it would be handy to have a single frame of raw h.264 bitstream saved to a file, so let’s generate one.
Let’s convert something appropriate, like this PAL video test pattern
Originally I used the openh264 library to encode, but for the sake of brevity here I’ll just use ffmpeg with some flags that make it compatible with the openh264 decoder. All of these concepts will be explained in later posts.
And just like that, we have a single frame in PAL_test_pattern.h264
. Just to make sure, let’s verify real quick with ffplay
Yup, looks right. With that sorted, it’s coding time!
Let’s start with the softbuffer
example. First, we’ll set up a binary crate with the required dependencies.
[]
= "0.5.0"
= "0.4.1"
= "0.29.15"
The example code spawns a window with winit
, attaches a softbuffer
Surface
, then renders a basic test pattern on the window every time the compositor asks for it.
use NonZeroU32;
use Rc;
use ;
use ;
use WindowBuilder;
Running that provides us with a nice pattern:
Continuing the classic programmer tradition of copy-pasting things until they work, let’s insert the openh264
decoding example at the top before we enter the event loop.
let mut surface = new.unwrap;
// Pack the frame into the binary for simplicity
const H264_IN: & = include_bytes!;
// Prepare the decoder
let api = from_source; // Why does this matter?
let mut decoder = new.unwrap;
// Find the decoded frame
let mut frame = vec!;
for nal in nal_units
event_loop.run
And now we can replace the contents of the for loop drawing the test pattern with a loop that converts this pixel data into a format that softbuffer
understands.
let mut buffer = surface.buffer_mut.unwrap;
for index in 0..
buffer.present.unwrap;
For now, we’ll force the window to be the correct size when creating it.
let event_loop = new.unwrap;
let window = new;
let context = new.unwrap;
And just like that…
Let’s pipe the RTSP stream into this. Spawning the decoder into a thread that receives messages and making winit constantly request repaints… This is going to be easy as p-
Huh. What gives?
Maybe I should stop trying to make things happen without understanding what they really entail. H.264 is a very old standard designed by lots of people and companies, and it turns out, it’s not even a single codec. There are lots of profiles that each add more and more advanced coding tools to the mix, and if the burden of implementing all that wasn’t hard enough already, they are also under separate patents.
The biggest pro of OpenH264 is Cisco’s pledge, that as long as somebody uses the precompiled library provided by them, they will not pass on the licensing costs to the user. This makes OpenH264 very popular, especially for companies, because it removes a lot of legal headaches from software development. However…
The biggest con of OpenH264 is that it doesn’t support many profiles. Actually, it only supports the lowest one, Constrained Baseline. Most devices we’d have to interface with would much prefer to run at Main or even High, thus the bitstreams put out by them through RTSP might as well be Greek to the decoder.
So what now?
We would definitely prefer to not have to pay MPEG-LA for using software like libav in our pipeline, but we’d also like to be able to support as many of the codec’s features as possible.
However, there is a way to have our cake and eat it too. We already have paid MPEG-LA for a hardware license! A couple cents of each processor and graphics card sale is immediately forwarded to video codec patent holders, because all three major silicon vendors incorporate video encoding and decoding pipelines into their products.
Intel has QuickSync, NVIDIA has VDPAU and NVENC/NVDEC, but the common denominator seems to be Intel’s open source VAAPI. Intel (obviously) supports it, NVIDIA hardware can be made to support it through a compatibility layer, and AMD also provides official implementations.
Now I’ll just have to learn to make use of it.
Help.