Using HDMI radio interference for high-speed data transfer

This story, too, begins with noise. I was browsing the radio waves with a software radio, looking for mysteries to accompany my ginger tea. I had started to notice a wide-band spiky signal on a number of frequencies that only seemed to appear indoors. Some sort of interference from electronic devices, probably. Spoiler alert, it eventually led me to broadcast a webcam picture over the radio waves... but how?

It sounds like video

The mystery deepened when I listened to how this interference sounded like as an AM signal. It reminded me of a time I mistakenly plugged our home stereo system to the Nintendo console's video output and heard a very similar buzz.

Am I possibly listening to video? Why would there be analog video transmitting on any frequency, let alone inside my home?

If we plot the signal's amplitude against time we can see that there is a strong pulse exactly 60 times per second. This could be the vertical synchronisation signal of 60 Hz video. A shorter pulse (pictured above) can be seen repeating more frequently; it could be the horizontal one. Between these pulses there is what appears to be noise. Maybe, if we use the strong pulses for synchronisation and plot the amplitude of that noise as a two-dimensional picture, we could see something?

And sure enough, when main screen turn on, we get signal:

(I've hidden the bright synchronisation signal from this picture.)

It seems to be my Raspberry Pi's desktop with weirdly distorted greyscale colours! Somehow, some part of the monitor setup is radiating it quite loudly into the aether. The frequency I'm listening to is a multiple of the monitor's pixel clock frequency.

As it turns out, this vulnerability of some monitors has been known for a long time. In 1985, van Eck demonstrated how CRT monitors can be spied on from a distance[1]; and in 2004, Markus Kuhn showed that the same still works on flat-screen monitors[2]. The image is heavily distorted, but some shapes and even bigger text can be recognisable.

The next thought was, could we get any more information out of these images? Is there any information about colour?

Mapping all the colours

HDMI is fully digital; there is no linear dependency between pixel values and greyscale brightness in this amplitude image. I believe the brightness is related to the number of bit transitions over my radio's sampling time (which is around 8 bit-lengths); and in HDMI, this is dependent on many things, not just the actual RGB value of the pixel. HDMI also uses multiple differential wires that all are transmitting their own picture channels side by side.

This is why I don't think it's possible easy to reconstruct a clear picture of what's being shown on the screen, let alone decode any colours.

But could the reverse be possible? Could we control this phenomenon to draw the greyscale pictures of our choice on the receiver's screen? How about sending binary data by displaying alternating pixel values on the monitor?

My monitor uses 16-bit colours. There are "only" 65,536 different colours, so it's possible to go through all of them and see how each appears in the receiver. But it's not that simple; the bit-pattern of a HDMI pixel can actually get modified based on what came before it. And my radio isn't fast enough to even tell the bits apart anyway. What we could do is fill entire lines with one colour and average the received signal strength. We would then get a mapping for single-colour horizontal streaks (above). Assuming a long run of the same colour always produces the same bitstream, this could be good enough.

Here's the map of all the colours and their intensity in the radio receiver. (Whatever happens between 16,128 and 16,384? I don't know.)

Now, we can resample a greyscale image so that its pixels become short horizontal lines. Then, for every greyscale value find the closest matching RGB565 color in the above map. When we display this psychedelic hodge-podge of colour on the screen (on the right), enough of the above mapping seems to be preserved to produce a recognizable picture of a movie[3] on the receiver side (on the left):

These colours are not constant in any way. If I move the antenna around, even if I turn it from vertical to horizontal, the greyscales will shift or even get inverted. If I tune the radio to another harmonic of the pixel clock frequency, the image seems to break down completely. (Are there more secrets to be unfolded in these variations?)

The binary exfiltration protocol

Now we should have enough information to be able to transmit bits. Maybe even big files and streaming data, depending on the bitrate we can achieve.

First of all, how should one bit be encoded? The absolute brightness will fluctuate depending on radio conditions. So I decided to encode bits as the brightness difference between two short horizontal lines. Positive difference means 1 and negative 0. This should stay fairly constant, unless the colours completely flip around that is.

The monitor has 768 pixels vertically. This is a nice number so I designed a packet that runs vertically across the display. (This proved to be a bad decision, as we will later see.) We can stack as many packets side-by-side as the monitor width allows. A new batch of packets can be displayed in each frame, or we can repeat them over multiple frames to improve reliability.

These packets should have some metadata, at least a sequence number. Our medium is also quite noisy, so we need some kind of forward error correction. I'm using a Hamming(12,8) code which adds 4 error correction bits for every 8 bits of data. Finally, we need to add a CRC to each packet so we can make sure it arrived intact; I chose CRC16 with the polynomial 0x8005 (just because liquid-dsp provided it by default).

First results!

It was quite unbelievable, I was able to transmit a looping 64 kbps audio stream almost without any glitches, with the monitor and the receiver in the same room approximately 2 meters from each other.

Quick tip. Raw 8-bit PCM audio is a nice test format for these kinds of streaming experiments. It's straightforward to set an arbitrary bitrate by resampling the sound (with SoX for instance); there's no structure, headers, or byte order to deal with; and any packet loss, misorder, or buffer underrun is instantly audible. You can use a headerless companding algorithm like A-law to fit more dynamic range in 8 bits. Even stereo works; if you start from the wrong byte the channels will just get swapped. SoX can also play back the stream.

But can we get more? Slowly I added more samples per second, and a second audio channel. Suddenly we were at 256 kbps and still running smoothly. 200 kbps was even possible from the adjacent room, with a directional antenna 5 meters away, and with the door closed! In the same room, it worked up to around 512 kilobits per second but then hit a wall.

A tearful performance

The heavy error correction and framing adds around 60% of overhead, and we're left wit 480 bits of 'payload' per packet. If we have 39 packets per frame at 60 frames per second we should get more than a megabit per second, right? But for some reason it always caps at half a megabit.

The reason revealed itself when I noticed every other frame was often completely discarded at the CRC check. Of course; I should have thought of properly synchronising the screen update to the graphics adapter's frame update cycle (or VSYNC). This would prevent the picture information changing mid-frame, also known as tearing. But whatever options I tried with the SDL library I couldn't get the Raspberry Pi 4 to not introduce tearing.

Screen tearing appears to be an unsolved problem plaguing the Raspberry Pi 4 specifically (see this Google search). I tried another mini computer, the Asus Tinker Board R2.0, but I couldn't get the graphics drivers to work properly. I then realised it was a mistake to have the packets run from top to bottom; any horizontal tearing will cut every single packet in half! With a horizontal design only one packet per frame would suffer this fate.

A new design enables video-over-video

Packets that run horizontally across the screen indeed fix most of the packet loss. It may also help with CPU load as it improves memory access locality. I'm now able to get 1000 kbps from the monitor! What could this be used for? A live video stream, perhaps?

But the clock was ticking. I had a presentation coming up and I really wanted to amaze everyone with a video transfer demo. I quite literally got it working on the morning of the event. For simplicity, I decided to go with MJPEG, even though fancier schemes could compress way more efficiently. The packet loss issues are mostly kept at bay by repeating frames.

The data stream is "hidden" in a Windows desktop screenshot; I'm changing the colours in a way that both creates a readable bit and also looks inconspicuous when you look from far away.

Mitigations

This was a fun project but this kind of a vulnerability could, in the tinfoiliest of situations, be used for exfiltrating information out of a supposedly airgapped computer.

The issue has been alleviated in some modern display protocols. DisplayPort[4] makes use of scrambling: a pseudorandom sequence of bits is mixed with the bitstream to remove the strong clock oscillations that are so easily radiated out. This also randomizes the bitstream-to-amplitude correlation. I haven't personally tested whether it still has some kind of video in their radio interference, though. (Edit: Scrambling seems to be optionally supported by later versions of HDMI, too – but it might depend on which features exactly the two devices negotiate. How could you know if it's turned on?)

I've also tried wrapping the monitor in tinfoil (very impractical) and inside a cage made out of chicken wire (it had no effect - perhaps I should have grounded it?). I can't recommend either of these.

Software considerations

This project was made possible by at least C++, Perl, SoX, ImageMagick, liquid-dsp, Dear Imgui, GLFW, turbojpeg, and v4l2! If you're a library that feels left out, please leave a comment.

If you wish to play around with video emanations, I heard there is a project called TempestSDR. For generic analog video decoding via a software radio, there is TVSharp.

References

  1. Van Eck, Wim (1985): Electromagnetic radiation from video display units: An eavesdropping risk?
  2. Kuhn, Markus (2004): Electromagnetic Eavesdropping Risks of Flat-Panel Displays
  3. KUNG FURY Official Movie [HD] (2015)
  4. Video Electronics Standards Association (2006): DisplayPort Standard, version 1.

Spiral spectrograms and intonation illustrations

I've been experimenting with methods for visualising harmony, intonation (tuning), and overtones in music. Ordinary spectrograms aren't very well suited for that as the harmonic relations are not intuitively visible. Let's see what could be done about this. I'll try to sprinkle the text with Wikipedia links in order to immerse (nerd snipe?) the reader in the subject.

Equal temperament cents against time

We can examine how tuning evolves during a recording by choosing a reference pitch and plotting all frequencies relative to it modulo 100 cents. This is similar to what an electronic tuner does, but instead of just showing the fundamental frequency, we'll plot the whole spectrum. Information about the absolute frequencies is lost. This "zoomed-in" plot visualises how the distribution of frequencies fits the 12-tone equal temperament system (12-TET) common in Western music.

Here's the first 20 seconds of Debussy's Clair De Lune as found on YouTube, played with a well-tuned (video) and an out-of-tune piano (video). The second piano sounds out of tune because there are relative differences in tuning between the strings. The first piano looks to be a few cents sharp as a whole, but consistently so, so it's not perceptible.

The vertical axis is the same that electronic tuners use. All the notes of a chord will appear in the middle, as long as they are well tuned against the reference pitch of, say, A = 440 Hz. The top edge of the graph is half a semitone sharp (quarter tone = 50c), and the bottom is half a semitone flat.

Overtones barely appear in the picture because the first three conveniently hit other notes in tune. But from f5 onwards the harmonic series starts deviating from 12-TET and the harmonics start to look out-of-tune (f5 = −14c, f7 = −31c, ...). These can be cut out by filtering, or hoping that they're lower in volume and setting the color range accordingly.

Six months later...

You know how you sometimes accidentally delete a project and have to rewrite it from scratch much later, and it's never exactly the same? That's what happened at this point. It's a little scuffed, but here's some piano music that utilises quarter tones (by Wyschnegradsky). I used the same scale on purpose, so the quarter tones wrap around from the top and bottom.

More examples of these "intonation plots" in a tweet.

This suits well for piano music. However, not even all western classical music is tuned to equal temperament; for instance, solo strings may be played with Pythagorean intonation[1], whereas vocal ensembles[2] and string quartets may tune some intervals closer to just intonation. Unlike the piano, these wouldn't look too good in the equal-note plot.

Octave spiral

If we instead plot the spectrum modulo 1200 cents (1 octave) we get an interesting interval view. We could even ditch the time scale and wind the spectrogram into a spiral to make it prettier and preserve the overtones and absolute frequency information. Now each note is represented by an angle in this spiral; in 12-TET, they're separated by 30 degrees. At any point in the spiral, going 1 turn outwards doubles the frequency.

Here's a C major chord on a piano. Note how the harmonic content adds several high-frequency notes on top of the C, E, and G of the triad chord, and how multiple notes can contribute to the same overtone:

I had actually hoped I would get an Ultimate Chord Viewer where any note, regardless of frequency, would have all its harmonics neatly stacked on top of it. But it's not what has happened here: the harmonic series is not a stack of octaves (2^n), but instead of integer multiples (n). Some harmonics appear at seemingly unrelated angles. But it's still a pretty interesting visualisation, and perhaps makes more sense musically.

This plot is also better at illustrating different tuning systems. Let's look at a major third interval F-A in equal temperament and just intonation, with a few more harmonics.

The intuition from this plot is that equal temperament aims for equal distances (angles) between notes and just intonation tries to make more of the harmonics match instead.

Even though the Ultimate Chord Viewer was not achieved I now have ideas for Christmas lights...

Live visualization

Here's what a cappella music with some reverb looks like on the spiral spectrogram.

A shader toy

This little real-time GLSL demo on shadertoy draws a spiral FFT from microphone audio. But don't get your hopes up: The spectrograms in this blog post were made with a 65536-point FFT. Shadertoy's 512px microphone texture offers a lot less in terms of frequency range and bins. This greatly blurs the frequency resolution, especially towars the low end. Could it be improved with the right colormap? Or a custom FFT with the waveform texture as its input?

References

Speech to birdsong conversion

I had a dream one night where a blackbird was talking in human language. When I woke up there was actually a blackbird singing outside the window. Its inflections were curiously speech-like. The dreaming mind only needed to imagine a bunch of additional harmonics to form phonemes and words. One was left wondering if speech could be transformed into a blackbird song by isolating one of the harmonics...

One way to do this would be to:

  • Find the instantaneous fundamental frequency and amplitude of the speech. For example, filter the harmonics out and use an FM demodulator to find the frequency. Then find the signal envelope amplitude by AM demodulation.
  • Generate a new wave with similar amplitude variations but greatly multiplied in frequency.

A proof-of-concept script using the Perl-SoX-csdr command-line toolchain is available (source code here). The result sounds surprisingly blackbird-like. Even the little trills are there, probably as a result of FM noise or maybe vocal fry at the end of sentences. I got the best results by speaking slowly and using exaggerated inflection.

Someone hinted that the type of intonation used in certain automatic announcements is perfect for this kind of conversion. And it seems to be true! Here, a noise gate and reverb has been added to the result to improve it a little:

And finally, a piece of sound art where this synthetic blackbird song is mixed with a subtle chord and a forest ambience:

Think of the possibilities: A simultaneous interpreter for talking to birds. A tool for dubbing talking birds in animation or live theatre. Entertainment for cats.

What other birds could be done with a voice changer like this? What about croaky birds like a duck or a crow?

(I talked about this blog post a little on NPR: Here's What 'All Things Considered' Sounds Like — In Blackbird Song)

Plotting patterns in music with a fantasy record player

Back in April I bought a vinyl record that had a weird wavy pattern near the outer edge. I though I may have broken it somehow but couldn't even test this because I don't own a record player. *) But when I took a closer look at the pattern it seemed to somehow follow changes in the music. That doesn't look like damage at all.

When I played the CD version it became clear: this was an artifact of the tempo of the electronic track (100 bpm) being a multiple of the rotational speed (33 1/3 rpm), and these were probably drum hits! My tweet sparked some interesting discussion and I've been pondering this ever since. Could we plot any song as a loop or grid based on its own tempo and see interesting patterns?

(*) I know, it's a little odd. But I have a few unplayed vinyl records waiting for the day that I finally have the proper equipment. By the way, the song was Black Pink by RinneRadio from their wonderful album staRRk.

I wrote a little script to do just this: to plot the amplitude of the FLAC into a grid with an adjustable width. The result looks very similar to the pattern on the vinyl surface! Note that this image is a "straightened out" version of the disc surface and it's showing three of those wavy patterns. The top edge corresponds to the outer edge of the vinyl.

[Image: A plot showing similar patterns that were on the disc surface.]

Later I wrote a little more ambitious plotter that shall be explained soon.

Computer-conducted music gives best patterns

After plotting several different songs against their own tempo like this it seemed that in addition to electronic music a lot of pop and rock has this type of a pattern, too. The most striking and clear patterns can be seen in music that makes use of drum samples in a quantized time base (aka. a drum machine): the same kick drum sample, for example, repeats four times in each bar, perfectly timed by a computer so that they align in phase.

Somewhat similar patterns can be seen in live music that is played to a "click track": each band member hears a common computer-generated time signal in their earplug so that they won't sway from an average tempo. But of course the live beats won't be perfectly phase-aligned in this case, because the musicians are humans and there's also physics involved.

3D rendered video experiment

To demonstrate how the patterns on vinyl records are born I made a video showing a fantasy record player that can play an "e-ink powered optical record" and morph it into any RPM. I say fantasy because it's just that: imagination, science fiction, rendered 3D art - it would be quite unfeasible in real life. You can't actually make e-ink displays that fast and accurate. But of course it would be possible to have a live display of a digitally sampled audio file as a spiral and use some kind of physical controllers to change the RPM value in real time, and just play the sound from a digital file.

Making the video was really fun and I think the end result is equal parts weird and illustrating.

Programming the disk surface and audio

The disc surface is based on a video texture: a different image was rendered for each changed frame using a purpose-written C++ program. The program uses an oversampled (8x) version of the original music that it then resamples at a variable rate based on the recorded RPM value (let's call it a morphing value). Oversampling and low-pass filtering beforehand makes variable-rate resampling simple: just take a sample at the appropriate time instant and don't worry about interpolation. It won't sound perfect but actually the artifact adds an interesting distortion, perhaps simulating pixel boundaries in the 'e-ink display'.

The amplitude sample at each time instant was projected into polar coordinates and plotted as an image. The image is fairly large - at least 2048 by 2048 pixels. I use this as a sort of image-space oversampling to get the polar projection to look a little better. I even tried 8192 x 8192 video but it was getting too heavy on my computer. But a new image must only be generated when the morphing value changes; the other frames can be copied.

[Image: A square image of the disc video texture.]

The sound track was made by continuously sampling the position of the "play head" 44100 times per second, whether the disk was moving or not. Which sample ends up in the audio depends on the current rotational angle and the morphing value of the disk surface. When either of those values change it moves the audio past the play head. A DC cancel filter was then applied because the play head would often stop on a non-zero sample, and it didn't look nice in the waveform. There's also a quiet hum in the background.

[Image: Screenshot of C++ code with a list of events.]

I made an event-based system where I could input events simulating the button presses and other controls. The system responds to speed change events with a smoothstep function so that the disc seems to have realistic inertia. Also, the slow startup and slowdown sounds kind of cool this way. Here's an extra-slow version of the effect -- you can hear the slight aliasing artifacts in the very beginning and end:

3D modeling, texturing, shading

The models were all made in Blender, a tool that I've slowly learned to use during the pandemic situation. Its modeling tools are pretty fun to use and once you learn it you can build pretty 3D models to look at that won't take up any room in your apartment.

[Image: A screenshot of Blender with the device without textures.]

I love the retro aesthetic of old reel-to-reel players and other studio equipment. So I looked for inspiration by searching "reel-to-reel" on Google Images. Try it out, it's worth it! Originally I wanted for the disc to be transparent with some type of movable particles inside, and the laser to be shone through it, but this was computationally very expensive to render. So I made it and 'e-ink' display instead. (I regret this choice a little bit since some people, at first glance, apparently thought the video depicted actual existing technology. But I tried to make it clear it's a photorealistic render :)

I made use of the boolean workflow and bevel modifiers to cut holes and other small details in the hard surfaces. The cables are Bezier curves with the round bevel setting enabled.

The little red LCD is also a video texture on an emission shader – each frame was an SVG that was changed a little in time to add flicker and then exported using Inkscape from a batch script.

The wood textures, fingerprints, and the room environment photo are from HDRi Haven, Texture Haven and CC0 Textures. I'm especially proud of all the details on the disc surface -- here's the shader setup I built for the surface:

[Image: A Blender texture node map.]

The video was rendered in Blender Eevee and it took maybe 10 hours at 720p60. It's sad that it's not in 1080p but I was impatient. I spent quite some time to make the little red LCD look realistic but it was completely spoiled by compression!

Here's a bigger still rendered in Cycles:

[Image: A render of the record player.]

Mapping rotation to the disc

Rotation angles from the C++ program were sampled 60 times per second and output as CSV. These were then imported to Blender as keyframes for the rotating disc, using the Python API:

[Image: A screenshot of a Python script.]

Here you only need to print a new keyframe when the speed of rotation changes, or is about to change; Blender will interpolate the rest.

A Driver was set up to make the Y rotation slightly follow the Z rotation with a very small factor to make the disc 'wobble' a bit.

What's next?

It's endless fun to build and program functional fantasy electronics and I may need to do more of that. I'm currently also obsessed with 3D modeled dollhouses and who knows how those things will combine?

By the way, there is an actual device somewhat resembling this 3D model. It's called the Panoptigon, it's sort of an optical mellotrone (video).

Capturing PAL video with an SDR (and a few dead-ends)

I play 1980s games, mostly Super Mario Bros., on the Nintendo NES console. It would be great to be able to capture live video from the console for recording speedrun attempts. Now, how to make the 1985 NES and the 2013 MacBook play together, preferably using hardware that I already have? This project diary documents my search for the answer.

Here's a spoiler – it did work:

[Image: A powered-on NES console and a MacBook on top of it, showing a Tetris title screen.]

Things that I tried first

A capture device

Video capture devices, or capture cards, are devices specially made for this purpose. There was only one cheap (~30€) capture device for composite video available locally, and I bought it, hopingly. But it wasn't readily recognized as a video device on the Mac, and there seemed to be no Mac drivers available. Having already almost capped my budget for this project I then ordered a 5€ EasyCap device from eBay, as there was some evidence of Mac drivers online. The EasyCap was still making its way to Finland as of this writing, so I continued to pursure other routes.

PS: When the device finally arrived, it sadly seemed that the EasyCapViewer-Fushicai software only supports opening this device in NTSC mode. There's PAL support in later commits in the GitHub repo, but the project is old and can't be compiled anymore as Apple has deprecated QuickTime.

Even when they do work, a downside to many cheap capture devices is that they can only capture at half the true framerate (that is, at 25 or 30 fps).

CRT TV + DSLR camera

The cathode-ray tube television that I use for gaming could be filmed with a digital camera. This posed interesting problems: The camera must be timed appropriately so that a full scan is captured in every frame, to prevent temporal aliasing (stripes). This is why I used a DSLR camera with a full manual mode (Canon EOS 550D in this case).

For the 50 Hz PAL television screen I used a camera frame rate of 25 fps and an exposure time of 1/50 seconds (set by camera limitations). The camera will miss every other frame of the original 50 fps video, but on the other hand, will get an evenly lit screen every time.

A Moiré pattern will also appear if the camera is focused on the CRT shadow mask. This is due to intererence between two regular 2D arrays, the shadow mask in the TV and the CCD array in the camera. I got rid of this by setting the camera on manual focus and defocusing the lense just a bit.

[Image: A screen showing Super Mario Bros., and a smaller picture with Oona in it.]

This produced surprisingly good quality video, save for the slight jerkiness caused by the low frame rate (video). This setup was good for one-off videos; However, I could not use this setup for live streaming, because the camera could only record on the SD card and not connect to the computer directly.

LCD TV + webcam

An old LCD TV that I have has significantly less flicker than the CRT, and I could have live video via the webcam. But the Microsoft LifeCam HD-3000 that I have had only a binary option for manual exposure (pretty much "none" and "lots"). Using the higher setting the video was quite washed out, with lots of motion blur. The lower setting was so fast that it looked like the LCD had visible vertical scanning. Brightness was also heavily dependent on viewing angle, which caused gradients over the image. I had to film at a slightly elevated angle so that the upper part of the image wouldn't go too dark, and this made the video look like a bootleg movie copy.

[Image: A somewhat blurry photo of an LCD TV showing Super Mario Bros.]

Composite video

Now to capturing the actual video signal. The NES has two analog video outputs: one is composite video and the other an RF modulator, which has the same composite video signal modulated onto an AM carrier in the VHF television band plus a separate FM audio carrier. This is meant for televisions with no composite video input: the TV sees the NES as an analog TV station and can tune to it.

In composite video, information about brightness, colour, and synchronisation is encoded in the signal's instantaneous voltage. The bandwidth of this signal is at least 5 MHz, or 10 MHz when RF modulated, which would require a 10 MHz IQ sampling rate.

[Image: Oscillogram of one PAL scanline, showing hsync, colour burst, and YUV parts.]

I happen to have an Airspy R2 SDR receiver that can listen to VHF and take 10 million samples per second - could it be possible? I made a cable that can take the signal from the NES RCA connector to the Airspy SMA connector. And sure enough, when the NES RF channel selector is at position "3", a strong signal indeed appears on VHF television channel 3, at around 55 MHz.

Software choices

There's already an analog TV demodulator for SDRs - it's a plugin for SDR# called TVSharp. But SDR# is a Windows program and TVSharp doesn't seem to support colour. And it seemed like an interesting challenge to write a real-time PAL demodulator myself anyway.

I had been playing with analog video demodulation recently because of my HDMI Tempest project (video). So I had already written a C++ program that interprets a 10 Msps digitised signal as greyscale values and sync pulses and show it live on the screen. Perhaps this could be used as a basis to build on. (It was not published, but apparently there is a similar project written in Java, called TempestSDR)

Data transfer from the SDR is done using airspy_rx from airspy-tools. This is piped to my program that reads the data into a buffer, 256 ksamples at a time.

Automatic gain control is an important part of demodulating an AM signal. I used liquid-dsp's AGC by feeding it the maximum amplitude over every scanline period; this roughly corresponds to sync level. This is suboptimal, but it works in our high-SNR case. AM demodulation was done using std::abs() on the complex-valued samples. The resulting real value had to be flipped from 1, because TV is transmitted "inverse AM" to save on the power bill. I then scaled the signal so that black level was close to 0, white level close to 1, and sync level below 0.

I use SDL2 to display the video and OpenCV for pixel addressing, scaling, cropping, and YUV-RGB conversions. OpenCV is an overkill dependency inherited from the Tempest project and SDL2 could probably do all of those things by itself. This remains TODO.

Removing the audio

The captured AM carrier seems otherwise clean, but there's an interfering peak on the lower sideband side at about –4.5 MHz. I originally saw it in the demodulated signal and thought it would be related to colour, as it's very close to the PAL chroma subcarrier frequency of 4.43361875 MHz. But when it started changing frequency in triangle-wave shapes, I realized it's the audio FM carrier. Indeed, when it is FM demodulated, beautiful NES music can be heard.

[Image: A spectrogram showing the AM carrier centered in zero, with the sidebands, chroma subcarriers and audio alias annotated.]

The audio carrier is actually outside this 10 MHz sampled bandwidth. But it's so close to the edge (and so powerful) that the Airspy's anti-alias filter cannot sufficiently attenuate it, and it becomes folded, i.e. aliased, onto our signal. This caused visible banding in the greyscale image, and some synchronization problems.

I removed the audio using a narrow FIR notch filter from the liquid-dsp library. Now, the picture quality is very much acceptable. Minor artifacts are visible in narrow vertical lines because of a pixel rounding choice I made, but they can be ignored.

[Image: Black-and-white screen capture of NES Tetris being played.]

Decoding colour

PAL colour is a bit complicated. It was designed in the 1960s to be backwards compatible with black-and-white TV receivers. It uses the YUV colourspace, the Y or "luminance" channel being a black-and-white sum signal that already looks good by itself. Even if the whole composite signal is interpreted as Y, the artifacts caused by colour information are bearable. Y also has a lot more bandwidth, and hence resolution, than the U and V (chrominance) channels.

U and V are encoded in a chrominance subcarrier in a way that I still haven't quite grasped. The carrier is suppressed, but a burst of carrier is transmitted just before every scanline for reference (so-called colour burst).

Turns out that much of the chroma information can be recovered by band-pass filtering the chrominance signal, mixing it down to baseband using a PLL locked to the colour burst, rotating it by a magic number (chroma *= std::polar(1.f, deg2rad(170.f))), and plotting the real and imaginary parts of this complex number as the U and V colour channels. This is similar to how NTSC colour is demodulated.

In PAL, every other scanline has its chrominance phase shifted (hence the name, Phase Alternating [by] Line). I couldn't get consistent results demodulating this, so I skipped the chrominance part of every other line and copied it from the line above. This doesn't even look too bad for my purposes. However, there seems to be a pre-echo in UV that's especially visible on a blue background (most of SMB1 sadly), and a faint stripe pattern on the Y channel, most probably crosstalk from the chroma subcarrier that I left intact for now.

[Image: The three chroma channels Y, U, and V shown separately as greyscale images, together with a coloured composite of Mario and two Goombas.]

I used liquid_firfilt to band-pass the chroma signal, and liquid_nco to lock onto the colour burst and shift the chroma to baseband.

Let's play Tetris!

Latency

It's not my goal to use this system as a gaming display; I'm still planning to use the CRT. However, total buffer delays are quite small due to the 10 Msps sampling rate, so the latency from controller to screen is pretty good. The laptop can also easily decode and render at 50 fps, which is the native frame rate of the PAL NES. Tetris is playable up to level 12!

Using a slow-mo phone camera, I measured the time it takes for a button press to make Mario jump. The latency is similar to that of a NES emulator:

MethodFrames @240fpsLatency
RetroArch emulator28117 ms
PAL NES + Airspy SDR26108 ms
PAL NES + LCD TV2083 ms

Performance considerations

A 2013 MacBook Pro is perhaps not the best choice for dealing with live video to begin with. But I want to be able to run the PAL decoder and a screencap / compositing / streaming client on the same laptop, so performance is even more crucial.

When colour is enabled, CPU usage on this quad-core laptop is 110% for palview and 32% for airspy_rx. The CPU temperature is somewhere around 85 °C. Black-and-white decoding lowers palview usage to 84% and CPU temps to 80 °C. I don't think there's enough cycles left for a streaming client just yet. Some CPU headroom would be nice as well; a resync after dropped samples looks quite nasty, and I wouldn't want that to happen very often.

[Image: htop screenshot show palview and airspy_rx on top, followed by some system processes.]

Profiling reveals that the most CPU-intensive tasks are those related to FIR filtering. FIR filters are based on convolution, which is of high computational complexity, unless done in hardware. FFT convolution can also be faster, but only when the kernel is relatively long.

[Image: Diagram shows the Audio notch FIR takes up 27 % and Chroma Bandpass FIR 12 % of CPU. Several smaller contributors mentioned.

I've thought of having another computer do the Airspy transfer, audio notch filtering, and AM demodulation, and then transmit this preprocessed signal to the laptop via Ethernet. But my other computers (Raspberry Pi 3B+ and a Core 2 Duo T7500 laptop) are not nearly as powerful as the MacBook.

Instead of a FIR bandpass filter, a so-called chrominance comb filter is often used to separate chrominance from luminance. This could be realized very efficiently as a linear-complexity delay line. This is a promising possibility, but so far my experiments have had mixed results.

There's no source code release for now (Why? FAQ), but if you want some real-time coverage of this project, I did a multi-threaded tweetstorm: one, two, three.

Beeps and melodies in two-way radio

Lately my listening activities have focused on two-way FM radio. I'm interested in automatic monitoring and visualization of multiple channels simultaneously, and classifying transmitters. There's a lot of in-band signaling to be decoded! This post shall demonstrate this diversity and also explain how my listening station works.

Background: walkie-talkies are fun

The frequency band I've recently been listening to the most is called PMR446. It's a European band of radio frequencies for short-distance UHF walkie-talkies. Unlike ham radio, it doesn't require licenses or technical competence – anyone with 50€ to spare can get a pair of walkie-talkies at the department store. It's very similar to FRS in the US. It's quite popular where I live.

[Image: Photo of three different walkie-talkies.]

The short-distance nature of PMR446 is what I find perhaps most fascinating: in normal conditions, everything you hear has been transmitted from a 2-kilometer (1.3-mile) radius. Transmitter power is limited to 500 mW and directional antennas are not allowed on the transmitter side. But I have a receive-only system and a my only directional antenna is for 450 MHz, which is how I originally found these channels.

Roger beep

The roger beep is a short melody sent by many hand-held radios to indicate the end of transmission.

The end of transmission must be indicated, because two-way radio is 'half-duplex', which means only one person can transmit at a time. Some voice protocols solve the same problem by mandating the use of a specific word like 'over'; others rely on the short burst of static (squelch tail) that can be heard right after the carrier is lost. Roger beeps are especially common in consumer radios, but I've heard them in ham QSOs as well, especially if repeaters are involved.

Other signaling on PMR

PMR also differs from ham radio in that many of its users don't want to hear random people talking on the same frequency; indeed, many devices employ tones or digital codes designed to silence unwanted conversations, called CTCSS, DCS, or coded squelch. They are very low-frequency tones that can't usually be heard at all because of filtering. These won't prevent others from listening to you though; anyone can just disable coded squelch on their device and hear everyone else on the channel.

Many devices also use a tone-based system for preventing the short burst of static, that classic walkie-talkie sound, from sounding whenever a transmission ends. Baofeng calls these squelch tail elimination tones, or STE for short. The practice is not standardized and I've seen several different sub-audible frequencies being used in the wild, namely 55, 62, and 260 Hz. (Edit: As correctly pointed out by several people, another way to do this is to reverse the phase of the CTCSS tone in the end, called a 'reverse burst'. Not all radios use it though; many opt to send a 55 Hz tone instead, even when they are using CTCSS.)

Some radios have a button called 'alarm' that sends a long, repeating melody resembling a 90s mobile phone ring tone. These melodies also vary from one radio to the other.

My receiver

I have a system in place to alert me whenever there's a strong enough signal matching an interesting set of parameters on any of the eight PMR channels. It's based on a Raspberry Pi 3B+ and an Airspy R2 SDR receiver. The program can play the live audio of all channels simultaneously, or one could be selected for listening. It also has an annotated waterfall view that shows traffic on the band during the last couple of hours:

[Image: A user interface with text-mode graphics, showing eight vertical lanes of timestamped information. The lanes are mostly empty, but there's an occasional colored bar with annotations like 'a1' or '62'.]

The computer is a headless Raspberry Pi with only SSH connectivity; that's why it's in text mode. Also, text-mode waterfall plots are cool!

The coloured bars indicate signal strength (colour) and the duty factor (pattern). The numbers around the bars are decoded squelch codes, STEs and roger beeps. Uncertain detections are greyed out. In this view we've detected roger beeps of type 'a1' and 'a2'; a somewhat rare 62 Hz STE tone; and a ring tone, or alarm (RNG).

Because squelch codes are designed to be read by electronic circuits and their frequencies and codewords are specified exactly, writing a digital decoder for them was somewhat straightforward. Roger beeps and ring tones, on the other hand, are only meant for the human listener and detecting them amongst the noise took a bit more trial-and-error.

Melody detection algorithm

The melody detection algorithm in my receiver is based on a fast Fourier transform (FFT). When loss of carrier is detected, the last moments of the audio are searched for tones thusly:

[Image: A diagram illustrating how an FFT is used to search for a melody. The FFT in the image is noisy and some parts of the melody can not be measured.]
  1. The audio buffer is divided up into overlapping 60-millisecond Hann-windowed slices.
  2. Every slice is Fourier transformed and all peak frequencies (local maxima) are found. Their center frequencies are refined using Gaussian peak interpolation (Gasior & Gonzalez 2004). We need this, because we're only going to allow ±15 Hz of frequency error.
  3. The time series formed by the strongest maxima is compared to a list of pre-defined 'tone signatures'. Each candidate tone signature gets a score based on how many FFT slices match (+) corresponding slices of the tone signature. Slices with too much frequency error subtract from the score ().
  4. Most tone signatures have one or more 'quiet zones', the quietness of which further contributes to the score. This is usually placed after the tone, but some tones may also have a pause in the middle.
  5. The algorithm allows second and third harmonics (with half the score), because some transmitters may distort the tones enough for these to momentarily overpower the fundamental frequency.
  6. Every possible time shift (starting position) inside the 1.5-second audio buffer is searched.
  7. The tone signature with the best score is returned, if this score exceeds a set threshold.

This algorithm works quite well. It's not always able to detect the tones, especially if part of the melody is completely lost in noise, but it's good enough to be used for waterfall annotation. False positives are rare; most of them are detections of very short tone signatures that only consist of one or two beeps. My test dataset of 92 recorded transmissions yields only 5 false negatives and no false positives.

For example, this noisy recording:

was succesfully recognized as having a ringtone (RNG), a roger beep of type a1, and CTCSS code XA:

Naming and classification

Because I love classifying stuff I've had to come up with a system for naming these roger tones as well. My current system uses a lower-case letter for classifying the tone into a category, followed by a number that differentiates similar but slightly different tones. This is a work in progress, because every now and then a new kind of tone appears.

My goal would be to map the melodies to specific manufacturers. I've only managed to map a few. Can you recognise any of these devices?

ClassIdentified modelRecording
aCobra AM845 (a1)
cMotorola TLKR T40 (c1)
d?
e?
hBaofeng UV-5RC
i?

I didn't list them all here, but there are even more samples. I've added some alarm tones there as well, and a list of all the tone signatures that I currently know of. (Why no full source code? FAQ)

In my rx log I also have an emoji classification system for CTCSS codes. This way I can recognize a familiar transmission faster. A few examples below (there are 38 different CTCSS codes in total):

[Image: Two-character codes grouped into categories and paired with emoji. Four categories, namely fruit, sound, mammals, and scary. The fruit category has codes beginning with an M, and emoji for different fruit, etc.]

Future directions

There are mainly just minor bugs in my project trello at the moment, like adding the aforementioned emoji. But as the RasPi is not very powerful the DSP chain could be made more efficient. Sometimes a block of samples gets dropped. Currently it uses a bandpass-sampled filterbank to separate the channels, exploiting aliasing to avoid CPU-intensive frequency shifting altogether:

This is quite fast. But the 1:20 decimation from the Airspy IQ data is done with SoX's 1024-point FIR filter and could possibly be done with fewer coefficients. Also, the RasPi has four cores, so half of the channels could be demodulated in a second thread. Currently all concurrency is thanks to SoX and pmrsquash being different processes.

Related posts

References

Animated line drawings with OpenCV

OpenCV is a pretty versatile C++ computer vision library. Because I use it every day it has also become my go-to tool for creating simple animations at pixel level, for fun, and saving them as video files. This is not one of its core functions but happens to be possible using its GUI drawing tools.

Below we'll take a look at some video art I wrote for a music project. It goes a bit further than just line drawings but the rest is pretty much just flavouring. As you'll see, creating images in OpenCV has a lot in common with how you would work with layers and filters in an image editor like GIMP or Photoshop.

Setting it up

It doesn't take a lot of boilerplate to initialize an OpenCV project. Here's my minimal CMakeLists.txt:

cmake_minimum_required (VERSION 2.8)
project                (marmalade)
find_package           (OpenCV REQUIRED)
add_executable         (marmalade marmalade.cc)
target_link_libraries  (marmalade ${OpenCV_LIBS})

I also like to set compiler flags to enforce the C++11 standard, but this is not necessary.

In the main .cc file I have:

#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/highgui/highgui.hpp"

Now you can build the project by just typing cmake . && make in the terminal.

Basic shapes

First, we'll need an empty canvas. It will be a matrix (cv::Mat) with three unsigned char channels for RGB at Full HD resolution:

const cv::Size video_size(1920, 1080);
cv::Mat mat_frame = cv::Mat::zeros(video_size, CV_8UC3);

This will also initialize everything to zero, i.e. black.

Now we can draw our graphics!

I had an initial idea of an endless cascade of concentric rings each rotating at a different speed. There might be color and brightness variations as well but otherwise it would stay static the whole time. You can't see a circle's rotation around its center, so we'll add some features to them as well, maybe some kind of bars or spokes.

A simplified render method for a ring would look like this:

void Ring::RenderTo(cv::Mat& mat_output) const {
  cv::circle(mat_output, 8 * center_, 8 * radius_, color_, 1, CV_AA, 3);
  for (const Bar& bar : bars()) {
    cv::line(mat_output, 8 * (center_ + bar.start), 8 * (center_ + bar.end),
             color_, 1, CV_AA, 3);
  }
}

Drawing antialiased graphics at subpixel coordinates can make for some confusing OpenCV code. Here, all coordinates are multiplied by the magic number 8 and the drawing functions are instructed to do a bit shift of 3 bits (2^3 == 8). These three bits are used for the decimal part of the subpixel position.

The coordinates of the bars are generated for each frame based on the ring's current rotation angle.

Here are some rings at different phases of rotation. A bug leaves the innermost circle with no spokes, but it kind of looks better that way.

[Image: White concentric circles on a black background, with evenly separated lines connecting them.]

Eye candy: Glow effect

I wanted a subtle vector display look to the graphics, even though I wasn't aiming for any sort of realism with it. So the brightest parts of the image would have to glow a little, or spread out in space. This can be done using Gaussian blur.

Gaussian blur requires convolution, which is very CPU-intensive. I think most of the rendering time was spent calculating blur convolution. It could be sped up using threads (cv::parallel_for_) or the GPU (cv::cuda routines) but there was no real-time requirement in this hobby project.

There are a couple of ways to only apply the blur to the brightest pixels. We could blur a copy of the image masked with its thresholded version, for example. But I like to use look-up tables (LUT). This is similar to the curves tool in Photoshop. A look-up table is just a 256-by-1 RGB matrix that maps an 8-bit index to a colour. In this look-up table I just have a linear ramp where everything under 127 maps to black.

cv::Mat mat_lut = GlowLUT();
cv::Mat mat_glow;
cv::LUT(mat_frame, mat_lut, mat_glow);

Now when blurring, if we add the original image on top of the blurred version, its sharpness is preserved:

cv::GaussianBlur(mat_glow, mat_glow, cv::Size(0,0), 3.0);
mat_frame += 2 * mat_glow;
[Image: A zoomed view of a circle, showing the glow effect.]

The effect works unevenly on antialiased lines which adds a nice pearl-stringy look.

Eye candy: Tinted glass and grid lines

I created a vignetted and dirty green-yellow tinted look by multiplying the image per-pixel by an overlay made in GIMP. This has the same effect as having a "Multiply" layer mode in an image editor. Perhaps I was thinking of an old glass display, or Vectrex overlays. The overlay also has black grid lines that will appear black in the result. Multiplication doesn't change the color of black areas in the original, but I also added a copy of the overlay at 10% brightness to make it dimly visible in the background.

cv::Mat mat_overlay = cv::imread("overlay.png");
  
cv::multiply(mat_frame, mat_overlay, mat_frame, 1.f/255);
mat_frame += mat_overlay * 0.1f;
[Image: A zoomed view of a circle, showing the color overlay effect.]

Eye candy: Flicker

Some objects flicker slightly for an artistic effect. This can be headache-inducing if overdone, so I tried to use it in moderation. The rings have a per-frame probability for a decrease in brightness, which I think looks good at 60 fps.

if (randf(0.f, 1.f) < .0001f)
  color *= .5f;

The spokes will also sometimes blink upon encountering each other, and the whole ring flickers a bit when it first becomes visible.

Title text

An LCD matrix font was used for the title text. This was just a PNG image of 128 characters that was spliced up and rearranged. This can be done in OpenCV by using submatrices and rectangle ROIs:

cv::Mat mat_font = cv::imread("lcd_font.png");
const cv::Size letter_size(24, 32);
const std::string text("finally, the end of the "
                       "marmalade forest!");

int cursor_x = 0;
for (char code : text_) {
  int mx = code % 32;
  int my = code / 32;

  cv::Rect font_roi(cv::Point(mx * letter_size.width,
                              my * letter_size.height),
                    letter_size);
  cv::Mat mat_letter = mat_font(font_roi);

  cv::Rect target_roi(text_origin_.x + cursor_x,
                      text_origin_.y,
                      mat_letter.cols, mat_letter.rows)
  mat_letter.copyTo(mat_output(target_roi));

  cursor_x += letter_size.width;
}
[Image: A zoomed view of the text 'finally' with a glow and color overlay effect.]

Encoding the video

Now we can save the frames as a video file. OpenCV has a VideoWriter class for just this purpose. But I like to do this a bit differently. I encoded the frame images individually as BMP and just concatenated them one after the other to stdout:

std::vector<uchar> outbuf;
cv::imencode(".bmp", mat_frame, outbuf);
fwrite(outbuf.data(), sizeof(uchar), outbuf.size(), stdout);

I then ran this program from a shell script that piped the output to ffmpeg for encoding. This way I could also combine it with the soundtrack in a single run.

make && \
 ./marmalade -p | \
 ffmpeg -y -i $AUDIOFILE -framerate $FPS -f image2pipe \
        -vcodec bmp -i - -s:v $VIDEOSIZE -c:v libx264 \
        -profile:v high -b:a 192k -crf 23 \
        -pix_fmt yuv420p -r $FPS -shortest -strict -2 \
        video.mp4 && \
 open video.mp4

Result

The 1080p/60 version can be viewed by clicking on the gear wheel menu.