Showing posts with label radio. Show all posts
Showing posts with label radio. Show all posts

Capturing PAL video with an SDR (and a few dead-ends)

I play 1980s games, mostly Super Mario Bros., on the Nintendo NES console. It would be great to be able to capture live video from the console for recording speedrun attempts. Now, how to make the 1985 NES and the 2013 MacBook play together, preferably using hardware that I already have? This project diary documents my search for the answer.

Here's a spoiler – it did work:

[Image: A powered-on NES console and a MacBook on top of it, showing a Tetris title screen.]

Things that I tried first

A capture device

Video capture devices, or capture cards, are devices specially made for this purpose. There was only one cheap (~30€) capture device for composite video available locally, and I bought it, hopingly. But it wasn't readily recognized as a video device on the Mac, and there seemed to be no Mac drivers available. Having already almost capped my budget for this project I then ordered a 5€ EasyCap device from eBay, as there was some evidence of Mac drivers online. The EasyCap was still making its way to Finland as of this writing, so I continued to pursure other routes.

PS: When the device finally arrived, it sadly seemed that the EasyCapViewer-Fushicai software only supports opening this device in NTSC mode. There's PAL support in later commits in the GitHub repo, but the project is old and can't be compiled anymore as Apple has deprecated QuickTime.

Even when they do work, a downside to many cheap capture devices is that they can only capture at half the true framerate (that is, at 25 or 30 fps).

CRT TV + DSLR camera

The cathode-ray tube television that I use for gaming could be filmed with a digital camera. This posed interesting problems: The camera must be timed appropriately so that a full scan is captured in every frame, to prevent temporal aliasing (stripes). This is why I used a DSLR camera with a full manual mode (Canon EOS 550D in this case).

For the 50 Hz PAL television screen I used a camera frame rate of 25 fps and an exposure time of 1/50 seconds (set by camera limitations). The camera will miss every other frame of the original 50 fps video, but on the other hand, will get an evenly lit screen every time.

A Moiré pattern will also appear if the camera is focused on the CRT shadow mask. This is due to intererence between two regular 2D arrays, the shadow mask in the TV and the CCD array in the camera. I got rid of this by setting the camera on manual focus and defocusing the lense just a bit.

[Image: A screen showing Super Mario Bros., and a smaller picture with Oona in it.]

This produced surprisingly good quality video, save for the slight jerkiness caused by the low frame rate (video). This setup was good for one-off videos; However, I could not use this setup for live streaming, because the camera could only record on the SD card and not connect to the computer directly.

LCD TV + webcam

An old LCD TV that I have has significantly less flicker than the CRT, and I could have live video via the webcam. But the Microsoft LifeCam HD-3000 that I have had only a binary option for manual exposure (pretty much "none" and "lots"). Using the higher setting the video was quite washed out, with lots of motion blur. The lower setting was so fast that it looked like the LCD had visible vertical scanning. Brightness was also heavily dependent on viewing angle, which caused gradients over the image. I had to film at a slightly elevated angle so that the upper part of the image wouldn't go too dark, and this made the video look like a bootleg movie copy.

[Image: A somewhat blurry photo of an LCD TV showing Super Mario Bros.]

Composite video

Now to capturing the actual video signal. The NES has two analog video outputs: one is composite video and the other an RF modulator, which has the same composite video signal modulated onto an AM carrier in the VHF television band plus a separate FM audio carrier. This is meant for televisions with no composite video input: the TV sees the NES as an analog TV station and can tune to it.

In composite video, information about brightness, colour, and synchronisation is encoded in the signal's instantaneous voltage. The bandwidth of this signal is at least 5 MHz, or 10 MHz when RF modulated, which would require a 10 MHz IQ sampling rate.

[Image: Oscillogram of one PAL scanline, showing hsync, colour burst, and YUV parts.]

I happen to have an Airspy R2 SDR receiver that can listen to VHF and take 10 million samples per second - could it be possible? I made a cable that can take the signal from the NES RCA connector to the Airspy SMA connector. And sure enough, when the NES RF channel selector is at position "3", a strong signal indeed appears on VHF television channel 3, at around 55 MHz.

Software choices

There's already an analog TV demodulator for SDRs - it's a plugin for SDR# called TVSharp. But SDR# is a Windows program and TVSharp doesn't seem to support colour. And it seemed like an interesting challenge to write a real-time PAL demodulator myself anyway.

I had been playing with analog video demodulation recently because of my HDMI Tempest project (video). So I had already written a C++ program that interprets a 10 Msps digitised signal as greyscale values and sync pulses and show it live on the screen. Perhaps this could be used as a basis to build on. (It was not published, but apparently there is a similar project written in Java, called TempestSDR)

Data transfer from the SDR is done using airspy_rx from airspy-tools. This is piped to my program that reads the data into a buffer, 256 ksamples at a time.

Automatic gain control is an important part of demodulating an AM signal. I used liquid-dsp's AGC by feeding it the maximum amplitude over every scanline period; this roughly corresponds to sync level. This is suboptimal, but it works in our high-SNR case. AM demodulation was done using std::abs() on the complex-valued samples. The resulting real value had to be flipped from 1, because TV is transmitted "inverse AM" to save on the power bill. I then scaled the signal so that black level was close to 0, white level close to 1, and sync level below 0.

I use SDL2 to display the video and OpenCV for pixel addressing, scaling, cropping, and YUV-RGB conversions. OpenCV is an overkill dependency inherited from the Tempest project and SDL2 could probably do all of those things by itself. This remains TODO.

Removing the audio

The captured AM carrier seems otherwise clean, but there's an interfering peak on the lower sideband side at about –4.5 MHz. I originally saw it in the demodulated signal and thought it would be related to colour, as it's very close to the PAL chroma subcarrier frequency of 4.43361875 MHz. But when it started changing frequency in triangle-wave shapes, I realized it's the audio FM carrier. Indeed, when it is FM demodulated, beautiful NES music can be heard.

[Image: A spectrogram showing the AM carrier centered in zero, with the sidebands, chroma subcarriers and audio alias annotated.]

The audio carrier is actually outside this 10 MHz sampled bandwidth. But it's so close to the edge (and so powerful) that the Airspy's anti-alias filter cannot sufficiently attenuate it, and it becomes folded, i.e. aliased, onto our signal. This caused visible banding in the greyscale image, and some synchronization problems.

I removed the audio using a narrow FIR notch filter from the liquid-dsp library. Now, the picture quality is very much acceptable. Minor artifacts are visible in narrow vertical lines because of a pixel rounding choice I made, but they can be ignored.

[Image: Black-and-white screen capture of NES Tetris being played.]

Decoding colour

PAL colour is a bit complicated. It was designed in the 1960s to be backwards compatible with black-and-white TV receivers. It uses the YUV colourspace, the Y or "luminance" channel being a black-and-white sum signal that already looks good by itself. Even if the whole composite signal is interpreted as Y, the artifacts caused by colour information are bearable. Y also has a lot more bandwidth, and hence resolution, than the U and V (chrominance) channels.

U and V are encoded in a chrominance subcarrier in a way that I still haven't quite grasped. The carrier is suppressed, but a burst of carrier is transmitted just before every scanline for reference (so-called colour burst).

Turns out that much of the chroma information can be recovered by band-pass filtering the chrominance signal, mixing it down to baseband using a PLL locked to the colour burst, rotating it by a magic number (chroma *= std::polar(1.f, deg2rad(170.f))), and plotting the real and imaginary parts of this complex number as the U and V colour channels. This is similar to how NTSC colour is demodulated.

In PAL, every other scanline has its chrominance phase shifted (hence the name, Phase Alternating [by] Line). I couldn't get consistent results demodulating this, so I skipped the chrominance part of every other line and copied it from the line above. This doesn't even look too bad for my purposes. However, there seems to be a pre-echo in UV that's especially visible on a blue background (most of SMB1 sadly), and a faint stripe pattern on the Y channel, most probably crosstalk from the chroma subcarrier that I left intact for now.

[Image: The three chroma channels Y, U, and V shown separately as greyscale images, together with a coloured composite of Mario and two Goombas.]

I used liquid_firfilt to band-pass the chroma signal, and liquid_nco to lock onto the colour burst and shift the chroma to baseband.

Let's play Tetris!

Latency

It's not my goal to use this system as a gaming display; I'm still planning to use the CRT. However, total buffer delays are quite small due to the 10 Msps sampling rate, so the latency from controller to screen is pretty good. The laptop can also easily decode and render at 50 fps, which is the native frame rate of the PAL NES. Tetris is playable up to level 12!

Using a slow-mo phone camera, I measured the time it takes for a button press to make Mario jump. The latency is similar to that of a NES emulator:

MethodFrames @240fpsLatency
RetroArch emulator28117 ms
PAL NES + Airspy SDR26108 ms
PAL NES + LCD TV2083 ms

Maybe you now notice that the CRT is not listed here. That's because before I could make these measurements a funny sound was heard from inside the TV and a picture has never appeared since.

Performance considerations

A 2013 MacBook Pro is perhaps not the best choice for dealing with live video to begin with. But I want to be able to run the PAL decoder and a screencap / compositing / streaming client on the same laptop, so performance is even more crucial.

When colour is enabled, CPU usage on this quad-core laptop is 110% for palview and 32% for airspy_rx. The CPU temperature is somewhere around 85 °C. Black-and-white decoding lowers palview usage to 84% and CPU temps to 80 °C. I don't think there's enough cycles left for a streaming client just yet. Some CPU headroom would be nice as well; a resync after dropped samples looks quite nasty, and I wouldn't want that to happen very often.

[Image: htop screenshot show palview and airspy_rx on top, followed by some system processes.]

Profiling reveals that the most CPU-intensive tasks are those related to FIR filtering. FIR filters are based on convolution, which is of high computational complexity, unless done in hardware. FFT convolution can also be faster, but only when the kernel is relatively long.

[Image: Diagram shows the Audio notch FIR takes up 27 % and Chroma Bandpass FIR 12 % of CPU. Several smaller contributors mentioned.

I've thought of having another computer do the Airspy transfer, audio notch filtering, and AM demodulation, and then transmit this preprocessed signal to the laptop via Ethernet. But my other computers (Raspberry Pi 3B+ and a Core 2 Duo T7500 laptop) are not nearly as powerful as the MacBook.

Instead of a FIR bandpass filter, a so-called chrominance comb filter is often used to separate chrominance from luminance. This could be realized very efficiently as a linear-complexity delay line. This is a promising possibility, but so far my experiments have had mixed results.

There's no source code release for now (Why? FAQ), but if you want some real-time coverage of this project, I did a multi-threaded tweetstorm: one, two, three.

Descrambling split-band voice inversion with deinvert

Voice inversion is a primitive method of rendering speech unintelligible to prevent eavesdropping of radio or telephone calls. I wrote about some simple ways to reverse it in a previous post. I've since written a software tool, deinvert (on GitHub), that does all this for us. It can also descramble a slightly more advanced scrambling method called split-band inversion. Let's see how that happens behind the scenes.

Simple voice inversion

Voice inversion works by inverting the audio spectrum at a set maximum frequency called the inversion carrier. Frequencies near this carrier will thus become frequencies near zero Hz, and vice versa. The resulting audio is unintelligible, though familiar sentences can easily be recognized.

Deinvert comes with 8 preset carrier frequencies that can be activated with the -p option. These correspond to a list of carrier frequencies I found in an actual scrambler's manual, dubbed "the most commonly used inversion carriers".

The algorithm behind deinvert can be divided into three phases: 1) pre-filtering, 2) mixing, and 3) post-filtering. Mixing means multiplying the signal by an oscillation at the selected carrier frequency. This produces two sidebands, or mirrored copies of the signal, with the lower one frequency-inverted. Pre-filtering is necessary to prevent this lower sideband from aliasing when its highest components would go below zero Hertz. Post-filtering removes the upper sideband, leaving just the inverted audio. Both filters can be realized as low-pass FIR filters.

[Image: A spectrogram in four steps, where the signal is first cut at 3 kHz, then shifted up, producing two sidebands, the upper of which is then filtered out.]

This operation is its own inverse, like ROT13; by applying the same inversion again we get intelligible speech back. Indeed, deinvert can also be used as a scrambler by just running unscrambled audio through it. The same inversion carrier should be used in both directions.

Split-band inversion

The split-band scrambling method adds another carrier frequency that I call the split point. It divides the spectrum into two parts that are inverted separately and then combined, preventing ordinary inverters from fully descrambling it.

A single filter-inverter pair may already bring back the low end of the spectrum. Descrambling it fully amounts to running the inversion algorithm twice, with different settings for the filters and mixer, and adding the results together.

The problem here is to find these two frequencies. But let's take a look at an example from audio scrambled using the CML CMX264 split-band inverter (from a video by GBPPR2).

[Image: A spectrogram showing a narrow band of speech-like harmonics, but with a constant dip in the middle of the band.]

In this case the filter roll-off is clearly visible in the spectrogram and it's obvious where the split point is. The higher carrier is probably at the upper limit of the full band or slightly above it. Here the full bandwidth seems to be around 3200 Hz and the split point is at 1200 Hz. This could be initially descrambled using deinvert -f 3200 -s 1200; if the result sounds shifted up or down in frequency this could be refined accordingly.

Performance

On a single core of an i7-based laptop from 2013, deinvert processes a 44.1 kHz WAV file at 60x realtime speed (120x for simple inversion). Most of the CPU cycles are spent doing filter convolution, i.e. calculating the signal's vector dot product with the low-pass filter kernels:

[Image: A graph of the time spent in various parts of the call tree of the program, with the subtree leading to the dot product operation highlighted. It takes well over 80 % of the tree.]

For this reason deinvert has a quality setting (0 to 3) for controlling the number of samples in the convolution kernels. A filter with a shorter kernel is linearly faster to compute, but has a low roll-off and will leave more unwanted harmonics.

A quality setting of 0 turns filtering off completely, and is very fast. For simple inversion this should be fine, as long as the original doesn't contain much power above the inversion carrier. It's easy to ignore the upper sideband because of its high frequency. In split-band descrambling this leaves some nasty folded harmonics in the speech band though.

Here's a descramble of the above CMX264 split-band audio using all the different quality settings in deinvert. You will first hear it scrambled, and then descrambled with increasing quality setting.

The default quality level is 2. This should be enough for real-time descrambling of simple inversion on a Raspberry Pi 1, still leaving cycles for an FM receiver for instance:

(RasPi 1)Simple inversionSplit-band inversion
-q 016x realtime5.8x realtime
-q 16.5x realtime3.0x realtime
-q 22.8x realtime1.3x realtime
-q 31.2x realtime0.4x realtime

The memory footprint is less than four megabytes.

Future developments

There's a variant of split-band inversion where the inversion carrier changes constantly, called variable split-band. The transmitter informs the receiver about this sequence of frequencies via short bursts of data every couple of seconds or so. This data seems to be FSK, but it shall be left to another time.

I've also thought about ways to automatically estimate the inversion carrier frequency. Shifting speech up or down in frequency breaks the relationships of the harmonics. Perhaps this fact could be exploited to find a shift that would minimize this error?

Links

Mapping microwave relay links from video

Radio networks are often at least partially based on microwave relay links. They're those little mushroom-like appendices growing out of cell towers and building-mounted base stations. Technically, they're carefully directed dish antennas linking such towers together over a line-of-sight connection. I'm collecting a little map of nearby link stations, trying to find out how they're interconnected and which network they belong to.

Circling around

We can find a rough direction for any link antenna by approximating a tangent for the dish shroud surface from position-stamped video footage taken while circling the tower. Optimally we would have a drone make a full circle around the tower at a constant distance and elevation to map all antennas at once; but if our DJI Phantom has run out of battery, a GPS positioned still camera at ground level will also do.

[Image: Five photos of the same directional microwave antenna, taken from different angles, and edge-detection and elliptical Hough transform results from each one, with a large and small circle for all ellipses.]

The rest can be done manually, or using Hough transform and centroid calculation from OpenCV. In these pictures, the ratio of the diameters of the concentric circles is a sinusoid function of the angle between the antenna direction and the camera direction. At its maximum, we're looking straight at the beam. (The ratio won't max out at unity in this case, because we're looking at the antenna slightly from below.) We can select the frame with the maximum ratio from high-speed footage, or we can interpolate a smooth sinusoid to get an even better value.

[Image: Diagram showing how the ratio of the diameters of the large and small circle is proportional to the angle of the antenna in relation to the camera.]

This particular antenna is pointing west-northwest with an azimuth of 290°.

What about distance?

Because of the line-of-sight requirement, we also know the maximum possible distance to the linked tower, using the formula 7140 × √(4 / 3 × h) where h is the height of the antenna from ground. If the beam happens to hit a previously mapped tower closer than this distance, we can assume they're connected!

This antenna is communicating to a tower not further away than 48 km. Judging from the building it's standing on, it belongs to a government trunked radio network.

The burger pager

A multinational burger chain has a restaurant nearby. One day I went there and ordered a take-away burger that was not readily available. (Exceptional circumstances; Ludum Dare was underway and I really needed fast food.) The clerk gave me a device that looked like a thick coaster, and told me I could fetch the burger from the counter when the coaster starts blinking its lights and make noises.

Of course, this device deserved a second look! (We can forget about the burger now) The device looked like a futuristic coaster with red LEDs all around it. I've blurred the text on top of it for dramatic effect.

[Image: Photo of a saucer-shaped device sitting on the edge of a table, with red LEDs all around it. It has a label sticker with a logo that is redacted from the photo.]

Several people in the restaurant were waiting for orders with their similar devices, which suggested to me this could be a pager system of some sort. Turning the receiver over, we see stickers with interesting information, including a UHF carrier frequency.

[Image: Photo of a smaller sticker on the bottom of the device, reading 'BASE ID:070 FRQ:450.2500MHZ'.]

For this kind of situations I often carry my RTL2832U-based television receiver dongle with me (the so-called rtl-sdr). Luckily this was one of those days! I couldn't help but tune in to 450.2500 MHz and see what's going on there.

[Image: Photo of a very small radio hotglued inside a Hello Kitty branded tin.]

And indeed, just before a pager went off somewhere, this burst could be heard on the frequency (FM demodulated audio):

Googling the device's model number, I found out it's using POCSAG, a common asynchronous pager protocol at 2400 bits per second. The modulation is binary FSK, which means we should be able to directly read the data from the above demodulated waveform, even by visual inspection if necessary. And here's the data.

...
10101010101010101010101010101010  preamble for bit sync
10101010101010101010101010101010
10101010101010101010101010101010
01111100110100100001010111011000  sync codeword
00101010101110101011001001001000  pager address
01111010100010011100000110010111  idle
01111010100010011100000110010111
01111010100010011100000110010111
01111010100010011100000110010111
...

There's no actual message being paged, just an 18-bit device address. It's preceded by a preamble of alternating 1's and 0's that the receiver can grab onto, and a fixed synchronization codeword that tells where the data begins. It's followed by numerous repetitions of the idle codeword.

The next question is inevitable. How much havoc would ensue if someone were to loop through all 262,144 possible addresses and send a message like this? I'll leave it as hypothetical.

Squelch it out

Many air traffic and maritime radio channels, like 2182 kHz and VHF 16, are being monitored and continuously recorded by numerous vessels and ground stations. Even though gigabytes are now cheap, long recordings would have to be compressed to make them easier to move around.

There are some reasons not to use lossy compression schemes like MP3 or Vorbis. Firstly, radio recordings are used in accident investigations, and additional artifacts at critical moments could hinder investigation. Secondly, lossy compression performs poorly on noisy signals, because of the amount of entropy present. Let's examine some properties of such recordings that may aid in their lossless compression instead.

A common property of radio conversations is that the transmissions are short, and they're followed by sometimes very long pauses. Receivers often silence themselves in the absence of a carrier (known as 'squelching'), so there's nothing interesting to record during that time.

This can be exploited by replacing such quiet passages with absolute digital silence, that is a stream of zero-valued samples. Now, using lossless FLAC, passages of digital silence compress into a very small space because of run-length encoding; the encoder simply has to save the number of zero samples encountered.

Update 11/2015: It seems many versions of rtl_fm now do this automatically, i.e. they don't output anything when squelched unless instructed to do so.

Real-life test

Let's see how the method performs on an actual recording. A sample WAV file will be recorded from the radio, from a conversation on VHF FM. The SDR does the actual squelching, and then the PCM waveform will be piped through squelch, a silencer tool I wrote.

rtl_fm      -f 156.3M -p 96 -N -s 16k -g 49.2 -l 200 |\
  sox       -r 16k -t raw -e signed -b 16 -c 1 -V1 -\
               -r 11025 -t raw - sinc 200-4500 |\
  squelch   -l 5 -d 4096 -t 64 |\
  sox       -t raw -e signed -c 1 -b 16 -r 11025 - recording.wav

The waveform is sinc filtered so as to remove the DC offset resulting from FM demodulation without a PLL. This makes silence detection simpler. It also removes any noise outside the actual bandwidth of VHF voice transceivers. (The output of the first SoX at quiet passages is not digital silence; the resample to the common rate of 11,025 Hz introduces some LSB noise. This is what we're squelching in this example.)

30 minutes of audio was recorded. A carrier was on less than half of the time. FLAC compresses the file at a ratio of 0.174; this results in an average bitrate of 30 kbps, which is pretty good for lossless 16-bit audio. At this rate, a CD can fit almost 50 hours of recording.

Update 1/2019: A similar effect can also be achieved using the command-line tool csdr. It supports different sample formats and even a dynamically controllable squelch level.

Eavesdropping on a wireless keyboard

Some time ago, I needed to find a new wireless keyboard. With the level of digital paranoia that I have, my main priority was security. But is eavesdropping a justifiable concern? How insecure would it actually be to type your passwords using an older type of wireless keyboard?

To investigate this, I bought an old Logitech iTouch PS/2 cordless keyboard at an online auction. It's dated July 2000. Back in those days, wireless desktops used the 27 MHz shortwave band; later they've largely moved to 2.4 GHz. This one happens to be of the shortwave type. They've been tapped before (pdf), but no proof-of-concept was published.

I actually disposed of the keyboard before I could photograph it, so here's a newer Logitech S510 from 2005, still using the same technology:

[Image: Photo of a black cordless QWERTY keyboard with a Logitech logo.]

Compared to modern USB wireless dongles, the receiver of the iTouch is huge. It isn't a one chip wonder either, and contains a PCB with multiple crystal oscillators and decoder ICs. Based on Google results, one of the Motorola chips is an FM receiver, which gives us a hint about the mode of transmission.

[Image: On the left, a gray box with the Logitech logo, a button labeled 'CONNECT', and a wire coming out of it, ending in two PS/2 connectors. On the right, the box opened and its PCB revealed. On the PCB there are three metal-colored crystals (16.4200 0014TC, 10.1700 0011TC3, 10.2700 HELE), two bulky components labeled LT455EW, and three microchips with Motorola logos. An array of wires goes out.]

But because eavesdropping is our goal here, I'm tossing the receiver. Afterall, the signal is well within the 11-meter band of any home receiver with an SW band. For bandwidth reasons however, I'll use my RTL2838-based television receiver dongle, which can be tuned to an arbitrary frequency and commanded to just dump the I/Q sample stream (using rtl-sdr).

The transmission is clearly visible at 27.14 MHz. Zooming closer and taking a spectrogram, the binary FM/FSK nature of the transmission becomes obvious:

[Image: Spectrogram showing a constant sinusoid and, below it, a burst of data encoded in the frequency of another sinusoid.]

The sample length of one bit indicates a bitrate of 850 bps. A reference oscillator with a digital PLL can be easily implemented in software. I assumed there's a delta encoding on top of the FSK.

One keypress produces about 85 bits of data. The bit pattern seems to always correlate with the key being pressed, so there's no encryption at all. Pressing the reassociation button doesn't change things either. Without going too much into the details of the obscure protocol, I just mapped all keys to their bit patterns, like so:

w 111111101111011111101111101101011011100111111111001111111101111011111101111101101
e 111111101111011111101111110101011011100111111111001111111101111011111101111110101
1 111111101111011111101111110110110111001111111110011111111011110111111111110110110
2 111111101111011111101111110110111111100111111111001111111101111011111101111110110
3 111111101111011111101111111010111101001111111110011111111011110111111011111110101
7 111111101111011111101111111110110110110011111111100111111110111101111111011111111
8 111111101111011111101111101110111110011111111100111111110111101111110111110111011
9 111111101111011111101111101110110101100111111111001111111101111011111101111110110
0 111111101111011111101111110110111011001111111110011111111011110111111011111101110
u 111111101111011111101111111101011110011111111100111111110111101111110111111110101
i 111111101111011111101111111101010111001111111110011111111011110111111011111111010
                                                               6,8              40% 

The bitstrings are so much correlated between keystrokes that we can calculate the Levenshtein distance of the received bitstring to all the mapped keys, find the smallest distance, and behold, we can receive text from the keyboard!

$ rtl_sdr - -f 27132000 -g 32.8 -s 96000 |\
> sox -r .raw -c 2 -r 96000 -e unsigned -b 8 - -t .raw -r 22050 - |\
> ./fm |perl deco.pl
Found 1 device(s):
  0:  Realtek, RTL2838UHIDIR, SN: 00000013

Using device 0: ezcap USB 2.0 DVB-T/DAB/FM dongle
Found Rafael Micro R820T tuner
Tuned to 27132000 Hz.
Tuner gain set to 32.800000 dB.
Reading samples in async mode...
[CAPS]0wned
█

So, when buying a keyboard for personal use, I chose one with 128-bit AES air interface encryption.

Update: This post was mostly about accomplishing this with low-level stuff readily available at home. For anyone needing a proof of concept or even decoder hardware, there's KeyKeriki.

Update #2: Due to requests, my code is here: fm.c, deco.pl. Of course it's for reference only, as it's not a working piece of software, and never will be. Oh and it's not pretty either.

SSTV reception on Linux

As a programmer writing obscure little FLOSS projects for fun, having someone actually read and fix my code is a compliment. Recently I got a pull request on slowrx, my somewhat stalled slow-scan television demodulator project, and was inspired to write more. (Note: slowrx is not maintained any more – I suggest you to check out QSSTV!)

The project started in 2007 when I was trying to find a SSTV receiver for Linux that could run in "webcam mode" and just save everything it receives. It would use modern stuff like ALSA instead of OSS and Gtk+2/3 instead of 1. Preferably it would work even if the receiver wasn't exactly tuned at the zero-beat frequency. Turned out existing software couldn't do this, so I accepted the challenge.

So here's how slowrx looks today:

slowrx screenshot

FM demodulation is done by a 1024-point Fast Fourier Transform and Gaussian spectrum interpolation. The windowing function adapts to SNR, so the resolution drops under noisy conditions but we get a clearer image (or the user may opt to use a narrow window regardless of noise). Sound card clock mismatch is calculated using a linear Hough transform on the sync channel and fixed by re-running the demodulation at the adjusted rate. The program will detect VIS headers at any in-band frequency and shifts the reception window accordingly. It can decode the FSK callsign after the picture. The program stays up for hours without a segfault. Yeah – it's written in C99!

Pictures I've received using slowrx:

Pictures received

I've noticed the 8- and 12-second black-and-white Robot modes are still in use today. I believe they were the "original" SSTV modes that used to be viewed on a green phosphor screen in a much more analogue environment. So I've been thinking about an optional phosphor filter for B/W modes:

Phosphor filter