Showing posts with label software. Show all posts
Showing posts with label software. Show all posts

Descrambling split-band voice inversion with deinvert

Voice inversion is a primitive method of rendering speech unintelligible to prevent eavesdropping of radio or telephone calls. I wrote about some simple ways to reverse it in a previous post. I've since written a software tool, deinvert (on GitHub), that does all this for us. It can also descramble a slightly more advanced scrambling method called split-band inversion. Let's see how that happens behind the scenes.

Simple voice inversion

Voice inversion works by inverting the audio spectrum at a set maximum frequency called the inversion carrier. Frequencies near this carrier will thus become frequencies near zero Hz, and vice versa. The resulting audio is unintelligible, though familiar sentences can easily be recognized.

Deinvert comes with 8 preset carrier frequencies that can be activated with the -p option. These correspond to a list of carrier frequencies I found in an actual scrambler's manual, dubbed "the most commonly used inversion carriers".

The algorithm behind deinvert can be divided into three phases: 1) pre-filtering, 2) mixing, and 3) post-filtering. Mixing means multiplying the signal by an oscillation at the selected carrier frequency. This produces two sidebands, or mirrored copies of the signal, with the lower one frequency-inverted. Pre-filtering is necessary to prevent this lower sideband from aliasing when its highest components would go below zero Hertz. Post-filtering removes the upper sideband, leaving just the inverted audio. Both filters can be realized as low-pass FIR filters.

[Image: A spectrogram in four steps, where the signal is first cut at 3 kHz, then shifted up, producing two sidebands, the upper of which is then filtered out.]

This operation is its own inverse, like ROT13; by applying the same inversion again we get intelligible speech back. Indeed, deinvert can also be used as a scrambler by just running unscrambled audio through it. The same inversion carrier should be used in both directions.

Split-band inversion

The split-band scrambling method adds another carrier frequency that I call the split point. It divides the spectrum into two parts that are inverted separately and then combined, preventing ordinary inverters from fully descrambling it.

A single filter-inverter pair may already bring back the low end of the spectrum. Descrambling it fully amounts to running the inversion algorithm twice, with different settings for the filters and mixer, and adding the results together.

The problem here is to find these two frequencies. But let's take a look at an example from audio scrambled using the CML CMX264 split-band inverter (from a video by GBPPR2).

[Image: A spectrogram showing a narrow band of speech-like harmonics, but with a constant dip in the middle of the band.]

In this case the filter roll-off is clearly visible in the spectrogram and it's obvious where the split point is. The higher carrier is probably at the upper limit of the full band or slightly above it. Here the full bandwidth seems to be around 3200 Hz and the split point is at 1200 Hz. This could be initially descrambled using deinvert -f 3200 -s 1200; if the result sounds shifted up or down in frequency this could be refined accordingly.

Performance

On a single core of an i7-based laptop from 2013, deinvert processes a 44.1 kHz WAV file at 60x realtime speed (120x for simple inversion). Most of the CPU cycles are spent doing filter convolution, i.e. calculating the signal's vector dot product with the low-pass filter kernels:

[Image: A graph of the time spent in various parts of the call tree of the program, with the subtree leading to the dot product operation highlighted. It takes well over 80 % of the tree.]

For this reason deinvert has a quality setting (0 to 3) for controlling the number of samples in the convolution kernels. A filter with a shorter kernel is linearly faster to compute, but has a low roll-off and will leave more unwanted harmonics.

A quality setting of 0 turns filtering off completely, and is very fast. For simple inversion this should be fine, as long as the original doesn't contain much power above the inversion carrier. It's easy to ignore the upper sideband because of its high frequency. In split-band descrambling this leaves some nasty folded harmonics in the speech band though.

Here's a descramble of the above CMX264 split-band audio using all the different quality settings in deinvert. You will first hear it scrambled, and then descrambled with increasing quality setting.

The default quality level is 2. This should be enough for real-time descrambling of simple inversion on a Raspberry Pi 1, still leaving cycles for an FM receiver for instance:

(RasPi 1)Simple inversionSplit-band inversion
-q 016x realtime5.8x realtime
-q 16.5x realtime3.0x realtime
-q 22.8x realtime1.3x realtime
-q 31.2x realtime0.4x realtime

The memory footprint is less than four megabytes.

Future developments

There's a variant of split-band inversion where the inversion carrier changes constantly, called variable split-band. The transmitter informs the receiver about this sequence of frequencies via short bursts of data every couple of seconds or so. This data seems to be FSK, but it shall be left to another time.

I've also thought about ways to automatically estimate the inversion carrier frequency. Shifting speech up or down in frequency breaks the relationships of the harmonics. Perhaps this fact could be exploited to find a shift that would minimize this error?

Links

Redsea 0.7, a lightweight RDS decoder

I've written about redsea, my RDS decoder project, many times before. It has changed a lot lately; it even has a version number, 0.7.6 as of this writing. What follows is a summary of its current state and possible future developments.

Input formats

Redsea can decode several types of data streams. The command-line switches to activate these can be found in the readme.

Its main use, perhaps, is to demodulate an FM multiplex carrier, as received using a cheap rtl-sdr radio dongle and demodulated using rtl_fm. The multiplex is an FM demodulated signal sampled at 171 kHz, a convenient multiple of the RDS data rate (1187.5 bps) and the subcarrier frequency (57 kHz). There's a convenience shell script that starts both redsea and the rtl_fm receiver. For example, ./rtl-rx.sh -f 88.0M would start reception on 88.0 MHz.

It can also decode an "ASCII binary" stream (--input-ascii):

0001100100111001000101110000101110011000010010110010011001000000100001
1010010000011010110100010000000100000001101110000100010111000010111001
1001000010110000111111011101101011001010101110100011111101000011100010
100000011010010001011100001

Or hex-encoded RDS groups one per line (--input-hex), which is the format used by RDS Spy:

6201 01D8 E704 594C
6201 01D9 2217 4520
6201 E1C1 594C 6202
6201 01DA 1139 594B
6201 21DC 2020 2020

Output formats

The default output has changed drastically. There used to be no strict format to it, rather it was just a human-readable terminal display. This sort of output format will probably return at some point, as an option. But currently redsea outputs line-delimited JSON, where every group is a JSON object on a separate line. It is quite verbose but machine readable and well-suited for post-processing:

{"pi":"0x6201","group":"0A","tp":false,"prog_type":"Serious classical","ta":tru
e,"is_music":true,"alt_freqs":[87.9,88.5,89.2,89.5,89.8,90.9,93.2],"ps":"YLE YK
SI"}
{"pi":"0x6201","group":"14A","tp":false,"prog_type":"Serious classical","other_
network":{"pi":"0x6205","tp":false,"has_linkage":false}}
{"pi":"0x6201","group":"0A","tp":false,"prog_type":"Serious classical","ta":tru
e,"is_music":true,"partial_ps":"YL      "}
{"pi":"0x6201","group":"2A","tp":false,"prog_type":"Serious classical","partial
_radiotext":"Yöklassinen."}
{"pi":"0x6201","group":"0A","tp":false,"prog_type":"Serious classical","ta":tru
e,"is_music":true,"partial_ps":"YLE     "}
{"pi":"0x6201","group":"0A","tp":false,"prog_type":"Serious classical","ta":tru
e,"is_music":true,"partial_ps":"YLE YK  "}
{"pi":"0x6201","group":"2A","tp":false,"prog_type":"Serious classical","partial
_radiotext":"Yöklassinen."}
{"pi":"0x6201","group":"0A","tp":false,"prog_type":"Serious classical","ta":tru
e,"is_music":true,"alt_freqs":[87.9,88.5,89.2,89.5,89.8,90.9,93.2],"ps":"YLE YK
SI"}

Someone on GitHub hinted about jq, a command-line tool that can color and filter JSON, among other things:

> ./rtl-rx.sh -f 87.9M | jq -c
{"pi":"0x6201","group":"0A","tp":false,"prog_type":"Serious classical","ta":tru
e,"is_music":true,"partial_ps":"YL      "}
{"pi":"0x6201","group":"14A","tp":false,"prog_type":"Serious classical","other_
network":{"pi":"0x6202","tp":false}}
{"pi":"0x6201","group":"0A","tp":false,"prog_type":"Serious classical","ta":tru
e,"is_music":true,"partial_ps":"YLE     "}
{"pi":"0x6201","group":"0A","tp":false,"prog_type":"Serious classical","ta":tru
e,"is_music":true,"partial_ps":"YLE YK  "}
{"pi":"0x6201","group":"1A","tp":false,"prog_type":"Serious classical","prog_it
em_started":{"day":9,"time":"23:10"},"has_linkage":false}
^C

> ./rtl-rx.sh -f 87.9M | grep "\"radiotext\"" | jq ".radiotext"
"Yöklassinen."
"Yöklassinen."
"Yöklassinen."
"Yöklassinen."
"Yöklassinen."
"Yöklassinen."
"Yöklassinen."

The output can be timestamped using the ts utility from moreutils.

Additionally, redsea can output hex-endoded groups, the same format mentioned above.

Fast and lightweight

I've made an effort to make redsea fast and lightweight, so that it could be run real-time on cheap single-board computers like the Raspberry Pi 1. I rewrote it in C++ and chose liquid-dsp as the DSP library, which seems to work very well for the purpose.

Redsea now uses around 40% CPU on the Pi 1. Enough cycles will be left for the FM receiver, rtl_fm, which has a similar CPU demand. On my laptop, redsea has negligible CPU usage (0.9% of a single core). Redsea only runs a single thread and takes up 1500 kilobytes of memory.

Sensitivity

I've gotten several reports that redsea requires a stronger signal than other RDS decoders. This has been improved in recent versions, but I think it still has problems with even many local stations.

Let's examine how a couple of test signals go through the demodulator in Subcarrier::​demodulateMoreBits() and list possible problems. The test signals shall be called the good one (blue) and the noisy one (magenta). They were recorded on different channels using different antenna setups. Here are their average demodulated power spectra:

[Image: Spectrum plots of the two signals superimposed.]

The noise floor around the RDS subcarrier is roughly 23 dB higher in the noisy signal. Redsea recovers 99.9 % of transmitted blocks from the good signal and 60.1 % from the noisy one.

Below, redsea locks onto our good-quality signal. Time is in seconds.

[Image: A graph of several signal properties against time.]

Out of the noisy signal, redsea could recover a majority of blocks as well, even though the PLL and constellations are all over the place:

[Image: A graph of several signal properties against time.]

1) PLL

There's some jitter in the 57 kHz PLL, especially pronounced when the signal is noisy. One would expect a PLL to slowly converge on a frequency, but instead it just fluctuates around it. The PLL is from the liquid-dsp library (internal PLL of the NCO object).

  • Is this an issue?
  • What could affect this? Loop filter bandwidth?
  • What about the gain, i.e. the multiplier applied to the phase error?

2) Symbol synchronizer

  • Is liquid's symbol synchronizer being used correctly?
  • What should be the correct values for bandwidth, delay, excess bandwidth factor?
  • Do we really need a separate PLL and symbol synchronizer? Couldn't they be combined somehow? Afterall, the PLL already gives us a multiple of the symbol speed (57,000 / 48 = 1187.5).

3) Pilot tone

The PLL could potentially be made to lock onto the pilot tone instead. It would yield a much higher SNR.

  • According to the specs, the RDS subcarrier is phase-locked to the pilot, but can we trust this? Also, the phase difference is not defined in the standard.
  • What about mono stations with no pilot tone?
  • Perhaps a command-line option?
See redsea wiki for discussion.

4) rtl_fm

  • Are the parameters for rtl_fm (gain, filter) optimal?
  • Is there a poor-quality resampling phase somewhere, such as the one mentioned in the rtl_fm guide? Probably not, since we don't specify -r
  • Is the bandwidth (171 kHz) right?

Other features (perhaps you can help!)

Besides the basic RDS features (program service name, radiotext, etc.) redsea can decode some Open Data applications as well. It receives traffic messages from the TMC service and prints them in English. These are partially encrypted in some areas. It can also decode RadioText+, a service used in some parts of Germany to transmit such information as artist/title tags, studio hotline numbers and web links.

If there's an interesting service in your area you'd like redsea to support, please tell me! I've heard eRT (Enhanced RadioText) being in use somewhere in the world, and RASANT is used to send DGPS corrections in Germany, but I haven't seen any good data on those.

A minute or two of example data would be helpful; you can get hex output by adding the -x switch to the redsea command in rtl-rx.sh.

Receiving RDS with the RTL-SDR

redsea is a command-line RDS decoder. I originally wrote it as a script to decode RDS from demultiplexed FM stereo sound. Later I've experimented with other ways to read the bits, and the latest addition is to support the RTL-SDR television receiver via the rtl_fm tool.

Redsea is on GitHub. It has minimal dependencies (perl core modules, C standard library, rtl-sdr command-line tools) and has been tested to work on OSX and Linux with good enough FM reception. All test results, ideas, and pull requests are welcome.

Update 12/2016: Redsea has seen a lot of development since this post was written; see Redsea 0.7, a lightweight RDS decoder.

What it says

The program prints out decoded RDS groups, one group per line. Each group will contain a PI code identifying the station plus varying other data, depending on the group type. The below picture explains the types of data you'll probably most often encounter.

[Image: Screenshot of textual output from redsea, with some parts explained.]

A more verbose output can be enabled with the -l option (it contains the same information though). The -t option prefixes all groups with an ISO timestamp.

How it works

The DSP side of my program, named rtl_redsea, is written in C99. It's a synchronous DBPSK receiver that first bandpass filters ① the multiplex signal. A PLL locks onto the 19 kHz stereo pilot tone; its third harmonic (57 kHz) is used to regenerate the RDS subcarrier. Dividing it by 16 also gives us the 1187.5 Hz clock frequency. Phase offsets of these derived signals are adjusted separately.

[Image: Oscillograms illustrating how the RDS subcarrier is gradually processed in redsea and finally reduced to a series of 1's and 0's.]

The local 57 kHz carrier is synchronized so that the constellation lines up on the real axis, so we can work on the real part only ②. Biphase symbols are multiplied by the square-wave clock and integrated ③ over a clock period, and then dumped into a delta decoder ④, which outputs the binary data as bit strings into stdout ⑤.

Signal quality is estimated a couple of times per second by counting the number of "suspicious" integrated biphase symbols, i.e. symbols with halves of opposite signs. The symbols are being sampled with a 180° phase shift as well, and we can switch to that stream if it seems to produce better results.

This low-throughput binary string data is then handled by redsea.pl via a pipe. Synchronization and error detection/correction happens there, as well as decoding. Group data is then displayed on the terminal, in semi-human-readable form.

Future

My ultimate goal is to have a tool useful for FM DX, i.e. pretty good noise resistance.

Squelch it out

Many air traffic and maritime radio channels, like 2182 kHz and VHF 16, are being monitored and continuously recorded by numerous vessels and ground stations. Even though gigabytes are now cheap, long recordings would have to be compressed to make them easier to move around.

There are some reasons not to use lossy compression schemes like MP3 or Vorbis. Firstly, radio recordings are used in accident investigations, and additional artifacts at critical moments could hinder investigation. Secondly, lossy compression performs poorly on noisy signals, because of the amount of entropy present. Let's examine some properties of such recordings that may aid in their lossless compression instead.

A common property of radio conversations is that the transmissions are short, and they're followed by sometimes very long pauses. Receivers often silence themselves in the absence of a carrier (known as 'squelching'), so there's nothing interesting to record during that time.

This can be exploited by replacing such quiet passages with absolute digital silence, that is a stream of zero-valued samples. Now, using lossless FLAC, passages of digital silence compress into a very small space because of run-length encoding; the encoder simply has to save the number of zero samples encountered.

Update 11/2015: It seems many versions of rtl_fm now do this automatically, i.e. they don't output anything when squelched unless instructed to do so.

Real-life test

Let's see how the method performs on an actual recording. A sample WAV file will be recorded from the radio, from a conversation on VHF FM. The SDR does the actual squelching, and then the PCM waveform will be piped through squelch, a silencer tool I wrote.

rtl_fm      -f 156.3M -p 96 -N -s 16k -g 49.2 -l 200 |\
  sox       -r 16k -t raw -e signed -b 16 -c 1 -V1 -\
               -r 11025 -t raw - sinc 200-4500 |\
  squelch   -l 5 -d 4096 -t 64 |\
  sox       -t raw -e signed -c 1 -b 16 -r 11025 - recording.wav

The waveform is sinc filtered so as to remove the DC offset resulting from FM demodulation without a PLL. This makes silence detection simpler. It also removes any noise outside the actual bandwidth of VHF voice transceivers. (The output of the first SoX at quiet passages is not digital silence; the resample to the common rate of 11,025 Hz introduces some LSB noise. This is what we're squelching in this example.)

30 minutes of audio was recorded. A carrier was on less than half of the time. FLAC compresses the file at a ratio of 0.174; this results in an average bitrate of 30 kbps, which is pretty good for lossless 16-bit audio. At this rate, a CD can fit almost 50 hours of recording.

Update 1/2019: A similar effect can also be achieved using the command-line tool csdr. It supports different sample formats and even a dynamically controllable squelch level.

SSTV reception on Linux

As a programmer writing obscure little FLOSS projects for fun, having someone actually read and fix my code is a compliment. Recently I got a pull request on slowrx, my somewhat stalled slow-scan television demodulator project, and was inspired to write more. (Note: slowrx is not maintained any more – I suggest you to check out QSSTV!)

The project started in 2007 when I was trying to find a SSTV receiver for Linux that could run in "webcam mode" and just save everything it receives. It would use modern stuff like ALSA instead of OSS and Gtk+2/3 instead of 1. Preferably it would work even if the receiver wasn't exactly tuned at the zero-beat frequency. Turned out existing software couldn't do this, so I accepted the challenge.

So here's how slowrx looks today:

slowrx screenshot

FM demodulation is done by a 1024-point Fast Fourier Transform and Gaussian spectrum interpolation. The windowing function adapts to SNR, so the resolution drops under noisy conditions but we get a clearer image (or the user may opt to use a narrow window regardless of noise). Sound card clock mismatch is calculated using a linear Hough transform on the sync channel and fixed by re-running the demodulation at the adjusted rate. The program will detect VIS headers at any in-band frequency and shifts the reception window accordingly. It can decode the FSK callsign after the picture. The program stays up for hours without a segfault. Yeah – it's written in C99!

Pictures I've received using slowrx:

Pictures received

I've noticed the 8- and 12-second black-and-white Robot modes are still in use today. I believe they were the "original" SSTV modes that used to be viewed on a green phosphor screen in a much more analogue environment. So I've been thinking about an optional phosphor filter for B/W modes:

Phosphor filter