Voice inversion is a primitive method of rendering speech unintelligible to prevent eavesdropping of radio or telephone calls. I wrote about some simple ways to reverse it in a previous post. I've since written a software tool, deinvert (on GitHub), that does all this for us. It can also descramble a slightly more advanced scrambling method called split-band inversion. Let's see how that happens behind the scenes.
Simple voice inversion
Voice inversion works by inverting the audio spectrum at a set maximum frequency called the inversion carrier. Frequencies near this carrier will thus become frequencies near zero Hz, and vice versa. The resulting audio is unintelligible, though familiar sentences can easily be recognized.
Deinvert comes with 8 preset carrier frequencies that can be activated with the -p option. These correspond to a list of carrier frequencies I found in an actual scrambler's manual, dubbed "the most commonly used inversion carriers".
The algorithm behind deinvert can be divided into three phases: 1) pre-filtering, 2) mixing, and 3) post-filtering. Mixing means multiplying the signal by an oscillation at the selected carrier frequency. This produces two sidebands, or mirrored copies of the signal, with the lower one frequency-inverted. Pre-filtering is necessary to prevent this lower sideband from aliasing when its highest components would go below zero Hertz. Post-filtering removes the upper sideband, leaving just the inverted audio. Both filters can be realized as low-pass FIR filters.
This operation is its own inverse, like ROT13; by applying the same inversion again we get intelligible speech back. Indeed, deinvert can also be used as a scrambler by just running unscrambled audio through it. The same inversion carrier should be used in both directions.
Split-band inversion
The split-band scrambling method adds another carrier frequency that I call the split point. It divides the spectrum into two parts that are inverted separately and then combined, preventing ordinary inverters from fully descrambling it.
A single filter-inverter pair may already bring back the low end of the spectrum. Descrambling it fully amounts to running the inversion algorithm twice, with different settings for the filters and mixer, and adding the results together.
The problem here is to find these two frequencies. But let's take a look at an example from audio scrambled using the CML CMX264 split-band inverter (from a video by GBPPR2).
In this case the filter roll-off is clearly visible in the spectrogram and it's obvious where the split point is. The higher carrier is probably at the upper limit of the full band or slightly above it. Here the full bandwidth seems to be around 3200 Hz and the split point is at 1200 Hz. This could be initially descrambled using deinvert -f 3200 -s 1200; if the result sounds shifted up or down in frequency this could be refined accordingly.
Performance
On a single core of an i7-based laptop from 2013, deinvert processes a 44.1 kHz WAV file at 60x realtime speed (120x for simple inversion). Most of the CPU cycles are spent doing filter convolution, i.e. calculating the signal's vector dot product with the low-pass filter kernels:
For this reason deinvert has a quality setting (0 to 3) for controlling the number of samples in the convolution kernels. A filter with a shorter kernel is linearly faster to compute, but has a low roll-off and will leave more unwanted harmonics.
A quality setting of 0 turns filtering off completely, and is very fast. For simple inversion this should be fine, as long as the original doesn't contain much power above the inversion carrier. It's easy to ignore the upper sideband because of its high frequency. In split-band descrambling this leaves some nasty folded harmonics in the speech band though.
Here's a descramble of the above CMX264 split-band audio using all the different quality settings in deinvert. You will first hear it scrambled, and then descrambled with increasing quality setting.
The default quality level is 2. This should be enough for real-time descrambling of simple inversion on a Raspberry Pi 1, still leaving cycles for an FM receiver for instance:
(RasPi 1) | Simple inversion | Split-band inversion |
---|---|---|
-q 0 | 16x realtime | 5.8x realtime |
-q 1 | 6.5x realtime | 3.0x realtime |
-q 2 | 2.8x realtime | 1.3x realtime |
-q 3 | 1.2x realtime | 0.4x realtime |
The memory footprint is less than four megabytes.
Future developments
There's a variant of split-band inversion where the inversion carrier changes constantly, called variable split-band. The transmitter informs the receiver about this sequence of frequencies via short bursts of data every couple of seconds or so. This data seems to be FSK, but it shall be left to another time.
I've also thought about ways to automatically estimate the inversion carrier frequency. Shifting speech up or down in frequency breaks the relationships of the harmonics. Perhaps this fact could be exploited to find a shift that would minimize this error?
Links
- deinvert is on GitHub - please also see the wiki for detailed instructions on how to compile and use it.
Awesome post!
ReplyDeleteHow did you get the performance graph?
Thanks!
Thanks! I used XCode Instruments to get the CPU usage numbers, and wrote a Perl script to render that graph in SVG. It was partly ported from JavaScript in this StackOverflow reply, and inspired by sunburst partition in d3js. A little touch-up in Inkscape was also necessary.
DeleteAh, ok, very cool! I thought it was some kind of analyzer's direct output.
DeleteIt looks very pro.
Thanks for the details. Keep it up, ridin' the waves!
Sounds very similar to the radio chatter voices in the film THX 1138 - https://youtu.be/my2WzWKACcQ
ReplyDeleteIs there a more secure way to make the live voice anonymous?
ReplyDeleteI'd like to found a HW scrambler design to SMD it and attach to the phone's mic, but it seems quite hard to do.
Variable split-band (VSB) should be harder to crack; probably not impossible though. But googling for "voice scrambler" seems to bring up some commercial solutions, both analog and digital.
DeleteAllright! Googling a bit abour VSB scrambling I found FX214/FX224 a good candidate to power a prototype.
DeleteThanks for your answer!
I'm not so good in programming, but I want to descramble a WAV file that I already have. Could you tell how I can start with descrambling?
ReplyDeleteHello, nice post, can this work with: CRY2001
ReplyDeletehttps://www.sigidwiki.com/wiki/CRY2001_Voice_Scrambler
I've tested with all the preset with the audio sample on the web with no luck :(
Any help will be appreciate..
Hi, unfortunately deinvert won't be able to unscramble that one. It sounds like the inversion parameters are changing dynamically every 100 milliseconds or so. It would be interesting to study that scrambler though.
DeleteJust an unrelated question: what do you use to draw that signal processing diagram?
ReplyDeletePS: awesome blog :) I just bookmarked it today.
Thanks! I draw them in Inkscape using a self-made symbol set, here: signalflow.svg
DeleteHi im trying to descramble 420.2875 i get scramble voice on op25 but only Starrick on deinvert can anyone help me with this
ReplyDeleteHello I'm totally noob can't understand how to run this script in Ubuntu how to correctly install it with DSP liquid and how to descramble a wave file I saw the commands but I like more details plz if possible with pictures
ReplyDeleteHi windytan I really love your project and work if possible can u give us a detailed explanation of installation process in Ubuntu and how to test in wave files detailed commands BCS I don't know anything about Linux commands or programming much love for this project
ReplyDelete