Showing posts with label gnireenigne. Show all posts
Showing posts with label gnireenigne. Show all posts

In pursuit of Otama's tone

It would be fun to use the Otamatone in a musical piece. But for someone used to keyboard instruments it's not so easy to play cleanly. It has a touch-sensitive (resistive) slider that spans roughly two octaves in just 14 centimeters, which makes it very sensitive to finger placement. And in any case, I'd just like to have a programmable virtual instrument that sounds like the Otamatone.

What options do we have, as hackers? Of course the slider could be replaced with a MIDI interface, so that we could use a piano keyboard to hit the correct frequencies. But what if we could synthesize a similar sound all in software?

Sampling via microphone

We'll have to take a look at the waveform first. The Otamatone has a piercing electronic-sounding tone to it. One is inclined to think the waveform is something quite simple, perhaps a sawtooth wave with some harmonic coloring. Such a primitive signal would be easy to synthesize.

[Image: A pink Otamatone in front of a microphone. Next to it a screenshot of Audacity with a periodic but complex waveform in it.]

A friend lended me her Otamatone for recording purposes. Turns out the wave is nothing that simple. It's not a sawtooth wave, nor a square wave, no matter how the microphone is placed. But it sounds like one! Why could that be?

I suspect this is because the combination of speaker and air interface filters out the lowest harmonics (and parts of the others as well) of square waves. But the human ear still recognizes the residual features of a more primitive kind of waveform.

We have to get to the source!

Sampling the input voltage to the Otamatone's speaker could reveal the original signal. Also, by recording both the speaker input and the audio recorded via microphone, we could perhaps devise a software filter to simulate the speaker and head resonance. Then our synthesizer would simplify into a simple generator and filter. But this would require opening up the instrument and soldering a couple of leads in, to make a Line Out connector. I'm not doing this to my friend's Otamatone, so I bought one of my own. I named it TÄMÄ.

[Image: A Black Otamatone with a cable coming out of its mouth into a USB sound card. A waveform with more binary nature is displayed on a screen.]

I soldered the left channel and ground to the same pads the speaker is connected to. I had no idea about the voltage range in advance, but fortunately it just happens to fit line level and not destroy my sound card. As you can see in the background, we've recorded a signal that seems to be a square wave with a low duty cycle.

[Image: Oscillogram of a square wave.]

This square wave seems to be superimposed with a much quieter sinusoidal "ring" at 584 Hz that gradually fades out in 30 milliseconds.

Next we need to map out the effect the finger position on the slider has on this signal. It seems to not only change the frequency but the duty cycle as well. This happens a bit differently depending on which one of the three octave settings (LO, MID, or HI) is selected.

The Otamatone has a huge musical range of over 6 octaves:

[Image: Musical notation showing a range from A1 to B7.]

In frequency terms this means roughly 55 to 3800 Hz.

The duty cycle changes according to where we are on the slider: from 33 % in the lowest notes to 5 % in the highest ones, on every octave setting. The frequency of the ring doesn't change, it's always at around 580 Hz, but it doesn't seem to appear at all on the HI setting.

So I had my Perl-based software synth generate a square wave whose duty cycle and frequency change according to given MIDI notes.

FIR filter 1: not so good

Raw audio generated this way doesn't sound right; it needs to be filtered to simulate the effects of the little speaker and other parts.

Ideally, I'd like to simulate the speaker and head resonances as an impulse response, by feeding well-known impulses into the speaker. The generated square wave could then be convolved with this response. But I thought a simpler way would be to create a custom FIR frequency response in REAPER, by visually comparing the speaker input and microphone capture spectra. When their spectra are laid on top of each other, we can read the required frequency response as the difference between harmonic powers, using the cursor in baudline. No problem, it's just 70 harmonics until we're outside hearing range!

[Image: Screenshot of Baudline showing lots of frequency spikes, and next to it a CSV list of dozens of frequencies and power readings in the Vim editor.]

I then subtracted one spectrum from another and manually created a ReaFir filter based on the extrema of the resulting graph.

[Image: Screenshot of REAPER's FIR filter editor, showing a frequency response made out of nodes and lines interpolated between them.]

Because the Otamatone's mouth can be twisted to make slightly different vowels I recorded two spectra, one with the mouth fully closed and the other one as open as possible.

But this method didn't quite give the sound the piercing nasalness I was hoping for.

FIR filter 2: better

After all that work I realized the line connection works in both directions! I can just feed any signal and the Otamatone will sound it via the speaker. So I generated a square wave in Audacity, set its frequency to 35 Hz to accommodate 30 milliseconds of response, played it via one sound card and recorded via another one:

[Image: Two waveforms, the top one of which is a square wave and the bottom one has a slowly decaying signal starting at every square transition.]

The waveform below is called the step response. One of the repetitions can readily be used as a FIR convolution kernel. Strictly, to get an impulse response would require us to sound a unit impulse, i.e. just a single sample at maximum amplitude, not a square wave. But I'm not redoing that since recording this was hard enough already. For instance, I had to turn off the fridge to minimize background noise. I forgot to turn it back on, and now I have a box of melted ice cream and a freezer that smells like salmon. The step response gives pretty good results.

One of my favorite audio tools, sox, can do FFT convolution with an impulse response. You'll have to save the impulse response as a whitespace-separated list of plaintext sample values, and then run sox original.wav convolved.wav fir response.csv.

Or one could use a VST plugin like FogConvolver:

[Image: A screenshot of Fog Convolver.]

A little organic touch

There's more to an instrument's sound than its frequency spectrum. The way the note begins and ends, the so-called attack and release, are very important cues for the listener.

The width of a player's finger on the Otamatone causes the pressure to be distributed unevenly at first, resulting in a slight glide in frequency. This also happens at note-off. The exact amount of Hertz to glide depends on the octave, and by experimentation I stuck with a slide-up of 5 % of the target frequency in 0.1 seconds.

It is also very difficult to hit the correct note, so we could add some kind of random tuning error. But turns out this is would be too much; I want the music to at least be in tune.

Glides (glissando) are possible with the virtual instrument by playing a note before releasing the previous one. This glissando also happens in 100 milliseconds. I think it sounds pretty good when used in moderation.

I read somewhere (Wikipedia?) that vibrato is also possible with Otamatone. I didn't write a vibratio feature in the code itself, but it can be added using a VST plugin in REAPER (I use MVibrato from MAudioPlugins). I also added a slight flanger with inter-channel phase difference in the sample below, to make the sound just a little bit easier on the ears (but not too much).

Sometimes the Otamatone makes a short popping sound, perhaps when finger pressure is not firm enough. I added a few of these randomly after note-off.

Working with MIDI

We're getting on a side track, but anyway. Working with MIDI used to be straightforward on the Mac. But GarageBand, the tool I currently use to write music, amazingly doesn't have a MIDI export function. However, you can "File -> Add Region To Loop Library", then find the AIFF file in the loop library folder, and use a tool called GB2MIDI to extract MIDI data from it.

I used mididump from python-midi to read MIDI files.

Tyna Wind - lucid future vector

Here's TÄMÄ's beautiful synthesized voice singing us a song.

Gramophone audio from photograph, revisited

"I am the atomic powered robot. Please give my best wishes to everybody!"

Those are the words uttered by Tommy, a childhood toy robot of mine. I've taken a look at his miniature vinyl record sound mechanism a few times before (#1, #2), in an attempt to recover the analog audio signal using only a digital camera. Results were noisy at best. The blog posts resurfaced in a recent IRC discussion which inspired me to try my luck with a slightly improved method.

Source photo

I will be using an old photo of Tommy's internal miniature record I already had from previous adventures in 2012. I don't want to perform another invasive operation on Tommy to take a new photograph, as I already broke a plastic tab last time I opened him. But it also means I don't have control over the photographing environment. It's part of the challenge.

The picture was taken with a DSLR and it's an uncompressed 8-bit color photo measuring 3000 by 3000 pixels. There's a fair amount of focus blur, chromatic aberration and similar distortions. But at this resolution, a clear pattern can be seen when zooming into the grooves.

[Image: Close-up shot of a miniature vinyl record, with a detail view of the grooves.]

This pattern superficially resembles a variable-area optical audio track seen in old film prints, and that's why I previously tried to decode it as such. But it didn't produce satisfactory results, and there is no physical reason it even should. In fact, I'm not even sure as to which physical parameter the audio is encoded in – does the needle move vertically or horizontally? How would this feature manifest itself in the photograph? Do the bright blobs represent crests in the groove, or just areas that happen to be oriented the right way in this particular lighting?

Unwrapping

To make the grooves a little easier to follow I first unwrapped the circular record into a linear image. I did this by remapping the image space from polar to 9000-wide Cartesian coordinates and then resampling it with a windowed sinc kernel:

[Image: The photo of the circular record unwrapped into a long linear strip.]

Mapping the groove path

It's not easy to automatically follow the groove. As one would imagine, it's not a mathematically perfect spiral. Sometimes the groove disappears into darkness, or blurs into the adjacent track. But it wasn't overly tedious to draw a guiding path manually. Most of the work was just copy-pasting from a previous groove and making small adjustments.

I opened the unwrapped image in Inkscape and drew a colored polyline over all obvious grooves. I tried to make sure a polyline at the left image border would neatly continue where the previous one ended on the right side.

The grooves were alternatively labeled as 'a' and 'b', since I knew this record had two different sound effects on interleaved tracks.

[Image: A zoomed-in view of the unwrapped grooves labeled and highlighted with colored lines.]

This polyline was then exported from Inkscape and loaded by a script that extracted a 3-7 pixel high column from the unwrapped original, centered around the groove, for further processing.

Pixels to audio

I had noticed another information-carrying feature besides just the transverse area of the groove: its displacement from center. The white blobs sometimes appear below or above the imaginary center line.

[Image: Parts of a few grooves shown greatly magnified. They appear either as horizontal stripes, or horizontally organized groups of distinct blobs.]

I had my script calculate the brightness mass center (weighted y average) relative to the track polyline at all x positions along the groove. This position was then directly used as a PCM sample value, and the whole groove was written to a WAV file. A noise reduction algorithm was also applied, based on sample noise from the silent end of the groove.

The results are much better than what I previously obtained (see video below, or mp3 here):

Future ideas

Several factors limit the fidelity and dynamic range obtained by this method. For one, the relationship between the white blobs and needle movement is not known. The results could possibly still benefit from more pixel resolution and color bit depth. The blob central displacement (insofar as it is the most useful feature) could also be more accurately obtained using a Gaussian fit or similar algorithm.

The groove guide could be drawn more carefully, as some track slips can be heard in the recovered audio.

Opening up the robot for another photograph would be risky, since I already broke a plastic tab before. But other ways to optically capture the signal would be using a USB microscope or a flatbed scanner. These methods would still be only slightly more complicated that just using a microphone! The linear light source of the scanner would possibly cause problems with the circular groove. I would imagine the problem of the disappearing grooves would still be there, unless some sort of carefully controlled lighting was used.

Headerless train announcements

[Image: Information display onboard a Helsinki train, showing a transcript of an announcement along with the time of the day, current speed and other info.]

The Finnish state railway company just changed their automatic announcement voice, discarding old recordings from trains. It's a good time for some data dumpster diving for the old ones, don't you think?

A 67-megabyte ISO 9660 image is produced that once belonged to an older-type onboard announcement device. It contains a file system of 58 directories with five-digit names, and one called "yleis" (Finnish for "general").

Each directory contains files with three-digit file names. For each number, there's 001.inf, 001.txt and 001.snd. The .inf and .txt files seem to contain parts of announcements as ISO 8859 encoded strings, such as "InterCity train" and "to Helsinki". The .snd files obviously contain the corresponding audio announcements. There's a total of 1950 sound files.

Directory structure

The file system seems to be structurally pointless; there's nothing apparent that differentiates all files in /00104 from files in /00105. Announcements in different languages are numerically separated, though (/001xx = Finnish, /002xx = Swedish, /003xx = English). Track numbers and time readouts are stored sequentially, but there are out-of-place announcements and test files in between. The logic connecting numbers to their meanings is probably programmed into the device for every train route.

Everything can be spliced together from almost single words. But many common announcements are also recorded as whole sentences, probably to make them sound more natural.

Audio format

The audio files are headerless; there is no explicit information about the format, sample rate or sample size anywhere.

The byte histogram and Poincaré plot of the raw data suggest a 4-bit sample size; this, along with the fact that all files start with 0x80, is indicative of an adaptive differential PCM encoding scheme.

[Image: Byte histogram and Poincare plot of a raw audio file, characteristic of Gaussian-distributed data encoded as four-bit samples.]

Unfortunately there are as many variations to ADPCM as there are manufacturers of encoder chips. None of the decoders known by SoX produce clean results. But with the right settings for the OKI-ADPCM decoder we can already hear some garbled speech under heavy Brownian noise.

For unknown reasons, the output signal from SoX is spectrum-inverted. Luckily it's trivial to fix (see my previous post on frequency inversion). The pitch sounds roughly natural when a 19,000 Hz sampling rate is assumed. A test tone found in one file comes out as a 1000 Hz sine when the sampling rate is further refined to 18,930 Hz.

This is what we get after frequency inversion, spectral equalization, and low-pass filtering:

There's still a high noise floor due to the mismatch between OKI-ADPCM and the unknown algorithm used by the announcement device, but it's starting to sound alright!

Peculiarities

There seems to be an announcement for every thinkable situation, such as:

  • "Ladies and Gentlemen, as due to heavy snowfall, we are running slightly late. Please accept our apologies."
  • "Ladies and Gentlemen, an animal has been run over by the train. We have to wait a while before continuing the journey."
  • "Ladies and Gentlemen, the arrival track of the train having been changed, the platform is on your left hand side."
  • "Ladies and Gentlemen, we regret to inform you that today the restaurant-car is exceptionally closed."

Also, there is an English recording of most announcements, even though only Finnish and Swedish are usually heard on commuter trains.

One file contains a long instrumental country song.

In an eerily out-of-place sound file, a small child reads out a list of numbers.

Final words

This is something I've wanted to do with this almost melodically intonated announcement about ticket selling compartments.

Mystery signal from a helicopter

Last night, YouTube suggested a video for me. It was a raw clip from a news helicopter filming a police chase in Kansas City, Missouri. I quickly noticed a weird interference in the audio, especially the left channel, and thought it must be caused by the chopper's engine. I turned up the volume and realized it's not interference at all, but a mysterious digital signal! And off we went again.

The signal sits alone on the left audio channel, so I can completely isolate it. Judging from the spectrogram, the modulation scheme seems to be BFSK, switching the carrier between 1200 and 2200 Hz. I demodulated it by filtering it with a lowpass and highpass sinc in SoX and comparing outputs. Now I had a bitstream at 1200 bps.

[Image: A nondescript oscillogram of the data signal, and below it, the signal after FM demodulation, showing a clear pattern characteristic of binary FSK switching at 1200 bps.]

The bitstream consists of packets of 47 bytes each, synchronized by start and stop bits and separated by repetitions of the byte 0x80. Most bits stay constant during the video, but three distinct groups of bytes contain varying data, marked blue below:

[Image: A time-stamped hex dump of the byte stream, arranged in packets with only a few bytes changing over time.]

What could it be? Location telemetry from the helicopter? Information about the camera direction? Video timestamps?

The first guess seems to be correct. It is supported by the relationship of two of the three byte groups. If the 4 first bits of each byte are ignored, the data forms a smooth gradient of three-digit numbers in base-10. When plotted parametrically, they form an intriguing winding curve. It is very similar to this plot of the car's position (blue, yellow) along with viewing angles from the helicopter (green), derived from the video by manually following landmarks (only the first few minutes shown):

[Image: Screenshot from Google Earth, showing time-stamped placemarks tracing the roads of a suburb, accompanied by an X-Y plot of the changing FSK bytes that draws a very similar picture.]

When the received curve is overlaid with the car's location trace, we see that 100 steps on the curve scale corresponds to exactly 1 minute of arc on the map!

Using this relative information, and the fact that the helicopter circled around the police station in the end, we can plot all the received data points in Google Earth to see the location trace of the helicopter:

[Image: Coordinates from the whole data signal plotted on top of a Google Earth satellite photo several miles across, with a lot of circling around.]

Update: Apparently the video downlink to ground was transmitted using a transmitter similar to Nucomm Skymaster TX that is able to send live GPS coordinates. And this is how they seem to do it.

Update 2: Yes, it's 7-bit Bell 202 ASCII. I tried decoding it as 7-bit data earlier, ignoring parity, but must have gotten the bit order wrong! So I just chose a roundabout way and kept looking at the hex. When fully decoded, the stream says:

#L N390386 W09434208YJ
#L N390386 W09434208YJ
#L N390384 W09434208YJ
#L N390384 W09434208YJ
#L N390381 W09434198YJ
#L N390381 W09434198YJ
#L N390379 W09434188YJ

These are the full lat/lon pairs of coordinates (39° 3.86′ N, 94° 34.20′ W). Nucomm says the system enables viewing the helicopter "on a moving map system". Also, it could enable the receiving antenna to be locked onto the helicopter's position, to allow uninterrupted video downlink.

Thanks to all the readers for additional hints!

If you want to try it yourself, there's a shell script that will run sox, minimodem, and Perl in the right order for you.

Decoding radio-controlled bus stop displays

In the previous post I told about the 16 kbps data stream on FM broadcast frequencies, and my suspicion that it's being used by the bus and tram stop display system here in Helsinki. Now it's time to find out the truth.

I had the opportunity to observe a display stuck in the middle of its bootup sequence, displaying a version string. This revealed that the system is called IBus and it's made by the Swedish company Axentia. Sure enough, their website talks about DARC and how it requires no return channel, making it possible to use battery-powered displays in remote areas.

[Image: Photo of a liquid crystal display showing the text: 8589 IBUS 6.5.70d]

Not much else is said about the system, though; there are no specs for the proprietary protocol. So I implemented the five-layer DARC protocol stack in Perl and was left with a stream of fully error-corrected packets on top of Layer 5, separated into hundreds of subchannels. Some of these contained human-readable ISO-8859-1 strings with names of terminal stations. They seemed like an easy starting point for reverse engineering.

Here's one such packet after dissection:

[Image: An infographic titled 'Bus & Destination Packet', showing the hex bytes of a 64-byte packet, divided into fields that are labeled according to their apparent purpose. There are counters, size fields, the bus stop identifier, bus line identifiers, and apparent references to other types of packets. Several fields containing Latin-1 text are also transcribed. The text in them reads '23N RUSKEASUO BRUNAKÄRR'.]

Each display seems to store in its memory a list of buses that can be expected to pass the stop, along with their destinations in both official languages. The above "bus & destination packets" are used to update the memory. This is done once a day for each display on a narrow-band subchannel, so that updating all the displays takes the whole day. The mapping of the bus stop ID to actual bus stops is not straightforward, and had to be guessed from the lists of buses, on a case-by-case basis.

A different kind of packet updates the remaining waiting time for each bus in minutes. This "minutes packet" is sent three times per minute for every display in the system.

[Image: An infographic titled 'Minutes Packet', showing another type of labeled packet. Most fields are one byte long and they contain the number of minutes until the next bus arrives, in 7 bits, and one bit telling whether this estimate is based on positioning or time tables. Information about which number belongs to which bus is contained in the references from other types of packets.]

These packets may contain waiting times for several displays using the same subchannel; the "bus & destination packet" contains an address to the correct minutes slot. (The subchannel address is signaled on a lower protocol layer.) A special flag is used to signify an unused slot; another flag indicates that the bus has no working GPS and that the time is an approximation based on timetables. This causes a tilde "~" to appear before the minutes field. This means all calculation is done centrally and the displays only show numbers they hear on the radio.

There's also an announcement packet that's used to sent text messages to the displays. Often these will be about traffic disruptions and diverted routes. It's up to the displays to decide how to display the message; in the low-end battery-powered ones, the actual minutes function is (annoyingly) blanked for the duration of the slowly scrolling message.

I have yet to figure out the meaning of some packet types with non-changing data.

A special subchannel contains test packets with messages such as "Tämä on Mono-Axentia-testiä... Toimiiko tää ees..." and "Määritykset ja tiedotteet tehty vain Monolla - ei mitään IBus:lla" suggesting that they're planning to migrate from IBus to something called Mono. Interestingly, there's also a repeating test message in German – "Bus 61 nach Flughafen aus Haltestelle 1".

What good is it, you ask? Well, who wouldn't want a personal display repeater at home, telling when it's time to go?

[Image: Photo of a small liquid crystal display kit with its PCB showing, obviously home-soldered to a bunch of wires, and displaying the text: '72 TAPANILA ~12'.]

Broadcast messages on the DARC side

By now I've rambled quite a lot about RDS, the data subcarrier on the third harmonic of the 19 kHz FM stereo pilot tone. And we know the second harmonic carries the stereo information. But is there something beyond that?

Using a wideband FM receiver, like an RTL-SDR, I can plot the whole demodulated spectrum of any station, say, from baseband to 90 kHz. Most often there is nothing but silence above the third pilot harmonic (denoted 3f here), which is the RDS subcarrier. But two local stations have this kind of a peculiar spectrogram:

[Image: A spectrogram showing a signal at the audible frequency range, labeled 'mono', and four carriers centered at 19, 38, 57, and 76 kHz, labeled pilot, 2f, 3f, and 4f, respectively. Pilot is a pure sinusoid; 2f and 3f are several kHz wide signals with mirrored sidebands; and 4f is 20 kHz wide and resembles wideband noise.]

There seems to be lots of more data on a pretty wide band centered on the fourth harmonic (4f)!

As so often with mysterious persistent signals, I got a lead by googling for the center frequency (76 kHz). So, meet the imaginatively named Data Radio Channel, or DARC for short. This 16,000 bps data stream uses level-controlled minimum-shift keying (L-MSK), which can be thought of as a kind of offset-quadrature phase-shift keying (O-QPSK) where consecutive bits are sent alternating between the in-phase and quadrature channels.

To find out more, I'll need to detect the signal first. Detecting L-MSK coherently is non-trivial ("tricky") for a hobbyist like me. So I'm going to cheat a little and treat the signal as continuous-phase but non-coherent binary FSK instead. This should give us good data, even though the bit error probability will be suboptimally high. I'll use a band-pass filter first; then a splitter filter to split the signal band into mark and space; detect both envelopes and calculate difference; lowpass filter with a cutoff at the bitrate; recover bit timing and synchronize using a PLL; and now we can just take the bits out by hard decision at bit middle points.

Oh yeah, we have a bitstream!

[Image: Three oscillograms followed by a stream of 1s and 0s. The first oscillogram is quite nondescript; the second one actually shows two waveforms, red and blue, in the same graph, with the red dominating in envelope power where blue is suppressed and vice versa. The third oscillograms shows a graph apparently following their envelope power difference, with sample points at regular intervals. The sign of this plot at sample points dictates whether a 0 or 1 is shown below that sample.]

Decoder software for DARC does not exist, according to search engines. To read the actual data frames, I'll have to acquire block synchronization, regenerate the pseudorandom scrambler used and reverse its action, check the data against CRCs for errors, and implement the stack of layers that make up the protocol. The DARC specification itself is luckily public, albeit not very helpfully written; in contrast to the RDS standard, it contains almost no example circuits, data, or calculations. So, a little recap of Galois field mathematics and linear feedback shift registers for me.

But what does it all mean? And who's listening to it?

Now, it's no secret (Signaali 3/2011, p. 16) that the HSL bus stop timetable displays in Helsinki get their information about the next arriving GPS positioned buses through this very FM station, YLE 1. That's the only thing they're telling though. I haven't found anything bus-related in the RDS data, so it's quite likely that they're listening to DARC.

[Image: Photo of a rugged LCD display mounted on a metal pole, displaying the text '55K FORSBY ~6' and stamped with the logo 'HSL HRT'.]

DARC is known to be used in various other applications as well, such as DGPS correction signals. So decoding it could prove interesting.

Sniffing the packet types being sent, it seems that quite a big part of the time is being used to send a transmit cycle schedule (COT and SCOT). And indeed, the aforementioned magazine article hints that the battery-powered displays use a precise radio schedule to save power, receiving only during a short period every minute each. (This grep only lists metadata type packets, not the actual data.)

$ perl darcdec-filters.pl | grep Type: | sort | uniq -c
  88 Type: 0 Channel Organization Table (COT)
   8 Type: 1 Alternative Frequency Table (AFT)
   8 Type: 2 Service Alternative Frequency Table (SAFT)
   1 Type: 4 Service Name Table (SNT)
   8 Type: 5 Time and Date Table (TDT)
 112 Type: 6 Synchronous Channel Organization Table (SCOT)
$ █

At the moment I'm only getting sensible data out at the very beginning of each packet (or "block"). I do get a solid, error-free block sync but, for example, the date is consistently showing the year 1922 and all CRCs fail. Other fields are similarly weird but consistent. This could mean that I've still got the descrambler PN polynomial wrong, only succeeding when it's using bits from the initialization seed. And this can only mean many more sleepless coding nights ahead.

(Continued in the next post)

(Pseudotags for Finns: HSL:n pysäkkinäytöt eli aikataulunäytöt.)

Update 1/2019: There's now a basic DARC decoder on GitHub, it's called darc2json.

The burger pager

A multinational burger chain has a restaurant nearby. One day I went there and ordered a take-away burger that was not readily available. (Exceptional circumstances; Ludum Dare was underway and I really needed fast food.) The clerk gave me a device that looked like a thick coaster, and told me I could fetch the burger from the counter when the coaster starts blinking its lights and make noises.

Of course, this device deserved a second look! (We can forget about the burger now) The device looked like a futuristic coaster with red LEDs all around it. I've blurred the text on top of it for dramatic effect.

[Image: Photo of a saucer-shaped device sitting on the edge of a table, with red LEDs all around it. It has a label sticker with a logo that is redacted from the photo.]

Several people in the restaurant were waiting for orders with their similar devices, which suggested to me this could be a pager system of some sort. Turning the receiver over, we see stickers with interesting information, including a UHF carrier frequency.

[Image: Photo of a smaller sticker on the bottom of the device, reading 'BASE ID:070 FRQ:450.2500MHZ'.]

For this kind of situations I often carry my RTL2832U-based television receiver dongle with me (the so-called rtl-sdr). Luckily this was one of those days! I couldn't help but tune in to 450.2500 MHz and see what's going on there.

[Image: Photo of a very small radio hotglued inside a Hello Kitty branded tin.]

And indeed, just before a pager went off somewhere, this burst could be heard on the frequency (FM demodulated audio):

Googling the device's model number, I found out it's using POCSAG, a common asynchronous pager protocol at 2400 bits per second. The modulation is binary FSK, which means we should be able to directly read the data from the above demodulated waveform, even by visual inspection if necessary. And here's the data.

...
10101010101010101010101010101010  preamble for bit sync
10101010101010101010101010101010
10101010101010101010101010101010
01111100110100100001010111011000  sync codeword
00101010101110101011001001001000  pager address
01111010100010011100000110010111  idle
01111010100010011100000110010111
01111010100010011100000110010111
01111010100010011100000110010111
...

There's no actual message being paged, just an 18-bit device address. It's preceded by a preamble of alternating 1's and 0's that the receiver can grab onto, and a fixed synchronization codeword that tells where the data begins. It's followed by numerous repetitions of the idle codeword.

The next question is inevitable. How much havoc would ensue if someone were to loop through all 262,144 possible addresses and send a message like this? I'll leave it as hypothetical.

A determined 'hacker' decrypts RDS-TMC

As told in a previous post, I like to watch the RDS-TMC traffic messages every now and then, just for fun. Even though I've never had a car. Actually I haven't done it for years now, but thought I'd share with you the joy of solving the enigma.[disclaimer 1]

RDS-TMC is used in car navigators to inform the driver about traffic jams, roadworks and urgent stuff like that. It's being broadcast on a subcarrier of a public radio FM transmission. It's encrypted in many countries, including mine, so that it could be monetized by selling the encryption keys.

A draft of the encryption standard, namely ISO/DIS 14819-6, is freely available online. Here's an excerpt[disclaimer 2] that reads blatantly like a challenge:

"After calling for candidate proposals [for a method of encryption], the submission from Deutsche Telekom was judged by an expert panel to have best met the pre-determined criteria the task force had established. The method encrypts the sixteen bits that form the Location element in each RDS-TMC message to render the message virtually useless without decryption. The encryption is only 'light' but was adjudged to be adequate to deter other than the most determined 'hacker'. More secure systems were rejected because of the RDS capacity overhead that was required."

TMC messages consist mostly of numeric references to a static database of preset sentences and locations; no actual text is being transmitted. The database is not a secret and is freely available. The location information is encrypted with a key that changes daily. Every night, a random key is picked from 31 pregenerated alternatives. The key is never transferred over the air, only its numeric ID (1–31). The keys are preprogrammed into all licensed TMC receivers, and they can decrypt the locations knowing the key ID.

The size of the key space is 216 and the encryption algorithm consists of three permutation operations:

[Image: A cipher diagram beginning with the 16-bit hex words C1A0 and F3D5, labeled 'location' and 'key', respectively. The 'location' word goes into a bitwise right rotation block, controlled by the first nybble of 'key'. The third and fourth nybble of 'key', taken as a single byte, go to a bitwise left shift block controlled by the second nybble of 'key'. Outputs from these two bitwise blocks are XORed. The result is the hex word 85E9, labeled 'encrypted location'.]

The algorithm is simple enough to be run using pen-and-paper hardware, and that's just what I did while creating the above crypto diagram:

[Image: A pink, heart-shaped Post-It note with a hand-written columnar XOR calculation in binary. One of the operands is shifted left from the alignment. The result is 85E9.]

The tricky part is that I don't know the keys. But there's a catch. To save bandwidth, only regional messages are transmitted. This limits the space of possible locations, giving us a lot of information about the encrypted data. Assuming all messages are from this limited region, we can limit the number of keys to a very small number, in the dozens.

The next day, we have an all new encryption key again. But there's another catch. Many messages persist over several days, if not weeks. These would be messages about long-lasting roadworks and such. We just need to wait for messages that we heard yesterday that only have their location code changed, and we can continue limiting the keyspace by collecting more data.

Once we've limited the keyspace to a single key, we can decrypt all of today's messages. When the key changes again, it is trivial to find today's key by knowing yesterday's key and comparing the locations of persistent messages; this is known as a known-plaintext attack or KPA.

Here's some encrypted data straight from the radio.

$ ./redsea.pl | grep TMC
══╡ TMC msg 00 1828 4400
══╡ TMC sysmsg 6040
══╡ TMC msg 00 1828 4400
══╡ TMC msg 07 8264 0294
══╡ TMC msg 07 8264 0294
══╡ TMC msg 07 8264 0294
══╡ TMC sysmsg 0021
══╡ TMC msg 07 5964 72ca
█

A little Perl script then decodes everything and even plots the affected segment on a little map. The screenshot is from a few years back.

[Image: Screenshot of a GUI with the title 'RDS-TMC'. It's divided into three sections. The first one, labeled 'TMC Service', tells that the service provider is 'MMN TMC+', we're using location table 6/17, and the data is encrypted with key number 5. The second section, labeled 'Received messages', shows a scrollable columnar list of traffic messages received, with a one-word description, an icon and rough location. The third section, 'Message', shows the selected message in detail. A map displays the affected road segment. A long event description is printed in Finnish, along with the road name, exact coordinates, speed limit, time of expiration, and the last time the message was received.]

Now I just need a car. Well, actually I prefer motorcycles. But I guess it would work, too.

Tools used: Ordinary FM radio, sound card, computer. All data is from public sources. RDS was decoded from intermodulation distortion in the radio's Line Out audio caused by the stereo demuxer circuitry.

Update 2014-07-27: Some news seem to highlight that I was the first one to break this joke of a cipher. This could be true; I don't really care. In any case, the often-referred-to 2007 work by Barisani and Bianco (PDF 13MB) was done on unencrypted RDS-TMC and no cryptanalysis was involved; "encryption is supported for commercial services but irrelevant to our goals". I encourage you to read it, it addressed some of the real-world security implications of injecting crafted TMC messages into cars.


Disclaimer 1: I will take this post down on the first appearance of any complaint from any party, of course. My intent is not malicious and I'm not even publishing any keys or code.

Disclaimer 2: This use of the above excerpt of the ISO standard is not an infringement of copyright as it is being used here under the doctrine of "Fair Use" of the United States Copyright Law (17 U.S.C. § 107), seeing as this blog is hosted on US soil.

Eavesdropping on a wireless keyboard

Some time ago, I needed to find a new wireless keyboard. With the level of digital paranoia that I have, my main priority was security. But is eavesdropping a justifiable concern? How insecure would it actually be to type your passwords using an older type of wireless keyboard?

To investigate this, I bought an old Logitech iTouch PS/2 cordless keyboard at an online auction. It's dated July 2000. Back in those days, wireless desktops used the 27 MHz shortwave band; later they've largely moved to 2.4 GHz. This one happens to be of the shortwave type. They've been tapped before (pdf), but no proof-of-concept was published.

I actually disposed of the keyboard before I could photograph it, so here's a newer Logitech S510 from 2005, still using the same technology:

[Image: Photo of a black cordless QWERTY keyboard with a Logitech logo.]

Compared to modern USB wireless dongles, the receiver of the iTouch is huge. It isn't a one chip wonder either, and contains a PCB with multiple crystal oscillators and decoder ICs. Based on Google results, one of the Motorola chips is an FM receiver, which gives us a hint about the mode of transmission.

[Image: On the left, a gray box with the Logitech logo, a button labeled 'CONNECT', and a wire coming out of it, ending in two PS/2 connectors. On the right, the box opened and its PCB revealed. On the PCB there are three metal-colored crystals (16.4200 0014TC, 10.1700 0011TC3, 10.2700 HELE), two bulky components labeled LT455EW, and three microchips with Motorola logos. An array of wires goes out.]

But because eavesdropping is our goal here, I'm tossing the receiver. Afterall, the signal is well within the 11-meter band of any home receiver with an SW band. For bandwidth reasons however, I'll use my RTL2838-based television receiver dongle, which can be tuned to an arbitrary frequency and commanded to just dump the I/Q sample stream (using rtl-sdr).

The transmission is clearly visible at 27.14 MHz. Zooming closer and taking a spectrogram, the binary FM/FSK nature of the transmission becomes obvious:

[Image: Spectrogram showing a constant sinusoid and, below it, a burst of data encoded in the frequency of another sinusoid.]

The sample length of one bit indicates a bitrate of 850 bps. A reference oscillator with a digital PLL can be easily implemented in software. I assumed there's a delta encoding on top of the FSK.

One keypress produces about 85 bits of data. The bit pattern seems to always correlate with the key being pressed, so there's no encryption at all. Pressing the reassociation button doesn't change things either. Without going too much into the details of the obscure protocol, I just mapped all keys to their bit patterns, like so:

w 111111101111011111101111101101011011100111111111001111111101111011111101111101101
e 111111101111011111101111110101011011100111111111001111111101111011111101111110101
1 111111101111011111101111110110110111001111111110011111111011110111111111110110110
2 111111101111011111101111110110111111100111111111001111111101111011111101111110110
3 111111101111011111101111111010111101001111111110011111111011110111111011111110101
7 111111101111011111101111111110110110110011111111100111111110111101111111011111111
8 111111101111011111101111101110111110011111111100111111110111101111110111110111011
9 111111101111011111101111101110110101100111111111001111111101111011111101111110110
0 111111101111011111101111110110111011001111111110011111111011110111111011111101110
u 111111101111011111101111111101011110011111111100111111110111101111110111111110101
i 111111101111011111101111111101010111001111111110011111111011110111111011111111010
                                                               6,8              40% 

The bitstrings are so much correlated between keystrokes that we can calculate the Levenshtein distance of the received bitstring to all the mapped keys, find the smallest distance, and behold, we can receive text from the keyboard!

$ rtl_sdr - -f 27132000 -g 32.8 -s 96000 |\
> sox -r .raw -c 2 -r 96000 -e unsigned -b 8 - -t .raw -r 22050 - |\
> ./fm |perl deco.pl
Found 1 device(s):
  0:  Realtek, RTL2838UHIDIR, SN: 00000013

Using device 0: ezcap USB 2.0 DVB-T/DAB/FM dongle
Found Rafael Micro R820T tuner
Tuned to 27132000 Hz.
Tuner gain set to 32.800000 dB.
Reading samples in async mode...
[CAPS]0wned
█

So, when buying a keyboard for personal use, I chose one with 128-bit AES air interface encryption.

Update: This post was mostly about accomplishing this with low-level stuff readily available at home. For anyone needing a proof of concept or even decoder hardware, there's KeyKeriki.

Update #2: Due to requests, my code is here: fm.c, deco.pl. Of course it's for reference only, as it's not a working piece of software, and never will be. Oh and it's not pretty either.

The infrared impulse

Imagine a movie party with friends. Just as the 237-minute Love Exposure is about to start, you feel the need to have a remote controller. You remember the spare TV remote lying around. You also happen to have an infrared phototransistor from a Hauppauge HVR-1100 PCI card. So you quickly try to come up with a way to receive the signals using stuff you readily have at home.

[Image: A television remote control with a couple dozen buttons and the label 'Hauppauge!'.]

Far-fetched? Well, it did happen to me. The movie was some great stuff, so I decided to sit and watch it instead. But the next day, I finished this remotely useful contraption. (Of course proper USB IR receivers are cheap on eBay and well supported on Linux, but hey.)

Long story short, I connected the phototransistor directly into the sound card's line in by soldering the leads to a stereo jack. The sound card is becoming my favorite method of sampling the outside world into the computer.

[Image: A stereo miniplug with its lead connected to a phototransistor.]

And sure enough, pressing the button "1" on the remote produces this waveform:

[Image: An oscillogram with 12 strong downward spikes of two distinctly different durations.]

By comparing the timing to some common IR protocols I found out the signal is Manchester-encoded RC-5. After running-average type lowpass filtering, thresholding and Manchester decoding in Perl, we get these 14 bits:

[Image: Spikes from the above spectrogram superimposed with a regular clock signal and interpreted so that when a clock tick coincides with a spike, a logic one is produced, and when it coincides with zero, a logic zero is produced.]

The first "11" is a start bit sequence, then follows a toggle bit that tells whether this is a repeat or a new keypress. The rest is an address field (11110b = 1Eh) and the command itself (000001b = "number 1").

Yay, no new hardware needed!