Oct 30, 2014

Visualizing hex dumps with Unicode emoji

Memorizing SSH public key fingerprints can be difficult; they're just long random numbers displayed in base 16. There are some terminal-friendly solutions, like OpenSSH's randomart. But because I use a Unicode terminal, I like to map the individual bytes into characters in the Miscellaneous Symbols and Pictographs block.

What's happening here? First we create a 256-element array containing a hand-picked collection of emoji. Naturally, they're all assigned an index from 0x00 to 0xff. Then we'll loop through standard input and look for lines containing colon-separated hex bytes. Each hex value is replaced with an emoji from the array.

Here's the output:

The script could easily be extended to support output from other hex-formatted sources as well, such as xxd:

kissofoni; tassun kynsi neulana / musa korvista kajahtaa

Jul 14, 2014

Mapping microwave relay links from video

Radio networks are often at least partially based on microwave relay links. They're those little mushroom-like appendices growing out of cell towers and building-mounted base stations. Technically, they're carefully directed dish antennas linking such towers together over a line-of-sight connection. I'm collecting a little map of nearby link stations, trying to find out how they're interconnected and which network they belong to.

Circling around

We can find a rough direction for any link antenna by approximating a tangent for the dish shroud surface from position-stamped video footage taken while circling the tower. Optimally we would have a drone make a full circle around the tower at a constant distance and elevation to map all antennas at once; but if our DJI Phantom has run out of battery, a GPS positioned still camera at ground level will also do.

The rest can be done manually, or using pattern recognition from OpenCV. In these pictures, the ratio of the diameters of the concentric circles is a sinusoid function of the angle between the antenna direction and the camera direction. At its maximum, we're looking straight at the beam. (The ratio won't max out at unity in this case, because we're looking at the antenna slightly from below.) We can select the frame with the maximum ratio from high-speed footage, or we can interpolate a smooth sinusoid to get an even better value.

This particular antenna is pointing west-northwest with an azimuth of 290°.

What about distance?

Because of the line-of-sight requirement, we also know the maximum possible distance to the linked tower, using the formula 7140 × √(4 / 3 × h) where h is the height of the antenna from ground. If the beam happens to hit a previously mapped tower closer than this distance, we can assume they're connected!

This antenna is communicating to a tower not further away than 48 km. Judging from the building it's standing on, it belongs to a government trunked radio network.

Jun 16, 2014

Headerless train announcements

The Finnish state railway company just changed their automatic announcement voice, discarding old recordings from trains. It's a good time for some data dumpster diving for the old ones, don't you think?

A 67-megabyte ISO 9660 image is produced that once belonged to an older-type onboard announcement device. It contains a file system of 58 directories with five-digit names, and one called "yleis" (Finnish for "general").

Each directory contains files with three-digit file names. For each number, there's 001.inf, 001.txt and 001.snd. The .inf and .txt files seem to contain parts of announcements as ISO 8859 encoded strings, such as "InterCity train" and "to Helsinki". The .snd files obviously contain the corresponding audio announcements. There's a total of 1950 sound files.

Directory structure

The file system seems to be structurally pointless; there's nothing apparent that differentiates all files in /00104 from files in /00105. Announcements in different languages are numerically separated, though (/001xx = Finnish, /002xx = Swedish, /003xx = English). Track numbers and time readouts are stored sequentially, but there are out-of-place announcements and test files in between. The logic connecting numbers to their meanings is probably programmed into the device for every train route.

Everything can be spliced together from almost single words. But many common announcements are also recorded as whole sentences, probably to make them sound more natural.

Audio format

The audio files are headerless; there is no explicit information about the format, sample rate or sample size anywhere.

The byte histogram (left) and Poincaré plot (right) suggest a 4-bit sample size; this, along with the fact that all files start with 0x80, is indicative of an adaptive differential PCM encoding scheme.

Unfortunately there are as many variations to ADPCM as there are manufacturers of encoder chips. None of the decoders known by SoX produce clean results. But with the right settings for the OKI-ADPCM decoder we can already hear some garbled speech under heavy Brownian noise.

For unknown reasons, the output signal from SoX is spectrum-inverted. Luckily it's trivial to fix (see my previous post on frequency inversion). The pitch sounds roughly natural when a 19,000 Hz sampling rate is assumed. A test tone found in one file comes out as a 1000 Hz sine when the sampling rate is further refined to 18,930 Hz.

This is what we get after frequenqy inversion, spectral equalization, and low-pass filtering:

There's still a high noise floor due to the mismatch between OKI-ADPCM and the unknown algorithm used by the announcement device, but it's starting to sound alright!

Peculiarities

There seems to be an announcement for every thinkable situation, such as:

  • "Ladies and Gentlemen, as due to heavy snowfall, we are running slightly late. Please accept our apologies."
  • "Ladies and Gentlemen, an animal has been run over by the train. We have to wait a while before continuing the journey."
  • "Ladies and Gentlemen, the arrival track of the train having been changed, the platform is on your left hand side."
  • "Ladies and Gentlemen, we regret to inform you that today the restaurant-car is exceptionally closed."

Also, there is an English recording of most announcements, even though only Finnish and Swedish are usually heard on commuter trains.

One file contains a long instrumental country song.

In an eerily out-of-place sound file, a small child reads out a list of numbers.

Final words

This is something I've wanted to do with this almost melodically intonated announcement about ticket selling compartments.

Jun 9, 2014

Time-coding audio files

One day you'll need to include real-time UTC timestamps in audio. It's useful when reconstructing events from long, unsupervised surveillance microphone recordings, or when constantly monitoring and logging radio channels.

There's no standard method for doing this with WAV or FLAC files. One method would be to log the start time in the filename and calculate the time based on audio position. However, this is not possible with voice-activated or squelched recorders. It also relies on the accuracy and stability of the ADC clock.

I'll take a look at some ways to include an accurate timestamp directly in the in-band audio.

Least significant bit

Time information can be encoded in the least significant bit (LSB) of the 16-bit PCM samples. This "steganographic" method requires a lossless file format and lossless conversions. The script below truncates all samples of a raw single-channel signed-integer PCM stream to 15 bits and inserts a 20-byte ISO 8601 timestamp in ASCII roughly every second, preceded by a "mark" start bit. When played back, the LSB can be zeroed out to get rid of the timestamps. The WAV can also be played as such; the "ticking" sound will be practically inaudible at an amplitude of −96 dB. The outgoing PCM stream is then sent to SoX for WAV encoding.

Note that the start bit of the timestamp will mark the moment the sample reached this script, and it could differ hundreds of milliseconds from the actual moment of reception at the microphone. Also, the timestamp does not mark the start of a second, but is rather timed by an arbitrary sample counter. One could also poll and write the timestamps in a continuous manner.

The above script could be modified to interface with my squelch script, by only inserting timestamps when squelch is not active. The resulting audio could then be efficiently encoded as FLAC.

lsb-time-read.pl reads back the timestamps, also printing the sample position of each. Below is a sound sample of a clean signal followed by a timestamped one.

Lossy-friendly approach

Lossy compression, by definition, does not retain the numeric values of samples, so they can't be treated as bit fields. Instead, we can use an analog modulation scheme like binary FSK. MP3 and Ogg Vorbis encoders will, at a reasonable bit rate, retain the structure of a sufficiently slow FSK burst. This method will work even if the timestamping phase is followed by an analog conversion.

Using the ultrasonic part of the spectrum comes to mind; but unfortunately such high frequencies are mainly ignored by a LPF at the encoder. However, we can use the higher end of the remaining spectrum and filter it out afterwards, if the recording consists of narrow-band speech. In the case of squelched conversation, we could write the timestamp only in the beginning of each transmission. This way it could even be in the speech frequencies.

fsk-timestamp.pl embeds the timestamps into PCM data; they can be read back using minimodem --rx --mark 11000 --space 13000 --file stamped.wav -q 1200.

A sound sample follows.

Feb 1, 2014

Mystery signal from a helicopter

Last night, YouTube suggested a video for me. It was a raw clip from a news helicopter filming a police chase in Kansas City, Missouri. I quickly noticed a weird interference in the audio, especially the left channel, and thought it must be caused by the chopper's engine. I turned up the volume and realized it's not interference at all, but a mysterious digital signal! And off we went again.

The signal sits alone on the left audio channel, so I can completely isolate it. Judging from the spectrogram, the modulation scheme seems to be BFSK, switching the carrier between 1200 and 2200 Hz. I demodulated it by filtering it with a lowpass and highpass sinc in SoX and comparing outputs. Now I had a bitstream at 1200 bps.

The bitstream consists of packets of 47 bytes each, synchronized by start and stop bits and separated by repetitions of the byte 0x80. Most bits stay constant during the video, but three distinct groups of bytes contain varying data, marked blue below:

What could it be? Location telemetry from the helicopter? Information about the camera direction? Video timestamps?

The first guess seems to be correct. It is supported by the relationship of two of the three byte groups. If the 4 first bits of each byte are ignored, the data forms a smooth gradient of three-digit numbers in base-10. When plotted parametrically, they form an intriguing winding curve. It is very similar to this plot of the car's position (blue, yellow) along with viewing angles from the helicopter (green), derived from the video by magical image analysis (only the first few minutes shown):

When the received curve is overlaid with the car's location trace, we see that 100 steps on the curve scale corresponds to exactly 1 minute of arc on the map!

Using this relative information, and the fact that the helicopter circled around the police station in the end, we can plot all the received data points in Google Earth to see the location trace of the helicopter:

Update: Apparently the video downlink to ground was transmitted using a transmitter similar to Nucomm Skymaster TX that is able to send live GPS coordinates. And this is how they seem to do it.

Update 2: Yes, it's 7-bit Bell 202 ASCII. I tried decoding it as 7-bit data earlier, ignoring parity, but must have gotten the bit order wrong! So I just chose a roundabout way and kept looking at the hex. When fully decoded, the stream says:

#L N390386 W09434208YJ
#L N390386 W09434208YJ
#L N390384 W09434208YJ
#L N390384 W09434208YJ
#L N390381 W09434198YJ
#L N390381 W09434198YJ
#L N390379 W09434188YJ

These are the full lat/lon pairs of coordinates (39° 3.86′ N, 94° 34.20′ W). Nucomm says the system enables viewing the helicopter "on a moving map system". Also, it could enable the receiving antenna to be locked onto the helicopter's position, to allow uninterrupted video downlink.

Thanks to all the readers for additional hints!

Jan 13, 2014

Misleading representations of discrete-time signals

Many audio editor tools draw digitally sampled signals in ways that can be misleading or sometimes downright incorrect. I hope this post will be useful in explaining this recurring and confusing issue.

In Audacity, let's set the project sample rate to 48 kHz and generate a sinusoid tone at, say, 22.5 kHz. Using a tone near the Nyquist frequency and a high zoom level will make what I'm talking about quite apparent:

It appears as though our newly generated sinusoid is "pumping" or going up and down in amplitude! Of course this can't be true; it's just a pure sine wave and should be perfectly reconstructible as such, according to the sampling theorem. But what's happening here?

The problem lies in drawing straight lines between the samples (known as linear interpolation). This overlooks the fact that the original signal – in this case, coming from Audacity's signal generator – only contained frequencies below the Nyquist frequency. Such sharp corners would have been impossible.

A cleaner, more customary way of drawing discrete-time signals is shown below:

Here, only the samples are shown, and their order is discernible from the lines that connect them to the zero level. But we're not trying to claim anything about what's in between those samples (the value is actually undefined).

The "pumping" pattern we're still seeing emerges from the slowly changing phase difference between the sampling frequency and the signal; in other words, the sinusoid is being sampled at different points.

A more faithful representation of the signal can be obtained by low-pass filtering the waveform shown by Audacity – after resampling at a higher rate – using a 'brick-wall' cut-off at the Nyquist frequency. This is equivalent to sinc interpolation, and it's pretty close to what actually happens in the sound card when the signal is played back. We can immediately see how the weird dot pattern was formed:

Sinc interpolation is computationally more complex (slower) than linear interpolation, which is probably why it isn't used by Audacity and others. Some audio editors however, like Cool Edit, do use a better interpolation at high zoom levels.

See also my related post, Rendering PCM with simulated phosphor persistence.

(And since it's been linked so many times – a great episode of Digital Show & Tell explains this and a lot more about sampled signals.)

Nov 24, 2013

Decoding radio-controlled bus stop displays

In the previous post I told about the 16 kbps data stream on FM broadcast frequencies, and my suspicion that it's being used by the bus and tram stop display system here in Helsinki. Now it's time to find out the truth.

I had the opportunity to observe a display stuck in the middle of its bootup sequence, displaying a version string. This revealed that the system is called IBus and it's made by the Swedish company Axentia. Sure enough, their website talks about DARC and how it requires no return channel, making it possible to use battery-powered displays in remote areas.

Not much else is said about the system, though; there are no specs for the proprietary protocol. So I implemented the five-layer DARC protocol stack in Perl and was left with a stream of fully error-corrected packets on top of Layer 5, separated into hundreds of subchannels. Some of these contained human-readable strings with names of terminal stations. They seemed like an easy starting point for reverse engineering.

Here's one such packet after dissection:

Each display seems to store in its memory a list of buses that can be expected to pass the stop, along with their destinations in both official languages. The above "bus & destination packets" are used to update the memory. This is done once a day for each display on a narrow-band subchannel, so that updating all the displays takes the whole day. The mapping of the bus stop ID to actual bus stops is not straightforward, and had to be guessed from the lists of buses, on a case-by-case basis.

A different kind of packet updates the remaining waiting time for each bus in minutes. This "minutes packet" is sent three times per minute for every display in the system.

These packets may contain waiting times for several displays using the same subchannel; the "bus & destination packet" contains an address to the correct minutes slot. A special flag is used to signify an unused slot; another flag indicates that the bus has no working GPS and that the time is an approximation based on timetables. This causes a tilde "~" to appear before the minutes field.

There's also an announcement packet that's used to sent text messages to the displays. Often these will be about traffic disruptions and diverted routes. It's up to the displays to decide how to display the message; in the low-end battery-powered ones, the actual minutes function is (annoyingly) disabled for the duration of the message.

I have yet to figure out the meaning of some packet types with non-changing data.

A special subchannel contains test packets with messages such as "Tämä on Mono-Axentia-testiä... Toimiiko tää ees..." and "Määritykset ja tiedotteet tehty vain Monolla - ei mitään IBus:lla" suggesting that they're planning to migrate from IBus to something called Mono. Interestingly, there's also a repeating test message in German – "Bus 61 nach Flughafen aus Haltestelle 1".

What good is it, you ask? Well, who wouldn't want a personal display repeater at home, telling when it's time to go?