Headerless train announcements

[Image: Information display onboard a Helsinki train, showing a transcript of an announcement along with the time of the day, current speed and other info.]

The Finnish state railway company just changed their automatic announcement voice, discarding old recordings from trains. It's a good time for some data dumpster diving for the old ones, don't you think?

A 67-megabyte ISO 9660 image is produced that once belonged to an older-type onboard announcement device. It contains a file system of 58 directories with five-digit names, and one called "yleis" (Finnish for "general").

Each directory contains files with three-digit file names. For each number, there's 001.inf, 001.txt and 001.snd. The .inf and .txt files seem to contain parts of announcements as ISO 8859 encoded strings, such as "InterCity train" and "to Helsinki". The .snd files obviously contain the corresponding audio announcements. There's a total of 1950 sound files.

Directory structure

The file system seems to be structurally pointless; there's nothing apparent that differentiates all files in /00104 from files in /00105. Announcements in different languages are numerically separated, though (/001xx = Finnish, /002xx = Swedish, /003xx = English). Track numbers and time readouts are stored sequentially, but there are out-of-place announcements and test files in between. The logic connecting numbers to their meanings is probably programmed into the device for every train route.

Everything can be spliced together from almost single words. But many common announcements are also recorded as whole sentences, probably to make them sound more natural.

Audio format

The audio files are headerless; there is no explicit information about the format, sample rate or sample size anywhere.

The byte histogram and Poincaré plot of the raw data suggest a 4-bit sample size; this, along with the fact that all files start with 0x80, is indicative of an adaptive differential PCM encoding scheme.

[Image: Byte histogram and Poincare plot of a raw audio file, characteristic of Gaussian-distributed data encoded as four-bit samples.]

Unfortunately there are as many variations to ADPCM as there are manufacturers of encoder chips. None of the decoders known by SoX produce clean results. But with the right settings for the OKI-ADPCM decoder we can already hear some garbled speech under heavy Brownian noise.

For unknown reasons, the output signal from SoX is spectrum-inverted. Luckily it's trivial to fix (see my previous post on frequency inversion). The pitch sounds roughly natural when a 19,000 Hz sampling rate is assumed. A test tone found in one file comes out as a 1000 Hz sine when the sampling rate is further refined to 18,930 Hz.

This is what we get after frequency inversion, spectral equalization, and low-pass filtering:

There's still a high noise floor due to the mismatch between OKI-ADPCM and the unknown algorithm used by the announcement device, but it's starting to sound alright!


There seems to be an announcement for every thinkable situation, such as:

  • "Ladies and Gentlemen, as due to heavy snowfall, we are running slightly late. Please accept our apologies."
  • "Ladies and Gentlemen, an animal has been run over by the train. We have to wait a while before continuing the journey."
  • "Ladies and Gentlemen, the arrival track of the train having been changed, the platform is on your left hand side."
  • "Ladies and Gentlemen, we regret to inform you that today the restaurant-car is exceptionally closed."

Also, there is an English recording of most announcements, even though only Finnish and Swedish are usually heard on commuter trains.

One file contains a long instrumental country song.

In an eerily out-of-place sound file, a small child reads out a list of numbers.

Final words

This is something I've wanted to do with this almost melodically intonated announcement about ticket selling compartments.

Time-coding audio files

One day you'll need to include real-time UTC timestamps in audio. It's useful when reconstructing events from long, unsupervised surveillance microphone recordings, or when constantly monitoring and logging radio channels.

There's no standard method for doing this with WAV or FLAC files. One method would be to log the start time in the filename and calculate the time based on audio position. However, this is not possible with voice-activated or squelched recorders. It also relies on the accuracy and stability of the ADC clock.

I'll take a look at some ways to include an accurate timestamp directly in the in-band audio.

Least significant bit

Time information can be encoded in the least significant bit (LSB) of the 16-bit PCM samples. This "steganographic" method requires a lossless file format and lossless conversions. The script below truncates all samples of a raw single-channel signed-integer PCM stream to 15 bits and inserts a 20-byte ISO 8601 timestamp in ASCII roughly every second, preceded by a "mark" start bit. When played back, the LSB can be zeroed out to get rid of the timestamps. The WAV can also be played as such; the "ticking" sound will be practically inaudible at an amplitude of −96 dB. The outgoing PCM stream is then sent to SoX for WAV encoding.

use strict;
use warnings;
use DateTime;
my $snum    = 0;
my $writing = 0;
my $pos     = 0;
my $code    = "";
open my $out, '-|', 'sox -t .raw -e unsigned-integer -b 16 -r 44100 '.
                    '-c 1 - stamped.wav';
while (read STDIN, my $sample, 2) {
  $sample = unpack "s", $sample;
  my $bit = 0;
  if ($writing) {
    $bit = (ord(substr $code, $pos >> 3, 1) >> ($pos % 8)) & 1;
    if (++$pos >= length $code << 3) {
      $writing = 0;
      $bit     = 0;
  } elsif ($snum++ % 44100 == 0) {
    $writing = 1;
    $pos     = 0;
    $bit     = 1;
    $code    = DateTime->now()->iso8601();
  print $out pack "S", ($sample + 0x7FFF) & 0xFFFE | $bit;
close $out;

Note that the start bit of the timestamp will mark the moment the sample reached this script, and it could differ hundreds of milliseconds from the actual moment of reception at the microphone. Also, the timestamp does not mark the start of a second, but is rather timed by an arbitrary sample counter. One could also poll and write the timestamps in a continuous manner.

The above script could be modified to interface with my squelch script, by only inserting timestamps when squelch is not active. The resulting audio could then be efficiently encoded as FLAC.

lsb-time-read.pl reads back the timestamps, also printing the sample position of each. Below is a sound sample of a clean signal followed by a timestamped one.

Lossy-friendly approach

Lossy compression, by definition, does not retain the numeric values of samples, so they can't be treated as bit fields. Instead, we can use an analog modulation scheme like binary FSK. MP3 and Ogg Vorbis encoders will, at a reasonable bit rate, retain the structure of a sufficiently slow FSK burst. This method will work even if the timestamping phase is followed by an analog conversion.

Using the ultrasonic part of the spectrum comes to mind; but unfortunately such high frequencies are mainly ignored by a LPF at the encoder. However, we can use the higher end of the remaining spectrum and filter it out afterwards, if the recording consists of narrow-band speech. In the case of squelched conversation, we could write the timestamp only in the beginning of each transmission. This way it could even be in the speech frequencies.

fsk-timestamp.pl embeds the timestamps into PCM data; they can be read back using minimodem --rx --mark 11000 --space 13000 --file stamped.wav -q 1200.

A sound sample follows.