Showing posts with label communications. Show all posts
Showing posts with label communications. Show all posts

Using HDMI EMI for fast wireless data transfer

This story, too, begins with noise. I was browsing the radio waves with a software radio, looking for mysteries to accompany my ginger tea. I had started to notice a wide-band spiky signal on a number of frequencies that only seemed to appear indoors. Some sort of interference from electronic devices, probably. Spoiler alert, it eventually led me to broadcast a webcam picture over the radio waves... but how?

It sounds like video

The mystery deepened when I listened to how this interference sounded like as an AM signal. It reminded me of a time I mistakenly plugged our home stereo system to the Nintendo console's video output and heard a very similar buzz.

Am I possibly listening to video? Why would there be analog video transmitting on any frequency, let alone inside my home?

[Image: Oscillogram of a noisy waveform that seems to have a pulse every 10 microseconds or so.]

If we plot the signal's amplitude against time we can see that there is a strong pulse exactly 60 times per second. This could be the vertical synchronisation signal of 60 Hz video. A shorter pulse (pictured above) can be seen repeating more frequently; it could be the horizontal one. Between these pulses there is what appears to be noise. Maybe, if we use the strong pulses for synchronisation and plot the amplitude of that noise as a two-dimensional picture, we could see something?

And sure enough, when main screen turn on, we get signal:

[Image: A grainy greyscale image of what appears to be a computer desktop.]

(I've hidden the bright synchronisation signal from this picture.)

It seems to be my Raspberry Pi's desktop with weirdly distorted greyscale colours! Somehow, some part of the monitor setup is radiating it quite loudly into the aether. The frequency I'm listening to is a multiple of the monitor's pixel clock frequency.

As it turns out, this vulnerability of some monitors has been known for a long time. In 1985, van Eck demonstrated how CRT monitors can be spied on from a distance[1]; and in 2004, Markus Kuhn showed that the same still works on flat-screen monitors[2]. The image is heavily distorted, but some shapes and even bigger text can be recognisable. Sometimes this kind of eavesdropping is called "van Eck phreaking", in reference to phone phreaking.

The next thought was, could we get any more information out of these images? Is there any information about colour?

Mapping all the colours

HDMI is fully digital; there is no linear dependency between pixel values and greyscale brightness in this amplitude image. I believe the brightness in the above image is related to the number of bit transitions over my radio's sampling time (which is around 8 bit-lengths); and in HDMI, this is dependent on many things, not just the actual RGB value of the pixel. HDMI also uses multiple differential wires that all are transmitting their own picture channels side by side.

This is why I don't think it's possible easy to reconstruct a clear picture of what's being shown on the screen, let alone decode any colours.

But could the reverse be possible? Could we control this phenomenon to draw the greyscale pictures of our choice on the receiver's screen? How about sending binary data by displaying alternating pixel values on the monitor?

[Image: On the left, gradients of red, green, and blue; on the right, greyscale lines of seemingly unrelated brightness.]

My monitor uses 16-bit colours. There are "only" 65,536 different colours, so it's possible to go through all of them and see how each appears in the receiver. But it's not that simple; the bit-pattern of a HDMI pixel can actually get modified based on what came before it. And my radio isn't fast enough to even tell the bits apart anyway. What we could do is fill entire lines with one colour and average the received signal strength. We would then get a mapping for single-colour horizontal streaks (above). Assuming a long run of the same colour always produces the same bitstream, this could be good enough.

[Image: An XY plot where x goes from 0 to 65536 and Y from 0 to 1.2. A pattern seems to repeat itself every 256 values of x. Values from 16128 to 16384 are markedly higher.]

Here's the map of all the colours and their intensity in the radio receiver. (Whatever happens between 16,128 and 16,384? I don't know.)

Now, we can resample a greyscale image so that its pixels become short horizontal lines. Then, for every greyscale value find the closest matching RGB565 color in the above map. When we display this psychedelic hodge-podge of colour on the screen (on the right), enough of the above mapping seems to be preserved to produce a recognizable picture of a movie[3] on the receiver side (on the left):

[Image: On the right, a monitor shows a noisy green and blue image. On the left, another monitor shows a grainy picture of a man and the text 'Hackerman'.]

These colours are not constant in any way. If I move the antenna around, even if I turn it from vertical to horizontal, the greyscales will shift or even get inverted. If I tune the radio to another harmonic of the pixel clock frequency, the image seems to break down completely. (Are there more secrets to be unfolded in these variations?)

The binary exfiltration protocol

Now we should have enough information to be able to transmit bits. Maybe even big files and streaming data, depending on the bitrate we can achieve.

First of all, how should one bit be encoded? The absolute brightness will fluctuate depending on radio conditions. So I decided to encode bits as the brightness difference between two short horizontal lines. Positive difference means 1 and negative 0. This should stay fairly constant, unless the colours completely flip around that is.

[Image: When a bit is 0, the leftmost line is darker than the rightmost line, and vice versa. These lines are used to form 768-bit packets.]

The monitor has 768 pixels vertically. This is a nice number so I designed a packet that runs vertically across the display. (This proved to be a bad decision, as we will later see.) We can stack as many packets side-by-side as the monitor width allows. A new batch of packets can be displayed in each frame, or we can repeat them over multiple frames to improve reliability.

These packets should have some metadata, at least a sequence number. Our medium is also quite noisy, so we need some kind of forward error correction. I'm using a Hamming(12,8) code which adds 4 error correction bits for every 8 bits of data. Finally, we need to add a CRC to each packet so we can make sure it arrived intact; I chose CRC16 with the polynomial 0x8005 (just because liquid-dsp provided it by default).

First results!

It was quite unbelievable, I was able to transmit a looping 64 kbps audio stream almost without any glitches, with the monitor and the receiver in the same room approximately 2 meters from each other.

Quick tip. Raw 8-bit PCM audio is a nice test format for these kinds of streaming experiments. It's straightforward to set an arbitrary bitrate by resampling the sound (with SoX for instance); there's no structure, headers, or byte order to deal with; and any packet loss, misorder, or buffer underrun is instantly audible. You can use a headerless companding algorithm like A-law to fit more dynamic range in 8 bits. Even stereo works; if you start from the wrong byte the channels will just get swapped. SoX can also play back the stream.

But can we get more? Slowly I added more samples per second, and a second audio channel. Suddenly we were at 256 kbps and still running smoothly. 200 kbps was even possible from the adjacent room, with a directional antenna 5 meters away, and with the door closed! In the same room, it worked up to around 512 kilobits per second but then hit a wall.

[Image: Info window that says HasPreamble: 1. Total: 928.5 kbps, Fresh: 853.6 kbps, Fresh (payload): 515.7 kbps.]

A tearful performance

The heavy error correction and framing adds around 60% of overhead, and we're left wit 480 bits of 'payload' per packet. If we have 39 packets per frame at 60 frames per second we should get more than a megabit per second, right? But for some reason it always caps at half a megabit.

The reason revealed itself when I noticed every other frame was often completely discarded at the CRC check. Of course; I should have thought of properly synchronising the screen update to the graphics adapter's frame update cycle (or VSYNC). This would prevent the picture information changing mid-frame, also known as tearing. But whatever options I tried with the SDL library I couldn't get the Raspberry Pi 4 to not introduce tearing.

Screen tearing appears to be an unsolved problem plaguing the Raspberry Pi 4 specifically (see this Google search). I tried another mini computer, the Asus Tinker Board R2.0, but I couldn't get the graphics drivers to work properly. I then realised it was a mistake to have the packets run from top to bottom; any horizontal tearing will cut every single packet in half! With a horizontal design only one packet per frame would suffer this fate.

A new design enables video-over-video

Packets that run horizontally across the screen indeed fix most of the packet loss. It may also help with CPU load as it improves memory access locality. I'm now able to get 1000 kbps from the monitor! What could this be used for? A live video stream, perhaps?

But the clock was ticking. I had a presentation coming up and I really wanted to amaze everyone with a video transfer demo. I quite literally got it working on the morning of the event. For simplicity, I decided to go with MJPEG, even though fancier schemes could compress way more efficiently. The packet loss issues are mostly kept at bay by repeating frames.

The data stream is "hidden" in a Windows desktop screenshot; I'm changing the colours in a way that both creates a readable bit and also looks inconspicuous when you look from far away.

Mitigations

This was a fun project but this kind of a vulnerability could, in the tinfoiliest of situations, be used for exfiltrating information out of a supposedly airgapped computer.

The issue has been alleviated in some modern display protocols. DisplayPort[4] makes use of scrambling: a pseudorandom sequence of bits is mixed with the bitstream to remove the strong clock oscillations that are so easily radiated out. This also randomizes the bitstream-to-amplitude correlation. I haven't personally tested whether it still has some kind of video in their radio interference, though. (Edit: Scrambling seems to be optionally supported by later versions of HDMI, too – but it might depend on which features exactly the two devices negotiate. How could you know if it's turned on?)

[Image: A monitor completely wrapped in tinfoil, with the text IMPRACTICAL written over it.]

I've also tried wrapping the monitor in tinfoil (very impractical) and inside a cage made out of chicken wire (it had no effect - perhaps I should have grounded it?). I can't recommend either of these.

Software considerations

This project was made possible by at least C++, Perl, SoX, ImageMagick, liquid-dsp, Dear Imgui, GLFW, turbojpeg, and v4l2! If you're a library that feels left out, please leave a comment.

If you wish to play around with video emanations, I heard there is a project called TempestSDR. For generic analog video decoding via a software radio, there is TVSharp.

References

  1. Van Eck, Wim (1985): Electromagnetic radiation from video display units: An eavesdropping risk?
  2. Kuhn, Markus (2004): Electromagnetic Eavesdropping Risks of Flat-Panel Displays
  3. KUNG FURY Official Movie [HD] (2015)
  4. Video Electronics Standards Association (2006): DisplayPort Standard, version 1.

Descrambling split-band voice inversion with deinvert

Voice inversion is a primitive method of rendering speech unintelligible to prevent eavesdropping of radio or telephone calls. I wrote about some simple ways to reverse it in a previous post. I've since written a software tool, deinvert (on GitHub), that does all this for us. It can also descramble a slightly more advanced scrambling method called split-band inversion. Let's see how that happens behind the scenes.

Simple voice inversion

Voice inversion works by inverting the audio spectrum at a set maximum frequency called the inversion carrier. Frequencies near this carrier will thus become frequencies near zero Hz, and vice versa. The resulting audio is unintelligible, though familiar sentences can easily be recognized.

Deinvert comes with 8 preset carrier frequencies that can be activated with the -p option. These correspond to a list of carrier frequencies I found in an actual scrambler's manual, dubbed "the most commonly used inversion carriers".

The algorithm behind deinvert can be divided into three phases: 1) pre-filtering, 2) mixing, and 3) post-filtering. Mixing means multiplying the signal by an oscillation at the selected carrier frequency. This produces two sidebands, or mirrored copies of the signal, with the lower one frequency-inverted. Pre-filtering is necessary to prevent this lower sideband from aliasing when its highest components would go below zero Hertz. Post-filtering removes the upper sideband, leaving just the inverted audio. Both filters can be realized as low-pass FIR filters.

[Image: A spectrogram in four steps, where the signal is first cut at 3 kHz, then shifted up, producing two sidebands, the upper of which is then filtered out.]

This operation is its own inverse, like ROT13; by applying the same inversion again we get intelligible speech back. Indeed, deinvert can also be used as a scrambler by just running unscrambled audio through it. The same inversion carrier should be used in both directions.

Split-band inversion

The split-band scrambling method adds another carrier frequency that I call the split point. It divides the spectrum into two parts that are inverted separately and then combined, preventing ordinary inverters from fully descrambling it.

A single filter-inverter pair may already bring back the low end of the spectrum. Descrambling it fully amounts to running the inversion algorithm twice, with different settings for the filters and mixer, and adding the results together.

The problem here is to find these two frequencies. But let's take a look at an example from audio scrambled using the CML CMX264 split-band inverter (from a video by GBPPR2).

[Image: A spectrogram showing a narrow band of speech-like harmonics, but with a constant dip in the middle of the band.]

In this case the filter roll-off is clearly visible in the spectrogram and it's obvious where the split point is. The higher carrier is probably at the upper limit of the full band or slightly above it. Here the full bandwidth seems to be around 3200 Hz and the split point is at 1200 Hz. This could be initially descrambled using deinvert -f 3200 -s 1200; if the result sounds shifted up or down in frequency this could be refined accordingly.

Performance

On a single core of an i7-based laptop from 2013, deinvert processes a 44.1 kHz WAV file at 60x realtime speed (120x for simple inversion). Most of the CPU cycles are spent doing filter convolution, i.e. calculating the signal's vector dot product with the low-pass filter kernels:

[Image: A graph of the time spent in various parts of the call tree of the program, with the subtree leading to the dot product operation highlighted. It takes well over 80 % of the tree.]

For this reason deinvert has a quality setting (0 to 3) for controlling the number of samples in the convolution kernels. A filter with a shorter kernel is linearly faster to compute, but has a low roll-off and will leave more unwanted harmonics.

A quality setting of 0 turns filtering off completely, and is very fast. For simple inversion this should be fine, as long as the original doesn't contain much power above the inversion carrier. It's easy to ignore the upper sideband because of its high frequency. In split-band descrambling this leaves some nasty folded harmonics in the speech band though.

Here's a descramble of the above CMX264 split-band audio using all the different quality settings in deinvert. You will first hear it scrambled, and then descrambled with increasing quality setting.

The default quality level is 2. This should be enough for real-time descrambling of simple inversion on a Raspberry Pi 1, still leaving cycles for an FM receiver for instance:

(RasPi 1)Simple inversionSplit-band inversion
-q 016x realtime5.8x realtime
-q 16.5x realtime3.0x realtime
-q 22.8x realtime1.3x realtime
-q 31.2x realtime0.4x realtime

The memory footprint is less than four megabytes.

Future developments

There's a variant of split-band inversion where the inversion carrier changes constantly, called variable split-band. The transmitter informs the receiver about this sequence of frequencies via short bursts of data every couple of seconds or so. This data seems to be FSK, but it shall be left to another time.

I've also thought about ways to automatically estimate the inversion carrier frequency. Shifting speech up or down in frequency breaks the relationships of the harmonics. Perhaps this fact could be exploited to find a shift that would minimize this error?

Links

CTCSS fingerprinting: a method for transmitter identification

Identifying unknown radio transmitters by their signals is called radio fingerprinting. It is usually based on rise-time signatures, i.e. characteristic differences in how the transmitter frequency fluctuates at carrier power-up. Here, instead, I investigate the fingerprintability of another feature in hand-held FM transceivers, known as CTCSS or Continuous Tone-Coded Squelch System.

Motivation & data

I came across a long, losslessly compressed recording of some walkie-talkie chatter and wanted to know more about it, things like the number of participants and who's talking with who. I started writing a transcript – a fun pastime – but some voices sounded so similar I wondered if there was a way to tell them apart automatically.

[Image: Screenshot of Audacity showing an audio file over eleven hours long.]

The file comprises several thousand short transmissions as FM demodulated audio lowpass filtered at 4500 Hz. Signal quality is variable; most transmissions are crisp and clear but some are buried under noise. Passages with no signal are squelched to zero.

I considered several potentially fingerprintable features, many of them unrealistic:

  • Carrier power-up; but many transmissions were missing the very beginning because of squelch
  • Voice identification; but it would probably require pretty sophisticated algorithms (too difficult!) and longer samples
  • Mean audio power; but it's not consistent enough, as it depends on text, tone of voice, etc.
  • Maximum audio power; but it's too sensitive to peaks in FM noise

I then noticed all transmissions had a very low tone at 88.5 Hz. It turned out to be CTCSS, an inaudible signal that enables handsets to silence unwanted transmissions on the same channel. This gave me an idea inspired by mains frequency analysis: Could this tone be measured to reveal minute differences in crystal frequencies and modulation depths? Also, knowing that these were recorded using a cheap DVB-T USB stick – would it have a stable enough oscillator to produce consistent measurements?

Measurements

I used the liquid-dsp library for signal processing. It has several methods for measuring frequencies. I decided to use a phase-locked loop, or PLL; I could have also used FFT with peak interpolation.

In my fingerprinting tool, the recording is first split into single transmissions. The CTCSS tone is bandpass filtered and a PLL starts tracking it. When the PLL frequency stops fluctuating, i.e. the standard deviation is small enough, it's considered locked and its frequency is averaged over this time. The average RMS power is measured similarly.

Here's one such transmission:

[Image: A graph showing frequency and power, first fluctuating but then both stabilize for a moment, where text says 'PLL locked'. Caption says 'No, I did not copy'.]

Results

At least three clusters are clearly distinguishable by eye. Zooming in to one of the clusters reveals it's made up of several smaller clusters. Perhaps the larger clusters correspond to three different models of radios in use, and these smaller ones are the individual transmitters?

[Image: A plot of RMS power versus frequency, with dots scattered all over, but mostly concentrated in a few clusters.]

A heat map reveals even more structure:

[Image: The same clusters presented in a gradual color scheme and numbered from 1 to 12.]

It seems at least 12 clusters, i.e. potential individual transmitters, can be distinguished.

Even though most transmissions are part of some cluster, there are many outliers as well. These appear to correspond to a very noisy or very short transmission. (Could the FFT have produced better results with these?)

Use as transcription aid

My goal was to make these fingerprints useful as labels aiding transcription. This way, a human operator could easily distinguish parties of a conversation and add names or call signs accordingly.

I experimented with automated k-means clustering, but that didn't immediately produce appealing results. Then I manually assigned 12 anchor points at apparent cluster centers and had a script calculate the nearest anchor point for all transmissions. Prior to distance calculations the axes were scaled so that the data seemed uniformly distributed around these points.

This automatic labeling proved quite sensitive to errors. It could be useful when listing possible transmitters for an unknown transmission with no context; distances to previous transmissions positively mentioning call signs could be used. Instead I ended up printing the raw coordinates and colouring them with a continuous RGB scale:

[Image: A few lines from a conversation between Boa 1 and Cobra 1. Numbers in different colors are printed in front of each line.]

Here the colours make it obvious which party is talking. Call signs written in a darker shade are deduced from the context. One sentence, most probably by "Cobra 1", gets lost in noise and the RMS power measurement becomes inaccurate (463e-6). The PLL frequency is still consistent with the conversation flow, though.

Countermeasures

If CTCSS is not absolutely required in your network, i.e. there are no unwanted conversations on the frequency, then it can be disabled to prevent this type of fingerprinting. In Motorola radios this is done by setting the CTCSS code to 0. (In the menus it may also be called a PT code or Interference Eliminator code.) In many other consumer radios it's doesn't seem to be that easy.

Conclusions

CTCSS is a suitable signal for fingerprinting transmitters, reflecting minute differences in crystal frequencies and, possibly, FM modulation indices. Even a cheap receiver can recover these differences. It can be used when the signal is already FM demodulated or otherwise not suitable for more traditional rise-time fingerprinting.

Mapping microwave relay links from video

Radio networks are often at least partially based on microwave relay links. They're those little mushroom-like appendices growing out of cell towers and building-mounted base stations. Technically, they're carefully directed dish antennas linking such towers together over a line-of-sight connection. I'm collecting a little map of nearby link stations, trying to find out how they're interconnected and which network they belong to.

Circling around

We can find a rough direction for any link antenna by approximating a tangent for the dish shroud surface from position-stamped video footage taken while circling the tower. Optimally we would have a drone make a full circle around the tower at a constant distance and elevation to map all antennas at once; but if our DJI Phantom has run out of battery, a GPS positioned still camera at ground level will also do.

[Image: Five photos of the same directional microwave antenna, taken from different angles, and edge-detection and elliptical Hough transform results from each one, with a large and small circle for all ellipses.]

The rest can be done manually, or using Hough transform and centroid calculation from OpenCV. In these pictures, the ratio of the diameters of the concentric circles is a sinusoid function of the angle between the antenna direction and the camera direction. At its maximum, we're looking straight at the beam. (The ratio won't max out at unity in this case, because we're looking at the antenna slightly from below.) We can select the frame with the maximum ratio from high-speed footage, or we can interpolate a smooth sinusoid to get an even better value.

[Image: Diagram showing how the ratio of the diameters of the large and small circle is proportional to the angle of the antenna in relation to the camera.]

This particular antenna is pointing west-northwest with an azimuth of 290°.

What about distance?

Because of the line-of-sight requirement, we also know the maximum possible distance to the linked tower, using the formula 7140 × √(4 / 3 × h) where h is the height of the antenna from ground. If the beam happens to hit a previously mapped tower closer than this distance, we can assume they're connected!

This antenna is communicating to a tower not further away than 48 km. Judging from the building it's standing on, it belongs to a government trunked radio network.

Broadcast messages on the DARC side

By now I've rambled quite a lot about RDS, the data subcarrier on the third harmonic of the 19 kHz FM stereo pilot tone. And we know the second harmonic carries the stereo information. But is there something beyond that?

Using a wideband FM receiver, like an RTL-SDR, I can plot the whole demodulated spectrum of any station, say, from baseband to 90 kHz. Most often there is nothing but silence above the third pilot harmonic (denoted 3f here), which is the RDS subcarrier. But two local stations have this kind of a peculiar spectrogram:

[Image: A spectrogram showing a signal at the audible frequency range, labeled 'mono', and four carriers centered at 19, 38, 57, and 76 kHz, labeled pilot, 2f, 3f, and 4f, respectively. Pilot is a pure sinusoid; 2f and 3f are several kHz wide signals with mirrored sidebands; and 4f is 20 kHz wide and resembles wideband noise.]

There seems to be lots of more data on a pretty wide band centered on the fourth harmonic (4f)!

As so often with mysterious persistent signals, I got a lead by googling for the center frequency (76 kHz). So, meet the imaginatively named Data Radio Channel, or DARC for short. This 16,000 bps data stream uses level-controlled minimum-shift keying (L-MSK), which can be thought of as a kind of offset-quadrature phase-shift keying (O-QPSK) where consecutive bits are sent alternating between the in-phase and quadrature channels.

To find out more, I'll need to detect the signal first. Detecting L-MSK coherently is non-trivial ("tricky") for a hobbyist like me. So I'm going to cheat a little and treat the signal as continuous-phase but non-coherent binary FSK instead. This should give us good data, even though the bit error probability will be suboptimally high. I'll use a band-pass filter first; then a splitter filter to split the signal band into mark and space; detect both envelopes and calculate difference; lowpass filter with a cutoff at the bitrate; recover bit timing and synchronize using a PLL; and now we can just take the bits out by hard decision at bit middle points.

Oh yeah, we have a bitstream!

[Image: Three oscillograms followed by a stream of 1s and 0s. The first oscillogram is quite nondescript; the second one actually shows two waveforms, red and blue, in the same graph, with the red dominating in envelope power where blue is suppressed and vice versa. The third oscillograms shows a graph apparently following their envelope power difference, with sample points at regular intervals. The sign of this plot at sample points dictates whether a 0 or 1 is shown below that sample.]

Decoder software for DARC does not exist, according to search engines. To read the actual data frames, I'll have to acquire block synchronization, regenerate the pseudorandom scrambler used and reverse its action, check the data against CRCs for errors, and implement the stack of layers that make up the protocol. The DARC specification itself is luckily public, albeit not very helpfully written; in contrast to the RDS standard, it contains almost no example circuits, data, or calculations. So, a little recap of Galois field mathematics and linear feedback shift registers for me.

But what does it all mean? And who's listening to it?

Now, it's no secret (Signaali 3/2011, p. 16) that the HSL bus stop timetable displays in Helsinki get their information about the next arriving GPS positioned buses through this very FM station, YLE 1. That's the only thing they're telling though. I haven't found anything bus-related in the RDS data, so it's quite likely that they're listening to DARC.

[Image: Photo of a rugged LCD display mounted on a metal pole, displaying the text '55K FORSBY ~6' and stamped with the logo 'HSL HRT'.]

DARC is known to be used in various other applications as well, such as DGPS correction signals. So decoding it could prove interesting.

Sniffing the packet types being sent, it seems that quite a big part of the time is being used to send a transmit cycle schedule (COT and SCOT). And indeed, the aforementioned magazine article hints that the battery-powered displays use a precise radio schedule to save power, receiving only during a short period every minute each. (This grep only lists metadata type packets, not the actual data.)

$ perl darcdec-filters.pl | grep Type: | sort | uniq -c
  88 Type: 0 Channel Organization Table (COT)
   8 Type: 1 Alternative Frequency Table (AFT)
   8 Type: 2 Service Alternative Frequency Table (SAFT)
   1 Type: 4 Service Name Table (SNT)
   8 Type: 5 Time and Date Table (TDT)
 112 Type: 6 Synchronous Channel Organization Table (SCOT)
$ █

At the moment I'm only getting sensible data out at the very beginning of each packet (or "block"). I do get a solid, error-free block sync but, for example, the date is consistently showing the year 1922 and all CRCs fail. Other fields are similarly weird but consistent. This could mean that I've still got the descrambler PN polynomial wrong, only succeeding when it's using bits from the initialization seed. And this can only mean many more sleepless coding nights ahead.

(Continued in the next post)

(Pseudotags for Finns: HSL:n pysäkkinäytöt eli aikataulunäytöt.)

Update 1/2019: There's now a basic DARC decoder on GitHub, it's called darc2json.

The GSM buzz

You probably recognize the loud buzz a ringing mobile phone often causes in nearby speakers. But where does it come from? Let's dissect the sound and see if it reveals something about the call!

First we need to know a little about how GSM works. A certain cell site (tower) is responsible for handling all the calls made in its coverage area. But the allocated frequency band is very limited. Several phones can talk to the same tower on the same frequency by taking turns. Thus, one GSM phone will only transmit a short burst of digitally encoded speech at a time, and will then wait silently for its next turn. This is called time-division multiple access (TDMA).

Up to 8 phones can fit on a single frequency. They take turns during what's called a TDMA frame. One phone may only transmit once every frame, in its own allocated time slot. One frame lasts 0.004615 seconds, a time slot thus being one eighth of this.

Now, let's zoom into the buzzing sound. Plotted below is its waveform. Not surprisingly, it is also a plot of the voltage variation in the speaker cable.

[Image: Three oscillograms of the same signal zoomed at different time scales. The first one spans 2.5 seconds and shows 10 bursts in a rhythmic pattern. The second one, 147 milliseconds, shows two bursts, and each seems to be composed of 4 short peaks. The third one, 25 milliseconds, shows one burst, which is clearly composed of 4 equally spaced spikes, resembling a square wave with a 20% duty cycle.]

It looks like voltage is being switched on for a short moment and then cut off for a longer period. You may know from high-school physics that a radio transmitter produces an oscillation in the electromagnetic field, which in turn creates a current (and voltage) in a conductor. So the voltage spikes are probably the periods during which the phone is transmitting. And sure enough, when TDMA timeframes and timeslots are superimposed onto the waveform, magic happens:

[Image: The 25 millisecond oscillogram with a time grid superimposed, showing that each spike's duration is exactly one eighth (0.577 ms) of the time between spikes (4.615 ms).]

So, what does this reveal about the call going through? Nothing whatsoever, luckily.

The sound of the dialup, pictured

If you ever connected to the Internet before the 2000s, you probably remember that it made a peculiar sound. But despite becoming so familiar, it remained a mystery for most of us. What do these sounds mean?

(The audio was recorded by William Termini on his iMac G3.)

As many already know, what you're hearing is often called a handshake, the start of a telephone conversation between two modems. The modems are trying to find a common language and determine the weaknesses of the telephone channel originally meant for human speech.

Below is a spectrogram of the handshake audio. I've labeled some signals according to which party transmitted them, and also put a concise explanation below.

[Image: A large infographic detailing the phases of the dialup handshake, centered on a time-frequency-power representation (spectrogram).]

(You can order this poster as a high-res print via Redbubble!)

Hello, is this a modem?

The first thing we hear in this example is a dial tone, the same tone you would hear when picking up your landline phone. The modem now knows it's connected to a phone line and can dial a number. The number is signaled to the network using Dual-Tone Multi-Frequency signaling, or DTMF, the same sounds a telephone makes when dialing a number.

The remote modem answers with a distinct tone that our calling modem can recognize. They then exchange short bursts of binary data to assess what kind of protocol is appropriate. This is called a V.8 bis transaction.[2]

Suppressing echoes

Now the modems must address the problem of echo suppression. When humans talk, only one of them is usually talking while the other one listens. The telephone network exploits this fact and temporarily silences the return channel to suppress any confusing echoes of the talker's own voice.

Modems don't like this at all, as they can very well talk at the same time (it's called full-duplex). The answering modem now puts on a special answer tone that will disable any echo suppression circuits on the line. The tone also has periodic "snaps" (180° phase transitions) that aim to disable yet another type of circuit called echo canceller.[1]

Finding a suitable modulation

Now the modems will list their supported modulation modes and try to find one that both know.[1] They also probe the line with test tones to see how it responds to tones of different frequencies, and how much it attenuates the signal. They exchange their test results and decide a speed that is suitable for the line.[4]

Enough small talk!

After this, the modems will go to scrambled data. They add pseudorandom numbers to the data[3] before transmission to make its power distribution more even and to make sure there are no patterns that are suboptimal for transfer[5]. They listen to each other sending a series of binary 1's and adjust their equalizers to optimally shape the incoming signal.

Soon after this, the modem speaker will go silent and data can be put through the connection.

But why?

Why was it audible? Why not, one could ask. But my speculation is as follows. Back in the days, telephone lines were used for audio. The first modems even used the telephone receiver like humans do, by talking into the mouthpiece, until newer modems were developed that could directly connect into the phone line. Even then, the idea of not hearing what's happening on a phone line you're calling on was quite new, and modems would default to exposing the user to the handshake audio. And in case you accidentally called a human, you would still have time to pick up the telephone and explain the situation.

All you had to do to silence the handshake was to send the command ATM0 down the serial line before dialing.[6]

Poster

Update 02/2013: Due to numerous requests, I made this into a 42-megapixel poster that Redbubble is selling. A few dollars per poster is directed to the nerd who made this.

References

  1. RFC 4734: Definition of Events for Modem, Fax, and Text Telephony Signals (2006), section 2.3: V.8 events.
  2. ITU-T Recommendation V.8 bis (1996).
  3. ITU-T Recommendation V.33 (1988). Annex A: Scrambling.
  4. ITU-T Recommendation V.34 (1996). Chapter 10: Start-up signals and sequences.
  5. Advantages of Scrambling (2012). RF Wireless World.
  6. Turner, Glen (2003): Remote Serial Console HOWTO.