tag:blogger.com,1999:blog-50962788917634262762024-03-17T19:29:20.490+02:00absorptionsa blog about sound & signals by windytan [oona räisänen]Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.comBlogger57125tag:blogger.com,1999:blog-5096278891763426276.post-1283977007557849222024-01-23T21:47:00.009+02:002024-01-30T22:38:56.817+02:00Smoother sailing: Studying audio imperfections in Steamboat Willie<div class="kuva oikealla"><img alt="" border="0" width="240" data-original-height="600" data-original-width="720" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgmpMyJpJ8YH4hEKMKde_Pow2smdNeAH6fSTNsUd-g0ZOzvJRNoDwj0N1WNQL82P5MMJSZrz83kgqX9K86wFDNtde3Fczi4X8juTW3orD9acpHU3OrdTbAvqq2IfILDZJFvZtaav4JT1HbDybzN7sNyxr0ilR5mQMr0QZTF0Qb_r615A421JOMBvHli4fIX/s480/mikki.jpg" alt="[Image: Mickey Mouse whistling on the bridge of a steamboat.]"/></div>
<p><em>Steamboat Willie</em> (1928) was one of the earliest cartoons with synchronized sound. That is, it had post-production sound effects; this was something new and exciting. Now that the cartoon has recently entered the public domain<a class="ref" href="#bbc24">[bbc24]</a> we can safely delve into its famous soundtrack. See, there's something interesting about how it sounds...</p>
<p>If you listen closely to the <a href="https://www.youtube.com/watch?v=I5pG1wbRKOg" class="external" title="Steamboat Willie (1928 Film) - 4K Film Remaster">soundtrack on Youtube</a> it sounds somehow distorted. You might be tempted to point out that it's 96 years old, yes. But you might also recognize that it is suffering from <em>flutter</em>, i.e. an unstable playback or recording speed.</p>
<p>In the spirit of this blog let's geek out for a bit and study this flutter distortion further. Can we learn something interesting? Could we perhaps learn enough to be able to reduce it?</p>
<p>Of course the flutter might be 100% authentic to how it sounded in theatres in the 1920s; we don't know when and why it appeared in the audio (more on that later!). It might have sounded even worse. But we can still hope to enjoy the sound effects in their original recorded form.</p>
<h3>Prior work</h3>
<p>I'm not the first one to notice this clip is 'fluttering' and to try and do something about it. I found videos of people's attempts to un-flutter it using Celemony Capstan, a professional tool made just for this purpose, with varying results. Capstan uses Melodyne's famous note detection engine to detect musical features and then controls a varispeed effect to cancel out any flutter.</p>
<p>But Capstan is expensive, and it's more fun to come up with a home-made solution anyway. And what about non-musical sounds? Besides, I had some code laying around in a forgotten desk drawer that just might fit the purpose.</p>
<h3>Finding a high quality source</h3>
<p>Why would I need a high-quality digital file of a poor-quality soundtrack from the 1920s? I guess it's the archivist in me hoping that it has been preserved with high level of detail. But also, if you're going to try and dig up some hidden details in the sound, you'd want minimal interference from any lossy psychoacoustic compression, right? These artifacts might become audible after varispeed effects and could also hinder frequency detection.</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxHxA1NzbJulmPxer0gk48ImmkoN4ji66G3lnAMiMYlLkE0ekgJ6VTq9umFEn97Wr5dGUIo8fzsUpC8w6DL_9yptx8BqDQe0swHSQpJ62F3f-CYlLrYNMwLsW61j4GLWbLINrLsSpvwkih9Mn7I_kbaXADf47Imt3AgK3Hxo3U-wM4JQpFkNOy9DtHQUA3/s1353/yt-vs-4k.jpg"><img alt="[Image: Two spectrograms labeled 'random Youtube video' and '4K version', the former showing compression artifacts.]" border="0" width="520" data-original-height="515" data-original-width="1353" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxHxA1NzbJulmPxer0gk48ImmkoN4ji66G3lnAMiMYlLkE0ekgJ6VTq9umFEn97Wr5dGUIo8fzsUpC8w6DL_9yptx8BqDQe0swHSQpJ62F3f-CYlLrYNMwLsW61j4GLWbLINrLsSpvwkih9Mn7I_kbaXADf47Imt3AgK3Hxo3U-wM4JQpFkNOy9DtHQUA3/s520/yt-vs-4k.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxHxA1NzbJulmPxer0gk48ImmkoN4ji66G3lnAMiMYlLkE0ekgJ6VTq9umFEn97Wr5dGUIo8fzsUpC8w6DL_9yptx8BqDQe0swHSQpJ62F3f-CYlLrYNMwLsW61j4GLWbLINrLsSpvwkih9Mn7I_kbaXADf47Imt3AgK3Hxo3U-wM4JQpFkNOy9DtHQUA3/s1040/yt-vs-4k.jpg 2x"/></a></div>
<p>The high-quality <a href="https://archive.org/details/steamboat-willie-4-k-resolution" class="external">source</a> I found is in the Internet Archive. It might originally be coming from the 4K Blu-Ray release called Celebrating Mickey. The spectrogram doesn't show almost any compression artifacts that I can see, even in the quietest frequency ranges! Perfect!</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhC59_S2FfWRnMghfZ4QuHa4GEugXiwUtUudblRrARDPO-v171yCcH4ri_lUxGpkcWCbh1d6htKfH7CSqmMjjJpG0id9XcyMLNTz2k5yVn2jTDvWu62jhr1bl0BOugwiI2CepLB3Y20DwI2sPGV_kQvPmt1nNdwvQPFVVeHVk-9JtyuxoUyyLCiG3IhXd-e/s1100/film_scan.jpg"><img alt="[Image: A single film frame.]" border="0" width="400" data-original-height="457" data-original-width="1100" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhC59_S2FfWRnMghfZ4QuHa4GEugXiwUtUudblRrARDPO-v171yCcH4ri_lUxGpkcWCbh1d6htKfH7CSqmMjjJpG0id9XcyMLNTz2k5yVn2jTDvWu62jhr1bl0BOugwiI2CepLB3Y20DwI2sPGV_kQvPmt1nNdwvQPFVVeHVk-9JtyuxoUyyLCiG3IhXd-e/s400/film_scan.jpg" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhC59_S2FfWRnMghfZ4QuHa4GEugXiwUtUudblRrARDPO-v171yCcH4ri_lUxGpkcWCbh1d6htKfH7CSqmMjjJpG0id9XcyMLNTz2k5yVn2jTDvWu62jhr1bl0BOugwiI2CepLB3Y20DwI2sPGV_kQvPmt1nNdwvQPFVVeHVk-9JtyuxoUyyLCiG3IhXd-e/s800/film_scan.jpg 2x"/></a></div>
<p>But the Internet Archive <a href="https://archive.org/details/steamboat-willie-16mm-film-scan-4k-lossless" class="external">delivers</a> something even better. There's a (visually) lossless 4K scan of the movie with the <a href="https://en.wikipedia.org/wiki/Sound-on-film" class="external">optical soundtrack</a> partially included (above)! The high-quality version is 34 GB, but there's a downscaled 480p MP4 one thousandth of the size.</p><p>I listened to the optical soundtrack from this low-resolution version with a little pixel-reader script. Turns out the flutter is already present on the film! (Edit: Note that we don't know where this particular film print came from. When was it created? Is there an original somewhere, without flutter?)</p>
<h3>Hand-guiding a frequency tracker</h3>
<p>Looking at the above spectrogram, we can see that the frequency of everything is zig-zagging as a function of time – that's flutter all right. But how to quantify these variations? We could zoom in on one of the frequency peaks and follow the course of its frequency in time. I'm using FFT peak interpolation to find more accurate frequency estimates<a class="ref" href="#gasior04" title="">[gasior04]</a>.</p>
<p>Take the sound of Pete's tobacco hitting the ship's bell around the 01'45'' mark. You'd think a bell is supposed to have a constant frequency, yet this one sounds quite unstable. We can follow any one of the harmonics and see how the playback speed (bell frequency) varies over the period of one second:</p>
<div class="saumaton kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgnShT00BrqwOsAyo2lEg8GsM894ogOPfS0K1BARfwYSE_2TfpPVAYjTkdx1f1rff48an2iDaUxXdQguAdkJDbZqW5Pk06ZHylJx9iZTxmAu1ztnikQIF-rVobS5xPQZmB_QRQhTO4P_Ym4Bz5HttsqxFNw8260cjloq4UeKI7iwReipqjaBaTJnuJYZs3G/s1784/flutterpercent.png"><img alt="[Image: Spectrogram with fluctuating tones.]" border="0" width="550" data-original-height="532" data-original-width="1784" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgnShT00BrqwOsAyo2lEg8GsM894ogOPfS0K1BARfwYSE_2TfpPVAYjTkdx1f1rff48an2iDaUxXdQguAdkJDbZqW5Pk06ZHylJx9iZTxmAu1ztnikQIF-rVobS5xPQZmB_QRQhTO4P_Ym4Bz5HttsqxFNw8260cjloq4UeKI7iwReipqjaBaTJnuJYZs3G/s550/flutterpercent.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgnShT00BrqwOsAyo2lEg8GsM894ogOPfS0K1BARfwYSE_2TfpPVAYjTkdx1f1rff48an2iDaUxXdQguAdkJDbZqW5Pk06ZHylJx9iZTxmAu1ztnikQIF-rVobS5xPQZmB_QRQhTO4P_Ym4Bz5HttsqxFNw8260cjloq4UeKI7iwReipqjaBaTJnuJYZs3G/s1100/flutterpercent.png 2x"/></a></div>
<p>To my eye, this oscillation looks periodic and not random at all. We can run another round of FFT on a longer stretch of samples to find the strongest period of these fluctuations: It turns out to be 15 Hz. (Why 15? I so hoped it would have been 24 Hz – it would have made a more interesting story! More on that later...)</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVcLRpMLSy3_mVGx7bqA2qq5eM5OgSp_4PBNZs0ESSw0WGaYm_uNd-xOEhyWpeEy5_9itOS3Ta34aAkOTqVMf38gGuTxfbi8Ru9NNl1EaeHQxQh3OrIaF4uurbNYPz4guDRldNGc9PT0rlKgeAAxX-1uJKHQDUt9qQLAHc8I3kLwiiEoT_XhUZmTwdV_tv/s1490/15hz.png"><img alt="[Image: Spectrum plot showing a peak at 15.0 Hz about 15 dB higher than background.]" border="0" width="550" data-original-height="397" data-original-width="1490" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVcLRpMLSy3_mVGx7bqA2qq5eM5OgSp_4PBNZs0ESSw0WGaYm_uNd-xOEhyWpeEy5_9itOS3Ta34aAkOTqVMf38gGuTxfbi8Ru9NNl1EaeHQxQh3OrIaF4uurbNYPz4guDRldNGc9PT0rlKgeAAxX-1uJKHQDUt9qQLAHc8I3kLwiiEoT_XhUZmTwdV_tv/s550/15hz.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVcLRpMLSy3_mVGx7bqA2qq5eM5OgSp_4PBNZs0ESSw0WGaYm_uNd-xOEhyWpeEy5_9itOS3Ta34aAkOTqVMf38gGuTxfbi8Ru9NNl1EaeHQxQh3OrIaF4uurbNYPz4guDRldNGc9PT0rlKgeAAxX-1uJKHQDUt9qQLAHc8I3kLwiiEoT_XhUZmTwdV_tv/s1100/15hz.png 2x"/></a></div>
<p>Okay, so can we repeat this process for the whole movie? I don't think we can just automatically follow the frequency of every peak, since some sounds will naturally contain vibration and rises and drops in frequency. Not all of it is due to flutter. Some sort of a vetting process is needed. We could try a tedious manual route...</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjeHxYb1XXSjCwdDvUnGoPv0OVbnzi2W_PaxSo2_tfchJ73ArpUu6D61cFciKnrS8T5fsmgblIapbsuvmPx7tg2wSthgzbLsxvo8H8PGBncLuCemYWExdgDEdADr_Pl21gpg9PtsylusilHh9Y8I7cDCzsFPJs-nTlbSn_-NJnKNcwm_uU2cQMC5Nwpt7Yh/s1204/annotation.jpg" style="display: block; padding: 1em 0; text-align: center; "><img alt="[Image: GUI of a software with spectrograms and oscillogram plots.]" border="0" width="550" data-original-height="952" data-original-width="1204" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjeHxYb1XXSjCwdDvUnGoPv0OVbnzi2W_PaxSo2_tfchJ73ArpUu6D61cFciKnrS8T5fsmgblIapbsuvmPx7tg2wSthgzbLsxvo8H8PGBncLuCemYWExdgDEdADr_Pl21gpg9PtsylusilHh9Y8I7cDCzsFPJs-nTlbSn_-NJnKNcwm_uU2cQMC5Nwpt7Yh/s550/annotation.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjeHxYb1XXSjCwdDvUnGoPv0OVbnzi2W_PaxSo2_tfchJ73ArpUu6D61cFciKnrS8T5fsmgblIapbsuvmPx7tg2wSthgzbLsxvo8H8PGBncLuCemYWExdgDEdADr_Pl21gpg9PtsylusilHh9Y8I7cDCzsFPJs-nTlbSn_-NJnKNcwm_uU2cQMC5Nwpt7Yh/s1100/annotation.jpg 2x"/></a></div>
<p>I made a little software tool (above) where I could click and drag little boxes onto a spectrogram to search for peaks in. This wobbly line is then simply taken to be the speed variation (red graph in the top picture).</p>
<p>It became quite a chore to annotate longer sounds as this software didn't come with undo, edit, or save features for the longest time!</p>
<p>Now let's think about what to do with this speed information...</p>
<h3>Desk drawer deep dive</h3>
<p>Some time ago I had made a tool that could well come in handy now. It was for correcting wobbly wideband radio recordings stored on VHS tapes. These recordings contained some empty carriers that happened to work like seismographs, accurately recording the tape speed variations. The tool then used a <a href="https://en.wikipedia.org/wiki/Lagrange_polynomial" class="external">Lagrange polynomial</a> to interpolate new samples at a steady interval, so called 'digital varispeed'.</p>
<p>It was ultimately based on an interesting paper on de-fluttering magnetic tapes using the tape bias signal as reference<a class="ref" href="#howarth04">[howarth04]</a>.</p>
<div class="kuva oikealla"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-ftP_bhx7_ijHh3uG18VJRLnLsRbKWaho5kgEz3Hu5y5m5R1pYC3BivLZunACRMAllEkv4YrMKk1ANe3YVOeyuM2rvzMEW8LoHJqyewoU8B6ABosgMbwAptHtv4VTwU0zbKB6fJP5Qx36iiDS8Pm-ok9yAd9EYUfQO06Oi8-JSQp0UIe0LPhcsV4hqP6P/s547/1981.jpg"><img alt="[Image: Buttons of an old device, one of them Varispeed, labeled 1981. Below, part of a GUI with the text Varispeed, labeled 2023.]" border="0" width="240" data-original-height="386" data-original-width="547" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-ftP_bhx7_ijHh3uG18VJRLnLsRbKWaho5kgEz3Hu5y5m5R1pYC3BivLZunACRMAllEkv4YrMKk1ANe3YVOeyuM2rvzMEW8LoHJqyewoU8B6ABosgMbwAptHtv4VTwU0zbKB6fJP5Qx36iiDS8Pm-ok9yAd9EYUfQO06Oi8-JSQp0UIe0LPhcsV4hqP6P/s240/1981.jpg"/></a></div>
<p class="remark">By the way, I keep mentioning <em>varispeed</em> and never explained it. This was a feature of old studio-grade reel-to-reel tape recorders where the playback speed could be freely varied by the operator; hence vari+speed. Audio people still use this word in the digital world to essentially refer to variable-rate resampling, which has the same effect, so I'm using them interchangeably. (Topmost photo: Ferdinando Traversa, <a href="https://creativecommons.org/licenses/by/4.0/" class="external">CC BY</a>, cropped to detail)</p>
<p style="clear:both">Here's what this digital varispeed sounds like when exaggerated. In the below example I'm doing it in a simpler way. Instead of the Lagrange method I first upsampled some music by 10x in an audio software; hand-drew a speed curve in Audacity; and then used that curve to pick samples out of the oversampled music:</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjRocIM1OFdKzStrX8VTZ_pZnynkIJRxVZzCKjdBIkCa9FqKFTXNrYfwUBvsO0dp0iy0FpivP1g_FQboHFDF3xgNcWd303GLKqwH3s3NB2hyphenhyphenwhXyNdi3kNUMhS07pEdQw9cbcmaRZ94IGMfwdLfqDsIj9vh4PH2Qtqpvr6kJGZipaShCdrVYg-j59Sjo9Cy/s1501/speed-variations.png"><img alt="[Image: A waveform in Audacity.]" border="0" width="550" data-original-height="182" data-original-width="1501" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjRocIM1OFdKzStrX8VTZ_pZnynkIJRxVZzCKjdBIkCa9FqKFTXNrYfwUBvsO0dp0iy0FpivP1g_FQboHFDF3xgNcWd303GLKqwH3s3NB2hyphenhyphenwhXyNdi3kNUMhS07pEdQw9cbcmaRZ94IGMfwdLfqDsIj9vh4PH2Qtqpvr6kJGZipaShCdrVYg-j59Sjo9Cy/s550/speed-variations.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjRocIM1OFdKzStrX8VTZ_pZnynkIJRxVZzCKjdBIkCa9FqKFTXNrYfwUBvsO0dp0iy0FpivP1g_FQboHFDF3xgNcWd303GLKqwH3s3NB2hyphenhyphenwhXyNdi3kNUMhS07pEdQw9cbcmaRZ94IGMfwdLfqDsIj9vh4PH2Qtqpvr6kJGZipaShCdrVYg-j59Sjo9Cy/s1100/speed-variations.png 2x"/></a></div>
<div class="audiodiv"><audio controls=""><source src="https://oona.windytan.com/blogfiles/vary-out.mp3"></audio></div>
<p>Carefully controlled, this effect can be used to cancel out flutter. Here's how: If we knew exactly how the playback speed was fluctuating we could instantly vary the speed of our resampler in the opposite direction, thus canceling the variations. And with the above research we now have that knowledge!</p>
<p>Well, almost. I couldn't always see a clear frequency peak to follow, so the graph is patchy. But.. Maybe it could help to band-pass the speed signal at 15 Hz? This would help fill out small gaps and also preserve vibrato and other fluctuations that aren't part of the flutter distortion. We can at least try!</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj0n3FreUSaXNwU3f4_NacMaZACdrpp-bcD3AGkqcOLVPbVbGDs34gKuhIYvmzoW8sBSLWWJ1UwhJVefJ7HPRSFV3sw93N2oPVT5p98J54dQz6kQEiAPE_HKoej4DE4dXHtbMREBfLOX43NVzJ8LgETuGbJKS6KET1NqhfyXqmn8FOW3Pw579ivisEsNnU5/s1028/flutter-filter.png"><img alt="[Image: Two waveforms, one of them piecewise and noisy, the other one smooth and continuous.]" border="0" width="550" data-original-height="244" data-original-width="1028" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj0n3FreUSaXNwU3f4_NacMaZACdrpp-bcD3AGkqcOLVPbVbGDs34gKuhIYvmzoW8sBSLWWJ1UwhJVefJ7HPRSFV3sw93N2oPVT5p98J54dQz6kQEiAPE_HKoej4DE4dXHtbMREBfLOX43NVzJ8LgETuGbJKS6KET1NqhfyXqmn8FOW3Pw579ivisEsNnU5/s550/flutter-filter.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj0n3FreUSaXNwU3f4_NacMaZACdrpp-bcD3AGkqcOLVPbVbGDs34gKuhIYvmzoW8sBSLWWJ1UwhJVefJ7HPRSFV3sw93N2oPVT5p98J54dQz6kQEiAPE_HKoej4DE4dXHtbMREBfLOX43NVzJ8LgETuGbJKS6KET1NqhfyXqmn8FOW3Pw579ivisEsNnU5/s1100/flutter-filter.png 2x"/></a></div>
<p>In the example above, I replaced empty parts with a constant value of 100% and then filtered the whole thing. This sews the disjointed parts together in a smooth way.</p>
<h3>Can we hear some examples already?</h3>
<p>This clip is from when the goat ate Minnie's sheet music and guitar – the apparent catalyst event that sent Mickey Mouse to seek revenge on the entire animal kingdom.</p>
<table style="margin: 2em auto">
<tr><th>Before</th><td><audio controls=""><source src="https://oona.windytan.com/blogfiles/scorehog-original.mp3"></audio></td>
<td rowspan="2">
<img alt="[Image: Movie screenshot]" border="0" width="160" data-original-height="270" data-original-width="324" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyMGiKzngcjsWRdbDKAjM8uddVzY6p1aClossKkwHT8E_eBiH_fr3y-8h2XBL2CAf7KK2nAWCG50OpDQ5piRiU5snyCoo1k_Ragsn326reHc6ql-QkqXsg-0_cqHAY54MrGcFM2HQQd7BX7yQ_Tym8-5xnyfOMz6dz1wshz7mdS0ao9ZWyX5lBYJUCT3pF/s320/example-goat.jpg"/></td>
</tr>
<tr><th>After</th><td><audio controls=""><source src="https://oona.windytan.com/blogfiles/scorehog-corrected.mp3"></audio></td></tr>
</table>
<p>You can definitely hear the difference in the bell-like sounds coming from the goats insides. It even sounds like the little flute notes in the beginning are easier to tell apart in the corrected version.</p>
<p>Here's another musical example, with strings.</p>
<table style="margin: 2em auto">
<tr><th>Before</th><td><audio controls=""><source src="https://oona.windytan.com/blogfiles/jousia-original.mp3"></audio></td>
<td rowspan="2">
<img alt="[Image: Movie screenshot]" border="0" width="160" data-original-height="270" data-original-width="324" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhT6Np0rsoCoSPtjtdu7rO9KWd_-wH4nyS1xpSRcaAEG6OYTZLYqE1whAqrMYzVqrrElzvr8EyjLXQo0F3mFGzasExYU8Bno5uI-E4ScWW803nj2k6nXMcigyfddljMRwwRDn3wfsAjihXCXYuFTDsShy8lg7-ZnAdHdysPIUVDZBLcMvY0SsIJx2tAZVdJ/s320/example-lift.jpg"/>
</td>
</tr>
<tr><th>After</th><td><audio controls=""><source src="https://oona.windytan.com/blogfiles/jousia-unfluttered.mp3"></audio></td></tr>
</table>
<p>The cow's moo. That's a hard one because it's so rich in harmonics, in the spectrogram it looks almost like a <em>spaghetti bolognese</em>. My algorithm is constrained to a box and can't stay with one harmonic when the 'moo' slides in frequency. You can hear some artifacts because of this, but still the result sounds less sheep-like than the original.</p>
<table style="margin: 2em auto">
<tr><th>Before</th><td><audio controls=""><source src="https://oona.windytan.com/blogfiles/moo-original.mp3"></audio></td>
<td rowspan="2">
<img alt="[Image: Movie screenshot]" border="0" width="160" data-original-height="270" data-original-width="324" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-rof_q4SRStw6UAZpRiymm4m8SaiYjwDouDWZPEIiFYqRiKC45pmSaQ2Rh40dMz6yzsRUYrEPIWta3lOCicEgyPLUV9w4KljzS8jHH8iQ0JvhaGZqIY42Y92MJsBCuGEAQaZAI_inkwl5okdLezPlxoWDJbyUcR15Cql4wZyL5bQx5zvVBcZF__ECBsYo/s320/example-moo.jpg"/>
</td>
</tr>
<tr><th>After</th><td><audio controls=""><source src="https://oona.windytan.com/blogfiles/moo-corrected.mp3"></audio></td></tr>
</table>
<p>But Mickey whistling "Steamboat Bill" in the beginning of the film actually doesn't sound better when corrected... I preferred a bit of vibrato!</p>
<table style="margin: 2em auto">
<tr><th>Before</th><td><audio controls=""><source src="https://oona.windytan.com/blogfiles/viheltelee.mp3"></audio></td>
<td rowspan="2">
<img alt="[Image: Movie screenshot]" border="0" width="160" data-original-height="270" data-original-width="324" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjjuZucmQXJmuMxSCIdBgbrQKBvsv2d5RW-Kq8GB0cPQmCXGruPZUfl8v2D5zy4yRCSKZKwUuVodofx1pDjkigdDlcnxqYcpYFrlTKUPnE26IJxCkBxP9jByvT0VInbjYEhBOK0TeWElBiwweNZ_1oz3izzt7mVv4iAlGaVvpsXG75pCebzIJLy-hPmAmgL/s320/example-whistle.jpg"/>
</td>
</tr>
<tr><th>After</th><td><audio controls=""><source src="https://oona.windytan.com/blogfiles/viheltelee-unfluttered.mp3"></audio></td></tr>
</table>
<h3>Sidetrack 1: Anything else we can find?</h3>
<p>Glad you're still reading! Let's step away from flutter for a while and take the raw audio track itself under the Fourier microscope. Zooming closer, is there anything interesting in the lower end?</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhpgDAGqXTHKKLEf-Nsc4U2aBDg1NhpfQnoLcZt9fPXrh5LgosyzGyRckaNuA9O1Gb0XRKevcuwdFx9ZtCXGNKn5SNrrbGoMIHmNPjJW8_phqO4ZcB02wHAPpsin8q4yWw1WXI41Ly6l5tUYrm5kPC75aUH6jr1H8NIWW86Oe80bgwO5Rf8EcmCjsb_1vi5/s1074/annotated-200px.jpg"><img alt="[Image: Spectrogram showing a frequency range from 0 to 180 Hz.]" border="0" width="550" data-original-height="296" data-original-width="1074" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhpgDAGqXTHKKLEf-Nsc4U2aBDg1NhpfQnoLcZt9fPXrh5LgosyzGyRckaNuA9O1Gb0XRKevcuwdFx9ZtCXGNKn5SNrrbGoMIHmNPjJW8_phqO4ZcB02wHAPpsin8q4yWw1WXI41Ly6l5tUYrm5kPC75aUH6jr1H8NIWW86Oe80bgwO5Rf8EcmCjsb_1vi5/s550/annotated-200px.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhpgDAGqXTHKKLEf-Nsc4U2aBDg1NhpfQnoLcZt9fPXrh5LgosyzGyRckaNuA9O1Gb0XRKevcuwdFx9ZtCXGNKn5SNrrbGoMIHmNPjJW8_phqO4ZcB02wHAPpsin8q4yWw1WXI41Ly6l5tUYrm5kPC75aUH6jr1H8NIWW86Oe80bgwO5Rf8EcmCjsb_1vi5/s1100/annotated-200px.jpg 2x"/></a></div>
<p>We can faintly see peaks at multiples of both 24 and 60 Hz. No surprises there, really... 24 Hz being the film framerate and 60 Hz the North American mains frequency. Was there a projector running in the recording studio? Or maybe it's an artifact of scanning the soundtrack one frame at a time? In any case, these sounds are pretty weak.</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgmJ6HtiyVj-yVW0JA7Ae5VYVOLhNuSfVoam0-yXFdzlKW6u1LeVuulZ872Jv7i8ec3-ofsac_qnXfL78uXDvydAiiOc97mPVXO03fLrR6rOPQv4n26REMoqsqOgwRIlkv1VzxaZI0ZOmYAZWCzT_FsVv4nldRBxQDIvxZ9OgsuFUOxFUUfcHsLJmiaBdoa/s856/sidebands.jpg"><img alt="[Image: Spectrogram showing tones with apparent sidebands.]" border="0" width="550" data-original-height="275" data-original-width="856" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgmJ6HtiyVj-yVW0JA7Ae5VYVOLhNuSfVoam0-yXFdzlKW6u1LeVuulZ872Jv7i8ec3-ofsac_qnXfL78uXDvydAiiOc97mPVXO03fLrR6rOPQv4n26REMoqsqOgwRIlkv1VzxaZI0ZOmYAZWCzT_FsVv4nldRBxQDIvxZ9OgsuFUOxFUUfcHsLJmiaBdoa/s550/sidebands.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgmJ6HtiyVj-yVW0JA7Ae5VYVOLhNuSfVoam0-yXFdzlKW6u1LeVuulZ872Jv7i8ec3-ofsac_qnXfL78uXDvydAiiOc97mPVXO03fLrR6rOPQv4n26REMoqsqOgwRIlkv1VzxaZI0ZOmYAZWCzT_FsVv4nldRBxQDIvxZ9OgsuFUOxFUUfcHsLJmiaBdoa/s1100/sidebands.jpg 2x"/></a></div>
<p>In some places you can see some sort of modulation that seems to be generating sidebands, just like in radio signals. It's especially visible in Mickey's whistle when it's flutter-corrected, here at the 5-second mark. The sidebands peaks are 107 and 196 Hz away from the 'carrier' if you will. I'm not sure what this could be. Fluctuating amplitude?</p>
<h3>Sidetrack 2: Playing sound-on-film frame by frame?</h3>
<p>This is an experiment I did some time ago. It's just a silly thought - what would happen if the soundtrack was being read in the same way as the picture is – stopped 24 times per second? Would this be the <em>ultimate</em> flutter distortion?</p>
<p>In the olden days, sound was stored on the film next to the picture frames as analog information. Unlike the picture frames that had to be stopped momentarily for projection, the sound had to be played at a constant speed. There was a complicated mechanism in the projector to make this possible.</p>
<p>I found some speed curves for old-school movie projectors in <a class="ref" href="#bickford72">[bickford72]</a>. They describe the film's deceleration and acceleration during these stops. Let's emulate these speed curves in audio with the oversampling varispeed method.</p>
<p>The video below is a 3D animation where this same speed curve controls an animation of a moving film in an imaginary machine. The clip is from another 1920s animation, <em>Alice in the Wooly West</em> (1926).</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/6yyuMOBck2s?rel=0" frameborder="0" gesture="media" allow="encrypted-media" allowfullscreen=""></iframe>
<p><em>~~ Now we know ~~</em></p>
<h3>Conclusions</h3>
<ul>
<li>We found a 15 Hz speed fluctuation that was, to some extent, reversible.</li>
<li>This flutter signal is already present in the optical soundtrack of a film scan (of unknown origin).</li>
<li>With enough manual work, much of the soundtrack could probably be 'corrected'.</li>
<li>'Hmm, that sounds odd' are sometimes the words of a white rabbit.</li>
</ul>
<h3>References</h3>
<ul class="references">
<li id="bbc24"><a href="https://www.bbc.com/news/entertainment-arts-67833411" class="external">"Disney's earliest Mickey and Minnie Mouse enter public domain as US copyright expires"</a>. BBC News. 2024-01-01.</li>
<li id="howarth04">Howarth, J. & Wolfe, P. J. (2004): <a href="https://www.aes.org/e-lib/browse.cfm?elib=12870" class="external">Correction of Wow and Flutter Effects in Analog Tape Transfers</a></li>
<li id="gasior04">Gasior, M. & Gonzalez, J.L. (2004): Improving FFT Frequency Measurement Resolution by Parabolic and Gaussian Spectrum Interpolation</li>
<li id="bickford72">Bickford, John H. (1972). "Geneva Mechanisms". Mechanisms for intermittent motion (<a href="https://web.archive.org/web/20140102154224/http://ebooks.library.cornell.edu/k/kmoddl/pdf/002_010.pdf" class="external">PDF</a>). New York: Industrial Press Inc.</li>
</ul>
Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com14tag:blogger.com,1999:blog-5096278891763426276.post-53596889011100335582023-02-27T23:02:00.013+02:002023-05-18T22:44:57.762+03:00Using HDMI radio interference for high-speed data transfer<p>This story, too, begins with noise. I was browsing the radio waves with a software radio, looking for mysteries to accompany my ginger tea. I had started to notice a wide-band spiky signal on a number of frequencies that only seemed to appear indoors. Some sort of interference from electronic devices, probably. Spoiler alert, it eventually led me to broadcast a webcam picture over the radio waves... but how?</p>
<h3>It sounds like video</h3>
<p>The mystery deepened when I listened to how this interference sounded like as an AM signal. It reminded me of a time I mistakenly plugged our home stereo system to the Nintendo console's video output and heard a very similar buzz.</p>
<div class="audiodiv"><audio controls=""><source src="https://oona.windytan.com/blogfiles/monitor-buzz.mp3"></audio></div>
<p>Am I possibly listening to video? Why would there be analog video transmitting on any frequency, let alone inside my home?</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXUsWHBMqcYxy4gA0sBUBqWrLsHi4PWksuIljM1lNuhYfc9YCgUGSjo9jAeGmgf1IEMrBrCgaqDTfunlFkWw_5yGTUujWYw3Twd16cn_JQkPzXCoL4YeRU_aCXxu1bDzbssBxo5wBtXuRXWVMSCCjZ2rj80CsOQCEORg1FhVSVLK724BQNDxMtHKGT7A/s1726/Screen%20Shot%202023-02-26%20at%2011.16.20.png"><img alt="" border="0" width="500" data-original-height="538" data-original-width="1726" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXUsWHBMqcYxy4gA0sBUBqWrLsHi4PWksuIljM1lNuhYfc9YCgUGSjo9jAeGmgf1IEMrBrCgaqDTfunlFkWw_5yGTUujWYw3Twd16cn_JQkPzXCoL4YeRU_aCXxu1bDzbssBxo5wBtXuRXWVMSCCjZ2rj80CsOQCEORg1FhVSVLK724BQNDxMtHKGT7A/s480/Screen%20Shot%202023-02-26%20at%2011.16.20.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXUsWHBMqcYxy4gA0sBUBqWrLsHi4PWksuIljM1lNuhYfc9YCgUGSjo9jAeGmgf1IEMrBrCgaqDTfunlFkWw_5yGTUujWYw3Twd16cn_JQkPzXCoL4YeRU_aCXxu1bDzbssBxo5wBtXuRXWVMSCCjZ2rj80CsOQCEORg1FhVSVLK724BQNDxMtHKGT7A/s960/Screen%20Shot%202023-02-26%20at%2011.16.20.png 2x" alt="[Image: Oscillogram of a noisy waveform that seems to have a pulse every 10 microseconds or so.]"/></a></div>
<p>If we plot the signal's amplitude against time we can see that there is a strong pulse exactly 60 times per second. This could be the vertical synchronisation signal of 60 Hz video. A shorter pulse (pictured above) can be seen repeating more frequently; it could be the horizontal one. Between these pulses there is what appears to be noise. Maybe, if we use the strong pulses for synchronisation and plot the amplitude of that noise as a two-dimensional picture, we could see something?</p>
<p>And sure enough, when main screen turn on, we get signal:</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9CNpvezklyb70HXN6VomYVMSpTsW6AFDkXmP_0O4DjuaWfEhvDDu_j1owOPDDOaIYvEcLe9L4ifzTy2wSfzqaMzBfSFa9xIlax5CSJVAQDiFb_81KkdjDr3yaJwbQsxbldjLlhjBwUUs4xyVSdZKeiunBFFkLlGKR734HbGYvi6PJ_9JlKv6l0-wDDw/s947/saved_frame.jpg"><img alt="" border="0" width="473" data-original-height="668" data-original-width="947" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9CNpvezklyb70HXN6VomYVMSpTsW6AFDkXmP_0O4DjuaWfEhvDDu_j1owOPDDOaIYvEcLe9L4ifzTy2wSfzqaMzBfSFa9xIlax5CSJVAQDiFb_81KkdjDr3yaJwbQsxbldjLlhjBwUUs4xyVSdZKeiunBFFkLlGKR734HbGYvi6PJ_9JlKv6l0-wDDw/s473/saved_frame.jpg" alt="[Image: A grainy greyscale image of what appears to be a computer desktop.]"/></a></div>
<p>(I've hidden the bright synchronisation signal from this picture.)</p>
<p>It seems to be my Raspberry Pi's desktop with weirdly distorted greyscale colours! Somehow, some part of the monitor setup is radiating it quite loudly into the aether. The frequency I'm listening to is a multiple of the monitor's pixel clock frequency.</p>
<p>As it turns out, this vulnerability of some monitors has been known for a long time. In 1985, van Eck demonstrated how CRT monitors can be spied on from a distance<a href="#vanEck1985" class="ref">[1]</a>; and in 2004, Markus Kuhn showed that the same still works on flat-screen monitors<a href="#Kuhn2004" class="ref">[2]</a>. The image is heavily distorted, but some shapes and even bigger text can be recognisable.</p>
<p>The next thought was, could we get any more information out of these images? Is there any information about colour?</p>
<h3>Mapping all the colours</h3>
<p>HDMI is fully digital; there is no linear dependency between pixel values and greyscale brightness in this amplitude image. I believe the brightness is related to the number of bit transitions over my radio's sampling time (which is around 8 bit-lengths); and in HDMI, this is dependent on many things, not just the actual RGB value of the pixel. HDMI also uses multiple differential wires that all are transmitting their own picture channels side by side.</p>
<p>This is why I don't think it's <del>possible</del> easy to reconstruct a clear picture of what's being shown on the screen, let alone decode any colours.</p>
<p>But could the reverse be possible? Could we control this phenomenon to draw the greyscale pictures of our choice on the receiver's screen? How about sending binary data by displaying alternating pixel values on the monitor?</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjYlBNJnxusLm0zga8IEeSTo8OwbdhSK_FlW37ms4d_Vs443hYuSLpUnGWl1_pvmYsdo9u4Tmnc0GSPr4He9tFsSOd3kC6JZ2LTJnsNI1Gy7KhEsL_qXGBi-TWeolcUiwyprFVL_F7Rw4MOCe3zVh6XBzeCpdeiHumCgQOxPmz22WZTT2V1yeRFjKJAwA/s1360/gradients1.jpg"><img alt="" border="0" width="480" data-original-height="768" data-original-width="1360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjYlBNJnxusLm0zga8IEeSTo8OwbdhSK_FlW37ms4d_Vs443hYuSLpUnGWl1_pvmYsdo9u4Tmnc0GSPr4He9tFsSOd3kC6JZ2LTJnsNI1Gy7KhEsL_qXGBi-TWeolcUiwyprFVL_F7Rw4MOCe3zVh6XBzeCpdeiHumCgQOxPmz22WZTT2V1yeRFjKJAwA/s480/gradients1.jpg" alt="[Image: On the left, gradients of red, green, and blue; on the right, greyscale lines of seemingly unrelated brightness.]"/></a></div>
<p>My monitor uses 16-bit colours. There are "only" 65,536 different colours, so it's possible to go through all of them and see how each appears in the receiver. But it's not that simple; the bit-pattern of a HDMI pixel can actually get modified based on what came before it. And my radio isn't fast enough to even tell the bits apart anyway. What we could do is fill entire lines with one colour and average the received signal strength. We would then get a mapping for single-colour horizontal streaks (above). Assuming a long run of the same colour always produces the same bitstream, this could be good enough.</p>
<div class="saumaton kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgO5yrhvrmVgWY1dt800HyVjAVnsiJYj2pbYfwLS45kGP3hB9BAfqqFYHQalZ7dkL8sztMslCKuAk91yIbe20DrVrKjGgIxWpBacpQ6PhYuG7Qj7BAqeitzSpTEIp5hZ062sr8KK5rMEAJcaRadvfomtelE9qSUTYy-OdDZsfuMapsWmYogOcC0p2P39Q/s1936/565.png"><img alt="" border="0" width="500" data-original-height="816" data-original-width="1936" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgO5yrhvrmVgWY1dt800HyVjAVnsiJYj2pbYfwLS45kGP3hB9BAfqqFYHQalZ7dkL8sztMslCKuAk91yIbe20DrVrKjGgIxWpBacpQ6PhYuG7Qj7BAqeitzSpTEIp5hZ062sr8KK5rMEAJcaRadvfomtelE9qSUTYy-OdDZsfuMapsWmYogOcC0p2P39Q/s500/565.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgO5yrhvrmVgWY1dt800HyVjAVnsiJYj2pbYfwLS45kGP3hB9BAfqqFYHQalZ7dkL8sztMslCKuAk91yIbe20DrVrKjGgIxWpBacpQ6PhYuG7Qj7BAqeitzSpTEIp5hZ062sr8KK5rMEAJcaRadvfomtelE9qSUTYy-OdDZsfuMapsWmYogOcC0p2P39Q/s1000/565.png 2x" alt="[Image: An XY plot where x goes from 0 to 65536 and Y from 0 to 1.2. A pattern seems to repeat itself every 256 values of x. Values from 16128 to 16384 are markedly higher.]"/></a></div>
<p>Here's the map of all the colours and their intensity in the radio receiver. (Whatever happens between 16,128 and 16,384? I don't know.)</p>
<p>Now, we can resample a greyscale image so that its pixels become short horizontal lines. Then, for every greyscale value find the closest matching RGB565 color in the above map. When we display this psychedelic hodge-podge of colour on the screen (on the right), enough of the above mapping seems to be preserved to produce a recognizable picture of a movie<a href="#KungFury" class="ref">[3]</a> on the receiver side (on the left):</p>
<div class="saumaton kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqVgmnnJ63CKp_QK6jXBh2Bk0_Geg4Q1HyIsjmAVDsqckXYPgor6K-i6eS3w7EFBqUG8riRKO58_HtEawOZ8SayDR2ExvxH7_odJDyYO4TTZ_GejF6DlnbqpOlfIRZoiuF9lJWjtIa-Ibxv7RMLhRvqjHofnXgFLDHcK7l_bNFhkQBvGVX7pg9wzL-WA/s1200/hackermans.jpg"><img alt="" border="0" width="500" data-original-height="729" data-original-width="1200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqVgmnnJ63CKp_QK6jXBh2Bk0_Geg4Q1HyIsjmAVDsqckXYPgor6K-i6eS3w7EFBqUG8riRKO58_HtEawOZ8SayDR2ExvxH7_odJDyYO4TTZ_GejF6DlnbqpOlfIRZoiuF9lJWjtIa-Ibxv7RMLhRvqjHofnXgFLDHcK7l_bNFhkQBvGVX7pg9wzL-WA/s500/hackermans.jpg" alt="[Image: On the right, a monitor shows a noisy green and blue image. On the left, another monitor shows a grainy picture of a man and the text 'Hackerman'.]"/></a></div>
<p>These colours are not constant in any way. If I move the antenna around, even if I turn it from vertical to horizontal, the greyscales will shift or even get inverted. If I tune the radio to another harmonic of the pixel clock frequency, the image seems to break down completely. (Are there more secrets to be unfolded in these variations?)</p>
<h3>The binary exfiltration protocol</h3>
<p>Now we should have enough information to be able to transmit bits. Maybe even big files and streaming data, depending on the bitrate we can achieve.</p>
<p>First of all, how should one bit be encoded? The absolute brightness will fluctuate depending on radio conditions. So I decided to encode bits as the brightness difference between two short horizontal lines. Positive difference means 1 and negative 0. This should stay fairly constant, unless the colours completely flip around that is.</p>
<div class="saumaton kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQ036DGW7I3bTtvUHyLhOKnRRI7x_4NJE_SVbM7UR5BOpzA1zggZkuX4X9iuxL1_hWmgQ45SibjU1yUxamIDsksiydE9KoHwQKSvRfc5koVN6UPB12OQJ0OWwQhb1Byzy2YiA4ojTBCElyE-so6nj7uYKftnRdoSxjhCvW_jqQTp0wWSSS_bZKciifxw/s1200/exprotocol.jpg"><img alt="" border="0" width="500" data-original-height="613" data-original-width="1200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQ036DGW7I3bTtvUHyLhOKnRRI7x_4NJE_SVbM7UR5BOpzA1zggZkuX4X9iuxL1_hWmgQ45SibjU1yUxamIDsksiydE9KoHwQKSvRfc5koVN6UPB12OQJ0OWwQhb1Byzy2YiA4ojTBCElyE-so6nj7uYKftnRdoSxjhCvW_jqQTp0wWSSS_bZKciifxw/s500/exprotocol.jpg" alt="[Image: When a bit is 0, the leftmost line is darker than the rightmost line, and vice versa. These lines are used to form 768-bit packets.]"/></a></div>
<p>The monitor has 768 pixels vertically. This is a nice number so I designed a packet that runs vertically across the display. (This proved to be a bad decision, as we will later see.) We can stack as many packets side-by-side as the monitor width allows. A new batch of packets can be displayed in each frame, or we can repeat them over multiple frames to improve reliability.</p>
<p>These packets should have some metadata, at least a sequence number. Our medium is also quite noisy, so we need some kind of forward error correction. I'm using a Hamming(12,8) code which adds 4 error correction bits for every 8 bits of data. Finally, we need to add a CRC to each packet so we can make sure it arrived intact; I chose CRC16 with the polynomial <code>0x8005</code> (just because liquid-dsp provided it by default).</p>
<h3>First results!</h3>
<p>It was quite unbelievable, I was able to transmit a looping 64 kbps audio stream almost without any glitches, with the monitor and the receiver in the same room approximately 2 meters from each other.</p>
<p class="remark"><span class="remark-title">Quick tip.</span> Raw 8-bit PCM audio is a nice test format for these kinds of streaming experiments. It's straightforward to set an arbitrary bitrate by resampling the sound (with <a href="https://sox.sourceforge.net/" class="external">SoX</a> for instance); there's no structure, headers, or byte order to deal with; and any packet loss, misorder, or buffer underrun is instantly audible. You can use a headerless companding algorithm like A-law to fit more dynamic range in 8 bits. Even stereo works; if you start from the wrong byte the channels will just get swapped. SoX can also play back the stream.</p>
<p>But can we get more? Slowly I added more samples per second, and a second audio channel. Suddenly we were at 256 kbps and still running smoothly. 200 kbps was even possible from the adjacent room, with a directional antenna 5 meters away, and with the door closed! In the same room, it worked up to around 512 kilobits per second but then hit a wall.</p>
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVWff6eBNhzq5ilkn_CQ9jtwaioTzBniaOrssVlPVsFM8fyKMiFvZiQRgvzKZpmiNgVnCJMNd2OvRCP4RGr0SjtVtB-Kr1EAUVpqf2y0GPRZLTU0eBFZBMDgjZzIO5RnYB0EQA3vbZ3h6RbPsMR4yL3EMt7QPY797dHd2z5ggc_BhDTdXwqUpq7xRhZw/s500/500k.jpg" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" width="320" data-original-height="292" data-original-width="500" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVWff6eBNhzq5ilkn_CQ9jtwaioTzBniaOrssVlPVsFM8fyKMiFvZiQRgvzKZpmiNgVnCJMNd2OvRCP4RGr0SjtVtB-Kr1EAUVpqf2y0GPRZLTU0eBFZBMDgjZzIO5RnYB0EQA3vbZ3h6RbPsMR4yL3EMt7QPY797dHd2z5ggc_BhDTdXwqUpq7xRhZw/s320/500k.jpg" alt="[Image: Info window that says HasPreamble: 1. Total: 928.5 kbps, Fresh: 853.6 kbps, Fresh (payload): 515.7 kbps.]"/></a></div>
<h3>A tearful performance</h3>
<p>The heavy error correction and framing adds around 60% of overhead, and we're left wit 480 bits of 'payload' per packet. If we have 39 packets per frame at 60 frames per second we should get more than a megabit per second, right? But for some reason it always caps at half a megabit.</p>
<p>The reason revealed itself when I noticed every other frame was often completely discarded at the CRC check. Of course; I should have thought of properly synchronising the screen update to the graphics adapter's frame update cycle (or VSYNC). This would prevent the picture information changing mid-frame, also known as tearing. But whatever options I tried with the SDL library I couldn't get the Raspberry Pi 4 to not introduce tearing.</p>
<p>Screen tearing appears to be an unsolved problem plaguing the Raspberry Pi 4 specifically (see this <a href="https://www.google.com/search?q=raspi+4+tearing" class="external">Google search</a>). I tried another mini computer, the Asus Tinker Board R2.0, but I couldn't get the graphics drivers to work properly. I then realised it was a mistake to have the packets run from top to bottom; any horizontal tearing will cut every single packet in half! With a horizontal design only one packet per frame would suffer this fate.</p>
<h3>A new design enables video-over-video</h3>
<p>Packets that run horizontally across the screen indeed fix most of the packet loss. It may also help with CPU load as it improves memory access locality. I'm now able to get 1000 kbps from the monitor! What could this be used for? A live video stream, perhaps?</p>
<p>But the clock was ticking. I had a presentation coming up and I really wanted to amaze everyone with a video transfer demo. I quite literally got it working on the morning of the event. For simplicity, I decided to go with MJPEG, even though fancier schemes could compress way more efficiently. The packet loss issues are mostly kept at bay by repeating frames.</p>
<p>The data stream is "hidden" in a Windows desktop screenshot; I'm changing the colours in a way that both creates a readable bit and also looks inconspicuous when you look from far away.</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/iemOXp6bQXA?rel=0" frameborder="0" gesture="media" allow="encrypted-media" allowfullscreen></iframe>
<h3>Mitigations</h3>
<p>This was a fun project but this kind of a vulnerability could, in the tinfoiliest of situations, be used for exfiltrating information out of a supposedly airgapped computer.</p>
<p>The issue has been alleviated in some modern display protocols. DisplayPort<a href="#VESA2006" class="ref">[4]</a> makes use of scrambling: a pseudorandom sequence of bits is mixed with the bitstream to remove the strong clock oscillations that are so easily radiated out. This also randomizes the bitstream-to-amplitude correlation. I haven't personally tested whether it still has some kind of video in their radio interference, though. (Edit: Scrambling seems to be optionally supported by later versions of HDMI, too – but it might depend on which features exactly the two devices negotiate. How could you know if it's turned on?)</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwU4DaZn87NEYfQkxI97gJex5Obh2ZmISUAeK01b1S5MhdQ-SU04ozJ1OJcba2jF80FSyyN4KJV2RuPo6D2lIfx-ow_VRu-RFV8UkEix8M67H46Kj8s556zVu2UMVNGWGlwCZBGk8iy6Y0EI96uXiSl_6Hfy3KiARTucPqe8-QWz3pvyuAgRRr9HlhtQ/s1200/impractical.jpg"><img alt="" border="0" width="400" data-original-height="885" data-original-width="1200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwU4DaZn87NEYfQkxI97gJex5Obh2ZmISUAeK01b1S5MhdQ-SU04ozJ1OJcba2jF80FSyyN4KJV2RuPo6D2lIfx-ow_VRu-RFV8UkEix8M67H46Kj8s556zVu2UMVNGWGlwCZBGk8iy6Y0EI96uXiSl_6Hfy3KiARTucPqe8-QWz3pvyuAgRRr9HlhtQ/s400/impractical.jpg" alt="[Image: A monitor completely wrapped in tinfoil, with the text IMPRACTICAL written over it.]"/></a></div>
<p>I've also tried wrapping the monitor in tinfoil (very impractical) and inside a cage made out of chicken wire (it had no effect - perhaps I should have grounded it?). I can't recommend either of these.</p>
<h3>Software considerations</h3>
<p>This project was made possible by at least C++, Perl, SoX, ImageMagick, liquid-dsp, Dear Imgui, GLFW, turbojpeg, and v4l2! If you're a library that feels left out, please leave a comment.</p>
<p>If you wish to play around with video emanations, I heard there is a project called TempestSDR. For generic analog video decoding via a software radio, there is TVSharp.</p>
<h3>References</h3>
<ol class="references">
<li id="vanEck1985">Van Eck, Wim (1985): Electromagnetic radiation from video display units: An eavesdropping risk?</li>
<li id="Kuhn2004">Kuhn, Markus (2004): Electromagnetic Eavesdropping Risks of Flat-Panel Displays</li>
<li id="KungFury"><a href="https://www.youtube.com/watch?v=bS5P_LAqiVg" class="external">KUNG FURY Official Movie [HD]</a> (2015)</li>
<li id="VESA2006">Video Electronics Standards Association (2006): DisplayPort Standard, version 1.</li>
</ol>Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com23tag:blogger.com,1999:blog-5096278891763426276.post-48691344204028657952021-11-28T18:02:00.009+02:002023-08-07T16:41:06.079+03:00Spiral spectrograms and intonation illustrations<p>I've been experimenting with methods for visualising harmony, intonation (tuning), and overtones in music. Ordinary spectrograms aren't very well suited for that as the harmonic relations are not intuitively visible. Let's see what could be done about this. I'll try to sprinkle the text with Wikipedia links in order to immerse (<a href="https://en.wikipedia.org/wiki/Nerd_sniping" class="external">nerd snipe</a>?) the reader in the subject.</p>
<h3>Equal temperament cents against time</h3>
<p>We can examine how tuning evolves during a recording by choosing a reference pitch and plotting all frequencies relative to it modulo 100 <a href="https://en.wikipedia.org/wiki/Cent_(music)" class="external">cents</a>. This is similar to what an <a href="https://en.wikipedia.org/wiki/Electronic_tuner" class="external">electronic tuner</a> does, but instead of just showing the fundamental frequency, we'll plot the whole spectrum. Information about the absolute frequencies is lost. This "zoomed-in" plot visualises how the distribution of frequencies fits the 12-tone <a href="https://en.wikipedia.org/wiki/Equal_temperament" class="external">equal temperament</a> system (12-TET) common in Western music.</p>
<p>Here's the first 20 seconds of Debussy's <em>Clair De Lune</em> as found on YouTube, played with a well-tuned (<a href="https://www.youtube.com/watch?v=CvFH_6DNRCY#t=7s" class="external" title="CLAUDE DEBUSSY: CLAIR DE LUNE">video</a>) and an out-of-tune piano (<a href="https://www.youtube.com/watch?v=xgkWOcLT0Lw#t=7s" title="YouTube: Debussy - Clair de Lune | Played on an Out of Tune Piano" class="external">video</a>). The second piano sounds out of tune because there are relative differences in tuning between the strings. The first piano looks to be a few cents sharp as a whole, but consistently so, so it's not perceptible.</p>
<div class="saumaton kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEitcu6_yTFzFwmNwpEV8KFKefpfG4kILVbRoBMjoYw5b0qXJnWab3aMCmvhrFGOKs3J-EvJi6DiXJI1hzzBAXSdY19HBoeYrwj35nKpqjhtVutYaMHJ4riEuE_wg5p3BbN5tfonWsQ__1F2/s1000/debussy.png"><img alt="" border="0" data-original-height="411" data-original-width="1000" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEitcu6_yTFzFwmNwpEV8KFKefpfG4kILVbRoBMjoYw5b0qXJnWab3aMCmvhrFGOKs3J-EvJi6DiXJI1hzzBAXSdY19HBoeYrwj35nKpqjhtVutYaMHJ4riEuE_wg5p3BbN5tfonWsQ__1F2/s500/debussy.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEitcu6_yTFzFwmNwpEV8KFKefpfG4kILVbRoBMjoYw5b0qXJnWab3aMCmvhrFGOKs3J-EvJi6DiXJI1hzzBAXSdY19HBoeYrwj35nKpqjhtVutYaMHJ4riEuE_wg5p3BbN5tfonWsQ__1F2/s1000/debussy.png 2x" alt="[Image: Two spectrograms labeled 'Piano in tune' and 'Piano out of tune'. The first one shows blobs of light along the center axis. In the second one, the blobs are jumping up and down the graph.]"/></a></div>
<p>The vertical axis is the same that electronic tuners use. All the notes of a chord will appear in the middle, as long as they are well tuned against the reference pitch of, say, <a href="https://en.wikipedia.org/wiki/A440_(pitch_standard)" class="external">A = 440 Hz</a>. The top edge of the graph is half a semitone sharp (quarter tone = 50c), and the bottom is half a semitone flat.</p>
<p>Overtones barely appear in the picture because the first three conveniently hit other notes in tune. But from f5 onwards the <a href="https://en.wikipedia.org/wiki/Harmonic_series_(music)" class="external">harmonic series</a> starts deviating from 12-TET and the harmonics start to look out-of-tune (f5 = −14c, f7 = −31c, ...). These can be cut out by filtering, or hoping that they're lower in volume and setting the color range accordingly.</p>
<p style="text-transform: uppercase; letter-spacing: 0.4em; text-align:center; font-family:serif; padding: 1em">Six months later...</p>
<p>You know how you sometimes accidentally delete a project and have to rewrite it from scratch much later, and it's never exactly the same? That's what happened at this point. It's a little scuffed, but here's some piano music that utilises quarter tones (by Wyschnegradsky). I used the same scale on purpose, so the quarter tones wrap around from the top and bottom. </p>
<div class="saumaton kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_w_F10RjlAfFh2dfoUk6n_cghhzaPR7sYG45J9gtehVncrO-nY1fL4ZGizUo4j41b781dzUBwj60-EkX9L_fpin0Ma-GnSGl_7ZFZcb6Upbzt2V-AL7HPCpy6y0NaFkeh1oJx7yVj9XkQ/s1056/quartertones.png"><img alt="" border="0" data-original-height="200" data-original-width="1056" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_w_F10RjlAfFh2dfoUk6n_cghhzaPR7sYG45J9gtehVncrO-nY1fL4ZGizUo4j41b781dzUBwj60-EkX9L_fpin0Ma-GnSGl_7ZFZcb6Upbzt2V-AL7HPCpy6y0NaFkeh1oJx7yVj9XkQ/s500/quartertones.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_w_F10RjlAfFh2dfoUk6n_cghhzaPR7sYG45J9gtehVncrO-nY1fL4ZGizUo4j41b781dzUBwj60-EkX9L_fpin0Ma-GnSGl_7ZFZcb6Upbzt2V-AL7HPCpy6y0NaFkeh1oJx7yVj9XkQ/s1000/quartertones.png 2x" alt="[Image: Similar as above, but blobs are also seen at the very top and bottom of the graph.]"/></a></div>
<p>More examples of these "intonation plots" in <a href="https://twitter.com/windyoona/status/1401573702165204997" class="external">a tweet</a>.</p>
<p>This suits well for piano music. However, not even all western classical music is tuned to equal temperament; for instance, solo strings may be played with Pythagorean intonation<a href="#Loosen1994" class="ref" title="Tuning of diatonic scales by violinists, pianists, and nonmusicians">[1]</a>, whereas vocal ensembles<a href="#DAmario2020" class="ref" title="A Longitudinal Study of Intonation in an a cappella Singing Quintet">[2]</a> and string quartets may tune some intervals closer to just intonation. Unlike the piano, these wouldn't look too good in the equal-note plot.</p>
<h3>Octave spiral</h3>
<p>If we instead plot the spectrum modulo 1200 cents (1 octave) we get an interesting interval view. We could even ditch the time scale and wind the spectrogram into a spiral to make it prettier and preserve the overtones and absolute frequency information. Now each note is represented by an angle in this spiral; in 12-TET, they're separated by 30 degrees. At any point in the spiral, going 1 turn outwards doubles the frequency.</p>
<p>Here's a C major chord on a piano. Note how the harmonic content adds several high-frequency notes on top of the C, E, and G of the triad chord, and how multiple notes can contribute to the same overtone:</p>
<div class="saumaton kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhOtGHKCQtsO7AQgTY9WlITvIKzB11TGNt6oZBNDLNCaeSd0YWBrHRNO5Vvyjmo_pL-9PfU0vnPwQEAIVmwE-XIa7hNoI6WYiF8HnJE8PpQIqPjAC44W-Ygiu5jgDTo_kEyXTwKLb_QHbSi/s797/cduuri-plot.gif"><img alt="" border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhFURiTfZMg_rFGPBD2invqT_bFHNGfpc-pstZCR6IpeZR1w7-K9MkfK6Rp-MnyBlwzypV5nDOfqsIgszWvjMXPEvVZlVGUUg8JRdYuLf85kAKjJdJ6Slgs8ipm39oEEQuZOga-nIv_Ex4h/s390/cduuri-plot.gif" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhFURiTfZMg_rFGPBD2invqT_bFHNGfpc-pstZCR6IpeZR1w7-K9MkfK6Rp-MnyBlwzypV5nDOfqsIgszWvjMXPEvVZlVGUUg8JRdYuLf85kAKjJdJ6Slgs8ipm39oEEQuZOga-nIv_Ex4h/s780/cduuri-plot.gif 2x" alt="[Image: 12 note names labeled around a spiral. The C, E, and G notes light up sequentially, and their frequency and harmonics are displayed on the spiral.]"/></a></div>
<p>I had actually hoped I would get an <em>Ultimate Chord Viewer</em> where any note, regardless of frequency, would have all its harmonics neatly stacked on top of it. But it's not what has happened here: the harmonic series is not a stack of octaves (2^n), but instead of integer multiples (n). Some harmonics appear at seemingly unrelated angles. But it's still a pretty interesting visualisation, and perhaps makes more sense musically.</p>
<p>This plot is also better at illustrating different tuning systems. Let's look at a <a href="https://en.wikipedia.org/wiki/Major_third" class="external">major third</a> interval F-A in equal temperament and <a href="https://en.wikipedia.org/wiki/Just_intonation" class="external">just intonation</a>, with a few more harmonics.</p>
<div class="saumaton kuva keskella fills-mobile" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXFPWrnuz599rW3Yw5TrOFUGmj9cAHK7c2eBYwS0kZzKvo1OP6uTcwPsuUu_NyD5esMk04cn4n1Xjkg7mu8bWF17ssjIgS4l-FKAKo23_2AjHmS_XC32NmbVdmlOTTRVb99BVFetsdzgqM/s1000/just-equal-optimized.gif"><img alt="" border="0" data-original-height="616" data-original-width="1000" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXFPWrnuz599rW3Yw5TrOFUGmj9cAHK7c2eBYwS0kZzKvo1OP6uTcwPsuUu_NyD5esMk04cn4n1Xjkg7mu8bWF17ssjIgS4l-FKAKo23_2AjHmS_XC32NmbVdmlOTTRVb99BVFetsdzgqM/s500/just-equal-optimized.gif" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXFPWrnuz599rW3Yw5TrOFUGmj9cAHK7c2eBYwS0kZzKvo1OP6uTcwPsuUu_NyD5esMk04cn4n1Xjkg7mu8bWF17ssjIgS4l-FKAKo23_2AjHmS_XC32NmbVdmlOTTRVb99BVFetsdzgqM/s1000/just-equal-optimized.gif 2x" alt="[Image: The interval alternates between 400 and 386 cents. When it's 386, a few harmonics of the F note merge with those of the A note.]"/></a></div>
<p>The intuition from this plot is that equal temperament aims for equal distances (angles) between notes and just intonation tries to make more of the harmonics match instead.</p>
<p>Even though the <em>Ultimate Chord Viewer</em> was not achieved I now have ideas for Christmas lights...</p>
<h3>Live visualization</h3>
<p>Here's what a cappella music with some reverb looks like on the spiral spectrogram.</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/VeC8TIu8c5M?rel=0" frameborder="0" gesture="media" allow="encrypted-media" allowfullscreen></iframe>
<h3>A shader toy</h3>
<p><a href="https://www.shadertoy.com/view/ftKGRc" class="external">This little real-time GLSL demo</a> on shadertoy draws a spiral FFT from microphone audio. But don't get your hopes up: The spectrograms in this blog post were made with a 65536-point FFT. Shadertoy's 512px microphone texture offers a lot less in terms of frequency range and bins. This greatly blurs the frequency resolution, especially towars the low end. Could it be improved with the right colormap? Or a custom FFT with the waveform texture as its input?</p>
<h3>References</h3>
<ul class="references">
<li id="Loosen1994">Loosen, F. (1994): <a href="https://link.springer.com/content/pdf/10.3758/BF03213900.pdf" title="Tuning of diatonic scales by violinists, pianists, and nonmusicians" class="external">Tuning of diatonic scales by violinists, pianists, and nonmusicians</a>. <span class="ref-title">Perception & Psychophysics</span> <span class="ref-volume">56</span>(2): 221–226.</li>
<li id="DAmario2020">D'Amario, S., Howard, D.M., Daffern, H., Pennill, N. (2020): <a href=" https://www.sciencedirect.com/science/article/pii/S0892199718302418" title="A Longitudinal Study of Intonation in an a cappella Singing Quintet" class="external">A Longitudinal Study of Intonation in an <i>a cappella</i> Singing Quintet</a>. <span class="ref-title">Journal of Voice</span> <span class="ref-volume">34</span>(1): 159.e13–159.e27.</li>
</ul>Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com10tag:blogger.com,1999:blog-5096278891763426276.post-2635596732725287712021-03-29T21:37:00.006+03:002022-11-11T08:48:58.951+02:00Speech to birdsong conversion<p>I had a dream one night where a blackbird was talking in human language. When I woke up there was actually a blackbird singing outside the window. Its inflections were curiously speech-like. The dreaming mind only needed to imagine a bunch of additional harmonics to form phonemes and words. One was left wondering if speech could be transformed into a blackbird song by isolating one of the harmonics...</p>
<p>One way to do this would be to:</p>
<ul>
<li>Find the instantaneous fundamental frequency and amplitude of the speech. For example, filter the harmonics out and use an FM demodulator to find the frequency. Then find the signal envelope amplitude by AM demodulation.</li>
<li>Generate a new wave with similar amplitude variations but greatly multiplied in frequency.</li>
</ul>
<div class="saumaton kuva keskella fills-mobile"><img alt="" border="0" data-original-height="306" data-original-width="1040" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgwmcvy__Hyvsbyi0IJevIZvLV-1hFR93nJGg7NYk5W4p-pLFyL9nrJNbBfHapaSzIW896CeqjDUu5VyFbtblGwsk-hxLLg8oAm5CCkY68kUDwLHqTSra5sAOx3xolqG3L8lgcdZcyh_quF/s520/bird-dsp-2x.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgwmcvy__Hyvsbyi0IJevIZvLV-1hFR93nJGg7NYk5W4p-pLFyL9nrJNbBfHapaSzIW896CeqjDUu5VyFbtblGwsk-hxLLg8oAm5CCkY68kUDwLHqTSra5sAOx3xolqG3L8lgcdZcyh_quF/s1040/bird-dsp-2x.png 2x" alt="[Image: Signal path diagram.]"/></div>
<p>A proof-of-concept script using the Perl-SoX-csdr command-line toolchain is available (<a href="https://gist.github.com/windytan/80781ca72c357bb61de8a7b70faea48f">source code here</a>). The result sounds surprisingly blackbird-like. Even the little trills are there, probably as a result of FM noise or maybe vocal fry at the end of sentences. I got the best results by speaking slowly and using exaggerated inflection.</p>
<p>Someone hinted that the type of intonation used in certain automatic announcements is perfect for this kind of conversion. And it seems to be true! Here, a noise gate and reverb has been added to the result to improve it a little: </p>
<div class="audiodiv"><audio controls=""><source src="https://oona.windytan.com/blogfiles/atthetone.mp3"></audio></div>
<p>And finally, a piece of sound art where this synthetic blackbird song is mixed with a subtle chord and a forest ambience:</p>
<div class="separator" style="clear: both; text-align: center;"><iframe class="BLOG_video_class" allowfullscreen="" youtube-src-id="vYguVHUlGCA" width="500" height="322" src="https://www.youtube.com/embed/vYguVHUlGCA"></iframe></div>
<p>Think of the possibilities: A simultaneous interpreter for talking to birds. A tool for dubbing talking birds in animation or live theatre. Entertainment for cats.</p>
<p>What other birds could be done with a voice changer like this? What about croaky birds like a duck or a crow?</p>
<p>(I talked about this blog post a little on NPR: <a href="https://www.npr.org/2021/04/16/988200892/heres-what-all-things-considered-sounds-like-in-blackbird-song" class="external">Here's What 'All Things Considered' Sounds Like — In Blackbird Song</a>)</p>
Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com44tag:blogger.com,1999:blog-5096278891763426276.post-26695019895054588732020-12-08T23:42:00.035+02:002023-08-07T16:41:23.491+03:00Plotting patterns in music with a fantasy record player<div class="kuva oikealla"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIK350lA9fMo-Z9PohL6SBuT_Xkl7mOGVsHRblgbnNBp7cLfnTWTXkWSJmiwMy5KD7PVhJGnISfBLfKrNTj8oR9-x5Qb33xGyx3Gs0Te13Aa-6jfWw1gLq5kMI936vJF30xCzpzVKtjMf8/s852/BlackPink.jpg"><img alt="" border="0" width="220" data-original-height="779" data-original-width="852" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIK350lA9fMo-Z9PohL6SBuT_Xkl7mOGVsHRblgbnNBp7cLfnTWTXkWSJmiwMy5KD7PVhJGnISfBLfKrNTj8oR9-x5Qb33xGyx3Gs0Te13Aa-6jfWw1gLq5kMI936vJF30xCzpzVKtjMf8/s320/BlackPink.jpg" alt="[Image: Close-up of a vinyl record showing a wavy pattern.]"/></a></div>
<p>Back in April I bought a vinyl record that had a weird wavy pattern near the outer edge. I though I may have broken it somehow but couldn't even test this because I don't own a record player. *) But when I took a closer look at the pattern it seemed to somehow follow changes in the music. That doesn't look like damage at all.</p>
<p>When I played the CD version it became clear: this was an artifact of the tempo of the electronic track (100 bpm) being a multiple of the rotational speed (33 1/3 rpm), and these were probably drum hits! My <a href="https://twitter.com/windyoona/status/1249999597315002373?s=20" class="external">tweet</a> sparked some interesting discussion and I've been pondering this ever since. Could we plot any song as a loop or grid based on its own tempo and see interesting patterns?</p>
<p>(*) I know, it's a little odd. But I have a few unplayed vinyl records waiting for the day that I finally have the proper equipment. By the way, the song was Black Pink by RinneRadio from their wonderful album staRRk.</p>
<p>I wrote a little <a href="https://gist.github.com/windytan/d46686709ff43cd679d52c17302f7736#file-plot_rpm-pl-L5" class="external">script</a> to do just this: to plot the amplitude of the FLAC into a grid with an adjustable width. The result looks very similar to the pattern on the vinyl surface! Note that this image is a "straightened out" version of the disc surface and it's showing three of those wavy patterns. The top edge corresponds to the outer edge of the vinyl.</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjfo5UWLOUu5REWu2Pywu_t1WgP6UJZ5W9DpzWU2NBuej4TthDHygkCXWka07TwdFMIpkroeqYT15OA3S19IcNU4nzb15WZ7fyyNmI6h0CE4A7VBTSXX-DhRDaGNlIYa2AG3NKQ9NGxO6_/s1024/blackpink2.png"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjfo5UWLOUu5REWu2Pywu_t1WgP6UJZ5W9DpzWU2NBuej4TthDHygkCXWka07TwdFMIpkroeqYT15OA3S19IcNU4nzb15WZ7fyyNmI6h0CE4A7VBTSXX-DhRDaGNlIYa2AG3NKQ9NGxO6_/s1024/blackpink2.png" alt="[Image: A plot showing similar patterns that were on the disc surface.]" width="500"/></a></div>
<p>Later I wrote a little more ambitious plotter that shall be explained soon.</p>
<h3>Computer-conducted music gives best patterns</h3>
<p>After plotting several different songs against their own tempo like this it seemed that in addition to electronic music a lot of pop and rock has this type of a pattern, too. The most striking and clear patterns can be seen in music that makes use of drum samples in a quantized time base (aka. a drum machine): the same kick drum sample, for example, repeats four times in each bar, perfectly timed by a computer so that they align in phase.</p>
<p>Somewhat similar patterns can be seen in live music that is played to a "click track": each band member hears a common computer-generated time signal in their earplug so that they won't sway from an average tempo. But of course the live beats won't be perfectly phase-aligned in this case, because the musicians are humans and there's also physics involved.</p>
<h3>3D rendered video experiment</h3>
<p>To demonstrate how the patterns on vinyl records are born I made a video showing a fantasy record player that can play an "e-ink powered optical record" and morph it into any RPM. I say fantasy because it's just that: imagination, science fiction, rendered 3D art - it would be quite unfeasible in real life. You can't actually make e-ink displays that fast and accurate. But of course it would be possible to have a live display of a digitally sampled audio file as a spiral and use some kind of physical controllers to change the RPM value in real time, and just play the sound from a digital file.</p>
<p>Making the video was really fun and I think the end result is equal parts weird and illustrating.</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/mRi23ueU7Zk?rel=0" frameborder="0" gesture="media" allow="encrypted-media" allowfullscreen></iframe>
<h3>Programming the disk surface and audio</h3>
<p>The disc surface is based on a video texture: a different image was rendered for each changed frame using a purpose-written C++ program. The program uses an oversampled (8x) version of the original music that it then resamples at a variable rate based on the recorded RPM value (let's call it a morphing value). Oversampling and low-pass filtering beforehand makes variable-rate resampling simple: just take a sample at the appropriate time instant and don't worry about interpolation. It won't sound perfect but actually the artifact adds an interesting distortion, perhaps simulating pixel boundaries in the 'e-ink display'.</p>
<p>The amplitude sample at each time instant was projected into polar coordinates and plotted as an image. The image is fairly large - at least 2048 by 2048 pixels. I use this as a sort of image-space oversampling to get the polar projection to look a little better. I even tried 8192 x 8192 video but it was getting too heavy on my computer. But a new image must only be generated when the morphing value changes; the other frames can be copied.</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjPisSQ0k9f16Mkoi7KOWCajT5YBDRJbUt0QjbR7mIWw3LZD22hyphenhyphenS9o4HTOCrkbjlEr9Q8la_GWX-CUZeotP9V8yBj7tdogf4dM351kjbrt1YTsMsc1E8vUf98ZophoMLD57qxwmPMv7vwn/s512/disc-pigpen-insync.jpg"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjPisSQ0k9f16Mkoi7KOWCajT5YBDRJbUt0QjbR7mIWw3LZD22hyphenhyphenS9o4HTOCrkbjlEr9Q8la_GWX-CUZeotP9V8yBj7tdogf4dM351kjbrt1YTsMsc1E8vUf98ZophoMLD57qxwmPMv7vwn/s512/disc-pigpen-insync.jpg" alt="[Image: A square image of the disc video texture.]"/></a></div>
<p>The sound track was made by continuously sampling the position of the "play head" 44100 times per second, whether the disk was moving or not. Which sample ends up in the audio depends on the current rotational angle and the morphing value of the disk surface. When either of those values change it moves the audio past the play head. A DC cancel filter was then applied because the play head would often stop on a non-zero sample, and it didn't look nice in the waveform. There's also a quiet hum in the background.</p>
<div class="kuva keskella fills-mobile"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhWdBWR5I5I1mTXJs721g0ZtFf5ixXLtAqnh96HYgl_ICY1u_vgb_OmR5hqidOjSyOeo8lhomcHU6Kl4EGhHZVW7ItkV3Qx_VH0Pe8H9lThTNk0ekEKenZhRXx55glQ_mXYbsapu-uLmqGx/s479/events-screenshot.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhWdBWR5I5I1mTXJs721g0ZtFf5ixXLtAqnh96HYgl_ICY1u_vgb_OmR5hqidOjSyOeo8lhomcHU6Kl4EGhHZVW7ItkV3Qx_VH0Pe8H9lThTNk0ekEKenZhRXx55glQ_mXYbsapu-uLmqGx/s958/events-screenshot.png 2x" alt="[Image: Screenshot of C++ code with a list of events.]"/></div>
<p>I made an event-based system where I could input events simulating the button presses and other controls. The system responds to speed change events with a <a href="https://en.wikipedia.org/wiki/Smoothstep" class="external">smoothstep function</a> so that the disc seems to have realistic inertia. Also, the slow startup and slowdown sounds kind of cool this way. Here's an extra-slow version of the effect -- you can hear the slight aliasing artifacts in the very beginning and end:</p>
<div class="audiodiv"><audio controls=""><source src="https://oona.windytan.com/blogfiles/slow-speed-change.mp3"></audio></div>
<h3>3D modeling, texturing, shading</h3>
<p>The models were all made in Blender, a tool that I've slowly learned to use during the pandemic situation. Its modeling tools are pretty fun to use and once you learn it you can build pretty 3D models to look at that won't take up any room in your apartment.</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZGeWmeXgvDoQaLk2DzbwzbB9gftTUJqL6fJP0KPXjMdMAtsKeFbcf9tEzkso-FJHZEnjhGU5gebXX-bJ6JOZrf7K2NrQbjKuapeGXzsfBfopsSjsl2ip6SR_iF5SqRJGEcUN-DqA4sjLA/s1920/modeling-psyche4.jpg"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZGeWmeXgvDoQaLk2DzbwzbB9gftTUJqL6fJP0KPXjMdMAtsKeFbcf9tEzkso-FJHZEnjhGU5gebXX-bJ6JOZrf7K2NrQbjKuapeGXzsfBfopsSjsl2ip6SR_iF5SqRJGEcUN-DqA4sjLA/s500/modeling-psyche4.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZGeWmeXgvDoQaLk2DzbwzbB9gftTUJqL6fJP0KPXjMdMAtsKeFbcf9tEzkso-FJHZEnjhGU5gebXX-bJ6JOZrf7K2NrQbjKuapeGXzsfBfopsSjsl2ip6SR_iF5SqRJGEcUN-DqA4sjLA/s1000/modeling-psyche4.jpg 2x" alt="[Image: A screenshot of Blender with the device without textures.]"/></a></div>
<p>I love the retro aesthetic of old reel-to-reel players and other studio equipment. So I looked for inspiration by searching "reel-to-reel" on Google Images. Try it out, it's worth it! Originally I wanted for the disc to be transparent with some type of movable particles inside, and the laser to be shone through it, but this was computationally very expensive to render. So I made it and 'e-ink' display instead. (I regret this choice a little bit since some people, at first glance, apparently thought the video depicted actual existing technology. But I tried to make it clear it's a photorealistic render :)</p>
<p>I made use of the boolean workflow and bevel modifiers to cut holes and other small details in the hard surfaces. The cables are Bezier curves with the round bevel setting enabled.</p>
<p>The little red LCD is also a video texture on an emission shader – each frame was an SVG that was changed a little in time to add flicker and then exported using Inkscape from a batch script.</p>
<p>The wood textures, fingerprints, and the room environment photo are from HDRi Haven, Texture Haven and CC0 Textures. I'm especially proud of all the details on the disc surface -- here's the shader setup I built for the surface:</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIF0nbsZgcGp5_hTtXkrSDtcjcyTQ0p072dWDuut948V5VK1KbZQJk9LRs6CrsFiM3EPLtez-2a7cPeGIfI-hG9D_VfqWXyrGRcwx7UXDUBm4_Y-p9-oWW_kQFWqokP2Q03raDENghfyB5/s2016/disc-shader.jpg"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIF0nbsZgcGp5_hTtXkrSDtcjcyTQ0p072dWDuut948V5VK1KbZQJk9LRs6CrsFiM3EPLtez-2a7cPeGIfI-hG9D_VfqWXyrGRcwx7UXDUBm4_Y-p9-oWW_kQFWqokP2Q03raDENghfyB5/s500/disc-shader.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIF0nbsZgcGp5_hTtXkrSDtcjcyTQ0p072dWDuut948V5VK1KbZQJk9LRs6CrsFiM3EPLtez-2a7cPeGIfI-hG9D_VfqWXyrGRcwx7UXDUBm4_Y-p9-oWW_kQFWqokP2Q03raDENghfyB5/s1000/disc-shader.jpg 2x" alt="[Image: A Blender texture node map.]"/></a></div>
<p>The video was rendered in Blender Eevee and it took maybe 10 hours at 720p60. It's sad that it's not in 1080p but I was impatient. I spent quite some time to make the little red LCD look realistic but it was completely spoiled by compression!</p>
<p>Here's a bigger still rendered in Cycles:</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgCWvYWzuSFKzHBqwtb5aICiMpF1aJNpDntyNnTZOGh9FY4aObotdk4IB4MmasuG6uLPw2pOTtfVC4xJSTlHWrUpMPsEXQKMasaKnmDXoSHdb9TTHKtWcUIK0s6tyPR_Zu3cSNatrbaiLgw/s1920/psyche-render1.jpg"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgCWvYWzuSFKzHBqwtb5aICiMpF1aJNpDntyNnTZOGh9FY4aObotdk4IB4MmasuG6uLPw2pOTtfVC4xJSTlHWrUpMPsEXQKMasaKnmDXoSHdb9TTHKtWcUIK0s6tyPR_Zu3cSNatrbaiLgw/s500/psyche-render1.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgCWvYWzuSFKzHBqwtb5aICiMpF1aJNpDntyNnTZOGh9FY4aObotdk4IB4MmasuG6uLPw2pOTtfVC4xJSTlHWrUpMPsEXQKMasaKnmDXoSHdb9TTHKtWcUIK0s6tyPR_Zu3cSNatrbaiLgw/s1000/psyche-render1.jpg 2x" alt="[Image: A render of the record player.]"/></a></div>
<h3>Mapping rotation to the disc</h3>
<p>Rotation angles from the C++ program were sampled 60 times per second and output as CSV. These were then imported to Blender as keyframes for the rotating disc, using the Python API:</p>
<div class="kuva keskella fills-mobile"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgoOUqggMM8fUV4EK7B4vh9P71EfJsgxbMDTzZbecqYHCO1GcFs2yPEoDCvdMP7jRljmO4wcMT8ru_mDjRZUOdFMcvj3h-UWXjvH0VX4anuZUmnHn4PYoCUTNq27zi_alHwsQeMF0etPAm_/s635/python-csv-import.jpg" alt="[Image: A screenshot of a Python script.]"/></div>
<p>Here you only need to print a new keyframe when the speed of rotation changes, or is about to change; Blender will interpolate the rest.</p>
<p>A Driver was set up to make the Y rotation slightly follow the Z rotation with a very small factor to make the disc 'wobble' a bit.</p>
<h3>What's next?</h3>
<p>It's endless fun to build and program functional fantasy electronics and I may need to do more of that. I'm currently also obsessed with 3D modeled dollhouses and who knows how those things will combine?</p>
<p>By the way, there is an actual device somewhat resembling this 3D model. It's called the Panoptigon, it's sort of an optical mellotrone (<a href="https://www.youtube.com/watch?v=LhmvLNNFe6A" class="external">video</a>).</p>Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com9tag:blogger.com,1999:blog-5096278891763426276.post-43298157882548774042019-08-24T21:30:00.008+03:002023-08-22T22:43:35.635+03:00Capturing PAL video with an SDR (and a few dead-ends)<p>I play 1980s games, mostly Super Mario Bros., on the Nintendo NES console. It would be great to be able to capture live video from the console for recording speedrun attempts. Now, how to make the 1985 NES and the 2013 MacBook play together, preferably using hardware that I already have? This project diary documents my search for the answer.</p>
<p>Here's a spoiler – it did work:</p>
<div class="kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgI12MLq_5qagdxwQXzc8WOzE-N5cgtWXGSheXiG3H6qnkCWZMuGhQh_Fn88YSbqr7Owrqz8kppdC_p_I88_lEqBUj7fDoYI7-I0dy8yP7WdbcuHv62FKNssU7lpg2eNDtRV42E1WnhmxQD/s1600/IMG_6228.jpg"><img alt="[Image: A powered-on NES console and a MacBook on top of it, showing a Tetris title screen.]" border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgI12MLq_5qagdxwQXzc8WOzE-N5cgtWXGSheXiG3H6qnkCWZMuGhQh_Fn88YSbqr7Owrqz8kppdC_p_I88_lEqBUj7fDoYI7-I0dy8yP7WdbcuHv62FKNssU7lpg2eNDtRV42E1WnhmxQD/s400/IMG_6228.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgI12MLq_5qagdxwQXzc8WOzE-N5cgtWXGSheXiG3H6qnkCWZMuGhQh_Fn88YSbqr7Owrqz8kppdC_p_I88_lEqBUj7fDoYI7-I0dy8yP7WdbcuHv62FKNssU7lpg2eNDtRV42E1WnhmxQD/s800/IMG_6228.jpg 2x"/></a></div>
<h3>Things that I tried first</h3>
<h4>A capture device</h4>
<p>Video capture devices, or capture cards, are devices specially made for this purpose. There was only one cheap (~30€) capture device for composite video available locally, and I bought it, hopingly. But it wasn't readily recognized as a video device on the Mac, and there seemed to be no Mac drivers available. Having already almost capped my budget for this project I then ordered a 5€ EasyCap device from eBay, as there was some evidence of Mac drivers online. The EasyCap was still making its way to Finland as of this writing, so I continued to pursure other routes.</p>
<p>PS: When the device finally arrived, it sadly seemed that the EasyCapViewer-Fushicai software only supports opening this device in NTSC mode. There's PAL support in later commits in the GitHub repo, but the project is old and can't be compiled anymore as Apple has deprecated QuickTime.</p>
<div class="saumaton kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuhuYPWvBeDhXq5EAFBaEC3N6jsSIXM-3ocFVW2fydkeJ4mjZodZipeYOSqcCz27KJMri_SwsFgw7u1ORbACWoEkfu7V-iDzohuOffmcBWxg3fKsBUku60hv2UzLztC1ANOzr9ayid-9RM/s1600/usbtv-screenshot.jpg"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuhuYPWvBeDhXq5EAFBaEC3N6jsSIXM-3ocFVW2fydkeJ4mjZodZipeYOSqcCz27KJMri_SwsFgw7u1ORbACWoEkfu7V-iDzohuOffmcBWxg3fKsBUku60hv2UzLztC1ANOzr9ayid-9RM/s450/usbtv-screenshot.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuhuYPWvBeDhXq5EAFBaEC3N6jsSIXM-3ocFVW2fydkeJ4mjZodZipeYOSqcCz27KJMri_SwsFgw7u1ORbACWoEkfu7V-iDzohuOffmcBWxg3fKsBUku60hv2UzLztC1ANOzr9ayid-9RM/s900/usbtv-screenshot.jpg 2x" /></a></div>
<p>Even when they do work, a downside to many cheap capture devices is that they can only capture at half the true framerate (that is, at 25 or 30 fps).</p>
<!--<h4>Composite video + RedPitaya</h4>
<p>RedPitaya is a device that can take digital samples of </p>-->
<h4>CRT TV + DSLR camera</h4>
<p>The cathode-ray tube television that I use for gaming could be filmed with a digital camera. This posed interesting problems: The camera must be timed appropriately so that a full scan is captured in every frame, to prevent temporal aliasing (stripes). This is why I used a DSLR camera with a full manual mode (Canon EOS 550D in this case).</p>
<p>For the 50 Hz PAL television screen I used a camera frame rate of 25 fps and an exposure time of 1/50 seconds (set by camera limitations). The camera will miss every other frame of the original 50 fps video, but on the other hand, will get an evenly lit screen every time.</p>
<p>A Moiré pattern will also appear if the camera is focused on the CRT shadow mask. This is due to intererence between two regular 2D arrays, the shadow mask in the TV and the CCD array in the camera. I got rid of this by setting the camera on manual focus and defocusing the lense just a bit.</p>
<div class="kuva keskella"><img alt="[Image: A screen showing Super Mario Bros., and a smaller picture with Oona in it.]" border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjgHYuaIsECDkGGbtyVixRSGVw_Dh3ukjHRBxw2EBKtdDbOADwsEJWVeUUfo3BiJ2xP-Lu9W2XTAE7NAScuX3ZL8h2EEHKGqM2LBLfe9e4ogESKxNfmoYgI4EA39TNzHYEaDN0JXZNVPTG/s450/crt-capture.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjgHYuaIsECDkGGbtyVixRSGVw_Dh3ukjHRBxw2EBKtdDbOADwsEJWVeUUfo3BiJ2xP-Lu9W2XTAE7NAScuX3ZL8h2EEHKGqM2LBLfe9e4ogESKxNfmoYgI4EA39TNzHYEaDN0JXZNVPTG/s900/crt-capture.jpg 2x" /></div>
<p>This produced surprisingly good quality video, save for the slight jerkiness caused by the low frame rate (<a class="external" href="https://www.youtube.com/watch?v=TbNo2ndoFag">video</a>). This setup was good for one-off videos; However, I could not use this setup for live streaming, because the camera could only record on the SD card and not connect to the computer directly.</p>
<h4>LCD TV + webcam</h4>
<p>An old LCD TV that I have has significantly less flicker than the CRT, and I could have live video via the webcam. But the Microsoft LifeCam HD-3000 that I have had only a binary option for manual exposure (pretty much "none" and "lots"). Using the higher setting the video was quite washed out, with lots of motion blur. The lower setting was so fast that it looked like the LCD had visible vertical scanning. Brightness was also heavily dependent on viewing angle, which caused gradients over the image. I had to film at a slightly elevated angle so that the upper part of the image wouldn't go too dark, and this made the video look like a bootleg movie copy.</p>
<div class="kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh_XzVzjsytlPbzIsOBJJSZgaDgbHVOi8v2waK0wMfLbhfvNsoSRqp_BjjuHYfRRYk7BKLJ1NygWGJAG_xUBnQPy2Kwi8bQJp3EaRtc4WurxPqR9CPae6oZEks2QT0zIwj0gntzH_W6u8-Y/s1600/lcd-capture.jpg"><img alt="[Image: A somewhat blurry photo of an LCD TV showing Super Mario Bros.]" border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh_XzVzjsytlPbzIsOBJJSZgaDgbHVOi8v2waK0wMfLbhfvNsoSRqp_BjjuHYfRRYk7BKLJ1NygWGJAG_xUBnQPy2Kwi8bQJp3EaRtc4WurxPqR9CPae6oZEks2QT0zIwj0gntzH_W6u8-Y/s350/lcd-capture.jpg" /></a></div>
<!--<h4>Idea: DV camera + FireWire</h4>
<p>DV cameras used to have composite video inputs and digital FireWire outputs, so they could be used for digitising video live. I thought of finding an old DV camera online, but together with a FireWire/Thunderbolt adapter the cost would probably have been over 100€. And I wasn't even sure if a modern Mac would be able to do that.</p>-->
<h3>Composite video</h3>
<p>Now to capturing the actual video signal. The NES has two analog video outputs: one is composite video and the other an RF modulator, which has the same composite video signal modulated onto an AM carrier in the VHF television band plus a separate FM audio carrier. This is meant for televisions with no composite video input: the TV sees the NES as an analog TV station and can tune to it.</p>
<p>In composite video, information about brightness, colour, and synchronisation is encoded in the signal's instantaneous voltage. The bandwidth of this signal is at least 5 MHz, or 10 MHz when RF modulated, which would require a 10 MHz IQ sampling rate.</p>
<div class="saumaton kuva keskella"><img alt="[Image: Oscillogram of one PAL scanline, showing hsync, colour burst, and YUV parts.]" border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgH_zqGuJdjJwVxXRLC3zSt1nIMly-zYw0_gOm2MGJftW623qdMAIHK4o6rufDWshKSfQGn2k4UGmJ1fzGCRzjVQy7uBJfNFzYq9iSvClbvI7a4s3q1882pmnRrP1EvU0KOojvrCnO2uPhr/s400/scanline.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgH_zqGuJdjJwVxXRLC3zSt1nIMly-zYw0_gOm2MGJftW623qdMAIHK4o6rufDWshKSfQGn2k4UGmJ1fzGCRzjVQy7uBJfNFzYq9iSvClbvI7a4s3q1882pmnRrP1EvU0KOojvrCnO2uPhr/s800/scanline.jpg 2x" /></div>
<p>I happen to have an Airspy R2 SDR receiver that can listen to VHF and take 10 million samples per second - could it be possible? I made a cable that can take the signal from the NES RCA connector to the Airspy SMA connector. And sure enough, when the NES RF channel selector is at position "3", a strong signal indeed appears on VHF television channel 3, at around 55 MHz.</p>
<h3>Software choices</h3>
<p>There's already an analog TV demodulator for SDRs - it's a plugin for SDR# called TVSharp. But SDR# is a Windows program and TVSharp doesn't seem to support colour. And it seemed like an interesting challenge to write a real-time PAL demodulator myself anyway.</p>
<p>I had been playing with analog video demodulation recently because of my HDMI Tempest project (<a class="external" href="https://www.youtube.com/watch?v=gJhRRTSDCa0">video</a>). So I had already written a C++ program that interprets a 10 <abbr title="millions of samples per second">Msps</abbr> digitised signal as greyscale values and sync pulses and show it live on the screen. Perhaps this could be used as a basis to build on. (It was not published, but apparently there is a similar project written in Java, called <a class="external" href="https://github.com/martinmarinov/TempestSDR">TempestSDR</a>)</p>
<p>Data transfer from the SDR is done using <span class="code">airspy_rx</span> from airspy-tools. This is piped to my program that reads the data into a buffer, 256 ksamples at a time.</p>
<p>Automatic gain control is an important part of demodulating an AM signal. I used liquid-dsp's <a class="external" href="http://liquidsdr.org/doc/agc/">AGC</a> by feeding it the maximum amplitude over every scanline period; this roughly corresponds to sync level. This is suboptimal, but it works in our high-SNR case. AM demodulation was done using <span class="code">std::abs()</span> on the complex-valued samples. The resulting real value had to be flipped from 1, because TV is transmitted "inverse AM" to save on the power bill. I then scaled the signal so that black level was close to 0, white level close to 1, and sync level below 0.</p>
<p>I use SDL2 to display the video and OpenCV for pixel addressing, scaling, cropping, and YUV-RGB conversions. OpenCV is an overkill dependency inherited from the Tempest project and SDL2 could probably do all of those things by itself. This remains TODO.</p>
<h3>Removing the audio</h3>
<p>The captured AM carrier seems otherwise clean, but there's an interfering peak on the lower sideband side at about –4.5 MHz. I originally saw it in the demodulated signal and thought it would be related to colour, as it's very close to the PAL chroma subcarrier frequency of 4.43361875 MHz. But when it started changing frequency in triangle-wave shapes, I realized it's the audio FM carrier. Indeed, when it is FM demodulated, beautiful NES music can be heard.</p>
<div class="saumaton kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjcdduLn81WyrXsmRsfs6ZD2RNUFx__hikxP8jDtbhAksiWF_76b2dSKONlzn60JpI-KPMAFa5J5_n91vxddlj5__bGMZWL-iddItLklaZETGaBz1G-bUsoUgBojxrGoJ1khV-x7xoYGwB6/s1600/audiocarrier-annotated.jpg"><img alt="[Image: A spectrogram showing the AM carrier centered in zero, with the sidebands, chroma subcarriers and audio alias annotated.]" border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjcdduLn81WyrXsmRsfs6ZD2RNUFx__hikxP8jDtbhAksiWF_76b2dSKONlzn60JpI-KPMAFa5J5_n91vxddlj5__bGMZWL-iddItLklaZETGaBz1G-bUsoUgBojxrGoJ1khV-x7xoYGwB6/s500/audiocarrier-annotated.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjcdduLn81WyrXsmRsfs6ZD2RNUFx__hikxP8jDtbhAksiWF_76b2dSKONlzn60JpI-KPMAFa5J5_n91vxddlj5__bGMZWL-iddItLklaZETGaBz1G-bUsoUgBojxrGoJ1khV-x7xoYGwB6/s1000/audiocarrier-annotated.jpg 2x" /></a></div>
<p>The audio carrier is actually outside this 10 MHz sampled bandwidth. But it's so close to the edge (and so powerful) that the Airspy's anti-alias filter cannot sufficiently attenuate it, and it becomes folded, i.e. <a class="external" href="https://en.wikipedia.org/wiki/Aliasing">aliased</a>, onto our signal. This caused visible banding in the greyscale image, and some synchronization problems.</p>
<p>I removed the audio using a narrow FIR <a class="external" href="https://github.com/jgaeddert/liquid-dsp/blob/master/examples/firfilt_cccf_notch_example.c">notch filter</a> from the liquid-dsp library. Now, the picture quality is very much acceptable. Minor artifacts are visible in narrow vertical lines because of a pixel rounding choice I made, but they can be ignored.</p>
<div class="kuva keskella fills-mobile"><img alt="[Image: Black-and-white screen capture of NES Tetris being played.]" border="0" data-original-height="239" data-original-width="268" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwFvn_csWHfM4RW-1z0z8Qs2OWLsuQTo8H-Ks8apRdv0lc7l4E2fqfQJMtvDXcQPXy8R6lnAS0RQiHstD2QD-QVpctakfFVlg3Vc0DxktbK9z2cx0RLvhxgvLQHk0slV4LGCuLwXMjzWg6/s320/tetris-bw.jpg"/></div>
<h3>Decoding colour</h3>
<p>PAL colour is a bit complicated. It was designed in the 1960s to be backwards compatible with black-and-white TV receivers. It uses the YUV colourspace, the Y or "luminance" channel being a black-and-white sum signal that already looks good by itself. Even if the whole composite signal is interpreted as Y, the artifacts caused by colour information are bearable. Y also has a lot more bandwidth, and hence resolution, than the U and V (chrominance) channels.</p>
<p>U and V are encoded in a chrominance subcarrier in a way that I still haven't quite grasped. The carrier is suppressed, but a burst of carrier is transmitted just before every scanline for reference (so-called colour burst).</p>
<p>Turns out that much of the chroma information can be recovered by band-pass filtering the chrominance signal, mixing it down to baseband using a PLL locked to the colour burst, rotating it by a magic number (<span class="code">chroma *= std::polar(1.f, deg2rad(170.f))</span>), and plotting the real and imaginary parts of this complex number as the U and V colour channels. This is similar to how NTSC colour is demodulated.</p>
<div class="saumaton kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjs50SBVMLHweIAz9fgk8VptoDsy0oZm0u2LXfH8k5YAUwq20Zh9RQCej5dzGzkjKUKc0YjXsNwsD4mrVtnAPhFIlHY0jW0YJtC2s1gKPm8GJWtwd0bVFqx4G5ZCUqzAnZHQx7dbvJZWIy4/s1600/dspchain.png"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjs50SBVMLHweIAz9fgk8VptoDsy0oZm0u2LXfH8k5YAUwq20Zh9RQCej5dzGzkjKUKc0YjXsNwsD4mrVtnAPhFIlHY0jW0YJtC2s1gKPm8GJWtwd0bVFqx4G5ZCUqzAnZHQx7dbvJZWIy4/s500/dspchain.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjs50SBVMLHweIAz9fgk8VptoDsy0oZm0u2LXfH8k5YAUwq20Zh9RQCej5dzGzkjKUKc0YjXsNwsD4mrVtnAPhFIlHY0jW0YJtC2s1gKPm8GJWtwd0bVFqx4G5ZCUqzAnZHQx7dbvJZWIy4/s1000/dspchain.png 2x"/></a></div>
<p>In PAL, every other scanline has its chrominance phase shifted (hence the name, Phase Alternating [by] Line). I couldn't get consistent results demodulating this, so I skipped the chrominance part of every other line and copied it from the line above. This doesn't even look too bad for my purposes. However, there seems to be a pre-echo in UV that's especially visible on a blue background (most of SMB1 sadly), and a faint stripe pattern on the Y channel, most probably crosstalk from the chroma subcarrier that I left intact for now.</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgFPMVeaLGO-ll-fmPdHg7hka4GDK4dxAMxmCysom703v-_8r-Vf5zDdxMXolFrjsYdD6BR6v5J_YZ2tr7-oO3nD1ri8u48wcuwdFE5qmcXl-uOF-H9NJjURQ0GUImH9wQatyx4A6d_IzP9/s1600/chroma-preecho.jpg"><img alt="[Image: The three chroma channels Y, U, and V shown separately as greyscale images, together with a coloured composite of Mario and two Goombas.]" border="0" data-original-height="442" data-original-width="418" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgFPMVeaLGO-ll-fmPdHg7hka4GDK4dxAMxmCysom703v-_8r-Vf5zDdxMXolFrjsYdD6BR6v5J_YZ2tr7-oO3nD1ri8u48wcuwdFE5qmcXl-uOF-H9NJjURQ0GUImH9wQatyx4A6d_IzP9/s320/chroma-preecho.jpg"/></a></div>
<p>I used <span class="code">liquid_firfilt</span> to band-pass the chroma signal, and <span class="code">liquid_nco</span> to lock onto the colour burst and shift the chroma to baseband.</p>
<h3>Let's play Tetris!</h3>
<iframe allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/3TeRvhMBqnE" width="560"></iframe>
<h3>Latency</h3>
<p>It's not my goal to use this system as a gaming display; I'm still planning to use the CRT. However, total buffer delays are quite small due to the 10 Msps sampling rate, so the latency from controller to screen is pretty good. The laptop can also easily decode and render at 50 fps, which is the native frame rate of the PAL NES. Tetris is playable up to level 12!</p>
<p>Using a slow-mo phone camera, I measured the time it takes for a button press to make Mario jump. The latency is similar to that of a NES emulator:</p>
<table class="pretty">
<tbody><tr><th>Method</th><th>Frames @240fps</th><th>Latency</th></tr>
<tr><td>RetroArch emulator</td><td>28</td><td>117 ms</td></tr>
<tr><td>PAL NES + Airspy SDR</td><td>26</td><td>108 ms</td></tr>
<tr><td>PAL NES + LCD TV</td><td>20</td><td>83 ms</td></tr>
</tbody></table>
<p>Maybe you now notice that the CRT is not listed here. That's because before I could make these measurements a funny sound was heard from inside the TV and a picture has never appeared since.</p>
<h3>Performance considerations</h3>
<p>A 2013 MacBook Pro is perhaps not the best choice for dealing with live video to begin with. But I want to be able to run the PAL decoder <i>and</i> a screencap / compositing / streaming client on the same laptop, so performance is even more crucial. </p>
<p>When colour is enabled, CPU usage on this quad-core laptop is 110% for <span class="code">palview</span> and 32% for <span class="code">airspy_rx</span>. The CPU temperature is somewhere around <abbr title="185 °F">85 °C</abbr>. Black-and-white decoding lowers <span class="code">palview</span> usage to 84% and CPU temps to <abbr title="176 °F">80 °C</abbr>. I don't think there's enough cycles left for a streaming client just yet. Some CPU headroom would be nice as well; a resync after dropped samples looks quite nasty, and I wouldn't want that to happen very often.</p>
<div class="kuva keskella"><img alt="[Image: htop screenshot show palview and airspy_rx on top, followed by some system processes.]" border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhTTxyKTqsrUbgApxBxqtoaVRUvrj-uEXBkDHJFn1dvhGRLpmWsmtjbe3N8n9N_o0XnnALUpe5VsKpB-Qcc0uXpQQZsrGr-Lmb6nBzs-3EzM5CG6m7iFpwyCUqaQMy1rMXiaYvKXoerH6d9/s500/htop-screenshot.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhTTxyKTqsrUbgApxBxqtoaVRUvrj-uEXBkDHJFn1dvhGRLpmWsmtjbe3N8n9N_o0XnnALUpe5VsKpB-Qcc0uXpQQZsrGr-Lmb6nBzs-3EzM5CG6m7iFpwyCUqaQMy1rMXiaYvKXoerH6d9/s1000/htop-screenshot.png 2x" /></div>
<p>Profiling reveals that the most CPU-intensive tasks are those related to FIR filtering. FIR filters are based on convolution, which is of high computational complexity, unless done in hardware. FFT convolution can also be faster, but only when the kernel is relatively long.</p>
<div class="saumaton kuva keskella"><img alt="[Image: Diagram shows the Audio notch FIR takes up 27 % and Chroma Bandpass FIR 12 % of CPU. Several smaller contributors mentioned." border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjGYO21h4_lDhrj-qK9YjPE2NhIyflFDD6mk1tWyMTyPmIt7M57gM-fpVqgWmbheLkuHrO0k7skNPKghZ2OKswmo4UkI_YxDJ5Ezuxy5UDoHh5vvxxS2qGHkvdkzRNqyIyy9xwh3OhnMMnt/s450/palview-profiling.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjGYO21h4_lDhrj-qK9YjPE2NhIyflFDD6mk1tWyMTyPmIt7M57gM-fpVqgWmbheLkuHrO0k7skNPKghZ2OKswmo4UkI_YxDJ5Ezuxy5UDoHh5vvxxS2qGHkvdkzRNqyIyy9xwh3OhnMMnt/s900/palview-profiling.png 2x" /></div>
<div class="kuva keskella"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhowkdKouKs3OcdI0VDXDLqh0mbSioxs_MHXk9akzb2ChqTYfN6betq04MUOsC54kP3P0EjniNOjM3CsI3R5fvBtXafW37BmaRmxyOecYDSZyjX3swAdrX3phcgztUlKEu41ciqmN0NAKTl/s400/dspchain-heat.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhowkdKouKs3OcdI0VDXDLqh0mbSioxs_MHXk9akzb2ChqTYfN6betq04MUOsC54kP3P0EjniNOjM3CsI3R5fvBtXafW37BmaRmxyOecYDSZyjX3swAdrX3phcgztUlKEu41ciqmN0NAKTl/s800/dspchain-heat.jpg 2x" /></div>
<p>I've thought of having another computer do the Airspy transfer, audio notch filtering, and AM demodulation, and then transmit this preprocessed signal to the laptop via Ethernet. But my other computers (Raspberry Pi 3B+ and a Core 2 Duo T7500 laptop) are not nearly as powerful as the MacBook.</p>
<p>Instead of a FIR bandpass filter, a so-called chrominance comb filter is often used to separate chrominance from luminance. This could be realized very efficiently as a linear-complexity delay line. This is a promising possibility, but so far my experiments have had mixed results.</p>
<p>There's no source code release for now (Why? <a href="https://www.windytan.com/p/about.html#sourcecode">FAQ</a>), but if you want some real-time coverage of this project, I did a multi-threaded tweetstorm:
<a class="external" href="https://twitter.com/windyoona/status/1160617331208458242">one</a>,
<a class="external" href="https://twitter.com/windyoona/status/1162455012225835009">two</a>,
<a class="external" href="https://twitter.com/windyoona/status/1165204552293015552">three</a>.</p>Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com25tag:blogger.com,1999:blog-5096278891763426276.post-14184937123531822252019-03-15T21:48:00.002+02:002022-10-23T21:43:21.559+03:00Beeps and melodies in two-way radio<p>Lately my listening activities have focused on two-way FM radio. I'm interested in automatic monitoring and visualization of multiple channels simultaneously, and classifying transmitters. There's a lot of in-band signaling to be decoded! This post shall demonstrate this diversity and also explain how my listening station works.</p>
<h3>Background: walkie-talkies are fun</h3>
<p>The frequency band I've recently been listening to the most is called PMR446. It's a European band of radio frequencies for short-distance UHF walkie-talkies. Unlike ham radio, it doesn't require licenses or technical competence – anyone with 50€ to spare can get a pair of walkie-talkies at the department store. It's very similar to FRS in the US. It's quite popular where I live.</p>
<div class="saumaton kuva keskella"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcIlePrhv4X5zhlMjS3-NXPCemkiHAFKtk0XImsucTQCD6s0RVHzqB2y0uBpP1JTCF6DSyrI074B8qr-PxEvvZ497giDNJo9Z28XZAxVG7AFhENmV93bcWDAqpeXmwfwj-2ZD5Q7Q05zs5/s400/puhelimet-tausta.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcIlePrhv4X5zhlMjS3-NXPCemkiHAFKtk0XImsucTQCD6s0RVHzqB2y0uBpP1JTCF6DSyrI074B8qr-PxEvvZ497giDNJo9Z28XZAxVG7AFhENmV93bcWDAqpeXmwfwj-2ZD5Q7Q05zs5/s800/puhelimet-tausta.jpg 2x" data-original-width="800" data-original-height="600" alt="[Image: Photo of three different walkie-talkies.]"/></div>
<p>The short-distance nature of PMR446 is what I find perhaps most fascinating: in normal conditions, everything you hear has been transmitted from a 2-kilometer (1.3-mile) radius. Transmitter power is limited to 500 mW and directional antennas are not allowed on the transmitter side. But I have a receive-only system and a my only directional antenna is for 450 MHz, which is how I originally found these channels.</p>
<h3>Roger beep</h3>
<p>The <em>roger beep</em> is a short melody sent by many hand-held radios to indicate the end of transmission.</p>
<div class="audiodiv"><audio controls=""><source src="https://oona.windytan.com/blogfiles/pmrsounds/a1.mp3"></audio></div>
<p>The end of transmission must be indicated, because two-way radio is 'half-duplex', which means only one person can transmit at a time. Some voice protocols solve the same problem by mandating the use of a specific word like 'over'; others rely on the short burst of static (squelch tail) that can be heard right after the carrier is lost. Roger beeps are especially common in consumer radios, but I've heard them in ham QSOs as well, especially if repeaters are involved.</p>
<h3>Other signaling on PMR</h3>
<p>PMR also differs from ham radio in that many of its users don't want to hear random people talking on the same frequency; indeed, many devices employ tones or digital codes designed to silence unwanted conversations, called CTCSS, DCS, or <strong>coded squelch</strong>. They are very low-frequency tones that can't usually be heard at all because of filtering. These won't prevent others from listening to you though; anyone can just disable coded squelch on their device and hear everyone else on the channel.</p>
<div class="kuva keskella"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjL-59-NNh_0_JU-hGwxZLiTlGp3x3e5ROOEcgsLsGIEuJC-Kdt4s4L9-wXVWbIR-SxyLjP5wRWvmu2oy2FdHC2NOQo2d0Syb3MLRelVpYIObdvdngAwwV4iicgsgHjcokZxyF0HYRvfQGz/s401/baofeng-ste.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjL-59-NNh_0_JU-hGwxZLiTlGp3x3e5ROOEcgsLsGIEuJC-Kdt4s4L9-wXVWbIR-SxyLjP5wRWvmu2oy2FdHC2NOQo2d0Syb3MLRelVpYIObdvdngAwwV4iicgsgHjcokZxyF0HYRvfQGz/s802/baofeng-ste.jpg 2x" width="401" height="245" data-original-width="802" data-original-height="490" /></div>
<p>Many devices also use a tone-based system for preventing the short burst of static, that classic walkie-talkie sound, from sounding whenever a transmission ends. Baofeng calls these <strong>squelch tail elimination</strong> tones, or STE for short. The practice is not standardized and I've seen several different sub-audible frequencies being used in the wild, namely 55, 62, and 260 Hz. (Edit: As correctly pointed out by several people, another way to do this is to reverse the phase of the CTCSS tone in the end, called a 'reverse burst'. Not all radios use it though; many opt to send a 55 Hz tone instead, even when they are using CTCSS.)</p>
<p>Some radios have a button called 'alarm' that sends a long, repeating melody resembling a 90s mobile phone ring tone. These melodies also vary from one radio to the other.</p>
<h3>My receiver</h3>
<p>I have a system in place to alert me whenever there's a strong enough signal matching an interesting set of parameters on any of the eight PMR channels. It's based on a Raspberry Pi 3B+ and an Airspy R2 SDR receiver. The program can play the live audio of all channels simultaneously, or one could be selected for listening. It also has an annotated waterfall view that shows traffic on the band during the last couple of hours:</p>
<div class="saumaton kuva keskella fills-mobile"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEizOo72Y8CO7jCOSswHCLK4csjiubXHra8cqXLAnK9Ri12eMUsp8q43PSo14DN8Oi0TzCHvr4MnpLagOZoN-d-jwQdbf-fO1zgB_5qx-rZ0eMHktmhLLV-TOM0FHnymSyKpWLSqqxtjvcRv/s527/screenshot-pmrsquash.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEizOo72Y8CO7jCOSswHCLK4csjiubXHra8cqXLAnK9Ri12eMUsp8q43PSo14DN8Oi0TzCHvr4MnpLagOZoN-d-jwQdbf-fO1zgB_5qx-rZ0eMHktmhLLV-TOM0FHnymSyKpWLSqqxtjvcRv/s1054/screenshot-pmrsquash.png 2x" data-original-width="1054" data-original-height="630" alt="[Image: A user interface with text-mode graphics, showing eight vertical lanes of timestamped information. The lanes are mostly empty, but there's an occasional colored bar with annotations like 'a1' or '62'.]"/></div>
<p>The computer is a headless Raspberry Pi with only SSH connectivity; that's why it's in text mode. Also, text-mode waterfall plots are cool!</p>
<p>The coloured bars indicate signal strength (colour) and the duty factor (pattern). The numbers around the bars are decoded squelch codes, STEs and roger beeps. Uncertain detections are greyed out. In this view we've detected roger beeps of type 'a1' and 'a2'; a somewhat rare 62 Hz STE tone; and a ring tone, or alarm (RNG).</p>
<p>Because squelch codes are designed to be read by electronic circuits and their frequencies and codewords are specified exactly, writing a digital decoder for them was somewhat straightforward. Roger beeps and ring tones, on the other hand, are only meant for the human listener and detecting them amongst the noise took a bit more trial-and-error.</p>
<h3>Melody detection algorithm</h3>
<p>The melody detection algorithm in my receiver is based on a fast Fourier transform (FFT). When loss of carrier is detected, the last moments of the audio are searched for tones thusly:</p>
<div class="saumaton kuva keskella fills-mobile"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjS9JMtU94yhhuthNknNLnoMeR6IcUMyIeAIRdUae-iduSILXG_yBLBXbiS4DQZ7oTTTawxORfleB7eBTlbpUpXIDljXSP8nCPhB_eDf2Z2TFrAFYJpOHgYVI_V1WsV9mLVEyh22MExkHtW/s511/fft.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjS9JMtU94yhhuthNknNLnoMeR6IcUMyIeAIRdUae-iduSILXG_yBLBXbiS4DQZ7oTTTawxORfleB7eBTlbpUpXIDljXSP8nCPhB_eDf2Z2TFrAFYJpOHgYVI_V1WsV9mLVEyh22MExkHtW/s1022/fft.jpg 2x" data-original-width="1022" data-original-height="682" alt="[Image: A diagram illustrating how an FFT is used to search for a melody. The FFT in the image is noisy and some parts of the melody can not be measured.]"/></div>
<ol>
<li>The audio buffer is divided up into overlapping 60-millisecond Hann-windowed slices.</li>
<li>Every slice is Fourier transformed and all peak frequencies (local maxima) are found. Their center frequencies are refined using Gaussian peak interpolation<a href="#GasiorGonzalez2004" class="ref" title="Improving FFT frequency measurement resolution by parabolic and Gaussian interpolation"> (Gasior & Gonzalez 2004)</a>. We need this, because we're only going to allow ±15 Hz of frequency error.</li>
<li>The time series formed by the strongest maxima is compared to a list of pre-defined 'tone signatures'. Each candidate tone signature gets a score based on how many FFT slices match (<span style="color:#00ff00; font-weight:bold">+</span>) corresponding slices of the tone signature. Slices with too much frequency error subtract from the score (<span style="color:#ff0000; font-weight:bold">–</span>).</li>
<li>Most tone signatures have one or more 'quiet zones', the quietness of which further contributes to the score. This is usually placed after the tone, but some tones may also have a pause in the middle.</li>
<li>The algorithm allows second and third harmonics (with half the score), because some transmitters may distort the tones enough for these to momentarily overpower the fundamental frequency.</li>
<li>Every possible time shift (starting position) inside the 1.5-second audio buffer is searched.</li>
<li>The tone signature with the best score is returned, if this score exceeds a set threshold.</li>
</ol>
<p>This algorithm works quite well. It's not always able to detect the tones, especially if part of the melody is completely lost in noise, but it's good enough to be used for waterfall annotation. False positives are rare; most of them are detections of very short tone signatures that only consist of one or two beeps. My test dataset of 92 recorded transmissions yields only 5 false negatives and no false positives.</p>
<p>For example, this noisy recording:</p>
<div class="audiodiv"><audio controls=""><source src="https://oona.windytan.com/blogfiles/xa_rng.mp3"></audio></div>
<p>was succesfully recognized as having a ringtone (RNG), a roger beep of type a1, and CTCSS code XA:</p>
<div class="saumaton kuva keskella"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhI4uDmgGguyNgRtLmp1T4V-_fGZrkPftBskmlkM1SBxZ7sU-hyAxZgCmjieiryMm_emDwtaaZWXG3mfSdJom1DRCemKstIFxg8JzW6tpCG09mr1m_a0XB0TW3za3Nk_kbe9ziHdSumIFXu/s204/Screenshot+2019-03-27+at+23.47.08.png" data-original-width="204" data-original-height="138" /></div>
<h3>Naming and classification</h3>
<p>Because I love classifying stuff I've had to come up with a system for naming these roger tones as well. My current system uses a lower-case letter for classifying the tone into a category, followed by a number that differentiates similar but slightly different tones. This is a work in progress, because every now and then a new kind of tone appears.</p>
<p>My goal would be to map the melodies to specific manufacturers. I've only managed to map a few. Can you recognise any of these devices?</p>
<table class="pretty">
<tr><th>Class</th><th>Identified model</th><th>Recording</th></tr>
<tr><td>a</td><td>Cobra AM845 (a1)<td><audio controls><source src="https://oona.windytan.com/blogfiles/pmrsounds/a1.mp3"></audio></td></tr>
<tr><td>c</td><td>Motorola TLKR T40 (c1)<td><audio controls><source src="https://oona.windytan.com/blogfiles/pmrsounds/c1.mp3"></audio></td></tr>
<tr><td>d</td><td>?</td><td><audio controls><source src="https://oona.windytan.com/blogfiles/pmrsounds/d.mp3"></audio></td></tr>
<tr><td>e</td><td>?</td><td><audio controls><source src="https://oona.windytan.com/blogfiles/pmrsounds/e2.mp3"></audio></td></tr>
<tr><td>h</td><td>Baofeng UV-5RC</td><td><audio controls><source src="https://oona.windytan.com/blogfiles/pmrsounds/h.mp3"></audio></td></tr>
<tr><td>i</td><td>?</td><td><audio controls><source src="https://oona.windytan.com/blogfiles/pmrsounds/i2.mp3"></audio></td></tr>
</table>
<p>I didn't list them all here, but there are <a href="https://oona.windytan.com/blogfiles/pmrsounds/">even more samples</a>. I've added some alarm tones there as well, and <a href="https://oona.windytan.com/blogfiles/pmrsounds/tones.cc.txt">a list</a> of all the tone signatures that I currently know of. <em>(Why no full source code? <a href="/p/about.html#sourcecode">FAQ</a>)</em></p>
<p>In my rx log I also have an emoji classification system for CTCSS codes. This way I can recognize a familiar transmission faster. A few examples below (there are 38 different CTCSS codes in total):</p>
<div class="saumaton kuva keskella"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg27GQuq2fxG2UrLqto-asPmTgLQkmh_WUJT4Il5P5N3pJdFpJ7xr7vuwxLnyKe799iDx_XTstPhtQBkUsftie-hBdZ6kiR2KIaa3l0E9IBFW32-Yc3V7FU1FrcbsOZTHWTScTwZvYCHrMt/s418/ctcss-emoji.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg27GQuq2fxG2UrLqto-asPmTgLQkmh_WUJT4Il5P5N3pJdFpJ7xr7vuwxLnyKe799iDx_XTstPhtQBkUsftie-hBdZ6kiR2KIaa3l0E9IBFW32-Yc3V7FU1FrcbsOZTHWTScTwZvYCHrMt/s836/ctcss-emoji.png 2x" alt="[Image: Two-character codes grouped into categories and paired with emoji. Four categories, namely fruit, sound, mammals, and scary. The fruit category has codes beginning with an M, and emoji for different fruit, etc.]"/></div>
<h3>Future directions</h3>
<p>There are mainly just minor bugs in my project trello at the moment, like adding the aforementioned emoji. But as the RasPi is not very powerful the DSP chain could be made more efficient. Sometimes a block of samples gets dropped. Currently it uses a <a href="https://en.wikipedia.org/wiki/Undersampling" class="external">bandpass-sampled</a> filterbank to separate the channels, exploiting aliasing to avoid CPU-intensive frequency shifting altogether:</p>
<div class="saumaton kuva keskella"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh92A8NSuKanYK1kedyocPmmRWhf8OvuirNx56ph_hHyYMCVkXOAobtA1Y-c4Bc6YCLMlHOIJtakQnQiUyya3wTHx1uYxZ7X3ASLA_oXxxFpJz_L0OXIw-GcD9Y2YR-aGiiTWBvAMYJ3Sfo/s520/dpschain.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh92A8NSuKanYK1kedyocPmmRWhf8OvuirNx56ph_hHyYMCVkXOAobtA1Y-c4Bc6YCLMlHOIJtakQnQiUyya3wTHx1uYxZ7X3ASLA_oXxxFpJz_L0OXIw-GcD9Y2YR-aGiiTWBvAMYJ3Sfo/s1040/dpschain.png 2x" data-original-width="1040" data-original-height="586" /></div>
<p>This is quite fast. But the 1:20 decimation from the Airspy IQ data is done with SoX's 1024-point FIR filter and could possibly be done with fewer coefficients. Also, the RasPi has four cores, so half of the channels could be demodulated in a second thread. Currently all concurrency is thanks to SoX and pmrsquash being different processes.</p>
<h3>Related posts</h3>
<ul><li><a href="/2016/10/ctcss-fingerprinting-method-for.html">CTCSS fingerprinting: a method for transmitter identification</a></li></ul>
<h3>References</h3>
<ul class="references">
<li id="GasiorGonzalez2004">Gasior, M., Gonzalez, J.L. (2004): <a href="https://cds.cern.ch/record/720344/files/ab-note-2004-021.pdf" class="external">Improving FFT frequency measurement resolution by parabolic and Gaussian interpolation</a>.</li>
</ul>
Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com11tag:blogger.com,1999:blog-5096278891763426276.post-58099773398252715772017-12-30T14:15:00.001+02:002023-08-07T16:41:41.735+03:00Animated line drawings with OpenCV<p>OpenCV is a pretty versatile C++ computer vision library. Because I use it every day it has also become my go-to tool for creating simple animations at pixel level, for fun, and saving them as video files. This is not one of its core functions but happens to be possible using its GUI drawing tools.</p>
<p>Below we'll take a look at some video art I wrote for a music project. It goes a bit further than just line drawings but the rest is pretty much just flavouring. As you'll see, creating images in OpenCV has a lot in common with how you would work with layers and filters in an image editor like GIMP or Photoshop.</p>
<h3>Setting it up</h3>
<p>It doesn't take a lot of boilerplate to initialize an OpenCV project. Here's my minimal CMakeLists.txt:</p>
<pre><code class="cmake">cmake_minimum_required (VERSION 2.8)
project (marmalade)
find_package (OpenCV REQUIRED)
add_executable (marmalade marmalade.cc)
target_link_libraries (marmalade ${OpenCV_LIBS})</code></pre>
<p>I also like to set compiler flags to enforce the C++11 standard, but this is not necessary.</p>
<p>In the main <span class="code">.cc</span> file I have:</p>
<pre><code class="cc">#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/highgui/highgui.hpp"</code></pre>
<p>Now you can build the project by just typing <span class="code">cmake . && make</span> in the terminal.</p>
<h3>Basic shapes</h3>
<p>First, we'll need an empty canvas. It will be a matrix (<span class="code">cv::Mat</span>) with three unsigned char channels for RGB at Full HD resolution:</p>
<pre><code>const cv::Size video_size(1920, 1080);
cv::Mat mat_frame = cv::Mat::zeros(video_size, CV_8UC3);</code></pre>
<p>This will also initialize everything to zero, i.e. black.</p>
<p>Now we can draw our graphics!</p>
<p>I had an initial idea of an endless cascade of concentric rings each rotating at a different speed. There might be color and brightness variations as well but otherwise it would stay static the whole time. You can't see a circle's rotation around its center, so we'll add some features to them as well, maybe some kind of bars or spokes.</p>
<p>A simplified render method for a ring would look like this:</p>
<pre><code class="cc">void Ring::<span class="hljs-function">RenderTo</span>(cv::<span class="hljs-builtin">Mat</span>& mat_output) const {
cv::<span class="hljs-builtin">circle</span>(mat_output, 8 * center_, 8 * radius_, color_, 1, <span class="hljs-symbol">CV_AA</span>, 3);
for (const Bar& bar : bars()) {
cv::<span class="hljs-builtin">line</span>(mat_output, 8 * (center_ + bar.start), 8 * (center_ + bar.end),
color_, 1, <span class="hljs-symbol">CV_AA</span>, 3);
}
}</code></pre>
<p>Drawing antialiased graphics at subpixel coordinates can make for some confusing OpenCV code. Here, all coordinates are multiplied by the magic number 8 and the drawing functions are instructed to do a bit shift of 3 bits (2^3 == 8). These three bits are used for the decimal part of the subpixel position.</p>
<p>The coordinates of the bars are generated for each frame based on the ring's current rotation angle.</p>
<p>Here are some rings at different phases of rotation. A bug leaves the innermost circle with no spokes, but it kind of looks better that way.</p>
<div class="kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEicIPnEytmyXedVbulIXkF378vzJR4EYLKX3ygpqWbJJcn-6p_AGQUB3jt2R4vKotN_0dxB5edNDpQAe4Rc1w6SKiCOyLRbKJR4USqIxof6rgvea29M71-q_0ymcPiuUHtZBZGmOSLLlRZp/s1600/spokes.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEicIPnEytmyXedVbulIXkF378vzJR4EYLKX3ygpqWbJJcn-6p_AGQUB3jt2R4vKotN_0dxB5edNDpQAe4Rc1w6SKiCOyLRbKJR4USqIxof6rgvea29M71-q_0ymcPiuUHtZBZGmOSLLlRZp/s450/spokes.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEicIPnEytmyXedVbulIXkF378vzJR4EYLKX3ygpqWbJJcn-6p_AGQUB3jt2R4vKotN_0dxB5edNDpQAe4Rc1w6SKiCOyLRbKJR4USqIxof6rgvea29M71-q_0ymcPiuUHtZBZGmOSLLlRZp/s900/spokes.png 2x" data-original-width="900" data-original-height="506" alt="[Image: White concentric circles on a black background, with evenly separated lines connecting them.]"/></a></div>
<h3>Eye candy: Glow effect</h3>
<p>I wanted a subtle vector display look to the graphics, even though I wasn't aiming for any sort of realism with it. So the brightest parts of the image would have to glow a little, or spread out in space. This can be done using Gaussian blur.</p>
<p>Gaussian blur requires convolution, which is very CPU-intensive. I think most of the rendering time was spent calculating blur convolution. It could be sped up using threads (<span class="code">cv::parallel_for_</span>) or the GPU (<span class="code">cv::cuda</span> routines) but there was no real-time requirement in this hobby project.</p>
<p>There are a couple of ways to only apply the blur to the brightest pixels. We could blur a copy of the image masked with its thresholded version, for example. But I like to use look-up tables (LUT). This is similar to the curves tool in Photoshop. A look-up table is just a 256-by-1 RGB matrix that maps an 8-bit index to a colour. In this look-up table I just have a linear ramp where everything under 127 maps to black.</p>
<pre><code class="cc">cv::Mat mat_lut = GlowLUT();
cv::Mat mat_glow;
cv::LUT(mat_frame, mat_lut, mat_glow);</code></pre>
<p>Now when blurring, if we add the original image on top of the blurred version, its sharpness is preserved:</p>
<pre><code class="cc">cv::GaussianBlur(mat_glow, mat_glow, cv::Size(0,0), 3.0);
mat_frame += 2 * mat_glow;</code></pre>
<div class="kuva keskella"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNAONoJIEObnIVGE-AbyecROvA4VkTER3ic2TDQ5nHXI5GYfvHEejAruFS33WyLTCruuN0lO2Mo4AWXOQL8iQYZLhpvQ9jExSwQmRz2diVCoU3jPWjjB7VqVJwD2Z2oRXRSzM2JHdaAAx7/s450/glow-zoomed.png" data-original-width="450" data-original-height="252" alt="[Image: A zoomed view of a circle, showing the glow effect.]"/></div>
<p>The effect works unevenly on antialiased lines which adds a nice pearl-stringy look.</p>
<h3>Eye candy: Tinted glass and grid lines</h3>
<p>I created a vignetted and dirty green-yellow tinted look by multiplying the image per-pixel by an overlay made in GIMP. This has the same effect as having a "Multiply" layer mode in an image editor. Perhaps I was thinking of an old glass display, or Vectrex overlays. The overlay also has black grid lines that will appear black in the result. Multiplication doesn't change the color of black areas in the original, but I also added a copy of the overlay at 10% brightness to make it dimly visible in the background.</p>
<pre><code class="cc">cv::Mat mat_overlay = cv::imread("overlay.png");
cv::multiply(mat_frame, mat_overlay, mat_frame, 1.f/255);
mat_frame += mat_overlay * 0.1f;</code></pre>
<div class="kuva keskella"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjnljOUCNrwy85yDcP0YzmqcEFu7f6W0uqegcsrFkhj4jTG9oMivCOJRrCAWgMyjYAAI0TKTXQZ-PwRMmlhrR9BpeBiFh9yFaZCUxI9MXxrIGbfCA78zvSrbs0TcLcmJwOXnCpUrBLJqWd2/s450/overlay-effect.png" alt="[Image: A zoomed view of a circle, showing the color overlay effect.]"/></div>
<h3>Eye candy: Flicker</h3>
<p>Some objects flicker slightly for an artistic effect. This can be headache-inducing if overdone, so I tried to use it in moderation. The rings have a per-frame probability for a decrease in brightness, which I think looks good at 60 fps.</p>
<pre><code class="cc">if (randf(0.f, 1.f) < .0001f)
color *= .5f;
</code></pre>
<p>The spokes will also sometimes blink upon encountering each other, and the whole ring flickers a bit when it first becomes visible.</p>
<h3>Title text</h3>
<p>An LCD matrix font was used for the title text. This was just a PNG image of 128 characters that was spliced up and rearranged. This can be done in OpenCV by using submatrices and rectangle ROIs:</p>
<pre><code class="cc">cv::Mat mat_font = cv::imread("lcd_font.png");
const cv::Size letter_size(24, 32);
const std::string text("finally, the end of the "
"marmalade forest!");
int cursor_x = 0;
for (char code : text_) {
int mx = code % 32;
int my = code / 32;
cv::Rect font_roi(cv::Point(mx * letter_size.width,
my * letter_size.height),
letter_size);
cv::Mat mat_letter = mat_font(font_roi);
cv::Rect target_roi(text_origin_.x + cursor_x,
text_origin_.y,
mat_letter.cols, mat_letter.rows)
mat_letter.copyTo(mat_output(target_roi));
cursor_x += letter_size.width;
}</code></pre>
<div class="kuva keskella"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEirmsMDeVOsnMtHYkFwICo5_uXRRFuUEnk_Cyewpe2mqHEJkIWdWt3I3rgMvDdcy0SAM5FNFoYKxekllK3zP53uhHpMLmk_C69yUSLlc_YoR0KlOAy0eaGJb7RWFvXoPS5mpO9-zAt2IdHA/s450/finally.png" data-original-width="450" data-original-height="192" alt="[Image: A zoomed view of the text 'finally' with a glow and color overlay effect.]"/></div>
<h3>Encoding the video</h3>
<p>Now we can save the frames as a video file. OpenCV has a <span class="code">VideoWriter</span> class for just this purpose. But I like to do this a bit differently. I encoded the frame images individually as BMP and just concatenated them one after the other to stdout:</p>
<pre><code>std::vector<uchar> outbuf;
cv::imencode(".bmp", mat_frame, outbuf);
fwrite(outbuf.data(), sizeof(uchar), outbuf.size(), stdout);</code></pre>
<p>I then ran this program from a shell script that piped the output to <span class="code">ffmpeg</span> for encoding. This way I could also combine it with the soundtrack in a single run.</p>
<pre><code class="sh">make && \
./marmalade -p | \
ffmpeg -y -i $AUDIOFILE -framerate $FPS -f image2pipe \
-vcodec bmp -i - -s:v $VIDEOSIZE -c:v libx264 \
-profile:v high -b:a 192k -crf 23 \
-pix_fmt yuv420p -r $FPS -shortest -strict -2 \
video.mp4 && \
open video.mp4</code></pre>
<h3>Result</h3>
<p>The 1080p/60 version can be viewed by clicking on the gear wheel menu.</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/a78ec5YdHbE?rel=0" frameborder="0" gesture="media" allow="encrypted-media" allowfullscreen></iframe>Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com3tag:blogger.com,1999:blog-5096278891763426276.post-32634868083182252072017-11-25T16:03:00.003+02:002022-10-24T11:03:02.287+03:00In pursuit of Otama's tone<p>It would be fun to use the <em>Otamatone</em> in a musical piece. But for someone used to keyboard instruments it's not so easy to play cleanly. It has a touch-sensitive (resistive) slider that spans roughly two octaves in just 14 centimeters, which makes it very sensitive to finger placement. And in any case, I'd just like to have a programmable virtual instrument that sounds like the Otamatone.<p>
<p>What options do we have, as hackers? Of course the slider could be replaced with a MIDI interface, so that we could use a piano keyboard to hit the correct frequencies. But what if we could synthesize a similar sound all in software?</p>
<h3>Sampling via microphone</h3>
<p>We'll have to take a look at the waveform first. The Otamatone has a piercing electronic-sounding tone to it. One is inclined to think the waveform is something quite simple, perhaps a sawtooth wave with some harmonic coloring. Such a primitive signal would be easy to synthesize.</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEheF9HQ5A_3NXK-RZkItzL19UQsYH19clTdD9zIwcqeYbYWLDmuYtQcu08wrquO59vbbdyedkzN7Tx_ZEdHp9Rw2s8b-VlmZNVvOFn6OBEctWfa1qXjbj0qKJpcyWbZNv5eW_JdosnzZ-jA/s1600/popfilter.jpg" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEheF9HQ5A_3NXK-RZkItzL19UQsYH19clTdD9zIwcqeYbYWLDmuYtQcu08wrquO59vbbdyedkzN7Tx_ZEdHp9Rw2s8b-VlmZNVvOFn6OBEctWfa1qXjbj0qKJpcyWbZNv5eW_JdosnzZ-jA/s500/popfilter.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEheF9HQ5A_3NXK-RZkItzL19UQsYH19clTdD9zIwcqeYbYWLDmuYtQcu08wrquO59vbbdyedkzN7Tx_ZEdHp9Rw2s8b-VlmZNVvOFn6OBEctWfa1qXjbj0qKJpcyWbZNv5eW_JdosnzZ-jA/s1000/popfilter.jpg 2x" alt="[Image: A pink Otamatone in front of a microphone. Next to it a screenshot of Audacity with a periodic but complex waveform in it.]" data-original-width="1600" data-original-height="801" /></a></div>
<p>A friend lended me her Otamatone for recording purposes. Turns out the wave is nothing that simple. It's not a sawtooth wave, nor a square wave, no matter how the microphone is placed. But it sounds like one! Why could that be?</p>
<p>I suspect this is because the combination of speaker and air interface filters out the lowest harmonics (and parts of the others as well) of square waves. But the human ear still recognizes the residual features of a more primitive kind of waveform.</p>
<h3>We have to get to the source!</h3>
<p>Sampling the input voltage to the Otamatone's speaker could reveal the original signal. Also, by recording both the speaker input and the audio recorded via microphone, we could perhaps devise a software filter to simulate the speaker and head resonance. Then our synthesizer would simplify into a simple generator and filter. But this would require opening up the instrument and soldering a couple of leads in, to make a Line Out connector. I'm not doing this to my friend's Otamatone, so I bought one of my own. I named it <em>TÄMÄ</em>.</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEha-l4G4z3zvNaB_4UpCaVp3G-JkeZgIYI_7q0dfXYQwygBWYMBfXmQsDt1Qvpw0hkwMNTXLjaJE2xz9U7dv-S1yTNSVL8TSH9PhVSNMhZlCCN0QH4aeilTxxOl96IHU22LsseUR7nJ0Yad/s1600/otamalineout.jpg" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEha-l4G4z3zvNaB_4UpCaVp3G-JkeZgIYI_7q0dfXYQwygBWYMBfXmQsDt1Qvpw0hkwMNTXLjaJE2xz9U7dv-S1yTNSVL8TSH9PhVSNMhZlCCN0QH4aeilTxxOl96IHU22LsseUR7nJ0Yad/s420/otamalineout.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEha-l4G4z3zvNaB_4UpCaVp3G-JkeZgIYI_7q0dfXYQwygBWYMBfXmQsDt1Qvpw0hkwMNTXLjaJE2xz9U7dv-S1yTNSVL8TSH9PhVSNMhZlCCN0QH4aeilTxxOl96IHU22LsseUR7nJ0Yad/s840/otamalineout.jpg 2x" data-original-width="1600" data-original-height="1200" alt="[Image: A Black Otamatone with a cable coming out of its mouth into a USB sound card. A waveform with more binary nature is displayed on a screen.]"/></a></div>
<p>I soldered the left channel and ground to the same pads the speaker is connected to. I had no idea about the voltage range in advance, but fortunately it just happens to fit line level and not destroy my sound card. As you can see in the background, we've recorded a signal that seems to be a square wave with a low duty cycle.</p>
<div class="saumaton kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpR7HiGi6gtLxjIOXF1q109KtIkKXlX7cN7OAuUa27KLATtKaz0qJ0PxVN3lt3Uf7WvUWo2I3SMpPUV056zLbiDCDhlupMJot1bjHwxWs40sc7jc-x32deNcM0VZariy8N1lW6pb4AK5kk/s1600/Screen+Shot+2017-11-24+at+17.24.36.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpR7HiGi6gtLxjIOXF1q109KtIkKXlX7cN7OAuUa27KLATtKaz0qJ0PxVN3lt3Uf7WvUWo2I3SMpPUV056zLbiDCDhlupMJot1bjHwxWs40sc7jc-x32deNcM0VZariy8N1lW6pb4AK5kk/s450/Screen+Shot+2017-11-24+at+17.24.36.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpR7HiGi6gtLxjIOXF1q109KtIkKXlX7cN7OAuUa27KLATtKaz0qJ0PxVN3lt3Uf7WvUWo2I3SMpPUV056zLbiDCDhlupMJot1bjHwxWs40sc7jc-x32deNcM0VZariy8N1lW6pb4AK5kk/s900/Screen+Shot+2017-11-24+at+17.24.36.png 2x" alt="[Image: Oscillogram of a square wave.]" data-original-width="1600" data-original-height="494" /></a></div>
<p>This square wave seems to be superimposed with a much quieter sinusoidal "ring" at 584 Hz that gradually fades out in 30 milliseconds.</p>
<p>Next we need to map out the effect the finger position on the slider has on this signal. It seems to not only change the frequency but the duty cycle as well. This happens a bit differently depending on which one of the three octave settings (LO, MID, or HI) is selected.</p>
<p>The Otamatone has a huge musical range of over 6 octaves:</p>
<div class="saumaton kuva keskella fills-mobile"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgV30yNuAN8hCumOCKRSM6a1eySsZbX25hZMl-irAbnntWoNPyfCxa1mB1RrRYp4ekcFdxgKsDBmTV4n1k1bOxE5RP3yMlJQNuerZxHXdZ22VBA_wsWzEMfEoKeBcDv5EWW77ZDqTlDn7BM/s400/otamatone-range.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgV30yNuAN8hCumOCKRSM6a1eySsZbX25hZMl-irAbnntWoNPyfCxa1mB1RrRYp4ekcFdxgKsDBmTV4n1k1bOxE5RP3yMlJQNuerZxHXdZ22VBA_wsWzEMfEoKeBcDv5EWW77ZDqTlDn7BM/s800/otamatone-range.png 2x" data-original-width="1344" data-original-height="486" alt="[Image: Musical notation showing a range from A1 to B7.]" /></div>
<p>In frequency terms this means roughly 55 to 3800 Hz.</p>
<p>The duty cycle changes according to where we are on the slider: from 33 % in the lowest notes to 5 % in the highest ones, on every octave setting. The frequency of the ring doesn't change, it's always at around 580 Hz, but it doesn't seem to appear at all on the HI setting.</p>
<p>So I had my Perl-based software synth generate a square wave whose duty cycle and frequency change according to given MIDI notes.</p>
<h3>FIR filter 1: not so good</h3>
<p>Raw audio generated this way doesn't sound right; it needs to be filtered to simulate the effects of the little speaker and other parts.</p>
<p>Ideally, I'd like to simulate the speaker and head resonances as an impulse response, by feeding well-known impulses into the speaker. The generated square wave could then be convolved with this response. But I thought a simpler way would be to create a custom FIR frequency response in REAPER, by visually comparing the speaker input and microphone capture spectra. When their spectra are laid on top of each other, we can read the required frequency response as the difference between harmonic powers, using the cursor in baudline. No problem, it's just 70 harmonics until we're outside hearing range!</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjtmPUtBeb_Dr02WSuLA8rcxRlnvZa5aOl2ajaE2kD8i4DtsKFPihJbEkH4f3rpfEsDldTuqv_bqfHEqgyX8yi_xTANmuzXneUIKPYn6EgLJiIXqzeN2RiV5WdgzNMtQsHeXxzuz1rGu2Qz/s1600/harmopowers.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjtmPUtBeb_Dr02WSuLA8rcxRlnvZa5aOl2ajaE2kD8i4DtsKFPihJbEkH4f3rpfEsDldTuqv_bqfHEqgyX8yi_xTANmuzXneUIKPYn6EgLJiIXqzeN2RiV5WdgzNMtQsHeXxzuz1rGu2Qz/s480/harmopowers.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjtmPUtBeb_Dr02WSuLA8rcxRlnvZa5aOl2ajaE2kD8i4DtsKFPihJbEkH4f3rpfEsDldTuqv_bqfHEqgyX8yi_xTANmuzXneUIKPYn6EgLJiIXqzeN2RiV5WdgzNMtQsHeXxzuz1rGu2Qz/s960/harmopowers.png 2x" data-original-width="1600" data-original-height="1039" alt="[Image: Screenshot of Baudline showing lots of frequency spikes, and next to it a CSV list of dozens of frequencies and power readings in the Vim editor.]"/></a></div>
<p>I then subtracted one spectrum from another and manually created a ReaFir filter based on the extrema of the resulting graph.</p>
<div class="kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinekeR8qdGsDJ5sWXxH_9G9jPzvQonFuC2tCSUu5GFzDFCfUW2T93QEIAYoTQ9gWDq-ps4YT6tw_0cK8ePiRheRynLOTBQ511rDIt62Zh5Jt2AJKW0wxbcw168nQqNWTkZzvRoPdAOYbku/s1600/Screen+Shot+2017-11-19+at+10.21.26.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinekeR8qdGsDJ5sWXxH_9G9jPzvQonFuC2tCSUu5GFzDFCfUW2T93QEIAYoTQ9gWDq-ps4YT6tw_0cK8ePiRheRynLOTBQ511rDIt62Zh5Jt2AJKW0wxbcw168nQqNWTkZzvRoPdAOYbku/s480/Screen+Shot+2017-11-19+at+10.21.26.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinekeR8qdGsDJ5sWXxH_9G9jPzvQonFuC2tCSUu5GFzDFCfUW2T93QEIAYoTQ9gWDq-ps4YT6tw_0cK8ePiRheRynLOTBQ511rDIt62Zh5Jt2AJKW0wxbcw168nQqNWTkZzvRoPdAOYbku/s960/Screen+Shot+2017-11-19+at+10.21.26.png 2x" data-original-width="1600" data-original-height="636" alt="[Image: Screenshot of REAPER's FIR filter editor, showing a frequency response made out of nodes and lines interpolated between them.]"/></a></div>
<p>Because the Otamatone's mouth can be twisted to make slightly different vowels I recorded two spectra, one with the mouth fully closed and the other one as open as possible.</p>
<p>But this method didn't quite give the sound the piercing nasalness I was hoping for.</p>
<h3>FIR filter 2: better</h3>
<p>After all that work I realized the line connection works in both directions! I can just feed any signal and the Otamatone will sound it via the speaker. So I generated a square wave in Audacity, set its frequency to 35 Hz to accommodate 30 milliseconds of response, played it via one sound card and recorded via another one:</p>
<div class="saumaton kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjYJqCo1whmphJf0z81cYQFYl1fEZUGrMwVWOX0Hj9tmWNOEvgSAflGjXj7KlwFm4o7glpza3Q5EWV_CIas05s6WRXcth5O8Qjq8Z0s6iRLghEmTA_UW_QaTFiidiYknipr48taEvLcroZg/s1600/Screen+Shot+2017-11-22+at+8.19.33.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjYJqCo1whmphJf0z81cYQFYl1fEZUGrMwVWOX0Hj9tmWNOEvgSAflGjXj7KlwFm4o7glpza3Q5EWV_CIas05s6WRXcth5O8Qjq8Z0s6iRLghEmTA_UW_QaTFiidiYknipr48taEvLcroZg/s450/Screen+Shot+2017-11-22+at+8.19.33.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjYJqCo1whmphJf0z81cYQFYl1fEZUGrMwVWOX0Hj9tmWNOEvgSAflGjXj7KlwFm4o7glpza3Q5EWV_CIas05s6WRXcth5O8Qjq8Z0s6iRLghEmTA_UW_QaTFiidiYknipr48taEvLcroZg/s900/Screen+Shot+2017-11-22+at+8.19.33.png 2x" data-original-width="1510" data-original-height="602" alt="[Image: Two waveforms, the top one of which is a square wave and the bottom one has a slowly decaying signal starting at every square transition.]"/></a></div>
<p>The waveform below is called the step response. One of the repetitions can readily be used as a FIR convolution kernel. Strictly, to get an <em>impulse</em> response would require us to sound a unit impulse, i.e. just a single sample at maximum amplitude, not a square wave. But I'm not redoing that since recording this was hard enough already. For instance, I had to turn off the fridge to minimize background noise. I forgot to turn it back on, and now I have a box of melted ice cream and a freezer that smells like salmon. The step response gives pretty good results.</p>
<p>One of my favorite audio tools, <span class="code">sox</span>, can do FFT convolution with an impulse response. You'll have to save the impulse response as a whitespace-separated list of plaintext sample values, and then run <span class="code">sox original.wav convolved.wav fir response.csv</span>.</p>
<p>Or one could use a VST plugin like FogConvolver:</p>
<div class="kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEix0nl71C6FiN1Gal1S3F8aETwyyR4KlLG3AmKv9ULa2XqbK63uBWSvNlYSxeGoCyRPfN4Es_9MEw3qNCEdMr8RxEHuSwZfqqjrf_L4Tt8cQYMPm46JzLeScF3klY3U0ssLoKgiArTfrbia/s1600/screenshot-fogbank.jpg" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEix0nl71C6FiN1Gal1S3F8aETwyyR4KlLG3AmKv9ULa2XqbK63uBWSvNlYSxeGoCyRPfN4Es_9MEw3qNCEdMr8RxEHuSwZfqqjrf_L4Tt8cQYMPm46JzLeScF3klY3U0ssLoKgiArTfrbia/s450/screenshot-fogbank.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEix0nl71C6FiN1Gal1S3F8aETwyyR4KlLG3AmKv9ULa2XqbK63uBWSvNlYSxeGoCyRPfN4Es_9MEw3qNCEdMr8RxEHuSwZfqqjrf_L4Tt8cQYMPm46JzLeScF3klY3U0ssLoKgiArTfrbia/s900/screenshot-fogbank.jpg 2x" data-original-width="1594" data-original-height="1286" alt="[Image: A screenshot of Fog Convolver.]"/></a></div>
<h3>A little organic touch</h3>
<p>There's more to an instrument's sound than its frequency spectrum. The way the note begins and ends, the so-called attack and release, are very important cues for the listener.</p>
<p>The width of a player's finger on the Otamatone causes the pressure to be distributed unevenly at first, resulting in a slight glide in frequency. This also happens at note-off. The exact amount of Hertz to glide depends on the octave, and by experimentation I stuck with a slide-up of 5 % of the target frequency in 0.1 seconds.</p>
<p>It is also very difficult to hit the correct note, so we could add some kind of random tuning error. But turns out this is would be too much; I want the music to at least be in tune.</p>
<p>Glides (glissando) are possible with the virtual instrument by playing a note before releasing the previous one. This glissando also happens in 100 milliseconds. I think it sounds pretty good when used in moderation.</p>
<p>I read somewhere (Wikipedia?) that vibrato is also possible with Otamatone. I didn't write a vibratio feature in the code itself, but it can be added using a VST plugin in REAPER (I use MVibrato from MAudioPlugins). I also added a slight flanger with inter-channel phase difference in the sample below, to make the sound just a little bit easier on the ears (but not too much).</p>
<div class="audiodiv"><audio controls=""><source src="https://oona.windytan.com/blogfiles/lucid-otama-05.mp3">(HTML5 audio: Synthesized Otamatone sample.)</audio></div>
<p>Sometimes the Otamatone makes a short popping sound, perhaps when finger pressure is not firm enough. I added a few of these randomly after note-off.</p>
<h3>Working with MIDI</h3>
<p>We're getting on a side track, but anyway. Working with MIDI used to be straightforward on the Mac. But GarageBand, the tool I currently use to write music, amazingly doesn't have a MIDI export function. However, you can "File -> Add Region To Loop Library", then find the AIFF file in the loop library folder, and use a tool called GB2MIDI to extract MIDI data from it.</p>
<p>I used mididump from <a href="https://github.com/vishnubob/python-midi" class="external">python-midi</a> to read MIDI files.</p>
<h3>Tyna Wind - lucid future vector</h3>
<p>Here's <em>TÄMÄ</em>'s beautiful synthesized voice singing us a song.</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/X3k7HD28RbQ?rel=0" frameborder="0" allowfullscreen></iframe>Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com5tag:blogger.com,1999:blog-5096278891763426276.post-90585134347200417282017-09-12T23:31:00.003+03:002022-10-23T21:45:01.156+03:00Descrambling split-band voice inversion with deinvert<p>Voice inversion is a primitive method of rendering speech unintelligible to prevent eavesdropping of radio or telephone calls. I wrote about some simple ways to reverse it in <a href="http://www.windytan.com/2013/05/descrambling-voice-inversion.html" title="absorptions: Descrambling the voice inversion scrambler">a previous post</a>. I've since written a software tool, deinvert (<a href="https://github.com/windytan/deinvert" title="github: windytan/deinvert" class="external">on GitHub</a>), that does all this for us. It can also descramble a slightly more advanced scrambling method called split-band inversion. Let's see how that happens behind the scenes.</p>
<h3>Simple voice inversion</h3>
<p>Voice inversion works by inverting the audio spectrum at a set maximum frequency called the inversion carrier. Frequencies near this carrier will thus become frequencies near zero Hz, and vice versa. The resulting audio is unintelligible, though familiar sentences can easily be recognized.</p>
<div class="audiodiv"><audio controls=""><source src="https://oona.windytan.com/blogfiles/inverted-llama.mp3">(HTML5 audio: Inverted speech.)</audio></div>
<p><em>Deinvert</em> comes with 8 preset carrier frequencies that can be activated with the <span class="code">-p</span> option. These correspond to a list of carrier frequencies I found in an actual scrambler's manual, dubbed "the most commonly used inversion carriers".</p>
<p>The algorithm behind <em>deinvert</em> can be divided into three phases: 1) pre-filtering, 2) mixing, and 3) post-filtering. Mixing means multiplying the signal by an oscillation at the selected carrier frequency. This produces two <em>sidebands</em>, or mirrored copies of the signal, with the lower one frequency-inverted. Pre-filtering is necessary to prevent this lower sideband from aliasing when its highest components would go below zero Hertz. Post-filtering removes the upper sideband, leaving just the inverted audio. Both filters can be realized as low-pass FIR filters.</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgN5_Zy6ARkoMhJuBl96gzNm3WDMRfEiBqZzb1_E1eQqlQdBvClPr6LvjqGyHfxHVIs88Z0ly9zzGswfkGgqwNsYxwHSZRJ_bMmfq8NKyqaWhM8PeYDYFqKUCeTSPKL72VISb-vycG-6KAL/s1600/invert-spectro.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgN5_Zy6ARkoMhJuBl96gzNm3WDMRfEiBqZzb1_E1eQqlQdBvClPr6LvjqGyHfxHVIs88Z0ly9zzGswfkGgqwNsYxwHSZRJ_bMmfq8NKyqaWhM8PeYDYFqKUCeTSPKL72VISb-vycG-6KAL/s500/invert-spectro.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgN5_Zy6ARkoMhJuBl96gzNm3WDMRfEiBqZzb1_E1eQqlQdBvClPr6LvjqGyHfxHVIs88Z0ly9zzGswfkGgqwNsYxwHSZRJ_bMmfq8NKyqaWhM8PeYDYFqKUCeTSPKL72VISb-vycG-6KAL/s1000/invert-spectro.png 2x" data-original-width="1600" data-original-height="916" alt="[Image: A spectrogram in four steps, where the signal is first cut at 3 kHz, then shifted up, producing two sidebands, the upper of which is then filtered out.]"/></a></div>
<p>This operation is its own inverse, like ROT13; by applying the same inversion again we get intelligible speech back. Indeed, <em>deinvert</em> can also be used as a scrambler by just running unscrambled audio through it. The same inversion carrier should be used in both directions.</p>
<h3>Split-band inversion</h3>
<p>The split-band scrambling method adds another carrier frequency that I call the split point. It divides the spectrum into two parts that are inverted separately and then combined, preventing ordinary inverters from fully descrambling it.</p>
<div class="saumaton kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhY40JBjWSghrzwYrHUbHu4r13JVfqKz_8xqYwnwiMiAyqU15wtV2r8WPUeYjgmN_8MbJwPxoVhzfEo8typwXG6GP_BlaA08jF9WC5lSa9ApVT3jH28bPxbEg4YtqkxJlr85Fw9PVSyZEzU/s1600/dspchain.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhY40JBjWSghrzwYrHUbHu4r13JVfqKz_8xqYwnwiMiAyqU15wtV2r8WPUeYjgmN_8MbJwPxoVhzfEo8typwXG6GP_BlaA08jF9WC5lSa9ApVT3jH28bPxbEg4YtqkxJlr85Fw9PVSyZEzU/s420/dspchain.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhY40JBjWSghrzwYrHUbHu4r13JVfqKz_8xqYwnwiMiAyqU15wtV2r8WPUeYjgmN_8MbJwPxoVhzfEo8typwXG6GP_BlaA08jF9WC5lSa9ApVT3jH28bPxbEg4YtqkxJlr85Fw9PVSyZEzU/s840/dspchain.png 2x" /></a></div>
<p>A single filter-inverter pair may already bring back the low end of the spectrum. Descrambling it fully amounts to running the inversion algorithm twice, with different settings for the filters and mixer, and adding the results together.</p>
<p>The problem here is to find these two frequencies. But let's take a look at an example from audio scrambled using the CML CMX264 split-band inverter (from <a href="https://www.youtube.com/watch?v=6qLFTf_T1JI" title="Radio Intercept: CML CMX264 Frequency Domain Split-Band Speech Inversion" class="external">a video by GBPPR2</a>).</p>
<div class="kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6QjzxPReF6UXcSlkUZ0Q6w0MW2SC8M7Z6NdGfBFrBPZxoYtTQu0N8Wtk5Df5IVeVxpsjZ8K3k7Lp6HH4GzhaYGJS1Jnhu6fFOO58QrGqaF_j6rbLyPKoSqAS6vj8x2Qe2PaMRIIkY7Egl/s1600/split-point.jpg" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6QjzxPReF6UXcSlkUZ0Q6w0MW2SC8M7Z6NdGfBFrBPZxoYtTQu0N8Wtk5Df5IVeVxpsjZ8K3k7Lp6HH4GzhaYGJS1Jnhu6fFOO58QrGqaF_j6rbLyPKoSqAS6vj8x2Qe2PaMRIIkY7Egl/s500/split-point.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6QjzxPReF6UXcSlkUZ0Q6w0MW2SC8M7Z6NdGfBFrBPZxoYtTQu0N8Wtk5Df5IVeVxpsjZ8K3k7Lp6HH4GzhaYGJS1Jnhu6fFOO58QrGqaF_j6rbLyPKoSqAS6vj8x2Qe2PaMRIIkY7Egl/s1000/split-point.jpg 2x" data-original-width="1600" data-original-height="850" alt="[Image: A spectrogram showing a narrow band of speech-like harmonics, but with a constant dip in the middle of the band.]"/></a></div>
<p>In this case the filter roll-off is clearly visible in the spectrogram and it's obvious where the split point is. The higher carrier is probably at the upper limit of the full band or slightly above it. Here the full bandwidth seems to be around 3200 Hz and the split point is at 1200 Hz. This could be initially descrambled using <span class="code">deinvert -f 3200 -s 1200</span>; if the result sounds shifted up or down in frequency this could be refined accordingly.</p>
<h3>Performance</h3>
<p>On a single core of an i7-based laptop from 2013, deinvert processes a 44.1 kHz WAV file at 60x realtime speed (120x for simple inversion). Most of the CPU cycles are spent doing filter convolution, i.e. calculating the signal's vector dot product with the low-pass filter kernels:</p>
<div class="saumaton kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilLXBUcWihSX70UpUUDN5MiA7N5j7kQMppYG43xzuJpYdZH0c0k-19GC5e7_5TfivSeOle408E0WPlnJPe1dEbMMe2OhVBfVw5Jq8ojSq5v8M59cyWGWp46-oUngwwlus8N8VY28HarmeX/s1600/deinvert-perf.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilLXBUcWihSX70UpUUDN5MiA7N5j7kQMppYG43xzuJpYdZH0c0k-19GC5e7_5TfivSeOle408E0WPlnJPe1dEbMMe2OhVBfVw5Jq8ojSq5v8M59cyWGWp46-oUngwwlus8N8VY28HarmeX/s470/deinvert-perf.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilLXBUcWihSX70UpUUDN5MiA7N5j7kQMppYG43xzuJpYdZH0c0k-19GC5e7_5TfivSeOle408E0WPlnJPe1dEbMMe2OhVBfVw5Jq8ojSq5v8M59cyWGWp46-oUngwwlus8N8VY28HarmeX/s940/deinvert-perf.png 2x" data-original-width="1600" data-original-height="923" alt="[Image: A graph of the time spent in various parts of the call tree of the program, with the subtree leading to the dot product operation highlighted. It takes well over 80 % of the tree.]"/></a></div>
<p>For this reason deinvert has a quality setting (0 to 3) for controlling the number of samples in the convolution kernels. A filter with a shorter kernel is linearly faster to compute, but has a low roll-off and will leave more unwanted harmonics.</p>
<p>A quality setting of 0 turns filtering off completely, and is very fast. For simple inversion this should be fine, as long as the original doesn't contain much power above the inversion carrier. It's easy to ignore the upper sideband because of its high frequency. In split-band descrambling this leaves some nasty folded harmonics in the speech band though.</p>
<p>Here's a descramble of the above CMX264 split-band audio using all the different quality settings in deinvert. You will first hear it scrambled, and then descrambled with increasing quality setting.</p>
<div class="audiodiv"><audio controls=""><source src="https://oona.windytan.com/blogfiles/cml-filters.mp3">(HTML5 audio: Inverted speech.)</audio></div>
<p>The default quality level is 2. This should be enough for real-time descrambling of simple inversion on a Raspberry Pi 1, still leaving cycles for an FM receiver for instance:</p>
<table class="pretty">
<tr><th>(RasPi 1)</th><th>Simple inversion</th><th>Split-band inversion</th></tr>
<tr><th>-q 0</th><td>16x realtime</td><td>5.8x realtime</td></tr>
<tr><th>-q 1</th><td>6.5x realtime</td><td>3.0x realtime</td></tr>
<tr><th>-q 2</th><td>2.8x realtime</td><td>1.3x realtime</td></tr>
<tr><th>-q 3</th><td>1.2x realtime</td><td>0.4x realtime</td></tr>
</table>
<p>The memory footprint is less than four megabytes.</p>
<h3>Future developments</h3>
<p>There's a variant of split-band inversion where the inversion carrier changes constantly, called variable split-band. The transmitter informs the receiver about this sequence of frequencies via short bursts of data every couple of seconds or so. This data seems to be FSK, but it shall be left to another time.</p>
<p>I've also thought about ways to automatically estimate the inversion carrier frequency. Shifting speech up or down in frequency breaks the relationships of the harmonics. Perhaps this fact could be exploited to find a shift that would minimize this error?</p>
<h3>Links</h3>
<ul>
<li><a href="https://github.com/windytan/deinvert" class="external">deinvert is on GitHub</a> - please also see the <a href="https://github.com/windytan/deinvert/wiki" class="external">wiki</a> for detailed instructions on how to compile and use it.</li>
</ul>Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com15tag:blogger.com,1999:blog-5096278891763426276.post-27206443138466606742017-07-27T21:01:00.001+03:002019-08-27T10:01:00.624+03:00Gramophone audio from photograph, revisited<blockquote>"I am the atomic powered robot. Please give my best wishes to everybody!"</blockquote>
<p>Those are the words uttered by Tommy, a childhood toy robot of mine. I've taken a look at his miniature vinyl record sound mechanism a few times before (<a href="http://www.windytan.com/2013/02/the-atomic-powered-robot.html" title="absorptions: The atomic powered robot">#1</a>, <a href="http://www.windytan.com/2013/02/the-laser-equipped-lego-train.html" title="absorptions: The laser-equipped lego train">#2</a>), in an attempt to recover the analog audio signal using only a digital camera. Results were noisy at best. The blog posts resurfaced in a recent IRC discussion which inspired me to try my luck with a slightly improved method.</p>
<h3>Source photo</h3>
<p>I will be using an old photo of Tommy's internal miniature record I already had from previous adventures in 2012. I don't want to perform another invasive operation on Tommy to take a new photograph, as I already broke a plastic tab last time I opened him. But it also means I don't have control over the photographing environment. It's part of the challenge.</p>
<p>The picture was taken with a DSLR and it's an uncompressed 8-bit color photo measuring 3000 by 3000 pixels. There's a fair amount of focus blur, chromatic aberration and similar distortions. But at this resolution, a clear pattern can be seen when zooming into the grooves.</p>
<div class="saumaton kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgt3gSLou9vVgUvBmkE3ED4wax1cru1J1s0FR8z1w2v59j3HqKD-l6MXafFwLk8SUS34_D5YkpPuGEDBv_4HgPwAlsrH_5dVNCqppDjuyYPS5lvUlMv72WIyiRfU4w6auJxqNW0wDCl-xVB/s1600/vinyyli-zoom.jpg" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgt3gSLou9vVgUvBmkE3ED4wax1cru1J1s0FR8z1w2v59j3HqKD-l6MXafFwLk8SUS34_D5YkpPuGEDBv_4HgPwAlsrH_5dVNCqppDjuyYPS5lvUlMv72WIyiRfU4w6auJxqNW0wDCl-xVB/s350/vinyyli-zoom.jpg" srcset ="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgt3gSLou9vVgUvBmkE3ED4wax1cru1J1s0FR8z1w2v59j3HqKD-l6MXafFwLk8SUS34_D5YkpPuGEDBv_4HgPwAlsrH_5dVNCqppDjuyYPS5lvUlMv72WIyiRfU4w6auJxqNW0wDCl-xVB/s700/vinyyli-zoom.jpg 2x" data-original-width="1600" data-original-height="966" alt="[Image: Close-up shot of a miniature vinyl record, with a detail view of the grooves.]" /></a></div>
<p>This pattern superficially resembles a variable-area optical audio track seen in old film prints, and that's why I previously tried to decode it as such. But it didn't produce satisfactory results, and there is no physical reason it even should. In fact, I'm not even sure as to which physical parameter the audio is encoded in – does the needle move vertically or horizontally? How would this feature manifest itself in the photograph? Do the bright blobs represent crests in the groove, or just areas that happen to be oriented the right way in this particular lighting?</p>
<h3>Unwrapping</h3>
<p>To make the grooves a little easier to follow I first unwrapped the circular record into a linear image. I did this by remapping the image space from polar to 9000-wide Cartesian coordinates and then resampling it with a windowed sinc kernel:</p>
<div class="saumaton kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEcWYze6gwl_Adheon9elKRD6fon9GdyWpGg_LoCzE8UtkpPVtzcfOVNNB5EWT91ZjCl5l8mDfiUeYGuVOWDYCa5k6KxHmmgDLdQmJIOHCiWBGKlJQSv9-i7dVhL3ttktCXo2-zccBkkr5/s1600/unwrapped.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEcWYze6gwl_Adheon9elKRD6fon9GdyWpGg_LoCzE8UtkpPVtzcfOVNNB5EWT91ZjCl5l8mDfiUeYGuVOWDYCa5k6KxHmmgDLdQmJIOHCiWBGKlJQSv9-i7dVhL3ttktCXo2-zccBkkr5/s520/unwrapped.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEcWYze6gwl_Adheon9elKRD6fon9GdyWpGg_LoCzE8UtkpPVtzcfOVNNB5EWT91ZjCl5l8mDfiUeYGuVOWDYCa5k6KxHmmgDLdQmJIOHCiWBGKlJQSv9-i7dVhL3ttktCXo2-zccBkkr5/s1040/unwrapped.png 2x" data-original-width="1400" data-original-height="193" alt="[Image: The photo of the circular record unwrapped into a long linear strip.]"/></a></div>
<h3>Mapping the groove path</h3>
<p>It's not easy to automatically follow the groove. As one would imagine, it's not a mathematically perfect spiral. Sometimes the groove disappears into darkness, or blurs into the adjacent track. But it wasn't overly tedious to draw a guiding path manually. Most of the work was just copy-pasting from a previous groove and making small adjustments.</p>
<p>I opened the unwrapped image in Inkscape and drew a colored polyline over all obvious grooves. I tried to make sure a polyline at the left image border would neatly continue where the previous one ended on the right side.</p>
<p>The grooves were alternatively labeled as 'a' and 'b', since I knew this record had two different sound effects on interleaved tracks.</p>
<div class="saumaton kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrCXDEnEGkileTPi_q5Zd8GVmEr4cxNPZBYj2zm4PXfMw4MP7v_HeI0Z024NxWeSfvObC3TXZJNFBig3xqHrB2luaHAFm-9RwYibcEacWEwfeqlFlmQJd1mMgP7EulMwK-oqbVAaQ62tmh/s1600/raidat-manuaalisesti.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrCXDEnEGkileTPi_q5Zd8GVmEr4cxNPZBYj2zm4PXfMw4MP7v_HeI0Z024NxWeSfvObC3TXZJNFBig3xqHrB2luaHAFm-9RwYibcEacWEwfeqlFlmQJd1mMgP7EulMwK-oqbVAaQ62tmh/s420/raidat-manuaalisesti.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrCXDEnEGkileTPi_q5Zd8GVmEr4cxNPZBYj2zm4PXfMw4MP7v_HeI0Z024NxWeSfvObC3TXZJNFBig3xqHrB2luaHAFm-9RwYibcEacWEwfeqlFlmQJd1mMgP7EulMwK-oqbVAaQ62tmh/s840/raidat-manuaalisesti.png 2x" data-original-width="1400" data-original-height="870" alt="[Image: A zoomed-in view of the unwrapped grooves labeled and highlighted with colored lines.]"/></a></div>
<p>This polyline was then exported from Inkscape and loaded by a script that extracted a 3-7 pixel high column from the unwrapped original, centered around the groove, for further processing.</p>
<h3>Pixels to audio</h3>
<p>I had noticed another information-carrying feature besides just the transverse area of the groove: its displacement from center. The white blobs sometimes appear below or above the imaginary center line.</p>
<div class="saumaton kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7BPvUXTmKHG6U_RrgeUoaXAKahZ67hbjVsTxb7MbfyvxA1AAf-swR0fSdSKx_PnWjHSKoXDYrf3lQiiATcOkx8tlmEBg0bhn_lQc3Bs2TBm3FNaA9aJ3MjQxNeLTh_lEDaPNh_wWRn78B/s1600/groove-center-displacement.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7BPvUXTmKHG6U_RrgeUoaXAKahZ67hbjVsTxb7MbfyvxA1AAf-swR0fSdSKx_PnWjHSKoXDYrf3lQiiATcOkx8tlmEBg0bhn_lQc3Bs2TBm3FNaA9aJ3MjQxNeLTh_lEDaPNh_wWRn78B/s380/groove-center-displacement.png" data-original-width="380" data-original-height="228" alt="[Image: Parts of a few grooves shown greatly magnified. They appear either as horizontal stripes, or horizontally organized groups of distinct blobs.]"/></a></div>
<p>I had my script calculate the brightness mass center (weighted <span class="math">y</span> average) relative to the track polyline at all <span class="math">x</span> positions along the groove. This position was then directly used as a PCM sample value, and the whole groove was written to a WAV file. A noise reduction algorithm was also applied, based on sample noise from the silent end of the groove.</p>
<p>The results are much better than what I previously obtained (see video below, or <a href="https://oona.windytan.com/blogfiles/atomicpowered.mp3">mp3 here</a>):</p>
<center><iframe width="560" height="315" src="https://www.youtube.com/embed/J0XMDiM2PrA" frameborder="0" allowfullscreen></iframe></center>
<h3>Future ideas</h3>
<p>Several factors limit the fidelity and dynamic range obtained by this method. For one, the relationship between the white blobs and needle movement is not known. The results could possibly still benefit from more pixel resolution and color bit depth. The blob central displacement (insofar as it is the most useful feature) could also be more accurately obtained using a Gaussian fit or similar algorithm.<p>
<p>The groove guide could be drawn more carefully, as some track slips can be heard in the recovered audio.</p>
<p>Opening up the robot for another photograph would be risky, since I already broke a plastic tab before. But other ways to optically capture the signal would be using a USB microscope or a flatbed scanner. These methods would still be only slightly more complicated that just using a microphone! The linear light source of the scanner would possibly cause problems with the circular groove. I would imagine the problem of the disappearing grooves would still be there, unless some sort of carefully controlled lighting was used.</p>Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com20tag:blogger.com,1999:blog-5096278891763426276.post-65920311981535795712017-07-17T15:36:00.000+03:002019-07-04T22:30:50.614+03:00Virtual music box<p>A little music project I was writing required a melody be played on a music box. However, the paper-programmable music box I had (pictured) could only play notes on the C major scale. I couldn't easily find a realistic-sounding synthesizer version either. They all seemed to be missing something. Maybe they were too perfectly tuned? I wasn't sure.</p>
<p>Perhaps, if I digitized the sound myself, I could build a flexible virtual instrument to generate just the perfect sample for the piece!</p>
<div class="kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3dtNWvQ6egqAgWx-_k9__beJ6HvB4UPg6nS5DVTClY1ytN56t9x6Sw2mZdZI9eJPwpxWXi3DSd9ywFSd3jyTErBQoprIybZ1rr7c6blyFotHO5uV0JsYiJ-rulJfua0sYJhN6VkCr_VSS/s1600/IMG_6798.jpg"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3dtNWvQ6egqAgWx-_k9__beJ6HvB4UPg6nS5DVTClY1ytN56t9x6Sw2mZdZI9eJPwpxWXi3DSd9ywFSd3jyTErBQoprIybZ1rr7c6blyFotHO5uV0JsYiJ-rulJfua0sYJhN6VkCr_VSS/s450/IMG_6798.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3dtNWvQ6egqAgWx-_k9__beJ6HvB4UPg6nS5DVTClY1ytN56t9x6Sw2mZdZI9eJPwpxWXi3DSd9ywFSd3jyTErBQoprIybZ1rr7c6blyFotHO5uV0JsYiJ-rulJfua0sYJhN6VkCr_VSS/s900/IMG_6798.jpg 2x" alt="[Image: A paper programmable music box.]"/></a></div>
<p>I haven't really made a sampled instrument before, short of perhaps using Impulse Tracker clones with terrible single-sample ones. So I proceeded in an improvised manner. Below I'll post some interesting findings and sound samples of how the instrument developed along the way. There won't be any source code as for now.</p>
<p>By the way, there is <a href="https://www.youtube.com/watch?v=COty6_oDEkk" title="How a Wind Up Music Box Works" class="external">a great explanatory video</a> by engineerguy about the workings of music boxes that will explain some terminology ("pins" and "teeth") used in this post.</p>
<h3>Recording samples</h3>
<div class="kuva oikealla"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhFI67wIrNfYfUA4m244LiKXFxr1ua_6ZbA3w3B911IJX40ZcckRT5L_pl-SqgnDjIq00wqbROQQiKr3WY-h5B5K6yTHpsumwsvHI09rtyXre6pOOj5TdWQm1rUyZQlFX_sxzdz322Z09jw/s1600/IMG_7607.JPG"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhFI67wIrNfYfUA4m244LiKXFxr1ua_6ZbA3w3B911IJX40ZcckRT5L_pl-SqgnDjIq00wqbROQQiKr3WY-h5B5K6yTHpsumwsvHI09rtyXre6pOOj5TdWQm1rUyZQlFX_sxzdz322Z09jw/s240/IMG_7607.JPG" data-original-width="1600" data-original-height="1600" alt="[Image: A recording setup with a microphone.]"/></a></div>
<p>The first step was, obviously, to record the sound to be used as samples. I damped my room using towels and mattresses to minimize room echo; this could be added later if desired, but for now it would only make it harder to cleanly splice the audio. The microphone used was the Audio Technica AT2020, and I digitized it using the Behringer Xenyx 302 USB mixer.</p>
<p>I perforated a paper roll to play all the possible notes in succession, and rolled the paper through. The sound of the paper going through the mechanism posed a problem at first, but I soon learned to stop the paper at just the right moment to make way for the sound of the tooth.</p>
<p>Now I had pretty decent recordings of the whole two-octave range. I used Audacity to extract the notes from the recording, and named the files according to the actual playing MIDI pitch. (The music box actually plays a G# major scale, contrary to what's marked on the blank paper rolls.)</p>
<h3>The missing notes</h3>
<p>Next, we'll need to generate the missing notes that don't belong in the scale of this music box. Because pitch is proportional to the speed of vibration, this could be done by simply speeding up or slowing down an adjacent note by just the right factor. In equal temperament tuning, this factor would be the 12th root of 2, or roughly 1.05946. Such scaling is straightforward to do on the command line using SoX, for instance (<span class="code">sox c1.wav c_sharp1.wav speed 1.05946</span>).</p>
<div class="saumaton kuva keskella"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcfWi9CVTYXztYkPcGYsJzxJg9ZLbXvg5ZjInnFWXMW0EHcng-N394B2myUJju5mKhEY54OB4g9ZeOatQXe0cX-VOAdCuLecez_uLibvqIFuv4KoP_KIwHI1jaqJFdx0IFBkmJ2VegCLVK/s380/notes_12root_t.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcfWi9CVTYXztYkPcGYsJzxJg9ZLbXvg5ZjInnFWXMW0EHcng-N394B2myUJju5mKhEY54OB4g9ZeOatQXe0cX-VOAdCuLecez_uLibvqIFuv4KoP_KIwHI1jaqJFdx0IFBkmJ2VegCLVK/s760/notes_12root_t.png 2x" data-original-width="800" data-original-height="444" alt="[Image: Musical notation explaining transposition by multiplication by the 12th root of 2.]"/></div>
<p>This method can also be used to generate whole new octaves; for example, a transposition of +8 semitones would have a ratio of (<sup>12</sup>√2)<sup>8</sup> ≈ 1.5874. Inter-note variance could be retained by using a random source file for each resampled note. But large-interval transpositions would probably not sound very good due to coloring in the harmonic series.</p>
<p>Here's a table of some intervals and the corresponding speed ratios in equal temperament:</p>
<table>
<tr><td>–3</td><td>= (<sup>12</sup>√2)<sup>–3</sup></td><td>≈ 0.840896</td></tr>
<tr><td>–2</td><td>= (<sup>12</sup>√2)<sup>–2</sup></td><td>≈ 0.890899</td></tr>
<tr><td>–1</td><td>= (<sup>12</sup>√2)<sup>–1</sup></td><td>≈ 0.943874</td></tr>
<tr><td>+1</td><td>= (<sup>12</sup>√2)<sup>1</sup></td><td>≈ 1.059463</td></tr>
<tr><td>+2</td><td>= (<sup>12</sup>√2)<sup>2</sup></td><td>≈ 1.122462</td></tr>
<tr><td>+3</td><td>= (<sup>12</sup>√2)<sup>3</sup></td><td>≈ 1.189207</td></tr>
</table>
<h3>First test!</h3>
<p>Now I could finally write a script to play my melody!</p>
<div class="audiodiv"><audio controls=""><source src="https://oona.windytan.com/blogfiles/soittorasia01.mp3">(HTML5 audio: Music box notes.)</audio></div>
<p>It sounds pretty good already - there's no obvious noise and the samples line up seamlessly even though they were just naively glued together sample by sample. There's a lot of power in the lower harmonics, probably because of the big cardboard box I used, but this can easily be changed by EQ if we want to give the impression of a cute little music box.</p>
<h3>Adding errors</h3>
<p>The above sound still sounded quite artificial, I think mostly because simultaneous notes start on the same exact millisecond. There seems to be a small timing variance in music boxes that is an important contributor to their overall delicate sound. In the below sample I added a timing error from a normal distribution with a standard deviation of 11 milliseconds. It sounds a lot better already!</p>
<div class="audiodiv"><audio controls=""><source src="https://oona.windytan.com/blogfiles/soittorasia02.mp3">(HTML5 audio: Music box notes.)</audio></div>
<h3>Other sounds from the teeth</h3>
<p>If you listen to recordings of music boxes you can occasionally hear a high-pitched screech as well. It sounds a bit like stopping a tuning fork or guitar string with a metal object. That's why I thought it must be the sound of the pin stopping a vibrating tooth just before playing another note on the same tooth.</p>
<div class="saumaton kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJ5FEUZCR7qcTWZglyBnW7MmpAGYuN_19HRlgJPM7BQfHMrPkt2lDX_mwrmwVdZeB2MUO1KH-xUBbgRs0vpBjT5zLwyTo1ZDHpdPwHiOjRhdR3h0qNoE0QIKQLgsC0qA4Y7VK03HLxfTGt/s1600/screech.jpg" ><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJ5FEUZCR7qcTWZglyBnW7MmpAGYuN_19HRlgJPM7BQfHMrPkt2lDX_mwrmwVdZeB2MUO1KH-xUBbgRs0vpBjT5zLwyTo1ZDHpdPwHiOjRhdR3h0qNoE0QIKQLgsC0qA4Y7VK03HLxfTGt/s470/screech.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJ5FEUZCR7qcTWZglyBnW7MmpAGYuN_19HRlgJPM7BQfHMrPkt2lDX_mwrmwVdZeB2MUO1KH-xUBbgRs0vpBjT5zLwyTo1ZDHpdPwHiOjRhdR3h0qNoE0QIKQLgsC0qA4Y7VK03HLxfTGt/s940/screech.jpg 2x" data-original-width="1200" data-original-height="621" alt="[Image: Spectrogram of the beginning of a note with the characteristic screech, centered around 12 kilohertz.]"/></a></div>
<p>Sure enough, this sound could always be heard by playing the same note twice in quick succession. I recorded this sound for each tooth and added it to my sound generator. The sound will be generated only if the previous note sample is still playing, and its volume will be scaled in proportion to the tooth's envelope amplitude at that moment. Also, it will silence the note. The amount of silence between the screech and the next note will depend on a tempo setting.</p>
<p>Adding this resonance definitely brings about a more organic feel:</p>
<div class="audiodiv"><audio controls=""><source src="https://oona.windytan.com/blogfiles/soittorasia03.mp3">(HTML5 audio: Music box notes.)</audio></div>
<h3>The wind-up mechanism</h3>
<p>For a final touch I recorded sounds from the wind-up mechanism of another music box, even though this one didn't have one. It's all stitched up from small pieces, so the number of wind-ups in the beginning and the speed of the whirring sound can all be adjusted. I was surprised at the smoothness of the background sound; it's a three-second loop with no cross-fading involved. You can also hear the box lid being closed in the end.</p>
<div class="audiodiv"><audio controls=""><source src="https://oona.windytan.com/blogfiles/soittorasia04.mp3">(HTML5 audio: Music box notes.)</audio></div>
<h3>Notation</h3>
<div class="saumaton kuva oikealla"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0Rm4-UqZZbjeo4nVT9jXakV-VhqLzrOg7N_h0NU-P8VTbNLgX6WiZqpTPAbe5BZYWDxxAehgdQKiKHxyX8oTyHZ5psG4mD3zDc7wTFw4bcOnetXIUnTzhkAWP2MqDFjsqVaXvR7n_tDTV/s1600/Screen+Shot+2017-07-17+at+15.22.27.png"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0Rm4-UqZZbjeo4nVT9jXakV-VhqLzrOg7N_h0NU-P8VTbNLgX6WiZqpTPAbe5BZYWDxxAehgdQKiKHxyX8oTyHZ5psG4mD3zDc7wTFw4bcOnetXIUnTzhkAWP2MqDFjsqVaXvR7n_tDTV/s320/Screen+Shot+2017-07-17+at+15.22.27.png" data-original-width="807" data-original-height="1014" alt="[Image: VIM screenshot of a text file containing music box markup.]"/></a></div>
<p>The native notation of a music box is some kind of a perforated tape or drum, so I ended up using a similar format. There's a tempo marking and tuning information in the beginning, followed by notation one eighth per line. Arpeggios are indicated by a pointy bracket <span class="code">></span>. I also wrote a script to convert MIDI files into this format; but the number of notes in a music box loop is usually so small that it's not very hard to write manually.</p>
<p>This format could include additional information as well, perhaps controlling the motor sound or box size and shape (properties of the EQ filter).</p>
<p>This format could also potentially be useful when producing or transcribing music from music drums.</p>
<br style="clear:both">
<h3>Future developments</h3>
<p>Currently the music box generator has a hastily written "engineer's UI", which means I probably won't remember how to use it in a couple months' time. Perhaps it could it be integrated into some music software, as a plugin.</p>
<p>Possibilities for live performances are limited, I think. It wouldn't work exactly like a keyboard instrument usually does. At least there should be a way to turn on the background noise, and the player should take into account the 300-millisecond delay caused by the pin slowly rotating over the tooth. But it could be used to play a roll in an endless loop and the settings could be modified on the fly.</p>
<p>As such, the tool performs best at pre-rendering notated music. And I'm happy with the results!</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/_JbUFpZtRiE" frameborder="0" allowfullscreen></iframe>Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com6tag:blogger.com,1999:blog-5096278891763426276.post-58070763140643907792016-10-07T16:17:00.004+03:002022-10-23T21:47:38.471+03:00CTCSS fingerprinting: a method for transmitter identification<p>Identifying unknown radio transmitters by their signals is called radio fingerprinting. It is usually based on rise-time signatures, i.e. characteristic differences in how the transmitter frequency fluctuates at carrier power-up. Here, instead, I investigate the fingerprintability of another feature in hand-held FM transceivers, known as <a href="https://en.wikipedia.org/wiki/Continuous_Tone-Coded_Squelch_System" class="external" title="Wikipedia: Continuous Tone-Coded Squelch System">CTCSS</a> or Continuous Tone-Coded Squelch System.</p>
<h3>Motivation & data</h3>
<p>I came across a long, losslessly compressed recording of some walkie-talkie chatter and wanted to know more about it, things like the number of participants and who's talking with who. I started writing a transcript – a fun pastime – but some voices sounded so similar I wondered if there was a way to tell them apart automatically.</p>
<div class="kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjyjQ5aQ5C_WSBEMIuM7R_1CZyvc4zMoPWt4Egna3qhWYNKJN2iZZB-NciG3Efy8Qo-utByS1Wr6fzGtkfg8omYo2BK7JfNT-EzKHfwmSEpQBLOfQ8SaLvbqj98DLt49nnxJye2G-u6n25/s1600/Screen+Shot+2016-09-26+at+0.19.53.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjyjQ5aQ5C_WSBEMIuM7R_1CZyvc4zMoPWt4Egna3qhWYNKJN2iZZB-NciG3Efy8Qo-utByS1Wr6fzGtkfg8omYo2BK7JfNT-EzKHfwmSEpQBLOfQ8SaLvbqj98DLt49nnxJye2G-u6n25/s520/Screen+Shot+2016-09-26+at+0.19.53.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjyjQ5aQ5C_WSBEMIuM7R_1CZyvc4zMoPWt4Egna3qhWYNKJN2iZZB-NciG3Efy8Qo-utByS1Wr6fzGtkfg8omYo2BK7JfNT-EzKHfwmSEpQBLOfQ8SaLvbqj98DLt49nnxJye2G-u6n25/s1040/Screen+Shot+2016-09-26+at+0.19.53.png 2x" alt="[Image: Screenshot of Audacity showing an audio file over eleven hours long.]"/></a></div>
<p>The file comprises several thousand short transmissions as FM demodulated audio lowpass filtered at 4500 Hz. Signal quality is variable; most transmissions are crisp and clear but some are buried under noise. Passages with no signal are squelched to zero.</p>
<p>I considered several potentially fingerprintable features, many of them unrealistic:</p>
<ul>
<li>Carrier power-up; but many transmissions were missing the very beginning because of squelch</li>
<li>Voice identification; but it would probably require pretty sophisticated algorithms (too difficult!) and longer samples</li>
<li>Mean audio power; but it's not consistent enough, as it depends on text, tone of voice, etc.</li>
<li>Maximum audio power; but it's too sensitive to peaks in FM noise</li>
</ul>
<p>I then noticed all transmissions had a very low tone at 88.5 Hz. It turned out to be CTCSS, an inaudible signal that enables handsets to silence unwanted transmissions on the same channel. This gave me an idea inspired by <a href="https://en.wikipedia.org/wiki/Electrical_network_frequency_analysis" class="external" title="Wikipedia: Electrical network frequency analysis">mains frequency analysis</a>: Could this tone be measured to reveal minute differences in crystal frequencies and modulation depths? Also, knowing that these were recorded using a cheap DVB-T USB stick – would it have a stable enough oscillator to produce consistent measurements?</p>
<h3>Measurements</h3>
<p>I used the <a href="https://github.com/jgaeddert/liquid-dsp/" class="external" title="jgaeddert/liquid-dsp: digital signal processing library for software-defined radios">liquid-dsp</a> library for signal processing. It has several methods for measuring frequencies. I decided to use a phase-locked loop, or PLL; I could have also used FFT with peak interpolation.</p>
<p>In my fingerprinting tool, the recording is first split into single transmissions. The CTCSS tone is bandpass filtered and a PLL starts tracking it. When the PLL frequency stops fluctuating, i.e. the standard deviation is small enough, it's considered locked and its frequency is averaged over this time. The average RMS power is measured similarly.</p>
<p>Here's one such transmission:</p>
<div class="saumaton kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwIys3UsSSN0416QgmaXwvwEQ15s17O6sFWs9AJ0nkeKszasjvgzaEZ3dzqNAb-rlvsduwGU3yNl_1621sPnOtUE_wsdCfuAVCB8sONadQBuueNtjaJKKLU1nfBKmbvZuyfCRrthlBw68Q/s1600/plot-lock.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwIys3UsSSN0416QgmaXwvwEQ15s17O6sFWs9AJ0nkeKszasjvgzaEZ3dzqNAb-rlvsduwGU3yNl_1621sPnOtUE_wsdCfuAVCB8sONadQBuueNtjaJKKLU1nfBKmbvZuyfCRrthlBw68Q/s480/plot-lock.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwIys3UsSSN0416QgmaXwvwEQ15s17O6sFWs9AJ0nkeKszasjvgzaEZ3dzqNAb-rlvsduwGU3yNl_1621sPnOtUE_wsdCfuAVCB8sONadQBuueNtjaJKKLU1nfBKmbvZuyfCRrthlBw68Q/s960/plot-lock.png 2x" alt="[Image: A graph showing frequency and power, first fluctuating but then both stabilize for a moment, where text says 'PLL locked'. Caption says 'No, I did not copy'.]"/></a></div>
<h3>Results</h3>
<p>At least three clusters are clearly distinguishable by eye. Zooming in to one of the clusters reveals it's made up of several smaller clusters. Perhaps the larger clusters correspond to three different models of radios in use, and these smaller ones are the individual transmitters?</p>
<div class="saumaton kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrhYz0_IxabL_QLPuN0Rt8QzJ1uy_bkd1tLC4q1cpRzh_mMREjKb7UUnLU1Ri-rFyaWydXCrnEYymsKnkQDed51EaPYmIumgb-W8BK5KqaUOfuMq83HzPsFSDfr5wWub56L73O_heotwK5/s1600/plot12.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrhYz0_IxabL_QLPuN0Rt8QzJ1uy_bkd1tLC4q1cpRzh_mMREjKb7UUnLU1Ri-rFyaWydXCrnEYymsKnkQDed51EaPYmIumgb-W8BK5KqaUOfuMq83HzPsFSDfr5wWub56L73O_heotwK5/s520/plot12.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrhYz0_IxabL_QLPuN0Rt8QzJ1uy_bkd1tLC4q1cpRzh_mMREjKb7UUnLU1Ri-rFyaWydXCrnEYymsKnkQDed51EaPYmIumgb-W8BK5KqaUOfuMq83HzPsFSDfr5wWub56L73O_heotwK5/s1040/plot12.png 2x" alt="[Image: A plot of RMS power versus frequency, with dots scattered all over, but mostly concentrated in a few clusters.]"/></a></div>
<p>A heat map reveals even more structure:</p>
<div class="saumaton kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixlW3cmPy8YU5-c8M8OLOs1GlGhyphenhyphenebaTp6Xc7WzVmi9goQbEYPD-CLhE6aQGfhVcLIpdkMGUrfh-Ri-ehUplQraoz8ag2Rn4r3VyVT-D35i1Oxbv14qrf6rFFrKQ76bBHmZkOwm_XxxWDD/s1600/heatmap3.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixlW3cmPy8YU5-c8M8OLOs1GlGhyphenhyphenebaTp6Xc7WzVmi9goQbEYPD-CLhE6aQGfhVcLIpdkMGUrfh-Ri-ehUplQraoz8ag2Rn4r3VyVT-D35i1Oxbv14qrf6rFFrKQ76bBHmZkOwm_XxxWDD/s420/heatmap3.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixlW3cmPy8YU5-c8M8OLOs1GlGhyphenhyphenebaTp6Xc7WzVmi9goQbEYPD-CLhE6aQGfhVcLIpdkMGUrfh-Ri-ehUplQraoz8ag2Rn4r3VyVT-D35i1Oxbv14qrf6rFFrKQ76bBHmZkOwm_XxxWDD/s840/heatmap3.png 2x" alt="[Image: The same clusters presented in a gradual color scheme and numbered from 1 to 12.]"/></a></div>
<p>It seems at least 12 clusters, i.e. potential individual transmitters, can be distinguished.</p>
<p>Even though most transmissions are part of some cluster, there are many outliers as well. These appear to correspond to a very noisy or very short transmission. (Could the FFT have produced better results with these?)</p>
<h3>Use as transcription aid</h3>
<p>My goal was to make these fingerprints useful as labels aiding transcription. This way, a human operator could easily distinguish parties of a conversation and add names or call signs accordingly.</p>
<p>I experimented with automated k-means clustering, but that didn't immediately produce appealing results. Then I manually assigned 12 anchor points at apparent cluster centers and had a script calculate the nearest anchor point for all transmissions. Prior to distance calculations the axes were scaled so that the data seemed uniformly distributed around these points.</p>
<p>This automatic labeling proved quite sensitive to errors. It could be useful when listing possible transmitters for an unknown transmission with no context; distances to previous transmissions positively mentioning call signs could be used. Instead I ended up printing the raw coordinates and colouring them with a continuous RGB scale:</p>
<div class="saumaton kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjy9wyge-5pOtq7REJyPPznqfHLZSsuvleHEhGm5XVN9iLpGL2gN9Th6nBiedVP2iple9fuMPfF_tX6JSz-yYVjcevReNuE_Pnj5YrAm4YYc3ZeYWpMJV1XNWauB6ccyrhbeNl-KfZNN3Zn/s1600/snakes.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjy9wyge-5pOtq7REJyPPznqfHLZSsuvleHEhGm5XVN9iLpGL2gN9Th6nBiedVP2iple9fuMPfF_tX6JSz-yYVjcevReNuE_Pnj5YrAm4YYc3ZeYWpMJV1XNWauB6ccyrhbeNl-KfZNN3Zn/s520/snakes.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjy9wyge-5pOtq7REJyPPznqfHLZSsuvleHEhGm5XVN9iLpGL2gN9Th6nBiedVP2iple9fuMPfF_tX6JSz-yYVjcevReNuE_Pnj5YrAm4YYc3ZeYWpMJV1XNWauB6ccyrhbeNl-KfZNN3Zn/s1040/snakes.png 2x" alt="[Image: A few lines from a conversation between Boa 1 and Cobra 1. Numbers in different colors are printed in front of each line.]"/></a></div>
<p>Here the colours make it obvious which party is talking. Call signs written in a darker shade are deduced from the context. One sentence, most probably by "Cobra 1", gets lost in noise and the RMS power measurement becomes inaccurate (463e-6). The PLL frequency is still consistent with the conversation flow, though.</p>
<h3>Countermeasures</h3>
<p>If CTCSS is not absolutely required in your network, i.e. there are no unwanted conversations on the frequency, then it can be disabled to prevent this type of fingerprinting. In Motorola radios this is done by setting the CTCSS code to 0. (In the menus it may also be called a PT code or Interference Eliminator code.) In many other consumer radios it's doesn't seem to be that easy.</p>
<h3>Conclusions</h3>
<p>CTCSS is a suitable signal for fingerprinting transmitters, reflecting minute differences in crystal frequencies and, possibly, FM modulation indices. Even a cheap receiver can recover these differences. It can be used when the signal is already FM demodulated or otherwise not suitable for more traditional rise-time fingerprinting.</p>Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com17tag:blogger.com,1999:blog-5096278891763426276.post-26395736481765762042016-10-02T22:10:00.001+03:002022-10-23T21:48:19.189+03:00Redsea 0.7, a lightweight RDS decoder<p>I've written about <a href="https://github.com/windytan/redsea" title="GitHub: windytan/redsea" class="external">redsea</a>, my RDS decoder project, many times before. It has changed a lot lately; it even has a version number, 0.7.6 as of this writing. What follows is a summary of its current state and possible future developments.</p>
<h3>Input formats</h3>
<p>Redsea can decode several types of data streams. The command-line switches to activate these can be found in the readme.</p>
<p>Its main use, perhaps, is to demodulate an FM multiplex carrier, as received using a cheap rtl-sdr radio dongle and demodulated using <span class="code">rtl_fm</span>. The multiplex is an FM demodulated signal sampled at 171 kHz, a convenient multiple of the RDS data rate (1187.5 bps) and the subcarrier frequency (57 kHz). There's a convenience shell script that starts both redsea and the rtl_fm receiver. For example, <span class="code">./rtl-rx.sh -f 88.0M</span> would start reception on 88.0 MHz.</p>
<p>It can also decode an "ASCII binary" stream (<span class="code">--input-ascii</span>):</p>
<pre class="term">
0001100100111001000101110000101110011000010010110010011001000000100001
1010010000011010110100010000000100000001101110000100010111000010111001
1001000010110000111111011101101011001010101110100011111101000011100010
100000011010010001011100001</pre>
<p>Or hex-encoded RDS groups one per line (<span class="code">--input-hex</span>), which is the format used by <a href="http://rdsspy.com/" title="RDS Spy" class="external">RDS Spy</a>:</p>
<pre class="term">6201 01D8 E704 594C
6201 01D9 2217 4520
6201 E1C1 594C 6202
6201 01DA 1139 594B
6201 21DC 2020 2020</pre>
<h3>Output formats</h3>
<p>The default output has changed drastically. There used to be no strict format to it, rather it was just a human-readable terminal display. This sort of output format will probably return at some point, as an option. But currently redsea outputs line-delimited JSON, where every group is a JSON object on a separate line. It is quite verbose but machine readable and well-suited for post-processing:</p>
<pre class="term">{"pi":"0x6201","group":"0A","tp":false,"prog_type":"Serious classical","ta":tru
e,"is_music":true,"alt_freqs":[87.9,88.5,89.2,89.5,89.8,90.9,93.2],"ps":"YLE YK
SI"}
{"pi":"0x6201","group":"14A","tp":false,"prog_type":"Serious classical","other_
network":{"pi":"0x6205","tp":false,"has_linkage":false}}
{"pi":"0x6201","group":"0A","tp":false,"prog_type":"Serious classical","ta":tru
e,"is_music":true,"partial_ps":"YL "}
{"pi":"0x6201","group":"2A","tp":false,"prog_type":"Serious classical","partial
_radiotext":"Yöklassinen."}
{"pi":"0x6201","group":"0A","tp":false,"prog_type":"Serious classical","ta":tru
e,"is_music":true,"partial_ps":"YLE "}
{"pi":"0x6201","group":"0A","tp":false,"prog_type":"Serious classical","ta":tru
e,"is_music":true,"partial_ps":"YLE YK "}
{"pi":"0x6201","group":"2A","tp":false,"prog_type":"Serious classical","partial
_radiotext":"Yöklassinen."}
{"pi":"0x6201","group":"0A","tp":false,"prog_type":"Serious classical","ta":tru
e,"is_music":true,"alt_freqs":[87.9,88.5,89.2,89.5,89.8,90.9,93.2],"ps":"YLE YK
SI"}</pre>
<p>Someone on GitHub <a href="https://github.com/windytan/redsea/issues/24#issuecomment-247766780" class="external" title="broken json - issue #24">hinted</a> about <span class="code">jq</span>, a command-line tool that can color and filter JSON, among other things:</p>
<pre class="term">> ./rtl-rx.sh -f 87.9M | jq -c
{<span style="color:#4990e8">"pi"</span>:<span style="color:#00aa00">"0x6201"</span>,<span style="color:#4990e8">"group"</span>:<span style="color:#00aa00">"0A"</span>,<span style="color:#4990e8">"tp"</span>:false,<span style="color:#4990e8">"prog_type"</span>:<span style="color:#00aa00">"Serious classical"</span>,<span style="color:#4990e8">"ta"</span>:tru
e,<span style="color:#4990e8">"is_music"</span>:true,<span style="color:#4990e8">"partial_ps"</span>:<span style="color:#00aa00">"YL "</span>}
{<span style="color:#4990e8">"pi"</span>:<span style="color:#00aa00">"0x6201"</span>,<span style="color:#4990e8">"group"</span>:<span style="color:#00aa00">"14A"</span>,<span style="color:#4990e8">"tp"</span>:false,<span style="color:#4990e8">"prog_type"</span>:<span style="color:#00aa00">"Serious classical"</span>,<span style="color:#4990e8">"other_
network"</span>:{<span style="color:#4990e8">"pi"</span>:<span style="color:#00aa00">"0x6202"</span>,<span style="color:#4990e8">"tp"</span>:false}}
{<span style="color:#4990e8">"pi"</span>:<span style="color:#00aa00">"0x6201"</span>,<span style="color:#4990e8">"group"</span>:<span style="color:#00aa00">"0A"</span>,<span style="color:#4990e8">"tp"</span>:false,<span style="color:#4990e8">"prog_type"</span>:<span style="color:#00aa00">"Serious classical"</span>,<span style="color:#4990e8">"ta"</span>:tru
e,<span style="color:#4990e8">"is_music"</span>:true,<span style="color:#4990e8">"partial_ps"</span>:<span style="color:#00aa00">"YLE "</span>}
{<span style="color:#4990e8">"pi"</span>:<span style="color:#00aa00">"0x6201"</span>,<span style="color:#4990e8">"group"</span>:<span style="color:#00aa00">"0A"</span>,<span style="color:#4990e8">"tp"</span>:false,<span style="color:#4990e8">"prog_type"</span>:<span style="color:#00aa00">"Serious classical"</span>,<span style="color:#4990e8">"ta"</span>:tru
e,<span style="color:#4990e8">"is_music"</span>:true,<span style="color:#4990e8">"partial_ps"</span>:<span style="color:#00aa00">"YLE YK "</span>}
{<span style="color:#4990e8">"pi"</span>:<span style="color:#00aa00">"0x6201"</span>,<span style="color:#4990e8">"group"</span>:<span style="color:#00aa00">"1A"</span>,<span style="color:#4990e8">"tp"</span>:false,<span style="color:#4990e8">"prog_type"</span>:<span style="color:#00aa00">"Serious classical"</span>,<span style="color:#4990e8">"prog_it
em_started"</span>:{<span style="color:#4990e8">"day"</span>:9,<span style="color:#4990e8">"time"</span>:<span style="color:#00aa00">"23:10"</span>},<span style="color:#4990e8">"has_linkage"</span>:false}
^C
> ./rtl-rx.sh -f 87.9M | grep "\"radiotext\"" | jq ".radiotext"
<span style="color:#00aa00">"Yöklassinen."
"Yöklassinen."
"Yöklassinen."
"Yöklassinen."
"Yöklassinen."
"Yöklassinen."
"Yöklassinen."</span></pre>
<p>The output can be timestamped using the <span class="code">ts</span> utility from <span class="code">moreutils</span>.</p>
<p>Additionally, redsea can output hex-endoded groups, the same format mentioned above.</p>
<h3>Fast and lightweight</h3>
<p>I've made an effort to make redsea fast and lightweight, so that it could be run real-time on cheap single-board computers like the Raspberry Pi 1. I rewrote it in C++ and chose <a href="http://liquidsdr.org/" class="external" title="liquidsdr.org">liquid-dsp</a> as the DSP library, which seems to work very well for the purpose.</p>
<p>Redsea now uses around 40% CPU on the Pi 1. Enough cycles will be left for the FM receiver, <span class="code">rtl_fm</span>, which has a similar CPU demand. On my laptop, redsea has negligible CPU usage (0.9% of a single core). Redsea only runs a single thread and takes up 1500 kilobytes of memory.</p>
<h3>Sensitivity</h3>
<p>I've gotten several reports that redsea requires a stronger signal than other RDS decoders. This has been improved in recent versions, but I think it still has problems with even many local stations.</p>
<p>Let's examine how a couple of test signals go through the demodulator in <span class="code"><a href="https://github.com/windytan/redsea/blob/master/src/subcarrier.cc#L122" title="redsea/subcarrier.cc" class="external">Subcarrier::​demodulateMoreBits()</a></span> and list possible problems. The test signals shall be called the good one (blue) and the noisy one (magenta). They were recorded on different channels using different antenna setups. Here are their average demodulated power spectra:</p>
<div class="saumaton kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxJb8tfNcyfwlqRoibN9ffpl4QTwWtHbxI_XwUQURfVL9Sabkfj7MVufWlpiHT5q1bSgi13xowZ_YVcFHu82JkG3JcnUALlyf0qe_tMvqG7rPaAfOAp8kp8haCQfhysja4ITKwXE6j_bQn/s1600/viker-puhe.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxJb8tfNcyfwlqRoibN9ffpl4QTwWtHbxI_XwUQURfVL9Sabkfj7MVufWlpiHT5q1bSgi13xowZ_YVcFHu82JkG3JcnUALlyf0qe_tMvqG7rPaAfOAp8kp8haCQfhysja4ITKwXE6j_bQn/s520/viker-puhe.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxJb8tfNcyfwlqRoibN9ffpl4QTwWtHbxI_XwUQURfVL9Sabkfj7MVufWlpiHT5q1bSgi13xowZ_YVcFHu82JkG3JcnUALlyf0qe_tMvqG7rPaAfOAp8kp8haCQfhysja4ITKwXE6j_bQn/s1040/viker-puhe.png 2x" alt="[Image: Spectrum plots of the two signals superimposed.]"/></a></div>
<p>The noise floor around the RDS subcarrier is roughly 23 dB higher in the noisy signal. Redsea recovers 99.9 % of transmitted blocks from the good signal and 60.1 % from the noisy one.</p>
<p>Below, redsea locks onto our good-quality signal. Time is in seconds.</p>
<div class="saumaton kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWhUWCDPhsTT7vtVZJibWzn4K_2SoG9EOY75kRFeLXZgK7PN8ML5O33DeD43SGMKDcErB_dvzwmC7DhzV18pHLlrrEcLsRdjfQw8o964RBjAUVHDCLIiHhExYFocZDKAgMbMXfDPUWaa1f/s1600/NZjst7F.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWhUWCDPhsTT7vtVZJibWzn4K_2SoG9EOY75kRFeLXZgK7PN8ML5O33DeD43SGMKDcErB_dvzwmC7DhzV18pHLlrrEcLsRdjfQw8o964RBjAUVHDCLIiHhExYFocZDKAgMbMXfDPUWaa1f/s520/NZjst7F.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWhUWCDPhsTT7vtVZJibWzn4K_2SoG9EOY75kRFeLXZgK7PN8ML5O33DeD43SGMKDcErB_dvzwmC7DhzV18pHLlrrEcLsRdjfQw8o964RBjAUVHDCLIiHhExYFocZDKAgMbMXfDPUWaa1f/s1040/NZjst7F.png 2x" alt="[Image: A graph of several signal properties against time.]"/></a></div>
<p>Out of the noisy signal, redsea could recover a majority of blocks as well, even though the PLL and constellations are all over the place:</p>
<div class="saumaton kuva keskella fills-mobile"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibHzljSbMrQw5VMB4ErTyv3bxmE5tFl4MKbu6Y7njkCOsF8cnY07FpLUynNBXQ5owOVwwvxaGEXI1xvcp7kciC5T9DXDfJUri1rGDixvq8FcnKYBgleIstRJ4ooUSkTywe8pXk_H9oQdkp/s1600/wXmOjiN.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibHzljSbMrQw5VMB4ErTyv3bxmE5tFl4MKbu6Y7njkCOsF8cnY07FpLUynNBXQ5owOVwwvxaGEXI1xvcp7kciC5T9DXDfJUri1rGDixvq8FcnKYBgleIstRJ4ooUSkTywe8pXk_H9oQdkp/s520/wXmOjiN.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibHzljSbMrQw5VMB4ErTyv3bxmE5tFl4MKbu6Y7njkCOsF8cnY07FpLUynNBXQ5owOVwwvxaGEXI1xvcp7kciC5T9DXDfJUri1rGDixvq8FcnKYBgleIstRJ4ooUSkTywe8pXk_H9oQdkp/s1040/wXmOjiN.png 2x" alt="[Image: A graph of several signal properties against time.]"/></a></div>
<h4>1) PLL</h4>
<p>There's some jitter in the 57 kHz PLL, especially pronounced when the signal is noisy. One would expect a PLL to slowly converge on a frequency, but instead it just fluctuates around it. The PLL is from the liquid-dsp library (internal PLL of the NCO object).</p>
<ul>
<li>Is this an issue?
<li>What could affect this? Loop filter bandwidth?
<li>What about the gain, i.e. the multiplier applied to the phase error?
</ul>
<h4>2) Symbol synchronizer</h4>
<ul>
<li>Is liquid's symbol synchronizer being used correctly?
<li>What should be the correct values for bandwidth, delay, excess bandwidth factor?
<li>Do we really need a separate PLL and symbol synchronizer? Couldn't they be combined somehow? Afterall, the PLL already gives us a multiple of the symbol speed (57,000 / 48 = 1187.5).
</ul>
<h4>3) Pilot tone</h4>
<p>The PLL could potentially be made to lock onto the pilot tone instead. It would yield a much higher SNR.</p>
<ul>
<li>According to the specs, the RDS subcarrier is phase-locked to the pilot, but can we trust this? Also, the phase difference is not defined in the standard.
<li>What about mono stations with no pilot tone?
<li>Perhaps a command-line option?
</ul>
See <a href="https://github.com/windytan/redsea/wiki/RDS-clock-recovery-from-pilot-tone%3F" class="external">redsea wiki</a> for discussion.</p>
<h4>4) rtl_fm</h4>
<ul>
<li>Are the parameters for rtl_fm (gain, filter) optimal?
<li>Is there a poor-quality resampling phase somewhere, such as the one mentioned in the <a href="http://kmkeen.com/rtl-demod-guide/" class="external" title="Rtl_fm guide: Updates for rtl_fm overhaul">rtl_fm guide</a>? Probably not, since we don't specify <span class="code">-r</span>
<li>Is the bandwidth (171 kHz) right?
</ul>
<h3>Other features (perhaps you can help!)</h3>
<p>Besides the basic RDS features (program service name, radiotext, etc.) redsea can decode some Open Data applications as well. It receives traffic messages from the TMC service and prints them in English. These are partially encrypted in some areas. It can also decode RadioText+, a service used in some parts of Germany to transmit such information as artist/title tags, studio hotline numbers and web links.</p>
<p>If there's an interesting service in your area you'd like redsea to support, please tell me! I've heard eRT (Enhanced RadioText) being in use somewhere in the world, and RASANT is used to send DGPS corrections in Germany, but I haven't seen any good data on those.</p>
<p>A minute or two of example data would be helpful; you can get hex output by adding the <span class="code">-x</span> switch to the redsea command in <span class="code">rtl-rx.sh</span>.</p>Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com18tag:blogger.com,1999:blog-5096278891763426276.post-24152579104603716152015-10-07T01:54:00.000+03:002018-12-31T12:14:30.612+02:00Pea whistle steganography<div class="kuva oikealla"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjKdJ8jqWTJO_fS1ilnlhIRaRV4AYC-6mj0nVcE2udYbIZQrcnKZWdlLrOPrUVRiXXzQ-dYc3JDSQLxcHVGgxAqBdsEAng04JMQ_B-dFcIyQ9VOcjDCPd-SwVhKJT5gV1YXxr0N_tGMLCfL/s1600/IMG_4864-1.jpg" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjKdJ8jqWTJO_fS1ilnlhIRaRV4AYC-6mj0nVcE2udYbIZQrcnKZWdlLrOPrUVRiXXzQ-dYc3JDSQLxcHVGgxAqBdsEAng04JMQ_B-dFcIyQ9VOcjDCPd-SwVhKJT5gV1YXxr0N_tGMLCfL/s240/IMG_4864-1.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjKdJ8jqWTJO_fS1ilnlhIRaRV4AYC-6mj0nVcE2udYbIZQrcnKZWdlLrOPrUVRiXXzQ-dYc3JDSQLxcHVGgxAqBdsEAng04JMQ_B-dFcIyQ9VOcjDCPd-SwVhKJT5gV1YXxr0N_tGMLCfL/s480/IMG_4864-1.jpg 2x" alt="[Image: Acme Thunderer 60.5 whistle]" style="width:240px"/></a></div>
<p>Would anyone notice if a referee's whistle transmitted a secret data burst?</p>
<p>I do really follow the game. But every time the pea whistle sounds to start the jam I can't help but think of the possibility of embedding data in the frequency fluctuation. I'm sure it's alternating between two distinct frequencies. Is it really that binary? How random is the fluctuation? Could it be synthesized to contain data, and could that be read back?</p>
<p>I found a staggeringly detailed <a href="https://en.wikipedia.org/wiki/Whistle" class="external" title="Wikipedia: Whistle">Wikipedia article</a> about the physics of whistles – but not a single word there about the effects of adding a pea inside, which is obviously the cause of the frequency modulation.</p>
<p>To investigate this I bought a metallic pea whistle, the Acme Thunderer 60.5, pictured here. Recording its sound wasn't straightforward as the laptop microphone couldn't record the sound without clipping. The sound is incredibly loud indeed – I borrowed a sound pressure meter and it showed a peak level of 106.3 dB(A) at a distance of 70 cm, which translates to 103 dB at the standard 1 m distance. (For some reason I suddenly didn't want to make another measurement to get the distance right.)</p>
<div class="kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhtsQ0g2nfQYkU_wODsX-TPDkTVARQ1cFJKzshhBrvGrdswsMmZBMlVYPA2_Sp_jsDselFLiXlqe7h5gOgBu16XAAvAa-k86D12YbPwF0hz3R92_86PyukNigOfPH9rj35yF86VqerS_Fj3/s1600/IMG_4865-1.jpg" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhtsQ0g2nfQYkU_wODsX-TPDkTVARQ1cFJKzshhBrvGrdswsMmZBMlVYPA2_Sp_jsDselFLiXlqe7h5gOgBu16XAAvAa-k86D12YbPwF0hz3R92_86PyukNigOfPH9rj35yF86VqerS_Fj3/s320/IMG_4865-1.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhtsQ0g2nfQYkU_wODsX-TPDkTVARQ1cFJKzshhBrvGrdswsMmZBMlVYPA2_Sp_jsDselFLiXlqe7h5gOgBu16XAAvAa-k86D12YbPwF0hz3R92_86PyukNigOfPH9rj35yF86VqerS_Fj3/s640/IMG_4865-1.jpg 2x" alt="[Image: Display of a sound pressure meter showing 106.3 dB max.]"/></a></div>
<p>Later I found a microphone that was happy about the decibels and got this spectrogram of a 500-millisecond whistle.</p>
<div class="kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijskzqOQXlFFqsYmhyphenhyphenXNS45EpV7mbP0Vz0-9CSh8qWwKDseRnUfLx7zlB7tYR6lAU9LCAIcWLu_h2i6mCYU0ahBJk6_P8HWZbfv4GTWzek7wZXtaR0fC5LTcRKOlO6Y7I8pa6tRf7GN2Ln/s1600/acme-spektri.jpg" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijskzqOQXlFFqsYmhyphenhyphenXNS45EpV7mbP0Vz0-9CSh8qWwKDseRnUfLx7zlB7tYR6lAU9LCAIcWLu_h2i6mCYU0ahBJk6_P8HWZbfv4GTWzek7wZXtaR0fC5LTcRKOlO6Y7I8pa6tRf7GN2Ln/s450/acme-spektri.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijskzqOQXlFFqsYmhyphenhyphenXNS45EpV7mbP0Vz0-9CSh8qWwKDseRnUfLx7zlB7tYR6lAU9LCAIcWLu_h2i6mCYU0ahBJk6_P8HWZbfv4GTWzek7wZXtaR0fC5LTcRKOlO6Y7I8pa6tRf7GN2Ln/s900/acme-spektri.jpg 2x" alt="[Image: Spectrogram showing a tone with frequency shifts.]"/></a></div>
<div class="audiodiv"><audio controls=""><source src="https://oona.windytan.com/blogfiles/pilli-thunderer.mp3">(HTML5 audio: The sound of a whistle.)</audio></div>
<p>The whistle seems to contain a sliding beginning phase, a long steady phase with frequency shifts, and a short sliding end phase. The "tail" after the end slide is just a room reverb and I'm not going to need it just yet. A slight amplitude modulation can be seen in the oscillogram. There's also noise on somewhat narrow bands around the harmonics.</p>
<p>The FM content is most clearly visible in the second and third harmonics. And seems like it could very well fit FSK data!</p>
<h3>Making it sound right</h3>
<p>I'm no expert on synthesizers, so I decided to write everything from scratch (<a href="https://gist.github.com/windytan/d3bca32a22998dab962e" class="external" title="GitHub: whistle encoder">whistle-encode.pl</a>). But I know the start phase of a sound, called the attack, is pretty important in identification. It's simple to write the rest of the fundamental tone as a simple FSK modulator; at every sample point, a data-dependent increment is added to a phase accumulator, and the signal is the cosine of the accumulator. I used a low-pass IIR filter before frequency modulation to make the transitions smoother and more "natural".</p>
<p>Adding the harmonics is just a matter of measuring their relative powers from the spectrogram, multiplying the fundamental phase angle by the index of the harmonic, and then multiplying the cosine of that phase angle by the relative power of that harmonic. SoX takes care of the WAV headers.</p>
<p>Getting the noise to sound right was trickier. I ended up generating white noise (a simple <span class="code">rand()</span>), lowpass filtering it, and then mixing a copy of it around every harmonic frequency. I gave the noise harmonics a different set of relative powers than for the cosine harmonics. It still sounds a bit too much like digital quantization noise.</p>
<h3>Embedding data</h3>
<p>There's a limit to the amount of bits that can be sent before the result starts to sound unnatural; nobody has lungs that big. A data rate of 100 bps sounded similar to the Acme Thunderer, which is pretty much nevertheless. I preceded the burst with two bytes for bit and byte sync (<span class="code">0xAA 0xA7</span>), and one byte for the packet size.</p>
<p>Here's "OHAI!":</p>
<div class="audiodiv"><audio controls=""><source src="https://oona.windytan.com/blogfiles/whistledata.mp3">(HTML5 audio: A data burst that sounds like a whistle.)</audio></div>
<p>Sounds legit to me! Here's a slightly longer one, encoding "Help me, I'm stuck inside a pea whistle":</p>
<div class="audiodiv"><audio controls=""><source src="https://oona.windytan.com/blogfiles/whistle-helpme.mp3">(HTML5 audio: A data burst that sounds like a whistle.)</audio></div>
<h3>Homework</h3>
<ol>
<li>Write a receiver for the data. It should be as simple as receiving FSK. The frequency can be determined using <span class="code">atan2</span>, a zero-crossing detector, or FFT, for instance. The synchronization bytes are meant to help decode such a short burst; the alternating 0s and 1s of <span class="code">0xAA</span> probably give us enough transitions to get a bit lock, and the <span class="code">0xA7</span> serves as a recognizable pattern to lock the byte boundaries on.</li>
<li>Build a physical whistle that does this! (Edit: <a href="https://lukelectro.wordpress.com/2015/12/28/continued-data-transmission-whistle-experiment-more-successful-this-time/" class="external">example solution</a>!)</li>
</ol>Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com18tag:blogger.com,1999:blog-5096278891763426276.post-24658680364126771832015-10-02T14:53:00.000+03:002018-02-21T21:01:16.230+02:00The microphone bioamplifier<p>As the in-ear microphone <a href="http://www.windytan.com/2015/07/case-study-tinnitus-with-distortion.html" title="absorptions: Case study: tinnitus with distortion">in the previous post</a> couldn't detect a signal that would suggest objective tinnitus, the next step would be to examine EMG signals from facial muscles. This is usually done using a special-purpose device called a bioamplifier, special-purpose electrodes, and contact gel, none of which I have at hand. A perfect opportunity for home-baking, that is!</p>
<p>There's an Instructable called <a href="http://www.instructables.com/id/How-to-make-ECG-pads-conductive-gel/?ALLSTEPS" class="external" title="How to make ECG pads & conductive gel">How to make ECG pads & conductive gel</a>. Great! Aloe vera gel and table salt for the conductive gel are no problem, neither are the snap buttons for the electrodes. I don't have bottle caps, though, so instead I cut circular pieces out of some random plastic packaging.</p>
<div class="kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-IFvsNBn7XwU4rXB823E2AWw8EPwR64TasHFdYmCYeSdOzd7eJCk1uc41pCZVZblYPgzBtv8tyWAQfdO0JlMekMzeL_Rscoonfv_LjJOW-GwXFbwTkdnbx3EsNOuiwEIYvfvTi7lAWCmN/s1600/IMG_4855-1.jpg" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-IFvsNBn7XwU4rXB823E2AWw8EPwR64TasHFdYmCYeSdOzd7eJCk1uc41pCZVZblYPgzBtv8tyWAQfdO0JlMekMzeL_Rscoonfv_LjJOW-GwXFbwTkdnbx3EsNOuiwEIYvfvTi7lAWCmN/s400/IMG_4855-1.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-IFvsNBn7XwU4rXB823E2AWw8EPwR64TasHFdYmCYeSdOzd7eJCk1uc41pCZVZblYPgzBtv8tyWAQfdO0JlMekMzeL_Rscoonfv_LjJOW-GwXFbwTkdnbx3EsNOuiwEIYvfvTi7lAWCmN/s800/IMG_4855-1.jpg 2x" alt="[Image: An electrode made out of transparent plastic.]"/></a></div>
<p>As for the bioamplifier, why can't we just use the microphone preamplifier that was used for amplifying audio in the previous post? Both are weak low-frequency signals. There's no apparent reason for why it couldn't amplify EMG, if only a digital filter was used to suppress the mains hum.</p>
<h3>It's a signal, but it's noise</h3>
<p>First, a little disclaimer. It's unwise to just plug yourself into a random electric circuit, even if Oona survived. Mic preamps, for example, are not mere passive listeners; instead they will in some cases try to apply <a href="https://en.wikipedia.org/wiki/Phantom_power" title="Wikipedia: Phantom power" class="external">phantom power</a> to the load. This can be up to 48 volts DC at 10 mA. There's anecdotal evidence of people getting palpitations from experiments like this. Or maybe not. But you wouldn't want to take the risk.</p>
<div class="kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8ZwKtWTgsN0jdLYOFdX0FEP6GkEytmI7VHseOPMYGuKlR5KvaZU9iy56ENKmiGCMB1nEEUfVj6vw369ad_ffxlIqTQwvKdl1BiRnPdV0OAoEcDE3PKItVWcPPhQL1XRYO5v-wt_6G2HHo/s1600/IMG_2923-1.jpg" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8ZwKtWTgsN0jdLYOFdX0FEP6GkEytmI7VHseOPMYGuKlR5KvaZU9iy56ENKmiGCMB1nEEUfVj6vw369ad_ffxlIqTQwvKdl1BiRnPdV0OAoEcDE3PKItVWcPPhQL1XRYO5v-wt_6G2HHo/s260/IMG_2923-1.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8ZwKtWTgsN0jdLYOFdX0FEP6GkEytmI7VHseOPMYGuKlR5KvaZU9iy56ENKmiGCMB1nEEUfVj6vw369ad_ffxlIqTQwvKdl1BiRnPdV0OAoEcDE3PKItVWcPPhQL1XRYO5v-wt_6G2HHo/s520/IMG_2923-1.jpg 2x" data-original-width="831" data-original-height="1108" alt="[Image: Photo of my cheek with an electrode attached to it.]"/></a></div>
<p>So I attached myself into some leads, soldered into a stereo miniplug, using the home-made pads that I taped on opposite sides of my face. I plugged the whole assembly into the USB sound card's mic preamp and recorded the signal at a pretty low sampling rate.</p>
<div class="kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEid_rsFra0OFNOqpe0vXgQkUkYylgUyeX_7UWwtFVB01TZk-_p3M63Hltk51RO8z3uk_caghPvRLyHwWNTNaKfWXrHfzWSDpxPOOVjWnT5lni_pi5hIAk4X51D3a1_PSuf4stmqNN1tFeSd/s1600/lihas2.jpg" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEid_rsFra0OFNOqpe0vXgQkUkYylgUyeX_7UWwtFVB01TZk-_p3M63Hltk51RO8z3uk_caghPvRLyHwWNTNaKfWXrHfzWSDpxPOOVjWnT5lni_pi5hIAk4X51D3a1_PSuf4stmqNN1tFeSd/s500/lihas2.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEid_rsFra0OFNOqpe0vXgQkUkYylgUyeX_7UWwtFVB01TZk-_p3M63Hltk51RO8z3uk_caghPvRLyHwWNTNaKfWXrHfzWSDpxPOOVjWnT5lni_pi5hIAk4X51D3a1_PSuf4stmqNN1tFeSd/s1000/lihas2.jpg 2x" data-original-width="1554" data-original-height="700" alt="[Image: Spectrogram.]"/></a></div>
<p>The signal, shown here from 0 to 700 Hz, is dominated by a mains hum (colored red-brown), as I suspected. There is indeed a strong signal present during contraction of jaw muscles (large green area). Moving the jaw left and right produces a very low-frequency signal instead (bright green splatter at the bottom).</p>
<p>It's fun to watch but still a bit of a disappointment; I was really hoping for a clear narrow-band signal near the 65 Hz frequency of interest.</p>
<h3>Einthoven's triangle</h3>
<p>At this point I was almost ready to ditch the EMG thing as uninteresting, but decided to move the electrodes around and see what kind of signals I could get. When one of them was moved far enough, a pulsating low-frequency signal would appear:<p>
<div class="kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4SM3lSCjDjXS7zIXOJFWXFRa2R-oxGCLU6RazEyh0GEnOOXooIvU2EIR_2xaRJo0k5BcSD16PdHStJR5JsyXPKsb3sE9TgWzZc2PELQkvvnIJn9sl3jprdfvx7q5H23V7wR9VzZ1sTEp0/s1600/pulsating.jpg" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4SM3lSCjDjXS7zIXOJFWXFRa2R-oxGCLU6RazEyh0GEnOOXooIvU2EIR_2xaRJo0k5BcSD16PdHStJR5JsyXPKsb3sE9TgWzZc2PELQkvvnIJn9sl3jprdfvx7q5H23V7wR9VzZ1sTEp0/s500/pulsating.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4SM3lSCjDjXS7zIXOJFWXFRa2R-oxGCLU6RazEyh0GEnOOXooIvU2EIR_2xaRJo0k5BcSD16PdHStJR5JsyXPKsb3sE9TgWzZc2PELQkvvnIJn9sl3jprdfvx7q5H23V7wR9VzZ1sTEp0/s1000/pulsating.jpg 2x" alt="[Image: Spectrogram with a regularly pulsating signal.]"/></a></div>
<p>Could this be what I think it is? To be sure about it I changed the positions of the electrodes to match Lead II in <a href="https://en.wikipedia.org/wiki/Einthoven's_triangle" title="Wikipedia: Einthoven's triangle" class="external">Einthoven's triangle</a>, as used in electrocardiography. The signal from Lead II represents potential difference between my left leg and right arm, caused by the heart.</p>
<p>After I plugged the leads in the amp already did this:</p>
<div class="kuva keskella"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhVKNcorTRIpmXogOkQi4tukKw1GYDhQjfKDe0JxG1mDEsAczybHqSP9ddPGO29y2gQjdav6pXDtJ1kJLmBZuxvJSot1tK1hUOjWTeRp6zf9_goJ6usTBO5sFRtBk_Nc8571yspwN7b9K0J/s200/pulssi.gif" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhVKNcorTRIpmXogOkQi4tukKw1GYDhQjfKDe0JxG1mDEsAczybHqSP9ddPGO29y2gQjdav6pXDtJ1kJLmBZuxvJSot1tK1hUOjWTeRp6zf9_goJ6usTBO5sFRtBk_Nc8571yspwN7b9K0J/s400/pulssi.gif 2x" alt="[Image: Animation of the signal indicator LEDs of an amplifier blinking in a rhythmic manner.]"/></div>
<p>Looks promising! The mains hum was really irritating at this point, but I could get completely rid of it by rejecting all frequencies above 45 Hz, since the signal of interest was below that.</p>
<p>The result is a beautiful view of the iconic QRS complex, caused by ventricular depolarization in the heart:</p>
<div class="kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlBDXiUU66rtGlYmP4tXO9T_AcIqoA_BiqR9SVyozwbMNLvNoPL5BzkaiJRBHUZErx-JKSvv4kyC5qupKQM0Aaa39IG5xW2Q3bV7bqr1GKaLxMfxxw0RGh9bJQ7_kRrw84W6xv3a3e7WnX/s1600/screenshot-ekg.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlBDXiUU66rtGlYmP4tXO9T_AcIqoA_BiqR9SVyozwbMNLvNoPL5BzkaiJRBHUZErx-JKSvv4kyC5qupKQM0Aaa39IG5xW2Q3bV7bqr1GKaLxMfxxw0RGh9bJQ7_kRrw84W6xv3a3e7WnX/s500/screenshot-ekg.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlBDXiUU66rtGlYmP4tXO9T_AcIqoA_BiqR9SVyozwbMNLvNoPL5BzkaiJRBHUZErx-JKSvv4kyC5qupKQM0Aaa39IG5xW2Q3bV7bqr1GKaLxMfxxw0RGh9bJQ7_kRrw84W6xv3a3e7WnX/s1000/screenshot-ekg.png 2x" alt="[Image: Oscillogram with strong triple-pointed spikes at regular intervals.]"/></a></div>
<p>Quite a side product!</p>Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com6tag:blogger.com,1999:blog-5096278891763426276.post-25664196983505977582015-07-11T10:39:00.000+03:002018-12-31T12:23:40.370+02:00Case study: tinnitus with distortion<div class="kuva oikealla" style="width:200px"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgUf6rqi-WaY6GsAMciuY1RwdK1XoL1M0KzBcPXzR3XzX3mN26qru8QhMUcDK4s4jGBYfnNUEP3ITgSuAwXzZvPPevQDob5MnY8FW8levIw5Kq5xwSIsApVQBouyKAv_8a3gKu-syNzZIvL/s1600/pta.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgUf6rqi-WaY6GsAMciuY1RwdK1XoL1M0KzBcPXzR3XzX3mN26qru8QhMUcDK4s4jGBYfnNUEP3ITgSuAwXzZvPPevQDob5MnY8FW8levIw5Kq5xwSIsApVQBouyKAv_8a3gKu-syNzZIvL/s200/pta.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgUf6rqi-WaY6GsAMciuY1RwdK1XoL1M0KzBcPXzR3XzX3mN26qru8QhMUcDK4s4jGBYfnNUEP3ITgSuAwXzZvPPevQDob5MnY8FW8levIw5Kq5xwSIsApVQBouyKAv_8a3gKu-syNzZIvL/s400/pta.png 2x" alt="[Image: A pure tone audiogram of both ears indicating no hearing loss.]"/></a></div>
<p>A periodically appearing low-frequency tinnitus is one of my least favorite signals. A doctor's visit only resulted in a <span class="caps">WONTFIX</span> and the audiogram shown here, which didn't really answer any questions. Also, the sound comes with some pecularities that warrant a deeper analysis. So it shall become one of my <i>absorptions</i>.</p>
<p>The possible subtype<a href="#Vielsmeier2012" class="ref" title="Temporomandibular Joint Disorder Complaints in Tinnitus: Further Hints for a Putative Tinnitus Subtype"> (Vielsmeier et al. 2012)</a> of tinnitus I have, related to a joint problem, is apparently even more poorly understood than the classical case<a href="#Vielsmeier2011" class="ref" title="Tinnitus with temporomandibular joint disorders: a specific entity of tinnitus patients?"> (Vielsmeier et al. 2011)</a>, which of course means I'm free to make wild speculations! And maybe throw a supporting citation here and there.</p>
<p>Here's a simulation of what it sounds like. The occasional frequency shifts are caused by head movements. (There's only low-frequency content, so headphones will be needed; otherwise it will sound like silence.)</p>
<div class="audiodiv"><audio controls=""><source src="https://oona.windytan.com/blogfiles/tinnitus.mp3">(HTML5 audio: computer-generated low-frequency tone on the right channel with some frequency shifts.)</audio></div>
<p>It's nothing new, save for the somewhat uncommon frequency. Now to the weird stuff.</p>
<h3>Real-life audio artifacts!</h3>
<p>This analysis was originally sparked by a seemingly unrelated observation. I listen to podcasts and documentaries a lot, and sometimes I've noticed the voice sounding like it had shifted up in frequency, for just a small amount. It would resemble an across-the-spectrum linear shift that breaks the harmonic relationships, much like when listening to a <abbr title="Single sideband">SSB</abbr> transmission. (Simulated sound sample from a podcast below.)</p>
<div class="audiodiv"><audio controls=""><source src="https://oona.windytan.com/blogfiles/tinnitus-am.mp3">[HTML5 audio: excerpt from a science news podcast with distorted speech.]</audio></div>
<p>I always assumed this was a compression artifact of some kind. Or maybe broken headphones. But one day I also noticed it in real life, when a friend was talking to me! I had to ask her repeat, even though I had heard her well. Surely not a compression artifact. Of course I immediately associated it with the tinnitus that had been quite strong that day. But how could a pure tone alter the whole spectrum so drastically?</p>
<h3>Amplitude modulation?</h3>
<p>It's known that a signal gets frequency-shifted when amplitude-modulated, i.e. multiplied in the time domain, by a steady sine wave signal. This is a useful effect in the realm of radio, where it's known as heterodyning. My tinnitus happens to be a near-sinusoidal tone at 65 Hz; if this got somehow multiplied with part of the actual sound somewhere in the auditory pathway, it could explain the distortion.</p>
<div class="saumaton kuva keskella invertable"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjD0WvVnLRiPh7HrR9y3cXDceGHJ-RiiljtpX2FE-G3SkBiFLS9rfgr5w78a9cf6MuFdW_NN9JkPT4B2vEII16_yiDRnvPummJXI5kl2oFUV_8L8VAeX4GMRgkv7bFb93xxdndcLc6w5CjO/s1600/multiply.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjD0WvVnLRiPh7HrR9y3cXDceGHJ-RiiljtpX2FE-G3SkBiFLS9rfgr5w78a9cf6MuFdW_NN9JkPT4B2vEII16_yiDRnvPummJXI5kl2oFUV_8L8VAeX4GMRgkv7bFb93xxdndcLc6w5CjO/s500/multiply.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjD0WvVnLRiPh7HrR9y3cXDceGHJ-RiiljtpX2FE-G3SkBiFLS9rfgr5w78a9cf6MuFdW_NN9JkPT4B2vEII16_yiDRnvPummJXI5kl2oFUV_8L8VAeX4GMRgkv7bFb93xxdndcLc6w5CjO/s1000/multiply.png 2x" alt="[Image: Oscillograms of a wideband signal and a sinusoid tone, and a multiplication of the two.]"/></a></div>
<p>Where could such a multiplication take place physically? I'm guessing it should be someplace where the signal is still represented as a single waveform. The basilar membrane in the cochlea already mechanically filters the incoming sound into frequency bands one sixth of an octave wide for neural transmission<a href="#AuditoryNeuroscience" class="ref" title="Auditory Neuroscience – Making sense of sound"> (Schnupp et al. 2012)</a>. Modulating one of these narrow bands would likely not affect so many harmonics at the same time, so it should either happen before the filtering or at a later phase, where the signal is still being handled in a time-domain manner.</p>
<p>I've had several possibilities in mind:</p>
<ol value="a">
<li>The low frequency tone could have its origins in actual physical vibration around the inner ear that would cause displacement of the basilar membrane. This is supported by a subjective physical sensation of pressure in the ear accompanying the sound. How it could cause amplitude modulation is discussed later on.</li>
<li>A somatosensory neural signal can cause inhibitory modulation of the auditory nerves in the dorsal cochlear nucleus<a href="#Young1995" class="ref" title="Somatosensory effects on neurons in dorsal cochlear nucleus"> (Young et al. 1995)</a>. If this could happen fast enough, it could lead to amplitude modulation of the sound by modulating the amount of impulses transmitted; assuming the auditory nerves still carry direct information about the waveform at this point (they sort of do). Some believe the dorsal cochlear nucleus is exactly where the perceived sound in this type of tinnitus also originates<a href="#Sanchez2011" class="ref" title="Diagnosis and management of somatosensory tinnitus: review article"> (Sanchez & Rocha 2011)</a>.</li>
</ol>
<h3>Guinea pigs know the feeling</h3>
<p>Already in the 1970s, it was demonstrated that human auditory thresholds are modulated by low frequency tones<a href="#Zwicker1977" class="ref" title="Masker period patterns produced by very-low-frequency maskers and their possible relation to basilar-membrane displacement"> (Zwicker 1977)</a>. In a 1984 paper the mechanism was investigated further in Guinea pigs<a href="#Patuzzi1984" class="ref" title="The modulation of the sensitivity of the mammalian cochlea by low frequency tones"> (Patuzzi et al. 1984)</a>. A low-frequency tone (anywhere from 33 up to 400 Hz) presented to the ear modulated the sensitivity of the cochlear hair cell voltage to higher frequency sounds. This modulation tracked the waveform of the low tone, such that the greatest amplitude suppression was during the peaks of the low tone amplitude, and there was no suppression at its zero crossings. In other words, a low tone was capable of amplitude-modulating the ear's response to higher tones.</p>
<p>This modulation was observed already in the <i>mechanical velocity</i> of the basilar membrane, even before conversion into neural voltages. Some kind of an electro-mechanical feedback process was thought to be involved.</p>
<h3>Hints towards a muscular origin</h3>
<p>So, probably a 65 Hz signal exists somewhere, whether physical vibration or neural impulses. Where does it come from? Tinnitus with vascular etiology is usually pulsatile in nature<a href="#Hofmann2013" class="ref" title="Pulsatile Tinnitus"> (Hofmann et al. 2013)</a>, so it can be ruled out. But what about muscle cramps? After all, I know there's a problem with the temporomandibular joint and nearby muscles might not be happy about that. We could get some hints by studying the frequencies related to a contracting muscle.</p>
<p>A 1974 study of <abbr title="Electroencephalogram">EEG</abbr> contamination caused by various muscles showed that the surface <abbr title="Electromyographic">EMG</abbr> signal from the masseter muscle during contraction has its peak between 50 and 70 Hz<a href="#ODonnell1974" class="ref" title="Contamination of scalp EEG spectrum during contraction of cranio-facial muscles"> (O'Donnell et al. 1974)</a>; just what we're looking for. (The masseter is located very close to the temporomandibular joint and the ear.) Later, there has been initial evidence that central neural motor commands to small muscles may be rhythmic in nature and that this rhythm is also reflected in EMG and the synchronous vibration of the contracting muscle<a href="#McAuley1997" class="ref" title="Frequency peaks of tremor, muscle vibration and electromyographic activity at 10 Hz, 20 Hz and 40 Hz during human finger muscle contraction may reflect rhythmicities of central neural firing"> (McAuley et al. 1997)</a>.</p>
<p>Sure enough, in my case, applying firm pressure to the deep masseter or the posterior digastric muscle temporarily silences the sound.</p>
<h3>Recording it</h3>
<p>Tinnitus associated with a physical sound detectable by an outside observer, a rare occurrence, is described as <i>objective</i><a href="#Hofmann2013" class="ref" title="Pulsatile Tinnitus"> (Hofmann et al. 2013)</a>. My next plan was to use a small in-ear microphone setup to try and find out if there was an objective sound present. This would shed light on the way the sound is transmitted from the muscles to the auditory system, as if it made any difference.</p>
<p>But before I could do that, I went to this loud open air trance party (with DJ Tristan) that, for some reason, eradicated the whole tinnitus that had been going on for a week or two. I had to wait for a week before it reappeared. (And I noted it being the result of a stressful situation, as people on Twitter and HN have also pointed out.)</p>
<div class="kuva oikealla" style="width:200px"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4B5y1VkH6XtRcQ-gsZZMbs18mzyztspQqSW5kPw5CuPgWXwbkwLlRMVvwp1uLwXqrtrsD3XMRFLn6UdWRdkf-WZSl5wO2olOFTBp60mCR7c3iMBuRDWzm_g9jntBJlvFsOzen4tSXvNfA/s1600/IMG_4749-1.jpg" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4B5y1VkH6XtRcQ-gsZZMbs18mzyztspQqSW5kPw5CuPgWXwbkwLlRMVvwp1uLwXqrtrsD3XMRFLn6UdWRdkf-WZSl5wO2olOFTBp60mCR7c3iMBuRDWzm_g9jntBJlvFsOzen4tSXvNfA/s200/IMG_4749-1.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4B5y1VkH6XtRcQ-gsZZMbs18mzyztspQqSW5kPw5CuPgWXwbkwLlRMVvwp1uLwXqrtrsD3XMRFLn6UdWRdkf-WZSl5wO2olOFTBp60mCR7c3iMBuRDWzm_g9jntBJlvFsOzen4tSXvNfA/s400/IMG_4749-1.jpg 2x" alt="[Image: Sennheiser earplugs connected to the microphone preamp input of a Xenyx 302 USB audio interface.]"/></a></div>
<p>Now I could do a measurement. I used my earplugs as a microphone by plugging them into a mic preamplifier using a plug adapter. It's a mono preamp, so I disconnected the left channel of the adapter using cellotape to just record from the right ear.</p>
<p>I set <a href="http://baudline.com/" class="external" title="baudline signal analyzer">baudline</a> for 2-minute spectral integration time and a 600 Hz decimated sample rate, and the preamp to its maximum gain. Even though the setup is quite sensitive and the earplug has very good isolation, I wasn't able to detect even the slightest peak at 65 Hz. So either recording outside the tympanic membrane was an absurd idea to begin with, or maybe the neural explanation is the more likely cause of the sound.</p>
<div class="saumaton kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3wgd9SJLF2ipcP0qQbHXm5ySvnsQvEvirSipnvH0QkzlsC7AUi4K8EMqqhlqcqXIPz1VwOOiawMuhSlocElM30402-Y-0m7qzpUUoIC0pniwtE6Lm5QmBEYTw0Vdl3j67pxFfmIwrauDS/s1600/Screen+Shot+2015-07-11+at+17.42.31.png" imageanchor="1"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3wgd9SJLF2ipcP0qQbHXm5ySvnsQvEvirSipnvH0QkzlsC7AUi4K8EMqqhlqcqXIPz1VwOOiawMuhSlocElM30402-Y-0m7qzpUUoIC0pniwtE6Lm5QmBEYTw0Vdl3j67pxFfmIwrauDS/s400/Screen+Shot+2015-07-11+at+17.42.31.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3wgd9SJLF2ipcP0qQbHXm5ySvnsQvEvirSipnvH0QkzlsC7AUi4K8EMqqhlqcqXIPz1VwOOiawMuhSlocElM30402-Y-0m7qzpUUoIC0pniwtE6Lm5QmBEYTw0Vdl3j67pxFfmIwrauDS/s800/Screen+Shot+2015-07-11+at+17.42.31.png 2x" alt="[Image: Screenshot of baudline with the result of spectral integration from 0 to 150 Hz, with nothing to note but a slight downward slope towards the higher frequencies.]"/></a></div>
<p>My next post, <a href="https://www.windytan.com/2015/10/the-microphone-bioamplifier.html">The microphone bioamplifier</a>, starts by exploring this further.</p>
<h3>References</h3>
<ul class="references">
<li id="Hofmann2013">Hofmann, E., Behr, R., Neumann-Haefelin, T., Schwager, K. (2013): <a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3719451/" title="Pulsatile Tinnitus" class="external">Pulsatile Tinnitus</a>. <span class="ref-title">Deutsches Ärtzeblatt</span> <span class="ref-volume">110</span>(26): 451–458.</li>
<li id="McAuley1997">McAuley, J., Rothwell, J., Marsden, C. (1997): <a href="http://www.ncbi.nlm.nih.gov/pubmed/9187289" class="external" title="Frequency peaks of tremor, muscle vibration and electromyographic activity at 10 Hz, 20 Hz and 40 Hz during human finger muscle contraction may reflect rhythmicities of central neural firing">Frequency peaks of tremor, muscle vibration and electromyographic activity at 10 Hz, 20 Hz and 40 Hz during human finger muscle contraction may reflect rhythmicities of central neural firing</a>. <span class="ref-title">Experimental Brain Research</span> <span class="ref-volume">114</span>(3): 525–41.</li>
<li id="ODonnell1974">O'Donnell, R., Berkhout, J., Adey, W.R. (1974): <a href="http://www.ncbi.nlm.nih.gov/pubmed/4135021" class="external" title="Contamination of scalp EEG spectrum during contraction of cranio-facial muscles">Contamination of scalp EEG spectrum during contraction of cranio-facial muscles</a>. <span class="ref-title">Electroencephalography and Clinical Neurophysiology</span> <span class="ref-volume">37</span>(2): 145–51.</li>
<li id="Patuzzi1984">Patuzzi, R., Sellick, P.M., Johnstone, B.M. (1984): <a href="http://www.sciencedirect.com/science/article/pii/0378595584900911" class="external" title="The modulation of the sensitivity of the mammalian cochlea by low frequency tones">The modulation of the sensitivity of the mammalian cochlea by low frequency tones</a>. <span class="ref-title">Hearing Research</span> <span class="ref-volume">13</span>(1): 19–27.</li>
<li id="Sanchez2011">Sanchez, T.G., Rocha, C.B. (2011): <a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3129953/" class="external" title="Diagnosis and management of somatosensory tinnitus: review article">Diagnosis and management of somatosensory tinnitus: review article</a>. <span class="ref-title">Clinics</span> <span class="ref-volume">66</span>(6): 1089–1094.</li></li>
<li id="AuditoryNeuroscience">Schnupp, J., Nelken, I., King, A. (2012), <a href="http://cognet.mit.edu/book/auditory-neuroscience" class="external" title="Auditory Neuroscience – Making sense of sound">Auditory Neuroscience – Making sense of sound</a>. 368 pp., MIT Press.</li>
<li id="Vielsmeier2011">Vielsmeier, V. et al. (2011): <a href="http://www.ncbi.nlm.nih.gov/pubmed/21705788" class="external" title="Tinnitus with temporomandibular joint disorders: a specific entity of tinnitus patients?">Tinnitus with temporomandibular joint disorders: a specific entity of tinnitus patients?</a> <span class="ref-title">Otolaryngology – Head and Neck Surgery</span> <span class="ref-volume">145</span>(5): 748–52.</li>
<li id="Vielsmeier2012">Vielsmeier, V. et al. (2012): <a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0038887" class="external" title="Temporomandibular Joint Disorder Complaints in Tinnitus: Further Hints for a Putative Tinnitus Subtype">Temporomandibular Joint Disorder Complaints in Tinnitus: Further Hints for a Putative Tinnitus Subtype</a>. PLoS ONE <span class="ref-volume">7</span>(6): e38887.</li>
<li id="Young1995">Young, E.D., Nelken, I., et al. (1995): <a href="http://www.ncbi.nlm.nih.gov/pubmed/7760132" class="external" title="Somatosensory effects on neurons in dorsal cochlear nucleus">Somatosensory effects on neurons in dorsal cochlear nucleus</a>. <span class="ref-title">Journal of Neurophysiology</span> <span class="ref-volume">73</span>(2): 743–65.</li>
<li id="Zwicker1977">Zwicker, E. (1977): <a href="http://scitation.aip.org/content/asa/journal/jasa/61/4/10.1121/1.381387" class="external" title="Masker period patterns produced by very-low-frequency maskers and their possible relation to basilar-membrane displacement">Masker period patterns produced by very-low-frequency maskers and their possible relation to basilar-membrane displacement</a>. <span class="ref-title">Journal of the Acoustical Society of America</span> <span class="ref-volume">61</span>: 1031–1040.</li>
</ul>Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com32tag:blogger.com,1999:blog-5096278891763426276.post-4634696572449762672015-04-14T20:17:00.002+03:002022-10-23T22:55:15.312+03:00Trackers leaking bank account data<p>A Finnish online bank used to include a US-based third-party analytics and tracking script in all of its pages. Ospi <a href="http://ospi.netcode.fi/blog/mita-tietoja-s-pankki-valittaa-kolmannelle-osapuolelle.html" class="external" title="Mitä tietoja S-Pankki välittää kolmannelle osapuolelle – oBlog">first wrote</a> about it (in Finnish) in February 2015, and this caused a bit of a fuss.</p>
<p>The bank <a href="https://twitter.com/S_Pankki/status/569878961209143296" class="external" title="S-Pankin arkea on Twitter">responded</a> to users' worries by claiming that all information is collected anonymously:</p>
<div class="saumaton kuva keskella invertable"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcdvnM-vdWEEJETMI5Io0tiq36nvh4Yyyd46NGsANXQsXMOy-lydxIPiip72T50DUbL4O-kxxotmawOZleC0u7g-kNp2EDZs5-HDpTtsmv8tQ0319rU7rMdVhsNdbYz4ZlfDG_I097Nb5k/s450/asiakkaidemme.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcdvnM-vdWEEJETMI5Io0tiq36nvh4Yyyd46NGsANXQsXMOy-lydxIPiip72T50DUbL4O-kxxotmawOZleC0u7g-kNp2EDZs5-HDpTtsmv8tQ0319rU7rMdVhsNdbYz4ZlfDG_I097Nb5k/s900/asiakkaidemme.png 2x" alt="[Image: A tweet by the bank, in Finnish. Translation: Our customers' personal data will not be made available to Google under any circumstances. Thanks to everyone who participated in the discussion! (2/2)]"/></div>
<p>But is it true?</p>
<p>As Ospi notes, a plethora of information is sent along the HTTP request for the tracker script. This includes, of course, the IP address of the user; but also the full URL the user is browsing. The bank's URLs reveal quite a bit about what the user is doing; for instance, a user planning to start a continuous savings contract will send the url <span class="code">continuousSavingsContractStep1.do</span>.</p>
<p>I logged in to the bank (using well-known demo credentials) to record one such tracking request. The URL sent to the third party tracker contains a cleartext transaction archive code that could easily be used to match a transaction between two bank accounts, since it's identical for both users. But there's also a hex string called <span class="code">accountId</span> (highlighted in red).</p>
<pre class="term">Remote Address: 80.***.***.***:443
Request URL: https://www.google-analytics.com/collect?v=1&_v=j33&a=870588619&t
=pageview&_s=1&dl=https%3A%2F%2Fonline.********.fi%2Febank%2Facco
unt%2FinitTransactionDetails.do%3FbackLink%3Dreset%26<strong style="color:#e600e6">accountId</strong>%3D
<strong style="color:#e600e6">69af881eca98b7042f18e975e00f9d49d5d5ee64</strong>%26rowNo%3D0%26type%3Dtra
ns%26archivecode%3D20150220123456780002&ul=en-us&de=windows-1252&
dt=Tilit%C2%A0%7C%C2%A0Verkkopankki%20%7C%20S-Pankki&sd=24-bit&sr
=1440x900&vp=1440x150&je=1&fl=16.0%20r0&_u=QACAAQQBI~&jid=&cid=18
39557247.1424801770&uid=&tid=UA-37407484-1&cd1=&cd2=demo_accounts
&cd3=%2Ffi%2F&z=2098846672
Request Method: GET
Status Code: 200 OK</pre>
<p>It's 40 hex characters long, which is 160 bits. This happens to be the length of an SHA-1 hash.</p>
<p>Could it really be a simple hash of the user's bank account number? Surely they would at least salt it.</p>
<p>Let's try!</p>
<p>The demo account's IBAN code is FI96 3939 0001 0006 03, but this doesn't give us the above hash. However, if we remove the country code, IBAN checksum, and all whitespaces, it turns out we have a match!</p>
<pre class="term">$ <span class="userinput">echo -n "FI96 3939 0001 0006 03" | shasum</span>
dcf04c4fd3b6e29b4b43a8bf43c2713ac9be1de2 -
$ <span class="userinput">echo -n "FI9639390001000603" | shasum</span>
3e3658e4c2802dd5c21b1c6c1ed55fc1f39c8830 -
$ <span class="userinput">echo -n "39390001000603" | shasum</span>
<em style="color:#e600e6">69af881eca98b7042f18e975e00f9d49d5d5ee64</em> -
$ █</pre>
<p>This is a BBAN format bank account number. BBAN numbers are easy to brute-force, especially if the bank is already known. I wrote the following C program, ~25 lines of code, that reversed the above hash to the correct account number in 0.5 seconds.</p>
<pre><code>#include <openssl/sha.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define BBAN_LENGTH 14
int main() {
const char target_hash[SHA_DIGEST_LENGTH] = {
"\x69\xaf\x88\x1e\xca\x98\xb7\x04\x2f\x18"
"\xe9\x75\xe0\x0f\x9d\x49\xd5\xd5\xee\x64"
};
unsigned char try_accnum[BBAN_LENGTH+1];
unsigned char try_hash[SHA_DIGEST_LENGTH];
for (int bban_office=0; bban_office < 1e4; bban_office++) {
for (int bban_id=0; bban_id < 1e6; bban_id++) {
snprintf((char*)try_accnum, sizeof(try_accnum),
"3939%04d%06d", bban_office, bban_id);
SHA1(try_accnum, BBAN_LENGTH, try_hash);
if (memcmp(try_hash, target_hash, SHA_DIGEST_LENGTH) == 0) {
printf("found %s\n", try_accnum);
return EXIT_SUCCESS;
}
}
}
return EXIT_FAILURE;
}</code></pre>
<pre class="term">$ <span class="userinput">gcc -lcrypto -o bban_hash bban_hash.c</span>
$ <span class="userinput">time ./bban_hash</span>
found 39390001000603
./bban_hash 0.42s user 0.00s system 99% cpu 0.420 total
$ █</pre>
<p>In conclusion, the third party is provided with the user's IP address, bank account number, addresses of subpages they visit, and account numbers associated with all transactions they make. The analytics company should also have no difficulty matching the user with its own database collected from other sites, including their full name and search history.</p>
<p>Incidentally, this is in breach of the <a href="http://www.finanssiala.fi/en/material/Guidelines_on_bank_secrecy.pdf" class="external" title="Guidelines on bank secrecy 2009 (PDF)">Guidelines on bank secrecy (PDF)</a> by the Federation of Finnish Financial Services; "In accordance with the secrecy obligation, third parties may not even be disclosed whether a certain person is a customer of the bank" (pg 4) (<a href="http://www.finanssiala.fi/materiaalit/Pankkisalaisuusohjeet.pdf" class="external" title="Pankkisalaisuusohjeet 2009 (PDF)">sama suomeksi</a>: "Salassapitovelvollisuus sisältää myös sen, että sivullisille ei ilmoiteta edes sitä, onko tietty henkilö pankin asiakas vai ei").</p>
<h3>Solution</h3>
<p>The script was eventually <a href="http://www.iltasanomat.fi/digi/art-1424915743167.html" class="external" title="Tietosuojahuolet vaikuttivat – S-Pankki lopetti Googlen käytön – Digi – Ilta-Sanomat">removed</a> from the site, leaving the bank regretful that such a useful tool was lost.</p>
<p>However, alternatives do exist (like Piwik) that can be run locally, not involving a third party. <strong>Edit:</strong> <em>The Intercept</em>, a news website, is <a href="https://theintercept.com/2015/11/04/what-the-intercepts-new-audience-measurement-system-means-for-reader-privacy" title="What The Intercept’s New Audience Measurement System Means for Reader Privacy" class="external">using non-privacy-invading metrics</a>.</p>
<h3>External links</h3>
<ul class="references">
<li><a href="http://ospi.netcode.fi/blog/mita-tietoja-s-pankki-valittaa-kolmannelle-osapuolelle.html" class="external" title="Mitä tietoja S-Pankki välittää kolmannelle osapuolelle – oBlog">Mitä tietoja S-Pankki välittää kolmannelle osapuolelle – oBlog</a></li>
<li><a href="https://twitter.com/S_Pankki/status/569878961209143296" class="external" title="S-Pankin arkea on Twitter">S-Pankin arkea on Twitter: "Asiakkaidemme henkilötietoja ei missään tapauksessa siirry Googlen käyttöön. Kiitos kaikille keskusteluun osallistuneille! (2/2)"</a></li>
<li><a href="http://www.finanssiala.fi/en/material/Guidelines_on_bank_secrecy.pdf" class="external" title="Guidelines on bank secrecy 2009 (PDF)">Guidelines on bank secrecy</a> (PDF)</li>
<li><a href="http://www.finanssiala.fi/materiaalit/Pankkisalaisuusohjeet.pdf" class="external" title="Pankkisalaisuusohjeet 2009 (PDF)">Pankkisalaisuusohjeet</a> (PDF)</li>
<li><a href="https://web.archive.org/web/20150301010150/http://www.iltasanomat.fi/digi/art-1424915743167.html" class="external" title="Tietosuojahuolet vaikuttivat – S-Pankki lopetti Googlen käytön – Digi – Ilta-Sanomat">Tietosuojahuolet vaikuttivat – S-Pankki lopetti Googlen käytön – Ilta-Sanomat (archive.org)</a></li>
<li><a href="https://theintercept.com/2015/11/04/what-the-intercepts-new-audience-measurement-system-means-for-reader-privacy" title="What The Intercept’s New Audience Measurement System Means for Reader Privacy" class="external">What The Intercept’s New Audience Measurement System Means for Reader Privacy</a>
</ul>Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com8tag:blogger.com,1999:blog-5096278891763426276.post-8163837286265501382015-02-08T21:10:00.000+02:002019-05-26T10:48:29.384+03:00Receiving RDS with the RTL-SDR<p><span class="code">redsea</span> is a command-line RDS decoder. I originally wrote it as a script to decode RDS from <a href="http://www.windytan.com/2013/04/how-i-discovered-rds.html" title="absorptions: How I discovered RDS">demultiplexed FM stereo sound</a>. Later I've experimented with other ways to read the bits, and the latest addition is to support the RTL-SDR television receiver via the <span class="code">rtl_fm</span> tool.</p>
<p>Redsea is on <a href="https://github.com/windytan/redsea" class="external" title="windytan/redsea">GitHub</a>. It has minimal dependencies (perl core modules, C standard library, rtl-sdr command-line tools) and has been tested to work on OSX and Linux with good enough FM reception. All test results, ideas, and pull requests are welcome.</p>
<div class="update"><strong>Update 12/2016:</strong> Redsea has seen a lot of development since this post was written; see <a href="http://www.windytan.com/2016/10/redsea-07-lightweight-rds-decoder.html" title="absorptions: Redsea 0.7, a lightweight RDS decoder">Redsea 0.7, a lightweight RDS decoder</a>.</div>
<h3>What it says</h3>
<p>The program prints out decoded RDS groups, one group per line. Each group will contain a PI code identifying the station plus varying other data, depending on the group type. The below picture explains the types of data you'll probably most often encounter.</p>
<div class="saumaton kuva keskella invertable"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEghDSuLUBhcKfHe0dEKhePRSEWFhrKu3tNkSLDL7rHZQUfMFDTcbS2AhNSlM2HRToqNGoq6YSdk3fIxa3fcF2cgGkfCmpmNA_QVIfaYgLN21JJ-LwF-Avf__oVdquGFti67-DKlqYrqZj6T/s1600/rds-groups.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEghDSuLUBhcKfHe0dEKhePRSEWFhrKu3tNkSLDL7rHZQUfMFDTcbS2AhNSlM2HRToqNGoq6YSdk3fIxa3fcF2cgGkfCmpmNA_QVIfaYgLN21JJ-LwF-Avf__oVdquGFti67-DKlqYrqZj6T/s520/rds-groups.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEghDSuLUBhcKfHe0dEKhePRSEWFhrKu3tNkSLDL7rHZQUfMFDTcbS2AhNSlM2HRToqNGoq6YSdk3fIxa3fcF2cgGkfCmpmNA_QVIfaYgLN21JJ-LwF-Avf__oVdquGFti67-DKlqYrqZj6T/s1040/rds-groups.png 2x" alt="[Image: Screenshot of textual output from redsea, with some parts explained.]"/></a></div>
<p>A more verbose output can be enabled with the <span class="code">-l</span> option (it contains the same information though). The <span class="code">-t</span> option prefixes all groups with an ISO timestamp.</p>
<h3>How it works</h3>
<p>The DSP side of my program, named <span class="code">rtl_redsea</span>, is written in C99. It's a synchronous DBPSK receiver that first bandpass filters ① the multiplex signal. A PLL locks onto the 19 kHz stereo pilot tone; its third harmonic (57 kHz) is used to regenerate the RDS subcarrier. Dividing it by 16 also gives us the 1187.5 Hz clock frequency. Phase offsets of these derived signals are adjusted separately.</p>
<div class="saumaton kuva keskella invertable"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0-32woZW6BB5JDgpyzAf3g6a9QG30gFenMBF9dyxaWlwVDJs3tcnuvOY8W47uLsHahxks2255_giLHj6AP7uDGm_o8CWR0JKg0EniI5Zq50bzKexdQ5PD1zwSSEasGilj7iEj0x8zg7a2/s1600/redsea-waves-all.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0-32woZW6BB5JDgpyzAf3g6a9QG30gFenMBF9dyxaWlwVDJs3tcnuvOY8W47uLsHahxks2255_giLHj6AP7uDGm_o8CWR0JKg0EniI5Zq50bzKexdQ5PD1zwSSEasGilj7iEj0x8zg7a2/s500/redsea-waves-all.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0-32woZW6BB5JDgpyzAf3g6a9QG30gFenMBF9dyxaWlwVDJs3tcnuvOY8W47uLsHahxks2255_giLHj6AP7uDGm_o8CWR0JKg0EniI5Zq50bzKexdQ5PD1zwSSEasGilj7iEj0x8zg7a2/s1000/redsea-waves-all.png 2x" alt="[Image: Oscillograms illustrating how the RDS subcarrier is gradually processed in redsea and finally reduced to a series of 1's and 0's.]"/></a></div>
<p>The local 57 kHz carrier is synchronized so that the constellation lines up on the real axis, so we can work on the real part only ②. Biphase symbols are multiplied by the square-wave clock and integrated ③ over a clock period, and then dumped into a delta decoder ④, which outputs the binary data as bit strings into <span class="code">stdout</span> ⑤.</p>
<p>Signal quality is estimated a couple of times per second by counting the number of "suspicious" integrated biphase symbols, i.e. symbols with halves of opposite signs. The symbols are being sampled with a 180° phase shift as well, and we can switch to that stream if it seems to produce better results.</p>
<p>This low-throughput binary string data is then handled by <span class="code">redsea.pl</span> via a pipe. Synchronization and error detection/correction happens there, as well as decoding. Group data is then displayed on the terminal, in semi-human-readable form.</p>
<h3>Future</h3>
<p>My ultimate goal is to have a tool useful for FM DX, i.e. pretty good noise resistance.</p>Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com44tag:blogger.com,1999:blog-5096278891763426276.post-45163341549236184542015-01-16T19:54:00.000+02:002018-02-21T21:15:03.221+02:00My chip collection<p>Old IC (integrated circuit) packages are fun and I collect them. This involves going to flea markets to look for cheap vintage electronics like telephones, answering machines, radios or toys, and then desoldering and salvaging all the ICs and other interesting parts. Selected packages from my disorganized pile of chips follow. Most are <a href="https://en.wikipedia.org/wiki/Plain_old_telephone_service" title="Wikipedia: Plain old telephone service" class="external">POTS</a>-related. </p>
<h3>Sony CXA1619BS</h3>
<div class="saumaton kuva oikealla"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgLzWDjhxrtaQkYMEY5mWA64DopVUjJg2XKcYjY1wq9Tnf0JfnlnYT_jAhJjGfJhRdvGty4qnwiQBiG7M0KVHlzKGHFMCP0Kh7UdX9Yg6Hzl9k-fO9ThClrQx5Vrh-fQMQiRudLJA6iyIbb/s160/IMG_3605-1.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgLzWDjhxrtaQkYMEY5mWA64DopVUjJg2XKcYjY1wq9Tnf0JfnlnYT_jAhJjGfJhRdvGty4qnwiQBiG7M0KVHlzKGHFMCP0Kh7UdX9Yg6Hzl9k-fO9ThClrQx5Vrh-fQMQiRudLJA6iyIbb/s320/IMG_3605-1.jpg 2x" style="width:160px" alt="[Image: Photo of package]"/></div>
<p>A "one-chip-wonder", this is an FM/AM radio in a small package. It takes an RF signal (from the antenna) and an IF oscillator frequency as inputs and outputs demodulated monaural audio. </p>
<div style="clear:both"></div>
<h3>Sanyo LA2805</h3>
<div class="saumaton kuva oikealla"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0WZVUi88LlKBhiBmlw3DnJugciSIWORU48FXUIk5_OBp1aMvzDUV0-xC1ftarqhxy1zLKTWOSrsB3Nn8L79WGSzlZm5x68X4zXqfLjK-IkVnupFn57kA-HR3OIHDwYRm7gQ5CwFJw3OP0/s160/IMG_3667-1.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0WZVUi88LlKBhiBmlw3DnJugciSIWORU48FXUIk5_OBp1aMvzDUV0-xC1ftarqhxy1zLKTWOSrsB3Nn8L79WGSzlZm5x68X4zXqfLjK-IkVnupFn57kA-HR3OIHDwYRm7gQ5CwFJw3OP0/s320/IMG_3667-1.jpg 2x" style="width:160px" alt="[Image: Photo of package]"/></div>
<p>This chip does general answering machine related tasks. It has a tape preamp for recording and playback; voice detector logic; beep detection using zero-crossing comparation; power amplifier; line amplifier; and pins for interfacing with a microcontroller. </p>
<div style="clear:both"></div>
<h3>Unicorn Microelectronics UM91215C</h3>
<div class="saumaton kuva oikealla"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjo_JNMkqcclx0ZYVhiY4y7q9hYhsxxlI6IdFT9td71lWQnXbYoAQeFvc0E0hwF65HQqF6vjRuYlW4Z3PAcv9bmHjv7eI4-zEqTA_0xvONXgwIy6cqK4jriaE5KS93vjk3WiZ3DoyQ5QD9f/s160/IMG_3609-1.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjo_JNMkqcclx0ZYVhiY4y7q9hYhsxxlI6IdFT9td71lWQnXbYoAQeFvc0E0hwF65HQqF6vjRuYlW4Z3PAcv9bmHjv7eI4-zEqTA_0xvONXgwIy6cqK4jriaE5KS93vjk3WiZ3DoyQ5QD9f/s320/IMG_3609-1.jpg 2x" style="width:160px" alt="[Image: Photo of package]"/></div>
<p>The UM91215C is a tone/pulse dialer. A telephone keyboard matrix is connected to the input pins, and the chip outputs DTMF-encoded audio or pulsed digits, depending on the selected dialing mode. An external oscillator needs to be connected as well. It can do a one-key redial of the last dialed number, and it can also flash the phone line.</p>
<div style="clear:both"></div>
<h3>Holtek HT9170</h3>
<div class="saumaton kuva oikealla"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgh1m8g62zYfyKiLAosd0IbONX1i9vvzrd9Dk7BwbDSYA6HSjZ752s3M6ReVSFwJTKi2bMAGui4z_Q-hfYS8YUavkB_a1ZMEXukM6HyAB1QEWw3mtRAbcMn9abimPJX038No6raNod5J8g/s160/IMG_3606-1.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgh1m8g62zYfyKiLAosd0IbONX1i9vvzrd9Dk7BwbDSYA6HSjZ752s3M6ReVSFwJTKi2bMAGui4z_Q-hfYS8YUavkB_a1ZMEXukM6HyAB1QEWw3mtRAbcMn9abimPJX038No6raNod5J8g/s320/IMG_3606-1.jpg 2x" style="width:160px" alt="[Image: Photo of package]"/></div>
<p>A DTMF receiver, reversing the operation of UM91215C above. The chip, employing filters and zero-crossing detectors, is fed an external oscillator frequency and telephone line audio, and it outputs a four-bit code corresponding to the DTMF digit present in the signal. The use of external components is minimal, but a crystal oscillator is needed in this case as well.</p>
<div style="clear:both"></div>
<h3>SGS-Thomson TDA1154</h3>
<div class="saumaton kuva oikealla"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgu_hnwUfVbB91VIe0rcD9YcP0NM8urUYE2qNN6ZCmT2umiEm8k_b68-eiBrvZmsBRt489NmFWFzyh-EE2OpJxcQwrIvGeYvmGDPfWH4AzqkKxCJ1LFv_wG9xmrCkmOs6eJiFJfC2jbETfV/s160/IMG_3617-1.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgu_hnwUfVbB91VIe0rcD9YcP0NM8urUYE2qNN6ZCmT2umiEm8k_b68-eiBrvZmsBRt489NmFWFzyh-EE2OpJxcQwrIvGeYvmGDPfWH4AzqkKxCJ1LFv_wG9xmrCkmOs6eJiFJfC2jbETfV/s320/IMG_3617-1.jpg 2x" style="width:160px" alt="[Image: Photo of package]" /></div>
<p>A speed regulator for DC motors, this chip can keep a motor running at a very stable speed under varying load conditions. In an answering machine, it is needed to keep <a href="http://www.windytan.com/2013/03/beyond-hiss.html" title="absorptions: Beyond the hiss">distortions in tape audio</a> in the minimum.</p>
<div style="clear:both"></div>
<h3>Toshiba TC8835AN</h3>
<div class="saumaton kuva oikealla"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjGCGn7_I2GlylRc5oc6ogAu3W8TE7GczyLw2sKrx4CrTzmw5zJc1n__lnSgtn2-Klpg9GAwF6sB75wX9K06DMDadP8LFehYXKcLjU1FtWGpYRAnBMKMSsTFkwgTkyUKs2glNtP2rXo6bgK/s160/IMG_3602-1.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjGCGn7_I2GlylRc5oc6ogAu3W8TE7GczyLw2sKrx4CrTzmw5zJc1n__lnSgtn2-Klpg9GAwF6sB75wX9K06DMDadP8LFehYXKcLjU1FtWGpYRAnBMKMSsTFkwgTkyUKs2glNtP2rXo6bgK/s320/IMG_3602-1.jpg 2x" style="width:160px" alt="[Image: Photo of package]"/></div>
<p>This chip can store and play back a total of 16 audio recordings of 512 kilobits in size. It also contains a lot of command logic, explained in a 40-page datasheet. Type of audio encoding is not specified, but the bitrate can be chosen between 22kbps and 16kbps. The analog output must be filtered prior to playback.</p>
<div style="clear:both"></div>
<h3>Intel 8049</h3>
<div class="saumaton kuva oikealla"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9XkAxP9Pu088raQuCmkEIIFDwEcYwfb6G8japeR98RK5Kiku0S3FekMBIJi5gb3jIs7SKCVJ7QxutJ8GGUoAmHQvaJgNk3f1fNZmId51AGa3gsd9J-gkRKagi2PQiV_VLX4g2hP7Wlsri/s320/IMG_3623-1.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9XkAxP9Pu088raQuCmkEIIFDwEcYwfb6G8japeR98RK5Kiku0S3FekMBIJi5gb3jIs7SKCVJ7QxutJ8GGUoAmHQvaJgNk3f1fNZmId51AGa3gsd9J-gkRKagi2PQiV_VLX4g2hP7Wlsri/s640/IMG_3623-1.jpg 2x" style="width:320px" alt="[Image: Photo of package]"/></div>
<p>This monster of a chip is a 6 MHz, 8-bit microcontroller with 17 registers, 2 kilobytes ROM, 128 bytes RAM, and an instruction set of 90 codes. It's used in many older devices, from telephones to digital multimeters.</p>Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com5tag:blogger.com,1999:blog-5096278891763426276.post-70397454511186186562014-10-30T01:52:00.002+02:002018-02-21T20:58:11.287+02:00Visualizing hex dumps with Unicode emoji<p>Memorizing SSH public key fingerprints can be difficult; they're just long random numbers displayed in base 16. There are some terminal-friendly solutions, like OpenSSH's randomart. But because I use a Unicode terminal, I like to map the individual bytes into characters in the <a href="https://en.wikipedia.org/wiki/Miscellaneous_Symbols_and_Pictographs" class="external" title="Wikipedia: Miscellaneous Symbols and Pictographs">Miscellaneous Symbols and Pictographs</a> block.</p>
<p>This Perl script does just that:</p>
<pre><code>
@emoji = qw( 🌀 🌂 🌅 🌈 🌙 🌞 🌟 🌠 🌰 🌱 🌲 🌳 🌴 🌵 🌷 🌸
🌹 🌺 🌻 🌼 🌽 🌾 🌿 🍀 🍁 🍂 🍃 🍄 🍅 🍆 🍇 🍈
🍉 🍊 🍋 🍌 🍍 🍎 🍏 🍐 🍑 🍒 🍓 🍔 🍕 🍖 🍗 🍘
🍜 🍝 🍞 🍟 🍠 🍡 🍢 🍣 🍤 🍥 🍦 🍧 🍨 🍩 🍪 🍫
🍬 🍭 🍮 🍯 🍰 🍱 🍲 🍳 🍴 🍵 🍶 🍷 🍸 🍹 🍺 🍻
🍼 🎀 🎁 🎂 🎃 🎄 🎅 🎈 🎉 🎊 🎋 🎌 🎍 🎎 🎏 🎒
🎓 🎠 🎡 🎢 🎣 🎤 🎥 🎦 🎧 🎨 🎩 🎪 🎫 🎬 🎭 🎮
🎯 🎰 🎱 🎲 🎳 🎴 🎵 🎷 🎸 🎹 🎺 🎻 🎽 🎾 🎿 🏀
🏁 🏂 🏃 🏄 🏆 🏇 🏈 🏉 🏊 🐀 🐁 🐂 🐃 🐄 🐅 🐆
🐇 🐈 🐉 🐊 🐋 🐌 🐍 🐎 🐏 🐐 🐑 🐒 🐓 🐔 🐕 🐖
🐗 🐘 🐙 🐚 🐛 🐜 🐝 🐞 🐟 🐠 🐡 🐢 🐣 🐤 🐥 🐦
🐧 🐨 🐩 🐪 🐫 🐬 🐭 🐮 🐯 🐰 🐱 🐲 🐳 🐴 🐵 🐶
🐷 🐸 🐹 🐺 🐻 🐼 🐽 🐾 👀 👂 👃 👄 👅 👆 👇 👈
👉 👊 👋 👌 👍 👎 👏 👐 👑 👒 👓 👔 👕 👖 👗 👘
👙 👚 👛 👜 👝 👞 👟 👠 👡 👢 👣 👤 👥 👦 👧 👨
👩 👪 👮 👯 👺 👻 👼 👽 👾 👿 💀 💁 💂 💃 💄 💅 );
while (<>) {
if (/[a-f0-9:]+:[a-f0-9:]+/) {
($b, $m, $a) = ($`, $&, $');
print $b.join(" ", map { $emoji[$_] } map hex, split /:/, $m)." ".$a;
}
}</code></pre>
<p>What's happening here? First we create a 256-element array containing a hand-picked collection of emoji. Naturally, they're all assigned an index from <span class="code">0x00</span> to <span class="code">0xff</span>. Then we'll loop through standard input and look for lines containing colon-separated hex bytes. Each hex value is replaced with an emoji from the array.</p>
<p>Here's the output:</p>
<div class="saumaton kuva keskella"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSqjooSaf0To6FrwZgBLMDYJQHvR191sOKPXBI2QkiB8wQm6MVOKg2UbXH9cw3fL0v99vL2mxvdLEgJwXoZGl6_lE9LzAvKm7ccBuX__Nsv4JtwsE7dfI9C1mqdqnl29pEm78SDW1n3H8y/s480/rsa.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSqjooSaf0To6FrwZgBLMDYJQHvR191sOKPXBI2QkiB8wQm6MVOKg2UbXH9cw3fL0v99vL2mxvdLEgJwXoZGl6_lE9LzAvKm7ccBuX__Nsv4JtwsE7dfI9C1mqdqnl29pEm78SDW1n3H8y/s960/rsa.png 2x" alt="[Image: Terminal screenshot showing a PGP key fingerprint and the same with all hex numbers replaced with emoji.]"/></div>
<p>The script could easily be extended to support output from other hex-formatted sources as well, such as xxd:</p>
<div class="saumaton kuva keskella"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgslVegzWOtb-hUhvIKDSKCLGAo_hfuorZCNSp90FuuO8_sprWa_eYVleRlCJFJfdR0m0B7d-uz4iUHY4MbJsMFI-VPigBi3IcC6vTxL9J7pZLclC-zSbNcI-yEgEC6RSlihrTr-7npZaVN/s480/korvista.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgslVegzWOtb-hUhvIKDSKCLGAo_hfuorZCNSp90FuuO8_sprWa_eYVleRlCJFJfdR0m0B7d-uz4iUHY4MbJsMFI-VPigBi3IcC6vTxL9J7pZLclC-zSbNcI-yEgEC6RSlihrTr-7npZaVN/s960/korvista.png 2x" alt="[Image: Terminal screenshot showing a hex dump of a poem and the same with all hex numbers replaced with emoji. kissofoni; tassun kynsi neulana / musa korvista kajahtaa]"/></div>
<p>Some additional methods for visualizing hex dumps and key fingerprints, from the comments section:</p>
<ul>
<li><a href="http://user.xmission.com/~atoponce/art/" title="PGP Art" class="external">PGP Strong Set Top 50 Fingerprint Art</a></li>
<li><a href="http://sebsauvage.net/wiki/doku.php?id=php:vizhash_gd" title="php:vizhash_gd [sebsauvage]" class="external">VizHash GD - a visual hash</a>
</ul>Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com15tag:blogger.com,1999:blog-5096278891763426276.post-20067906265060278382014-07-14T16:05:00.001+03:002023-03-11T16:47:11.433+02:00Mapping microwave relay links from video<p>Radio networks are often at least partially based on <a href="https://en.wikipedia.org/wiki/Microwave_transmission#Microwave_radio_relay" class="external" title="Wikipedia: Microwave transmission">microwave relay links</a>. They're those little mushroom-like appendices growing out of cell towers and building-mounted base stations. Technically, they're carefully directed dish antennas linking such towers together over a line-of-sight connection. I'm collecting a little map of nearby link stations, trying to find out how they're interconnected and which network they belong to.</p>
<h3>Circling around</h3>
<p>We can find a rough direction for any link antenna by approximating a tangent for the dish shroud surface from position-stamped video footage taken while circling the tower. Optimally we would have a drone make a full circle around the tower at a constant distance and elevation to map all antennas at once; but if our DJI Phantom has run out of battery, a GPS positioned still camera at ground level will also do.</p>
<div class="saumaton kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEht212L91KY_9TT2hZspfeS6LE70CXC2KxPCK56Y7HVyFJmOw7QYNEKJaIVQ4XR_qggDXRS2ufY_WjMYp_A7qIxwKSjCF6A_HwMi0CmR2KRSDKVxL2zOmySCabC0qorKIKLeyMMQfWn6K71/s1600/linkitylink.png" imageanchor="1"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEht212L91KY_9TT2hZspfeS6LE70CXC2KxPCK56Y7HVyFJmOw7QYNEKJaIVQ4XR_qggDXRS2ufY_WjMYp_A7qIxwKSjCF6A_HwMi0CmR2KRSDKVxL2zOmySCabC0qorKIKLeyMMQfWn6K71/s480/linkitylink.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEht212L91KY_9TT2hZspfeS6LE70CXC2KxPCK56Y7HVyFJmOw7QYNEKJaIVQ4XR_qggDXRS2ufY_WjMYp_A7qIxwKSjCF6A_HwMi0CmR2KRSDKVxL2zOmySCabC0qorKIKLeyMMQfWn6K71/s1000/linkitylink.png 2x" alt="[Image: Five photos of the same directional microwave antenna, taken from different angles, and edge-detection and elliptical Hough transform results from each one, with a large and small circle for all ellipses.]" /></a></div>
<p>The rest can be done manually, or using Hough transform and centroid calculation from OpenCV. In these pictures, the ratio of the diameters of the concentric circles is a sinusoid function of the angle between the antenna direction and the camera direction. At its maximum, we're looking straight at the beam. (The ratio won't max out at unity in this case, because we're looking at the antenna slightly from below.) We can select the frame with the maximum ratio from high-speed footage, or we can interpolate a smooth sinusoid to get an even better value.</p>
<div class="saumaton kuva keskella invertable"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBWQ8ZWP0iIFBS6javYIrLqtpiyKlWCRq-FPdN6myGibuaV_ntX2dkDqct5k_6dCNbObsskvlBLkaivN1oKYj9KBN6_mb65fPZvKDxShcvj6BgK0lrPUVt1ebQ8mw1KP2mV1qHyBdb-aY-/s1600/dish.png" imageanchor="1"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBWQ8ZWP0iIFBS6javYIrLqtpiyKlWCRq-FPdN6myGibuaV_ntX2dkDqct5k_6dCNbObsskvlBLkaivN1oKYj9KBN6_mb65fPZvKDxShcvj6BgK0lrPUVt1ebQ8mw1KP2mV1qHyBdb-aY-/s400/dish.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBWQ8ZWP0iIFBS6javYIrLqtpiyKlWCRq-FPdN6myGibuaV_ntX2dkDqct5k_6dCNbObsskvlBLkaivN1oKYj9KBN6_mb65fPZvKDxShcvj6BgK0lrPUVt1ebQ8mw1KP2mV1qHyBdb-aY-/s800/dish.png 2x" alt="[Image: Diagram showing how the ratio of the diameters of the large and small circle is proportional to the angle of the antenna in relation to the camera.]"/></a></div>
<p>This particular antenna is pointing west-northwest with an azimuth of 290°.</p>
<h3>What about distance?</h3>
<p>Because of the line-of-sight requirement, we also know the maximum possible distance to the linked tower, using the formula 7140 × √(4 / 3 × h) where h is the height of the antenna from ground. If the beam happens to hit a previously mapped tower closer than this distance, we can assume they're connected!</p>
<p>This antenna is communicating to a tower not further away than 48 km. Judging from the building it's standing on, it belongs to a government trunked radio network.</p>Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com3tag:blogger.com,1999:blog-5096278891763426276.post-51511767133536599582014-06-16T21:55:00.001+03:002023-03-11T16:42:48.599+02:00Headerless train announcements<div class="kuva oikealla" style="width:200px"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh26Tjh5ugfGUsyQTOimb0w9ppvdoNxwEWdQhP4WtSHEYTQ1f8RDCfAonikNw3n5pvlfB4JLvMg1Dpq35hJ0MMCOJi_EToLZ1fm85yRaXNZLaWSfkbTSaifvOGrmVVAgJ4HcX7NquaKy7NG/s1600/Cv6TbdL.jpg" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh26Tjh5ugfGUsyQTOimb0w9ppvdoNxwEWdQhP4WtSHEYTQ1f8RDCfAonikNw3n5pvlfB4JLvMg1Dpq35hJ0MMCOJi_EToLZ1fm85yRaXNZLaWSfkbTSaifvOGrmVVAgJ4HcX7NquaKy7NG/s200/Cv6TbdL.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh26Tjh5ugfGUsyQTOimb0w9ppvdoNxwEWdQhP4WtSHEYTQ1f8RDCfAonikNw3n5pvlfB4JLvMg1Dpq35hJ0MMCOJi_EToLZ1fm85yRaXNZLaWSfkbTSaifvOGrmVVAgJ4HcX7NquaKy7NG/s400/Cv6TbdL.jpg 2x" alt="[Image: Information display onboard a Helsinki train, showing a transcript of an announcement along with the time of the day, current speed and other info.]"/></a></div>
<p>The Finnish state railway company just changed their automatic announcement voice, discarding old recordings from trains. It's a good time for some data dumpster diving for the old ones, don't you think?</p>
<p>A 67-megabyte ISO 9660 image is produced that once belonged to an older-type onboard announcement device. It contains a file system of 58 directories with five-digit names, and one called "yleis" (Finnish for "general").</p>
<p>Each directory contains files with three-digit file names. For each number, there's <span class="code">001.inf</span>, <span class="code">001.txt</span> and <span class="code">001.snd</span>. The <span class="code">.inf</span> and <span class="code">.txt</span> files seem to contain parts of announcements as ISO 8859 encoded strings, such as "InterCity train" and "to Helsinki". The <span class="code">.snd</span> files obviously contain the corresponding audio announcements. There's a total of 1950 sound files.</p>
<h3>Directory structure</h3>
<p>The file system seems to be structurally pointless; there's nothing apparent that differentiates all files in <span class="code">/00104</span> from files in <span class="code">/00105</span>. Announcements in different languages are numerically separated, though (<span class="code">/001xx</span> = Finnish, <span class="code">/002xx</span> = Swedish, <span class="code">/003xx</span> = English). Track numbers and time readouts are stored sequentially, but there are out-of-place announcements and test files in between. The logic connecting numbers to their meanings is probably programmed into the device for every train route.</p>
<p>Everything can be spliced together from almost single words. But many common announcements are also recorded as whole sentences, probably to make them sound more natural. </p>
<h3>Audio format</h3>
<p>The audio files are headerless; there is no explicit information about the format, sample rate or sample size anywhere.</p>
<p>The byte histogram and Poincaré plot of the raw data suggest a 4-bit sample size; this, along with the fact that all files start with <code>0x80</code>, is indicative of an adaptive differential PCM encoding scheme.</p>
<div class="saumaton kuva keskella"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8Q8-T2Qr_Ecur96iJTFcsfPvxSnhl1fqfwWRf2i5jkOxNmQZEjkgjwz29e3xhHumDGS6oZBZ7esLlV1aXv7YdbIoDnIuwZATKyB0QsAO3YJLg4bD42BOZEJs_Ksj5-o4isE0MNQhN3NJf/s450/poincare-plots.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8Q8-T2Qr_Ecur96iJTFcsfPvxSnhl1fqfwWRf2i5jkOxNmQZEjkgjwz29e3xhHumDGS6oZBZ7esLlV1aXv7YdbIoDnIuwZATKyB0QsAO3YJLg4bD42BOZEJs_Ksj5-o4isE0MNQhN3NJf/s900/poincare-plots.png 2x" data-original-width="1600" data-original-height="793" alt="[Image: Byte histogram and Poincare plot of a raw audio file, characteristic of Gaussian-distributed data encoded as four-bit samples.]"/></div>
<p>Unfortunately there are as many variations to ADPCM as there are manufacturers of encoder chips. None of the decoders known by SoX produce clean results. But with the right settings for the OKI-ADPCM decoder we can already hear some garbled speech under heavy Brownian noise.</p>
<div class="audiodiv"><audio controls><source src="https://oona.windytan.com/blogfiles/kokeilua0.mp3" />[HTML5 audio: Sound resembling garbled speech buried in noise.]</audio></div>
<p>For unknown reasons, the output signal from SoX is spectrum-inverted. Luckily it's trivial to fix (see <a href="https://www.windytan.com/2013/05/descrambling-voice-inversion.html" title="absorptions: Descrambling the voice inversion scrambler">my previous post on frequency inversion</a>). The pitch sounds roughly natural when a 19,000 Hz sampling rate is assumed. A test tone found in one file comes out as a 1000 Hz sine when the sampling rate is further refined to 18,930 Hz.</p>
<p>This is what we get after frequency inversion, spectral equalization, and low-pass filtering:</p>
<div class="audiodiv"><audio controls><source src="https://oona.windytan.com/blogfiles/kokeilua.mp3" />[HTML5 audio: The Helsinki train announcement voice saying "This is an InterCity2 train to Helsinki."]</audio></div>
<p>There's still a high noise floor due to the mismatch between OKI-ADPCM and the unknown algorithm used by the announcement device, but it's starting to sound alright!</p>
<h3>Peculiarities</h3>
<p>There seems to be an announcement for every thinkable situation, such as:</p>
<p><ul><li>"Ladies and Gentlemen, as due to heavy snowfall, we are running slightly late. Please accept our apologies."</li>
<li>"Ladies and Gentlemen, an animal has been run over by the train. We have to wait a while before continuing the journey."</li>
<li>"Ladies and Gentlemen, the arrival track of the train having been changed, the platform is on your left hand side."</li>
<li>"Ladies and Gentlemen, we regret to inform you that today the restaurant-car is exceptionally closed."</li></ul></p>
<p>Also, there is an English recording of most announcements, even though only Finnish and Swedish are usually heard on commuter trains.</p>
<p>One file contains a long instrumental country song.</p>
<p>In an eerily out-of-place sound file, a small child reads out a list of numbers.</p>
<h3>Final words</h3>
<p>This is something I've wanted to do with this almost melodically intonated announcement about ticket selling compartments.</p>
<div class="audiodiv"><audio controls><source src="https://oona.windytan.com/blogfiles/eijan_aikakausi.mp3" />[HTML5 audio: A musical piece made using an announcement in Finnish.]</audio></div>Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com14tag:blogger.com,1999:blog-5096278891763426276.post-23891943954399756852014-06-09T02:51:00.001+03:002023-03-11T16:43:12.919+02:00Time-coding audio files<p>One day you'll need to include real-time UTC timestamps in audio. It's useful when reconstructing events from long, unsupervised surveillance microphone recordings, or when constantly monitoring and logging radio channels.</p>
<p>There's no standard method for doing this with WAV or FLAC files. One method would be to log the start time in the filename and calculate the time based on audio position. However, this is not possible with voice-activated or squelched recorders. It also relies on the accuracy and stability of the ADC clock.</p>
<p>I'll take a look at some ways to include an accurate timestamp directly in the in-band audio.</p>
<h3>Least significant bit</h3>
<p>Time information can be encoded in the least significant bit (LSB) of the 16-bit PCM samples. This "steganographic" method requires a lossless file format and lossless conversions. The script below truncates all samples of a raw single-channel signed-integer PCM stream to 15 bits and inserts a 20-byte ISO 8601 timestamp in ASCII roughly every second, preceded by a "mark" start bit. When played back, the LSB can be zeroed out to get rid of the timestamps. The WAV can also be played as such; the "ticking" sound will be practically inaudible at an amplitude of −96 dB. The outgoing PCM stream is then sent to SoX for WAV encoding.</p>
<pre><code lang="Perl">#!/usr/bin/perl
use strict;
use warnings;
use DateTime;
my $snum = 0;
my $writing = 0;
my $pos = 0;
my $code = "";
open my $out, '-|', 'sox -t .raw -e unsigned-integer -b 16 -r 44100 '.
'-c 1 - stamped.wav';
while (read STDIN, my $sample, 2) {
$sample = unpack "s", $sample;
my $bit = 0;
if ($writing) {
$bit = (ord(substr $code, $pos >> 3, 1) >> ($pos % 8)) & 1;
if (++$pos >= length $code << 3) {
$writing = 0;
$bit = 0;
}
} elsif ($snum++ % 44100 == 0) {
$writing = 1;
$pos = 0;
$bit = 1;
$code = DateTime->now()->iso8601();
}
print $out pack "S", ($sample + 0x7FFF) & 0xFFFE | $bit;
}
close $out;</code></pre>
<p>Note that the start bit of the timestamp will mark the moment the sample reached this script, and it could differ hundreds of milliseconds from the actual moment of reception at the microphone. Also, the timestamp does not mark the start of a second, but is rather timed by an arbitrary sample counter. One could also poll and write the timestamps in a continuous manner.</p>
<p>The above script could be modified to interface with <a href="https://www.windytan.com/2013/07/squelch-it-out.html" title="absorptions: Squelch it out">my squelch script</a>, by only inserting timestamps when squelch is not active. The resulting audio could then be efficiently encoded as FLAC.</p>
<p><a href="https://gist.github.com/windytan/3fdd4c7b19d262402e5e" class="external" title="lsb-time-read.pl">lsb-time-read.pl</a> reads back the timestamps, also printing the sample position of each. Below is a sound sample of a clean signal followed by a timestamped one.</p>
<div class="audiodiv"><audio controls>
<source src="https://oona.windytan.com/blogfiles/stampedmix1.mp3" />
(Here, a HTML5 audio element used to be)
</audio></div>
<h3>Lossy-friendly approach</h3>
<p>Lossy compression, by definition, does not retain the numeric values of samples, so they can't be treated as bit fields. Instead, we can use an analog modulation scheme like binary FSK. MP3 and Ogg Vorbis encoders will, at a reasonable bit rate, retain the structure of a sufficiently slow FSK burst. This method will work even if the timestamping phase is followed by an analog conversion.</p>
<p>Using the ultrasonic part of the spectrum comes to mind; but unfortunately such high frequencies are mainly ignored by a LPF at the encoder. However, we can use the higher end of the remaining spectrum and filter it out afterwards, if the recording consists of narrow-band speech. In the case of squelched conversation, we could write the timestamp only in the beginning of each transmission. This way it could even be in the speech frequencies.</p>
<p><a href="https://gist.github.com/windytan/08650f1c7e84c7370130" class="external" title="fsk-timestamp.pl">fsk-timestamp.pl</a> embeds the timestamps into PCM data; they can be read back using <span class="code">minimodem --rx --mark 11000 --space 13000 --file stamped.wav -q 1200</span>.</p>
<p>A sound sample follows.</p>
<div class="audiodiv"><audio controls>
<source src="https://oona.windytan.com/blogfiles/stampedmix2.mp3" />
(Here, a HTML5 audio element used to be)
</audio></div>Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com9tag:blogger.com,1999:blog-5096278891763426276.post-74587382905050646252014-02-01T00:43:00.001+02:002023-08-25T00:04:18.628+03:00Mystery signal from a helicopter<p>Last night, YouTube suggested <a href="https://www.youtube.com/watch?v=TCKRe4jJ0Qk" class="external" title="Police Chase Ends In Front Of KC Police HQ – YouTube">a video</a> for me. It was a raw clip from a news helicopter filming a police chase in Kansas City, Missouri. I quickly noticed a weird interference in the audio, especially the left channel, and thought it must be caused by the chopper's engine. I turned up the volume and realized it's not interference at all, but a mysterious digital signal! And off we went again.</p>
<div class="audiodiv"><audio controls><source src="https://oona.windytan.com/blogfiles/selostus2.mp3" />[HTML5 audio: Recording from a microphone onboard a helicopter, with occasional narration. The left channel contains what appears to be a data signal.]</audio></div>
<p>The signal sits alone on the left audio channel, so I can completely isolate it. Judging from the spectrogram, the modulation scheme seems to be BFSK, switching the carrier between 1200 and 2200 Hz. I demodulated it by filtering it with a lowpass and highpass sinc in SoX and comparing outputs. Now I had a bitstream at 1200 bps.</p>
<div class="saumaton kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjC1scRJZ0uYjDdYjlDDwofVMz8mdVvL7_26-4MiRLn7ThCjaJJpH-2D-ylOpCSle4LYWyRNxs-XsCzhDEvoyr-biJwzSNk2KI3CdVvmNzT-PZ6zXXoqqUUPOl3R1CloThn5Ef0lVrs5H8B/s1600/bitteja.png" imageanchor="1"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjC1scRJZ0uYjDdYjlDDwofVMz8mdVvL7_26-4MiRLn7ThCjaJJpH-2D-ylOpCSle4LYWyRNxs-XsCzhDEvoyr-biJwzSNk2KI3CdVvmNzT-PZ6zXXoqqUUPOl3R1CloThn5Ef0lVrs5H8B/s500/bitteja.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjC1scRJZ0uYjDdYjlDDwofVMz8mdVvL7_26-4MiRLn7ThCjaJJpH-2D-ylOpCSle4LYWyRNxs-XsCzhDEvoyr-biJwzSNk2KI3CdVvmNzT-PZ6zXXoqqUUPOl3R1CloThn5Ef0lVrs5H8B/s1000/bitteja.png 2x" alt="[Image: A nondescript oscillogram of the data signal, and below it, the signal after FM demodulation, showing a clear pattern characteristic of binary FSK switching at 1200 bps.]"/></a></div>
<p>The bitstream consists of packets of 47 bytes each, synchronized by start and stop bits and separated by repetitions of the byte <span class="code">0x80</span>. Most bits stay constant during the video, but three distinct groups of bytes contain varying data, marked blue below:</p>
<div class="saumaton kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilaBnLuqt1a9yt_RZi-XACd-HKohNk4w0hAsH0ntktYkf2Icd1iPRNJOh4jeyig2G6yD-CU9JhYfKpG0rbAzMadFS9EBsUJRBEv5FUEmcWqM5rItCHMi9Jm1uFHQoGDdI-XEr0ELK2Cssc/s1600/kopteri-bitit2.png" imageanchor="1"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilaBnLuqt1a9yt_RZi-XACd-HKohNk4w0hAsH0ntktYkf2Icd1iPRNJOh4jeyig2G6yD-CU9JhYfKpG0rbAzMadFS9EBsUJRBEv5FUEmcWqM5rItCHMi9Jm1uFHQoGDdI-XEr0ELK2Cssc/s460/kopteri-bitit2.png" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilaBnLuqt1a9yt_RZi-XACd-HKohNk4w0hAsH0ntktYkf2Icd1iPRNJOh4jeyig2G6yD-CU9JhYfKpG0rbAzMadFS9EBsUJRBEv5FUEmcWqM5rItCHMi9Jm1uFHQoGDdI-XEr0ELK2Cssc/s920/kopteri-bitit2.png 2x" alt="[Image: A time-stamped hex dump of the byte stream, arranged in packets with only a few bytes changing over time.]"/></a></div>
<p>What could it be? Location telemetry from the helicopter? Information about the camera direction? Video timestamps?</p>
<p>The first guess seems to be correct. It is supported by the relationship of two of the three byte groups. If the 4 first bits of each byte are ignored, the data forms a smooth gradient of three-digit numbers in base-10. When plotted parametrically, they form an intriguing winding curve. It is very similar to this plot of the car's position (blue, yellow) along with viewing angles from the helicopter (green), derived from the video by manually following landmarks (only the first few minutes shown):</p>
<div class="kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi658ISVvXlb7DelH0uhyphenhyphen3tOFZu7xceBh2iE7mRjmuqbyFY44goZi-ZEu3fABnneGjEZQQrdRXO6-HN5-Mf0AVOnU2pVUqp4R1nM_5ZtLFijeuJBTAjIwgT13TaCIroWbnDUT1vaDyx-s8f/s1600/positionplot.jpg" imageanchor="1"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi658ISVvXlb7DelH0uhyphenhyphen3tOFZu7xceBh2iE7mRjmuqbyFY44goZi-ZEu3fABnneGjEZQQrdRXO6-HN5-Mf0AVOnU2pVUqp4R1nM_5ZtLFijeuJBTAjIwgT13TaCIroWbnDUT1vaDyx-s8f/s460/positionplot.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi658ISVvXlb7DelH0uhyphenhyphen3tOFZu7xceBh2iE7mRjmuqbyFY44goZi-ZEu3fABnneGjEZQQrdRXO6-HN5-Mf0AVOnU2pVUqp4R1nM_5ZtLFijeuJBTAjIwgT13TaCIroWbnDUT1vaDyx-s8f/s910/positionplot.jpg 2x" alt="[Image: Screenshot from Google Earth, showing time-stamped placemarks tracing the roads of a suburb, accompanied by an X-Y plot of the changing FSK bytes that draws a very similar picture.]"/></a></div>
<p>When the received curve is overlaid with the car's location trace, we see that 100 steps on the curve scale corresponds to exactly 1 minute of arc on the map!</p>
<p>Using this relative information, and the fact that the helicopter circled around the police station in the end, we can plot all the received data points in Google Earth to see the location trace of the helicopter:</p>
<div class="kuva keskella"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpo3zNOcWxPRGYQKVhPj5VseA9IIckX9g5y2NUcMqOB78Tme3U25qi0hg2ytxfo0grX5EPni35nn0XcfpSfWcCunBF8INr3jU_61b8bNCveDHAfn8iuIofHO_NX9nVgjOaBOzfqh8EGwkJ/s1600/choptrace.jpg" imageanchor="1"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpo3zNOcWxPRGYQKVhPj5VseA9IIckX9g5y2NUcMqOB78Tme3U25qi0hg2ytxfo0grX5EPni35nn0XcfpSfWcCunBF8INr3jU_61b8bNCveDHAfn8iuIofHO_NX9nVgjOaBOzfqh8EGwkJ/s460/choptrace.jpg" srcset="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpo3zNOcWxPRGYQKVhPj5VseA9IIckX9g5y2NUcMqOB78Tme3U25qi0hg2ytxfo0grX5EPni35nn0XcfpSfWcCunBF8INr3jU_61b8bNCveDHAfn8iuIofHO_NX9nVgjOaBOzfqh8EGwkJ/s910/choptrace.jpg 2x" alt="[Image: Coordinates from the whole data signal plotted on top of a Google Earth satellite photo several miles across, with a lot of circling around.]"/></a></div>
<p><b>Update:</b> Apparently the video downlink to ground was transmitted using a transmitter similar to Nucomm Skymaster TX that is able to send live GPS coordinates. And this is how they seem to do it.</p>
<p><b>Update 2:</b> Yes, it's 7-bit Bell 202 ASCII. I tried decoding it as 7-bit data earlier, ignoring parity, but must have gotten the bit order wrong! So I just chose a roundabout way and kept looking at the hex. When fully decoded, the stream says:</p>
<pre class="term">#L N390386 W09434208YJ
#L N390386 W09434208YJ
#L N390384 W09434208YJ
#L N390384 W09434208YJ
#L N390381 W09434198YJ
#L N390381 W09434198YJ
#L N390379 W09434188YJ</pre>
<p>These are the full lat/lon pairs of coordinates (39° 3.86′ N, 94° 34.20′ W). Nucomm says the system enables viewing the helicopter "on a moving map system". Also, it could enable the receiving antenna to be locked onto the helicopter's position, to allow uninterrupted video downlink.</p>
<p>Thanks to all the readers for additional hints!</p>
<p>If you want to try it yourself, there's <a href="https://gist.github.com/windytan/e2ebb8ed872f89b1ec6a31dd033c7168" class="external">a shell script</a> that will run sox, minimodem, and Perl in the right order for you.</p> Oona Räisänenhttp://www.blogger.com/profile/08764440174916554983noreply@blogger.com89