Smoother sailing: Studying audio imperfections in Steamboat Willie

[Image: Mickey Mouse whistling on the bridge of a steamboat.]

Steamboat Willie (1928) was one of the earliest cartoons with synchronized sound. That is, it had post-production sound effects; this was something new and exciting. Now that the cartoon has recently entered the public domain[bbc24] we can safely delve into its famous soundtrack. See, there's something interesting about how it sounds...

If you listen closely to the soundtrack on Youtube it sounds somehow distorted. You might be tempted to point out that it's 96 years old, yes. But you might also recognize that it is suffering from flutter, i.e. an unstable playback or recording speed.

In the spirit of this blog let's geek out for a bit and study this flutter distortion further. Can we learn something interesting? Could we perhaps learn enough to be able to reduce it?

Of course the flutter might be 100% authentic to how it sounded in theatres in the 1920s; we don't know when and why it appeared in the audio (more on that later!). It might have sounded even worse. But we can still hope to enjoy the sound effects in their original recorded form.

Prior work

I'm not the first one to notice this clip is 'fluttering' and to try and do something about it. I found videos of people's attempts to un-flutter it using Celemony Capstan, a professional tool made just for this purpose, with varying results. Capstan uses Melodyne's famous note detection engine to detect musical features and then controls a varispeed effect to cancel out any flutter.

But Capstan is expensive, and it's more fun to come up with a home-made solution anyway. And what about non-musical sounds? Besides, I had some code laying around in a forgotten desk drawer that just might fit the purpose.

Finding a high quality source

Why would I need a high-quality digital file of a poor-quality soundtrack from the 1920s? I guess it's the archivist in me hoping that it has been preserved with high level of detail. But also, if you're going to try and dig up some hidden details in the sound, you'd want minimal interference from any lossy psychoacoustic compression, right? These artifacts might become audible after varispeed effects and could also hinder frequency detection.

[Image: Two spectrograms labeled 'random Youtube video' and '4K version', the former showing compression artifacts.]

The high-quality source I found is in the Internet Archive. It might originally be coming from the 4K Blu-Ray release called Celebrating Mickey. The spectrogram doesn't show almost any compression artifacts that I can see, even in the quietest frequency ranges! Perfect!

[Image: A single film frame.]

But the Internet Archive delivers something even better. There's a (visually) lossless 4K scan of the movie with the optical soundtrack partially included (above)! The high-quality version is 34 GB, but there's a downscaled 480p MP4 one thousandth of the size.

I listened to the optical soundtrack from this low-resolution version with a little pixel-reader script. Turns out the flutter is already present on the film! (Edit: Note that we don't know where this particular film print came from. When was it created? Is there an original somewhere, without flutter?)

Hand-guiding a frequency tracker

Looking at the above spectrogram, we can see that the frequency of everything is zig-zagging as a function of time – that's flutter all right. But how to quantify these variations? We could zoom in on one of the frequency peaks and follow the course of its frequency in time. I'm using FFT peak interpolation to find more accurate frequency estimates[gasior04].

Take the sound of Pete's tobacco hitting the ship's bell around the 01'45'' mark. You'd think a bell is supposed to have a constant frequency, yet this one sounds quite unstable. We can follow any one of the harmonics and see how the playback speed (bell frequency) varies over the period of one second:

[Image: Spectrogram with fluctuating tones.]

To my eye, this oscillation looks periodic and not random at all. We can run another round of FFT on a longer stretch of samples to find the strongest period of these fluctuations: It turns out to be 15 Hz. (Why 15? I so hoped it would have been 24 Hz – it would have made a more interesting story! More on that later...)

[Image: Spectrum plot showing a peak at 15.0 Hz about 15 dB higher than background.]

Okay, so can we repeat this process for the whole movie? I don't think we can just automatically follow the frequency of every peak, since some sounds will naturally contain vibration and rises and drops in frequency. Not all of it is due to flutter. Some sort of a vetting process is needed. We could try a tedious manual route...

[Image: GUI of a software with spectrograms and oscillogram plots.]

I made a little software tool (above) where I could click and drag little boxes onto a spectrogram to search for peaks in. This wobbly line is then simply taken to be the speed variation (red graph in the top picture).

It became quite a chore to annotate longer sounds as this software didn't come with undo, edit, or save features for the longest time!

Now let's think about what to do with this speed information...

Desk drawer deep dive

Some time ago I had made a tool that could well come in handy now. It was for correcting wobbly wideband radio recordings stored on VHS tapes. These recordings contained some empty carriers that happened to work like seismographs, accurately recording the tape speed variations. The tool then used a Lagrange polynomial to interpolate new samples at a steady interval, so called 'digital varispeed'.

It was ultimately based on an interesting paper on de-fluttering magnetic tapes using the tape bias signal as reference[howarth04].

[Image: Buttons of an old device, one of them Varispeed, labeled 1981. Below, part of a GUI with the text Varispeed, labeled 2023.]

By the way, I keep mentioning varispeed and never explained it. This was a feature of old studio-grade reel-to-reel tape recorders where the playback speed could be freely varied by the operator; hence vari+speed. Audio people still use this word in the digital world to essentially refer to variable-rate resampling, which has the same effect, so I'm using them interchangeably. (Topmost photo: Ferdinando Traversa, CC BY, cropped to detail)

Here's what this digital varispeed sounds like when exaggerated. In the below example I'm doing it in a simpler way. Instead of the Lagrange method I first upsampled some music by 10x in an audio software; hand-drew a speed curve in Audacity; and then used that curve to pick samples out of the oversampled music:

[Image: A waveform in Audacity.]

Carefully controlled, this effect can be used to cancel out flutter. Here's how: If we knew exactly how the playback speed was fluctuating we could instantly vary the speed of our resampler in the opposite direction, thus canceling the variations. And with the above research we now have that knowledge!

Well, almost. I couldn't always see a clear frequency peak to follow, so the graph is patchy. But.. Maybe it could help to band-pass the speed signal at 15 Hz? This would help fill out small gaps and also preserve vibrato and other fluctuations that aren't part of the flutter distortion. We can at least try!

[Image: Two waveforms, one of them piecewise and noisy, the other one smooth and continuous.]

In the example above, I replaced empty parts with a constant value of 100% and then filtered the whole thing. This sews the disjointed parts together in a smooth way.

Can we hear some examples already?

This clip is from when the goat ate Minnie's sheet music and guitar – the apparent catalyst event that sent Mickey Mouse to seek revenge on the entire animal kingdom.

Before [Image: Movie screenshot]
After

You can definitely hear the difference in the bell-like sounds coming from the goats insides. It even sounds like the little flute notes in the beginning are easier to tell apart in the corrected version.

Here's another musical example, with strings.

Before [Image: Movie screenshot]
After

The cow's moo. That's a hard one because it's so rich in harmonics, in the spectrogram it looks almost like a spaghetti bolognese. My algorithm is constrained to a box and can't stay with one harmonic when the 'moo' slides in frequency. You can hear some artifacts because of this, but still the result sounds less sheep-like than the original.

Before [Image: Movie screenshot]
After

But Mickey whistling "Steamboat Bill" in the beginning of the film actually doesn't sound better when corrected... I preferred a bit of vibrato!

Before [Image: Movie screenshot]
After

Sidetrack 1: Anything else we can find?

Glad you're still reading! Let's step away from flutter for a while and take the raw audio track itself under the Fourier microscope. Zooming closer, is there anything interesting in the lower end?

[Image: Spectrogram showing a frequency range from 0 to 180 Hz.]

We can faintly see peaks at multiples of both 24 and 60 Hz. No surprises there, really... 24 Hz being the film framerate and 60 Hz the North American mains frequency. Was there a projector running in the recording studio? Or maybe it's an artifact of scanning the soundtrack one frame at a time? In any case, these sounds are pretty weak.

[Image: Spectrogram showing tones with apparent sidebands.]

In some places you can see some sort of modulation that seems to be generating sidebands, just like in radio signals. It's especially visible in Mickey's whistle when it's flutter-corrected, here at the 5-second mark. The sidebands peaks are 107 and 196 Hz away from the 'carrier' if you will. I'm not sure what this could be. Fluctuating amplitude?

Sidetrack 2: Playing sound-on-film frame by frame?

This is an experiment I did some time ago. It's just a silly thought - what would happen if the soundtrack was being read in the same way as the picture is – stopped 24 times per second? Would this be the ultimate flutter distortion?

In the olden days, sound was stored on the film next to the picture frames as analog information. Unlike the picture frames that had to be stopped momentarily for projection, the sound had to be played at a constant speed. There was a complicated mechanism in the projector to make this possible.

I found some speed curves for old-school movie projectors in [bickford72]. They describe the film's deceleration and acceleration during these stops. Let's emulate these speed curves in audio with the oversampling varispeed method.

The video below is a 3D animation where this same speed curve controls an animation of a moving film in an imaginary machine. The clip is from another 1920s animation, Alice in the Wooly West (1926).

~~ Now we know ~~

Conclusions

  • We found a 15 Hz speed fluctuation that was, to some extent, reversible.
  • This flutter signal is already present in the optical soundtrack of a film scan (of unknown origin).
  • With enough manual work, much of the soundtrack could probably be 'corrected'.
  • 'Hmm, that sounds odd' are sometimes the words of a white rabbit.

References

14 comments:

  1. Amazing work!

    ReplyDelete
  2. This commenter on lobste.rs points out that you're exactly right about there being a projector in the recording studio - it was used to keep the orchestra in sync. https://lobste.rs/s/zv2a2p/smoother_sailing_studying_audio#c_jcfwbd

    ReplyDelete
    Replies
    1. The scan I found being a later reprint could explain where the audio got distorted. I've also wondered why the contrast in the soundtrack is so much greater than in the picture. In fact it's 0% black vs. 100% white.

      Delete
    2. When a composite print was made, one with both picture and optical sound, the machinery that did the developing of the composite print frequently had a means of deploying a different high-contrast chemical developer to the area where the optical sound was, from the developer used for the image area. This resulted in the extremely high-contrast region you noticed in the optical sound, whereas the image area was grayscale. I can’t say that’s how the print of Steamboat Willy you found was made, but I’d guess it is likely. I worked in a film laboratory in the 1980s and the equipment to do that localized-developer application on color release film stock was quite interesting!

      Delete
  3. Very impressive! When you noticed the volume fluctuation, I thought so the same, there's a tremolo effect over the audio as well. Not sure if it's in phase with the vibrato but it might be.
    While it's academically an interesting thing to try to correct, but it also destroys some of the 'old movie' vibe at the same time. It also adds a bit of that auto tune effect, which is jarring for something so old (heck it's jarring even in today's music when over used).
    Since it's in the optical sound track, people must've experienced the same vibrato sound when first viewed in the theatres.

    ReplyDelete
    Replies
    1. Fully agree about Auto-Tune :) I actually want to believe that the original screening version sounded all right and that this must be a mistake that happened in a later copy. But more research is needed.

      Delete
  4. Are all the corrective 15Hz sections phase-aligned with each other? I.e. would a global correction be worth trying? I wonder if there was a synchronous motor, a 4:1 gear reduction somewhere and a bad gear causing 60Hz to modulate at 15Hz.

    ReplyDelete
    Replies
    1. Might be worth exploring! The sections do form a contiguous oscillation when connected. But if you mean is the 15 Hz locked to the sample clock, then no, the exact phase fluctuates. I'm currently waiting for the big download to finish so I can test another theory - if there is any relation to the frame rate. 15 = 5/8 * 24, so could be a far fetch.

      Delete
  5. are you sure that 24fps was the rate of the original animation? Many films from the early days were shot at variable frame rates including 15fps.

    ReplyDelete
    Replies
    1. It's difficult to be sure, but when I play the film scan at 24 fps the music sounds the most natural. Parts of the animation may stay in place for the duration of a few frames, but looks there is some movement in all the 24 frames in a second.

      Delete
  6. I wonder we “prefer” the flutter version of Mickey Mouse whistling because it is so familiar to us — especially since Disney has used it for their logo over the past five years. In fact, that’s an excellent question: does that Disney “Steamboat Willy” logo correct for flutter?

    ReplyDelete
    Replies
    1. Interesting question! I found it, it has the same flutter/vibrato in the logo audio: https://www.youtube.com/watch?v=7Y_Vh6zH8q8

      A few people have mentioned that the flutter adds an 'old-timey' feel to the movie. And that it was difficult to stabilize recording speeds with the technology of that time. But listening to e.g. classical 78rpm recordings from 1928, they sound absolutely listenable by today's standards and no audible fast flutter. That's what makes me suspect it's not original to the film.

      Delete

Please browse through the FAQ first, it might be that your question is already answered.

Spammers have even found comments sections, so this comments section is pre-moderated; it will take some time for the comment to show up.