Speech to birdsong conversion

I had a dream one night where a blackbird was talking in human language. When I woke up there was actually a blackbird singing outside the window. Its inflections were curiously speech-like. The dreaming mind only needed to imagine a bunch of additional harmonics to form phonemes and words. One was left wondering if speech could be transformed into a blackbird song by isolating one of the harmonics...

One way to do this would be to:

  • Find the instantaneous fundamental frequency and amplitude of the speech. For example, filter the harmonics out and use an FM demodulator to find the frequency. Then find the signal envelope amplitude by AM demodulation.
  • Generate a new wave with similar amplitude variations but greatly multiplied in frequency.
[Image: Signal path diagram.]

A proof-of-concept script using the Perl-SoX-csdr command-line toolchain is available (source code here). The result sounds surprisingly blackbird-like. Even the little trills are there, probably as a result of FM noise or maybe vocal fry at the end of sentences. I got the best results by speaking slowly and using exaggerated inflection.

Someone hinted that certain types of automatic announcements have the perfect inflection for this kind of conversion. And it seems to be true! Here, a noise gate and reverb has been added to the result to improve it a little:

And finally, a piece of sound art where this synthetic blackbird song is mixed with a subtle chord and a forest ambience:

Think of the possibilities: A simultaneous interpreter for talking to birds. A tool for dubbing talking birds in animation or live theatre. Entertainment for cats.

What other birds could be done with a voice changer like this? What about croaky birds like a duck or a crow?

34 comments:

  1. Could you make the invert process? from bird to human language? :)

    ReplyDelete
    Replies
    1. Because so much information is lost in the process, the reverse would require completely re-imagining it. In other words, it would require a dreamer. :) Perhaps something like DeepDream could do it?

      Delete
  2. What if you were to reverse the chain, and convert some blackbird speech to human speech?

    ReplyDelete
    Replies
    1. The process is destructive, so you can't reverse it.

      It would be like painting black over a painting. If I give you a black canva, you can't guess what was painted before and reconstruct it. You'd have to make some opinionated choices.

      Delete
  3. Just lovely. Now you simply need to write a song to speech decoder!

    ReplyDelete
  4. This is beautiful! I'm glad I stumbled across your blog in a hacker news comment thread.
    I'm reminded of this [https://www.youtube.com/watch?v=-JftSgb69JY] composition by Chris Hughes, who was inspired by Steve Reich's 1967 notes on a slow-motion score: "Very gradually slow down a recorded sound to many times its original length without changing its pitch or timbre at all."

    ReplyDelete
  5. Thanks! Sorry to ask a signals question, but do you have any search terms or links or pointers so I can learn more about using FM demodulation to find the fundamental? It's kind of melting my brain trying to understand how that could work. Thanks!

    ReplyDelete
    Replies
    1. FM demodulation works here because I first lowpass filtered the speech so that it only (or mostly) contained the fundamental frequency. I found the right cutoff frequencies by trial and error. Now, because FM encodes information in the signal frequency, we can extract this information by FM demodulation. The magnitude of the result is proportional to the frequency (of the fundamental, in this case).

      There are more robust methods out there, you could search for "pitch detection algorithm".

      Delete
    2. Thanks for the reply. I think I have a glimmer now. :-) I've actually written some pitch detection software and studied that subject a bit, which is why I was surprised to read about your method because I had never heard of it. Is it a less precise approach than e.g. Schmitt triggering? And it sounds from what you wrote as if you want a relatively pure tone to use this approach, hence the low-pass filtering?

      Delete
    3. It can be precise but it only does the right thing for a pure tone, there's a lot of contamination if there are any harmonics. But in this case it's not a big deal, since it's for art. Also, in this case the inflections in my voice mostly span less than 1 octave, so the second harmonic should be easy to filter out. FM demodulation was the first thing that came to my mind that can quickly be done with command-line tools familiar to me (csdr), so that's why I chose it.

      Delete
  6. This reminded me of a spanish language called Silbo Gomero, which is whistled. https://en.wikipedia.org/wiki/Silbo_Gomero

    ReplyDelete
  7. You should really ecode a formant (the second one carries the most information, I think) instead of the fundamental frequency, especially for a non tonal language like english. The fundamental frequency carries way too little information.

    ReplyDelete
  8. can birds understand any of this? i.e. can you elicit specifi bird behaviors?? like bring me a worm.

    ReplyDelete
  9. This is incredible! I was listening to the songbirds when someone showed me this, I love it. What do you use to make the pretty purple and green diagrams?

    ReplyDelete
    Replies
    1. Thanks! I design them myself in Inkscape. I've made a free-to-use SVG that contains the styles and some of the elements: signalflow.svg

      Delete
  10. Thank you for creating this. Your blog is very inspriational

    ReplyDelete
  11. Do you have an app or would you be willing to create an app so we could record our voice through the computer into birdsong so we can greet our friends and so forth. What fun. ;-}

    ReplyDelete
  12. An NPR radio piece on the same subject: https://www.npr.org/2021/04/16/988200892/heres-what-all-things-considered-sounds-like-in-blackbird-song

    ReplyDelete
  13. As a poet, I would love to take some lines from my poetry or favorite poems and translate them into birdsong. Maybe one day there will be a method to easily record and convert. This is truly beautiful. Thank you.

    ReplyDelete
  14. Is this translator hosted somewhere that it can be used? I'd love to try it!

    ReplyDelete
    Replies
    1. Unfortunately it's not. Would be really cool!

      Delete
  15. Trrr
    Uwhheeeee
    Shreeee
    Ssssssss
    Tweeeeeeee

    ReplyDelete
  16. Just heard your NPR interview - very interesting!!
    For a related project, consider converting voice to a whistled language (Silbo Gomero) or a drum language. Here's a good reference paper.
    Rialland, A. (2005). "Phonological and phonetic aspects of whistled languages".
    https://core.ac.uk/download/pdf/191755708.pdf

    ReplyDelete
  17. I love this! This is probably a dumb question- but is there a way to open/recreate this on MaxMSP?

    ReplyDelete
    Replies
    1. It *should* be possible. It needs to be re-thought a little since there are no complex signals in MaxMSP (as far as I know). Also there's no MSP object to demodulate FM. I tried to get a fundamental frequency with fzero~ instead, but the result sounds quite different, not at all smooth. Here's the MSP patch I tried: [adc~] -- [fzero~ @threshold 0.01] -- [sig~] -- [*~ 8] -- [cycle~] -- [*~ 0.1] -- [dac~]. Additionally you could scale the resulting signal with the output 2 from fzero~ (using [*~]).

      Delete
  18. Fascinating! From your sample it seems certain phonemes generate a more bird-like sounds than others (at least with the given processing).
    That makes me wonder if certain human languages on average produce better results than others in this context :)

    ReplyDelete

The comments section is pre-moderated; it will take some time for the comment to show up.

You might want to check out the FAQ first.