"I am the atomic powered robot. Please give my best wishes to everybody!"
Those are the words uttered by Tommy, a childhood toy robot of mine. I've taken a look at his miniature vinyl record sound mechanism a few times before (#1, #2), in an attempt to recover the analog audio signal using only a digital camera. Results were noisy at best. The blog posts resurfaced in a recent IRC discussion which inspired me to try my luck with a slightly improved method.
Source photo
I will be using an old photo of Tommy's internal miniature record I already had from previous adventures in 2012. I don't want to perform another invasive operation on Tommy to take a new photograph, as I already broke a plastic tab last time I opened him. But it also means I don't have control over the photographing environment. It's part of the challenge.
The picture was taken with a DSLR and it's an uncompressed 8-bit color photo measuring 3000 by 3000 pixels. There's a fair amount of focus blur, chromatic aberration and similar distortions. But at this resolution, a clear pattern can be seen when zooming into the grooves.
This pattern superficially resembles a variable-area optical audio track seen in old film prints, and that's why I previously tried to decode it as such. But it didn't produce satisfactory results, and there is no physical reason it even should. In fact, I'm not even sure as to which physical parameter the audio is encoded in – does the needle move vertically or horizontally? How would this feature manifest itself in the photograph? Do the bright blobs represent crests in the groove, or just areas that happen to be oriented the right way in this particular lighting?
Unwrapping
To make the grooves a little easier to follow I first unwrapped the circular record into a linear image. I did this by remapping the image space from polar to 9000-wide Cartesian coordinates and then resampling it with a windowed sinc kernel:
Mapping the groove path
It's not easy to automatically follow the groove. As one would imagine, it's not a mathematically perfect spiral. Sometimes the groove disappears into darkness, or blurs into the adjacent track. But it wasn't overly tedious to draw a guiding path manually. Most of the work was just copy-pasting from a previous groove and making small adjustments.
I opened the unwrapped image in Inkscape and drew a colored polyline over all obvious grooves. I tried to make sure a polyline at the left image border would neatly continue where the previous one ended on the right side.
The grooves were alternatively labeled as 'a' and 'b', since I knew this record had two different sound effects on interleaved tracks.
This polyline was then exported from Inkscape and loaded by a script that extracted a 3-7 pixel high column from the unwrapped original, centered around the groove, for further processing.
Pixels to audio
I had noticed another information-carrying feature besides just the transverse area of the groove: its displacement from center. The white blobs sometimes appear below or above the imaginary center line.
I had my script calculate the brightness mass center (weighted y average) relative to the track polyline at all x positions along the groove. This position was then directly used as a PCM sample value, and the whole groove was written to a WAV file. A noise reduction algorithm was also applied, based on sample noise from the silent end of the groove.
The results are much better than what I previously obtained (see video below, or mp3 here):
Future ideas
Several factors limit the fidelity and dynamic range obtained by this method. For one, the relationship between the white blobs and needle movement is not known. The results could possibly still benefit from more pixel resolution and color bit depth. The blob central displacement (insofar as it is the most useful feature) could also be more accurately obtained using a Gaussian fit or similar algorithm.
The groove guide could be drawn more carefully, as some track slips can be heard in the recovered audio.
Opening up the robot for another photograph would be risky, since I already broke a plastic tab before. But other ways to optically capture the signal would be using a USB microscope or a flatbed scanner. These methods would still be only slightly more complicated that just using a microphone! The linear light source of the scanner would possibly cause problems with the circular groove. I would imagine the problem of the disappearing grooves would still be there, unless some sort of carefully controlled lighting was used.
My first thought would be to use a BRDF capture technique to get the surface normals. Essentially use a portable flash and a static camera to take a series of photos with the light coming from different directions. Filter out all but the highest specular reflections, and combine to form a normal map.
ReplyDeletesounds like he's saying "I am a .... colored robot."
ReplyDeleteIIRC, stereo is encoded by using horizontal wiggle as one channel and vertical wiggle as the other channel. (I don't know if 'wiggle' is the correct technical term... :p )
ReplyDeleteWiggle is very accurate! But this was a mono record.
DeleteActually stereo channels are recorded in the 45 degree angle to the record. Effectively that makes horizontal wiggle L+R and vertical L-R (or vice versa, can't remember the exact details).
DeleteThat's a pretty neat system.
Awesome as usual! How did you calibrate the 'sampling rate' along the track?
ReplyDeleteThanks! I used trial and error to get it somewhat close to what it actually sounds like (video) - in this case 23,250 Hz. The laser sound effect also has a hum near 120 Hz which I could have used to get closer to the original recording speed.
DeleteHave you considered popping the stepper motor from a (cheap?) flatbed scanner and have it spin the record while the light/scanline stays fixed?
ReplyDeleteI think it's rather answering the question "is it possible to recover audio from a single, ordinary photo"?
Deletexkr47, that's an interesting idea, could be worth exploring. You're spot-on there Anonymous, perhaps I'll clarify that in the post. I wouldn't want to open up the robot again. On the other hand, these records are probably easily available on eBay.
DeleteHave you considered option of a turntable with a step motor, laser and camera? the laser should give you a good detail of the disk grooves. Pictures with normal camera should give plenty of options of sampling areas. There was earlier mention of a microscope that may be used instead of the camera.
DeleteSince you are already using a DSLR, you could take raw images instead of 8-bit JPEGs. Straight away you'd get more bit depth and a linear relationship between the amount of light and the pixel values.
ReplyDeletewere you inspired by the project IRENE? http://irene.lbl.gov/ http://www.npr.org/templates/story/story.php?storyId=11851842
ReplyDeleteand its application to Edison dolls https://www.nps.gov/edis/learn/photosmultimedia/edison-talking-doll-recordings-1888-1890.htm
Interesting! I'll have to take a look. The first link is broken unfortunately.
DeleteA broken link? On a .gov site? :) It's not uncommon for sites to block whole nations at the network level, for some misguided reason...
DeleteHere's an alternative: https://web.archive.org/web/20170822163612/http://irene.lbl.gov/
And here's the second link, why not: https://web.archive.org/web/20170822163846/http://www.npr.org/templates/story/story.php?storyId=11851842
How about attempting to play the little record on a record player and capturing the audio directly into a wav file instead of taking a photo of it.
ReplyDeleteGreat idea! But the point of this post was to investigate methods for picture-only recovery, as in this case I didn't want to open the robot up again. I already broke a plastic tab before, they've become a little brittle with time.
DeleteWOW Beautiful.
ReplyDeleteIt did me agree to ARSS
http://arss.sourceforge.net/
or Phonopaper https://www.warmplace.ru/soft/phonopaper/index.php ;)
DeleteThis comment has been removed by the author.
ReplyDeleteHey, i've seen that RIIA pre-emphasis and de-emphasis is used during the creation and playback of the discs. Did you use the RIAA de-emphasis after analysing the image to get the sound ?
ReplyDelete