Recently I solved a small problem and found it funny enough to write a post.
I had a PDF document with numbers in it (ones and zeros). I needed the numbers for a program, but they were embedded as a scanned picture instead of text. Copying them by hand would be boring and error-prone. I wouldn't want any typos, seeing as the numbers themselves were supposed to be part of an error-correcting code.
Then I thought: perhaps Perl could do this for me! I came up with this:
#!/usr/bin/perl
use feature "switch";
open(S,"convert bitit.png gray:-|");
for $y (0..434) {
for $x (0..699) {
read(S,$a,1);
$b[int($x / 27)][int($y / 27)] ++ if (ord($a) < 127);
}
}
close(S);
for $y (0..15) {
$byte = 0;
for $x (0..25) {
$byte <<= 1;
given ($b[$x][$y] // 0) {
when ($_ < 10) { print " "; }
when ($_ < 90) { print "1 "; $byte++; }
default { print "0 "; }
}
}
printf (" 0x%07x\n",$byte);
}
Running the script produces:
$ perl bitit.pl
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0x2000077
0 1 0 1 0 1 1 1 0 0 1 1 1 0x10002e7
0 1 0 1 1 1 0 1 0 1 1 1 1 0x08003af
0 1 0 1 1 0 0 0 0 1 0 1 1 0x040030b
0 1 0 1 1 0 1 0 1 1 0 0 1 0x0200359
0 1 0 1 1 0 1 1 1 0 0 0 0 0x0100370
0 1 0 0 1 1 0 1 1 1 0 0 0 0x00801b8
0 1 0 0 0 1 1 0 1 1 1 0 0 0x00400dc
0 1 0 0 0 0 1 1 0 1 1 1 0 0x002006e
0 1 0 0 0 0 0 1 1 0 1 1 1 0x0010037
0 1 0 1 0 1 1 0 0 0 1 1 1 0x00082c7
0 1 0 1 1 1 0 1 1 1 1 1 1 0x00043bf
0 1 0 1 1 0 0 0 0 0 0 1 1 0x0002303
0 1 0 1 1 0 1 0 1 1 1 0 1 0x000135d
0 1 0 1 1 0 1 1 1 0 0 1 0 0x0000b72
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 1 0 0 1 0x00005b9
$ █
How does it do that? The script divides the image — read pixel-by-pixel via ImageMagick — into squares and counts the black pixels in every square. The character "0" has obviously more black than "1"; the threshold was found by experimenting. An empty square has nearly no black pixels at all, and depicts a zero in this example. Calculating a hex value for every row is simple.
I ended up having to write only slightly more characters than the image contained! :)
Wonderful =)
ReplyDeleteGuess how the old bank checks with the magnetic ink were read...there was no fancy OCR or pixel-by-pixel recognition...each digit was shaped in a weird 1970's sci-fi computer font which had a different detectable amount of ink for each number from 0 through 9. The different amount of ink had different magnetic flux levels resulting in different voltages which became a 10 level logic signal which was then sampled and converted to binary for the computer.
ReplyDelete