Sample solutions and discussion Perl Expert Quiz of The Week #19 (20040712) banner(1) and cursive(1) are old UNIX utilities that take ASCII input (0-127) and render the input in a hard-coded font like the examples below. Later, FIGlet was created to use a plain text format font file to allow a wider variety of output styles. % banner japh # # ###### # # # # # # # # # # # # # # # # # # # ###### ####### # # ####### # # # # # # # # # # ##### # # # # # % cursive japh / o __. _ /_ /_(_/|_/_)_/ /_ / / -' ' Write a program to do "optical" character recognition of ASCII-Art text. Given an input string provided on STDIN print the text represented by the ASCII-art to STDOUT. There are three milestones for this puzzle. 1) Process any output produced by banner(1) and print the result to STDOUT. Your program should obviously print "JAPH" or "japh" at your discretion, if given the output from banner above; but not "jAph". If you do not have banner(1) or your banner prints vertically instead of horizontally use the banner font for FIGlet. Both the font and FIGlet itself are available at http://figlet.org/ OR ftp://pthbb.org/mirror-mirror/figlet/ Instead, you could also use one of the many web interfaces for generating FIGlet output such as http://wfiglet.handalak.com/ For the next two milestones you need to download some FIGlet fonts available at ftp://ftp.figlet.org/pub/figlet/fonts/contributed.tar.gz OR ftp://pthbb.org/mirror-mirror/figlet/fonts/contributed.tar.gz The FIGlet font file format is rather intuitive but described at length in the FIGlet man page widely available online including http://www.redstone.army.mil/documents/figlet-2.1.1.man.html A tarball with a special issue of the Text::FIGlet module is available at ftp://pthbb.org/pub/misc/qotw.tgz It includes perl versions bof figlet and banner, and the figlet man page 2) Process any output produced by FIGlet with the basic and drpepper fonts. Your output should be as before. In addition to the text to process from STDIN you must accept an argument -f= specifying the path to the font the input was created with. % figlet -f basic -A basic d88b .d8b. d8888b. db db `8P' d8' `8b 88 `8D 88 88 88 88ooo88 88oodD' 88ooo88 88 88~~~88 88~~~ 88~~~88 db. 88 88 88 88 88 88 Y8888P YP YP 88 YP YP % !! | yourprogram.pl -f=figlib/basic.flf JAPH % figlet -f drpepper -A japh _ _ <_> ___ ___ | |_ | |<_> || . \| . | | |<___|| _/|_|_| <__' |_| % !! | yourprogram.pl -f=figlib/drpepper.flf japh 3) Process any output produced by FIGlet with the argument -m-1 and a font from the library listed above. You should accept an argument -d= which specifies the location of the font library. Programatically determine the font used to render the provided input and process accordingly. Your output should be as before with the name of the font chosen printed to STDERR before the results are printed to STDOUT. / # #/ ### ## # ## ## ### /### /### ## /## ### / ### / / ### / ## / ### ## / ###/ / ###/ ##/ ### / ## ## ## ## ## ## / ## ## ## ## ## ## ### ## ## ## ## ## ## ### ## ## ## ## ## ## ### ## /# ## ## ## ## ### ####/ ## ####### ## ## ## ### ## ###### ## ## ## ## / / ## / / ## / / ## / PS> For the interminably curious to see why the condition of -m-1 is imposed compare the output of the font slscript with and wihtout it. ---------------------------------------------------------------- This goal of this weeks quiz was to write a program to perform "OCR" on ASCII-Art and determine what text the image represents. For details see the original lengthy problem statement at http://perl.plover.com/qotw/e/019 The quiz had three milestones, and the four particpants' entries were as follows Milestone 1 (M1): Jurgen Pletinckx http://perl.plover.com/~alias/list.cgi?1:mss:1892 Patrick LeBoutillier http://perl.plover.com/~alias/list.cgi?1:mss:1888 Jerrad Pierce http://perl.plover.com/~alias/list.cgi?1:mss:1878 Milestone 2 (M2): Jurgen Pletinckx Same as above Jerrad Pierce Same as above Milestone 3 (M3): Jurgen Pletinckx Same as above Jerrad Pierce http://perl.plover.com/~alias/list.cgi?1:mss:1891 Ronald J Kimball http://perl.plover.com/~alias/list.cgi?1:mss:1889 I expected most people to use the same program for each milestone, or at least M1 and M2, it seemed easier to develop a script and enhance it to comply with subsequent requirements. Not to get ahead of myself, but I think it's interesting, and speaks to the nature of the challenge that all of the submissions require perfect input in order to work. My own submission for M3 fails to recognize "hello" in fraktur for the mismatch of a single "pixel". All of the programs could be described on some level as "brute force", though some more than others. Jurgen and Ronald's solutions for M3 process the input with all available fonts, and they have the false positives of rot13 and term as a result. There were four approaches to the guts of character matching, substr, shift, regexp and serialization. Jurgen used substr for each row of input to extract the section of input corresponding to a figlet character for comparison. Patrick used a combination of substr and serialized figlet characters (a one line representation for easy storage and retrieval in a lookup table on disk). Ronald showed his expert knowledge by using /\G//g against a string of serialized input that had been effectively rotated the text 90 degrees clockwise. Jerrad had a similarly interesting strategy of rotating the input with Text::Orientation 90 degrees clockwise in order to swap columns for rows; making it easy to shift lines off the resulting array as they are used in a match. Jerrad's program also included several optimizations including a character frequency table and short-circuiting once a sole potential match remained. Jurgen's program had an internal requirement that characters be tested widest first and others used ASCII order. Jerrad and Jurgen's programs both performed many more tests to accquire a match due to the placement of tests for equality inside several levels of loops. I (Jerrad) had experimented with something more "liberal" that would have alleviated this and made the engine less finicky about input, "low-resolution" matching. I checked the 0th, 3rd, and 6th columns for equality. It worked rather well however h, m and n were often indistinguishable. That could have been fixed with a progressive scan scheme but at the expense of readily avoiding an infinite loop for fonts such as basic where lowercase and uppercase have the same representation. Jurgen and Ronald's M3 submissions were the only ones that worked out of the box. All of the font-matching engines excluded fonts whose height was not a multiple of the input height; fonts 4 and 8 line high allowed for 16 line high input, but not 6, 9 or 10 line high fonts. Jurgen and Ronald both then proceeded to process the text against the remaining fonts. Jerrad adapted Patrick's fingerprint cache on disk, eliminating fonts that did not include all of the characters used in the input; helas this high pay-off optimization is another example of requiring perfect input. [ Thanks to Jerrad Pierce for running the QOTW this week. New quiz tomorrow. -MJD ]