Sample solutions and discussion Perl Quiz of The Week #7 (20021127) A gentleman on the perl-qotw-discuss list reports: > In two different companies that I've worked at, the policy has been > that percentages in reports must always add up to 100% (at the cost > of munging the actual data). It seems that otherwise end users > report it as a bug. This means, for example, that if you survey 300 people and find that 100 prefer the color red, 100 prefer blue, and 100 prefer black, you are not allowed to report 33.3 % prefer red 33.3 % prefer blue 33.3 % prefer black Because then the percentages appear to add up to only 99.9%. Instead, you'll fib, by rounding one of the percentages up to 33.4% instead of down to 33.3 %: 33.3 % prefer red 33.4 % prefer blue 33.3 % prefer black This, of course, is ridiculous, since it suggests that there were somehow more 'blue' responses than 'red' or 'black' responses, when there were in fact equal numbers of each. But in the world of business the appearance of correctness is sometimes more important than actual correctness. Similarly, if you survey 70 people and find that 30 prefer red, 30 prefer blue, and 10 prefer black, you may not say that 42.9 % prefer red 42.9 % prefer blue 14.3 % prefer black because the percentages appear to add up to 100.1%. You must adjust one of the percentages down by 0.1%. You will write a function, 'fudge_numbers', which takes the real data as input and returns the appropriate percentages. The first argument to fudge_numbers() will be special: It will be an integer, 0 or greater, indicating how many places after the decimal point will be retained after rounding. An argument of 1 will mean that the percentages you return would be rounded to the nearest tenth of a percent, as in the examples above. An argument of 0 will mean that the percentages should be rounded to the nearest percent; an argument of 2 will mean that the percentages should be rounded to the nearest hundredth of a percent. The remaining arguments to fudge_numbers() will be the actual data, which will all be non-negative numbers. The return value of fudge_numbers() will be a list of numbers indicating relative percentages. There must be exactly one return value for each data argument. The return values must be rounded off as indicated by the rounding argument, and they must total exactly 100. (Or as near as possible within the computer's limits of precision.) For example, fudge_numbers(1, 100, 100, 100) should return (33.4, 33.3, 33.3) or (33.3, 33.4, 33.3) or (33.3, 33.3, 33.4) (All are equally acceptable.) Similarly: Arguments Return values 1, 100, 100, 100 33.3, 33.4, 33.3 0, 100, 100, 100 33, 34, 33 2, 100, 100, 100 33.33, 33.34, 33.33 2, 7, 7, 7 33.33, 33.34, 33.33 1, 30, 30, 10 42.9, 42.9, 14.2 or 42.9, 42.8, 14.3 or 42.8, 42.8, 14.3 1 z 100 (here 'z' is any number) ---------------------------------------------------------------- I got a pleasant surprise while I was testing these. I didn't solve the problem myself until very late, because I couldn't think of a solution, and because I hate floating-point numbers. But then when time came to write the report, I finally gave in and did it. Then I ran the test suite and started looking at the programs in order from shortest to longest. The shortest two didn't pass the tests. The third-shortest did, but when I read the code I scratched my head and said "That can't work, can it?" And then I added some more tests to the test suite and found that it *didn't* work. Then the next four-shortest also didn't pass the tests, and that left my own late entry as the shortest version that did pass the tests. Of course, it's still possible that someone might see it, scratch their head, say "That can't work, can it?" and find the test that it fails. But until then, here it is: # Round $v to nearest integer sub round { sprintf("%.0f", shift) } # Add up the arguments sub sum { my $s = 0; $s += $_ for @_; $s; } sub fudge_numbers { my ($prec, @d) = @_; my $scale = 10 ** $prec; my $sum = sum(@d); # Scale data so that all significant digits are # *left* of the decimal point @p = map $_*100*$scale/$sum, @d; @r = map round($_), @p; # rounded versions of @p @e = map $p[$_]-$r[$_], (0 .. $#r); # error # This is the number of jots by which the answer is too LOW. my $total_error = round(sum(@e)); if ($total_error) { # Sign +1: numbers need to be increased. # -1: numbers need to be decreased my $sign = $total_error < 0 ? -1 : 1; $total_error *= $sign; # absolute value # We want total_error equal to zero. # To achieve this, we will add a jot to the low numbers, # or subtract a jot from the high numbers, as needful. for (0..$#r) { next unless $e[$_] * $sign > 0; # Error goes the wrong way $r[$_] += $sign; # Adjust value $total_error--; last if $total_error == 0; } } map $_ / $scale, @r; # Scale data back to percentages } My background is in systems programming, and I think in my entire life as a systems programmer I only ever used a floating-point number once. I *hate* floating-point numbers, and I think it would be fair to criticize me for avoiding them out of fear and ignorance. But once again, avoiding them turned out to be a good strategy. I deal with integers throughout. If the input is (3, 50, 50, 50) then instead of trying to come up with 33.333 / 33.333 / 33.334, and worrying about the floating-point comparison issues, I try to come up with 33333 / 33333 / 33334 and then scale the answers back to percentages at the last moment. That way I don't have to worry about the fact that Señor Computadoro Estúpido thinks that 100 - (33.333+33.333+33.333) = 0.00100000000000477. Let's consider (1, 2, 3, 5) with a precision of 2 as an example. The program first computes the percentages, but scaled so that all the significant figures are to the left of the decimal point. For the example, the percentage values are 909.090909090909 1818.18181818182 2727.27272727273 4545.45454545455 representing 9.090909%, 18.181818%, etc. This is the '@p'array. Then the program rounds off the percentages to the specified precision; this just means rounding them off to the nearest integer, since we scaled them for that exact purpose. This is the '@r' array: 909 1818 2727 4545 The program then computes the difference between the true value (in @p) and the rounded value (in @r); this is the 'error', stored in @e. Since the true percentages must add up to 100%, and we want the rounded values to do the same, we need to adjust the rounded values so that the total error is 0. $total_error is the sum of the values in @e, and we would like it to be 0. If it *is* 0, we don't need to do any fudging at all, and we skip most of the rest of the function. The big 'if' block in the middle of the function does the fudging. First it calculates $fudge, which is +1 if the numbers need to be fudged upward (because the total is too small, as with 33% + 33% + 33%) and -1 if the numbers need to be fudged downward (because the total is too large, as with 17% + 17% + 17% + 17% + 17% + 17%.) We'll choose some of the elements of @r and add $fudge to them to make the total come out right. Because all the numbers have been scaled so that the least significant place is just to the left of the decimal point, we never need to consider a fudge amount other than +1 or -1. Now we scan over the elements of @r looking for candidates for fudging. If the number is already too small, we mustn't fudge it still further downward, and vice versa; the "next unless $e[$_] * $fudge < 0" line takes care of this check: the total rounding error for this element must be in the *opposite* direction from the direction we're trying to fudge. When we find a fudging candidate, we fudge it ($r[$_] += $fudge) and then adjust the $total_error in the same way. When the total error reaches zero, no more adjustments are necessary. After we've finished any necessary adjustments, we scale the adjusted elements of @r back to the right size for percentages and return the results. *. This time there was no discussion of peculiar edge cases. Are negative numbers allowed? What if all the numbers are zero? Perhaps all the edge-case-fanciers were on vacation. *. This problem turned out to be quite difficult to get right, much harder than I thought it would be. Of 17 programs posted to the -discuss list, only 4 (from 3 authors) passed all the tests! You should consider trying the test suite yourself. You can obtain it from http://perl.plover.com/qotw/misc/r007/TestFudge.pm Then to use it, run the command perl -MTestFudge yourprogram.pl and look for 'not ok' in the output. If your program fails a test, debugging it will probably be at least as instructive as doing the quiz in the first place. Thanks to Andreas Koenig for the tricky test case (#44) that caught out one of the submitted programs. *. A very common error was to compute the fudge factor correctly and then to apply it to the wrong elements. Many people assumed that any of the result values could be fudged. But doing so can lead to bizarre results. Nobody would accept (37, 23, 40) as a valid fudging of (33.3, 33.3, 33.3). Similarly, once person said on the -discuss list: I do not think the improper result from [0 1 1 1 1 1 1] -> [15 17 17 17 17 17] is a bug so much as an issue with the constraints of the problem. Maybe, but the problem said: The return value ... will be a list of numbers indicating relative percentages.... [which] must be rounded off as indicated by the rounding argument. There is no way to interpret '15' as 16 2/3 % (the exact relative percentage) rounded off to 0 decimal places. (The person quoted above submitted a revised solution when this language was pointed out to him; nevertheless, even if I'd somehow left a loophole in the problem specification, what's the point of producing a solution that you know is defective just because you can weasel it through a loophole in the problem statement? The Quiz of the Week is not mandatory.) Anyway, many results were misrounded even by solutions that were *not* deliberately ignoring the requirement to round off. For example, test 104 concerned the data (2 2 1 1 1 1) rounded off to 0 places. The exact percentages are (25, 25, 12.5, 12.5, 12.5, 12.5). There are a lot of reasonable answers here, all of the form (25, 25, 12, 13, 12, 13). But what you *cannot* do is alter the 25, which is already exact. There is no interpretation of 'round off' in which 25 is 'rounded off' to anything other than 25. Nevertheless, among the solutions submitted on the -discuss list, the 25 was 'rounded off' to 23, 24, 26, and 27. (One hapless poster got the 25's right and then rounded off 12.5 to 14.) All together, 10 of the 17 posted solutions failed this test. Similarly, faced with (2 1 1 1 1), where the exact answer was (33 1/3, 16 2/3, 16 2/3, 16 2/3, 16 2/3), and the correct result would have been something like (33, 16, 17, 17, 17), eight of the 17 programs produced (32, 17, 17, 17, 17) instead. *. Randal Schwartz said: I thought about a test harness for easy#7, but when I realized that the numbers could come back in any order, I punted. :) I had meant to require the percentages to be in the same order as the input data. That is, given data (1, 2, 3, 4), the return values MUST be (10, 20, 30, 40), and not some other permutation of those. But it turned out that nobody returned the results in the wrong order, so I didn't have to worry about it. *. John Macdonald's third posted solution (http:°perl.plover.com/qotw/misc/r007/macdonald3.pl) is worth study, because it relies on a clever insight: Using truncation has some advantages. - As you point out, the numbers are within 1 in the last decimal place. - Fudging will always be incrementing, never decrementing. Then a light bulb went on. If you use the technique of applying the fudge to the elements that had the greatest error from the truncation, the fudging process will then minimise the final error and come to the same result as when you start with rounding. And, because the fudging is always positive, the code is simpler. I like that. This gave me that "Gosh, I wish I had thought of that"feeling. Modulo this insight, his program is very similar to mine. *. Since at most one other person produced a correct answer, I thought I'd better look at it to see if it was doing anything different. It was sent in by Brian King. The first thing that grabbed my attention was: my $tolerance = 0.000_000_001; # 1 one-billionth should suffice. What's funny about this is that I had been about to write almost exactly the same thing in my own program, but then I got a nagging feeling about what would happen if the caller asked for their percentages rounded off to the nearest ten-billionth, and I couldn't see a way out, so I scrapped the whole idea and went with the always-use-integers approach that I showed above. And sure enough, Mr. King's program produces the wrong answer for (10, 1, 1, 1). The output is (33.3333333333 33.3333333333 33.3333333333), but it should be (33.3333333334 33.3333333333 33.3333333333). Darn! That said, there were a couple of other things I found interesting about Mr. King's program. It was one of the longer ones (second-longest, in fact) and I wondered why. Mostly it seemed to be because of repeated code. For example: if ( $off_by > 0 && ( abs($off_by) > $tolerance ) ) { # if the overall difference is positive & we're still off... if ( $out > ( ( $this / $sum ) * 100 ) ) { # and if we rounded this one up $out -= $precision; $off_by -= $precision; #round it down instead. update how much we're still off } } ## end if ( $off_by > 0 && ( abs($off_by... elsif ( $off_by < 0 && ( abs($off_by) > $tolerance ) ) { if ( $out < ( $this / $sum ) * 100 ) { $out += $precision; $off_by += $precision; } } Here I would have at least eliminated the repeated abs($off_by) > $tolerance test: if (abs($off_by) > $tolerance) { if ($off_by > 0) { ... } elsif ($off_by < 0) { ... } } but I would have preferred to somehow merge the two blocks into one. And in fact, I *did* merge the two blocks into one; this code corresponds closely with the next unless $e[$_] * $sign > 0; # Error goes the wrong way $r[$_] += $sign; # Adjust value $total_error--; section of my example program. Along similar lines, Mr. King's program has if ( $precision == 0 ) { $precision = 1; } else { $precision = '.' . '0' x ( $precision - 1 ) . '1'; } but it would have been simpler to do $precision = 10 ** -$precision; Don't take these criticisms too seriously, since I wouldn't even have been looking at the code so closely if it hadn't outperformed almost all the other submitted programs; the one real defect could be fixed (if necessary) by adjusting $tolerance to a more appropriate value. That's all for this week's regular quiz. I'll send something about the frost simulators tomorrow, and new quizzes on Wednesday. My thanks to everyone who contributed to the discussion, but also especially to the people who worked the problem on their own.