Sample solutions and discussion
Perl Quiz of The Week #7 (20021127)
A gentleman on the perl-qotw-discuss list reports:
> In two different companies that I've worked at, the policy has been
> that percentages in reports must always add up to 100% (at the cost
> of munging the actual data). It seems that otherwise end users
> report it as a bug.
This means, for example, that if you survey 300 people and find that
100 prefer the color red, 100 prefer blue, and 100 prefer black, you
are not allowed to report
33.3 % prefer red
33.3 % prefer blue
33.3 % prefer black
Because then the percentages appear to add up to only 99.9%. Instead,
you'll fib, by rounding one of the percentages up to 33.4% instead of
down to 33.3 %:
33.3 % prefer red
33.4 % prefer blue
33.3 % prefer black
This, of course, is ridiculous, since it suggests that there were
somehow more 'blue' responses than 'red' or 'black' responses, when
there were in fact equal numbers of each. But in the world of
business the appearance of correctness is sometimes more important
than actual correctness.
Similarly, if you survey 70 people and find that 30 prefer red, 30
prefer blue, and 10 prefer black, you may not say that
42.9 % prefer red
42.9 % prefer blue
14.3 % prefer black
because the percentages appear to add up to 100.1%. You must adjust
one of the percentages down by 0.1%.
You will write a function, 'fudge_numbers', which takes the real data
as input and returns the appropriate percentages.
The first argument to fudge_numbers() will be special: It will be an
integer, 0 or greater, indicating how many places after the decimal
point will be retained after rounding. An argument of 1 will mean
that the percentages you return would be rounded to the nearest tenth
of a percent, as in the examples above. An argument of 0 will mean
that the percentages should be rounded to the nearest percent; an
argument of 2 will mean that the percentages should be rounded to the
nearest hundredth of a percent.
The remaining arguments to fudge_numbers() will be the actual data,
which will all be non-negative numbers.
The return value of fudge_numbers() will be a list of numbers
indicating relative percentages. There must be exactly one return
value for each data argument. The return values must be rounded off
as indicated by the rounding argument, and they must total exactly
100. (Or as near as possible within the computer's limits of
precision.)
For example,
fudge_numbers(1, 100, 100, 100)
should return
(33.4, 33.3, 33.3)
or
(33.3, 33.4, 33.3)
or
(33.3, 33.3, 33.4)
(All are equally acceptable.)
Similarly:
Arguments Return values
1, 100, 100, 100 33.3, 33.4, 33.3
0, 100, 100, 100 33, 34, 33
2, 100, 100, 100 33.33, 33.34, 33.33
2, 7, 7, 7 33.33, 33.34, 33.33
1, 30, 30, 10 42.9, 42.9, 14.2
or 42.9, 42.8, 14.3
or 42.8, 42.8, 14.3
1 z 100
(here 'z' is any number)
----------------------------------------------------------------
I got a pleasant surprise while I was testing these. I didn't solve
the problem myself until very late, because I couldn't think of a
solution, and because I hate floating-point numbers. But then when
time came to write the report, I finally gave in and did it. Then I
ran the test suite and started looking at the programs in order from
shortest to longest. The shortest two didn't pass the tests. The
third-shortest did, but when I read the code I scratched my head and
said "That can't work, can it?" And then I added some more tests to
the test suite and found that it *didn't* work. Then the next
four-shortest also didn't pass the tests, and that left my own late
entry as the shortest version that did pass the tests.
Of course, it's still possible that someone might see it, scratch
their head, say "That can't work, can it?" and find the test that it
fails. But until then, here it is:
# Round $v to nearest integer
sub round { sprintf("%.0f", shift) }
# Add up the arguments
sub sum {
my $s = 0;
$s += $_ for @_;
$s;
}
sub fudge_numbers {
my ($prec, @d) = @_;
my $scale = 10 ** $prec;
my $sum = sum(@d);
# Scale data so that all significant digits are
# *left* of the decimal point
@p = map $_*100*$scale/$sum, @d;
@r = map round($_), @p; # rounded versions of @p
@e = map $p[$_]-$r[$_], (0 .. $#r); # error
# This is the number of jots by which the answer is too LOW.
my $total_error = round(sum(@e));
if ($total_error) {
# Sign +1: numbers need to be increased.
# -1: numbers need to be decreased
my $sign = $total_error < 0 ? -1 : 1;
$total_error *= $sign; # absolute value
# We want total_error equal to zero.
# To achieve this, we will add a jot to the low numbers,
# or subtract a jot from the high numbers, as needful.
for (0..$#r) {
next unless $e[$_] * $sign > 0; # Error goes the wrong way
$r[$_] += $sign; # Adjust value
$total_error--;
last if $total_error == 0;
}
}
map $_ / $scale, @r; # Scale data back to percentages
}
My background is in systems programming, and I think in my entire life
as a systems programmer I only ever used a floating-point number once.
I *hate* floating-point numbers, and I think it would be fair to
criticize me for avoiding them out of fear and ignorance. But once
again, avoiding them turned out to be a good strategy. I deal with
integers throughout. If the input is
(3, 50, 50, 50)
then instead of trying to come up with 33.333 / 33.333 / 33.334, and
worrying about the floating-point comparison issues, I try to come up
with 33333 / 33333 / 33334 and then scale the answers back to
percentages at the last moment. That way I don't have to worry about
the fact that Señor Computadoro Estúpido thinks that
100 - (33.333+33.333+33.333) = 0.00100000000000477.
Let's consider (1, 2, 3, 5) with a precision of 2 as an example.
The program first computes the percentages, but scaled so that all the
significant figures are to the left of the decimal point. For the
example, the percentage values are
909.090909090909
1818.18181818182
2727.27272727273
4545.45454545455
representing 9.090909%, 18.181818%, etc. This is the '@p'array. Then
the program rounds off the percentages to the specified precision;
this just means rounding them off to the nearest integer, since we
scaled them for that exact purpose. This is the '@r' array:
909
1818
2727
4545
The program then computes the difference between the true value (in
@p) and the rounded value (in @r); this is the 'error', stored in @e.
Since the true percentages must add up to 100%, and we want the
rounded values to do the same, we need to adjust the rounded values so
that the total error is 0. $total_error is the sum of the values in
@e, and we would like it to be 0. If it *is* 0, we don't need to do
any fudging at all, and we skip most of the rest of the function.
The big 'if' block in the middle of the function does the fudging.
First it calculates $fudge, which is +1 if the numbers need to be
fudged upward (because the total is too small, as with 33% + 33% +
33%) and -1 if the numbers need to be fudged downward (because the
total is too large, as with 17% + 17% + 17% + 17% + 17% + 17%.) We'll
choose some of the elements of @r and add $fudge to them to make the
total come out right. Because all the numbers have been scaled so
that the least significant place is just to the left of the decimal
point, we never need to consider a fudge amount other than +1 or -1.
Now we scan over the elements of @r looking for candidates for
fudging. If the number is already too small, we mustn't fudge it
still further downward, and vice versa; the "next unless $e[$_] * $fudge < 0"
line takes care of this check: the total rounding error for this element
must be in the *opposite* direction from the direction we're trying to
fudge.
When we find a fudging candidate, we fudge it ($r[$_] += $fudge) and
then adjust the $total_error in the same way. When the total error
reaches zero, no more adjustments are necessary.
After we've finished any necessary adjustments, we scale the adjusted
elements of @r back to the right size for percentages and return the
results.
*. This time there was no discussion of peculiar edge cases. Are
negative numbers allowed? What if all the numbers are zero?
Perhaps all the edge-case-fanciers were on vacation.
*. This problem turned out to be quite difficult to get right, much
harder than I thought it would be. Of 17 programs posted to the
-discuss list, only 4 (from 3 authors) passed all the tests!
You should consider trying the test suite yourself. You can obtain
it from
http://perl.plover.com/qotw/misc/r007/TestFudge.pm
Then to use it, run the command
perl -MTestFudge yourprogram.pl
and look for 'not ok' in the output. If your program fails a test,
debugging it will probably be at least as instructive as doing the
quiz in the first place.
Thanks to Andreas Koenig for the tricky test case (#44) that caught
out one of the submitted programs.
*. A very common error was to compute the fudge factor correctly and
then to apply it to the wrong elements. Many people assumed that
any of the result values could be fudged. But doing so can lead to
bizarre results. Nobody would accept (37, 23, 40) as a valid
fudging of (33.3, 33.3, 33.3). Similarly, once person said on the
-discuss list:
I do not think the improper result from
[0 1 1 1 1 1 1] -> [15 17 17 17 17 17]
is a bug so much as an issue with the constraints of the
problem.
Maybe, but the problem said:
The return value ... will be a list of numbers indicating
relative percentages.... [which] must be rounded off as
indicated by the rounding argument.
There is no way to interpret '15' as 16 2/3 % (the exact relative
percentage) rounded off to 0 decimal places. (The person quoted
above submitted a revised solution when this language was pointed
out to him; nevertheless, even if I'd somehow left a loophole in
the problem specification, what's the point of producing a
solution that you know is defective just because you can weasel it
through a loophole in the problem statement? The Quiz of the Week
is not mandatory.)
Anyway, many results were misrounded even by solutions that were
*not* deliberately ignoring the requirement to round off. For
example, test 104 concerned the data (2 2 1 1 1 1) rounded off to
0 places. The exact percentages are (25, 25, 12.5, 12.5, 12.5, 12.5).
There are a lot of reasonable answers here, all of the form (25,
25, 12, 13, 12, 13). But what you *cannot* do is alter the 25,
which is already exact. There is no interpretation of 'round
off' in which 25 is 'rounded off' to anything other than 25.
Nevertheless, among the solutions submitted on the -discuss list,
the 25 was 'rounded off' to 23, 24, 26, and 27. (One hapless
poster got the 25's right and then rounded off 12.5 to 14.) All
together, 10 of the 17 posted solutions failed this test.
Similarly, faced with (2 1 1 1 1), where the exact answer was (33 1/3,
16 2/3, 16 2/3, 16 2/3, 16 2/3), and the correct result would have
been something like (33, 16, 17, 17, 17), eight of the 17 programs
produced (32, 17, 17, 17, 17) instead.
*. Randal Schwartz said:
I thought about a test harness for easy#7, but when I realized
that the numbers could come back in any order, I punted. :)
I had meant to require the percentages to be in the same order as
the input data. That is, given data (1, 2, 3, 4), the return
values MUST be (10, 20, 30, 40), and not some other permutation of
those. But it turned out that nobody returned the results in the
wrong order, so I didn't have to worry about it.
*. John Macdonald's third posted solution
(http:°perl.plover.com/qotw/misc/r007/macdonald3.pl) is worth
study, because it relies on a clever insight:
Using truncation has some advantages.
- As you point out, the numbers are within 1 in the last
decimal place.
- Fudging will always be incrementing, never decrementing.
Then a light bulb went on. If you use the technique of
applying the fudge to the elements that had the greatest error
from the truncation, the fudging process will then minimise the
final error and come to the same result as when you start with
rounding.
And, because the fudging is always positive, the code is
simpler. I like that.
This gave me that "Gosh, I wish I had thought of that"feeling.
Modulo this insight, his program is very similar to mine.
*. Since at most one other person produced a correct answer, I thought
I'd better look at it to see if it was doing anything different.
It was sent in by Brian King. The first thing that grabbed my
attention was:
my $tolerance = 0.000_000_001; # 1 one-billionth should suffice.
What's funny about this is that I had been about to write almost
exactly the same thing in my own program, but then I got a nagging
feeling about what would happen if the caller asked for their
percentages rounded off to the nearest ten-billionth, and I
couldn't see a way out, so I scrapped the whole idea and went with
the always-use-integers approach that I showed above. And sure
enough, Mr. King's program produces the wrong answer for (10, 1, 1, 1).
The output is (33.3333333333 33.3333333333 33.3333333333), but it
should be (33.3333333334 33.3333333333 33.3333333333). Darn!
That said, there were a couple of other things I found interesting
about Mr. King's program. It was one of the longer ones
(second-longest, in fact) and I wondered why. Mostly it seemed to
be because of repeated code. For example:
if ( $off_by > 0 && ( abs($off_by) > $tolerance ) ) {
# if the overall difference is positive & we're still off...
if ( $out > ( ( $this / $sum ) * 100 ) ) {
# and if we rounded this one up
$out -= $precision;
$off_by -= $precision;
#round it down instead. update how much we're still off
}
} ## end if ( $off_by > 0 && ( abs($off_by...
elsif ( $off_by < 0 && ( abs($off_by) > $tolerance ) ) {
if ( $out < ( $this / $sum ) * 100 ) {
$out += $precision;
$off_by += $precision;
}
}
Here I would have at least eliminated the repeated
abs($off_by) > $tolerance
test:
if (abs($off_by) > $tolerance) {
if ($off_by > 0) {
...
} elsif ($off_by < 0) {
...
}
}
but I would have preferred to somehow merge the two blocks into
one. And in fact, I *did* merge the two blocks into one; this code
corresponds closely with the
next unless $e[$_] * $sign > 0; # Error goes the wrong way
$r[$_] += $sign; # Adjust value
$total_error--;
section of my example program. Along similar lines, Mr. King's
program has
if ( $precision == 0 ) {
$precision = 1;
}
else {
$precision = '.' . '0' x ( $precision - 1 ) . '1';
}
but it would have been simpler to do
$precision = 10 ** -$precision;
Don't take these criticisms too seriously, since I wouldn't even
have been looking at the code so closely if it hadn't outperformed
almost all the other submitted programs; the one real defect could
be fixed (if necessary) by adjusting $tolerance to a more
appropriate value.
That's all for this week's regular quiz. I'll send something about
the frost simulators tomorrow, and new quizzes on Wednesday. My
thanks to everyone who contributed to the discussion, but also
especially to the people who worked the problem on their own.