Sample solutions and discussion
Perl Quiz of The Week #7 (20021127)


        A gentleman on the perl-qotw-discuss list reports:

        > In two different companies that I've worked at, the policy has been
        > that percentages in reports must always add up to 100% (at the cost
        > of munging the actual data).  It seems that otherwise end users
        > report it as a bug.

        This means, for example, that if you survey 300 people and find that
        100 prefer the color red, 100 prefer blue, and 100 prefer black, you
        are not allowed to report

                33.3 % prefer red
                33.3 % prefer blue
                33.3 % prefer black

        Because then the percentages appear to add up to only 99.9%.  Instead,
        you'll fib, by rounding one of the percentages up to 33.4% instead of
        down to 33.3 %:

                33.3 % prefer red
                33.4 % prefer blue
                33.3 % prefer black

        This, of course, is ridiculous, since it suggests that there were
        somehow more 'blue' responses than 'red' or 'black' responses, when
        there were in fact equal numbers of each.  But in the world of
        business the appearance of correctness is sometimes more important
        than actual correctness.

        Similarly, if you survey 70 people and find that 30 prefer red, 30
        prefer blue, and 10 prefer black, you may not say that

                42.9 % prefer red
                42.9 % prefer blue
                14.3 % prefer black

        because the percentages appear to add up to 100.1%.  You must adjust
        one of the percentages down by 0.1%.

        You will write a function, 'fudge_numbers', which takes the real data
        as input and returns the appropriate percentages.

        The first argument to fudge_numbers() will be special: It will be an
        integer, 0 or greater, indicating how many places after the decimal
        point will be retained after rounding.  An argument of 1 will mean
        that the percentages you return would be rounded to the nearest tenth
        of a percent, as in the examples above.  An argument of 0 will mean
        that the percentages should be rounded to the nearest percent; an
        argument of 2 will mean that the percentages should be rounded to the
        nearest hundredth of a percent.

        The remaining arguments to fudge_numbers() will be the actual data,
        which will all be non-negative numbers.

        The return value of fudge_numbers() will be a list of numbers
        indicating relative percentages.  There must be exactly one return
        value for each data argument.  The return values must be rounded off
        as indicated by the rounding argument, and they must total exactly
        100.  (Or as near as possible within the computer's limits of
        precision.)  

        For example, 

                fudge_numbers(1, 100, 100, 100) 

        should return

                (33.4, 33.3, 33.3)
        or
                (33.3, 33.4, 33.3)
        or
                (33.3, 33.3, 33.4)

        (All are equally acceptable.)

        Similarly:

                Arguments                       Return values

                1, 100, 100, 100                33.3, 33.4, 33.3
                0, 100, 100, 100                33, 34, 33
                2, 100, 100, 100                33.33, 33.34, 33.33

                2, 7, 7, 7                      33.33, 33.34, 33.33

                1, 30, 30, 10                   42.9, 42.9, 14.2
                                             or 42.9, 42.8, 14.3
                                             or 42.8, 42.8, 14.3

                1 z                             100
                        (here 'z' is any number)

----------------------------------------------------------------

I got a pleasant surprise while I was testing these.  I didn't solve
the problem myself until very late, because I couldn't think of a
solution, and because I hate floating-point numbers.  But then when
time came to write the report, I finally gave in and did it.  Then I
ran the test suite and started looking at the programs in order from
shortest to longest.  The shortest two didn't pass the tests.  The
third-shortest did, but when I read the code I scratched my head and
said "That can't work, can it?"  And then I added some more tests to
the test suite and found that it *didn't* work.   Then the next
four-shortest also didn't pass the tests, and that left my own late
entry as the shortest version that did pass the tests.  

Of course, it's still possible that someone might see it, scratch
their head, say "That can't work, can it?" and find the test that it
fails.  But until then, here it is:

        # Round $v to nearest integer
        sub round { sprintf("%.0f", shift) }

        # Add up the arguments
        sub sum {
          my $s = 0;
          $s += $_ for @_;
          $s;
        }

        sub fudge_numbers {
          my ($prec, @d) = @_;
          my $scale = 10 **  $prec;
          my $sum = sum(@d);

          # Scale data so that all significant digits are 
          # *left* of the decimal point
          @p = map $_*100*$scale/$sum, @d; 
          @r = map round($_), @p;       # rounded versions of @p
          @e = map $p[$_]-$r[$_], (0 .. $#r); # error

          # This is the number of jots by which the answer is too LOW.
          my $total_error = round(sum(@e));

          if ($total_error) {
            # Sign +1: numbers need to be increased. 
            #      -1: numbers need to be decreased
            my $sign = $total_error < 0 ? -1 : 1;
            $total_error *= $sign;      # absolute value

            # We want total_error equal to zero.
            # To achieve this, we will add a jot to the low numbers, 
            # or subtract a jot from the high numbers, as needful.

            for (0..$#r) {
              next unless $e[$_] * $sign > 0;  # Error goes the wrong way
              $r[$_] += $sign;          # Adjust value
              $total_error--;          
              last if $total_error == 0;
            }
          }

          map $_ / $scale, @r;  # Scale data back to percentages
        }

My background is in systems programming, and I think in my entire life
as a systems programmer I only ever used a floating-point number once.
I *hate* floating-point numbers, and I think it would be fair to
criticize me for avoiding them out of fear and ignorance.  But once
again, avoiding them turned out to be a good strategy.  I deal with
integers throughout.  If the input is

        (3, 50, 50, 50)

then instead of trying to come up with 33.333 / 33.333 / 33.334, and
worrying about the floating-point comparison issues, I try to come up
with 33333 / 33333 / 33334 and then scale the answers back to
percentages at the last moment.   That way I don't have to worry about
the fact that Señor Computadoro Estúpido thinks that
100 - (33.333+33.333+33.333) = 0.00100000000000477.

Let's consider (1, 2, 3, 5) with a precision of 2 as an example.
The program first computes the percentages, but scaled so that all the
significant figures are to the left of the decimal point. For the
example, the percentage values are

   909.090909090909
  1818.18181818182
  2727.27272727273
  4545.45454545455

representing 9.090909%, 18.181818%, etc.  This is the '@p'array.  Then
the program rounds off the percentages to the specified precision;
this just means rounding them off to the nearest integer, since we
scaled them for that exact purpose.  This is the '@r' array:

   909
  1818
  2727
  4545

The program then computes the difference between the true value (in
@p) and the rounded value (in @r); this is the 'error', stored in @e.
Since the true percentages must add up to 100%, and we want the
rounded values to do the same, we need to adjust the rounded values so
that the total error is 0.  $total_error is the sum of the values in
@e, and we would like it to be 0.  If it *is* 0, we don't need to do
any fudging at all, and we skip most of the rest of the function.

The big 'if' block in the middle of the function does the fudging.
First it calculates $fudge, which is +1 if the numbers need to be
fudged upward (because the total is too small, as with 33% + 33% +
33%) and -1 if the numbers need to be fudged downward (because the
total is too large, as with 17% + 17% + 17% + 17% + 17% + 17%.)  We'll
choose some of the elements of @r and add $fudge to them to make the
total come out right.  Because all the numbers have been scaled so
that the least significant place is just to the left of the decimal
point, we never need to consider a fudge amount other than +1 or -1.

Now we scan over the elements of @r looking for candidates for
fudging.  If the number is already too small, we mustn't fudge it
still further downward, and vice versa; the "next unless $e[$_] * $fudge < 0"
line takes care of this check:  the total rounding error for this element
must be in the *opposite* direction from the direction we're trying to
fudge.  

When we find a fudging candidate, we fudge it ($r[$_] += $fudge) and
then adjust the $total_error in the same way.  When the total error
reaches zero, no more adjustments are necessary.

After we've finished any necessary adjustments, we scale the adjusted
elements of @r back to the right size for percentages and return the
results.


*. This time there was no discussion of peculiar edge cases.  Are
   negative numbers allowed?  What if all the numbers are zero?
   Perhaps all the edge-case-fanciers were on vacation.

*. This problem turned out to be quite difficult to get right, much
   harder than I thought it would be.  Of 17 programs posted to the
   -discuss list, only 4 (from 3 authors) passed all the tests!  

   You should consider trying the test suite yourself.  You can obtain
   it from

        http://perl.plover.com/qotw/misc/r007/TestFudge.pm

   Then to use it, run the command

        perl -MTestFudge yourprogram.pl

   and look for 'not ok' in the output.  If your program fails a test,
   debugging it will probably be at least as instructive as doing the
   quiz in the first place.

   Thanks to Andreas Koenig for the tricky test case (#44) that caught
   out one of the submitted programs.

*. A very common error was to compute the fudge factor correctly and
   then to apply it to the wrong elements.  Many people assumed that
   any of the result values could be fudged.  But doing so can lead to
   bizarre results.  Nobody would accept (37, 23, 40) as a valid
   fudging of (33.3, 33.3, 33.3).  Similarly, once person said on the
   -discuss list:

        I do not think the improper result from

        [0 1 1 1 1 1 1]  -> [15 17 17 17 17 17]

        is a bug so much as an issue with the constraints of the
        problem.

    Maybe, but the problem said:

        The return value ... will be a list of numbers indicating
        relative percentages.... [which] must be rounded off as
        indicated by the rounding argument.

    There is no way to interpret '15' as 16 2/3 % (the exact relative
    percentage) rounded off to 0 decimal places.  (The person quoted
    above submitted a revised solution when this language was pointed
    out to him; nevertheless, even if I'd somehow left a loophole in
    the problem specification, what's the point of producing a
    solution that you know is defective just because you can weasel it
    through a loophole in the problem statement?  The Quiz of the Week
    is not mandatory.)

    Anyway, many results were misrounded even by solutions that were
    *not* deliberately ignoring the requirement to round off.  For
    example, test 104 concerned the data (2 2 1 1 1 1) rounded off to
    0 places.  The exact percentages are (25, 25, 12.5, 12.5, 12.5, 12.5).
    There are a lot of reasonable answers here, all of the form (25,
    25, 12, 13, 12, 13).   But what you *cannot* do is alter the 25,
    which is already exact.  There is no interpretation of 'round
    off' in which 25 is 'rounded off' to anything other than 25.

    Nevertheless, among the solutions submitted on the -discuss list,
    the 25 was 'rounded off' to 23, 24, 26, and 27.   (One hapless
    poster got the 25's right and then rounded off 12.5 to 14.)  All
    together, 10 of the 17 posted solutions failed this test.

    Similarly, faced with (2 1 1 1 1), where the exact answer was (33 1/3,
    16 2/3, 16 2/3, 16 2/3, 16 2/3), and the correct result would have
    been something like (33, 16, 17, 17, 17), eight of the 17 programs
    produced (32, 17, 17, 17, 17) instead.

*. Randal Schwartz said:

        I thought about a test harness for easy#7, but when I realized
        that the numbers could come back in any order, I punted. :)

   I had meant to require the percentages to be in the same order as
   the input data.  That is, given data (1, 2, 3, 4), the return
   values MUST be (10, 20, 30, 40), and not some other permutation of
   those.  But it turned out that nobody returned the results in the
   wrong order, so I didn't have to worry about it.
   

*. John Macdonald's third posted solution
   (http:°perl.plover.com/qotw/misc/r007/macdonald3.pl) is worth
   study, because it relies on a clever insight:

        Using truncation has some advantages.

        - As you point out, the numbers are within 1 in the last
          decimal place.

        - Fudging will always be incrementing, never decrementing.

        Then a light bulb went on.  If you use the technique of
        applying the fudge to the elements that had the greatest error
        from the truncation, the fudging process will then minimise the
        final error and come to the same result as when you start with
        rounding.

        And, because the fudging is always positive, the code is
        simpler.  I like that.

    This gave me that "Gosh, I wish I had thought of that"feeling.
    Modulo this insight, his program is very similar to mine.

*. Since at most one other person produced a correct answer, I thought
   I'd better look at it to see if it was doing anything different.
   It was sent in by Brian King.  The first thing that grabbed my
   attention was:

     my $tolerance = 0.000_000_001;    # 1 one-billionth should suffice.

   What's funny about this is that I had been about to write almost
   exactly the same thing in my own program, but then I got a nagging
   feeling about what would happen if the caller asked for their
   percentages rounded off to the nearest ten-billionth, and I
   couldn't see a way out, so I scrapped the whole idea and went with
   the always-use-integers approach that I showed above.  And sure
   enough, Mr. King's program produces the wrong answer for (10, 1, 1, 1).
   The output is (33.3333333333 33.3333333333 33.3333333333), but it
   should be     (33.3333333334 33.3333333333 33.3333333333).  Darn!

   That said, there were a couple of other things I found interesting
   about Mr. King's program.  It was one of the longer ones
   (second-longest, in fact) and I wondered why.  Mostly it seemed to
   be because of repeated code.  For example:

             if ( $off_by > 0 && ( abs($off_by) > $tolerance ) ) {
                 # if the overall difference is positive & we're still off...
                 if ( $out > ( ( $this / $sum ) * 100 ) ) {
                     # and if we rounded this one up
                     $out -= $precision;
                     $off_by -= $precision;
                     #round it down instead. update how much we're still off
                 }
             } ## end  if ( $off_by > 0 && ( abs($off_by...
             elsif ( $off_by < 0 && ( abs($off_by) > $tolerance ) ) {
                 if ( $out < ( $this / $sum ) * 100 ) {
                     $out    += $precision;
                     $off_by += $precision;
                 }
             }

   Here I would have at least eliminated the repeated

     abs($off_by) > $tolerance 

   test:

        if (abs($off_by) > $tolerance) {
          if ($off_by > 0) {
            ...
          } elsif ($off_by < 0) {
           ...
          }
        }

   but I would have preferred to somehow merge the two blocks into
   one.  And in fact, I *did* merge the two blocks into one; this code
   corresponds closely with the 

              next unless $e[$_] * $sign > 0;  # Error goes the wrong way
              $r[$_] += $sign;          # Adjust value
              $total_error--;          

   section of my example program.  Along similar lines, Mr. King's
   program has

     if ( $precision == 0 ) {
         $precision = 1;
     }
     else {
         $precision = '.' . '0' x ( $precision - 1 ) . '1';
     }

   but it would have been simpler to do

         $precision = 10 ** -$precision;

   Don't take these criticisms too seriously, since I wouldn't even
   have been looking at the code so closely if it hadn't outperformed
   almost all the other submitted programs.

Well, that's all for this week's regular quiz.  I'll send something
about the frost simulators tomorrow, and new quizzes on Wednesday.  My
thanks to everyone who contributed to the discussion, but also
especially to the people who worked the problem on their own.

