Appendix 2 Continued Fraction Arithmetic by Bill Gosper Abstract: Contrary to everybody, this self contained paper will show that continued fractions are not only perfectly amenable to arithmetic, they are amenable to perfect arithmetic. Output Suppose we want the continued fraction for an exact rational number, say 2.54, the number of centimeters per inch. We can execute Euclid's algorithm neatly as follows: 254 Initially, 2.54 = 254/100 100 2 Short divide 100 into 254, getting 2, remainder 54 54 1 54 into 100 goes once, remainder 46 46 1 46 into 54 8 5 8 into 46 6 1 2 3 0 We incidentally have found that gcd(254,100) = 2. This says that the continued fraction of 2.54 is 2 1 1 5 1 3, or 2.54 = 2 + 1/(1 + 1/(1 + 1/(5 + 1/(1 + 1/3)))) 1 = 2 + --------- 1 1 + --------- 1 1 + --------- 1 5 + -------- 1 1 + --- 3 Similarly, if you want 100/2.54, the number of inches per meter, you will find 39 2 1 2 2 1 4 which is much nicer than 39.(370078740157480314960629921259842519685039) where the part in parentheses repeats forever. (Incidentally, this repeating decimal is easily computed since the remainder of 2 after the quotient digits 3937 ensures that, starting with 7874..., the answer will be just twice the original one, ignoring the decimal point. Thus you just double the quotient, starting from the left (using one digit lookahead for carries), placing the answer digits on the right, so as to make the number chase its tail. This trick is even easier in expansions of ratios where some remainder is exactly 1/nth of an earlier one, for small n. You forget about the divisor and simply start shortdividing by n at the quotient digit corresponding to the earlier remainder. If this seems confusing, forget it--it has nothing to do with continued fractions.) Now suppose we want the continued fraction of 70 t + 29 --------- , 12 t + 5 knowing only that t is positive. We can only give a partial answer-- if t happened to be irrational, for instance, the true answer would go on forever. We do know, however, that as t varies between oo and 0, 70 70 t + 29 29 -- < --------- < -- 12 12 t + 5 5 so the first term, at least, is 5. Following the same procedure for Euclid's algorithm: 70 t + 29 12 t + 5 5 ( 70 t + 29 - 5 (12 t + 5) = 10 t + 4 ) 10 t + 4 1 (I.e. between 12/10 and 5/4) 2 t + 1 4 (It could only be 5 if t were truly oo) 2 t All we can say about our next quotient is that it lies between 1 and oo. But we have managed to reexpress 70 t + 29 1 --------- as 5 + --------- 12 t + 5 1 1 + --------- 1 4 + --------- 2 t + 1 ------- 2 t and thereby determine the first three terms of the continued fraction. If we knew that t > 1/2, we could get another term: 2 t + 1 1 ------- = 1 + --- . 2 t 2 t Input Now, the opposite problem: suppose you are receiving the terms of a continued fraction 5 1 4 1 ... which may go on forever, or possibly end with a symbolic expression. We wish to reconstruct the value as the terms come in, rather than awaiting an end which may never come. Let x symbolize the value of the "rest" of the continued fraction, so that before we learn its first term, x stands for the whole thing. When we learn that the first term is 5, 1 5 x + 1 x becomes 5 + - or ------- . x x When the next term turns out to be 1, 1 5 x + 1 6 x + 5 x becomes 1 + - and ------- becomes ------- . x x x + 1 Then 1 6 x + 5 29 x + 6 x becomes 4 + - and ------- becomes -------- . x x + 1 5 x + 1 Then 1 29 x + 6 35 x + 29 x becomes 1 + - and -------- becomes --------- . x 5 x + 1 6 x + 5 Finally, when x becomes 2 t, we have reconstructed the original expression. In general, when 1 q x + r (a q + r) x + q x becomes a + - , then ------- becomes --------------- . x s x + t (a s + t) x + s Although this looks messy, it can be handled almost as neatly as Euclid's algorithm: From RIGHT TO LEFT across the page, we write the incoming terms as we learn them: . . . 1 4 1 5 Under the first (rightmost) term, we place the left edge of the array 1 0 1 x + 0 symbolizing ------- i.e. the identity function: 0 1 0 x + 1 . . . 1 4 1 5 1 0 0 1 Then, again from RIGHT TO LEFT, we extend the lower two rows with the simple recurrence: multiply each element by the term in the top row above it, then add the element to the right and write the sum on the left: . . . 1 4 1 5 35 29 6 5 1 0 6 5 1 1 0 1 That is, 29 = 4 * 6 + 5, and in the bottom row, 6 = 1 * 5 + 1. Letting the last term be 2t, 2t 1 4 1 5 70t + 29 35 29 6 5 1 0 12t + 5 6 5 1 1 0 1 The great thing about this process is that you can take other functions by initializing the rightmost matrix to other than 1 0 0 1 . Furthermore, it is possible to intersperse steps of the Euclid process between input steps, thereby computing the continued fraction of the value of the function while the value of the argument is still being learned. Throughput Suppose, for instance, that we want to compute the continued fraction 2 for ----------- 3 - sqrt(2) knowing that the continued fraction for sqrt(2) is 1 2 2 2 2 2 ... . We set up . . . 2 2 2 2 2 2 1 0 2 -1 3 symbolizing 0 sqrt(2) + 2 ------------- . - sqrt(2) + 3 Filling in two elements of each row: . . . 2 2 2 2 2 2 1 4 2 0 2 3 2 -1 3 4 2 4 x + 2 But means ------- and since x, the rest of the continued 3 2 3 x + 2 fraction, is between 0 and oo, we know that the answer is between 4/3 and 2/2, so we can perform a Euclid step with the quotient 1 as the first answer term: . . . 2 2 2 2 2 2 1 4 2 0 2 3 2 -1 3 1 1 0 (As before, we list the output terms down the right side.) Now we must proceed to the left again (unless we are willing to admit that we know x > 2): . . . 2 2 2 2 2 2 1 4 2 0 2 8 3 2 -1 3 1 2 1 0 Any number between 8/2 and 3/1 has 3 as its integer part, so 3 is our second term. . . . 2 2 2 2 2 2 1 4 2 0 2 8 3 2 -1 3 1 2 1 0 3 2 0 Continuing: . . . 2 2 2 2 2 2 1 4 2 0 2 8 3 2 -1 3 1 5 2 1 0 3 10 4 2 0 1 5 2 1 0 4 10 4 2 0 1 2 1 0 4 But we have been in a loop since the second occurrence of the pattern 2 1 2 0 so we have found that the continued fraction for 2 ----------- is 1 3 1 4 1 4 1 4 . . . . 3 - sqrt(2) A more interesting case: suppose we want the continued fraction for 1 e - 1 tanh - = ----- 2 e + 1 and we know that e = 2.71828... = 2 1 2 1 1 4 1 1 6 1 1 8 1 ..., which we can abbreviate 2 (1 2k+2 1). . . . 1 1 4 1 1 2 1 2 1 1 -1 4 3 1 1 0 12 7 5 2 1 1 2 20 11 9 2 1 1 0 1 6 2 1 1 0 1 10 0 1 Our suspicions aroused by the arithmetic progression developing in the answer, and especially by the third occurrence of the pattern 2 1 0 1 , we introduce the symbolic term 2k+6 . . . 1 1 2k+6 1 1 4 1 1 2 1 2 1 1 -1 4 3 1 1 0 12 7 5 2 1 1 2 20 11 9 2 1 1 0 1 6 8k+28 4k+15 4k+13 2 1 1 0 1 10 2 1 1 0 1 4k+14 0 1 The reoccurrence, independent of k, of the pattern 2 1 e - 1 proves that ----- = 0 2 6 10 (4k+14) = 0 (4k+2) . 0 1 e + 1 In fact we have proved a more general result. We can replace 2k by f(k), an arbitrary positive integer-valued function, to get the theorem: if x = (f(k) 1 1) then 2 x + 1 ------- = 2 x + 1 = (2f(k)+2) . 0 x + 1 A handy check on the arithmetic is provided by the fact that the determinant of each of the 2 by 2 matrices is simply negated upon each output and input. Thus, for example, the magnitude of these determinants in the preceding example was always 2: (8k+12)*1 - (4k+7)*2 = -2 1 * 1 - (-1) * 1 = 2 Another source of redundancy is the possibility of postponing the output (Euclid) steps for a while, then performing them in a long burst, arriving at the same point in the array via a different route. The disadvantage of this scheme is that larger intermediate numbers are generated. Functions of the form p x + q ------- , r x + s which we have been abbreviating with the matrix p q r s are known as homographic functions. If f(x) and g(x) are two such functions, the matrix for f(g(x)) is simply the product of the matrices for f and g. This can be verified directly by substitution. This means that we can regard a continued fraction 1 x = a b c ... = a + --------- 1 b + --------- 1 c + --------- ... as a composition of homographic functions: a 1 b 1 c 1 ( ) ( ) ( ) . . . 1 0 1 0 1 0 and a homographic function of such an x is merely p q a 1 b 1 c 1 ( ) ( ) ( ) ( ) . . . r s 1 0 1 0 1 0 Carrying out these multiplications from left to right will produce the same sequence of matrices as successively inputing the terms a, b, c, ... in our array process. To output a term using matrices, multiply on the left by the inverse of the matrix for inputing that term. Thus, our theorem that the function 2 x + 1 will remain unchanged by inputing the sequence f 1 1 and then outputing the term 2f+2 can be written as the matrix identity 0 1 2 1 f 1 1 1 1 1 2 1 ( ) ( ) ( ) ( ) ( ) = ( ) . 1 -2f-2 0 1 1 0 1 0 1 0 0 1 Here is why we bother with these clumsy matrices: we know that e + 1 ----- = 2 6 10 14 ... e - 1 since deleting (or adding) an initial 0 term reciprocates the value of a continued fraction. We wish to use this result to get the continued fraction for 4/e. The problem is: what is the initial matrix? Answer: 0 4 1 1 -1 2 -2 4 -4 ( ) ( ) = == 1 0 1 -1 1/2 1/2 1 1 The left hand matrix says 4/x, the next one says x + 1 ----- . x - 1 Inverting it says solve for x (in our case e). (If function composition comes from matrix multiplication, then function inversion must come from matrix inversion.) Multiplying by the first matrix applies the "4 divided by" function. Multiplying all of the elements by 2 is just equivalent to multiplying the numerator and denominator of a fraction. The following computation of 4/e is much simpler if we squeeze out additional terms from patterns like 8 0 3 1 by using the fact that x > 1, instead of the weaker condition x > 0, so that we have 8 8 x + 0 8+0 8 8 x + 0 0 - > ------- > --- instead of - > ------- > - . 3 3 x + 1 3+1 3 3 x + 1 1 . . . 8k+26 8k+22 18 14 10 6 2 4 4 -4 (Determinant = +or- 8) 19 3 1 1 1 91 9 1 3 2 11 1 1 8 ------- 43 | 3 1 | 3 | | 26 | 2 -2 | 1 ------- 17 1 1 9 1 1 144 8 0 1 19 1 1 7 11 1 1 8 0 1 -------- 24k+67 | 3 1 | 2 (Pattern recurs, | | prompts input of 16k+42 | 2 -2 | period begins 1 symbolic terms.) -------- 8k+25 1 1 8k+17 1 1 64k+208 8 0 k+2 8k+27 1 1 7 8k+19 1 1 8 0 k+2 ------------ | 3 1 | Pattern recurs, period ends 2 | | | 2 -2 | ------------ This gives us 4 - = 1 2 8 3 1 1 1 1 7 1 1 2 (1 1 1 k+2 7 1 k+2 2) e = 1 2 8 3 (1 1 1 k+1 7 1 k+1 2) . The reason for introducing the input term 8k+22 instead of 4k+22 is that the matrix recurred only every other input term, thus apparently regarding the input sequence to be (8k+2, 8k+6) instead of simply (4k+2). This is evidently related to the fact that this process is characterized by a determinant of +or- 8. A fun conjecture to test would be the following generalization of Hurwitz's theorem: The homographic matrix is periodic iff the input sequence is periodic modulo the determinant. Inverting Homographic Functions A very useful trick to add to your high school algebra repertoire: a x + b -d y + b y = ------- -> x = -------- . c x + d c y - a That is, to invert a homographic function, just interchange and negate the diagonal elements of its matrix. This is a shortcut equivalent to inverting the matrix, then multiplying all four elements by minus the determinant. Of course if the determinant, ad - bc, is 0, then we can't solve for x because y is a constant independent of x. Addition, Multiplication, etc. of Two Continued Fractions There is, however, another and yet more significant practical demand that the apparatus of continued fractions does not satisfy at all. Knowing the representations of several numbers, we would like to be able, with relative ease, to find the representations of the simpler functions of these numbers (especially their sum and product). ... On the other hand, for continued fractions there are no practically applicable rules for arithmetical operations; even the problem of finding the continued fraction for a sum from the continued fractions representing the addends is exceedingly complicated, and unworkable in computational practice. --A. YA. KHinchin, 1935 Until now, we have only taken functions of single continued fractions, but to be really useful, our algorithm must extend to arithmetic combinations of two continued fractions, x and y. This we do neatly by expanding to three dimensions. We start with a 2 by 2 by 2 matrix of eight integers. The position of each integer in the matrix is determined by whether or not it is a coefficient of x, whether or not it is a coefficient of y, and whether or not it is in the numerator. (The coefficients of xy are simultaneously coefficients of x and of y.) If we continue the convention of writing the terms of x from right to left across the top of the page, and write the terms of y down the right, where we used to put the outputs, then we can put the 2 by 2 matrix corresponding to the numerator expression where we used to put the initial homographic function matrix. That is, if x = x1 x2 x3 ... y = y1 y2 y3 ... then we represent axy + bx + cy + d by . . . x3 x2 x1 b d a c y1 y2 . . . Floating below the surface of the page, directly beneath the bdac matrix, we can imagine the denominator matrix f h e g representing exy + fx + gy + h . Thus we can compute any expression of the form axy + bx + cy + d ----------------- exy + fx + gy + h by starting with the three dimensional matrix b d f h a c e g . Let us call such expressions bihomographic. The algorithms for continued fraction addition, subtraction, multiplication, and division are all identical but for the initialization of the matrix! 1 0 1 0 x + y = 0 1 x - y = 0 1 0 1 0 -1 0 0 0 0 0 0 1 0 x * y = 0 1 x / y = 0 0 1 0 0 0 0 0 0 1 We shall work through a slightly fancier example function, for no extra cost. Using y = sqrt(6) = 2 2 4 2 4 2 4 . . . 2 e + 1 x = coth 1 = ------ = 1 3 5 7 9 . . . 2 e - 1 we will compute 2xy + x -2 sqrt(6) ------- = (1 + e ) (1 + -------) xy + y 12 which dictates the initial setup . . . 5 3 1 <- x y 1 0 | 0 0 2 0 2 1 1 2 4 Now as x and y independently vary from 0 to oo, axy + bx + cy + d ----------------- exy + fx + gy + h varies between limits among the ratios a/e, b/f, c/g, and d/h, provided that the denominator doesn't change sign. For the matrix in question, two of these denominators are zero, and they must be shifted out of the picture by inputing a term of y. . . . 5 3 1 1 0 0 0 2 0 2 1 1 5 0 2 2 2 4 Here the 2 by 2 matrix 2 0 was multiplied by 2 (the first term of y) 1 1 and added to 1 0 to get 5 0 . Now we observe that the lefthand 0 0 2 2 pair of ratios, 2/1 and 5/2, have the same integer parts, whereas the bottom pair, 5/2 and 0/2, do not. Since our goal is to get them to be equal so that we can perform a Euclid output step, we proceed to the left with an x input. Inputing from y instead would not get rid of the 5/2 and 0/2 for another step. . . . 5 3 1 1 0 0 0 2 2 0 2 2 1 1 5 5 0 2 4 2 2 4 Unfortunately, this input of 1 still does not provide enough information to determine the output (smaller terms are less informative than larger ones). Nevertheless, the two new ratios, 2/2 and 5/4 have equal integer parts, so we continue leftward. . . . 5 3 1 1 0 0 0 8 2 2 0 2 7 2 1 1 20 5 5 0 2 14 4 2 2 4 At last we have determined that the first answer term is 1, and we 7 2 subtract 1 times the denominator matrix from the numerator 14 4 8 2 1 0 matrix to get the new denominator matrix . 20 5 6 1 . . . 5 3 1 outputs 1 0 0 0 1 8 2 2 0 2 7 2 1 1 1 0 20 5 5 0 2 14 4 2 2 6 1 4 Again a 0 denominator thwarts immediate hope of another output, but it is in the corner where either an x or a y input will get rid of it. In fact, since the integer parts of the other three ratios are all different, we will need at least two more input terms to get rid of them. We can further deduce that we need at least one y input, otherwise the y-independent ratios will remain between 7 and oo, while the other pair will stay in the disjoint interval between 14/6 and 4. So let's sample y first. . . . 5 3 1 outputs 1 0 0 0 1 8 2 2 0 2 7 2 1 1 1 0 20 5 5 0 2 14 4 2 2 6 1 35 10 4 13 2 Now 14/6 and 35/13 have equal integer parts, so we move x-ward. . . . 5 3 1 outputs 1 0 0 0 1 8 2 2 0 2 7 2 1 1 1 0 20 5 5 0 2 74 14 4 2 2 31 6 1 185 35 10 4 67 13 2 which nets us an output term of 2. . . . 5 3 1 outputs 1 0 0 0 1 2 8 2 2 0 2 7 2 1 1 1 0 20 5 5 0 2 74 14 4 2 2 31 6 1 12 2 185 35 10 4 67 13 2 51 9 Now we must input the 4 from the y sequence, whereupon we will get an output of 1, leaving us with the matrix 51 9 16 4 216 38 83 20 . On the next page is a perspective view of the entire process up until now. The extremely elongated numbers are the inputs, and the three outputs 1 2 1 are in the upper right. This picture was produced with Bruce Baumgart's Geometric Editor (Stanford AI Memo 232). Another x and y go in and a 2, a 1, and a 1 bubble out: . . . 7 5 3 1 outputs 1 0 0 0 1 2 8 2 2 0 2 1 7 2 1 1 2 1 0 1 1 20 5 5 0 2 . 74 14 4 2 2 . 31 6 1 . 12 2 185 35 10 4 67 13 2 366 51 9 116 16 4 299 58 2 1550 216 38 601 83 20 348 50 . 253 33 95 17 . 3466 483 . 1318 182 830 119 488 63 342 56 Unfortunately, except for a few degenerate cases, each matrix variable will tend to grow so as to contain about a quarter as many digits as have been input. However, there is no need for the most difficult multiprecision routines--namely multiply and divide--since multiplies will nearly always involve small terms, and divides are merely integer estimates of ratios. The rare case of a large term can be handled by breaking it up, e.g. 78629 = 78000 0 629 . (See the Zero and Negative Terms section.) Also on the brighter side, the rate of information output will keep up with the inputs except for a slight lag of a term or two due to the crude bounds (0 to oo) used for the unread segments. But Why All This Trouble? Why use algorithms that are at least twice as hard as the usual algorithms on numbers in positional notation (e.g. decimal or binary)? One answer is that many numbers, such as pi and e, can be represented exactly, using little programs (coroutines) to provide indefinitely many continued fraction terms on demand. But the algorithms for sum, product, etc. of two such numbers have this same property, for they produce their output as they read their input. Thus we can cascade several of these routines so as to evaluate arithmetic expressions in such a way that output stream begins almost immediately, and yet can continue almost indefinitely. The user is freed from having to decide in advance just how much precision is necessary and yet not wasteful. Later we will extend this claim to cover series terms and approximating iterations. When you analyze why people actually bother with numerical arithmetic instead of leaving things symbolic, you find that the main reason is for deciding inequalities. Imagine how much computational effort has been wasted in generating the bits to the right of the decisive ones in inequalities. A fixed word length is almost always too long, yet sometimes too short, whereas term-at-a-time arithmetic can shut off as soon as the "comparands" differ. Another great waste of effort is the generation of information which is destroyed by truncation and rounding, or discarded in the form of remainders from division. By contrast, information is completely conserved during continued fraction processes, making the arithmetic reversible. In fact, continued fraction arithmetic is information-driven: processing is always directed to the subexpressions upon which the final answer depends most. No input is needlessly read, and no output is needlessly delayed. As a result, quantities which are multipled by small coefficients or 0 will be evaluated only a little, or not at all. The original arithmetic expression, whose value we seek to print out, is expressed as a composition of homographic and bihomographic functions. (At the bottom level are the constants, which act like functions of no arguments.) These functions are the subexpressions over which the computational effort is distributed. When a function is asked for a term, it performs the algorithm we have described in the preceding pages, asking in turn for zero or more terms from its subfunctions until its pending output term is insensitive to them. If multiple processors are available, a fork can be executed whenever a function finds itself dependent on more than one subfunction. (Problem: how do you write an arbitrary arithmetic expression as a minimal number of homographic and bihomographic functions?) Contrary to convention, processing begins at the output routine. The output routine asks the top level function for a datum (term or digit) and the top level function inturn asks for data from the input to which it is most sensitive. Eventually, a bottom level number routine will be reached, whereupon information begins to percolate upward. But most computations involve imprecise quantites, so why bother with errorless arithmetic? Because the built-in error analysis is guaranteed to stop output before erroneous terms, simultaneously indicating the quantity which limited the significance. The trick is to implement imprecise quantities as bottom level functions of no arguments analogous to those for pi and e, but instead of containing and endless description, these programs emit a loud croak when asked for one more term than they have. A drawback of this scheme is that continued fraction terms are inadequately small units of information, so that imprecise quantities will usually have a fraction of a term left over, that is, a term whose exact value is uncertain, but bounded more narrowly than between 1 and oo. Furthermore, most of the subexpression modules will also contain partial terms when a computation stalls. The best solution to this problem is to use continued logarithms (later section) instead of continued fractions, but we have a tolerable solution which uses the reversiblity of continued fraction computations. The idea is for each imprecise quantity to describe its upper bound, then take back a term or so and proceed to describe its lower bound. For example the gas constant PV -- = R = 8.31432 +or- .00034 Joules/deg/mole nT could be converted to two continued fractions--one for the lower limit of 8.31398, and one for the upper limit of 8.31466, but we can effectively get both by running Euclid's algorithm on the lower limit while keeping track of the error interval: 8.31398 + .00068 1.00000 0 8 .31398 + .00068 3 .05806 - .00184 5 .02368 + .00988 2 .01070 - .02160 2 .00328 + .05308 In the fifth row a negative error has greater magnitude than the corresponding remainder, indicating that we would be outputing different terms by then if we were doing the upper limit instead. But we can switch over to doing the upper limit simply by adding the last two errors to their corresponding remainders and then continuing: .01070 - .02160 = -.01090 .00328 + .05308 = .05636 0 -.01090 -6 -.00904 Note the 0 term, charateristic of retraction. This tells us that the true value is between 8 3 5 2 3 and 8 3 5 2 2 0 -6 = 8 3 5 2 -4 = 8 3 5 1 1 3 = 8 3 5 2 3 0 -7 This means that we can describe our imprecise number as 8 3 5 2 3 oo 0 -oo -7 oo where oo means a very large term or, equivalently, a termination signal. The desired effect, anyway, is to squeeze out the partial terms from the immediate superior function by making it think it has gotten a lot of information. Probably the best thing for f(x,y) to do when it receives its first oo from an imprecise x is to set aside copies of its coefficients of x, freeze y input (in case y might deliver a confusing oo), read the rest of x (to the last oo), then replace all of the noncoefficients of x with the copies of the corresponding coefficients of x that had been set aside. Unfortunately, when a function depends on two imprecise arguments, it must go through extra pain to select the extremes from the four values it achieves as each argument saturates at each limit. Square Roots of Continued Fractions To find y = sqrt(x), rewrite the equation as y = x/y . Our plan is to extract a term at a time from the continued fraction process for x/y subject to the condition that the output terms of the process must equal the input terms of y. We will be concerned with matrices whose last element is minus their first. This property is preserved if we always input what we output: x ax+b a b cx-a c -a x 2 b+2ax-cx a-cx Another important property of these matrices: if a y + b f(y) = ------- c y - a and we wish to find the "fixed point" of f, i.e. solve the equation y = f(y) , then the simple iteration y <- average (y, f(y)) will be equivalent to the rapidly convergent Newton's iteration for the equivalent equation 2 c y - 2 a y - b = 0 . These particular homographic functions are the self-inverse ones, that is, f(f(y)) = y for all y. For a warmup exercise, we will assume x to be merely a rational, 17/10, instead of a continued fraction. We set up the matrix for 17 f(y) = ---- 10 y namely 0 17 10 0 . Since the output must equal the input, the next term of y must always be the integer part of the fixed point of the (homographic) function defined by the matrix. To find this we can run a miniature successive approximation for each term. For example, the arbitrary guess that y = 3 gives f(y) = 17/30 , whose integer part is 0. The average of y and f(y), whose integer part is 1, will be much closer, being equivalent to a step of Newton's method. Now f(1) = 17/10, and since the actual value always lies between y and f(y), 1 must be the integer part of sqrt(17/10) and hence the first input and output term. Outputing and inputing a 1: 1 0 17 10 10 0 1 7 -10 17 The next term will be the integer part of the solution of 10 y + 10 y = --------- . 7 y - 10 We could start by guessing y = 2. Note that since we desire positive terms, we must choose x > 10/7 to avoid the negative root. [f(2)] = 4, so we try the average, 3. [f(3)] = 3, so we output and input 3: 3 1 0 17 10 10 0 1 11 7 -10 17 3 7 -11 40 Here we find f(3) = 4, which is no problem, since the actual fixed point is in between and thus must start with 3. 3 3 1 0 17 10 10 0 1 11 7 -10 17 3 10 7 -11 40 3 10 -10 40 Then [f(2)] = 2, 2 3 3 1 0 17 10 10 0 1 11 7 -10 17 3 10 7 -11 40 3 10 10 -10 40 2 7 -10 27 But we had this same matrix after the first term, so sqrt(17/10) = 1 (3 3 2) . (Actually, in this special case where the radicand is rational, it is possible to eliminate the guesswork from each iteration by observing that the determinant is preserved. In general, when a y + b y = ------- c y - a we have the determinant 2 D = - a - b c and a + sqrt(-D) y = ------------ c so [y] is merely [(a+d)/c], where d = [sqrt(-D)], which we need only compute once at the beginning. The algorithm is then a close equivalent to the one in Knuth, exercise 4.5.3.12. Unfortunately, when the radicand is a continued fraction to begin with, there is no such convenient invariant, so we will need a small iteration for each term.) The Real Thing Actually, we can solve any quadratic equation by rewriting it as the fixed point of a self-inverse homographic function: 2 - q x - 2 r p x + q x + r = 0 -> x = ----------- . 2 p x + q So instead of simply taking the square root of a continued fraction, it will be more illustrative to solve a quadratic equation, one of whose coefficients is a continued fraction. We will compute coth 1/2 from coth 1 = 1 3 5 ... = (2k+1) using 2 (coth t) + 1 ------------- = coth 2t 2 coth t with t = 1/2. The equation we want is x y - 1 y = ------- y - x where x = coth 1 and y = coth 1/2 , giving us the initial setup . . . 7 5 3 1 0 -1 -1 0 1 0 0 1 Corresponding to x = oo and x = 0 are the left and righthand 2 by 2 matrices, which, as functions of y, also have the useful property of being self-inverse. This means that we can use the Newton averaging trick to quickly find the integer part of the fixed point of the left matrix, and if it satisfies the righthand one, it is the term to output in the answer and input as y. If the two matrices have different fixed points, more x input is needed. This sounds harder than it is. Initially, the lefthand (x = oo) equation says y = - y . Guessing y = 69 will give us a value of -69, which, averaged with 69 gives our second approximation, 0, which is exactly right, since the equation happened to be degenerately linear. The righthand equation is y = - 1/y , which is decidedly not solved by y = 0, so, hardly to our surprise, we must read in a term or more of x before we can begin to output some algebraic function of it. . . . 7 5 3 1 -1 0 -1 -1 -1 0 1 1 0 1 0 1 The new lefthand function is truly pathological, being identically 1 except at 1, where it is 0/0. Assuming that we make our algorithm perform more input upon division by 0, two more inputs will occur. . . . 7 5 3 1 -16 -3 -1 0 -1 -21 -4 -1 -1 0 21 4 1 1 0 16 3 1 0 1 Finally, both of the last two matrices have fixed points solidly between 2 and 3, so we output a 2 in the z direction and input a 2 in the y direction. . . . 7 5 3 1 -16 -3 -1 0 -1 -21 -4 -1 -1 0 26 5 21 4 1 1 0 2 16 3 1 0 1 -11 -2 11 2 4 1 Now the lefthand matrix has fixed point between 6 and 7, while 6 plugged into the righthand one gives 15/4. More input. . . . 7 5 3 1 -16 -3 -1 0 -1 -21 -4 -1 -1 0 26 5 21 4 1 1 0 2 115 16 3 1 0 1 -79 -11 -2 79 11 2 29 4 1 Trying our 6 in the new lefthand matrix, we win. . . . 7 5 3 1 -16 -3 -1 0 -1 -21 -4 -1 -1 0 26 5 21 4 1 1 0 2 115 16 3 1 0 1 -79 -11 -2 589 82 79 11 2 6 29 4 1 -95 -13 95 13 19 4 Now the lefthand matrix says 10, but not the right. Inputing 9 from x confirms the 10 for y. . . . 9 7 5 3 1 -16 -3 -1 0 -1 -21 -4 -1 -1 0 26 5 21 4 1 1 0 2 115 16 3 1 0 1 -79 -11 -2 589 82 79 11 2 6 265 29 4 1 -868 -95 -13 8945 979 868 95 13 10 175 19 4 -882 -95 882 95 125 29 It is not obvious how to show the the answer will indeed be 2 6 10 14 ... = (4k+2). For a while, this computation will be typical in that the output rate will about match the input rate, while the matrix integers slowly grow, but in this peculiar case, the output terms outgrow the input terms, so that input must occur somewhat oftener to make up the information rate. Come to think of it, the schoolboy algorithm for square root is also digit-at-a-time, but requires two inputs for each output to avoid souring future outputs. Square Roots etc. Using Feedback Suppose that continued fraction arithmetic is being used in a successive approximations process, and suppose further that this process is converging at better than one term per iteration. (Newton's method, for example, delivers exponentially more terms each iteration.) This means that the next approximation will contain at least one more correct term than the current one, independent of the erroneous terms which follow. But a continued fraction process will not request data of which it is independent, and thus it will have already computed the new, correct term by the time it reads the corresponding incorrect term. But then there is no need at all to read the incorrect term, since the correct one is already available. So once a process starts to converge, it can gobble its own output in a feedback loop. (This idea is due to Rich Schroeppel.) There is a minor catch in all of this: in order to be able to output early, the module which computes the approximating expression must "realize" that all instances of the input approximation must vary from 0 to oo in unison. Thus all instances of the approximation variable must be grouped into a single expression which may be more complicated than the ones for simple arithmetic. Generalization of the algorithm to higher dimensions, in order to accomodate additional variables or higher powers, is straightforward but tedious. Someday, I would like to spend some time contemplating the consequences of more complicated feedback arrangements. Worked Example To compute x = sqrt(y), Newton's method says 2 x + y x <- ------ 2 x where x is the approximating variable. Unfortunately, if both x and y are continued fractions, the general form of the expression required will be 2 2 ax y + bxy + cy + dx + ex + f ------------------------------ 2 2 gx y + hxy + iy + jx + kx + l which involves twelve integer variables and four dimensions. The feedback technique is more easily demonstrated if y is simply an integer, like 6 for instance. Then we can use the mechanism for simple arithmetic, starting with the matrix 0 6 1 0 1 0 0 1 x y + 6 i.e. ------- , which, when y = x, is Newton's method for x + y x = sqrt(6). In the denominator, the choice of x + y instead of 2x + 0y, for instance, will provide convenient symmetry which will be preserved by the fact that both inputs (and the output) will always be equal. The four ratios amount to two 0s and two oos, indicating that we will have to warm up the process before it produces terms automatically. To get a term, we must estimate the integer part of the answer, which we do simply by successive substitution using integer arithmetic. Starting with a guess of 3, for instance: 3*3 + 6 2*2 + 6 ------- = 2+ , ------- = 2+ 3 + 3 2 + 2 so 2 is the first term, which we output and input for both x and y: . . . 2 0 6 2 1 0 2 -2 6 1 0 2 1 0 1 0 1 -2 . 4 1 . 2 0 . (We could have done this in any of the six possible orders.) Again the ratios disagree, so we must take another guess and resubstitute it until it stabilizes. Among the four ratios, 0/1 is the limit when both inputs approach 0 (unrealisitic, they are greater than 1), the two 1/0s are the limits when one input approaches 0 while the other approaches oo (absurd, they are equal), and the 4/2 is the limit when they both approach oo. Since the curve is asymptotically flat, this last, lower left ratio, when finite, is the best first guess: 2 4*2 + (1+1)*2 + 0 ------------------ = 2+ 2 2*2 + (0+0)*2 + 1 So 2 it is. Again, outputing, inputing, and inputing: . . . 2 2 0 6 2 1 0 2 -2 6 1 0 2 1 0 1 1 0 1 -2 0 1 -2 4 1 2 4 2 0 1 0 1 9 4 2 1 This time, 9/2 suggests 4, which is confirmed by 178/40 = 4+ , so . . . 4 2 2 0 6 2 1 0 2 -2 6 1 0 2 1 0 1 1 0 1 -2 0 1 -2 4 1 2 4 2 0 4 1 0 1 2 0 2 9 4 4 9 2 1 4 1 0 40 9 18 4 Now we are cooking, since the three ratios, 40/18, 9/4, and 2/1, all say that the next term is 2, and since everything is positive, the true value must be between the greatest and least ratio. Pumping through this 2, 2 4 2 2 0 6 2 1 0 2 -2 6 1 0 2 1 0 1 1 0 1 -2 0 1 -2 4 1 2 4 2 0 4 1 0 1 2 0 2 9 4 4 9 2 1 9 4 1 0 2 1 0 40 9 2 40 18 4 9 4 1 . 89 40 . 20 9 . We are now to the point of producing outputs twice as fast as they are needed for input, so our matrix is getting overfed. Let's drain it out without inputing to see what's left. 40 18 9 4 4 2 1 0 89 40 20 9 9 4 2 1 outputing a 4 and a 2. But we had this matrix 4 2 1 0 9 4 2 1 before, right after processing the second term. Since the matrix alone determines its future inputs, a repetition immediately implies a loop, proving sqrt(6) = 2 2 (4 2 4 2) = 2 (2 4) . Non-regular Continued Fractions From the (non-regular) continued fraction for arctan 1, 4 1 -- = 1 + --------- pi 4 3 + --------- 9 5 + --------- 16 7 + --------- . . . we can compute the regular continued fraction for pi: . . . 100 81 64 49 36 25 16 9 4 1 (1) 21 19 17 15 13 11 9 7 5 3 1 12 4 0 4 24 4 1 1 0 3 4 0 1 ------- 555 51 6 1 16416 1044 79 7 1 0 7 1008 72 2 2 ---------- 98692840 3891940 169621 8261 456 29 6169520 243320 10598 518 28 2 --------------- 4934642 194597 308476 12166 15 307502 12107 1 974 59 The differences between this algorithm and the one for regular continued fractions (all 1s in the numerators): 1. The list of numerators is written down just above the denominator terms. 2. Each element is computed from the two to its right by multiplying the nearer one by the denominator term above it, in the next to top line--then multiplying the further (rightmost) element by the numerator term above it (in the top line)--then finally adding the two products. (When the numerators are all 1, this is the same as the regular algorithm.) Thus, in the pi conversion, 555 = 9*51 + 16*6. 3. The determinant is not preserved, and it is possible for a 2 by 2 pattern to have a gcd of all four elements greater than 1. This gcd will always divide the last numerator used. In the pi conversion, this gcd exceeded 1 three times, successively reaching 4, 36, and 20. In an effort to keep the elements small, these gcds were divided out each time. The reducible matrices were underlined and the reduced ones were then copied directly beneath. 4. You soon need scratch paper or a calculator. The output steps are the same as for Euclid's algorithm. The regular continued fraction for pi is particularly hard to get out of any process, due to the difficulty in deciding whether the third term is going to be 15 or 16. (The value of pi with its first two terms gone is 15.997... .) These problems are due to the particularly large term which will follow the 1--we can already tell it will be around 300 from looking at the last matrix. This is also what makes 355 3 7 15 1 = 3 7 16 = --- = 3.1415929... 113 such a good approximation to pi. A partial remedy to this "constipation" problem is simply to guess what a pending output term will be, relying on the process to correct itself later. The corrections, if necessary, will take the form of negative terms and possibly 0. These can be "cleaned up" by running the regular Euclidean process starting with the identity function. In the case of the pi conversion, the pattern 8261 456 518 28 tells us that the next term is between 15.9 and 16.3 so we could (incorrectly) guess that the next term was 16: . . . 100 81 64 49 36 25 16 9 4 1 (1) 21 19 17 15 13 11 9 7 5 3 1 12 4 0 4 24 4 1 1 0 3 4 0 1 ------- 555 51 6 1 16416 1044 79 7 1 0 7 1008 72 2 2 ---------- 8261 456 29 6169520 243320 10598 518 28 2 16 -1948 -1180 53 -27 8 -------------- 308476 12166 -974 -59 Although this gets us an earlier output, the next three input terms still fail to get us that big term near 300--not until the ingestion of the pair 196 29 do we get our desired -294. After that, five more inputs yield the three small outputs 3, -4, 5. (Small terms contain less information and therefore come out quicker.) This data is based on the assumption of an output whenever the nearest integers to both the upper and lower limits are equal. Zero and Negative Terms Converting the preceding result to all-positive form, we use the identity function: . . . 5 -4 3 -294 16 7 3 22 3 1 0 113 7 1 0 1 3 -14093 -4703 16 1 0 7 -881 -294 1 0 15 -878 -293 1 -3 -1 292 33 7 -2 -1 1 19 4 -1 0 1 14 3 1 5 1 2 4 1 1 1 0 which is correct as far as it goes. The careful reader may wonder at the seemingly premature input of the term 5 to the matrix 7 -2 7 y - 2 = ------- 4 -1 4 y - 1 which seems to say "between 7/4 and 2", thus foreordaining an output of 1. Beware denominator elements of opposite sign! y between 0 and oo actually says "OUTSIDE 7/4 and 2", with a singularity at y = 1/4. y must be outside 0 and 1/3 to assure an output of 1 (as was the case). This raises the question of just what are reasonable assumptions about the range of y when we are dealing with an admittedly messed up continued fraction. The answer is that there are none, at least without some very special preprocessing of the input sequence. The problem is mainly with 0. The sequence ... a 0 b ... is equivalent to the single term ... a+b ..., since if p and q were any adjacent elements of an input process, . . . a+b . . . . . . b 0 a . . . == (a+b)p+q p q (a+b)p+q p ap+q p q i.e. the last two adjacent elements are in the same state either way. This seemingly innocent fact explains why the addition of an initial 0 term is equivalent to the deletion (when possible) of one: 0 0 x . . . = 0+x . . . = x . . . It also partly explains the funny preambles on certain "linear Hurwitz" numbers: e = 2 (1 2k+2 1) = 1 0 1 (1 2k+2 1) = (1 2k 1) 4 - = 1 2 8 3 (1 1 1 k+1 7 1 k+1 2) e = 1 2 1 0 7 1 0 2 (1 1 1 k+1 7 1 k+1 2) = 1 2 (1 k 7 1 k 2 1 1) . 17 sqrt(--) = 1 (3 3 2) = (1 3 3 1 0) 10 (Hurwitz numbers are those which can be written in this parenthesis notation using polynomials in k. Hurwitz's theorem states that this property is preserved by homographic functions with rational coefficients. Square roots of rationals are all of the form a (b c ... c b 2a) = (a b c ... c b a 0) . More on this later.) The mischief comes with sequences like . . . 1 2 3 4 5 0 -5 -4 -3 -2 -1 . . . = . . . 0 . . . wherein it seems to be saying something and then takes it all back. This allows a peculiar but complete reversibility of continued fraction computations--by merely inputing or outputing 0 and then several negated terms in reverse order, the computation can back up to any previous state, but unless the maximum length of these "retraction palindromes" can be bounded in advance, there is no reliable way to collapse them out with a sequential process. Even further confusion can be introduced with several applications of the rule: -a -b -c -d ... = -a-1 1 b-1 c d ... In practical situations, however, you really can avoid these problems, and the only other nonsense sequence to watch out for is . . . -1 1 -1 1 -1 1 . . . But these can be detected when they begin, whereupon you can shut of output until they stop. You can also simply discard three pairs at a time, since the only effect is to negate the whole state matrix: . . . 1 -1 1 -1 1 -1 . . . -p -q q-p -p q q-p p q . Increasing the Convergence Rate of Continued Fractions Reexpressing Series as Continued Fractions R notation Conversion to Decimal . . . 1 15 7 3 22 3 1 0 7 1 0 1 3 ****** 150 10 0 106 7 1 1 ******* 440 30 106 7 4 ******* 180 160 20 113 106 7 1 ******** 670 540 113 106 That is, just follow the recipe for the homographic function of one argument, except on output, you multiply by 10 after the subtraction, instead of reciprocating. On paper, this necessitates recopying the denominators, which resembles the outputing of 0. Thus, a decimal fraction can be thought of as a continued fraction with two terms for every digit. The denominators are the decimal digits alternated with 0s, while the numerators are alternately 1 and 10. On output, the gcd of the matrix can be multiplied by a divisor of 10. This can be detected simply by maintaining modulo 10 versions of the two denominator coefficients. In the special case that the input continued fraction is regular (as above), only a finite number of such reductions can occur, corresponding to the total number of factors of 2 and 5 that the initial denominator coefficients shared in common. Thus, although there is little reduction to be gained in the regular case, there is also little effort-- you need only count the 2s and 5s in the gcd of the initial denominator terms, and cancel out at most one of each with each output multiplier of 10. A curiosity worth noting is that when this decimal (or especially octal) conversion is performed on the nonregular fraction for arctan 1 (as in the section on nonregular fractions), the number of reductions by 2 depends drastically upon the original denominator coefficients. If they are 0 and 1, for instance, there will be four times as many cancellable powers of 2 than if they are 1 and 0. Thus, by this method, 1/pi is significantly easier to calculate than pi. This fact may be connected with the observation that infinite series for 1/pi seem to be simpler and more rapidly convergent than series for pi. Conversion From Decimal is immediate, since, for instance 1 pi = 3.141592... = 3 + --------- 10 0 + --------- 1 1 + --------- 10 0 + --------- 1 4 + --------- 1 10 = 3 + ---------- 0 + --------- 1000 1 0 + ---------- 1 + --------- 1 10 141 + ---------- 0 + --------- 1000 1 592 + --------- 5 + --------- 10 . . . 0 + --------- 1 9 + --------- . . . and we already know how to deal with non-regular continued fractions. Conflicting Notations oo matrices left to right. Approximations Comparison rule: If we call the integer part of a continued fraction the 0th term, then we can say that the a (regular) continued fraction is an increasing function of its even numbered terms, and a decreasing function of its odd numbered terms. Thus, to compare two continued fractions, compare the terms at the first position where they differ, then reverse the decision if this position is odd numbered. If one continued fraction terminates before differeing from the other, regard the shorter one to be suffixed by an infinite term. The comparison rule can also follow from the rule for subtracting two continued fractions with zero or more initial terms in common. If w = a b c ... p x and y = a b c ... p z , where x and z are the tails of the two fractions, then Nx + n Nz + n N n w = ------ and y = ------ , where is the Dx + d Dz + d D d matrix resulting from the input of a b c ... p to the identity matrix 1 0 0 1 . Then u (x - z) w - y = ----------------- (Dx + d) (Dz + d) where u is the determinant Nd - nD = 1 or -1 respective of whether there was an even or odd number of inputs to th matrix. Note that this expression is independent of N and n, so we need only compute the bottom row of the matrix. But the bottom row is what you would get by dropping the original input term, the computing the top row. We thus save another step. (If two continued fractions start with the same term, it is clear that their difference is independent of the value of that term.) Simplest intervening rational: In a recent popularity poll, parents were asked what they thought of the idea of teaching schoolchildren continued fractions instead of long division. Sixty nine percent of those responding though it was a communist plot. What is the smallest number of people who could have been polled? Presumably, by 69% the pollsters didn't mean exactly 69 of every 100 but rather some fraction p/q which is at least .685 but less than .695. Our problem is to find in the half-open interval [.685, .695) the rational with smallest q. (A half-open interval contains its left endpoint but not its right one.) If p/q and r/s are distinct nonnegative rationals in lowest terms, we will say that p/q is simpler than r/s if p<=r and q<=s. It may be that of p/q and r/s, neither is simpler than the other. In this case, however, there is always some rational numerically between them which is simpler than either. (E.g., 11/8 is between and simpler than 11/10 and 13/8. 11/8 is the minimum of the numerators over the minimum of the denominators.) It follows that there is a simplest rational in every finite interval, since there can only be a finite number, pq, of rationals simpler than any rational p/q. If our intervals can include oo, we shall treat it as if it were 1/0, thus defining oo to be simpler than any rational besides 0 (i.e., 0/1). The motivation for this is related to oo having the empty continued fraction. Now we have defined the simplest rational in any interval with nonnegative endpoints which does not include both 0 and oo. We leave it to the philosophers to determine which of these two numbers is simplest. The pollster problem now becomes: what is the denominator of the simplest rational in [.685, .695) ? This is easy to solve if we first determine that the continued fractions of the two endpoints are .685 = 0 1 2 3+ and .695 = 0 1 2 5+ through the first term where they differ. By 3+ we mean some number greater than 3 but less than 4, and similarly for 5+. From this comparison rule, all of the numbers in the interval have continued fractions = 0 1 2 x, where x is in the interval (3+, 5+], what ever those two numbers happen to be. The simplest number in this interval is 4. The simplest rational in [.685, .695) is therefore the number whose continued fraction is 0 1 2 4 = 9/13 .(692307) so as few as 13 people may have been polled, provided that they all responded. This rationale is amplified on the page after next. (I can't resist pointing out that dividing 13 into 9 is a great example of the tail chasing trick mentioned on page 1: after producing the digits 6 and 9, the remainder is 3, which is 1/3 of the initial "remainder" (namely, the dividend) 9. This means that we can compute the rest of the quotient digits 230769... simply by dividing 3 into 692307... .) Of course, since we really only wanted the denominator 13, we could have skipped the computation of the numerator 9: 4 2 1 0 13 3 1 1 0 1 , (relying on the input process to produce lowest terms.) But then if we wanted to check the answer, we would have to multiply 13 by .69 and see if the result was very near an integer. The computation of two nearby continued fractions can be streamlined considerably, if we do not wish to go much beyond where they disagree. Begin an ordinary Euclid output process on one number (for variety, we choose the larger), while keeping track of the separation interval, as we did with the gas constant. For .695/1 and .685/1, .695 -.010 1.000 .000 0 .695 -.010 1 .305 +.010 2 .085 -.030 3 .050 +.100 stopping when the magnitude of the interval width exceeds the last remainder. At this stage, the last term would have been different, had we used the other endpoint. But we can easily switch over to computing the other endpoint by adding in the interval widths on whichever two consecutive lines we wish: .695 -.010 1.000 .000 0 .695 -.010 1 .305 +.010 2 .315 .085 -.030 3 .055 5 .050 +.100 .040 Since both contnued fractions were infinished when we stopped them, we have shown, with very little manipulative effort, that .695 = 0 1 2 3+ and .685 = 0 1 2 5+ as required. Fact (theorem): If A and B are positive rationals, with A simpler than B, then C + 1/A is simpler than C + 1/B, where C is a nonnegative integer. But C + 1/A and C + 1/B are what you get by prefixing the term C to the continued fractions for A and B. This means that in determining which of two (terminating) continued fractions represents the simpler rational, we can ignore any initial sequence of terms that they have in common. The continued fraction of the simplest rational included in an interval consists of this common initial term sequence, to which is appended the continued fraction of the simplest rational in the interval determined by erasing the common initial sequence from the original endpoints. We characterize an interval as a pair of endpoints in [0, oo]. Associated with each endpoint is a flag saying whether or not the endpoint is included in the interval. When we modify the continued fraction of an endpoint to produce a new endpoint, we will be careful not to modify this flag. Recipe #1 for the simplest included rational: Write as continued fractions the two endpoints of the interval in question, stopping at least one term beyond the one where they first disagree, except that if one of them terminates before disagreeing, suffix to it a term of oo. Set aside whatever initial sequence of terms they have in common. These terms will be the initial ones of the simplest included rational, as well as all of the other numbers in the interval. Now we need only find the simplest rational in an interval whose endpoints have continued fractions which start with different terms. (If we have set aside all of the terms of one endpoint, we are left with an explicit oo.) The only way this new interval can fail to contain oo or at least one integer is if the upper endpoint is itself an exact integer, but is excluded by virtue of being the tail of an endpoint which was excluded in the original problem. It must also be the case that the lower endpoint has an initial term 1 less than the upper one. The easiest thing to do in this case is to rewrite the single term which is the upper endpoint as the previous integer followed by a 1, e.g., 3 1 instead of 4. Then we again have two continued fractions which start with the same term, and can proceed with the recipe. Recipe #2 for simplest included rational: If the interval contains oo, return the answer oo. IF the interval contains any integers, return the smallest one. Otherwise let g be the greatest integer in the smaller endpoint. Return g + the reciprocal of the simplest rational in the inttefval determined by reciprocating the fraction parts of the original endpoints. The reason Recipe #2 sounds easier is that it avoids the question of how to do the arithmetic. When it comes down to doing the work, Recipe #1 will save you plenty. Here is an example which illustrates the tricky points. A sportscaster remarks that Joe diMolisho batted over .312 last year. Unfortunately, the sportscaster is notoriously pro diMolisho, so you can bet theat if Joe batted as much as .3125, his friend in the booth would have said "Joe batted .313 last year". At least we know Joe saw a fair amount of action, by determining the simplest rational in the open interval (.312, .3125) (both endpoints excluded): .3120 +.0005 1.000 .0000 0 .3120 +.0005 3 .0640 -.0015 4 .0560 +.0065 1 .0625 .0080 -.0080 7 .0000 oo .0000 +.0625 thus establishing that .312 = 0 3 4 1 7 and .3125 = 0 3 4 1 oo = 0 3 5 . Notice that our streamlined algorithm conveniently produced the continued fraction for .3125 in just the (nonstandard) form we needed for the recipe. We have only to find the simplest rational in (7, oo), which is 8 because the oo is not in the interval. So, dMolisho's simplest possible performance ratio is 0 3 4 1 8 = 44/141 = .312051... indicating at least 141 at bats. Rounding rule: When you discard the tail of a continued fraction, you effectively subtract from the last term retained the reciprocal of the quantity represented by the tail. This reciprocal is greater than 1/2 iff the first term of the tail is 1, indicating that the last retained term should be incremeted just when the first discarded term is 1. Another way to look at it is that a final 1 can always be combined into the preceding term, so why truncate just before a 1 when truncating just after it will give a more accurate estimate with the same number of terms? Best truncations: Whether or not you round, a truncated continued fraction has the property of being the closest rational approximation to the untruncated number, subject to having such a small numerator. (No simpler rational comes as close.) The only other rationals with this property can all be formed by reducing the last term of the truncated fraction by up to 50%. For example, after 355/113, what is the next better rational approximation to pi? pi = 3 7 15 1 292 1 1 1 2 1 . . . and 355/113 = 3 7 15 1 so 103993/33102 = 3 7 15 1 292 is better than 355/113 (and by the rounding rule, 104348/33215 = 3 7 15 1 293 is better still), but are there any approximations of intermediate accuracy and simplicity? Indeed, reducing the terminal 292 to any integer greater than 292/2 = 146 will produce intermediate approximations, while the approximation 51808/16491 = 3 7 15 1 145 is actually worse than 355/113. To test the borderline case of 52141/16604 = 3 7 15 1 146, we perform the following simple but mysterious ritual: write pi as 3 7 15 1 146 0 146 1 1 2 1 . . . . Then, delete the first term and fold the left-hand portion over the 0: 146 1 15 7 146 1 1 1 2 1 . . . . Because the upper continued fraction numerically exceeds the lower one (by the comparison rule), 52163/16604 = 3 7 15 1 146 is the next better approximation to pi after 355/113 (!) The improvement, however, is microscopic: less than 2 parts in a billion. Mathematical explanation: suppose we wish to truncate z = a a ... a a a ... 0 1 i i+1 i+2 by discarding the terms beginning with term i+2. How far can we reduce term i+1 before it would be better to simply discard it too? Define N - = a a ... a x = a a ... D 0 1 i i+1 i+2 n - = a a ... a d 0 1 i-1 Nd - Dn = u (u = +or- 1) Then Nx + n z = ------ Dx + d N n a 1 a 1 a 1 ( ) = ( 0 ) ( 1 ) ... ( i ) D d 1 0 1 0 1 0 Transposing both sides, N D a 1 a 1 a 1 ( ) = ( i ) ... ( 1 ) ( 0 ) n d 1 0 1 0 1 0 implying D - = a ... a a d i 2 1 The error introduced by replacing the tail x by the single term p is Nx + n Np + n u (x - p) ------ - ------ = ---------------- Dx + d Dp + d (Dx + d)(Dp + d) while the error introduced by simply discarding the tail is N Nx + n u - - ------ = ---------- D Dx + d D (Dx + d) The crossover point is when these two errors are equal, i.e., when d p + - = x - p D or p a ... a a = (a -p) a ... i 0 1 i+1 i+2 so if we think of this truncation as chopping off the continued fraction "in the middle of a term", we have the following peculiar yet simple rule: to test whether our chop has produced the best rational approximation possible with such a small numerator, reverse the sequence of terms that we kept, and discard what was originally the zeroth term. Compare this, as a continued fraction, with the part we chopped off (x-p). If the part we chopped off is greater than or equal to the reversed part, we would have done better to chop off the whole term. Continued Logarithms There is a mutation of continued fractions, which I call continued logarithms, which have several advantages over regular continued fractions, especially for computational hardware. As with ordinary continued fractions, each number and subexpression will be a microprocess which describes itself a little at a time, but instead of continued fraction terms, our description language will have only two words, "0" and "1". A 1 means "I was at least 2, so I halved myself". A 0 means "I was between 1 and 2, so I subtracted 1 and reciprocated myself (so now I am > 1)". For example, we compute the continued logarithm of 100/2.54 : 11111 100/2.54 -> 50/2.54 -> 25/2.54 -> 25/5.08 -> 25/10.16 -> 25/20.32 0 25/20.32 - 1 = 4.68/20.32 11 20.32/4.68 -> 10.16/4.68 -> 5.08/4.68 0 5.08/4.68 - 1 = .40/4.68 111 4.68/.40 -> 2.34/.40 -> 1.17/.40 -> 1.17/.80 0 1.17/.80 - 1 = .37/.80 1 .80/.37 -> .40/.37 0 .40/.37 - 1 = .03/.37 111 .37/.03 -> .37/.06 -> .37/.12 -> .37/.24 0 .37/.24 - 1 = .13/.24 0 .24/.13 - 1 = .11/.13 0 .13/.11 - 1 = .02/.11 11 .11/.02 -> .11/.04 -> .11/.08 0 .11/.08 - 1 = .03/.08 1 .08/.03 -> .04/.03 0 .04/.03 - 1 = .01/.03 1 .03/.01 -> .03/.02 0 .03/.02 - 1 = .01/.02 1 .02/.01 -> .01/.01 0 .01/.01 - 1 = 0 oo so 100/2.54 = 111110110111010111000110101010 . Alternatively, we could write it as the sequence of lengths of bursts of 1s: 5,2,3,1,3,0,0,2,1,1,1. In the latter representation, each term is the integer part of the log base 2 of the number remaining to be described. As with regular continued fractions, oo is the empty sequence, rationals are the finite sequences, and many (but not all!) quadratic surds have periodic sequences. Unlike regular continued fractions, integers can have long sequences, and Hurwitz numbers seem patternless. The direct expression of a continued logarithm as a nonregular continued fraction: 5 100 5 2 ---- = 2 + --------- 2.54 2 2 2 2 + --------- 3 3 2 2 + --------- . . . 1 1 2 2 + --------- 1 2 . That is, each denominator and succeeding numerator must be equal powers of 2. Why Use Continued Logarithms? The primary advantage is the conveniently small information parcel. The restriction to integers of regular continued fractions makes them unsuitable for very large and very small numbers. The continued fraction for Avogadro's number, for example, cannot even be determined to one term, since its integer part contains 23 digits, only 6 of which are known. Also, mechanisms for handling such gigantic terms would have to be built, only to lie dormant throughout most computations, since huge terms are very rare except at the beginning of huge numbers. By contrast, the continued logarithm of Avogadro's number begins with its binary order of magnitude, and only then begins the description equivalent to the leading digits--a sort of recursive version of scientific notation. Another problem related to huge terms could be called the .999... problem. Suppose you were using continued fraction arithmetic to multiply sqrt(2) = 1 2 2 2 ... by sqrt(2), but without any assurance that the two numbers are, in fact, identical. This means that at any time one of the input terms might turn out to be something other than 2. Depending upon whether this occurs on an even or odd term, the numerical value of the product will be 2.0000+ or 1.9999+, or, as continued fractions 2 ... or 1 1 ... But until that deviation occurs (maybe never), the first term of the continued fraction is in doubt. A partial solution to this problem is to forcibly egest a 2 when it becomes clear that this module is unduly stuck. If we are wrong, the gigantic term will simply come out negative. What we would like to say now is "regard my next term as infinite until further notice", hoping that we are indeed through. But this is not enough for the functions which depend on our output, to which they will eventually become extremely sensitive. They will need to know "just how close to infinity are you?", but we, faced with an ever growing number with oscillating sign, have no way to answer. We do not even know when we should input more 2s (at least we hope they are 2s). The information- drivenness has broken down. With continued logarithms, there is no problem at all, if we regard a 1 to mean "my MAGNITUDE was at least 2, so I halved myself". Our function will simply produce 1s as long as the input patterns hold, and a constant stream of 1s is at least as informative to a superior function as any other string. Simply stated, continued logarithms allow us to say that the first digit of infinity is 1. A slight embarrassment could occur if it turned out that one of the inputs was really < sqrt(2), since we have not included in our language a mechanism for negative numbers (in this case it would serve as an expletive). We will see to this later. The Simple Details Suppose you are given the integers a, b, c, and d, and wish to compute the homographic function a x + b y = ------- c x + d with continued logarithms. For each x input of 1, y's value must be preserved by doubling a and c, or if possible, halving b and d. This is because x has halved itself. When x announces that it has reduced itself by 1, add a to b and c to d. When x announces it has reciprocated itself, interchange a with b and c with d. Equally easy is output of y. The knowledge that x is > 1 gives us a + b a y between ----- and - , c + d c provided c+d and d have the same sign. If both of these quantities are >= 2, y can emit a 1 and halve itself by doubling c and d, or if possible, halving a and b. If the two ratios lie between 1 and 2, y decrements itself by subtracting c from a and d form b, then reciprocates itself by interchanging a with c and b with d and finally announces all this by emitting a 0. Although these operations are not as nice on paper, they are beautifully suited to binary machines, requiring only shift, add, subtract, exchange, and compare-for-greater. To illustrate the power and simplicity of this mechanism, we will compute the continued log of sqrt(6) from first principles, by solving 6 y = - . y Setting up the matrix for 6/y, 0 6 1 0 we test whether y is greater or less than 2 by plugging in y = 2 to get 3. The fact that the value went up instead of down says that y > 2, since by Newton's method, the average is closer than either. So we outpulse and receive a 1, meaning to halve a and b, then double a and c. 0 3 2 0 1 Now plugging in y = 2 gives 3/4, so y < 2 and we oupulse and receive a 0, which is just like outputing and inputing a term of 1 with regular continued fractions. (In fact the golden ratio, whose continued fraction is all 1s, has continued log all 0s.) 0 3 2 2 0 10 1 -2 3 Here the assumption y >= 2 fulfills itself, while 1 <= y < 2 will drive 2 y + 2 ------- y - 2 negative. So again we emit and receive a 1 by halving a and b, then doubling a and c. 0 3 2 1 0 10 2 -2 3 1 Here again y > 2, requiring another 1, but since b is odd, we must double c and d, then also double a and c. 0 3 4 1 0 10 8 -4 3 11 Here y < 2, ending another 1 burst. 0 3 4 1 0 10 4 8 -4 3 110 1 -4 5 Here y must be > 2 (in fact > 4) to avoid chasing the negative root, and this time we can process the 1 by halving a and b, then halving b and d. 0 3 4 1 0 10 2 2 -4 3 110 1 -2 5 1 But the matrix was in this state right after emitting the first 0, so sqrt(6) = 10(1101) . In computational practice, we will need two other words besides 1 and 0 in the language. For negative numbers we could have either "-" for "I negated myself" or "+" for "I incremented myself". For fractions initially < 1, "*" would mean "I doubled myself", the opposite of "1". Also, for finite inputs, we formally require an end-of-string mark, which is logically oo, since the empty continued logarithm, like the empty continued fraction, represents +or- oo. Continued logs can also represent oo as an endless burst of 1s, if it results from nonterminating inputs. The Continued Logarithm of pi etc. Architecture If it is possible to make very long parallel adders, it should be possible to make a high-precision, ultrahigh speed arithmetic processor based on continued logarithms. It would be an extremely parallel device consisting entirely of registers and having no static memory. Such an architecture is feasible because, within a given bihomographic octet of registers, each bit must connect to only five others. Here is why. Schematically, we can think of our 2 by 2 by 2 matrix as a cube with a register at each vertex. Into this cube flow the two bit streams describing the operands x and y, and out of it flows the bit stream of the answer, z. No matter which of the three possible transactions we perform, the additions, subtractions, comparisons, and exchanges take place between registers joined by the edges of this cube. In fact, we could imagine that within each edge was the control logic for the transaction determined by that edge's direction. Thus each register bit need only talk to its three counterparts in the x, y, and z directions, plus its left and right neighbors (for shifting and carrying). Instead of wasting time testing which of the three possible transactions is most relevant, we will synchronously and cyclicly input x, input y, and output z on each major cycle. This will simplify the hardware at the cost of diluting the information density of the output stream by a small percentage, due to the occasional retraction of a premature output. Sadly, this will also cripple our automatic error analysis, but such is the price of speed. We could gain even more speed by making output decisons before the carries have settled, since this should introduce only slight further dilution. After our octet has received about 2n inputs and produced about n outputs, each of our registers will contain about n/4 significant bits. Carry times will grow as log n, so our quotient or product or whatever will have taken time proportional to n log n, like the FFT algorithms, but with a far smaller constant of proportionality. The four main advantages of this scheme over the FFT schemes are: 1) Simplicity--it is hardly more than a "smart memory", with a minimal form of processor distributed throughout. 2) Speed--we beat the traditional cost factor of dozens or even hundreds for multiprecision, with output bits typically separated by only slightly more than an integer add time. With all of the internal thrashing, it may waste energy, but not time. 3) Consequently, the crossover point is toward much smaller numbers, thus encompassing more applications. 4) Additional parallelism--we can interconnect networks of these octets which evaluate whole arithmetic expressions in parallel. In fancy models, we could imagine "octet pools" containing allocatable segments of register octets, to be dynamically distributed as register contents grew and shrank. Fancy or not, it should be possible to sustain ultraprecise computations to megabit accuracy, at megabit/second rates, using something not much more complicated than a large semiconductor memory. More conventional processors can not come anywhere near sustaining such an ouput rate since most of their bits are lying dormant in storage for relatively long periods. Even Illiac IV using FFT multiplication can only provide pi in megabit quantities at about 5 kilobits/sec, and only then with prodigious programming effort. The key idea is that with every bit of storage there be the associated logic to operate on that bit. The continued logarithm formulation restricts the number of data paths to a conceivably practical level.