Notes about Infinite Lists

M-J. Dominus

Simplified Hamming function
Performance Notes
Multi-way merge
Answer to Exercise
Errata

1: Simplified Hamming function

Rujith S. de Silva points out:

Your hamming function does not need the `hack' you used:


  my $href = \1;                # Dummy reference

The following is all you need, because Perl evaluates variables in closures as late as possible:


sub hamming {
  my $hamming;
  $hamming =
      Stream->new(1, 
 	  sub { &merge($hamming->scale(2),
		       &merge($hamming->scale(3),
			      $hamming->scale(5)
			      ))
		  }
          );
}

Which is absolutely correct; I've included Rujith's version of the subroutine into Stream.pm, so that you can compare them if you want to.

Rujith continues:

In this respect, Perl is unlike Java, which evaluates variables in closures at the time the closure is being constructed. I can't see how to do the above in Java - can you figure it out?

If you have any thoughts on this, I'd be delighted to hear about them at mjd-tpj@plover.com.

2: Performance Notes

In my article, I said:

It only takes a few minutes to compute three thousand Hamming numbers, even on my dinky P75 computer.

The bottleneck here was not CPU power, but available memory. I put in a memory upgrade last week, and now the same subroutine on the same CPU produces the three thousand numbers in about twenty seconds.

3: Multi-way merge

Gene Hsu took up my suggestion of a 3-stream merge function further, and sent in an n-stream merge function:


sub merge {
    my @streams = @_;
    my @nonEmptyStreams = grep !$_->is_empty, @streams;

    return @nonEmptyStreams[0] if scalar @nonEmptyStreams < 2;

    my @heads = map $_->head, @nonEmptyStreams;

    # now figure out what the smallest head value is
    # first initialize the minimal value for min
    my $min = shift @heads;
    for (@heads) {
	if ($min > $_) {
	    $min = $_;
	}
    }
    Stream->new(
		$min,
		sub { &merge( map { $_->head == $min ?
				    $_->tail :
				    $_ }
				@nonEmptyStreams) });
}

This did indeed speed up the hamming subroutine by about 13%. I did a little hacking on this myself, and produced this version, which is about 10% faster than Gene's:


sub nmerge2 {
    my @streams = @_;
    my $min;
    my @minstreams;
    my @unminstreams;
    my $headstream;

    1 while (($headstream = shift @_) && exists($headstream->{e}));
    return Stream->empty unless $headstream;
    $min = $headstream->{h}; @minstreams = ($headstream);

    foreach $s (@_) {
      next if exists($s->{e});
      my $h;
      if (($h = $s->{h}) < $min) {
	$min = $h;
	push @unminstreams, @minstreams;
	@minstreams = ($s);
      } elsif ($h == $min) {
	push @minstreams, $s;
      } else {
	push @unminstreams, $s;
      }
    }

    Stream->new($min, sub { &nmerge2(@unminstreams, 
				     map {$_->tail} @minstreams) 
			      }
		);
}

I still believe that a specialized 3-stream version would be fastest of all. If you want to try speeding this up, you can get a benchmarking script.

4: Answer to Exercise

The article contained an `exercise for mathematically inclined readers':

What's interesting about this stream?


	&tabulate(sub {$_[0] * 3 + 1}, 1)->prime_filter

You are probably familiar with the fundamental theorem of arithmetic, which states that every integer has a unique factorization into primes. In fact, the most interesting (and difficult-to-prove) part of that theorem is the `unique' part. You can factor 24 several ways---into 4*6 or 8*3, for example---but when you finish factoring the factors into primes, you always end up with 2*2*2*3 in some order.

It's not obvious that unique factorization is actually a property---it can be hard to imagine how it could be otherwise. But it is an important property of the integers which is false in many similar systems.

The usual example of such a system is the 3n+1 domain, which includes the numbers

	1 4 7 10 13 16 19 22 25 28 ...

You'll note that the 3n+1 domain has all the same nice properties that the integers have: It's closed under multiplication, so that if you multiply two numbers of the form 3n+1, you get another of the form 3n+1; multiplication is commutative and associative, and soforth. And it does have primes; 4 is a prime, for example (because 2 is not in the domain). 28 is not a prime; it's the product of the two primes 4 and 7.

But one property the 3n+1 domain does not have is unique factorization. 100 is in the domain, but has two different factorizations into primes: 100 = 4 * 25 = 10 * 10. None of the numbers 4, 10, 25 can be factored further.

A property equivalent to unique factorization is that if p is a prime, and if a*b is divisible by p, then either a or b is divisible by p. This is true in the integers, but not in the 3n+1 domain: 4*25 is divisible by 10, but neither 4 nor 25 is divisible by 10.

The stream in the exercise:

	4 7 10 13 19 22 25 31 34 37 43 46 55 58 61 67 73 79 82 85 ...

is a list of all the prime numbers from the 3n+1 domain.

5: Errata

The square function on page 20 has an obvious error. It should say
```
    sub square { $_[0] * $_[0] }
  
```
For some reason, the output of the hamming function on page 21 is missing the number 27. It should read
```
    1 2 3 4 5 6 8 9 10 12 15 16 18 20 24 25 27 30 32 36 
  
```
I have no idea where the 27 went. The program is correct; I must have deleted the 27 somehow before we went to press.

Return to: Universe of Discourse main page | What's new page | Perl Paraphernalia

mjd-tpj@plover.com