Date: Mon, 03 Sep 2001 03:55:52 -0400
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: random link from web page
Message-Id: <3B933788.38EF4540@earthlink.net>

confused wrote:
> 
> ok, i am using the code at the bottom of this message to get links
> from a web page. How do i get it to return only 1 random link from the
> page so every time you goto the cgi page, it will show a different
> link ?
> 
> #!/usr/bin/perl -w
> 
> use strict;
> use HTML::LinkExtor;
> use LWP::Simple;
> 
> my %seen;
> my $url = "http://www.yahoo.com";
> my $parser = HTML::LinkExtor->new(undef, $url);
> $parser->parse(get($url))->eof;
> my @links = $parser->links;
> foreach my $linkarray (@links) {

Why are the above two things on seperate lines?  The only place that the
result of $parser->links is used is the foreach... you don't need the
results after that, so don't store them.

foreach my $linkarray ( $parser->links ) {

>     my @element = @$linkarray;
>     my $elt_type = shift @element;
>     while (@element) {
>         my ($attr_name, $attr_value) = splice(@element, 0, 2);
>         $seen{$attr_value}++;
>     }

Why get parts you don't need?  Replace the entire insides of the for
loop with:

	for( my $i = 2; $i < @$linkarray; $i += 2 ) {
		$seen{ $linkarray->[$i] }++;
	}

> }

Here's another way of doing the foreach loop:
@seen{map {
	my $la = $_;
	map { $la->[$_ & ~1] } 2 .. $#$la;
} $parser->links} = ();

I wouldn't suggest using it, but you might have been considering it.

This also works:
foreach my $linkarray ( $parser->links ) {
	@seen{ map { $linkarray->[$_ & ~1] } 2 .. $#$linkarray } = ();
}

>  print "Content-type: text/html\n\n";

Are you sure that this is what you want to be printing?
I would think you would want to *go to* the link you'd found.
This would be done by printing a Location header instead of a
Content-type header.

> for (sort keys %seen) { print $_, "\n" };

To get one random link from %seen, you can do either of:
	print ((keys %seen)[rand(scalar keys %seen)], "\n");
or:
	my $randval;
	for( my ($idx,$iter)=(1); $iter = each %seen; ++$idx ) {
		$randval = $iter if( rand() < 1/$idx );
	}
	print $randval, "\n";

Both work equally well.  The first uses more memory, but is faster.
The second is slower, but uses almost no memory.  Which one you prefer
depends on your needs.

Here's all the things I suggested put together.


#!/usr/bin/perl -w

use strict;
use HTML::LinkExtor;
use LWP::Simple;

my $url = "http://www.yahoo.com";
my $parser = HTML::LinkExtor->new(undef, $url);
$parser->parse(get($url))->eof;
my %seen;
foreach my $linkarray ($parser->links) {
	for( my $i = 2; $i < @$linkarray; $i += 2 ) {
		$seen{ $linkarray->[$i] } = 1;
	}
}
print "Location: ";
print +(keys %seen)[rand(scalar keys %seen)];
print "\n\n";
__END__


-- 
"I think not," said Descartes, and promptly disappeared.


