Sample solutions and discussion Perl Expert Quiz of The Week #3 (20021030) Perl provides a function, 'gethostbyname()', to translate host names to their addresses. For example, gethostbyname("perl.plover.com") returns the string \xd8\x9e\x34\x79 because that is the packed 4-byte IP address of perl.plover.com. Unfortunately, gethostbyname() has a drawback. It typically consults the DNS (domain name system) network service, and resolving a hostname make require many network queries. Sending these queries and receiving the answers or waiting for the timeouts can take a lot of time, so a Perl program might hang for a long time whenever it does 'gethostbyname'. You will write a replacement for gethostbyname() that does not suffer from this drawback; it will be asynchronous. The replacement will have two parts: 1. An asynchronous domain lookup server. This is a program that runs continually in the background, waiting for requests from clients to look up hostnames. There must be some way for a client process to contact the server and send it a hostname. The server must immediately return some sort of token that identifies the query uniquely. The server then performs the gethostbyname() call on behalf of the client. There must be some way for the client to find out whether or not the address has been obtained, and, if so, what the address is, using the previously-returned token to specify which previous query it is inquiring about. Finding out whether and answer is ready must be instantaneous. If the answer is ready, finding out the address must be instantaneous. 2. A Perl module that provides an interface to the asynchronous lookup service. The interface will look like this: use Gethostbyname::Asynchronous; my $query = Gethostbyname::Asynchronous->query("perl.plover.com"); until ($query->is_ready) { # ... do something else ... } # At this point, the service has located the address for us my $address = $query->answer; The ->answer method always returns the answer. If the answer isn't ready at the time the call is made, then ->answer waits until an answer is available and then returns it. The module must allow multiple simultaneous queries: for my $name (@names) { $address{$name} = Gethostbyname::Asynchronous->query($name); } while (...) { # ...do something else for a while... for my $host (keys %address) { if ($address{$host}->is_ready) { print "$host: ", $address{$host}->answer, "\n"; delete $address{$host}; } } # More other stuff going on here... } If you're not familiar with OOP techniques in Perl, you may implement a functional-style interface instead: use Gethostbyname::Asynchronous::Functional; my $qid = query("perl.plover.com"); until (is_ready($qid)) { # ... do something else ... } # At this point, the service has located the address for us my $address = answer($qid); I was really disappointed by the response to this question. I thought it would be perceived as a pertinent, realistic problem, small enough to be solved quickly, but sitll complete enough to be relevant. Apparently other folks thought it looked like too much work. Andreas Trottmamn was the only person to post a solution to the qotw-discuss list. There are a number of techniques one can use to build a server; I was hoping to be able to compare them. The client can connect to the server through a network socket, or through a pipe, or by using shared memory as a mailbox. I hoped someone would build a solution around POE. Andreas Trottmann's solution involved a network service, running on port 65432. His use of IO::Socket made the server code very small: #!/usr/bin/perl -w use strict; use IO::Socket; use Gethostbyname::Asynchronous; # define a "reaper" for our children (without this, we'll get a lot of zombies) $SIG{CHLD} = sub { wait }; my $sock = new IO::Socket::INET (LocalPort => $Gethostbyname::Asynchronous::portno, Proto => 'tcp', ReuseAddr => 1, Listen => 5, ); die "Can't create socket: $!\n" unless defined $sock; my $client; for(;;) { if($client = $sock->accept()) { my $pid = fork(); die "Can't fork: $!\n" unless defined $pid; unless($pid) { # $pid == 0 means we're in the child process chomp(my $hostname = $client->getline); print $client scalar gethostbyname $hostname || ''; exit 0; # this automatically closes the client socket } } # we're in the parent process, don't need *this* client anymore undef $client; } Andreas's server forks off a child process to handle each client; this is a simple and straightforward way to handle each client instantly. His interface module connects to the server at the well-known port and immediately returns the resulting filehandle; by doing select() on the filehandle a client can see whether the answer is ready, and by reading the handle, the client can get the answer: package Gethostbyname::Asynchronous; use strict; use vars qw(@ISA $portno); use IO::Socket::INET; use IO::Select; @ISA=('IO::Select'); $portno = 65432; # i hope that's an unoccupied number sub query { my $sock = IO::Socket::INET->new(PeerAddr => 'localhost', PeerPort => $portno, ); die "Can't connect to the gethostbyname server: $!\n" unless defined $sock; $sock->blocking(0); print $sock $_[1], "\n"; my $select = IO::Select->new; $select->add($sock); return bless $select; } sub is_ready { return scalar $_[0]->can_read(0); } sub answer { my($sock) = $_[0]->can_read; my $result; $sock->read($result, 4); my(@result) = unpack 'CCCC', $result; return join('.', @result); } 1; # give a successful return value in any case My own solution was quite different. I decided I didn't want to use sockets. (Why? Partly because I thought everyone else would do it; partly because I didn't want to write the networking code; I wanted to explore alternative solutions.) I first considered having the clients communicate with the server by writing through a named pipe (FIFO) in a public location. But pipes only offer half-duplex communication (you can send a message in either direction, but not in both directions at once) and it wasn't clear to me how I would get the server to send the token back to a client without possibly interfering with the requests from other clients that were also in the pipe at the same time; also, named pipes are not portable. So I eventually decided to use the file system as a mailbox. The services advertises a directory, defined by the module, to serve as a mailbox. I used /tmp/async-dns, but it could be anything at all. This directory contains two subdirectories, 'q' and 'a'. Clients make requests by depositing files in 'q', and can get the answers back by looking in 'a'. package Gethostbyname::Asynchronous; $HOME = "/tmp/async-dns"; my $S = "a"; sub query { my ($pkg, $hostname) = @_; my $token = join ".", $$, $S++; local *Q; open Q, ">", "$HOME/q/$token" or return; print Q $hostname; bless { token => $token } => $pkg; } The client is responsible for inventing its own token. I decided this was permissible, because it's entirely invisible to the user of the service; the token generation occurs inside the module. The token is just the process ID of the process making the request, combined with a unique per-request identifier. To make a request, the client deposits a file in /tmp/async-dns/q whose name is the desired token and which contains the hostname to be looked up. The server will eventually notice this, and when the answer is ready, it will deposit a file with the same name into the /tmp/async-dns/a directory; the contents of the answer file will be the desired address. The client can watch for this file: sub _ans_file { "$HOME/a/$_[0]{token}"; } sub is_ready { my $self = shift; -e $self->_ans_file; } When the file appears, the client can read it to find out the answer: sub answer { my $self = shift; sleep 1 until $self->is_ready; local *A; open A, "<", $self->_ans_file or return; my $result; { local $/; $result = } close A; unlink $self->_ans_file; $result; } The server doesn't need to fork; it just runs an infinite loop, alternately scanning /q for new requests, and depositing answers to requests into /a. The main loop looks like: while (1) { my @queries = find_queries(); for my $q (@queries) { my $addr = gethostbyname($q->{hostname}); record_answer($q->{token}, $addr); } } The complete server is: #!/usr/bin/perl use Gethostbyname::Asynchronous; my $HOME = $Gethostbyname::Asynchronous::HOME; my $query_dir = "$HOME/q"; my $answer_dir = "$HOME/a"; setup(); while (1) { my @queries = find_queries(); for my $q (@queries) { my $addr = gethostbyname($q->{hostname}); record_answer($q->{token}, $addr); } } sub find_queries { local *D; opendir D, $query_dir or die "Couldn't open query dir $query_dir: $!; aborting"; my @new_queries; while (my $token = readdir D) { local *Q; next unless -f "$query_dir/$token"; next unless open Q, "<", "$query_dir/$token"; local $/; my $hostname = ; push @new_queries, {hostname => $hostname, token => $token }; } unlink map "$query_dir/$_->{token}", @new_queries; @new_queries; } sub record_answer { my ($token, $addr) = @_; open A, ">", "$answer_dir/.tmp" or die "Couldn't open tmp answer file $answer_dir/tmp: $!; aborting"; print A $addr; close A; rename "$answer_dir/.tmp", "$answer_dir/$token" or die "Couldn't rename tmp answer file to $answer_dir/$token: $!; aborting"; } sub setup { my %perm = ($HOME => 0711, $query_dir => 0733, $answer_dir => 0733, ); my $old_mask = umask 0000; for ($HOME, $query_dir, $answer_dir) { if (-d) { my $perms = (stat $_)[2] & 0777; next if $perms == $perm{$_}; chmod $perm{$_}, $_ or die "Couldn't set permissions on directory $_: $!"; } elsif (-e) { die "$_ exists, but is not a directory! Aborting.\n"; } else { mkdir $_, $perm{$_} or die "Couldn't create directory $_: $!; aborting.\n"; } } umask $old_mask; } 1. One potential problem here is that clients are responsible for cleaning up the answers in the a/ directory when they are done with them; if they don't do this, the a/ directory will become full of old queries. Not only would this waste disk space, but it might introduce a bug. Suppose process 11111 makes query 11111.a, which the server answers; the answer is in /tmp/async-dns/a/11111.a. Process 11111 exits without deleting this file. Some time later, a new process 11111 makes a query, also called 11111.a, and then checks a/11111.a before the server has a chance to answer the query. It will see the old a/11111.a file with the wrong answer in it! It would be better to make the server responsible for cleaning up the a/ directory, but it can't because it doesn't have any way to know when an answer file can be thrown away. The network socket approach does not have this drawback. 2. Another potential problem is security. Suppose Alice is a bad person, and she can guess the tokens that Barbara's process is using. Then she may be able to delete the answer that the server has left for Barbara's process and replace it with a different address; barbara's process will then have the wrong address and go and talk to the wrong host! Similarly, Alice could delete Barabra's process's request before the server sees it, and replace it with a request for a different host. It's not hard to defeat this; the library should make the tokens hard to guess by including a large random number in each one: use Math::TrulyRandom my $token = join ".", $$, $S++, unpack("H*", truly_random_value()); 3. I used a hash as the basis for the object in the client code: bless { token => $token } => $pkg; Since the hash has only one element, I didn't need to have a hash at all; I might have just blessed $token itself: bless \$token => $pkg; Then for example, _ans_file would have become: sub _ans_file { "$HOME/a/${$_[0]}"; } In such cases I usually use a hash for two reasons: I feel that the code is clearer (you have the syntactic marker 'token' as a signpost wherever the token member data is being used), and also because objects often acquire more member data later on, and it's easy to add another member to a hash. Andreas opted to just return a reblessed IO::Select object, which is very simple. 4. One person said he didn't do it because he didn't like my interface. He wanted to do it with an event loop with a callback. I think that's rather silly. The interface I specified is very generic. It would be very easy to add an event loop interface around it, or to incorporate it into any sort of event loop framework. The reverse is not true. 5. Someone sent me private email, asking if this was a silly quiz question, since there is a feature of Net::DNS that makes asynchronous queries. I don't think it is silly, for two reasons. First, the design of such a facility is of interest by itself. (Or at least, I thought it would be of interest.) And second, unlike Net::DNS, it's easy to turn this library into a generic asynchonrous computation service; just changing a few lines of either my solution or Andreas's will make it into a library for asynchronously requesting web pages, or asynchronously sending email, or asynchronously performing any computation at all: while (1) { my @queries = find_queries(); for my $q (@queries) { my $result = eval($q->{code}); record_answer($q->{token}, $result); } } I don't think Net::DNS is going to do this. Thanks again for your interest. I will send another quiz tomorrow.