Sample solutions and discussion Perl Expert Quiz of The Week #20 (20040729) This week's quiz is to write a limited replacement for the CPAN 'MLDBM' module. The purpose of MLDBM is to allow storage of complex Perl data structures in a DBM file. The typical DBM implementation (such as SDBM_File, which comes standard with Perl) allows the programmer to manipulate a disk database as if it were an ordinary hash variable. One says something like tie %hash, 'SDBM_File', $filename; and thereafter, $hash{"key"} = "value"; stores "value" in the disk file under the key "key", and $x = $hash{"key"}; retrieves the value again. But most DBM implementations are limited to storing plain strings as values. Attempting to store undef as a value actually stores the empty string instead, so that $hash{"key"} = undef; print defined($hash{"key"}) ? "oops" : "OK"; prints "oops". Similarly, attempting to store a compound object such as [1, 2, 3] or {name => "bill", numbers => [1, 2, 3] } actually stores a useless string like "ARRAY(0x436c1d)" or "HASH(0x436c1d)". Similarly, $hash{"this"}{"that"} = "ouch"; causes a "strict refs" failure, if you have "strict refs" checking on, and a bizarre error if not. (Sub-quiz: Why?) The purpose of MLDBM is to work around this. (See http://search.cpan.org/~chamas/MLDBM-2.01/lib/MLDBM.pm for complete details.) Your task this week is to write a version of MLDBM. It should be a tied hash class named 'ML_DBM', such that after tie %mldbm, 'ML_DBM', \%hash; the user can access %mldbm as if it were an ordinary hash, but all data apparently stored in %mldbm is actually stored in %hash instead. For example, this must work: $mldbm{'array'} = [5,6,7]; print $mldbm{'array'}[2]; # This must print 7 as must this: $mldbm{'hash'} = { I => { like => "pie" } } print $mldbm{'hash'}{'I'}{'like'}; # This must print "pie" and this: $mldbm{'a'}{'a2'} = 'AA'; $z = $mldbm{'a'}; $z->{'b2'} = "BB"; print $mldbm{'a'}{'b2'}; # This must print "BB" There is a trivial solution to this, which is just to store the values into %hash and fetch them back as requested. However, this quiz has a restriction which rules out that solution: The values that ML_DBM stores in %hash must always be ordinary strings. The idea is that the user can tie %hash to their favorite DBM implementation, and then use that as the backing hash for ML_DBM: tie %hash => 'NDBM_File', $filename or die ...; tie %db => 'ML_DBM', \%hash; and ML_DBM will (unknowingly, but faithfully) store the data into the underlying disk file via %hash. ML_DBM must restrict the values it stores into %hash to ordinary strings, or else the underlying DBM implementation (NDBM_File in this case) will not faithfully preserve them. ---------------------------------------------------------------- First off, apologies for the late summary. [ Most of the lateness in the report is my fault, not Bradley's; he sent me his postmortem several days ago, and I held onto it for a while because I was playing around with the problem some more. - MJD ] Executive Summary: See http://search.cpan.org/~pinyan/MLDBM-Easy-0.01/lib/MLDBM/Easy.pm My Aborted Attempt: Since I had never played with tie'ing I thought I would make a go of this quiz (although MJD warned me it would be a difficult one). [ It turned out to be more difficult than I realized. - MJD ] My first stop was: perldoc perltie which I perused for several minutes gathering what information I thought necessary for my attempt. Armed with my newfound knowledge I chose to focus on something mentioned in the quiz: 'There is a trivial solution to this, which is just to store the values into %hash and fetch them back as requested.' I decided that this would be my first goal, simply to implement the trivial solution. I quickly coded up a (far from complete) MyHash.pm that did just that: package MyHash; use strict; use warnings; sub TIEHASH { my ( $class, $hash ) = @_; return bless $hash, $class; } sub FETCH { my ( $self, $key ) = @_; return $self->{$key}; } sub STORE { my ( $self, $key, $value ) = @_; $self->{$key} = $value; } package main; my ( %hash, %tied ); tie %tied, 'MyHash', \%hash; $tied{'foo'}->{'bar'}->{'baz'} = 'test'; print $tied{'foo'}->{'bar'}->{'baz'}, "\n"; Ok, that worked. My next step was to make use of Data::Dumper to see what was actually being passed to STORE: sub STORE { my ( $self, $key, $value ) = @_; print Dumper \@_; $self->{$key} = $value; } I assumed it would be a reference to a data structure that would have to be serialized before being stored. Then it would have to be unserialized (deserialized?) in the FETCH method. I had even gone so far as deciding to try and use YAML for the serialization. Imagine my confusion upon seeing this output from 'perl MyHash.pm': $VAR1 = [ bless( {}, 'MyHash' ), 'foo', {} ]; test 'So the value passed to STORE is not yet fleshed out?' I thought to myself. 'But how can you serialize data that is not there?'. The next thought was: 'Now what?' So I decided to actually visit http://search.cpan.org/~chamas/MLDBM-2.01/lib/MLDBM.pm as recommended in the quiz. There I quickly learned of the limitations of MLDBM with regards to working with references. I decided that I could implement a solution that worked but was under the same restrictions as MLDBM. I pseudo-coded until I was satisfied that I could at least achieve that modest end. I then turned my thoughts to ways of overcoming those restrictions. In all honesty there was not much there. I briefly tried to think of a way to turn: $tied{'foo'}->{'bar'}->{'baz'} = 'test'; into: # where | is some arbitrary separator character %hash = ( 'foo' => 'foo|bar', 'foo|bar' => 'foo|bar|baz', 'foo|bar|baz' => 'test' ); The only thing I could think of was to make use of tie'd hashes that would know their 'namespace' and define a STORE method that used the 'namespace' to store the value given. Then my brain exploded and I gave up for the night. That is as far as I got and I should now turn to submissions that actually solved the given problem. Submissions: I counted three different submissions, two from Randy W. Sims and one from Michael C. Toren. One of the submissions from Randy W. Sims has the same restrictions as MLDBM (with the exception of correctly storing an undef value) and so I will only consider his second submission. The similarity between the two (fully conformant) submissions is that both make use of a 'frontend' hash that allows the serialization to be delayed until the data structures being stored are fully fleshed out. After this delay the serialized data is then stored in the 'backend' hash that was passed to tie. Where the two solutions differ is in how the data is serialized as well as how long the serialization is delayed. Michael C. Toren implemented his own serialization technique while Randy W. Sims made use of Data::Dumper. I have not spent enough time looking at Michael C. Toren's serialization code to feel confident in making any comparisons between it and Data::Dumper. Randy W. Sims' submission delays the serialization of data structures until the next operation on the tied hash while Michael C. Toren's submission delays the serialization until the hash is untied or destroyed. From my understanding of their code Randy W. Sims' method has an advantage since the frontend hash will never contain the entire contents of the backend hash unlike Michael C. Toren's submission. Regardless of their differences, both submissions from Randy W. Sims and Michael C. Toren solved the given quiz successfully. Congratulations to them. ---------------------------------------------------------------- Sub-quiz: Similarly, attempting to store a compound object such as [1, 2, 3] or {name => "bill", numbers => [1, 2, 3] } actually stores a useless string like "ARRAY(0x436c1d)" or "HASH(0x436c1d)". Similarly, $hash{"this"}{"that"} = "ouch"; causes a "strict refs" failure, if you have "strict refs" checking on, and a bizarre error if not. (Sub-quiz: Why?) ---------------------------------------------------------------- I did not see anyone comment on the given sub-quiz but here are my observations: The code: use strict; use warnings; use Fcntl; # For O_RDWR, O_CREAT, etc. use SDBM_File; my %h; tie( %h, 'SDBM_File', 'filename', O_RDWR|O_CREAT, 0666 ) or die "Couldn't tie SDBM file 'filename': $!; aborting"; $h{"this"}{"that"} = "ouch"; gives an error: Can't use string ("HASH(0x1abedc4)") as a HASH ref while "strict refs" in use at SDBM_test.pl line 11. which seems to be self-explanatory. Removing the 'use strict' did not produce a bizarre error as was indicated it should so I cannot comment on that part of the sub-quiz (perhaps my version of Perl affects the expected error?). [ The "bizarre error" is not a visible failure; it's an invisible erroneous behavior. Perl has a feature called a "symbolic reference", in which an ordinary string, say "foo", can be used as if it were a reference. The result of using "foo" in a place where Perl expects an array reference is that Perl silently treats the string as though it were a reference to the array @foo. Similarly, if "foo" is used as if it were a hash reference, it behaves as though it were a reference to %foo. In advanced applications, this feature can be very useful, but it is also the cause of many common error that are extremely difficult to track down. Let's consider this simple function, which is something like Perl's built-in "push" operator: sub my_push { my ($aref, @items) = @_; push @$aref, @items; } Normally, 'my_push' should be invoked like this: my_push(\@my_array, 1, 2, 3); print "@my_array"; # prints 1 2 3 which pushes 1, 2, 3 onto the end of @my_array. This works also: my $aref = \@my_array; my_push($aref, 1, 2, 3); print "@my_array"; # prints 1 2 3 But this does not: my $aref = \@my_array; my_push("$aref", 1, 2, 3); print "@my_array"; # prints NOTHING! The quotation marks around $aref turn $aref from an array reference into a string, something like "ARRAY(0x436c1d)". 'my_push' then treats this string as though it were a reference, but it isn't. Perl pushes the 1, 2, 3 not into @my_array but into the array with the extremely bizarre name @"ARRAY(0x436c1d)". The programmer thought they were storing the 1, 2, 3 into a known place, but actually the 1, 2, 3 are now lost, stored into a mystery bizarro array. This is exactly the kind of error that "strict refs" was created to diagnose. In Pr. Embree's example code, this error is exercised: my %h; tie( %h, 'SDBM_File', 'filename', O_RDWR|O_CREAT, 0666 ) or die "Couldn't tie SDBM file 'filename': $!; aborting"; $h{"this"}{"that"} = "ouch"; Perl creates a new, fresh hash, {}, and tries to store a reference to it into the SDBM file under the key "this". But doing this convers the reference to a string, and what is actually stored in the file is the string "HASH(0x1abedc4)". Perl then fetches this value back and uses it as a reference to a hash, in which "ouch" is stored under key "that". In an ordinary hash, this works properly. But with the SDBM file, what is used as a hash reference is the string "HASH(0x1abedc4)". If "strict refs" is on, using this string as if it were a hash reference is a fatal error. With "strict refs" disabled, Perl stores the string "ouch" into ${"HASH(0x1abedc4)"}{"that"}. There is a good chance that this data will be lost for ever. Even if it isn't, it is certainly not what the programmer intended. That is the "bizarre error". My thanks to Pr. Embree and also to everyone who contributed, whether visibly or not. -- MJD. ]