Why is Tie::File preferable to DB_File's DB_RECNO feature?

Date:        14 Mar 2002 11:44:48 -0000
Message-ID:  <l20020314114448.6084.qmail@plover.com>
From:        mjd@plover.com
Subject:     Re: Trying to update a file in perl 
In-reply-to: Your message of "14 Mar 2002 10:44:33 GMT."
                    <xn9g033zawe.fsf@voxel13.doc.ic.ac.uk>

Ed Avis said:
FWIW the DB_File module also can do this, so it's not clear why a new module was needed.

I'm glad you asked. DB_File is a great piece of software and it a good solution for many problems. But the DB_RECNO feature is less great than the rest of it. It has a number of serious defects.

DB_File reads the entire file into memory, modifies it in memory, and the writes out the entire file again when you untie the file. This is completely impractical for large files.

Tie::File does not do any of those things. It doesn't try to read the entire file into memory; instead it uses a lazy approach and caches recently-used records. The cache size is strictly bounded by the programmer. DB_File's ->{cachesize} doesn't prevent your process from blowing up when reading a big file.

DB_File has a crappy writing strategy. If you have a ten-megabyte file and tie it with DB_File, and then use

        $a[0] =~ s/PERL/Perl/;

DB_File will then read the entire ten-megabyte file into memory, do the change, and write the entire file back to disk, reading ten megabytes and writing ten megabytes. Tie::File will read and write only the first record.

If you have a million-record file and tie it with DB_File, and then use

        $a[999998] =~ s/Larry/Larry Wall/;

DB_File will read the entire million-record file into memory, do the change, and write the entire file back to disk. Tie::File will only rewrite records 999998 and 999999. During the writing process, it will never have more than a few kilobytes of data in memory at any time.

Since changes to DB_File files only appear when you do 'untie', it is inconvenient to arrange for concurrent access to the same file by two or more processes. Each process needs to call $db->sync after every write. When you change a Tie::File array, the changes are reflected in the file immediately; no explicit ->sync call is required, unless you have asked for it to be required.

Tie::File supports splice(). DB_File does not.

DB_File is only installed by default if you already have the db library on your system; Tie::File is pure Perl and is installed by default no matter what, starting in Perl 5.7.3 you can be absolutely sure it will be everywhere. You will never have that surety with DB_File. If you don't have DB_File yet, it requires a C compiler. You can install Tie::File from CPAN in five minutes with no compiler.

DB_File is written in C, so if you aren't allowed to install the modules, it is useless. Tie::File is written in Perl, so even if you aren't allowed to install modules, you can look into the source code, see how it works, and copy the subroutines or the ideas from the subroutines directly into your own Perl program. The original poster wanted a "Vanilla Perl" solution. Tie::File is a "Vanilla Perl" solution; DB_File isn't.

Hope this helps.