Date: Tue, 17 Jul 2001 00:58:37 -0400
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Array Sorting-Looping-Number Formatting
Message-Id: <3B53C5FD.C1B99216@earthlink.net>

Eric wrote:
> 
> Regarding the following data file parsing and writing to a new file
> script (partial code):
> 
> <snip>  #Routine stuff
> my $line;
> while ( defined( $line = <INFILE> ) )
> 
> {
>           next unless $line =~ /--.*STATUS AVAILABLE/;

Does this expression always identify the start of your records?

If so, then try code like the following:
foreach my $record ( do { local $/;
	split /(?=--.*STATUS AVAILABLE)/, <INFILE> } ) {

Each $record should be an entire record.

	my @lines = split /\n/, $record;
	my ($status) = unpack 'x42 A10', shift @lines;
	my %record = map /^\s*(.*?):(.*)/, @lines;
	my ($addr, $price) = # assuming it was " LOCATION:"
		unpack 'A25 x33 A11', $record{"LOCATION"};
	my ($bd,$sf) = # assuming it was " ZIP:"
		unpack 'x46 A3 x14 A4', $record{"ZIP"};

This eliminates any dependency on order, and uses the strings which
prefix the fields as keys.  The presence or absence of "NEW FIN" should
now have no effect whatsoever.  You'll have to change your unpack
templates, though.

[snip]
>         $index = $price/$sf; # calculate the Index per sf value
>       writedatafile( $status, $addr, $price, $bd, $sf, $ba, $ag, $lot,
> $index );
> }
> ### SUBROUTINE
> sub writedatafile
> {
> my ($status, $addr, $price, $bd, $sf, $ba, $ag, $lot, $index ) = @_;
> open (FILE ,">>$datafile");               #open output file for append

Always, yes, always, check the return value of open.

Also, there's no need to open for append for each and every read, you
should only need to open the file once.

> print FILE "$R,$status,\"$addr\",$price,$sf,$bd,$ba,$lot,$ag,$index\n";


I would do this as:
my ( $file, $counter );
END { $file && (close($file) or die "Couldn't close $datafile: $!\n") }
sub writetofile {
    $file or open ($file ,">>", $datafile)
        or die "Couldn't open $datafile for append: $!\n";
    $_[1] = qq{"$_[1]"};
    ++$counter;
    print $file join(",", $counter, @_[0..2,4,3,5,7,6,8]), "\n";
}


> print to file
> $R++;                                               #increment record counter
> close (FILE); #close output file

Again, opening and closing the file for each record is innefficient.

> }


> ------------
> Question #1:
> 
> On some of the data file records, an unwanted Line #3 sometimes
> appears, which then shifts all the other lines down & messes up the
> rest of the parsing (the ZIP line becomes Line #4, etc.) Example:
> 
> Input Data File Line #3:  NEW FIN:blahblah #THIS LINE IS ONLY INCLUDED
> ON SOME RECORDS
> Input Data File Line #4: ZIP:blahblah #THIS IS THE ONLY DESIRED LINE
> BEFORE PROCEEDING
> 
> In trying to loop past the unwanted line, I tried:  next unless $line
> =~ /^ ZIP/; Yes...there is a leading space before ZIP... but
> everything is still whacked into die mode.  I axed out the die line
> and at least I know that doesn't work either.  Will something?

Answered... use the prefixes as field names, and ignore the actual order
of the fields.

> Question #2:
> 
> $index = $price/$sf yields a number, but I need to have a non-decimal
> place number rounded up (or down) and to print on the same .csv record
> line.  I just can't figure out how to combine the printf function in
> the same line with the print FILE above, or is that just not possible
> or the right way to approach it?

Use printf to put the formatted string in the file, or sprintf to create
a formated string and return it, so you can put it in the file yourself.

> Question #3:
> 
> I've read about the foreach & keys array functions, but am lost here
> out how to integrate things to get the parsed records into an
> appropriate array so that I can then sort and print them to a file by
> (1st Field = $status, Alphabetic, in Ascending order) and then (2nd
> Field = $index, in Descending/Reverse Numeric order).  Having a blank
> line print to separate the different $status entries is most
> desirable.  Is this possible or must I still do this part manually
> after importing the .csv file into Excel?

It's possible, but you have to store the intermediate results in an
array, not send them straight to the file.

So:

my @records;
foreach my $record ( do { local $/;
	split /(?=--.*STATUS AVAILABLE)/, <INFILE> } )
{
	my @lines = split /\n/, $record;
	my ($status) = unpack 'x42 A10', shift @lines;
	my %in_record = map /^\s*(.*?):(.*)/, @lines;
	my %out_record;
	@out_record{addr,price} =
		unpack 'A25 x33 A11', $in_record{"LOCATION"};
	@out_record{bd,sf} =
		unpack 'x46 A3 x14 A4', $in_record{"ZIP"};
	...
	$out_record{index} = $out_record{price} / $out_record{sv};
	push @records, \%out_record;
}

@records = sort {
	$a->{status} cmp $b->{status} || #status, ascending alpha
	$b->{index} <=> $a->{index} #index, descending numeric.
} @records;

open (FILE ,">>", $datafile) or die "Couldn't open $datafile: $!\n";
my @fields = qw(status addr price sf bd ba lot ag index);
for my $r (0 .. $#records) {
	my $record = $records[$r];
	$$record{addr} = qq{"$$record{addr}"};
	$$record{index} = sprintf "%.2f", $$record{index};
	print FILE join(",", $r, @$record{@fields}), "\n";
}
close FILE or die "Couldn't close $datafile: $!\n";

-- 
The longer a man is wrong, the surer he is that he's right.