Date: Tue, 2 Oct 2001 18:18:59 +0100
From: "S Warhurst"
Subject: Efficient code?
Message-Id: <9pcsu3$k10@newton.cc.rl.ac.uk>
Sorry if you guys are getting fed up of helping me.. ignore me if you are ;)
1) If one wants to go through each line of a 100,000 line text file, looking
for one of 50 different strings, is the quickest way to do it (in terms of
processor time) to use 50 IF... ELSIF... commands? (some of these do require
regexp matching.. eg: "I like cornflakes for breakfast" where cornflakes
could be any breakfast cereal).
-----------------------------
2) If one has an array of email addresses like:
@emails = ('john@here.com', 'tim@there.com', 'bill@anisp.com',
'tim@there.com', 'john@here.com')
and wanted to make them unique I would use the line:
@emails = do {my %h; grep {!$h {$_} ++} @emails}
Is there an easy way to modify this line so that it gives a count of email
addresses aswell, resulting in an array like this:
@emails = (['bill@anisp.com', '1'],
['john@here.com', '2'],
['tim@there.com', '2']);
----------------------------
3) If I have an array like the following (there would actually be several
hundred of rows):
@array = (['bath.ac.uk', '46'],
['blackpool.ac.uk', '22'],
['hull.ac.uk', '13'],
['sussex.ac.uk', '36'],
['hull.ac.uk', '31'],
['blackpool.ac.uk', '2']);
and I want to find the unique domains in column 1 and total the values in
column 2 so it looks like this:
@array = (['bath.ac.uk', '46'],
['blackpool.ac.uk', '24'],
['hull.ac.uk', '44'],
['sussex.ac.uk', '36']);
The way I have written it is:
@array = sort { $a->[0] <=> $b->[0] } @array;
$c = 0;
for($r = 0;$r <= $#array;$r++)
{
if($array[$r][0] ne $array[$r+1][0])
{
$domains[$c][0] = $array[$r][0];
$domains[$c][1] += $array[$r][1];
$c++;
}
else
{
$domains[$c][0] = $array[$r][0];
$domains[$c][1] += $array[$r][1];
}
}
So, first of all it sorts the array by domain name.
Then, it compares the domain name in the current row with the one in the
next row.
If it is different then it just places the domain name and associated value
in a new array (@domains) and increments $c which keep a count of the
position to enter data in the @domain array.
If it is the same it doesn't increment $c so that when it comes to the next
line it adds the value in column 1 to the existing one. It repeats this
until it's finished.
Is this the most efficient way to do this?
Thanks
---------¦
Bigus @ work
¦----------