Date: Tue, 2 Oct 2001 18:18:59 +0100 From: "S Warhurst" Subject: Efficient code? Message-Id: <9pcsu3$k10@newton.cc.rl.ac.uk> Sorry if you guys are getting fed up of helping me.. ignore me if you are ;) 1) If one wants to go through each line of a 100,000 line text file, looking for one of 50 different strings, is the quickest way to do it (in terms of processor time) to use 50 IF... ELSIF... commands? (some of these do require regexp matching.. eg: "I like cornflakes for breakfast" where cornflakes could be any breakfast cereal). ----------------------------- 2) If one has an array of email addresses like: @emails = ('john@here.com', 'tim@there.com', 'bill@anisp.com', 'tim@there.com', 'john@here.com') and wanted to make them unique I would use the line: @emails = do {my %h; grep {!$h {$_} ++} @emails} Is there an easy way to modify this line so that it gives a count of email addresses aswell, resulting in an array like this: @emails = (['bill@anisp.com', '1'], ['john@here.com', '2'], ['tim@there.com', '2']); ---------------------------- 3) If I have an array like the following (there would actually be several hundred of rows): @array = (['bath.ac.uk', '46'], ['blackpool.ac.uk', '22'], ['hull.ac.uk', '13'], ['sussex.ac.uk', '36'], ['hull.ac.uk', '31'], ['blackpool.ac.uk', '2']); and I want to find the unique domains in column 1 and total the values in column 2 so it looks like this: @array = (['bath.ac.uk', '46'], ['blackpool.ac.uk', '24'], ['hull.ac.uk', '44'], ['sussex.ac.uk', '36']); The way I have written it is: @array = sort { $a->[0] <=> $b->[0] } @array; $c = 0; for($r = 0;$r <= $#array;$r++) { if($array[$r][0] ne $array[$r+1][0]) { $domains[$c][0] = $array[$r][0]; $domains[$c][1] += $array[$r][1]; $c++; } else { $domains[$c][0] = $array[$r][0]; $domains[$c][1] += $array[$r][1]; } } So, first of all it sorts the array by domain name. Then, it compares the domain name in the current row with the one in the next row. If it is different then it just places the domain name and associated value in a new array (@domains) and increments $c which keep a count of the position to enter data in the @domain array. If it is the same it doesn't increment $c so that when it comes to the next line it adds the value in column 1 to the existing one. It repeats this until it's finished. Is this the most efficient way to do this? Thanks ---------¦ Bigus @ work ¦----------