Using a regular expression with nested for loops, using Perl -

June 15, 2010

i have 2 arrays:

@file_list holds list of files in directory, and
@name_list holds names.

for example, these arrays contain

@file_list = ('bob_car', 'bob_house', 'bob_work', 'fred_car', 'fred_house', 'fred_work', ...); @name_list = ('bob', 'fred', ...);

(the real data not simple).

my goal compare each file every name , see if match. match if file string starts name.

i use these matches sort files new directories, based on corresponding name.

here code:

for ( $i = 0; $i < scalar @file_list ; $i++ )    {     ( $j = 0; $j < @name_list ; $j++ )         {          if ( $file_list[ $i ] =~ m/^$name_list[ $j ]/ )             {              print "$file_list[ $i ] goes $name_list[ $j ]\n";              }           else             {              print "no match\n";                }         }    }

however, don't matches. i've tested individual loops , working. else, there off regex?

about how arrays made:

for @name_list, file containing names organized in seemingly random way, because of how used else. names in file on several different lines, lots of blank lines in between , lots of blank entries within lines. names can appear more once.

i used following code make @name_list:

while (my $line = <$originalfile>)      {      chomp $line;      @current_line = split( "\t", $line );       ( $i = 0; $i < scalar @current_line ; $i ++ )          {           if ( $current_line[ $i ] =~ m/^\s*$/ )              {               # print "$current_line[$i] blank\n";               }           else               {               push( @raw_name_list, $current_line[ $i ] );                 }          } # end of     } # while  # collect list without repeat instances of same name  %unique = (); foreach $name (@raw_name_list)     {      $unique{$name} ++;     } @name_list = keys %unique;   foreach $name ( @name_list )    {     # print "$name\n";     chomp $name;       unless(mkdir $name, 0700)          {          die "unable create directory called $name\n";         }    }

the array @file_list made using:

opendir(dir, $argv[1]);                              @file_list = grep ! /^\./, readdir dir; closedir(dir);  # print @file_list;

@amon, here did test loops , regex:

file: $file (@transposed_files) {   print "$file\n";   $name (@transposedunique) {     print "i see $name\n";     if ($file =~ /^\q$name\e/) {       print "$file goes $name\n";       next file;     }   }   #print "no match $file\n"; }

oh, , transposed arrays, print outfile separate rows.

short version: building name array wrong. @ line:

$unique{name} ++;

you incrementing name entry of hash. wanted $name variable.

the longer version

on english, , foreach-loops

your code bit unperlish , looks more c perl. perl closer english might think. original wording of question:

take first element @file_list , compare each element in @name_list

you wrote as

for (my $i = 0; $i < @file_list; $i++) {   (my $j = 0; $j < @name_list; $j++) {     ...; # compare $file_list[$i] $name_list[$j]   } }

i'd rather do

for $file (@file_list) {   $name (@name_list) {     ...; # compare $file $name   } }

and save myself hassle of array subscripting.

building correct regexes

your code contains following test:

$file_list[ $i ] =~ m/^$name_list[ $j ]/

this not think if $name_list[$j] contains special characters (, ., +. can match literal contents of variable enclosing in \q ... \e. make code

$file =~ /^\q$name\e/

(if used variant of loop).

you go nifty route , compare leading substring directly:

$name eq substr $file, 0, length($name)

this expresses same condition.

on loop control

i make 2 assumptions:

you interested in first matching name file
you want print no match message if no name found

perl allows break out of arbitrary loops, or restart current iteration, or go directly next iteration, without using flags, in other languages. have label our loops label: (...).

so once have match, can start our search next file. also, want print no match if left inner loop without going next file. code it:

file: $file (@file_list) {   $name (@name_list) {     if ($file =~ /^\q$name\e/) {       print "$file goes $name\n";       next file;     }   }   print "no match $file\n"; }

the zen of negation

in file parsing code, express condition

if ($field =~ /^\s*$/) { } else {   # stuff if field not consist of   # 0 or more whitespace characters }

that description far complex. how about

if ($field =~ /\s/) {   # stuff if field contains non-whitespace character. }

the same condition, simpler, , more efficient.

simplify parse

in short, file parsing code can condensed to

my %uniq; while (<$originalfile>) {   chomp;   $uniq{$_} = undef grep /\s/, split /\t/; } @name_list = sort { length($b) <=> length($a) } keys %uniq;

the split function takes regex first argument, , split on $_ if no other string specified. returns list of fields.

the grep function takes condition , list, , return elements of list match condition. current element in $_, regexes match default. explanation of regex, see above.

note: still allows fields contain whitespace, in leading position. split on whitespace, can give split special argument of string containing single space: split ' '. make grep unneccessary.

the for loop can used statement modifier, i.e. expr list. current element in $_. assign $_ entry in our %uniq hash (which initialized empty hash). number, undef works well.

the keys returned in seemingly random order. multiple names match file, want select 1 match, have match specific name first. therefore, sort names after length in descending order.

Search This Blog

Detect