Using a regular expression with nested for loops, using Perl -
i have 2 arrays:
@file_list
holds list of files in directory, and@name_list
holds names.
for example, these arrays contain
@file_list = ('bob_car', 'bob_house', 'bob_work', 'fred_car', 'fred_house', 'fred_work', ...); @name_list = ('bob', 'fred', ...);
(the real data not simple).
my goal compare each file every name , see if match. match if file string starts name.
i use these matches sort files new directories, based on corresponding name.
here code:
for ( $i = 0; $i < scalar @file_list ; $i++ ) { ( $j = 0; $j < @name_list ; $j++ ) { if ( $file_list[ $i ] =~ m/^$name_list[ $j ]/ ) { print "$file_list[ $i ] goes $name_list[ $j ]\n"; } else { print "no match\n"; } } }
however, don't matches. i've tested individual loops , working. else, there off regex?
about how arrays made:
for @name_list
, file containing names organized in seemingly random way, because of how used else. names in file on several different lines, lots of blank lines in between , lots of blank entries within lines. names can appear more once.
i used following code make @name_list
:
while (my $line = <$originalfile>) { chomp $line; @current_line = split( "\t", $line ); ( $i = 0; $i < scalar @current_line ; $i ++ ) { if ( $current_line[ $i ] =~ m/^\s*$/ ) { # print "$current_line[$i] blank\n"; } else { push( @raw_name_list, $current_line[ $i ] ); } } # end of } # while # collect list without repeat instances of same name %unique = (); foreach $name (@raw_name_list) { $unique{$name} ++; } @name_list = keys %unique; foreach $name ( @name_list ) { # print "$name\n"; chomp $name; unless(mkdir $name, 0700) { die "unable create directory called $name\n"; } }
the array @file_list
made using:
opendir(dir, $argv[1]); @file_list = grep ! /^\./, readdir dir; closedir(dir); # print @file_list;
@amon, here did test loops , regex:
file: $file (@transposed_files) { print "$file\n"; $name (@transposedunique) { print "i see $name\n"; if ($file =~ /^\q$name\e/) { print "$file goes $name\n"; next file; } } #print "no match $file\n"; }
oh, , transposed arrays, print outfile separate rows.
short version: building name array wrong. @ line:
$unique{name} ++;
you incrementing name
entry of hash. wanted $name
variable.
the longer version
on english, , foreach-loops
your code bit unperlish , looks more c perl. perl closer english might think. original wording of question:
take first element
@file_list
, compare each element in@name_list
you wrote as
for (my $i = 0; $i < @file_list; $i++) { (my $j = 0; $j < @name_list; $j++) { ...; # compare $file_list[$i] $name_list[$j] } }
i'd rather do
for $file (@file_list) { $name (@name_list) { ...; # compare $file $name } }
and save myself hassle of array subscripting.
building correct regexes
your code contains following test:
$file_list[ $i ] =~ m/^$name_list[ $j ]/
this not think if $name_list[$j]
contains special characters (
, .
, +
. can match literal contents of variable enclosing in \q ... \e
. make code
$file =~ /^\q$name\e/
(if used variant of loop).
you go nifty route , compare leading substring directly:
$name eq substr $file, 0, length($name)
this expresses same condition.
on loop control
i make 2 assumptions:
- you interested in first matching name file
- you want print
no match
message if no name found
perl allows break out of arbitrary loops, or restart current iteration, or go directly next iteration, without using flags, in other languages. have label our loops label: (...)
.
so once have match, can start our search next file. also, want print no match
if left inner loop without going next file. code it:
file: $file (@file_list) { $name (@name_list) { if ($file =~ /^\q$name\e/) { print "$file goes $name\n"; next file; } } print "no match $file\n"; }
the zen of negation
in file parsing code, express condition
if ($field =~ /^\s*$/) { } else { # stuff if field not consist of # 0 or more whitespace characters }
that description far complex. how about
if ($field =~ /\s/) { # stuff if field contains non-whitespace character. }
the same condition, simpler, , more efficient.
simplify parse
in short, file parsing code can condensed to
my %uniq; while (<$originalfile>) { chomp; $uniq{$_} = undef grep /\s/, split /\t/; } @name_list = sort { length($b) <=> length($a) } keys %uniq;
the split
function takes regex first argument, , split on $_
if no other string specified. returns list of fields.
the grep
function takes condition , list, , return elements of list match condition. current element in $_
, regexes match default. explanation of regex, see above.
note: still allows fields contain whitespace, in leading position. split on whitespace, can give split
special argument of string containing single space: split ' '
. make grep
unneccessary.
the for
loop can used statement modifier, i.e. expr list
. current element in $_
. assign $_
entry in our %uniq
hash (which initialized empty hash). number, undef
works well.
the keys returned in seemingly random order. multiple names match file, want select 1 match, have match specific name first. therefore, sort names after length in descending order.
Comments
Post a Comment