regex - How to read numbers from extracted lines (deleting repeated numbers) -
i need convert parts of .txt file in format (first matching "schday")
<schday> <name>school occup wd</name> <type>fraction</type> <hr index="0">0</hr> <hr index="1">0</hr> <hr index="2">0</hr> <hr index="3">0</hr> <hr index="4">0</hr> <hr index="5">0</hr> <hr index="6">0</hr> <hr index="7">0.05</hr> <hr index="8">0.75</hr> .... to (values come first, , “steps” need 2 ends defined):
0.00, 0.00, 0.00, 6.00, <- end of step 0.05, 7.00, 0.75, 8.00, ... etc
this have far:
open (outfile, ">c:/begperl/parts/all1.txt")|| die "can't open it"; @files = glob ("*.txt"); (@files) { open (infile, $_) || die "can't open infile"; @lines = <infile>; %answer; $regex = '<schday'; $idx (0..$#lines) { if ($lines[$idx] =~ /$regex/) { $ii (($idx + 3)..($idx + 26)){ {$answer{$ii} = ($lines[$ii]);} } } foreach $key (sort keys %answer) { print outfile "$answer{$key}\n" } } close (infile);} so have lines want. need extract numbers, including decimal points, , delete consecutive hours same values.
you document has xml structure. better off exploiting using proper xml parser. xml::twig allows isolate parts of xml document in interested. in case, want <hr> elements occur within <schday> elements:
my $parser = xml::twig->new( twig_roots => { 'schday/hr' => \&do_print }, ); this tells parser invoke do_print sub each <hr> within <schday>. do_print called 2 arguments: parser instance created , element. use $element->att('index') access value of index attribute, , $attr->text text of attribute, , format , print them. here script:
#!/usr/bin/env perl use strict; use warnings; use xml::twig; $parser = xml::twig->new( twig_roots => { 'schday/hr' => \&do_print }, ); $parser->parse(\*data); sub do_print { $parser = shift; $element = shift; printf "%.02f,%.02f,\n", $element->text, $element->att('index'), ; $parser->purge; return; } __data__ <schday> <name>school occup wd</name> <type>fraction</type> <hr index="0">0</hr> <hr index="1">0</hr> <hr index="2">0</hr> <hr index="3">0</hr> <hr index="4">0</hr> <hr index="5">0</hr> <hr index="6">0</hr> <hr index="7">0.05</hr> <hr index="8">0.75</hr> </schday> output:
0.00, 0.00, 0.00, 1.00, 0.00, 2.00, 0.00, 3.00, 0.00, 4.00, 0.00, 5.00, 0.00, 6.00, 0.05, 7.00, 0.75, 8.00,
as needs fixed code … here points hope write better perl:
open (outfile, ">c:/begperl/parts/all1.txt")|| die "can't open it"; don't use bareword filehandles such
outfile. package variables means subject action @ distance. instead, declare lexical variable in smallest applicable scope in:$filename = 'c:/begperl/parts/all1.txt'; open $outfile, '>', $filename or die "failed open '$filename': $!";do name loop variable in
forloops:$input_file (@files) { open $input, '<', $input_file or die "failed open '$input_file': $!";don't slurp when line-by-line processing do. is, don't use
@lines = <infile>;read of lines of file in 1 go.don't use magical constants such
3,26below. instead, give them names. example:use const::fast; const $hr_begin => 3; const $hr_end => 26;
but, still fragile. if number of lines of <hr> elements changes? after all, xml document, , have next batch with
<hr index="5"> 0.00 </hr> what do then?
Comments
Post a Comment