regex - How to read numbers from extracted lines (deleting repeated numbers) -


i need convert parts of .txt file in format (first matching "schday")

<schday>   <name>school occup wd</name>   <type>fraction</type>   <hr index="0">0</hr>   <hr index="1">0</hr>   <hr index="2">0</hr>   <hr index="3">0</hr>   <hr index="4">0</hr>   <hr index="5">0</hr>   <hr index="6">0</hr>   <hr index="7">0.05</hr>   <hr index="8">0.75</hr>   .... 

to (values come first, , “steps” need 2 ends defined):

0.00, 0.00,  0.00, 6.00,    <- end of step  0.05, 7.00,  0.75, 8.00,  ... 

etc

this have far:

open (outfile, ">c:/begperl/parts/all1.txt")|| die "can't open it";  @files = glob ("*.txt");  (@files) {      open (infile, $_) || die "can't open infile";     @lines = <infile>;     %answer;     $regex = '<schday';     $idx (0..$#lines) {     if ($lines[$idx] =~ /$regex/) {         $ii (($idx + 3)..($idx + 26)){         {$answer{$ii} = ($lines[$ii]);}         }     }     foreach $key (sort keys %answer) { print outfile "$answer{$key}\n" }     } close (infile);} 

so have lines want. need extract numbers, including decimal points, , delete consecutive hours same values.

you document has xml structure. better off exploiting using proper xml parser. xml::twig allows isolate parts of xml document in interested. in case, want <hr> elements occur within <schday> elements:

my $parser = xml::twig->new(     twig_roots => { 'schday/hr' => \&do_print }, ); 

this tells parser invoke do_print sub each <hr> within <schday>. do_print called 2 arguments: parser instance created , element. use $element->att('index') access value of index attribute, , $attr->text text of attribute, , format , print them. here script:

#!/usr/bin/env perl  use strict; use warnings;  use xml::twig;  $parser = xml::twig->new(     twig_roots => { 'schday/hr' => \&do_print }, );  $parser->parse(\*data);  sub do_print {     $parser = shift;     $element = shift;      printf "%.02f,%.02f,\n",         $element->text,         $element->att('index'),     ;     $parser->purge;     return; }  __data__ <schday>   <name>school occup wd</name>   <type>fraction</type>   <hr index="0">0</hr>   <hr index="1">0</hr>   <hr index="2">0</hr>   <hr index="3">0</hr>   <hr index="4">0</hr>   <hr index="5">0</hr>   <hr index="6">0</hr>   <hr index="7">0.05</hr>   <hr index="8">0.75</hr> </schday> 

output:

0.00, 0.00, 0.00, 1.00, 0.00, 2.00, 0.00, 3.00, 0.00, 4.00, 0.00, 5.00, 0.00, 6.00, 0.05, 7.00, 0.75, 8.00,

as needs fixed code … here points hope write better perl:

open (outfile, ">c:/begperl/parts/all1.txt")|| die "can't open it"; 
  • don't use bareword filehandles such outfile. package variables means subject action @ distance. instead, declare lexical variable in smallest applicable scope in:

     $filename = 'c:/begperl/parts/all1.txt';   open $outfile, '>', $filename       or die "failed open '$filename': $!"; 
  • do name loop variable in for loops:

     $input_file (@files) {       open $input, '<', $input_file           or die "failed open '$input_file': $!"; 
  • don't slurp when line-by-line processing do. is, don't use @lines = <infile>; read of lines of file in 1 go.

  • don't use magical constants such 3 , 26 below. instead, give them names. example:

           use const::fast;        const $hr_begin => 3;        const $hr_end   => 26; 

but, still fragile. if number of lines of <hr> elements changes? after all, xml document, , have next batch with

<hr index="5">    0.00 </hr> 

what do then?


Comments

Popular posts from this blog

assembly - 8086 TASM: Illegal Indexing Mode -

Java, LWJGL, OpenGL 1.1, decoding BufferedImage to Bytebuffer and binding to OpenGL across classes -

javascript - addthis share facebook and google+ url -