perl - Leave only Nth text between two marker strings -
i working on mac os , looking elegant solution below problem. since purely text related thought perl best choice?
- i have data file on disk eg data.html (does not matter html)
- it contains chinese characters (so file utf8 encoded believe)
its structure this:
some top text text top styles text <h1>topic 1 text</h1> text applicable topic 1 formatting... <h1>topic 2 title</h1> text applicable topic 2...
i want write file each topic contains top text , styles. input os data.html output topic1.html, topic2.html...
assuming file simple , doesn't have other h1 tags, should work:
use strict; use warnings; use open qw(:std :encoding(utf8)); open $input, '<', 'data.html'; $content = join '', <$input>; close $input; @parts = split /<\/?h1>/, $content; $top_text_and_styles = shift @parts; $count = 0; while (my ($topic, $body) = splice @parts, 0, 2) { $topic_content = join "", $top_text_and_styles, $topic, $body; $count += 1; $output_name = "topic${count}.html"; open $output, '>', $output_name; print $output $topic_content; close $output; }
Comments
Post a Comment