Projekt: Hesla Jednoty bratrské/2010/los10odt-make

Skripty pro extraxci textu Hesel z ODT (pro rok 2010)


# Converts file Losung.odt to plain text:
# At first, we use the Open-Office word editor to save
# the original MS-Word Losung.doc file to the ODT file.
# The ODT file is the zipped archive consisted of some files.
# We will use the file 'content.xml' only.

yy=10 #year

echo Unzipping the ODT file...
unzip ../w01-Losungen/Losungen2010.odt content.xml

echo Inserting new-lines before every XML-tag...
perl -pe "s/</\n</g" content.xml > los${yy}-01.xml

echo Inserting style-names at the beginning of every line...
perl -w los${yy}-01.xml > los${yy}-02.xml

echo Stripping all xml tags...

Výše uvedený skriptík volá:

#! /usr/bin/perl -w
# perl -w cont_nl_sty.xml > los07-10.txt

# strip down all tags

while (<>) {
    next if /^$/;

# úprava Losung 2010 z ODT XML tak, že přidá styly do složených závorek

while(<>) {
    if(/text:style-name="(.*)">/) {
        print "$`";
        print "$&";
        print "{$1}";
        print "$'";
    else {print;}