czwartek, 20 grudnia 2012

Perl encoding problem

SW asked me to augment a Perl script that originally processes ISO-8859-2 encoded text (TeX) files only by adding UTF-8 and CP1250 (one byte MS Windows encoding for Central Europe) encodings as well.

I made up it as follows (not sure if correct):


use Getopt::Long;
my $coding = 'utf8'; my $showhelp= '' ;
GetOptions( "coding=s" => \$coding, "help|\?" => \$showhelp,) ;
if ( $showhelp ) { print "*** $0 [-coding=[cp1250|iso88592|utf-8]] file1 file2...\n" ;
exit 1; }

if ( $coding =~ /cp1250/ ) { $coding='cp1250'; use open ':encoding(cp1250)'; }
elsif ( $coding =~ /iso8859\-?2/ ) { $coding='iso-8859-2'; use open ':encoding(iso-8859-2)'; }
elsif ( $coding =~ /utf\-?8/ ) { $coding='UTF-8'; use open ':encoding(UTF-8)'; }
else { die "*** Unknown coding: $coding\n"; exit 1; }

print STDERR "*** Coding: $coding\n";
## rest of the script omitted ....

I reencoded the script from original ISO-8859-2 to UTF-8 as well with iconv, so all strings are UTF-8 encoded now.

Brak komentarzy:

Prześlij komentarz