Using Spreadsheet::ParseExcel

In my own experience as a directory administrator only about 25% of my time is spent managing the servers and software that support the environment.

The lion’s share of the effort in my job is managing the import and export of data through the directory environment, in most cases working with groups outside the IT department across the globe.

Recently we had some weird results when a csv file with employee data from our offices in Brazil was imported into the directory.

The short version of the story is that in several attributes (surname, locality, state), some of the characters used didn’t quite match up with a known character set. Our assumption is that the data was originally created in Excel and then dumped to CSV format, probably using a localized copy of the program and the Portugese (Brazil) keyboard template.

Among other things this got me going in a direction I’d avoided for a long time — learning how to parse Excel spreadsheets with Perl.

My problem was that the doc for Spreadsheet::ParseExcel was opaque to me, as are most man pages on UNIX systems. Maybe it’s just that I’m dense, or don’t have enough patience. Anyway, after much experimentation, I finally “cracked the code” and was able to come up with the following little script to convert an Excel spreadsheet into a delimited text file.

#!/usr/bin/perl
# parse_excel.pl Read an Excel spreadsheet and print its contents to
# a bar delimited file
# Created 10/19/04 by P Lembo
	
use strict;
use Spreadsheet::ParseExcel;
	
my $HOME = "/u01/home/ldapcon";
my $xlsFile = "$HOME/data/admin/BigcorpOIDRegistry.xls";
my $txtFile = "$HOME/data/admin/oid-registry.txt";
	
my $wkBook =  Spreadsheet::ParseExcel::Workbook->Parse($xlsFile);
	
open FH, ">$txtFile" or die $!;
	
my $nuSheets = $wkBook->{SheetCount};
print "There are $nuSheets sheets in this workbookn";
	
# Iterate through worksheets
for ( my $iSheet=0; $iSheet {SheetCount}; $iSheet++ ) {
    my $wkSheet = $wkBook->Worksheet($iSheet);
    my $wksName = $wkSheet->{Name};
	
    print “Name of worksheet is $wksNamen”;
	
    # Now iterate through a worksheet by row
    for (my $iRow=0; $iRow {MaxRow}; $iRow++ ) {
	my $colNu = $wkSheet->{MaxCol};
	
    # Loop through the columns in the row
	for (my $iCol=0; $iCol {MaxCol}; $iCol++ ) {
	
	# Pull out the cell value
	my $wksCell = $wkSheet->{Cells}[$iRow][$iCol];
	
	# Print each value
  	print FH $wksCell->Value, if ($wksCell);
	
	# Print a delimiter after each value except the last in the row
	print FH “|”, if ($iCol < ($colNu - 1));
	
        }
	
     # Print a newline after each row
     print FH "n";
	
   }
}
	
print FH "n":
	
close FH;
	
__END__;