Thursday, September 29, 2005

Creating an Excel Reader utility class

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/creating_an_excel_reader_utility_class.htm]

Yesterday I created a Cell data structure to help read and store a single cell from Excel. Now let's use that to build an ExcelReader utility class.

Let's first think about our goals:

  • The Reader should be able to get specific values from Excel given the path to the workbook, and worksheet index.
  • In Excel, cells can have formulas that produce results. For example, the cell formula may be "= 1 + 1", but its result - what you see if you print the sheet - would be '2'. We want to be able to get both the value (formula) and text (displayed to user).
  • Reading the sheet will be slow, so we want a way to bundle the retrieval of ranges into a single call that gets a 2D array of object values. For example, given the upperLeft cell A1, and the lowerRight cell C2, return a 2x3 object array, for a total of six values. Such a feature would let us quickly read large portions of the sheet.

Disclaimer: I have only used this for personal pet projects. I have not performance tested it for production.

Given that, here's the code for our ExcelReader:

using System;
using Microsoft.Office.Core;
using System.Reflection;

namespace ExcelUtilities
{
    ///


    /// Summary description for ExcelReader.
    ///

    public class ExcelReader : IDisposable
    {
        public ExcelReader(string strWorkBookPath)
        {
            _app = new Excel.ApplicationClass();
            _book = null;
            _sheet = null;

            //speed up performance
            _app.Visible = false;
            _app.ScreenUpdating = false;
            _app.DisplayAlerts = false;

      if (!System.IO.Path.IsPathRooted(strWorkBookPath))
      {
        //root off current directory
        string strRoot = Assembly.GetExecutingAssembly().Location;
        strRoot = System.IO.Path.GetDirectoryName(strRoot);
        strWorkBookPath = strRoot + @"\" + strWorkBookPath;
      }

      if (!System.IO.File.Exists(strWorkBookPath))
        throw new System.ArgumentException("
The file path for the Excel book did not exist: '" + strWorkBookPath + "'.");

            _book = _app.Workbooks.Open(strWorkBookPath,
                Missing.Value, Missing.Value, Missing.Value,
                Missing.Value, Missing.Value, Missing.Value, Missing.Value,
                Missing.Value, Missing.Value, Missing.Value, Missing.Value,
                Missing.Value, Missing.Value, Missing.Value);
        }

        private Excel.ApplicationClass _app;
        private Excel.Workbook _book;   
        private Excel.Worksheet _sheet;
        private Excel.Range _range;

        private int _intCurrentSheet = 1;

        public int CurrentSheet
        {
            get
            {
                return _intCurrentSheet;
            }
            set
            {
                if (_intCurrentSheet < 1)
                    throw new ArgumentOutOfRangeException("
Sheet must be a positive number.");
                _intCurrentSheet = value;
            }
        }

        ///

        ///        Where cell is of the form "
A1"
        ///

        ///
        ///
        public object GetCell(Cell c)
        {
            return GetRange(c,c)[0,0];
        }

    public string GetCellText(Cell c)
    {
      _sheet = (Excel.Worksheet)_book.Worksheets[this.CurrentSheet];

      //get values
      _range = _sheet.get_Range(c.A1Reference,c.A1Reference);

      object o =  _range.Text;
     
      if (o == null)
        return String.Empty;
      else
        return o.ToString().Trim();
    }

        public object[,] GetRange(Cell cUpperLeft, Cell cLowerRight)
        {
            _sheet = (Excel.Worksheet)_book.Worksheets[this.CurrentSheet];

            //get values
            _range = _sheet.get_Range(cUpperLeft.A1Reference,cLowerRight.A1Reference);

            object[,] aaobj = null;
            if (_range.Count == 1)
            {
                aaobj = new object[1,1];
                aaobj[0,0] = _range.Value2;
            }
            else
                aaobj = (object[,])_range.Value2;

            return aaobj;
        }

        #region IDisposable Members

        public void Dispose()
        {
            //cleanup
            _range = null;
            _sheet = null;

            if (_book != null)
                _book.Close(false, Missing.Value, Missing.Value);
            _book = null;
            if (_app != null)
                _app.Quit();
            _app = null;
        }

        #endregion
    }
}

The constructor takes the necessary parameters to initialize the "connection" to Excel. Instance members then use this info to retrieve data.

Note that due to the heavy resources associated to read Excel, we want to implement IDisposable, such that callers can run it in a "using" context like so:

        [Test] public void GetCell_Empty()
        {
            using (ExcelReader er = new ExcelReader(_strPath))
            {
                er.CurrentSheet = 2;
                object o = er.GetCell(new Cell("X",7));
                Assert.IsNull(o);
            }
        }

We use our Cell object from the previous post to ensure that we're only requesting valid cells. If a user directly accessed the sheet, requesting cell '-1a$', it would be an error. Our Cell object can't even be created with these bad inputs, and hence relieves the ExcelReader from this validation burden. It's also quicker for the program to catch the bad input when trying to instantiate an in-memory Cell class than requesting an erroneous value from the Excel workbook.

One big question: How to test this? You could create a real Excel Workbook, include it as an embedded resource, and then have your tests hit that workbook. This would let you test all the boundary conditions, like:

  • Making sure that our methods distinguish between value (like the formula '=1 + 1') and text (its result, like '2')
  • The GetCell methods handle ranges of input
  • The ExcelReader handles both absolute and relative paths.

This series has provided us several tips to use .Net to enhance Excel functionality. We can call a .Net console app from Excel, passing in whatever command-line values we need, and then have that .Net console app read the Excel book to do complex functionality, such as creating an Html Report, sending data to the database, transforming the data to an Xml file, sending the data over an Xml Web Service, or anything else the .Net Framework lets us do.

No comments:

Post a Comment