Thursday, September 29, 2005

Creating an Excel Reader utility class

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/creating_an_excel_reader_utility_class.htm]

Yesterday I created a Cell data structure to help read and store a single cell from Excel. Now let's use that to build an ExcelReader utility class.

Let's first think about our goals:

  • The Reader should be able to get specific values from Excel given the path to the workbook, and worksheet index.
  • In Excel, cells can have formulas that produce results. For example, the cell formula may be "= 1 + 1", but its result - what you see if you print the sheet - would be '2'. We want to be able to get both the value (formula) and text (displayed to user).
  • Reading the sheet will be slow, so we want a way to bundle the retrieval of ranges into a single call that gets a 2D array of object values. For example, given the upperLeft cell A1, and the lowerRight cell C2, return a 2x3 object array, for a total of six values. Such a feature would let us quickly read large portions of the sheet.

Disclaimer: I have only used this for personal pet projects. I have not performance tested it for production.

Given that, here's the code for our ExcelReader:

using System;
using Microsoft.Office.Core;
using System.Reflection;

namespace ExcelUtilities
{
    ///


    /// Summary description for ExcelReader.
    ///

    public class ExcelReader : IDisposable
    {
        public ExcelReader(string strWorkBookPath)
        {
            _app = new Excel.ApplicationClass();
            _book = null;
            _sheet = null;

            //speed up performance
            _app.Visible = false;
            _app.ScreenUpdating = false;
            _app.DisplayAlerts = false;

      if (!System.IO.Path.IsPathRooted(strWorkBookPath))
      {
        //root off current directory
        string strRoot = Assembly.GetExecutingAssembly().Location;
        strRoot = System.IO.Path.GetDirectoryName(strRoot);
        strWorkBookPath = strRoot + @"\" + strWorkBookPath;
      }

      if (!System.IO.File.Exists(strWorkBookPath))
        throw new System.ArgumentException("
The file path for the Excel book did not exist: '" + strWorkBookPath + "'.");

            _book = _app.Workbooks.Open(strWorkBookPath,
                Missing.Value, Missing.Value, Missing.Value,
                Missing.Value, Missing.Value, Missing.Value, Missing.Value,
                Missing.Value, Missing.Value, Missing.Value, Missing.Value,
                Missing.Value, Missing.Value, Missing.Value);
        }

        private Excel.ApplicationClass _app;
        private Excel.Workbook _book;   
        private Excel.Worksheet _sheet;
        private Excel.Range _range;

        private int _intCurrentSheet = 1;

        public int CurrentSheet
        {
            get
            {
                return _intCurrentSheet;
            }
            set
            {
                if (_intCurrentSheet < 1)
                    throw new ArgumentOutOfRangeException("
Sheet must be a positive number.");
                _intCurrentSheet = value;
            }
        }

        ///

        ///        Where cell is of the form "
A1"
        ///

        ///
        ///
        public object GetCell(Cell c)
        {
            return GetRange(c,c)[0,0];
        }

    public string GetCellText(Cell c)
    {
      _sheet = (Excel.Worksheet)_book.Worksheets[this.CurrentSheet];

      //get values
      _range = _sheet.get_Range(c.A1Reference,c.A1Reference);

      object o =  _range.Text;
     
      if (o == null)
        return String.Empty;
      else
        return o.ToString().Trim();
    }

        public object[,] GetRange(Cell cUpperLeft, Cell cLowerRight)
        {
            _sheet = (Excel.Worksheet)_book.Worksheets[this.CurrentSheet];

            //get values
            _range = _sheet.get_Range(cUpperLeft.A1Reference,cLowerRight.A1Reference);

            object[,] aaobj = null;
            if (_range.Count == 1)
            {
                aaobj = new object[1,1];
                aaobj[0,0] = _range.Value2;
            }
            else
                aaobj = (object[,])_range.Value2;

            return aaobj;
        }

        #region IDisposable Members

        public void Dispose()
        {
            //cleanup
            _range = null;
            _sheet = null;

            if (_book != null)
                _book.Close(false, Missing.Value, Missing.Value);
            _book = null;
            if (_app != null)
                _app.Quit();
            _app = null;
        }

        #endregion
    }
}

The constructor takes the necessary parameters to initialize the "connection" to Excel. Instance members then use this info to retrieve data.

Note that due to the heavy resources associated to read Excel, we want to implement IDisposable, such that callers can run it in a "using" context like so:

        [Test] public void GetCell_Empty()
        {
            using (ExcelReader er = new ExcelReader(_strPath))
            {
                er.CurrentSheet = 2;
                object o = er.GetCell(new Cell("X",7));
                Assert.IsNull(o);
            }
        }

We use our Cell object from the previous post to ensure that we're only requesting valid cells. If a user directly accessed the sheet, requesting cell '-1a$', it would be an error. Our Cell object can't even be created with these bad inputs, and hence relieves the ExcelReader from this validation burden. It's also quicker for the program to catch the bad input when trying to instantiate an in-memory Cell class than requesting an erroneous value from the Excel workbook.

One big question: How to test this? You could create a real Excel Workbook, include it as an embedded resource, and then have your tests hit that workbook. This would let you test all the boundary conditions, like:

  • Making sure that our methods distinguish between value (like the formula '=1 + 1') and text (its result, like '2')
  • The GetCell methods handle ranges of input
  • The ExcelReader handles both absolute and relative paths.

This series has provided us several tips to use .Net to enhance Excel functionality. We can call a .Net console app from Excel, passing in whatever command-line values we need, and then have that .Net console app read the Excel book to do complex functionality, such as creating an Html Report, sending data to the database, transforming the data to an Xml file, sending the data over an Xml Web Service, or anything else the .Net Framework lets us do.

Wednesday, September 28, 2005

Reading Excel from .Net: Creating a Cell structure

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/reading_excel_from_net_creating_a_cell_structure.htm]

I've been blogging a series on enhancing Excel with .Net. One way to do this is to have some process in Excel kick off a .Net Console app, and than have the .Net app do all the difficult programming (where you have the support of the .Net Framework) as opposed to VBA. In order to do this, we need to be able to read an Excel sheet from .Net.

Previous posts in this series were:

This post will start showing how to read Excel from .Net. This is pretty standard. If you google it you'll see a lot of variations. However, many "solutions" are not very reusable. We'd like to create a reusable ExcelReader utility class. This will be much easier if we first build a "Cell" class. This Class should have:

  • Constructors to take a row and column in both RC (int-int) and A1 (letter-int)  notation.
  • Properties for the row and column (even if you use the A1 notation). So if you enter the column 'C', it will return 3.
  • A Property to always return the A1Reference. So if you enter (2,2), it should return "B2".

This post will show how to build and test this cell. The next post will apply it to our ExcelReader.
 

Source Code:

using System;

namespace ExcelUtilities
{
    public struct Cell 
    {   
        #region Constructors

        public Cell(int intRow, int intCol)
        {
            if (intRow < 1)
                throw new ArgumentException("Row must be greater than 0");
            if (intCol < 1)
                throw new ArgumentException("Column must be greater than 0");
            _intRow = intRow;
            _intCol = intCol;
            _strA1Reference = ConvertIntToStringColumn(intCol) + intRow;
        }

        public Cell(string strCol, int intRow)
        {
      if (strCol == null)
        throw new System.ArgumentNullException("Column can't be null");
      if (intRow < 1)
        throw new ArgumentException("Row must be greater than 0");

            _intRow = intRow;
            _intCol = ConvertStringToIntColumn(strCol);
            if (_intCol > 256)
                throw new ArgumentException("Column cannot be greater than IV (256). Value was '" + strCol + "'.");

            _strA1Reference = strCol.ToUpper() + intRow.ToString();
        }  
       
        #endregion

        #region Implementation

        private static int ConvertStringToIntColumn(string strCol)
        {
            strCol = strCol.ToUpper();
            //A --> 1, BD --> (26+4), IV     
            //Can only have max of two characters.     
            char[] ach = strCol.ToCharArray();
            if (ach.Length == 0)
                throw new ArgumentException("Column cannot have 0 length. Must be A - IV. Was: '" + strCol + "'.");
            else if (ach.Length == 1)
                return GetNumberValue(ach[0]);
            else       
                return 26*GetNumberValue(ach[0]) + GetNumberValue(ach[1]);
        }

        private static string ConvertIntToStringColumn(int intCol)
        {
            if (intCol < 1)
                throw new ArgumentOutOfRangeException("intCol",intCol,"Column must be between 1 and 256.");
            if (intCol > 256)
                throw new ArgumentOutOfRangeException("intCol",intCol,"Column must be between 1 and 256.");

            //single digit, just convert it.
            if (intCol <= 26)   
                return GetCharacterValue(intCol).ToString();

            //multiple char
            int intDigit1 = intCol / 26;
            int intDigit2 = intCol % 26;

            string s = GetCharacterValue(intDigit1).ToString() + GetCharacterValue(intDigit2).ToString();

            return s;
        }

        private static char GetCharacterValue(int i)
        {
            return (char)(i + 64);
        }

        private static int GetNumberValue(char c)
        {
            return (int)c - 64;
        }

   
        #endregion

        #region Public Properties

        private int _intRow, _intCol;
        private string _strA1Reference;

        public int Row
        {
            get
            {
                return this._intRow;
            }
        }

        public int Column
        {
            get
            {
                return this._intCol;
            }
        }

        public string A1Reference
        {
            get
            {
                return _strA1Reference;
            }
        }

        #endregion

        #region Test Wrappers

#if DEBUG

        public static string TestConvertIntToStringColumn(int intCol)
        {
            return ConvertIntToStringColumn(intCol);
        }

#endif

        #endregion

    }
}

Unit Tests:

This kind of class - purely an in-memory data structure with no external dependencies, is easily unit tested. This codes uses the NUnit 2.2 Framework.

using System;using NUnit.Framework;using ExcelUtilities;namespace UnitTests{    ///     /// Summary description for TestCell.    ///     [TestFixture] public class TestCell    {        public TestCell()        {        }        #region A1 Constructor        [Test] public void GetRowAndColumn_1()         {            Cell c = new Cell("B",3);            Assert.AreEqual(3,c.Row);            Assert.AreEqual(2,c.Column);        }        [Test] public void GetRowAndColumn_IgnoreCase()         {            Cell c = new Cell("b",3);            Assert.AreEqual(3,c.Row);            Assert.AreEqual(2,c.Column);        }        [Test] [ExpectedException(typeof(ArgumentException))]         public void GetRowAndColumn_InvalidCol()         {            Cell c = new Cell("ZZ",3);        }        [Test] [ExpectedException(typeof(ArgumentException))]         public void GetRowAndColumn_InvalidRow()         {            Cell c = new Cell("A",-1);        }    [Test] [ExpectedException(typeof(System.ArgumentNullException))]     public void GetRowAndColumn_InvalidColumn0()     {      Cell c = new Cell(null,-1);    }    [Test] [ExpectedException(typeof(System.ArgumentException))]     public void GetRowAndColumn_InvalidColumn1()     {      Cell c = new Cell("",-1);    }    [Test] [ExpectedException(typeof(System.ArgumentException))]     public void GetRowAndColumn_InvalidColumn2()     {      Cell c = new Cell("#$%",-1);    }        #endregion        #region RC constructor        [Test] public void RC_1()         {            Cell c = new Cell(2,3);            Assert.AreEqual(2,c.Row);            Assert.AreEqual(3,c.Column);        }        [Test] [ExpectedException(typeof(ArgumentException))]         public void RC_2()         {            Cell c = new Cell(-1,3);        }        [Test] [ExpectedException(typeof(ArgumentException))]         public void RC_3()         {            Cell c = new Cell(1,-1);        }        #endregion        #region ConvertIntToStringColumn        #if DEBUG        [Test] [ExpectedException(typeof(System.ArgumentOutOfRangeException))]         public void ConvertIntToStringColumn_1()         {            Cell.TestConvertIntToStringColumn(-1);        }        [Test] [ExpectedException(typeof(System.ArgumentOutOfRangeException))]        public void ConvertIntToStringColumn_2a()         {            Cell.TestConvertIntToStringColumn(0);        }        [Test] [ExpectedException(typeof(System.ArgumentOutOfRangeException))]        public void ConvertIntToStringColumn_2()         {            Cell.TestConvertIntToStringColumn(257);        }        [Test] public void ConvertIntToStringColumn_3()         {            Assert.AreEqual("A",Cell.TestConvertIntToStringColumn(1));        }        [Test] public void ConvertIntToStringColumn_4()         {            Assert.AreEqual("IV",Cell.TestConvertIntToStringColumn(256));        }        [Test] public void ConvertIntToStringColumn_5()         {            Assert.AreEqual("Z",Cell.TestConvertIntToStringColumn(26));        }#endif        #endregion        #region A1Reference        [Test] public void A1Reference_1a()         {            Cell c = new Cell("B",3);            Assert.AreEqual("B3",c.A1Reference);        }    [Test] public void A1Reference_CaseInsensitive()     {      Cell c = new Cell("b",3);      Assert.AreEqual("B3",c.A1Reference);    }        [Test] public void A1Reference_2()         {            Cell c = new Cell(3,2);            Assert.AreEqual("B3",c.A1Reference);        }        #endregion    }}

There's not much to explain about this code. One note about using the #if DEBUG to enclose a test-wrapper. We could use reflection to avoid this, as explained in this Code Project article, but I was just being quick and informal. Next we'll do the more interesting task of using this in an ExcelReader utility class.

Thursday, September 22, 2005

C# 3.0 LINQ - querying data

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/c_30_linq__querying_data.htm]

This looks so cool:

[Quote from MSDN]

The LINQ Project is a codename for a set of extensions to the .NET Framework that encompass language-integrated query, set, and transform operations. It extends C# and Visual Basic with native language syntax for queries and provides class libraries to take advantage of these capabilities. 

[End Quote]

It looks like you'd have the similar query abilities that you'd have in SQL or XPath, and it beats writing tons of custom code. See 101 examples here: http://www.msdn.microsoft.com/vcsharp/future/linqsamples/

Thursday, September 15, 2005

Extending Excel with .Net Part III: Context Menus

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/extending_excel_with_net_part_iii_context_menus.htm]

In my previous blog about extending Excel with .Net, I showed how an Excel sheet could call a .Net console app. This blog will show how to add Context Menus to a specific sheet. This is a little off topic for me as I don't normally discuss VBA. There are certainly other resources to cover this, like http://www.cpearson.com/excel.htm

What we'd like to do is add Context Menus for just a single workbook. When you close the books, the menus disappear, when you re-open it they reappear. Normally when you add an item to the context menu, it persists. So we'll add it when we open the workbook, and remove it when we deactivate the workbook. Something like so:

Private Const m_Context1 As String = "Context1"
Private Const m_Context2 As String = "Context2"

Private Sub Workbook_WindowActivate(ByVal Wn As Excel.Window)
    Call AddItemToContextMenu
End Sub
Private
Sub Workbook_WindowDeactivate(ByVal Wn As Excel.Window)
    Call RemoveContextMenuItem
End Sub

Sub
AddItemToContextMenu()
    Dim cmdNew As CommandBarButton
   
    Set cmdNew = Application.CommandBars("Cell").Controls.Add
    With cmdNew
        .Caption = m_Context1
        .OnAction = "Method1"
        .BeginGroup = True
    End With

    Set cmdNew = Application.CommandBars("Cell").Controls.Add
    With cmdNew
        .Caption = m_Context2
        .OnAction = "Method2"
        .BeginGroup = False
    End With

End
Sub

Sub
RemoveContextMenuItem()
    On Error Resume Next
    Application.CommandBars("Cell").Controls(m_Context1).Delete
    Application.CommandBars("Cell").Controls(m_Context2).Delete
End Sub

A couple notes:

  1. We reference the menu items by a string, so I refactor this to a constant such that string doesn't get accidentally mistyped during the remove phase.
  2. The WindowActivate and WindowDeactive trigger the events we need. We can't merely just do this on open or close because the user may have two workbooks open at a time, and we don't want our context menu items to appear in the other workbook.
  3. The OnAction property is set to the method to be called.

While it's nice to be able to add such menu items, it really makes me appreciate WinForms in .Net!

Next post: How to read the Excel book from .Net.

Monday, September 5, 2005

Excel Migration Part II: Calling a .Net Process

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/excel_migration_part_ii_calling_a_net_process.htm]

In my previous blog I discussed extending Excel with VBA vs. re-writing the app. I said I wanted to create a hybrid solution. Here I'll propose a solution for using .Net to improve how you view and share the final data with clients.

Essentially we want to create a "thin" report of our final data. This report is thin in the sense that it only contains the data and the means to present it. It does not contain calculation logic, validation, user interactions, security, etc... The context is that you've used your Excel sheet to calculate the data needed for a widget, and now you want to send that data to a client. You don't want to send them the entire workbook, as this is:

  • Platform Dependent - it requires that they have not just Excel, but the correct version of Excel.
  • Bulky - It physically takes up more space (you store all the rows, and all the calculations for each row)
  • Messy - while you can format things nicely in Excel (such that it prints okay), this is extra work to do. It is also much harder when you only want to display one row.
  • Excess - Suppose you're using the same sheet to manage widgets for 10 different clients, you only want to send one (or several) rows of output - not every row in your sheet.
  • Unsecure - Say you only want to send your output, not all the calculations you used. Your calculations may even be intellectual property that can't be distributed. Sure you could put a password on the Excel sheet, but that is an extra step and still gives away your calculations - albeit in a locked format. Wouldn't it be ideal to not even send the calculations at all?
  • Changeable - suppose you're sending this to a manager - do you really want to risk that manager "accidentally" fiddling with your calculations or breaking the Excel sheet? Ideally you could give them an idiot-proof document with no "movable parts" that cannot be broken.

One solution that solves all of this is to have a tool generate an Html report from only the Excel output that you want. Html would have the benefits:

  • Platform Independent - it has no need for Excel. Anyone in the world can read HTML
  • Light weight - a small, text-based file.
  • Easily formatted
  • Minimal - only send the output you want
  • Secure - it only includes the results, none of the proprietary calculations used to get those.
  • Unchangeable - a pure Html doc has no "movable parts". It is read only and impossible even for the most clumsy user to mess up.

How we would implement this:

  1. Create a .Net Console App that takes in the necessary info to read the Excel book and create the desired Html Report
  2. In Excel, have a processes to call this app via the command line.

Say this .Net console app, called CreateHtmlReportConsole,  would take four parameters: the physical path to the Excel book, the physical path to an html file that servers as a template for the report, the Excel worksheet, and the Excel row to pull data from. A sample command line may look like so (I've included the quotes in case your paths have spaces in them):

"C:\myProjects\CreateHtmlReportConsole.exe" "C:\ExcelBooks\Book1.xls" "C:\myProjects\Template.htm" 1 14

I'll show step #2 here, and discuss step#1 more in another blog. You could have a method in VBA (from a button or context menu) use the Shell command to call an external process like our .Net console app. For example:

Dim strCmd As String
Dim strCurrentDirectory As String
Dim strThisFileName As String

'Get current directory and filename:
strCurrentDirectory = ThisWorkbook.Path & "\"
strThisFileName = ThisWorkbook.Name

'Create a command line like:
' "C:\myProjects\CreateHtmlReportConsole.exe" "C:\ExcelBooks\Book1.xls" "C:\myProjects\Template.htm" 1 14
strCmd = strCurrentDirectory + "CreateHtmlReportConsole.exe """ + _
    strCurrentDirectory + strThisFileName + """ """ + _
    strCurrentDirectory + "Template.htm"" 1 " & intRow
Shell (strCmd)

Next step: how add dynamic context menus in Excel, from which you could call this code.

Thursday, September 1, 2005

Migrating an app from Excel to .Net, Part I

[This was originally posted at http://timstall.dotnetdevelopersjournal.com/migrating_an_app_from_excel_to_net_part_i.htm]

I recently had the challenge of improving upon an Excel workbook. This was a complicated sheet, with 200 columns of ugly calculations that solved a difficult engineering problem. There were many rows, one for each widget being engineered. I'm going to blog about several issues I encountered for this.

The first issue I needed to deal with: should I extend Excel (with VBA) or rewrite the application in .Net. I see the following pros and cons:

 Extend Excel with VBARewrite in .Net
Pro
  • Potentially much quicker - not rewriting all the complex calculations
  • Excel is already designed for this kind of thing. It's easy to copy new rows, and it has a built in tools like GoalSeek (a numeric solver) of which there is no clear counterpart available in .Net (perhaps I could implement Newton's Method with numeric derivatives, but that takes some work would be very error prone for me to mess up).
  • Excel is easy to maintain for the end user, especially if they need to make a change.
  • Powerful platfom
  • Easier to test code with Test-Driven-Development (what if you need to modify a formula in your Excel book, and you want to make sure everything still works)
  • Splits out input, logic, and view into separate components. For example you could have a WinForm take (and validate) the input, a backend DLL could do all the logic, and then you could generate an Xml file or Html document with the final output results.
Con
  • Limited power, VBA is messy to deal with compared to .Net with C#, Visual Studio, and the .Net Framework
  • Excel combines the input, logic, and view all into one sheet. For example is you wanted to show the client only the calculated output, you'd either send them the entire sheet (which lets them see your calculations unless you do some fancy cell-hiding), or copy those values to a new document.
  • Lot of extra work
  • Hard for the end user to change or maintain

The final solution I'm looking at: make a hybrid. Continue using Excel for user input and calculations, extending it with VBA. But use .Net to read the Excel document and generate output views. I'll discuss this more in my next blog.