Saturday, March 31, 2012

Excel Part 1

The core MS Office apps have their application and inner objects exposed via COM.  These COM interfaces have distributable .NET Interop assemblies available to download.

There are two basic ways to interact with Excel via the COM objects and via the interop assembly.  Functionally I think the COM will allow you to accomplish the same tasks, but it will not be as easy.  To load the interop you will need:

001
002
003
004
005
006
007
#Load the Excel Assembly, Locally or from GAC
try {
    Add-Type -ASSEMBLY "Microsoft.Office.Interop.Excel"  | out-null
}catch {
#If the assembly can't be found this will load the most recent version in the GAC
    [Reflection.Assembly]::LoadWithPartialname("Microsoft.Office.Interop.Excel"| out-null
}

I tend to distribute the inerop DLL in my network share so I don't have to make sure that all servers and workstations have it installed.  The above should take care of either loading a local assembly or looking in the GAC.

To access Excel data, you have to be aware of the hierarchy of things.  At the top is the application class that contains one or more workbooks that contain one or more worksheets.  Within the worksheet are ranges.  Each layer can access down to some of the other layers.

001
002
003
004
005
006
007
008
009
Function Open-ExcelApplication {
Param([switch] $Visible,[switch] $HideAlerts)
    $app = New-Object Microsoft.Office.Interop.Excel.ApplicationClass
    $app.Visible  = $Visible
    $app.DisplayAlerts = -not $HideAlerts
    return $app
}
$app = open-excelApplication -Visible
$app | gm active*

Yields:

Name                      MemberType Definition                                              
----                      ---------- ----------                                              
ActiveCell                Property   Microsoft.Office.Interop.Excel.Range ActiveCell {get;}  
ActiveChart               Property   Microsoft.Office.Interop.Excel.Chart ActiveChart {get;}  
ActiveDialog              Property   Microsoft.Office.Interop.Excel.DialogSheet ActiveDialog ...
ActiveEncryptionSession   Property   int ActiveEncryptionSession {get;}                      
ActiveMenuBar             Property   Microsoft.Office.Interop.Excel.MenuBar ActiveMenuBar {get;}
ActivePrinter             Property   string ActivePrinter {get;set;}                          
ActiveProtectedViewWindow Property   Microsoft.Office.Interop.Excel.ProtectedViewWindow Activ...
ActiveSheet               Property   System.Object ActiveSheet {get;}                        
ActiveWindow              Property   Microsoft.Office.Interop.Excel.Window ActiveWindow {get;}
ActiveWorkbook            Property   Microsoft.Office.Interop.Excel.Workbook ActiveWorkbook {...


All of the classes also have a .Application property that points back to the top.


001
002
003
004
005
006
007
008
009
010
011
012
013
function New-ExcelWorkBook {
Param([parameter(ValueFromPipeline=$true)] $ExcelApplication
,[switch] $Visible)
process {
if ($ExcelApplication -eq $null ) { 
$ExcelApplication  = Open-ExcelApplication -Visible:$Visible
}
$WorkBook = $ExcelApplication.WorkBooks.Add()
return $WorkBook
}
}
$book = $app |  New-ExcelWorkBook
$book | gm active*

Yields:

   TypeName: System.__ComObject#{000208da-0000-0000-c000-000000000046}

Name         MemberType Definition                  
----         ---------- ----------                  
ActiveChart  Property   Chart ActiveChart () {get}  
ActiveSheet  Property   IDispatch ActiveSheet () {get}
ActiveSlicer Property   Slicer ActiveSlicer () {get}

Alternately, you can open an existing workbook:
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
function Get-ExcelWorkBook {
Param([parameter(Mandatory=$true,ValueFromPipeline=$true)] $inputObject
,[switch] $Visible
,[switch] $readonly)
[Microsoft.Office.Interop.Excel.ApplicationClass] $app = $null
if ($inputObject -is [Microsoft.Office.Interop.Excel.ApplicationClass]) {
  $app = $inputObject
  $WorkBook = $app.ActiveWorkbook
else {
  $app = Open-ExcelApplication -Visible:$Visible 
  try {
    if ($inputObject.Contains("\\"-or $inputObject.Contains("//")) {
      $WorkBook = $app.Workbooks.Open($inputObject,$true,[System.Boolean]$readonly)
    } else {
      $WorkBook = $app.Workbooks.Open((Resolve-path $inputObject),$true,[System.Boolean]$readonly)
  }} catch {$WorkBook = $app.Workbooks.Open((Resolve-path $inputObject),$true,[System.Boolean]$readonly)}
}
#todo: Add Switch to toggle Full Rebuild (this does an update data)
$app.CalculateFullRebuild()
return $WorkBook
}


The Interop allows you easy access to the classes and enumerations.  The largest caveat is what you may expect  vs what you get when you look at the COM collections.  These collections are built implementing default properties that do not come across in powershell.  A recorded macro may reference WorkSheets("Sheet1") but in PS you will need to say $WorkSheets.item("Sheet1").  So, what looks like it may be an array may need a call to the item property to do what you expect.

When you look at Excel you see cells, when you automate it you have ranges.

001
002
$Sheet = $Book.Worksheets.item("Sheet1")
$sheet | gm -MemberType *Property | where { $_.Definition -match "Range" }

Yields:

  TypeName: System.__ComObject#{000208d8-0000-0000-c000-000000000046}

 Name              MemberType            Definition                        
 ----              ----------            ----------                        
 Range             ParameterizedProperty Range Range (Variant, Variant) {get}
 Cells               Property              Range Cells () {get}              
 CircularReference Property              Range CircularReference () {get}  
 Columns         Property              Range Columns () {get}            
 Rows              Property              Range Rows () {get}                
 UsedRange     Property              Range UsedRange () {get}


 All of the following are the same:

001
002
003
004
005
006
007
$sheet.Range("A1").Text
$sheet.Range("A1:A1").Text
$sheet.Range("A1","A1").Text
$sheet.cells.Item(1,1).text
$sheet.Columns.Item(1).Rows.Item(1).Text
$sheet.Rows.Item(1).Columns.Item(1).Text
$sheet.UsedRange.Range("a1").Text


If you were going to use a formula in a cell, this same convention is used for the .Range ParameterizedProperty.  Cells you have to use the .Item property but you can more easily use in loops as the cell is a coordinate.  UsedRange is limited to the cells that have or have had data in them as a block.  So furthest X and furthest Y make up the range.  If you want to know what these bounds are:

001
002
$sheet.UsedRange.Columns.Count
$sheet.UsedRange.Rows.Count


To close it all down you can do something like:
001
002
003
004
005
006
007
008
009
010
011
012
Function Close-ExcelApplication {
Param([parameter(Mandatory=$true,ValueFromPipeline=$true)] $inputObject)
if ($inputObject -is [Microsoft.Office.Interop.Excel.ApplicationClass]) {
$app = $inputObject 
else {
$app = $inputObject.Application
Release-Ref $inputObject
}
$app.ActiveWorkBook.Close($false| Out-Null
$app.Quit() | Out-Null
Release-Ref $app
}

This will work fine if you only have one workbook open with changes.  If you have more than one open you will need to make sure that you close or save each sheet.  If you try to close and have more than the active workbook not saved, excel will prompt you to save.  This is not something that you want if you expect your script to run unattended.

Another big point to consider is the garbage collection.  I know it was a big concern with Office 2003 and may be unneeded in 2007 or 2010, but an extra step to clean up your variables should be used.

001
002
003
004
005
function Release-Ref ($ref) {
([System.Runtime.InteropServices.Marshal]::ReleaseComObject([System.__ComObject]$ref-gt 0 | Out-Null
[System.GC]::Collect() | Out-Null
[System.GC]::WaitForPendingFinalizers() | Out-Null
}

If you don't do this, Excel may (may)  fail to close.


More later...

2 comments:

  1. Some very good stuff here. I have found that in PowerShell using the Interop or COM are essentially the same thing since it seems everything goes through Interop and regardless of what you use, there are always some properties that show up as system._COMobject that you can't look into.

    My other comments are general scripting suggestions: use full cmdlet names instead of aliases like gm. Second, I always encourage people to think "writing" objects to the pipeline and not "returning". With what you have shown here, it may not make much difference, but to me Return doesn't make sense in a language that relies so much on the pipeline. Finally, be sure to use standard verbs for your functions. I'm not sure where Release-Ref came from and assuming it is a function. If you eventually package this as a module, PowerShell will complain about non-standard verbs.

    ReplyDelete
  2. Hello,

    Is it possible to use it without excel install on the server ?
    Regards

    ReplyDelete