IT:Column references in Excel

From Stiki

Jump to: navigation, search
This page contains sample code to be used as part of a complete program. You may need to adapt this code to make it suitable for your requirements. Please be aware of the disclaimer regarding the use of any code posted on this site.

In Microsoft Excel, cells are usually referenced by column and row using characters for the column reference and numbers for the row reference (so the first cell is A1, with B1 in the second column and first row). The underlying coding of Excel actually uses numbers to refer to both the row and the column. For example =COLUMN(B1) will return the number 2.

There are times when we wish to convert a numeric column reference into its text character equivalent. An example of this would be when using the INDIRECT() function with an argument that is a constructed string representing a cell reference.

The way to convert a column number into a column reference is not necessarily direct enough to do within in a cell formula. This is because column references range from A to Z, then AA to ZZ, and so on. Conditioning can lead to unwieldy cell formulae, and this is a commonly enough required function that it is worth keeping in VBA whenever it is needed.

Note that the equivalent operation of converting a column reference from text into numbers is much easier to achieve through use of this type of formula: =COLUMN(INDIRECT(col_ref&"1")), remembering of course that INDIRECT is a volatile function (it may require more than one recalculation).

The following code is flexible enough to cope with Excel column references of any size, allowing for future expansion in later versions of the program. For versions of up to Excel 2003 the column references ranged from A to IV (1 to 256) whereas in Excel 2007 the range was extended to go from A to XFD (1 to 16,384).

Code

Public Function Col_Ref(ByVal column_number As Long) As String
    ' Declare internal variables:
    Dim output_string As String
 
    ' Initialise variable values:
    output_string = ""
 
    ' Validate the function argument:
    If column_number > 0 Then
        ' Loop through each character of the column reference:
        Do
            ' Construct the next least-significant character:
            output_string = Chr(((column_number - 1) Mod 26) + 1 + 64) & output_string
            ' Decrement the column number for this character:
            column_number = Int((column_number - 1) / 26)
        Loop Until column_number = 0
    End If
 
    ' Return the column reference as required:
    Col_Ref = output_string
End Function

Notes

The key lines of code are within the Do...Loop block.

Taking the current column number modulo 26 returns a value from 0 to 25 for the least significant part of the number in base-26:

    column_number Mod 26

In order to account for column 26 being Z, we deduct one from the column number first, and add that one back on after taking the modulus. This gives a value from 1 to 26 as required:

    ((column_number - 1) Mod 26) + 1

In order to convert this number into a character of text, we use the Chr() function. This converts a number into its ASCII text equivalent character. The capital letters of the Latin alphabet are defined as characters 65 to 90 (for A to Z) so we add a further 64 to get the appropriate character:

    Chr(((column_number - 1) Mod 26) + 1 + 64)

This gives us the correct character for the least significant part of the current column reference. Any characters we have already built up are even less significant than the current character, so append the older characters to the current character to correctly build up the column reference:

    output_string = Chr(((column_number - 1) Mod 26) + 1 + 64) & output_string

Having used that part of the 'column_number' value, we can decrement this value accordingly. We divide by 26 to remove the least significant part:

    column_number / 26

In fact we need to deduct 1 first. This is again to account for column 26 being Z, and is consistent with deducting the 1 above when taking the modulus:

    (column_number - 1) / 26

This then gives us a number that we can round down to give us the remaining value for the unprocessed columns:

    Int((column_number - 1) / 26)

Assigning this back to 'column_number' gives us the correct value for repeating the above steps to get subsequent characters in the column reference:

    column_number = Int((column_number - 1) / 26)

Repeating this step sufficient times will always reduce 'column_number' as it is subtracted from and divided repeatedly. It will eventually always reach 0 (so long as it was a positive value to begin with) and so we can terminate the loop on this condition:

    Loop Until column_number = 0

Discussion on the algorithm

Note that we can not simply use base-26 or base-27 arithmetic to derive the column reference. This is because Excel column references are neither strict base-26 nor strict base-27 values. Effectively there are 27 different allowable digits including a zero value, but the zero value can only appear at the start of the column reference.

To illustrate this point, consider the ten digits in base-10: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. The set of all one-digit numbers ranges from (0 or) 1 to 9. The set of all two-digit numbers ranges from (00 or) 10 to 99. The set of all three-digit numbers ranges from (000 or) 100 to 999. (Depending on how you wish to exactly define an n-digit number.)

  Including leading 0 Excluding leading 0
Range Count Range Count
1-digit numbers 0 to 9 10 1 to 9 9
2-digit numbers 00 to 99 100 10 to 90 90
3-digit numbers 000 to 999 1000 100 to 999 900
n-digit numbers ... 10^n ... 10^n - 10^(n-1)

You can see that the sum of the final column totals (10^n - 1) which, together with 0, equals the total number of n-digit numbers including leading zeroes, as we would expect.

We can extend these formulae for any number base as well. This gives the total number of n-digit base-b numbers as b^n (including leading zeroes) or (b^n - b^(n-1)) (excluding leading zeroes).

Let us look at the equivalent table for Excel character references (assuming that these references can be arbitrarily large, not limited by the number of allowable columns in a worksheet):

  Range Count
1-character column references A to Z 26
2-character column references AA to ZZ 26^2
3-character column references AAA to ZZZ 26^3
n-character column references ... 26^n

This looks like a base-26 progession as per the table above for numbers including leading zeroes. The crucial difference is this: the column references are actually references excluding leading spaces. Compare: 000 to 999 includes all numbers from 00 to 99 and 0 to 9, whereas AAA to ZZZ is a distinct range from AA to ZZ and A to Z. The column ranges are equivalent to 100 to 999, 10 to 99, and 1 to 9.

The actual total of all column references up to and including n characters is (26 + 26^2 + 26^3 + ... + 26^n) whereas the total of all base-26 numbers up to and including n characters is (26^n). The total number of character references is more than (26^n) but less than (27^n).

The point is that the column reference system is not equivalent to a rebased numbering system because the equivalent of the leading zero, the blank null character, cannot appear as a character of lower significance. Illustrating the blank character as an @ for clarity, we don't see columns in this sequence: {... Y, Z, A@, AA, AB, ..., ZY, ZZ, A@@, A@A, A@B, ...} (where leading @ symbols are not displayed, just as leading zeroes are not displayed).

The detail of this explanation is probably beyond what you need though. In summary, use the Visual Basic code listed above, and it will correctly convert numbers into column references.

Personal tools