Skip to main content

Excel Macro: Using Regular Expressions

Sometimes, Excel's built-in find-replace function doesn't meet our needs, and this time you can consider using regular expressions in Excel.

To use regular expressions directly in Excel formulas or loops, the following UDF (User Defined Function) can help. It exposes the functionality of a regular expression more or less directly as an Excel function.

To use regular expressions in Excel, you need to add VBA reference to "Microsoft VBScript Regular Expressions 5.5" or use Late binding.

Using Regular Expressions

'------------------ Modules ------------------
Function REGEX(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant
    Dim inputRegexObj As Object, outputRegexObj As Object, outReplaceRegexObj As Object
    Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object
    Dim replaceNumber As Integer
    
    Set inputRegexObj = CreateObject("VBScript.RegExp")
    Set outputRegexObj = CreateObject("VBScript.RegExp")
    Set outReplaceRegexObj = CreateObject("VBScript.RegExp")

    With inputRegexObj
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = matchPattern
    End With
    With outputRegexObj
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = "\$(\d+)"
    End With
    With outReplaceRegexObj
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
    End With

    Set inputMatches = inputRegexObj.Execute(strInput)
    If inputMatches.Count = 0 Then
        REGEX = False
    Else
        Set replaceMatches = outputRegexObj.Execute(outputPattern)
        For Each replaceMatch In replaceMatches
            replaceNumber = replaceMatch.SubMatches(0)
            outReplaceRegexObj.Pattern = "\$" & replaceNumber

            If replaceNumber = 0 Then
                outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value)
            Else
                If replaceNumber > inputMatches(0).SubMatches.Count Then
                    REGEX = CVErr(xlErrValue)
                    Exit Function
                Else
                    outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1))
                End If
            End If
        Next
        REGEX = outputPattern
    End If
End Function

How this UDF works

Syntax

REGEX(InputString, matchPattern, [outputPattern])

Parameters

  1. InputString Required. text to use the regular expression on.
  2. matchPattern Required. A regular expression.
  3. outputPattern Optional. A format string specifying how the result should look. It can contain $0, $1, $2, and so on. $0 is the entire match, $1 and up correspond to the respective match groups in the regular expression. Defaults to $0.

Examples

Extracting an email address

=REGEX("Dwayne Johnson: [email protected], 1972", "\w+@\w+\.\w+")
=REGEX("Dwayne Johnson: [email protected], 1972", "\w+@\w+\.\w+", "$0")

Results in:
[email protected]

Extracting several substrings

=REGEX("Dwayne Johnson: [email protected], 1972", "^(.+): (.+), (\d+)$", "E-Mail: $2, Name: $1, Born: $3" )

Results in:
E-Mail: [email protected], Name: Dwayne Johnson, Born: 1972

Splitting a combined string in a cell

=REGEX("Dwayne Johnson: [email protected], 1972", "^(.+): (.+), (\d+)$", "$" & 1)
=REGEX("Dwayne Johnson: [email protected], 1972", "^(.+): (.+), (\d+)$", "$" & 2)
=REGEX("Dwayne Johnson: [email protected], 1972", "^(.+): (.+), (\d+)$", "$" & 3)

Results in:
Dwayne Johnson
[email protected]
1972

Syntax of regular expressions

Basic definitions

- Range.

  • E.g. a-z matches an lower case letters from a to z
  • E.g. 0-5 matches any number from 0 to 5

[] Match exactly one of the objects inside these brackets.

  • E.g. [a] matches the letter a
  • E.g. [abc] matches a single letter which can be a, b or c
  • E.g. [a-z] matches any single lower case letter of the alphabet.

() Groups different matches for return purposes. See examples below.

{} Multiplier for repeated copies of pattern defined before it.

  • E.g. [a]{2} matches two consecutive lower case letter a: aa
  • E.g. [a]{1,3} matches at least one and up to three lower case letter aaaaaa

+ Match at least one, or more, of the pattern defined before it.

  • E.g. a+ will match consecutive a's aaaaaa, and so on

? Match zero or one of the pattern defined before it.

  • E.g. Pattern may or may not be present but can only be matched one time.
  • E.g. [a-z]? matches empty string or any single lower case letter.

* Match zero or more of the pattern defined before it.

  • E.g. Wildcard for pattern that may or may not be present.
  • E.g. [a-z]* matches empty string or string of lower case letters.

. Matches any character except newline \n

  • E.g. a. Matches a two character string starting with a and ending with anything except \n

| OR operator

  • E.g. a|b means either a or b can be matched.
  • E.g. red|white|orange matches exactly one of the colors.

^ NOT operator

  • E.g. [^0-9] character can not contain a number
  • E.g. [^aA] character can not be lower case a or upper case A

\ Escapes special character that follows (overrides above behavior)

  • E.g. \.\\\(\?\$\^

Anchoring Patterns

^ Match must occur at start of string

  • E.g. ^a First character must be lower case letter a
  • E.g. ^[0-9] First character must be a number.

$ Match must occur at end of string

  • E.g. a$ Last character must be lower case letter a

Precedence table

Order Name Representation
1 Parentheses ( )
2 Multipliers ? + * {m,n} {m, n}?
3 Sequence & Anchors abc ^ $
4 Alternation |

Predefined Character Abbreviations

abr same as meaning
\d [0-9] Any single digit
\D [^0-9] Any single character that's not a digit
\w [a-zA-Z0-9_] Any word character
\W [^a-zA-Z0-9_] Any non-word character
\s [ \r\t\n\f] Any space character
\S [^ \r\t\n\f] Any non-space character
\n [\n] New line

Most VBA code should be placed in Standard Modules unless specified.

If you see a comment '------------------ Modules------------------ in the code header that means put the code in a Standard Module. For more information, learn this course: Where should I put the Excel VBA code?

The following steps teach you how to put VBA code into a Standard Module:

  1. Activate the Visual Basic Editor by pressing ALT + F11.
  2. Right-click the project/workbook name in the Project Window.
  3. Choose Insert -> Module.
  4. Type or paste the code in the newly created module. You will probably need to change the sheet name, the range address, and the save location.
  5. Click Run button on the Visual Basic Editor toolbar.
  6. For more information, learn this course: Programming with Excel VBA

Leave a comment

Your email address will not be published. Required fields are marked *

Format your code: <pre><code class="language-vba">place your code here</code></pre>

1 comment
  1. SK
    Sandeep Kothari

    Wow!