Sometimes, Excel's built-in find-replace function doesn't meet our needs, and this time you can consider using regular expressions in Excel.
To use regular expressions directly in Excel formulas or loops, the following UDF (User Defined Function) can help. It exposes the functionality of a regular expression more or less directly as an Excel function.
To use regular expressions in Excel, you need to add VBA reference to "Microsoft VBScript Regular Expressions 5.5" or use Late binding.
Using Regular Expressions
'------------------ Modules ------------------
Function REGEX(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant
Dim inputRegexObj As Object, outputRegexObj As Object, outReplaceRegexObj As Object
Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object
Dim replaceNumber As Integer
Set inputRegexObj = CreateObject("VBScript.RegExp")
Set outputRegexObj = CreateObject("VBScript.RegExp")
Set outReplaceRegexObj = CreateObject("VBScript.RegExp")
With inputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = matchPattern
End With
With outputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "\$(\d+)"
End With
With outReplaceRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
End With
Set inputMatches = inputRegexObj.Execute(strInput)
If inputMatches.Count = 0 Then
REGEX = False
Else
Set replaceMatches = outputRegexObj.Execute(outputPattern)
For Each replaceMatch In replaceMatches
replaceNumber = replaceMatch.SubMatches(0)
outReplaceRegexObj.Pattern = "\$" & replaceNumber
If replaceNumber = 0 Then
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value)
Else
If replaceNumber > inputMatches(0).SubMatches.Count Then
REGEX = CVErr(xlErrValue)
Exit Function
Else
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1))
End If
End If
Next
REGEX = outputPattern
End If
End Function
How this UDF works
Syntax
REGEX(InputString, matchPattern, [outputPattern])
Parameters
- InputString Required. text to use the regular expression on.
- matchPattern Required. A regular expression.
- outputPattern Optional. A format string specifying how the result should look. It can contain
$0
,$1
,$2
, and so on.$0
is the entire match,$1
and up correspond to the respective match groups in the regular expression. Defaults to$0
.
Examples
Extracting an email address
=REGEX("Dwayne Johnson: [email protected], 1972", "\w+@\w+\.\w+")
=REGEX("Dwayne Johnson: [email protected], 1972", "\w+@\w+\.\w+", "$0")
Results in:
[email protected]
Extracting several substrings
=REGEX("Dwayne Johnson: [email protected], 1972", "^(.+): (.+), (\d+)$", "E-Mail: $2, Name: $1, Born: $3" )
Results in:
E-Mail: [email protected], Name: Dwayne Johnson, Born: 1972
Splitting a combined string in a cell
=REGEX("Dwayne Johnson: [email protected], 1972", "^(.+): (.+), (\d+)$", "$" & 1)
=REGEX("Dwayne Johnson: [email protected], 1972", "^(.+): (.+), (\d+)$", "$" & 2)
=REGEX("Dwayne Johnson: [email protected], 1972", "^(.+): (.+), (\d+)$", "$" & 3)
Results in:
Dwayne Johnson
[email protected]
1972
Syntax of regular expressions
Basic definitions
-
Range.
- E.g.
a-z
matches an lower case letters from a to z - E.g.
0-5
matches any number from 0 to 5
[]
Match exactly one of the objects inside these brackets.
- E.g.
[a]
matches the letter a - E.g.
[abc]
matches a single letter which can be a, b or c - E.g.
[a-z]
matches any single lower case letter of the alphabet.
()
Groups different matches for return purposes. See examples below.
{}
Multiplier for repeated copies of pattern defined before it.
- E.g.
[a]{2}
matches two consecutive lower case letter a:aa
- E.g.
[a]{1,3}
matches at least one and up to three lower case lettera
,aa
,aaa
+
Match at least one, or more, of the pattern defined before it.
- E.g.
a+
will match consecutive a'sa
,aa
,aaa
, and so on
?
Match zero or one of the pattern defined before it.
- E.g. Pattern may or may not be present but can only be matched one time.
- E.g.
[a-z]?
matches empty string or any single lower case letter.
*
Match zero or more of the pattern defined before it.
- E.g. Wildcard for pattern that may or may not be present.
- E.g.
[a-z]*
matches empty string or string of lower case letters.
.
Matches any character except newline \n
- E.g.
a.
Matches a two character string starting with a and ending with anything except\n
|
OR operator
- E.g.
a|b
means eithera
orb
can be matched. - E.g.
red|white|orange
matches exactly one of the colors.
^
NOT operator
- E.g.
[^0-9]
character can not contain a number - E.g.
[^aA]
character can not be lower casea
or upper caseA
\
Escapes special character that follows (overrides above behavior)
- E.g.
\.
,\\
,\(
,\?
,\$
,\^
Anchoring Patterns
^
Match must occur at start of string
- E.g.
^a
First character must be lower case lettera
- E.g.
^[0-9]
First character must be a number.
$
Match must occur at end of string
- E.g.
a$
Last character must be lower case lettera
Precedence table
Order | Name | Representation |
---|---|---|
1 | Parentheses | ( ) |
2 | Multipliers | ? + * {m,n} {m, n}? |
3 | Sequence & Anchors | abc ^ $ |
4 | Alternation | | |
Predefined Character Abbreviations
abr | same as | meaning |
---|---|---|
\d | [0-9] | Any single digit |
\D | [^0-9] | Any single character that's not a digit |
\w | [a-zA-Z0-9_] | Any word character |
\W | [^a-zA-Z0-9_] | Any non-word character |
\s | [ \r\t\n\f] | Any space character |
\S | [^ \r\t\n\f] | Any non-space character |
\n | [\n] | New line |
How to Use This Macro
Most VBA code should be placed in Standard Modules unless specified.
If you see a comment '------------------ Modules------------------
in the code header that means put the code in a Standard Module. For more information, learn this course: Where should I put the Excel VBA code?
The following steps teach you how to put VBA code into a Standard Module:
- Activate the Visual Basic Editor by pressing ALT + F11.
- Right-click the project/workbook name in the Project Window.
- Choose Insert -> Module.
- Type or paste the code in the newly created module. You will probably need to change the sheet name, the range address, and the save location.
- Click Run button on the Visual Basic Editor toolbar.
- For more information, learn this course: Programming with Excel VBA
Wow!