See Also: Regex Members
The System.Text.RegularExpressions.Regex class represents the .NET Framework's regular expression engine. It can be used to quickly parse large amounts of text to find specific character patterns; to extract, edit, replace, or delete text substrings; and to add the extracted strings to a collection to generate a report.
If your primary interest is to validate a string by determining whether it conforms to a particular pattern, you can use the System.Configuration.RegexStringValidator class.
To use regular expressions, you define the pattern that you want to identify in a text stream by using the syntax documented in Regular Expression Language Elements. Next, you can optionally instantiate a System.Text.RegularExpressions.Regex object. Finally, you call a method that performs some operation, such as replacing text that matches the regular expression pattern, or identifying a pattern match.
For more information about using the System.Text.RegularExpressions.Regex class, see the following sections:
The string class includes several search and comparison methods that you can use to perform pattern matching with text. For example, the string.Contains(string), string.EndsWith(string), and string.StartsWith(string) methods determine whether a string instance contains a specified substring; and the string.IndexOf(string), string.IndexOfAny(Char[]), string.LastIndexOf(string), and string.LastIndexOfAny(Char[], int, int) methods return the starting position of a specified substring in a string. Use the methods of the string class when you are searching for a specific string. Use the System.Text.RegularExpressions.Regex class when you are searching for a specific pattern in a string. For more information and examples, see .NET Framework Regular Expressions.
After you define a regular expression pattern, you can provide it to the regular expression engine in either of two ways:
By instantiating a System.Text.RegularExpressions.Regex object that represents the regular expression. To do this, you pass the regular expression pattern to a Regex.#ctor(string) constructor. A System.Text.RegularExpressions.Regex object is immutable; when you instantiate a System.Text.RegularExpressions.Regex object with a regular expression, that object's regular expression cannot be changed.
By supplying both the regular expression and the text to search to a static (Shared in Visual Basic) System.Text.RegularExpressions.Regex method. This enables you to use a regular expression without explicitly creating a System.Text.RegularExpressions.Regex object.
All System.Text.RegularExpressions.Regex pattern identification methods include both static and instance overloads.
The regular expression engine must compile a particular pattern before the pattern can be used. Because System.Text.RegularExpressions.Regex objects are immutable, this is a one-time procedure that occurs when a System.Text.RegularExpressions.Regex class constructor or a static method is called. To eliminate the need to repeatedly compile a single regular expression, the regular expression engine caches the compiled regular expressions used in static method calls. As a result, regular expression pattern-matching methods offer comparable performance for static and instance methods.
In the .NET Framework versions 1.0 and 1.1, all compiled regular expressions, whether they were used in instance or static method calls, were cached. Starting with the .NET Framework 2.0, only regular expressions used in static method calls are cached.
However, caching can adversely affect performance in the following two cases:
When you use static method calls with a large number of regular expressions. By default, the regular expression engine caches the 15 most recently used static regular expressions. If your application uses more than 15 static regular expressions, some regular expressions must be recompiled. To prevent this recompilation, you can increase the Regex.CacheSize property.
When you instantiate new System.Text.RegularExpressions.Regex objects with regular expressions that have previously been compiled. For example, the following code defines a regular expression to locate duplicated words in a text stream. Although the example uses a single regular expression, it instantiates a new System.Text.RegularExpressions.Regex object to process each line of text. This results in the recompilation of the regular expression with each iteration of the loop.
code reference: System.Text.RegularExpressions.Class.Caching#1
To prevent recompilation, you should instantiate a single System.Text.RegularExpressions.Regex object that is accessible to all code that requires it, as shown in the following rewritten example.
code reference: System.Text.RegularExpressions.Class.Caching#2
Whether you decide to instantiate a System.Text.RegularExpressions.Regex object and call its methods or call static methods, the System.Text.RegularExpressions.Regex class offers the following pattern-matching functionality:
Validation of a match. You call the erload:System.Text.RegularExpressions.Regex.IsMatch method to determine whether a match is present.
Retrieval of a single match. You call the erload:System.Text.RegularExpressions.Regex.Match method to retrieve a System.Text.RegularExpressions.Match object that represents the first match in a string or in part of a string. Subsequent matches can be retrieved by calling the Match.NextMatch method.
Retrieval of all matches. You call the erload:System.Text.RegularExpressions.Regex.Matches method to retrieve a System.Text.RegularExpressions.MatchCollection object that represents all the matches found in a string or in part of a string.
Replacement of matched text. You call the erload:System.Text.RegularExpressions.Regex.Replace method to replace matched text. The replacement text can also be defined by a regular expression. In addition, some of the erload:System.Text.RegularExpressions.Regex.Replace methods include a System.Text.RegularExpressions.MatchEvaluator parameter that enables you to programmatically define the replacement text.
Creation of a string array that is formed from parts of an input string. You call the erload:System.Text.RegularExpressions.Regex.Split method to split an input string at positions that are defined by the regular expression.
In addition to its pattern-matching methods, the System.Text.RegularExpressions.Regex class includes several special-purpose methods:
The Regex.Escape(string) method escapes any characters that may be interpreted as regular expression operators in a regular expression or input string.
The Regex.Unescape(string) method removes these escape characters.
The erload:System.Text.RegularExpressions.Regex.CompileToAssembly method creates an assembly that contains predefined regular expressions. The .NET Framework contains examples of these special-purpose assemblies in the System.Web.RegularExpressions namespace.
The .NET Framework supports a full-featured regular expression language that provides substantial power and flexibility in pattern matching. However, the power and flexibility come at a cost: the risk of poor performance. Regular expressions that perform poorly are surprisingly easy to create. In some cases, regular expression operations that rely on excessive backtracking can appear to stop responding when they process text that nearly matches the regular expression pattern. For more information about the .NET Framework regular expression engine, see Details of Regular Expression Behavior. For more information about excessive backtracking, see Backtracking.
Starting with the net_v45, you can define a time-out interval for regular expression matches. If the regular expression engine cannot identify a match within this time interval, the matching operation throws a System.Text.RegularExpressions.RegexMatchTimeoutException exception. In most cases, this prevents the regular expression engine from wasting processing power by trying to match text that nearly matches the regular expression pattern. It also could indicate, however, that the timeout interval has been set too low, or that the current machine load has caused an overall degradation in performance.
How you handle the exception depends on the cause of the exception. If the exception occurs because the time-out interval is set too low or because of excessive machine load, you can increase the time-out interval and retry the matching operation. If the exception occurs because the regular expression relies on excessive backtracking, you can assume that a match does not exist, and, optionally, you can log information that will help you modify the regular expression pattern.
You can set a time-out interval by calling the Regex.#ctor(string, RegexOptions, TimeSpan) constructor when you instantiate a regular expression object. For static methods, you can set a time-out interval by calling an overload of a matching method that has a matchTimeout parameter. If you do not set a time-out value explicitly, the default time-out value is determined as follows:
By using the application-wide time-out value, if one exists. This can be any time-out value that applies to the application domain in which the System.Text.RegularExpressions.Regex object is instantiated or the static method call is made. You can set the application-wide time-out value by calling the AppDomain.SetData(string, object) method to assign the string representation of a TimeSpan value to the "REGEX_DEFAULT_MATCH_TIMEOUT" property.
By using the value Regex.InfiniteMatchTimeout, if no application-wide time-out value has been set.
We recommend that you set a time-out value in all regular expression pattern-matching operations. For more information, see Best Practices for Regular Expressions in the .NET Framework.