Previous: Char Classes, Up: Syntax of Regexps
For the most part, ‘\’ followed by any character matches only that character. However, there are several exceptions: certain sequences starting with ‘\’ that have special meanings. Here is a table of the special ‘\’ constructs.
Thus, ‘foo\|bar’ matches either ‘foo’ or ‘bar’ but no other string.
‘\|’ applies to the largest possible surrounding expressions. Only a surrounding ‘\( ... \)’ grouping can limit the grouping power of ‘\|’.
If you need full backtracking capability to handle multiple uses of
‘\|’, use the POSIX regular expression functions (see POSIX Regexps).
For example, ‘c[ad]\{1,2\}r’ matches the strings ‘car’,
‘cdr’, ‘caar’, ‘cadr’, ‘cdar’, and ‘cddr’, and
nothing else.
‘\{0,1\}’ or ‘\{,1\}’ is equivalent to ‘?’.
‘\{0,\}’ or ‘\{,\}’ is equivalent to ‘*’.
‘\{1,\}’ is equivalent to ‘+’.
This last application is not a consequence of the idea of a
parenthetical grouping; it is a separate feature that was assigned as a
second meaning to the same ‘\( ... \)’ construct because, in
practice, there was usually no conflict between the two meanings. But
occasionally there is a conflict, and that led to the introduction of
shy groups.
Shy groups are also called non-capturing or unnumbered
groups.
In other words, after the end of a group, the matcher remembers the beginning and end of the text matched by that group. Later on in the regular expression you can use ‘\’ followed by digit to match that same text, whatever it may have been.
The strings matching the first nine grouping constructs appearing in the entire regular expression passed to a search or matching function are assigned numbers 1 through 9 in the order that the open parentheses appear in the regular expression. So you can use ‘\1’ through ‘\9’ to refer to the text matched by the corresponding grouping constructs.
For example, ‘\(.*\)\1’ matches any newline-free string that is composed of two identical halves. The ‘\(.*\)’ matches the first half, which may be anything, but the ‘\1’ that follows must match the same exact text.
If a ‘\( ... \)’ construct matches more than once (which can happen, for instance, if it is followed by ‘*’), only the last match is recorded.
If a particular grouping construct in the regular expression was never
matched—for instance, if it appears inside of an alternative that
wasn't used, or inside of a repetition that repeated zero times—then
the corresponding ‘\digit’ construct never matches
anything. To use an artificial example, ‘\(foo\(b*\)\|lose\)\2’
cannot match ‘lose’: the second alternative inside the larger
group matches it, but then ‘\2’ is undefined and can't match
anything. But it can match ‘foobb’, because the first
alternative matches ‘foob’ and ‘\2’ matches ‘b’.
define-category
function (see Categories).
The following regular expression constructs match the empty string—that is, they don't use up any characters—but whether they match depends on the context. For all, the beginning and end of the accessible portion of the buffer are treated as if they were the actual beginning and end of the buffer.
‘\b’ matches at the beginning or end of the buffer (or string)
regardless of what text appears next to it.
Not every string is a valid regular expression. For example, a string
that ends inside a character alternative without a terminating ‘]’
is invalid, and so is a string that ends with a single ‘\’. If
an invalid regular expression is passed to any of the search functions,
an invalid-regexp
error is signaled.