Rogue Wave banner
Previous fileTop of DocumentContentsIndex pageNext file
Essential Tools Module Reference Guide
Rogue Wave web site:  Home Page  |  Main Documentation Page

RWTRegex<T>

Module:  Essential Tools Module   Group:  String Processing Classes


Does not inherit

Local Index

Members

Non-Members

Synopsis

#include <rw/tools/regex.h>
RWTRegex<char>       re0(".*\\.doc");  
   // Matches filenames with suffix ".doc"
RWCRegularExpression re1("a+");       
   // Matches one or more 'a'
RWWRegularExpression re2(L"b+");       
   // Matches one or more wide-character, 'b'

Description

RWTRegex<T> is the primary template for the new regular expression interface. It provides most of the POSIX.2 standard for regular expression pattern matching and may be used for both narrow (8-bit) and for wide (wchar_t) character strings.

It enhances and replaces RWCRegexp and RWCRExpr, which have been deprecated in this release.

However, if your regular expression search requires the usage of backreferences, you will need to use RWCRegexp, rather than RWTRegex<T>. Backreferencing is not supported in extended regular expressions (EREs) but only in basic regular expressions (BREs).

RWTRegex<T> can represent both a simple and an extended regular expression such as those found in lex and awk. The constructor "compiles" the expression into a form that can be used more efficiently. The results can then be used for string searches using class RWCString. Regular expressions (REs) can be of arbitrary size, limited by memory. The extended regular expression features found here are a subset of those found in the POSIX.2 standard (ANSI/IEEE Std. 1003.2, ISO/IEC 9945-2).

RWTRegex<T> differs from the POSIX.2 standard in the following ways:

Constructing a regular expression

To match a single character RE

Any character that is not a special character matches itself.

  1. A backslash (\) followed by any special character matches the literal character itself; that is, its use "escapes" the special character. For example, \* matches "*" without applying the syntax of the * special character.

  2. The "special characters" are:

  3. The period (.) matches any character. For example, ".umpty" matches either "Humpty" or "Dumpty."

  4. A set of characters enclosed in brackets ([ ]) is a one-character RE that matches any of the characters in that set. This means that [akm] matches either an "a", "k", or "m". A range of characters can be indicated with a dash, as in [a-z], which matches any lower-case letter. However, if the first character of the set is the caret (^), then the RE matches any character except those in the set. It does not match the empty string. For example: [^akm] matches any character except "a", "k", or "m". The caret loses its special meaning if it is not the first character of the set.

To match a multicharacter RE

  1. Parentheses (( )) group parts of regular expressions together into subexpressions that can be treated as a single unit. For example, (ha)+ matches one or more "ha"s.

  2. An asterisk (*) following a one-character RE or a parenthesized subexpression matches zero or more occurrences of the RE. Hence, [a-z]* and (ha)* matches zero or more lower-case characters.

  3. A plus (+) following a one-character RE or a parenthesized subexpression matches one or more occurrences of the RE. Hence, [a-z]+ and (ha)+ matches one or more lower-case characters.

  4. A question mark (?) is an optional element. The preceding RE can occur zero or once in the string -- no more. For example, xy?z matches either xyz or xz.

  5. The concatenation of REs is a RE that matches the corresponding concatenation of strings. For example, [A-Z][a-z]* matches any capitalized word.

  6. The OR character ( | ) allows a choice between two regular expressions. For example, jell(y|ies) matches either "jelly" or "jellies".

  7. Braces ({ }) following a one-character RE matches the preceding element according to the number indicated. For example, a{2,3} matches either "aa" or "aaa."

All or part of the regular expression can be "anchored" to either the beginning or end of the string being searched.

  1. If the caret (^) is at the beginning of the (sub)expression, then the matched string must be at the beginning of the string being searched. For example, you could use "t^hat" to return all occurrences of "hat" but avoid returning "that".

  2. If the dollar sign ($) is at the end of the (sub)expression, then the matched string must be at the end of the string being searched. For example, "know$" would match "I know what I know" but not "He knows what he knows."

Overriding the backslash special character

A common pitfall with regular expression classes is overriding the backslash special character (\). The C++ compiler and the regular expression constructor will both assume that any backslashes they see are intended to escape the following character. Thus, to specify a regular expression that exactly matches "a\a", you would have to create the regular expression using four backslashes as follows: the regular expression needs to see "a\\a", and for that to happen, the compiler would have to see "a\\\\a".

The backslashes marked with a ^ are an escape for the compiler, and the ones marked with | will thus be seen by the regular expression parser. At that point, the backslash marked 1 is an escape, and the one marked 2 will actually be put into the regular expression.

Similarly, if you really need to escape a character, such as a ".", you will have to pass two backslashes to the compiler:

Once again, the backslash marked ^ is an escape for the compiler, and the one marked with | will be seen by the regular expression constructor as an escape for the following ".".

Persistence

None

Example

Program output:

Related classes

Related classes include:

Public Typedefs

typedef RW_TYPENAME RWTRegexTraits<T>::Char RChar;
typedef RWTRegex<char> RWCRegularExpression;
typedef RWTRegexMatchIterator<RChar> iterator;
typedef RWTRegexMatchIterator<RChar> match_iterator;


NOTE -- The program provides match_iterator and iterator. RWTRegex::iterator will be a match iterator. If you need to add new iterator types, you must give them a descriptive prefix, as in match_iterator.
typedef std::basic_string<RChar> RString;

Global typedefs

typedef RWTRegex<char> RWCRegularExpression;
typedef RWTRegex<wchar_t> RWWRegularExpression;

Public Constructors

RWTRegex(const Rchar* str, size_t length = size_t(-1));
RWTRegex(const std::basic_string<E>& str, size_t length = size_t(-1));
RWTRegex(const RWTRegex& source);
RWTRegex();

Public Destructor

virtual ~RWTRegex();

Assignment Operators

RWTRegex&
operator= (const RWTRegex& rhs);
bool
operator< (const RWTRegex& rhs) const;
bool
operator==(const RWTRegex& rhs) const;

Enumeration

enum RWTRegexStatus

Public Member Functions

const RWRegexErr&
getStatus() const;
size_t
index(const RChar*   str, 
     size_t* mLen   = 0,
     size_t start  = size_t(0),
     size_t length = size_t(-1));
size_t
index(const RString& str, 
     size_t* mLen   = 0,
     size_t  start  = size_t(0),
     size_t  length = size_t(-1));
RWTRegexResult<T>
matchAt(const RChar* str,
     size_t start  = size_t(0),
     size_t length = size_t(-1));
RWTRegexResult<T>
matchAt(const RString& str, 
     size_t start  = size_t(0),
     size_t length = size_t(-1));
size_t
replace(RString& str, const RString& replacement,
     size_t count = 1,
     size_t matchID = 0,
     size_t start = size_t(0),
     size_t length = size_t(-1),
     bool replaceEmptyMatches = true);
RWTRegexResult<T>
search(const RChar* str, size_t start  = size_t(0),
     size_t length = size_t(-1));
RWTRegexResult<T>
search(const RString& str, size_t start = size_t(0),
     size_t length = size_t(-1));
size_t
subCount() const;


Previous fileTop of DocumentContentsIndex pageNext file

© Copyright Rogue Wave Software, Inc. All Rights Reserved.
Rogue Wave and SourcePro are registered trademarks of Rogue Wave Software, Inc. in the United States and other countries. All other trademarks are the property of their respective owners.
Contact Rogue Wave about documentation or support issues.