Essential Tools Module User's Guide : Chapter 4 String Processing Classes : Pattern Matching
Pattern Matching
Class RWCString supports a convenient interface for string searches. In the example below, the code fragment:
 
RWCString s("curiouser and curiouser.");
size_t i = s.index("curious");
will find the start of the first occurrence of curious in s. The comparison will be case sensitive, and the result will be that i is set to 0. To find the index of the next occurrence, you would use:
i = s.index("curious", ++i);
which will result in i set to 14. You can make a case-insensitive comparison with:
 
RWCString s("Curiouser and curiouser.");
size_t i = s.index("curious", 0, RWCString::ignoreCase);
which will also result in i set to 0.
If the pattern does not occur in the string, the index() method returns the special value RW_NPOS.
Regular Expressions
As part of its pattern-matching capability, the Essential Tools Module supports simple and extended regular expression searches through its new RWTRegex<T> interface. Using RWTRegex<T> gives you access to wchar_t support, {m,n} cardinality constraint ability, and improved performance.
Extended regular expressions are the regular expressions used in the UNIX utilities lex and awk. Extended regular expressions can be any length, although limited by available memory. You will find details of the regular expression syntax in the ../sourceproref:index.htmlSourcePro C++ API Reference Guide under RWTRegex<T>.
RWTRegex<T> is based on the POSIX.2 standard for regular expressions. POSIX.2 includes notations for basic regular expressions (BREs) and extended regular expressions (EREs). RWTRegex<T> is based on the ERE standard to support the wide character searches required by many non-Latin languages.
NOTE >> If your regular expression search requires the usage of backreferences, you will need to use RWCRegexp, rather than RWTRegex<T>.
In order to offer backwards-compatibility with RWCRegexp regular expression syntax, the RWTRegex<T>-supported syntax differs slightly from the POSIX standard. For details, see the ../sourceproref:index.htmlSourcePro C++ API Reference Guide under RWTRegex<T>.
The RWTRegex<T> Interface
RWTRegex<T> can perform both simple and extended regular expressions. The interface includes four primary classes:
RWTRegex<T> is the primary template for all instantiations on any type of character. Specializations of this template for use with characters of type char and wchar_t are also provided.
RWTRegexMatchIterator<T> iterates over all matches of a regular expression in a given string.
RWTRegexTraits<T> defines the character traits for regular expression characters of a specified type.
RWTRegexResult<T> encapsulates the result of a pattern matching operation. It stores the starting offset and length of the overall match, as well as all sub-expression matches.
RWRegexErr is used to report pattern compilation errors.
NOTE >> A previous regular expression class RWCRegexp is now deprecated. However, if your program requires the use of backreferences, you must use RWCRegexp, as RWTRegex<T> does not include that functionality. (Backreferences allow you to match new text with previously-matched text in a regular expression.)
Using Regular Expressions Efficiently
The results of performance tests show that the new RWTRegex<T> interface performs matching operations approximately seven to eight times faster than did the previous interface. Compiling the original pattern is, however, slightly slower.
To maximize efficiency when pattern matching, first instantiate RWTRegex<T> to compile the pattern once. Then perform any repeated matching operations.
Introductory Examples Using Regular Expressions
You can use a regular expression to return a substring; for example, here's how you might match all Windows messages (prefix WM_):
#include <rw/cstring.h>
#include <rw/tools/regex.h> // Get access to RWTRegex<T>
#include <iostream>
using std::cout;
using std::endl;
 
int main()
{
RWCString a(“A message named WM_CREATE”);
// Construct a Regular Expression to match Windows messages:
RWTRegex<char> re(“WM_[A-Z]*”);
RWTRegexResult<char>result;
if (result=re.search(a))
cout << result.subString(a) << endl;
else
cout << “No match in” << a << endl;
return 0;
}
Program Output:
WM_CREATE
The search method on RWTRegex determines if there is a match. Then the RWTRegexResult::substring() method obtains a matched string. The following example shows some of the capabilities of extended regular expressions:
 
#include <rw/cstring.h>
#include <rw/tools/regex.h> // Get access regular expressions
#include <iostream>
using std::cout;
using std::endl;
 
int main()
{
RWTRegex<char> re("Lisa|Betty|Eliza");
RWCString s("Betty II is the Queen of England.");
// Replace first occurrence of "Lisa" or "Betty" or "Eliza"
// with "Elizabeth"
re.replace(s, "Elizabeth");
cout << s << endl;
 
s = "Leg Leg Hurrah!";
re = RWTRegex<char>("Leg");
 
// Replace all occurrences of "Leg" with "Hip"
re.replace(s, "Hip", 0);
cout << s << endl;
return 0;
}
Program Output:
Elizabeth II is the Queen of England.
Hip Hip Hurrah!