Rogue Wave banner
Previous fileTop of DocumentContentsIndex pageNext file
Internationalization Module Reference Guide
Rogue Wave web site:  Home Page  |  Main Documentation Page

RWURegularExpression

Module:  Internationalization Module   Group:  Unicode String Processing


Does Not Inherit

Local Index

Members

Header File

#include <rw/i18n/RWURegularExpression.h> 

Description

RWURegularExpression supports regular expressions with Unicode extensions.

A regular expression is a string pattern composed of normal characters and special characters. Special characters are used to denote an arrangement of the other characters in the regular expression pattern. A regular expression can be used to search for, and perhaps replace, occurrences of the regular expression pattern in strings.

Regular expression syntax describes how to arrange normal characters and special characters to form a valid regular expression pattern. The regular expression syntax for RWURegularExpression is similar to that of the POSIX 2 extended regular expression (ERE) specification. For more information see Section 8.4.2, "POSIX Extended Regular Expression Syntax," in the Internationalization Module User's Guide.

RWURegularExpression extends the POSIX 2 ERE syntax to provide support for Unicode basic and tailored regular expressions.

Basic Unicode regular expression support corresponds to Level 1 support, as described in the Unicode Regular Expression Guidelines (Unicode Technical Report #18 (UTR-18) Version 5.1 at http://www.unicode.org/reports/tr18/tr18-5.1.html). Basic Unicode regular expressions are useful for the majority of Unicode strings, and extend the POSIX ERE standard with the following Unicode extensions:

Tailored Unicode regular expressions extend the basic regular expression functionality, corresponding to Level 2 and Level 3 support, also described in UTR-18 Version 5.1. In addition to some minor additions, tailored extensions include support for:

For more information on basic and tailored regular expression support in the Internationalization Module, Section 8.4.3, "Unicode Regular Expressions," in the Internationalization Module User's Guide.

The Role of the Locale in a Regular Expression

RWURegularExpression accepts an RWULocale argument in its constructor, or via the setLocale() method.The regular expression instance uses the locale to determine locale-specific behavior in a tailored regular expression (Locales have little effect on basic regular expressions). Grapheme clusters, character sets, and the break locations for words, sentences and lines may change depending on locale. For example, the Spanish character "ch" is found in the character set "[b-d]" in Spanish locales, but not in English.

For more information on creating regular expressions, Section 8.4.4, "How to Create an RWURegularExpression," in the Internationalization Module User's Guide.

Example

Related Classes

RWUStringSearch

Public Enums

enum Options { Normal,
               IgnoreCase,
               InterpretGraphemes
};
enum Status { Ok,
              MissingEscapeSequence,
              InvalidHexNibble,
              InsufficientHex8Data,
              InsufficientHex16Data,
              MissingClosingBracket,
              MissingClosingCurlyBrace,
              MissingClosingParen,
              UnmatchedClosingParen,
              InvalidSubexpression,
              InvalidDataAfterOr,
              InvalidDataBeforeOr,
              ConsecutiveCardinalities,
              InvalidCardinalityRange,
              LeadingCardinality,
              InvalidDecimalDigit,
              UnmatchedClosingCurly,
              NeverEndingCategoryName,
              InvalidCategoryName,
              InfiniteEmptyMatch,
              ASCIIConversionError,
              InvalidGraphemeCluster,
              NumberOfStatusCodes
};
enum UnicodeConformanceLevel { Basic, Tailored };

Public Constructors

RWURegularExpression();
explicit RWURegularExpression(const char* pattern,
           UnicodeConformanceLevel level = Basic,
           int32_t options = int32_t(Normal),
           const RWULocale& locale = RWULocale::getDefault(),
           RWUToUnicodeConverter& converter = 
RWUToUnicodeConversionContext::
getContext().getConverter());
explicit RWURegularExpression(const RWCString& pattern,
           UnicodeConformanceLevel level = Basic,
           int32_t options = int32_t(Normal),
           const RWULocale& locale = RWULocale::getDefault(),
           RWUToUnicodeConverter& converter =
RWUToUnicodeConversionContext::
getContext().getConverter());
explicit RWURegularExpression(const RWUString& pattern,
           UnicodeConformanceLevel level = Basic,
           int32_t options = int32_t(Normal),
           const RWULocale& locale = RWULocale::getDefault());
RWURegularExpression(const RWURegularExpression& source);

Public Destructor

~RWURegularExpression();

Public Member Operators

RWURegularExpression&
operator=(const RWURegularExpression& rhs);
bool
operator<(const RWURegularExpression& rhs);
bool
operator==(const RWURegularExpression& rhs);

Public Member Functions

RWUCollator::CollationStrength
getCollationStrength() const;
UnicodeConformanceLevel
getLevel() const;
RWULocale
getLocale() const;
int32_t
getOptions() const;
RWUString
getPattern() const;
RWURegexResult
matchAt(const RWUString& str) const;
RWURegexResult
matchAt(const RWUString& str,
        const RWUConstStringIterator& start) const;
RWURegexResult
matchAt(const RWUString& str,
        const RWUConstStringIterator& start,
        const RWUConstStringIterator& end) const;
size_t
replace(RWUString& str,
        const RWUString& replacement,
        size_t count = size_t(1),
        int32_t matchID = 0) const;
size_t
replace(RWUString& str, 
        const RWUString& replacement, size_t count,
        int32_t matchID,
        const RWUConstStringIterator& start) const;
size_t
replace(RWUString& str, 
        const RWUString& replacement, size_t count,
        int32_t matchID, const RWUConstStringIterator& start,
        const RWUConstStringIterator& end,
        bool replaceEmptyMatches = true) const;
RWURegexResult 
search(const RWUString& str) const;
RWURegexResult
search(const RWUString& str,
       const RWUConstStringIterator& start) const;
RWURegexResult
search(const RWUString& str, 
       const RWUConstStringIterator& start,
       const RWUConstStringIterator& end) const;
void
setCollationStrength(RWUCollator::CollationStrength);
void
setLevel(UnicodeConformanceLevel level = Basic);

NOTE -- When setLevel() is called, the regular expression pattern is recompiled into a form that more efficiently allows for the specified level of Unicode support.
void
setLocale(const RWULocale& loc);
size_t
subCount() const;


Previous fileTop of DocumentContentsIndex pageNext file

© Copyright Rogue Wave Software, Inc. All Rights Reserved.
Rogue Wave and SourcePro are registered trademarks of Rogue Wave Software, Inc. in the United States and other countries. All other trademarks are the property of their respective owners.
Contact Rogue Wave about documentation or support issues.