Internationalization Module User’s Guide : Chapter 3 Character and String Processing : Character Properties : General Character Categories
General Character Categories
Every Unicode character is also assigned to a general character category in the Unicode Character Database. RWUCharTraits provides a GeneralCategory enum with values that identify the various categories, such as UppercaseLetter, LowercaseLetter, DecimalDigitNumber, LineSeparator, ConnectorPunctuation, and so on. (See the documentation for RWUCharTraits in the SourcePro API Reference Guide for a complete list of enumerated values.) The values in this enumeration correspond to the general category property codes that appear in the Unicode Character Database, as described in:
http://www.unicode.org/reports/tr44/
The static method RWUCharTraits::getGeneralCategory() returns the value in the GeneralCategory enumeration that identifies the general character category associated with a given code point. Various convenience methods are also provided, which return true if a given RWUChar32 represents a code point in a particular character category: RWUCharTraits::isControl(), RWUCharTraits::isError(), RWUCharTraits::isLetter(), RWUCharTraits::isPunctuation(), RWUCharTraits::isSpace(), and RWUCharTraits::isWhitespace(). The static method getWhitespace() returns a null-terminated array of whitespace code points, as a convenience for use as delimiters (see “Tokenizing”).