Internationalization Module User’s Guide : Chapter 3 Character and String Processing : Character Properties : General Character Categories
General Character Categories
Every Unicode character is also assigned to a general character category in the Unicode Character Database. RWUCharTraits provides a GeneralCategory enum with values that identify the various categories, such as UppercaseLetter, LowercaseLetter, DecimalDigitNumber, LineSeparator, ConnectorPunctuation, and so on. (See the documentation for RWUCharTraits in the SourcePro C++ API Reference Guide for a complete list of enumerated values.) The values in this enumeration correspond to the general category property codes that appear in the Unicode Character Database, as described in:
http://www.unicode.org/reports/tr44/
The static method RWUCharTraits::getGeneralCategory() returns the value in the GeneralCategory enumeration that identifies the general character category associated with a given code point. Various convenience methods are also provided, which return true if a given RWUChar32 represents a code point in a particular character category: RWUCharTraits::isControl(), RWUCharTraits::isError(), RWUCharTraits::isLetter(), RWUCharTraits::isPunctuation(), RWUCharTraits::isSpace(), and RWUCharTraits::isWhitespace(). The static method getWhitespace() returns a null-terminated array of whitespace code points, as a convenience for use as delimiters (see “Tokenizing”).