RWBasicUString and RWCString
RWBasicUString is similar to
RWCString. For example:
• Both classes have methods append(), prepend(), insert(), remove(), and replace() for modifying a string.
• Both classes also have methods first(), last(), index(), rindex(), and contains() that search for characters or strings of characters contained with a string.
• Both classes have methods compareTo() for lexically ordering strings.
RWBasicUString differs from
RWCString in that an
RWBasicUString instance contains a series of Unicode characters encoded in UTF-16, while an
RWCString instance contains bytes encoded in an arbitrary encoding.
RWBasicUString also performs conversion between UTF-16 and UTF-8. Because
RWBasicUString contains UTF-16, its API has some methods that
RWCString does not. For example:
• Methods requiresSurrogatePair(), isHighSurrogate(), and isLowSurrogate() indicate whether a 21-bit Unicode code point requires a surrogate pair of UTF-16 code units. Most characters can be represented in the UTF-16 encoding form with a single 16-bit code unit. Only characters in the range 0x10000 to 0x10FFFF must be represented with a surrogate pair of two UTF-16 code units.
• Method
computeCodePointValue() returns the appropriate
RWUChar32 code point given a surrogate pair of
RWUChar16 code units.
• Methods
highSurrogate() and
lowSurrogate() return the first and second surrogate
RWUChar16 code units for a given
RWUChar32 code point.
• Methods compareCodeUnits() and compareCodePoints() perform code unit and code point ordering of strings, respectively. Code unit ordering of two strings may differ from code point ordering if either string contains surrogate pairs.
• Methods codeUnitLength() and codePointLength() return the number of code units or code points in a string. The standard length() method is equivalent to codeUnitLength().
• Method
toUtf8() returns an
RWCString containing a UTF-8 representation of the string.
• Method
toUtf32() returns a
std::basic_string templatized on
RWUChar32 containing a UTF-32 representation of the string.
• Method
toWide() returns an
RWWString containing a UTF-16 or UTF-32 representation of the contents of the string. The representation depends on the size of
wchar_t. If
sizeof(wchar_t) is
2, the
RWWString is encoded in UTF-16. If
sizeof(wchar_t) is
4, the
RWWString is encoded in UTF-32.
• Method
validateCodePoint() throws an
RWConversionErr if a given
RWUChar32 code point is not a valid Unicode character, or returns the code point if it is valid. This method can be used to validate a code point value anywhere one is passed to a method.