Chapter 3 Character and String Processing

Internationalization Module User’s Guide : Chapter 3 Character and String Processing

Overview

As described in “The Unicode Standard”, the Internationalization Module uses the UTF-16 -character encoding form for the internal representation and manipulation of multilingual text. In UTF-16, each 21-bit Unicode code point is represented using one or two 16-bit code units.

The character and string processing classes of the Internationalization Module provide the ability to create and manipulate UTF-16 strings. This chapter describes how to:

• represent individual UTF-16 code units with RWUChar16 and Unicode code points with RWUChar32

• examine the character traits of an individual code point with RWUCharTraits; for example, its case, its direction of display, or whether it is a whitespace character

• represent and manipulate UTF-16 strings with RWUString, and substrings with RWUSubString and RWUConstSubString

• iterate over the code points in a string with RWUStringIterator and RWUConstStringIterator