Rogue Wave banner
Previous fileTop of DocumentContentsIndex pageNext file
Internationalization Module Reference Guide
Rogue Wave web site:  Home Page  |  Main Documentation Page

RWUString

Module:  Internationalization Module   Group:  Unicode String Processing


RWUString RWBasicUString

Local Index

Members

Non-Members

Header File

#include <rw/i18n/RWUString.h> 

Description

RWUString stores and manipulates Unicode character sequences encoded as UTF-16 code units. This class extends RWBasicUString in the Essential Tools Module.

Unicode is a coded character set. It assigns numeric code point values from 0 to 0x10FFFF to abstract characters. UTF-16 is a character encoding form for Unicode in which a single 21-bit Unicode code point is represented using one or two 16-bit integer code units. UTF-16 strikes a balance between ease of use and efficient use of memory. Most characters can be represented with a single 16-bit code unit. Only characters in the range 0x10000 to 0x10FFFF must be represented with a surrogate pair of two UTF-16 code units.

Null Termination

One or more code units in a Unicode character string can be zero. Hence, a Unicode character string may not be null-terminated. In practice, it is a rare Unicode string that uses embedded nulls, but you should program defensively. RWUString handles embedded nulls properly.

Narrow Characters and Other Non-Unicode Strings

RWUString does not deal directly with non-Unicode characters or character strings such as char, char*, wchar_t, wchar_t*, RWCString, RWWString, std::string, std::wstring, and so on. If a non-Unicode character or string must be used with an RWUString, the non-Unicode character or string must be converted into Unicode first. The conversion can be done explicitly through the use of an RWUToUnicodeConverter, or implicitly through the use of an RWUToUnicodeConversionContext.

Code Units, Code Points, and Characters

The characteristics of UTF-16 imply that the number of 16-bit code units in a string may differ from the number of code points. Furthermore, the nature of Unicode implies that the number of code points may differ from the number of characters, as interpreted by the end user, since Unicode characters can be decomposed into multiple code points that correspond to the various accents or glyphs that comprise each character. The following methods and classes help you work with these concepts:

Note that numCodePoints() may be slower than the length() method because numCodePoints() must traverse the string to find code points that arise from surrogate code unit pairs. Since the majority of code points in the current Unicode Standard do not require a surrogate representation, many applications can rely on length() to determine the number of code points.

Lexical vs. Logical Comparison

RWUString performs comparisons on a lexical basis. Methods such as compareTo(), contains(), first(), last(), index(), rindex(), strip(), and the global comparison operators compare the bit values of individual code units, not the logical values of code points or characters. In contrast, RWUCollator performs comparisons on a logical basis, following the conventions specified in a given locale. The logical comparisons made by RWUCollator are more likely to match an end user's expectations regarding string equality and ordering. The lexical comparisons made by RWUString, however, are likely to be faster. If two strings contain characters from the same script, and are in the same normalization form, lexical comparisons may be adequate for many purposes.

Parameters of Type RWUChar*

Do not pass a NULL pointer value for parameters of type const RWUChar16*. Doing so will produce erroneous behavior and will trigger an assertion failure in debug builds of the library.


NOTE -- Do not pass a NULL pointer value for parameters of type const RWUChar16*.

Example

Related Classes

RWBasicUString, RWUSubString, RWUConstSubString, RWUStringIterator, RWUConstStringIterator

Public Enums

enum StripType { Leading,
                 leading,
                 Trailing,
                 trailing,
                 Both,
                 both
};
enum Utf8 { UTF8 };
enum NormalizationForm { FormNFD,
                         FormNFKD,
                         FormNFC,
                         FormNFKC
};

Static Member Functions

static RWUString
foldCase(const RWUString& source,
         bool excludeSpecial = false);

NOTE -- This function supports simple caseless comparisons; use RWUCollator when more robust behavior is required.
static RWCString
toBytes(const RWUChar16* source, size_t length, 
        RWUFromUnicodeConverter& converter = RWUFromUnicodeConversionContext::getContext().getConverter());
static RWUString
toLower(const RWUString& source, const RWULocale& locale);
static RWUString
toTitle(const RWUString& source, const RWULocale& locale);
static RWUString
toUpper(const RWUString& source, const RWULocale& locale);
static RWUString
unescape(const RWUString& source);

Global Operators

The following comparison operators provide direct lexicographical comparisons between all supported Unicode string and substring types.

bool
operator==(const RWUString& lhs, const RWUString& rhs);
bool
operator==(const RWUString& lhs, const RWUChar16* rhs);

bool
operator==(const RWUChar16* lhs, const RWUString& rhs);

bool
operator==(const RWUString& lhs, const RWUChar32* rhs);

bool
operator==(const RWUChar32* lhs, const RWUString& rhs);

bool
operator==(const RWUString& lhs, const RWUSubString& rhs);

bool
operator==(const RWUSubString& lhs, const RWUString& rhs);

bool
operator==(const RWUString& lhs,const RWUConstSubString& rhs);

bool
operator==(const RWUConstSubString& lhs,const RWUString& rhs);

bool
operator==(const RWUSubString& lhs,
const RWUConstSubString& rhs); bool operator==(const RWUConstSubString& lhs,const RWUSubString& rhs); bool operator!=(const RWUString& lhs, const RWUString& rhs);
bool
operator!=(const RWUString& lhs, const RWUChar16* rhs);

bool
operator!=(const RWUChar16* lhs, const RWUString& rhs);

bool
operator!=(const RWUString& lhs, const RWUChar32* rhs);

bool
operator!=(const RWUChar32* lhs, const RWUString& rhs);

bool
operator!=(const RWUString& lhs, const RWUSubString& rhs);

bool
operator!=(const RWUSubString& lhs, const RWUString& rhs);

bool
operator!=(const RWUConstSubString& lhs,const RWUString& rhs);

bool
operator!=(const RWUString& lhs,const RWUConstSubString& rhs);

bool
operator!=(const RWUConstSubString& lhs,
const RWUSubString& rhs); bool operator!=(const RWUSubString& lhs,
const RWUConstSubString& rhs); bool operator<(const RWUString& lhs, const RWUString& rhs);
bool
operator<(const RWUChar16* lhs, const RWUString& rhs);

bool
operator<(const RWUString& lhs, const RWUChar16* rhs);

bool
operator<(const RWUChar32* lhs, const RWUString& rhs);

bool
operator<(const RWUString& lhs, const RWUChar32* rhs);

bool
operator<(const RWUSubString& lhs, const RWUString& rhs);

bool
operator<(const RWUString& lhs, const RWUSubString& rhs);

bool
operator<(const RWUConstSubString& lhs, const RWUString& rhs);

bool
operator<(const RWUString& lhs, const RWUConstSubString& rhs);

bool
operator<(const RWUSubString& lhs,
           const RWUConstSubString& rhs);

bool
operator<(const RWUConstSubString& lhs,
const RWUSubString& rhs); bool operator<=(const RWUString& lhs, const RWUString& rhs);
bool
operator<=(const RWUChar16* lhs, const RWUString& rhs);

bool
operator<=(const RWUString& lhs, const RWUChar16* rhs);

bool
operator<=(const RWUChar32* lhs, const RWUString& rhs);

bool
operator<=(const RWUString& lhs, const RWUChar32* rhs);

bool
operator<=(const RWUSubString& lhs, const RWUString& rhs);

bool
operator<=(const RWUString& lhs,const RWUSubString& rhs);

bool
operator<=(const RWUConstSubString& lhs,const RWUString& rhs);

bool
operator<=(const RWUString& lhs,const RWUConstSubString& rhs);

bool
operator<=(const RWUConstSubString& lhs,
           const RWUSubString& rhs);
bool
operator<=(const RWUSubString& lhs,
           const RWUConstSubString& rhs);

bool
operator>(const RWUString& lhs, const RWUString& rhs);
bool
operator>(const RWUChar16* lhs, const RWUString& rhs);

bool
operator>(const RWUString& lhs, const RWUChar16* rhs);

bool
operator>(const RWUChar32* lhs, const RWUString& rhs);

bool
operator>(const RWUString& lhs, const RWUChar32* rhs);

bool
operator>(const RWUSubString& lhs, const RWUString& rhs);

bool
operator>(const RWUString& lhs, const RWUSubString& rhs);

bool
operator>(const RWUConstSubString& lhs, const RWUString& rhs);

bool
operator>(const RWUString& lhs, const RWUConstSubString& rhs);

bool
operator>(const RWUConstSubString& lhs,
          const RWUSubString& rhs);
bool
operator>(const RWUSubString& lhs,
          const RWUConstSubString& rhs);

bool
operator>=(const RWUString& lhs, 
const RWUString& rhs);
bool
operator>=(const RWUChar16* lhs,const RWUString& rhs);            

bool
operator>=(const RWUString& lhs, const RWUChar16* rhs);

bool
operator>=(const RWUChar32* lhs, const RWUString& rhs);

bool
operator>=(const RWUString& lhs, const RWUChar32* rhs);

bool
operator>=(const RWUSubString& lhs, const RWUString& rhs);

bool
operator>=(const RWUString& lhs, const RWUSubString& rhs);

bool
operator>=(const RWUConstSubString& lhs,const RWUString& rhs);

bool
operator>=(const RWUString& lhs,const RWUConstSubString& rhs);

bool
operator>=(const RWUConstSubString& lhs,
const RWUSubString& rhs); bool operator>=(const RWUSubString& lhs,
const RWUConstSubString& rhs); RWUString operator+(const RWUString& lhs, const RWUString& rhs);
RWUString
operator+(const RWUChar16* lhs, const RWUString& rhs);

RWUString
operator+(const RWUString& lhs, const RWUChar16* rhs);

RWUString
operator+(const RWUString& lhs, const RWUSubString& rhs);

RWUString
operator+(const RWUSubString& lhs, 
const RWUString& rhs); RWUString operator+(const RWUConstSubString& lhs,
const RWUString& rhs); RWUString operator+(const RWUString& lhs,
const RWUConstSubString& rhs);
RW_SL_IO_STD(istream&) 
operator>>(RW_SL_IO_STD(istream&) is, RWUString& ustr);
RW_SL_IO_STD(ostream&) 
operator<<(RW_SL_IO_STD(ostream)& os, const RWUString& ustr);
RW_SL_IO_STD(ostream&) 
operator<<(RW_SL_IO_STD(ostream)& os,
           const RWUString::Pad& pad);

Public Constructors

RWUString();
RWUString(const RWUString& source);
RWUString(const RWBasicUString& source);
RWUString(const RWUSubString& source);
RWUString(const RWUConstSubString& source);
RWUString(const RWUChar16* source,
          Duration duration = Transient);
RWUString(const RWUChar16* source, size_t length,
          Duration duration = Transient);
RWUString(const RWUChar16* source, size_t length,
          size_t initialCapacity);
RWUString(RWUChar16* clientBuffer, Deallocator* deallocator);
RWUString(RWUChar16* clientBuffer, size_t contentLength, 
          Deallocator* deallocator);
RWUString(RWUChar16* clientBuffer, size_t contentLength, 
          size_t bufferCapacity, Deallocator* deallocator);
RWUString(const RWUChar32* source);
RWUString(const RWUChar32* source, size_t length);
RWUString(const RWSize_T& initialCapacity);
RWUString(RWUChar16 codeUnit, size_t repeat = 1);
RWUString(RWUChar32 codePoint, size_t repeat = 1);
RWUString(const char* source, Utf8);
RWUString(const char* source, size_t length, Utf8);
RWUString(const RWCString& source, Utf8);
RWUString(const char* source, 
  RWUToUnicodeConverter& converter = 
  RWUToUnicodeConversionContext::getContext().getConverter());
RWUString(const RWCString& source,
  RWUToUnicodeConverter& converter = 
  RWUToUnicodeConversionContext::getContext().getConverter());
RWUString(const RWCSubString& source,
  RWUToUnicodeConverter& converter = 
  RWUToUnicodeConversionContext::getContext().getConverter());
RWUString(const RWCConstSubString& source, 
  RWUToUnicodeConverter& converter = 
  RWUToUnicodeConversionContext::getContext().getConverter());
RWUString(const RW_SL_STD(string)& source, 
  RWUToUnicodeConverter& converter = 
  RWUToUnicodeConversionContext::getContext().getConverter());

Public Member Operators

RWUString&
operator=(const RWBasicUString& source);
RWUString&
operator=(const char* source);
RWUString&
operator=(const RWUSubString& source);
RWUString&
operator=(const RWUConstSubString& source);
RWUString&
operator=(const RWUChar16* source);
RWUString&
operator=(const RWUChar32* source);
RWUString&
operator=(RWUChar16 codeUnit);
RWUString&
operator=(RWUChar32 codePoint);
RWUString&
operator+=(const RWBasicUString& source);
RWUString&
operator+=(const RWUSubString& source);
RWUString&
operator+=(const RWUConstSubString& source);
RWUString&
operator+=(const RWUChar16* source);
RWUString&
operator+=(const RWUChar32* source);
RWUString&
operator+=(RWUChar16 codeUnit);
RWUString&
operator+=(RWUChar32 codePoint);
RWUChar16&
operator()(size_t offset);
RWUChar16
operator()(size_t offset) const;
RWUSubString
operator()(size_t offset, size_t length);
RWUConstSubString
operator()(size_t offset, size_t length) const;

Public Member Functions

RWUString&
append(const RWBasicUString& source);
RWUString&
append(const RWBasicUString& source, size_t sourceOffset, 
       size_t sourceLength);
RWUString&
append(const RWUSubString& source);
RWUString&
append(const RWUConstSubString& source);
RWUString&
append(const RWUChar16* source);
RWUString&
append(const RWUChar16* source, size_t sourceLength);
RWUString&
append(const RWUChar32* source);
RWUString&
append(const RWUChar32* source, size_t sourceLength);
RWUString&
append(RWUChar16 codeUnit, size_t repeat = 1);
RWUString&
append(RWUChar32 codePoint, size_t repeat = 1);
RWUStringIterator
beginCodePointIterator();
RWUConstStringIterator
beginCodePointIterator() const;
int
compareCodePoints(const RWUSubString& rhs) const;
int
compareCodePoints(const RWUConstSubString& rhs) const;
int
compareCodeUnits(const RWUSubString& rhs) const;
int
compareCodeUnits(const RWUConstSubString& rhs) const;
int
compareTo(const RWUSubString& rhs) const;
int
compareTo(const RWUConstSubString& rhs) const;
bool
contains(const RWUSubString& pattern) const;
bool 
contains(const RWUConstSubString& pattern) const;
bool
contains(size_t offset, const RWUSubString& pattern) const;
bool
contains(size_t offset,
         const RWUConstSubString& pattern) const;
bool
contains(size_t offset, size_t length,
         const RWUSubString& pattern) const;
bool
contains(size_t offset, size_t length,
         const RWUConstSubString& pattern) const;
RWUStringIterator
endCodePointIterator();
RWUConstStringIterator
endCodePointIterator() const;
size_t
first(const RWUSubString& codeUnitSet) const;
size_t
first(const RWUConstSubString& codeUnitSet) const;
size_t
first(size_t offset, const RWUSubString& codeUnitSet) const;
size_t
first(size_t offset,
      const RWUConstSubString& codeUnitSet) const;
size_t
first(size_t offset, size_t length,
      const RWUSubString& codeUnitSet) const;
size_t
first(size_t offset, size_t length,
      const RWUConstSubString& codeUnitSet) const;
RWUString&
foldCase(bool excludeSpecial = false);
size_t
index(const RWUSubString& pattern) const;

size_t
index(const RWUConstSubString& pattern) const;
size_t
index(size_t offset, const RWUSubString& pattern) const;
size_t
index(size_t offset, const RWUConstSubString& pattern) const;
size_t
index(size_t offset, size_t length,
      const RWUSubString& pattern) const;
size_t 
index(size_t offset, size_t length,
      const RWUConstSubString& pattern) const;
RWUString&
insert(size_t offset, const RWBasicUString& source);
RWUString&
insert(size_t offset, const RWBasicUString& source,
       size_t sourceOffset, size_t sourceLength);
RWUString&
insert(size_t offset, const RWUSubString& source);
RWUString&
insert(size_t offset, const RWUConstSubString& source);
RWUString&
insert(size_t offset, const RWUChar16* source);
RWUString&
insert(size_t offset, const RWUChar16* source,
       size_t sourceLength);
RWUString&
insert(size_t offset, const RWUChar32* source);
RWUString&
insert(size_t offset, const RWUChar32* source,
       size_t sourceLength);
RWUString&
insert(size_t offset, RWUChar16 codeUnit, size_t repeat = 1);
RWUString&
insert(size_t offset, RWUChar32 codePoint, size_t repeat = 1);
size_t
last(const RWUSubString& codeUnitSet) const;

size_t 
last(const RWUConstSubString& codeUnitSet) const;
size_t
last(size_t offset, const RWUSubString& codeUnitSet) const;
size_t
last(size_t offset,
     const RWUConstSubString& codeUnitSet) const;
size_t
last(size_t offset, size_t length,
     const RWUSubString& codeUnitSet) const;

size_t
last(size_t offset, size_t length,
     const RWUConstSubString& codeUnitSet) const;
RWUString&
normalize(NormalizationForm form = FormNFC);
RWUString&
prepend(const RWBasicUString& source);
RWUString&
prepend(const RWBasicUString& source, size_t sourceOffset, 
        size_t sourceLength);
RWUString&
prepend(const RWUSubString& source);
RWUString& 
prepend(const RWUConstSubString& source);
RWUString&
prepend(const RWUChar16* source);
RWUString&
prepend(const RWUChar16* source, size_t sourceLength);
RWUString&
prepend(const RWUChar32* source);
RWUString&
prepend(const RWUChar32* source, size_t sourceLength);
RWUString&
prepend(RWUChar16 codeUnit, size_t repeat = 1);
RWUString&
prepend(RWUChar32 codePoint, size_t repeat = 1);
RWUString&
remove(size_t offset = 0);
RWUString&
remove(size_t offset, size_t length);
RWUString&
replace(size_t offset, size_t length,
        const RWBasicUString& source);
RWUString&
replace(size_t offset, size_t length,
        const RWBasicUString& source, size_t sourceOffset, 
        size_t sourceLength);
RWUString&
replace(size_t offset, size_t length,
        const RWUSubString& source);
RWUString&
replace(size_t offset, size_t length,
        const RWUConstSubString& source);
RWUString&
replace(size_t offset, size_t length,
        const RWUChar16* source);
RWUString&
replace(size_t offset, size_t length, const RWUChar16* source,
        size_t sourceLength);
RWUString&
replace(size_t offset, size_t length,
        const RWUChar32* source);
RWUString&
replace(size_t offset, size_t length, const RWUChar32* source, 
        size_t sourceLength);
RWUString&
replace(size_t offset, size_t length, RWUChar16 codeUnit, 
        size_t repeat = 1);
RWUString&
replace(size_t offset, size_t length, RWUChar32 codePoint, 
        size_t repeat = 1);
size_t
rindex(const RWUSubString& pattern) const;
size_t 
rindex(const RWUConstSubString& pattern) const;
size_t
rindex(size_t offset, const RWUSubString& pattern) const;
size_t
rindex(size_t offset, const RWUConstSubString& pattern) const;
size_t
rindex(size_t offset, size_t length,
       const RWUSubString& pattern) const;
size_t
rindex(size_t offset, size_t length,
       const RWUConstSubString& pattern) const;
RWUSubString
strip(StripType stripType = Trailing,
      RWUChar32 codePoint = static_cast<RWUChar32>(0x0020));
RWUConstSubString
strip(StripType stripType = Trailing,
      RWUChar32 codePoint = static_cast<RWUChar32>(0x0020)) const;
RWUSubString
subString(const RWUString& pattern, size_t offset = 0);
RWUConstSubString
subString(const RWUString& pattern, size_t offset = 0) const;
RWCString
toBytes(RWUFromUnicodeConverter& converter = 
 RWUFromUnicodeConversionContext::getContext().getConverter()) 
 const;
RWUString&
toLower();
RWUString&
toLower(const RWULocale& locale);
RWUString&
toTitle();
RWUString&
toTitle(const RWULocale& locale);
RWUString&
toUpper();
RWUString&
toUpper(const RWULocale& locale);
RWUString&
unescape();

Class Pad

RWUString::Pad defines an iostream manipulator that can be used to insert the contents of an RWUString ustr into an output stream os, padding the string with the specified fill character until os.width() code points have been written to the stream.

If the length of ustr is greater than os.width(), the string is truncated and no padding occurs. If os.width() is zero, the entire contents of the string are inserted into the stream and no padding occurs.

An RWUString::Pad instance is only valid as long as the source string remains unchanged. Do not create persistent instances of this class; this class should only be instantiated as a temporary in an insertion expression.

Public Constructors

Pad(const RWUString& ustr,
    RWUChar32 codePoint = static_cast<RWUChar32>(0x0020)); 
Pad(const RWUSubString& ustr,
    RWUChar32 codePoint = static_cast<RWUChar32>(0x0020)); 
Pad(const RWUConstSubString& ustr,
    RWUChar32 codePoint = static_cast<RWUChar32>(0x0020));
Pad(const RWUString::Pad& source);


Previous fileTop of DocumentContentsIndex pageNext file

© Copyright Rogue Wave Software, Inc. All Rights Reserved.
Rogue Wave and SourcePro are registered trademarks of Rogue Wave Software, Inc. in the United States and other countries. All other trademarks are the property of their respective owners.
Contact Rogue Wave about documentation or support issues.