RWUString

RWUString stores and manipulates Unicode character sequences encoded as UTF-16 code units. This class extends RWBasicUString in the Essential Tools Module.

Unicode is a coded character set. It assigns numeric code point values from 0 to 0x10FFFF to abstract characters. UTF-16 is a character encoding form for Unicode in which a single 21-bit Unicode code point is represented using one or two 16-bit integer code units. UTF-16 strikes a balance between ease of use and efficient use of memory. Most characters can be represented with a single 16-bit code unit. Only characters in the range 0x10000 to 0x10FFFF must be represented with a surrogate pair of two UTF-16 code units.

Null Termination

One or more code units in a Unicode character string can be zero. Hence, a Unicode character string may not be null-terminated. In practice, it is a rare Unicode string that uses embedded nulls, but you should program defensively. RWUString handles embedded nulls properly.

Narrow Characters and Other Non-Unicode Strings

RWUString does not deal directly with non-Unicode characters or character strings such as char, char*, wchar_t, wchar_t*, RWCString, RWWString, std::string, std::wstring, and so on. If a non-Unicode character or string must be used with an RWUString, the non-Unicode character or string must be converted into Unicode first. The conversion can be done explicitly through the use of an RWUToUnicodeConverter, or implicitly through the use of an RWUToUnicodeConversionContext.

Code Units, Code Points, and Characters

The characteristics of UTF-16 imply that the number of 16-bit code units in a string may differ from the number of code points. Furthermore, the nature of Unicode implies that the number of code points may differ from the number of characters, as interpreted by the end user, since Unicode characters can be decomposed into multiple code points that correspond to the various accents or glyphs that comprise each character. The following methods and classes help you work with these concepts:

The length() method returns the number of UTF-16 code units in a string.
The numCodePoints() method returns the number of code points.
An RWUBreakSearch can be used to iterate over the characters of a string, in the context of a particular locale.

Note that numCodePoints() may be slower than the length() method because numCodePoints() must traverse the string to find code points that arise from surrogate code unit pairs. Since the majority of code points in the current Unicode Standard do not require a surrogate representation, many applications can rely on length() to determine the number of code points.

Lexical vs. Logical Comparison

RWUString performs comparisons on a lexical basis. Methods such as compareTo(), contains(), first(), last(), index(), rindex(), strip(), and the global comparison operators compare the bit values of individual code units, not the logical values of code points or characters. In contrast, RWUCollator performs comparisons on a logical basis, following the conventions specified in a given locale. The logical comparisons made by RWUCollator are more likely to match an end user's expectations regarding string equality and ordering. The lexical comparisons made by RWUString, however, are likely to be faster. If two strings contain characters from the same script, and are in the same normalization form, lexical comparisons may be adequate for many purposes.

Parameters of Type RWUChar*

Do not pass a NULL pointer value for parameters of type const RWUChar16*. Doing so will produce erroneous behavior and will trigger an assertion failure in debug builds of the library.

NOTE -- Do not pass a NULL pointer value for parameters of type const RWUChar16*.

Example

#include <rw/i18n/RWUString.h>
#include <rw/i18n/RWUConversionContext.h>
#include <iostream>

using std::cout;
using std::endl;

int
main()
{
  // Indicate that source and target strings are
  // encoded as ISO8859-1.
  RWUConversionContext context("ISO8859-1");

  // Initialize a Unicode string.
  RWUString str("source pro internationalization module");

  // Insert into the string.
  str.insert(str.index("int"), "core ");

  // Titlecase the string.
  str.toTitle();

  // Remove a character.
  RWUChar16 space = static_cast<RWUChar16>(0x20);
  str.remove(str.first(space), 1);

  // Print the result.
  cout << str << endl;

  return 0;
} // main

Results:
========

SourcePro Core Internationalization Module

Public Enums

enum StripType { Leading,
                 leading,
                 Trailing,
                 trailing,
                 Both,
                 both
};

An enumeration whose values are used to control the behavior of the strip() methods:

Leading or leading removes characters from the beginning of the string.
Trailing or trailing removes characters from the end of the string.
Both or both removes characters from both ends of the string.

enum Utf8 { UTF8 };

An enumeration used to select the constructors that accept UTF-8 encoded char strings.

enum NormalizationForm { FormNFD,
                         FormNFKD,
                         FormNFC,
                         FormNFKC
};

A NormalizationForm value indicates a particular normalization form, as defined by the Unicode Standard Annex #15, "Unicode Normalization Forms," http://www.unicode.org/unicode/reports/tr15/. Same as RWUNormalizer::NormalizationForm.

In converting a string to any of these forms, RWUString::normalize() leaves ASCII characters unaffected, and replaces deprecated characters. RWUString::normalize() never introduces compatibility characters.

FormNFD: Normalization Form Decomposed
FormNFKD: Normalization Form Compatibility Decomposed
FormNFC: Normalization Form Composed
FormNFKC: Normalization Form Compatibility Composed

Static Member Functions

static RWUString
foldCase(const RWUString& source,
         bool excludeSpecial = false);

Returns a folded-case representation of source in which each character in source is converted into a locale-independent, case-neutral representation suitable for use in caseless, lexical comparisons. If excludeSpecial is true, the special mappings that map the dotted I and dotless i to capital I are excluded. The length of the result may be different than that of the original contents.

NOTE -- This function supports simple caseless comparisons; use RWUCollator when more robust behavior is required.

static RWCString
toBytes(const RWUChar16* source, size_t length, 
        RWUFromUnicodeConverter& converter = RWUFromUnicodeConversionContext::getContext().getConverter());

Returns an RWCString instance that contains the sequence of bytes that are produced when the contents of the array source are converted into another character encoding scheme using converter. See also RWUFromUnicodeConversionContext and RWUFromUnicodeConverter.

static RWUString
toLower(const RWUString& source, const RWULocale& locale);

Returns a lowercase representation of source created using the case-mapping rules of the specified locale. The length of the result may be different than that of the original. Returns a reference to self.

static RWUString
toTitle(const RWUString& source, const RWULocale& locale);

Returns a titlecase representation of source created using the case-mapping and word-break rules of the specified locale. The length of the result may be different than that of the original. Returns a reference to self.

static RWUString
toUpper(const RWUString& source, const RWULocale& locale);

Returns an uppercase representation of source created using the case-mapping rules of the specified locale. The length of the result may be different than that of the original. Returns a reference to self.

static RWUString
unescape(const RWUString& source);

Parses the contents of source and replaces recognized escape sequences with the equivalent Unicode code unit representation. The following escape sequences are recognized:

\uhhhh = 4 hex digits in the range [0-9A-Fa-f]
\Uhhhhhhhh = 8 hex digits
\xhh = 1 or 2 hex digits
\ooo = 1, 2 or 3 octal digits in the range [0-7]
\a = U+0007; alert (BEL)
\b = U+0008; backspace (BS):
\t = U+0009; horizontal tab (HT)
\n = U+000A; newline/line feed (LF)
\v = U+000B; vertical tab (VT)
\f = U+000C; form feed (FF)
\r = U+000D; carriage return (CR)
\" = U+0022; double quote
\' = U+0027; single quote
\? = U+003F; question mark
\\ = U+005C; backslash

The value of any other escape sequence is the value of the character that follows the backslash.

If an escape sequence is ill-formed, this method throws RWConversionErr with an ILLEGALSEQ message.

A non-static version of this method is also provided.

Global Operators

The following comparison operators provide direct lexicographical comparisons between all supported Unicode string and substring types.

bool
operator==(const RWUString& lhs, const RWUString& rhs);

Returns true if lhs is lexicographically equal to rhs; otherwise, false.

bool
operator==(const RWUString& lhs, const RWUChar16* rhs);

bool
operator==(const RWUChar16* lhs, const RWUString& rhs);

bool
operator==(const RWUString& lhs, const RWUChar32* rhs);

bool
operator==(const RWUChar32* lhs, const RWUString& rhs);

bool
operator==(const RWUString& lhs, const RWUSubString& rhs);

bool
operator==(const RWUSubString& lhs, const RWUString& rhs);

bool
operator==(const RWUString& lhs,const RWUConstSubString& rhs);

bool
operator==(const RWUConstSubString& lhs,const RWUString& rhs);

bool
operator==(const RWUSubString& lhs,
            const RWUConstSubString& rhs);

bool
operator==(const RWUConstSubString& lhs,const RWUSubString& rhs);

bool
operator!=(const RWUString& lhs, const RWUString& rhs);

Returns true if lhs is lexicographically not equal to rhs; otherwise, false.

bool
operator!=(const RWUString& lhs, const RWUChar16* rhs);

bool
operator!=(const RWUChar16* lhs, const RWUString& rhs);

bool
operator!=(const RWUString& lhs, const RWUChar32* rhs);

bool
operator!=(const RWUChar32* lhs, const RWUString& rhs);

bool
operator!=(const RWUString& lhs, const RWUSubString& rhs);

bool
operator!=(const RWUSubString& lhs, const RWUString& rhs);

bool
operator!=(const RWUConstSubString& lhs,const RWUString& rhs);

bool
operator!=(const RWUString& lhs,const RWUConstSubString& rhs);

bool
operator!=(const RWUConstSubString& lhs,
            const RWUSubString& rhs);
bool
operator!=(const RWUSubString& lhs,
            const RWUConstSubString& rhs);
bool
operator<(const RWUString& lhs, const RWUString& rhs);

Returns true if lhs is lexicographically less than rhs; otherwise, false.

bool
operator<(const RWUChar16* lhs, const RWUString& rhs);

bool
operator<(const RWUString& lhs, const RWUChar16* rhs);

bool
operator<(const RWUChar32* lhs, const RWUString& rhs);

bool
operator<(const RWUString& lhs, const RWUChar32* rhs);

bool
operator<(const RWUSubString& lhs, const RWUString& rhs);

bool
operator<(const RWUString& lhs, const RWUSubString& rhs);

bool
operator<(const RWUConstSubString& lhs, const RWUString& rhs);

bool
operator<(const RWUString& lhs, const RWUConstSubString& rhs);

bool
operator<(const RWUSubString& lhs,
           const RWUConstSubString& rhs);

bool
operator<(const RWUConstSubString& lhs,
               const RWUSubString& rhs);

bool
operator<=(const RWUString& lhs, const RWUString& rhs);

Returns true if lhs is lexicographically less than or equal to rhs; otherwise, false.

bool
operator<=(const RWUChar16* lhs, const RWUString& rhs);

bool
operator<=(const RWUString& lhs, const RWUChar16* rhs);

bool
operator<=(const RWUChar32* lhs, const RWUString& rhs);

bool
operator<=(const RWUString& lhs, const RWUChar32* rhs);

bool
operator<=(const RWUSubString& lhs, const RWUString& rhs);

bool
operator<=(const RWUString& lhs,const RWUSubString& rhs);

bool
operator<=(const RWUConstSubString& lhs,const RWUString& rhs);

bool
operator<=(const RWUString& lhs,const RWUConstSubString& rhs);

bool
operator<=(const RWUConstSubString& lhs,
           const RWUSubString& rhs);
bool
operator<=(const RWUSubString& lhs,
           const RWUConstSubString& rhs);

bool
operator>(const RWUString& lhs, const RWUString& rhs);

Returns true if lhs is lexicographically greater than rhs; otherwise, false.

bool
operator>(const RWUChar16* lhs, const RWUString& rhs);

bool
operator>(const RWUString& lhs, const RWUChar16* rhs);

bool
operator>(const RWUChar32* lhs, const RWUString& rhs);

bool
operator>(const RWUString& lhs, const RWUChar32* rhs);

bool
operator>(const RWUSubString& lhs, const RWUString& rhs);

bool
operator>(const RWUString& lhs, const RWUSubString& rhs);

bool
operator>(const RWUConstSubString& lhs, const RWUString& rhs);

bool
operator>(const RWUString& lhs, const RWUConstSubString& rhs);

bool
operator>(const RWUConstSubString& lhs,
          const RWUSubString& rhs);
bool
operator>(const RWUSubString& lhs,
          const RWUConstSubString& rhs);

bool
operator>=(const RWUString& lhs, 
const RWUString& rhs);

Returns true if lhs is lexicographically greater than or equal to rhs; otherwise, false.

bool
operator>=(const RWUChar16* lhs,const RWUString& rhs);            

bool
operator>=(const RWUString& lhs, const RWUChar16* rhs);

bool
operator>=(const RWUChar32* lhs, const RWUString& rhs);

bool
operator>=(const RWUString& lhs, const RWUChar32* rhs);

bool
operator>=(const RWUSubString& lhs, const RWUString& rhs);

bool
operator>=(const RWUString& lhs, const RWUSubString& rhs);

bool
operator>=(const RWUConstSubString& lhs,const RWUString& rhs);

bool
operator>=(const RWUString& lhs,const RWUConstSubString& rhs);

bool
operator>=(const RWUConstSubString& lhs,
            const RWUSubString& rhs);
bool
operator>=(const RWUSubString& lhs,
            const RWUConstSubString& rhs);

RWUString
operator+(const RWUString& lhs, const RWUString& rhs);

Concatenates lhs with rhs and returns the result.

RWUString
operator+(const RWUChar16* lhs, const RWUString& rhs);

RWUString
operator+(const RWUString& lhs, const RWUChar16* rhs);

RWUString
operator+(const RWUString& lhs, const RWUSubString& rhs);

RWUString
operator+(const RWUSubString& lhs, 
           const RWUString& rhs);
RWUString
operator+(const RWUConstSubString& lhs, 
           const RWUString& rhs);
RWUString
operator+(const RWUString& lhs, 
           const RWUConstSubString& rhs);

RW_SL_IO_STD(istream&) 
operator>>(RW_SL_IO_STD(istream&) is, RWUString& ustr);

Reads an encoded byte stream from istream. The byte sequence is converted into UTF-16 using the currently active RWUToUnicodeConversionContext. Leading whitespace is always skipped before storing any code points, regardless of the ios::skipws format flag setting. Code points are then extracted until:

os.width() code points are read, if os.width() is greater than zero
a whitespace code point is read
the end of the input sequence is reached

The following Unicode characters are treated as whitespace delimiters:

U+0009 (HORIZONTAL TABULATION)
U+000A (LINE FEED)
U+000B (VERTICAL TABULATION)
U+000C (FORM FEED)
U+000D (CARRIAGE RETURN)
U+001C (FILE SEPARATOR)
U+001D (GROUP SEPARATOR)
U+001E (RECORD SEPARATOR)
U+001F (UNIT SEPARATOR)
U+0020 (SPACE)
U+0085 (NEXT LINE)
U+1680 (OGHAM SPACE MARK)
U+2000 (EN QUAD)
U+2001 (EM QUAD)
U+2002 (EN SPACE)
U+2003 (EM SPACE)
U+2004 (THREE-PER-EM SPACE)
U+2005 (FOUR-PER-EM SPACE)
U+2006 (SIX-PER-EM SPACE)
U+2007 (FIGURE SPACE)
U+2008 (PUNCTUATION SPACE)
U+2009 (THIN SPACE)
U+200A (HAIR SPACE)
U+200B (ZERO WIDTH SPACE)
U+2028 (LINE SEPARATOR)
U+2029 (PARAGRAPH SEPARATOR)
U+3000 (IDEOGRAPHIC SPACE)

Unlike standard extractors, this extractor must consume and discard the trailing whitespace delimiter. This behavior is due to the nature of the UTF-8 encoding scheme, which requires multiple bytes to encode some whitespace characters. For the same reason, this extractor cannot support the noskipws condition where the ios_base::skipws format flag has been cleared; each extraction will consume one whitespace character. Nor is a whitespace manipulator is not provide, since such a manipulator would consume the first non-whitespace character following a sequence of whitespace characters.

Throws RWUException to report conversion errors. Throws std::ios_base::failure to report any errors detected while performing stream operations.

RW_SL_IO_STD(ostream&) 
operator<<(RW_SL_IO_STD(ostream)& os, const RWUString& ustr);

Writes the sequence of bytes that are produced when contents of ustr are converted into the character encoding scheme specified by the currently active target RWUFromUnicodeConversionContext.

If os.width() is greater than the number of code points contained in the source string, the output is padded using a single space fill character (U+0020). If os.width() is less than the number of code points contained the source string, the entire contents of the string are inserted into the output stream. Any padding is inserted after the string if the ios_base::left format flag is set, or before if ios_base::right is set or if neither flag is set.

Throws RWUException to report conversion errors. Throws std::ios_base::failure to report any errors detected while performing stream operations.

RW_SL_IO_STD(ostream&) 
operator<<(RW_SL_IO_STD(ostream)& os,
           const RWUString::Pad& pad);

Writes the sequence of bytes that are produced when the contents of the RWUString used to construct pad are converted into the character encoding scheme specified by the currently active target RWUFromUnicodeConversionContext.

Throws RWUException to report conversion errors. Throws std::ios_base::failure to report any errors detected while performing stream operations.

Public Constructors

RWUString();

Default constructor. Constructs an empty, null string.

RWUString(const RWUString& source);

Copy constructor. Constructs an RWUString from source.

RWUString(const RWBasicUString& source);

Conversion constructor. Constructs an RWUString from source.

RWUString(const RWUSubString& source);
RWUString(const RWUConstSubString& source);

Constructs an RWUString containing a copy of the contents of the specified substring object source.

RWUString(const RWUChar16* source,
          Duration duration = Transient);

Constructs an RWUString instance that copies or references the contents of source, a null-terminated sequence of RWUChar16 values. The new RWUString instance assumes no responsibility for deallocating the storage associated with source.

If duration is Transient, this method copies the contents of source into an internally allocated and managed array.

If duration is Persistent, the client retains responsibility for the storage used for source. This mode may be used when source resides in static or otherwise durable storage. The storage associated with source must not be deallocated while the RWUString instance still references it. The original source array cannot be modified by any of the non-const methods provided by this class. RWUString creates a copy of source if any of these methods are called.

RWUString(const RWUChar16* source, size_t length,
          Duration duration = Transient);

Constructs an RWUString instance that copies or references the contents of source, an array of RWUChar16 values that contains length elements and may contain embedded nulls. The new RWUString instance assumes no responsibility for deallocating the storage associated with source.

If duration is Transient, this method copies the contents of source into an internally allocated and managed array.

RWUString(const RWUChar16* source, size_t length,
          size_t initialCapacity);

Constructs an RWUString instance that copies the array source into an internally-managed buffer with a minimum capacity of initialCapacity.

If the original allocation does not possess the capacity required by an append, insert, or replace operation, a new buffer allocation will be made to accommodate the change in length.

RWUString(RWUChar16* clientBuffer, Deallocator* deallocator);

Constructs an RWUString instance that assumes ownership of clientBuffer, a dynamically-allocated, null-terminated sequence of RWUChar16 values. The terminating null may appear at any position within the storage allocated for clientBuffer.

The deallocator parameter supplies the RWUString instance with an object that can be used to deallocate the storage referenced by clientBuffer.

The storage associated with clientBuffer must not be deallocated while the RWUString instance still references it.

If the original clientBuffer array does not possess the capacity required by an append, insert, or replace operation, the buffer will be copied into an internally allocated buffer. In this case, you won't be able to access clientBuffer via the original pointer. Capacity is determined by counting code units until the null character is found.

Copy-construction or assignment will produce an RWUString that refers to the same client-supplied buffer.

RWUString does not synchronize access to the client-supplied buffer; external synchronization will be required if multiple threads have access to the buffer through one or more RWUString instances.

RWUString(RWUChar16* clientBuffer, size_t contentLength, 
          Deallocator* deallocator);

Constructs an RWUString instance that assumes ownership of clientBuffer, a dynamically-allocated array of RWUChar16 values that contains contentLength elements and may contain embedded nulls. The storage required to hold contentLength elements may be less than the storage that was allocated for clientBuffer.

The deallocator parameter supplies the RWUString instance with an object that can be used to deallocate the storage referenced by clientBuffer.

The storage associated with clientBuffer must not be deallocated while the RWUString instance still references it.

Copy-construction or assignment will produce an RWUString that refers to the same client-supplied buffer.

RWUString(RWUChar16* clientBuffer, size_t contentLength, 
          size_t bufferCapacity, Deallocator* deallocator);

Constructs an RWUString instance that manipulates clientBuffer, a writeable, client-supplied array that initially contains contentLength elements and whose total usable size is given by bufferCapacity.

The deallocator parameter supplies the RWUString instance with an object that can be used to deallocate the storage referenced by clientBuffer.

The storage associated with clientBuffer must not be deallocated while the RWUString instance still references it.

Copy-construction or assignment will produce an RWUString that refers to the same client-supplied buffer.

RWUString(const RWUChar32* source);

Constructs an RWUString from the code units produced by converting the UTF-32 encoded source into its equivalent UTF-16 representation. The string contained in source must be null-terminated.

RWUString(const RWUChar32* source, size_t length);

Constructs an RWUString from the code units produced by converting the UTF-32 encoded source into its equivalent UTF-16 representation. The number of elements in source is specified using length. The source array may contain embedded nulls.

RWUString(const RWSize_T& initialCapacity);

Constructs an RWUString containing a zero-length string and a capacity of initialCapacity. An RWSize_T instance must be constructed to eliminate potential ambiguity with the RWUString(RWUChar32,size_t=1) constructor:

RWUString array(RWSize_T(10));

RWUString(RWUChar16 codeUnit, size_t repeat = 1);

Constructs an RWUString that contains repeat copies of the codeUnit.

RWUString(RWUChar32 codePoint, size_t repeat = 1);

Constructs an RWUString that contains repeat copies of the code unit or surrogate pair of code units produced by converting the UTF-32 codePoint into its equivalent UTF-16 representation.

RWUString(const char* source, Utf8);

Constructs an RWUString instance that contains the code units produced by converting the UTF-8 encoded text stored in source into a UTF-16 representation. The string contained in source must be null-terminated.

Use this method when a char text string is known to contain Unicode text encoded in UTF-8:

const char* utf8String = "..."; // UTF-8 encoded text
RWUString utf16String(utf8String, UTF8);

RWUString(const char* source, size_t length, Utf8);

Constructs an RWUString that contains the code units produced by converting the UTF-8 encoded text stored in source into a UTF-16 representation. The length of string contained in source is specified using length. The source may contain embedded nulls.

Use this method when a char text string is known to contain Unicode text encoded in UTF-8:

const char utf8String[20] = { ... }; // UTF-8 encoded text
RWUString utf16String(utf8String,20,UTF8);

RWUString(const RWCString& source, Utf8);

Constructs an RWUString instance that contains the code units produced by converting the UTF-8 encoded text stored in source into a UTF-16 representation.

Use this method when an RWCString text string is known to contain Unicode text encoded in UTF-8:

RWCString utf8String("..."); // UTF-8 encoded text
RWUString utf16String(utf8String,UTF8);

RWUString(const char* source, 
  RWUToUnicodeConverter& converter = 
  RWUToUnicodeConversionContext::getContext().getConverter());
RWUString(const RWCString& source,
  RWUToUnicodeConverter& converter = 
  RWUToUnicodeConversionContext::getContext().getConverter());
RWUString(const RWCSubString& source,
  RWUToUnicodeConverter& converter = 
  RWUToUnicodeConversionContext::getContext().getConverter());
RWUString(const RWCConstSubString& source, 
  RWUToUnicodeConverter& converter = 
  RWUToUnicodeConversionContext::getContext().getConverter());
RWUString(const RW_SL_STD(string)& source, 
  RWUToUnicodeConverter& converter = 
  RWUToUnicodeConversionContext::getContext().getConverter());

Constructs an RWUString from the UTF-16 code unit sequence produced by converting the contents of the null-terminated character string source using the conversion specified by converter.

Public Member Operators

RWUString&
operator=(const RWBasicUString& source);

Replaces the contents of self with a handle to the reference-counted body of source, and returns a reference to self.

RWUString&
operator=(const char* source);

Replaces the contents of self with a copy of the contents of source, and returns a reference to self. Converts the contents of source to UTF-16 using the current to-Unicode conversion context.

RWUString&
operator=(const RWUSubString& source);
RWUString&
operator=(const RWUConstSubString& source);

Replaces the contents of self with a copy of the contents of source, and returns a reference to self.

RWUString&
operator=(const RWUChar16* source);

Replaces the contents of self with a copy of the null-terminated contents of source and returns a reference to self.

RWUString&
operator=(const RWUChar32* source);

Replaces the contents of self with the code unit sequence required to represent the null-terminated, UTF-32 encoded contents of source in the UTF-16 encoding form, and returns a reference to self.

RWUString&
operator=(RWUChar16 codeUnit);

Replaces the contents of self with the single code unit codeUnit.

RWUString&
operator=(RWUChar32 codePoint);

Replaces the contents of self with the code unit sequence required to represent codePoint in the UTF-16 encoding form and returns a reference to self.

RWUString&
operator+=(const RWBasicUString& source);

Appends the contents of source to the contents of self.

RWUString&
operator+=(const RWUSubString& source);
RWUString&
operator+=(const RWUConstSubString& source);

Appends the contents of source to the contents of self.

RWUString&
operator+=(const RWUChar16* source);

Appends the contents of the null-terminated array source to the contents of self.

RWUString&
operator+=(const RWUChar32* source);

Appends the contents of the null-terminated array source to the contents of self, after converting the contents of source to UTF-16 using the current to-Unicode conversion context.

RWUString&
operator+=(RWUChar16 codeUnit);

Appends codeUnit to the contents of self.

RWUString&
operator+=(RWUChar32 codePoint);

Appends codePoint to the contents of self.

RWUChar16&
operator()(size_t offset);

Returns a reference to the code unit at the position specified by offset. This non-const variant can be used as an l-value in an assignment expression. Throws RWBoundsErr if the value of offset is greater than or equal to length(). Note that an individual code unit may not represent a complete code point.

RWUChar16
operator()(size_t offset) const;

Returns the code unit value at the position specified by offset. Throws RWBoundsErr if the value of offset is greater than or equal to length(). Note that an individual code unit may not represent a complete code point.

RWUSubString
operator()(size_t offset, size_t length);

Returns a substring within self that starts at offset and has an extent of length code units. The result can be used as an lvalue. Throws RWBoundsErr if the sum of offset and length is greater than length().

RWUConstSubString
operator()(size_t offset, size_t length) const;

Returns a substring within self that starts at offset and has an extent of length code units. Throws RWBoundsErr if the sum of offset and length is greater than length().

Public Member Functions

RWUString&
append(const RWBasicUString& source);

Appends the contents of source to the contents of self, and returns a reference to self.

RWUString&
append(const RWBasicUString& source, size_t sourceOffset, 
       size_t sourceLength);

Appends the contents of the specified range in source to the contents of self, and returns a reference to self. Throws RWBoundsErr if the sum of sourceOffset and sourceLength is greater than source.length().

RWUString&
append(const RWUSubString& source);
RWUString&
append(const RWUConstSubString& source);

Appends the contents of source to the contents of self, and returns a reference to self.

RWUString&
append(const RWUChar16* source);

Appends the contents of the null-terminated array source to the contents of self, and returns a reference to self.

RWUString&
append(const RWUChar16* source, size_t sourceLength);

Appends the contents of the array source to the contents of self, and returns a reference to self. The size of the source array is specified using sourceLength. The source array may contain embedded nulls.

RWUString&
append(const RWUChar32* source);

Converts the code points in the null-terminated array source into UTF-16 code units and appends those code units to the contents of self. Returns a reference to self.

RWUString&
append(const RWUChar32* source, size_t sourceLength);

Converts the code points in the array source into UTF-16 code units and appends those code units to contents of self. Returns a reference to self. The source string may contain embedded nulls. The size of the source array is specified using sourceLength.

RWUString&
append(RWUChar16 codeUnit, size_t repeat = 1);

Appends repeat copies of codeUnit to the contents of self. Returns a reference to self. The code unit is repeated repeat times. The code unit may be zero (null).

RWUString&
append(RWUChar32 codePoint, size_t repeat = 1);

Converts the UTF-32 codePoint into its equivalent UTF-16 representation and appends repeat copies of the resultant code unit or surrogate pair of code units to the contents of self. Returns a reference to self. The code point may be zero (null).

RWUStringIterator
beginCodePointIterator();

Returns an iterator that points to the first code point of self.

RWUConstStringIterator
beginCodePointIterator() const;

Returns an iterator that points to the first code point of self.

int
compareCodePoints(const RWUSubString& rhs) const;
int
compareCodePoints(const RWUConstSubString& rhs) const;

Returns a value that describes the lexical ordering between self and rhs. The return value should be interpreted as follows:

Self appears before rhs if the return value is less than zero.
Self is identical to rhs if the return value is zero.
Self appears after rhs if the return value is greater than zero.

int
compareCodeUnits(const RWUSubString& rhs) const;
int
compareCodeUnits(const RWUConstSubString& rhs) const;

Returns a value that describes the lexical ordering between self and rhs. The return value should be interpreted as follows:

Self appears before rhs if the return value is less than zero.
Self is identical to rhs if the return value is zero.
Self appears after rhs if the return value is greater than zero.

This method compares code unit values, not code point values. This may not produce the desired result if either string contains surrogate pairs or code unit values above the surrogate region. Use compareCodePoints() if code point ordering is required.

int
compareTo(const RWUSubString& rhs) const;
int
compareTo(const RWUConstSubString& rhs) const;

Returns a value that describes the lexical ordering between self and rhs. Equivalent to compareCodeUnits().

bool
contains(const RWUSubString& pattern) const;
bool 
contains(const RWUConstSubString& pattern) const;

Returns true if self contains pattern; otherwise, false. A zero-length pattern returns true.

bool
contains(size_t offset, const RWUSubString& pattern) const;
bool
contains(size_t offset,
         const RWUConstSubString& pattern) const;

Returns true if the specified range in self contains pattern; otherwise, false. The search begins at index offset within self. A zero-length pattern returns true. Throws RWBoundsErr if offset is greater than or equal to length().

bool
contains(size_t offset, size_t length,
         const RWUSubString& pattern) const;
bool
contains(size_t offset, size_t length,
         const RWUConstSubString& pattern) const;

Returns true if the specified range in self contains pattern; otherwise, false. The search begins at index offset within self, and extends for length code units. A zero-length pattern returns true. Throws RWBoundsErr if the sum of offset and length is greater than length().

RWUStringIterator
endCodePointIterator();

Returns an iterator that points to the position after last code point of self.

RWUConstStringIterator
endCodePointIterator() const;

Returns an iterator that points to the position after last code point of self.

size_t
first(const RWUSubString& codeUnitSet) const;
size_t
first(const RWUConstSubString& codeUnitSet) const;

Returns the index of the first occurrence of any code unit in codeUnitSet, or RW_NPOS if none of the code units in the set are found.

size_t
first(size_t offset, const RWUSubString& codeUnitSet) const;
size_t
first(size_t offset,
      const RWUConstSubString& codeUnitSet) const;

Returns the index of the first occurrence of any code unit in codeUnitSet, or RW_NPOS if none of the code units in the set are found. The search range starts at index offset within self, and extends through the length of self. Throws RWBoundsErr if offset is greater than or equal to length().

size_t
first(size_t offset, size_t length,
      const RWUSubString& codeUnitSet) const;
size_t
first(size_t offset, size_t length,
      const RWUConstSubString& codeUnitSet) const;

Returns the index of the first occurrence of any code unit in codeUnitSet, or RW_NPOS if none of the code units in the set are found. The search range starts at index offset within self, and extends for length code units within self. Throws RWBoundsErr if the sum of offset and length is greater than length().

RWUString&
foldCase(bool excludeSpecial = false);

Changes all letters in self into a locale-independent, case-neutral representation suitable for use in case-less, lexical comparisons. Returns a reference to self.

Turkic (tr and az) scripts include I-dot and i-dotless variations of the Latin letter I. The mapping used for these characters is controlled by excludeSpecial. If excludeSpecial is false, all I forms are folded to the Latin small letter i.

If excludeSpecial is true, the special Turkic I-dot and i-dotless forms are not mapped and are left unchanged.

The length of the result may differ from that of the original. This function supports simple caseless comparisons; use RWUCollator when more robust behavior is required.

Notes: This method produces a "full" case mapping where some characters are decomposed into multiple code points. This differs from the single code point mapping provided by RWUCharTraits, which provides only a simple one-to-one mapping.
Case folding does not preserve the normalization form of the source string; the result may require renormalization.
For more information on full versus simple case mapping, see UTR-21 Case Mappings, at http://www.unicode.org/unicode/reports/tr21.

size_t
index(const RWUSubString& pattern) const;

size_t
index(const RWUConstSubString& pattern) const;

Returns the index of the first occurrence of pattern, or RW_NPOS if the pattern was not found. An index value of zero is returned if the pattern length is zero.

size_t
index(size_t offset, const RWUSubString& pattern) const;
size_t
index(size_t offset, const RWUConstSubString& pattern) const;

Returns the index of the first occurrence of pattern, or RW_NPOS if the pattern was not found. The search begins at index position offset within self, and extends through the length of self. An index value of zero is returned if the pattern length is zero. Throws RWBoundsErr if offset is greater than or equal to length().

size_t
index(size_t offset, size_t length,
      const RWUSubString& pattern) const;
size_t 
index(size_t offset, size_t length,
      const RWUConstSubString& pattern) const;

Returns the index of the first occurrence of pattern, or RW_NPOS if the pattern was not found. The search begins at index position offset within self, and extends for length code units. An index value of zero is returned if the pattern length is zero. Throws RWBoundsErr if the sum of offset and length is greater than length().

RWUString&
insert(size_t offset, const RWBasicUString& source);

Inserts the contents of source before the code unit at index offset within the contents of self. Throws RWBoundsErr if offset is greater than or equal to length().

RWUString&
insert(size_t offset, const RWBasicUString& source,
       size_t sourceOffset, size_t sourceLength);

Inserts the contents of the specified range in source before the code unit at index offset within the contents of self. The range in source begins at sourceOffset, and extends for sourceLength code units. Throws RWBoundsErr if offset is greater than length(), or if the sum of sourceOffset and sourceLength is greater than source.length().

RWUString&
insert(size_t offset, const RWUSubString& source);
RWUString&
insert(size_t offset, const RWUConstSubString& source);

Inserts the contents of source before the code unit at index offset within the contents of self. Throws RWBoundsErr if offset is greater than length().

RWUString&
insert(size_t offset, const RWUChar16* source);

Inserts the contents of the null-terminated array source before the code unit at index offset within the contents of self. Throws RWBoundsErr if offset is greater than length().

RWUString&
insert(size_t offset, const RWUChar16* source,
       size_t sourceLength);

Inserts the contents of the array source before the code unit at index offset within the contents of self. The size of the source array is specified using sourceLength. The source array may contain embedded nulls. Throws RWBoundsErr if offset is greater than length().

RWUString&
insert(size_t offset, const RWUChar32* source);

Converts the code points in the null-terminated array source into UTF-16 code units and inserts those code units before the code unit at index offset within the contents of self. Throws RWBoundsErr if offset is greater than length().

RWUString&
insert(size_t offset, const RWUChar32* source,
       size_t sourceLength);

Converts the code points in the array source into UTF-16 code units and inserts those code units before the code unit at index offset within the contents of self. The size of the source array is specified using sourceLength. The source array may contain embedded nulls. Throws RWBoundsErr if offset is greater than length().

RWUString&
insert(size_t offset, RWUChar16 codeUnit, size_t repeat = 1);

Inserts repeat copies of codeUnit before the code unit at index offset within the contents of self. The code unit may be zero (null). Throws RWBoundsErr if offset is greater than length().

RWUString&
insert(size_t offset, RWUChar32 codePoint, size_t repeat = 1);

Converts the UTF-32 codePoint into its equivalent UTF-16 representation and inserts repeat copies of the resultant code unit or surrogate pair of code units before the code unit at index offset within the contents of self. The code point may be zero (null). Throws RWBoundsErr if offset is greater than or equal to length().

size_t
last(const RWUSubString& codeUnitSet) const;

size_t 
last(const RWUConstSubString& codeUnitSet) const;

Returns the index of the last occurrence of any code unit in codeUnitSet in self, or RW_NPOS is none of the code units in the set are found.

size_t
last(size_t offset, const RWUSubString& codeUnitSet) const;

Returns the index of the last occurrence of any code unit in codeUnitSet from offset to the end of self, or returns RW_NPOS if none of the code units in the set were found. The search begins at the end of self and continues backward to location offset, at the front of the range. Throws RWBoundsErr if offset is greater than length().

size_t
last(size_t offset,
     const RWUConstSubString& codeUnitSet) const;

size_t
last(size_t offset, size_t length,
     const RWUSubString& codeUnitSet) const;

size_t
last(size_t offset, size_t length,
     const RWUConstSubString& codeUnitSet) const;

Returns the index of the last occurrence of any code unit in codeUnitSet within the specified range in self, or RW_NPOS if none of the code units in the set were found. The search begins at location offset + length within self and continues backward to location offset, at the front of the range. Throws RWBoundsErr if the sum of offset and length is greater than length().

RWUString&
normalize(NormalizationForm form = FormNFC);

Transforms the contents of self to the specified normalization form.

Many characters have a variety of forms, including diacritics or contextual variants. A "decorated" character (such as å or \xd9 , for example) may be represented in Unicode either by a single code point, or a sequence of code points consisting of the base character and one or more combining characters that modify it. It is possible that within a given string or text file, the same abstract character may be represented in a variety of ways. This variability must be handled if comparisons are to be meaningful.

Normalization is used to transform a string into a predictable sequence of code points. There are four basic normalization forms for strings, represented by the following enum values in RWUString::NormalizationForm:

formNFC (Normalization Form Composed)
formNFD (Normalization Form Decomposed)
formNFKD (Normalization Form Compatibility Decomposed)
formNFKC (Normalization Form Compatibility Composed)

This method transforms self into a specific normalization form. See RWUNormalizer for complete information about normalization and normalization forms, and Section 5.3, "Normalization Forms," in the Internationalization Module User's Guide.

RWUString&
prepend(const RWBasicUString& source);

Prepends the contents of source to the contents of self. Returns a reference to self.

RWUString&
prepend(const RWBasicUString& source, size_t sourceOffset, 
        size_t sourceLength);

Prepends the contents of the specified range in source to the contents of self. Returns a reference to self. The range begins at index sourceOffset within source and extends for sourceLength code units. Throws RWBoundsErr if the sum of sourceOffset and sourceLength is greater than source.length().

RWUString&
prepend(const RWUSubString& source);
RWUString& 
prepend(const RWUConstSubString& source);

Prepends the contents of source to the contents of self. Returns a reference to self.

RWUString&
prepend(const RWUChar16* source);

Prepends the contents of the null-terminated array source to the contents of self. Returns a reference to self.

RWUString&
prepend(const RWUChar16* source, size_t sourceLength);

Prepends the contents of the array source to the contents of self. Returns a reference to self. The size of the source array is specified using sourceLength. The source array may contain embedded nulls.

RWUString&
prepend(const RWUChar32* source);

Converts the code points in the null-terminated array source into UTF-16 code units and prepends those code units to contents of self. Returns a reference to self.

RWUString&
prepend(const RWUChar32* source, size_t sourceLength);

Converts the code points in the array source into UTF-16 code units and prepends those code units to contents of self. Returns a reference to self. The size of the source array is specified using sourceLength. The source array may contain embedded nulls.

RWUString&
prepend(RWUChar16 codeUnit, size_t repeat = 1);

Prepends repeat copies of codeUnit to the contents of self. Returns a reference to self. The code unit may be zero (null).

RWUString&
prepend(RWUChar32 codePoint, size_t repeat = 1);

Converts the UTF-32 codePoint into its equivalent UTF-16 representation and prepends repeat copies of the resultant code unit or surrogate pair of code units to the contents of self. Returns a reference to self. The code point may be zero (null).

RWUString&
remove(size_t offset = 0);

Removes the range of code units that start at offset and extend through the end of self. Returns a reference to self. Throws RWBoundsErr if offset is greater than or equal to length().

RWUString&
remove(size_t offset, size_t length);

Removes the specified range of code units from the contents of self, and collapses the contents as necessary to produce a contiguous result. Returns a reference to self. The range begins at index offset within self and extends for length code units. Throws RWBoundsErr if the sum of offset and length is greater than length().

RWUString&
replace(size_t offset, size_t length,
        const RWBasicUString& source);

Replaces the specified range of code units in self with the contents of source. Returns a reference to self. The range begins at index offset within self and extends for length code units. Throws RWBoundsErr if the sum of offset and length is greater than length().

RWUString&
replace(size_t offset, size_t length,
        const RWBasicUString& source, size_t sourceOffset, 
        size_t sourceLength);

Replaces the specified range of code units in self with the specified range of code units in source. Returns a reference to self. The range in self begins at index offset and extends for length code units. The range in source begins at index sourceOffset and extends for sourceLength code units. Throws RWBoundsErr if the sum of offset and length is greater than length(), or the sum of sourceOffset and sourceLength is greater than source.length().

RWUString&
replace(size_t offset, size_t length,
        const RWUSubString& source);
RWUString&
replace(size_t offset, size_t length,
        const RWUConstSubString& source);

RWUString&
replace(size_t offset, size_t length,
        const RWUChar16* source);

Replaces the specified range of code units in self with the contents of the null-terminated array source. Returns a reference to self. The range begins at index offset within self and extends for length code units. Throws RWBoundsErr if the sum of offset and length is greater than length().

RWUString&
replace(size_t offset, size_t length, const RWUChar16* source,
        size_t sourceLength);

Replaces the specified range of code units in self with the contents of the array source. Returns a reference to self. The range begins at index offset within self and extends for length code units. The size of the source array is specified using sourceLength. The source array may contain embedded nulls. Throws RWBoundsErr if the sum of offset and length is greater than length().

RWUString&
replace(size_t offset, size_t length,
        const RWUChar32* source);

Converts the code points in the null-terminated array source into UTF-16 code units and replaces the specified range of code units in self with those code units. Returns a reference to self. The range begins at index offset within self and extends for length code units. Throws RWBoundsErr if the sum of offset and length is greater than length().

RWUString&
replace(size_t offset, size_t length, const RWUChar32* source, 
        size_t sourceLength);

Converts the code points in the array source into UTF-16 code units and replaces the specified range of code units in self with those code units. Returns a reference to self. The range begins at index offset within self and extends for length code units. The size of the source array is specified using sourceLength. The source array may contain embedded nulls. Throws RWBoundsErr if the sum of offset and length is greater than length().

RWUString&
replace(size_t offset, size_t length, RWUChar16 codeUnit, 
        size_t repeat = 1);

Replaces the specified range of code units in self with repeat copies of codeUnit. Returns a reference to self. The range begins at index offset within self and extends for length code units. The code unit may be zero (null). Throws RWBoundsErr if the sum of offset and length is greater than length().

RWUString&
replace(size_t offset, size_t length, RWUChar32 codePoint, 
        size_t repeat = 1);

Converts the UTF-32 codePoint into its equivalent UTF-16 representation and replaces the specified range of code units in self with repeat copies of the resultant code unit or surrogate pair of code units. Returns a reference to self. The range begins at index offset within self and extends for length code units. The code point may be zero (null). Throws RWBoundsErr if the sum of offset and length is greater than length().

size_t
rindex(const RWUSubString& pattern) const;
size_t 
rindex(const RWUConstSubString& pattern) const;

Returns the index of the last occurrence of pattern, or RW_NPOS if the pattern was not found. An index value equal to length() is returned if the pattern length is zero.

size_t
rindex(size_t offset, const RWUSubString& pattern) const;
size_t
rindex(size_t offset, const RWUConstSubString& pattern) const;

Returns the index of the last occurrence of pattern, or RW_NPOS if the pattern was not found. An index value equal to length() is returned if the pattern length is zero. The search begins at index offset within self and extends through the length of self. Throws RWBoundsErr if offset is greater than or equal to length().

size_t
rindex(size_t offset, size_t length,
       const RWUSubString& pattern) const;
size_t
rindex(size_t offset, size_t length,
       const RWUConstSubString& pattern) const;

Returns the index of the last occurrence of pattern, or RW_NPOS if the pattern was not found. An index value equal to length() is returned if the pattern length is zero. The search begins at index offset within self and extends for length code units. Throws RWBoundsErr if the sum of offset and length is greater than or equal to length().

RWUSubString
strip(StripType stripType = Trailing,
      RWUChar32 codePoint = static_cast<RWUChar32>(0x0020));

Returns a substring of self in which the character codePoint has been stripped off the beginning of the string, the end of the string, or both, as specified by stripType. The strip code point defaults to U+0020, the ASCII space character.

RWUConstSubString
strip(StripType stripType = Trailing,
      RWUChar32 codePoint = static_cast<RWUChar32>(0x0020)) const;

RWUSubString
subString(const RWUString& pattern, size_t offset = 0);

Returns a substring representing the first occurrence of the string pointed to by pattern following position offset. The result may be used as an lvalue. The result has a start position of RW_NPOS and a length of zero if no occurrences of the pattern could be found.

RWUConstSubString
subString(const RWUString& pattern, size_t offset = 0) const;

Returns a substring representing the first occurrence of the string pointed to by pattern following position offset. The result has a start position of RW_NPOS and a length of zero if no occurrences of the pattern could be found.

RWCString
toBytes(RWUFromUnicodeConverter& converter = 
 RWUFromUnicodeConversionContext::getContext().getConverter()) 
 const;

Returns an RWCString instance that contains the sequence of bytes that are produced when the contents of self are converted into another character encoding scheme using converter. See also RWUFromUnicodeConversionContext and RWUFromUnicodeConverter.

RWUString&
toLower();

Changes all letters in self to lowercase using the case-mapping rules of the current default locale. The length of the result may be different than that of the original. Returns a reference to self.

RWUString&
toLower(const RWULocale& locale);

Changes all letters in self to lowercase using the case-mapping rules of the specified locale. The length of the result may be different than that of the original. Returns a reference to self.

RWUString&
toTitle();

Changes all words in self to titlecase using the case-mapping and word-break rules of the default locale. The length of the result may be different than that of the original. Returns a reference to self.

RWUString&
toTitle(const RWULocale& locale);

Changes all words in self to titlecase using the case-mapping and word-break rules of the specified locale. The length of the result may be different than that of the original. Returns a reference to self.

RWUString&
toUpper();

Changes all letters in self to uppercase using the case-mapping rules of the default locale. The length of the result may be different than that of the original. Returns a reference to self.

RWUString&
toUpper(const RWULocale& locale);

Changes all letters in self to uppercase using the case-mapping rules of the specified locale. The length of the result may be different than that of the original. Returns a reference to self.

RWUString&
unescape();

Parses the contents of self and replaces recognized escape sequences with the equivalent Unicode code unit representation. The following escape sequences are recognized:

\uhhhh = 4 hex digits in the range [0-9A-Fa-f]
\Uhhhhhhhh = 8 hex digits
\xhh = 1 or 2 hex digits
\ooo = 1, 2 or 3 octal digits in the range [0-7]
\a = U+0007; alert (BEL)
\b = U+0008; backspace (BS):
\t = U+0009; horizontal tab (HT)
\n = U+000A; newline/line feed (LF)
\v = U+000B; vertical tab (VT)
\f = U+000C; form feed (FF)
\r = U+000D; carriage return (CR)
\" = U+0022; double quote
\' = U+0027; single quote
\? = U+003F; question mark
\\ = U+005C; backslash

The value of any other escape sequence is the value of the character that follows the backslash. If an escape sequence is ill-formed, this method throws RWConversionErr with an ILLEGALSEQ message. A static version of this method is also provided.

Class Pad

RWUString::Pad defines an iostream manipulator that can be used to insert the contents of an RWUString ustr into an output stream os, padding the string with the specified fill character until os.width() code points have been written to the stream.

If the length of ustr is greater than os.width(), the string is truncated and no padding occurs. If os.width() is zero, the entire contents of the string are inserted into the stream and no padding occurs.

An RWUString::Pad instance is only valid as long as the source string remains unchanged. Do not create persistent instances of this class; this class should only be instantiated as a temporary in an insertion expression.

RWUString ustr = ...;
std::cout << RWUString::Pad(ustr, static_cast<RWUChar32>('.'))
          << std::endl;

Public Constructors

Pad(const RWUString& ustr,
    RWUChar32 codePoint = static_cast<RWUChar32>(0x0020)); 
Pad(const RWUSubString& ustr,
    RWUChar32 codePoint = static_cast<RWUChar32>(0x0020)); 
Pad(const RWUConstSubString& ustr,
    RWUChar32 codePoint = static_cast<RWUChar32>(0x0020));

Constructs a manipulator instance.

Pad(const RWUString::Pad& source);

Constructs copy of a manipulator instance.

© Copyright Rogue Wave Software, Inc. All Rights Reserved.
Rogue Wave and SourcePro are registered trademarks of Rogue Wave Software, Inc. in the United States and other countries. All other trademarks are the property of their respective owners.
Contact Rogue Wave about documentation or support issues.

RWUString

Local Index

Members

Non-Members

Header File

Description

Example

Related Classes

Public Enums

Static Member Functions

Global Operators

Public Constructors

Public Member Operators

Public Member Functions

Class Pad