Module: Essential Tools Module Group: String Processing Classes
Does not inherit
operator!=() operator>>() operator>() |
operator>=() operator<<() operator<() |
operator<=() operator+() operator==() |
strXForm() toLower() toUpper() |
#include <rw/wstring.h> RWWString a;
Class RWWString offers very powerful and convenient facilities for manipulating wide character strings.
NOTE -- RWWString is designed for use with wide character strings. To manipulate multibyte character sequences, use RWCString.
When building on top of the standard library, RWWString uses an alternate implementation that is a thin wrapper on top of std::wstring. The RWWString interface remains the same, with the addition of one method for easy conversion: std::wstring std();. For applications doing many RWWString->std::wstring conversions, significant speed improvements might be obtained by using the standard library implementation.
NOTE -- Member function overloads with std::wstring or RWWConstSubString only appear when building on top of the C++ Standard Library.
This string class manipulates wide characters of the fundamental type wchar_t. These characters are generally two or four bytes, and can be used to encode richer code sets than the classic "char" type. Because wchar_t characters are all the same size, indexing is fast.
Conversion to and from multibyte and ASCII forms are provided by the RWWString constructors, and by the RWWString member functions isAscii(), toAscii(), and toMultiByte().
Stream operations implicitly translate to and from the multibyte stream representation. That is, on output, wide character strings are converted into multibyte strings, while on input they are converted back into wide character strings. Hence, the external representation of wide character strings is usually as multibyte character strings, saving storage space and making interfaces with devices (which usually expect multibyte strings) easier.
RWWStrings tolerate embedded nulls.
Parameters of type "const wchar_t*" must not be passed a value of zero. This is detected in the debug version of the library.
A separate RWWSubString class supports substring extraction and modification operations.
Simple
#include <iostream> #include <rw/wstring.h> int main () { RWWString a(L"There is no joy in Beantown"); std::cout << a << std::endl << "becomes......" << std::endl; a.subString(L"Beantown") = L"Redmond"; std::cout << a << std::endl; return 0; }
enum RWWString::caseCompare { exact, ignoreCase };
Specifies whether comparisons, searches, and hashing functions should use case sensitive (exact) or case-insensitive (ignoreCase) semantics.
enum RWWString::multiByte_ { multiByte };
Allows conversion from multibyte character strings to wide character strings. See constructor below.
enum RWWString::ascii_ {ascii };
Allows conversion from ASCII character strings to wide character strings. See constructor below.
RWWString();
Creates a string of length zero (the null string).
RWWString(const std::wstring& a);
Constructs a const RWWString with one element a. Only available in standard library builds of the Essential Tools Module.
RWWString(const wchar_t* cs);
Creates a string from the wide character string cs. The created string will copy the data pointed to by cs, up to the first terminating null.
RWWString(const wchar_t* cs, size_t N);
Constructs a string from the character string cs. The created string will copy the data pointed to by cs. Exactly N characters are copied, including any embedded nulls. Hence, the buffer pointed to by cs must be at least N* sizeof(wchar_t) bytes or N wide characters long.
RWWString(RWSize_T ic);
Creates a string of length zero (the null string). The string's capacity (that is, the size it can grow without resizing) is given by the parameter ic.
RWWString(const RWWString& str);
Copy constructor. The created string will copy str's data.
RWWString(const RWWSubString& ss);
Converts from sub-string. The created string will copy the substring represented by ss.
RWWString(const RWWConstSubString& ss);
Converts from substring. The created wide string copies the substring represented by ss. Only available in standard library builds of the Essential Tools Module.
RWWString(char c);
Constructs a string containing the single character c.
RWWString(char c, size_t N);
Constructs a string containing the character c repeated N times.
RWWString(const char* mbcs, multiByte_ mb); RWWString(const RWCString& s, multiByte_); RWWString(const RWCString&, multiByte_ mb);
Construct a wide character string from the multibyte character string contained in mbcs. The conversion is done using the Standard C Library function ::mbstowcs(). This constructor can be used as follows:
RWWString a("\306\374\315\313\306\374", multiByte);
RWWString(const char* acs, ascii_ asc); RWWString(const RWCString& s, ascii_ ); RWWString(const RWCString&, ascii_ asc);
Construct a wide character string from the ASCII character string contained in acs. The conversion is done by simply stripping the high-order bit and, hence, is much faster than the more general constructor given immediately above. For this conversion to be successful, you must be certain that the string contains only ASCII characters. This can be confirmed (if necessary) using RWCString::isAscii(). This constructor can be used as follows:
RWWString a("An ASCII character string", RWWString::ascii);
RWWString(const char* cs, size_t N, multiByte_ mb); RWWString(const char* cs, size_t N, ascii__ asc);
These two constructors are similar to the two constructors immediately above except that they copy exactly N characters, including any embedded nulls. Hence, the buffer pointed to by cs must be long enough to hold N characters (which, in the case of multibyte strings, does not necessarily correspond to N bytes).
operator const wchar_t*() const;
Access to the RWWString's data as a null terminated wide string. This datum is owned by the RWWString and may not be deleted or changed. If the RWWString object itself changes or goes out of scope, the pointer value previously returned will become invalid. While the string is null-terminated, note that its length is still given by the member function length(). That is, it may contain embedded nulls.
RWWString& operator=(const RWWString& str);
Assignment operator. The string will copy str's data. Returns a reference to self.
RWWString& operator=(const RWWSubString& sub); RWWString& operator=(const std::wstring& sub);
Assignment operator. The string will copy sub's data. Returns a reference to self.
RWWString& operator+=(const wchar_t* cs);
Appends the null-terminated character string pointed to by cs to self. Returns a reference to self.
RWWString& operator+=(const RWWString& str); RWWString& operator+=(const std::wstring& str)
Appends the string str to self. Returns a reference to self.
wchar_t& operator[](size_t i); wchar_t operator[](size_t i) const;
Returns the ith character. The first variant can be used as an lvalue. The index i must be between 0 and the length of the string less one. Bounds checking is performed -- if the index is out of range then an exception of type RWBoundsErr will be thrown.
wchar_t& operator()(size_t i); wchar_t operator()(size_t i) const;
Returns the ith character. The first variant can be used as an lvalue. The index i must be between 0 and the length of the string less one. Bounds checking is performed if the pre-processor macro RWBOUNDS_CHECK has been defined before including <rw/wstring.h>. In this case, if the index is out of range, then an exception of type RWBoundsErr will be thrown.
RWWSubString operator()(size_t start, size_t len); const RWWSubString operator()(size_t start, size_t len) const;
Substring operator. Returns an RWWSubString of self with length len, starting at index start. The first variant can be used as an lvalue. The sum of start plus len must be less than or equal to the string length. If the library was built using the RW_DEBUG flag, and start and len are out of range, then an exception of type RWBoundsErr will be thrown.
RWWString& append(const wchar_t* cs);
Appends a copy of the null-terminated wide character string pointed to by cs to self. Returns a reference to self.
RWWString& append(const wchar_t* cs, size_t N,);
Appends a copy of the wide character string cs to self. Exactly N wide characters are copied, including any embedded nulls. Hence, the buffer pointed to by cs must be at least N*sizeof(wchar_t) bytes long. Returns a reference to self.
RWWString& append(const RWWString& str); RWWString& append(const std::wstring& str);
Appends a copy of the string str to self. Returns a reference to self.
RWWString& append(const RWWString& str, size_t N); RWWString& append(const std::wstring& str, size_t N);
Appends the first N characters or the length of str (whichever is less) to self. Returns a reference to self.
size_t binaryStoreSize() const;
Returns the number of bytes necessary to store the object using the global function:
RWFile& operator<<(RWFile&, const RWWString&);
size_t capacity() const;
Returns the current capacity of self. This is the number of characters the string can hold without resizing.
size_t capacity(size_t capac);
Hint to the implementation to change the capacity of self to capac. Returns the actual capacity.
int collate(const wchar_t* str) const; int collate(const RWWString& str) const; int collate(const std::wstring& str) const;
Returns an int less then, greater than, or equal to zero, according to the result of calling the POSIX function ::wscoll() on self and the argument str. This supports locale-dependent collation.
int compareTo(const wchar_t* str, caseCompare = RWWString::exact) const; int compareTo(const RWWString& str, caseCompare = RWWString::exact) const; int compareTo(const std::wstring& str, caseCompare cmp = exact) const; int compareTo(const std::wstring* str, caseCompare cmp = exact) const;
Returns an int less than, greater than, or equal to zero, according to the result of calling the Standard C Library function ::memcmp() on self and the argument str. Case sensitivity is according to the caseCompare argument, and may be RWWString::exact or RWWString::ignoreCase.
bool contains(const wchar_t* str, caseCompare = RWWString::exact) const; bool contains(const RWWString& cs, caseCompare = RWWString::exact) const; bool contains(const std::wstring& str, caseCompare cmp = exact) const;
Pattern matching. Returns true if cs occurs in self. Case sensitivity is according to the caseCompare argument, and may be RWWString::exact or RWWString::ignoreCase.
const wchar_t* data() const;
Access to the RWWString's data as a null terminated string. This datum is owned by the RWWString and may not be deleted or changed. If the RWWString object itself changes or goes out of scope, the pointer value previously returned will become invalid. While the string is null-terminated, note that its length is still given by the member function length(). That is, it may contain embedded nulls.
size_t first(wchar_t c) const;
Returns the index of the first occurrence of the wide character c in self. Returns RW_NPOS if there is no such character or if there is an embedded null prior to finding c.
size_t first(wchar_t c, size_t) const;
Returns the index of the first occurrence of the wide character c in self. Continues to search past embedded nulls. Returns RW_NPOS if there is no such character.
size_t first(const wchar_t* str) const;
Returns the index of the first occurrence in self of any character in str. Returns RW_NPOS if there is no match or if there is an embedded null prior to finding any character from str.
size_t first(const wchar_t* str, size_t N) const;
Returns the index of the first occurrence in self of any character in str. Exactly N characters in str are checked including any embedded nulls so str must point to a buffer containing at least N wide characters. Returns RW_NPOS if there is no match.
unsigned hash(caseCompare = RWWString::exact) const;
Returns a suitable hash value.
size_t index(const wchar_t* pat,size_t i=0, caseCompare = RWWString::exact) const; size_t index(const RWWString& pat,size_t i=0, caseCompare = RWWString::exact) const; size_t index(const std::wstring& pat,size_t i=0, caseCompare cmp = exact) const;
Pattern matching. Starting with index i, searches for the first occurrence of pat in self and returns the index of the start of the match. Returns RW_NPOS if there is no such pattern. Case sensitivity is according to the caseCompare argument; it defaults to RWWString::exact.
size_t index(const wchar_t* pat, size_t patlen,size_t i, caseCompare) const; size_t index(const RWWString& pat, size_t patlen,size_t i, caseCompare) const; size_t index(const std::wstring& pat, size_t patlen,size_t i, caseCompare cmp) const;
Pattern matching. Starting with index i, searches for the first occurrence of the first patlen characters from pat in self and returns the index of the start of the match. Returns RW_NPOS if there is no such pattern. Case sensitivity is according to the caseCompare argument.
RWWString& insert(size_t pos, const wchar_t* cs);
Inserts a copy of the null-terminated string cs into self at position pos. Returns a reference to self.
RWWString& insert(size_t pos, const wchar_t* cs, size_t N);
Inserts a copy of the first N wide characters of cs into self at position pos. Exactly N wide characters are copied, including any embedded nulls. Hence, the buffer pointed to by cs must be at least N*sizeof(wchar_t) bytes long. Returns a reference to self.
RWWString& insert(size_t pos, const RWWString& str); RWWString& insert(size_t pos, const std::wstring& str);
Inserts a copy of the string str into self at position pos. Returns a reference to self.
RWWString& insert(size_t pos, const RWWString& str, size_t N); RWWString& insert(size_t pos, const std::wstring& str, size_t N);
Inserts a copy of the first N wide characters or the length of str (whichever is less) of str into self at position pos. Returns a reference to self.
bool isAscii() const;
Returns true if it is safe to perform the conversion toAscii() (that is, if all characters of self are ASCII characters).
bool isNull() const;
Returns true if this string has zero length (i.e., the null string).
size_t last(wchar_t c) const;
Returns the index of the last occurrence in the string of the wide character c. Returns RW_NPOS if there is no such character.
size_t length() const;
Returns the number of characters in self.
RWWString& prepend(const wchar_t* cs);
Prepends a copy of the null-terminated wide character string pointed to by cs to self. Returns a reference to self.
RWWString& prepend(const wchar_t* cs, size_t N,);
Prepends a copy of the character string cs to self. Exactly N characters are copied, including any embedded nulls. Hence, the buffer pointed to by cs must be at least N*sizeof(wchart_t) bytes long. Returns a reference to self.
RWWString& prepend(const RWWString& str); RWWString& prepend(const std::wstring& str);
Prepends a copy of the string str to self. Returns a reference to self.
RWWString& prepend(const RWWString& str, size_t N); RWWString& prepend(const std::wstring& str, size_t N);
Prepends the first N wide characters or the length of str (whichever is less) of cstr to self. Returns a reference to self.
istream& readFile(istream& s);
Reads characters from the input stream s, replacing the previous contents of self, until EOF is reached. The input stream is treated as a sequence of wide characters, each of which is read and stored in the RWWString object. Null characters are treated the same as other characters. Note: RWWString is designed for use with wide character strings; to manipulate multibyte character sequences, use RWCString.
istream& readLine(istream& s, bool skipWhite = true);
Reads characters from the input stream s, replacing the previous contents of self, until a newline (or an EOF) is encountered. The newline is removed from the input stream but is not stored. The input stream is treated as a sequence of wide characters, each of which is read and stored in the RWWString object. Null characters are treated the same as other characters. If the skipWhite argument is true, then whitespace is skipped (using the std::iostream library manipulator ws) before saving characters. Note: RWWString is designed for use with wide character strings; to manipulate multibyte character sequences, use RWCString.
istream& readString(istream& s);
Reads characters from the input stream s, replacing the previous contents of self, until an EOF or null terminator is encountered. The input stream is treated as a sequence of wide characters, each of which is read and stored in the RWWString object. Note: RWWString is designed for use with wide character strings; to manipulate multibyte character sequences, use RWCString.
istream& readToDelim(istream&, wchar_t delim=(wchar_t)'\n');
Reads characters from the input stream s, replacing the previous contents of self, until an EOF or the delimiting character delim is encountered. The delimiter is removed from the input stream but is not stored. The input stream is treated as a sequence of wide characters, each of which is read and stored in the RWWString object. Null characters are treated the same as other characters. Note: RWWString is designed for use with wide character strings; to manipulate multibyte character sequences, use RWCString.
istream& readToken(istream& s);
Whitespace is skipped before storing characters into a wide string. Characters are then read from the input stream s, replacing previous contents of self, until trailing whitespace or an EOF is encountered. The trailing whitespace is left on the input stream. Only ASCII whitespace characters are recognized, as defined by the Standard C Library function isspace(). The input stream is treated as a sequence of wide characters, each of which is read and stored in the RWWString object. Note: RWWString is designed for use with wide character strings; to manipulate multibyte character sequences, use RWCString.
RWWString& remove(size_t pos);
Removes the characters from the position pos, which must be no greater than length(), to the end of string. Returns a reference to self.
RWWString& remove(size_t pos, size_t N);
Removes N wide characters or to the end of string (whichever comes first) starting at the position pos, which must be no greater than length(). Returns a reference to self.
RWWString& replace(size_t pos, size_t N, const wchar_t* cs);
Replaces N wide characters or to the end of string (whichever comes first) starting at position pos, which must be no greater than length(), with a copy of the null-terminated string cs. Returns a reference to self.
RWWString& replace(size_t pos, size_t N1,const wchar_t* cs, size_t N2);
Replaces N1 characters or to the end of string (whichever comes first) starting at position pos, which must be no greater than length(), with a copy of the string cs. Exactly N2 characters are copied, including any embedded nulls. Hence, the buffer pointed to by cs must be at least N2*sizeof(wchart_t) bytes long. Returns a reference to self.
RWWString& replace(size_t pos, size_t N, const RWWString& str); RWWString& replace(size_t pos, size_t N, const std::wstring& str);
Replaces N characters or to the end of string (whichever comes first) starting at position pos, which must be no greater than length(), with a copy of the string str. Returns a reference to self.
RWWString& replace(size_t pos, size_t N1, const RWWString& str, size_t N2); RWWString& replace(size_t pos, size_t N1,const std::wstring& str, size_t N2);
Replaces N1 characters or to the end of string (whichever comes first) starting at position pos, which must be no greater than length(), with a copy of the first N2 characters, or the length of str (whichever is less), from str. Returns a reference to self.
void resize(size_t n);
Changes the length of self, adding blanks (i.e., L' ') or truncating as necessary.
size_t rindex(const char* pat, caseCompare cmp) const;
Pattern matching function that starts at the rear of the string and searches for the first occurrence of pat closest to the end of the string. Returns the index of the start of the match. Returns RW_NPOS if there is no such pattern. Case sensitivity is according to the caseCompare argument.
size_t rindex(const char pat,size_t i=RW_NPOS, caseCompare cmp = RWWString::exact) const; size_t rindex(const char* pat,size_t i=RW_NPOS, caseCompare cmp = RWWString::exact) const; size_t rindex(const RWCString& pat,size_t i=RW_NPOS, caseCompare cmp = RWWString::exact) const; size_t rindex(const std::wstring& pat,size_t i=RW_NPOS, caseCompare cmp = RWWString::exact) const;
Pattern matching. Starting with index i, searches for the occurrence of pat that starts on or before index i and is closest to the end of self. If there is a pattern match, returns the index of the start of the match; otherwise, returns RW_NPOS. Case sensitivity is according to the caseCompare argument, which defaults to RWCString::exact.
size_t rindex(const char* pat, size_t patlen,size_t i, caseCompare cmp) const; size_t rindex(const RWCString& pat, size_t patlen, size_t i,caseCompare cmp) const; size_t rindex(const std::wstring& pat, size_t patlen, size_t i,caseCompare cmp) const;
Pattern matching. Starting with index i, searches for the occurrence of the first patlen characters from pat that start on or before index i and are closest to the end of self. If there is a match, returns the index of the start of the match; otherwise returns RW_NPOS. Case sensitivity is according to the caseCompare argument, which defaults to RWCString::exact.
std::wstring& std();
Returns a reference to the underlying implementation of standard wide string. Only available in standard library builds of the Essential Tools Module.
const std::wstring& std() const;
Returns a const reference to the underlying implementation of standard wide string. Only available in standard library builds of the Essential Tools Module.
RWWSubString strip(stripType s = RWWString::trailing, wchar_t c = L' '); const RWWSubString strip(stripType s = RWWString::trailing, wchar_t c = L' ') const;
Returns a substring of self where the character c has been stripped off the beginning, end, or both ends of the string. The first variant can be used as an lvalue. The enum stripType can take values:
stripType | Meaning |
leading | Remove characters at beginning |
trailing | Remove characters at end |
both | Remove characters at both ends |
RWWSubString subString(const wchar_t* cs, size_t start=0, caseCompare = RWWString::exact); const RWWSubString subString(const wchar_t* cs, size_t start=0, caseCompare = RWWString::exact) const;
Returns a substring representing the first occurrence of the null-terminated string pointed to by "cs". Case sensitivity is according to the caseCompare argument; it defaults to RWWString::exact. The first variant can be used as an lvalue.
RWCString toAscii() const;
Returns an RWCString object of the same length as self, containing only ASCII characters. Any non-ASCII characters in self simply have the high bits stripped off. Use isAscii() to determine whether this function is safe to use.
RWCString toMultiByte() const;
This method does not handle embedded nulls on systems where wcstombs() doesn't handle them and returns an empty string if the wide string cannot be translated (for instance, if the correct locale is not set to convert a specific character).
void toLower();
Changes all upper-case letters in self to lower-case. Uses the C library function towlower().
void toUpper();
Changes all lower-case letters in self to upper-case. Uses the C library function towupper().
static unsigned hash(const RWWString& wstr);
Returns the hash value of wstr as returned by wstr.hash(RWWString::exact).
static size_t initialCapacity(size_t ic = 15);
Sets the minimum initial capacity of an RWWString, and returns the old value. The initial setting is 15 wide characters. Larger values will use more memory, but result in fewer resizes when concatenating or reading strings. Smaller values will waste less memory, but result in more resizes.
static size_t maxWaste(size_t mw = 15);
Sets the maximum amount of unused space allowed in a wide string should it shrink, and returns the old value. The initial setting is 15 wide characters. If more than mw characters are wasted, then excess space will be reclaimed.
static size_t resizeIncrement(size_t ri = 16);
Sets the resize increment when more memory is needed to grow a wide string. Returns the old value. The initial setting is 16 wide characters.
static RWWString fromAscii(const RWCString& str);
Convenience member function. Returns the result of the constructor RWWString(const RWCString&, RWWString::ascii).
static RWWString fromMultiByte(const RWCString& str);
Convenience member function. Returns the result of the constructor RWWString(const RWCString&, RWWString::multiByte).
bool operator==(const RWWString&, const wchar_t* ); bool operator==(const wchar_t*, const RWWString&); bool operator==(const RWWString&, const RWWString&); bool operator!=(const RWWString&, const wchar_t* ); bool operator!=(const wchar_t*, const RWWString&); bool operator!=(const RWWString&, const RWWString&);
Logical equality and inequality. Case sensitivity is exact.
bool operator< (const RWWString&, const wchar_t* ); bool operator< (const wchar_t*, const RWWString&); bool operator< (const RWWString&, const RWWString&); bool operator> (const RWWString&, const wchar_t* ); bool operator> (const wchar_t*, const RWWString&); bool operator> (const RWWString&, const RWWString&); bool operator<=(const RWWString&, const wchar_t* ); bool operator<=(const wchar_t*, const RWWString&); bool operator<=(const RWWString&, const RWWString&); bool operator>=(const RWWString&, const wchar_t* ); bool operator>=(const wchar_t*, const RWWString&); bool operator>=(const RWWString&, const RWWString&);
Comparisons are done lexicographically, byte by byte. Case sensitivity is exact. Use member collate() or strxfrm() for locale sensitivity.
RWWString operator+(const RWWString&, const RWWString&); RWWString operator+(const wchar_t*, const RWWString&); RWWString operator+(const RWWString&, const wchar_t* );
Concatenation operators.
ostream& operator<<(ostream& s, const RWWString& str);
Outputs an RWWString on ostream s as a sequence of bytes. This is done literally from the byte content of the characters.
istream& operator>>(istream& s, RWWString& str);
Calls str.readToken(s). That is, a token is read from the input stream s.
RWvostream& operator<<(RWvostream&, const RWWString& str); RWFile& operator<<(RWFile&, const RWWString& str);
Saves string str to a virtual stream or RWFile, respectively.
RWvistream& operator>>(RWvistream&, RWWString& str); RWFile& operator>>(RWFile&, RWWString& str);
Restores a wide character string into str from a virtual stream or RWFile, respectively, replacing the previous contents of str.
If the virtual stream or file experience an error while extracting the string, the operator returns the original string contents unmodified. Check the virtual stream or file to determine if an error occurred on extraction.
RWWString strXForm(const RWWString&);
Returns a string transformed by ::wsxfrm(), to allow quicker collation than RWWString::collate().
RWWString toLower(const RWWString& str);
Returns a version of str where all upper-case characters have been replaced with lower-case characters. Uses the C library function towlower().
RWWString toUpper(const RWWString& str);
Returns a version of str where all lower-case characters have been replaced with upper-case characters. Uses the C library function towupper().
© Copyright Rogue Wave Software, Inc. All Rights Reserved.
Rogue Wave and SourcePro are registered trademarks of Rogue Wave Software, Inc. in the United States and other countries. All other trademarks are the property of their respective owners.
Contact Rogue Wave about documentation or support issues.