Rogue Wave banner
Previous fileTop of DocumentContentsIndex pageNext file
Internationalization Module User's Guide
Rogue Wave web site:  Home Page  |  Main Documentation Page

4.3 Explicit Conversions

A converter is an object that converts text from one encoding to another. The Internationalization Module provides two converter classes:

An explicit conversion uses an instance of RWUToUnicodeConverter or RWUFromUnicodeConverter to specify how a particular conversion should be performed. The following sections describe how to create and manipulate converters.

4.3.1 Creating Converters

A converter instance is associated with an encoding at construction time. This association cannot be changed once a converter object is instantiated. For example, the following code creates an RWUToUnicodeConverter instance that converts from ISO-8859-1 to UTF-16:

This code constructs an RWUFromUnicodeConverter converter that converts from UTF-16 to Shift-JIS:

The encoding names recognized by the Internationalization Module may be accessed programmatically, as described in Section 4.2.

4.3.2 Explicitly Converting to Unicode

Class RWUToUnicodeConverter converts text from any recognized encoding to UTF-16. An instance of this class can be used to convert byte sequences that represent characters in a specific character encoding into the code unit sequences that represent those characters in the UTF-16 character encoding form.

RWUString provides constructors that accept text and an RWUToUnicodeConverter instance to use to convert the text to UTF-16:

Similarly, some RWURegularExpression constructors accept an RWUToUnicodeConverter instance used to convert the pattern data to UTF-16. (See Section 8.4 for more information on regular expressions.)

RWUToUnicodeConverter also provides explicit convert() methods that accept a byte sequence in the associated encoding and a reference to an RWUString to hold the result of the conversion to UTF-16. For example, assuming source holds text encoded in ASCII, this code converts the byte sequence to UTF-16:

The convert() method appends the results of a conversion to a target buffer. The convert() method also accepts a Boolean flush argument, with a default value of true. When flush is true, convert() flushes its internal buffers to the target buffer and clears its internal state. For modal encodings such as ISO-2022, clearing the internal state ensures that the next call to convert() can expect the source text to begin in the source encoding's default, unshifted state.

Calling convert() once with a value of true for flush is useful when converting a piece of text in its entirety from a source encoding to UTF-16. In contrast, convert() may be used to fill a target buffer in a piecemeal fashion. Repeatedly calling convert() with a value of false for flush, then calling it once with a value of true, causes convert() to flush its buffers and clear its internal state only at the end of a multipart conversion process.

4.3.3 Explicitly Converting from Unicode

Class RWUFromUnicodeConverter converts text from UTF-16 to any recognized character encoding. An instance of this class can be used to convert code unit sequences that represent characters in the UTF-16 character encoding form into the byte sequences required to represent those characters in a specific character encoding.

RWUString provides a toBytes() method that accepts an RWUFromUnicodeConverter instance, and returns an RWCString containing the byte sequence produced when the contents of the RWUString are converted using the given converter. For example, assuming source is an RWUString:

RWUFromUnicodeConverter also provides an explicit convert() method that accepts UTF-16 source text and a reference to an object to hold the converted byte sequence. For example, assuming source holds text encoded in UTF-16, this code converts its contents to Shift-JIS and holds the results in a Standard C++ Library string:

The convert() method also accepts a Boolean flush argument that may be used to flush the internal buffers of a converter and clear its internal state. The default value is true. See Section 4.3.2 for more information.

4.3.4 Conversion Errors

A conversion simply maps characters from a source encoding to a target encoding. Normally this is a straightforward process of replacing all the code point values for characters in the source encoding with the code point values for those characters in the target encoding. However, errors can occur in this process. For example, the character being converted may not have a representation in the target encoding, or the code units in the source string may be impossible to interpret as a code point value in the source encoding. When errors such as these occur, the converter can respond in several ways:

For both RWUToUnicodeConverter and RWUFromUnicodeConverter, the default error-handling response is to substitute for the offending character. RWUToUnico-deConverter uses U+FFFD as its substitution sequence. RWUFromUnicodeConverter uses a substitution sequence appropriate for the target encoding. For example, the substitution sequence for most ASCII-based encodings is 0x1a. You can change the default substitution sequence for a conversion from Unicode by calling RWUFromUnicodeConverter::setSubstitutionSequence().

To change a converter's error-handling behavior, call method RWUToUnicodeConverter::setErrorResponse() or method RWUFromUnicodeConverter::setErrorResponse(). Each of these methods accepts an enum value. The set of available enum values depends on the direction of the converter. The function RWUToUncodeConverter::setErrorResponse() accepts the following enum values:

The function RWUFromUnicodeConverter::setErrorResponse() provides a similar set of error-handling tactics, but supports a wider variety of escaping options to facilitate working with different target encodings:

4.3.5 Saving and Restoring the Error Response State

Both RWUToUnico-deConverter and RWUFromUnicodeConverter provide saveErrorResponseState() methods that save the current error handling state of a converter using RWUToUnicodeConverter::ErrorResponseState and RWUFromUnicodeConverter::ErrorResponseState. You can use these methods to save the current error response state prior to calling setErrorResponse(). (See Section 4.3.4.) The provided restoreErrorResponseState() methods restore the saved state. For example:


The saved state from one converter may be used to set the state on another converter. However, this operation may not be safe in future versions of the Internationalization Module.

4.3.6 Resetting Converters

At the conclusion of a successful call to convert() with the flush argument set to true (the default), a converter is automatically reset to a default, initial state. Sometimes, however, it may be necessary to reset a converter explicitly using the provided methods RWUToUnicodeConverter::reset() and RWUFromUnicodeConverter::reset(). For example:



Previous fileTop of DocumentContentsNo linkNext file

Copyright © Rogue Wave Software, Inc. All Rights Reserved.

The Rogue Wave name and logo, and SourcePro, are registered trademarks of Rogue Wave Software. All other trademarks are the property of their respective owners.
Provide feedback to Rogue Wave about its documentation.