Using the XML Streams Package without the Internationalization Module
The XML Streams classes all use UTF-8 as their internal character encoding. If you need to use the encoding UTF-16, you may use the wide character interface
RWWString, or the Unicode interface
RWBasicUString. (If you need to convert your streams to and from other character encodings, you will need to build and link the Internationalization Module. For more information, see
“Using the XML Streams Package with the Internationalization Module.”)
Choosing the Appropriate String Class Interface
The Essential Tools Module provides string classes that allow you to build your XML streams in UTF-8 and UTF-16. (For more detailed information on the encodings supported by the Essential Tools Module, see
“When You Do Not Need the Internationalization Module.”)
• The
RWCString and narrow character interface takes and produces UTF-8 encoded characters. The system does not, however, validate the data as UTF-8. Therefore, it is possible to insert characters using some other encoding, but the resulting document is not guaranteed valid.
NOTE >> Be aware that the XML Streams Module does not validate for UTF-8, so if you use another encoding with a narrow-character inserter or extractor, the resulting document may not be valid.
• The
RWWString and wide character interface take and produce UTF-16 encoded characters. Wide character data must be encoded in UTF-16 before insertion into an XML output stream.
• The
RWBasicUString and Unicode interface take and produce UTF-16 encoded characters. Using
RWBasicUString from the Essential Tools Module is the recommended and easiest way to insert or extract UTF-16 strings because it is the base class for the Internationalization Module’s
RWUString.
RWUString’s functionality has been extended to convert UTF-16 to and from any encoding. If you are using
RWBasicUString and decide that you need the Internationalization Module’s code conversion functionality, you can easily change to
RWUStrings.
NOTE >> Using
RWBasicUString for your UTF-16 strings is the recommended method over a conventional wide-character interface.
Converting to and from UTF-8 and UTF-16
You may convert your
RWCStrings,
RWBasicUStrings, and
RWWStrings to and from UTF-8 and UTF-16 as needed.
Converting RWCStrings and RWBasicUStrings
RWBasicUString provides conversion to and from UTF-16 and an
RWCString containing UTF-8.
From UTF-8 to UTF-16
To convert from UTF-8 to UTF-16, simply construct an
RWBasicUString with your
RWCString as a parameter.
RWCString myUtf8String = “hello world”;
RWBasicUString myUtf16String(myUtf8String);
From UTF-16 to UTF-8
To convert from UTF-16 to UTF-8, use the RWBasicUString.toUTF8() function.
RWCString myUtf8String(myUtf16String.toUtf8());
See the
SourcePro API Reference Guide for more information.
Converting RWWStrings to and from UTF-8
RWBasicUString does not provide a complete
RWWString or
wchar_t interface. (Its method
towide() converts from UTF-16 to an
RWWString, but it does not take a
wchar_t and return a Unicode UTF-16 string.)
To perform these wide character conversions, you may use the UTF-8 converter classes provided in the Advanced Tools Module’s Stream package.
From UTF-8 to a UTF-16 Wide Character
To convert from a UTF-8 encoded
RWCString to a UTF-16 encoded
RWWString, use
RWFromUTF8Converter as shown in this example.
RWWString destination; // 1
RWCString source(“My UTF8 characters”);
RWFromUTF8Converter converter; // 2
converter.convert(source,destination); // 3
From a UTF-16 Wide Character to UTF-8
To convert from a UTF-16 encoded
RWWString to a UTF-8 encoded
RWCString, use
RWToUTF8Converter as shown in this example.
RWWString newSource(”My UTF16 characters”);
RWCString newDestination; // 1
RWToUTF8Converter converter; // 2
converter.convert(newSource,newDestination); // 3
NOTE >> Note that these wide character conversions are suitable only on systems where wchar_t’s native encoding is UTF-16.