XML Streams Module User’s Guide : Chapter 7 International Features of the XML Streams Module : Using the XML Streams Package without the Internationalization Module
Using the XML Streams Package without the Internationalization Module
The XML Streams classes all use UTF-8 as their internal character encoding. If you need to use the encoding UTF-16, you may use the wide character interface RWWString, or the Unicode interface RWBasicUString. (If you need to convert your streams to and from other character encodings, you will need to build and link the Internationalization Module. For more information, see “Using the XML Streams Package with the Internationalization Module.”)
Choosing the Appropriate String Class Interface
The Essential Tools Module provides string classes that allow you to build your XML streams in UTF-8 and UTF-16. (For more detailed information on the encodings supported by the Essential Tools Module, see “When You Do Not Need the Internationalization Module.”)
These classes are RWCString, RWWString, and RWBasicUString.
The RWCString and narrow character interface takes and produces UTF-8 encoded characters. The system does not, however, validate the data as UTF-8. Therefore, it is possible to insert characters using some other encoding, but the resulting document is not guaranteed valid.
NOTE >> Be aware that the XML Streams Module does not validate for UTF-8, so if you use another encoding with a narrow-character inserter or extractor, the resulting document may not be valid.
If you are using RWCString and you need UTF-16 encoded characters, you may convert to and from UTF-8 and UTF-16. For information on UTF-8 to UTF-16 conversions, see “Converting to and from UTF-8 and UTF-16”.
The RWWString and wide character interface take and produce UTF-16 encoded characters. Wide character data must be encoded in UTF-16 before insertion into an XML output stream.
The RWBasicUString and Unicode interface take and produce UTF-16 encoded characters. Using RWBasicUString from the Essential Tools Module is the recommended and easiest way to insert or extract UTF-16 strings because it is the base class for the Internationalization Module’s RWUString. RWUString’s functionality has been extended to convert UTF-16 to and from any encoding. If you are using RWBasicUString and decide that you need the Internationalization Module’s code conversion functionality, you can easily change to RWUStrings.
NOTE >> Using RWBasicUString for your UTF-16 strings is the recommended method over a conventional wide-character interface.
Converting to and from UTF-8 and UTF-16
You may convert your RWCStrings, RWBasicUStrings, and RWWStrings to and from UTF-8 and UTF-16 as needed.
Converting RWCStrings and RWBasicUStrings
RWBasicUString provides conversion to and from UTF-16 and an RWCString containing UTF-8.
From UTF-8 to UTF-16
To convert from UTF-8 to UTF-16, simply construct an RWBasicUString with your RWCString as a parameter.
 
RWCString myUtf8String = “hello world”;
RWBasicUString myUtf16String(myUtf8String);
From UTF-16 to UTF-8
To convert from UTF-16 to UTF-8, use the RWBasicUString.toUTF8() function.
 
RWCString myUtf8String(myUtf16String.toUtf8());
See the SourcePro API Reference Guide for more information.
Converting RWWStrings to and from UTF-8
RWBasicUString does not provide a complete RWWString or wchar_t interface. (Its method towide() converts from UTF-16 to an RWWString, but it does not take a wchar_t and return a Unicode UTF-16 string.)
To perform these wide character conversions, you may use the UTF-8 converter classes provided in the Advanced Tools Module’s Stream package.
From UTF-8 to a UTF-16 Wide Character
To convert from a UTF-8 encoded RWCString to a UTF-16 encoded RWWString, use RWFromUTF8Converter as shown in this example.
 
RWWString destination; // 1
RWCString source(“My UTF8 characters”);
 
RWFromUTF8Converter converter; // 2
converter.convert(source,destination); // 3
//1 Start with an empty RWWString and an RWCString containing UTF-8 encoded characters.
//2 Create the RWFromUTF8Converter.
//3 Call the convert() function on RWFromUTF8Converter with the source and destination strings as parameters. After this call, the destination local variable will contain the source string encoded in a UTF-16 RWWString.
From a UTF-16 Wide Character to UTF-8
To convert from a UTF-16 encoded RWWString to a UTF-8 encoded RWCString, use RWToUTF8Converter as shown in this example.
 
RWWString newSource(”My UTF16 characters”);
RWCString newDestination; // 1
 
RWToUTF8Converter converter; // 2
converter.convert(newSource,newDestination); // 3
//1 Create a new empty RWCString and a new RWWString source.
//2 Create the RWToUTF8Converter.
//3 Call the convert() function on RWFromUTF8Converter with the source and destination strings as parameters. After this call, the newDestination local variable will contain the source string encoded in UTF-8.
NOTE >> Note that these wide character conversions are suitable only on systems where wchar_t’s native encoding is UTF-16.