Rogue Wave banner
Previous fileTop of DocumentContentsIndex pageNext file
XML Streams Module User's Guide
Rogue Wave web site:  Home Page  |  Main Documentation Page

7.2 Determining your Character Encoding Needs

The XML Streams package provides a simple interface for inserting or extracting characters and strings from various encodings.

Depending on your character encoding requirements, you may or may not need to build and link the Internationalization Module. Because building and linking the Internationalization Module also links the entire International Consortium of Unicode (ICU) libraries, it is wise to first evaluate your encoding needs.

7.2.1 Deciding if You Need the Internationalization Module

This section provides an overview to help you evaluate your character encoding requirements, and whether or not you need the Internationalization Module. Section 7.2.2 includes more detailed information on the specific requirements of the XML Streams package.

7.2.1.1 When You Do Not Need the Internationalization Module

You do not need the Internationalization Module if you know you will be building XML streams in one of the following encodings:

SourcePro's regular string and stream processing classes can accommodate these five encodings without linking the Internationalization Module.

If you need conversion to and from other encodings, or more advanced manipulation of strings in UTF-16, you will want to use the Internationalization Module.

7.2.1.2 When You Do Need the Internationalization Module

If you are building and serializing XML streams in other encodings than those listed in the previous Section 7.2.1.1, you must build and link the Internationalization Module. The Internationalization Module can convert a byte stream to and from any encoding and UTF-16, and offers advanced manipulation of strings, such as collation, Unicode regular expression searches, and resource bundles.

7.2.2 The XML Streams Package Character Encoding Requirements

The classes in the XML Streams package all read and write UTF-8 encoded documents. This means that the XML input streams take in UTF-8 only, and the XML output streams produce UTF-8 only.

You can, however, take advantage of various conversion utilities in SourcePro Core to convert your XML streams to and from any recognized character encoding. The Essential Tools Module and Internationalization Module contain classes that help you convert to and from UTF-8 prior to sending your data or character into the input stream, and after your data or character is returned by the output stream.

In addition, you may use a UTF-16 Unicode or wide character inserter or extractor interface to your XML streams, and the XML streams classes will internally convert between UTF-16 and UTF-8 as necessary. For a discussion on narrow character, wide, and Unicode interfaces, see Section 7.2.2.1 and Section 7.2.2.2.


If your strings are in a non-UTF-8 or UTF-16 encoding, you must first convert them before inserting them into an XML stream. For more information, see Section 7.4.

7.2.2.1 Narrow Character Interfaces

All narrow character interfaces, such as RWCString, char, and char* inserters and extractors, take or produce only UTF-8 encoded characters. If you are using an XML output stream with a narrow character interface, and you try to insert into the stream a non-UTF-8 character, the stream may produce an incorrect document. If your character encoding is UTF-16, you may use RWBasicUString from the Essential Tools Module to convert it to UTF-8. If your encoding is other than UTF-8 or UTF-16, you will need to use RWUString and the conversion utility classes from the Internationalization Module. See Section 7.4.

7.2.2.2 Wide and Unicode Character Interfaces

All wide and Unicode character interfaces, such as RWWString, RWBasicUString, RWUString, wchar_t, and wchar_t* inserters and extractors, take or produce only UTF-16 encoded characters. If you are using an XML output stream with a wide or Unicode character interface, and you try to insert into the stream a non-UTF-16 character, the stream may produce an incorrect document.

Output Streams

XML output streams convert UTF-16 encoded characters to UTF-8 before passing them on to the underlying data stream, as illustrated in Figure 2.

You may optionally convert your strings to another encoding after extracting them from the XML.

Figure 2: Wide or Unicode Interfaces to Output Streams

Input Streams

XML input streams convert from UTF-8 to UTF-16 before returning wide or Unicode characters or strings, as illustrated in Figure 3.

You may optionally convert your strings to another encoding after extracting them from the XML.

Figure 3: Wide or Unicode Interfaces to Input Streams



Previous fileTop of DocumentContentsNo linkNext file

Copyright © Rogue Wave Software, Inc. All Rights Reserved.

The Rogue Wave name and logo, and SourcePro, are registered trademarks of Rogue Wave Software. All other trademarks are the property of their respective owners.
Provide feedback to Rogue Wave about its documentation.