rwsf::XmlReader rwsf::HandleBase
#include <rwsf/core/XmlReader.h>
Class rwsf::XmlReader implements the handle/body idiom in which rwsf::XmlReaderImp is the body and rwsf::XmlReader is the handle.
rwsf::XmlReader is a simple XML pull-parser. The XML document is typically parsed element by element using readElement(), or by iteratively calling readElementStart(), readElementValue(), and readElementEnd(). On each read, this object sets its internal state with information about what was just read. Member functions getLastNodeType(), getLastName(), and getLastContent() can then be used to retrieve portions of the rwsf::XmlReader's state.
rwsf::XmlReader throws an exception of type rwsf::XmlParseError when it encounters XML that is not well-formed. The rwsf::XmlParseError exception contains a description of the error and the line and column number of the source document where the error occurred.
rwsf::XmlReader can parse documents in the encodings UTF-8, UTF-16(BE), UTF-16LE, US-ASCII, and ISO-8859-1. In addition, if the ICU library is present, rwsf::XmlReader will also convert from any character encodings supported by the ICU.
For more information on how Hydra performs conversions and how to create custom conversions, see Chapter 20, "Internationalizing Your Services," in the HydraExpress Web Service Development Guide
NOTE -- rwsf::XmlReader converts all documents to UTF-8 regardless of the encoding of the source document.
Currently, rwsf::XmlReader provides support only for reading elements and their content. No support for reading processing instructions, DOCTYPE declarations, or entity declarations is provided.
Following are some examples illustrating the most common ways to parse XML using rwsf::XmlReader. The examples are based on the following XML.
<barrel> <contents Name="wine">Pinot Noir</contents> </barrel>
Here are some common parsing tasks. For simplicity, we will start with the XML in this string:
std::string MyXmlString = "<barrel> <contents Name="wine">Pinot Noir</contents> </barrel>" rwsf::XmlReader pReader(MyXmlString.data()); // 1 pReader.readElementStart(); // 2 std::cout << "Outer Element Name = " << pReader.getLastName().getLocalName().data(); pReader.readElementStart(); // 3 std::cout << "Inner Element Name = " << pReader.getLastName().getLocalName().data(); rwsf::XmlAttributeSet attr = pReader.getLastAttributes(); // 4 rwsf::XmlAttribute name; if (attr.find(rwsf::XmlName("Name"), name)) { std::cout << "Name Attribute Value = " << name.getValue().data() << std::endl; } std::cout << "Inner Element Value = " << pReader.getLastContent().data(); // 5 pReader.readElementEnd(); // 6 pReader.readElementEnd(); // 6
//1 | Creates an rwsf::XmlReader using the string containing XML |
//2 | Reads the start tag of the outer element and outputs its name |
//3 | Reads the start tag of the inner element and outputs its name |
//4 | Retrieves the last attribute set that was read, finds the "Name" attribute, and outputs its value |
//5 | Retrieves the content of the last read element and prints it out |
//6 | Reads the inner and outer end tags.
|
Output:
Outer Element Name = barrel Inner Element Name = contents Name Attribute Value = wine Inner Element Value = Pinot Noir
typedef unsigned long XmlChar;
enum NodeType { StartTag, EndTag, EmptyTag, Data, Unknown };
static rwsf::XmlName NullName;
Static constant rwsf::XmlName that contains an empty prefix and an empty namespace URI.
XmlReader ();
Default constructor. Constructs an invalid reader.
XmlReader ( const char* buf, size_t length ); XmlReader ( const unsigned char* buf, size_t length );
Constructs a reader, using buf as input. Constructs self from the document pointed to by buf, which is length bytes long. Parses the prolog of the document if found, and determines document encoding both from the encoding= specifier in the optional XML declaration, and from a guess based on the first few bytes of the document. Upon construction, the reader is placed before the first tag in the document.
XmlReader(const std::string& document);
Convenience constructor for converting from an std::string. Constructs self from the document found in the string document. Parses the prolog of the document if found, and determines document encoding both from the encoding= specifier in the optional XML declaration, and from a guess based on the first few bytes of the document. Upon construction, the reader is placed before the first tag in the document.
~XmlReader();
Destructor.
XmlReader& operator= (const XmlReader& rdr);
Assignment operator. Associates this handle with rdr's body.
void addNamespace(const rwsf::XmlNamespace& namespace);
Adds namespace to the list of namespaces known by the reader. This methods is useful when parsing document fragments where namespaces are declared outside of the scope of the fragment.
bool eof();
Returns true if at the end of the current document; false otherwise.
rwsf::XmlReader getElementReader ( const rwsf::XmlReaderName&
name = rwsf::XmlReaderName::Empty );
Returns an rwsf::XmlReader instance for the current element. The current reader will be moved past the end of the returned element.
std::string getEncoding() const;
Returns the name of the encoding of the original source document, either from the XML declaration's "encoding=" declaration, or as automatically sensed from the first few bytes of the XML document.
bool getExpandAttributeReference() const;
Returns true if the reader is expanding entity references in attributes, false otherwise.
bool getExpandContentReference() const;
Returns the value of expandReference. A value of true expands references.
bool getExpandCommentReference() const;
Returns the value of ExpandCommentReference. A value of true expands references.
rwsf::XmlAttributeSet getLastAttributes() const;
Returns the set of attributes associated with the last node read of type rwsf::XmlReader::StartTag.
std::string getLastContent() const;
Returns the last content read, for nodes of type rwsf::XmlReader::Data. The content will be encoded in UTF-8, regardless of the encoding of the source document. This value is undefined if the last node read was not of type rwsf::XmlReader::Data.
rwsf::XmlName getLastName() const;
Returns the name of the last node read. This value is undefined if the last node read was of type rwsf::XmlReader::Data.
rwsf::XmlReader::NodeType getLastNodeType() const;
Returns the type of the last node read. The following table summarizes each type of node:
Type | Definition |
rwsf::XmlReader::StartTag | An XML start tag. |
rwsf::XmlReader::EndTag | An XML end tag. |
rwsf::XmlReader::EmptyTag | An empty XML tag. Example: "<customer/>". |
rwsf::XmlReader::Data | Data that is the content of an element, not including any tags. |
rwsf::XmlReader::Unknown | Set before the reader has read an element from the document. |
std::string getPrefixforURI(const std::string& uri) const;
Looks up the provided uri in the current list of namespaces and returns the corresponding prefix. If the current list of namespaces does not contain the uri, returns the empty string.
std::string getStandalone() const;
Returns the value of the source document's "standalone=" declaration, if it exists.
std::string getVersion() const;
Returns the value of the source document's "version=" declaration, if it exists.
std::string getURIforPrefix(const std::string& prefix) const;
Looks up the provided prefix in the current list of namespaces, returns the corresponding URI. If the current list of namespaces does not contain the prefix, returns the empty string.
bool hasEncoding() const;
Returns true if the source XML document explicitly specified an encoding. Returns false if the document's encoding was automatically sensed from the first few bytes of the XML document.
bool hasStandalone() const;
Returns true if a "standalone=" declaration existed in the source document's XML declaration.
bool isElementNext(const rwsf::XmlName &name); bool isElementNext(const std::string& name);
Returns TRUE if the next element is the one given in name.
bool isElementNextI(const rwsf::XmlReaderName& name); bool isElementNextI(const std::string& name, const std::string& uri);
Returns TRUE if the next element is the one given in name.
std::string readElement(const std::string& name); std::string readElement(const rwsf::XmlName& name = NullName);
Reads and returns the entire next element found in the XML document at the current depth. Skips past any content that may exist. If no element exists at the current depth, returns an empty string. The returned element includes the text of the starting and ending tags, along with the text of all content and child elements. The returned element will always be encoded in UTF-8, regardless of the encoding of the source document. If name is specified, the element's name must match name, otherwise throws an exception of type rwsf::XmlParseError.
void readElementEnd();
Reads the next node in the document. If the node was not an end tag, throws an exception of type rwsf::XmlParseError.
void readElementEnd(const rwsf::XmlName& name);
Reads the next node in the document. If the node was not an end tag matching name, throws an exception of type rwsf::XmlParseError.
void readElementStart();
Reads the next node in the document. If the node was not a start tag, throws an exception of type rwsf::XmlParseError.
rwsf::XmlAttributeSet readElementStart(const rwsf::XmlName& name);
Reads the next node in the document, and if the node was not a start tag, or the node's name does not match name, throws an exception of type rwsf::XmlParseError. Returns any attributes found inside the tag.
std::string readElementValue();
Reads and returns the next content from the document.
void readNextNode();
Reads the next start tag, empty tag, end tag, or content from the document. Use getLastNodeType(), getLastName(), and getLastContent() to retrieve information about what was read. If a well-formedness error is encountered while reading the document, an exception of type rwsf::XmlParseError is thrown.
This method is not typically used directly. It is used by other methods such as readElementStart(), readElementValue(), and so on.
std::string readWellFormedElement ( const rwsf::XmlName& name = NullName);
Adds any required namespace definitions so the string can be parsed on its own.
void setExpandAttributeReference(bool expandReference);
Sets whether the reader expands entity references in attributes. For example, when expandReference is true, the reader converts the attribute value 3<4 to 3<4.
void setExpandCommentReference(bool expandComment);
Sets whether references in ExpandComment is expanded. A value of true expands references.
void
setExpandContentReference(bool expandReference);
Sets whether references in content is expanded. A value of true expands references.
©2004-2007 Copyright Quovadx, Inc. All Rights Reserved.
Quovadx and Rogue Wave are registered trademarks and HydraSDO is a trademark of Quovadx, Inc. in the United States and other countries. All other trademarks are the property of their respective owners.
Contact Rogue Wave about documentation or support issues.