rwlogo
HydraExpress 4.6

HydraExpress C++ API Reference Guide


   


Rogue Wave web site:  Home Page  |  Main Documentation Page

rwsf::XmlReader Class Reference
[Core XML]

A simple XML pull-parser that implements reference semantics. More...

#include <rwsf/core/XmlReader.h>

Inheritance diagram for rwsf::XmlReader:
rwsf::HandleBase

List of all members.

Public Types

enum  NodeType {
  StartTag, EndTag, EmptyTag, Data,
  Unknown
}

Public Member Functions

 XmlReader ()
 XmlReader (const char *buf, size_t length)
 XmlReader (const unsigned char *buf, size_t length)
 XmlReader (const std::string &document)
bool eof ()
bool isElementNext (const rwsf::XmlName &name)
bool isElementNext (const std::string &name)
std::string readElementValue ()
std::string readElement (const rwsf::XmlName &name=NullName)
std::string readElement (const std::string &name)
std::string readWellFormedElement (const rwsf::XmlName &name=NullName)
rwsf::XmlReader getElementReader (const rwsf::XmlReaderName &name=rwsf::XmlReaderName::Empty)
rwsf::XmlAttributeSet readElementStart (const rwsf::XmlName &name)
void readElementEnd (const rwsf::XmlName &name)
rwsf::XmlName getLastName () const
std::string getLastContent () const
rwsf::XmlAttributeSet getLastAttributes () const
NodeType getLastNodeType () const
void addNamespace (const rwsf::XmlNamespace &ns)
void readNextNode ()
bool getExpandAttributeReference () const
void setExpandAttributeReference (bool expandReference)
bool getExpandContentReference () const
void setExpandContentReference (bool expandReference)
bool getExpandCommentReference () const
void setExpandCommentReference (bool expandComment)
std::string getEncoding () const
std::string getVersion () const
std::string getStandalone () const
bool hasStandalone () const
bool hasEncoding () const
std::string getPrefixForURI (const std::string &uri) const
std::string getURIForPrefix (const std::string &prefix) const
void readElementEnd ()
void readElementStart ()

Static Public Attributes

static rwsf::XmlName NullName

Detailed Description

Class rwsf::XmlReader is a simple XML pull-parser. The XML document is typically parsed element by element using readElement(), or by iteratively calling readElementStart(), readElementValue(), and readElementEnd(). On each read, an XmlReader instance sets its internal state with information about the content it just read. Member functions getLastNodeType(), getLastName(), and getLastContent() can then be used to retrieve portions of the rwsf::XmlReader's state.

Note:
This class uses reference semantics in which an instance of this class represents a reference to an implementation class.

rwsf::XmlReader throws an exception of type rwsf::XmlParseException when it encounters XML that is not well-formed. The rwsf::XmlParseException exception contains a description of the error and the line and column number of the source document where the error occurred.

rwsf::XmlReader can parse documents in the encodings UTF-8, UTF-16(BE), UTF-16LE, US-ASCII, and ISO-8859-1. In addition, if the rwsf_icu library is present, rwsf::XmlReader also converts from any character encodings supported by the ICU.

Please see the XML Binding Development Guide for further information on conversions and custom converters.

Note:
rwsf::XmlReader converts all documents to UTF-8 regardless of the encoding of the source document.

Currently, rwsf::XmlReader provides support only for reading elements and their content. No support for reading processing instructions, DOCTYPE declarations, or entity declarations is provided.


Member Enumeration Documentation

Enumeration of different node types in XML.

Enumerator:
StartTag 

An XML start tag; e.g., <customer>.

EndTag 

An XML end tag; e.g., </customer>.

EmptyTag 

An empty XML tag; e.g., <customer/>.

Data 

Data that is the content of an element, not including any tags; e.g., John Doe.

Unknown 

Set before the reader has read an element from the document.


Constructor & Destructor Documentation

rwsf::XmlReader::XmlReader (  ) 

Default constructor. Constructs an invalid reader.

rwsf::XmlReader::XmlReader ( const char *  buf,
size_t  length 
)

Constructs a reader from the document pointed to by buf, which is length bytes long. Parses the prolog of the document if found, and determines the document encoding both from the encoding= specifier in the optional XML declaration, and from a guess based on the first few bytes of the document. Upon construction, the reader is placed before the first tag in the document.

rwsf::XmlReader::XmlReader ( const unsigned char *  buf,
size_t  length 
)

Constructs a reader from the document pointed to by buf, which is length bytes long. Parses the prolog of the document if found, and determines the document encoding both from the encoding= specifier in the optional XML declaration, and from a guess based on the first few bytes of the document. Upon construction, the reader is placed before the first tag in the document.

rwsf::XmlReader::XmlReader ( const std::string &  document  ) 

Convenience constructor for converting from an std::string. Constructs a reader from the XML document in document. Parses the prolog of the document if found, and determines the encoding used by document, both from the encoding= specifier in the optional XML declaration, and from a guess based on the first few bytes of the document. Upon construction, the reader is placed before the first tag in the document.


Member Function Documentation

void rwsf::XmlReader::addNamespace ( const rwsf::XmlNamespace ns  ) 

Adds ns to the list of namespaces known by the reader. This method is useful when parsing document fragments where namespaces are declared outside the scope of the fragment.

bool rwsf::XmlReader::eof (  ) 

Returns true if at the end of the current document; false otherwise.

rwsf::XmlReader rwsf::XmlReader::getElementReader ( const rwsf::XmlReaderName &  name = rwsf::XmlReaderName::Empty  ) 

Returns a new rwsf::XmlReader instance for the current element, as if the current element in its entirety were this new document's root. This new reader copies the state of the parent reader, but its internal cursor is set to the beginning of the element, so that functions like readElementStart(), readElementValue(), etc. return the current element's information. The parent reader will have its cursor advanced past the element, so any of the parent reader's read() functions return the next element's information instead.

Exceptions:
rwsf::XmlParseException The current element's name is not the provided name.
std::string rwsf::XmlReader::getEncoding (  )  const

Returns the name of the encoding of the original source document, either from the XML declaration's "encoding=" declaration, or as automatically sensed from the first few bytes of the XML document.

bool rwsf::XmlReader::getExpandAttributeReference (  )  const

Returns true if the reader expands entity references in attributes, false otherwise. See setExpandAttributeReference() for an example of usage.

bool rwsf::XmlReader::getExpandCommentReference (  )  const

Returns true if the reader expands comments found in XML content, false otherwise. See setExpandCommentReference() for an example of usage.

bool rwsf::XmlReader::getExpandContentReference (  )  const

Returns true if the reader expands XML references in content, false otherwise. See setExpandContentReference() for an example of usage.

rwsf::XmlAttributeSet rwsf::XmlReader::getLastAttributes (  )  const

Returns the set of attributes associated with the last node read of type rwsf::XmlReader::StartTag.

std::string rwsf::XmlReader::getLastContent (  )  const

Returns the last content read, for nodes of type rwsf::XmlReader::Data. This value is undefined if the last node read was not of type rwsf::XmlReader::Data.

Note:
The content will be encoded in UTF-8, regardless of the encoding of the source document.
rwsf::XmlName rwsf::XmlReader::getLastName (  )  const

Returns the name of the last node read. This value is undefined if the last node read was of type rwsf::XmlReader::Data.

NodeType rwsf::XmlReader::getLastNodeType (  )  const

Returns the type of the last node read. See NodeType for more information on the NodeType enumeration.

std::string rwsf::XmlReader::getPrefixForURI ( const std::string &  uri  )  const

Looks up the provided uri in the current list of namespaces and returns the corresponding prefix. If the current list of namespaces does not contain the uri, returns the empty string.

std::string rwsf::XmlReader::getStandalone (  )  const

Returns the value of the source document's "standalone=" declaration if it exists, the empty string otherwise.

std::string rwsf::XmlReader::getURIForPrefix ( const std::string &  prefix  )  const

Looks up the provided prefix in the current list of namespaces, returns the corresponding URI. If the current list of namespaces does not contain the prefix, returns the empty string.

std::string rwsf::XmlReader::getVersion (  )  const

Returns the value of the source document's "version=" declaration if it exists, the empty string otherwise.

bool rwsf::XmlReader::hasEncoding (  )  const

Returns true if the source XML document explicitly specifies an encoding. Returns false if the document does not specify an encoding, i.e. the encoding was automatically determined from the first few bytes of the XML document.

bool rwsf::XmlReader::hasStandalone (  )  const

Returns true if a "standalone=" declaration exists in the source document's XML declaration.

bool rwsf::XmlReader::isElementNext ( const std::string &  name  ) 

Returns true if name is the next element.

Note:
If a qualified name is required for name, name must be an instance of XmlName. Any element or type name used in an std::string is considered an unqualified local name, even if it contains a namespace prefix and/or URI.

bool rwsf::XmlReader::isElementNext ( const rwsf::XmlName name  ) 

Returns true if name is the next element.

Note:
If a qualified name is required for name, name must be an instance of XmlName. Any element or type name used in an std::string is considered an unqualified local name, even if it contains a namespace prefix and/or URI.
std::string rwsf::XmlReader::readElement ( const std::string &  name  ) 

Reads in the next element from the current document and returns the entire element. A name can be provided, in which case the element's name must match, or an exception is thrown.This method returns the entire XML for the element, rooted at the element (in other words, the element's start and end tag will be a part of the resulting string). Also returned is all content and child tags with their content. In effect, the method grabs the element wholesale and gives it to you in string form.

Note:
The returned string will always be encoded in UTF-8, regardless of the original source encoding.

If a qualified name is required for name, name must be an instance of XmlName. Any element or type name used in an std::string is considered an unqualified local name, even if it contains a namespace prefix and/or URI.

Exceptions:
rwsf::XmlParseException The current element's name is not the provided name.
rwsf::XmlParseException The element's XML is invalid or malformed.

std::string rwsf::XmlReader::readElement ( const rwsf::XmlName name = NullName  ) 

Reads in the next element from the current document and returns the entire element. A name can be provided, in which case the element's name must match, or an exception is thrown.

This method returns the entire XML for the element, rooted at the element (in other words, the element's start and end tag will be a part of the resulting string). Also returned is all content and child tags with their content. In effect, the method grabs the element wholesale and gives it to you in string form.

Note:
The returned string will always be encoded in UTF-8, regardless of the original source encoding.

If a qualified name is required for name, name must be an instance of XmlName. Any element or type name used in an std::string is considered an unqualified local name, even if it contains a namespace prefix and/or URI.

Exceptions:
rwsf::XmlParseException The current element's name is not the provided name.
rwsf::XmlParseException The element's XML is invalid or malformed.
void rwsf::XmlReader::readElementEnd (  ) 

Reads the next node in the document. If the node is not an end tag, throws an exception.

Exceptions:
rwsf::XmlParseException The next tag is not an end tag.
void rwsf::XmlReader::readElementEnd ( const rwsf::XmlName name  ) 

Reads the next node in the document. If the node is not an end tag matching name, throws an exception of type rwsf::XmlParseException.

Exceptions:
rwsf::XmlParseException The current element's name is not the provided name.
rwsf::XmlParseException The next tag is not an end tag.
void rwsf::XmlReader::readElementStart (  ) 

Reads the next node in the document. If the node is not a start tag, throws an exception.

Exceptions:
rwsf::XmlParseException The next tag is not a start tag.
rwsf::XmlAttributeSet rwsf::XmlReader::readElementStart ( const rwsf::XmlName name  ) 

Reads the next node in the document. If the node is not a start tag, or the node's name does not match name, throws an exception of type rwsf::XmlParseException. Returns any attributes found inside the tag.

Exceptions:
rwsf::XmlParseException The current element's name is not the provided name.
rwsf::XmlParseException The next tag is not a start tag.
std::string rwsf::XmlReader::readElementValue (  ) 

Reads and returns the next element content from the document. The element's start or end tags are not included in the returned string. If getExpandCommentReference() returns false, comments will not be included in the output. If getExpandContentReference() returns false, the output will contain entity references (&lt;, &gt;, etc.). Otherwise, comments are printed and entity references unescaped, respectively.

See also:
getExpandCommentReference() and getExpandContentReference() for more information on comment and entity reference expansion.
void rwsf::XmlReader::readNextNode (  ) 

Reads the next start tag, empty tag, end tag, or content from the document. Use getLastNodeType(), getLastName(), and getLastContent() to retrieve information on what was read. If a well-formedness error is encountered while reading the document, an exception of type rwsf::XmlParseException is thrown.

Note:
This method is not typically used directly. It is used by other methods such as readElementStart(), readElementValue(), and so on.
std::string rwsf::XmlReader::readWellFormedElement ( const rwsf::XmlName name = NullName  ) 

This method functions exactly like readElement(), except that it adds namespace declarations to the element's start tag to allow the element to be well formed. This includes namespaces declared on parent elements that are in use by this element or one of its children. You can expect that the element alone will be able to resolve its namespaces internally, even if they were declared external to this element.

Note:
The returned string will always be encoded in UTF-8, regardless of the original source encoding.
Exceptions:
rwsf::XmlParseException The current element's name is not the provided name.
rwsf::XmlParseException The element's XML is invalid or malformed.
void rwsf::XmlReader::setExpandAttributeReference ( bool  expandReference  ) 

Sets whether the reader expands entity references in attributes. For example, when expandReference is true (the default), the reader converts the attribute value, like so:

 3 &lt; 4

to:

 3 < 4.
void rwsf::XmlReader::setExpandCommentReference ( bool  expandComment  ) 

Sets whether the reader expands comments found in XML content. The default is expandComment = false.

When expandComment is true, the reader keeps the comment in the element value returned from readElement():

 <elem><!-- My Comment --></elem>

to:

 <elem><!-- My Comment --></elem>

If expandComment is false (the default), the above example is converted to:

 <elem></elem>
void rwsf::XmlReader::setExpandContentReference ( bool  expandReference  ) 

Sets whether the reader expands entity references in content. For example, when expandReference is true (the default), the reader converts the element value returned from readElement(), like so:

 <elem>5 &lt; 20</elem>

to:

 <elem>5 < 20 </elem>

Member Data Documentation

Static constant rwsf::XmlName that contains an empty prefix and an empty namespace URI.


Copyright © Rogue Wave Software, Inc. All Rights Reserved.

The Rogue Wave name and logo are registered trademarks of Rogue Wave Software, and HydraExpress is a trademark of Rogue Wave Software. All other trademarks are the property of their respective owners.