SourcePro® C++ API Reference Guide

Product Documentation:
   SourcePro C++
Documentation Home
List of all members | Public Types | Static Public Member Functions
RWUTF8Helper Class Reference

Provides common functionality used to encode and decode UTF-8 sequences. More...

#include <rw/stream/RWUTF8Helper.h>

Public Types

enum  EncodingCategory {
  oneByte, twoBytes, threeBytes, fourBytes,
  highSurrogate, missingLowSurrogate, lowSurrogateWithoutHighSurrogate, invalidUTF8Encoding
}
 

Static Public Member Functions

static EncodingCategory decodeFirstByte (RWByte b)
 
static EncodingCategory decodeFourBytesEncoding (RWByte firstByte, RWByte secondByte, RWByte thirdByte, RWByte fourthByte, RWUChar &highSurrogateValue, RWUChar &lowSurrogateValue)
 
static EncodingCategory decodeThreeBytesEncoding (RWByte firstByte, RWByte secondByte, RWByte thirdByte, RWUChar &res)
 
static EncodingCategory decodeTwoBytesEncoding (RWByte firstByte, RWByte secondByte, RWUChar &res)
 
static EncodingCategory encodeOneUChar (RWUChar uc, RWByte *res, RWUChar highSurrogateValue=0)
 

Detailed Description

The class RWUTF8Helper provides common functionality used to encode and decode UTF-8 sequences.

Member Enumeration Documentation

 

Enumerator
oneByte 

One byte encoding form of UTF-8

twoBytes 

Two bytes encoding form of UTF-8

threeBytes 

Three bytes encoding form of UTF-8

fourBytes 

Four bytes encoding from of UTF-8

highSurrogate 

The character to be encoded is a high surrogate

missingLowSurrogate 

No low surrogate after a high surrogate

lowSurrogateWithoutHighSurrogate 

A low surrogate was not preceded by a high surrogate

invalidUTF8Encoding 

The encoding is not recognized as UTF-8

Member Function Documentation

static EncodingCategory RWUTF8Helper::decodeFirstByte ( RWByte  b)
static

Takes the first byte of a UTF-8 byte sequence encoding a single UTF-16 character, and returns the encoding category to which it belongs. Throws no exceptions.

Parameters
bThe first byte of a UTF-8 byte sequence encoding a single UTF-16 character
static EncodingCategory RWUTF8Helper::decodeFourBytesEncoding ( RWByte  firstByte,
RWByte  secondByte,
RWByte  thirdByte,
RWByte  fourthByte,
RWUChar highSurrogateValue,
RWUChar lowSurrogateValue 
)
static

Decodes a four-byte UTF-8 sequence. The function returns invalidUTF8Encoding in case the four-byte sequence doesn't represent a valid UTF-8 encoding sequence. Throws no exceptions.

Parameters
firstByteThe first byte of a UTF-8 four-byte sequence encoding a single UTF-16 character.
secondByteThe second byte of a UTF-8 four-byte sequence encoding a single UTF-16 character.
thirdByteThe third byte of a UTF-8 four-byte sequence encoding a single UTF-16 character.
fourthByteThe fourth byte of a UTF-8 four-byte sequence encoding a single UTF-16 character.
highSurrogateValueThe UTF-16 high surrogate resulting from the decoding of the four-byte UTF-8 sequence.
lowSurrogateValueThe UTF-16 low surrogate resulting from the decoding of the four-byte UTF-8 sequence.
static EncodingCategory RWUTF8Helper::decodeThreeBytesEncoding ( RWByte  firstByte,
RWByte  secondByte,
RWByte  thirdByte,
RWUChar res 
)
static

Decodes a three-byte encoding UTF-8 sequence. The function returns invalidUTF8Encoding if the three-byte sequence doesn't represent a valid UTF-8 encoding sequence. Throws no exceptions.

Parameters
firstByteThe first byte of a UTF-8 three-byte sequence encoding a single UTF-16 character.
secondByteThe second byte of a UTF-8 three-byte sequence encoding a single UTF-16 character.
thirdByteThe third byte of a UTF-8 three-byte sequence encoding a single UTF-16 character.
resThe UTF-16 character resulting from the decoding of the three-byte UTF-8 sequence
static EncodingCategory RWUTF8Helper::decodeTwoBytesEncoding ( RWByte  firstByte,
RWByte  secondByte,
RWUChar res 
)
static

Decodes a two-byte encoding UTF-8 sequence. The function returns invalidUTF8Encoding in case the two-byte sequence doesn't represent a valid UTF-8 encoding sequence. Throws no exceptions.

Parameters
firstByteThe first byte of a UTF-8 two-byte sequence encoding a single UTF-16 character.
secondByteThe second byte of a UTF-8 two-byte sequence encoding a single UTF-16 character.
resThe UTF-16 character resulting from the decoding of the two-byte UTF-8 sequence
static EncodingCategory RWUTF8Helper::encodeOneUChar ( RWUChar  uc,
RWByte res,
RWUChar  highSurrogateValue = 0 
)
static

Encodes the UTF-16 character uc according to UTF-8. The function returns the UTF-8 encoding category that was used to convert the UTF-16 character, or an error if the UTF-16 character could not be transformed. Throws no exceptions.

Parameters
ucThe UTF-16 character to be transformed.
resA pointer to a byte array containing at least four bytes. The byte array is used to store the transformation result.
highSurrogateValueThis parameter is only used when a high surrogate was previously encountered.

Copyright © 2016 Rogue Wave Software, Inc. All Rights Reserved.
Rogue Wave and SourcePro are registered trademarks of Rogue Wave Software, Inc. in the United States and other countries. All other trademarks are the property of their respective owners.
Provide feedback to Rogue Wave about its documentation.