Rogue Wave banner
Previous fileTop of DocumentContentsIndex pageNext file
Internationalization Module Reference Guide
Rogue Wave web site:  Home Page  |  Main Documentation Page

RWUBreakSearch

Module:  Internationalization Module   Group:  Unicode String Processing


Does Not Inherit

Local Index

Members

Header File

#include <rw/i18n/RWUBreakSearch.h> 

Description

RWUBreakSearch finds the locations of breaks, or potential breaks, in text. Whitespace and punctuation are correctly interpreted in accordance with a specified locale.

Breaks reported from RWUBreakSearch are located immediately prior to the reported location. For example, a character break reported at offset 0 occurs just before the first character. RWUConstStringIterator instances returned by member functions of RWUBreakSearch are positioned at the code point immediately following a break.

Five types of text breaks are supported by RWUBreakSearch:

RWUBreakSearch objects are created given the break type to search for, an RWUString which provides text for processing, and an optional locale name. If no locale is specified, then the current default locale is used.

After instantiating a break search, you can search for the specified break type using the first(), last(), next(), and previous() methods. RWUBreakSearch objects maintain a "current" position. Initially, the current position is the start of the source string. Calls to first(), last(), next(), and previous() alter the current position.

Note that breaks occur both before and after each unit being queried. This is true for all types of break searches. For example, there are a total of four character breaks in the string abc. There is a break before the a, before the b, before the c, and after the c. This may require special handling of the ends of strings, which are always break locations. Consider the following loop:

If the character break that is located at the str.endCodePointIterator() position--like the break after the c above--should be processed, then you must take care to process it outside the body of the loop.

Example

The following example counts the number of sentences in a string:

Public Typedefs

enum BreakType { CodePoint,
                 Character,
                 Word,
                 Line,
                 Sentence
};

NOTE -- RWUBreakSearch attempts to interpret nested quotes, nested parentheses, and periods that may either end a sentence, or be part of a number or abbreviation. This is a difficult problem, however, and the results are not guaranteed to be perfect.

Public Constructors

RWUBreakSearch(BreakType type, const RWUString& str,
           const RWULocale& locale = RWULocale::getDefault());

NOTE -- Distinct (deep) copies of the type and locale arguments are made within the RWUBreakSearch object, but only a reference to the input RWUString is held. Consequently, you must take care not to allow the string used to create the RWUBreakSearch to be changed before the last use of that RWUBreakSearch object. Destroying or changing the RWUString referenced by an RWUBreakSearch object invalidates that RWUBreakSearch object.
RWUBreakSearch(const RWUBreakSearch& source);

NOTE -- The RWUString referenced by self is the same RWUString referenced by source. Consequently, you must take care not to allow the string to be changed before the last use of self. Destroying or changing the RWUString referenced by an RWUBreakSearch object invalidates that RWUBreakSearch object.

Public Destructor

~RWUBreakSearch();

Public Member Operators

RWUBreakSearch&
operator=(const RWUBreakSearch& rhs);

NOTE -- The RWUString referenced by self is the same RWUString referenced by rhs. Consequently, you must take care not to allow the string to be changed before the last use of self. Destroying or changing the RWUString referenced by an RWUBreakSearch object invalidates that RWUBreakSearch object.

Public Member Functions

RWUConstStringIterator
back(void) const;
RWUConstStringIterator 
current(void) const;
RWUConstStringIterator
first(void);
RWUConstStringIterator
front(void) const;
RWCString
getLocale(void) const;
const RWUString&
getString(void) const;
BreakType
getType(void) const;
bool
isBreak(const RWUConstStringIterator& position) const;
bool
isBreak(const RWUStringIterator& position) const;
bool
isBreak(size_t offset) const;
RWUConstStringIterator
last(void);
RWUConstStringIterator
next(void);
RWUConstStringIterator
next(const RWUStringIterator& position);
RWUConstStringIterator
next(const RWUConstStringIterator& position);
RWUConstStringIterator
previous(void);
RWUConstStringIterator
previous(const RWUStringIterator& position);
RWUConstStringIterator
previous(const RWUConstStringIterator& position);
void
setLocale(const RWULocale& locale);
void
setString(const RWUString &str);
void
setType(BreakType type);


Previous fileTop of DocumentContentsIndex pageNext file

© Copyright Rogue Wave Software, Inc. All Rights Reserved.
Rogue Wave and SourcePro are registered trademarks of Rogue Wave Software, Inc. in the United States and other countries. All other trademarks are the property of their respective owners.
Contact Rogue Wave about documentation or support issues.