SourcePro® API Reference Guide

 
List of all members | Public Types | Public Member Functions

Finds the locations of breaks, or potential breaks, in text for a specified locale. More...

#include <rw/i18n/RWUBreakSearch.h>

Public Types

enum  BreakType {
  CodePoint, Character, Word, Line,
  Sentence
}
 

Public Member Functions

 RWUBreakSearch (BreakType type, const RWUString &str, const RWULocale &locale=RWULocale::getDefault())
 
 RWUBreakSearch (const RWUBreakSearch &source)
 
 ~RWUBreakSearch ()
 
RWUConstStringIterator back (void) const
 
RWUConstStringIterator current (void) const
 
RWUConstStringIterator first (void)
 
RWUConstStringIterator front (void) const
 
RWCString getLocale (void) const
 
const RWUStringgetString (void) const
 
BreakType getType (void) const
 
bool isBreak (const RWUConstStringIterator &position) const
 
bool isBreak (const RWUStringIterator &position) const
 
bool isBreak (size_t offset) const
 
RWUConstStringIterator last (void)
 
RWUConstStringIterator next (void)
 
RWUConstStringIterator next (const RWUStringIterator &position)
 
RWUConstStringIterator next (const RWUConstStringIterator &position)
 
RWUBreakSearchoperator= (const RWUBreakSearch &rhs)
 
RWUConstStringIterator previous (void)
 
RWUConstStringIterator previous (const RWUStringIterator &position)
 
RWUConstStringIterator previous (const RWUConstStringIterator &position)
 
void setLocale (const RWULocale &locale)
 
void setString (const RWUString &str)
 
void setType (BreakType type)
 

Detailed Description

RWUBreakSearch finds the locations of breaks, or potential breaks, in text. Whitespace and punctuation are correctly interpreted in accordance with a specified locale.

Breaks reported from RWUBreakSearch are located immediately prior to the reported location. For example, a character break reported at offset 0 occurs just before the first character. RWUConstStringIterator instances returned by member functions of RWUBreakSearch are positioned at the code point immediately following a break.

Five types of text breaks are supported by RWUBreakSearch:

RWUBreakSearch objects are created given the break type to search for, an RWUString which provides text for processing, and an optional locale name. If no locale is specified, then the current default locale is used.

After instantiating a break search, you can search for the specified break type using the first(), last(), next(), and previous() methods. RWUBreakSearch objects maintain a "current" position. Initially, the current position is the start of the source string. Calls to first(), last(), next(), and previous() alter the current position.

Note that breaks occur both before and after each unit being queried. This is true for all types of break searches. For example, there are a total of four character breaks in the string abc. There is a break before the a, before the b, before the c, and after the c. This may require special handling of the ends of strings, which are always break locations. Consider the following loop:

for (it = bSearch.first();
it != str.endCodePointIterator();
it = bSearch.next())
{...}

If the character break that is located at the str.endCodePointIterator() position – like the break after the c above – should be processed, then you must take care to process it outside the body of the loop.

Example

The following example counts the number of sentences in a string:

#include <rw/i18n/RWUBreakSearch.h>
#include <rw/i18n/RWUConversionContext.h>
#include <iostream>
using std::cout;
using std::endl;
int
main()
{
// Indicate that source and target strings are
// encoded as UTF-8.
RWUConversionContext context("UTF-8");
// Initialize a Unicode string.
RWUString str("Unicode 3.2 is a minor version of the "
"Unicode Standard. It overrides certain features of "
"Unicode 3.1, and adds a significant number of coded "
"characters.");
// Create an RWUBreakSearch capable of finding
// sentence breaks, based on the default locale.
// Find the beginning of the first sentence.
// Find the end of the last sentence.
// Count the sentences in the string.
int count = 0;
while (iter != end) {
++count;
iter = searcher.next();
} // while
cout << "Found " << count << " sentences." << endl;
return 0;
} // main

Program output:

Found 2 sentences.

Member Enumeration Documentation

Specifies the type of breaks for which an RWUBreakSearch should search.

Enumerator
CodePoint 

breaks occur before and after each code point in a string.

Character 

breaks occur before and after logical characters in a string.

Word 

breaks occur before and after each word.

Line 

breaks occur at positions where it would be appropriate to wrap text from one display line to the next.

Sentence 

breaks occur before and after sentences.

Note
RWUBreakSearch attempts to interpret nested quotes, nested parentheses, and periods that may either end a sentence, or be part of a number or abbreviation. This is a difficult problem, however, and the results are not guaranteed to be perfect.

Constructor & Destructor Documentation

RWUBreakSearch::RWUBreakSearch ( BreakType  type,
const RWUString str,
const RWULocale locale = RWULocale::getDefault() 
)

Creates an RWUBreakSearch that searches for breaks of type type within str, interpreting punctuation and whitespace in accordance with the given locale. If no locale is specified, then the current default locale is used.

Note
Distinct (deep) copies of the type and locale arguments are made within the RWUBreakSearch object, but only a reference to the input RWUString is held. Consequently, you must take care not to allow the string used to create the RWUBreakSearch to be changed before the last use of that RWUBreakSearch object. Destroying or changing the RWUString referenced by an RWUBreakSearch object invalidates that RWUBreakSearch object.
Exceptions
RWUExceptionThrown if any error occurs during the construction of the break search.
RWUBreakSearch::RWUBreakSearch ( const RWUBreakSearch source)

Creates a copy of the specified source RWUBreakSearch object. The RWUString referenced by self is the same RWUString referenced by source. The current position of self is the same as the position of source.

Note
The RWUString referenced by self is the same RWUString referenced by source. Consequently, you must take care not to allow the string to be changed before the last use of self. Destroying or changing the RWUString referenced by an RWUBreakSearch object invalidates that RWUBreakSearch object.
Exceptions
RWUExceptionThrown if any error occurs during the construction of the break search.
RWUBreakSearch::~RWUBreakSearch ( )

Destructor.

Member Function Documentation

RWUConstStringIterator RWUBreakSearch::back ( void  ) const

Returns the position of the last break in self's string, without changing the current position of self.

RWUConstStringIterator RWUBreakSearch::current ( void  ) const

Returns the current position maintained by self.

RWUConstStringIterator RWUBreakSearch::first ( void  )

Sets the current position to the first break in self's string, and returns the new position.

RWUConstStringIterator RWUBreakSearch::front ( void  ) const

Returns the position of the first break in self's string, without changing the current position of self.

RWCString RWUBreakSearch::getLocale ( void  ) const

Returns the name of the locale currently imbued on self.

const RWUString& RWUBreakSearch::getString ( void  ) const

Returns a const reference to the RWUString associated with self.

BreakType RWUBreakSearch::getType ( void  ) const

Returns the break type searched for by self.

bool RWUBreakSearch::isBreak ( const RWUConstStringIterator position) const

Returns true if the given string position is a break; otherwise, false.

Exceptions
RWUExceptionThrown with error code RWUUnsupportedError if position does not reference the same string as self.
bool RWUBreakSearch::isBreak ( const RWUStringIterator position) const

Returns true if the given string position is a break; otherwise, false.

Exceptions
RWUExceptionThrown with error code RWUUnsupportedError if position does not reference the same string as self.
bool RWUBreakSearch::isBreak ( size_t  offset) const

Returns true if the position at the given code unit offset is a break; otherwise, false.

RWUConstStringIterator RWUBreakSearch::last ( void  )

Sets the current position to the last break in self's string, and returns the new position.

RWUConstStringIterator RWUBreakSearch::next ( void  )

Finds the position of the next break after the current position. Makes that position self's new current position, and returns the new position. If self is already positioned at the end of its string, the current position remains at the end of the string.

RWUConstStringIterator RWUBreakSearch::next ( const RWUStringIterator position)

Changes the current position of self to the next break after the specified position, and returns the new position.

RWUConstStringIterator RWUBreakSearch::next ( const RWUConstStringIterator position)

Changes the current position of self to the next break after the specified position, and returns the new position.

RWUBreakSearch& RWUBreakSearch::operator= ( const RWUBreakSearch rhs)

Assignment operator. Creates a copy of the rhs RWUBreakSearch object. The RWUString referenced by self is the same RWUString referenced by rhs. The current position of self is the same as the position of rhs.

Note
The RWUString referenced by self is the same RWUString referenced by rhs. Consequently, you must take care not to allow the string to be changed before the last use of self. Destroying or changing the RWUString referenced by an RWUBreakSearch object invalidates that RWUBreakSearch object.
RWUConstStringIterator RWUBreakSearch::previous ( void  )

Changes the current position of self to the break prior to the current position, and returns the new position. If self is already positioned at the beginning of its string, the current position remains at the beginning of the string.

RWUConstStringIterator RWUBreakSearch::previous ( const RWUStringIterator position)

Changes the current position of self to the break prior to the specified position, and returns the new position.

RWUConstStringIterator RWUBreakSearch::previous ( const RWUConstStringIterator position)

Changes the current position of self to the break prior to the specified position, and returns the new position.

void RWUBreakSearch::setLocale ( const RWULocale locale)

Imbues a locale on self.

void RWUBreakSearch::setString ( const RWUString str)

Sets the RWUString in which self searches for breaks to str. Resets the current position of self to the start of the search string.

Only a reference to the input RWUString is held. Consequently, you must take care not to allow the string referenced by self to be changed before the last use of self. Destroying or changing the RWUString referenced by an RWUBreakSearch object invalidates that RWUBreakSearch object.

void RWUBreakSearch::setType ( BreakType  type)

Sets the break type searched for by self to type. Resets the current position of self to the start of the search string.

Copyright © 2023 Rogue Wave Software, Inc., a Perforce company. All Rights Reserved.