Rogue Wave banner
Previous fileTop of DocumentContentsIndex pageNext file
Internationalization Module User's Guide
Rogue Wave web site:  Home Page  |  Main Documentation Page

7.2 Boundary Analysis

RWUBreakSearch finds the locations of code point, character, word, sentence, and line breaks in text.

Instances of RWUBreakSearch are used by other classes in the Internationalization Module to find breaks in text in a locale-sensitive manner. For example, RWUStringSearch performs flexible, collation-based string searches, using the rules encapsulated by an RWUCollator and an optional RWUBreakSearch to determine if and where a match occurs (Section 8.3). Similarly, RWURegularExpression uses an RWUBreakSearch internally to find break-related matches (Section 8.4).

7.2.1 Creating an RWBreakSearch

RWUBreakSearch objects are created given:

For example, this code creates an RWUBreakSearch that can be used to search the RWUString myString for character breaks based on the current default locale:

7.2.2 Using an RWBreakSearch

Once a break search is instantiated, breaks can be queried using first(), last(), next(), and previous() methods. An RWUBreakSearch object maintains a current position. Initially, the current position is the start of the source string. Calls to first(), last(), next(), and previous() alter the current position.


Breaks are interpreted as being between characters, immediately to the left of the current position.

For example, the following code counts the number of sentences in a string:

//1

Indicates that source and target strings are encoded as UTF-8.

//2

Initializes a Unicode string.

//3

Creates an RWUBreakSearch capable of finding sentence breaks, based on the default locale.

//4

Finds the beginning of the first sentence.

//5

Finds the end of the last sentence.

//6

Counts the sentences in the string.

Note that for all types of break searches, breaks often occur both before and after each unit being queried. For example, there are a total of four character breaks in the string abc. There is a break before the a, before the b, before the c, and after the c. This may require special handling of the ends of strings. For example, consider the following loop:

If the character break that is located at the str.endCodePointIterator() position (like the break after the c above) should be processed, then you must take care to process it outside the body of the loop.

7.2.3 Direct Queries

RWUBreakSearch supports direct boundary queries using the isBreak() method. This method returns true if a given string position is a break. For example, this code tests whether there is a sentence break immediately to the left of the 12th code point in str:



Previous fileTop of DocumentContentsNo linkNext file

Copyright © Rogue Wave Software, Inc. All Rights Reserved.

The Rogue Wave name and logo, and SourcePro, are registered trademarks of Rogue Wave Software. All other trademarks are the property of their respective owners.
Provide feedback to Rogue Wave about its documentation.