Character Blocks

Internationalization Module User’s Guide : Chapter 3 Character and String Processing : Character Properties : Character Blocks

Character Blocks

A character block is a grouping of related characters within the Unicode encoding space. RWUCharTraits provides a Block enum with values that identify the various blocks, such as the BasicLatinBlock, the GreekAndCopticBlock, the BengaliBlock, the ThaiBlock, the EthiopicBlock, the CherokeeBlock, and so on. The values in this enumeration correspond to the block names that appear in the Unicode Character Database, as described in Chapter 14, “Code Charts,” of the Unicode Standard.

The static method RWUCharTraits::getBlock() returns the value in the Block enumeration that identifies the character block containing the Unicode character with a given code point.

Character Scripts

Every Unicode character is assigned a script name in the Unicode Character Database. The script name associated with a code point is often a better basis for distinguishing characters than the block name. Blocks are simply code point ranges; characters from the same script may be in several different blocks, while characters from different scripts may be in the same block.

RWUCharTraits provides a Script enum with values that identify the various scripts, such as Latin, Cyrillic, Hebrew, Tibetan, Runic, and so on. The values in this enumeration correspond to the script property names defined in the Unicode Character Database, as described in Unicode Technical Report #24, “Script Names”:

http://www.unicode.org/unicode/reports/tr24

The static method RWUCharTraits::getScript() returns the value in the Script enumeration that identifies the script associated with a given code point.