Internationalization Module User’s Guide : Chapter 5 Normalization : Character Equivalence
Character Equivalence
The Unicode standard recognizes two types of character equivalence: canonical equivalence and compatibility equivalence.
Canonical Equivalence
Canonical equivalence is a fundamental equivalence between individual Unicode characters and sequences of Unicode characters. For example, Unicode has a character for the letter e (U+0065) and a character for an acute accent (U+0301). The acute accent is called a combining character, because it combines with the preceding character to yield an accented character. When the string containing an acute accent is displayed, the accent is superimposed on the preceding character. However, Unicode also has a code point for the composite character é (U+00E9). The composite character and the two-character sequence are canonical equivalents. Appropriately rendered, canonical equivalents are indistinguishable. In this case, composite character U+00E9 is indistinguishable from the two character sequence U+0065 U+0301. Another example of canonical equivalence is that between Korean hangul syllables and the jamo characters that compose them.
Compatibility Equivalence
For round-trip compatibility with other encoding standards, Unicode has encoded many entities that are really variants of existing nominal characters. For example, the compatibility character ½ (U+00BD) corresponds to the nominal sequence 1/2 (U+0031, U+2044, and U+0032). Another example of compatibility equivalence is between circled and un-circled versions of characters.
Typically, compatibility characters differ in appearance from their nominal counterparts. Therefore, replacing a character by a compatibility equivalent may result in the lose of formatting information unless supplemented by markup or styling.