Using RWUNormalizer

In the Internationalization Module, RWUNormalizer converts a string into a particular normalization form, or detects whether a string is already in a particular form.

RWUNormalizer provides a NormalizationForm enum, with values representing the four normalization forms described in “Normalization Forms”: FormNFD, FormNFKD, FormNFC, and FormNFKC. This enum can be used in conjunction with the static normalize() method to convert a string into a particular normalization form. For example, the following code converts a string into Normalization Form Decomposed (NFD):

In converting a string into a particular form, normalize() leaves ASCII characters unaffected, and replaces deprecated characters. The normalize() method never introduces compatibility characters.

RWUNormalizer provides a CheckResult enum. The static quickCheck() and quickFcdCheck() methods return a CheckResult value to indicate whether a string is in a particular normalization form: Yes indicates that the string is in the specified form, No indicates that the string is not in the specified form, and Maybe indicates that the check was inconclusive. For example, the following code detects whether a string is in Normalization Form Composed (NFC):

The static method quickFcdCheck() detects whether a string is in Fast C or D (FCD) form. Strictly speaking, FCD is not a normalization form, since it does not specify a unique representation for every string. Instead, it describes a string whose raw decomposition, without character reordering, results in an NFD string. Thus, all NFD, most NFC, and many unnormalized strings are already in FCD form. Such strings may be collated without further normalization. See Chapter 6 for information on collating Unicode strings using the Internationalization Module.