Internationalization Module User’s Guide : Chapter 5 Normalization : Using RWUNormalizer
Using RWUNormalizer
In the Internationalization Module, RWUNormalizer converts a string into a particular normalization form, or detects whether a string is already in a particular form.
Normalizing Strings
RWUNormalizer provides a NormalizationForm enum, with values representing the four normalization forms described in “Normalization Forms”: FormNFD, FormNFKD, FormNFC, and FormNFKC. This enum can be used in conjunction with the static normalize() method to convert a string into a particular normalization form. For example, the following code converts a string into Normalization Form Decomposed (NFD):
 
RWUString str("This is a test.");
str = RWUNormalizer::normalize(str, RWUNormalizer::FormNFD);
In converting a string into a particular form, normalize() leaves ASCII characters unaffected, and replaces deprecated characters. The normalize() method never introduces compatibility characters.
Detecting the Normalization Form of a String
RWUNormalizer provides a CheckResult enum. The static quickCheck() and quickFcdCheck() methods return a CheckResult value to indicate whether a string is in a particular normalization form: Yes indicates that the string is in the specified form, No indicates that the string is not in the specified form, and Maybe indicates that the check was inconclusive. For example, the following code detects whether a string is in Normalization Form Composed (NFC):
 
RWUString str("This is a test.");
RWUNormalizer::CheckResult result =
RWUNormalizer::quickCheck(str, RWUNormalizer::FormNFC);
 
if (result != RWUNormalizer::Yes) {
str = RWUNormalizer::normalize(str, RWUNormalizer::FormNFC);
}
The static method quickFcdCheck() detects whether a string is in Fast C or D (FCD) form. Strictly speaking, FCD is not a normalization form, since it does not specify a unique representation for every string. Instead, it describes a string whose raw decomposition, without character reordering, results in an NFD string. Thus, all NFD, most NFC, and many unnormalized strings are already in FCD form. Such strings may be collated without further normalization. See Chapter 6 for information on collating Unicode strings using the Internationalization Module.