Using RWUNormalizer
In the Internationalization Module,
RWUNormalizer converts a string into a particular
normalization form, or detects whether a string is already in a particular form.
Normalizing Strings
RWUNormalizer provides a
NormalizationForm enum, with values representing the four normalization forms described in
“Normalization Forms”:
FormNFD,
FormNFKD,
FormNFC, and
FormNFKC. This enum can be used in conjunction with the static
normalize() method to convert a string into a particular normalization form. For example, the following code converts a string into Normalization Form Decomposed (NFD):
RWUString str("This is a test.");
str = RWUNormalizer::normalize(str, RWUNormalizer::FormNFD);
In converting a string into a particular form, normalize() leaves ASCII characters unaffected, and replaces deprecated characters. The normalize() method never introduces compatibility characters.
Detecting the Normalization Form of a String
RWUNormalizer provides a
CheckResult enum. The static
quickCheck() and
quickFcdCheck() methods return a
CheckResult value to indicate whether a string is in a particular normalization form:
Yes indicates that the string is in the specified form,
No indicates that the string is not in the specified form, and
Maybe indicates that the check was inconclusive. For example, the following code detects whether a string is in Normalization Form Composed (NFC):
RWUString str("This is a test.");
RWUNormalizer::CheckResult result =
RWUNormalizer::quickCheck(str, RWUNormalizer::FormNFC);
if (result != RWUNormalizer::Yes) {
str = RWUNormalizer::normalize(str, RWUNormalizer::FormNFC);
}
The static method
quickFcdCheck() detects whether a string is in
Fast C or D (FCD) form. Strictly speaking, FCD is not a normalization form, since it does not specify a unique representation for every string. Instead, it describes a string whose raw decomposition, without character reordering, results in an NFD string. Thus, all NFD, most NFC, and many unnormalized strings are already in FCD form. Such strings may be collated without further normalization. See
Chapter 6 for information on collating Unicode strings using the Internationalization Module.