System.String.Normalize Method

Returns a new string whose textual value is the same as this string, but whose binary representation is in Unicode normalization form C.

Syntax

public string Normalize ()

Returns

A new, normalized string whose textual value is the same as this string, but whose binary representation is in normalization form C.

Remarks

Some Unicode characters have multiple equivalent binary representations consisting of sets of combining and/or composite Unicode characters. For example, any of the following code points can represent the letter "αΊ―":

  • U+1EAF

  • U+0103 U+0301

  • U+0061 U+0306 U+0301

The existence of multiple representations for a single character complicates searching, sorting, matching, and other operations.

The Unicode standard defines a process called normalization that returns one binary representation when given any of the equivalent binary representations of a character. Normalization can be performed with several algorithms, called normalization forms, that obey different rules. The .NET Framework supports the four normalization forms (C, D, KC, and KD) that are defined by the Unicode standard. When two strings are represented in the same normalization form, they can be compared by using ordinal comparison.

To normalize and compare two strings, do the following:

[The 'ordered' type of list has not been implemented in the ECMA stylesheet.]

For a description of supported Unicode normalization forms, see System.Text.NormalizationForm.

Requirements

Namespace: System
Assembly: mscorlib (in mscorlib.dll)
Assembly Versions: 2.0.0.0, 4.0.0.0
Since: .NET 2.0