Provides functionality to split a string into text elements and to iterate through those text elements.
See Also: StringInfo Members
The .NET Framework defines a text element as a unit of text that is displayed as a single character, that is, a grapheme. A text element can be a base character, a surrogate pair, or a combining character sequence. The tp://go.microsoft.com/fwlink/?linkid=37123 defines a surrogate pair as a coded character representation for a single abstract character that consists of a sequence of two code units, where the first unit of the pair is a high surrogate and the second is a low surrogate. The Unicode Standard defines a combining character sequence as a combination of a base character and one or more combining characters. A surrogate pair can represent a base character or a combining character.
The System.Globalization.StringInfo class enables you to work with a string as a series of textual elements rather than individual char objects. You can work with the individual text elements in a string in two ways:
By enumerating each text element. To do this, you call the StringInfo.GetTextElementEnumerator(string) method, and then repeatedly call the TextElementEnumerator.MoveNext method on the returned System.Globalization.TextElementEnumerator object until the method returns false.
By calling the StringInfo.ParseCombiningCharacters(string) method to retrieve an array that contains the starting index of each text element. You can then retrieve individual text elements by passing these indexes to the StringInfo.SubstringByTextElements(int) method.
The following example illustrates both ways of working with the text elements in a string. It creates two strings:
strCombining, which is a string of Arabic characters that includes three text elements with multiple char objects. The first text element is the base character ARABIC LETTER ALEF (U+-627) followed by ARABIC HAMZA BELOW (U+-655) and ARABIC KASRA (U+0650). The second text element is ARABIC LETTER HEH (U+0647) followed by ARABIC FATHA (U+-64E). The third text element is ARABIC LETTTER BEH (U+0628) followed by ARABIC DAMMATAN (U+064C).
strSurrogates, which is a string that includes three surrogate pairs: GREEK ACROPHONIC FIVE TALENTS (U+10148) from the Supplementary Multilingual Plane, U+20026 from the Supplementary Ideographic Plane, and U+F1001 from the private user area. The UTF-16 encoding of each character is a surrogate pair that consists of a high surrogate followed by a low surrogate.
Each string is parsed once by the StringInfo.ParseCombiningCharacters(string) method and then by the StringInfo.GetTextElementEnumerator(string) method. Both methods correctly parse the text elements in the two strings and display the results of the parsing operation.
code reference: System.Globalization.StringInfo.Class#1