See Also: CharUnicodeInfo Members
The tp://go.microsoft.com/fwlink/?linkid=37123 defines a number of Unicode character categories. For example, a character might be categorized as an uppercase letter, a lowercase letter, a decimal digit number, a letter number, a paragraph separator, a math symbol, or a currency symbol. Your application can use the character category to govern string-based operations, such as parsing. The System.Globalization.UnicodeCategory enumeration defines the possible character categories.
You use the System.Globalization.CharUnicodeInfo class to obtain the System.Globalization.UnicodeCategory value for a specific character. The System.Globalization.CharUnicodeInfo class defines methods that return the following Unicode character values:
Numeric value. Applies only to numeric characters, including fractions, subscripts, superscripts, Roman numerals, currency numerators, encircled numbers, and script-specific digits.
Digit value. Applies to numeric characters that can be combined with other numeric characters to represent a whole number in a numbering system.
Decimal digit value. Applies only to characters that represent decimal digits in the decimal (base 10) system. A decimal digit can be one of ten digits, from zero through nine. These characters are members of the UnicodeCategory.DecimalDigitNumber category.
When using this class in your applications, keep in mind the following programming considerations for using the char type. The type can be difficult to use, and strings are generally preferable for representing linguistic content.
A char object does not always correspond to a single character. Although the char type represents a single 16-bit value, some characters (such as grapheme clusters and surrogate pairs) consist of two or more UTF-16 code units. For more information, see "Char Objects and Unicode Characters" in the string class.
The notion of a "character" is also flexible. A character is often thought of as a glyph, but many glyphs require multiple code points. For example, ä can be represented either by two code points ("a" plus U+0308, which is the combining diaeresis), or by a single code point ("ä" or U+00A4). Some languages have many letters, characters, and glyphs that require multiple code points, which can cause confusion in linguistic content representation. For example, there is a ΰ (U+03B0, Greek small letter upsilon with dialytika and tonos), but there is no equivalent capital letter. Uppercasing such a value simply retrieves the original value.