Text

From Xojo Documentation

Data Type


The Text type is used to store textual information (unicode). Text values automatically convert to String.

Methods
BeginsWith Left TitleCase
Characters Length ToCString
Codepoints Lowercase Trim
Compare Mid TrimLeft
Empty Replace TrimRight
EndOfLine ReplaceAll Uppercase
EndsWith Right
IndexOf Split
Shared Methods
FromCString FromUnicodeCodepoint Join

Notes

Const CompareCaseSensitive = 1

Used by methods to specify some methods should compare text as case sensistive.

Converting Text to and From Bytes

To get the bytes for the Text (using a MemoryBlock), you call TextEncoding.ConvertTextToData using a specific encoding.

To convert bytes in a MemoryBlock to Text, you call TextEncoding.ConvertDataToText, specifying the encoding.

Comparing Text vs. Strings

Text is abstract - a series of characters.

Bytes are concrete - a series of bits.

There are lots of different ways to encode characters into bytes. Most of them are very limited, only defining encodings for some characters, and even when they define encodings for the same characters, they often use different bytes.

The only encodings which can represent every character are the Unicode encodings: UTF-8, UTF-16, UTF-32.

The old String type tries to represent either text or bytes or both, and as a result it's complicated and confusing. With Text, this is now very simple: Text is characters, and if you want to convert to or from an array of bytes (or an old-fashioned String), you have to be clear about the encoding you intend to use.

When you say that you want to write an ASCII string to a serial port - well, you are actually writing bytes to the serial port, because you are doing something concrete, something that interchanges with other programs or machines. So you would convert the text to bytes, and you would do so using the ASCII encoding. Conversely, you can translate some bytes, contained in a String or a MemoryBlock, up to a Text value by specifying the encoding that was used to generate them.

Technical Information

The Text type is an immutable series of Unicode scalar values.

The documentation is very deliberate in its use of these terms: character, code point, and scalar value. A character, in this context, refers to an extended grapheme cluster (also known as a user-perceived character). The terms code point and scalar value retain the meaning defined in the Unicode standard.

All of the APIs on the Text type operate in characters. For example, if the APIs worked in terms of Unicode code points, it would be possible to corrupt data using Left/Mid/Right if the positions happened to be in the middle of a composed character or grapheme cluster. Working in characters also avoids situations where the length of 'é' can be either 1 or 2.

Many of the functions in this API optionally take locales because different locales can have special rules for casing and comparing. The default behavior being to perform the operation in a locale-insensitive manner. Functions that perform comparisons also take option flags that specify how to perform the comparison (e.g. case sensitively). These flags are bit flags that are combined via the bitwise Or operator. If the combination of options is invalid, an exception is thrown.

Sample Code

Assign text to a Label:

Dim t As Text
t = "Hello, World"
MyLabel.Text = t

Text is available in all project types, so you can also use it in place of String. A Text value can be converted to a String, so code like this works:

Dim t As Text = "Hello, World!"
MessageBox(t) ' MessageBox takes a String, but this works because Text can be converted to String

You can also convert a String with a known encoding to a Text using the String.ToText method:

Var s As String = "Hello"
Var t As Text = s.ToText // t = "Hello"

See Also

Xojo.Core.TextEncoding class