TextEncoding

Class (inherits from Object)

Used to specify the text encoding of a String.

Properties

Base	Format	Variant
Code	InternetName

Methods

Chr	IsValidData
Equals	Operator_Compare

Notes

When a computer stores text, it encodes each character as a numeric value and stores the byte (or bytes) associated with that number. When it needs to display or print that character, it consults the encoding scheme to determine which character the number represents.

The first computers used the encoding scheme called "ASCII", which stands for American Standard Code for Information Interchange. It specified 128 values and includes codes for upper and lower case letters, numbers, the common symbols on a keyboard, and some "invisible" control codes that were heavily used in early computers.

As computers became more sophisticated and were introduced in non-English speaking countries, the limitations of the ASCII encoding scheme became apparent. It didn’t include codes for accented characters and had no chance of handling idiographic languages, such as Japanese or Chinese, which require thousands of characters.

As a result, extensions to the ASCII encoding scheme were developed. Outside the range of 0-127, the schemes, in general, do not agree. For example, in the US macOS and Windows computers use different encodings for codes 128-255. Many other encoding schemes for handling languages that use non-ASCII characters have been developed.

The most general solution to the problem is an encoding called Unicode. It is designed to handle every character in every language. It also enables you to represent a mixture of languages within one text stream. However, not all strings that you may encounter use Unicode.

When you encounter a string, you need to know its encoding in order to interpret the sequence of bytes (or double-bytes) that make up the string's content. By default, every string contains both the bytes (content) and the encoding (if it is known; it is Nil if not known). Two different formats of Unicode are supported: UTF-8 and UTF-16. All strings in your project are compiled as UTF-8. This is a Unicode encoding that uses one byte for ASCII characters and up to four bytes for non-ASCII characters.

If you work only with strings that are created and managed within your own application, you probably don't need to deal with encodings directly, as the issues are taken care of by everything using UTF-8. However, if you receive strings from an outside source such as via the internet, an external database (that is, not SQLite), or a text file, you should let specify what encoding is used. If the string is a Memoryblock, the encoding will be Nil.

You can assign an encoding to a string in several ways. For example, if you are reading the string using the TextInputStream class, you use the Encoding property. The Encodings module gives you access to all known encodings. Here is an example that reads a text file that uses the UTF8 encoding:

Var f As FolderItem
Var t As TextInputStream
f = FolderItem.ShowOpenFileDialog("text") // file type defined as as File Type
If f <> Nil Then
t = TextInputStream.Open(f)
t.Encoding = Encodings.UTF8 // specify encoding of input stream
TextArea1.Value = t.ReadAll
t.Close
End If

Also, the Read, ReadLine, and ReadAll methods take an optional parameter that lets you specify the encoding.

If you need to output a string in a specific encoding, you can use the ConvertEncoding function to do so. For example, this code converts the text in a TextField to the WindowsANSI encoding:

Var s As String
s = TextField1.Value.ConvertEncoding(Encodings.WindowsANSI)

You will find text encoding helpful if you develop:

Internet applications, such as web browsers or e-mail applications
Applications that transfer text across different platforms
Applications based in Unicode

The Encoding function makes it easy to obtain the TextEncoding of any string. Use the Encodings module to obtain a specified text encoding. Some of the most useful are UTF8, UTF16, UTF32, ASCII, MacRoman, MacJapanese, and WindowsLatin1. Use the Autocomplete feature of the Code Editor to view the complete list.

ASCII Codes

The following table presents the ASCII character codes. It presents the Decimal, Hex, and Octal values for ASCII codes (0 to 127).

Decimal	Hex	Octal	Result	Decimal	Hex	Octal	Result
0	0	0	NUL	32	20	40	SP
1	1	1	SOH	33	21	41	!
2	2	2	STX	34	22	42	"
3	3	3	ETX	35	23	43	#
4	4	4	EOT	36	24	44	$
5	5	5	ENQ	37	25	45	%
6	6	6	ACK	38	26	46	&
7	7	7	BEL	39	27	47	'
8	8	10	BS	40	28	50	(
9	9	11	HT	41	29	51	)
10	A	12	LF	42	2A	52	*
11	B	13	VT	43	2B	53	+
12	C	14	FF	44	2C	54	,
13	D	15	CR	45	2D	55	-
14	E	16	SO	46	2E	56	.
15	F	17	SI	47	2F	57	/
16	10	20	DLE	48	30	60	0
17	11	21	DC1	49	31	61	1
18	12	22	DC2	50	32	62	2
19	13	23	DC3	51	33	63	3
20	14	24	DC4	52	34	64	4
21	15	25	NAK	53	35	65	5
22	16	26	SYN	54	36	66	6
23	17	27	ETB	55	37	67	7
24	18	30	CAN	56	38	70	8
25	19	31	EM	57	39	71	9
26	1A	32	SUB	58	3A	72	:
27	1B	33	ESC	59	3B	73	;
28	1C	34	FS	60	3C	74	<
29	1D	35	GS	61	3D	75	=
30	1E	36	RS	62	3E	76	>
31	1F	37	US	63	3F	77	?
64	40	100	@	96	60	140	'
65	41	101	A	97	61	141	a
66	42	102	B	98	62	142	b
67	43	103	C	99	63	143	c
68	44	104	D	100	64	144	d
69	45	105	E	101	65	145	e
70	46	106	F	102	66	146	f
71	47	107	G	103	67	147	g
72	48	110	H	104	68	150	h
73	49	111	I	105	69	151	i
74	4A	112	J	106	6A	152	j
75	4B	113	K	107	6B	153	k
76	4C	114	L	108	6C	154	l
77	4D	115	M	109	6D	155	m
78	4E	116	N	110	6E	156	n
79	4F	117	O	111	6F	157	o
80	50	120	P	112	70	160	p
81	51	121	Q	113	71	161	q
82	52	122	R	114	72	162	r
83	53	123	S	115	73	163	s
84	54	124	T	116	74	164	t
85	55	125	U	117	75	165	u
86	56	126	V	118	76	166	v
87	57	127	W	119	77	167	w
88	58	130	X	120	78	170	x
89	59	131	Y	121	79	171	y
90	5A	132	Z	122	7A	172	z
91	5B	133	[	123	7B	173	{
92	5C	134	\	124	7C	174
93	5D	135	]	125	7D	175	}
94	5E	136	^	126	7E	176	~
95	5F	137	_	127	7F	177	DEL

Examples

The following example obtains the TextEncoding of the string passed to the Encoding function.

Var t As TextEncoding
t = Encoding(TextArea1.Value)
If t <> Nil then
Label1.Value = "Base=" + t.Base.ToString
Label2.Value = "Format=" + t.Format.ToString
Label3.Value = "Variant=" + t.Variant.ToString
End If

The following statement uses the Encodings module to obtain the UTF8 text encoding for text in a TextField.

TextField2.Value = DefineEncoding(TextField1.Value, Encodings.UTF8)

The following example uses the Chr method to obtain the character corresponding to the code point of 165 for the MacRoman encoding, the bullet character (•):

Var s As String
s = Encodings.MacRoman.Chr(165)

TextEncoding

From Xojo Documentation

Notes

ASCII Codes

Examples

See Also

Decimal	Hex	Octal	Result	Decimal	Hex	Octal	Result
0	0	0	NUL	32	20	40	SP
1	1	1	SOH	33	21	41	!
2	2	2	STX	34	22	42	"
3	3	3	ETX	35	23	43	#
4	4	4	EOT	36	24	44	$
5	5	5	ENQ	37	25	45	%
6	6	6	ACK	38	26	46	&
7	7	7	BEL	39	27	47	'
8	8	10	BS	40	28	50	(
9	9	11	HT	41	29	51	)
10	A	12	LF	42	2A	52	*
11	B	13	VT	43	2B	53	+
12	C	14	FF	44	2C	54	,
13	D	15	CR	45	2D	55	-
14	E	16	SO	46	2E	56	.
15	F	17	SI	47	2F	57	/
16	10	20	DLE	48	30	60	0
17	11	21	DC1	49	31	61	1
18	12	22	DC2	50	32	62	2
19	13	23	DC3	51	33	63	3
20	14	24	DC4	52	34	64	4
21	15	25	NAK	53	35	65	5
22	16	26	SYN	54	36	66	6
23	17	27	ETB	55	37	67	7
24	18	30	CAN	56	38	70	8
25	19	31	EM	57	39	71	9
26	1A	32	SUB	58	3A	72	:
27	1B	33	ESC	59	3B	73	;
28	1C	34	FS	60	3C	74	<
29	1D	35	GS	61	3D	75	=
30	1E	36	RS	62	3E	76	>
31	1F	37	US	63	3F	77	?
64	40	100	@	96	60	140	'
65	41	101	A	97	61	141	a
66	42	102	B	98	62	142	b
67	43	103	C	99	63	143	c
68	44	104	D	100	64	144	d
69	45	105	E	101	65	145	e
70	46	106	F	102	66	146	f
71	47	107	G	103	67	147	g
72	48	110	H	104	68	150	h
73	49	111	I	105	69	151	i
74	4A	112	J	106	6A	152	j
75	4B	113	K	107	6B	153	k
76	4C	114	L	108	6C	154	l
77	4D	115	M	109	6D	155	m
78	4E	116	N	110	6E	156	n
79	4F	117	O	111	6F	157	o
80	50	120	P	112	70	160	p
81	51	121	Q	113	71	161	q
82	52	122	R	114	72	162	r
83	53	123	S	115	73	163	s
84	54	124	T	116	74	164	t
85	55	125	U	117	75	165	u
86	56	126	V	118	76	166	v
87	57	127	W	119	77	167	w
88	58	130	X	120	78	170	x
89	59	131	Y	121	79	171	y
90	5A	132	Z	122	7A	172	z
91	5B	133	[	123	7B	173	{
92	5C	134	\	124	7C	174
93	5D	135	]	125	7D	175	}
94	5E	136	^	126	7E	176	~
95	5F	137	_	127	7F	177	DEL

Decimal	Hex	Octal	Result	Decimal	Hex	Octal	Result
0	0	0	NUL	32	20	40	SP
1	1	1	SOH	33	21	41	!
2	2	2	STX	34	22	42	"
3	3	3	ETX	35	23	43	#
4	4	4	EOT	36	24	44	$
5	5	5	ENQ	37	25	45	%
6	6	6	ACK	38	26	46	&
7	7	7	BEL	39	27	47	'
8	8	10	BS	40	28	50	(
9	9	11	HT	41	29	51	)
10	A	12	LF	42	2A	52	*
11	B	13	VT	43	2B	53	+
12	C	14	FF	44	2C	54	,
13	D	15	CR	45	2D	55	-
14	E	16	SO	46	2E	56	.
15	F	17	SI	47	2F	57	/
16	10	20	DLE	48	30	60	0
17	11	21	DC1	49	31	61	1
18	12	22	DC2	50	32	62	2
19	13	23	DC3	51	33	63	3
20	14	24	DC4	52	34	64	4
21	15	25	NAK	53	35	65	5
22	16	26	SYN	54	36	66	6
23	17	27	ETB	55	37	67	7
24	18	30	CAN	56	38	70	8
25	19	31	EM	57	39	71	9
26	1A	32	SUB	58	3A	72	:
27	1B	33	ESC	59	3B	73	;
28	1C	34	FS	60	3C	74	<
29	1D	35	GS	61	3D	75	=
30	1E	36	RS	62	3E	76	>
31	1F	37	US	63	3F	77	?
64	40	100	@	96	60	140	'
65	41	101	A	97	61	141	a
66	42	102	B	98	62	142	b
67	43	103	C	99	63	143	c
68	44	104	D	100	64	144	d
69	45	105	E	101	65	145	e
70	46	106	F	102	66	146	f
71	47	107	G	103	67	147	g
72	48	110	H	104	68	150	h
73	49	111	I	105	69	151	i
74	4A	112	J	106	6A	152	j
75	4B	113	K	107	6B	153	k
76	4C	114	L	108	6C	154	l
77	4D	115	M	109	6D	155	m
78	4E	116	N	110	6E	156	n
79	4F	117	O	111	6F	157	o
80	50	120	P	112	70	160	p
81	51	121	Q	113	71	161	q
82	52	122	R	114	72	162	r
83	53	123	S	115	73	163	s
84	54	124	T	116	74	164	t
85	55	125	U	117	75	165	u
86	56	126	V	118	76	166	v
87	57	127	W	119	77	167	w
88	58	130	X	120	78	170	x
89	59	131	Y	121	79	171	y
90	5A	132	Z	122	7A	172	z
91	5B	133	[	123	7B	173	{
92	5C	134	\	124	7C	174
93	5D	135	]	125	7D	175	}
94	5E	136	^	126	7E	176	~
95	5F	137	_	127	7F	177	DEL

Decimal	Hex	Octal	Result	Decimal	Hex	Octal	Result
0	0	0	NUL	32	20	40	SP
1	1	1	SOH	33	21	41	!
2	2	2	STX	34	22	42	"
3	3	3	ETX	35	23	43	#
4	4	4	EOT	36	24	44	$
5	5	5	ENQ	37	25	45	%
6	6	6	ACK	38	26	46	&
7	7	7	BEL	39	27	47	'
8	8	10	BS	40	28	50	(
9	9	11	HT	41	29	51	)
10	A	12	LF	42	2A	52	*
11	B	13	VT	43	2B	53	+
12	C	14	FF	44	2C	54	,
13	D	15	CR	45	2D	55	-
14	E	16	SO	46	2E	56	.
15	F	17	SI	47	2F	57	/
16	10	20	DLE	48	30	60	0
17	11	21	DC1	49	31	61	1
18	12	22	DC2	50	32	62	2
19	13	23	DC3	51	33	63	3
20	14	24	DC4	52	34	64	4
21	15	25	NAK	53	35	65	5
22	16	26	SYN	54	36	66	6
23	17	27	ETB	55	37	67	7
24	18	30	CAN	56	38	70	8
25	19	31	EM	57	39	71	9
26	1A	32	SUB	58	3A	72	:
27	1B	33	ESC	59	3B	73	;
28	1C	34	FS	60	3C	74	<
29	1D	35	GS	61	3D	75	=
30	1E	36	RS	62	3E	76	>
31	1F	37	US	63	3F	77	?
64	40	100	@	96	60	140	'
65	41	101	A	97	61	141	a
66	42	102	B	98	62	142	b
67	43	103	C	99	63	143	c
68	44	104	D	100	64	144	d
69	45	105	E	101	65	145	e
70	46	106	F	102	66	146	f
71	47	107	G	103	67	147	g
72	48	110	H	104	68	150	h
73	49	111	I	105	69	151	i
74	4A	112	J	106	6A	152	j
75	4B	113	K	107	6B	153	k
76	4C	114	L	108	6C	154	l
77	4D	115	M	109	6D	155	m
78	4E	116	N	110	6E	156	n
79	4F	117	O	111	6F	157	o
80	50	120	P	112	70	160	p
81	51	121	Q	113	71	161	q
82	52	122	R	114	72	162	r
83	53	123	S	115	73	163	s
84	54	124	T	116	74	164	t
85	55	125	U	117	75	165	u
86	56	126	V	118	76	166	v
87	57	127	W	119	77	167	w
88	58	130	X	120	78	170	x
89	59	131	Y	121	79	171	y
90	5A	132	Z	122	7A	172	z
91	5B	133	[	123	7B	173	{
92	5C	134	\	124	7C	174
93	5D	135	]	125	7D	175	}
94	5E	136	^	126	7E	176	~
95	5F	137	_	127	7F	177	DEL