Flask like Jinja2 and Werkzeug is totally Unicode based when it comes to text. Not only these libraries, also the majority of web related Python libraries that deal with text. If you don’t know Unicode so far, you should probably read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets. This part of the documentation just tries to cover the very basics so that you have a pleasant experience with Unicode related things.
Flask has a few assumptions about your application (which you can change of course) that give you basic and painless Unicode support:
So what does this mean to you?
HTTP is based on bytes. Not only the protocol, also the system used to address documents on servers (so called URIs or URLs). However HTML which is usually transmitted on top of HTTP supports a large variety of character sets and which ones are used, are transmitted in an HTTP header. To not make this too complex Flask just assumes that if you are sending Unicode out you want it to be UTF-8 encoded. Flask will do the encoding and setting of the appropriate headers for you.
The same is true if you are talking to databases with the help of SQLAlchemy or a similar ORM system. Some databases have a protocol that already transmits Unicode and if they do not, SQLAlchemy or your other ORM should take care of that.
So the rule of thumb: if you are not dealing with binary data, work with Unicode. What does working with Unicode in Python 2.x mean?
If you are talking with a filesystem or something that is not really based on Unicode you will have to ensure that you decode properly when working with Unicode interface. So for example if you want to load a file on the filesystem and embed it into a Jinja2 template you will have to decode it from the encoding of that file. Here the old problem that text files do not specify their encoding comes into play. So do yourself a favour and limit yourself to UTF-8 for text files as well.
Anyways. To load such a file with Unicode you can use the built-in str.decode() method:
def read_file(filename, charset='utf-8'):
with open(filename, 'r') as f:
return f.read().decode(charset)
To go from Unicode into a specific charset such as UTF-8 you can use the unicode.encode() method:
def write_file(filename, contents, charset='utf-8'):
with open(filename, 'w') as f:
f.write(contents.encode(charset))
Most editors save as UTF-8 by default nowadays but in case your editor is not configured to do this you have to change it. Here some common ways to set your editor to store as UTF-8:
Vim: put set enc=utf-8 to your .vimrc file.
Emacs: either use an encoding cookie or put this into your .emacs file:
(prefer-coding-system 'utf-8)
(setq default-buffer-file-coding-system 'utf-8)
Notepad++:
It is also recommended to use the Unix newline format, you can select it in the same panel but this is not a requirement.