Tip #245: Working with Unicode (platform-independent)
tip karma |
Rating 6/3, Viewed by 571
|
created: |
|
May 10, 2002 12:21 |
|
complexity: |
|
basic |
author: |
|
Tony Mechelynck |
|
as of Vim: |
|
6.0 |
Here are the main options you will want to set if you want to work with Unicode files in (g)vim (see at bottom what help tags to look for)
if has("multi_byte")
set encoding=utf-8 " how vim shall represent characters internally
setglobal fileencoding=utf-8 " empty is also OK (defaults to same as 'encoding'). Or you may want to set one of the ucs encodings (which
" may use less disk space if you use only "alphabetic" scripts such as Latin, Greek, Cyrillic, Hebrew or Arabic, and
" not "ideographic" scripts like Chinese, Japanese or Korean. With the ucs encodings it is usually better
set bomb " to also set 'bomb' on ('byte-order-mark" option, irrelevant for utf-8 but not for ucs)
set termencoding=iso-8859-15 " or whatever is appropriate to your locale (iso-8859-15 is Latin1 + Euro currency sign)
set fileencodings=ucs-bom,iso-8859-15,iso-8859-3,utf-8
" or whatever is appropriate to the kinds of files you want to edit
" 'fileencodings' defines the heuristic to set 'fillencoding' (local to buffer) when reading an existing file. The first one that matches will be used.
" ucs-bom is "ucs with byte-order-mark"; it must not come after ucs-8 if you want it to be used
else
echoerr "Sorry, this version of (g)vim was not compiled with +multi_byte"
endif
In "replace" mode, one utf character (one or more data bytes) replaces one utf character (which need not use the same number of bytes)
In "normal" mode, ga shows the character under the cursor as text, decimal, octal and hex; g8 shows which byte(s) is/are used to represent it
In "insert" or "replace" mode,
- any character defined on your keyboard can be entered the usual way (even with dead keys if you have them, e.g. âêîôû äëïöü)
- any character which has a "digraph" (there are a huge lot of them, see :dig after setting enc=utf-8) can be entered with a Ctrl-K prefix
- any utf character at all can be entered with a Ctrl-V prefix, either <Ctrl-V> u aaaa or <Ctrl-V> U bbbbbbbb, with 0 <= aaaa <= FFFF, or 0 <= bbbbbbbb <= 7FFFFFFF
Unicode can be used to create html "body text", at least for Netscape 6 and probably for IE; but on my machine it doesn't display properly as "title text" (i.e., between <title></title> tags in the <head> part).
Gvim will display it properly if you have the fonts for it, provided that you set 'guifont' to some fixed-width font which has the glyphs you want to use (Courier New is OK for French, German, Greek, Russian and more, but I'm not sure about Hebrew or Arabic; its glyphs are of a more "fixed" width than those of, e.g. Lucida Console: the latter can be annoying if you need bold Cyrillic writing).
see:
:h utf8
:h 'enc'
:h 'fenc'
:h 'fencs'
:h 'tenc'
:h 'bomb'
:h 'guifont'
:h ga
:h g8
:h i_Ctrl-V_digit
Happy Vimming !
Tony.
<<Ask vim where an option was set. |
Working with Unicode (the same, rewritten for legibility) >>
Additional Notes
|