GNU GRUB Manual 2.12: Internationalisation

18 Internationalisation

18.1 Charset

GRUB uses UTF-8 internally other than in rendering where some GRUB-specific appropriate representation is used. All text files (including config) are assumed to be encoded in UTF-8.

18.2 Filesystems

NTFS, JFS, UDF, HFS+, exFAT, long filenames in FAT, Joliet part of ISO9660 are treated as UTF-16 as per specification. AFS and BFS are read as UTF-8, again according to specification. BtrFS, cpio, tar, squash4, minix, minix2, minix3, ROMFS, ReiserFS, XFS, ext2, ext3, ext4, FAT (short names), F2FS, RockRidge part of ISO9660, nilfs2, UFS1, UFS2 and ZFS are assumed to be UTF-8. This might be false on systems configured with legacy charset but as long as the charset used is superset of ASCII you should be able to access ASCII-named files. And it’s recommended to configure your system to use UTF-8 to access the filesystem, convmv may help with migration. ISO9660 (plain) filenames are specified as being ASCII or being described with unspecified escape sequences. GRUB assumes that the ISO9660 names are UTF-8 (since any ASCII is valid UTF-8). There are some old CD-ROMs which use CP437 in non-compliant way. You’re still able to access files with names containing only ASCII characters on such filesystems though. You’re also able to access any file if the filesystem contains valid Joliet (UTF-16) or RockRidge (UTF-8). AFFS, SFS and HFS never use unicode and GRUB assumes them to be in Latin1, Latin1 and MacRoman respectively. GRUB handles filesystem case-insensitivity however no attempt is performed at case conversion of international characters so e.g. a file named lowercase greek alpha is treated as different from the one named as uppercase alpha. The filesystems in questions are NTFS (except POSIX namespace), HFS+ (configurable at mkfs time, default insensitive), SFS (configurable at mkfs time, default insensitive), JFS (configurable at mkfs time, default sensitive), HFS, AFFS, FAT, exFAT and ZFS (configurable on per-subvolume basis by property “casesensitivity”, default sensitive). On ZFS subvolumes marked as case insensitive files containing lowercase international characters are inaccessible. Also like all supported filesystems except HFS+ and ZFS (configurable on per-subvolume basis by property “normalization”, default none) GRUB makes no attempt at check of canonical equivalence so a file name u-diaresis is treated as distinct from u+combining diaresis. This however means that in order to access file on HFS+ its name must be specified in normalisation form D. On normalized ZFS subvolumes filenames out of normalisation are inaccessible.

18.3 Output terminal

Firmware output console “console” on ARC and IEEE1275 are limited to ASCII.

BIOS firmware console and VGA text are limited to ASCII and some pseudographics.

None of above mentioned is appropriate for displaying international and any unsupported character is replaced with question mark except pseudographics which we attempt to approximate with ASCII.

EFI console on the other hand nominally supports UTF-16 but actual language coverage depends on firmware and may be very limited.

The encoding used on serial can be chosen with terminfo as either ASCII, UTF-8 or “visual UTF-8”. Last one is against the specification but results in correct rendering of right-to-left on some readers which don’t have own bidi implementation.

On emu GRUB checks if charset is UTF-8 and uses it if so and uses ASCII otherwise.

When using gfxterm or gfxmenu GRUB itself is responsible for rendering the text. In this case GRUB is limited by loaded fonts. If fonts contain all required characters then bidirectional text, cursive variants and combining marks other than enclosing, half (e.g. left half tilde or combining overline) and double ones. Ligatures aren’t supported though. This should cover European, Middle Eastern (if you don’t mind lack of lam-alif ligature in Arabic) and East Asian scripts. Notable unsupported scripts are Brahmic family and derived as well as Mongolian, Tifinagh, Korean Jamo (precomposed characters have no problem) and tonal writing (2e5-2e9). GRUB also ignores deprecated (as specified in Unicode) characters (e.g. tags). GRUB also doesn’t handle so called “annotation characters” If you can complete either of two lists or, better, propose a patch to improve rendering, please contact developer team.

18.4 Input terminal

Firmware console on BIOS, IEEE1275 and ARC doesn’t allow you to enter non-ASCII characters. EFI specification allows for such but author is unaware of any actual implementations. Serial input is currently limited for latin1 (unlikely to change). Own keyboard implementations (at_keyboard and usb_keyboard) supports any key but work on one-char-per-keystroke. So no dead keys or advanced input method. Also there is no keymap change hotkey. In practice it makes difficult to enter any text using non-Latin alphabet. Moreover all current input consumers are limited to ASCII.

18.5 Gettext

GRUB supports being translated. For this you need to have language *.mo files in $prefix/locale, load gettext module and set “lang” variable.

18.6 Regexp

Regexps work on unicode characters, however no attempt at checking cannonical equivalence has been made. Moreover the classes like [:alpha:] match only ASCII subset.

18.7 Other

Currently GRUB always uses YEAR-MONTH-DAY HOUR:MINUTE:SECOND [WEEKDAY] 24-hour datetime format but weekdays are translated. GRUB always uses the decimal number format with [0-9] as digits and . as descimal separator and no group separator. IEEE1275 aliases are matched case-insensitively except non-ASCII which is matched as binary. Similar behaviour is for matching OSBundleRequired. Since IEEE1275 aliases and OSBundleRequired don’t contain any non-ASCII it should never be a problem in practice. Case-sensitive identifiers are matched as raw strings, no canonical equivalence check is performed. Case-insenstive identifiers are matched as RAW but additionally [a-z] is equivalent to [A-Z]. GRUB-defined identifiers use only ASCII and so should user-defined ones. Identifiers containing non-ASCII may work but aren’t supported. Only the ASCII space characters (space U+0020, tab U+000b, CR U+000d and LF U+000a) are recognised. Other unicode space characters aren’t a valid field separator. test (see test) tests <, >, <=, >=, -pgt and -plt compare the strings in the lexicographical order of unicode codepoints, replicating the behaviour of test from coreutils. environment variables and commands are listed in the same order.