Identifying writing systems
The functions in this section are used to identify the writing system, or script of individual characters and of ranges within a larger text string.
unsigned-int32
) ⇒ (ret <pango-script>
)Looks up the
<pango-script>
for a particular character (as defined by Unicode Standard Annex<24>
). No check is made for ch being a valid Unicode character; if you pass in invalid character, the result is undefined.
- ch
- a Unicode character
- ret
- the
<pango-script>
for the character.Since 1.4
<pango-script>
) ⇒ (ret <pango-language>
)Given a script, finds a language tag that is reasonably representative of that script. This will usually be the most widely spoken or used language written in that script: for instance, the sample language for ‘PANGO_SCRIPT_CYRILLIC’ is ‘ru’ (Russian), the sample language for ‘PANGO_SCRIPT_ARABIC’ is ‘ar’.
For some scripts, no sample language will be returned because there is no language that is sufficiently representative. The best example of this is ‘PANGO_SCRIPT_HAN’, where various different variants of written Chinese, Japanese, and Korean all use significantly different sets of Han characters and forms of shared characters. No sample language can be provided for many historical scripts as well.
- script
- a
<pango-script>
- ret
- a
<pango-language>
that is representative of the script, or ‘#f
’ if no such language exists.Since 1.4
<pango-language>
) (script <pango-script>
) ⇒ (ret bool
)Determines if script is one of the scripts used to write language. The returned value is conservative; if nothing is known about the language tag language, ‘
#t
’ will be returned, since, as far as Pango knows, script might be used to write language.This routine is used in Pango's itemization process when determining if a supplied language tag is relevant to a particular section of text. It probably is not useful for applications in most circumstances.
- language
- a
<pango-language>
- script
- a
<pango-script>
- ret
- ‘
#t
’ if script is one of the scripts used to write language, or if nothing is known about language.Since 1.4
mchars
) (length int
) ⇒ (ret <pango-script-iter>
)Create a new
<pango-script-iter>
, used to break a string of Unicode into runs by text. No copy is made of text, so the caller needs to make sure it remains valid until the iterator is freed withpango-script-iter-free
.x
- text
- a UTF-8 string
- length
- length of text, or -1 if text is nul-terminated.
- ret
- the new script iterator, initialized to point at the first range in the text, which should be freed with
pango-script-iter-free
. If the string is empty, it will point at an empty range.Since 1.4