A Worldwide Language and Font Support

Using AVS/Express

A Worldwide Language and Font Support

This appendix describes facilities for building a worldwide application and adding to the standard fonts available for each locale within AVS/Express.

This chapter discusses:

Introduction
Language Support
Locales
Text Processing
Localization
Internationalization
Adding Fonts to AVS/Express

A.1 Introduction

It is important for the productivity of the application developer and end user that visual interfaces are presented in the most natural way. AVS/Express allows developers and end users to work in their own languages. There are two high-level approaches to developing applications with AVS/Express for international markets:

Develop and use a localized application in one language, not necessarily English. Language issues are addressed during the application development stage of the software lifecycle. Input and display of text can be performed in the target language. V files can be hardcoded with strings and object aliases in the target language. The benefit of developing directly in the language of the primary end user is offset by the difficulty of porting the application to other languages.
Develop an internationalized application in English, then provide it with translation dictionaries for languages used in target markets. The dictionaries are independent of V files, so that a single executable and a suite of V files can be used to run the same application in many different languages. Application development and linguistic issues are separate stages of the software lifecycle for internationalized projects.

Each of these project mechanisms is supported by an object property: localized projects use object aliases from the "user_name" property; internationalized projects use the "dictionary" property. The sections "Localization" and "Internationalization" describe these features and the associated project lifecycles.

You configure AVS/Express to run in a particular language through the locale model. This uses the concepts of locale, character set, encoding, and font. The sections on "Language Support" and "Locales" develop these ideas with background material and examples. AVS/Express uses aspects of the host platform's operating system and windowing system to provide worldwide language support. You should consult your operating system documentation to find out how to configure and use your platform with the language you require.

The section "Text Processing" presents an overview of strings in the V language, describes how they are exchanged between various components of the AVS/Express environment, and gives details of interfaces and formats relevant to worldwide language support.

A.2 Language Support

This section is background material designed to help you to understand the mechanics of worldwide language support. Information relating specifically to support of these features in AVS/Express is provided in later sections of this chapter and in the Release Notes.

This section describes how languages are represented for computer processing. It starts with a discussion of the two main approaches (local character sets and Unicode) and describes the choice AVS/Express has made. It then summarizes the languages, character sets, and encodings used by AVS/Express.

Text Representations

A language is represented by using one or more character sets to enumerate the abstract symbols of the language. An encoding maps the symbols in a character set to numerical codes. Representations of the symbols, called glyphs, are bound with their encoded values in a font.

There are two approaches to the definition of character sets:

Local character sets

Specify one or more character sets for each language; each character set may have a variety of encodings.

Universal character set

Create one character set that includes all of the symbols required by every language, and specify a unique encoding for this superset.

Standard local character sets are widely supported across different platforms. Local character sets allow worldwide language support to be implemented incrementally, by adding font handling and character conversion routines around existing core string processing.

The Unicode Consortium and the International Standards Organization (ISO) jointly developed a universal character set, known as Unicode. ISO10646 (1993) is a generalization of the Unicode standard (1991). With Unicode it is possible to develop multilingual applications that do not require language-specific text processing. Unicode is a fixed-width encoding, which means that every character is encoded by two bytes, even for languages such as English that use small alphabets. Unicode is only just becoming widely available. Some operating systems, such as Windows NT, use Unicode as their internal representation for all text information, but even these systems also provide interfaces and font mappings for local character sets.

AVS/Express uses local character sets.

Local Character Sets

The English alphabet can be encoded with 7-bits per character, while other European languages require 8-bit values:

ASCII (American Standard Code for Information Interchange)

A character set for the Roman alphabet as used in English. This character set is encoded in 7-bit values, that is, byte with the most significant bit 0.

Extended ASCII (ISO8859 parts 1-9)

This is a family of character sets incorporating ASCII in the lower half of the byte range. The upper half of the byte range is used for special punctuation, and accented Roman characters or new alphabets.

Table A-1

ISO8859-1

Latin-1, accented characters for Western European languages: , French, Spanish, Portuguese, Italian, Dutch, Danish, Swedish, Norwegian, Icelandic, Finnish.

ISO8859-2

Latin-2, accented characters for Eastern European languages: Polish, Czech, Slovak, Hungarian, Slovene, Croat, etc.

ISO8859-3

Latin-3

ISO8859-4

Latin-4

ISO8859-5

Cyrillic alphabet for Russian, Bulgarian, etc.

ISO8859-6

Arabic alphabet for Arabic, Persian, etc.

ISO8859-7

Greek alphabet

ISO8859-8

Hebrew alphabet

ISO8859-9

Latin-5, accented characters for Turkish

These are known as Single Byte Character Sets (SBCS). Each single-byte character set is limited to 256 characters.

Chinese, Japanese and Korean use thousands of characters, so they require two or more bytes to encode each character. These are Multi-byte Characters Sets (MBCS).

Multi-byte encodings are defined such that SBCS and MBCS can coexist in the same string. For example, you may use an ASCII English abbreviation in a Japanese sentence. They also have the important property that standard string processing for ASCII encoded English is valid in every locale.

There are two basic approaches to MBCS encoding:

modal

Escape sequences delimit changes of character set between SBCS and MBCS and between different multi-byte character sets. For example:

Japanese Industrial Standard 7-bit encoding (JIS) for Japanese (similar forms exist for Chinese and Korean)

non-modal

Byte values themselves signal how to parse subsequent bytes. The lead byte of a multi-byte character must be in the upper half of the byte range. The default MBCS is 2 bytes per character; subsequent MBCSs must use 3 or 4 bytes per character: reserved lead bytes to specify the new character set, then 2 bytes to identify the character. For example:

Extended UNIX Code (EUC), ISO2022-1993, a generic encoding method that can be applied to Chinese, Korean and Japanese
Shift-JIS (Microsoft kanji) for Japanese

Chinese characters (hanzi) are used in Korean (hanja) and Japanese (kanji). Korean phonetic symbols are called hangul. Japanese syllabic characters are called kana; there are two forms: the cursive hiragana and the more angular katakana.

Chinese, Korean and Japanese languages have national and industry standard character sets. For more details, see Supported Locales on page A-11.

Locales are distinguished by the number of bytes per character in their principal character set: single-byte locales, based on European or Middle Eastern languages; and multi-byte locales, based on the Chinese, Japanese or Korean languages.

A.3 Locales

Locale Model

The combined factors of language and culture which apply in a particular territory are grouped together in a locale. There are many considerations within a locale, such as monetary units, date and time, personal name order, collation sequence and number format, but the most important is the language.

Locales are often defined for territories that share the same language. For example, French is an official language of Belgium, Luxembourg, Switzerland, Canada, various countries in West Africa, and parts of the Caribbean, as well as in France itself. Territories that share a language often have cultural differences that are reflected in the locale. For example, in Britain the date is written day/month/year but in the U.S.A it is written month/day/year.

AVS/Express takes the current locale from the system environment at initialization time. The locale remains current throughout the AVS/Express session. The current locale is not represented by an AVS/Express object. The exact method for specifying the locale is system dependent. For more details, see Initialization on page A-7.

The default locale for AVS/Express is known as the "C" locale, and the default language is English. When AVS/Express is used in a non-default locale, it is said to use a local language. Text strings in this language will be referred to as local text or local strings.

AVS/Express supports input, display, storage and processing of local languages, but does not adapt to other factors in the locale, such as collation sequence.

Levels of Support

There are three possible levels of support for a locale in AVS/Express:

unsupported

AVS/Express cannot run in this locale. AVS/Express will try to fall back to the default locale.

enabled

AVS/Express supports string input, display, and processing in the local language, but you must supply your own translation dictionaries. The AVS/Express User Interface and Network Editor appear in English.

supported

The locale is enabled and translation dictionaries are present for strings found in the User Interface and Network Editor so they appear in the local language. You can supply translation dictionaries for strings unique to your application.

Initialization

This section describes initialization for a UNIX platform running the X/Motif window system.

The system locale is set using the LANG environment variable. The general format for LANG is:

language[_territory][.codeset][@modifier]

where clauses in square brackets, '[ ]', are optional. Each platform has its own set of values for the fields in the locale name. See your platform release notes to find the value of LANG appropriate to your system in your locale. The default locale is called "C", which implies English language. The codeset can be a character set, an encoding, or a name which implies both. Modifiers adjust certain details of the locale, such as choosing between various collation sequences or input methods.

For example, this is a valid Chinese locale for DEC OSF/1:

zh_CN.dechanzi@pinyin

The language is zh, which stands for zhong-guo-hua, meaning Chinese. The territory is CN, for the People's Republic of China. The codeset is dechanzi, a DEC-specific group of character sets for simplified Chinese characters; pinyin is a collation order based on the romanized Pinyin transliteration of Chinese words.

The AVS/Express locale is set during initialization and remains current for the rest of the session. AVS/Express uses a simplified locale name derived from the LANG environment variable. This provides a common naming convention across platforms. Optional modifiers do not affect the operation of AVS/Express, so they are ignored. To find the AVS/Express locale, the LANG value is truncated at the first '.' or '@', and the resulting name is looked up in a list of aliases within the AVS/Express locale database. When a match is found, the simplified name for that locale is used as the AVS/Express locale. The simplified format has two-letter abbreviations for both language and territory, separated by an underscore:

<language:2>_<TERRITORY:2>

For example, ja_JP is the simplified name of the AVS/Express locale for Japanese. It is derived from platform-specific LANG variables such as: ja,japanese,ja_JP.EUC, and ja_JP.deckanji.

The defaut locale is an exception to this format rule; it is just called "C".

AVS/Express uses the locale in two ways:

To determine default fonts

The relevant ISO8859 font is always loaded, but multi-byte locales require additional SBCS and MBCS fonts.

To specify the search path for dictionary files

Dictionaries are searched in this relative pathname:
default: runtime/nls/C
other: runtime/nls/<language_TERRITORY>

Keyboard Input

European languages usually have direct input methods from local keyboards, perhaps using shifted key sequences. Multi-byte character sets, however, require more complex methods. A separate application mediates between keyboard input and the target text widget. On UNIX platforms this application is called a Front End Processor (FEP), and on Windows NT it is called an Input Method Editor (IME). Only when the input interaction is finished will the FEP/IME send a local string to the AVS/Express application. Input methods determine where the raw keyboard input appears on the screen and how pre-edit operations are performed. Each FEP/IME supports different input methods.

Configuring an FEP/IME is platform dependent. See the window system release notes for your platform and your locale.

Input Encoding

AVS/Express accepts string input in EUC, 7-bit (JIS) and Shift-JIS encodings for the relevant multi-byte locales. There is no configuration required; all of the encodings can be used in the one session of AVS/Express. Separate strings can have different encodings, but the encoding must be consistent throughout any individual string. There are some additional technical restrictions:

All lines of 7-bit (JIS) encoded text must begin and end in single-byte mode (ASCII, JIS Roman, or GB Roman), because the end-quote, `"', is a valid 7-bit byte value in a multi-byte character. The V parser cannot test for end-of-string in multi-byte mode, it must wait for a single-byte `"'.
Multi-byte text and escape sequences in a 7-bit (JIS) encoding cannot contain ANSI "C" hex format, because the backslash, `\', is a valid byte value in a multi-byte character. For more details, see See Hexadecimal Format on page A-20.
Strings which are ambiguous between Shift-JIS and EUC are assumed to be EUC. Note that the full-width space is not ambiguous, so a workaround to force a Shift-JIS interpretation is to add a full-width space to the string.

Output Encoding

AVS/Express has an output encoding type which determines how local language strings are written. The output encoding is determined by the optional codeset field of the LANG environment variable. This value can be overridden by an independent environment variable, XP_MBCS_ENCODING.

Each supported encoding has a list of recognized values for the LANG codeset and the XP_MBCS_ENCODING environment variable:

Extended UNIX Code (EUC), default:

EUC, euc, eucJP, IBM-eucJP, deckanji, sdeckanji,
eucKR, IBM-eucKR, deckorean, dechanzi

7-bit modal encoding:

JIS, jis, 7BIT, 7bit.

Microsoft 8-bit encoding:

SJIS, sjis, Shift-JIS, ShiftJIS, IBM-932

The default output encoding, EUC, is used when the LANG codeset and XP_MBCS_ENCODING are unset or unrecognized.

Errors

If the locale for the LANG variable cannot be set on the system, AVS/Express defaults to using the C locale, and issues this message:

Warning: cannot set system locale, using C

This means that your system does not have the correct configuration of Motif, X or C libraries to support the requested locale. Consult your operating system release notes for this locale.

If the LANG codeset or XP_MBCS_ENCODING are unrecognized, AVS/Express prints a warning message:

Warning: unrecognized encoding {name}, using EUC

For the list of recognized values, see See Input Encoding on page A-8.

There are several runtime errors that can be written by AVS/Express when parsing multi-byte text in various encodings. These relate to corrupted strings: 8-bit values in a 7-bit encoding; escape sequences in an 8-bit encoding; unrecognized 7-bit escape sequences, and so forth. AVS/Express does not test every byte value for validity within the current character set, so it is possible to produce unintelligible text without any error message.

Environment Information

For more information about the LANG variable and the locales used for your session of AVS/Express, set the environment variable XP_LOCALE_DEBUG before running AVS/Express. This will force the LANG variable, system locale, AVS/Express locale and AVS/Express language name to be printed out. Here are some sample results:

Express: LANG is not set
Express: system locale is C
Express: express locale is C Default

Express: LANG is ja_JP.EUC
Express: system locale is ja_JP.EUC
Express: express locale is ja_JP Japanese

Express: LANG is fr_FR
Express: system locale is fr_FR.ISO8859-1
Express: express locale is fr_FR French

Note that the system locale may be different from the LANG variable.

If the XP_DEBUG_LOCALE environment variable is set and the locale is a multi-byte locale, useful information about the encoding variables is printed to the AVS/Express terminal. For example, if the LANG variable is ja_JP.eucJP, these are examples of possible encoding information:

Express: XP_MBCS_ENCODING is not set
Express: LANG codeset is eucJP
Express: express V output encoding is EUC

Express: XP_MBCS_ENCODING is JIS
Express: LANG codeset is eucJP
Express: express V output encoding is JIS

Notice that the XP_MBCS_ENCODING environment variable takes precedence over the LANG codeset.

Supported Locales

Default Locale

The C locale is the default. The language used in the C locale is English. AVS/Express loads the default font for the ISO8859-1 character set.

There are three situations when AVS/Express uses the C locale:

The LANG environment variable is not set. AVS/Express uses C in the dictionary pathname.
The LANG environment variable is set. The system does not recognize the LANG value and the system itself defaults to the C locale. AVS/Express uses C in the dictionary pathname.
The LANG environment variable is set. The system recognizes the LANG value. AVS/Express does not recognize the value, but it truncates the value at the first `.' or `@' and uses the resulting string in the dictionary pathname. AVS/Express assumes that the unrecognized locale is based on ISO8859-1 and loads the default font for that character set.

For example, suppose VE is the territory code for Venezuela. You set the LANG variable to es_VE for Spanish language in Venezuela and your system accepts this value. AVS/Express will load an ISO8859-1 font. It will look for this dictionary pathname under the AVS/Express install directory, or another project directory in $XP_PATH:

runtime/nls/es_VE

You can either create a real subdirectory with that name to contain Spanish translations unique to Venezuela, or just make it a link to es_ES to find generic Spanish dictionaries:

runtime/nls/es_VE -> es_ES

This default mechanism allows AVS/Express to run in unrecognized locales based on Western European languages (ISO8859-1 character set).

Western European Languages

These locales use the ISO8859-1 character set. The system locale is recognized by AVS/Express if it matches the AVS/Express locale name, its language name, or one of a list of other aliases. Codesets and modifiers are ignored. The supported locales are:

Table A-1

A VS/Express locale

Language

Other aliases

C

english

en en_GB american en_US en_CA en_AU En_GB En_US POSIX

fr_FR

french

fr c-french fr_CH fr_BE fr_CA Fr_FR Fr_CH Fr_BE

de_DE

german

de de_CH de_AT De_DE De_CH

es_ES

spanish

es Es_ES

pt_PT

portuguese

pt Pt_PT pt_BR

it_IT

italian

it It_IT it_CH

nl_NL

dutch

nl nl_BE Nl_NL Nl_BE

da_DK

danish

da Da_DK

sv_SE

swedish

sv Sv_SE

no_NO

norwegian

no No_NO

is_IS

icelandic

is Is_IS

fi_FI

finnish

fi Fi_FI su su_SU

Eastern European Languages

Express recognizes these Eastern European locales:

Table A-2

AVS/Express locale

Language

Other aliases

pl_PL

polish

pl

cs_CZ

czech

cs

sk_SK

slovak

sk

hu_HU

hungarian

hu

Optional LANG codesets and modifiers are ignored. AVS/Express loads a default font for the ISO8859-2 character set.

Cyrillic, Greek and Turkish

AVS/Express recognizes these additional single-byte locales:

Table A-3

AVS/ Express locale

Language

Other aliases

ru_RU

russian

ru

el_GR

greek

el

tr_TR

turkish

tr

Optional LANG codesets and modifiers are ignored. AVS/Express loads a default font for these character sets: ISO8859-5 for Russian; ISO8859-7 for Greek; and ISO8859-9 for Turkish.

Japanese Locale

AVS/Express recognizes these Japanese locales:

Table A-4

AVS/ Express locale

Language

Other aliases

ja_JP

japanese

ja Ja_JP

Optional LANG codesets and modifiers are ignored when determining the AVS/Express locale.

The AVS/Express Japanese locale loads default fonts for these character sets:

ISO8859-1
JIS X 0201-1976 (JIS Roman)
JIS X 0208-1983

The choice between ISO8859-1 and JIS X 0201 is left to the platform window system. Usually it will choose a JIS Roman font for single-byte text. The following character sets are not supported:

half-width katakana codeset from JIS X 0208-1983
JIS X 0212-1990

AVS/Express supports all three input and output encodings: EUC, JIS, Shift-JIS. An optional LANG codeset is used to set the AVS/Express output encoding.

These JIS escape sequences are recognized in input:

Table A-5

<ESC>$@

to kanji JIS C 6226-1978

<ESC>$B

to kanji JIS X 0208-1983

<ESC>&@<ESC>$B

to kanji JIS X 0208-1990

<ESC>(B

to ASCII

<ESC>(J

to JIS X 0201-1976 (JIS Roman)

<ESC>(H

to JIS X 0202-1990 (Swedish) implies to JIS Roman

The escape sequences written on output are:

Table A-6

<ESC>$B

to kanji JIS X 0208-1983

<ESC>(J

to JIS X 0201-1976 (JIS Roman)

In AVS/Express, Japanese text is displayed from left to right, in rows from top to bottom, the same as English.

Korean Locale

AVS/Express recognizes these Korean locales:

Table A-7

AVS/ Express locale

Language

Other aliases

ko_KR

korean

ko kr

Optional LANG codesets and modifiers are ignored when determining the AVS/Express locale.

The AVS/Express Korean locale loads default fonts for these character sets:

ISO8859-1
KS C 5601-1987

The AVS/Express Korean locale supports EUC and 7-bit encodings for input and output. There is no Shift-JIS encoding for Korean. The LANG codeset is used to determine the AVS/Express output encoding.

These 7-bit escape sequences are recognized in input and written in output:

Table A-8

<ESC>$@(C

to KS C 5601-1992

<ESC>(B

to ASCII

In AVS/Express, Korean text is displayed from left to right, in rows from top to bottom, the same as English.

North Korea has abolished the use of borrowed Chinese characters (hanja); they are passing out of use in South Korea.

Simplified Chinese Locale

In 1956 the People's Republic of China (PRC) simplified the traditional Chinese characters in an effort to improve literacy. The traditonal forms are still widely used outside the PRC: for Chinese in Taiwan, Hong Kong and Singapore; for Japanese in Japan (kanji); and for Korean in South Korea (hanja).

AVS/Express recognizes these Simplified Chinese locales:

Table A-9

AVS/ Express locale

Language

Other aliases

zh_CN

chinese-s

zh_HK.[codeset]

Optional LANG modifiers are ignored when determining the AVS/Express locale.

The codeset is significant in determining the locale for Hong Kong. If the territory name is HK and the codeset is either absent, or one of a recognized set of simplified codeset aliases, then AVS/Express selects the Simplified Chinese locale. The recognized simplified codesets for Hong Kong are:

dechanzi

The AVS/Express Simplified Chinese locale loads default fonts for these character sets:

ISO8859-1
GB 1988-1980 (GB Roman)
GB 2312-1980

It is not an error if a default font is not found for GB Roman. If fonts are found for both ISO8859-1 and GB 1988-1980, then the choice of single-byte character set is left to the window system. Usually it will choose a GB Roman font when available.

The AVS/Express Simplified Chinese locale supports EUC and 7-bit encodings for input and output. There is no Shift-JIS encoding for Simplified Chinese. An optional LANG codeset is used to determine the AVS/Express output encoding.

These 7-bit escape sequences are recognized in input:

Table A-10

<ESC>$@(A

to GB 2312-1980

<ESC>(B

to ASCII

<ESC>(T

to GB 1988-1980 (GB Roman)

These 7-bit escape sequences are written in output:

Table A-11

<ESC>$@(A

to GB 2312-1980

<ESC>(T

to GB 1988-1980 (GB Roman)

In AVS/Express, Simplified Chinese text is displayed from left to right, in rows from top to bottom, the same as English.

A.4 Text Processing

Pathways

Figure A-1

AVS/Express enters, displays, and writes text in many ways. You must consider the following for worldwide language support in your application:

File input/output (possibly created with a local editor)

V files
Dictionaries

VCP terminal input/output (possibly with localized terminal window)

Command line prompt and echo
V command input
Errors, warnings and messages output

User Interface and Network Editor input/display

Direct text display in Network Editor icons
Text display in User Interface widgets
Window manager decoration
Text input to dialogs and type-in widgets

Graphics display

2D text
3D text in software renderer
3D text in hardware renderers

Documentation

On-line help
Written manuals

The V language is based on ASCII characters and the English language. Many components of the V and VCP streams will not change across locales.

Three pathways are not supported for international use:

VCP errors, warnings and messages

A system of message catalogs has not yet been developed for AVS/Express.

3D text in hardware renderers (PEX and XGL)

These 3D graphics API's do not support international 3D text.

On-line help system and written documentation

The help system used by AVS/Express is Bristol Hyperhelp. European and Japanese versions of this product are available to you for integrating your local language help into AVS/Express applications.

The remaining pathways are supported for enabled locales.

Local language input to the User Interface and Network Editor is managed by the windowing system. AVS/Express expects to receive properly formed local strings from dialog and typein widgets, possibly via an FEP/IME.

Text display for the User Interface, Network Editor, 2D Graphics Display, and 3D software renderer is accomplished using the facilities of the underlying window system. Local language titles are rendered in window decoration by the local window manager.

The OpenGL renderer does support international 3D text on UNIX platforms. It borrows X Window fonts and renders 3D text as Z-buffered bitmapped images.

The Object Manager can read local language strings from V files, VCP terminal and dictionaries. In multi-byte locales, input and output can be in any appropriate encoding: EUC, 7-bit (JIS) and Shift-JIS (Microsoft Kanji). See Locales on page A-6 for more details.

The Object Manager is the hub of string processing in AVS/Express; most of the enabled pathways for local language strings radiate from the Object Manager. The next section explains how strings are defined in V and manipulated within the Object Manager, concentrating on those aspects important for worldwide language support. In a following section, the interfaces for writing V output are described.

Strings in V

There are three basic text items within the Object Manager:

Object basenames; for example, icon name and workspace pathname in the Network Editor
String object values; for example, titles, messages and filenames in the User Interface
Properties; for example, build directory or source file as displayed by the Properties Editor.

The AVS/Express default language is English. The V language uses printable ASCII for its syntax, including all keywords, delimiters, and object basenames. V string literals are enclosed in double quotes and can contain characters that are not printable ASCII.

Consider this V fragment declaring integer and cmethod objects:

string message = "Connect two objects";

cmethod update <src_file = "update.c">;

The object basenames are message and update; the string object value is initialized to "Connect two objects"; the cmethod object has the src_file property set with string value "update.c".

Since object basenames are part of the V language syntax, they must use the ASCII character set according to rules for identifiers in V. Properties are implemented as string objects within the Object Manager, so the behavior for strings applies to properties as well. Filenames can occur in properties or strings; they can be local strings when the host file system supports local pathnames.

Some property strings are taken from a small set of predefined string values. These enumerated string values should not be translated or set with local language strings. For example, the property NEdisplayMode can take only the string values "NEopened", "NEclosed" or "NEmaximized".

String objects can get their value from several sources:

Assignment or reference to other string objects
Expressions and operators that produce a string result, including the name_of operator in V, which returns an object basename string
Literal value in a V file (possibly written with a local language editor), or at the VCP (possibly running in a local language terminal)
Literal input from the User Interface or Network Editor (possibly running in a windowing system that supports local input methods)

String literals are lists of bytes between double quotes. The bytes can take any value, so they can represent characters from any character set.

Hexadecimal Format

String values can be set directly with characters, or encoded in the ANSI "C" hexadecimal format. For example, <ESC> is a non-printable ASCII character whose value is decimal 27, hexadecimal 0x1b. The escape character looks like this in ANSI "C" hex format: \x1b

Successive bytes can be concatenated with this representation. For example, the Japanese EUC encoding uses a pair of bytes for each character and both bytes have their most significant bit set. A Japanese kanji string object for "nihongo", which means "Japanese", could be initialized in hex format:

string nihongo = "\xc6\xfc\xcb\xdc\xb8\xec";

The text could be entered explicitly in a V file with a Japanese editor or at the VCP prompt in a Japanese terminal running AVS/Express:

Figure A-2

In either case, this is how the string object would appear in an application workspace of the Network Editor, opened, ready for editing the string value:

Figure A-3

There is a restriction on hexadecimal format for the 7-bit (JIS) encoding in multi-byte locales: hexadecimal format cannot be used in multi-byte substrings or in escape sequences. For example, the JIS encoding for "Japanese" has two escape sequences: <ESC>$@ and <ESC>(J. The hexadecimal format for <ESC> is \x1b, and the kanji text, F|K\8l, contains a backslash. This combination cannot be parsed correctly if hex format is used for the <ESC> characters. The raw byte value must be used in the string. This is invalid:

string nihongo = "\x1b$B F|K\8l \x1b(J"; /* wrong */

V Output

The function that writes V files is:

OMsave_obj

All V syntax is printable ASCII. The default behavior for writing string literals is to use ANSI "C" hexadecimal format for all characters that are not printable ASCII. If the V file is read back in to AVS/Express the strings will be restored to their original form, but if the V file is viewed in a local text editor, the hexadecimal representation will be displayed instead of the original character. This will make all MBCS strings illegible and will corrupt accented characters in European language strings.

If you are displaying in a Japanese terminal that uses JIS Roman as its single-byte character set, not only will the strings be displayed in hexadecimal format, but the ASCII backslash will be displayed as a JIS Roman Yen symbol, making the output even more obscure.

To save V strings with raw byte values for characters that are not printable ASCII, pass the flag OM_SAVE_8BIT in the mode argument of the OMsave_obj function. This will allow local language strings to be saved intact, so that they can be viewed, and possibly changed, in a local text editor.

Commands entered at the VCP, such as $save or $print, which result in V being written to a file or to the VCP terminal, use the OM_SAVE_8BIT flag. If you are running AVS/Express in a local terminal and your V code contains local string literals, then the V output from $print will display legible local strings.

In the Network Editor, the pulldown menu options Save Application and Save Objects will write V files with the OM_SAVE_8BIT flag set on the call to OMsave_obj. They are safe to use with local strings in European and Asian locales.

The encoding used for V output is determined by the global MBCS output encoding. See Output Encoding on page A-9.

A.5 Localization

String objects and object properties can hold local strings. Raw local string values are displayed on components of the User Interface. Object basenames must be in ASCII English, but they can be assigned local aliases with the "user_name" property. The object username is displayed on the object's icon in the Network Editor. You can develop, save, and run applications in your local language. These applications can be used only in the same AVS/Express locale as the original development. Porting this local application to another language requires a new set of local V files, or a revision for internationalization.

Username Property

A username is a display alias for an object. The username is used in preference to the basename for displaying and interacting with the object in the Network Editor. The username is a property, so it is saved and restored with the object definition in V. You can define a username for any type of object; for example:

macro Viewer3D <user_name="3DViewer"> { ...

The Object Manager will still use the basename to refer to the object, so the username is not restricted by the usual rules governing object identifiers: it does not have to be unique; it can contain punctuation and delimiters, it can start with a numeral (as in the example), and it does not even have to be in ASCII. The username is just a string, so it can have any format allowed for string literals in V, including ANSI "C" hexadecimal format, extended ASCII or multi-byte characters.

The name alias applies only in the Network Editor. The VCP interface will still use the basename. Navigation and reference at the VCP prompt requires the basename, and the output from inquiries such as the $list command will return the basename.

The username itself is not part of the worldwide language support mechanism, but it will be translated just like any other string. It is not advisable to use translation on a username that is already in a local language. The main problem with this double-translation is for Asian languages where encodings must agree in the string and the dictionary key. For more information about dictionaries and translation strategies, see Internationalization on page A-26.

Object Manager interface

The Object Manager provides functions to access the username property:

OMget_obj_user_name, OMret_obj_user_name

They are wrappers for the underlying property inquiry routine:

OMget_obj_sprop( ... OM_prop_user_name, ... );

Network Editor interface

The username is displayed on the object as it appears in the Network Editor: on the icon in a library palette, on the icon in the workspace, or as the title of an open or maximized object in the workspace.

You can set the user_name property from the Property Editor in the object icon's popup menu.

The object Rename operation changes the basename, not the username. The renamed object cannot have a local language basename.

Localized Projects

A project is localized when it has locale-specific V files. These are files containing local language strings in object username properties or string object values. In general, it is not possible to internationalize these strings using the dictionary mechanism, so the V files must be edited by hand to port the application to a new locale. This is time consuming and error prone, and it creates a shadow set of V files for development, distribution and maintenance. This is not the recommended method of implementing internationalized applications. An application should be localized only when it is not intended to be run in another locale.

The benefits of localization are that the development can take place in the local language. You can access all of the Network Editor and User Interface features in your local language, including text typeins and dialogs. There are two important development processes that cause local language strings to be written in the V files for the application, either by editing of V files directly, using a local text editor, or through the Network Editor's visual programming interface:

You can customize object names with local language usernames, perhaps using the Properties Editor.
You can specify string values in the local language, perhaps entered at an Edit Value typein.

Other transient actions relating to the use of AVS/Express do not write V files and so cannot tie the application to the current locale. The localized V files will be written at any subsequent "Save Application" or "Save Objects". The locale name is not saved with the application. It is your responsibility to run a localized application in the correct locale.

Examples

These examples of localized objects are given for the Japanese locale.

Local Object Name

The example object is the same as that used for Internationalized examples in the previous section: an integer called kanji initialized to 1. You localize this object basename by adding a translation in the username property. This can be added directly in V, either in a V file or at the VCP prompt, with local username in ANSI "C" hexadecimal format:

int kanji <user_name="\xb4\xc1\xbb\xfa"> = 1;

or explicitly in Japanese, using a Japanese editor on a V file, or at the VCP prompt when running AVS/Express in a Japanese terminal

Figure A-1

The username can also be added using the Properties Editor:

Figure A-2

Note that the dialog user interface has been translated by the AVS/Express dictionaries.

The kanji object would now appear like this in the Network Editor:

Figure A-3

Local String Object Value

The example object is a UIlabel with a local value for the label string subobject. Local string values can be entered in V, either in a V file or at the VCP prompt. The string can be given in hex format for an EUC encoding

UIlabel UIlabel {
label = "\xc6\xfc\xcb\xdc\xb8\xec";
};

or explicitly in Japanese using a local editor or a local VCP terminal

Figure A-4

The string can also be entered directly at an Edit Value typein in the Network Editor:

Figure A-5

To get local versions of "label" and "UIlabel", the username property must be defined for both objects. See the previous example.

A.6 Internationalization

Using the translation utility implemented in AVS/Express, object names and string object values can be translated from English to a local language, or even from one English form to another. Translations are loaded from dictionary files whose names are specified by a "dictionary" property. The dictionary files are read from a directory determined by the current locale. You can customize system dictionaries and supply application-specific dictionaries for your project. Translations are performed automatically by the Object Manager before returning strings to the application. The User Interface and Network Editor use these facilities to provide an internationalized interface.

Dictionary Property

The Object Manager loads a dictionary into a translation table when it encounters an object with the dictionary property set. The dictionary property's value is a file name. For example:

library Mappers <dictionary="mappers.dct"> { ...

The Object Manager loads the dictionary from a file with this relative pathname:

runtime/nls/<locale_name>/<dictionary_file>

So, if the current locale was ja_JP, for example, the dictionary would be loaded from:

runtime/nls/ja_JP/mappers.dct

The pathname is searched relative to the current list of project directories, as determined by the XP_PATH environment variable. This means that the application project directories are searched before the AVS/Express install directory, which is typically at the end of the list. If the file is found, it is loaded into a dynamic translation table for use by the Object Manager. If the file is not found, it is not an error and no warning is issued.

There is no default dictionary. If no dictionary properties are set, then there will be no translations. In fact, AVS/Express does set dictionary properties on some system libraries. The corresponding dictionary files can be found in runtime/nls/<locale_name> under the AVS/Express install directory. For example, the Network Editor uses the dictionaries ne.dct and root.dct.

When multiple dictionaries have been loaded, any translation request will use all the translation tables to resolve the string. Tables are searched in the order they were loaded.

Dictionary Files

Dictionary files are composed of comments and translation entries. Comments have '#' as the first character:

# optional comment text

Comments are ignored during processing. Other instances of '#' embedded in the line do not qualify the line as a comment. Text to the right of the embedded '#' is not ignored.

Translation entries are loaded into a dynamic translation table:

English text = local text

The English text is everything between the start of the line and the '=', except surrounding white space. The local text translation is everything between the '=' and the end of the line, except surrounding white space. If the line does not match this format, or if either text field is empty, AVS/Express ignores the entry.

Local text can have any syntax acceptable to the V parser. For example, the local text could be specified in ANSI "C" hexadecimal format. Entries must have unique English text. AVS/Express ignores subsequent occurrences of an entry with the same English text. Translations of different English text to the same local text are allowed. For example, translation of English text is case sensitive, but phonetic and ideographic languages do not have different cases, so "Red" and "red" should get the same Japanese text translation.

Embedded underscores, '_', are usually removed from object names before they are displayed. For example, the following V macro definition will be labelled "Workspace 1" on its icon in the Network Editor:

library Workspace_1 { ...

To translate the name, the full object basename, with embedded underscores, is required in the dictionary:

# match

Workspace_1 = Zone de travail 1
# no match
Workspace 1 = Zone de travail 1

The Object Manager will automatically rename repeated instances of the same object template by adding an instance number. For example, repeated instances of Read_Field will be called Read_Field, Read_Field#1, Read_Field#2. These names must be translated independently, each name must have a separate entry in the dictionary.

Object Manager Interface

The Object Manager always tries to translate object basenames and string object values when they are accessed with these functions:

OMget_str_val,OMget_str_array_val

If there is a valid dictionary with a match for the string, they return the translation. There is a function similar to OMget_str_val which takes an additional mode argument:

OMget_str_val_mode

To suppress translation of the returned string value, pass the flag OM_STR_NO_I18N in the mode argument.

There are two low-level functions for the Translation Manager utility:

TMload_dictionary, TMget_translation

The dictionary mechanism is read-only: dictionaries can be read from a file and an English string looked up in the table.

Network Editor Interface

The Network Editor uses the system dictionary ne.dct to customize its menu and dialog user interfaces. It uses standard Object Manager inquiries to retrieve translated basenames and strings values for the display of object icons, library palettes, and object properties. The OM_STR_NO_I18N flag is not set and strings are always translated if possible.

You can set the dictionary property on library objects using the Properties Editor in the library icon's popup menu system. The restriction to libraries is a matter of style: libraries are a useful level at which to collect translations. The dictionary property is not accessible through the Properties Editor on other object types. This does not affect your ability to set the property explicitly, either at the VCP prompt or by editing a V file.

Internationalized Projects

AVS/Express is internationalized: the same executable and the same suite of V files can run in any supported locale. The V files contain English strings and translation is effected at runtime by using dictionary files. Developers who want their application to run in more than one locale should follow the model of developing their applications first in English, then internationalizing later.

The advantages of internationalizing through dictionaries are:

There is only one set of V files to develop, distribute and maintain.
Application development is separate from language translation:

Translators do not need to be application developers.
Programmers do not need to be linguists.
Translation is a distinct stage of the project lifecycle.

The translators need an appreciation of the application in order to devise correct and helpful translations, but they need not know anything about programming or AVS/Express in order to write the dictionaries.

There are some aspects of development which affect usability in other languages. For example:

Pixmaps can be used on object icons in the Network Editor and in components of the User Interface. These should be comprehensible across all cultures. Never use text characters in a pixmap. Other images to avoid include road signs, hand gestures, religious symbols, sporting allusions, and visual representations of popular phrases or verbal associations.
The interpretation of colors can have cultural biases. Color can be used to reinforce differentiation of user interface components, but the color itself should not be the only indicator of function. This guideline also helps color-blind users.
Translations of English tend to be longer than the original English text. A typical German translation will be twice as long as the English. String truncation in the user interface is an important consideration.
Abbreviations do not exist in languages written with ideographic or syllabic characters. An English abbreviation must either be left untranslated, or the full local text must be given in the translation.
Characters in MBCS fonts are twice the width of characters in European language fonts, so string truncation is particularly likely for Japanese, Korean, and Chinese language translations.
There can be an ambiguity in using the same English word with the same translation in different contexts. This is discussed further below.

There are two basic schemes for using dictionaries:

original text entries
keyword entries

Original Text Entries

Use raw English strings and object basenames that you want to appear in the User Interface and Network Editor.

Set the dictionary property on the application library.
Do not provide a dictionary under the default C locale.
Provide dictionaries for each non-default locale in which you want to run your application.

The advantages of this scheme are:

no extra work customizing V files and the default C locale dictionary
no runtime performance penalty for translation in the C locale

Keyword Entries

Use English keywords in strings and object basenames, which you do not want to appear in the User Interface or Network Editor.

Set the dictionary property on the application library.
Provide dictionaries for all locales, including keyword/English translation in a C locale dictionary.

There are two ways to generate the keyword: invent a totally artificial code word, or just add a modifier to the original English text.

The advantage of using keyword entries is that it resolves ambiguities between English words in different contexts. Simply use different keywords in V, with the same English translations in the C locale dictionary, but with different translations in the locale where the ambiguity arises.

The dictionary used by the Network Editor uses keyword entries. For example, the first pulldown menu option is labelled NE_MENU_FILE, which is translated to "&File" in the dictionary runtime/nls/C/ne.dct (the "&" signifies the accelerator key for Motif).

Internationalizing a Localized Project

The correct way to internationalize a localized project is:

1. Create a dictionary for the development locale.

2. Add the dictionary property to the appropriate application library.

3. Remove all local language username properties from the V files, add a dictionary entry for each one as a translation of the basename string.

4. Extract all local language string object values from the V files, replace them with English equivalents, and add an entry in the dictionary for each of these translations.

5. Write dictionary files for other locales.

The main barrier to developing internationalized applications in the local language is that there are no direct interfaces from the Network Editor to translation tables, and no project mechanisms for saving dictionaries from these translation sessions. The dictionary mechanism is read only.

The Japanese objects shown in See Examples on page A-24. could be internationalized by following these steps:

1. Create a dictionary file, say example.dct, for the Japanese locale and put it in directory runtime/nls/ja_JP.

2. Add a dictionary property, with value "example.dct", to the appropriate application library, or perhaps explicitly for the kanji or UIlabel objects.

3. Remove the username property from the kanji object and add the Japanese translation for "kanji" to the runtime/nls/ja_JP/example.dct dictionary file.

4. Change the value of the label subobject to be "Japanese" and add the Japanese translation for "Japanese" to the runtime/nls/ja_JP/example.dct dictionary file.

5. Write dictionary files for other locales, they will all have name example.dct, but they will be located in the appropriate locale subdirectory under runtime/nls.

Examples

These examples are given for the Japanese locale. The dictionary property would normally be defined for the application library to which the sample object belongs, but to show a compact example, the dictionary is defined for the object itself.

Translation of Object Basename

The example object is called kanji, which is the Japanese name for Chinese characters used in written Japanese. The Japanese word for "kanji" consists of two kanji. The kanji object is an integer initialized with value 1.

V definition:

int kanji <dictionary="kanji.dct"> = 1;

A translation for "kanji" is defined in the dictionary file runtime/nls/ja_JP/kanji.dct. The translation can be represented in ANSI "C" hex format:

kanji = \xb4\xc1\xbb\xfa

or it can be written explicitly in Japanese with a Japanese editor:

Figure A-1

The corresponding object icon for kanji in the Network Editor:

Figure A-2

Numerical values, like "1", cannot be translated for integer objects. They could be translated as the value of a string object. The dictionary entries would be:

Figure A-3

Translation of String Object Value

The example string is the label subobject of a UIlabel. The UIlabel will be defined with a dictionary property. The label subobject value will be initialized to the word "Japanese" and there will be a translation for "Japanese" in the corresponding dictionary file.

V definition:

UIlabel UIlabel <dictionary="label.dct"> {
label = "Japanese";
};

The dictionary file runtime/nls/ja_JP/label.dct has an entry, either in hex format of an EUC encoding

Japanese = \xc6\xfc\xcb\xdc\xb8\xec

or explicitly in Japanese

Figure A-4

The object would appear like this in an application workspace of the Network Editor:

Figure A-5

Note how the string value is not translated in the edit window: only the raw string value is available for editing. Dictionary translations cannot be edited from the Network Editor since dictionaries are read-only. If you change the string value, the original translation is no longer valid.

For example, if you edit to remove the "ese" suffix from the label, AVS/Express looks for a translation of "Japan". If this doesn't exist, the label string value will be displayed untranslated, in English:

Figure A-6

In fact, the translation of "Japan" is just the first two kanji of the translation for "Japanese". If this entry had also been defined in the dictionary

Figure A-7

then the edited string would also yield a translated value

Figure A-8

The translation is used when the string is displayed in the User Interface:

Figure A-9

Translations could also be provided for "label", "UIlabel" and "UIshell". The icon labels and the window manager title decoration would be translated.

A.7 Adding Fonts to AVS/Express

An internally stored static data structure provides a single Font Family List (FFL) for each locale. These FFLs provide the default font for use within the NE and UI Kit in AVS/Express. The NE uses different members of a font family to display characters of different heights, depending on the scaling in use by the components being displayed. By default, the UI kit displays text using the member of a font family whose height is 12 pixels on UNIX systems and uses the default system font for all text on PCs.

Additional font families are read in at initialization from a locale-specific V file, $XP_PATH/runtime/nls/$LANG/fonts.v. By editing fonts.v, you can add additional fonts for use within the NE, or in application user interfaces created using the UI Kit.

fonts.v

The fonts.v file contains a single group object (named after the current locale) that defines any number of additional FFLs. Each FFL is defined using an object of class WTfontFamilyList. By modifying this fonts.v for your locale, you can add extra fonts for use by the NE and UI Kit.

WTfontFamilyList class definition

The WTfontFamilyList class is defined in $XP_PATH/v/wt_objs.v:

group WTfontFamily {
string family;
string charset;
};

group WTfontFamilyList {
WTfontFamily families[];
int num_families => array_size(families);
};

family

On Windows systems, this indicates the face name of the font. On Unix systems, it consists of the foundry and family components of an X Logical Font Description (XLFD) string. An asterisk (*) may be used for either component, indicating that any value may be used.

charset

On Windows systems, this indicates the charset of the font. On Unix systems, it consists of the registry and encoding components of an XLFD string. In this case, the use of asterisks is not permitted.

Example fonts.v file

The following fonts.v file defines two FFLs, each containing a single font family:

group C {
WTfontFamilyList courier {
families = {
#ifdef MSDOS
{ family = "courier",
charset = "ANSI_CHARSET"
#else
{ family = "*-courier",
charset = "ISO8859-1"
}
#endif
};
};

WTfontFamilyList times {
families = {
#ifdef MSDOS
{ family = "times",
charset = "ANSI_CHARSET" }
#else
{ family = "*-times",
charset = "ISO8859-1" }
#endif
};
};
};

ISO8859-1	Latin-1, accented characters for Western European languages: , French, Spanish, Portuguese, Italian, Dutch, Danish, Swedish, Norwegian, Icelandic, Finnish.
ISO8859-2	Latin-2, accented characters for Eastern European languages: Polish, Czech, Slovak, Hungarian, Slovene, Croat, etc.
ISO8859-3	Latin-3
ISO8859-4	Latin-4
ISO8859-5	Cyrillic alphabet for Russian, Bulgarian, etc.
ISO8859-6	Arabic alphabet for Arabic, Persian, etc.
ISO8859-7	Greek alphabet
ISO8859-8	Hebrew alphabet
ISO8859-9	Latin-5, accented characters for Turkish

A VS/Express locale	Language	Other aliases
C	english	en en_GB american en_US en_CA en_AU En_GB En_US POSIX
fr_FR	french	fr c-french fr_CH fr_BE fr_CA Fr_FR Fr_CH Fr_BE
de_DE	german	de de_CH de_AT De_DE De_CH
es_ES	spanish	es Es_ES
pt_PT	portuguese	pt Pt_PT pt_BR
it_IT	italian	it It_IT it_CH
nl_NL	dutch	nl nl_BE Nl_NL Nl_BE
da_DK	danish	da Da_DK
sv_SE	swedish	sv Sv_SE
no_NO	norwegian	no No_NO
is_IS	icelandic	is Is_IS
fi_FI	finnish	fi Fi_FI su su_SU

AVS/ Express locale	Language	Other aliases
ja_JP	japanese	ja Ja_JP

<ESC>$@	to kanji JIS C 6226-1978
<ESC>$B	to kanji JIS X 0208-1983
<ESC>&@<ESC>$B	to kanji JIS X 0208-1990
<ESC>(B	to ASCII
<ESC>(J	to JIS X 0201-1976 (JIS Roman)
<ESC>(H	to JIS X 0202-1990 (Swedish) implies to JIS Roman

<ESC>$@(A	to GB 2312-1980
<ESC>(B	to ASCII
<ESC>(T	to GB 1988-1980 (GB Roman)