TOC PREV NEXT INDEX

Using AVS/Express



A Worldwide Language and Font Support


This appendix describes facilities for building a worldwide application and adding to the standard fonts available for each locale within AVS/Express.

This chapter discusses:

A.1 Introduction

It is important for the productivity of the application developer and end user that visual interfaces are presented in the most natural way. AVS/Express allows developers and end users to work in their own languages. There are two high-level approaches to developing applications with AVS/Express for international markets:

Each of these project mechanisms is supported by an object property: localized projects use object aliases from the "user_name" property; internationalized projects use the "dictionary" property. The sections "Localization" and "Internationalization" describe these features and the associated project lifecycles.

You configure AVS/Express to run in a particular language through the locale model. This uses the concepts of locale, character set, encoding, and font. The sections on "Language Support" and "Locales" develop these ideas with background material and examples. AVS/Express uses aspects of the host platform's operating system and windowing system to provide worldwide language support. You should consult your operating system documentation to find out how to configure and use your platform with the language you require.

The section "Text Processing" presents an overview of strings in the V language, describes how they are exchanged between various components of the AVS/Express environment, and gives details of interfaces and formats relevant to worldwide language support.

A.2 Language Support

This section is background material designed to help you to understand the mechanics of worldwide language support. Information relating specifically to support of these features in AVS/Express is provided in later sections of this chapter and in the Release Notes.

This section describes how languages are represented for computer processing. It starts with a discussion of the two main approaches (local character sets and Unicode) and describes the choice AVS/Express has made. It then summarizes the languages, character sets, and encodings used by AVS/Express.

Text Representations

A language is represented by using one or more character sets to enumerate the abstract symbols of the language. An encoding maps the symbols in a character set to numerical codes. Representations of the symbols, called glyphs, are bound with their encoded values in a font.

There are two approaches to the definition of character sets:

Specify one or more character sets for each language; each character set may have a variety of encodings.
Create one character set that includes all of the symbols required by every language, and specify a unique encoding for this superset.

Standard local character sets are widely supported across different platforms. Local character sets allow worldwide language support to be implemented incrementally, by adding font handling and character conversion routines around existing core string processing.

The Unicode Consortium and the International Standards Organization (ISO) jointly developed a universal character set, known as Unicode. ISO10646 (1993) is a generalization of the Unicode standard (1991). With Unicode it is possible to develop multilingual applications that do not require language-specific text processing. Unicode is a fixed-width encoding, which means that every character is encoded by two bytes, even for languages such as English that use small alphabets. Unicode is only just becoming widely available. Some operating systems, such as Windows NT, use Unicode as their internal representation for all text information, but even these systems also provide interfaces and font mappings for local character sets.

AVS/Express uses local character sets.

Local Character Sets

The English alphabet can be encoded with 7-bits per character, while other European languages require 8-bit values:

A character set for the Roman alphabet as used in English. This character set is encoded in 7-bit values, that is, byte with the most significant bit 0.
This is a family of character sets incorporating ASCII in the lower half of the byte range. The upper half of the byte range is used for special punctuation, and accented Roman characters or new alphabets.
Table A-1
ISO8859-1
Latin-1, accented characters for Western European languages: , French, Spanish, Portuguese, Italian, Dutch, Danish, Swedish, Norwegian, Icelandic, Finnish.
ISO8859-2
Latin-2, accented characters for Eastern European languages: Polish, Czech, Slovak, Hungarian, Slovene, Croat, etc.
ISO8859-3
Latin-3
ISO8859-4
Latin-4
ISO8859-5
Cyrillic alphabet for Russian, Bulgarian, etc.
ISO8859-6
Arabic alphabet for Arabic, Persian, etc.
ISO8859-7
Greek alphabet
ISO8859-8
Hebrew alphabet
ISO8859-9
Latin-5, accented characters for Turkish

These are known as Single Byte Character Sets (SBCS). Each single-byte character set is limited to 256 characters.

Chinese, Japanese and Korean use thousands of characters, so they require two or more bytes to encode each character. These are Multi-byte Characters Sets (MBCS).

Multi-byte encodings are defined such that SBCS and MBCS can coexist in the same string. For example, you may use an ASCII English abbreviation in a Japanese sentence. They also have the important property that standard string processing for ASCII encoded English is valid in every locale.

There are two basic approaches to MBCS encoding:

Escape sequences delimit changes of character set between SBCS and MBCS and between different multi-byte character sets. For example:
  • Japanese Industrial Standard 7-bit encoding (JIS) for Japanese (similar forms exist for Chinese and Korean)
Byte values themselves signal how to parse subsequent bytes. The lead byte of a multi-byte character must be in the upper half of the byte range. The default MBCS is 2 bytes per character; subsequent MBCSs must use 3 or 4 bytes per character: reserved lead bytes to specify the new character set, then 2 bytes to identify the character. For example:
  • Extended UNIX Code (EUC), ISO2022-1993, a generic encoding method that can be applied to Chinese, Korean and Japanese
  • Shift-JIS (Microsoft kanji) for Japanese

Chinese characters (hanzi) are used in Korean (hanja) and Japanese (kanji). Korean phonetic symbols are called hangul. Japanese syllabic characters are called kana; there are two forms: the cursive hiragana and the more angular katakana.

Chinese, Korean and Japanese languages have national and industry standard character sets. For more details, see Supported Locales on page A-11.

Locales are distinguished by the number of bytes per character in their principal character set: single-byte locales, based on European or Middle Eastern languages; and multi-byte locales, based on the Chinese, Japanese or Korean languages.

A.3 Locales
Locale Model

The combined factors of language and culture which apply in a particular territory are grouped together in a locale. There are many considerations within a locale, such as monetary units, date and time, personal name order, collation sequence and number format, but the most important is the language.

Locales are often defined for territories that share the same language. For example, French is an official language of Belgium, Luxembourg, Switzerland, Canada, various countries in West Africa, and parts of the Caribbean, as well as in France itself. Territories that share a language often have cultural differences that are reflected in the locale. For example, in Britain the date is written day/month/year but in the U.S.A it is written month/day/year.

AVS/Express takes the current locale from the system environment at initialization time. The locale remains current throughout the AVS/Express session. The current locale is not represented by an AVS/Express object. The exact method for specifying the locale is system dependent. For more details, see Initialization on page A-7.

The default locale for AVS/Express is known as the "C" locale, and the default language is English. When AVS/Express is used in a non-default locale, it is said to use a local language. Text strings in this language will be referred to as local text or local strings.

AVS/Express supports input, display, storage and processing of local languages, but does not adapt to other factors in the locale, such as collation sequence.

Levels of Support

There are three possible levels of support for a locale in AVS/Express:

AVS/Express cannot run in this locale. AVS/Express will try to fall back to the default locale.
AVS/Express supports string input, display, and processing in the local language, but you must supply your own translation dictionaries. The AVS/Express User Interface and Network Editor appear in English.
The locale is enabled and translation dictionaries are present for strings found in the User Interface and Network Editor so they appear in the local language. You can supply translation dictionaries for strings unique to your application.
Initialization

This section describes initialization for a UNIX platform running the X/Motif window system.

The system locale is set using the LANG environment variable. The general format for LANG is:

language[_territory][.codeset][@modifier]

where clauses in square brackets, '[ ]', are optional. Each platform has its own set of values for the fields in the locale name. See your platform release notes to find the value of LANG appropriate to your system in your locale. The default locale is called "C", which implies English language. The codeset can be a character set, an encoding, or a name which implies both. Modifiers adjust certain details of the locale, such as choosing between various collation sequences or input methods.

For example, this is a valid Chinese locale for DEC OSF/1:

zh_CN.dechanzi@pinyin

The language is zh, which stands for zhong-guo-hua, meaning Chinese. The territory is CN, for the People's Republic of China. The codeset is dechanzi, a DEC-specific group of character sets for simplified Chinese characters; pinyin is a collation order based on the romanized Pinyin transliteration of Chinese words.

The AVS/Express locale is set during initialization and remains current for the rest of the session. AVS/Express uses a simplified locale name derived from the LANG environment variable. This provides a common naming convention across platforms. Optional modifiers do not affect the operation of AVS/Express, so they are ignored. To find the AVS/Express locale, the LANG value is truncated at the first '.' or '@', and the resulting name is looked up in a list of aliases within the AVS/Express locale database. When a match is found, the simplified name for that locale is used as the AVS/Express locale. The simplified format has two-letter abbreviations for both language and territory, separated by an underscore:

<language:2>_<TERRITORY:2>

For example, ja_JP is the simplified name of the AVS/Express locale for Japanese. It is derived from platform-specific LANG variables such as: ja,japanese,ja_JP.EUC, and ja_JP.deckanji.

The defaut locale is an exception to this format rule; it is just called "C".

AVS/Express uses the locale in two ways:

The relevant ISO8859 font is always loaded, but multi-byte locales require additional SBCS and MBCS fonts.
Dictionaries are searched in this relative pathname:
default: runtime/nls/C
other: runtime/nls/<language_TERRITORY>
Keyboard Input

European languages usually have direct input methods from local keyboards, perhaps using shifted key sequences. Multi-byte character sets, however, require more complex methods. A separate application mediates between keyboard input and the target text widget. On UNIX platforms this application is called a Front End Processor (FEP), and on Windows NT it is called an Input Method Editor (IME). Only when the input interaction is finished will the FEP/IME send a local string to the AVS/Express application. Input methods determine where the raw keyboard input appears on the screen and how pre-edit operations are performed. Each FEP/IME supports different input methods.

Configuring an FEP/IME is platform dependent. See the window system release notes for your platform and your locale.

Input Encoding

AVS/Express accepts string input in EUC, 7-bit (JIS) and Shift-JIS encodings for the relevant multi-byte locales. There is no configuration required; all of the encodings can be used in the one session of AVS/Express. Separate strings can have different encodings, but the encoding must be consistent throughout any individual string. There are some additional technical restrictions:

Output Encoding

AVS/Express has an output encoding type which determines how local language strings are written. The output encoding is determined by the optional codeset field of the LANG environment variable. This value can be overridden by an independent environment variable, XP_MBCS_ENCODING.

Each supported encoding has a list of recognized values for the LANG codeset and the XP_MBCS_ENCODING environment variable:

EUC, euc, eucJP, IBM-eucJP, deckanji, sdeckanji,
eucKR, IBM-eucKR, deckorean, dechanzi
JIS, jis, 7BIT, 7bit.
SJIS, sjis, Shift-JIS, ShiftJIS, IBM-932

The default output encoding, EUC, is used when the LANG codeset and XP_MBCS_ENCODING are unset or unrecognized.

Errors

If the locale for the LANG variable cannot be set on the system, AVS/Express defaults to using the C locale, and issues this message:

Warning: cannot set system locale, using C

This means that your system does not have the correct configuration of Motif, X or C libraries to support the requested locale. Consult your operating system release notes for this locale.

If the LANG codeset or XP_MBCS_ENCODING are unrecognized, AVS/Express prints a warning message:

Warning: unrecognized encoding {name}, using EUC

For the list of recognized values, see See Input Encoding on page A-8.

There are several runtime errors that can be written by AVS/Express when parsing multi-byte text in various encodings. These relate to corrupted strings: 8-bit values in a 7-bit encoding; escape sequences in an 8-bit encoding; unrecognized 7-bit escape sequences, and so forth. AVS/Express does not test every byte value for validity within the current character set, so it is possible to produce unintelligible text without any error message.

Environment Information

For more information about the LANG variable and the locales used for your session of AVS/Express, set the environment variable XP_LOCALE_DEBUG before running AVS/Express. This will force the LANG variable, system locale, AVS/Express locale and AVS/Express language name to be printed out. Here are some sample results:

Express: LANG is not set
Express: system locale is C
Express: express locale is C Default

Express: LANG is ja_JP.EUC
Express: system locale is ja_JP.EUC
Express: express locale is ja_JP Japanese

Express: LANG is fr_FR
Express: system locale is fr_FR.ISO8859-1
Express: express locale is fr_FR French

Note that the system locale may be different from the LANG variable.

If the XP_DEBUG_LOCALE environment variable is set and the locale is a multi-byte locale, useful information about the encoding variables is printed to the AVS/Express terminal. For example, if the LANG variable is ja_JP.eucJP, these are examples of possible encoding information:

Express: XP_MBCS_ENCODING is not set
Express: LANG codeset is eucJP
Express: express V output encoding is EUC

Express: XP_MBCS_ENCODING is JIS
Express: LANG codeset is eucJP
Express: express V output encoding is JIS

Notice that the XP_MBCS_ENCODING environment variable takes precedence over the LANG codeset.

Supported Locales
Default Locale

The C locale is the default. The language used in the C locale is English. AVS/Express loads the default font for the ISO8859-1 character set.

There are three situations when AVS/Express uses the C locale:

For example, suppose VE is the territory code for Venezuela. You set the LANG variable to es_VE for Spanish language in Venezuela and your system accepts this value. AVS/Express will load an ISO8859-1 font. It will look for this dictionary pathname under the AVS/Express install directory, or another project directory in $XP_PATH:

runtime/nls/es_VE

You can either create a real subdirectory with that name to contain Spanish translations unique to Venezuela, or just make it a link to es_ES to find generic Spanish dictionaries:

runtime/nls/es_VE -> es_ES

This default mechanism allows AVS/Express to run in unrecognized locales based on Western European languages (ISO8859-1 character set).

Western European Languages

These locales use the ISO8859-1 character set. The system locale is recognized by AVS/Express if it matches the AVS/Express locale name, its language name, or one of a list of other aliases. Codesets and modifiers are ignored. The supported locales are:

Table A-1
A VS/Express locale
Language
Other aliases
C
english
en en_GB american en_US en_CA en_AU En_GB En_US POSIX
fr_FR
french
fr c-french fr_CH fr_BE fr_CA Fr_FR Fr_CH Fr_BE
de_DE
german
de de_CH de_AT De_DE De_CH
es_ES
spanish
es Es_ES
pt_PT
portuguese
pt Pt_PT pt_BR
it_IT
italian
it It_IT it_CH
nl_NL
dutch
nl nl_BE Nl_NL Nl_BE
da_DK
danish
da Da_DK
sv_SE
swedish
sv Sv_SE
no_NO
norwegian
no No_NO
is_IS
icelandic
is Is_IS
fi_FI
finnish
fi Fi_FI su su_SU

Eastern European Languages

Express recognizes these Eastern European locales:

Table A-2
AVS/Express locale
Language
Other aliases
pl_PL
polish
pl
cs_CZ
czech
cs
sk_SK
slovak
sk
hu_HU
hungarian
hu

Optional LANG codesets and modifiers are ignored. AVS/Express loads a default font for the ISO8859-2 character set.

Cyrillic, Greek and Turkish

AVS/Express recognizes these additional single-byte locales:

Table A-3
AVS/ Express locale
Language
Other aliases
ru_RU
russian
ru
el_GR
greek
el
tr_TR
turkish
tr

Optional LANG codesets and modifiers are ignored. AVS/Express loads a default font for these character sets: ISO8859-5 for Russian; ISO8859-7 for Greek; and ISO8859-9 for Turkish.

Japanese Locale

AVS/Express recognizes these Japanese locales:

Table A-4
AVS/ Express locale
Language
Other aliases
ja_JP
japanese
ja Ja_JP

Optional LANG codesets and modifiers are ignored when determining the AVS/Express locale.

The AVS/Express Japanese locale loads default fonts for these character sets:

The choice between ISO8859-1 and JIS X 0201 is left to the platform window system. Usually it will choose a JIS Roman font for single-byte text. The following character sets are not supported:

AVS/Express supports all three input and output encodings: EUC, JIS, Shift-JIS. An optional LANG codeset is used to set the AVS/Express output encoding.

These JIS escape sequences are recognized in input:

Table A-5
<ESC>$@
to kanji JIS C 6226-1978
<ESC>$B
to kanji JIS X 0208-1983
<ESC>&@<ESC>$B
to kanji JIS X 0208-1990
<ESC>(B
to ASCII
<ESC>(J
to JIS X 0201-1976 (JIS Roman)
<ESC>(H
to JIS X 0202-1990 (Swedish) implies to JIS Roman

The escape sequences written on output are:

Table A-6
<ESC>$B
to kanji JIS X 0208-1983
<ESC>(J
to JIS X 0201-1976 (JIS Roman)

In AVS/Express, Japanese text is displayed from left to right, in rows from top to bottom, the same as English.

Korean Locale

AVS/Express recognizes these Korean locales:

Table A-7
AVS/ Express locale
Language
Other aliases
ko_KR
korean
ko kr

Optional LANG codesets and modifiers are ignored when determining the AVS/Express locale.

The AVS/Express Korean locale loads default fonts for these character sets:

The AVS/Express Korean locale supports EUC and 7-bit encodings for input and output. There is no Shift-JIS encoding for Korean. The LANG codeset is used to determine the AVS/Express output encoding.

These 7-bit escape sequences are recognized in input and written in output:

Table A-8
<ESC>$@(C
to KS C 5601-1992
<ESC>(B
to ASCII

In AVS/Express, Korean text is displayed from left to right, in rows from top to bottom, the same as English.

North Korea has abolished the use of borrowed Chinese characters (hanja); they are passing out of use in South Korea.

Simplified Chinese Locale

In 1956 the People's Republic of China (PRC) simplified the traditional Chinese characters in an effort to improve literacy. The traditonal forms are still widely used outside the PRC: for Chinese in Taiwan, Hong Kong and Singapore; for Japanese in Japan (kanji); and for Korean in South Korea (hanja).

AVS/Express recognizes these Simplified Chinese locales:

Table A-9
AVS/ Express locale
Language
Other aliases
zh_CN
chinese-s
zh_HK.[codeset]

Optional LANG modifiers are ignored when determining the AVS/Express locale.

The codeset is significant in determining the locale for Hong Kong. If the territory name is HK and the codeset is either absent, or one of a recognized set of simplified codeset aliases, then AVS/Express selects the Simplified Chinese locale. The recognized simplified codesets for Hong Kong are:

dechanzi

The AVS/Express Simplified Chinese locale loads default fonts for these character sets:

It is not an error if a default font is not found for GB Roman. If fonts are found for both ISO8859-1 and GB 1988-1980, then the choice of single-byte character set is left to the window system. Usually it will choose a GB Roman font when available.

The AVS/Express Simplified Chinese locale supports EUC and 7-bit encodings for input and output. There is no Shift-JIS encoding for Simplified Chinese. An optional LANG codeset is used to determine the AVS/Express output encoding.

These 7-bit escape sequences are recognized in input:

Table A-10
<ESC>$@(A
to GB 2312-1980
<ESC>(B
to ASCII
<ESC>(T
to GB 1988-1980 (GB Roman)

These 7-bit escape sequences are written in output:

Table A-11
<ESC>$@(A
to GB 2312-1980
<ESC>(T
to GB 1988-1980 (GB Roman)

In AVS/Express, Simplified Chinese text is displayed from left to right, in rows from top to bottom, the same as English.

A.4 Text Processing
Pathways
Figure A-1


AVS/Express enters, displays, and writes text in many ways. You must consider the following for worldwide language support in your application:

The V language is based on ASCII characters and the English language. Many components of the V and VCP streams will not change across locales.

Three pathways are not supported for international use:

A system of message catalogs has not yet been developed for AVS/Express.
These 3D graphics API's do not support international 3D text.
The help system used by AVS/Express is Bristol Hyperhelp. European and Japanese versions of this product are available to you for integrating your local language help into AVS/Express applications.

The remaining pathways are supported for enabled locales.

Local language input to the User Interface and Network Editor is managed by the windowing system. AVS/Express expects to receive properly formed local strings from dialog and typein widgets, possibly via an FEP/IME.

Text display for the User Interface, Network Editor, 2D Graphics Display, and 3D software renderer is accomplished using the facilities of the underlying window system. Local language titles are rendered in window decoration by the local window manager.

The OpenGL renderer does support international 3D text on UNIX platforms. It borrows X Window fonts and renders 3D text as Z-buffered bitmapped images.

The Object Manager can read local language strings from V files, VCP terminal and dictionaries. In multi-byte locales, input and output can be in any appropriate encoding: EUC, 7-bit (JIS) and Shift-JIS (Microsoft Kanji). See Locales on page A-6 for more details.

The Object Manager is the hub of string processing in AVS/Express; most of the enabled pathways for local language strings radiate from the Object Manager. The next section explains how strings are defined in V and manipulated within the Object Manager, concentrating on those aspects important for worldwide language support. In a following section, the interfaces for writing V output are described.

Strings in V

There are three basic text items within the Object Manager:

The AVS/Express default language is English. The V language uses printable ASCII for its syntax, including all keywords, delimiters, and object basenames. V string literals are enclosed in double quotes and can contain characters that are not printable ASCII.

Consider this V fragment declaring integer and cmethod objects:

string message = "Connect two objects";
cmethod update <src_file = "update.c">;

The object basenames are message and update; the string object value is initialized to "Connect two objects"; the cmethod object has the src_file property set with string value "update.c".

Since object basenames are part of the V language syntax, they must use the ASCII character set according to rules for identifiers in V. Properties are implemented as string objects within the Object Manager, so the behavior for strings applies to properties as well. Filenames can occur in properties or strings; they can be local strings when the host file system supports local pathnames.

Some property strings are taken from a small set of predefined string values. These enumerated string values should not be translated or set with local language strings. For example, the property NEdisplayMode can take only the string values "NEopened", "NEclosed" or "NEmaximized".

String objects can get their value from several sources:

String literals are lists of bytes between double quotes. The bytes can take any value, so they can represent characters from any character set.

Hexadecimal Format

String values can be set directly with characters, or encoded in the ANSI "C" hexadecimal format. For example, <ESC> is a non-printable ASCII character whose value is decimal 27, hexadecimal 0x1b. The escape character looks like this in ANSI "C" hex format: \x1b

Successive bytes can be concatenated with this representation. For example, the Japanese EUC encoding uses a pair of bytes for each character and both bytes have their most significant bit set. A Japanese kanji string object for "nihongo", which means "Japanese", could be initialized in hex format:

string nihongo = "\xc6\xfc\xcb\xdc\xb8\xec";

The text could be entered explicitly in a V file with a Japanese editor or at the VCP prompt in a Japanese terminal running AVS/Express:

Figure A-2


In either case, this is how the string object would appear in an application workspace of the Network Editor, opened, ready for editing the string value:

Figure A-3


There is a restriction on hexadecimal format for the 7-bit (JIS) encoding in multi-byte locales: hexadecimal format cannot be used in multi-byte substrings or in escape sequences. For example, the JIS encoding for "Japanese" has two escape sequences: <ESC>$@ and <ESC>(J. The hexadecimal format for <ESC> is \x1b, and the kanji text, F|K\8l, contains a backslash. This combination cannot be parsed correctly if hex format is used for the <ESC> characters. The raw byte value must be used in the string. This is invalid:

string nihongo = "\x1b$B F|K\8l \x1b(J"; /* wrong */
V Output

The function that writes V files is:

OMsave_obj

All V syntax is printable ASCII. The default behavior for writing string literals is to use ANSI "C" hexadecimal format for all characters that are not printable ASCII. If the V file is read back in to AVS/Express the strings will be restored to their original form, but if the V file is viewed in a local text editor, the hexadecimal representation will be displayed instead of the original character. This will make all MBCS strings illegible and will corrupt accented characters in European language strings.

If you are displaying in a Japanese terminal that uses JIS Roman as its single-byte character set, not only will the strings be displayed in hexadecimal format, but the ASCII backslash will be displayed as a JIS Roman Yen symbol, making the output even more obscure.

To save V strings with raw byte values for characters that are not printable ASCII, pass the flag OM_SAVE_8BIT in the mode argument of the OMsave_obj function. This will allow local language strings to be saved intact, so that they can be viewed, and possibly changed, in a local text editor.

Commands entered at the VCP, such as $save or $print, which result in V being written to a file or to the VCP terminal, use the OM_SAVE_8BIT flag. If you are running AVS/Express in a local terminal and your V code contains local string literals, then the V output from $print will display legible local strings.

In the Network Editor, the pulldown menu options Save Application and Save Objects will write V files with the OM_SAVE_8BIT flag set on the call to OMsave_obj. They are safe to use with local strings in European and Asian locales.

The encoding used for V output is determined by the global MBCS output encoding. See Output Encoding on page A-9.

A.5 Localization

String objects and object properties can hold local strings. Raw local string values are displayed on components of the User Interface. Object basenames must be in ASCII English, but they can be assigned local aliases with the "user_name" property. The object username is displayed on the object's icon in the Network Editor. You can develop, save, and run applications in your local language. These applications can be used only in the same AVS/Express locale as the original development. Porting this local application to another language requires a new set of local V files, or a revision for internationalization.

Username Property

A username is a display alias for an object. The username is used in preference to the basename for displaying and interacting with the object in the Network Editor. The username is a property, so it is saved and restored with the object definition in V. You can define a username for any type of object; for example:

macro Viewer3D <user_name="3DViewer"> { ...

The Object Manager will still use the basename to refer to the object, so the username is not restricted by the usual rules governing object identifiers: it does not have to be unique; it can contain punctuation and delimiters, it can start with a numeral (as in the example), and it does not even have to be in ASCII. The username is just a string, so it can have any format allowed for string literals in V, including ANSI "C" hexadecimal format, extended ASCII or multi-byte characters.

The name alias applies only in the Network Editor. The VCP interface will still use the basename. Navigation and reference at the VCP prompt requires the basename, and the output from inquiries such as the $list command will return the basename.

The username itself is not part of the worldwide language support mechanism, but it will be translated just like any other string. It is not advisable to use translation on a username that is already in a local language. The main problem with this double-translation is for Asian languages where encodings must agree in the string and the dictionary key. For more information about dictionaries and translation strategies, see Internationalization on page A-26.

Object Manager interface

The Object Manager provides functions to access the username property:

OMget_obj_user_name, OMret_obj_user_name

They are wrappers for the underlying property inquiry routine:

OMget_obj_sprop( ... OM_prop_user_name, ... );
Network Editor interface

The username is displayed on the object as it appears in the Network Editor: on the icon in a library palette, on the icon in the workspace, or as the title of an open or maximized object in the workspace.

You can set the user_name property from the Property Editor in the object icon's popup menu.

The object Rename operation changes the basename, not the username. The renamed object cannot have a local language basename.

Localized Projects

A project is localized when it has locale-specific V files. These are files containing local language strings in object username properties or string object values. In general, it is not possible to internationalize these strings using the dictionary mechanism, so the V files must be edited by hand to port the application to a new locale. This is time consuming and error prone, and it creates a shadow set of V files for development, distribution and maintenance. This is not the recommended method of implementing internationalized applications. An application should be localized only when it is not intended to be run in another locale.

The benefits of localization are that the development can take place in the local language. You can access all of the Network Editor and User Interface features in your local language, including text typeins and dialogs. There are two important development processes that cause local language strings to be written in the V files for the application, either by editing of V files directly, using a local text editor, or through the Network Editor's visual programming interface:

Other transient actions relating to the use of AVS/Express do not write V files and so cannot tie the application to the current locale. The localized V files will be written at any subsequent "Save Application" or "Save Objects". The locale name is not saved with the application. It is your responsibility to run a localized application in the correct locale.

Examples

These examples of localized objects are given for the Japanese locale.

Local Object Name

The example object is the same as that used for Internationalized examples in the previous section: an integer called kanji initialized to 1. You localize this object basename by adding a translation in the username property. This can be added directly in V, either in a V file or at the VCP prompt, with local username in ANSI "C" hexadecimal format:

int kanji <user_name="\xb4\xc1\xbb\xfa"> = 1;

or explicitly in Japanese, using a Japanese editor on a V file, or at the VCP prompt when running AVS/Express in a Japanese terminal

Figure A-1


The username can also be added using the Properties Editor:

Figure A-2


Note that the dialog user interface has been translated by the AVS/Express dictionaries.

The kanji object would now appear like this in the Network Editor:

Figure A-3


Local String Object Value

The example object is a UIlabel with a local value for the label string subobject. Local string values can be entered in V, either in a V file or at the VCP prompt. The string can be given in hex format for an EUC encoding

UIlabel UIlabel {
label = "\xc6\xfc\xcb\xdc\xb8\xec";
};

or explicitly in Japanese using a local editor or a local VCP terminal

Figure A-4


The string can also be entered directly at an Edit Value typein in the Network Editor:

Figure A-5


To get local versions of "label" and "UIlabel", the username property must be defined for both objects. See the previous example.

A.6 Internationalization

Using the translation utility implemented in AVS/Express, object names and string object values can be translated from English to a local language, or even from one English form to another. Translations are loaded from dictionary files whose names are specified by a "dictionary" property. The dictionary files are read from a directory determined by the current locale. You can customize system dictionaries and supply application-specific dictionaries for your project. Translations are performed automatically by the Object Manager before returning strings to the application. The User Interface and Network Editor use these facilities to provide an internationalized interface.

Dictionary Property

The Object Manager loads a dictionary into a translation table when it encounters an object with the dictionary property set. The dictionary property's value is a file name. For example:

library Mappers <dictionary="mappers.dct"> { ...

The Object Manager loads the dictionary from a file with this relative pathname:

runtime/nls/<locale_name>/<dictionary_file>

So, if the current locale was ja_JP, for example, the dictionary would be loaded from:

runtime/nls/ja_JP/mappers.dct

The pathname is searched relative to the current list of project directories, as determined by the XP_PATH environment variable. This means that the application project directories are searched before the AVS/Express install directory, which is typically at the end of the list. If the file is found, it is loaded into a dynamic translation table for use by the Object Manager. If the file is not found, it is not an error and no warning is issued.

There is no default dictionary. If no dictionary properties are set, then there will be no translations. In fact, AVS/Express does set dictionary properties on some system libraries. The corresponding dictionary files can be found in runtime/nls/<locale_name> under the AVS/Express install directory. For example, the Network Editor uses the dictionaries ne.dct and root.dct.

When multiple dictionaries have been loaded, any translation request will use all the translation tables to resolve the string. Tables are searched in the order they were loaded.

Dictionary Files

Dictionary files are composed of comments and translation entries. Comments have '#' as the first character:

# optional comment text

Comments are ignored during processing. Other instances of '#' embedded in the line do not qualify the line as a comment. Text to the right of the embedded '#' is not ignored.

Translation entries are loaded into a dynamic translation table:

English text = local text

The English text is everything between the start of the line and the '=', except surrounding white space. The local text translation is everything between the '=' and the end of the line, except surrounding white space. If the line does not match this format, or if either text field is empty, AVS/Express ignores the entry.

Local text can have any syntax acceptable to the V parser. For example, the local text could be specified in ANSI "C" hexadecimal format. Entries must have unique English text. AVS/Express ignores subsequent occurrences of an entry with the same English text. Translations of different English text to the same local text are allowed. For example, translation of English text is case sensitive, but phonetic and ideographic languages do not have different cases, so "Red" and "red" should get the same Japanese text translation.

Embedded underscores, '_', are usually removed from object names before they are displayed. For example, the following V macro definition will be labelled "Workspace 1" on its icon in the Network Editor:

library Workspace_1 { ...

To translate the name, the full object basename, with embedded underscores, is required in the dictionary:

# match
Workspace_1 = Zone de travail 1
# no match
Workspace 1 = Zone de travail 1

The Object Manager will automatically rename repeated instances of the same object template by adding an instance number. For example, repeated instances of Read_Field will be called Read_Field, Read_Field#1, Read_Field#2. These names must be translated independently, each name must have a separate entry in the dictionary.

Object Manager Interface

The Object Manager always tries to translate object basenames and string object values when they are accessed with these functions:

OMget_str_val,OMget_str_array_val

If there is a valid dictionary with a match for the string, they return the translation. There is a function similar to OMget_str_val which takes an additional mode argument:

OMget_str_val_mode

To suppress translation of the returned string value, pass the flag OM_STR_NO_I18N in the mode argument.

There are two low-level functions for the Translation Manager utility:

TMload_dictionary, TMget_translation

The dictionary mechanism is read-only: dictionaries can be read from a file and an English string looked up in the table.

Network Editor Interface

The Network Editor uses the system dictionary ne.dct to customize its menu and dialog user interfaces. It uses standard Object Manager inquiries to retrieve translated basenames and strings values for the display of object icons, library palettes, and object properties. The OM_STR_NO_I18N flag is not set and strings are always translated if possible.

You can set the dictionary property on library objects using the Properties Editor in the library icon's popup menu system. The restriction to libraries is a matter of style: libraries are a useful level at which to collect translations. The dictionary property is not accessible through the Properties Editor on other object types. This does not affect your ability to set the property explicitly, either at the VCP prompt or by editing a V file.

Internationalized Projects

AVS/Express is internationalized: the same executable and the same suite of V files can run in any supported locale. The V files contain English strings and translation is effected at runtime by using dictionary files. Developers who want their application to run in more than one locale should follow the model of developing their applications first in English, then internationalizing later.

The advantages of internationalizing through dictionaries are:

The translators need an appreciation of the application in order to devise correct and helpful translations, but they need not know anything about programming or AVS/Express in order to write the dictionaries.

There are some aspects of development which affect usability in other languages. For example:

There are two basic schemes for using dictionaries:

Original Text Entries

Use raw English strings and object basenames that you want to appear in the User Interface and Network Editor.

The advantages of this scheme are:

Keyword Entries

Use English keywords in strings and object basenames, which you do not want to appear in the User Interface or Network Editor.

There are two ways to generate the keyword: invent a totally artificial code word, or just add a modifier to the original English text.

The advantage of using keyword entries is that it resolves ambiguities between English words in different contexts. Simply use different keywords in V, with the same English translations in the C locale dictionary, but with different translations in the locale where the ambiguity arises.

The dictionary used by the Network Editor uses keyword entries. For example, the first pulldown menu option is labelled NE_MENU_FILE, which is translated to "&File" in the dictionary runtime/nls/C/ne.dct (the "&" signifies the accelerator key for Motif).

Internationalizing a Localized Project

The correct way to internationalize a localized project is:

1. Create a dictionary for the development locale.
2. Add the dictionary property to the appropriate application library.
3. Remove all local language username properties from the V files, add a dictionary entry for each one as a translation of the basename string.
4. Extract all local language string object values from the V files, replace them with English equivalents, and add an entry in the dictionary for each of these translations.
5. Write dictionary files for other locales.

The main barrier to developing internationalized applications in the local language is that there are no direct interfaces from the Network Editor to translation tables, and no project mechanisms for saving dictionaries from these translation sessions. The dictionary mechanism is read only.

The Japanese objects shown in See Examples on page A-24. could be internationalized by following these steps:

1. Create a dictionary file, say example.dct, for the Japanese locale and put it in directory runtime/nls/ja_JP.
2. Add a dictionary property, with value "example.dct", to the appropriate application library, or perhaps explicitly for the kanji or UIlabel objects.
3. Remove the username property from the kanji object and add the Japanese translation for "kanji" to the runtime/nls/ja_JP/example.dct dictionary file.
4. Change the value of the label subobject to be "Japanese" and add the Japanese translation for "Japanese" to the runtime/nls/ja_JP/example.dct dictionary file.
5. Write dictionary files for other locales, they will all have name example.dct, but they will be located in the appropriate locale subdirectory under runtime/nls.
Examples

These examples are given for the Japanese locale. The dictionary property would normally be defined for the application library to which the sample object belongs, but to show a compact example, the dictionary is defined for the object itself.

Translation of Object Basename

The example object is called kanji, which is the Japanese name for Chinese characters used in written Japanese. The Japanese word for "kanji" consists of two kanji. The kanji object is an integer initialized with value 1.

V definition:

int kanji <dictionary="kanji.dct"> = 1;

A translation for "kanji" is defined in the dictionary file runtime/nls/ja_JP/kanji.dct. The translation can be represented in ANSI "C" hex format:

kanji = \xb4\xc1\xbb\xfa

or it can be written explicitly in Japanese with a Japanese editor:

Figure A-1


The corresponding object icon for kanji in the Network Editor:

Figure A-2


Numerical values, like "1", cannot be translated for integer objects. They could be translated as the value of a string object. The dictionary entries would be:

Figure A-3


Translation of String Object Value

The example string is the label subobject of a UIlabel. The UIlabel will be defined with a dictionary property. The label subobject value will be initialized to the word "Japanese" and there will be a translation for "Japanese" in the corresponding dictionary file.

V definition:

UIlabel UIlabel <dictionary="label.dct"> {
label = "Japanese";
};

The dictionary file runtime/nls/ja_JP/label.dct has an entry, either in hex format of an EUC encoding

Japanese = \xc6\xfc\xcb\xdc\xb8\xec

or explicitly in Japanese

Figure A-4


The object would appear like this in an application workspace of the Network Editor:

Figure A-5


Note how the string value is not translated in the edit window: only the raw string value is available for editing. Dictionary translations cannot be edited from the Network Editor since dictionaries are read-only. If you change the string value, the original translation is no longer valid.

For example, if you edit to remove the "ese" suffix from the label, AVS/Express looks for a translation of "Japan". If this doesn't exist, the label string value will be displayed untranslated, in English:

Figure A-6


In fact, the translation of "Japan" is just the first two kanji of the translation for "Japanese". If this entry had also been defined in the dictionary

Figure A-7


then the edited string would also yield a translated value

Figure A-8


The translation is used when the string is displayed in the User Interface:

Figure A-9


Translations could also be provided for "label", "UIlabel" and "UIshell". The icon labels and the window manager title decoration would be translated.


A.7 Adding Fonts to AVS/Express

An internally stored static data structure provides a single Font Family List (FFL) for each locale. These FFLs provide the default font for use within the NE and UI Kit in AVS/Express. The NE uses different members of a font family to display characters of different heights, depending on the scaling in use by the components being displayed. By default, the UI kit displays text using the member of a font family whose height is 12 pixels on UNIX systems and uses the default system font for all text on PCs.

Additional font families are read in at initialization from a locale-specific V file, $XP_PATH/runtime/nls/$LANG/fonts.v. By editing fonts.v, you can add additional fonts for use within the NE, or in application user interfaces created using the UI Kit.

fonts.v

The fonts.v file contains a single group object (named after the current locale) that defines any number of additional FFLs. Each FFL is defined using an object of class WTfontFamilyList. By modifying this fonts.v for your locale, you can add extra fonts for use by the NE and UI Kit.

WTfontFamilyList class definition

The WTfontFamilyList class is defined in $XP_PATH/v/wt_objs.v:

group WTfontFamily {
string family;
string charset;
};

group WTfontFamilyList {
WTfontFamily families[];
int num_families => array_size(families);
};
family
On Windows systems, this indicates the face name of the font. On Unix systems, it consists of the foundry and family components of an X Logical Font Description (XLFD) string. An asterisk (*) may be used for either component, indicating that any value may be used.
charset
On Windows systems, this indicates the charset of the font. On Unix systems, it consists of the registry and encoding components of an XLFD string. In this case, the use of asterisks is not permitted.
Example fonts.v file

The following fonts.v file defines two FFLs, each containing a single font family:

group C {
WTfontFamilyList courier {
families = {
#ifdef MSDOS
{ family = "courier",
charset = "ANSI_CHARSET"
#else
{ family = "*-courier",
charset = "ISO8859-1"
}
#endif
};
};
WTfontFamilyList times {
families = {
#ifdef MSDOS
{ family = "times",
charset = "ANSI_CHARSET" }
#else
{ family = "*-times",
charset = "ISO8859-1" }
#endif
};
};
};


TOC PREV NEXT INDEX