Back to Contents


Internationalization and GAS


This section explains how the Genero Application Server handles international applications.

Topics

NOTE: Encoding rules have been enhanced for the snippet-based rendering engine, used by the GAS for the GWC. You can customize rendering engine output encoding as well as preferred input encoding. You are also able to use User Agent-preferred encodings.


Encoding Architecture

International applications are applications using one or more non-ASCII character sets to support one or more languages. The diagram below summarizes the GAS encoding architecture:


Charsets Configuration

Charsets can be defined in four places :

  1. With environment locales when launching a DVM.
  2. In HTML charset in template.
  3. Inside XML files used by the GAS.
  4. With environment locales when launching the GAS.

DVM Locale

If application files (such as .4gl, .per, .4st files) contain characters in a specific encoding, the DVM has to run in this encoding.

Setting a DVM in a specific encoding is described in the Genero BDL Reference Manual, section "Programming Applications", chapter "Localization". Locales can be set in the GAS executing environment, or with the <ENVIRONMENT_VARIABLE> tag inside the as.xcf file.

Example in as.xcf with KOI8-R (Russian) charset:

01 <?xml version="1.0" encoding="UTF-8"?>
02 <?fjsApplicationServerConfiguration Version="1.30"?>
...
130 <COMPONENT_LIST>
131   <EXECUTION_COMPONENT Id="cpn.wa.execution.local">
132     <ENVIRONMENT_VARIABLE Id="FGLDIR">$(res.fgldir)</ENVIRONMENT_VARIABLE>
133     <ENVIRONMENT_VARIABLE Id="FGLGUI">$(res.fglgui)</ENVIRONMENT_VARIABLE>
134     <ENVIRONMENT_VARIABLE Id="PATH">$(res.path)</ENVIRONMENT_VARIABLE>
...
139     <ENVIRONMENT_VARIABLE Id="LC_ALL">ru_RU.KIO8-R</ENVIRONMENT_VARIABLE>
140     <DVM>$(res.dvm.wa)</DVM>
141   </EXECUTION_COMPONENT>
...
158 </COMPONENT_LIST>

HTML charset

In order to correctly handle application data in the User Agent, the HTML page charset needs to be set. Because GAS generates HTML pages from templates, charset needs to be defined in templates.  Information about setting a charset in an HTML page can be found in HTML Specification - The Document Character Set.

Example in generodefault.html with BIG5 (Chinese) charset:

01 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
02 <html>
03   <head>
04   $(res.meta-tags)
05   <meta http-equiv="Content-Type" content="text/html; charset=BIG5">
06   <title>Title of the page</title>
07
08   <script language=javascript src="$(connector.uri)/fjs/uaapi/webBrowser.js"></script>
09   <script language=javascript src="$(connector.uri)/fjs/asapi/application.js"></script>
...
19   </head>
...

XML Encoding

GAS uses XML files like as.xcf or external application configuration files, and these files may include international characters. How to define an encoding in an XML file is described in Extensible Markup Language - Character Encoding.

Example in as.xcf with ISO-8858-6 (Arabic) charset:

01 <?xml version="1.0" encoding="ISO-8859-6"?>
02 <?fjsApplicationServerConfiguration Version="1.30"?>
03 <CONFIGURATION>
...

GAS System Encoding

GAS interacts with Operating Systems in many ways:

In these cases, GAS needs to know which encoding is used by the Operating System. The Operating System encoding is defined via environment variables as described in The Single Unix - Specification Version 2 - Locale.

Example in command line with th_TH.tis620 (Thai) locale:

01 LC_ALL=th_TH.tis620 gasd -d
Then gasd starts with 'TIS-620' system encoding

Locales supported by an Operating System can be displayed with command locale -a. If the Operating System doesn't support the desired encoding, or if a specific encoding is needed, the system encoding can be defined with the FGLAS_SYSENCODING environment variable which overrides system locales.

Example in command line with UTF-8 :

01 LC_ALL=th_TH.tis620 FGLAS_SYSENCODING=UTF-8 gasd -d
Then gasd starts with 'UTF-8' system encoding

Note: Encodings have different names across Operating Systems. To unify them, GAS manages an encoding name conversion. For each UNIX platform, a charset.alias file is provided for mapping the Operating System encoding name to a canonical encoding name.

Default Encoding

By default, GAS uses UTF-8 encoding for handling all Unicode characters.


Supported Charsets

The following list contains all character sets known by the GAS. One coded character set can be listed with several different names. Depending on your Operating System, DVM may support  these character sets.  Refer to the Genero BDL Reference Manual, section "Programming Applications", chapter "Localization" for more information.
ANSI_X3.4-1968 ANSI_X3.4-1986 ASCII CP367 IBM367 ISO-IR-6 ISO646-US ISO_646.IRV:1991 US US-ASCII CSASCII
UTF-8
ISO-10646-UCS-2 UCS-2 CSUNICODE
UCS-2BE UNICODE-1-1 UNICODEBIG CSUNICODE11
UCS-2LE UNICODELITTLE
ISO-10646-UCS-4 UCS-4 CSUCS4
UCS-4BE
UCS-4LE
UTF-16
UTF-16BE
UTF-16LE
UTF-32
UTF-32BE
UTF-32LE
UNICODE-1-1-UTF-7 UTF-7 CSUNICODE11UTF7
UCS-2-INTERNAL
UCS-2-SWAPPED
UCS-4-INTERNAL
UCS-4-SWAPPED
C99
JAVA
CP819 IBM819 ISO-8859-1 ISO-IR-100 ISO8859-1 ISO_8859-1 ISO_8859-1:1987 L1 LATIN1 CSISOLATIN1
ISO-8859-2 ISO-IR-101 ISO8859-2 ISO_8859-2 ISO_8859-2:1987 L2 LATIN2 CSISOLATIN2
ISO-8859-3 ISO-IR-109 ISO8859-3 ISO_8859-3 ISO_8859-3:1988 L3 LATIN3 CSISOLATIN3
ISO-8859-4 ISO-IR-110 ISO8859-4 ISO_8859-4 ISO_8859-4:1988 L4 LATIN4 CSISOLATIN4
CYRILLIC ISO-8859-5 ISO-IR-144 ISO8859-5 ISO_8859-5 ISO_8859-5:1988 CSISOLATINCYRILLIC
ARABIC ASMO-708 ECMA-114 ISO-8859-6 ISO-IR-127 ISO8859-6 ISO_8859-6 ISO_8859-6:1987 CSISOLATINARABIC
ECMA-118 ELOT_928 GREEK GREEK8 ISO-8859-7 ISO-IR-126 ISO8859-7 ISO_8859-7 ISO_8859-7:1987 CSISOLATINGREEK
HEBREW ISO-8859-8 ISO-IR-138 ISO8859-8 ISO_8859-8 ISO_8859-8:1988 CSISOLATINHEBREW
ISO-8859-9 ISO-IR-148 ISO8859-9 ISO_8859-9 ISO_8859-9:1989 L5 LATIN5 CSISOLATIN5
ISO-8859-10 ISO-IR-157 ISO8859-10 ISO_8859-10 ISO_8859-10:1992 L6 LATIN6 CSISOLATIN6
ISO-8859-13 ISO-IR-179 ISO8859-13 ISO_8859-13 L7 LATIN7
ISO-8859-14 ISO-CELTIC ISO-IR-199 ISO8859-14 ISO_8859-14 ISO_8859-14:1998 L8 LATIN8
ISO-8859-15 ISO-IR-203 ISO8859-15 ISO_8859-15 ISO_8859-15:1998 LATIN-9
ISO-8859-16 ISO-IR-226 ISO8859-16 ISO_8859-16 ISO_8859-16:2001 L10 LATIN10
KOI8-R CSKOI8R
KOI8-U
KOI8-RU
CP1250 MS-EE WINDOWS-1250
CP1251 MS-CYRL WINDOWS-1251
CP1252 MS-ANSI WINDOWS-1252
CP1253 MS-GREEK WINDOWS-1253
CP1254 MS-TURK WINDOWS-1254
CP1255 MS-HEBR WINDOWS-1255
CP1256 MS-ARAB WINDOWS-1256
CP1257 WINBALTRIM WINDOWS-1257
CP1258 WINDOWS-1258
850 CP850 IBM850 CSPC850MULTILINGUAL
862 CP862 IBM862 CSPC862LATINHEBREW
866 CP866 IBM866 CSIBM866
MAC MACINTOSH MACROMAN CSMACINTOSH
MACCENTRALEUROPE
MACICELAND
MACCROATIAN
MACROMANIA
MACCYRILLIC
MACUKRAINE
MACGREEK
MACTURKISH
MACHEBREW
MACARABIC
MACTHAI
HP-ROMAN8 R8 ROMAN8 CSHPROMAN8
NEXTSTEP
ARMSCII-8
GEORGIAN-ACADEMY
GEORGIAN-PS
KOI8-T
MULELAO-1
CP1133 IBM-CP1133
ISO-IR-166 TIS-620 TIS620 TIS620-0 TIS620.2529-1 TIS620.2533-0 TIS620.2533-1
CP874 WINDOWS-874
VISCII VISCII1.1-1 CSVISCII
TCVN TCVN-5712 TCVN5712-1 TCVN5712-1:1993
ISO-IR-14 ISO646-JP JIS_C6220-1969-RO JP CSISO14JISC6220RO
JISX0201-1976 JIS_X0201 X0201 CSHALFWIDTHKATAKANA
ISO-IR-87 JIS0208 JIS_C6226-1983 JIS_X0208 JIS_X0208-1983 JIS_X0208-1990 X0208 CSISO87JISX0208
ISO-IR-159 JIS_X0212 JIS_X0212-1990 JIS_X0212.1990-0 X0212 CSISO159JISX02121990
CN GB_1988-80 ISO-IR-57 ISO646-CN CSISO57GB1988
CHINESE GB_2312-80 ISO-IR-58 CSISO58GB231280
CN-GB-ISOIR165 ISO-IR-165
ISO-IR-149 KOREAN KSC_5601 KS_C_5601-1987 KS_C_5601-1989 CSKSC56011987
EUC-JP EUCJP EXTENDED_UNIX_CODE_PACKED_FORMAT_FOR_JAPANESE CSEUCPKDFMTJAPANESE
MS_KANJI SHIFT-JIS SHIFT_JIS SJIS CSSHIFTJIS
CP932
ISO-2022-JP CSISO2022JP
ISO-2022-JP-1
ISO-2022-JP-2 CSISO2022JP2
CN-GB EUC-CN EUCCN GB2312 CSGB2312
CP936 GBK MS936 WINDOWS-936
GB18030
ISO-2022-CN CSISO2022CN
ISO-2022-CN-EXT
HZ HZ-GB-2312
EUC-TW EUCTW CSEUCTW
BIG-5 BIG-FIVE BIG5 BIGFIVE CN-BIG5 CSBIG5
CP950
BIG5-HKSCS BIG5HKSCS
EUC-KR EUCKR CSEUCKR
CP949 UHC
CP1361 JOHAB
ISO-2022-KR CSISO2022KR