Localization

Summary:

Localization Support
Writing Programs
Runtime System Settings
Database Client Settings
Front-end Settings
Runtime System Messages
Troubleshooting

Localization Support

Localization Support allows you to write BDL programs that follow a specific language and cultural rules. This includes single and multi-byte character set support, language-specific messages, as well as lexical/numeric/currency conventions.

Localization Support is based on the POSIX system libraries handling the locale. A locale is a set of language and cultural rules.

A BDL program needs to be able to determine its locale and act accordingly to be portable to different cultures.

Writing Programs

Runtime character set must match development character set

When writing a form or program source file, you use a specific character set. This character set depends upon the text editor or operating system settings you are using on the development platform. For example, when writing a string constant in a 4gl module, containing Arabic characters, you probably use the ISO-8859-6 character set. The character set used used at runtime (during program execution) must match the character set used to write programs.

At runtime, a Genero program can only work in a specific character set. However, by using Localized Strings, you can start multiple instances of the same compiled program using different locales. For a given program instance the character set used by the strings resource files must correspond to the locale. Make sure the string identifiers use ASCII only.

Byte length semantics vs Character length semantics

Genero BDL uses byte length semantics: When defining a character data type like CHAR(n) or VARCHAR(n), n represents as a number of bytes, not a number of characters. In a single-byte character set like ISO-8859-1, any character is encoded on a unique byte, so the number of bytes equals the number of characters. But in a multi-byte character set, encoding requires more that one byte, so the number of bytes to store a multi-byte string is bigger as the number of characters. For example, in a BIG5 encoding, one Chinese character needs 2 bytes, so if you want to hold a BIG5 string with a maximum of 10 Chinese characters, you must define a CHAR(20). When using a variable-length encoding like UTF-8, characters can take one, two or more bytes, so you need to choose the right average to define CHAR or VARCHAR variables.

The definition of database columns using CHAR, VARCHAR, NCHAR and NVARCHAR types varies from one database vendor to another. Some use byte length semantics, other use character length semantics, and other provide both ways. For example, Informix uses bytes only; Oracle supports byte "CHAR(10 BYTE)" or character "CHAR(10 CHAR)" length semantics. SQL Server uses a single-byte character set for CHAR/VARCHAR and uses a 2-length Unicode character set (UCS-2) for NCHAR and NVARCHAR.

Other SQL elements like functions and operators are affected by the length semantic. For example, Informix LENGTH() function always returns a number of bytes, while Oracle's LENGTH() function returns a number of characters (use LENGTHB() to get the number of bytes with Oracle).

It is important to understand properly how the database servers handle multi-byte character sets. Check your database server reference manual: In most documentations you will find a "Localization" chapter which describes those concepts in detail.

For portability, we recommend to use byte length semantic based character data types in databases, because this corresponds to the length semantics used by Genero BDL (this is important when declaring variables by using DEFINE LIKE, which is based on database schemas).

Runtime System Settings

This section describes the settings defining the locale, changing the behavior of the runtime system.

Language Settings

The LANG environment variable defines the global settings for the language used by the application. This variable changes the behavior of the character handling functions, such as UPSHIFT, DOWNSHIFT. It also changes the handling of multi-byte characters. Invalid settings of LANG will cause compilation errors if a source file contains multi-byte characters.

With the LANG environment variable, you define the language, the territory (country) and the codeset (character set) to be used. The format of the value is normalized as follows, but may be specific on some operating systems:

language[_territory[.codeset]]

Warning: Most operating system vendors define specific set of values for the language, territory and codeset. For example, on a UNIX platform, you typically set "en_US.ISO8859-1" for a US English locale, while Microsoft Windows supports "English_USA.1252", or "en_us.1252". For more details about supported locales, please refer to the operating system documentation (search for the 'setlocale' function).

See also Troubleshooting to learn how to check if a locale is properly set, and list the locales installed on your system.

Numeric and Currency Settings

To perform decimal to/from string conversions, the runtime system uses the DBMONEY or DBFORMAT environment variables. These variables define hundreds / decimal separators and currency symbols for MONEY data types.

The LC_MONETARY and LC_NUMERIC standard environment variables, defining numeric and monetary rules, are ignored.

Date and Time Settings

To perform date to/from string conversions, the runtime system uses by default the DBDATE environment variable. When assigning a string to a date variable, the standard environment variable LC_TIME is ignored.

When using the FORMAT field attribute or the USING operator to format dates with abbreviated day and month names - by using ddd / mmm markers - the system uses English-language based texts for the conversion. This means, day (ddd) and month (mmm) abbreviations are not localized according to the locale settings, they will always be in English.

Database Client Settings

This section describes the settings defining the locale for the database client.

Each database vendor has its own locale settings.

Warning: You must properly configure the database client locale in order to send/receive data to the database server, according to the locale used by your application. Both database client locale and application locale settings must match (you cannot have a database client locale in Japanese and a runtime locale in Chinese).

Here is the list of environment variables defining the locale used by the application, for each supported database client:

Database Client	Settings
Genero DB	The character set used by the client is defined by the characterset ODBC DSN configuration parameter. If this parameter is not set, it defaults to ASCII. Before version 3.80, the character set was defined by the ANTS_CHARSET environment variable.
Oracle	The client locale settings can be set with environment variables like NLS_LANG, or after connection, with the ALTER SESSION instruction. By default, the client locale is set from the database server locale.
Informix	The client locale is defined by the CLIENT_LOCALE environment variable. For backward compatibility, if CLIENT_LOCALE is not defined, other settings are used if defined (DBDATE / DBTIME / GL_DATE / GL_DATETIME, as well as standard LC_* variables).
IBM DB2	The client locale is defined by the DB2CODEPAGE profile variable. You must set this variable with the db2set command. If DB2CODEPAGE is not set, DB2 uses the operating system code page on Windows and the LANG environment variable on Unix.
Microsoft SQL Server	The client locale is defined by the Window operating system locale where the database client is installed.
PostgreSQL	The client locale can be set with the PGCLIENTENCODING environment variable, with the client_encoding configuration parameter in postgresql.conf, or after connection, with the SET CLIENT_ENCODING instruction. Check the pg_conversion system table for available character set conversions.
MySQL	The client locale is defined by the default-character-set option in the configuration file, or after connection, with the SET NAMES and SET CHARACTER SET instructions.
Sybase ASA	The client locale is defined by the operating system locale where the database client is installed.

See database vendor documentation for more details.

Front-End Settings

The front-end workstation must support the character set used on the runtime system side. You can refer to each front-end documentation to check the list of supported character sets. The host operating system must also be able to handle the character set. For instance, a Western-European Windows is not configured to handle Arabic applications. If you start an Arabic application, some graphical problems may occur (for instance the title bar won't display Arabic characters, but unwanted characters instead).

Runtime System Messages

Predefined runtime system error messages are stored in the .iem system message files. The system message files use the same technique as user defined message files (See Message Files). The default message files are located in the FGLDIR/msg/en_US directory (.msg sources are provided).

For backward compatibility with Informix 4gl, some of these system error messages are used by the runtime system to report a "normal" error during a dialog instruction. For example, end users may get the error -1309 "There are no more rows in the direction you are going" when scrolling an a DISPLAY ARRAY list.

Here are some examples of system messages that can appear during a dialog:

Number	Description
-1204	Invalid year in date.
-1304	Error in field.
-1305	This field requires an entered value.
-1306	Please type again for verification.
-1307	Cannot insert another row - the input array is full.
-1309	There are no more rows in the direction you are going.
and more...

While it is recommended to use Localized Strings to internationalize application messages, you might need to translated the default system messages to a specific locale and language, or you might just want to customize the English messages.

With this technique, you can deploy multiple message files in different languages and locales in the same FGLDIR/msg directory.

To use your own customized system messages, do the following:

Create a new directory under $FGLDIR/msg, using the same name as your current locale.
For example, if LANG=fr_FR.ISO8859-1, you must create $FGLDIR/msg/fr_FR.ISO8859-1.
Copy the original system message source files (.msg) from $FGLDIR/msg/en_US to the locale-specific directory.
For example: $FGLDIR/msg/$LANG.
Modify the source files with the .msg suffix.
Re-compile the message files with the fglmkmsg tool to produce .iem files.
Run a program to check if the new messages are used.

Warnings:

The locale can be set with different environment variables (see setlocale manual pages for more details). To identify the locale name, the runtime system first looks for the LC_ALL value, then LC_CTYPE and finally LANG.
Pay attention to locale settings when editing message files: You must use the same locale as the one used at runtime.

Troubleshooting

Locale settings (LANG) corrupted on Microsoft platforms

On Microsoft Windows XP / 2000 platforms, some system updates (Services Pack 2) or Office versions do set the LANG environment variable with a value for Microsoft applications (for example 1033).

Such value is not recognized by Genero as a valid locale specification. Make sure that the LANG environment variable is properly set in the context of Genero applications.

A form is displayed with invalid characters

You may have different codesets on the client workstation and the application server. The typical mistake that can happen is the following: You have edited a form-file with the encoding CP1253; you compile this form-file on a UNIX-server (encoding ISO-8859-7). When displaying the form, invalid characters will appear. This is usually the case when you write your source file under a Windows system (that uses Microsoft Code Page encodings), and use a Linux server (that uses ISO codepages).

Warning: All source files must be created/edited in the encoding of the server (where fglcomp and fglrun will be executed).

Checking the locale configuration on Unix platforms

On Unix systems, the locale command without parameters outputs information about the current locale environment.

Once the LANG environment variable is set, check that the locale environment is correct:

$ export LANG=en_US.ISO8859-1
$ locale
LANG=en_US.ISO8859-1
LC_CTYPE="en_US.ISO8859-1"
LC_NUMERIC="en_US.ISO8859-1"
LC_TIME="en_US.ISO8859-1"
LC_COLLATE="en_US.ISO8859-1"
LC_MONETARY="en_US.ISO8859-1"
LC_MESSAGES="en_US.ISO8859-1"
LC_PAPER="en_US.ISO8859-1"
LC_NAME="en_US.ISO8859-1"
LC_ADDRESS="en_US.ISO8859-1"
LC_TELEPHONE="en_US.ISO8859-1"
LC_MEASUREMENT="en_US.ISO8859-1"
LC_IDENTIFICATION="en_US.ISO8859-1"
LC_ALL=

If the locale environment is not correct, then you should check the value of the following environment variables : LC_ALL, LC_CTYPE, LC_NUMERIC, LC_TIME, LC_COLLATE, ... value.

The following examples show the effect of LC_ALL and LC_CTYPE on locale configuration. The LC_ALL variable overrides all other LC_.... variables values.

$ export LANG=en_US.ISO8859-1
$ export LC_ALL=POSIX
$ export LC_CTYPE=fr_FR.ISO8859-15
$ locale
LANG=en_US.ISO8859-1
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=POSIX
$ fglrun -i mbcs
LANG honored : yes
Charmap      : ANSI_X3.4-1968
Multibyte    : no
Stateless    : yes

The charset used is the ASCII charset. Clearing the LC_ALL environment variable produces the following output:

$ unset LC_ALL
$ locale
LANG=en_US.ISO8859-1
LC_CTYPE=fr_FR.ISO8859-15
LC_NUMERIC="en_US.ISO8859-1"
LC_TIME="en_US.ISO8859-1"
LC_COLLATE="en_US.ISO8859-1"
LC_MONETARY="en_US.ISO8859-1"
LC_MESSAGES="en_US.ISO8859-1"
LC_PAPER="en_US.ISO8859-1"
LC_NAME="en_US.ISO8859-1"
LC_ADDRESS="en_US.ISO8859-1"
LC_TELEPHONE="en_US.ISO8859-1"
LC_MEASUREMENT="en_US.ISO8859-1"
LC_IDENTIFICATION="en_US.ISO8859-1"
LC_ALL=
$ fglrun -i mbcs
Error: locale not supported by C library, check LANG.
$ locale charmap
ANSI_X3.4-1968

After clearing the LC_ALL value, the value of the variable LC_CTYPE is used. It appears that it is not correct. After clearing this value we get the following output:

$ unset LC_CTYPE
$ locale
LANG=en_US.ISO8859-1
LC_CTYPE="en_US.ISO8859-1"
LC_NUMERIC="en_US.ISO8859-1"
LC_TIME="en_US.ISO8859-1"
LC_COLLATE="en_US.ISO8859-1"
LC_MONETARY="en_US.ISO8859-1"
LC_MESSAGES="en_US.ISO8859-1"
LC_PAPER="en_US.ISO8859-1"
LC_NAME="en_US.ISO8859-1"
LC_ADDRESS="en_US.ISO8859-1"
LC_TELEPHONE="en_US.ISO8859-1"
LC_MEASUREMENT="en_US.ISO8859-1"
LC_IDENTIFICATION="en_US.ISO8859-1"
LC_ALL=
$ locale charmap
ISO-8859-1
$ fglrun -i mbcs
LANG honored : yes
Charmap      : ISO-8859-1
Multibyte    : no
Stateless    : yes

Verifying if the locale is properly supported by the runtime system

You can check if the LANG locale is supported properly by using the -i mbcs option of the compilers and runner programs:

$ fglcomp -i mbcs
LANG honored : yes
Charmap      : ANSI_X3.4-1968
Multibyte    : no
Stateless    : yes

The lines printed with -i info option indicate if the locale can be supported by the operating system libraries. Here is a short description of each line:

Verification Parameter	Description
LANG Honored	This line indicates that the current locale configuration has been correctly set. Check if the indicator shows 'yes'.
Charmap	This is the name of the character set used by the runtime system.
Multibyte	This line indicates if the character set is multi-byte. Can be 'yes' or 'no'.
Stateless	A few character sets are using an internal state that can change during the character flow. Only stateless character sets can be supported by Genero. Check if the indicator shows 'yes'.

How to retrieve the list of available locales on the system

On Unix systems, the locale command with the parameter '-a' writes the names of available locales.

$ locale -a
...
en_US
en_US.iso885915
en_US.utf8
en_ZA
en_ZA.utf8
en_ZW
...

How to retrieve the list of available codesets on the system

On Unix systems, the locale command with the parameter '-m' writes the names of available codesets.

$ locale -m
...
ISO-8859-1
ISO-8859-10
ISO-8859-13
ISO-8859-14
ISO-8859-15
...

Using the charmap.alias file when client has different codeset names

The name of the codeset can be different from one system to another. The file $FGLDIR/etc/charmap.alias is used to provide the translation of the local name to a generic name. The generic name is the name sent to the front-end. It is the translated name that appears when the command 'fglrun -i mbcs' is used. The local codeset name is the value obtained using the system call 'nl_langinfo(CODESET)'. Note: This file might be incomplete.

An example of locale configuration on HP

$ export LANG=en_US.iso88591
$ locale
LANG=en_US.iso88591
LC_CTYPE="en_US.iso88591"
LC_COLLATE="en_US.iso88591"
LC_MONETARY="en_US.iso88591"
LC_NUMERIC="en_US.iso88591"
LC_TIME="en_US.iso88591"
LC_MESSAGES="en_US.iso88591"
LC_ALL=
$ locale charmap
"iso88591.cm"

The charmap.alias file contains the following line:

iso88591 ISO8859-1

The name sent to the client is ISO-8859-1 instead of iso88591.

The following C program should compile, and outputs the current codeset name.

#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <langinfo.h>
int main()
{
  setlocale(LC_ALL, "");
  printf("%s\n", nl_langinfo(CODESET));
  exit(0);
}

With the previous example this program outputs:

iso88591

Runtime character set must match development character set

Byte length semantics vs Character length semantics

Warnings:

The locale can be set with different environment variables (see setlocale manual pages for more details). To identify the locale name, the runtime system first looks for the LC_ALL value, then LC_CTYPE and finally LANG.

Pay attention to locale settings when editing message files: You must use the same locale as the one used at runtime.

Warning: All source files must be created/edited in the encoding of the server (where fglcomp and fglrun will be executed).