Localization

Summary:

Localization Support
Writing Programs
Runtime System Settings
Database Client Settings
Front-end Settings
Troubleshooting

Localization Support

Localization Support allows you to write BDL programs that follow a specific language and cultural rules. This includes single and multi-byte character set support, language-specific messages, as well as lexical/numeric/currency conventions.

Localization Support is based on the system libraries handling the locale. A locale is a set of language and cultural rules.

A BDL program needs to be able to determine its locale and act accordingly to be portable to different cultures.

Writing Programs

When writing a form or program source file, you use a specific character set. This character set depends upon the text editor or operating system settings you are using on the development platform. For example, when writing a string constant in a 4gl module, containing Arabic characters, you probably use the ISO-8859-6 character set.

Runtime System Settings

This section describes the settings defining the locale, changing the behavior of the runtime system.

Language Settings

The LANG environment variable defines the global settings for the language used by the application. This variable changes the behavior of the character handling functions, such as UPSHIFT, DOWNSHIFT. It also changes the handling of multi-byte characters. Invalid settings of LANG will cause compilation errors if a source file contains multi-byte characters.

With the LANG environment variable, you define the language, the territory (country) and the codeset (character set) to be used. The format of the value is normalized as follows, but may be specific on some operating systems:

language[_territory[.codeset]]

Warning: Most operating system vendors define specific set of values for the language, territory and codeset. For example, on a UNIX platform, you typically set "en_US.ISO8859-1" for a US English locale, while Microsoft Windows supports "English_USA.1252", or "en_us.1252".

For more details about supported locales, please refer to the operating system documentation (search for the 'setlocale' function).

See also Troubleshooting to learn how to check if a locale is properly set, and list the locales installed on your system.

Numeric / Currency Settings

The standard environment variables (LC_MONETARY and LC_NUMERIC) defining numeric and monetary rules are ignored. The runtime system uses the DBMONEY or DBFORMAT environment variables to define numeric formatting and currency symbols.

Date and Time Settings

The standard information environment variable (LC_TIME) defining date and time rules is ignored. You must use the DBDATE environment variable to define date formatting.

Database Client Settings

This section describes the settings defining the locale for the database client.

Each database vendor has its own locale settings.

Warning: You must properly configure the database client locale in order to send/receive data to the database server, according to the locale used by your application. Both database client locale and application locale settings must match (you cannot have a database client locale in Japanese and a runtime locale in Chinese).

Here is the list of environment variables defining the locale used by the application, for each supported database client:

Database Client	Settings
Genero DB	DB Client locale is defined by the user default locale (i.e. Regional Options) on Windows platforms, and by the LANG environment variable on Unix platforms.
Oracle	The DB client locale settings can be set with environment variables like NLS_LANG, or after connection, with the ALTER SESSION instruction. By default, the client locale is set from the database server locale.
Informix	The DB client locale is defined by the CLIENT_LOCALE environment variable. For backward compatibility, if CLIENT_LOCALE is not defined, other settings are used if defined (DBDATE / DBTIME / GL_DATE / GL_DATETIME, as well as standard LC_* variables).
IBM DB2	The DB client locale is defined by the DB2CODEPAGE profile variable. You must set this variable with the db2set command. If DB2CODEPAGE is not set, DB2 uses the operating system code page on Windows and the LANG environment variable on Unix.
Microsoft SQL Server	The DB client locale is defined by the Window operating system locale where the database client is installed.
PostgreSQL	The DB client locale can be set with the PGCLIENTENCODING environment variable, with the client_encoding configuration parameter in postgresql.conf, or after connection, with the SET CLIENT_ENCODING instruction. Check the pg_conversion system table for available character set conversions.
MySQL	The DB client locale is defined by the default-character-set option in the configuration file, or after connection, with the SET NAMES and SET CHARACTER SET instructions.
Sybase ASA	The DB client locale is defined by the operating system locale where the database client is installed.
Adabas	The DB client locale is defined by the operating system locale where the database client is installed.

See database vendor documentation for more details.

Front-End Settings

The front-end workstation must support the character set used on the runtime system side. You can refer to each front-end documentation to check the list of supported character sets. The host operating system must also be able to handle the character set. For instance, a Western-European Windows is not configured to handle Arabic applications. If you start an Arabic application, some graphical problems may occur (for instance the title bar won't display Arabic characters, but unwanted characters instead).

Troubleshooting

A form is displayed with invalid characters

You may have different codesets on the client workstation and the application server. The typical mistake that can happen is the following: You have edited a form-file with the encoding CP1253; you compile this form-file on a UNIX-server (encoding ISO-8859-7). When displaying the form, invalid characters will appear. This is usually the case when you write your source file under a Windows system (that uses Microsoft Code Page encodings), and use a Linux server (that uses ISO codepages).

Warning: All source files must be created/edited in the encoding of the server (where fglcomp and fglrun will be executed).

Checking the locale configuration on Unix platforms

On Unix systems, the locale command without parameters outputs information about the current locale environment.

Once the LANG environment variable is set, check that the locale environment is correct:

$ export LANG=en_US.ISO8859-1
$ locale
LANG=en_US.ISO8859-1
LC_CTYPE="en_US.ISO8859-1"
LC_NUMERIC="en_US.ISO8859-1"
LC_TIME="en_US.ISO8859-1"
LC_COLLATE="en_US.ISO8859-1"
LC_MONETARY="en_US.ISO8859-1"
LC_MESSAGES="en_US.ISO8859-1"
LC_PAPER="en_US.ISO8859-1"
LC_NAME="en_US.ISO8859-1"
LC_ADDRESS="en_US.ISO8859-1"
LC_TELEPHONE="en_US.ISO8859-1"
LC_MEASUREMENT="en_US.ISO8859-1"
LC_IDENTIFICATION="en_US.ISO8859-1"
LC_ALL=

If the locale environment is not correct, then you should check the value of the following environment variables : LC_ALL, LC_CTYPE, LC_NUMERIC, LC_TIME, LC_COLLATE, ... value.

The following examples show the effect of LC_ALL and LC_CTYPE on locale configuration. The LC_ALL variable overrides all other LC_.... variables values.

$ export LANG=en_US.ISO8859-1
$ export LC_ALL=POSIX
$ export LC_CTYPE=fr_FR.ISO8859-15
$ locale
LANG=en_US.ISO8859-1
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=POSIX
$ fglrun -i mbcs
LANG honored : yes
Charmap      : ANSI_X3.4-1968
Multibyte    : no
Stateless    : yes

The charset used is the ASCII charset. Clearing the LC_ALL environment variable produces the following output:

$ unset LC_ALL
$ locale
LANG=en_US.ISO8859-1
LC_CTYPE=fr_FR.ISO8859-15
LC_NUMERIC="en_US.ISO8859-1"
LC_TIME="en_US.ISO8859-1"
LC_COLLATE="en_US.ISO8859-1"
LC_MONETARY="en_US.ISO8859-1"
LC_MESSAGES="en_US.ISO8859-1"
LC_PAPER="en_US.ISO8859-1"
LC_NAME="en_US.ISO8859-1"
LC_ADDRESS="en_US.ISO8859-1"
LC_TELEPHONE="en_US.ISO8859-1"
LC_MEASUREMENT="en_US.ISO8859-1"
LC_IDENTIFICATION="en_US.ISO8859-1"
LC_ALL=
$ fglrun -i mbcs
Error: locale not supported by C library, check LANG.
$ locale charmap
ANSI_X3.4-1968

After clearing the LC_ALL value, the value of the variable LC_CTYPE is used. It appears that it is not correct. After clearing this value we get the following output:

$ unset LC_CTYPE
$ locale
LANG=en_US.ISO8859-1
LC_CTYPE="en_US.ISO8859-1"
LC_NUMERIC="en_US.ISO8859-1"
LC_TIME="en_US.ISO8859-1"
LC_COLLATE="en_US.ISO8859-1"
LC_MONETARY="en_US.ISO8859-1"
LC_MESSAGES="en_US.ISO8859-1"
LC_PAPER="en_US.ISO8859-1"
LC_NAME="en_US.ISO8859-1"
LC_ADDRESS="en_US.ISO8859-1"
LC_TELEPHONE="en_US.ISO8859-1"
LC_MEASUREMENT="en_US.ISO8859-1"
LC_IDENTIFICATION="en_US.ISO8859-1"
LC_ALL=
$ locale charmap
ISO-8859-1
$ fglrun -i mbcs
LANG honored : yes
Charmap      : ISO-8859-1
Multibyte    : no
Stateless    : yes

Verifying if the locale is properly supported by the runtime system

You can check if the LANG locale is supported properly by using the -i mbcs option of the compilers and runner programs:

$ fglcomp -i mbcs
LANG honored : yes
Charmap      : ANSI_X3.4-1968
Multibyte    : no
Stateless    : yes

The lines printed with -i info option indicate if the locale can be supported by the operating system libraries. Here is a short description of each line:

Verification Parameter	Description
LANG Honored	This line indicates that the current locale configuration has been correctly set. Check if the indicator shows 'yes'.
Charmap	This is the name of the character set used by the runtime system.
Multibyte	This line indicates if the character set is multi-byte. Can be 'yes' or 'no'.
Stateless	A few character sets are using an internal state that can change during the character flow. Only stateless character sets can be supported by Genero. Check if the indicator shows 'yes'.

How to retrieve the list of available locales on the system

On Unix systems, the locale command with the parameter '-a' writes the names of available locales.

$ locale -a
...
en_US
en_US.iso885915
en_US.utf8
en_ZA
en_ZA.utf8
en_ZW
...

How to retrieve the list of available codesets on the system

On Unix systems, the locale command with the parameter '-m' writes the names of available codesets.

$ locale -m
...
ISO-8859-1
ISO-8859-10
ISO-8859-13
ISO-8859-14
ISO-8859-15
...

Using the charmap.alias file when client has different codeset names

The name of the codeset can be different from one system to another. The file $FGLDIR/etc/charmap.alias is used to provide the translation of the local name to a generic name. The generic name is the name sent to the front-end. It is the translated name that appears when the command 'fglrun -i mbcs' is used. The local codeset name is the value obtained using the system call 'nl_langinfo(CODESET)'. Note: This file might be incomplete.

An example of locale configuration on HP

$ export LANG=en_US.iso88591
$ locale
LANG=en_US.iso88591
LC_CTYPE="en_US.iso88591"
LC_COLLATE="en_US.iso88591"
LC_MONETARY="en_US.iso88591"
LC_NUMERIC="en_US.iso88591"
LC_TIME="en_US.iso88591"
LC_MESSAGES="en_US.iso88591"
LC_ALL=
$ locale charmap
"iso88591.cm"

The charmap.alias file contains the following line:

iso88591 ISO8859-1

The name sent to the client is ISO-8859-1 instead of iso88591.

The following C program should compile, and outputs the current codeset name.

#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <langinfo.h>
int main()
{
  setlocale(LC_ALL, "");
  printf("%s\n", nl_langinfo(CODESET));
  exit(0);
}

With the previous example this program outputs:

iso88591