Summary:
See also: Localized Strings
Localization Support allows you to write BDL programs that follow a specific language and cultural rules. This includes single and multi-byte character set support, language-specific messages, as well as lexical/numeric/currency conventions.
Localization Support is based on the POSIX system libraries handling the locale. A locale is a set of language and cultural rules.
A BDL program needs to be able to determine its locale and act accordingly to be portable to different cultures.
When writing a form or program source file, you use a specific character set. This character set depends upon the text editor or operating system settings you are using on the development platform. For example, when writing a string constant in a 4gl module, containing Arabic characters, you probably use the ISO-8859-6 character set. The character set used used at runtime (during program execution) must match the character set used to write programs.
At runtime, a Genero program can only work in a specific character set. However, by using Localized Strings, you can start multiple instances of the same compiled program using different locales. For a given program instance the character set used by the strings resource files must correspond to the locale. Make sure the string identifiers use ASCII only.
Genero BDL uses byte length semantics: When defining a character data type like CHAR(n) or VARCHAR(n), n represents as a number of bytes, not a number of characters. In a single-byte character set like ISO-8859-1, any character is encoded on a unique byte, so the number of bytes equals the number of characters. But in a multi-byte character set, encoding requires more that one byte, so the number of bytes to store a multi-byte string is bigger as the number of characters. For example, in a BIG5 encoding, one Chinese character needs 2 bytes, so if you want to hold a BIG5 string with a maximum of 10 Chinese characters, you must define a CHAR(20). When using a variable-length encoding like UTF-8, characters can take one, two or more bytes, so you need to choose the right average to define CHAR or VARCHAR variables.
The definition of database columns using CHAR, VARCHAR, NCHAR and NVARCHAR types varies from one database vendor to another. Some use byte length semantics, other use character length semantics, and other provide both ways. For example, Informix uses bytes only; Oracle supports byte "CHAR(10 BYTE)" or character "CHAR(10 CHAR)" length semantics. SQL Server uses a single-byte character set for CHAR/VARCHAR and uses a 2-length Unicode character set (UCS-2) for NCHAR and NVARCHAR.
Other SQL elements like functions and operators are affected by the length semantic. For example, Informix LENGTH() function always returns a number of bytes, while Oracle's LENGTH() function returns a number of characters (use LENGTHB() to get the number of bytes with Oracle).
It is important to understand properly how the database servers handle multi-byte character sets. Check your database server reference manual: In most documentations you will find a "Localization" chapter which describes those concepts in detail.
For portability, we recommend to use byte length semantic based character data types in databases, because this corresponds to the length semantics used by Genero BDL (this is important when declaring variables by using DEFINE LIKE, which is based on database schemas).
This section describes the settings defining the locale, changing the behavior of the runtime system.
The LANG environment variable defines the global settings for the language used by the application. This variable changes the behavior of the character handling functions, such as UPSHIFT, DOWNSHIFT. It also changes the handling of multi-byte characters. Invalid settings of LANG will cause compilation errors if a source file contains multi-byte characters.
With the LANG environment variable, you define the language, the territory (country) and the codeset (character set) to be used. The format of the value is normalized as follows, but may be specific on some operating systems:
language[_territory[.codeset]]
See also Troubleshooting to learn how to check if a locale is properly set, and list the locales installed on your system.
To perform decimal to/from string conversions, the runtime system uses the DBMONEY or DBFORMAT environment variables. These variables define hundreds / decimal separators and currency symbols for MONEY data types.
The LC_MONETARY and LC_NUMERIC standard environment variables, defining numeric and monetary rules, are ignored.
To perform date to/from string conversions, the runtime system uses by default the DBDATE environment variable. When assigning a string to a date variable, the standard environment variable LC_TIME is ignored.
When using the FORMAT field attribute or the USING operator to format dates with abbreviated day and month names - by using ddd / mmm markers - the system uses English-language based texts for the conversion. This means, day (ddd) and month (mmm) abbreviations are not localized according to the locale settings, they will always be in English.
This section describes the settings defining the locale for the database client.
Each database vendor has its own locale settings.
Here is the list of environment variables defining the locale used by the application, for each supported database client:
Database Client | Settings |
Genero DB | The character set used by the client is defined by the characterset
ODBC DSN configuration parameter. If this parameter is not set, it defaults to ASCII. Before version 3.80, the character set was defined by the ANTS_CHARSET environment variable. |
Oracle | The client locale settings can be set with environment variables like NLS_LANG, or after connection, with the ALTER SESSION instruction. By default, the client locale is set from the database server locale. |
Informix | The client locale is defined by the CLIENT_LOCALE environment variable. For backward compatibility, if CLIENT_LOCALE is not defined, other settings are used if defined (DBDATE / DBTIME / GL_DATE / GL_DATETIME, as well as standard LC_* variables). |
IBM DB2 | The client locale is defined by the DB2CODEPAGE profile variable. You must set this variable with the db2set command. If DB2CODEPAGE is not set, DB2 uses the operating system code page on Windows and the LANG environment variable on Unix. |
Microsoft SQL Server | The client locale is defined by the Window operating system locale where the database client is installed. |
PostgreSQL | The client locale can be set with the PGCLIENTENCODING environment variable, with the client_encoding configuration parameter in postgresql.conf, or after connection, with the SET CLIENT_ENCODING instruction. Check the pg_conversion system table for available character set conversions. |
MySQL | The client locale is defined by the default-character-set option in the configuration file, or after connection, with the SET NAMES and SET CHARACTER SET instructions. |
Sybase ASA | The client locale is defined by the operating system locale where the database client is installed. |
See database vendor documentation for more details.
The front-end workstation must support the character set used on the runtime system side. You can refer to each front-end documentation to check the list of supported character sets. The host operating system must also be able to handle the character set. For instance, a Western-European Windows is not configured to handle Arabic applications. If you start an Arabic application, some graphical problems may occur (for instance the title bar won't display Arabic characters, but unwanted characters instead).
Predefined runtime system error messages are stored in the .iem system message files. The system message files use the same technique as user defined message files (See Message Files). The default message files are located in the FGLDIR/msg/en_US directory (.msg sources are provided).
For backward compatibility with Informix 4gl, some of these system error messages are used by the runtime system to report a "normal" error during a dialog instruction. For example, end users may get the error -1309 "There are no more rows in the direction you are going" when scrolling an a DISPLAY ARRAY list.
Here are some examples of system messages that can appear during a dialog:
Number | Description |
-1204 |
Invalid year in date. |
-1304 |
Error in field. |
-1305 |
This field requires an entered value. |
-1306 |
Please type again for verification. |
-1307 |
Cannot insert another row - the input array is full. |
-1309 |
There are no more rows in the direction you are going. |
and more... |
While it is recommended to use Localized Strings to internationalize application messages, you might need to translated the default system messages to a specific locale and language, or you might just want to customize the English messages.
With this technique, you can deploy multiple message files in different languages and locales in the same FGLDIR/msg directory.
To use your own customized system messages, do the following:
On Microsoft Windows XP / 2000 platforms, some system updates (Services Pack 2) or Office versions do set the LANG environment variable with a value for Microsoft applications (for example 1033).
Such value is not recognized by Genero as a valid locale specification. Make sure that the LANG environment variable is properly set in the context of Genero applications.
You may have different codesets on the client workstation and the application server. The typical mistake that can happen is the following: You have edited a form-file with the encoding CP1253; you compile this form-file on a UNIX-server (encoding ISO-8859-7). When displaying the form, invalid characters will appear. This is usually the case when you write your source file under a Windows system (that uses Microsoft Code Page encodings), and use a Linux server (that uses ISO codepages).
On Unix systems, the locale command without parameters outputs information about the current locale environment.
Once the LANG environment variable is set, check that the locale environment is correct:
$ export LANG=en_US.ISO8859-1 $ locale LANG=en_US.ISO8859-1 LC_CTYPE="en_US.ISO8859-1" LC_NUMERIC="en_US.ISO8859-1" LC_TIME="en_US.ISO8859-1" LC_COLLATE="en_US.ISO8859-1" LC_MONETARY="en_US.ISO8859-1" LC_MESSAGES="en_US.ISO8859-1" LC_PAPER="en_US.ISO8859-1" LC_NAME="en_US.ISO8859-1" LC_ADDRESS="en_US.ISO8859-1" LC_TELEPHONE="en_US.ISO8859-1" LC_MEASUREMENT="en_US.ISO8859-1" LC_IDENTIFICATION="en_US.ISO8859-1" LC_ALL=
If the locale environment is not correct, then you should check the value of the following environment variables : LC_ALL, LC_CTYPE, LC_NUMERIC, LC_TIME, LC_COLLATE, ... value.
The following examples show the effect of LC_ALL and LC_CTYPE on locale configuration. The LC_ALL variable overrides all other LC_.... variables values.
$ export LANG=en_US.ISO8859-1 $ export LC_ALL=POSIX $ export LC_CTYPE=fr_FR.ISO8859-15 $ locale LANG=en_US.ISO8859-1 LC_CTYPE="POSIX" LC_NUMERIC="POSIX" LC_TIME="POSIX" LC_COLLATE="POSIX" LC_MONETARY="POSIX" LC_MESSAGES="POSIX" LC_PAPER="POSIX" LC_NAME="POSIX" LC_ADDRESS="POSIX" LC_TELEPHONE="POSIX" LC_MEASUREMENT="POSIX" LC_IDENTIFICATION="POSIX" LC_ALL=POSIX $ fglrun -i mbcs LANG honored : yes Charmap : ANSI_X3.4-1968 Multibyte : no Stateless : yes
The charset used is the ASCII charset. Clearing the LC_ALL environment variable produces the following output:
$ unset LC_ALL $ locale LANG=en_US.ISO8859-1 LC_CTYPE=fr_FR.ISO8859-15 LC_NUMERIC="en_US.ISO8859-1" LC_TIME="en_US.ISO8859-1" LC_COLLATE="en_US.ISO8859-1" LC_MONETARY="en_US.ISO8859-1" LC_MESSAGES="en_US.ISO8859-1" LC_PAPER="en_US.ISO8859-1" LC_NAME="en_US.ISO8859-1" LC_ADDRESS="en_US.ISO8859-1" LC_TELEPHONE="en_US.ISO8859-1" LC_MEASUREMENT="en_US.ISO8859-1" LC_IDENTIFICATION="en_US.ISO8859-1" LC_ALL= $ fglrun -i mbcs Error: locale not supported by C library, check LANG. $ locale charmap ANSI_X3.4-1968
After clearing the LC_ALL value, the value of the variable LC_CTYPE is used. It appears that it is not correct. After clearing this value we get the following output:
$ unset LC_CTYPE $ locale LANG=en_US.ISO8859-1 LC_CTYPE="en_US.ISO8859-1" LC_NUMERIC="en_US.ISO8859-1" LC_TIME="en_US.ISO8859-1" LC_COLLATE="en_US.ISO8859-1" LC_MONETARY="en_US.ISO8859-1" LC_MESSAGES="en_US.ISO8859-1" LC_PAPER="en_US.ISO8859-1" LC_NAME="en_US.ISO8859-1" LC_ADDRESS="en_US.ISO8859-1" LC_TELEPHONE="en_US.ISO8859-1" LC_MEASUREMENT="en_US.ISO8859-1" LC_IDENTIFICATION="en_US.ISO8859-1" LC_ALL= $ locale charmap ISO-8859-1 $ fglrun -i mbcs LANG honored : yes Charmap : ISO-8859-1 Multibyte : no Stateless : yes
You can check if the LANG locale is supported properly by using the -i mbcs option of the compilers and runner programs:
$ fglcomp -i mbcs LANG honored : yes Charmap : ANSI_X3.4-1968 Multibyte : no Stateless : yes
The lines printed with -i info option indicate if the locale can be supported by the operating system libraries. Here is a short description of each line:
Verification Parameter | Description |
LANG Honored |
This line indicates that the current locale configuration has been correctly set. |
Charmap | This is the name of the character set used by the runtime system. |
Multibyte |
This line indicates if the character set is multi-byte. Can be 'yes' or 'no'. |
Stateless |
A few character sets are using an internal state that can change during
the character flow. Only stateless character sets can be supported by
Genero. Check if the indicator shows 'yes'. |
On Unix systems, the locale command with the parameter '-a' writes the names of available locales.
$ locale -a ... en_US en_US.iso885915 en_US.utf8 en_ZA en_ZA.utf8 en_ZW ...
On Unix systems, the locale command with the parameter '-m' writes the names of available codesets.
$ locale -m ... ISO-8859-1 ISO-8859-10 ISO-8859-13 ISO-8859-14 ISO-8859-15 ...
The name of the codeset can be different from one system to another. The file $FGLDIR/etc/charmap.alias is used to provide the translation of the local name to a generic name. The generic name is the name sent to the front-end. It is the translated name that appears when the command 'fglrun -i mbcs' is used. The local codeset name is the value obtained using the system call 'nl_langinfo(CODESET)'. Note: This file might be incomplete.
An example of locale configuration on HP
$ export LANG=en_US.iso88591 $ locale LANG=en_US.iso88591 LC_CTYPE="en_US.iso88591" LC_COLLATE="en_US.iso88591" LC_MONETARY="en_US.iso88591" LC_NUMERIC="en_US.iso88591" LC_TIME="en_US.iso88591" LC_MESSAGES="en_US.iso88591" LC_ALL= $ locale charmap "iso88591.cm"
The charmap.alias file contains the following line:
iso88591 ISO8859-1
The name sent to the client is ISO-8859-1 instead of iso88591.
The following C program should compile, and outputs the current codeset name.
#include <stdio.h> #include <stdlib.h> #include <locale.h> #include <langinfo.h> int main() { setlocale(LC_ALL, ""); printf("%s\n", nl_langinfo(CODESET)); exit(0); }
With the previous example this program outputs:
iso88591