PHP Localization

When you’re setting up the multi language site, at some point in time, you need to decide which option you will use for static text on the site.

There are a few options available when working with PHP:

When dealing with localization and internationalization, you many need all or some of the following features:

  1. storing and retrieving of strings by locale
  2. message formatting functions including support for variables and placement holders
  3. support for singular, plural and other
  4. support for sorting according to different locales
  5. support for different date, time and currency formats
  6. tools for extraction from source code
  7. tools for having the application translated and for loading the translations
  8. a possibility to add new languages without changing the application
  9. a fallback mechanism i.e. load the de_DE translation in case the de_AT translation is not available.

gettext

gettext™ is the GNU library for internationalization and translation. It is also natively supported by PHP. gettext is quite widely used (most notably in WordPress). It is easy to use and quite powerful. It focuses on the storage, retrieval and formatting of messages.

So that gettext can be used in PHP, it must have been built with the –with-gettext option or have to make sure the gettext extension is enabled in php.ini and loaded as a shared/dynamic library.

Windows users should see the following:

extension=php_gettext.dll

Unix/Linux users:

extension=gettext.so

You can check whether your PHP installation supports gettext with the following code:

<?php

if (function_exists("gettext")) {
    echo "gettext installed";
}
else {
    echo "gettext not installed";
}
?>

gettext supports the concept of translation domains. A domain is defining a translation scope which allows to have different translations for the same key in different libraries. Let’s say you use two different libraries which both show the string “I am going to %s” in English. In one library, the placeholder contains an action. In the other one the name of a country. If you now translate the first one let’s say in German, you’ll get “Ich werde %s”. In the second case, “Ich fahre nach %s”. So you need not to mix the two and allows each library to get it’s translation. This is done through domains. Each library will define it’s own domain and will not see the translation of other domains.

A domain also defines a root path for the translations. It is defined like this:

bindtextdomain("mytextdomainname", "mytextdomainlocaleroot");

After that you can also set a default text domain (it defines the text domain used when no text domain is specified):

textdomain("mytextdomainname");

A common choice for the text domain name is the name of the software, library or plugin.

Under the translation root path, you’ll the following directory structure:

/translation_root_path
    /en_US
        /LC_MESSAGES
            messages.po
            messages.mo
    /de_DE
        /LC_MESSAGES
            messages.po
            messages.mo

The messages.po file is a text file containing all translated entries. The messages.mo file is a compiled binary file read by gettext. Getting a messages.mo file involves the following steps:

  1. Extract all strings to be translated from the source code using xgettext. This will create a messages.po file
  2. Translate all string in messages.po
  3. Compile messages.po to messages.mo using msgfmt

The input for xgettext is a bunch of source code files:

xgettext *.php

Your can also use the following options:

  • -n to have the location of the string (which line of which file) written as a comment
  • -j to merge with an existing file
  • -o to specify an output file

You can also merge two .po files like this:

msgmerge old.po new.po --output-file=messages.po

This will merge new.po into old.po and create messages.po. It’s basically similar to:

msgmerge old.po new.po > messages.po

You can also run it in update mode:

msgmerge -U messages.po new.po

This will merge new.po into an existing messages.po. So basically when you make changes, you’ll create a new .po file and merge it with the existing one:

xgettext -o new.po *.php
msgmerge -U messages.po new.po
rm new.po

Translation and messages.po files can be updated and managed using poedit. Poedit is a free GUI tool for computer aided translation of documentation and user interfaces. It’s basically a UI frontend for the tools of GNU gettext used to edit .po files. It provides functionality for managing translation projects with a minimalistic UI. Not yet finished translations are specially marked so that it’s easy to see them, translation catalogs can be imported and unknown translations can be automatically taken over from the catalog.

Once the messages.po file is translated, you can compile it like this:

msgfmt messages.po

This generates the messages.mo file which can then be copied to the appropriate directory as shown above.

To set the language which is then used to load the appropriate messages.mo file, use the following code:

$language = "en_US";
putenv("LANG=" . $language); 
setlocale(LC_ALL, $language);

Now you can use the following to get a translated text:

echo _("Hello World!");

_(…) is a shortcut for gettext(…). Since you’ll be writing many such calls, you’ll soon be glad to have this shortcut :-).

The gettext function returns the translated string from the current text domain (set using the textdomain function). If you need a translation from another domain than the default domain (which is often the case when you work on plugins for an application, where the default text domain is the one of the application and you do not want to change it in a plugin), use dgettext instead:

echo dgettext("mytextdomainname", "mystring");

There are also versions of these two methods which can handle singular/plural: ngettext and dngettext. Here an example with ngettext:

echo ngettext("an apple", "a few apples", $no_of_apples);

ngettext will check whether the third parameter is one (in which case it will consider the first string) or more than one (in which case it will consider the second string). Here another example actually writing the number of apples:

printf(ngettext("%d window", "%d windows", $no_of_apples), $no_of_apples);

Note that gettext .mo files are cached so if you modify a messages.mo file you might notice that the changes are not active. They will only become active after you restart the web server. A few workaround are also described here.

The big advantage of the gettext approach is that you can use gettext without defining any translation at first and then define the translations. Since the strings used as keys are the text in the default language, you will always have a text displayed even if the corresponding translation does not exist yet. The downside is that you basically mix the text for the default language and code which some might find unclean.

Also gettext doesn’t address the conversion of date, time and currencies. It also doesn’t address the language specific sorting of localized entries.

You can find more info on gettext on the gettext GNU project page.

Note that WordPress uses gettext for translation and defines a few additional helper functions:

__('some message') // returns the translation for 'some message'
_e('some message') // echoes the translation for 'some message'
_n('some singular message', 'some plural message', $count ) // returns either the translation for 'some singular message' or 'some plural message' depending on the value of $count

Associative Arrays

This technique basically involves having PHP files containing a big associative array which keys are referenced in the code and values contain the translation in a given language. Based on the language selected in the application, you will load one array or the other one.

Defining such an array is as easy as:

$lang = array(
	"HELLO_WORLD" => "Hallo Welt !",
)

for the German version and the following for the French version:

$lang = array(
	"HELLO_WORLD" => "Bonjour monde !",
)

In you code, you’d load the appropriate translation like this:

<?php
if(isset($_REQUEST['lang'])) {
	//Explicitely set as URL parameter
	$lang = $_REQUEST['lang']; //read it from the URL parameters
	$_SESSION['lang'] = $lang; //save it in the session
	setcookie('lang', $lang, time() + (3600 * 24 * 20)); //set a cookie for 20 days
}
else if(isset($_SESSION['lang'])) {
	//Fallback: read it from the session information
	$lang = $_SESSION['lang'];
}
else if(isset($_COOKIE['lang'])) {
	//Fallback: read it from the cookie
	$lang = $_COOKIE['lang'];
}
else {
	//Fallback: English is the default
	$lang = 'en';
}

//The files are named lang.LANGUAGE.php i.e. lang.en.php
//Assumption: language files are stored in the langs subdirectory
$lang_ile = 'langs/lang.'.$lang.'.php';
if (!file_exists($lang_file)) {
	//Fallback: if no translation is available for this language, use English
	$lang_file = 'langs/lang.en.php';
}

//Load the selected translation
include_once $lang_file;
?>

Now you’ve loaded the file, you can access the translated string like this:

<?php
	echo $lang['HELLO_WORLD'];
?>

Now this works file if you have no parameter in the translation. Let’s say you want to write “I’ve seen Henri” but implement it so that if $person contains Alex instead of Henri, you can just do something like “I’ve seen $person”. Of course just doing it this way will not work:

<?php
	echo $lang['I_HAVE_SEEN'].' '.$person;
?>

Since in German, it would say: “Ich habe gesehen Henri” instead of “Ich habe Henri gesehen”. So you need to be able to define place holders. You can handle this using such a function:

function translate() {
	global $lang;
	$parameters = func_get_args();
	$parameters[0] = $lang[$parameters[0]];
	return call_user_func_array('sprintf', $parameters);
}

Here a short line by line explanation of the code of the function:

  1. we will use the $lang variable defined outside of the function.
  2. func_get_args returns all parameters of the containing function.
  3. we replace the code used as first parameter by the format string defined in the translation array.
  4. we call sprint with the entries of the array as parameter and return the results.

And you can use this function like this:

echo translate('I_HAVE_SEEN', 'Henri');

Disadvantages of this method:

  • It doesn’t support plural forms i.e. one child vs. two children
  • There is no default. If the translation doesn’t contain the string, you have a problem
  • It only takes care of storing and reading the translation, nothing else

PHP internationalization extension

Intl is the PHP internationalization extension and was introduced in PHP 5.3. It is basically a wrapper for the ICU (International Components for Unicode) library.. It focuses on collation and formatting of dates, times, numbers and currencies.

Translation2

Translation2 is a class to manage internationalization and translations in applications. The main advantages of Translation2 are:

  • It supports different containers for the translations e.g. in a database, using gettext, using XML.
  • It’s very flexible and supports decorators which can provide additional functionality e.g. caching, converting, fallbacks…
  • It supports fallback languages which is not always easy with the other solutions.
  • It has an administration class which makes it easy to manage translations.

More will be added to this post soon…

4 thoughts on “PHP Localization

  1. “9. a fallback mechanism i.e. load the de_DE translation in case the de_AT translation is not available.” What would be the best way to manage that with gettext? Should also work for other languages (like en_US for en_UK…). I don’t have any idea to release that…

    1. Translation2 does provide an easy documented way to achieve this. If you are using plain get text, you’ll have to write a wrapper over gettext checking whether you got the identifier back as translation and translating with the fallback language instead.

  2. Thank you for this post, I was in the process of creating the very same sort of article based on my research and the need for something that can compare the options for making your sites international – so this was a terrific start, I am very excited for the rest of it.

    Here are a couple tips I found out today, they may be useful:
    Zend Framework actually tossed Zend Local for php internationalization classes
    r3 (Yahoo developed) is another option for internationalization, seems a bit complicated but is an option : http://developer.yahoo.com/r3/

    I still can’t seem to find anything that does it all so I think I will use a combination of gettext(), php international classes(Intl) and the i18n collection of classes from the PHP-FLP which seems like a simple/easy wrapper for the gettext(), unless I can make it work with a database.

    Curious? Which performs better for speed and maintainability : translations from a flat file or a database, does anyone have thoughts?

Leave a Reply

Your email address will not be published.