Locale issues with Zend Lucene search

Datetime:2016-08-23 02:11:40          Topic: Lucene           Share

Posted December 3rd, 2015 inPHP

We use Zend Lucene Search on a PHP ecommerce website and ran into some issues where records added to the search index from the website interface weren't the same as when created from the command line. It turned out to be a locale issue and setting the locale fixed the problem.

Error messages

A clue that something wasn't working correctly was this error message:

iconv(): Detected an illegal character in input string

There are a variety of conversions that can be done which fix this error, but our problem appears to have been caused by a non-configured locale, defaulting to "C" and which does not support UTF8.

Check what locale is currently being used

In PHP, you can call "setlocale(LC_ALL, 0)" to find out what the locale is currently set as. Running through Nginx with PHP-FPM, it output this:

C

and from the command line this:

en_NZ.UTF-8

Running "locale" from a SSH terminal session output this:

LANG=en_NZ.UTF-8
LANGUAGE=
LC_CTYPE="en_NZ.UTF-8"
LC_NUMERIC="en_NZ.UTF-8"
LC_TIME="en_NZ.UTF-8"
LC_COLLATE="en_NZ.UTF-8"
LC_MONETARY="en_NZ.UTF-8"
LC_MESSAGES="en_NZ.UTF-8"
LC_PAPER="en_NZ.UTF-8"
LC_NAME="en_NZ.UTF-8"
LC_ADDRESS="en_NZ.UTF-8"
LC_TELEPHONE="en_NZ.UTF-8"
LC_MEASUREMENT="en_NZ.UTF-8"
LC_IDENTIFICATION="en_NZ.UTF-8"
LC_ALL=

which would indicate the CLI script is picking the locale from the system, but Nginx/PHP-FPM is not.

How to fix

It's possible to set the default locale in the php.ini file, although it might not be a good idea if you run many websites on your server as it could cause issues.

Instead use the setlocale() to set it specifically for your website. Check out the www.php.net/setlocale manual page for more information about the function.

In my case, we did this:

setlocale(LC_ALL, 'en_NZ.UTF-8');

This then solved the issue with adding documents to the Lucene index.





About List