default_charset and mb_internal_encoding

  105034
April 2, 2019 09:42 nicolai.scheer@gmail.com (Nicolai Scheer)
Hi list,

I'm currently in the process of migrating an old application from php 5.6
to 7.2.
In the process, I fiddled with the default_charset ini setting.

The documentation states (c.f.
https://www.php.net/manual/en/ini.core.php#ini.default-charset):

"In PHP 5.6 onwards, "UTF-8" is the default value and [...] The value of
default_charset
will also be used to set the default character set for [...] and for
mbstring functions
if the mbstring.http_input mbstring.http_output mbstring.internal_encoding
configuration option is unset."

As such, I'd expect to be able to set default_charset to iso-8859-1 and
mbstring to pick that same setting for its internal encoding (if the
mentioned directives are unset, that is).

This seems not to be the case:


    
  105239
April 11, 2019 13:41 cmbecker69@gmx.de ("Christoph M. Becker")
On 02.04.2019 at 11:42, Nicolai Scheer wrote:

> I'm currently in the process of migrating an old application from php 5.6 > to 7.2. > In the process, I fiddled with the default_charset ini setting. > > The documentation states (c.f. > https://www.php.net/manual/en/ini.core.php#ini.default-charset): > > "In PHP 5.6 onwards, "UTF-8" is the default value and [...] The value of > default_charset > will also be used to set the default character set for [...] and for > mbstring functions > if the mbstring.http_input mbstring.http_output mbstring.internal_encoding > configuration option is unset." > > As such, I'd expect to be able to set default_charset to iso-8859-1 and > mbstring to pick that same setting for its internal encoding (if the > mentioned directives are unset, that is). > > This seems not to be the case: > > ini_set( 'default_charset', 'iso-8859-1' ); > var_dump( ini_get("mbstring.internal_encoding") ); > var_dump( ini_get("mbstring.http_input") ); > var_dump( ini_get("mbstring.http_output") ); > echo mb_internal_encoding() . "\n"; > echo mb_strlen( "\xc3\xb6" ) . "\n"; > echo mb_strlen( "\xc3\xb6", '8bit' ) . "\n"; > > This outputs (7.2.15 on a CentOS box): > string(0) "" > string(0) "" > string(0) "" > UTF-8 > 1 > 2 > > The default_charset is set but mbstring settings are not, so I'd expect to > get 2 as the character/byte count in both cases. > > If I throw a mb_internal_encoding("iso-8859-1") in the mix, both string > lengths are equal. > > Since the mentioned mbstring directives are deprecated as of 5.6.0 - do I > really need to use mb_internal_encoding() instead? > Is the documentation wrong or am I just misinterpreting it? I thought that > default_charset should act as some kind of "master setting" in order not to > have to set all specific settings as well (e.g. iconv, mbstring). > > Usually we use UTF-8, so I did not come across this before... > > Any insight?
<https://3v4l.org/ZvQ67> confirms the reported behavior. A quick look at the code, too. I suggest you file a ticket on <https://bugs.php.net/>. Thanks, Christoph M. Becker