Re: [PHP-DEV] [RFC] [DISCUSSION] Locale-independent float to string cast

This is only part of a thread. view whole thread
  109278
March 24, 2020 18:41 george.banyard@gmail.com ("G. P. B.")
On Tue, 24 Mar 2020 at 11:03, Lynn <kjarli@gmail.com> wrote:

> Hi, > > This is a great RFC! Just one minor thing. > > > Outputting floats as strings in locales which change the decimal > separator will have a slightly different output. In our opinion, the > backward compatibility break won't be serious in practice > > In my opinion, this will be huge. I can't trace back where the thousands of > possible conversions are used in legacy software. There's a variety of > custom format functions, sprintf implementations, string casts, and > number_format usages. Would it be possible to trigger a warning or > deprecation when using a locale that would have a different result from > what the new result will be? If I'd use a locale that results in `3.5`, I > don't need a warning. If I use a locale that results in `3,5`, I would like > to see a warning of sorts so I can fix this before untested legacy code > will seriously break data exports. > > Regards, > Lynn >
We are not saying that this won't be prevalent,especially in legacy code, however, IMHO, this is not a serious issue as it is only a matter how one character is displayed to end users. As said by Christoph we did not consider a deprecation warning due to the performance impact this would lead to as float to string conversions are a common operation. The idea of a temporary INI setting which warn about these conversions is an interesting idea, but I personally would rather not introduce one. Moreover, with a temporary INI setting how long would it last, one minor version, one major version? Something in between the two? On Tue, 24 Mar 2020 at 11:15, Nikita Popov ppv@gmail.com> wrote:
> I'm obviously in favor of this proposal. > > Only really comment I have is on printf(): You're right that we have %f and > %F to toggle locale-sensitivity, but the %e, %E, %g, %G formats are always > locale-sensitive. It might make sense to introduce locale-insensitive > variants of those, especially considering that %G is considered the > "standard" floating point format. Internally we support %H for that, so we > could expose that... (Alternatively, locale-sensitivity might be removed > for e/E/g/G.) > > Regards, > Nikita
*sight* I was not aware of this issue ... Personally, I would prefer that %e and %E are not locale aware as they are meant to represent a standard notation. This would then mean that %g is locale aware as %f is and %G is not because %E and %F are not locale aware. But I'm open to suggestion considering this edge case. Best regards George P. Banyard
  109279
March 24, 2020 18:59 kjarli@gmail.com (Lynn)
On Tue, Mar 24, 2020 at 7:41 PM G. P. B. banyard@gmail.com> wrote:

> On Tue, 24 Mar 2020 at 11:03, Lynn <kjarli@gmail.com> wrote: > >> I would like >> to see a warning of sorts so I can fix this before untested legacy code >> will seriously break data exports. >> >> Regards, >> Lynn >> > > We are not saying that this won't be prevalent,especially in legacy code, > however, IMHO, this is not a serious issue as it is only a matter how one > character is displayed to end users. >
It's so much more than a character that's being displayed. There are automated data exports with prices, that will be automatically consumed on the other side. These kind of changes can seriously break if they were expected to be localized currency formats. We're talking massive amounts of data here. I've seen price formats go wrong often and seen them be fixed even up to a week ago. It's hard to track all usages in a 20 year old code base. As much as I love that this can be fixed, I rather not see it fixed at all if there's no upgrade path available. Regards, Lynn
  109281
March 24, 2020 23:14 kocsismate90@gmail.com (=?UTF-8?B?TcOhdMOpIEtvY3Npcw==?=)
Hi Lynn,

> There are automated data exports with prices, that will be automatically consumed on the other side.
I agree that these cases can go horribly wrong. However, my reasoning is the following: - if a piece of code currently relies on locale-independence (e.g. automated data exports) then this change wouldn't cause any breakage since a workaround has already been in place there (e.g. the programmers use var_export() instead of casting) - if a piece of code relies on the locale-dependent string representation of floats then there will be a BC break, sure, however I believe that code isn't very sensitive to the change in the vast majority of the cases since that data is for presentation purposes only. Or do you have other locale-dependent use-cases in mind? I am sure there are some but I think the number of the situations where the change is problematic is less than what it first seems. Regards, Máté
  109282
March 25, 2020 08:25 kjarli@gmail.com (Lynn)
I'm very sorry, I pressed the reply instead of reply all button, I hope
this fixes it!

I agree that these cases can go horribly wrong. However, my reasoning is
> the following: > - if a piece of code currently relies on locale-independence (e.g. > automated data exports) then this > change wouldn't cause any breakage since a workaround has already been in > place there (e.g. the > programmers use var_export() instead of casting) > - if a piece of code relies on the locale-dependent string representation > of floats then there will be > a BC break, sure, however I believe that code isn't very sensitive to the > change in the vast majority of the > cases since that data is for presentation purposes only. >
I know it's for presentation purpose only, sadly it's not used for just presentation code. It's being consumed and parsed by automated imports that expect a format different than `3.5`, because that's how it organically evolved. Even when it was meant only for presentation, the consumer expected the `3,5` format and will keep expecting this until we notify them of a change (and then it will probably still take weeks before this is fixed on their side). A lot of code that I've encountered is stuff like: ``` $csv .= $product->getPrice() . ';'; // older code I've seen come by $product = get_product($$var5); $csv .= $product['price'] . ';'; ``` As much as I'd like to see this fixed and always give a `3.5`, sadly that's not the case if you use a different locale. Going through thousands of files to fix this, is not going to be an easy task, especially not as this is often old enough to not be usable by static code analyzers.
> Or do you have other locale-dependent use-cases in mind? I am sure there > are some but I think the number of the situations where the change is > problematic is less than what it first seems. >
No, this is the only issue I personally see with it. However, the impact could be severe enough to not be able to upgrade to a new PHP version in the foreseeable future, thus I would like to see an upgrade path, one that tells you where it would've gone wrong, instead of turning the behavior on or off. If I can gather the logs and thus find usages, I can pro-actively start fixing where this would go wrong. I'll take a performance hit over a breakage. Performance penalty would only have to apply if the locale is set to something we know will change the outcome. Maybe it turns out this is a small issue, maybe it's a big issue... I can't tell because it's either upgrade & break, or upgrade & work in the proposed scenario. Some breakage may take weeks to find out because scripts run periodically. I fully understand that you don't see this as a big issue, and I really wish I could say the same. I can't vote, so I can't change the outcome, please consider my use-case when moving forward, thanks! Regards, Lynn
  109328
March 26, 2020 09:54 kocsismate90@gmail.com (=?UTF-8?B?TcOhdMOpIEtvY3Npcw==?=)
Hi Lynn,

Thank you very much for the input! Then we could add a secondary vote to
the RFC where an
upgrade path was decided upon.

The first such path is the one proposed by Christoph:
Introduce a temporary ini setting with which a "debug" mode of float to
string casting could be
enabled. That is, when it's turned on and a casting is affected by locale
settings then a warning
is emitted. This setting would be removed in a (near) future version of
PHP. Although
PHP 8.1 would be an unusual candidate to do so, but it *might* be possible
if we found a good
way to warn people from day 0 that it's just a temporary flag? Maybe by an
immediate deprecation?

Is there any other idea for a sensible upgrade path?

Cheers:
Máté