[RFC] Saner string to number comparisons

  104519
February 26, 2019 12:27 nikita.ppv@gmail.com (Nikita Popov)
Hi internals,

I think it is well known that == in PHP is a pretty big footgun. It doesn't
have to be. I think that type juggling comparisons in a language like PHP
have some merit, it's just that the particular semantics of == in PHP make
it so dangerous. The biggest WTF factor is probably that 0 == "foobar"
returns true.

I'd like to bring forward an RFC for PHP 8 to change the semantics of ==
and other non-strict comparisons, when used between a number and a string:

https://wiki.php.net/rfc/string_to_number_comparison

The tl;dr is that if you compare a number and a numeric string, they'll be
compared as numbers. Otherwise, the number is converted into a string and
they'll be compared as strings.

This is a very significant change -- not so much because the actual BC
breakage is expected to be particularly large, but because it is a silent
change in core language semantics, which makes it hard to determine whether
or not code is affected by the change. There are things we can do about
this, for example the RFC suggests that we might want to have a transition
mode where we perform the comparison using both the old and the new
semantics and warn if the result differs.

I think we should give serious consideration to making such a change. I'd
be interested to hear whether other people think this is worthwhile, and
how we could go about doing it, while minimizing breakage.

Regards,
Nikita
  104520
February 26, 2019 13:04 george.banyard@gmail.com ("G. P. B.")
On Tue, 26 Feb 2019 at 13:27, Nikita Popov ppv@gmail.com> wrote:

> Hi internals, > > I think it is well known that == in PHP is a pretty big footgun. It doesn't > have to be. I think that type juggling comparisons in a language like PHP > have some merit, it's just that the particular semantics of == in PHP make > it so dangerous. The biggest WTF factor is probably that 0 == "foobar" > returns true. > > I'd like to bring forward an RFC for PHP 8 to change the semantics of == > and other non-strict comparisons, when used between a number and a string: > > https://wiki.php.net/rfc/string_to_number_comparison > > The tl;dr is that if you compare a number and a numeric string, they'll be > compared as numbers. Otherwise, the number is converted into a string and > they'll be compared as strings. > > This is a very significant change -- not so much because the actual BC > breakage is expected to be particularly large, but because it is a silent > change in core language semantics, which makes it hard to determine whether > or not code is affected by the change. There are things we can do about > this, for example the RFC suggests that we might want to have a transition > mode where we perform the comparison using both the old and the new > semantics and warn if the result differs. > > I think we should give serious consideration to making such a change. I'd > be interested to hear whether other people think this is worthwhile, and > how we could go about doing it, while minimizing breakage. > > Regards, > Nikita >
I am in favor of this change however I do think having the Trailing Whitespaces numerics RFC would be better for this change as I do agree with you that IMHO 42 == "42 " should return true. I'm also not sure if you maybe should point this out sooner but as it is currently seems fine to me. However small nitpick on your precision section $float = 1.75; ini_set <http://www.php.net/ini_set>('precision', 14); // Defaultvar_dump <http://www.php.net/var_dump>($float < "1.75abc");// Behaves likevar_dump <http://www.php.net/var_dump>("1.75" < "1.75abc"); // true is wrong because var_dump($float < "1.75abc"); returns false: https://3v4l.org/G56Vj I think you meant to use the <= comparison operator maybe? Also the NAN behavior is a bit surprising (to me atleast) but it does follow the IEEE-754 However I don't really understand why it always returns 1 with the TIE operator. And finally as to how to do the transition I don't really have a strong opinion about it. An INI setting seems like this behavior is going to be supported for a long time. But it doesn't seem like its possible to throw deprecation notices to alert of this change. Best regards George P. Banyard
  104521
February 26, 2019 13:06 rowan.collins@gmail.com (Rowan Collins)
On Tue, 26 Feb 2019 at 12:27, Nikita Popov ppv@gmail.com> wrote:

> I'd like to bring forward an RFC for PHP 8 to change the semantics of == > and other non-strict comparisons, when used between a number and a string: > > https://wiki.php.net/rfc/string_to_number_comparison >
Hi Nikita, Thanks for tackling this; I think if we can improve this, we'll be answering a lot of language critics (I'm sure they'll find something else to complain about, but that's life!). However, I'm concerned that it doesn't go far enough, when you say that the following will still return true: 0 == "0e214987142012" "0" == "0e214987142012" I think the cases where this is useful are vastly outweighed by the cases where it's completely unexpected, and potentially dangerous (e.g. in a hash comparison). If this is not fixed, the "dogma to always avoid non-strict comparisons" you refer to will remain. If I understand it right, this arises from the fact that "0e214987142012" is considered a "well-formed numeric string", which is cast to int(0) or float(0). Is it feasible to also narrow this definition as part of the same change? Regards, -- Rowan Collins [IMSoP]
  104522
February 26, 2019 13:26 nikita.ppv@gmail.com (Nikita Popov)
On Tue, Feb 26, 2019 at 2:06 PM Rowan Collins collins@gmail.com>
wrote:

> On Tue, 26 Feb 2019 at 12:27, Nikita Popov ppv@gmail.com> wrote: > > > I'd like to bring forward an RFC for PHP 8 to change the semantics of == > > and other non-strict comparisons, when used between a number and a > string: > > > > https://wiki.php.net/rfc/string_to_number_comparison > > > > > Hi Nikita, > > Thanks for tackling this; I think if we can improve this, we'll be > answering a lot of language critics (I'm sure they'll find something else > to complain about, but that's life!). > > However, I'm concerned that it doesn't go far enough, when you say that the > following will still return true: > > 0 == "0e214987142012" > "0" == "0e214987142012" > > I think the cases where this is useful are vastly outweighed by the cases > where it's completely unexpected, and potentially dangerous (e.g. in a hash > comparison). If this is not fixed, the "dogma to always avoid non-strict > comparisons" you refer to will remain. > > If I understand it right, this arises from the fact that "0e214987142012" > is considered a "well-formed numeric string", which is cast to int(0) or > float(0). Is it feasible to also narrow this definition as part of the same > change? >
Yes, that's right. However, it's probably worth mentioning that string to string comparisons are subject to additional constraints beyond the well-formedness requirement. Since PHP 5.4.4 there are additional overflow protections in place, which prevent numeric comparison from applying when both sides are integers that overflow to double and become equal as a result of that. This means that "100000000000000000000" == "100000000000000000001" returns false rather than true since PHP 5.4.4. I'm mentioning this, because it is a precedent for tweaking the string to string numeric comparison rules to prevent unexpected and possibly security critical equalities. I think we could add similar special handling for the "0eNNNN" == "0eMMMM" case, as this is another "catastrophic" case when it comes to comparisons of hashes that happen to start with 0e, for example. It might be better to discuss such a change separately from this proposal though (it's much more minor, and something that can also conceivable go into a minor version, given that the previous change was applied in a patch release). Regards, Nikita
  104523
February 26, 2019 13:50 rowan.collins@gmail.com (Rowan Collins)
On 26 February 2019 13:26:24 GMT+00:00, Nikita Popov ppv@gmail.com> wrote:
>I'm mentioning this, because it is a precedent for tweaking the string >to >string numeric comparison rules to prevent unexpected and possibly >security >critical equalities. I think we could add similar special handling for >the >"0eNNNN" == "0eMMMM" case, as this is another "catastrophic" case when >it >comes to comparisons of hashes that happen to start with 0e, for >example.
That makes sense. Personally, I find the treatment of strings in this e-notation problematic in all contexts - it makes is_numeric() much less useful for validation, for instance - but realise we have to balance compatibility here.
>It might be better to discuss such a change separately from this >proposal >though (it's much more minor, and something that can also conceivable >go >into a minor version, given that the previous change was applied in a >patch >release).
I think keeping it to a separate RFC is fine, but it would be nice to target the same release. "We've made == safer" is something that we can shout about, even if it's composed of a bunch of small tweaks. It also gives one upgrade where people need to look for subtle breaks, rather than two. Regards, -- Rowan Collins [IMSoP]
  104527
February 27, 2019 07:53 dmitry@zend.com (Dmitry Stogov)
Hi Nikita,


Yeah, this is a big BC break, but I think, it's a good time to make some "cleanup" in PHP-8.

The only thing, I don't like is a difference between leading and trailing spaces.

They should behave in the same way.


Thanks. Dmitry.

________________________________
From: Nikita Popov ppv@gmail.com>
Sent: Tuesday, February 26, 2019 3:27:23 PM
To: PHP internals
Subject: [PHP-DEV] [RFC] Saner string to number comparisons

Hi internals,

I think it is well known that == in PHP is a pretty big footgun. It doesn't
have to be. I think that type juggling comparisons in a language like PHP
have some merit, it's just that the particular semantics of == in PHP make
it so dangerous. The biggest WTF factor is probably that 0 == "foobar"
returns true.

I'd like to bring forward an RFC for PHP 8 to change the semantics of ==
and other non-strict comparisons, when used between a number and a string:

https://wiki.php.net/rfc/string_to_number_comparison

The tl;dr is that if you compare a number and a numeric string, they'll be
compared as numbers. Otherwise, the number is converted into a string and
they'll be compared as strings.

This is a very significant change -- not so much because the actual BC
breakage is expected to be particularly large, but because it is a silent
change in core language semantics, which makes it hard to determine whether
or not code is affected by the change. There are things we can do about
this, for example the RFC suggests that we might want to have a transition
mode where we perform the comparison using both the old and the new
semantics and warn if the result differs.

I think we should give serious consideration to making such a change. I'd
be interested to hear whether other people think this is worthwhile, and
how we could go about doing it, while minimizing breakage.

Regards,
Nikita
  104528
February 27, 2019 08:06 robin@kingsquare.nl ("Kingsquare.nl - Robin Speekenbrink")
Op di 26 feb. 2019 om 13:27 schreef Nikita Popov ppv@gmail.com>:

> Hi internals, > > I think it is well known that == in PHP is a pretty big footgun. It doesn't > have to be. I think that type juggling comparisons in a language like PHP > have some merit, it's just that the particular semantics of == in PHP make > it so dangerous. The biggest WTF factor is probably that 0 == "foobar" > returns true. > > I'd like to bring forward an RFC for PHP 8 to change the semantics of == > and other non-strict comparisons, when used between a number and a string: > > https://wiki.php.net/rfc/string_to_number_comparison > > ... > > Regards, > Nikita >
Dear All, From a user-space POV this is a very good way forward and I (personally) really like the proposal of having a transitional period (7.4 anyone?) As of the 0 == "" bit: I do think that an empty string is widespread regarded as falsey-string... Thus 0 == "" sould IMHO return true... Just my 2 cents, keep up the excellent Nikita! (and everyone else ofcourse) Regards, Robin
  104529
February 27, 2019 08:30 arvids.godjuks@gmail.com (Arvids Godjuks)
ср, 27 февр. 2019 г. в 10:06, Kingsquare.nl - Robin Speekenbrink <
robin@kingsquare.nl>:

> Op di 26 feb. 2019 om 13:27 schreef Nikita Popov ppv@gmail.com>: > > > Hi internals, > > > > I think it is well known that == in PHP is a pretty big footgun. It > doesn't > > have to be. I think that type juggling comparisons in a language like PHP > > have some merit, it's just that the particular semantics of == in PHP > make > > it so dangerous. The biggest WTF factor is probably that 0 == "foobar" > > returns true. > > > > I'd like to bring forward an RFC for PHP 8 to change the semantics of == > > and other non-strict comparisons, when used between a number and a > string: > > > > https://wiki.php.net/rfc/string_to_number_comparison > > > > ... > > > > Regards, > > Nikita > > > > Dear All, > > From a user-space POV this is a very good way forward and I (personally) > really like the proposal of having a transitional period (7.4 anyone?) > > As of the 0 == "" bit: I do think that an empty string is widespread > regarded as falsey-string... Thus 0 == "" sould IMHO return true... > > Just my 2 cents, keep up the excellent Nikita! > (and everyone else ofcourse) > > Regards, > Robin >
It's a bad idea to leave 0 == "" - remember - we are dealing with strings that come in via HTTP, so when you handle form input a 0 can be a valid input value, but an empty string is not. Sure, a === is better used here or empty(), but we should not leave edge cases like these. This is a string to a number comparison - it should trigger notice or error or whatever is designed to happen when "abc" == 123 is being done. -- Arvīds Godjuks +371 26 851 664 arvids.godjuks@gmail.com Skype: psihius Telegram: @psihius https://t.me/psihius
  104530
February 27, 2019 08:44 claude.pache@gmail.com (Claude Pache)
> Le 27 févr. 2019 à 09:06, Kingsquare.nl - Robin Speekenbrink <robin@kingsquare.nl> a écrit : > > As of the 0 == "" bit: I do think that an empty string is widespread > regarded as falsey-string... Thus 0 == "" sould IMHO return true... >
0 == "" evaluating to true has been a footgun for me in the past; something like that: ```php $state = $_GET['state'] ?? null; // ... switch ($state) { case 0: // when $state === "", this branch is incorrectly chosen // ... } ``` where the `state` parameter comes from