[PHP-DEV] [RFC] [VOTE] Saner numeric strings

  111043
July 16, 2020 14:18 george.banyard@gmail.com ("G. P. B.")
Hello internals,

I've opened voting for the Saner Numeric strings RFC:
https://wiki.php.net/rfc/saner-numeric-strings

This will last 2 weeks until the 30th of July

Best regards

George P. Banyard
  111044
July 16, 2020 13:39 ocramius@gmail.com (Marco Pivetta)
Hey George,

I really like this specific bit of the proposal:

 > And the various cases which currently emit an E_WARNING will be promoted
to TypeErrors.

I really do not like these particular horrible behaviors of the language
(huge "yikes" for PHP being so broken):

 * `"123" == "123 "` - do not want - already bad enough that leading
whitespace is ignored here, and already caused some security issues on my
end some years ago
 * `is_numeric("123   ") === true` - `is_numeric()` should probably be
soft-deprecated and replaced with something stricter, instead of expanding
this too...

I don't really care about `declare(strict_types=0), since I don't use it
anymore, nor plan to use it anymore in any foreseeable future, but the two
points above really feel wrong, and I'm conflicted about what to vote.

Marco Pivetta

http://twitter.com/Ocramius

http://ocramius.github.com/


On Thu, Jul 16, 2020 at 3:21 PM G. P. B. banyard@gmail.com> wrote:

> Hello internals, > > I've opened voting for the Saner Numeric strings RFC: > https://wiki.php.net/rfc/saner-numeric-strings > > This will last 2 weeks until the 30th of July > > Best regards > > George P. Banyard >
  111045
July 16, 2020 15:15 george.banyard@gmail.com ("G. P. B.")
On Thu, 16 Jul 2020 at 15:39, Marco Pivetta <ocramius@gmail.com> wrote:

> Hey George, > > I really like this specific bit of the proposal: > > > And the various cases which currently emit an E_WARNING will be > promoted to TypeErrors. > > I really do not like these particular horrible behaviors of the language > (huge "yikes" for PHP being so broken): > > * `"123" == "123 "` - do not want - already bad enough that leading > whitespace is ignored here, and already caused some security issues on my > end some years ago > * `is_numeric("123 ") === true` - `is_numeric()` should probably be > soft-deprecated and replaced with something stricter, instead of expanding > this too... > > I don't really care about `declare(strict_types=0), since I don't use it > anymore, nor plan to use it anymore in any foreseeable future, but the two > points above really feel wrong, and I'm conflicted about what to vote. > > Marco Pivetta > > http://twitter.com/Ocramius > > http://ocramius.github.com/ >
Hey Marco, I do agree that accepting trailing whitespace may be considered suboptimal but as leading whitespaces don't even emit an E_NOTICE it is impossible to drop this "feature". Therefore I went for the next best thing which makes it IMHO sane, i.e. accepting trailing whitespaces everywhere. One way to improve the comparison case could be by adding an "eq" op like PERL has which always do a string comparison, basically equivalent to (string) $a === (string) $b as in PERL == always does a numerical comparison, which is not the case in PHP as it tries to be "smart". To address the second point, deprecating is_numeric seems out of scope for this RFC and could be a future scope. Hope this clarifies some of the reasoning. Best regards George P. Banyard
  111046
July 16, 2020 14:34 marandall@php.net (Mark Randall)
On 16/07/2020 16:15, G. P. B. wrote:
> I do agree that accepting trailing whitespace may be considered suboptimal > but as leading whitespaces don't even emit an E_NOTICE it is impossible to > drop this "feature".
Didn't someone (Nikita?) already back-port a warning to 7.4 about some changes coming in 8.0? So it's not really impossible... it just takes the will to do it. We can keep making small tweeks trying to make it better, or we can just do the "proper" thing, reject everything except exclusively properly-formatted numbers, and put the issue to bed permanently. Mark Randall
  111063
July 17, 2020 09:26 bobwei9@hotmail.com (Bob Weinand)
Hey George,

while I agree with your RFC in general, changing the autocast behavior of the empty string is not acceptable for me.

Empty strings often are the output of non-existing things, like default value of a text field in a DB, when reading files not filled with inputs yet etc. It should expose a similar behaviour to all the other falsy values, i.e.. null, false, ...
The current side effects can be held in mind "ah this won't break if the input is unexpectedly not present, should be safe" while writing code, but finding places where that sort of assumption was made is next to impossible.
This should be a major step backwards from the forgiveness of PHP - especially in cases where "shouldn't happen, but the behaviour is nearly always what I expect, and logging will allow me to improve it".
I do not want to break everything in production because something happens to return an empty string.

I'm aware that it is different from the TypeError behavior of *userland* functions (internal functions do only emit a warning). But internal functions are the important foundation here. This is also internal, the number conversions.

Hence I'm voting no on this.

Bob

> Am 16.07.2020 um 16:18 schrieb G. P. B. banyard@gmail.com>: > > Hello internals, > > I've opened voting for the Saner Numeric strings RFC: > https://wiki.php.net/rfc/saner-numeric-strings > > This will last 2 weeks until the 30th of July > > Best regards > > George P. Banyard
  111064
July 17, 2020 09:30 nikita.ppv@gmail.com (Nikita Popov)
On Fri, Jul 17, 2020 at 11:27 AM Bob Weinand <bobwei9@hotmail.com> wrote:

> Hey George, > > while I agree with your RFC in general, changing the autocast behavior of > the empty string is not acceptable for me. > > Empty strings often are the output of non-existing things, like default > value of a text field in a DB, when reading files not filled with inputs > yet etc. It should expose a similar behaviour to all the other falsy > values, i.e. null, false, ... > The current side effects can be held in mind "ah this won't break if the > input is unexpectedly not present, should be safe" while writing code, but > finding places where that sort of assumption was made is next to impossible. > This should be a major step backwards from the forgiveness of PHP - > especially in cases where "shouldn't happen, but the behaviour is nearly > always what I expect, and logging will allow me to improve it". > I do not want to break everything in production because something happens > to return an empty string. > > I'm aware that it is different from the TypeError behavior of *userland* > functions (internal functions do only emit a warning). But internal > functions are the important foundation here. This is also internal, the > number conversions. > > Hence I'm voting no on this. >
Can you give a code example for an undesirable behavior change? I don't think this proposal changes anything about the handling of empty strings. Empty strings are already considered non-numeric, and as such already result in TypeErrors (e.g. https://3v4l.org/WVfeg). Nikita
> > Am 16.07.2020 um 16:18 schrieb G. P. B. banyard@gmail.com>: > > > > Hello internals, > > > > I've opened voting for the Saner Numeric strings RFC: > > https://wiki.php.net/rfc/saner-numeric-strings > > > > This will last 2 weeks until the 30th of July > > > > Best regards > > > > George P. Banyard > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: https://www.php.net/unsub.php > >
  111065
July 17, 2020 09:39 bobwei9@hotmail.com (Bob Weinand)
> Am 17.07.2020 um 11:30 schrieb Nikita Popov ppv@gmail.com>: > > On Fri, Jul 17, 2020 at 11:27 AM Bob Weinand <bobwei9@hotmail.com <mailto:bobwei9@hotmail.com>> wrote: > Hey George, > > while I agree with your RFC in general, changing the autocast behavior of the empty string is not acceptable for me. > > Empty strings often are the output of non-existing things, like default value of a text field in a DB, when reading files not filled with inputs yet etc. It should expose a similar behaviour to all the other falsy values, i.e. null, false, ... > The current side effects can be held in mind "ah this won't break if the input is unexpectedly not present, should be safe" while writing code, but finding places where that sort of assumption was made is next to impossible. > This should be a major step backwards from the forgiveness of PHP - especially in cases where "shouldn't happen, but the behaviour is nearly always what I expect, and logging will allow me to improve it". > I do not want to break everything in production because something happens to return an empty string. > > I'm aware that it is different from the TypeError behavior of *userland* functions (internal functions do only emit a warning). But internal functions are the important foundation here. This is also internal, the number conversions. > > Hence I'm voting no on this. > > Can you give a code example for an undesirable behavior change? I don't think this proposal changes anything about the handling of empty strings. Empty strings are already considered non-numeric, and as such already result in TypeErrors (e.g. https://3v4l.org/WVfeg <https://3v4l.org/WVfeg>). > > Nikita
Apparently I tested the wrong functions. Was having a look at pow() which accepts a number. Sigh. Then it's the fault of pow() and pow() should be probably fixed… But as an example, I was storing values as a pipe delimited string of counters in the db - added a new value and forgot to update the old records. Old records looking like "1|27|37|". Doing an list($a, $b, $c, $d) = explode("|", $input); worked fine, but $d was, well, an empty string. Quickly found the reason in the logs, updated the db, but our end users didn't notice anything. Bob
  111114
July 22, 2020 13:20 cschneid@cschneid.com (Christian Schneider)
Am 16.07.2020 um 16:18 schrieb G. P. B. banyard@gmail.com>:
> I've opened voting for the Saner Numeric strings RFC: > https://wiki.php.net/rfc/saner-numeric-strings
Looking at the BC section I stumbled across
> Code relying on the fact that '' (an empty string) evaluates to 0 for arithmetic/bitwise operations.
which is elaborated further down as
> The third reason already emitted an E_WARNING. We considered special-casing this to evaluate to 0, but this would be inconsistent with how type declarations deal with an empty string, namely throwing a TypeError. Therefore a TypeError will also be emitted in this case. The error can be avoided by explicitly checking for an empty string and changing it to 0.
but now I'm a bit confused what is meant by "arithmetic/bitwise operations". I assume this applies only to things like substr("foo", "") but not to arithmetic *operators* like 42 + "" because that doesn't currently trigger E_WARNING AFAIK. Maybe this should be clarified? Regards, - Chris
  111137
July 22, 2020 20:31 weirdan@gmail.com (Bruce Weirdan)
On Wed, Jul 22, 2020 at 4:21 PM Christian Schneider <cschneid@cschneid.com>
wrote:

> but not to arithmetic *operators* like 42 + "" because that doesn't > currently trigger E_WARNING AFAIK. >
It does produce warning since PHP 7.1 : https://3v4l.org/4CV1E -- Best regards, Bruce Weirdan mailto: weirdan@gmail.com
  111275
July 31, 2020 12:01 george.banyard@gmail.com ("G. P. B.")
Hello internals,

Although, 1 day later than meant to, I'm glad to announce the Saner Numeric
String RFC has
been accepted with 30 votes in favour and 4 votes against.
The secondary implementation detail vote has failed with 2 votes in favour
and 27 votes against.

Best regards


George P. Banyard