[PHP-DEV][RFC] Saner numeric strings

  110771
June 29, 2020 10:30 george.banyard@gmail.com ("G. P. B.")
Greetings internal,

While reviving the "Permit trailing whitespace in numeric strings" RFC [1]
I didn't see Nikita's comment on Andrea's original PR which commented on
the fact that we should get rid of the "leading-numeric string" concept.

Therefore, mostly due to time constraints, I've merge the trailing
whitespace RFC with the removal of this concept in the following RFC:
https://wiki.php.net/rfc/saner-numeric-strings

The main gist is to convert all E_NOTICEs  “A non well formed numeric value
encountered” to the usual E_WARNING “A non-numeric value encountered” and
allow trailing whitespaces, which would be most reasonables cases where the
E_NOTICE has been emitted.

This RFC is heavily based on Andrea's original RFC [1] and I would like to
thank her once more for the preliminary work she's done that I could reuse.

Best regards

George P. Banyard

[1] https://wiki.php.net/rfc/trailing_whitespace_numerics
  110774
June 29, 2020 12:53 claude.pache@gmail.com (Claude Pache)
> Le 29 juin 2020 à 12:30, G. P. B. banyard@gmail.com> a écrit : > > Greetings internal, > > While reviving the "Permit trailing whitespace in numeric strings" RFC [1] > I didn't see Nikita's comment on Andrea's original PR which commented on > the fact that we should get rid of the "leading-numeric string" concept. > > Therefore, mostly due to time constraints, I've merge the trailing > whitespace RFC with the removal of this concept in the following RFC: > https://wiki.php.net/rfc/saner-numeric-strings > > The main gist is to convert all E_NOTICEs “A non well formed numeric value > encountered” to the usual E_WARNING “A non-numeric value encountered” and > allow trailing whitespaces, which would be most reasonables cases where the > E_NOTICE has been emitted. > > This RFC is heavily based on Andrea's original RFC [1] and I would like to > thank her once more for the preliminary work she's done that I could reuse. > > Best regards > > George P. Banyard > > [1] https://wiki.php.net/rfc/trailing_whitespace_numerics
Hi, It is not clear for me, by reading the RFC, whether or not the behaviour of either implicit or explicit conversion of so-called leading numeric string will be preserved, beyond some Notices that will be converted to Warnings in the implicit case. For example, does (int) "2px" still produce 2, even when the string is now considered as non-numeric? I think it is important to make sure that the current results of casting to int/float will not change. For example, I can imagine some script that reads a CSS value like "2px" and does some calculation on it, converting it to a number on the process. A Warning instead of a Notice will prompt the programmer to add explicit casts, which is probably a good thing. But changing the casted value to 0 would lead to bugs that are (1) harder to detect especially if the program already uses explicit casts, and (2) harder to correct. —Claude
  110775
June 29, 2020 13:01 george.banyard@gmail.com ("G. P. B.")
On Mon, 29 Jun 2020 at 14:53, Claude Pache pache@gmail.com> wrote:

> > > Le 29 juin 2020 à 12:30, G. P. B. banyard@gmail.com> a écrit : > > > > Greetings internal, > > > > While reviving the "Permit trailing whitespace in numeric strings" RFC > [1] > > I didn't see Nikita's comment on Andrea's original PR which commented on > > the fact that we should get rid of the "leading-numeric string" concept.. > > > > Therefore, mostly due to time constraints, I've merge the trailing > > whitespace RFC with the removal of this concept in the following RFC: > > https://wiki.php.net/rfc/saner-numeric-strings > > > > The main gist is to convert all E_NOTICEs “A non well formed numeric > value > > encountered” to the usual E_WARNING “A non-numeric value encountered” and > > allow trailing whitespaces, which would be most reasonables cases where > the > > E_NOTICE has been emitted. > > > > This RFC is heavily based on Andrea's original RFC [1] and I would like > to > > thank her once more for the preliminary work she's done that I could > reuse. > > > > Best regards > > > > George P. Banyard > > > > [1] https://wiki.php.net/rfc/trailing_whitespace_numerics > > > Hi, > > It is not clear for me, by reading the RFC, whether or not the behaviour > of either implicit or explicit conversion of so-called leading numeric > string will be preserved, beyond some Notices that will be converted to > Warnings in the implicit case. For example, does (int) "2px" still produce > 2, even when the string is now considered as non-numeric? > > I think it is important to make sure that the current results of casting > to int/float will not change. For example, I can imagine some script that > reads a CSS value like "2px" and does some calculation on it, converting it > to a number on the process. A Warning instead of a Notice will prompt the > programmer to add explicit casts, which is probably a good thing. But > changing the casted value to 0 would lead to bugs that are (1) harder to > detect especially if the program already uses explicit casts, and (2) > harder to correct. > > —Claude >
The behaviour of explicit casts (or any other conversion which accepts errors in the string, which corresponds to the 1 mode of allow_errors in the relevant C is_numeric_string function) will be preserved, so this should still produce 2. I will add test cases and clarify this in the RFC. Best regards George P. Banyard
  110801
July 1, 2020 13:19 george.banyard@gmail.com ("G. P. B.")
I've added a test case just to ensure that casting of leading numeric
strings yields
the previous behaviour and have added this to the BC Break section of the
RFC:
https://wiki.php.net/rfc/saner-numeric-strings

Any other feedback or questions are welcomed.

George P. Banyard
  110802
July 1, 2020 14:26 george.banyard@gmail.com ("G. P. B.")
Hello again,

While reviewing the implementation Nikita spotted a peculiar behaviour that
PHP currently has in regards to string offsets.
The offset is validated using the is_numeric_string but integer
representation of
it is computed using an explicit int cast, this is what leads to the
E_NOTICE in
a roundabout way.

Therefore, a string offset access such as $str['2str'] or $str['2.5'] would
still
evaluate to 2 under the current implementation but would generate an
E_WARNING "Illegal string offset".

The question becomes should such string offsets evaluate to 0 or not.
The answer isn't that straight forward as a float index like 14.5 would also
emit the same warning but evaluate to 14.

I see 3 options:

   1. Keep the current behaviour which emit the E_WARNING "Illegal string
   offset" and use the explicit int cast for both types of offsets ( '2str'
   and '2.5'), which would bring the behaviour of leading integer numeric
   strings in line with the one for float numeric strings.
   2. Emit the E_WARNING "Illegal string offset" and evaluate to 0 for
   offsets like '2str', and emit the E_WARNING "String offset cast
   occurred" for float numeric strings like '2.5'
   3. Emit the E_WARNING "Illegal string offset" and evaluate to 0 for
   non-numeric integer strings.


As this is relatively tricky and all of the options have some change in
behaviour compared to PHP 7 feedback would be very much appreciated.

I'll amend the RFC to detail this behaviour.

Best regards

George P. Banyard
  110803
July 1, 2020 16:44 rowan.collins@gmail.com (Rowan Tommins)
Hi George,

On Wed, 1 Jul 2020 at 15:26, G. P. B. banyard@gmail.com> wrote:

> > 2. Emit the E_WARNING "Illegal string offset" and evaluate to 0 for > offsets like '2str', and emit the E_WARNING "String offset cast > occurred" for float numeric strings like '2.5'
I'm not clear how this option fits with the rest of the RFC; there's no explicit mention of evaluating values to 0 which would previously have been evaluated differently. That would fall into the most significant category of BC breaks: working code in version X still works in version Y, but produces different results. At the moment, the "Backward Incompatible Changes" section just says:
> code with liberal use of leading-numerical strings will need to be updated
It would be good to expand that with examples of what such code would look like, how it would behave before and after, and what updates the user would need to make. Regards, -- Rowan Tommins [IMSoP]
  110818
July 2, 2020 12:05 george.banyard@gmail.com ("G. P. B.")
On Wed, 1 Jul 2020 at 18:44, Rowan Tommins collins@gmail.com> wrote:

> Hi George, > > On Wed, 1 Jul 2020 at 15:26, G. P. B. banyard@gmail.com> wrote: > >> >> 2. Emit the E_WARNING "Illegal string offset" and evaluate to 0 for >> offsets like '2str', and emit the E_WARNING "String offset cast >> occurred" for float numeric strings like '2.5' > > > > I'm not clear how this option fits with the rest of the RFC; there's no > explicit mention of evaluating values to 0 which would previously have been > evaluated differently. That would fall into the most significant category > of BC breaks: working code in version X still works in version Y, but > produces different results. > > At the moment, the "Backward Incompatible Changes" section just says: > > > code with liberal use of leading-numerical strings will need to be > updated > > It would be good to expand that with examples of what such code would look > like, how it would behave before and after, and what updates the user would > need to make. > > Regards, > -- > Rowan Tommins > [IMSoP] >
I've had another thought about while I was implementing this change. I've updated the RFC to reflect what I think is a more reasonable approach:
> For string offsets accessed using numeric strings the following changes > will be made: > > - numeric strings which correspond to well formed floats will emit the > more usual “String offset cast occurred” warning instead of the “Illegal > string offset” one. > - leading numeric strings will emit the “Illegal string offset” > instead of the “A non well formed numeric value encountered” notice, and > continue to evaluate to their respective values. > - Non-numeric strings which emmited the “Illegal string offset” > warning will throw an “Illegal offset type” TypeError > > The only aspect I'm not totally convinced is making well formed float strings more "legal" than they were in PHP 7.x.
I will also flesh out the RFC with more examples and which code is affected and how it would need to be updated to produce the previous behaviour. Best regards George P. Banyard