[RFC] Strict operators directive

  106054
June 25, 2019 13:09 arnold.adaniels.nl@gmail.com (Arnold Daniels)
Hi all,

I would like to open the discussion for RFC: "Strict operators directive".

This RFC proposes a new directive 'strict_operators'. When enabled, operators may cast operands to the expected type, but must comply to;

* Typecasting is not based on the type of the other operand

* Typecasting is not based on the value of any of the operands
* Operators will throw a TypeError for unsupported types

Reasoning; The current rules for type casting done by operators are inconsistent and complex, which can lead to surprising results where a statement seemingly contradicts itself.

Using a directive means that backwards compatibility is guaranteed.

https://wiki.php.net/rfc/strict_operators

Yours,
Arnold Daniels

[Arnold Daniels - Chat @ Spike](https://www.spikenow.com/?ref=spike-organic-signature&_ts=1mzl6)	[1mzl6]
  106055
June 25, 2019 17:56 guilliam.xavier@gmail.com (Guilliam Xavier)
On Tue, Jun 25, 2019 at 3:09 PM Arnold Daniels
nl@gmail.com> wrote:
> > Hi all, > > I would like to open the discussion for RFC: "Strict operators directive". > > This RFC proposes a new directive 'strict_operators'. When enabled, operators may cast operands to the expected type, but must comply to; > > * Typecasting is not based on the type of the other operand > > * Typecasting is not based on the value of any of the operands > * Operators will throw a TypeError for unsupported types > > Reasoning; The current rules for type casting done by operators are inconsistent and complex, which can lead to surprising results where a statement seemingly contradicts itself. > > Using a directive means that backwards compatibility is guaranteed. > > https://wiki.php.net/rfc/strict_operators > > Yours, > Arnold Daniels > > [Arnold Daniels - Chat @ Spike](https://www.spikenow.com/?ref=spike-organic-signature&_ts=1mzl6) [1mzl6]
Hello, thanks for the impressive work... I have just one interrogation: why disallow `~` for strings? (e.g. currently `~"\x00\x01\x02"` gives `"\xFF\xFE\xFD"`) -- Guilliam Xavier
  106056
June 25, 2019 20:27 arnold.adaniels.nl@gmail.com (Arnold Daniels)
On Tue, Jun 25, 2019 at 7:56 PM Guilliam Xavier xavier@gmail.com>
wrote:

> On Tue, Jun 25, 2019 at 3:09 PM Arnold Daniels > nl@gmail.com> wrote: > > > > Hi all, > > > > I would like to open the discussion for RFC: "Strict operators > directive". > > > > This RFC proposes a new directive 'strict_operators'. When enabled, > operators may cast operands to the expected type, but must comply to; > > > > * Typecasting is not based on the type of the other operand > > > > * Typecasting is not based on the value of any of the operands > > * Operators will throw a TypeError for unsupported types > > > > Reasoning; The current rules for type casting done by operators are > inconsistent and complex, which can lead to surprising results where a > statement seemingly contradicts itself. > > > > Using a directive means that backwards compatibility is guaranteed. > > > > https://wiki.php.net/rfc/strict_operators > > > > Yours, > > Arnold Daniels > > > > [Arnold Daniels - Chat @ Spike]( > https://www.spikenow.com/?ref=spike-organic-signature&_ts=1mzl6) > [1mzl6] > > Hello, thanks for the impressive work... > I have just one interrogation: why disallow `~` for strings? > (e.g. currently `~"\x00\x01\x02"` gives `"\xFF\xFE\xFD"`) > > -- > Guilliam Xavier >
Using `~` for strings should be allowed. I fixed it in the RFC. Well spotted. - Arnold
  106057
June 25, 2019 21:18 benjamin.morel@gmail.com (Benjamin Morel)
Impressive work indeed, this would be a perfect addition to strict_types
that would remove a lot of WTFs while preserving BC with older code.

Please note that the formatting of the RFC is broken after the Bitwise
Operators section.

Ben
  106058
June 25, 2019 21:32 krakjoe@gmail.com (Joe Watkins)
Evening,

There doesn't seem to be a patch or implementation.

Aside from the proposed semantics, which I can't really read because the
document is malformed, the most important questions for me are: How is this
going to work? Can it be done without significant complexity in the
compiler or VM?

Without an implementation I can't really consider the ideas proposed,
because they are just ideas without proof that they are reasonably
implementable.

While you can technically move forward with an RFC without implementation,
in this case the implementation should inform our decision at vote time.

Cheers
Joe


On Tue, 25 Jun 2019, 23:19 Benjamin Morel, morel@gmail.com> wrote:

> Impressive work indeed, this would be a perfect addition to strict_types > that would remove a lot of WTFs while preserving BC with older code. > > Please note that the formatting of the RFC is broken after the Bitwise > Operators section. > > Ben >
  106064
June 26, 2019 07:35 arnold.adaniels.nl@gmail.com (Arnold Daniels)
Fixed the formatting. Sorry about that. :-s

I really want to have a discussion prior to creating, to make sure there is
consensus on what should be implemented. However, I will create a patch
prior to voting.

The implementation I have in mind is;
1. add a flag to `CG(active_op_array)->fn_flags` (similar to
`strict_types`) *.
2. split function `get_binary_op` into `get_binary_op_standard` and a new
function `get_binary_op_strict`, where `get_binary_op` calls either based
on the op flag **.
3. add new functions for strict operators to zend_operators.c

*
https://github.com/php/php-src/blob/065559828022b37e88fc8eae4194efafea1b1506/Zend/zend_compile.c#L5127
**
https://github.com/php/php-src/blob/e18c60cd8dfed02311ebb3d11e3543d9a99c7c2a/Zend/zend_opcode.c#L1023

As proof of concept, I've created a test where the `strict_types` directive
affects the `==` and `!=` operators, making them do an 'identical', resp
'not identical' operation.
https://github.com/jasny/php-src/compare/PHP-7.4...jasny:strict_types-affect-operators-test
( to test build branch and run
https://gist.github.com/jasny/eacd187c949459b70d8f8f0818411f0a )

I've added this information to the RFC. Any suggestions or remarks on the
way to implement this are appreciated.

On Tue, Jun 25, 2019 at 11:32 PM Joe Watkins <krakjoe@gmail.com> wrote:

> Evening, > > There doesn't seem to be a patch or implementation. > > Aside from the proposed semantics, which I can't really read because the > document is malformed, the most important questions for me are: How is this > going to work? Can it be done without significant complexity in the > compiler or VM? > > Without an implementation I can't really consider the ideas proposed, > because they are just ideas without proof that they are reasonably > implementable. > > While you can technically move forward with an RFC without implementation, > in this case the implementation should inform our decision at vote time. > > Cheers > Joe > > > On Tue, 25 Jun 2019, 23:19 Benjamin Morel, morel@gmail.com> > wrote: > >> Impressive work indeed, this would be a perfect addition to strict_types >> that would remove a lot of WTFs while preserving BC with older code. >> >> Please note that the formatting of the RFC is broken after the Bitwise >> Operators section. >> >> Ben >> >
  106063
June 26, 2019 06:50 cschneid@cschneid.com (Christian Schneider)
Am 25.06.2019 um 15:09 schrieb Arnold Daniels nl@gmail.com>:
> This RFC proposes a new directive 'strict_operators'. When enabled, operators may cast operands to the expected type, but must comply to; > > * Typecasting is not based on the type of the other operand > > * Typecasting is not based on the value of any of the operands > * Operators will throw a TypeError for unsupported types
While I understand that some people don't like the way PHP does type conversions I think this proposal creates a much bigger element of surprise when copying PHP code from one place to another than all the ..ini-settings ever did. It basically creates two languages in one and I won't be able to determine what $a == 42 exactly does without having to look at the header of the file. I'm inclined to say that if you want to make PHP a new language with a new core type concept then you should fork it and call it something else to avoid confusion. - Chris
  106065
June 26, 2019 07:56 claude.pache@gmail.com (Claude Pache)
> Le 26 juin 2019 à 08:50, Christian Schneider <cschneid@cschneid.com> a écrit : > > Am 25.06.2019 um 15:09 schrieb Arnold Daniels nl@gmail.com>: >> This RFC proposes a new directive 'strict_operators'. When enabled, operators may cast operands to the expected type, but must comply to; >> >> * Typecasting is not based on the type of the other operand >> >> * Typecasting is not based on the value of any of the operands >> * Operators will throw a TypeError for unsupported types > > While I understand that some people don't like the way PHP does type conversions I think this proposal creates a much bigger element of surprise when copying PHP code from one place to another than all the ..ini-settings ever did. > > It basically creates two languages in one and I won't be able to determine what > $a == 42 > exactly does without having to look at the header of the file. > > I'm inclined to say that if you want to make PHP a new language with a new core type concept then you should fork it and call it something else to avoid confusion. > > - Chris >
Indeed. The directive may make operators more strict in what they accept, but it should avoid changing the semantics. Concretely, we must have either: "120" > "99.9"; // true or: "120" > "99.9"; // TypeError Anything else will bring confusion. —Claude
  106066
June 26, 2019 09:09 benjamin.morel@gmail.com (Benjamin Morel)
> "120" > "99.9"; // TypeError > Anything else will bring confusion.
Not sure about this, you can do it the JS way: if both operands are strings, then it behaves like strcmp(): "23" > "4"; // false "23" > "221"; // true I'm not saying that we should do it, but this would not be confusing to me at all. Ben
  106068
June 26, 2019 09:22 cschneid@cschneid.com (Christian Schneider)
Am 26.06.2019 um 11:09 schrieb Benjamin Morel morel@gmail.com>:
> >> "120" > "99.9"; // TypeError >> Anything else will bring confusion. > > Not sure about this, you can do it the JS way: if both operands are > strings, then it behaves like strcmp(): > > "23" > "4"; // false > "23" > "221"; // true > > I'm not saying that we should do it, but this would not be confusing to me > at all.
With the proposed change both "23" > "4" === true # Current behaviour and "23" > "4" === false # New strict behaviour could be the case depending on a declaration somewhere else in the source code. That's the confusion Claude and I were talking about: You cannot be sure what a very simple line of code does. - Chris
  106069
June 26, 2019 09:36 benjamin.morel@gmail.com (Benjamin Morel)
> (...) could be the case depending on a declaration somewhere else in the source code.
> That's the confusion Claude and I were talking about: You cannot be sure what a very simple line of code does.
Oh, I see. You mean that only replacing some of the current results with TypeErrors would be acceptable; returning a different value would not. This makes a lot of sense, but once again prevents the language from slowly moving towards something different (and better), leaving it stuck in its legacy forever. I'm starting to believe that a joint effort to fork PHP if the only way out :( Ben
  106070
June 26, 2019 10:39 claude.pache@gmail.com (Claude Pache)
> Le 26 juin 2019 à 11:36, Benjamin Morel morel@gmail.com> a écrit : > >> (...) could be the case depending on a declaration somewhere else in the > source code. >> That's the confusion Claude and I were talking about: You cannot be sure > what a very simple line of code does. > > Oh, I see. You mean that only replacing some of the current results with > TypeErrors would be acceptable; returning a different value would not. > This makes a lot of sense, but once again prevents the language from slowly > moving towards something different (and better), leaving it stuck in its > legacy forever. > > I'm starting to believe that a joint effort to fork PHP if the only way out > :( > > Ben
It would be something “different”, but not necessarily “better”. Programmers may intentionally rely on the current semantics when comparing numeric strings, e.g. in the following cases: * values that are grabbed from a database using a driver that returns only strings (or nulls); * values that are read from $_POST and that ultimately stems from some HTML element. ------- It was certainly a fundamental design error to have both implicit type conversion and operators that did different things based on the type of their operands. That leads to the infamous `"1" + 1 == 11` problem in JavaScript, or the the "3" < "24" problem in PHP. That could have been avoided in two ways: * either by forbidding implicit conversion; * or by using different operators for different types (as does Perl). Now, returning to the case of the comparison operators like `<` or `==`. Instead of killing implicit conversion and redefining the meaning of those operators in cases that are *not* just edge case, it may be preferable to use the other approach: * in some strict mode, reserve `<`, `==` etc. for numeric comparison, and throw a TypeError one of the operand is not numeric; * If we deem it worth, define a new operators for string comparison. (Although I’m not really sure it is worth: we have `strcmp()` and `===` for byte-to-byte comparison, and the Collator class for alphabetical sorting that actually works in languages not restricted to unaccented latin characters.) —Claude
  106073
June 26, 2019 11:46 benjamin.morel@gmail.com (Benjamin Morel)
> * in some strict mode, reserve `<`, `==` etc. for numeric comparison, and throw a TypeError one of the operand is not numeric;
> * If we deem it worth, define a new operators for string comparison. (Although I’m not really sure it is worth: we have `strcmp()` and `===` for
byte-to-byte comparison, and the Collator class for alphabetical sorting that actually works in languages not restricted to unaccented latin characters.) It's true that string comparison (sorting) is a much harder problem that cannot be solved without additional knowledge of the encoding of the string; so I agree that it might be better to just throw a TypeError when comparing strings, and leave the user with an operator that only works on numbers, and explicitly use dedicated functions when comparing strings. This makes sense for "<", "<=", ">", ">=", but what about "==" and "!="? Currently, "11" == "11.0"; what would this yield under the new proposal? - leave it as is: return true in this case => contradicts the whole purpose of the new proposal - throw a TypeError when performing the above comparison => not acceptable either I guess; every language allows == and != on strings, forcing to use strict comparison operators is a bit weird here. - change the semantics to return false when both operands are strings, and don't match => not acceptable to you as you cannot know what a line of code does without checking the header What would you suggest here? Ben
  106074
June 26, 2019 12:08 rowan.collins@gmail.com (Rowan Collins)
On Wed, 26 Jun 2019 at 12:46, Benjamin Morel morel@gmail.com>
wrote:

> This makes sense for "<", "<=", ">", ">=", but what about "==" and "!="? > > Currently, "11" == "11.0"; what would this yield under the new proposal? > > - leave it as is: return true in this case => contradicts the whole purpose > of the new proposal > - throw a TypeError when performing the above comparison => not acceptable > either I guess; every language allows == and != on strings, forcing to use > strict comparison operators is a bit weird here. > - change the semantics to return false when both operands are strings, and > don't match => not acceptable to you as you cannot know what a line of code > does without checking the header >
Given that we already have === and !==, could the strict mode simply throw an error for *any* use of the non-strict == and != versions? declare(strict_operators=1); var_dump( "11" == "11.0" ); # TypeError: "Cannot use non-strict equality operator in strict operator mode." var_dump( "11" === "11.0"); # bool(false) I'm not sure whether I like the idea or not, but I thought I'd throw it out there as a possibility. Regards, -- Rowan Collins [IMSoP]
  106075
June 26, 2019 12:15 benjamin.morel@gmail.com (Benjamin Morel)
> > Given that we already have === and !==, could the strict mode simply throw > an error for *any* use of the non-strict == and != versions? > declare(strict_operators=1); > var_dump( "11" == "11.0" ); # TypeError: "Cannot use non-strict equality > operator in strict operator mode." > var_dump( "11" === "11.0"); # bool(false) > I'm not sure whether I like the idea or not, but I thought I'd throw it out > there as a possibility.
That's definitely a possibility, that I'm sure a lot of people will dislike. I personally don't have a strong opinion about it. Ben
  106079
June 26, 2019 20:57 arnold.adaniels.nl@gmail.com (Arnold Daniels)
On Wed, Jun 26, 2019 at 1:46 PM Benjamin Morel morel@gmail.com>
wrote:

> > * in some strict mode, reserve `<`, `==` etc. for numeric comparison, and > throw a TypeError one of the operand is not numeric; > > > * If we deem it worth, define a new operators for string comparison. > (Although I’m not really sure it is worth: we have `strcmp()` and `===` for > byte-to-byte comparison, and the Collator class for alphabetical sorting > that actually works in languages not restricted to unaccented latin > characters.) > > It's true that string comparison (sorting) is a much harder problem that > cannot be solved without additional knowledge of the encoding of the > string; so I agree that it might be better to just throw a TypeError when > comparing strings, and leave the user with an operator that only works on > numbers, and explicitly use dedicated functions when comparing strings. > > This makes sense for "<", "<=", ">", ">=", but what about "==" and "!="? > > Currently, "11" == "11.0"; what would this yield under the new proposal? > > - leave it as is: return true in this case => contradicts the whole purpose > of the new proposal > - throw a TypeError when performing the above comparison => not acceptable > either I guess; every language allows == and != on strings, forcing to use > strict comparison operators is a bit weird here. > - change the semantics to return false when both operands are strings, and > don't match => not acceptable to you as you cannot know what a line of code > does without checking the header > > What would you suggest here? > > Ben >
PHP considers a string as a simple byte array. I want to stress that any discussion about character sets or collations is beyond the scope of this RFC. The directive only affects the result of comparing two numeric strings and non-numeric strings. As such, the RFC assumes the current result of comparing non-numeric strings to be 100% correct. To those who disagree with this assumption; please create a separate RFC to discuss this topic and do not take it into consideration in regards to the strict_operators RFC. --- The RFC is modeled after `strict_types`, so to quote part of its motivation "... this RFC proposes a fourth approach: per-file strict or weak type-checking. This has the following advantages: People can choose the type checking model that suits them best, which means this approach should hopefully placate both the strict and weak type checking camps. ..." Take under consideration that the use of `strict_operators` is optional. Those who are inclined to use it consider the current behavior of implicit type casting to be problematic. As such, I imagine that this group does not (want to) use code that exploits this behavior. Those who do not find the current behavior problematic will typically not use the directive and thus are unaffected by it. Disallowing all relational operators for strings is too radical and primarily caters towards those who aren't inclined to use the directive in the first place. In short; it's a compromise that makes nobody happy. The RFC will take the following stance; The directive is catering towards those that find implicit casting by relational operators on two operands of the same type, purely based on the value of those operands, very undesirable. For the audience that's inclined to use the directive, any issues that come from copy/pasting code that exploits this behavior are considered acceptable and should be solved. --- I've added two discussion points to the RFC based on the discussed concerns.. Arnold
  106077
June 26, 2019 12:50 arnold.adaniels.nl@gmail.com (Arnold Daniels)
On Wed, Jun 26, 2019 at 12:39 PM Claude Pache pache@gmail.com>
wrote:

> > > > Le 26 juin 2019 à 11:36, Benjamin Morel morel@gmail.com> a > écrit : > > > >> (...) could be the case depending on a declaration somewhere else in the > > source code. > >> That's the confusion Claude and I were talking about: You cannot be sure > > what a very simple line of code does. > > > > Oh, I see. You mean that only replacing some of the current results with > > TypeErrors would be acceptable; returning a different value would not. > > This makes a lot of sense, but once again prevents the language from > slowly > > moving towards something different (and better), leaving it stuck in its > > legacy forever. > > > > I'm starting to believe that a joint effort to fork PHP if the only way > out > > :( > > > > Ben > > > It would be something “different”, but not necessarily “better”. > > Programmers may intentionally rely on the current semantics when comparing > numeric strings, e.g. in the following cases: > * values that are grabbed from a database using a driver that returns only > strings (or nulls); > * values that are read from $_POST and that ultimately stems from some > HTML element. > > ------- > > It was certainly a fundamental design error to have both implicit type > conversion and operators that did different things based on the type of > their operands. That leads to the infamous `"1" + 1 == 11` problem in > JavaScript, or the the "3" < "24" problem in PHP. That could have been > avoided in two ways: > * either by forbidding implicit conversion; > * or by using different operators for different types (as does Perl). > > Now, returning to the case of the comparison operators like `<` or `==`. > Instead of killing implicit conversion and redefining the meaning of those > operators in cases that are *not* just edge case, it may be preferable to > use the other approach: > > * in some strict mode, reserve `<`, `==` etc. for numeric comparison, and > throw a TypeError one of the operand is not numeric; > > * If we deem it worth, define a new operators for string comparison. > (Although I’m not really sure it is worth: we have `strcmp()` and `===` for > byte-to-byte comparison, and the Collator class for alphabetical sorting > that actually works in languages not restricted to unaccented latin > characters.) > > —Claude > > > Forbidding implicit type conversion completely is taking it to far. Some operators like string concatenation (`.`) can perform conversions just fine..
The issue at hand is limited to operators that are affected by the value (not only the type) of the operands. Specifically: 1. When using numeric strings with relational operators. This includes statements like `"16" == "016"`. 2. When comparing two arrays, eg `[null] == [0]` and `[0] == ["foo"]`, or comparing two objects. 3. In a `switch` statement. -- 3. Whether a switch is or isn't affected by `strict_operators` should be determined via a secondary vote. 2. Concerning the `==` and `!=` with arrays and objects. There is currently a range of differences when compared to the effect of `===` and `!==`. To what extent is the typecasting intended? Some cases like `[0] == [false]` can be common. As such widening primitive conversion from bool to int might be a good idea (in general). Beyond that allowing cases like `[[]] == [false]` would undermine the purpose of this RFC as it allows seemingly self-contradicting statements to evaluate to true, like $a == $b && $a == $c && $b != $c with $a = [false]; $b = [0]; $c = [[]]; 1. The `strict_types` directives already require you to cast raw data from `$_GET`/`$_POST` or a database. In case using the directive would disallow strings, arrays, and objects as operands for relational operators (throwing a `TypeError`), would still require explicit casting. The difference is that when you forget to do that or copy the code from a code base where this isn't required, you'd always get a `TypeError`, rather than it giving a different result. I don't think a `TypeError` should not be thrown based on the value of an operand, only based on the type. Also. implicitly casting strings to numbers, but not casting other types (like arrays), is only making the logic of operators more complex and inconsistent. Disallowing relational operators `==`, `!=`, `<`, `>`, `<=`, `>=` and `<=>` for strings altogether, requiring the use of a function is an option. However, IMHO this is killing a fly with a cannon, as the problem is limited to "copy/pasted code for comparing numeric strings from a source file that doesn't use strict_operators to a file does use it". -- So, should a directive, declared at the top of the file, affect how the code in that file is executed? Afaics YES, that's exactly what it's for > The declare construct is used to set execution directives for a block of code. (https://www.php.net/declare)
  106071
June 26, 2019 11:00 rowan.collins@gmail.com (Rowan Collins)
On Wed, 26 Jun 2019 at 10:36, Benjamin Morel morel@gmail.com>
wrote:

> Oh, I see. You mean that only replacing some of the current results with > TypeErrors would be acceptable; returning a different value would not. > This makes a lot of sense, but once again prevents the language from slowly > moving towards something different (and better), leaving it stuck in its > legacy forever. >
If we're talking about combining operator overloading and type juggling in the way that JS does it, I would definitely debate whether that's "better". It leads to the weird circular situation where to know what an operator means, you have to look at the types; but to know how the types will be interpreted, you need to know what the operator means. Perl is a notable contrast: the types of operands are deduced based on the operator, but there are different operators to force them to different types. So `23 < 4` and `"23" < "4"` are both numeric comparisons, so return false; but `23 lt 4` and `"23" lt "4"` do string comparisons, and return true. That way the user's intent is clear, but you don't have to manually cast values or remember how different combinations will be interpreted.
> I'm starting to believe that a joint effort to fork PHP if the only way out >
If what you want is a fork of PHP with stronger typing, then take a look at Hack https://hacklang.org/ Regards, -- Rowan Collins [IMSoP]
  106072
June 26, 2019 11:18 addw@phcomp.co.uk (Alain D D Williams)
On Wed, Jun 26, 2019 at 12:00:18PM +0100, Rowan Collins wrote:

> Perl is a notable contrast: the types of operands are deduced based on the > operator, but there are different operators to force them to different > types. So `23 < 4` and `"23" < "4"` are both numeric comparisons, so return > false; but `23 lt 4` and `"23" lt "4"` do string comparisons, and return > true. That way the user's intent is clear, but you don't have to manually > cast values or remember how different combinations will be interpreted.
IMHO the Perl way is better: the different operators mean that I will get what I want, I don't need to worry about an accidental type juggle; it is also (presumably) faster as the run time does not need to: look at a string, decide if it could be a number and maybe change what it does. The big problem is backwards compatibility, so new operators would be needed: string compare: lt, gt, etc, not much of a problem numeric compare: #< #> would be nice were it not that # means comment. -- Alain Williams Linux/GNU Consultant - Mail systems, Web sites, Networking, Programmer, IT Lecturer. +44 (0) 787 668 0256 https://www.phcomp.co.uk/ Parliament Hill Computers Ltd. Registration Information: https://www.phcomp.co.uk/contact.php #include
  106076
June 26, 2019 12:47 arnold.adaniels.nl@gmail.com (Arnold Daniels)
On Wed, Jun 26, 2019 at 1:18 PM Alain D D Williams <addw@phcomp.co.uk>
wrote:

> On Wed, Jun 26, 2019 at 12:00:18PM +0100, Rowan Collins wrote: > > > Perl is a notable contrast: the types of operands are deduced based on > the > > operator, but there are different operators to force them to different > > types. So `23 < 4` and `"23" < "4"` are both numeric comparisons, so > return > > false; but `23 lt 4` and `"23" lt "4"` do string comparisons, and return > > true. That way the user's intent is clear, but you don't have to manually > > cast values or remember how different combinations will be interpreted. > > IMHO the Perl way is better: the different operators mean that I will get > what I > want, I don't need to worry about an accidental type juggle; it is also > (presumably) faster as the run time does not need to: look at a string, > decide > if it could be a number and maybe change what it does. > > The big problem is backwards compatibility, so new operators would be > needed: > > string compare: lt, gt, etc, not much of a problem > > numeric compare: #< #> would be nice were it not that # means comment. > > > Note that using a directive means there is no inherit backward
compatibility issue. We're talking about copy/pasted code sniplets only. Solving the issues presented, maintaining BC, without the use of a directive would require the addition of multiple type-specific operators. String compare, numeric compare, array compare, etc, etc. PHP code would become unrecognizable. I'm not a fan of that alternative. Arnold
  106078
June 26, 2019 20:44 d.takken@xs4all.nl (Dik Takken)
Hello,

Thanks a lot for your work on this RFC, it looks like a nice way to
allow the language to gradually move forward.

As pointed out by others, the ==, ===, != and !== operators are a bit
problematic. A possible solution could be to leave them out of the RFC.
The reason to do so is that the choice between strict or non-strict
comparison is already possible by choosing the appropriate operator. In
my view, explicitly using == in stead of === is either intentional or a
bug. If it is intentional, the author consciously chose to be
non-strict. The strictness declaration would then only affect operators
for which no strict variant exists or where the operator is implicit
(switch statement).

As for changing the behavior of in_array() and friends: I would love the
idea of not having to use the strict argument everywhere anymore.
However, changing behavior of functions that are not in the same file
that has the strictness declaration seems inconsistent. The scope of the
declaration would not be well defined anymore. There may be other means
to fix this annoyance, like introducing a strict variant of in_array().

Regarding the switch statement: While it is not an operator, one could
argue that it is a case of implicit use of an operator.

Regards,
Dik Takken
  106080
June 26, 2019 21:22 arnold.adaniels.nl@gmail.com (Arnold Daniels)
Hi Dik,

Thanks for taking the time to review this RFC.

On Wed, Jun 26, 2019 at 10:44 PM Dik Takken takken@xs4all.nl> wrote:

> Hello, > > Thanks a lot for your work on this RFC, it looks like a nice way to > allow the language to gradually move forward. > > As pointed out by others, the ==, ===, != and !== operators are a bit > problematic. A possible solution could be to leave them out of the RFC. > The reason to do so is that the choice between strict or non-strict > comparison is already possible by choosing the appropriate operator. In > my view, explicitly using == in stead of === is either intentional or a > bug. If it is intentional, the author consciously chose to be > non-strict. The strictness declaration would then only affect operators > for which no strict variant exists or where the operator is implicit > (switch statement).
I would argue the following; The explicit use of the strict_operator is intentional, meaning that the author consciously chose to be strict and does not expect some operators to still be non-strict. The issues pointed out, apply to all comparison operators. Ignoring == and != in the RFC creates an inconsistency, while not properly addressing those concerns.
> As for changing the behavior of in_array() and friends: I would love the > idea of not having to use the strict argument everywhere anymore. > However, changing behavior of functions that are not in the same file > that has the strictness declaration seems inconsistent. The scope of the > declaration would not be well defined anymore. There may be other means > to fix this annoyance, like introducing a strict variant of in_array(). >
:+1:
> Regarding the switch statement: While it is not an operator, one could > argue that it is a case of implicit use of an operator. >
I agree. Internally it's defined as an operator even. Still, I'll put this up as a secondary vote.
> Regards, > Dik Takken >
  106082
June 27, 2019 09:28 d.takken@xs4all.nl (Dik Takken)
On 26-06-19 23:22, Arnold Daniels wrote:
> > I would argue the following; The explicit use of the strict_operator is > intentional, meaning that the author consciously chose to be strict and > does not expect some operators to still be non-strict. The issues pointed > out, apply to all comparison operators. Ignoring == and != in the RFC > creates an inconsistency, while not properly addressing those concerns.
Yes, I guess you're right about treating all operators in a strict way with a simpler set of rules is more consistent. Concerning the issue with copying existing code into a file that uses stricter interpretation of the code: I think this should be regarded as performing an upgrade of the code that is being copied. No problem in my view. In the section about widening the scope you address the type juggling that happens on array access, like $array[12.34]. One could argue that accessing an array item by key is implicit use of the == operator, just like a switch statement is. I would love to see it included in the main proposal in stead of proposing it as part of a different directive. The change in behavior could be similar to what is proposed for the switch statement: Array keys are compared using the === operator. Regards, Dik Takken
  106203
July 9, 2019 14:07 nikita.ppv@gmail.com (Nikita Popov)
On Tue, Jun 25, 2019 at 3:10 PM Arnold Daniels nl@gmail.com>
wrote:

> Hi all, > > I would like to open the discussion for RFC: "Strict operators directive". > > This RFC proposes a new directive 'strict_operators'. When enabled, > operators may cast operands to the expected type, but must comply to; > > * Typecasting is not based on the type of the other operand > > * Typecasting is not based on the value of any of the operands > * Operators will throw a TypeError for unsupported types > > Reasoning; The current rules for type casting done by operators are > inconsistent and complex, which can lead to surprising results where a > statement seemingly contradicts itself. > > Using a directive means that backwards compatibility is guaranteed. > > https://wiki.php.net/rfc/strict_operators >
Hi Arnold, I like the idea behind this RFC. This is a good way to avoid unfortunate legacy behavior without breaking BC. Here are some more detailed thoughts: * I think to be really useful, this additionally needs https://wiki.php.net/rfc/namespace_scoped_declares or some variation thereof. Being able to say "this whole library uses strict operators" is much more useful than specifying this in every file (and possibly missing it somewhere and thus getting the wrong semantics). I will try to get a new version of this RFC based on directories rather than namespaces into PHP 8. * The sentence "In this case, we're passing an int to a function that accepts float. The parameter is converted (widened) to float." should probably not be referring to functions and parameters. * "To compare two numeric strings as numbers, they need to be cast to floats." This may loose precision for integers. It is better to cast to numbers (int or float) using, with the canonical way being +$x. But I guess that won't work under strict_operators. Maybe we should have a (number) cast (it already exists internally...) * This has already been mentioned by others: Having $str1 < $str2 perform a strcmp() style comparison under strict_operators is surprising. I think that overall the use of lexicographical string comparisons is quite rare and should be performed using an explicit strcmp() call. More likely than not, writing $str1 < $str2 is a bug and should generate a TypeError. Of course, equality comparisons like $str1 == $str2 should still work, similar to the distinction you make for arrays. * If I understand correctly, under this RFC "foo" == 0 will throw a TypeError, but ["foo"] == [0] will return false. Generally the behavior of the recursive comparison here is that it's the same as strict == but all errors become not-equal instead. Correct? I'm not sure how I feel about this, as it seems to introduce one more set of semantics next to the weak ==, strict == and === semantics there already are. * I also find it somewhat odd that you can't write something like "$obj != null" anymore, only "$obj !== null". * I think the "solution" to the last three points is a) only support numbers in relational operators (<,<=,>,>=,<=>) and throw TypeErrors otherwise (maybe modulo provisions for object overloading) and b) allow comparing any types in == and !=, without throwing a TypeError. The question "Are 42 and 'foobar' equal?" has a pretty clear answer: "No they aren't", so there is no need to make this a TypeError (while the question "Is 42 larger than 'foobar'?" has no good answer.) I believe doing something like this would roughly match how Python 3 works. (Edit: I see now that this is mentioned in the FAQ, but I think it would be good to reconsider this. It would solve most of my problems with this proposal.) * String increment seems like a pretty niche use case, and I believe that many people find the overflow behavior quite surprising. I think it may be better to forbid string increment under strict_operators. * A similar argument can be made for the use of &, | and ^ on strings. While I have some personal fondness for these, in practical terms these are rarely used and may be indicative of a bug. I think both for string increment and string and/or/xor it may be better to expose these as functions so their use is more explicit. Regards, Nikita
  106207
July 10, 2019 21:37 arnold.adaniels.nl@gmail.com (Arnold Daniels)
Hi Nikita,

Thanks for your feedback.

I'll fix the textual errors you mentioned.

* "To compare two numeric strings as numbers, they need to be cast to
> floats." This may loose precision for integers. It is better to cast to > numbers (int or float) using, with the canonical way being +$x. But I guess > that won't work under strict_operators. Maybe we should have a (number) > cast (it already exists internally...) >
Good point. While in most cases you know if you're working with floats or integers, adding a way to cast to either an int or float would be nice. Maybe preferably through a function like `numberval($x)` or simply `number($x), so the `(type)` syntax is reserved for actual types. That would be an RFC on its own though.
> * This has already been mentioned by others: Having $str1 < $str2 perform > a strcmp() style comparison under strict_operators is surprising. I think > that overall the use of lexicographical string comparisons is quite rare > and should be performed using an explicit strcmp() call. More likely than > not, writing $str1 < $str2 is a bug and should generate a TypeError. Of > course, equality comparisons like $str1 == $str2 should still work, similar > to the distinction you make for arrays. >
Ok, fair. I'll change it so <,<=,>,>=,<=> comparison on a string throws a TypeError, similar to arrays, resources, and objects.
> * If I understand correctly, under this RFC "foo" == 0 will throw a > TypeError, but ["foo"] == [0] will return false. Generally the behavior of > the recursive comparison here is that it's the same as strict == but all > errors become not-equal instead. Correct? I'm not sure how I feel about > this, as it seems to introduce one more set of semantics next to the weak > ==, strict == and === semantics there already are. >
The syntax would be `$a == $b` (or `$a == [0]`), where $a and $b are a string/int in one case and both an array in the other case. In the second case, we can't throw a TypeError as both operands are of the same type.
> * I also find it somewhat odd that you can't write something like "$obj != > null" anymore, only "$obj !== null". >
To check against null, it's better to use !==. For objects (and resources) using `!= null` is ok, but for other types, it's currently not. For example; `[] == null` gives true.
> * I think the "solution" to the last three points is a) only support > numbers in relational operators (<,<=,>,>=,<=>) and throw TypeErrors > otherwise (maybe modulo provisions for object overloading) and b) allow > comparing any types in == and !=, without throwing a TypeError. The > question "Are 42 and 'foobar' equal?" has a pretty clear answer: "No they > aren't", so there is no need to make this a TypeError (while the question > "Is 42 larger than 'foobar'?" has no good answer.) I believe doing > something like this would roughly match how Python 3 works. (Edit: I see > now that this is mentioned in the FAQ, but I think it would be good to > reconsider this. It would solve most of my problems with this proposal.) >
Besides the argument in the FAQ, having the == and != return do a type check, means there are a lot more cases where the behavior changes rather than that a TypeError is thrown. Currently `"foobar" == 0` returns true, but this would make it return false. So would `1 == true`, `"0" == 0` and `"0" == false`. To reduce the cases where the behavior changes to a minimum, it's better to throw TypeErrors for == and !=.
> * String increment seems like a pretty niche use case, and I believe that > many people find the overflow behavior quite surprising. I think it may be > better to forbid string increment under strict_operators. >
Ok
> * A similar argument can be made for the use of &, | and ^ on strings. > While I have some personal fondness for these, in practical terms these are > rarely used and may be indicative of a bug. I think both for string > increment and string and/or/xor it may be better to expose these as > functions so their use is more explicit. >
These operators make it very easy to work with binary data as strings in PHP. In other languages you have to work with byte arrays, which is a major pain. They're also very intuitive; `"wow" & "xul"` is the same as `chr(ord('w') & ord('x')) . chr(ord('o') & ord('u')). chr(ord('w') & ord('l'))`. I think these should stay.
> > Regards, > Nikita >
Arnold
  106208
July 10, 2019 22:24 dev@mabe.berlin (Marc)
Hi,


On 10.07.19 23:37, Arnold Daniels wrote:
>> * I also find it somewhat odd that you can't write something like "$obj != >> null" anymore, only "$obj !== null". >> > To check against null, it's better to use !==. For objects (and resources) > using `!= null` is ok, but for other types, it's currently not. For > example; `[] == null` gives true.
I would argue that two operands will not be the same if they are of different types (except for int/float). Means 0 == "" and 0 == "0" will both be false but 0 == 0 and 0 == 0.0 will be true. Note: In my opinion int/float check should also make sure that there is no data loss on comparing a very bit integer to a float. In this case they should not be equal.
> > >> * I think the "solution" to the last three points is a) only support >> numbers in relational operators (<,<=,>,>=,<=>) and throw TypeErrors >> otherwise (maybe modulo provisions for object overloading) and b) allow >> comparing any types in == and !=, without throwing a TypeError. The >> question "Are 42 and 'foobar' equal?" has a pretty clear answer: "No they >> aren't", so there is no need to make this a TypeError (while the question >> "Is 42 larger than 'foobar'?" has no good answer.) I believe doing >> something like this would roughly match how Python 3 works. (Edit: I see >> now that this is mentioned in the FAQ, but I think it would be good to >> reconsider this. It would solve most of my problems with this proposal.) >> > Besides the argument in the FAQ, having the == and != return do a type > check, means there are a lot more cases where the behavior changes rather > than that a TypeError is thrown. Currently `"foobar" == 0` returns true, > but this would make it return false. So would `1 == true`, `"0" == 0` and > `"0" == false`. To reduce the cases where the behavior changes to a > minimum, it's better to throw TypeErrors for == and !=.
I thing this goes hand in hand with the empty check like empty("0") is true but empty("00") is false. I couldn't find how/if this will change with this RFC.
> > >> * String increment seems like a pretty niche use case, and I believe that >> many people find the overflow behavior quite surprising. I think it may be >> better to forbid string increment under strict_operators. >> > Ok > > >> * A similar argument can be made for the use of &, | and ^ on strings. >> While I have some personal fondness for these, in practical terms these are >> rarely used and may be indicative of a bug. I think both for string >> increment and string and/or/xor it may be better to expose these as >> functions so their use is more explicit. >> > These operators make it very easy to work with binary data as strings in > PHP. In other languages you have to work with byte arrays, which is a major > pain. They're also very intuitive; `"wow" & "xul"` is the same as > `chr(ord('w') & ord('x')) . chr(ord('o') & ord('u')). chr(ord('w') & > ord('l'))`. I think these should stay.
I do agree here. Even if working with binary strings isn't most common in PHP web development I actually use these for bitsets. But I have a note here that bit shifting currently does not work with binary strings (tries to cast binary string to integer) and even if it would shift the binary string the >> is designed to keep the first bit as a positive/negative flag which of course does not make sense for binary strings. In my opinion the bit shifting operators should accept and work well with binary strings. I don't see a reason why it's performing a type cast here.
> > >> Regards, >> Nikita >> > Arnold >
Marc