NULL Coercion Consistency

  117501
April 8, 2022 17:34 craig@craigfrancis.co.uk (Craig Francis)
Hi,

I've written a new draft RFC to address the NULL coercion problems:

https://wiki.php.net/rfc/null_coercion_consistency

This is due to the result of the Allow NULL quiz:

https://quiz.craigfrancis.co.uk/

14 votes for Fatal Type Errors irrespective of `strict_types=1`;
13 votes for NULL coercion when not using `strict_types=1`;
8 votes to update some parameters to allow NULL;

I appreciate some want to force strict type checking on everyone, but I
want to make sure we have this properly documented, with names, and
explanations.

Breaking changes should be justified - if they aren't, they only
make upgrading difficult and frustrating (bad for security).

Craig
  117502
April 8, 2022 17:54 mel@dafert.at (Mel Dafert)
On 8 April 2022 19:34:52 CEST, Craig Francis <craig@craigfrancis.co.uk> wrote:
>Hi, > >I've written a new draft RFC to address the NULL coercion problems: > >https://wiki.php.net/rfc/null_coercion_consistency > >This is due to the result of the Allow NULL quiz: > >https://quiz.craigfrancis.co.uk/ > >14 votes for Fatal Type Errors irrespective of `strict_types=1`; >13 votes for NULL coercion when not using `strict_types=1`; >8 votes to update some parameters to allow NULL; > >I appreciate some want to force strict type checking on everyone, but I >want to make sure we have this properly documented, with names, and >explanations. > >Breaking changes should be justified - if they aren't, they only >make upgrading difficult and frustrating (bad for security). > >Craig
Hello, While the RFC extensively documents the problem space, it's rather short on what exactly it proposes. In particular, does this propose changing user-defined functions under strict_types=0 to accept null for scalar types? Eg., this will be allowed (under strict_types=0): ``` function x(string $y, int $z) { ... } x(null, null); //no error, no warning ``` Regards, Mel
  117503
April 8, 2022 18:07 craig@craigfrancis.co.uk (Craig Francis)
On Fri, 8 Apr 2022 at 18:54, Mel Dafert <mel@dafert.at> wrote:

> In particular, does this propose changing user-defined functions under > strict_types=0 to accept null for scalar types? > > Eg., this will be allowed (under strict_types=0): > ``` > function x(string $y, int $z) { > ... > } > x(null, null); //no error, no warning > ``` >
tbh my focus has been on the problems that have come up with internal functions. With user defined functions, I think it's up for debate (still a draft), but I think those NULL's should be coerced to the specified type (as documented), where I don't think PHP should be doing type checking under strict_types=0. Craig
  117505
April 9, 2022 09:29 craig@craigfrancis.co.uk (Craig Francis)
On Fri, 8 Apr 2022 at 19:07, Craig Francis <craig@craigfrancis.co.uk> wrote:

> On Fri, 8 Apr 2022 at 18:54, Mel Dafert <mel@dafert.at> wrote: > >> In particular, does this propose changing user-defined functions under >> strict_types=0 to accept null for scalar types? >> > > With user defined functions, I think it's up for debate (still a draft), > but I think those NULL's should be coerced to the specified type (as > documented), where I don't think PHP should be doing type checking under > strict_types=0. >
I've updated the RFC to note that user-defined functions don't cause a backwards compatibility issue; but I have added an example to highlight the coercion inconstancy: ```php function user_function(string $s, int $i, float $f, bool $b) { var_dump($s, $i, $f, $b); } user_function('1', '1', '1', '1'); // string(1) "1" / int(1) / float(1) / bool(true) user_function(2, 2, 2, 2); // string(1) "2" / int(2) / float(2) / bool(true) user_function(3.3, 3.3, 3.3, 3.3); // string(3) "3.3" / int(3) / float(3.3) / bool(true) user_function(false, false, false, false); // string(0) "" / int(0) / float(0) / bool(false) user_function(NULL, NULL, NULL, NULL); // Uncaught TypeError x4? ```
  117510
April 11, 2022 12:05 guilliam.xavier@gmail.com (Guilliam Xavier)
Hi,


Thanks for the draft.

On Sat, Apr 9, 2022 at 11:30 AM Craig Francis <craig@craigfrancis.co.uk>
wrote:

> On Fri, 8 Apr 2022 at 19:07, Craig Francis <craig@craigfrancis.co.uk> > wrote: > > > On Fri, 8 Apr 2022 at 18:54, Mel Dafert <mel@dafert.at> wrote: > > > >> In particular, does this propose changing user-defined functions under > >> strict_types=0 to accept null for scalar types? > >> > > > > With user defined functions, I think it's up for debate (still a draft), > > but I think those NULL's should be coerced to the specified type (as > > documented), where I don't think PHP should be doing type checking under > > strict_types=0. > > > > > I've updated the RFC to note that user-defined functions don't cause a > backwards compatibility issue; but I have added an example to highlight > the coercion inconstancy: >
You've updated the **Documentation** section (also: did you mean "inconsistency" rather than "inconstancy"?) but still not the **Proposal** (BTW all those sections between "Introduction" and "Proposal" would probably better be *sub*-sections of a section named "Problem" or "Current State" or something). And for the question: (currently) the RFC is named "NULL Coercion *Consistency*" and the Proposal says "Must keep the spirit of the original RFC, and *keep user-defined and internal functions consistent*.", which (to me) implies *not only* reverting the 8.1 deprecation on internal functions *but also* "changing user-defined functions under strict_types=0 to [coerce] null for scalar type[ declaration]s" indeed. In any case, that should be written clear in the RFC (either in "Proposal" or "Open Issues"). Regards, -- Guilliam Xavier
  117514
April 11, 2022 15:11 craig@craigfrancis.co.uk (Craig Francis)
On Mon, 11 Apr 2022 at 13:05, Guilliam Xavier xavier@gmail.com>
wrote:

> > https://wiki.php.net/rfc/null_coercion_consistency > > You've updated the **Documentation** section (also: did you mean > "inconsistency" rather than "inconstancy"?) but still not the **Proposal** > (BTW all those sections between "Introduction" and "Proposal" would > probably better be *sub*-sections of a section named "Problem" or "Current > State" or something). > > And for the question: (currently) the RFC is named "NULL Coercion > *Consistency*" and the Proposal says "Must keep the spirit of the original > RFC, and *keep user-defined and internal functions consistent*.", which (to > me) implies *not only* reverting the 8.1 deprecation on internal functions > *but also* "changing user-defined functions under strict_types=0 to > [coerce] null for scalar type[ declaration]s" indeed. > In any case, that should be written clear in the RFC (either in "Proposal" > or "Open Issues"). >
Thank you Guilliam, I've applied all of your suggestions (with a slight tweak to use `strict_types=1`, just because I find it easier to read). While I am open to for user-defined functions to keep the type check for NULL when not in `strict_types=1`, it does feel like a bug, and I think it would be better to be consistent (happy to discuss on or off list). Craig
  117515
April 11, 2022 15:14 george.banyard@gmail.com ("G. P. B.")
On Fri, 8 Apr 2022 at 18:35, Craig Francis <craig@craigfrancis.co.uk> wrote:

> Hi, > > I've written a new draft RFC to address the NULL coercion problems: > > https://wiki.php.net/rfc/null_coercion_consistency
The statement that:
> But the implementation caused a Type Error when coercing NULL for everyone > (even when not using *strict_types=1*), this seems more of an over-sight > is utterly wrong and was a conscious design choice based on the widely
accepted view that nullable by default like Java is a mistake and defeats the purpose. Even the competing RFC from Zend et al [1] proposed this behaviour
> A new set of coercion rules will apply to both user-land type hints and > internal type hints. The guiding principals behind these new rules are: > > 1. If the type of the value is an exact match to the type requested by > the hint - allow. > 2. If the value can be coerced to the type requested by the hint > without data loss and without creation of likely unintended data - allow. > 3. In all other cases - reject. > > And later to describe the internal function changes:
> This RFC proposes to bring the rule-set described in the last section to > internal functions as well, through updates to the zend_parse_parameters() > function. >
There was obviously the caveat about NULL being coerced for internals functions, but other than usage, another reasons stated:
> However, since this requires substantial auditing of internal functions - > especially ones that have default values but don't explicitly declare > themselves as accepting NULLs - it's outside the scope of this RFC and > will be revisited for 7.1. Note that if the > https://wiki.php.net/rfc/nullable_types > <https://wiki.php.net/rfc/nullable_types_rfc> is accepted it will further > reduce this discrepancy, by allowing user-land functions and internal > functions the same level of granularity in terms of accepting or rejecting > NULL values for function arguments. >
This auditing was to a large extent performed in PHP 8.0 with the introduction of the stubs. You can disagree with some of it, but this again brings us back to amending functions to have nullable arguments, not a blanket change. I appreciate some want to force strict type checking on everyone, but I
> want to make sure we have this properly documented, with names, and > explanations. > > Breaking changes should be justified - if they aren't, they only > make upgrading difficult and frustrating (bad for security). >
The breaking change was justified and explained, and voted unanimously in favour with 46 votes. The burden of justification is now on your end. Moreover, the breaking change has also been announced with ample time to fix the issues until the next major version. As with the previous one, this RFC is getting a strong no from my end, changing the type system into a broken and inconsistent way, and reversing a unanimously voted RFC needs strong justification. Best, George Peter Banyard [1] https://wiki.php.net/rfc/coercive_sth
  117517
April 11, 2022 18:22 craig@craigfrancis.co.uk (Craig Francis)
On Mon, 11 Apr 2022 at 16:14, G. P. B. banyard@gmail.com> wrote:

> But the implementation caused a Type Error when coercing NULL for everyone >> (even when not using *strict_types=1*), this seems more of an over-sight >> > is utterly wrong and was a conscious design choice based on the widely > accepted view that nullable by default like Java is a mistake and defeats > the purpose. >
Bit hash... anyway, I've replaced that paragraph (it wasn't clear enough), I just wanted to note how developers not using `strict_types=1` would see it as an over-sight that coercion works for string/int/float/bool but not null (despite what the documentation says), and how it's introduced a type check (that they didn't ask for). I assume you're using `strict_types=1`, and will be unaffected by this, so with all due respect, I'm not considering how you view or use NULL, nor do I care what Java does, I'm simply focused on how most developers use NULL in PHP. "2. If the value can be coerced to the type requested by the hint without
> data loss and without creation of likely unintended data - allow"
Yep, as noted under the Documentation heading, and copying from the manual, “PHP will coerce values of the wrong type into the expected scalar type declaration if possible”... which it can, and does; for example, coercion to a string "null is always converted to an empty string". https://www.php.net/manual/en/language.types.declarations.php#language.types.declarations.strict https://www.php.net/manual/en/language.types.string.php [...] but this again brings us back to amending functions to have nullable
> arguments, not a blanket change. >
As noted in the Introduction, second paragraph, unfortunately this won't work (especially considering the ~335 parameters affected by this)... just taking `htmlspecialchars()` as the most simple example, I asked a few developers who use `strict_types=1`, and they did not want `$string` to be a nullable argument, because NULL being passed to this argument suggests there's a bug elsewhere in their code. This view was also reflected in the quiz (the names are links to view their answers): https://quiz.craigfrancis.co.uk/ The breaking change was justified and explained, and voted unanimously in
> favour with 46 votes. > The burden of justification is now on your end. >
My whole RFC is the justification, from the lack of discussion of the impact (where I will note that I am in favour of its intention of consistency), the roughly 85% of scripts that do not use `strict_types=1`, the almost certainly less than 33% developers who use static analysis, how tools do not and will not fix this problem, how this goes against the documentation, and all of the examples. Keep in mind, if you're using `strict_types=1`, great, you're not affected (I'm not either)... however, I work with quite a few developers, and it's causing them a lot of problems, and their current approach is to either disable the deprecation warnings, or to stay with 8.0, simply because they do not have the time to make the changes... I will note that one team has been trying to update their code base, it's about 6 months later, and they still keep adding `strval()` to silence these deprecation warnings (yeah, I know, not good, but it's the easy solution). changing the type system into a broken and inconsistent way, and reversing
> a unanimously voted RFC needs strong justification. >
I really appreciate your time to read my draft RFC, and I know there is no way I can change your mind, but can you explain how would this be "broken and inconsistent"? Just take NULL to String coercion as an example, the documentation clearly and simply says “null is always converted to an empty string”, this needs to be updated to end with "... except when being passed to a function, in which case NULL will result in a Type Error" (I think that's "broken and inconsistent"). Craig
  117518
April 11, 2022 18:56 mark@demon-angel.eu (Mark Baker)
On 11/04/2022 20:22, Craig Francis wrote:
> Keep in mind, if you're using `strict_types=1`, great, you're not affected > (I'm not either)... however, I work with quite a few developers, and it's > causing them a lot of problems, and their current approach is to either > disable the deprecation warnings, or to stay with 8.0, simply because they > do not have the time to make the changes... I will note that one team has > been trying to update their code base, it's about 6 months later, and they > still keep adding `strval()` to silence these deprecation warnings (yeah, I > know, not good, but it's the easy solution).
This doesn't only apply to end user developers, but also to library and toolchain developers who need to maintain code covering a range of PHP versions, and who also need to make these trade-offs. It's those maintainers that have to deal with the impact of this type of change, whether needing to apply changes to their libraries to "suppress" the deprecation notice; or dealing with end-user developers raising the same issues time and again about it because they don't want to see deprecation notices appearing in their logs. Even though this was only a deprecation, most end-user developers don't recognise that difference from logged errors when using libraries, so the impact is felt most heavily by library and toolchain maintainers. This 8.1 deprecation was the one that created the most work, and the most stress for me; and even with an average of 85% code coverage with tests, I'm still not certain that I've resolved every instance using the dirty `$x ?? ''` approach when calling PHP core functions. In many ways, readying code for the 8.1 release because of this deprecation was more painful and time consuming than for the 8.0 release. With all the new deprecations being added for 8.2, I'm looking forward to it even less. That's also why I wish there was a better risk assessment made for every deprecation RFC; becase all too often there's little or none; and why I get frustrated whenever a question is raised about an RFC, and the person raising these concerns is too often shouted down for raising "a storm in a teacup". -- Mark Baker _________ |. \ \-3 |_J_/ PHP | || | __ | || |m| |m| I LOVE PHP
  117522
April 13, 2022 14:00 craig@craigfrancis.co.uk (Craig Francis)
On Mon, 11 Apr 2022 at 19:57, Mark Baker <mark@demon-angel.eu> wrote:

> This doesn't only apply to end user developers, but also to library and > toolchain developers who need to maintain code covering a range of PHP > versions >
Yep, and thank you for commenting. Library authors are the ones receiving pressure at the moment, and while users of the library sometimes help (e.g. pull requests), it's a fairly thankless and time consuming task, with little/no value for scripts not using `strict_types=1` (you're not the only one simply adding `?? ''` everywhere). Just one thing to add, this is from people who are complaining about a deprecation notice (which can and is being ignored), now think of the typical non-library code when it becomes a Fatal Error. Craig
  117519
April 11, 2022 19:08 a.leathley@gmx.net (Andreas Leathley)
On 11.04.22 20:22, Craig Francis wrote:
> Just take NULL to String coercion as an example, the documentation > clearly > and simply says “null is always converted to an empty string”, this needs > to be updated to end with "... except when being passed to a function, in > which case NULL will result in a Type Error" (I think that's "broken and > inconsistent").
You are taking parts of the documentation out of context, and omitting the start of the whole "Converting to string" section: "A value can be converted to a string using the (string) cast or the strval() function. String conversion is automatically done in the scope of an expression where a string is needed. This happens when using the echo or print functions, or when a variable is compared to a string. The sections on Types and Type Juggling will make the following clearer." In the same section it is also defined how resources and arrays get converted to strings. Following your argument, you should be able to pass arrays and resources to a string parameter because there is a clear definition in the documentation of what then happens. I unfortunately really don't like the direction of your RFC, because I see null as a real type and because it increases the divide between strict_types and not using strict_types. I have never used strict_types - in modern PHP code I find it unnecessary, if you use parameter, property and return types. I am glad PHP has decreased the difference between using strict_types or not, so your direction of trying to increase the difference seems like the wrong way to go. Ideally I would rather want to see a future where strict_types can be removed/reconciled.
  117521
April 13, 2022 13:36 craig@craigfrancis.co.uk (Craig Francis)
On Mon, 11 Apr 2022 at 20:08, Andreas Leathley leathley@gmx.net> wrote:

> You are taking parts of the documentation out of context, and omitting > the start of the whole "Converting to string" section: > > "A value can be converted to a string using the (string) cast or the > strval() function. String conversion is automatically done in the scope > of an expression where a string is needed. This happens when using the > echo or print functions, or when a variable is compared to a string. The > sections on Types and Type Juggling will make the following clearer." >
I'm sorry, I've read this several times now, and I'm not sure what this adds. My RFC simply quotes the paragraphs that explain how null is coerced (other than the small abbreviation for booleans, those paragraphs are included in their entirety), I don't think it needs any more context than that, it's not like I'm mis-representing how coercion works (and is used) in PHP. I could expand my first quote to include "... For example, a function that is given an int for a parameter that expects a string will get a variable of type string", but I don't think it's that useful.
> In the same section it is also defined how resources and arrays get > converted to strings. Following your argument, you should be able to > pass arrays and resources to a string parameter because there is a clear > definition in the documentation of what then happens. >
No, my RFC is about the problems this change has caused, how it is making it difficult to upgrade. I'm only referencing the documentation to confirm how NULL has always been coerced. I unfortunately really don't like the direction of your RFC, because I
> see null as a real type and because it increases the divide between > strict_types and not using strict_types. I have never used strict_types > - in modern PHP code I find it unnecessary, if you use parameter, > property and return types. I am glad PHP has decreased the difference > between using strict_types or not, so your direction of trying to > increase the difference seems like the wrong way to go. Ideally I would > rather want to see a future where strict_types can be removed/reconciled.
You can see NULL however you like, but most developers do not share that view. NULL has been passed to these functions, since, well, forever; and changing code to manually convert NULL to the relevant type is incredibly time consuming, and of questionable value (e.g. when developers simply add strval() everywhere). Either way, I'm not trying to "increase the difference" (this is how it has always worked). If you want type checking, good, it can be really useful, and I recommend it with static analysis... but it's enabled with `strict_types=1`. Craig
  117524
April 13, 2022 19:07 a.leathley@gmx.net (Andreas Leathley)
On 13.04.22 15:36, Craig Francis wrote:
> On Mon, 11 Apr 2022 at 20:08, Andreas Leathley leathley@gmx.net> wrote: > > You are taking parts of the documentation out of context, and omitting > the start of the whole "Converting to string" section: > > "A value can be converted to a string using the (string) cast or the > strval() function. String conversion is automatically done in the > scope > of an expression where a string is needed. This happens when using the > echo or print functions, or when a variable is compared to a > string. The > sections on Types and Type Juggling will make the following clearer." > > > > I'm sorry, I've read this several times now, and I'm not sure what > this adds. My RFC simply quotes the paragraphs that explain how null > is coerced (other than the small abbreviation for booleans, those > paragraphs are included in their entirety), I don't think it needs any > more context than that, it's not like I'm mis-representing how > coercion works (and is used) in PHP. > Mentioning the documentation as a reason to be "consistent" (which comes
up again and again in your arguments with this RFC) just seems like a bogus reason to me. It is nitpicking about specific sentences in the documentation without refering to the surrounding context. It would be nicer to argue according to real technical arguments instead of arguments about sentences taken out of context in the documentation.
> You can see NULL however you like, but most developers do not share > that view. NULL has been passed to these functions, since, well, > forever; and changing code to manually convert NULL to the relevant > type is incredibly time consuming, and of questionable value (e.g. > when developers simply add strval() everywhere).
"Most developers do not share that view". I find statements such as these as a tall order - did you interview the majority of developers around the world? Are you suggesting you are talking as a representative of a large population of PHP developers? How do you think you got such a role? I would prefer some humility and not assume you are an elected leader of an underrepresented group of developers and even knowing the intention behind their code. "NULL has been passed to these functions, since, well, forever" - this also goes for wrong values being passed to functions since forever leading to security issues and bugs. That is the whole point of developing a language: Reducing the surface for bugs, improving parts of it where a lot of bugs have happened historically and not making the same mistakes as other languages (and learning from the good parts of other languages). More people writing code in a certain way does not make that code better or safer.
  117527
April 14, 2022 08:10 craig@craigfrancis.co.uk (Craig Francis)
On Wed, 13 Apr 2022 at 20:08, Andreas Leathley leathley@gmx.net> wrote:

> Mentioning the documentation as a reason to be "consistent" (which comes > up again and again in your arguments with this RFC) just seems like a > bogus reason to me. It is nitpicking about specific sentences in the > documentation without refering to the surrounding context. It would be > nicer to argue according to real technical arguments instead of > arguments about sentences taken out of context in the documentation. >
I don't, do I? I've checked multiple times, and I only use the documentation to note how NULL coercion works. I'm using the word "consistent" to be "consistent with scalar types"... as in, a string/int/float/bool can be coerced, but NULL coercion has now been deprecated in some cases, as shown with the examples.
> "Most developers do not share that view". I find statements such as > these as a tall order - did you interview the majority of developers > around the world? Are you suggesting you are talking as a representative > of a large population of PHP developers? How do you think you got such a > role?
My intro says "Roughly 85% scripts do not use strict_types=1", and "Roughly 33% of developers use static analysis" (source of numbers noted in my RFC)... while that does not guarantee anything, it's a fairly good indication that most developers do not care about types (in regards to coercion), or specifically, how you "see null as a real type". As an aside, I would like to include more stats (maybe WordPress install numbers?), but I couldn't think of any which were easy to source (ref reliability), or be that useful to the discussion. I would prefer some humility and not assume you are an elected
> leader of an underrepresented group of developers and even knowing the > intention behind their code. >
I'm simply describing the situation, and I have suggested two ways to avoid the upgrade problems (the suggestion to update some arguments to be nullable has been rejected by written feedback, and via my questionnaire). "NULL has been passed to these functions, since, well, forever" - this
> also goes for wrong values being passed to functions since forever > leading to security issues and bugs.
I say in the RFC that "Arrays, Resources, and Objects (without __toString()) cannot be coerced (for fairly obvious reasons)". For example `htmlspecialchars(array())` will, as of 8.0, throw a Type Error (and a warning before), that does represent a bug (it should not happen), and I'm very happy with that situation. But NULL coercion is different, it has worked well for years, and it is used a lot (see the examples in my RFC). Craig
  117529
April 14, 2022 09:01 a.leathley@gmx.net (Andreas Leathley)
On 14.04.22 10:10, Craig Francis wrote:

> My intro says "Roughly 85% scripts do not use strict_types=1", and "Roughly > 33% of developers use static analysis" (source of numbers noted in my > RFC)... while that does not guarantee anything, it's a fairly good > indication that most developers do not care about types (in regards to > coercion), or specifically, how you "see null as a real type". > > As an aside, I would like to include more stats (maybe WordPress install > numbers?), but I couldn't think of any which were easy to source (ref > reliability), or be that useful to the discussion. > I have never used strict_types in any code I have ever written, and I
care about types and type coercions. Yet I do not like the strict_types distinction and I am glad that I do not need to use it, and I think we are not that far away from even reconciling these two systems. I do not mind that the string "5" can be passed to an int parameter, while passing the string "hello" will throw a type exception no matter what mode you use. With some adjustments to certain coercions the advantage of strict_types would be even more neglible - as you can pass "hello" to a boolean parameter without problems currently, and I don't like that. Looking at usage numbers and assuming that supports your point seems like a big leap to me. You are assuming these developers are making a choice while it seems quite possible that a lot of code has been written without thinking about all the implications, which is where so many bugs are coming from. In the last months reviewing code I noticed quite a few developers are copy-pasting code like crazy, or using generated code from other tools - both of which are often not well reasoned about. So this has little to do with developers not caring about methods of type coercions and more about the language permitting such code, leading to many follow-up problems. Also, this "problem" only affects the internal functions, yet you want to change it for all functions. If you use a framework and pass a value to a method of that framework, it would already have created a type error with null and a non-nullable argument for quite some time. I think you are making this out to be a bigger problem than it is. Sure, in a codebase with lots of custom code where no incoming variable is checked if it even exists there will be problems, but even then it is easily fixable. In a codebase based on a common framework it is much less of a problem, as you will probably be using lots of framework components and not mainly internal functions, and there you already have the "limitation" of not being able to pass null to a non-nullable method.
  117531
April 14, 2022 12:14 craig@craigfrancis.co.uk (Craig Francis)
On Thu, 14 Apr 2022 at 10:01, Andreas Leathley leathley@gmx.net> wrote:

> I have never used strict_types in any code I have ever written, and I care > about types and type coercions. Yet I do not like the strict_types > distinction and I am glad that I do not need to use it, and I think we are > not that far away from even reconciling these two systems. I do not mind > that the string "5" can be passed to an int parameter, while passing the > string "hello" will throw a type exception no matter what mode you use. > With some adjustments to certain coercions the advantage of strict_types > would be even more neglible - as you can pass "hello" to a boolean > parameter without problems currently, and I don't like that. >
Yep, I agree with everything you have said there, and it's what George seems to be working towards (which I also agree with): https://github.com/Girgias/unify-typing-modes-rfc
> Looking at usage numbers and assuming that supports your point seems like > a big leap to me. You are assuming these developers are making a choice > while it seems quite possible that a lot of code has been written without > thinking about all the implications,
Actually, that's exactly what I see happening - NULL is passed to these internal functions without any thought, and unlike arrays/resources/etc which cannot be sensibly coerced, NULL coercion has been fine, and has worked for (as long as I can tell) forever. Back to the most simple example I can think of: $search = filter_input(INPUT_GET, 'q'); // Or any of the framework examples. echo 'Results for ' . htmlspecialchars($search); It does not matter if the request explicitly set "/?q=" in the URL or not; `filter_input()` and frameworks show there is a difference by returning NULL or an Empty String... but most of the time, no thought is given to that difference, developers just expect it to work. which is where so many bugs are coming from. In the last months reviewing
> code I noticed quite a few developers are copy-pasting code like crazy, or > using generated code from other tools - both of which are often not well > reasoned about. So this has little to do with developers not caring about > methods of type coercions and more about the language permitting such code, > leading to many follow-up problems. >
Yep, and that's why I'm in favour of stopping the broken/weird coercions... it's just that NULL is different, and it's fine (as noted, there are some developers who use it part of their strict type checking, but that seems to be fairly unusual).
> Also, this "problem" only affects the internal functions, yet you want to > change it for all functions. >
My RFC is focused on the backwards compatibility problems for internal functions... user defined functions are only included for consistency reasons (keeping the spirit of the original RFC), because they probably should coerce NULL, like how they coerce integers to strings, etc. If you use a framework and pass a value to a method of that framework, it
> would already have created a type > error with null and a non-nullable argument for quite some time. >
Yep, and that has caused... I wouldn't say problems, but minor confusion? Craig
  117555
April 20, 2022 19:02 a.leathley@gmx.net (Andreas Leathley)
On 14.04.22 14:14, Craig Francis wrote:
> Yep, I agree with everything you have said there, and it's what George > seems to be working towards (which I also agree with): > > https://github.com/Girgias/unify-typing-modes-rfc > Yet your RFC goes exactly against the quoted document by making the
differences between strict_types larger.
> I also cannot explain why NULL should be rejected, other than for those > developers who see NULL as an invalid value and also use > `strict_types=1`... as in, when a team of developers spends a few hundred > hours adding strval() everywhere, what does that actually achieve? what do > they tell the client? does it make the code easier to read? does it avoid > any bugs? or is it only for 8.1 compatibility? I don't get why you would add strval everywhere. Why are you getting
null everywhere? In most of the examples in your RFC "null" is specifically chosen as a default value and could easily be an empty string (why do something like "$_POST['value'] ?? null" if you actually want an empty string?). For the framework examples, the second argument of these functions is the default value - it does not have to be null. And "filter_input" I have not seen in a codebase yet, probably because of its weird double function as retrieving an input variable and also filtering it, with both null, false and the result as a possible return value. The few cases where I encountered the deprecation notice it was always a mistake and easy to fix, and I was glad that was pointed out to me. There is another 3.5 years until PHP 9 is likely to come out, which is a lot of time for people to adjust their codebase. I could even see an argument for not promoting it to a fatal error in 9.0 if so many people need more time. Removing the deprecation and changing fundamental aspects about the type system in PHP by auto-coercing null just so people do not need to update their codebase seems like the worst possible way forward. So far, your RFC does not mention the arguments made against your proposed changes in an understandable way. George Peter Banyard wrote some extensive arguments and you have only included one sentence without any of his arguments and try to refute it in the very next sentence as not really an argument, and it was mentioned by him that coercing null in other languages like Java has lead to problems. Comparisons to other languages could be helpful, as NULL is not just a value in PHP - NULL in MySQL for example also cannot be coerced and is its own distinct value (it even has its own syntax for comparisons). I am still bothered by the language of the RFC in general where you often write things like ".. it hasn't been the case for everyone else", "most developers are not in a position" and so on. Assuming you know what everyone is doing and how everyone is doing it in an RFC does not seem constructive. All the numbers you cite are also circumstantial and not about the supposed problem discussed in the RFC - for example, you assume people not using static analysis are in favor of your RFC, even though this is pure speculation. Compared to all the other RFCs in the last 3 years I read through I don't only disagree with this RFC, I also think it is not very well reasoned about and does not convey the discussions that were held about the topic on this mailing list. It mainly contains your opinion, which seems insufficient for an RFC.
  117557
April 21, 2022 13:41 craig@craigfrancis.co.uk (Craig Francis)
On Wed, 20 Apr 2022 at 20:02, Andreas Leathley leathley@gmx.net> wrote:

> I don't get why you would add strval everywhere. Why are you getting null > everywhere?
As to adding `strval($var)`, or `(string) $var`, or `$var ?? ""` everywhere... that's because we (or frameworks) cannot simply change the defaults for these "Common sources of NULL", nor can everyone simply add strval() around these sources, because NULL can still be useful, with possible NULL checks later (e.g. if you need to distinguish between a missing value and an empty string)... the only safe/easy way to do it is at the sinks, and there are lots of them. Unfortunately we cannot magically update all of the existing code in the world... these changes have to be done manually (note the lack of tooling), and the time taken should be justified (like how the removal of magic quotes was justified).
> In most of the examples in your RFC "null" is specifically chosen as a > default value and could easily be an empty string (why do something like > "$_POST['value'] ?? null" if you actually want an empty string?).
You could, but these are common and useful patterns/sources... and most importantly, NULL has been the default for years. For the framework examples, the second argument of these functions is the
> default value - it does not have to be null. >
Yes, but we are talking about backwards compatibility... and while most developers could always specify the default (second argument) to be an empty string, that's not always appropriate (frameworks default to NULL so developers can tell when a value has not been provided, which can be useful in some cases).
> And "filter_input" I have not seen in a codebase yet, probably because of > its weird double function as retrieving an input variable and also > filtering it, with both null, false and the result as a possible return > value. >
Sorry, but it is used, and it can return NULL. The few cases where I encountered the deprecation notice it was always a
> mistake and easy to fix, and I was glad that was pointed out to me. >
Lucky you... and as noted with Rowan, getting to look at your code again can be useful. I also note that "making each change is fairly easy", but you will have much better luck with static analysis (where variables are reliably checked if they can be nullable). But the important bit is that these problems "are difficult to find" and "there are many of them". If you want a very simple public example: https://github.com/PHPCompatibility/PHPCompatibility/commit/ab7b0b82d95c82e62f9cffb0f1960f3ccbf0805e Or, if you want to do some digging: https://twitter.com/mwop/status/1441044164880355342 There is another 3.5 years until PHP 9 is likely to come out, which is a
> lot of time for people to adjust their codebase. I could even see an > argument for not promoting it to a fatal error in 9.0 if so many people > need more time.
If it's deprecated, that is an intent to break... and if no other solutions present themselves, and the vote for this RFC fails... why would it be delayed? It will then be clear the Internals developers want this (they would have made an informed choice, and put their name to it). Removing the deprecation and changing fundamental aspects about the type
> system in PHP by auto-coercing null just so people do not need to update > their codebase seems like the worst possible way forward. >
Many times NULL coercion is fine, like how the integer 5 can be coerced to the string "5"... and it's worth remembering this is how it's always worked (when not using `strict_types=1`). So far, your RFC does not mention the arguments made against your proposed
> changes in an understandable way.
Are you going to suggest any improvements? what have I missed? I'm trying to keep it short, because I know long RFC's can be a problem. George Peter Banyard wrote some extensive arguments and you have only
> included one sentence without any of his arguments and try to refute it in > the very next sentence as not really an argument,
The "one sentence" you refer to, I assume that's "Userland scalar types [...] did not include coercion from NULL for very good reasons"? Unfortunately the email does not say what these "good reasons" are, so I could only go with the Scalar Type Declarations RFC, where I summarised the only paragraph that talks about NULL (in regards to weak type checks). I think “to be consistent with our existing type declarations” is a pretty good quote summary of that paragraph, and I think it's fair for me to add "no further details given". The comment that NULL was "never accepted for scalar type declarations" has already been discussed, and does not add anything extra. https://wiki.php.net/rfc/scalar_type_hints_v5#behaviour_of_weak_type_checks As to the 2 emails from George... I read them very carefully, several times, I have covered those details in my RFC, and I did reply... but just to be absolutely sure I haven't missed anything: https://externals.io/message/117501#117515 1) The Type Error when coercing NULL seeming to be an oversight, I replied with an explanation, and tweaked the wording of that paragraph: https://wiki.php.net/rfc/null_coercion_consistency?do=diff&rev2%5B0%5D=1649688632&rev2%5B1%5D=1649694771&difftype=sidebyside 2) Java's "nullable by default" is one way NULL is used, I don't think this adds anything new. 3) The RFC from Zend, I covered in my reply, specifically the second rule "If the value can be coerced", which NULL can be (I will note that some developers see NULL as an invalid value, but that's already mentioned in my RFC). 4) "amending functions to have nullable arguments", it was the basis of my previous RFC, I've noted in this in this RFC, and in a few of my replies. As an aside, no one seems to agree that htmlspecialchars() should explicitly accept NULL (due to the reason noted in the second paragraph), let alone the other ~334 parameters. 5) "The breaking change was justified and explained", I've covered in my RFC and reply, the justification was about consistency between internal and external functions (which I talk about in my RFC, and agree with in spirit), and the impact was not explained (ref "the lack of discussion of the impact", and the single email from Craig Duncan). 6) "the breaking change has also been announced with ample time to fix the issues", see the "Temporary Solutions" section in my RFC, on how this is being ignored, and the complete lack of tooling to make these changes. https://externals.io/message/117501#117523 1) "strict_types were a mistake", yep, that's fine, I just don't think it adds anything to my RFC. 2) "making coercive typing mode more sensible", going through the quoted RFC from Girgias, that seems to be fine, and I agree with. 3) "Userland scalar types [...] did not include coercion from NULL for very good reasons.", as noted above, I added this to the Open Issues on the 15th April. 4) "general consensus that internals and userland should behave identically", I agree with in spirit, and has been included since my very first draft - where I said we must "keep user-defined and internal functions consistent". 5) "align internal behaviour with the accepted better design choice of userland", I'm not sure I can quote this, as "better" needs to be described and explained. In short, it's not adding anything that hasn't already been noted above. I don't think the rest of the second email raises any other points that are relevant to my RFC (it's more about how PHP is developed). I am still bothered by the language of the RFC in general where you often
> write things like ".. it hasn't been the case for everyone else", "most > developers are not in a position" and so on. Assuming you know what > everyone is doing and how everyone is doing it in an RFC does not seem > constructive. All the numbers you cite are also circumstantial and not > about the supposed problem discussed in the RFC.
I am human, and I wrote it in a way that I hope is understandable. That said, I'm getting the feeling you are the one intentionally quoting "out of context" (1st quote can only apply to those using `strict_types=1`, so by definition "it hasn't been the case for everyone else"; 2nd is about the problem of how "most developers are not in a position" to use "very strict Static Analysis", which I back up with the low percentage usage of `strict_types=1` and the quoted JetBrains developer survey). Admittedly I've only got the 2 stats in my RFC intro, but they are relevant, and I did say to you on 14 Apr 2022 "I would like to include more stats (maybe WordPress install numbers?), but I couldn't think of any which were easy to source (ref reliability), or be that useful to the discussion" .... do you have anything to add here?
> - for example, you assume people not using static analysis are in favor of > your RFC, even though this is pure speculation.
What? Where? Compared to all the other RFCs in the last 3 years I read through I don't
> only disagree with this RFC, I also think it is not very well reasoned > about and does not convey the discussions that were held about the topic on > this mailing list. It mainly contains your opinion, which seems > insufficient for an RFC.
I have spent considerable amount of time replying to your points, and as noted above, I have included peoples comments in my RFC. It is also my second attempt at solving the problem due to feedback. If you can give me something/anything to contribute to the RFC, I will happily add/update it; or if you can provide any other solutions to this problem (please don't pretend it doesn't exist), then I will happily discuss or even recommend them. Craig
  117561
April 21, 2022 16:32 a.leathley@gmx.net (Andreas Leathley)
> There is another 3.5 years until PHP 9 is likely to come out, which is a >> lot of time for people to adjust their codebase. I could even see an >> argument for not promoting it to a fatal error in 9.0 if so many people >> need more time. > > > If it's deprecated, that is an intent to break... and if no other solutions > present themselves, and the vote for this RFC fails... why would it be > delayed? It will then be clear the Internals developers want this (they > would have made an informed choice, and put their name to it).
A deprecation notice is fairly harmless. If in two years many projects and many developers say that they need more time to fix these deprecations and promoting them to errors would cause a lot of problems, then it would be easy to prolong the period where it is only a deprecation with a small RFC. By then more people will know if they are impacted, many frameworks will have updated, and there will be a clearer picture if this is such a big deal or not. Right now this is not clear - I doubt most projects are using PHP 8.1, not even all frameworks/libraries are compatible to PHP 8.1.
> Are you going to suggest any improvements? what have I missed? I'm trying > to keep it short, because I know long RFC's can be a problem.
An RFC should cover the discussions held on this mailing list. From the RFC howto: "Update your RFC to document all the issues and discussions. Cover both the positive and negative arguments." Do you honestly believe you have done that? I have tried to discuss some counterpoints and alternatives to your proposal, but none are mentioned in the RFC. I also don't see the discussion points of other people in the RFC. None of the alternatives to your proposal are mentioned in the RFC - like changing the internal functions to accept null instead. There have been quite a few suggestions and arguments made so far, and I don't see them in the RFC. I have discussed RFCs with a few people on this mailing list, sometimes with very different opinions about a topic, and not always was there a resolution to the topic at hand. Yet the discussions with you have been the most frustrating so far, because you say you are open to arguments and proposals, yet you do not seem to consider them at all - yet you really seem to believe you are totally open-minded. I have been impressed by a few people on this mailing list who I disagreed with wholeheartedly, because I noticed with how much care they tried to relay disagreeing arguments and other proposals in the RFC and how balanced the RFCs were at the end. Your RFC seems totally incomplete to me and mainly based on your made-up opinion. But at this point I don't think I will get through to you in any way, which is why I will step out of this conversation. If the RFC comes to a vote in the current conditions though I will raise an objection that it does not represent the discussions held on this mailing list and should be disregarded because of that.
  117590
April 25, 2022 09:44 craig@craigfrancis.co.uk (Craig Francis)
> On 21 Apr 2022, at 17:32, Andreas Leathley leathley@gmx.net> wrote: > >>> There is another 3.5 years until PHP 9 is likely to come out, which is a lot of time for people to adjust their codebase. I could even see an argument for not promoting it to a fatal error in 9.0 if so many people need more time. >> >> >> If it's deprecated, that is an intent to break... and if no other solutions present themselves, and the vote for this RFC fails... why would it be delayed? It will then be clear the Internals developers want this (they would have made an informed choice, and put their name to it). > > A deprecation notice is fairly harmless. If in two years many projects and many developers say that they need more time to fix these deprecations and promoting them to errors would cause a lot of problems, then it would be easy to prolong the period where it is only a deprecation with a small RFC. By then more people will know if they are impacted, many frameworks will have updated, and there will be a clearer picture if this is such a big deal or not. Right now this is not clear - I doubt most projects are using PHP 8.1, not even all frameworks/libraries are compatible to PHP 8.1.
You're right, I am working with a few development teams that are simply not upgrading to 8.1 at the moment, they don't have the time to fix this issue... two are using set_error_handler() to ignore this specific deprecation notice, and one team has moved over, but they are still finding these issues ~6 months later.
>> Are you going to suggest any improvements? what have I missed? I'm trying to keep it short, because I know long RFC's can be a problem. > > An RFC should cover the discussions held on this mailing list. From the RFC howto: > > "Update your RFC to document all the issues and discussions. Cover both the positive and negative arguments." > > Do you honestly believe you have done that? I have tried to discuss some counterpoints and alternatives to your proposal, but none are mentioned in the RFC. I also don't see the discussion points of other people in the RFC. None of the alternatives to your proposal are mentioned in the RFC - like changing the internal functions to accept null instead. There have been quite a few suggestions and arguments made so far, and I don't see them in the RFC.
I'll ask again, what have I missed? as in, please, give me details. Last time you vaguely said that "George Peter Banyard wrote some extensive arguments", I spent at least an hour going though every single point (again), and this time I listed them out for you, and you ignored that, not even an apology. The only point you made in that last paragraph is "changing the internal functions to accept null", it's already there, under the heading "Rejected Features", which includes the link to the RFC I created, and the reason it was rejected. That reason is also noted in the second paragraph of the Introduction.
> I have discussed RFCs with a few people on this mailing list, sometimes with very different opinions about a topic, and not always was there a resolution to the topic at hand. Yet the discussions with you have been the most frustrating so far, because you say you are open to arguments and proposals, yet you do not seem to consider them at all - yet you really seem to believe you are totally open-minded. I have been impressed by a few people on this mailing list who I disagreed with wholeheartedly, because I noticed with how much care they tried to relay disagreeing arguments and other proposals in the RFC and how balanced the RFCs were at the end. Your RFC seems totally incomplete to me and mainly based on your made-up opinion. But at this point I don't think I will get through to you in any way, which is why I will step out of this conversation. If the RFC comes to a vote in the current conditions though I will raise an objection that it does not represent the discussions held on this mailing list and should be disregarded because of that.
I have no idea how to respond to this, I have spent a considerable amount of time carefully reading your 5 emails, responding to every single point you raised (which you simply ignore), ensuring all points are in my RFC, and this is your attitude? no apology, no thanks, just vague complaints and insults? If you have any objection that is not already noted in the RFC, please, let me know what it is... maybe "Andreas does not like the wording in this RFC, or the stats provided, but does not provide any alternative"? Craig
  117591
April 25, 2022 10:35 rowan.collins@gmail.com (Rowan Tommins)
On 25/04/2022 10:44, Craig Francis wrote:
> You're right, I am working with a few development teams that are simply not upgrading to 8.1 at the moment, they don't have the time to fix this issue... two are using set_error_handler() to ignore this specific deprecation notice, and one team has moved over, but they are still finding these issues ~6 months later.
Taking time to find and make fixes is the whole point of deprecation notices - if nothing else changes, they can expect to have another 3.5 years before a version of PHP is released where these become errors. Upgrading to PHP 8.1 as soon as possible but only tackling the deprecations when time allows is absolutely the right thing to do. Regards, -- Rowan Tommins [IMSoP]
  117592
April 25, 2022 10:45 craig@craigfrancis.co.uk (Craig Francis)
On 25 Apr 2022, at 11:35, Rowan Tommins collins@gmail.com> wrote:
> Taking time to find and make fixes is the whole point of deprecation notices - if nothing else changes, they can expect to have another 3.5 years before a version of PHP is released where these become errors. Upgrading to PHP 8.1 as soon as possible but only tackling the deprecations when time allows is absolutely the right thing to do.
That's true for things that need to be fixed, like the deprecation and removal of magic quotes, but this is a lot of work for... I can't provide an answer to that. In the mean time, I've only convinced 3 teams to upgrade to 8.1, and two of them are using `set_error_handler()` to specifically ignore this error (the log files were filled with far too many "passing null to parameter" deprecation messages, whereas adding #[ReturnTypeWillChange] wasn't too bad). Craig
  117523
April 13, 2022 14:15 george.banyard@gmail.com ("G. P. B.")
On Mon, 11 Apr 2022 at 19:22, Craig Francis <craig@craigfrancis.co.uk>
wrote:

> On Mon, 11 Apr 2022 at 16:14, G. P. B. banyard@gmail.com> wrote: > >> But the implementation caused a Type Error when coercing NULL for >>> everyone (even when not using *strict_types=1*), this seems more of an >>> over-sight >>> >> is utterly wrong and was a conscious design choice based on the widely >> accepted view that nullable by default like Java is a mistake and defeats >> the purpose. >> > > > Bit hash... anyway, I've replaced that paragraph (it wasn't clear enough), > I just wanted to note how developers not using `strict_types=1` would see > it as an over-sight that coercion works for string/int/float/bool but not > null (despite what the documentation says), and how it's introduced a type > check (that they didn't ask for). > > I assume you're using `strict_types=1`, and will be unaffected by this, so > with all due respect, I'm not considering how you view or use NULL, nor do > I care what Java does, I'm simply focused on how most developers use NULL > in PHP. >
If I indeed exclusively used strict_types I wouldn't care, but I don't because IMHO strict_types were a mistake and are harmful and induce some self inflicted problems. I've spent a large amount of time making coercive typing mode more sensible and aligning the behaviour as close to reasonably possible with strict_types so that the possibility of dropping strict_types all together wouldn't be outrageous. I've written about this elsewhere and on this list already but main points are located here: https://github.com/Girgias/unify-typing-modes-rfc Userland scalar types, however they would have been introduced, did not include coercion from NULL for *very* good reasons. Wanting to change this behaviour needs more justification than a paragraph in docs, as the docs should reflect the reality of the implementation. The second aspect is the general consensus that internals and userland should behave identically. Which is why the deprecation is here in the first place, to align internal behaviour with the accepted *better* design choice of userland. Your RFC is going against these two pillars. Moreover, the fact that we are deprecating this and not just changing the behaviour from one version to another is that we do recognize PHP developers use NULL in this way, and we are giving them advance notice of the change. Also a deprecation notice should, IMHO, never be promoted to an exception. This is maybe something we need to get better at teaching end users so that they don't pressure library maintainers (or themselves) to fix all of them when they still have a couple of years. Be that either on the php.net docs or elsewhere, to teach end users to either not do this, or at least exclude the vendor directory from their error handler. However, and I cannot stress this enough, we cannot just stop using this tool available to us to steer the language into being better. Because then the other two choices are doing hard breaks with no notice in a major version, or not breaking anything. Both things which are unreasonable to expect. There are behaviours of PHP that will probably never be "fixable", but this should not prevent us from improving the aspects we can. And if we only focused on how developers use PHP we would still have register globals, magic quotes, etc. Clearly it seems you won't listen to why these changes were made and want to reverse course, therefore good luck convincing enough people to accept such an RFC. I think I've said all there is to say and won't interact further with any such proposals. Regards, George P. Banyard
  117525
April 13, 2022 19:43 craig@craigfrancis.co.uk (Craig Francis)
https://wiki.php.net/rfc/null_coercion_consistency

On Wed, 13 Apr 2022 at 15:15, G. P. B. banyard@gmail.com> wrote:

> I've spent a large amount of time making coercive typing mode more > sensible and aligning the behaviour as close to reasonably possible with > strict_types so that the possibility of dropping strict_types all together > wouldn't be outrageous. > I've written about this elsewhere and on this list already but main points > are located here: https://github.com/Girgias/unify-typing-modes-rfc >
Thank you George, I do appreciate your work on this, I agree with pretty much everything you have written, and I really do want PHP to move in that direction. The *only* thing I disagree with is NULL coercion, because it has worked perfectly well for years, and is used so often. Userland scalar types, however they would have been introduced, did not
> include coercion from NULL for *very* good reasons. > Wanting to change this behaviour needs more justification than a paragraph > in docs, as the docs should reflect the reality of the implementation. >
Some people see NULL as an invalid value, which can sometimes be useful in a `strict_types=1` environment, but that's the only "good reason" I can find... it's why I changed my mind about making some parameters nullable, and took a different approach with this RFC. But most developers do not use NULL like that, and that's why it's passed so often to the ~335 function parameters I've noted (most of them are on the list because they realistically receive user values, which can be NULL in most frameworks). The second aspect is the general consensus that internals and userland
> should behave identically. >
Yep, they really should... it's why I said we "Must keep the spirit of the original RFC, and keep user-defined and internal functions consistent". Moreover, the fact that we are deprecating this and not just changing the
> behaviour from one version to another is that we do recognize PHP > developers use NULL in this way, and we are giving them advance notice of > the change. > Also a deprecation notice should, IMHO, never be promoted to an exception. >
Unfortunately deprecation notices are being ignored (as noted in my RFC, with examples on how); and there is the intention to use Type Error Exceptions in 9.0 (the thing I'm trying to avoid). I'm approaching this from a security point of view - the last thing I want to do in ~5 years is auditing code/systems where they have stuck with 8.x because it's too hard to upgrade. There are behaviours of PHP that will probably never be "fixable", but this
> should not prevent us from improving the aspects we can. > And if we only focused on how developers use PHP we would still have > register globals, magic quotes, etc. >
I want the language to be better as well, and I was *very* glad to see the back of Register Globals and Magic Quotes (I know they were a pain to remove, but it was well justified, as they caused so many problems). Clearly it seems you won't listen to why these changes were made and want
> to reverse course, therefore good luck convincing enough people to accept > such an RFC. >
I have been listening/reading (it's why I've taken about 2 hours to read what you have written, and reply). I have also been discussing this with different people (on and off list)... unlike the original RFC discussion, where it was only Craig Duncan that mentioned this issue: https://externals.io/message/112327#112531 And it's because I've been listening to people that I changed my mind, and created this new RFC (as noted above). I think I've said all there is to say and won't interact further with any
> such proposals. >
That's a shame, as I'd like to discuss solutions to this problem you would be happy with. Craig
  117526
April 14, 2022 07:31 landers.robert@gmail.com (Robert Landers)
On Wed, Apr 13, 2022 at 9:45 PM Craig Francis <craig@craigfrancis.co.uk> wrote:
> > https://wiki.php.net/rfc/null_coercion_consistency > > On Wed, 13 Apr 2022 at 15:15, G. P. B. banyard@gmail.com> wrote: > > > I've spent a large amount of time making coercive typing mode more > > sensible and aligning the behaviour as close to reasonably possible with > > strict_types so that the possibility of dropping strict_types all together > > wouldn't be outrageous. > > I've written about this elsewhere and on this list already but main points > > are located here: https://github.com/Girgias/unify-typing-modes-rfc > > > > > Thank you George, I do appreciate your work on this, I agree with pretty > much everything you have written, and I really do want PHP to move in that > direction. > > The *only* thing I disagree with is NULL coercion, because it has worked > perfectly well for years, and is used so often. > > > > Userland scalar types, however they would have been introduced, did not > > include coercion from NULL for *very* good reasons. > > Wanting to change this behaviour needs more justification than a paragraph > > in docs, as the docs should reflect the reality of the implementation. > > > > > Some people see NULL as an invalid value, which can sometimes be useful in > a `strict_types=1` environment, but that's the only "good reason" I can > find... it's why I changed my mind about making some parameters nullable, > and took a different approach with this RFC. > > But most developers do not use NULL like that, and that's why it's passed > so often to the ~335 function parameters I've noted (most of them are on > the list because they realistically receive user values, which can be NULL > in most frameworks). > > > > The second aspect is the general consensus that internals and userland > > should behave identically. > > > > > Yep, they really should... it's why I said we "Must keep the spirit of the > original RFC, and keep user-defined and internal functions consistent". > > > > Moreover, the fact that we are deprecating this and not just changing the > > behaviour from one version to another is that we do recognize PHP > > developers use NULL in this way, and we are giving them advance notice of > > the change. > > Also a deprecation notice should, IMHO, never be promoted to an exception. > > > > > Unfortunately deprecation notices are being ignored (as noted in my RFC, > with examples on how); and there is the intention to use Type Error > Exceptions in 9.0 (the thing I'm trying to avoid). > > I'm approaching this from a security point of view - the last thing I want > to do in ~5 years is auditing code/systems where they have stuck with 8.x > because it's too hard to upgrade. > > > > There are behaviours of PHP that will probably never be "fixable", but this > > should not prevent us from improving the aspects we can. > > And if we only focused on how developers use PHP we would still have > > register globals, magic quotes, etc. > > > > > I want the language to be better as well, and I was *very* glad to see the > back of Register Globals and Magic Quotes (I know they were a pain to > remove, but it was well justified, as they caused so many problems). > > > > Clearly it seems you won't listen to why these changes were made and want > > to reverse course, therefore good luck convincing enough people to accept > > such an RFC. > > > > > I have been listening/reading (it's why I've taken about 2 hours to read > what you have written, and reply). I have also been discussing this with > different people (on and off list)... unlike the original RFC discussion, > where it was only Craig Duncan that mentioned this issue: > > https://externals.io/message/112327#112531 > > And it's because I've been listening to people that I changed my mind, and > created this new RFC (as noted above). > > > > I think I've said all there is to say and won't interact further with any > > such proposals. > > > > > That's a shame, as I'd like to discuss solutions to this problem you would > be happy with. > > Craig
I don't have a vote, so my opinion only goes so far. But here's my 2¢.
> I see null as a real type
This confuses me. I see "false" and "true" becoming their own types, so what is a boolean? The language currently has no way to express aliases of types, otherwise I'd say a boolean is an alias "true|false" but it doesn't. What happens if a function returns the "false" type and I pass it to a function that only allows "bool"? I expect it to work. So naturally, there is some coercion in there as well, and it would be great if null got the same treatment where old codebases benefit without taking away the advantage for newer codebases. I see this RFC as a balance in that regard, because real life is messy. I really hope it passes and good luck! Robert Landers Software Engineer Utrecht NL
  117528
April 14, 2022 08:27 craig@craigfrancis.co.uk (Craig Francis)
On Thu, 14 Apr 2022 at 08:31, Robert Landers robert@gmail.com>
wrote:

> > I see null as a real type > > This confuses me...
Andreas is probably the best person to explain their view; oddly I see NULL as it's own type as well, because are there are times where it's useful to determine the difference between NULL and an Empty String; but when looking at examples like user values (e.g. GET/POST/COOKIE) being checked (e.g. `strlen()` or `preg_match()`), encoded (e.g. shown on a web page with `htmlspecialchars()`), or simply modified (e.g. `trim()`), I don't see why those possible NULL values should lead to a Fatal Type Error (unless you are talking about an environment that uses NULL as an invalid value, but I suspect that's quite rare). I see this RFC as a balance in that regard, because real life is
> messy. I really hope it passes and good luck! >
Thank you Robert, it's right the balance I'm trying to find. Craig
  117537
April 16, 2022 11:17 rowan.collins@gmail.com (Rowan Tommins)
On 8 April 2022 18:34:52 BST, Craig Francis <craig@craigfrancis.co.uk> wrote:
>I've written a new draft RFC to address the NULL coercion problems: > >https://wiki.php.net/rfc/null_coercion_consistency
I'm sympathetic to the general problem you're trying to solve, but I'm not convinced by the argument that this is about consistency, because user defined functions have been consistently rejecting null for non-nullable parameters since PHP 7.0, and even before that for arrays and objects. Consistency is a good argument for small changes that eliminate unusual edge cases, but I think far-reaching changes should be able to stand on their own merits - regardless of whether it's consistent, do we want the language to work this way? To me, the main defence of type coercion is that PHP operates in a world full of strings - URLs are strings, HTTP requests are strings, and a lot of API responses are strings - so making it easy to work with strings like '42' as numbers makes a lot of sense. It's not clear to me that this extends to null. I think a large part of the problem here is that null can mean many different things in different contexts - "unknown", "not provided", "invalid input", "default", "not applicable", etc. These differences are subtle, but lead to different expectations of behaviour: - Treat null as a specific case with its own meaning, distinct from any other valid value. This is what explicitly nullable parameters and union types allow. - Treat null the same as any other out of range value, and raise an error. This is what happens in user-defined functions in PHP, and in built-in functions expecting non-scalar arguments. Compare also out of range actions like division by zero. - Treat null as a special state that propagates through expressions, because any operation with an unknown input has an unknown output. This is the approach to null taken by SQL, and by IEEE floating point with NaN. It is also the basis of the ?-> operator, and of things like Optional.map in Swift. - Treat null as a generic default value which can be filled in implicitly based on requirements. This is the interpretation currently taken by internal functions for scalar arguments, and what you are proposing to make standard. It is also, as you point out, the way PHP treats null in some other contexts, such as many operators. Interestingly, one of your examples mentions filter_input, which takes the "propagate" approach, and htmlspecialchars, which doesn't. It would often be more useful to retain the information that a value is null than to have it silently converted to an empty string as a side-effect of some other operation. Perhaps it would be useful to have some function-call equivalent of the ?-> operator. I'm not sure what this would look like for normal function calls, but it would be easy to add if we had a pipe operator, e.g.: If this was equivalent to htmlspecialchars($foo) $foo |> htmlspecialchars(...) Then this could be equivalent to ($foo === null ? null : htmlspecialchars($foo)) $foo ?|> htmlspecialchars(...) I'm not set against this RFC, but I'm not quite convinced by the case it makes, and think there may still be other options to explore. Regards, -- Rowan Tommins [IMSoP]
  117542
April 19, 2022 11:34 craig@craigfrancis.co.uk (Craig Francis)
On Sat, 16 Apr 2022 at 12:17, Rowan Tommins collins@gmail.com> wrote:

> On 8 April 2022 18:34:52 BST, Craig Francis <craig@craigfrancis.co.uk> > wrote: > >I've written a new draft RFC to address the NULL coercion problems: > > > >https://wiki.php.net/rfc/null_coercion_consistency > > > I'm sympathetic to the general problem you're trying to solve, but I'm not > convinced by the argument that this is about consistency, because user > defined functions have been consistently rejecting null for non-nullable > parameters since PHP 7.0, and even before that for arrays and objects.
Thanks for looking at this Rowan. I am open to discussing different solutions (this is my second attempt), and any suggestions are welcome. In regards to consistency, while I think my RFC does that, it's not my main motivation for creating this RFC - which is to address the backwards compatibility issues. But to expand on consistency, I appreciate user defined functions (since 7.0) didn't coerce NULL, and I do want internal/user defined functions to be consistent with each other (why I agree with the spirit of the original RFC), it's just odd that string/int/float/bool can be coerced, but NULL cannot be coerced in this context and can be in other contexts, e.g. ``` $i = 0; $n = NULL; if ($i == $n) { // Fine } print($i); // Fine print($n); // Fine printf('%s', $i); // Fine printf('%s', $n); // Fine echo 'ConCat ' . $i . $n; // Fine echo htmlspecialchars($i); // Fine echo htmlspecialchars($n . ''); // Fine echo htmlspecialchars($n); // Bad??? ``` Consistency is a good argument for small changes that eliminate unusual
> edge cases, but I think far-reaching changes should be able to stand on > their own merits - regardless of whether it's consistent, do we want the > language to work this way? >
Good question, while I mention consistency, my focus is on backwards compatibility, and how PHP has accepted NULL to these ~335 parameters since, well, forever... my fear is that the upgrade to PHP 9 will be so difficult (as noted with the lack of tooling to solve this), I'll be auditing PHP code that's stuck on 8.x for many years to come. https://twitter.com/mwop/status/1441044164880355342 To me, the main defence of type coercion is that PHP operates in a world
> full of strings - URLs are strings, HTTP requests are strings, and a lot of > API responses are strings - so making it easy to work with strings like > '42' as numbers makes a lot of sense. It's not clear to me that this > extends to null. >
Yep, PHP does work in a world of strings, and for simplicity string coercion works fairly well... but I think I can include NULL in that definition, because HTTP requests do not guarantee values have been provided, where PHP (via filter_input) and frameworks (Laravel, Symfony, CakePHP, CodeIgniter, etc), have used NULL to note when a value has not been provided... and many developers have simply not cared about that distinction (e.g. when passing to htmlspecialchars, trim, strlen, preg_match, etc). I think a large part of the problem here is that null can mean many
> different things in different contexts - "unknown", "not provided", > "invalid input", "default", "not applicable", etc. > > These differences are subtle, but lead to different expectations of > behaviour: > > - Treat null as a specific case with its own meaning, distinct from any > other valid value. This is what explicitly nullable parameters and union > types allow. > - Treat null the same as any other out of range value, and raise an error. > This is what happens in user-defined functions in PHP, and in built-in > functions expecting non-scalar arguments. Compare also out of range actions > like division by zero. > - Treat null as a special state that propagates through expressions, > because any operation with an unknown input has an unknown output. This is > the approach to null taken by SQL, and by IEEE floating point with NaN. It > is also the basis of the ?-> operator, and of things like Optional.map in > Swift. > - Treat null as a generic default value which can be filled in implicitly > based on requirements. This is the interpretation currently taken by > internal functions for scalar arguments, and what you are proposing to make > standard. It is also, as you point out, the way PHP treats null in some > other contexts, such as many operators. >
Thank you, that's a really good explanation. The developers I work with would assume the last definition, in the same way they can pass in an integer to `urlencode()`, where they just don't think about it being implicitly coerced into a string, it just works (TM)... that's not to say the `strict_types=1` style of checking, and NULL being treated as an invalid value, doesn't provide value for some developers (but as shown in the stats, they are in the minority). Interestingly, one of your examples mentions filter_input, which takes the
> "propagate" approach, and htmlspecialchars, which doesn't. It would often > be more useful to retain the information that a value is null than to have > it silently converted to an empty string as a side-effect of some other > operation.
Those are interesting points. I think `filter_input()` is a bit different, as it's a source of data (not exactly propagating); whereas all of the other functions, they either process that data (and return a sting, and do not propagate) or work with it (e.g. preg_match and strcmp returning a bool, strlen returning an integer, etc). Retaining NULL is interesting, but it would be a new thing... Perhaps it would be useful to have some function-call equivalent of the ?->
> operator. I'm not sure what this would look like for normal function calls, > but it would be easy to add if we had a pipe operator, e.g.: > > If this was equivalent to htmlspecialchars($foo) > $foo |> htmlspecialchars(...) > > Then this could be equivalent to ($foo === null ? null : > htmlspecialchars($foo)) > $foo ?|> htmlspecialchars(...) >
.... adding that syntax might be useful in some contexts (when the developer wants to make NULL easier to propagate); but that would be a new feature, and I don't think it solves the backwards compatibility issue we currently face, and what I'm trying to address in this RFC. I'm not set against this RFC, but I'm not quite convinced by the case it
> makes, and think there may still be other options to explore. >
I'm open to any suggestions, I just want to avoid the situation where Fatal Type Errors happen seemingly randomly (as in, someone who has never used type checks before, and they use NULL without any thought about it). I'm even open to the idea of someone coming up with a tool to auto-update existing code, but so far I've only heard people say that is the way it should be done (with absolutely no detail on how it would even begin to work). I also cannot explain why NULL should be rejected, other than for those developers who see NULL as an invalid value and also use `strict_types=1`... as in, when a team of developers spends a few hundred hours adding strval() everywhere, what does that actually achieve? what do they tell the client? does it make the code easier to read? does it avoid any bugs? or is it only for 8.1 compatibility? Anyway, thanks again Rowan, I really appreciate your thoughts on this. Craig
  117543
April 19, 2022 13:17 rowan.collins@gmail.com (Rowan Tommins)
On 19/04/2022 12:34, Craig Francis wrote:
> The developers I work with would assume the last definition
I think you've somewhat missed my point. I wasn't talking about people's habits or preferences, I was talking about different *scenarios* where null is used to mean different things. Yes, some people prefer languages that "fails early" and some are more interested in "do what I mean", but not everything is about that.
> I also cannot explain why NULL should be rejected ... does it avoid > any bugs?
Yes, sometimes. Imagine an array of values provided by the user; during validation, those which were optional and not provided get set to null. You then loop through and display those which were provided: foreach ( $fields as $name => $value ) {     if ( $value !== null ) {          echo "$name: $value
\n";     } } Then, you realise you forgot about escaping, and decide to run everything through htmlspecialchars(): $htmlFields = array_map('htmlspecialchars', $fields); foreach ( $htmlFields as $name => $value ) {     if ( $value !== null ) {          echo "$name: $value
\n";     } } Spot the bug? $value will now never be null, because htmlspecialchars() will silently turn the nulls into empty strings. I've also seen the opposite problem: a string function was removed because it was no longer needed, and the code broke because values, including nulls, were no longer being cast to string. Is protecting against this worth the backwards compatibility cost of changing the behaviour, and requiring extra code in other scenarios? Possibly not. But that's different from not having any benefit. Regards, -- Rowan Tommins [IMSoP]
  117551
April 20, 2022 17:02 craig@craigfrancis.co.uk (Craig Francis)
On Tue, 19 Apr 2022 at 14:17, Rowan Tommins collins@gmail.com> wrote:

> On 19/04/2022 12:34, Craig Francis wrote: > > The developers I work with would assume the last definition > > > I think you've somewhat missed my point. I wasn't talking about people's > habits or preferences, I was talking about different *scenarios* where > null is used to mean different things. >
Yep, and I'm thankful you listed them... I'm just trying to focus on how PHP has worked (implicitly based on requirements), and how changing that causes problems (and will be an even bigger problem when it becomes a Fatal Type Error). Yes, some people prefer languages that "fails early" and some are more
> interested in "do what I mean", but not everything is about that. >
Agreed, the only thing I'd add... failing early with NULL might help debug some problems (assuming there is a problem), but I believe static analysis and unit tests are much better at doing this (e.g. static analysis is able to see if a variable can be NULL, and apply those stricter checks, as noted by George with Girgias RFC). In contrast, failing early at runtime, on something that is not actually a problem, like the examples in my RFC, creates 2 problems (primarily upgrading; but also adding unnecessary code to explicitly cast those possible NULL values to the correct type, which isn't needed for the other scalar types).
> > I also cannot explain why NULL should be rejected ... does it avoid > > any bugs? > > > Yes, sometimes. Imagine an array of values provided by the user; during > validation, those which were optional and not provided get set to null. > You then loop through and display those which were provided: > > foreach ( $fields as $name => $value ) { > if ( $value !== null ) { > echo "$name: $value
\n"; > } > } > > Then, you realise you forgot about escaping, and decide to run > everything through htmlspecialchars(): > > $htmlFields = array_map('htmlspecialchars', $fields); > foreach ( $htmlFields as $name => $value ) { > if ( $value !== null ) { > echo "$name: $value
\n"; > } > } > > Spot the bug? $value will now never be null, because htmlspecialchars() > will silently turn the nulls into empty strings. >
Thank you... but I will add, while `htmlspecialchars()` rejecting NULL would get you to look at the code again, I wouldn't say it's directly picking up the problem, and it relies on there being a NULL value in $fields for this to be noticed (if that doesen't happen, you're now in a position where random errors will start happening in production). This is where I'd note the value of unit tests, as they are much better placed to check this feature. Bit of a tangent, I'm uncomfortable that `$name` is not being HTML encoded, which takes us to context aware templating engines, and how you can identify these mistakes via the `is_literal` RFC or the `literal-string` type. I've also seen the opposite problem: a string function was removed
> because it was no longer needed, and the code broke because values, > including nulls, were no longer being cast to string. > > > Is protecting against this worth the backwards compatibility cost of > changing the behaviour, and requiring extra code in other scenarios? > Possibly not. But that's different from not having any benefit.
That's fair, adding in an extra check might give some benefit in some cases, but I don't think it's reliable, or worth the cost in updating existing code (with no tooling to help) and the additional code that will be needed to cast these possible NULL values to the relevant type (which isn't needed for the other scalar types). That said, if you (or anyone) have any better ideas on how to address this problem, please let me know. Craig
  117552
April 20, 2022 17:26 cmbecker69@gmx.de ("Christoph M. Becker")
On 20.04.2022 at 19:02, Craig Francis wrote:

> In contrast, failing early at runtime, on something that is not actually a > problem, like the examples in my RFC, creates 2 problems (primarily > upgrading; but also adding unnecessary code to explicitly cast those > possible NULL values to the correct type, which isn't needed for the other > scalar types).
Null is *not* a scalar type[1], though. This is the reason why it is not coerced for userland functions using *scalar* type hints with coercive typing. [1] <https://www.php.net/is_scalar> -- Christoph M. Becker
  117553
April 20, 2022 17:49 craig@craigfrancis.co.uk (Craig Francis)
On Wed, 20 Apr 2022 at 18:26, Christoph M. Becker <cmbecker69@gmx.de> wrote:

> Null is *not* a scalar type[1], though. This is the reason why it is > not coerced for userland functions using *scalar* type hints with > coercive typing. > > [1] <https://www.php.net/is_scalar> >
But why not? and how many developers actually care how PHP has defined a scalar type and how that relates to coercion specifically (and only) with functions? Bit of a tangent, I could argue that NULL is closer to a simple string/int/float/bool, than a complex array/resource/object (well, an object without a __toString method)... or in other words, NULL is a simple value that is coerced perfectly well in other contexts, and has been coerced for internal functions since, well, forever. Anyway, the problem I'm trying to solve is the backwards compatibility issue this is causing, and how it will affect upgrading when it becomes a Fatal Type Error... if you have any suggestions on how to solve this, please let me know, for now this is the best solution I've got. Craig
  117559
April 21, 2022 14:09 rowan.collins@gmail.com (Rowan Tommins)
On Wed, 20 Apr 2022 at 18:02, Craig Francis <craig@craigfrancis.co.uk>
wrote:
> I'm just trying to focus on how PHP has worked
You keep repeating this mantra, but user-defined functions with declared parameter types have never accepted nulls, in any mode, unless explicitly marked with "?" or "= null". The fact that internal functions have parameter parsing behaviour that is almost impossible to implement in userland, and often not even consistent between functions, is a wart of engine internals, not a design decision. All of that, and the "consistency" in the title of your RFC, is a complete distraction from the real questions: 1) given a null input, and a non-nullable parameter, what should the run-time do? 2) what is the best way to help users update their code to be compatible with that new behaviour?
> Agreed, the only thing I'd add... failing early with NULL might help debug some problems (assuming there is a problem), but I believe static
analysis and unit tests are much better at doing this (e.g. static analysis is able to see if a variable can be NULL, and apply those stricter checks, as noted by George with Girgias RFC). I think we should institute a "swear jar" - whenever someone says "static analysis could do this better", they have to donate €1 to the PHP Foundation. :P More seriously, 1) your own RFC claims that fewer than 33% of developers use static analysis; and 2) if PHP had a compile step with mandatory static analysis, we would still have to decide whether passing NULL to these functions should be rejected as an error by the static analyser.
> In contrast, failing early at runtime, on something that is not actually a > problem, like the examples in my RFC, creates 2 problems (primarily > upgrading; but also adding unnecessary code to explicitly cast those > possible NULL values to the correct type, which isn't needed for the other > scalar types). >
I've been trying not to nit-pick, because I think over-all you do have a valid point, but several of your examples do not need any extra code to handle nulls; and those that do need maybe a handful of characters. For instance, $search = ($_GET['q'] ?? ''); is both shorter and clearer than $search = ($_GET['q'] ?? NULL);
> Thank you... but I will add, while `htmlspecialchars()` rejecting NULL > would get you to look at the code again, I wouldn't say it's directly > picking up the problem, and it relies on there being a NULL value > in $fields for this to be noticed (if that doesen't happen, you're now in a > position where random errors will start happening in production). >
There are always going to be errors that depend on the value of input, and unless you have god-like skill at writing tests, some of those will only happen in production. Saying "this shouldn't happen" doesn't answer the question of what to do when it does.
> Bit of a tangent, I'm uncomfortable that `$name` is not being HTML > encoded, which takes us to context aware templating engines, and how you > can identify these mistakes via the `is_literal` RFC or the > `literal-string` type. >
This isn't real code, it's an illustrative example; but if it makes you feel more comfortable, imagine $name has some deliberate HTML markup in it, so needs to be echo'd raw.
> That said, if you (or anyone) have any better ideas on how to address this > problem, please let me know. >
I honestly would be more interested in having some string functions return null for null input than changing existing behaviour to accept null for non-nullable parameters (interestingly, until PHP 8.0, htmlspecialchars() could return null, e.g. if given an array). Unfortunately, that would be a different kind of compatibility break, so I'm not sure it fully solves the problem. Regards, -- Rowan Tommins [IMSoP]
  117560
April 21, 2022 14:40 cschneid@cschneid.com (Christian Schneider)
Am 21.04.2022 um 16:09 schrieb Rowan Tommins collins@gmail.com>:
> All of that, and the "consistency" in the title of your RFC, is a complete > distraction from the real questions: > > 1) given a null input, and a non-nullable parameter, what should the > run-time do? > 2) what is the best way to help users update their code to be compatible > with that new behaviour?
You are leaving out option 3 (which is not part of the RFC but should still be on the table IMHO): 3) Leave the behavior but change the parameter definition to nullable to match the implementation.
> For instance, $search = ($_GET['q'] ?? ''); is both shorter and clearer than > $search = ($_GET['q'] ?? NULL);
Your version is lossy as you cannot distinguish between "empty query" and "no query was submitted" any longer. While this is normally easily fixed checking other flags it needs additional work and cannot be done with a simple search and replace. Regards, - Chris
  117562
April 21, 2022 16:40 rowan.collins@gmail.com (Rowan Tommins)
On 21 April 2022 15:40:42 BST, Christian Schneider <cschneid@cschneid.com> wrote:
>You are leaving out option 3 (which is not part of the RFC but should still be on the table IMHO): >3) Leave the behavior but change the parameter definition to nullable to match the implementation.
Those weren't options, they were questions, or problem statements. Also, as he's repeatedly pointed out, Craig suggested making the parameters explicitly nullable, and received clear feedback that that wouldn't be popular.
>> For instance, $search = ($_GET['q'] ?? ''); is both shorter and clearer than >> $search = ($_GET['q'] ?? NULL); > >Your version is lossy as you cannot distinguish between "empty query" and "no query was submitted" any longer.
The example passes the variable to htmlspecialchars, so this is exactly what will happen anyway - it returns an empty string for a null input. If you want to distinguish between null and empty string, you *must* avoid passing it to such a function, so the proposed error is a good thing. The whole argument in favour of coercing rests on code that *doesn't* need to distinguish between null and empty string. Note that as well as being shorter, this is more appropriate than using strval() or (string), which will accept other types as well as null. Regards, -- Rowan Tommins [IMSoP]
  117589
April 25, 2022 09:33 craig@craigfrancis.co.uk (Craig Francis)
On 21 Apr 2022, at 15:09, Rowan Tommins collins@gmail.com> wrote:
> On Wed, 20 Apr 2022 at 18:02, Craig Francis <craig@craigfrancis.co.uk> wrote: >> I'm just trying to focus on how PHP has worked > > You keep repeating this mantra, but user-defined functions with declared parameter types have never accepted nulls, in any mode, unless explicitly marked with "?" or "= null".
Thanks Rowan, and you're right that user defined functions haven't worked like that, but that's not really what I'm focused on - which is how PHP has always worked in regards to internal functions (the bit that has changed, and is causing problems), and how NULL coercion works in other contexts. I mention user defined functions because I still want user and internal functions to work in the same way (one form of consistency).
> The fact that internal functions have parameter parsing behaviour that is almost impossible to implement in userland, and often not even consistent between functions, is a wart of engine internals, not a design decision.
Bit of a tangent, but do you have some examples? would be nice to clean some of these up, or at least have them in mind as we discuss this RFC.
> All of that, and the "consistency" in the title of your RFC, is a complete distraction from the real questions: > > 1) given a null input, and a non-nullable parameter, what should the run-time do? > 2) what is the best way to help users update their code to be compatible with that new behaviour?
I'm happy to change the title, but consistency does apply in different ways... as in, ensuring user and internal function parameters work in the same way (even if that way might need to change slightly); how other simple values like string/int/float/bool values can still be coerced for parameters, but NULL won't be; and how NULL coercion does work in other contexts (string contact, `==` comparison, sprintf, print, array keys). To answer your questions, 1. I think NULL inputs should be coerced for non-nullable parameters, when not using `strict_types=1`, and where static analysis tools provide the additional/stricter checks (like how they can reject a string '5' for an integer parameter). With nullable parameters being different by accepting NULL without coercion. 2. With this approach, developers won't need to update their code... in contrast, the approach of NULL causing a Type Error for internal functions, developers will need to change their code, and I cannot find any tools that help with this, except very strict static analysis to find but not update them (it's difficult tracing every variable from source to sink).
>> Agreed, the only thing I'd add... failing early with NULL might help debug some problems (assuming there is a problem), but I believe static analysis and unit tests are much better at doing this (e.g. static analysis is able to see if a variable can be NULL, and apply those stricter checks, as noted by George with Girgias RFC). > > > I think we should institute a "swear jar" - whenever someone says "static analysis could do this better", they have to donate €1 to the PHP Foundation. :P
Cool, I've got 23 credits left this month :-) So I'll spend 1 more... I think it's fair to say that developers using `strict_types=1` are more likely to be using static analysis; and if `strict_types=1` is going to eventually disappear, those developers won't lose any functionality with the stricter checking being done by static analysis, which can check all possible variable types (more reliable than runtime), and (with the appropriate level of strictness) static analysis can do things like rejecting the string '5' being passed to an integer parameter and null being passed to a non-nullable parameter.
> More seriously, 1) your own RFC claims that fewer than 33% of developers use static analysis; and 2) if PHP had a compile step with mandatory static analysis, we would still have to decide whether passing NULL to these functions should be rejected as an error by the static analyser.
Not sure how to respond to that... do you think PHP will have a compile step, with mandatory static analysis? Considering how static analysis tools today have several levels, and the ability to create a baseline (to ignore problems with old code), are you suggesting that everyone would have the strictest checks enabled, where no coercion happens at all?
>> In contrast, failing early at runtime, on something that is not actually a problem, like the examples in my RFC, creates 2 problems (primarily upgrading; but also adding unnecessary code to explicitly cast those possible NULL values to the correct type, which isn't needed for the other scalar types). > > > I've been trying not to nit-pick, because I think over-all you do have a valid point, but several of your examples do not need any extra code to handle nulls; and those that do need maybe a handful of characters. For instance, $search = ($_GET['q'] ?? ''); is both shorter and clearer than $search = ($_GET['q'] ?? NULL);
Thank you; and you're right, if you write new code today, you could do that, but that assumes you don't need to tell the difference between an empty value vs a missing value (I find this check is rare, but they do happen, and is why most frameworks use NULL for that distinction, WordPress being the only exception I've found so far). But, updating existing code, while that would make automated updates easier, it's likely to cause problems, because you're editing the value source, with no idea about checks later on (like your example which looks for NULL)... and that's why an automated update of existing code would have more luck updating the sinks rather than the sources (e.g. it knows which sinks are expecting a string, so it can add in a `strval($var)`, or `(string) $var`, or `$var ?? ""`).
>> Thank you... but I will add, while `htmlspecialchars()` rejecting NULL would get you to look at the code again, I wouldn't say it's directly picking up the problem, and it relies on there being a NULL value in $fields for this to be noticed (if that doesen't happen, you're now in a position where random errors will start happening in production). > > > There are always going to be errors that depend on the value of input, and unless you have god-like skill at writing tests, some of those will only happen in production. Saying "this shouldn't happen" doesn't answer the question of what to do when it does.
Agreed, it's just that in the example given, runtime checks seemed particularly fragile (as in, the NULL check is a feature to affect the output, and while I don't expect all code to be tested, I would expect explicit features to be, which would ensure a NULL value is provided and checked, instead of relying on the developer remembering to test a NULL value).
>> Bit of a tangent, I'm uncomfortable that `$name` is not being HTML encoded, which takes us to context aware templating engines, and how you can identify these mistakes via the `is_literal` RFC or the `literal-string` type. > > > This isn't real code, it's an illustrative example; but if it makes you feel more comfortable, imagine $name has some deliberate HTML markup in it, so needs to be echo'd raw.
(/me shudders, with memories of a project that did that, and trying to remember which variables contained HTML).
>> That said, if you (or anyone) have any better ideas on how to address this problem, please let me know. > > I honestly would be more interested in having some string functions return null for null input than changing existing behaviour to accept null for non-nullable parameters (interestingly, until PHP 8.0, htmlspecialchars() could return null, e.g. if given an array). Unfortunately, that would be a different kind of compatibility break, so I'm not sure it fully solves the problem.
I've talked to a few developers who see NULL being passed to htmlspecialchars() as something they want to be warned about, because they see NULL as an invalid value (with my RFC, they would still keep that check with `strict_types=1` and/or static analysis). Also, have a look at the quiz results: https://quiz.craigfrancis.co.uk/ It kinda confirms those discussions, while I don't think many people carefully went though the list of parameters (a bit more on the quantitative rather than qualitative side), if you focus on the people who selected option 3 first (preferring a Fatal Error for NULL), it's only Kamil and Tim who were ok with htmlspecialchars() accepting NULL, but they were not open to many of the other 334 parameters. That said, your suggestion of "?|>" could work in addition to my RFC, so the NULL value can be kept when explicitly asked to preserve (avoiding that backwards compatibility issue). Craig
  117605
April 25, 2022 21:07 rowan.collins@gmail.com (Rowan Tommins)
On 25/04/2022 10:33, Craig Francis wrote:

>> The fact that internal functions have parameter parsing behaviour that is almost impossible to implement in userland, and often not even consistent between functions, is a wart of engine internals, not a design decision. > Bit of a tangent, but do you have some examples? would be nice to clean some of these up, or at least have them in mind as we discuss this RFC.
Fundamentally, the internal parameter handling system (ZPP) is completely separate from the way function signatures work in userland, and evolved based on a different set of requirements. The emphasis of ZPP is on unwrapping zval structs to values that can be manipulated directly in C; so, for instance, it has always had support for integer parameters. Since 7.0, userland signatures have evolved an essentially parallel set of features with an emphasis on designing a consistent and useful dynamic typing system. Increasingly, ZPP is being aligned with the userland language, which also allows reflection information to be generated based on PHP stubs. For instance: * Making rejected parameters throw TypeError rather than raise a Warning and return null * Giving optional parameters an explicit default in the signature rather than inspecting the argument count * Using union types, rather than ad hoc if/switch on zval type The currently proposed change to how internal functions handle nulls in 9.0 is just another part of that process - the userland behaviour is well-established, and we're making the ZPP behaviour match. Off the top of my head, I don't know what other inconsistencies remain, but my point was that in every case so far, internal functions have been adapted to match userland, not vice versa.
> So I'll spend 1 more... I think it's fair to say that developers using `strict_types=1` are more likely to be using static analysis; and if `strict_types=1` is going to eventually disappear, those developers won't lose any functionality with the stricter checking being done by static analysis, which can check all possible variable types (more reliable than runtime), and (with the appropriate level of strictness) static analysis can do things like rejecting the string '5' being passed to an integer parameter and null being passed to a non-nullable parameter.
There's an unhelpful implication here, and in your discussion of testing, that PHP users can be divided into two camps: those who check program correctness with static analysis tools, unit tests, etc; and those who don't care about program correctness. Instead, how about we think about those who are writing new code and want PHP to tell them early when they do something silly; and those who are maintaining large code bases and have to deal with compatibility problems. Neither of these groups is helped enough by static analysers - as you've rightly pointed out elsewhere, static checks are *not* reliable in a dynamic language, and are not likely to be built-in any time soon. I'm by no means the strongest advocate of strictness in PHP - I think there is a risk of throwing out good features with the bad. But I would love to see strict_types=1 become unnecessary - not because "everyone's running static analysers anyway, so who cares" but because the default behaviour provides a good balance of safety and usability. That makes me very hesitant to use the strict_types modes as a crutch for this compatibility break - it only puts off the question of what we think the sensible behaviour actually is.
> Thank you; and you're right, if you write new code today, you could do that, but that assumes you don't need to tell the difference between an empty value vs a missing value
As I've said multiple times now, as soon as you pass it to a function that doesn't have specific handling for nulls, you lose that distinction anyway. There is literally zero difference in behaviour between "$foo = htmlspecialchars($_GET['foo'] ?? null)" and "$foo = htmlspecialchars($_GET['foo'] ?? '')". Telling users when they've passed null to a non-nullable parameter is precisely about *preserving* that distinction: if you want null to mean something specific, treating it as a string is a bug.
> But, updating existing code, while that would make automated updates easier, it's likely to cause problems, because you're editing the value source, with no idea about checks later on (like your example which looks for NULL)... and that's why an automated update of existing code would have more luck updating the sinks rather than the sources (e.g. it knows which sinks are expecting a string, so it can add in a `strval($var)`, or `(string) $var`, or `$var ?? ""`).
That's a fair point, although "sinks" are often themselves the next "source", which is what makes static analysis possible as often as it is. Despite all of the above, I am honestly torn on this issue. It is a disruptive change, and I'm not a fan of errors for errors' sake; but I can see the value in the decision made back in 7.0 to exclude nulls by default. Regards, -- Rowan Tommins [IMSoP]
  117606
April 25, 2022 22:17 larry@garfieldtech.com ("Larry Garfield")
On Mon, Apr 25, 2022, at 4:07 PM, Rowan Tommins wrote:

> Off the top of my head, I don't know what other inconsistencies remain, > but my point was that in every case so far, internal functions have been > adapted to match userland, not vice versa.
Internal functions error if you pass excessive arguments to a non-variadic function. User-space functions just ignore the extras. This is an inconsistency that has caused me considerable grief in the past year. I know Joe has said he wants userspace to become more strict like internal on this one. I will not predict what actually happens. --Larry Garfield
  117624
April 26, 2022 16:36 guilliam.xavier@gmail.com (Guilliam Xavier)
On Tue, Apr 26, 2022 at 12:18 AM Larry Garfield <larry@garfieldtech.com>
wrote:

> On Mon, Apr 25, 2022, at 4:07 PM, Rowan Tommins wrote: > > > Off the top of my head, I don't know what other inconsistencies remain, > > but my point was that in every case so far, internal functions have been > > adapted to match userland, not vice versa. > > > > Internal functions error if you pass excessive arguments to a non-variadic > function. User-space functions just ignore the extras. This is an > inconsistency that has caused me considerable grief in the past year. > > I know Joe has said he wants userspace to become more strict like internal > on this one. I will not predict what actually happens. > > >
A few internal functions have parameters with an `UNKNOWN` default value in the stubs, e.g. https://github.com/php/php-src/blob/php-8.1.5/Zend/zend_builtin_functions.stub.php#L35 function get_class(object $object = UNKNOWN): string {} documented with a question mark at https://www.php.net/manual/en/function.get-class.php get_class(object $object = ?): string or https://github.com/php/php-src/blob/php-8.1.5/ext/standard/basic_functions.stub.php#L1558 function mt_rand(int $min = UNKNOWN, int $max = UNKNOWN): int {} documented with two signatures at https://www.php.net/manual/en/function.mt-rand.php mt_rand(): int mt_rand(int $min, int $max): int -- Guilliam Xavier
  117631
April 26, 2022 20:11 rowan.collins@gmail.com (Rowan Tommins)
On 26/04/2022 17:36, Guilliam Xavier wrote:
> function mt_rand(int $min = UNKNOWN, int $max = UNKNOWN): int {} > > documented with two signatures at > https://www.php.net/manual/en/function.mt-rand.php > > mt_rand(): int > mt_rand(int $min, int $max): int
This is actually a really pertinent example: you might expect mt_rand(null, null) to give the same behaviour as mt_rand(), but actually it will always return 0, because it silently coerces to mt_rand(0, 0). That's exactly the kind of unexpected behaviour type checking aims to protect against. There used to be a lot more functions with pseudo-defaults like this, but a lot were made to accept null in PHP 8.0. e.g. mb_convert_encoding('foo', 'ASCII', null) now acts like mb_convert_encoding('foo', 'ASCII') and looks up a run-time default; but in previous versions it acted as mb_convert_encoding('foo', 'ASCII', '') which is not a valid call. Regards, -- Rowan Tommins [IMSoP]
  117638
April 27, 2022 17:34 craig@craigfrancis.co.uk (Craig Francis)
On 26 Apr 2022, at 21:11, Rowan Tommins collins@gmail.com> wrote:
> > On 26/04/2022 17:36, Guilliam Xavier wrote: >> function mt_rand(int $min = UNKNOWN, int $max = UNKNOWN): int {} >> >> documented with two signatures at >> https://www.php.net/manual/en/function.mt-rand.php >> >> mt_rand(): int >> mt_rand(int $min, int $max): int > > > This is actually a really pertinent example: you might expect mt_rand(null, null) to give the same behaviour as mt_rand(), but actually it will always return 0, because it silently coerces to mt_rand(0, 0). That's exactly the kind of unexpected behaviour type checking aims to protect against.
First, thanks Guilliam for the examples. And Rowan, I agree it's a good example, and I've added it to the Open Issues. But I'm wondering, is it only one function? and assuming it's a problem, could we use `Z_PARAM_LONG_OR_NULL()` and specifically throw an exception when either parameter is NULL, like the `max < min` check? On the basis that I'd rather have one extra check for this function, and keep NULL coercion working everywhere else (i.e. where it's fine). As an aside, under the Future Scope: https://wiki.php.net/rfc/null_coercion_consistency#future_scope I'd noted some functions that could do with some similar changes, but that's more about copying the example of `$separator` in `explode()` and the “cannot be empty” error, so NULL or an Empty String is rejected (rather than just rejecting NULL).
> There used to be a lot more functions with pseudo-defaults like this, but a lot were made to accept null in PHP 8.0. e.g. mb_convert_encoding('foo', 'ASCII', null) now acts like mb_convert_encoding('foo', 'ASCII') and looks up a run-time default; but in previous versions it acted as mb_convert_encoding('foo', 'ASCII', '') which is not a valid call.
Thanks, that's another good example, but I'm going to be a pain, and would suggest that the specific function was correctly updated to work with NULL (and anyone who did pass NULL to that argument, would have had the "must specify at least one encoding" error). Craig
  117637
April 27, 2022 16:30 craig@craigfrancis.co.uk (Craig Francis)
On Mon, 25 Apr 2022 at 23:18, Larry Garfield <larry@garfieldtech.com> wrote:

> > > Internal functions error if you pass excessive arguments to a non-variadic > function. User-space functions just ignore the extras. This is an > inconsistency that has caused me considerable grief in the past year. > > I know Joe has said he wants userspace to become more strict like internal > on this one. I will not predict what actually happens. > >
Just to note our off-list discussion, this is shown with: ``` function example($a) { var_dump(func_get_args()); } $a = example('A', 'B'); // Fine $b = strlen('A', 'B'); // Expects exactly 1 argument, 2 given ``` It's due to the old (pre-variadic) approach of using `func_get_args()`, `func_num_args()`, etc... https://www.php.net/manual/en/functions.arguments.php#functions.variable-arg-list "It is also possible to achieve variable-length arguments by using func_num_args(), func_get_arg(), and func_get_args() functions. This technique is not recommended as it was used prior to the introduction of the ... token." I'd also note that using these functions (instead of "...$var") aren't so good for IDE's or Static Analysis, as the function signature does not clearly define the parameters exist (let alone type)... so these `func_*` functions might be candidates for deprecation, if anyone cares enough about them :-) Craig
  117636
April 27, 2022 15:51 craig@craigfrancis.co.uk (Craig Francis)
On 25 Apr 2022, at 22:07, Rowan Tommins wrote:
> On 25/04/2022 10:33, Craig Francis wrote: > >>> The fact that internal functions have parameter parsing behaviour that is almost impossible to implement in userland, and often not even consistent between functions, is a wart of engine internals, not a design decision. >> Bit of a tangent, but do you have some examples? would be nice to clean some of these up, or at least have them in mind as we discuss this RFC. > > > Fundamentally, the internal parameter handling system (ZPP) is completely separate from the way function signatures work in userland, and evolved based on a different set of requirements. The emphasis of ZPP is on unwrapping zval structs to values that can be manipulated directly in C; so, for instance, it has always had support for integer parameters. Since 7.0, userland signatures have evolved an essentially parallel set of features with an emphasis on designing a consistent and useful dynamic typing system. > > Increasingly, ZPP is being aligned with the userland language, which also allows reflection information to be generated based on PHP stubs. For instance: > > * Making rejected parameters throw TypeError rather than raise a Warning and return null > * Giving optional parameters an explicit default in the signature rather than inspecting the argument count > * Using union types, rather than ad hoc if/switch on zval type > > The currently proposed change to how internal functions handle nulls in 9.0 is just another part of that process - the userland behaviour is well-established, and we're making the ZPP behaviour match. > > Off the top of my head, I don't know what other inconsistencies remain, but my point was that in every case so far, internal functions have been adapted to match userland, not vice versa.
Thank you Rowan, that's really helpful.
>> So I'll spend 1 more... I think it's fair to say that developers using `strict_types=1` are more likely to be using static analysis; and if `strict_types=1` is going to eventually disappear, those developers won't lose any functionality with the stricter checking being done by static analysis, which can check all possible variable types (more reliable than runtime), and (with the appropriate level of strictness) static analysis can do things like rejecting the string '5' being passed to an integer parameter and null being passed to a non-nullable parameter. > > > There's an unhelpful implication here, and in your discussion of testing, that PHP users can be divided into two camps: those who check program correctness with static analysis tools, unit tests, etc; and those who don't care about program correctness. > > Instead, how about we think about those who are writing new code and want PHP to tell them early when they do something silly; and those who are maintaining large code bases and have to deal with compatibility problems. Neither of these groups is helped enough by static analysers - as you've rightly pointed out elsewhere, static checks are *not* reliable in a dynamic language, and are not likely to be built-in any time soon. > > I'm by no means the strongest advocate of strictness in PHP - I think there is a risk of throwing out good features with the bad. But I would love to see strict_types=1 become unnecessary - not because "everyone's running static analysers anyway, so who cares" but because the default behaviour provides a good balance of safety and usability. > > That makes me very hesitant to use the strict_types modes as a crutch for this compatibility break - it only puts off the question of what we think the sensible behaviour actually is.
Yep, I agree, and now I know what George is planning, with the gradual removal of `strict_types=1`, you're right that should not be how the two groups are defined. I would prefer PHP itself to be strict enough to identify and complain about actual mistakes/problems, but be tolerant and allow reasonable forms of type coercion (e.g. your example with a string holding an integer being coerced to an actual integer as needed). That's where I see static analysis providing strict checking for those who want it (e.g. they can choose to not allow any coercion). The only issue I have is when NULL is passed to functions like urlencode(), I do not see that as a problem, and I think NULL coercion should continue to also work in that context (I really don't see why it justifies a fatal Type Error for everyone, as not everyone treats NULL as an invalid value). I've made some tweaks to my RFC in an attempt to better reflect that.
>> Thank you; and you're right, if you write new code today, you could do that, but that assumes you don't need to tell the difference between an empty value vs a missing value > > > As I've said multiple times now, as soon as you pass it to a function that doesn't have specific handling for nulls, you lose that distinction anyway. There is literally zero difference in behaviour between "$foo = htmlspecialchars($_GET['foo'] ?? null)" and "$foo = htmlspecialchars($_GET['foo'] ?? '')".
You're right that changing the NULL coalesce at that point would be fine, because the value is only going to htmlspecialchars() - it's why I think automated tools will have much more luck taking this approach (even if it seems redundant) But we're usually looking at variables for user input (often nullable), that are set earlier in the script, then that nullable variable is used multiple times later. Forgive this primitive example, but this shows `$name` being used in three different ways, where an automated tool cannot simply change line 1 so it doesn't return a NULL, because it would break the Register link. ``` $name = $request->get('name'); if (trim($name) === '') { // Contains non-whitespace characters; so not "", " ", NULL, etc $where_sql[] = 'name LIKE ?'; $where_val[] = $name; } echo '
'; if ($name !== NULL) { $register_url = '/admin/accounts/add/?name=' . urlencode($name); echo '

Add Account

'; } ``` In this example, why does `trim()` and `htmlspecialchars()` justify a fatal Type Error?
> Telling users when they've passed null to a non-nullable parameter is precisely about *preserving* that distinction: if you want null to mean something specific, treating it as a string is a bug.
I don't think that represents a bug, are we are talking about a system that takes user input (so often nullable), supports coercion with other simple types, and supports NULL coercion in other contexts (e.g. string concat).
>> But, updating existing code, while that would make automated updates easier, it's likely to cause problems, because you're editing the value source, with no idea about checks later on (like your example which looks for NULL)... and that's why an automated update of existing code would have more luck updating the sinks rather than the sources (e.g. it knows which sinks are expecting a string, so it can add in a `strval($var)`, or `(string) $var`, or `$var ?? ""`). > > > That's a fair point, although "sinks" are often themselves the next "source", which is what makes static analysis possible as often as it is.
Yep, it can be complicated... and that's why I like your "$foo ?|> htmlspecialchars(...)" suggestion, because existing code (like the example above) expects functions like trim() to always return a string (rather than returning a NULL when provided with a NULL), and that will continue to work as expected/documented, but you get to introduce a new syntax to allow NULL to propagate though expressions.
> Despite all of the above, I am honestly torn on this issue. It is a disruptive change, and I'm not a fan of errors for errors' sake; but I can see the value in the decision made back in 7.0 to exclude nulls by default.
Thanks Rowan, I'll just add that I think the fatal Type Error for NULL will be much more disruptive, and I would rather relax that requirement for user defined functions, so NULL coercion works in all contexts. Craig
  117651
April 30, 2022 17:05 rowan.collins@gmail.com (Rowan Tommins)
On 27/04/2022 16:51, Craig Francis wrote:
> Forgive this primitive example, but this shows `$name` being used in three different ways, where an automated tool cannot simply change line 1 so it doesn't return a NULL, because it would break the Register link. > > ``` > $name = $request->get('name'); > > if (trim($name) === '') { // Contains non-whitespace characters; so not "", " ", NULL, etc > $where_sql[] = 'name LIKE ?'; > $where_val[] = $name; > } > > echo ' >
> > >
'; > > if ($name !== NULL) { > $register_url = '/admin/accounts/add/?name=' . urlencode($name); > echo ' >

Add Account

'; > } > ``` > > In this example, why does `trim()` and `htmlspecialchars()` justify a fatal Type Error?
Honestly, I fail to see how the inconsistent null handling in this example is anything other than a bug: * If strings containing only spaces are considered empty, it's probably a mistake that they're not trimmed everywhere. But of course adding $name = trim($name) would remove all distinction between null and ''. * If null is equivalent to an empty string when deciding whether to run the SQL, why is it not also equivalent to an empty string when rendering the register link? The current logic means that clicking "Go" causes a register link to appear, with an empty name on the query string, which doesn't seem like useful functionality. * On the other hand, if NULL is considered a different state, why is there no behaviour distinguishing it around the SQL logic? It seems likely that an else clause would be added there, perhaps "else { echo 'You must enter a name.'; }" In the current code, that would appear for both null and empty strings, which is probably a mistake. I think that actually demonstrates quite nicely that most code would benefit from treating "string" and "?string" as strictly different types, and either defaulting to an empty string explicitly and early, or considering null at every step. Simultaneously relying on null values being preserved, and them being silently coerced, leads to fragile code where it's not clear where nulls are deliberately handled as empty strings, and where they've simply been forgotten about.
>> Telling users when they've passed null to a non-nullable parameter is precisely about *preserving* that distinction: if you want null to mean something specific, treating it as a string is a bug. > > I don't think that represents a bug, are we are talking about a system that takes user input (so often nullable), supports coercion with other simple types, and supports NULL coercion in other contexts (e.g. string concat).
Read the sentence you replied to again: IF you want to treat null as distinct from '', THEN failing to do so is a bug. If, on the other hand, you just want to take nullable user input and handle it CONSISTENTLY as a string, then it's fine to explicitly default to '' AT SOURCE.
>> Despite all of the above, I am honestly torn on this issue. It is a disruptive change, and I'm not a fan of errors for errors' sake; but I can see the value in the decision made back in 7.0 to exclude nulls by default. > > Thanks Rowan, I'll just add that I think the fatal Type Error for NULL will be much more disruptive, and I would rather relax that requirement for user defined functions, so NULL coercion works in all contexts.
I think the main difference between our positions is that I believe that if PHP's type system was designed from scratch today, null would not be silently coerced in these situations. So while I agree that the change will be disruptive, I disagree with your position that it brings no benefit. On 27/04/2022 18:34, Craig Francis wrote:
> But I'm wondering, is it only one function? and assuming it's a problem, could we use `Z_PARAM_LONG_OR_NULL()` and specifically throw an exception when either parameter is NULL, like the `max < min` check? On the basis that I'd rather have one extra check for this function, and keep NULL coercion working everywhere else (i.e. where it's fine).
Well, since it's one of three examples in Guilliam's e-mail, the answer to the first question seems rather trivially "no", unless I'm missing something? As for the second question, certainly we could add specific prohibitions to null on a case by case basis, but that's basically equivalent to your previous suggestion of explicitly allowing null on a case by case basis, and doesn't really answer the question of what the default behaviour should be - especially bearing in mind that any default should apply to both built-in and user-defined functions. Regards, -- Rowan Tommins [IMSoP]
  117660
May 3, 2022 11:37 craig@craigfrancis.co.uk (Craig Francis)
> On 30 Apr 2022, at 18:05, Rowan Tommins collins@gmail.com> wrote: > > On 27/04/2022 16:51, Craig Francis wrote: >> Forgive this primitive example, but this shows `$name` being used in three different ways, where an automated tool cannot simply change line 1 so it doesn't return a NULL, because it would break the Register link. >> >> ``` >> $name = $request->get('name'); >> >> if (trim($name) === '') { // Contains non-whitespace characters; so not "", " ", NULL, etc >> $where_sql[] = 'name LIKE ?'; >> $where_val[] = $name; >> } >> >> echo ' >>
>> >> >>
'; >> >> if ($name !== NULL) { >> $register_url = '/admin/accounts/add/?name=' . urlencode($name); >> echo ' >>

Add Account

'; >> } >> ``` >> >> In this example, why does `trim()` and `htmlspecialchars()` justify a fatal Type Error? > > > Honestly, I fail to see how the inconsistent null handling in this example is anything other than a bug:
My example was only trying to show how an automated tool cannot simply update the `$name` source (line 1), because it will not be able to easily tell what happens later (especially if the code is split into multiple files)... whereas an automated tool will find it much easier to do this at the sinks: ``` - if (trim($name) === '') { + if (trim((string) $name) === '') { ``` Which is how the default NULL from `getConfigData()` was handled with PHPCompatibility: https://github.com/PHPCompatibility/PHPCompatibility/commit/ab7b0b82d95c82e62f9cffb0f1960f3ccbf0805e <https://github.com/PHPCompatibility/PHPCompatibility/commit/ab7b0b82d95c82e62f9cffb0f1960f3ccbf0805e> Or maybe: ``` - [...] value="' . htmlspecialchars($name) . '"> + [...] value="' . htmlspecialchars($name ?? '') . '"> ``` Which is how "Vaimo Composer Patches" might "fix" the NULL from `extractSingleValue()`: https://github.com/vaimo/composer-patches/pull/96/files <https://github.com/vaimo/composer-patches/pull/96/files> https://github.com/vaimo/composer-patches/blob/cbceb44dfad9e37f18e319125b036a10496ea094/src/Patch/SourceLoaders/PatchesSearch.php#L234 <https://github.com/vaimo/composer-patches/blob/cbceb44dfad9e37f18e319125b036a10496ea094/src/Patch/SourceLoaders/PatchesSearch.php#L234> This is how Laravel Blade handles NULL: https://github.com/laravel/framework/blob/ab1506091b9f166b312b3990d07b2e21d971f2e6/src/Illuminate/Support/helpers.php#L119 <https://github.com/laravel/framework/blob/ab1506091b9f166b312b3990d07b2e21d971f2e6/src/Illuminate/Support/helpers.php#L119> But, on the basis that PHP considers it a problem if NULL is passed to these functions, and should not be allowed (blocking will make better code, somehow)... maybe someone should ask for this PR to be reverted? It's only one library, so they should be able to easily fix all of the code affected by this, right? :-) https://github.com/laravel/framework/pull/36262 <https://github.com/laravel/framework/pull/36262> Interestingly Symphony Twig (which uses NULL for undefined `$context` values), preserves NULL when passed to `twig_escape_filter()`... but that's typically provided directly to `echo`, where NULL is accepted and coerced to an empty string :-) ``` echo twig_escape_filter($this->env, ($context["v"] ?? null), "html", null, true); ``` https://github.com/twigphp/Twig/blob/b4d6723715da57667cca851051eba3786714290d/src/Extension/EscaperExtension.php#L195 <https://github.com/twigphp/Twig/blob/b4d6723715da57667cca851051eba3786714290d/src/Extension/EscaperExtension.php#L195> On the basis that PHP considers it a problem if NULL is passed to these functions, maybe `echo(string ...$expressions)` and `print(string $expression)` should also deprecate (and eventually fatal error) when they receive NULL? That would be fun :-) None of these library changes (to support 8.1) are making their code easier to read, nor are they improving the code quality. Anyway, I'm going on a pointless tangent... the important bit is that many frameworks already (and will probably continue to) return NULL by default, doing so has been fine (and can be useful in some cases)... and now that NULL is going to be rejected, I can only see that creating an upgrade problem.
> * If strings containing only spaces are considered empty, it's probably a mistake that they're not trimmed everywhere. But of course adding $name = trim($name) would remove all distinction between null and ''.
Bit of a tangent, but someone can use `trim()` to check if a value is worth saving (or using the SELECT query), but will still want to preserve exactly what the user wrote (e.g. taking your name as an example, searching for "Tom" vs " Tom" will return your record for both, but not my step-brother Tom). Either way, why should `trim()` trigger a fatal Type Error if it receives NULL? Does that actually represent problem, considering it's fine receiving an integer?
> * If null is equivalent to an empty string when deciding whether to run the SQL, why is it not also equivalent to an empty string when rendering the register link? The current logic means that clicking "Go" causes a register link to appear, with an empty name on the query string, which doesn't seem like useful functionality.
While I wasn't intending it to be a real example... maybe imagine the original code did not run the DB query when the field is left blank (or only contains whitespace characters), probably for performance reasons... and later, after someone complains about duplicate accounts, they put in a small change to ensure the admin did at least one search, but for anti-frustration reasons, they allowed the admin to just press [enter]?
> * On the other hand, if NULL is considered a different state, why is there no behaviour distinguishing it around the SQL logic? It seems likely that an else clause would be added there, perhaps "else { echo 'You must enter a name.'; }" In the current code, that would appear for both null and empty strings, which is probably a mistake. > > I think that actually demonstrates quite nicely that most code would benefit from treating "string" and "?string" as strictly different types, and either defaulting to an empty string explicitly and early, or considering null at every step. Simultaneously relying on null values being preserved, and them being silently coerced, leads to fragile code where it's not clear where nulls are deliberately handled as empty strings, and where they've simply been forgotten about.
It wasn't supposed to demonstrate anything other than the problem an automated tool would have. But, on the basis that many developers don't seem to be "considering null at every step" (going on the roughly 15% of scripts using `strict_types=1`)... should frameworks default to returning an empty string instead? If so, how much existing code will break if that was to happen? Or should all of these developers specify an empty string as their default every time they ask for a user-value? In either case, it's a lot of work to fix the things that break, and I still cannot see the value in doing so.
>>> Telling users when they've passed null to a non-nullable parameter is precisely about *preserving* that distinction: if you want null to mean something specific, treating it as a string is a bug. >> >> I don't think that represents a bug, we are talking about a system that takes user input (so often nullable), supports coercion with other simple types, and supports NULL coercion in other contexts (e.g. string concat). > > > Read the sentence you replied to again: IF you want to treat null as distinct from '', THEN failing to do so is a bug. If, on the other hand, you just want to take nullable user input and handle it CONSISTENTLY as a string, then it's fine to explicitly default to '' AT SOURCE.
But we're talking about existing code, and exiting frameworks, who use NULL by default; and it's been coerced perfectly fine (until 8.1, or for user defined functions that have specified a non-nullable type). Maybe the frameworks should have taken WordPress as their example, and used an empty string by default? And we're back to the question, what is actually wrong with passing NULL to the mentioned functions? I can only see the argument of possibly preserving NULL (with different backwards compatibility issues, and I suspect better handled with your "?|>" suggestion); or those developers who prefer a very strict environment, with no type coercion, where they can treat NULL as an invalid value (but probably should be using static analysis to reliably check their code never passes NULL to these functions or performs any coercion).
>>> Despite all of the above, I am honestly torn on this issue. It is a disruptive change, and I'm not a fan of errors for errors' sake; but I can see the value in the decision made back in 7.0 to exclude nulls by default. >> >> Thanks Rowan, I'll just add that I think the fatal Type Error for NULL will be much more disruptive, and I would rather relax that requirement for user defined functions, so NULL coercion works in all contexts. > > > I think the main difference between our positions is that I believe that if PHP's type system was designed from scratch today, null would not be silently coerced in these situations. So while I agree that the change will be disruptive, I disagree with your position that it brings no benefit.
But what is that benefit? I'm sorry, but I really don't see it. I'm going on the basis that you're ok with numbers in strings being coerced to integers/floats (which I also see as being useful, because you're right, most inputs are strings)... but you're not ok with NULL being coerced (which is also common, because values aren't guaranteed to be provided by the user, and NULL is typically the default).
> On 27/04/2022 18:34, Craig Francis wrote: >> But I'm wondering, is it only one function? and assuming it's a problem, could we use `Z_PARAM_LONG_OR_NULL()` and specifically throw an exception when either parameter is NULL, like the `max < min` check? On the basis that I'd rather have one extra check for this function, and keep NULL coercion working everywhere else (i.e. where it's fine). > > > Well, since it's one of three examples in Guilliam's e-mail, the answer to the first question seems rather trivially "no", unless I'm missing something?
Sorry, I can only see two examples... `get_class()`, which is more of an example of UNKNOWN being used (and NULL has not been "allowed as of PHP 7.2")... and `mt_rand()`, which I mentioned in the RFC, because you provided a good example of NULL being a potential problem, not that I can find anyone doing this publicly: https://grep.app/search?q=mt_rand%28NULL&filter[lang][0]=PHP <https://grep.app/search?q=mt_rand%28NULL&filter%5Blang%5D%5B0%5D=PHP> https://grep.app/search?q=mt_rand%5Cs%2A%5C%28%5Cs%2ANULL®exp=true&filter[lang][0]=PHP <https://grep.app/search?q=mt_rand%5Cs%2A%5C%28%5Cs%2ANULL®exp=true&filter%5Blang%5D%5B0%5D=PHP> https://www.google.com/search?q=%22mt_rand+NULL+NULL%22 <https://www.google.com/search?q=%22mt_rand+NULL+NULL%22>
> As for the second question, certainly we could add specific prohibitions to null on a case by case basis, but that's basically equivalent to your previous suggestion of explicitly allowing null on a case by case basis, and doesn't really answer the question of what the default behaviour should be - especially bearing in mind that any default should apply to both built-in and user-defined functions.
I don't want oddities either, I was just exploring the possibility of a developer accidentally writing `mt_rand(NULL, NULL)`, in the current environment (where the NULL's are coerced to the integer 0); and if it's a problem worth protecting against, how we could handle it. Craig
  117661
May 3, 2022 13:55 rowan.collins@gmail.com (Rowan Tommins)
On 03/05/2022 12:37, Craig Francis wrote:

> But what is that benefit? I'm sorry, but I really don't see it.
I started drafting a longer reply, but honestly I don't think we're getting anywhere. Every attempt to explain the benefit seems to end in one of two ways: - an endless back and forth nit-picking hypothetical situations where it might or might not be useful - an outright dismissal that "people who want it strict can go over there and use strict_types=1 and/or static analysis" To me, it's *always* about trade-offs: the *benefit* of strict checks exists for everyone, and the question is whether they want to pay the *cost* or not. As long as we can't agree on that fundamental point, there's no point continuing the discussion.
> I'm going on the basis that you're ok with numbers in strings being > coerced to integers/floats (which I also see as being useful, because > you're right, most inputs are strings)... but you're not ok with NULL > being coerced (which is also common, because values aren't guaranteed > to be provided by the user, and NULL is typically the default).
I will reply to this point, though, because I think it's a genuinely interesting thing to ponder. One significant difference is that not only is it often not *useful* to distinguish an input of 123 from '123', it's often not *possible*. There is literally no way for an HTTP URL or header to contain an integer, rather than a string representation of one, because it's not a binary protocol. On the other hand, you might well receive an empty string as input where you're expecting an integer. Notably, that is *not* coerced automatically to zero; the code has to explicitly decide if that should trigger distinct behaviour (such as a validation error) or be treated as a default value. Not receiving a field you expected feels very similar, so similar behaviour feels reasonable. Regards, -- Rowan Tommins [IMSoP]
  117672
May 5, 2022 15:29 craig@craigfrancis.co.uk (Craig Francis)
On 3 May 2022, at 14:55, Rowan Tommins collins@gmail.com> wrote:
> > On 03/05/2022 12:37, Craig Francis wrote: >> But what is that benefit? I'm sorry, but I really don't see it. > > > I started drafting a longer reply, but honestly I don't think we're getting anywhere. Every attempt to explain the benefit seems to end in one of two ways: > > - an endless back and forth nit-picking hypothetical situations where it might or might not be useful > - an outright dismissal that "people who want it strict can go over there and use strict_types=1 and/or static analysis" > > To me, it's *always* about trade-offs: the *benefit* of strict checks exists for everyone, and the question is whether they want to pay the *cost* or not. As long as we can't agree on that fundamental point, there's no point continuing the discussion.
I hope I don't come across like that; I'm really trying to understand the benefits and costs (I tend to use small examples to help myself understand). I'm not trying to be dismissive with the use of static analysis. I just think PHP should be tolerant of some things (e.g. string '5' to int 5, and null to empty string), but I also recognise some developers prefer a very strict environment that does not do any type coercion (that's where I think static analysis works really well, as it can enforce extra checks, including type checks for all variables from all sources to all sinks). That said, I do see value in some Type Errors, like how I updated my RFC a couple of weeks ago with some examples "like `substr($string, “offset”)` and `htmlspecialchars(array())` as being clearly problematic" (thanks again George). I'm also fine with `substr('abc', $offset)` rejecting an Empty String or NULL for `$offset` (I'll note that `$offset` was never added to my list of 335 parameters). Under "Future Scope" I've given 4 example parameters that probably should reject an Empty String or NULL (because they do represent problems, similar to how `$separator` in `explode()` already has the “cannot be empty” fatal error). And finally, I can see how `mt_rand(NULL, NULL)` could be a problem (someone assuming NULL represents a default value, but it's coerced to the integer 0), but as I noted in my previous email, I cannot find anyone doing this, and after re-checking my lists and having a re-look though the manual, I think it's the only one that benefits from the rejection of NULL coercion. Taking that as my rough position on type coercion, I don't see a *benefit* from blocking NULL coercion (more below). Whereas, blocking NULL coercion does introduce an upgrade *cost* (not made easier due to the lack of tooling); and the continuing cost to some developers using the noted frameworks or `filter_input()` (e.g. always specifying an empty string default, or always manually casting NULL to a string)... the other cost is the weirdness in how NULL coercion still works for echo()/print(), string concatenation, == comparisons, arithmetics, sprintf, etc.
>> I'm going on the basis that you're ok with numbers in strings being coerced to integers/floats (which I also see as being useful, because you're right, most inputs are strings)... but you're not ok with NULL being coerced (which is also common, because values aren't guaranteed to be provided by the user, and NULL is typically the default). > > > I will reply to this point, though, because I think it's a genuinely interesting thing to ponder. > > One significant difference is that not only is it often not *useful* to distinguish an input of 123 from '123', it's often not *possible*. There is literally no way for an HTTP URL or header to contain an integer, rather than a string representation of one, because it's not a binary protocol. > > On the other hand, you might well receive an empty string as input where you're expecting an integer. Notably, that is *not* coerced automatically to zero; the code has to explicitly decide if that should trigger distinct behaviour (such as a validation error) or be treated as a default value. Not receiving a field you expected feels very similar, so similar behaviour feels reasonable.
I'm someone who will try to justify some very strict coding styles - like no inline JavaScript, use of Trusted Types, the use of literal-string for SQL/HTML/etc, and in some cases the use of application/xhtml+xml (these have easily provable benefits, but they can also be tricky, so few developers use them). With your example, I'm probably fine with an Empty String or NULL not being coerced to int 0 (as in, I could see how it might represent a problem, although I wouldn't care if it did get coerced to 0). But a lot of existing PHP code simply takes user input (which can be NULL), and passes it to these functions with no expectation of a fatal error (not good if it's in mid-way though processing data). If we were talking about a desktop application, where the UI was defined and displayed by that application, then a missing field would represent a problem in that application, but we're typically talking about the web with PHP, where the data often comes from an un-trusted browser... i.e. the user/browser/extension/network can be doing something odd, all the way down to how a standard HTML checkbox works (unchecked does not provide a field). It's because of these oddities, and the way NULL has historically worked, many developers simply don't see the difference between an empty or missing field (like other sources of NULL). That's why I don't see any benefit to blocking NULL coercion in this context, as an Empty String or NULL are often seen as the same - e.g. a programmer is simply checking if a name was provided, checking an email address contains the '@' character, checking if a message is too long, trimming the whitespace from a value, getting a record with an id/slug/name/ref, adding the value to a url, showing the search term, etc. Craig
  117676
May 6, 2022 15:26 internals@lists.php.net ("Björn Larsson via internals")
Den 2022-04-08 kl. 19:34, skrev Craig Francis:
> Hi, > > I've written a new draft RFC to address the NULL coercion problems: > > https://wiki.php.net/rfc/null_coercion_consistency > > This is due to the result of the Allow NULL quiz: > > https://quiz.craigfrancis.co.uk/ > > 14 votes for Fatal Type Errors irrespective of `strict_types=1`; > 13 votes for NULL coercion when not using `strict_types=1`; > 8 votes to update some parameters to allow NULL; > > I appreciate some want to force strict type checking on everyone, but I > want to make sure we have this properly documented, with names, and > explanations. > > Breaking changes should be justified - if they aren't, they only > make upgrading difficult and frustrating (bad for security). > > Craig > Hi,
Have been busy for a while and haven't followed all the details around this RFC, but now back on track. One code pattern to upgrade your code to PHP 8.1 and 9.0 is to use the null coalescing operator like rtrim($string ?? '',...). If one then has a legacy codebase that works flawlessly I would say that this path is the preferred one since it requires a minimum of work. To dig into your code base and address why the parameter ends up like null is probably more cumbersome. OTOH, one doesn't reap the benefits of the "Deprecate passing null to non-nullable arguments of internal functions" RFC since it requires to much work. One could argue that the "Deprecating passing null" RFC is nice, but has a flaw when applied for existing code. Applying the null coalescing code pattern is an easy way to make your code 8.1 compatible, but not sure if it adds value to the application itself. Now if this RFC is the best way to improve things if one think this is something of an issue with the original RFC worth addressing I don't know. But at least worth having a discussion about! The purist within me sighs a little when applying the code pattern above... r//Björn L
  117679
May 7, 2022 00:31 craig@craigfrancis.co.uk (Craig Francis)
On 6 May 2022, at 16:26, Björn Larsson larsson@telia.com> wrote:
> Den 2022-04-08 kl. 19:34, skrev Craig Francis: >> Hi, >> I've written a new draft RFC to address the NULL coercion problems: >> https://wiki.php.net/rfc/null_coercion_consistency >> ... > > One code pattern to upgrade your code to PHP 8.1 and 9.0 is to use the > null coalescing operator like rtrim($string ?? '',...). > > If one then has a legacy codebase that works flawlessly I would say that > this path is the preferred one since it requires a minimum of work. To > dig into your code base and address why the parameter ends up like null > is probably more cumbersome. OTOH, one doesn't reap the benefits of the > "Deprecate passing null to non-nullable arguments of internal functions" > RFC since it requires to much work. > > One could argue that the "Deprecating passing null" RFC is nice, but has > a flaw when applied for existing code. Applying the null coalescing code > pattern is an easy way to make your code 8.1 compatible, but not sure if > it adds value to the application itself. > > Now if this RFC is the best way to improve things if one think this is > something of an issue with the original RFC worth addressing I don't > know. But at least worth having a discussion about! The purist within > me sighs a little when applying the code pattern above...
Thanks Björn, My main concern is about backwards compatibility. If this RFC fails, I think the only way for existing code to avoid the issue will be via an automated tool, but for it to do this safely, I agree with your suggestion, I think it will need to use null coalescing (or casting via `(string) $var` or `strval($var)`) at the variable sinks... but the idea of doing that to the ~335 parameters I've listed, is, well, ugly to say the least... but I'd much rather do that, instead of having projects stuck on 8.x (where they will inevitably stop receiving security patches). I would like to know about the future benefits, and this is where my discussion with Rowan has been interesting (trying to work out the benefits vs costs)... but, tbh, I'm not really seeing a future benefit for breaking NULL coercion with internal functions (sorry). I do see benefits for consistency, and rejecting weird forms of type coercion (e.g. `htmlspecialchars(array())`), and I've tried to factor these into my RFC, but considering how NULL and an Empty String can often be seen as interchangeable (many developers giving no thought to the difference), I'm not sure how these fatal errors will help. Craig
  117680
May 7, 2022 03:45 jordan.ledoux@gmail.com (Jordan LeDoux)
On Fri, Apr 8, 2022 at 10:35 AM Craig Francis <craig@craigfrancis.co.uk>
wrote:

> Hi, > > I've written a new draft RFC to address the NULL coercion problems: > > https://wiki.php.net/rfc/null_coercion_consistency > > This is due to the result of the Allow NULL quiz: > > https://quiz.craigfrancis.co.uk/ > > 14 votes for Fatal Type Errors irrespective of `strict_types=1`; > 13 votes for NULL coercion when not using `strict_types=1`; > 8 votes to update some parameters to allow NULL; > > I appreciate some want to force strict type checking on everyone, but I > want to make sure we have this properly documented, with names, and > explanations. > > Breaking changes should be justified - if they aren't, they only > make upgrading difficult and frustrating (bad for security). > > Craig >
This RFC seems to be trying to force all PHP developers, regardless of what *they* think, to treat null as "whatever the default value is within the type context of execution", which is probably the most dangerous and bug-prone usage of null in PHP code. This would make it almost impossible for most programs to treat null instead as "an undefined value which must be explicitly set", which is another usage I commonly see in code. Given that, I think there needs to be much more justification of this change personally. Jordan
  117682
May 7, 2022 08:38 craig@craigfrancis.co.uk (Craig Francis)
On 7 May 2022, at 04:45, Jordan LeDoux ledoux@gmail.com> wrote:
> > On Fri, Apr 8, 2022 at 10:35 AM Craig Francis <craig@craigfrancis.co.uk <mailto:craig@craigfrancis.co.uk>> wrote: > https://wiki.php.net/rfc/null_coercion_consistency <https://wiki.php.net/rfc/null_coercion_consistency> > > > This RFC seems to be trying to force all PHP developers, regardless of what *they* think, to treat null as "whatever the default value is within the type context of execution", which is probably the most dangerous and bug-prone usage of null in PHP code. This would make it almost impossible for most programs to treat null instead as "an undefined value which must be explicitly set", which is another usage I commonly see in code.
Not what I'm going for... but anyway, to get an idea of your position, do you think the string '15' should be coerced to a number, or should it fatal error as well? e.g. $my_number = round($request->get('my_number')); Craig
  117684
May 7, 2022 09:01 ocramius@gmail.com (Marco Pivetta)
Hey Craig

On Sat, 7 May 2022, 10:39 Craig Francis, <craig@craigfrancis.co.uk> wrote:

> Not what I'm going for... but anyway, to get an idea of your position, do > you think the string '15' should be coerced to a number, or should it fatal > error as well? e.g. > > $my_number = round($request->get('my_number')); >
Passing '15' for `int` should throw a type error: this is why people (slowly, steadily) opt into strict types. I mostly refrained from commenting on this RFC thus far exactly because I'm already "out of the woods", and I already add strict types everywhere. Still, when upgrading legacy PHP apps, I would have to deal with the problem, at which point I prefer looking for type errors in internal function calls, than looking for changes in null coercion (the one proposed here) everywhere. Practically, my idea is: * coercion rules are old and well known * changing coercion rules around `null` drags in complexity * focus should be in migrating to strict types instead Regardless of the output of this RFC, people upgrading to 8.0 or 8.1 will already experience BC breaks in internal function input types, and adapt to them. This RFC added on top of that further destabilises the upgrade path to 8.2 or other versions (wherever introduced). For me, this makes the RFC a net negative value for the language and its consumers, as it adds a problem.
>
  117685
May 7, 2022 10:54 craig@craigfrancis.co.uk (Craig Francis)
> On 7 May 2022, at 10:01, Marco Pivetta <ocramius@gmail.com> wrote: > > Hey Craig
Hi Marco, Thanks for your thoughts.
> On Sat, 7 May 2022, 10:39 Craig Francis, <craig@craigfrancis.co.uk <mailto:craig@craigfrancis.co.uk>> wrote: > Not what I'm going for... but anyway, to get an idea of your position, do you think the string '15' should be coerced to a number, or should it fatal error as well? e.g. > > $my_number = round($request->get('my_number')); > > Passing '15' for `int` should throw a type error: this is why people (slowly, steadily) opt into strict types.
I don't think everyone agrees :-)
> I mostly refrained from commenting on this RFC thus far exactly because I'm already "out of the woods", and I already add strict types everywhere. > > Still, when upgrading legacy PHP apps, I would have to deal with the problem, at which point I prefer looking for type errors in internal function calls, than looking for changes in null coercion (the one proposed here) everywhere. > > Practically, my idea is: > > * coercion rules are old and well known > * changing coercion rules around `null` drags in complexity > * focus should be in migrating to strict types instead > > Regardless of the output of this RFC, people upgrading to 8.0 or 8.1 will already experience BC breaks in internal function input types, and adapt to them.
I will note that my RFC does not change NULL coercion (why I quoted the documentation), and I'm not against the other BC breaks where certain type coercions are clearly problematic. It was 8.1 which introduced this specific BC problem (with ignorable deprecation notices)... and while that change was to begin alignment with user defined functions, it's the user defined functions that have been the oddity (i.e. do not use NULL coercion, unlike other contexts, such as concatenation, == comparisons, arithmetics, sprintf, print, echo, array keys). Personally I think it should have been user defined functions that worked with the "old and well known" coercion rules, and doing so would have reduced complexity (i.e. working in the same way). I'm also not against increasing strictness, and some of the type errors (ref previous emails, and noted in my RFC), but I think it's more of a balance, where an overly strict environment creates different problems (e.g. making the language much more verbose, and can be unnecessarily hard to learn/use). But my focus is on upgrading existing code, and this one is difficult to find and update (e.g. the lack of tooling to help). Craig
  117686
May 7, 2022 11:29 mel@dafert.at (Mel Dafert)
On 7 May 2022 12:54:45 CEST, Craig Francis <craig@craigfrancis.co.uk> wrote:
>I will note that my RFC does not change NULL coercion (why I quoted the documentation), and I'm not against the other BC breaks where certain type coercions are clearly problematic. > >It was 8.1 which introduced this specific BC problem (with ignorable deprecation notices)... and while that change was to begin alignment with user defined functions, it's the user defined functions that have been the oddity (i.e. do not use NULL coercion, unlike other contexts, such as concatenation, == comparisons, arithmetics, sprintf, print, echo, array keys). Personally I think it should have been user defined functions that worked with the "old and well known" coercion rules, and doing so would have reduced complexity (i.e. working in the same way). ...
>But my focus is on upgrading existing code, and this one is difficult to find and update (e.g. the lack of tooling to help).
It is exactly user-defined functions that this RFC introduces breakage for. The behaviour to throw on null in user-defined functions exists since PHP 7.0, and is being relied on. Changing these now would introduce behaviour changes that are harder to find than new type errors. Using strict typing isn't an option either/would be just as much work as auditing the changes this would introduce. It may be that user-defined functions should have accepted null to begin with in your opinion, but that still makes it a breaking change now. Regards, Mel
  117687
May 7, 2022 17:16 alec@alec.pl (Aleksander Machniak)
On 07.05.2022 13:29, Mel Dafert wrote:
> It is exactly user-defined functions that this RFC introduces breakage for. > The behaviour to throw on null in user-defined functions exists since PHP > 7.0, and is being relied on. Changing these now would introduce behaviour changes > that are harder to find than new type errors. > Using strict typing isn't an option either/would be just as much work as auditing > the changes this would introduce. > > It may be that user-defined functions should have accepted null to begin with in > your opinion, but that still makes it a breaking change now.
Indeed. I suggest to add a note to "Backward Incompatible Changes" section of the RFC. This changes the contract, but does not throw new errors in existing code. Now my 2 cents, reading docs and RFCs (I'm not the internals developer): The https://wiki.php.net/rfc/scalar_type_hints_v5 says that NULL was excluded "in order to be consistent with our existing type declarations for classes, callables and arrays". So, to me it sounds like the main reason was that NULL can't be cast to array/object/callable, not that it was expected to be treated as a new type in the future, nor more strictness was intended. At the time excluding NULL made sense as it was a new feature anyway, and it didn't apply to internal functions, so it was fully sign-in feature. Although I must say that "it should be possible for existing userland libraries to add scalar type declarations without breaking compatibility" wasn't really true, because of NULL. You could call the proposed NULL-to-scalar coercion in weak-type checks a new feature that makes https://wiki.php.net/rfc/deprecate_null_to_scalar_internal_arg obsolete/invalid, still, there are good reasons to do this. Imo, the BC break is much less problematic than the one we're trying to solve here. -- Aleksander Machniak Kolab Groupware Developer [https://kolab.org] Roundcube Webmail Developer [https://roundcube.net] ---------------------------------------------------- PGP: 19359DC1 # Blog: https://kolabian.wordpress.com
  117704
May 9, 2022 11:27 craig@craigfrancis.co.uk (Craig Francis)
On 7 May 2022, at 18:16, Aleksander Machniak <alec@alec.pl> wrote:
> On 07.05.2022 13:29, Mel Dafert wrote: >> It is exactly user-defined functions that this RFC introduces breakage for. >> The behaviour to throw on null in user-defined functions exists since PHP >> 7.0, and is being relied on. Changing these now would introduce behaviour changes >> that are harder to find than new type errors. >> Using strict typing isn't an option either/would be just as much work as auditing >> the changes this would introduce. >> It may be that user-defined functions should have accepted null to begin with in >> your opinion, but that still makes it a breaking change now. > > > Indeed. I suggest to add a note to "Backward Incompatible Changes" section of the RFC. This changes the contract, but does not throw new errors in existing code.
Thanks Aleksander, While I'm emailing Mel off-list about a more realistic example, I've added a simple example to demonstrate a BC break: https://wiki.php.net/rfc/null_coercion_consistency#backward_incompatible_changes <https://wiki.php.net/rfc/null_coercion_consistency#backward_incompatible_changes> ``` function my_function(string $my_string) { var_dump($my_string); } try { my_function('A'); // string(1) "A" my_function(1); // string(1) "1" my_function(1.2); // string(3) "1.2" my_function(true); // string(1) "1" my_function(false); // string(0) "" my_function(NULL); // Throw Type Error } catch (TypeError $e) { // Do something important? } ``` I believe the example Mel is looking at involves a function that should return an integer; and later, a second function should only be run when the returned value is `!== 0` (not `!= 0` or `> 0`, due to "style guides telling people to unconditionally prefer triple equal")... that's all working fine, but sometimes the first function can return NULL (I assume that indicates a problem), where it's useful to have the second function throw an exception when this happens (so no more processing happens, and it can be investigated). While I think there are better ways of handling this, I'll work with Mel to get this example into my RFC. Personally I think these are quite unusual, and don't match the size of the BC impact we face when internal functions (that have been accepting and coercing NULL since, well, forever) start throwing fatal type errors in 9.0. Craig
  117692
May 7, 2022 21:11 jordan.ledoux@gmail.com (Jordan LeDoux)
On Sat, May 7, 2022 at 1:38 AM Craig Francis <craig@craigfrancis.co.uk>
wrote:

> > Not what I'm going for... but anyway, to get an idea of your position, do > you think the string '15' should be coerced to a number, or should it fatal > error as well? e.g. > > $my_number = round($request->get('my_number')); > > '15' is not an undefined value.
If a program uses the paradigm that null represents the equivalent of something like `unset($var)`, then the program would not expect coercion to happen at all if a specific type is demanded, for the same reason that a C program wouldn't expect an out-of-range memory address to silently evaluate as an int 0. For programs that use this paradigm, the type system that PHP *already uses* which has a difference between `?int` and `int`, means that there is an explicit piece of code noting the exceptions to this rule about how null is treated. What exactly would be the purpose of `?int` if this RFC was passed? To pass the value as null instead of 0? That's it? What about `int|float`? Which one would it be coerced to? This pretty radically changes how typing itself would work in existing user programs. What I'm saying is that for such a radical BC break you need to provide some compelling justification, and I'm concerned from your replies in this thread and the content of the RFC that you don't actually understand the extent of a BC-break you're proposing. Jordan
  117693
May 7, 2022 21:18 jordan.ledoux@gmail.com (Jordan LeDoux)
On Sat, May 7, 2022 at 2:11 PM Jordan LeDoux ledoux@gmail.com>
wrote:

> > I'm concerned from your replies in this thread and the content of the RFC > that you don't actually understand the extent of a BC-break you're > proposing. > > Sorry, that was a tad more combative than I intended. What I'm trying to
say is that I'm concerned there may be a blindspot here about the impact which could significantly impact developers in ways that aren't being accounted for. Jordan
  117695
May 8, 2022 06:52 alec@alec.pl (Aleksander Machniak)
On 07.05.2022 23:11, Jordan LeDoux wrote:
> What exactly would be the purpose of `?int` if this RFC was passed? To pass > the value as null instead of 0? That's it?
Yes. No change here.
> What about `int|float`? Which one would it be coerced to?
What happens if you pass FALSE to such an argument? int(0). The same would happen with NULL.
> This pretty radically changes how typing itself > would work in existing user programs. What I'm saying is that for such a > radical BC break you need to provide some compelling justification.
If coercion is possible it should be done as it is done for bool/int/float/string. Inside the function you can still trust the variable type is as expected. The only difference is that it will not throw an error on NULL. Right now you can still use your function "incorrectly" and not get an error, e.g. if you pass FALSE to it (I understand NULL might be more common). And, if you use strict_types=1 nothing changes for you. I don't see this as a radical change. Definitely not more radical than making the code to throw errors where they were not thrown before (no matter how long the deprecation period was). So, imo the BC-break argument is not that strong in this case. The consistency argument is also not that strong (but here probably more in favor of the change) because as it was mentioned already current behavior is consistent with some things and inconsistent with others. The essential question is whether we want more of "weak-typing coercion" or not. More would mean that there's no breakage for existing code in PHP9 (essentially only 8.1 would be "broken" (because of the deprecation notice) so people could just skip it). No need for ugly code like "$var ?? ''" everywhere is also a win. Ok, this was my take, now I will shut up. -- Aleksander Machniak Kolab Groupware Developer [https://kolab.org] Roundcube Webmail Developer [https://roundcube.net] ---------------------------------------------------- PGP: 19359DC1 # Blog: https://kolabian.wordpress.com
  117696
May 8, 2022 10:48 jordan.ledoux@gmail.com (Jordan LeDoux)
On Sat, May 7, 2022 at 11:53 PM Aleksander Machniak <alec@alec.pl> wrote:

> > What happens if you pass FALSE to such an argument? int(0). The same > would happen with NULL. > > The thing that everyone seems to be glossing over with these coercion
examples is that in order to have any scalar value, it must be explicitly set *somewhere*. That might be inside an internal function with a pass-by-ref, it might be on a regular old assignment line, it might be as the result of a function return... but somewhere the scalar must be set to that variable. This is not the case with null. If you use the unset() function on a variable for example, it will var_dump as null *and* it will pass an is_null() check *and* it will pass a $var === null *and* it will return false for an isset() check. Null loses these qualities if it is cast to *any* scalar. See: https://3v4l.org/lUcVV False being cast to int(0) at least doesn't change whether or not it will return true or false for isset(). Null fundamentally has the meaning that a variable is uninitialized, no matter what other ways a program happens to use it. Jordan
  117698
May 8, 2022 11:38 marandall@php.net (Mark Randall)
On 08/05/2022 11:48, Jordan LeDoux wrote:
> This is not the case with null. If you use the unset() function on a > variable for example, it will var_dump as null *and* it will pass an > is_null() check *and* it will pass a $var === null *and* it will return > false for an isset() check. Null loses these qualities if it is cast to > *any* scalar.
Fortunately the writing is on the wall for the undefined cases, but that doesn't take away from your argument.
  117699
May 8, 2022 21:33 internals@lists.php.net ("Björn Larsson via internals")
Den 2022-05-08 kl. 08:52, skrev Aleksander Machniak:
> On 07.05.2022 23:11, Jordan LeDoux wrote: >> What exactly would be the purpose of `?int` if this RFC was passed? To >> pass >> the value as null instead of 0? That's it? > > Yes. No change here. > >> What about `int|float`? Which one would it be coerced to? > > What happens if you pass FALSE to such an argument? int(0). The same > would happen with NULL. > >> This pretty radically changes how typing itself >> would work in existing user programs. What I'm saying is that for such a >> radical BC break you need to provide some compelling justification. > > If coercion is possible it should be done as it is done for > bool/int/float/string. Inside the function you can still trust the > variable type is as expected. The only difference is that it will not > throw an error on NULL. Right now you can still use your function > "incorrectly" and not get an error, e.g. if you pass FALSE to it (I > understand NULL might be more common). And, if you use strict_types=1 > nothing changes for you. > > I don't see this as a radical change. Definitely not more radical than > making the code to throw errors where they were not thrown before (no > matter how long the deprecation period was). So, imo the BC-break > argument is not that strong in this case. > > The consistency argument is also not that strong (but here probably more > in favor of the change) because as it was mentioned already current > behavior is consistent with some things and inconsistent with others. > > The essential question is whether we want more of "weak-typing coercion" > or not. More would mean that there's no breakage for existing code in > PHP9 (essentially only 8.1 would be "broken" (because of the deprecation > notice) so people could just skip it). No need for ugly code like "$var > ?? ''" everywhere is also a win. > It's not only ugly code ;) To make your program/application/library 8.1
compatible using that codepattern requires en effort, but brings close to zero improvement, except being 8.1 compatible. So the net effect is negative. r//Björn L
  117701
May 9, 2022 08:18 rowan.collins@gmail.com (Rowan Tommins)
On 08/05/2022 22:33, Björn Larsson via internals wrote:
> It's not only ugly code ;) To make your program/application/library 8.1 > compatible using that codepattern requires en effort, but brings close > to zero improvement, except being 8.1 compatible. So the net effect is > negative.
Important correction: that change is to make your codebase PHP 9.0 compatible, not PHP 8.1 compatible. That is the point of deprecation notices, to give advance warning that something will become incompatible in the future; when you act on that notice is up to you, and not in any sense about "compatibility" with the version that raises the notice. (For my thoughts on the rest of what you're saying, see every other message I've sent to this thread.) Regards, -- Rowan Tommins [IMSoP]
  117703
May 9, 2022 09:06 landers.robert@gmail.com (Robert Landers)
On Mon, May 9, 2022 at 10:18 AM Rowan Tommins collins@gmail.com> wrote:
> > On 08/05/2022 22:33, Björn Larsson via internals wrote: > > It's not only ugly code ;) To make your program/application/library 8.1 > > compatible using that codepattern requires en effort, but brings close > > to zero improvement, except being 8.1 compatible. So the net effect is > > negative. > > > Important correction: that change is to make your codebase PHP 9.0 > compatible, not PHP 8.1 compatible. > > That is the point of deprecation notices, to give advance warning that > something will become incompatible in the future; when you act on that > notice is up to you, and not in any sense about "compatibility" with the > version that raises the notice. > > (For my thoughts on the rest of what you're saying, see every other > message I've sent to this thread.) > > Regards, > > -- > Rowan Tommins > [IMSoP] > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: https://www.php.net/unsub.php >
> Important correction: that change is to make your codebase PHP 9.0 > compatible, not PHP 8.1 compatible.
If you are writing library code, it is 8.1 compatible to avoid people opening issues about spamming their logs.
  117705
May 9, 2022 13:09 rowan.collins@gmail.com (Rowan Tommins)
On 09/05/2022 10:06, Robert Landers wrote:
> If you are writing library code, it is 8.1 compatible to avoid people > opening issues about spamming their logs.
Yes, I understand that such complaints are common, but we should be working to provide documentation and tooling to those users, rather than accepting their misinterpretation and repeating it on this list. Otherwise, we might as well just promote all deprecation notices to fatal errors immediately. I will open a new thread about this later. Regards, -- Rowan Tommins [IMSoP]
  117707
May 9, 2022 23:02 craig@craigfrancis.co.uk (Craig Francis)
On 7 May 2022, at 22:11, Jordan LeDoux ledoux@gmail.com> wrote:
> On Sat, May 7, 2022 at 1:38 AM Craig Francis <craig@craigfrancis.co.uk <mailto:craig@craigfrancis.co.uk>> wrote: > > Not what I'm going for... but anyway, to get an idea of your position, do you think the string '15' should be coerced to a number, or should it fatal error as well? e.g. > > $my_number = round($request->get('my_number')); > > > '15' is not an undefined value. > > If a program uses the paradigm that null represents the equivalent of something like `unset($var)`
Hi Jordan, Some programers can do that, but it's not how everyone sees it. NULL is not the same as `unset()`, it is a value in itself, and many programmers simply give no thought to the distinction between an Empty String and NULL. e.g. when using the noted frameworks, NULL is provided for user values, and developers will often treat it the same as an Empty String... if `$name == ''`, then show an error saying your name is required; if `strlen($name) > 200` then your name is too long, etc.
> What exactly would be the purpose of `?int` if this RFC was passed? To pass the value as null instead of 0?
I'll talk about `int` below; but just so I can focus on the nullability part of this question, I'll use `?string`... Yes, that's all I'm proposing. For example `?string` would not change (it will continue to provide the NULL value to the function), whereas `string` would coerce NULL to an Empty String, e.g. ``` function accept_nullable_string(?string $my_string) { var_dump($my_string); } function accept_string(string $my_string) { var_dump($my_string); } accept_nullable_string(3); // string(1) "3" accept_nullable_string('2'); // string(1) "2" accept_nullable_string(true); // string(1) "1" accept_nullable_string(false); // string(0) "" accept_nullable_string(NULL); // NULL accept_string(3); // string(1) "3" accept_string('2'); // string(1) "2" accept_string(true); // string(1) "1" accept_string(false); // string(0) "" accept_string(NULL); // Currently TypeError, change to string(0) "" ```
> That's it?
Yep, but the point is about avoiding the upgrade problem when internal functions will stop coercing NULL, and instead throw type errors (these are hard to find, and will break code in seemly random places).
> What about `int|float`? Which one would it be coerced to?
Good question, Short answer; today, if you pass the string '4' to a function that uses `int|float`, then it receives `int(4)`: ``` function accept_number(int|float $my_int) { var_dump($my_int); } accept_number('4'); // int(4) ``` Longer answer... First I'll note the documentation says (for string to integer conversion) "If the string is numeric or leading numeric then it will resolve to the corresponding integer value, otherwise it is converted to zero (0)." https://www.php.net/manual/en/language.types.integer.php#language.types.integer.casting <https://www.php.net/manual/en/language.types.integer.php#language.types.integer.casting> But an Empty String is not coerced to `int(0)` for user defined functions, or arithmetic (e.g. `5 + ""` complains after PHP 7.1)... whereas, an empty string is converted to 0 when using `intval($var)`, and `(int) $var`. While I think this can be a bit harsh, I can see how it might avoid some ambiguity/issues, e.g. `$offset = ''; $a = substr('abc', $offset)`, although I have seen `substr('abc', NULL, $length)` a few times. If PHP was to throw an exception for NULL to integer coercion with internal functions, while I recognise this would still create a BC break (e.g. `round($nullable)`), I'd might be fine with it, because an Empty String to Integer is also rejected. That said, and back to your question about `int|float`, because the string '4' is coerced to `int(4)`, I'd prefer NULL to be coerced to `int(0)`.
> This pretty radically changes how typing itself would work in existing user programs.
I wouldn't say this is radical, considering it's just removing a single type error (I suspect it's rare for code to rely on NULL coercion triggering an exception)... in contrast, internal functions have always worked with null coercion, and works in other contexts like string concatenation (`'A' . $nullable`), == comparisons, the print()/echo() functions, etc.
> What I'm saying is that for such a radical BC break you need to provide some compelling justification, and I'm concerned from your replies in this thread and the content of the RFC that you don't actually understand the extent of a BC-break you're proposing.
As noted previously, and the example I've added to the "Backward Incompatible Changes" section, I don't think this is a radical BC break, it's fairly small change designed to make NULL coercion work as documented, in a constant way (all contexts), and most importantly, avoid the upgrade problems for PHP 9.0 (please keep in mind the lack of tooling to help with this, where the only safe and easy way to handle this would be to add NULL coalescing throughout the code base, which is not a good idea). Craig
  117681
May 7, 2022 05:59 alec@alec.pl (Aleksander Machniak)
On 08.04.2022 19:34, Craig Francis wrote:
> Hi, > > I've written a new draft RFC to address the NULL coercion problems: > > https://wiki.php.net/rfc/null_coercion_consistency
As a voter, I'm in favor of this RFC. I suggest to rename "Documentation" section title to "Scalars coercion inconsistency" or sth like that, as this section isn't really much about the documentation, rather the current state. -- Aleksander Machniak Kolab Groupware Developer [https://kolab.org] Roundcube Webmail Developer [https://roundcube.net] ---------------------------------------------------- PGP: 19359DC1 # Blog: https://kolabian.wordpress.com
  117683
May 7, 2022 08:55 craig@craigfrancis.co.uk (Craig Francis)
On 7 May 2022, at 06:59, Aleksander Machniak <alec@alec.pl> wrote:
> > On 08.04.2022 19:34, Craig Francis wrote: >> Hi, >> I've written a new draft RFC to address the NULL coercion problems: >> https://wiki.php.net/rfc/null_coercion_consistency > > As a voter, I'm in favor of this RFC. > > I suggest to rename "Documentation" section title to "Scalars coercion inconsistency" or sth like that, as this section isn't really much about the documentation, rather the current state.
Good point Aleksander, I started with 5 quotes from the Documentation, but you're right, the examples have grown a bit since then, and the second part does show the Current State, so I've gone with that heading (hope that's ok). Craig
  117773
May 23, 2022 18:45 craig@craigfrancis.co.uk (Craig Francis)
On 8 Apr 2022, at 18:34, Craig Francis <craig@craigfrancis.co.uk> wrote:
> I've written a new draft RFC to address the NULL coercion problems: > https://wiki.php.net/rfc/null_coercion_consistency <https://wiki.php.net/rfc/null_coercion_consistency>
For those against my RFC, can you take a quick look at this patch for Laravel: https://github.com/laravel/framework/pull/36262/files#diff-15b0a3e2eb2d683222d19dfacc04c616a3db4e3d3b3517e96e196ccbf838f59eR118 <https://github.com/laravel/framework/pull/36262/files#diff-15b0a3e2eb2d683222d19dfacc04c616a3db4e3d3b3517e96e196ccbf838f59eR118> If passing NULL to `htmlspecialchars()` represents a problem, as used in templates like `

Hi {{ $name }}

`, presumably this patch should be reverted? Craig
  117779
May 24, 2022 08:47 rowan.collins@gmail.com (Rowan Tommins)
On 23/05/2022 19:45, Craig Francis wrote:

> For those against my RFC, can you take a quick look at this patch for Laravel: > > https://github.com/laravel/framework/pull/36262/files#diff-15b0a3e2eb2d683222d19dfacc04c616a3db4e3d3b3517e96e196ccbf838f59eR118 <https://github.com/laravel/framework/pull/36262/files#diff-15b0a3e2eb2d683222d19dfacc04c616a3db4e3d3b3517e96e196ccbf838f59eR118> > > If passing NULL to `htmlspecialchars()` represents a problem, as used in templates like `

Hi {{ $name }}

`, presumably this patch should be reverted?
I notice that the docblock didn't previously list null as valid input, so this was only working by mistake, as the commit message admits. If you copied the documented union to the native type declaration on the parameter, it would immediately reject nulls, because that has always been the behaviour of user-defined functions. Static analysis tools will also have complained that null was not a valid input according to the documentation. I also note that the commit message says "On PHP >= 8.1, an error is thrown if `null` is passed to `htmlspecialchars`." which is of course not true for native PHP, only if you make the highly dubious decision to promote deprecations to errors. As an anecdote, I was recently working on a bug involving nulls causing unintended behaviour in an API. As part of the clean-up, I changed a function signature from getByIdentifer($identifier) to getByIdentifer(string $identifier), and during testing, got an error because I'd missed a scenario where null was passed. This was a good result - it prevented the broken behaviour and alerted me to the case that needed fixing. If the parameter had instead been silently coerced to an empty string, I would have got neither an error nor the original null behaviour, causing a new bug that might have taken much longer to track down. If your RFC as drafted was implemented, I could only have achieved the desired check by turning on strict_types=1 for whole sections of the application, which would probably require dozens of other fixes, or writing an ugly manual check: function getByIdentifier(?string $identifier) {     if ( $identifier === null ) {          throw new TypeError("Function doesn't actually accept nulls");     }     // ... } As I have said previously, I share your concern about the impact of the currently planned change, but I don't think loosening the existing type rules on user-defined functions is a good solution. Regards, -- Rowan Tommins [IMSoP]
  117790
May 26, 2022 12:20 craig@craigfrancis.co.uk (Craig Francis)
On 24 May 2022, at 09:47, Rowan Tommins collins@gmail.com> wrote:
> On 23/05/2022 19:45, Craig Francis wrote: > >> For those against my RFC, can you take a quick look at this patch for Laravel: >> >> https://github.com/laravel/framework/pull/36262/files#diff-15b0a3e2eb2d683222d19dfacc04c616a3db4e3d3b3517e96e196ccbf838f59eR118 >> >> If passing NULL to `htmlspecialchars()` represents a problem, as used in templates like `

Hi {{ $name }}

`, presumably this patch should be reverted? > > > I notice that the docblock didn't previously list null as valid input, so this was only working by mistake, as the commit message admits. If you copied the documented union to the native type declaration on the parameter, it would immediately reject nulls, because that has always been the behaviour of user-defined functions. Static analysis tools will also have complained that null was not a valid input according to the documentation.
To confirm the order of events: First, the Docblock originally said this function did not accept NULL, but at runtime it accepted/coerced NULL to an empty string. This is exactly how `htmlspecialchars()` worked pre 8.1. Where developers using static analysis tools can choose to treat NULL as an invalid value, and those tools could report nullable variables as an error (via strict type checking). Second, the Docblock and implementation was updated to allow NULL, because NULL is common value (a backwards compatibility issue), so this HTML encoding function, used by Blade templates by default, now accepts NULL. This seems to undermine the argument that NULL should not be passed to `htmlspecialchars()`. I did suggest updating the ~335 function parameters to be nullable; but that was rejected because some developers prefer to treat NULL as an invalid value. To keep those developers happy, and avoid the backwards compatibility issue, it seems easier to allow NULL to be coerced (like other contexts, e.g. string concat).
> I also note that the commit message says "On PHP >= 8.1, an error is thrown if `null` is passed to `htmlspecialchars`." which is of course not true for native PHP, only if you make the highly dubious decision to promote deprecations to errors.
While I'd put the word "error" down as a typo, the intention is for 9.0 to throw a type error. And while user-defined functions are part of the conversation (for consistency reasons), I'm trying to find the benefits of breaking NULL coercion for internal functions (because, if there is an overall benefit, that Laravel Blade patch should be reverted).
> As an anecdote, I was recently working on a bug involving nulls causing unintended behaviour in an API. As part of the clean-up, I changed a function signature from getByIdentifer($identifier) to getByIdentifer(string $identifier), and during testing, got an error because I'd missed a scenario where null was passed. This was a good result - it prevented the broken behaviour and alerted me to the case that needed fixing. If the parameter had instead been silently coerced to an empty string, I would have got neither an error nor the original null behaviour, causing a new bug that might have taken much longer to track down. > > If your RFC as drafted was implemented, I could only have achieved the desired check by turning on strict_types=1 for whole sections of the application, which would probably require dozens of other fixes, or writing an ugly manual check: > > function getByIdentifier(?string $identifier) { > if ( $identifier === null ) { > throw new TypeError("Function doesn't actually accept nulls"); > } > // ... > }
It sounds like you got lucky - you have a function that has a problem with NULL (but I assume it's fine with an empty string?), and during your testing you happened to pass NULL to this function. As noted before, static analysis is *considerably* better at these types of checks, because it's able check if variables *can* contain NULL. They can also perform other checks as well (important when your code seems to care about NULL vs an empty string).
> As I have said previously, I share your concern about the impact of the currently planned change, but I don't think loosening the existing type rules on user-defined functions is a good solution.
If you have a better solution, please let me know. I don't think breaking NULL coercion for internal functions helps (the change seems to involve loads of tiny updates, probably by littering projects with `$val ?? ''`, `strval($val)`, or `(string) $val`, with hard to define benefits); in contrast, allowing NULL coercion for user-defined functions would at least keep user-defined and internal functions consistent, with NULL coercion continuing to work in the same way as other contexts (e.g. string concat, == comparisons, sprintf, etc); or how coercion works between string/int/float/bool. That said, I would still like to know about the benefits of rejecting NULL for `htmlspecialchars()`, in the context of that Laravel Blade patch; because, if it is beneficial (vs the BC break), they should revert that patch... shouldn't they? Craig
  117791
May 26, 2022 14:01 michael.babker@gmail.com (Michael Babker)
On Thu, May 26, 2022 at 7:21 AM Craig Francis <craig@craigfrancis.co.uk>
wrote:

> That said, I would still like to know about the benefits of rejecting NULL > for `htmlspecialchars()`, in the context of that Laravel Blade patch; > because, if it is beneficial (vs the BC break), they should revert that > patch... shouldn't they? > No, they should not. Laravel made an explicit choice to handle null values
in its escaping function for its templating framework. That is their choice to make. They should not be forced to implicitly handle null values because the language decides to add implicit type coercion to user defined functions (because consistency seems to be so important to some), and they should not be forced to reject null values because underlying language functions reject them as well. Every implicit type coercion is a bug in hiding IMO. If something doesn’t explicitly support null, you shouldn’t have been passing null to it in the first place. Relying on magic language behaviors to prevent type errors is not a long term sustainable code pattern. If a function, by explicit design, can safely work with null values then that parameter should be properly declared nullable. If a function, by explicit design, does not support null values then a type error is a hugely acceptable response. Yes, that means that there is a lot of work involved for folks between now and the release of PHP 9.0 to either refactor code to avoid type errors or to get the language to expand the supported types in appropriate functions, but IMO that effort from all parties is worth it in the long run to ensure language consistency (I really don’t think userland PHP and internal/extension PHP should have vastly different behaviors, and type coercion support based on where a function is designed is one of those holes that needs filled) and to ensure all APIs explicitly declare what types of data they support. -- - Michael Please pardon any errors, this message was sent from my iPhone.
  117793
May 26, 2022 16:41 craig@craigfrancis.co.uk (Craig Francis)
On 26 May 2022, at 15:01, Michael Babker babker@gmail.com> wrote:
> On Thu, May 26, 2022 at 7:21 AM Craig Francis <craig@craigfrancis.co.uk> wrote: >> That said, I would still like to know about the benefits of rejecting NULL for `htmlspecialchars()`, in the context of that Laravel Blade patch; because, if it is beneficial (vs the BC break), they should revert that patch... shouldn't they? > > No, they should not. Laravel made an explicit choice to handle null values in its escaping function for its templating framework. That is their choice to make. They should not be forced to implicitly handle null values because the language decides to add implicit type coercion to user defined functions (because consistency seems to be so important to some), and they should not be forced to reject null values because underlying language functions reject them as well.
Erm, I'm simply asking - If there is a good reason for throwing an exception when NULL is passed to `htmlspecialchars()`, then that reason would also apply to Laravel Blade for it's HTML encoding. If you know what that reasons is, you should be able to make that case to them, and get that patch reverted. I believe the only time it's helpful is when working in a strict type environment, which is best checked with static analysis, as those extra type checks can highlight unexpected values. But most (I'm assuming ~85%) developers simply do not use strict type checking, and don't seem to consider the sources of NULL (see the "Common sources of NULL" I've listed in the RFC), and for those developers, spending days editing their code to manually convert those NULL values, that does not seem justified. To respond to your comment, Laravel doesn't provide a native type declaration (it's in a DocBlock), so they don't have any type coercion for their `e()` function: https://github.com/laravel/framework/blob/c113768dbd47de7466d703108eaf8155916d5666/src/Illuminate/Support/helpers.php#L109
> Every implicit type coercion is a bug in hiding IMO.
That's fine, but strict static analysis is so much better at finding these issues (i.e. can this variable be NULL, vs a runtime check to see if the value is NULL this time)... and if you want runtime checks as well, that's where `strict_types=1` comes in.
> If a function, by explicit design, can safely work with null values then that parameter should be properly declared nullable.
`htmlspecialchars()` can and always has safely worked with NULL, the same with the other 335 parameters I've listed. But when I suggested changing these parameters to be nullable, it was not well received.
> Yes, that means that there is a lot of work involved for folks between now and the release of PHP 9.0 to either refactor code to avoid type errors or to get the language to expand the supported types in appropriate functions, but IMO that effort from all parties is worth it in the long run to ensure language consistency (I really don’t think userland PHP and internal/extension PHP should have vastly different behaviors, and type coercion support based on where a function is designed is one of those holes that needs filled) and to ensure all APIs explicitly declare what types of data they support.
I don't think userland and internal functions should have different behaviours either (my RFC does say "keep user-defined and internal functions consistent"). To achieve that, for those not using `strict_types=1`, I'm simply suggesting that NULL coercion continues to work for internal functions, and we update user-defined functions to allow NULL coercion. This would also help developers add native types to their functions. Then we have a setup which is basically the same as how the string "6" can be coerced to the integer 6, and how string concatenation with NULL works, how `'' == $nullable` works`, how `sprintf('Hi %s', $nullable)` works, how `echo $nullable` works, etc. Craig
  117795
May 26, 2022 19:06 michael.babker@gmail.com (Michael Babker)
On Thursday, May 26, 2022 at 11:41 AM, Craig Francis <craig@craigfrancis.co.uk (mailto:craig@craigfrancis.co.uk)> wrote:
> On 26 May 2022, at 15:01, Michael Babker babker@gmail.com> wrote: > > On Thu, May 26, 2022 at 7:21 AM Craig Francis <craig@craigfrancis..co.uk> wrote: > > > That said, I would still like to know about the benefits of rejecting NULL for `htmlspecialchars()`, in the context of that Laravel Blade patch; because, if it is beneficial (vs the BC break), they should revert that patch... shouldn't they? > > > > No, they should not. Laravel made an explicit choice to handle null values in its escaping function for its templating framework. That is their choice to make. They should not be forced to implicitly handle null values because the language decides to add implicit type coercion to user defined functions (because consistency seems to be so important to some), and they should not be forced to reject null values because underlying language functions reject them as well. > > > Erm, I'm simply asking - If there is a good reason for throwing an exception when NULL is passed to `htmlspecialchars()`, then that reason would also apply to Laravel Blade for it's HTML encoding.
Laravel’s `e()` helper function is more than a wrapper around `htmlspecialchars()` which supports values beyond a string, and has elected to support a null value being passed into that function with explicit handling for it. I’ve seen similar patches land in other packages as well. Whether it is to prevent that package from triggering the deprecation regarding null values itself or the package owners choosing to explicitly handle the null case so users don’t need to deal with it doesn’t really matter, the point is folks have made that decision and it is not problematic in any way IMO. You’re basically saying that a framework choosing to apply the same `htmlspecialchars($string ?? ‘’)` patch that applications are suggested to use (for those who don’t care to differentiate between null and empty string before something reaches an escaping function) is in the wrong by arguing for Laravel to revert that patch. On Thursday, May 26, 2022 at 11:41 AM, Craig Francis <craig@craigfrancis.co.uk (mailto:craig@craigfrancis.co.uk)> wrote:
> > If a function, by explicit design, can safely work with null values then that parameter should be properly declared nullable. > > > `htmlspecialchars()` can and always has safely worked with NULL, the same with the other 335 parameters I've listed. But when I suggested changing these parameters to be nullable, it was not well received.
The arginfo for the `htmlspecialchars()` function, and if I’m reading correctly (I’m no C developer so I could very well be misinterpreting things) the internal `php_html_entities()` function which `htmlspecialchars()` calls, does not declare null as a supported value for the `$string` parameter. Meaning that depending on whether your code uses strict_types or not, either calling the function with a null value triggers a type error or requires implicit type coercision to convert the value to a type that the function does support. This means to me that the `htmlspecialchars()` function does NOT support null values and that passing null through only works because of the reliance on implicit type coercision. That, to me, is a valid reason for a type error to be raised; the code very explicitly requires a string and is not written to process a null value. A blanket suggestion to make every string parameter which implicitly supports a null value nullable because they previously supported null through type coercion IMO should be shot down. Instead, parameters which are emitting a deprecation notice because they are receiving an unsupported null value should be reviewed, and if that function can by design support null values, those parameters should be updated to reflect nullability. IMO, `htmlspecialchars()` is not a function which should expliciitly support a null value and the present deprecation is correct. On Thursday, May 26, 2022 at 11:41 AM, Craig Francis <craig@craigfrancis.co.uk (mailto:craig@craigfrancis.co.uk)> wrote:
> On 26 May 2022, at 15:01, Michael Babker babker@gmail.com> wrote: > > Yes, that means that there is a lot of work involved for folks between now and the release of PHP 9.0 to either refactor code to avoid type errors or to get the language to expand the supported types in appropriate functions, but IMO that effort from all parties is worth it in the long run to ensure language consistency (I really don’t think userland PHP and internal/extension PHP should have vastly different behaviors, and type coercion support based on where a function is designed is one of those holes that needs filled) and to ensure all APIs explicitly declare what types of data they support. > > > > I don't think userland and internal functions should have different behaviours either (my RFC does say "keep user-defined and internal functions consistent"). > > To achieve that, for those not using `strict_types=1`, I'm simply suggesting that NULL coercion continues to work for internal functions, and we update user-defined functions to allow NULL coercion.
On the record, I personally disagree with the strict_types declaration’s existence purely because it makes the runtime inconsistent based solely on what file something was called from, plus there are way to bypass a developer’s explicit opt-in to strict_types being enabled which makes it a “mostly strict but there’s still a gotcha” declaration at best. I don’t think this type of deep rooted behavioral difference improves PHP in any way, but that ship has long sailed. With that said though, I disagree with exposing the internal type coercion behavior to userland code. I think the userland behavior as it is today is the correct behavior, even if I look at implicit type coercision as a bug which will cause hard-to-identify production issues for somebody at some point down the road. Let’s also be real here, a not-so-insignificant number of PHP builds are powered by platforms which have minimal if any automated testing and even less static analysis, meaning a static analyzer is not going to be of much help to those builds to begin with because they’re using non-analyzed code as a foundation. Changing the language in a way that makes these already fragile platforms even more suseptable to hidden type-related bugs is a bad idea. Personally speaking, I’ve been bitten by enough type-coercion related bugs in my career (and continue to address new issues arise thanks to a combination of poorly formed legacy data in the database plus open source packages becoming stricter in the types they support) that I would rather advocate for the language moving in a direction that makes type-related issues harder to hit and it to be noisy when it does encounter such issues; the changes in PHP 8.1 IMO are a step in the right direction.
  117808
May 28, 2022 02:35 craig@craigfrancis.co.uk (Craig Francis)
On 26 May 2022, at 20:06, Michael Babker babker@gmail.com> wrote:
> On Thursday, May 26, 2022 at 11:41 AM, Craig Francis <craig@craigfrancis.co.uk (mailto:craig@craigfrancis.co.uk)> wrote: >> [...] If there is a good reason for throwing an exception when NULL is passed to `htmlspecialchars()`, then that reason would also apply to Laravel Blade for it's HTML encoding. > > Laravel’s `e()` helper function is more than a wrapper around `htmlspecialchars()` which supports values beyond a string, and has elected to support a null value being passed into that function with explicit handling for it. I’ve seen similar patches land in other packages as well. Whether it is to prevent that package from triggering the deprecation regarding null values itself or the package owners choosing to explicitly handle the null case so users don’t need to deal with it doesn’t really matter, the point is folks have made that decision and it is not problematic in any way IMO. You’re basically saying that a framework choosing to apply the same `htmlspecialchars($string ?? ‘’)` patch that applications are suggested to use (for those who don’t care to differentiate between null and empty string before something reaches an escaping function) is in the wrong by arguing for Laravel to revert that patch.
Sorry, I've read this a few times, and I just don't understand what you mean... but let's leave it, I cannot think how else to re-write my question to get an answer. On 26 May 2022, at 20:06, Michael Babker babker@gmail.com> wrote:
> On Thursday, May 26, 2022 at 11:41 AM, Craig Francis <craig@craigfrancis.co.uk (mailto:craig@craigfrancis.co.uk)> wrote: >> >> `htmlspecialchars()` can and always has safely worked with NULL, the same with the other 335 parameters I've listed. But when I suggested changing these parameters to be nullable, it was not well received. > > The arginfo for the `htmlspecialchars()` function, and if I’m reading correctly (I’m no C developer so I could very well be misinterpreting things) the internal `php_html_entities()` function which `htmlspecialchars()` calls, does not declare null as a supported value for the `$string` parameter. Meaning that depending on whether your code uses strict_types or not, either calling the function with a null value triggers a type error or requires implicit type coercision to convert the value to a type that the function does support. This means to me that the `htmlspecialchars()` function does NOT support null values and that passing null through only works because of the reliance on implicit type coercision. That, to me, is a valid reason for a type error to be raised; the code very explicitly requires a string and is not written to process a null value. > > A blanket suggestion to make every string parameter which implicitly supports a null value nullable because they previously supported null through type coercion IMO should be shot down. Instead, parameters which are emitting a deprecation notice because they are receiving an unsupported null value should be reviewed, and if that function can by design support null values, those parameters should be updated to reflect nullability. IMO, `htmlspecialchars()` is not a function which should expliciitly support a null value and the present deprecation is correct.
Erm, I really don't wish to be rude, but have you read my RFC, and the problem I'm trying to solve... in short, the backwards compatibility issue for PHP 9.0. Developers will need to spend a lot of time updating their code, which is important to avoid the fatal type errors, but I simply do not understand what the benefit for them will be. On 26 May 2022, at 20:06, Michael Babker babker@gmail.com> wrote:
> On Thursday, May 26, 2022 at 11:41 AM, Craig Francis <craig@craigfrancis.co.uk (mailto:craig@craigfrancis.co.uk)> wrote: >> On 26 May 2022, at 15:01, Michael Babker babker@gmail.com> wrote: >>> Yes, that means that there is a lot of work involved for folks between now and the release of PHP 9.0 to either refactor code to avoid type errors or to get the language to expand the supported types in appropriate functions, but IMO that effort from all parties is worth it in the long run to ensure language consistency (I really don’t think userland PHP and internal/extension PHP should have vastly different behaviors, and type coercion support based on where a function is designed is one of those holes that needs filled) and to ensure all APIs explicitly declare what types of data they support. >> >> >> >> I don't think userland and internal functions should have different behaviours either (my RFC does say "keep user-defined and internal functions consistent"). >> >> To achieve that, for those not using `strict_types=1`, I'm simply suggesting that NULL coercion continues to work for internal functions, and we update user-defined functions to allow NULL coercion. > > On the record, I personally disagree with the strict_types declaration’s existence purely because it makes the runtime inconsistent based solely on what file something was called from, plus there are way to bypass a developer’s explicit opt-in to strict_types being enabled which makes it a “mostly strict but there’s still a gotcha” declaration at best. I don’t think this type of deep rooted behavioral difference improves PHP in any way, but that ship has long sailed.
I believe George Peter Banyard is trying to work on this.
> With that said though, I disagree with exposing the internal type coercion behavior to userland code. I think the userland behavior as it is today is the correct behavior, even if I look at implicit type coercision as a bug which will cause hard-to-identify production issues for somebody at some point down the road. Let’s also be real here, a not-so-insignificant number of PHP builds are powered by platforms which have minimal if any automated testing and even less static analysis, meaning a static analyzer is not going to be of much help to those builds to begin with because they’re using non-analyzed code as a foundation. Changing the language in a way that makes these already fragile platforms even more suseptable to hidden type-related bugs is a bad idea. Personally speaking, I’ve been bitten by enough type-coercion related bugs in my career (and continue to address new issues arise thanks to a combination of poorly formed legacy data in the database plus open source packages becoming stricter in the types they support) that I would rather advocate for the language moving in a direction that makes type-related issues harder to hit and it to be noisy when it does encounter such issues; the changes in PHP 8.1 IMO are a step in the right direction.
Sorry, but I'm finding this hard to follow, it sounds like you don't want any type coercion... e.g. the string "5" to int 5 should just not happen without the developer explicitly converting the type (e.g. intval)... I disagree, but we can leave it at that. Craig
  117796
May 27, 2022 06:44 jordan.ledoux@gmail.com (Jordan LeDoux)
On Thu, May 26, 2022 at 5:21 AM Craig Francis <craig@craigfrancis.co.uk>
wrote:

> > It sounds like you got lucky - you have a function that has a problem with > NULL (but I assume it's fine with an empty string?), and during your > testing you happened to pass NULL to this function. As noted before, static > analysis is *considerably* better at these types of checks, because it's > able check if variables *can* contain NULL. They can also perform other > checks as well (important when your code seems to care about NULL vs an > empty string). > > Nearly *all* code has a problem with null. It very much feels like the
original effort to deprecate null calls decided to resolve this by saying "let's have the language help developers improve their code so it doesn't have these problems in the first place", and this effort is trying to resolve this by saying "let's have the language support the buggy code in ways that makes it work". At my job, my task for the last three weeks has literally been upgrading our internal codebase for 8.1, and the biggest set of logs I'm dealing with is exactly what you're talking about here: null's passed to internal functions. Every single case I've looked at so far has been traced to code that was written incorrectly, where some code somewhere was not properly guarding its values, and error cases were slipping through. Jordan
  117809
May 28, 2022 02:35 craig@craigfrancis.co.uk (Craig Francis)
On 27 May 2022, at 07:44, Jordan LeDoux ledoux@gmail.com> wrote:
> On Thu, May 26, 2022 at 5:21 AM Craig Francis <craig@craigfrancis.co.uk> wrote: >> It sounds like you got lucky - you have a function that has a problem with NULL (but I assume it's fine with an empty string?), and during your testing you happened to pass NULL to this function. As noted before, static analysis is *considerably* better at these types of checks, because it's able check if variables *can* contain NULL. They can also perform other checks as well (important when your code seems to care about NULL vs an empty string). > > > Nearly *all* code has a problem with null.
Erm, but it doesn't... does it? I know I keep going on about this very simply example, but it represents a fairly typical style of programming PHP, and I just do not understand what the problem with it is: ``` $search = $request->input('q'); // Laravel, returns NULL when 'q' is not defined. echo 'Results for ' . htmlspecialchars($search); ``` But forget about it, hopefully someone else can find a solution to the problem. On 27 May 2022, at 07:44, Jordan LeDoux ledoux@gmail.com> wrote:
> It very much feels like the original effort to deprecate null calls decided to resolve this by saying "let's have the language help developers improve their code so it doesn't have these problems in the first place", and this effort is trying to resolve this by saying "let's have the language support the buggy code in ways that makes it work". > > At my job, my task for the last three weeks has literally been upgrading our internal codebase for 8.1, and the biggest set of logs I'm dealing with is exactly what you're talking about here: null's passed to internal functions. Every single case I've looked at so far has been traced to code that was written incorrectly, where some code somewhere was not properly guarding its values, and error cases were slipping through.
For one of the teams I work with (the only one trying to make the relevant changes), this is also their "biggest" problem... but they are having exactly the opposite experience, a considerable amount of hours have gone into finding and changing their code, and not a single change was for code that was "written incorrectly" (I suppose that depends on what you think "correct" code is)... the other teams I work with are either suppressing this notice, or simply not upgrading to 8.1. Craig
  117801
May 27, 2022 09:11 rowan.collins@gmail.com (Rowan Tommins)
On 26/05/2022 13:20, Craig Francis wrote:
> First, the Docblock originally said this function did not accept NULL, but at runtime it accepted/coerced NULL to an empty string. This is exactly how `htmlspecialchars()` worked pre 8.1. Where developers using static analysis tools can choose to treat NULL as an invalid value, and those tools could report nullable variables as an error (via strict type checking).
The same events can be given a very different narrative: Originally, this function was not intended to accept null. The documentation clearly stated the accepted types, and any static analysis tool would reject any value not in that list. The author of the function chose to treat NULL as an invalid value. Because the list of valid types wasn't enforced at run-time, it was accidentally possible to pass null. Many developers who weren't using additional tooling came to rely on this bug, so as a compromise, the authors decided to change the documented behaviour of nulls. At no point did anybody look at the function and say "I can safely pass null to this, as long as I remember never to use static analysis tools". They accidentally passed null, and by luck got a useful result.
>> I also note that the commit message says "On PHP >= 8.1, an error is thrown if `null` is passed to `htmlspecialchars`." which is of course not true for native PHP, only if you make the highly dubious decision to promote deprecations to errors. > While I'd put the word "error" down as a typo, the intention is for 9.0 to throw a type error.
My point is that there is no action required on PHP 8.1. The commit message should have said, "On PHP >= 9.0, an error will be thrown if `null` is passed to `htmlspecialchars`."
> And while user-defined functions are part of the conversation (for consistency reasons), I'm trying to find the benefits of breaking NULL coercion for internal functions (because, if there is an overall benefit, that Laravel Blade patch should be reverted).
This is a complete non sequitur. It's perfectly possible for different scenarios to support different decisions, and what Laravel decides that particular function should accept is based on their own assessment of the trade-offs. PHP is free to make an entirely different decision based on entirely different trade-offs. Meanwhile, the benefits have been explained repeatedly in this thread. You may not agree that they are worth the cost, and as I've repeatedly said, I have some sympathy for that. But please stop trying to take the conversation back to the very beginning by implying that you've asked a question and not received an answer. Regards, -- Rowan Tommins [IMSoP]
  117810
May 28, 2022 02:35 craig@craigfrancis.co.uk (Craig Francis)
On 27 May 2022, at 10:11, Rowan Tommins collins@gmail.com> wrote:
> On 26/05/2022 13:20, Craig Francis wrote: >> First, the Docblock originally said this function did not accept NULL, but at runtime it accepted/coerced NULL to an empty string. This is exactly how `htmlspecialchars()` worked pre 8.1. Where developers using static analysis tools can choose to treat NULL as an invalid value, and those tools could report nullable variables as an error (via strict type checking). > > > The same events can be given a very different narrative: > > Originally, this function was not intended to accept null. The documentation clearly stated the accepted types, and any static analysis tool would reject any value not in that list. The author of the function chose to treat NULL as an invalid value. > > Because the list of valid types wasn't enforced at run-time, it was accidentally possible to pass null. Many developers who weren't using additional tooling came to rely on this bug, so as a compromise, the authors decided to change the documented behaviour of nulls. > > At no point did anybody look at the function and say "I can safely pass null to this, as long as I remember never to use static analysis tools". They accidentally passed null, and by luck got a useful result.
I'm really sorry, but I'm not sure what your point is... I was asking why passing NULL to `htmlspecialchars()` represents a problem, but we should just leave it, we're not getting anywhere.
>>> I also note that the commit message says "On PHP >= 8.1, an error is thrown if `null` is passed to `htmlspecialchars`." which is of course not true for native PHP, only if you make the highly dubious decision to promote deprecations to errors. >> While I'd put the word "error" down as a typo, the intention is for 9.0 to throw a type error. > > > My point is that there is no action required on PHP 8.1. The commit message should have said, "On PHP >= 9.0, an error will be thrown if `null` is passed to `htmlspecialchars`."
But deprecation messages do prompt developers to make changes (action is required). For those who do "no action", when they try upgrading to 9.0, they will have a bit of a problem (with no easy fix).
>> And while user-defined functions are part of the conversation (for consistency reasons), I'm trying to find the benefits of breaking NULL coercion for internal functions (because, if there is an overall benefit, that Laravel Blade patch should be reverted). > > > This is a complete non sequitur. It's perfectly possible for different scenarios to support different decisions, and what Laravel decides that particular function should accept is based on their own assessment of the trade-offs. PHP is free to make an entirely different decision based on entirely different trade-offs.
Sorry, but I'm not following... if there is a benefit/reason for PHP to reject NULL for `htmlspecialchars()`, and I'm just too stupid to see what it is, I would have assumed that benefit/reason would also apply to the HTML encoding function `e()` in Laravel.
> Meanwhile, the benefits have been explained repeatedly in this thread. You may not agree that they are worth the cost, and as I've repeatedly said, I have some sympathy for that. But please stop trying to take the conversation back to the very beginning by implying that you've asked a question and not received an answer.
Sorry, I'm just not getting it, I have read and re-read all of your emails many times, and I'm clearly not clever enough to understand. Anyway, I've send a separate email about my RFC, because I cannot find a solution... I've tried my best, and I'm done. Craig
  117819
May 28, 2022 19:42 michael.babker@gmail.com (Michael Babker)
On Fri, May 27, 2022 at 9:36 PM Craig Francis <craig@craigfrancis.co.uk>
wrote:

> I know I keep going on about this very simply example, but it represents a > fairly typical style of programming PHP, and I just do not understand what > the problem with it is: > > ``` > $search = $request->input('q'); // Laravel, returns NULL when 'q' is not > defined. > > echo 'Results for ' . htmlspecialchars($search); > ```
If you want to go with this specific example, exactly as written, here is the exact issue I have with it. As already mentioned, `htmlspecialchars()` is documented and the implementing code requires a string argument. Passing a null value to it requires explicitly writing non-typesafe code, and requires understanding of how the PHP language handles type coercion, and how those rules are different based on whether the file calling that function has the `strict_types` declaration. So on a code review, I have to understand the context of the value passed through and know what context the language is operating in to determine if non-string values are allowed to avoid errors, and understand the semantics of PHP’s type coercion system to make sure the output is something expected based on the input. Compare that to knowing the function explicitly only accepts a string value and you know you are working with a string through appropriate validation; for me, even in a loosely typed environment, I’d rather follow the documented parameter types instead of having to worry about all the other magic involved to make null work. Though, that input variable would never reach output in any of my projects to begin with if it were null or an empty string; both would land on a code path treated as no search being performed. So that example for me is also way oversimplified because it doesn’t match a real world workflow IMO. On Fri, May 27, 2022 at 9:36 PM Craig Francis <craig@craigfrancis.co.uk> wrote:
> Sorry, but I'm not following... if there is a benefit/reason for PHP to > reject NULL for `htmlspecialchars()`, and I'm just too stupid to see what > it is, I would have assumed that benefit/reason would also apply to the > HTML encoding function `e()` in Laravel.
No, that would not automatically apply to Laravel’s helper function, or any escaping functions in Twig, or escaping functions used in platforms without templating frameworks like WordPress. If it’s not OK for functions that can call `htmlspecialchars()` to gracefully handle null to make sure those trigger the same type error as the core language, then along the same lines, I would argue that patches which add null coalescing or explicit typecasting to that parameter in libraries or applications should also be rejected because then they’re masking an error the language purposefully elected to emit.
> --
- Michael Please pardon any errors, this message was sent from my iPhone.
  117835
May 30, 2022 14:04 guilliam.xavier@gmail.com (Guilliam Xavier)
On Tue, May 24, 2022 at 10:47 AM Rowan Tommins collins@gmail.com> wrote:
> > As an anecdote, I was recently working on a bug involving nulls causing > unintended behaviour in an API. As part of the clean-up, I changed a > function signature from getByIdentifer($identifier) to > getByIdentifer(string $identifier), and during testing, got an error > because I'd missed a scenario where null was passed. This was a good > result - it prevented the broken behaviour and alerted me to the case > that needed fixing. If the parameter had instead been silently coerced > to an empty string, I would have got neither an error nor the original > null behaviour, causing a new bug that might have taken much longer to > track down. > > If your RFC as drafted was implemented, I could only have achieved the > desired check by turning on strict_types=1 for whole sections of the > application, which would probably require dozens of other fixes, or > writing an ugly manual check: > > function getByIdentifier(?string $identifier) { > if ( $identifier === null ) { > throw new TypeError("Function doesn't actually accept nulls"); > } > // ... > }
For this specific example, shouldn't it rather [already] be like this anyway? function getByIdentifier(string $identifier) { if ( $identifier === '' ) { throw new InvalidArgumentException("Empty identifier"); } // ... } (but I guess you could find actual examples where an empty string is "valid" and null is not, indeed)
> As I have said previously, I share your concern about the impact of the > currently planned change, but I don't think loosening the existing type > rules on user-defined functions is a good solution.
I agree the "issue" looks like different PoV of benefit VS cost, plus the reticence about going back on the general trend of "more strictness" (even with strict_types=0). I'm just worried to see people rather disabling deprecation notices (via error_reporting, or a custom error handler, or even patching PHP's source and recompiling their own) than "fixing" the code (granted, that's not specific to *this* RFC, but somewhat "highlighted" here) FWIW, to avoid the "risks" of `expr ?? ''` (possibly hiding undefined) and `(string)expr` / `strval(expr)` (potentially casting "too much" without errors), I've already seen custom functions like function null_to_empty_string(?string $string_or_null): string { return $string_or_null === null ? '' : $string_or_null; } (but also its "opposite" empty_string_to_null(string $string): ?string) Regards, -- Guilliam Xavier
  117837
May 30, 2022 14:59 rowan.collins@gmail.com (Rowan Tommins)
On 30/05/2022 15:04, Guilliam Xavier wrote:
> For this specific example, shouldn't it rather [already] be like this anyway? > > function getByIdentifier(string $identifier) { > if ( $identifier === '' ) { > throw new InvalidArgumentException("Empty identifier"); > } > // ... > } > > (but I guess you could find actual examples where an empty string is > "valid" and null is not, indeed)
The actual code in this case ended up in a generic routine that used isset() to choose which SQL to generate. An empty string would generate a WHERE clause that matched zero rows, but a null would omit the WHERE clause entirely, and match *all* rows. So an extra pre-validation on the string format might be useful for debugging, but wouldn't result in materially different results.
> I'm just worried to see people rather disabling deprecation notices > (via error_reporting, or a custom error handler, or even patching > PHP's source and recompiling their own) than "fixing" the code > (granted, that's not specific to*this* RFC, but somewhat "highlighted" here)
Indeed, I think we have a general problem with how deprecations are communicated and acted on in general. I have been thinking about how to improve that, other than "never change anything" or "never warn people we're going to change anything", and will try to write up my ideas soon.
> function null_to_empty_string(?string $string_or_null): string > { return $string_or_null === null ? '' : $string_or_null; } > > (but also its "opposite" empty_string_to_null(string $string): ?string)
That's actually an interesting observation. It's probably quite common to treat empty strings as null when going from input to storage; and to treat null as empty string when retrieving again. Importantly, databases generally *don't* treat them as equivalent, so forgetting that translation can be a real cause of bugs. I often advocate for string columns in databases to allow either null or empty string, but not both (by adding a check constraint), so that such bugs are caught earlier. To go back to Craig's favourite example, that could be a genuine problem caused by passing null to htmlspecialchars() - if we intended that null to be stored as such in the database, we've silently converted it into a non-equivalent empty string. (Yes, escaping should be done on output not input, but it's not completely infeasible that that combination might happen.) Regards, -- Rowan Tommins [IMSoP]
  117838
May 30, 2022 15:32 guilliam.xavier@gmail.com (Guilliam Xavier)
On Mon, May 30, 2022 at 4:59 PM Rowan Tommins collins@gmail.com> wrote:
> > The actual code in this case ended up in a generic routine that used > isset() to choose which SQL to generate. An empty string would generate > a WHERE clause that matched zero rows, but a null would omit the WHERE > clause entirely, and match *all* rows. So an extra pre-validation on the > string format might be useful for debugging, but wouldn't result in > materially different results.
Maybe the routine could use e.g. array_key_exists() rather than isset()? (anyway, sometimes you *actually* want isset() null behavior, or use a null default for a parameter as a "not passed" argument...)
> That's actually an interesting observation. It's probably quite common > to treat empty strings as null when going from input to storage; and to > treat null as empty string when retrieving again. Importantly, databases > generally *don't* treat them as equivalent,
Yeah, I only know Oracle to do something as... "clever" as storing an empty VARCHAR '' as NULL (for "optimization" IIRC) -_-
> so forgetting that > translation can be a real cause of bugs. I often advocate for string > columns in databases to allow either null or empty string, but not both > (by adding a check constraint), so that such bugs are caught earlier.
Same (sometimes you have no choice but allow NULL, e.g. an optional foreign key, but the referenced primary key is not nullable and should generally also reject '') -- Guilliam Xavier
  117811
May 28, 2022 02:36 craig@craigfrancis.co.uk (Craig Francis)
On 8 Apr 2022, at 18:34, Craig Francis <craig@craigfrancis.co.uk> wrote:
> I've written a new draft RFC to address the NULL coercion problems: > https://wiki.php.net/rfc/null_coercion_consistency
I give up. I'm clearly not clever enough to understand what the benefits are for breaking NULL coercion... considering NULL has been coerced for internal functions since forever, and continues to work in other contexts. If anyone wants to work on a solution to the problem, feel free to either edit my RFC, or create your own RFC. I believe my RFC documents every position put forward, and includes all of details I think are relevant. I fear a vote now will only result in rejection; and once people put their names down for a certain position, they will never change their mind. That said, if someone did continue, and got their RFC accepted, I should be able to organise some funding for the implementation. For some background, I paid for the `is_literal()` implementation before that RFC vote, as that patch is useful irrespective of the result (it's being used in a few projects now, and works incredibly well; thanks again to Joe Watkins for writing, and Máté Kocsis for testing). In this case, two of my clients are considering the cost of modifying their code (by adding a load of `?? ''` everywhere), and they would rather avoid that (time consuming, and makes their code more complicated). I suggested putting aside a budget to either do that modification, or fund the implementation if my RFC was to pass. I'm not sure how big or complicated the task would be, but I noted the R11 suggested day rate of $500 (~£390), and the implementation could take a few weeks. If anyone does want to email me about this, I won't respond until at least June 6th. Craig
  117812
May 28, 2022 06:25 alec@alec.pl (Aleksander Machniak)
On 28.05.2022 04:36, Craig Francis wrote:
> On 8 Apr 2022, at 18:34, Craig Francis <craig@craigfrancis.co.uk> wrote: >> I've written a new draft RFC to address the NULL coercion problems: >> https://wiki.php.net/rfc/null_coercion_consistency > > I give up.
Don't give up. You have my Yes vote. Imo, the RFC: - fixes real upgrade problem (very important), - improves consistency in a better way than the solution introduced in 8.1. - does not change strict_types behavior Some people don't care with what arguments their functions are called with. As long as the value can be coerced to the expected type the function will do what it is supposed to do. Other people have strict_types. All people don't want to be forced to modify a working code. So, this is another "battle" between strict and non-strict camps. I'd like to see which is the majority these days. I hope that even some strict-code proponents can see this makes sense. -- Aleksander Machniak Kolab Groupware Developer [https://kolab.org] Roundcube Webmail Developer [https://roundcube.net] ---------------------------------------------------- PGP: 19359DC1 # Blog: https://kolabian.wordpress.com
  117813
May 28, 2022 07:10 craig@craigfrancis.co.uk (Craig Francis)
On 28 May 2022, at 07:25, Aleksander Machniak <alec@alec.pl> wrote:
> > On 28.05.2022 04:36, Craig Francis wrote: >> On 8 Apr 2022, at 18:34, Craig Francis <craig@craigfrancis.co.uk> wrote: >>> I've written a new draft RFC to address the NULL coercion problems: >>> https://wiki.php.net/rfc/null_coercion_consistency >> I give up. > > Don't give up. You have my Yes vote. > > Imo, the RFC: > - fixes real upgrade problem (very important), > - improves consistency in a better way than the solution introduced in 8.1. > - does not change strict_types behavior > > Some people don't care with what arguments their functions are called with. As long as the value can be coerced to the expected type the function will do what it is supposed to do. Other people have strict_types. All people don't want to be forced to modify a working code. > > So, this is another "battle" between strict and non-strict camps. I'd like to see which is the majority these days. I hope that even some strict-code proponents can see this makes sense.
I'm sorry Aleksander. While I think this is the best approach as well, I don't have a vote, and I don't have the time/energy to respond to emails where I'm clearly not making any progress, or doing a good job of putting the case forward (for reference, each email usually takes a least an hour, as I want to make sure I've completely understood what the person is saying, and yay dyslexia). Anyway, I'm going away for a week (leaving in 1 hour), and when I get back, I *really* need to focus on other things (life, client work, and things related to `is_literal()`/`literal-string` which I am making some good progress on). Thank you for your comments though... and maybe you would be able to convince the other voters it's worth it? Keep in mind, even my quiz from a few months ago had split the room on this issue: https://quiz.craigfrancis.co.uk/ Craig
  117818
May 28, 2022 16:48 marandall@php.net (Mark Randall)
On 28/05/2022 03:36, Craig Francis wrote:
> In this case, two of my clients are considering the cost of modifying their code (by adding a load of `?? ''` everywhere), and they would rather avoid that (time consuming, and makes their code more complicated).
Alternatively, they may wish to define their own functions that do take null and then run a find / replace or rector rule to convert them over. I would still recommend ?? because it's the most future proof. We have passed two RFCs in recent months which will eliminate default-nulls in 2 of the 3 major variable sources in the engine (the remaining one being arrays), and that too may eventually find itself promoted. If it's a framework, I would suggest adding methods for explicitly getting a string that defaults to '' instead of null.