Handling of null arguments to internal functions

  105845
June 6, 2019 08:54 nikita.ppv@gmail.com (Nikita Popov)
Hi internals,

The https://wiki.php.net/rfc/consistent_type_errors RFC resolved one of the
big differences in argument handling between internal and userland
functions. This change allowed us to add return type information (available
through reflection) to internal functions, something that Gabriel Caruso
has been working on.

During the discussion of that RFC, Andrea raised another important
difference: User functions do not accept null for a typed argument, unless
it is explicitly nullable (through ?Type or a null default value). This is
independent of whether the type is scalar (int and array both reject null)
and also independent of strict_types.

For internal functions on the other hand, in weak typing mode, null will be
accepted to scalar arguments and coerced to the respective type. For
example strlen(null) becomes strlen('') becomes 0, without emitting any
diagnostics.

More precisely, this is a difference between two different argument
handling approaches, arginfo (used by userland, forbids null) and zpp (used
by internal, allows null). As we would like to also use arginfo for
internal functions (as it is available through reflection, while zpp is
not), we need to resolve this discrepancy in some way.

The "correct" thing to do would be to forbid passing null to non-nullable
arguments for internal functions / zpp as well, making behavior fully
consistent. I've created https://github.com/php/php-src/pull/4227 to
prototype this and see how the test impact looks like.

However, I think there will be a lot of fallout from this change. When
running this patch against the Symfony test suite, I get many thousands of
deprecation warnings (though many with the same root cause). In some cases,
it's a matter of us not marking arguments as nullable that likely should
be, e.g. I was able to remove a slew of deprecation warnings by making the
$pattern argument in NumberFormatter/IntlDateFormatter nullable.

This is what I originally wanted to propose here, but after seeing the
impact this has (relative to, say, changing == semantics) it might not be a
good idea.

If we don't want to do that, then we should start accepting null for
scalars for internal functions in arginfo as well. It will resolve the
arginfo/zpp discrepancy, but will leave the divide between
internal/userland functions.

Thoughts?

Regards,
Nikita
  105847
June 6, 2019 10:09 george.banyard@gmail.com ("G. P. B.")
On Thu, 6 Jun 2019 at 10:54, Nikita Popov ppv@gmail.com> wrote:

> Hi internals, > > The https://wiki.php.net/rfc/consistent_type_errors RFC resolved one of > the > big differences in argument handling between internal and userland > functions. This change allowed us to add return type information (available > through reflection) to internal functions, something that Gabriel Caruso > has been working on. > > During the discussion of that RFC, Andrea raised another important > difference: User functions do not accept null for a typed argument, unless > it is explicitly nullable (through ?Type or a null default value). This is > independent of whether the type is scalar (int and array both reject null) > and also independent of strict_types. > > For internal functions on the other hand, in weak typing mode, null will be > accepted to scalar arguments and coerced to the respective type. For > example strlen(null) becomes strlen('') becomes 0, without emitting any > diagnostics. > > More precisely, this is a difference between two different argument > handling approaches, arginfo (used by userland, forbids null) and zpp (used > by internal, allows null). As we would like to also use arginfo for > internal functions (as it is available through reflection, while zpp is > not), we need to resolve this discrepancy in some way. > > The "correct" thing to do would be to forbid passing null to non-nullable > arguments for internal functions / zpp as well, making behavior fully > consistent. I've created https://github.com/php/php-src/pull/4227 to > prototype this and see how the test impact looks like. > > However, I think there will be a lot of fallout from this change. When > running this patch against the Symfony test suite, I get many thousands of > deprecation warnings (though many with the same root cause). In some cases, > it's a matter of us not marking arguments as nullable that likely should > be, e.g. I was able to remove a slew of deprecation warnings by making the > $pattern argument in NumberFormatter/IntlDateFormatter nullable. > > This is what I originally wanted to propose here, but after seeing the > impact this has (relative to, say, changing == semantics) it might not be a > good idea. >
I personnally think it's a good idea. However with the possible impact we could maybe work on this on a longer timescale, as in soft deprecation in PHP 7.4 by informing users that some of the internal functions where a null value isn't sensible (e.g. strlen) will emit deprecation notices for the whole PHP 8 lifecycle and be removed in PHP 9 (whenever that is). But this would take a lot of effort to audit every internal function and debate if a null argument makes "sense". Another way would be to mark every internal function scalar argument explicitly nullable and implement the current behavior into the function, this is even a bigger task and probably not the wisest too, but it would resolve the discrepancy without any BC breaks.
> If we don't want to do that, then we should start accepting null for > scalars for internal functions in arginfo as well. It will resolve the > arginfo/zpp discrepancy, but will leave the divide between > internal/userland functions. > > Thoughts? >
I'm not really keen on the idea that internal functions can behave differently than userland ones, even though this is currently the case.
> Regards, > Nikita >
Best regards George P. Banyard
  105846
June 6, 2019 10:10 come@opensides.be (=?ISO-8859-1?Q?C=F4me?= Chilliet)
I’m not sure if this is exactly the same topic, but one problem I have with how internal functions are handling arguments is how the absence of an optional argument is treated.

I have stumbled across functions documented as functionname($arg1, $arg2 = NULL) which behaves differently when called as functionname('something') and functionname('something', NULL).
A lot of places in the documentation state "if omitted" about an argument.

This is a big problem when several arguments are optional and you just want to provide a value for the last one. You cannot know if giving the default value for the ones in between will affect the behavior.

Shouldn’t the argument parsing system treat absence the same as the default value?
This is what happens for userland functions.

Côme
  105848
June 6, 2019 10:21 nikita.ppv@gmail.com (Nikita Popov)
On Thu, Jun 6, 2019 at 12:10 PM Côme Chilliet <come@opensides.be> wrote:

> I’m not sure if this is exactly the same topic, but one problem I have > with how internal functions are handling arguments is how the absence of an > optional argument is treated. > > I have stumbled across functions documented as functionname($arg1, $arg2 = > NULL) which behaves differently when called as functionname('something') > and functionname('something', NULL). > A lot of places in the documentation state "if omitted" about an argument.. > > This is a big problem when several arguments are optional and you just > want to provide a value for the last one. You cannot know if giving the > default value for the ones in between will affect the behavior. > > Shouldn’t the argument parsing system treat absence the same as the > default value? > This is what happens for userland functions. >
This is a somewhat different but closely related problem. The important thing to understand is that internal functions do not have a native concept of a default value. If you see a default value in the documentation, that's just documentation. On the implementation side, there is basically three ways in which defaults are typically handled: * Assigning a default to a zpp variable: Omitting the argument and passing that default value (but not null!) will behave the same. * Using a ! specifier: Omitting the argument and passing null will behave the same. * Checking the number of arguments: Omitting the argument and specifying it may result in differing behavior. If you see something that is documented as =null, but does not behave the same when null is passed, please do report a bug. The harder question is what to do about cases where the default is documented as "", and that "" value has abnormal behavior (i.e. is specially treated in implementation, the behavior does not fall out "naturally" from the empty string). In that case, I think we should also be accepting null as a value (independent of strict_types). There are probably a lot of functions that do this. Nikita
  105849
June 6, 2019 11:48 arvids.godjuks@gmail.com (Arvids Godjuks)
чт, 6 июн. 2019 г. в 10:55, Nikita Popov ppv@gmail.com>:

> Hi internals, > > The https://wiki.php.net/rfc/consistent_type_errors RFC resolved one of > the > big differences in argument handling between internal and userland > functions. This change allowed us to add return type information (available > through reflection) to internal functions, something that Gabriel Caruso > has been working on. > > During the discussion of that RFC, Andrea raised another important > difference: User functions do not accept null for a typed argument, unless > it is explicitly nullable (through ?Type or a null default value). This is > independent of whether the type is scalar (int and array both reject null) > and also independent of strict_types. > > For internal functions on the other hand, in weak typing mode, null will be > accepted to scalar arguments and coerced to the respective type. For > example strlen(null) becomes strlen('') becomes 0, without emitting any > diagnostics. > > More precisely, this is a difference between two different argument > handling approaches, arginfo (used by userland, forbids null) and zpp (used > by internal, allows null). As we would like to also use arginfo for > internal functions (as it is available through reflection, while zpp is > not), we need to resolve this discrepancy in some way. > > The "correct" thing to do would be to forbid passing null to non-nullable > arguments for internal functions / zpp as well, making behavior fully > consistent. I've created https://github.com/php/php-src/pull/4227 to > prototype this and see how the test impact looks like. > > However, I think there will be a lot of fallout from this change. When > running this patch against the Symfony test suite, I get many thousands of > deprecation warnings (though many with the same root cause). In some cases, > it's a matter of us not marking arguments as nullable that likely should > be, e.g. I was able to remove a slew of deprecation warnings by making the > $pattern argument in NumberFormatter/IntlDateFormatter nullable. > > This is what I originally wanted to propose here, but after seeing the > impact this has (relative to, say, changing == semantics) it might not be a > good idea. > > If we don't want to do that, then we should start accepting null for > scalars for internal functions in arginfo as well. It will resolve the > arginfo/zpp discrepancy, but will leave the divide between > internal/userland functions. > > Thoughts? > > Regards, > Nikita >
Hello Nikita, as a user-land dev and in general looking at where I would like for PHP to end up long-term, I think the first one is a better solution, though quite a bit more painful to do and longer-term to implement due to the amount of work involved. My reasoning is JIT - as things go forward, we probably are going to start to see some things get implemented in PHP instead of the C core (*caught* PDO) and that might lead to some strange situations if the argument handling is going to be inconsistent between these two worlds. But consistency, in general, would be a nice change of pace so you don't have to keep in mind that there are slight differences in behaviour depending on what you call - a built-in function or a userland one. My 0.02$, Thanks. -- Arvīds Godjuks +371 26 851 664 arvids.godjuks@gmail.com Skype: psihius Telegram: @psihius https://t.me/psihius
  105852
June 6, 2019 12:46 rowan.collins@gmail.com (Rowan Collins)
On Thu, 6 Jun 2019 at 12:49, Arvids Godjuks godjuks@gmail.com>
wrote:

> consistency, in general, would be a nice change of pace so you don't have > to keep in mind that there are slight differences in behaviour depending on > what you call - a built-in function or a userland one. >
This is my view as well. Another thing that inconsistency causes problems with is polyfills - if you want to wrap, emulate, or otherwise reimplement an internal function, it can be fiddly to emulate the subtleties of ZPP. (This gets worse with objects, which can do all sorts of wacky things internally that have no user-space equivalent, but that's a topic for another day.) Would it be possible to use a combination of automation (analysis of the C code, or fuzz testing of the functions) and collaboration (a great big list people can work through a section of and report back) to categorise the functions in core? Something like: a) not affected, because handling is consistent with userland anyway b) should explicitly accept nulls c) should explicitly reject nulls We do however have to make a tricky judgement on functions in category c, of how much code is going to break if we make them stricter. Regards, -- Rowan Collins [IMSoP]