Planning an RFC to allow calls to global functions in constantexpressions

  108343
February 1, 2020 23:00 tysonandre775@hotmail.com (tyson andre)
Hi internals,

I have a working implementation for calling global functions in constant expressions at https://github.com/php/php-src/pull/5139 
(see PR description for more details, see tests for how edge cases get resolved)

If there was interest, and no implementation or process blockers I was unaware of, I was planning to set up an RFC with votes along these lines:

1. (Primary) Yes/no for supporting calls to global functions, with or without the below limitation (requires 2/3 majority)
2. (Secondary vote requiring 2/3 majority to allow calls to all global functions) Support calls to any user-defined or internal global functions, not just unambiguously resolved calls to the deterministic, side effect free ones that are guaranteed to be installed in core extensions.

Alternately, have a first vote on allowing a small subset of global functions calls, then if that passes, start a second vote on allowing all global functions.

For example, allow `\count()`, `\strlen()`, `\array_merge()`, and `\in_array()`,
but don't allow functions such as
`\strtolower()` (different in Turkish locale),
\sprintf() (Depends on locale and  `ini_get('precision')`, but so does `(string)EXPR`),
`json_encode()` (The `json` extension can be disabled),
or calls which aren't unambiguously resolved with `\` or `use function`.
I'll assemble the list when writing the RFC.
https://github.com/phan/phan/blob/2.4.8/src/Phan/Plugin/Internal/UseReturnValuePlugin.php#L190
has a list of candidate functions (non-deterministic ones such as rand() will be excluded).

I mentioned this earlier in https://externals.io/message/108238#108238
The use case I have in mind is different from what was mentioned there,
and this is *just* adding supports for function calls by name, not any other expressions (variables, properties, methods, etc)

Motivation for the voting structure: There are many opinions on what a constant *should* be used for, e.g. one opinion is https://externals.io/message/108238#107992


> Off the top of my head, I can think of three types of things that you might call "constants": > > 1. A value that will never change, and you just want to give a name to. For instance, MINUTES_IN_HOUR=60 > 2. A value set at compile-time, to hard-code a particular condition. For instance, DEBUG=false > 3. A value which is set at run-time, but happens not to change during the course of the application, so should not be over-written. > > PHP's constants are designed primarily to be type 1 - they are literal > values, hard-coded in the source. Unlike C, there is no standard > pre-processor for PHP, so type 2 is rarer, but it's possible to > implement by manipulating the source file before deployment, or even > when running the autoloader.
Thanks, - Tyson
  108344
February 1, 2020 23:50 smalyshev@gmail.com (Stanislav Malyshev)
Hi!

> For example, allow `\count()`, `\strlen()`, `\array_merge()`, and `\in_array()`, > but don't allow functions such as > `\strtolower()` (different in Turkish locale),
What happens if a function like strlen() is applied to a non-string argument? Conversion to string is certainly runtime-dependent even for primitive types like floats.
> allow calls to all global functions
Not sure I understand here - when these calls will be happening? And if you need a value that depends on runtime calculation, what's wrong with using define()? -- Stas Malyshev smalyshev@gmail.com
  108345
February 2, 2020 00:27 tysonandre775@hotmail.com (tyson andre)
> What happens if a function like strlen() is applied to a non-string > argument? Conversion to string is certainly runtime-dependent even for > primitive types like floats.
Good point. I thought that string casts were already allowed, but was mistaken. It's possible to cast to string through concatenation right now, so it is a somewhat pre-existing issue. ``` php > const X = (string)2.3; Fatal error: Constant expression contains invalid operations in php shell code on line 1 php > const Y = '' . 2.3; php > var_export(Y); '2.3' ``` Possible solutions: 1. Exclude functions with string parameters, strval(), strpos(), etc. from the limited set of functions in this RFC. Subsequent RFCs can be created to vote on for adding those groups of functions. 2. Always treat parameters in calls in constants like strict_types=1 - It's doable and won't break existing code, but I don't want to add that inconsistency 3. Keep existing behavior in PR Currently, the implementation is intended to respect the strict_types setting of the file containing the constant expression, so strpos, strlen, etc would have that issue. I still need to add a test that the implementation is actually doing that.
> Not sure I understand here - when these calls will be happening? And if > you need a value that depends on runtime calculation, what's wrong with > using define()?
The constant expressions will be evaluated at the same time php currently evaluates constant expressions. For class constants and static variable defaults, they're evaluated once at the time of use For global constants, they're evaluated immediately. If you want to use the define()d constant for a class constant, that is possible right now, but it's an additional constant that's only (ideally) used in one place, and there's the possibility of choosing conflicting names if the code is copied. Another problem with using define() is that opcache cannot optimize global constants, because define() will emit a notice instead of raising an Error if the constant already exists. So the performance is slightly worse. And the code is slightly less clear because of the indirection.
  108346
February 2, 2020 02:23 smalyshev@gmail.com (Stanislav Malyshev)
Hi!

> The constant expressions will be evaluated at the same time php currently evaluates constant > expressions.
But you essentially propose running arbitrary code at that time, which is much bigger deal than evaluating simple constant expressions. While simple functions like strlen() would probably be OK, running arbitrary function means possibility for arbitrary side effects, which makes the whole thing completely unpredictable.
> Another problem with using define() is that opcache cannot optimize global constants,
opcache can't also optimize runtime-evaluated constants, because results and side-effects of an arbitrary function call is not guaranteed to be the same. It could only optimize anything if it were guaranteed that the function is pure, does not depend on anything but its arguments and has no side effects.
> because define() will emit a notice instead of raising an Error if the constant already exists.
You can trivially check for this, and if it ever would be a problem, define() error level can be changed, but I don't think it's a real issue since it was never changed so far.
> So the performance is slightly worse.
If you try to optimize performance this way, you're optimizing performance in a wrong way. No sane code would depend on performance of define(). -- Stas Malyshev smalyshev@gmail.com
  108409
February 6, 2020 11:35 nikita.ppv@gmail.com (Nikita Popov)
On Sun, Feb 2, 2020 at 12:00 AM tyson andre <tysonandre775@hotmail.com>
wrote:

> > Hi internals, > > I have a working implementation for calling global functions in constant > expressions at https://github.com/php/php-src/pull/5139 > (see PR description for more details, see tests for how edge cases get > resolved) > > If there was interest, and no implementation or process blockers I was > unaware of, I was planning to set up an RFC with votes along these lines: > > 1. (Primary) Yes/no for supporting calls to global functions, with or > without the below limitation (requires 2/3 majority) > 2. (Secondary vote requiring 2/3 majority to allow calls to all global > functions) Support calls to any user-defined or internal global functions, > not just unambiguously resolved calls to the deterministic, side effect > free ones that are guaranteed to be installed in core extensions. > > Alternately, have a first vote on allowing a small subset of global > functions calls, then if that passes, start a second vote on allowing all > global functions. > > For example, allow `\count()`, `\strlen()`, `\array_merge()`, and > `\in_array()`, > but don't allow functions such as > `\strtolower()` (different in Turkish locale), > \sprintf() (Depends on locale and `ini_get('precision')`, but so does > `(string)EXPR`), > `json_encode()` (The `json` extension can be disabled), > or calls which aren't unambiguously resolved with `\` or `use function`. > I'll assemble the list when writing the RFC. > > https://github.com/phan/phan/blob/2.4.8/src/Phan/Plugin/Internal/UseReturnValuePlugin.php#L190 > has a list of candidate functions (non-deterministic ones such as rand() > will be excluded). > > I mentioned this earlier in https://externals.io/message/108238#108238 > The use case I have in mind is different from what was mentioned there, > and this is *just* adding supports for function calls by name, not any > other expressions (variables, properties, methods, etc) > > Motivation for the voting structure: There are many opinions on what a > constant *should* be used for, e.g. one opinion is > https://externals.io/message/108238#107992 > > > > Off the top of my head, I can think of three types of things that you > might call "constants": > > > > 1. A value that will never change, and you just want to give a name to. > For instance, MINUTES_IN_HOUR=60 > > 2. A value set at compile-time, to hard-code a particular condition. For > instance, DEBUG=false > > 3. A value which is set at run-time, but happens not to change during > the course of the application, so should not be over-written. > > > > PHP's constants are designed primarily to be type 1 - they are literal > > values, hard-coded in the source. Unlike C, there is no standard > > pre-processor for PHP, so type 2 is rarer, but it's possible to > > implement by manipulating the source file before deployment, or even > > when running the autoloader. >
Hey Tyson, As you've already realized, the main problem here is the behavior for functions that have side-effects or state. Currently we mostly get away with the illusion that it doesn't matter when exactly constexpr initializers are evaluated. Apart from the error location and (admittedly quite a few) other details, you ostensibly can't distinguish whether the expression is going to evaluated on declaration, on first access or on each access. I'm not a big fan of whitelisting functions, especially as this runs into the name resolution issue (you suggest that essentially the use of fully qualified/imported names will be forced -- unlike everywhere else in PHP), and because I've seen this "mark all the functions const/constexpr" race play out one too many times in other languages, which have a much stronger motivation for it than we do. If we want to expand the allowed operations inside constexpr initializers, I think this needs to start by considering precise evaluation semantics in the places where it matters. Those are primarily initializers for function parameters and non-static properties, because both of these have a reasonable expectation of evaluation at each use. I think the best way to approach those two may well be to relax the constexpr restrictions on them entirely (allowing say "public Foo $prop = new Foo()", something people want anyway) and make sure that things are evaluated on each access (unless they can be pre-evaluated without affecting behavior, of course). For the remaining cases (constant, class constant and static property initializers) we can probably get away with the current semantics, which is evaluation on first access, with a very broad definition of what "access" means. Regards, Nikita
  108427
February 9, 2020 00:55 tysonandre775@hotmail.com (tyson andre)
> As you've already realized, the main problem here is the behavior for > functions that have side-effects or state. Currently we mostly get away > with the illusion that it doesn't matter when exactly constexpr > initializers are evaluated. Apart from the error location and (admittedly > quite a few) other details, you ostensibly can't distinguish whether the > expression is going to evaluated on declaration, on first access or on each > access.
Good point - If PHP ends up supporting an option where **any** global function can get called, It'd make sense to evaluate any expression *with a function call* every time for instance properties and parameter defaults (e.g. `public $id = generate_unique_int_id()`). (the optimizer can later optimize any functions where it won't be noticeable). I think that would require updates to my PR for caching, since immutable returned zvals (including false/true) get cached permanently.
> I'm not a big fan of whitelisting functions, especially as this runs into > the name resolution issue (you suggest that essentially the use of fully > qualified/imported names will be forced -- unlike everywhere else in PHP), > and because I've seen this "mark all the functions const/constexpr" race > play out one too many times in other languages, which have a much stronger > motivation for it than we do.
I'd agree with all of those drawbacks of whitelisting functions. This is still the best compromise I can think of for allowing those functions I've mentioned, if it turns out there are too many objections to this making constants too dynamic.
> If we want to expand the allowed operations inside constexpr initializers, > I think this needs to start by considering precise evaluation semantics in > the places where it matters. Those are primarily initializers for function > parameters and non-static properties, because both of these have a > reasonable expectation of evaluation at each use. I think the best way to > approach those two may well be to relax the constexpr restrictions on them > entirely (allowing say "public Foo $prop = new Foo()", something people > want anyway) and make sure that things are evaluated on each access (unless > they can be pre-evaluated without affecting behavior, of course).
For a more general proposal, the variable scope is a potential problem. For example, would the below example create a new class definition every time f() was called? (Creating a brand new scope with no variables, allowing everything except variables (and relying on zend_forbid_dynamic_call to blacklist get_defined_variables()), or always using the global scope might be ways of handling this.) ``` function f($x) { return fn() => new class() { public $var = $x; // or new Foo($x) }; } $factory1 = f(1); $factory2 = f(2); ``` That reminds me: I forgot about functions such as `compact()`, `extract()` and `get_defined_vars()`. (and `func_num_args()`/`func_get_args()`/`func_get_arg()`). It looks like all of them check `zend_forbid_dynamic_call()` to throw, but I need to add tests and possibly flag the call as dynamic.
> For the remaining cases (constant, class constant and static property > initializers) we can probably get away with the current semantics, which is > evaluation on first access, with a very broad definition of what "access" > means.
Agreed - I plan on keeping the existing semantics for those, but I think the semantics of global(non-class) constants are that they are currently always evaluated immediately, based on the tests I've wrote for this RFC. Thanks, - Tyson
  108428
February 9, 2020 02:40 tysonandre775@hotmail.com (tyson andre)
Hi internals,

> I have a working implementation for calling global functions in constant expressions > at https://github.com/php/php-src/pull/5139 > (see PR description for more details, see tests for how edge cases get resolved)
I've written up an early draft of this based on https://wiki.php.net/rfc/calls_in_constant_expressions The GitHub PR hasn't yet been updated to support (and include tests for) the proposed behavior in instance properties and in parameter defaults, but I plan to work on that. The RFC document also includes the proposed whitelist of functions. That document is a rough draft, and will be updated when the implementation gets updated. - Tyson