[RFC] Explicit call-site pass-by-reference (again)

  108699
February 20, 2020 14:13 nikita.ppv@gmail.com (Nikita Popov)
Hi internals,

I'd like to start the discussion on the "explicit call-site
pass-by-reference" RFC again:
https://wiki.php.net/rfc/explicit_send_by_ref

The RFC proposes to allow using a "&" marker at the call-site (in addition
to the declaration-site) when by-reference passing is used.

Relative to the last time this was discussed, there are three main changes
to the proposal:

1. The RFC proposes a mode in which the use of & at the call-site is
required. This depends on the outcome of the "language evolution" RFC. It
uses a declare as a placeholder, but if we go with editions, then the mode
would be enabled in the next edition instead. (If we go with none of the
options, then this part of the RFC becomes void.)

2. The RFC now also tackles a long-standing problem in the handling of
__call/__callStatic and call_user_func. The explicit call-site annotation
allows us to use these features together with by-reference arguments. I
failed to realize that the RFC can nicely solve this problem when
originally proposing it.

3. The RFC now discusses the old "call-time pass-by-reference" feature in
more detail, and how this RFC differs from it. The important difference is
that this RFC requires the marker at both call *and* declaration.

Regards,
Nikita
  108700
February 20, 2020 14:47 internals@lists.php.net ("Levi Morrison via internals")
Just chiming in to voice strong support for this RFC. This is a key
piece toward making PHP code statically analyzable. If it becomes
required at the call site, such as in an edition of the language, it
will significantly enhance the ability to reason about code and
probably make it more correct as well. As a small example, consider
this method on an Optional type class:

function map(callable $f): Optional {
  if ($this->enabled) {
    return new Optional($f($this->data));
  } else {
    return $this;
  }
}

The intent is to return a new optional or an empty one, but if you
pass a closure that accepts something by reference you can change the
original, which is not intended at all. For people who defend against
it, it requires saving `$this->data` to a local variable, then passing
in the local. Then if the user does a call-by-reference it will affect
the local, not the object's data.
  108705
February 20, 2020 23:04 larry@garfieldtech.com ("Larry Garfield")
On Thu, Feb 20, 2020, at 8:47 AM, Levi Morrison via internals wrote:
> Just chiming in to voice strong support for this RFC. This is a key > piece toward making PHP code statically analyzable. If it becomes > required at the call site, such as in an edition of the language, it > will significantly enhance the ability to reason about code and > probably make it more correct as well. As a small example, consider > this method on an Optional type class: > > function map(callable $f): Optional { > if ($this->enabled) { > return new Optional($f($this->data)); > } else { > return $this; > } > } > > The intent is to return a new optional or an empty one, but if you > pass a closure that accepts something by reference you can change the > original, which is not intended at all. For people who defend against > it, it requires saving `$this->data` to a local variable, then passing > in the local. Then if the user does a call-by-reference it will affect > the local, not the object's data.
If $this->data is itself an object, then you have a concern for data manipulation (spooky action at a distance) even if it's passed by value. Given how much data these days is objects, and thus the problem exists regardless of whether it's by value or by reference passing, adding steps to make pass-by-reference harder doesn't seem to help much. --Larry Garfield
  108706
February 20, 2020 23:26 cschneid@cschneid.com (Christian Schneider)
Am 21.02.2020 um 00:04 schrieb Larry Garfield <larry@garfieldtech.com>:
> On Thu, Feb 20, 2020, at 8:47 AM, Levi Morrison via internals wrote: >> Just chiming in to voice strong support for this RFC. This is a key >> piece toward making PHP code statically analyzable. If it becomes >> required at the call site, such as in an edition of the language, it >> will significantly enhance the ability to reason about code and >> probably make it more correct as well. As a small example, consider >> this method on an Optional type class: >> >> function map(callable $f): Optional { >> if ($this->enabled) { >> return new Optional($f($this->data)); >> } else { >> return $this; >> } >> } >> >> The intent is to return a new optional or an empty one, but if you >> pass a closure that accepts something by reference you can change the >> original, which is not intended at all. For people who defend against >> it, it requires saving `$this->data` to a local variable, then passing >> in the local. Then if the user does a call-by-reference it will affect >> the local, not the object's data. > > > If $this->data is itself an object, then you have a concern for data manipulation (spooky action at a distance) even if it's passed by value. Given how much data these days is objects, and thus the problem exists regardless of whether it's by value or by reference passing, adding steps to make pass-by-reference harder doesn't seem to help much.
+1 The whole discussion about being worried about 'malicious' libraries altering your precious scalar values misses the fact that PHP is not a pure language, there are many ways a function can have side-effects, Larry pointing out one obvious one. Speaking of language editions: Trying to solve one obscure case (and one which is easily enough detectable by statical analysis) by introducing such a big BC break could render a whole edition ineligible for a software project. So beware, features bundled in one (hypothetical) edition better not break too many different things at the same time. If you don't trust your library code then you're in deep trouble anyway. - Chris
  108711
February 21, 2020 08:37 nikita.ppv@gmail.com (Nikita Popov)
On Fri, Feb 21, 2020 at 12:05 AM Larry Garfield <larry@garfieldtech.com>
wrote:

> On Thu, Feb 20, 2020, at 8:47 AM, Levi Morrison via internals wrote: > > Just chiming in to voice strong support for this RFC. This is a key > > piece toward making PHP code statically analyzable. If it becomes > > required at the call site, such as in an edition of the language, it > > will significantly enhance the ability to reason about code and > > probably make it more correct as well. As a small example, consider > > this method on an Optional type class: > > > > function map(callable $f): Optional { > > if ($this->enabled) { > > return new Optional($f($this->data)); > > } else { > > return $this; > > } > > } > > > > The intent is to return a new optional or an empty one, but if you > > pass a closure that accepts something by reference you can change the > > original, which is not intended at all. For people who defend against > > it, it requires saving `$this->data` to a local variable, then passing > > in the local. Then if the user does a call-by-reference it will affect > > the local, not the object's data. > > > If $this->data is itself an object, then you have a concern for data > manipulation (spooky action at a distance) even if it's passed by value. > Given how much data these days is objects, and thus the problem exists > regardless of whether it's by value or by reference passing, adding steps > to make pass-by-reference harder doesn't seem to help much. >
If you will allow me some exaggeration, what you're basically saying here is that all the const / readonly / immutability features in (nearly) all programming languages are useless, because they (nearly) always allow for interior mutability in one way or another. "const" in JavaScript doesn't allow you to rebind the object, but you can still modify the object. Same with "final" in Java. Similar things hold in C/C++/Rust when it comes to const pointers/references to structs that contain non-const pointers/references. And of course, the "readonly" RFC for PHP that is currently under discussion has the same characteristics. What I'm trying to say here: All of these features do not guarantee recursive immutability, but that doesn't render them useless in the least. In fact, the outer-most layer is where immutability is the most important, because there's a lot of difference between $i = 0; var_dump($i); // int(0) foo($i); var_dump($i); // array(7) { ... } // WTF just happened??? and $o = new Foo(); var_dump($o); // object(Foo) #42 { xxx } foo($o); var_dump($o); // object(Foo) #42 { yyy } // Did something change in there? Doesn't really matter for this code! One of the big differences is that by-reference passing can change the *type* of the variable, while by-object passing cannot. It cannot even change object identity. On a closing note: I don't think this RFC makes passing by reference "harder" in any meaningful sense. Yes, you do need to write one extra character. In exchange, every time you read code you will immediately see that by-reference passing is used, here be dragons. Regards, Nikita
  108710
February 21, 2020 02:29 matthewmatthew@gmail.com (Matthew Brown)
This proposal is great, but most PHP static analysis tools already do a
reasonable job of understanding by-reference assignment and detecting bugs
there (an exception is closure use by-reference checks, which is a
static-analysis no-man's land).

No static analysis tools catch your specific use-case, though.

On Thu, 20 Feb 2020 at 09:48, Levi Morrison via internals <
internals@lists.php.net> wrote:

> Just chiming in to voice strong support for this RFC. This is a key > piece toward making PHP code statically analyzable. If it becomes > required at the call site, such as in an edition of the language, it > will significantly enhance the ability to reason about code and > probably make it more correct as well. As a small example, consider > this method on an Optional type class: > > function map(callable $f): Optional { > if ($this->enabled) { > return new Optional($f($this->data)); > } else { > return $this; > } > } > > The intent is to return a new optional or an empty one, but if you > pass a closure that accepts something by reference you can change the > original, which is not intended at all. For people who defend against > it, it requires saving `$this->data` to a local variable, then passing > in the local. Then if the user does a call-by-reference it will affect > the local, not the object's data. > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > >
  108703
February 20, 2020 16:47 marandall@php.net (Mark Randall)
On 20/02/2020 14:13, Nikita Popov wrote:
> The RFC proposes to allow using a "&" marker at the call-site (in addition > to the declaration-site) when by-reference passing is used.
It's a solid +1 from me I do think this is somewhere else that an "official" upgrade / migration tool would be rather well-received, an easy mechanism to scan a file / directory for standard extension functions with known reference args and re-write them appropriately. -- Mark Randall marandall@php.net
  108707
February 20, 2020 23:50 mike@newclarity.net (Mike Schinkel)
> On Feb 20, 2020, at 9:13 AM, Nikita Popov ppv@gmail.com> wrote: > > I'd like to start the discussion on the "explicit call-site > pass-by-reference" RFC again: > https://wiki.php.net/rfc/explicit_send_by_ref
> On Feb 20, 2020, at 6:04 PM, Larry Garfield <larry@garfieldtech.com> wrote: > > If $this->data is itself an object, then you have a concern for data manipulation (spooky action at a distance) even if it's passed by value. Given how much data these days is objects, and thus the problem exists regardless of whether it's by value or by reference passing, adding steps to make pass-by-reference harder doesn't seem to help much. > > --Larry Garfield
> On Feb 20, 2020, at 6:26 PM, Christian Schneider <cschneid@cschneid.com> wrote: > > +1 > > The whole discussion about being worried about 'malicious' libraries altering your precious scalar values misses the fact that PHP is not a pure language, there are many ways a function can have side-effects, Larry pointing out one obvious one. > Speaking of language editions: Trying to solve one obscure case (and one which is easily enough detectable by statical analysis) by introducing such a big BC break could render a whole edition ineligible for a software project. So beware, features bundled in one (hypothetical) edition better not break too many different things at the same time. > > If you don't trust your library code then you're in deep trouble anyway.
A huge +1 to Nikita's RFC. A noted -1 to both Larry and Christian's objection. Why? Because perfect should not be the enemy of the significant improvement for specific use-cases unless it can be illustrated that making the improvement disallows future perfection. -Mike
  108715
February 21, 2020 22:20 rowan.collins@gmail.com (Rowan Tommins)
On 20 February 2020 14:13:58 GMT+00:00, Nikita Popov ppv@gmail.com> wrote:
>Hi internals, > >I'd like to start the discussion on the "explicit call-site >pass-by-reference" RFC again: >https://wiki.php.net/rfc/explicit_send_by_ref
Hi Nikita, Thanks for putting the case for this so clearly. My instinctive reaction is still one of frustration that the pain of removing call-site ampersands was in vain, and I will now be asked to put most of them back in. It's also relevant that users already find where & should and should not be used very confusing. There is a potential "PR" cost of this change that should be weighed against the advantages. I'm also not very keen on internal functions being able to do things that can't be replicated on userland, and this RFC adds two: additional behaviour for existing "prefer-ref" arguments, and new "prefer-value" arguments. My current opinion is that I'd rather wait for the details of out and inout parameters to be worked out, and reap higher gains for the same cost. For instance, if preg_match could mark $matches as "out", I'd be more happy to run in a mode where I needed to add a call-site keyword. Regards, -- Rowan Tommins [IMSoP]
  108718
February 22, 2020 06:50 mike@newclarity.net (Mike Schinkel)
> On Feb 21, 2020, at 5:20 PM, Rowan Tommins collins@gmail.com> wrote: > > On 20 February 2020 14:13:58 GMT+00:00, Nikita Popov ppv@gmail.com> wrote: >> Hi internals, >> >> I'd like to start the discussion on the "explicit call-site >> pass-by-reference" RFC again: >> https://wiki.php.net/rfc/explicit_send_by_ref > > My instinctive reaction is still one of frustration that the pain of removing call-site ampersands was in vain, and I will now be asked to put most of them back in.
That is a great example of what is known as a "sunken cost." In summary "A a sunken cost is a cost paid in the past that is no longer relevant to decisions about the future."
> It's also relevant that users already find where & should and should not be used very confusing.
One of the reasons it is confusing is because developers are currently required to use the ampersand in one place and not the other. Making it always used removes said confusion as they would no longer be a reason to have to remember when and when not to use the ampersand anymore.
> There is a potential "PR" cost of this change that should be weighed against the advantages.
To say "We fixed something that in hindsight we've since determined was a problem." How is this a concern? And when has the PHP community primarily worried about PR cost anyway, except with Hack starting eating PHP's lunch in terms of performance?
> I'm also not very keen on internal functions being able to do things that can't be replicated on userland, and this RFC adds two: additional behaviour for existing "prefer-ref" arguments, and new "prefer-value" arguments.
I used to have the same preference. And then I realized that languages that allow everything and do not withhold low-level functionality allows userland to create of DSL-like extensions that can result in highly fragile and obtuse architectures. Just look at Ruby. And yes that is an abstraction, but so is a generic concern about adding internal functions that cannot be leveraged in userland. So what specific problems would having these enhancement cause for the language?
> My current opinion is that I'd rather wait for the details of out and inout parameters to be worked out, and reap higher gains for the same cost. For instance, if preg_match could mark $matches as "out", I'd be more happy to run in a mode where I needed to add a call-site keyword.
This sounds like preferring perfect in the (potentially distant) future vs. much better today. If this feature does not block some abstract vision for a perfect future and is something that can be delivered in the short term to solve real-world problems today, why stand in its way? -Mike
  108719
February 22, 2020 10:56 rowan.collins@gmail.com (Rowan Tommins)
On 22 February 2020 06:50:46 GMT+00:00, Mike Schinkel <mike@newclarity.net> wrote:
>> On Feb 21, 2020, at 5:20 PM, Rowan Tommins collins@gmail.com> >wrote: >> My instinctive reaction is still one of frustration that the pain of >removing call-site ampersands was in vain, and I will now be asked to >put most of them back in. > >That is a great example of what is known as a "sunken cost."
Perhaps, yes. I freely admit it's an emotional reaction rather than a rational one.
>One of the reasons it is confusing is because developers are currently >required to use the ampersand in one place and not the other. Making >it always used removes said confusion as they would no longer be a >reason to have to remember when and when not to use the ampersand >anymore.
Maybe. I think a larger part of it is that references themselves are a slightly confusing concept, and the fact that & looks like an operator of its own (and is often documented that way) but is really an annotation on other operators/commands. That is, the & in $foo = &$bar and return &$bar doesn't modify $bar, it modifies = and return, respectively. Making the rules more logical and symmetrical would perhaps be more helpful to new users than it is to established users, particularly those who've known multiple versions of the language already.
> There is a potential "PR" cost of this change that should be weighed >against the advantages. > >To say "We fixed something that in hindsight we've since determined was >a problem." How is this a concern?
The concern is that the costs will be much more visible to users than the benefits, and they will resent the core developers pushing that requirement onto them, rather than thanking then for their hard work. As I said, that's not an absolute reason not to do it, it's a cost to be weighed.
>> I'm also not very keen on internal functions being able to do things >that can't be replicated on userland, and this RFC adds two: additional >behaviour for existing "prefer-ref" arguments, and new "prefer-value" >arguments > >So what specific problems would having these enhancement cause for the >language?
There are two problems I have with internal-only features in general: the inability to polyfill and extend, and the requirement for a separate mental model. As an example of the first, the RFC mentions using call_user_func with a call-site annotation to forward the parameter by reference. The reason for allowing that also applies to a user-defined wrapper like call_with_current_user or call_with_swapped_parameters, but there's no syntax for those to be marked "prefer-val". As an example of the second, even under strict settings, calls to certain internal functions will have an optional & at the call site, which changes their behaviour. To those without knowledge of the core, those functions simply have to be remembered as "magic", because their behaviour can't be modelled as part of the normal language.
>> My current opinion is that I'd rather wait for the details of out and >inout parameters to be worked out, and reap higher gains for the same >cost. For instance, if preg_match could mark $matches as "out", I'd be >more happy to run in a mode where I needed to add a call-site keyword. > >This sounds like preferring perfect in the (potentially distant) future >vs. much better today.
No, it's preferring to hold out for a little bit more value to weigh against my evaluation of the cost. This is, when followed through to its conclusion of mandatory marking, a disruptive change to every piece of code, so we need to decide if the disruption is worth it. It's also the second change in the same place, and we should be sure that we've got it right this time, and won't require a third change in the near future. For instance, if out parameters were added, would the same line of code end up going from optional &, to forbidden &, to mandatory &, to mandatory "out"? I'm not strongly against the idea, but the advantages just don't feel quite strong enough, so if I had a vote, I'd currently be inclined to vote no. Regards, -- Rowan Tommins [IMSoP]
  108724
February 23, 2020 07:03 mike@newclarity.net (Mike Schinkel)
> On Feb 23, 2020, at 2:00 AM, Mike Schinkel <mike@newclarity.net> wrote: > On Feb 22, 2020, at 5:56 AM, Rowan Tommins collins@gmail.com> wrote: >> One of the reasons it is confusing is because developers are currently >> required to use the ampersand in one place and not the other. Making >> it always used removes said confusion as they would no longer be a >> reason to have to remember when and when not to use the ampersand >> anymore. > > Maybe. I think a larger part of it is that references themselves are a slightly confusing concept, and the fact that & looks like an operator of its own (and is often documented that way) but is really an annotation on other operators/commands. That is, the & in $foo = &$bar and return &$bar doesn't modify $bar, it modifies = and return, respectively. > > Making the rules more logical and symmetrical would perhaps be more helpful to new users than it is to established users, particularly those who've known multiple versions of the language already.
You call out the use of the ampersand being viewed as an operator acting on a variable as problematic, but that is already baked into current PHP, not going to change any time soon if ever, and is orthogonal to this RFC. So whether or not people find the ampersand operator to be confusing that is irrelevant to the debate posed by Nikita's RFC over whether we should make the use of ampersand related to passing-by-reference be more consistent.
>> There is a potential "PR" cost of this change that should be weighed >> against the advantages. >> >> To say "We fixed something that in hindsight we've since determined was >> a problem." How is this a concern? > > The concern is that the costs will be much more visible to users than the benefits, and they will resent the core developers pushing that requirement onto them, rather than thanking then for their hard work. > > As I said, that's not an absolute reason not to do it, it's a cost to be weighed.
Nikita's RFC proposes that the ampersand would be optional at the calling site, so is it really a concern that developers will resent something that is optional? Yes Nikita mentioned that a future "edition" might make is a requirement, but even then it will still be optional — developers can choose not to use the new edition — and I think the resentment will come more from the concept of forcing an "edition" on developers than any specific feature. Note that I plan to post soon about how I think we can alleviate that. So we can debate the PR "cost" of requiring ampersands at the call site when the requiring RFC is on the table. As a side note, I remember thinking "WTF?!?" when the requirement to use an ampersand at the calling site was removed. It is possible your analysis of PR cost is discounting the potential large number of people who will think adding it back is a good think.
>>> I'm also not very keen on internal functions being able to do things >> that can't be replicated on userland, and this RFC adds two: additional >> behaviour for existing "prefer-ref" arguments, and new "prefer-value" >> arguments >> >> So what specific problems would having these enhancement cause for the >> language? > > There are two problems I have with internal-only features in general: the inability to polyfill and extend, and the requirement for a separate mental model. > > As an example of the first, the RFC mentions using call_user_func with a call-site annotation to forward the parameter by reference. The reason for allowing that also applies to a user-defined wrapper like call_with_current_user or call_with_swapped_parameters, but there's no syntax for those to be marked "prefer-val".
Let's analyze. In this case there does not appear to be a need for "prefer-val." And Nikita's RFC adds functionality we currently do not have — ability to pass by reference to call_user_func() so that is a win over status quo as it gains a feature that we previously internal-only: $f ) { $args[$i]++; } } function call_with_current_user(Callable $callable, int ...&$args ) { array_unshift(&$args,current_user()); $temp_args = $args; $result = call_user_func_array( $callable, &$temp_args ); foreach( $temp_args as $i => $t ) { if ( func_is_byref_arg( $i, $args ) ) { $args = $temp_args[$i]; } } return $result; } $foo = 0; $bar = 0; $baz = 0; call_with_current_user( 'foobar', &$foo, $bar, $baz ); echo $foo; // prints 1 echo $bar; // prints 0 echo $baz; // prints 0 In my example func_is_byref_arg($pos[,$variadic_arg]):bool accepts one parameter if you are checking for by-ref positionally, and two if you are introspecting a variadic parameter. So I argue we should fill in the holes of the RFC that introduces a feature that to help developers write more robust code instead of decline an RFC for imperfections in its first draft.
> As an example of the second, even under strict settings, calls to certain internal functions will have an optional & at the call site, which changes their behaviour. > > To those without knowledge of the core, those functions simply have to be remembered as "magic", because their behaviour can't be modelled as part of the normal language.
I am unclear how the optional ampersand at the call site will change the behavior. As I understand the RFC the behavior will still be driven by the ampersand at the declaration site. The presence or absence of ampersand at a call still will merely be decoration that allows developers to better convey their intent. Can you please give an example of how this RFC would change behavior at call site compared to a call site where the ampersand did not exist, given the behavior of this RFC?
>>> My current opinion is that I'd rather wait for the details of out and >> inout parameters to be worked out, and reap higher gains for the same >> cost. For instance, if preg_match could mark $matches as "out", I'd be >> more happy to run in a mode where I needed to add a call-site keyword. >> >> This sounds like preferring perfect in the (potentially distant) future >> vs. much better today. > > > No, it's preferring to hold out for a little bit more value to weigh against my evaluation of the cost. > > This is, when followed through to its conclusion of mandatory marking, a disruptive change to every piece of code, so we need to decide if the disruption is worth it.
That is disingenuous. The RFC does not require mandatory use, period. The "cost" you worry about will not exist unless and until a future RFC proposes to make it mandatory and that RFC is accepted. Further, your cost analysis does not appear to consider the cost of status quo and this RFC's ability to reduce that cost. Using Nikita's RFC example there is a potential real-world cost to getting the following wrong in a userland project: $ret = array_slice($array, 0, 3); $ret = array_splice($array, 0, 3); With Nikita's RFC developers could chose to start using ampersand at the calling site for these type of methods. Let's consider that I write the following: array_slice(&$array, 0, 3); array_splice(&$array, 0, 3); With this RFC (I assume) an error could be generated on array_splice(&$array, 0, 3)saying that I cannot pass the array by reference. Today we don't get that. This alone could reduce errors that I have seen in source code and I admittedly have committed myself. Said succinctly, there is a (IMO significant) cost to doing nothing that your analysis appears to ignore.
> It's also the second change in the same place, and we should be sure that we've got it right this time, and won't require a third change in the near future.
I don't particularly see a problem with requiring a third change in the future. Hindsight is a wonderful clarifier. And I believe elsewhere you have been debating me over the need for incremental change. Caveat emptor.
> For instance, if out parameters were added, would the same line of code end up going from optional &, to forbidden &, to mandatory &, to mandatory "out"?
My view is that we should actually hash those concerns out and move forward rather than state them in the abstract and let the fact that legitimate concerns *might* exist derail an improvement to the language. Since there are not infinite potentials, let's just address your specific concerns here. My straw man proposal is that if we add an `out` keyword exists then a developer could use either `out` or `&` but not both. Then in a future "edition" of pHP it would be possible that we disallow `&` if enough people agree that that is better. Or we could leave as either/or. Allowing ampersand at a call site today does not block potential future `out` keywords AFAICT. For me I don't care which it is as long as there is a calling site notation that allows a developer to write code indicating intent and for other developers to read code and see that intent. Status quo waiting for some future potential that may not arrive for years does not get us there in the near term, but Nikita's RFC would.
> I'm not strongly against the idea, but the advantages just don't feel quite strong enough, so if I had a vote, I'd currently be inclined to vote no.
Heh. My vote would yes. And since neither of us have a vote I guess it would be applicable to say they cancel each other's vote out. Or not. :-D -Mike
  108729
February 23, 2020 14:23 rowan.collins@gmail.com (Rowan Tommins)
Hi Mike,

First, I'd just like to reiterate that I absolutely see benefits in this 
proposal, and am definitely not campaigning for it to be abandoned as a 
bad idea. Like with any proposal, we have to weigh those benefits 
against the costs, and my current personal opinion is that the scales 
come down *very slightly* on the cost side.

I will also just say that you have made some valid points about 
different ways people might perceive this change, and my fears on that 
score may be overblown.


On 23/02/2020 07:03, Mike Schinkel wrote:

> The RFC does not require mandatory use, period. > > The "cost" you worry about will not exist unless and until a future RFC proposes to make it mandatory and that RFC is accepted.
The RFC states very clearly that the full benefit of the change will only be realised by making the markers mandatory in some way, and includes specific discussion of how that might be introduced. Put simply, tools (and even humans) get most from knowing that a particular line of code *won't* pass anything by reference, and optional markers can't guarantee that. I am analysing the proposal on that basis, just as I would analyse a proposed deprecation on the basis that the deprecated feature will one day be removed. If we analyse it on the basis of it *never* becoming mandatory, we have to adjust our analysis of both costs *and* benefits. Regarding prefer-ref and prefer-val:
> function call_with_current_user(Callable $callable, int &$foo, int $bar ) { > return call_user_func( $callable, current_user(), &$foo, $bar ); > }
If you define the function this way, all callers are *required* to pass the parameter by reference. That immediately means that this is a fatal error: call_with_current_user('foobar', 42, 42); Internal functions have the magical ability to accept both literals values and reference variables, whereas userland functions have to choose one or the other.
>> As an example of the second, even under strict settings, calls to certain internal functions will have an optional & at the call site, which changes their behaviour. >> >> To those without knowledge of the core, those functions simply have to be remembered as "magic", because their behaviour can't be modelled as part of the normal language. > I am unclear how the optional ampersand at the call site will change the behavior.
I was referring to this line in the RFC:
> If the argument is a prefer-ref argument of an internal function, then > adding the |&| annotation will pass it by reference, while not adding > it will pass it by value. Outside this mode, the passing behavior > would instead be determined by the VM kind of the argument operand.
That means that for any function implemented internally as "prefer-ref", the user can now *choose* whether their variable will be overwritten by the function. I don't know exactly which functions this would affect, because as far as I know, the manual doesn't have a standard way to annotate "prefer-ref". Which is kind of my point: it's magic behaviour which sits outside most people's understanding of the language.
> I don't particularly see a problem with requiring a third change in > the future. Hindsight is a wonderful clarifier. And I believe > elsewhere you have been debating me over the need for incremental > change. Caveat emptor.
The distinction I would make is between incremental change, and contradictory change. If we later introduce out parameters in a way that's compatible with call-site &, that would indeed be incremental change; the effort spent adding & would move code closer to the final state. If we end up introducing call-site "out", the effort spent adding & will simply be compounded with the effort spent adding "out". Predicting the future is a mug's game, but it's at least worth exploring some possible futures, and how decisions now might help or hinder them. Regards, -- Rowan Tommins (né Collins) [IMSoP]
  108747
February 24, 2020 22:32 mike@newclarity.net (Mike Schinkel)
> On Feb 23, 2020, at 9:23 AM, Rowan Tommins collins@gmail.com> wrote: > > Hi Mike, > > First, I'd just like to reiterate that I absolutely see benefits in this proposal, and am definitely not campaigning for it to be abandoned as a bad idea. Like with any proposal, we have to weigh those benefits against the costs, and my current personal opinion is that the scales come down *very slightly* on the cost side. > > I will also just say that you have made some valid points about different ways people might perceive this change, and my fears on that score may be overblown. > > > On 23/02/2020 07:03, Mike Schinkel wrote: > >> The RFC does not require mandatory use, period. >> >> The "cost" you worry about will not exist unless and until a future RFC proposes to make it mandatory and that RFC is accepted. > > The RFC states very clearly that the full benefit of the change will only be realised by making the markers mandatory in some way, and includes specific discussion of how that might be introduced. Put simply, tools (and even humans) get most from knowing that a particular line of code *won't* pass anything by reference, and optional markers can't guarantee that.
Fair point.
> > I am analysing the proposal on that basis, just as I would analyse a proposed deprecation on the basis that the deprecated feature will one day be removed. > > If we analyse it on the basis of it *never* becoming mandatory, we have to adjust our analysis of both costs *and* benefits.
However, if you consider editions, it may not ever need to become mandatory and yet those who want it could still benefit.
> Regarding prefer-ref and prefer-val: > >> function call_with_current_user(Callable $callable, int &$foo, int $bar ) { >> return call_user_func( $callable, current_user(), &$foo, $bar ); >> } > > > If you define the function this way, all callers are *required* to pass the parameter by reference. That immediately means that this is a fatal error: > > call_with_current_user('foobar', 42, 42); > > Internal functions have the magical ability to accept both literals values and reference variables, whereas userland functions have to choose one or the other.
Uh, yeah I guess. But I would ask the question, why would you want to do that? The reason we don't make the ampersand a requirement is a legacy concern. But if you are writing a new function there is no legacy concern. So it would seem a developer would *want* to force the ampersand. Or is your point just that there are a list of possible options and you want the ability to use _all_ options from that list regardless of whether a specific option has a valid use-case?
>>> As an example of the second, even under strict settings, calls to certain internal functions will have an optional & at the call site, which changes their behaviour. >>> >>> To those without knowledge of the core, those functions simply have to be remembered as "magic", because their behaviour can't be modelled as part of the normal language. >> I am unclear how the optional ampersand at the call site will change the behavior. > > > I was referring to this line in the RFC: > >> If the argument is a prefer-ref argument of an internal function, then adding the |&| annotation will pass it by reference, while not adding it will pass it by value. Outside this mode, the passing behavior would instead be determined by the VM kind of the argument operand. > > > That means that for any function implemented internally as "prefer-ref", the user can now *choose* whether their variable will be overwritten by the function. I don't know exactly which functions this would affect, because as far as I know, the manual doesn't have a standard way to annotate "prefer-ref". Which is kind of my point: it's magic behaviour which sits outside most people's understanding of the language.
Sounds like the solution then is to update the documentation?
> I don't particularly see a problem with requiring a third change in the future. Hindsight is a wonderful clarifier. And I believe elsewhere you have been debating me over the need for incremental change. Caveat emptor. > > > The distinction I would make is between incremental change, and contradictory change. If we later introduce out parameters in a way that's compatible with call-site &, that would indeed be incremental change; the effort spent adding & would move code closer to the final state. If we end up introducing call-site "out", the effort spent adding & will simply be compounded with the effort spent adding "out". > Predicting the future is a mug's game, but it's at least worth exploring some possible futures, and how decisions now might help or hinder them.
I do agree that changes that are contradictory as problematic. However, in this case Nikita has weighed in and said "out" is unlikely to happen. So that seems to remove the concern about conflicts with out parameters? -Mike
  108727
February 23, 2020 09:47 nikita.ppv@gmail.com (Nikita Popov)
On Fri, Feb 21, 2020 at 11:20 PM Rowan Tommins collins@gmail.com>
wrote:

> On 20 February 2020 14:13:58 GMT+00:00, Nikita Popov ppv@gmail.com> > wrote: > >Hi internals, > > > >I'd like to start the discussion on the "explicit call-site > >pass-by-reference" RFC again: > >https://wiki.php.net/rfc/explicit_send_by_ref > > > Hi Nikita, > > Thanks for putting the case for this so clearly. My instinctive reaction > is still one of frustration that the pain of removing call-site ampersands > was in vain, and I will now be asked to put most of them back in. It's also > relevant that users already find where & should and should not be used very > confusing. There is a potential "PR" cost of this change that should be > weighed against the advantages. > > I'm also not very keen on internal functions being able to do things that > can't be replicated on userland, and this RFC adds two: additional > behaviour for existing "prefer-ref" arguments, and new "prefer-value" > arguments. >
I should say that this is a non-essential part of the RFC. I noticed that this RFC provides a way to solve this problem, but if we don't think it the problem is worth solving, then we don't have to solve it. The prefer-ref/prefer-val thing is indeed a bit peculiar. It's an artifact of the current way of implicit by-reference passing, where the decision of whether to pass by-value or by-reference has to be made based on an "educated guess" at the call-site. That leaves us with always-val, always-ref, prefer-val and prefer-ref as the possible passing modes. In the explicit by-ref passing regime, the latter two consolidate, and we have by-val, by-ref and "either" as the options, which is a lot more obvious. But again, I can't say I'm fully convinced myself that this is really a problem we need to solve. I don't really care about call_user_func() at all (it is entirely obsoleted by $fn()), and now that I think about it, __call() isn't really the right primitive to expose anyway. If you will allow me a little digression... Instead of having __call(), what we really should have is __get_method(). For a simple forwarding proxy, the implementation would look something like this: public function __get_method(string $name): Closure { if (method_exists($this->proxy, $name)) { return Closure::fromCallable([$this->proxy, $name]); } return null; } This solves multiple problems with one stone: First, it preserves the signature of the method we're proxying to: This is better than the solution in this RFC, because it preserves both by-ref argument passing and by-ref returns, and can validate that properly (i.e. passing a non-ref to by-ref will diagnose). Second, it makes is_callable() work precisely, because we no longer have to assume that with __call() any method is callable. Third, it makes Reflection work on the proxied method. It's possible to recover normal __call() semantics from this approach by writing something like this: public function __get_method(string $name): Closure { return function(...$args) use($name) { // Normal __call() implementation in here. }; } My current opinion is that I'd rather wait for the details of out and inout
> parameters to be worked out, and reap higher gains for the same cost. For > instance, if preg_match could mark $matches as "out", I'd be more happy to > run in a mode where I needed to add a call-site keyword. >
I believe we talked about this in some detail in the previous discussion on this topic. My basic stance on in/out is that it's *probably* not worth the complexity, unless it is part of an effort to eliminate references from PHP entirely (which would be hugely beneficial). Unfortunately I don't really see a clear pathway towards that. "out" parameters can remove one use-case of references, and I can see how that would work both in terms of semantics and implementation. The case of "inout" parameters is much more problematic. While these can nominally work without references, I don't see how they could do so efficiently (we would have to "move out" the value from the original location to avoid COW). Similarly, I don't have any answer to how &__get() and &offsetGet() would work without references. Regards, Nikita
  108730
February 23, 2020 14:41 rowan.collins@gmail.com (Rowan Tommins)
On 23/02/2020 09:47, Nikita Popov wrote:
> The prefer-ref/prefer-val thing is indeed a bit peculiar. It's an > artifact of the current way of implicit by-reference passing, where > the decision of whether to pass by-value or by-reference has to be > made based on an "educated guess" at the call-site. That leaves us > with always-val, always-ref, prefer-val and prefer-ref as the possible > passing modes. In the explicit by-ref passing regime, the latter two > consolidate, and we have by-val, by-ref and "either" as the options, > which is a lot more obvious.
Thanks, that's a good summary of how this all relates to the RFC. If there is a use case for such a mode, perhaps we need a way to annotate userland functions as "either", so that they too can take advantage of the call-site annotation. In a sense, they'd be opting in to the pre-5.4 behaviour, for that particular parameter.
> Instead of having __call(), what we really should have is __get_method().
That is a really interesting idea. The lack of function signature is currently a big turn-off for using __call, because it means manually recreating a lot of the unpacking and type checking that the language would normally do for you.
> I believe we talked about this in some detail in the previous > discussion on this topic. My basic stance on in/out is that it's > *probably* not worth the complexity, unless it is part of an effort to > eliminate references from PHP entirely (which would be hugely > beneficial). Unfortunately I don't really see a clear pathway towards > that. "out" parameters can remove one use-case of references, and I > can see how that would work both in terms of semantics and > implementation. The case of "inout" parameters is much more > problematic. While these can nominally work without references, I > don't see how they could do so efficiently (we would have to "move > out" the value from the original location to avoid COW). Similarly, I > don't have any answer to how &__get() and &offsetGet() would work > without references.
That's fair enough. I guess the reason I've fixated on them is that I'd really like "out" parameters, independent of what else happens with references, in order to get the clear signal of "variable is initialised here" on a call like "preg_match($foo, $bar, out $matches)". That would make out parameters more attractive to build APIs around, e.g. when you want multiple strongly typed outputs from one call. Even if we can't eliminate references entirely, perhaps there's value in reducing the use cases where they're necessary? A bit like how property accessor syntax wouldn't allow us to remove __get and __set, but would mean fewer cases where people needed to deal with them. Regards, -- Rowan Tommins (né Collins) [IMSoP]
  109019
March 14, 2020 20:08 tysonandre775@hotmail.com (tyson andre)
Hi internals,

One idea I had that was related to this (but not in the scope of this RFC)
would be adding a way to force the interpreter to treat an argument (variable, array field, property access, etc) as being passed by value,
and refuse to modify it by reference (e.g. emit a notice and create a separate reference (or throw an Error))

i.e. instead of using the opcode SEND_VAR_EX, use a brand new opcode kind SEND_VAR_BY_VALUE that would do that, if the method signature was unknown.

- Currently, php emits a notice and creates a temporary reference for non-variables, such as when passing the result of a function returning a non-reference to a reference parameter.
- I assume SEND_VAR_BY_VALUE is equivalent to the new opcode needed if a subsequent RFC made call-site pass-by-reference mandatory in a given file.

Possible syntaxes (only within argument lists):

```
$a->someMethod(*$foo);
$a->someMethod(&&$foo);
$a->someMethod(\$foo);
$a->someMethod(identity($foo));  // add a new keyword such as identity or value
$a->someMethod(=$foo);
```

Context: From the current thread's RFC https://wiki.php.net/rfc/explicit_send_by_ref

> In fact, our inability to determine at compile-time whether a certain argument is passed by-value or by-reference is one of the most significant obstacles in our ability to analyze and optimize compiled code
I suggested this because it would be useful for optimizing frequently used code (after profiling it (e.g. with phpspy) and checking opcache debug output), especially in frequently called functions/methods. I'd prefer a declare directive (or edition) to make call-site pass-by-reference mandatory to pass by reference over this suggestion, though. - Tyson