Re: [PHP-DEV] A little syntactic sugar on array_* function calls?

This is only part of a thread. view whole thread
  114581
May 25, 2021 10:51 someniatko@gmail.com (someniatko)
> The proposed syntax > > $array |> array_map($fn1, ?) |> array_filter(?, $fn2) > > When I compare to: > > $array->map($fn1)->filter($fn2) > > 1. It's longer. Much longer. > 2. It still requires knowing where the array goes. That's legacy which we could sidestep with the arrow notation. > 3. Admittedly, the pipe is much more powerful.
While the argument No. 2. is completely valid, the 1st one is not so. If you remove whitespaces around the `|>` and also if you alias these functions to `map` and `filter` respectively (or if, for instance, some future RFC moves them into a special `PHP\Array` namespace, which would probably never happen, but it's allowed to dream), it could look like this: ```php $array|>map($fn1, ?)|>filter(?, $fn2); $array->map($fn1)->filter($fn2); ``` A bit longer (due to 2.), but not that much, actually. Best wishes, someniatko
  114602
May 25, 2021 20:58 the.liquid.metal@gmail.com (Hendra Gunawan)
Hello.

> > ```php > $array|>map($fn1, ?)|>filter(?, $fn2); > $array->map($fn1)->filter($fn2); > ``` >
Whitespace removal is not a solution for code length problems. You might have a new problem if you do it. "|" is very similar to the lowercase "L" and uppercase "i". It's just an extra 3 characters (", ?" or "?, "). For most people, this is not a problem at all. people tend to write "one statement per line" rather than "multi statement line". I myself usually write no more than 3 statements per line if they are less than 120 characters. The real problem is there is no consistency for "haystack vs needle" position. There are RFCs to fix this (along with the naming convention problem), but none of them are successful.
> The pipe operator feels like a poor solution while "->" would do > exactly what people want.
Not so poor if we * use "~>" as pipe operator rather than "|>" * redesign the api under their proper namespace and strictly place the "haystack" as the first function argument. Regards, Hendra Gunawan.
  114603
May 25, 2021 22:28 txigreman@hotmail.com (=?iso-8859-1?Q?Iv=E1n_Arias?=)
Hi all,

It sounds like scalar objects by Nikita:
https://github.com/<https://github.com/nikic/scalar_objects>nikic<https://github.com/nikic/scalar_objects>/scalar_objects<https://github.com/nikic/scalar_objects>

Regards,
Iván Arias.

Get Outlook for Android<https://aka.ms/AAb9ysg>

________________________________
From: Hendra Gunawan metal@gmail.com>
Sent: Tuesday, May 25, 2021 10:58:46 PM
To: someniatko <someniatko@gmail.com>
Cc: Karoly Negyesi <karoly@negyesi.net>; Marco Pivetta <ocramius@gmail.com>; Lynn <kjarli@gmail.com>; internals@lists.php.net <internals@lists.php.net>
Subject: Re: [PHP-DEV] A little syntactic sugar on array_* function calls?

Hello.

> > ```php > $array|>map($fn1, ?)|>filter(?, $fn2); > $array->map($fn1)->filter($fn2); > ``` >
Whitespace removal is not a solution for code length problems. You might have a new problem if you do it. "|" is very similar to the lowercase "L" and uppercase "i". It's just an extra 3 characters (", ?" or "?, "). For most people, this is not a problem at all. people tend to write "one statement per line" rather than "multi statement line". I myself usually write no more than 3 statements per line if they are less than 120 characters. The real problem is there is no consistency for "haystack vs needle" position. There are RFCs to fix this (along with the naming convention problem), but none of them are successful.
> The pipe operator feels like a poor solution while "->" would do > exactly what people want.
Not so poor if we * use "~>" as pipe operator rather than "|>" * redesign the api under their proper namespace and strictly place the "haystack" as the first function argument. Regards, Hendra Gunawan. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: https://www.php.net/unsub.php
  114606
May 26, 2021 05:44 office.hamzaahmad@gmail.com (Hamza Ahmad)
Hello,
I read about this extension times ago but didn't know whether it had
been public. If Nikita is reading this, I request him to think of
proposing a modified version of this extension bundled with PHP.
In simple words, he can hide the function that registers a class that
serves as a prototype of a built-in type. And, also provide with
scalar methods for string, int, float and arrays. To get handler
registering functionality added to core, there should be a separate
RFC.
While I read this thread for the first time, I had following suggestions:
1. All array functions should be moved to its scaler object, and the
word "array_" should also be removed.
2. To maintain backward compatibility, all array_* functions will
become method aliases of scaler array.
3. ArrayObject will also exist for the compatibility purpose, and its
methods will also be added to the scaler array.
Thus, array() or [] will return scaler array object, and following
syntax would become valid:
`[1,2,3,4,5,6,7,8,9] -> reverse();`
`array(1 => 'a', 2 => 'b') -> flip();`
If it happens, users will automatically be stopped from passing
non-array values to array functions, and the error will be caught
earlier.
Regards

On 5/26/21, Iván Arias <txigreman@hotmail.com> wrote:
> Hi all, > > It sounds like scalar objects by Nikita: > https://github.com/<https://github.com/nikic/scalar_objects>nikic<https://github.com/nikic/scalar_objects>/scalar_objects<https://github.com/nikic/scalar_objects> > > Regards, > Iván Arias. > > Get Outlook for Android<https://aka.ms/AAb9ysg> > > ________________________________ > From: Hendra Gunawan metal@gmail.com> > Sent: Tuesday, May 25, 2021 10:58:46 PM > To: someniatko <someniatko@gmail.com> > Cc: Karoly Negyesi <karoly@negyesi.net>; Marco Pivetta <ocramius@gmail.com>; > Lynn <kjarli@gmail.com>; internals@lists.php.net <internals@lists.php.net> > Subject: Re: [PHP-DEV] A little syntactic sugar on array_* function calls? > > Hello. > >> >> ```php >> $array|>map($fn1, ?)|>filter(?, $fn2); >> $array->map($fn1)->filter($fn2); >> ``` >> > > Whitespace removal is not a solution for code length problems. > You might have a new problem if you do it. "|" is very similar > to the lowercase "L" and uppercase "i". > > It's just an extra 3 characters (", ?" or "?, "). For most people, > this is not a problem at all. people tend to write "one statement per line" > rather than "multi statement line". I myself usually write no more than > 3 statements per line if they are less than 120 characters. > > The real problem is there is no consistency for "haystack vs needle" > position. There are RFCs to fix this (along with the naming convention > problem), but none of them are successful. > >> The pipe operator feels like a poor solution while "->" would do >> exactly what people want. > > Not so poor if we > * use "~>" as pipe operator rather than "|>" > * redesign the api under their proper namespace and strictly place > the "haystack" as the first function argument. > > Regards, > Hendra Gunawan. > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: https://www.php.net/unsub.php > >
  114609
May 26, 2021 10:29 hossein.baghayi@gmail.com (Hossein Baghayi)
On Wed, 26 May 2021 at 10:14, Hamza Ahmad hamzaahmad@gmail.com>
wrote:

> Thus, array() or [] will return scaler array object, >
Hello, This doesn't seem trivial to me. I mean, should array object be passed by value or by reference? Arrays are passed by value by default so far, and objects are be-ref internally. If we are to have array object, will it be exceptional? Or should we change its behaviour going forward? To be clear, array() returns an array right now, which by default is passed by value at the moment. If it was supposed to be changed to an object, (array() to return an object), should it be still passed by value? (an exceptional object) or we should change its behaviour going forward and pass it by-ref? Either way, it may have some quirks associated with it.
  114613
May 26, 2021 11:03 mike@newclarity.net (Mike Schinkel)
> On May 25, 2021, at 6:28 PM, Iván Arias <txigreman@hotmail.com> wrote: > > Hi all, > > It sounds like scalar objects by Nikita: https://github.com/nikic/scalar_objects
Yes, but Nikita wrote this note about technical limitations at the bottom of the repo README: Due to technical limitations, it is not possible to create mutable APIs for primitive types. Modifying $self within the methods is not possible (or rather, will have no effect, as you'd just be changing a copy). Does that mean that the scope of Nikita's proof-of-concept could not modify $self, or that it is simply not possible to modify $self given limitations inherent in PHP? Further, does that only apply to scalars, or might possible arrays could be different? -Mike
  114615
May 26, 2021 11:51 ocramius@gmail.com (Marco Pivetta)
On Wed, May 26, 2021 at 1:03 PM Mike Schinkel <mike@newclarity.net> wrote:

> > > > On May 25, 2021, at 6:28 PM, Iván Arias <txigreman@hotmail.com> wrote: > > > > Hi all, > > > > It sounds like scalar objects by Nikita: > https://github.com/nikic/scalar_objects > > Yes, but Nikita wrote this note about technical limitations at the bottom > of the repo README: > > Due to technical limitations, it is not possible to create mutable APIs > for primitive types. Modifying $self within the methods is not possible (or > rather, will have no effect, as you'd just be changing a copy). >
Sounds like a big **advantage**? Marco Pivetta http://twitter.com/Ocramius http://ocramius.github.com/
  114616
May 26, 2021 11:54 office.hamzaahmad@gmail.com (Hamza Ahmad)
> should array object be passed by value or by reference? > If we are to have array object, will it be exceptional?
By value. Because array is a data type, we are talking about making it behave like object. In JavaScript, Arrays, Strings, and Numbers are objects; they have their respective properties and methods. Still, when they are passed to a function or a method call, they are passed by value, not by reference. We should pass arrays by value because it will let a function or a method modify it without changing the original array. If we make it a regular object, it will be a bc break. So, whenever a callable modifies an array, it will modify a variable out of its scope. I am talking about attaching some methods and properties to array (or largely, the string, int and float) type. In your manner, an exceptional array object that is passed by value, Which will have "key_first", "key_last", "keys", "values", "length", "type" (if PHP later introduces typed arrays), and "is_list" as properties, and "reverse", "flip", "map", "filter", "walk" and so on as methods. Such methods will be performed on a value, not a variable. In other words: `[1,2,3,4,5,6,7,8,9,0]->print();` will work. This way, it does not matter whether one modifies a variable or a value. According to the implementation of Nikita's extension, there will be functions attached to each method of a type. To remove this limitation, what if array is an internal array object? To make my previous statement regarding making array_* functions as method aliases for array->* methods, I give the example of mysqli. It has both ways of interaction, object-oriented and procedural. So, why not this with arrays? Regards On 5/26/21, Mike Schinkel <mike@newclarity.net> wrote:
> > >> On May 25, 2021, at 6:28 PM, Iván Arias <txigreman@hotmail.com> wrote: >> >> Hi all, >> >> It sounds like scalar objects by Nikita: >> https://github.com/nikic/scalar_objects > > Yes, but Nikita wrote this note about technical limitations at the bottom of > the repo README: > > Due to technical limitations, it is not possible to create mutable APIs for > primitive types. Modifying $self within the methods is not possible (or > rather, will have no effect, as you'd just be changing a copy). > > Does that mean that the scope of Nikita's proof-of-concept could not modify > $self, or that it is simply not possible to modify $self given limitations > inherent in PHP? > > Further, does that only apply to scalars, or might possible arrays could be > different? > > -Mike
  114629
May 26, 2021 23:44 the.liquid.metal@gmail.com (Hendra Gunawan)
Hello.

> > Yes, but Nikita wrote this note about technical limitations at the bottom of the repo README: > > Due to technical limitations, it is not possible to create mutable APIs for > primitive types. Modifying $self within the methods is not possible (or > rather, will have no effect, as you'd just be changing a copy). >
If it is solved, this is a great accomplishment for PHP. But I think scalar object is not going anywhere in the near future. If you are not convinced, please take a look https://github.com/nikic/scalar_objects/issues/20#issuecomment-569520181. This makes me have a strong feeling about pipe operator greater than before to solve object-style for scalar issue. I hope that someone will take an initiative to fix the old inconsistent and confusing API. Pipe operator+new API is a better solution than no solution at all.
  114654
May 28, 2021 01:11 mike@newclarity.net (Mike Schinkel)
> On May 26, 2021, at 7:44 PM, Hendra Gunawan metal@gmail.com> wrote: > > Hello. > >> >> Yes, but Nikita wrote this note about technical limitations at the bottom of the repo README: >> >> Due to technical limitations, it is not possible to create mutable APIs for >> primitive types. Modifying $self within the methods is not possible (or >> rather, will have no effect, as you'd just be changing a copy). >> > > If it is solved, this is a great accomplishment for PHP. But I think > scalar object is not going anywhere in the near future. If you are not > convinced, please take a look > https://github.com/nikic/scalar_objects/issues/20#issuecomment-569520181.
Nikita's comment actually causes me more questions, not fewer. Nikita says "We need to know that $a[$b][$c is an array in order to determine that the call should be performed by-reference. However, we already need to convert $a, $a[$b] and $a[$b][$c] into references before we know about that." How then are we able to do the following?: $a[$b][$c][] = 1; How also can we do this: byref($a[$b][$c]); function byref(&$x) { $x[]= 2; } See https://3v4l.org/aPvTD <https://3v4l.org/aPvTD> I assume that in both my examples $a[$b][$c] would be considered an "lvalue"[1] and can be a target of assignment triggered by either the assignment operator or calling the function and passing to a by-ref parameter. [1] https://en.wikipedia.org/wiki/Value_(computer_science)#Assignment:_l-values_and_r-values So is there a reason that -> on an array could not trigger the same? Is Nikita saying that the performance of those calls performed by-reference would not matter because they are always being assigned, at least in the former case, but to do so with array expressions would be problematic? (Ignoring there is no code in the wild that currently uses the -> operator, or does that matter?) I ask honestly to understand, and not as a rhetorical question. Additionally, if the case of updating an array variable is not a problem but updating an array expression is a problem then why not just limit the -> operator to only work on expressions for immutable methods and require variables for mutable methods? I would think should be easy enough to throw an error for those specific "methods" that would be mutable, such as shift() and unshift() if $a[$b][$c]->shift('foo') were called? Or maybe just completely limit using the -> operator on array variables. Don't work on any array expressions for consistency. There is already precedence in PHP for operators that work on variables and not on expressions: ++, --, and &. IF we can get a thumbs up from Nikita that one of these would actually be possible then I think the next step should be to write up a list of proposed array methods that would be implemented to support the -> operator with arrays and put them in an RFC, and to flesh out any edge cases. -Mike
  114657
May 28, 2021 14:31 nikita.ppv@gmail.com (Nikita Popov)
On Fri, May 28, 2021 at 3:11 AM Mike Schinkel <mike@newclarity.net> wrote:

> > On May 26, 2021, at 7:44 PM, Hendra Gunawan metal@gmail.com> > wrote: > > > > Hello. > > > >> > >> Yes, but Nikita wrote this note about technical limitations at the > bottom of the repo README: > >> > >> Due to technical limitations, it is not possible to create mutable APIs > for > >> primitive types. Modifying $self within the methods is not possible (or > >> rather, will have no effect, as you'd just be changing a copy). > >> > > > > If it is solved, this is a great accomplishment for PHP. But I think > > scalar object is not going anywhere in the near future. If you are not > > convinced, please take a look > > https://github.com/nikic/scalar_objects/issues/20#issuecomment-569520181 > . > > Nikita's comment actually causes me more questions, not fewer. > > Nikita says "We need to know that $a[$b][$c is an array in order to > determine that the call should be performed by-reference. However, we > already need to convert $a, $a[$b] and $a[$b][$c] into references before we > know about that." > > How then are we able to do the following?: > > $a[$b][$c][] = 1; >
In this case, we're clearly performing a write operation on the array. If you want to know the technical details, the compiler will convert this into a sequence of FETCH_DIM_W ops followed by ASSIGN_DIM. The "W" bit here is for "write", which will perform all the necessary special handling, such as copy-on-write separation and auto-vivification. How also can we do this:
> > byref($a[$b][$c]); > function byref(&$x) { > $x[]= 2; > } > > See https://3v4l.org/aPvTD <https://3v4l.org/aPvTD> >
This is a more complex case. In this case the compiler doesn't know in advance whether the argument is passed by value or by reference. What happens here is: 1. INIT_FCALL determines that we're calling byref(). 2. CHECK_FUNC_ARG for the first arg determines that this argument is passed by-reference for this function. 3. FETCH_DIM_FUNC_ARG on the array will be perform either an FETCH_DIM_R or to FETCH_DIM_W operation, depending on what CHECK_FUNC_ARG determined. I assume that in both my examples $a[$b][$c] would be considered an
> "lvalue"[1] and can be a target of assignment triggered by either the > assignment operator or calling the function and passing to a by-ref > parameter. > > [1] > https://en.wikipedia.org/wiki/Value_(computer_science)#Assignment:_l-values_and_r-values > > So is there a reason that -> on an array could not trigger the same? Is > Nikita saying that the performance of those calls performed by-reference > would not matter because they are always being assigned, at least in the > former case, but to do so with array expressions would be problematic? > (Ignoring there is no code in the wild that currently uses the -> operator, > or does that matter?) >
Note that the byref($a[$b][$c]) case only works because we know which function is being called at the time the argument is passed. If you have $a[$b][$c]->test() we need to pass $a[$b][$c] by reference (FETCH_DIM_W) or by value (FETCH_DIM_R) depending on whether $a[$b][$c]->test() accepts the argument by-value or by-reference. But we can only know that once we have already evaluated $a[$b][$c] and found out that it is indeed an array. The only way around this is to *always* perform a for-write fetch of $a[$b][$c], even though we don't know that the end result is going to be an array. However, doing so would pessimize the performance of code operating on objects. Consider $some_huge_shared_array[0]->foo(). If we fetch $some_huge_shared_array for write, we'll be required to perform a full duplication of the array in preparation for a possible future write. If it turns out that $some_huge_shared_array[0] is actually an object, or that $some_huge_shared_array[0] is an array and the performed operation is by-value, then we have performed this copy unnecessarily. I don't believe this is acceptable. I ask honestly to understand, and not as a rhetorical question.
> > Additionally, if the case of updating an array variable is not a problem > but updating an array expression is a problem then why not just limit the > -> operator to only work on expressions for immutable methods and require > variables for mutable methods? I would think should be easy enough to > throw an error for those specific "methods" that would be mutable, such as > shift() and unshift() if $a[$b][$c]->shift('foo') were called? >
There are externalities associated even with the simple $x->foo() case, though they are less severe. They primarily involve reduced ability to analyze code in opcache. In either case, this limitation does not seem reasonable to me from a language design perspective. If $a->push($b) works, then $a[$k]->push($b) can reasonably be expected to work as well.
> Or maybe just completely limit using the -> operator on array variables. > Don't work on any array expressions for consistency. There is already > precedence in PHP for operators that work on variables and not on > expressions: ++, --, and &. > > IF we can get a thumbs up from Nikita that one of these would actually be > possible then I think the next step should be to write up a list of > proposed array methods that would be implemented to support the -> operator > with arrays and put them in an RFC, and to flesh out any edge cases. >
The only correct way to resolve this issue is to not support mutable operations. I don't think there's much need for mutable operations. sort() and shuffle() would be best implemented by returning a new array instead. array_push() is redundant with $array[]. array_shift() and array_unshift() should never be used. array_pop() and array_splice() are the only sensible mutable array methods that come to mind, and I daresay we can do without them. Regards, Nikita
  114659
May 28, 2021 19:00 marandall@php.net (Mark Randall)
On 28/05/2021 15:31, Nikita Popov wrote:
> This is a more complex case. In this case the compiler doesn't know in > advance whether the argument is passed by value or by reference. What > happens here is:
I'm trying to wrap my head around this, but if a function arg can handle this, does something internal to the engine preclude fetching in write context, after already fetching in read context, other than performance? So can the initial fetch be performed with FETCH_DIM_R, handling the object case + any other scalars, and if and only if the value is an array and operating on what would traditionally be a by-ref, repeating the previous lookup with FETCH_DIM_W? Mark Randall
  114660
May 28, 2021 21:01 mike@newclarity.net (Mike Schinkel)
Hi Nikita,

Thank you for taking the time to explain in detail.  

One more question below.

-Mike

> On May 28, 2021, at 10:31 AM, Nikita Popov ppv@gmail.com> wrote: > > On Fri, May 28, 2021 at 3:11 AM Mike Schinkel <mike@newclarity.net <mailto:mike@newclarity.net>> wrote: > > On May 26, 2021, at 7:44 PM, Hendra Gunawan metal@gmail.com <mailto:the.liquid.metal@gmail.com>> wrote: > > > > Hello. > > > >> > >> Yes, but Nikita wrote this note about technical limitations at the bottom of the repo README: > >> > >> Due to technical limitations, it is not possible to create mutable APIs for > >> primitive types. Modifying $self within the methods is not possible (or > >> rather, will have no effect, as you'd just be changing a copy). > >> > > > > If it is solved, this is a great accomplishment for PHP. But I think > > scalar object is not going anywhere in the near future. If you are not > > convinced, please take a look > > https://github.com/nikic/scalar_objects/issues/20#issuecomment-569520181 <https://github.com/nikic/scalar_objects/issues/20#issuecomment-569520181>.. > > Nikita's comment actually causes me more questions, not fewer. > > Nikita says "We need to know that $a[$b][$c is an array in order to determine that the call should be performed by-reference. However, we already need to convert $a, $a[$b] and $a[$b][$c] into references before we know about that." > > How then are we able to do the following?: > > $a[$b][$c][] = 1; > > In this case, we're clearly performing a write operation on the array. If you want to know the technical details, the compiler will convert this into a sequence of FETCH_DIM_W ops followed by ASSIGN_DIM. The "W" bit here is for "write", which will perform all the necessary special handling, such as copy-on-write separation and auto-vivification. > > How also can we do this: > > byref($a[$b][$c]); > function byref(&$x) { > $x[]= 2; > } > > See https://3v4l.org/aPvTD <https://3v4l.org/aPvTD> <https://3v4l.org/aPvTD <https://3v4l.org/aPvTD>> > > This is a more complex case. In this case the compiler doesn't know in advance whether the argument is passed by value or by reference. What happens here is: > > 1. INIT_FCALL determines that we're calling byref(). > 2. CHECK_FUNC_ARG for the first arg determines that this argument is passed by-reference for this function. > 3. FETCH_DIM_FUNC_ARG on the array will be perform either an FETCH_DIM_R or to FETCH_DIM_W operation, depending on what CHECK_FUNC_ARG determined. > > I assume that in both my examples $a[$b][$c] would be considered an "lvalue"[1] and can be a target of assignment triggered by either the assignment operator or calling the function and passing to a by-ref parameter. > > [1] https://en.wikipedia.org/wiki/Value_(computer_science)#Assignment:_l-values_and_r-values <https://en.wikipedia.org/wiki/Value_(computer_science)#Assignment:_l-values_and_r-values> > > So is there a reason that -> on an array could not trigger the same? Is Nikita saying that the performance of those calls performed by-reference would not matter because they are always being assigned, at least in the former case, but to do so with array expressions would be problematic? (Ignoring there is no code in the wild that currently uses the -> operator, or does that matter?) > > Note that the byref($a[$b][$c]) case only works because we know which function is being called at the time the argument is passed. If you have $a[$b][$c]->test() we need to pass $a[$b][$c] by reference (FETCH_DIM_W) or by value (FETCH_DIM_R) depending on whether $a[$b][$c]->test() accepts the argument by-value or by-reference. But we can only know that once we have already evaluated $a[$b][$c] and found out that it is indeed an array. > > The only way around this is to *always* perform a for-write fetch of $a[$b][$c], even though we don't know that the end result is going to be an array. However, doing so would pessimize the performance of code operating on objects. Consider $some_huge_shared_array[0]->foo(). If we fetch $some_huge_shared_array for write, we'll be required to perform a full duplication of the array in preparation for a possible future write. If it turns out that $some_huge_shared_array[0] is actually an object, or that $some_huge_shared_array[0] is an array and the performed operation is by-value, then we have performed this copy unnecessarily. > > I don't believe this is acceptable. > > I ask honestly to understand, and not as a rhetorical question. > > Additionally, if the case of updating an array variable is not a problem but updating an array expression is a problem then why not just limit the -> operator to only work on expressions for immutable methods and require variables for mutable methods? I would think should be easy enough to throw an error for those specific "methods" that would be mutable, such as shift() and unshift() if $a[$b][$c]->shift('foo') were called? > > There are externalities associated even with the simple $x->foo() case, though they are less severe. They primarily involve reduced ability to analyze code in opcache. > > In either case, this limitation does not seem reasonable to me from a language design perspective. If $a->push($b) works, then $a[$k]->push($b) can reasonably be expected to work as well. > > Or maybe just completely limit using the -> operator on array variables. Don't work on any array expressions for consistency. There is already precedence in PHP for operators that work on variables and not on expressions: ++, --, and &. > > IF we can get a thumbs up from Nikita that one of these would actually be possible then I think the next step should be to write up a list of proposed array methods that would be implemented to support the -> operator with arrays and put them in an RFC, and to flesh out any edge cases. > > The only correct way to resolve this issue is to not support mutable operations.
I don't think I agree that this is the only correct way, but I respect your position of authority on the matter.
> I don't think there's much need for mutable operations. sort() and shuffle() would be best implemented by returning a new array instead. array_push() is redundant with $array[]. array_shift() and array_unshift() should never be used.
Why do you say array_shift() and array_unshift() should never be used? When I wrote the above questions the use-case I was thinking about most was $a->unshift($value) as I use array_unshift() more than most of the other array functions. Do you mean that these if applied as "methods" to an array should not be use immutably — meaning in-place is bad but returning an array value that has been shifted would be okay — or do you have some other reason you believe that shifting an array is bad? Note the reason I have used them in the past is when I need to pass an array to a function written by someone else that expects the array to be ordered. Also, what about very large arrays? I assume — which could be a bad assumption — that PHP internally can be more efficient about how it handles array_unshift() instead of just duplicating the large array so as to add an element at the beginning? -Mike
  114674
May 31, 2021 09:52 nikita.ppv@gmail.com (Nikita Popov)
On Fri, May 28, 2021 at 11:01 PM Mike Schinkel <mike@newclarity.net> wrote:

> Hi Nikita, > > Thank you for taking the time to explain in detail. > > One more question below. > > -Mike > > On May 28, 2021, at 10:31 AM, Nikita Popov ppv@gmail.com> wrote: > > On Fri, May 28, 2021 at 3:11 AM Mike Schinkel <mike@newclarity.net> wrote: > >> > On May 26, 2021, at 7:44 PM, Hendra Gunawan metal@gmail.com> >> wrote: >> > >> > Hello. >> > >> >> >> >> Yes, but Nikita wrote this note about technical limitations at the >> bottom of the repo README: >> >> >> >> Due to technical limitations, it is not possible to create mutable >> APIs for >> >> primitive types. Modifying $self within the methods is not possible (or >> >> rather, will have no effect, as you'd just be changing a copy). >> >> >> > >> > If it is solved, this is a great accomplishment for PHP. But I think >> > scalar object is not going anywhere in the near future. If you are not >> > convinced, please take a look >> > >> https://github.com/nikic/scalar_objects/issues/20#issuecomment-569520181.. >> >> Nikita's comment actually causes me more questions, not fewer. >> >> Nikita says "We need to know that $a[$b][$c is an array in order to >> determine that the call should be performed by-reference. However, we >> already need to convert $a, $a[$b] and $a[$b][$c] into references before we >> know about that." >> >> How then are we able to do the following?: >> >> $a[$b][$c][] = 1; >> > > In this case, we're clearly performing a write operation on the array. If > you want to know the technical details, the compiler will convert this into > a sequence of FETCH_DIM_W ops followed by ASSIGN_DIM. The "W" bit here is > for "write", which will perform all the necessary special handling, such as > copy-on-write separation and auto-vivification. > > How also can we do this: >> >> byref($a[$b][$c]); >> function byref(&$x) { >> $x[]= 2; >> } >> >> See https://3v4l.org/aPvTD <https://3v4l.org/aPvTD> >> > > This is a more complex case. In this case the compiler doesn't know in > advance whether the argument is passed by value or by reference. What > happens here is: > > 1. INIT_FCALL determines that we're calling byref(). > 2. CHECK_FUNC_ARG for the first arg determines that this argument is > passed by-reference for this function. > 3. FETCH_DIM_FUNC_ARG on the array will be perform either an FETCH_DIM_R > or to FETCH_DIM_W operation, depending on what CHECK_FUNC_ARG determined. > > I assume that in both my examples $a[$b][$c] would be considered an >> "lvalue"[1] and can be a target of assignment triggered by either the >> assignment operator or calling the function and passing to a by-ref >> parameter. >> >> [1] >> https://en.wikipedia.org/wiki/Value_(computer_science)#Assignment:_l-values_and_r-values >> >> So is there a reason that -> on an array could not trigger the same? Is >> Nikita saying that the performance of those calls performed by-reference >> would not matter because they are always being assigned, at least in the >> former case, but to do so with array expressions would be problematic? >> (Ignoring there is no code in the wild that currently uses the -> operator, >> or does that matter?) >> > > Note that the byref($a[$b][$c]) case only works because we know which > function is being called at the time the argument is passed. If you have > $a[$b][$c]->test() we need to pass $a[$b][$c] by reference (FETCH_DIM_W) or > by value (FETCH_DIM_R) depending on whether $a[$b][$c]->test() accepts the > argument by-value or by-reference. But we can only know that once we have > already evaluated $a[$b][$c] and found out that it is indeed an array. > > The only way around this is to *always* perform a for-write fetch of > $a[$b][$c], even though we don't know that the end result is going to be an > array. However, doing so would pessimize the performance of code operating > on objects. Consider $some_huge_shared_array[0]->foo(). If we fetch > $some_huge_shared_array for write, we'll be required to perform a full > duplication of the array in preparation for a possible future write. If it > turns out that $some_huge_shared_array[0] is actually an object, or that > $some_huge_shared_array[0] is an array and the performed operation is > by-value, then we have performed this copy unnecessarily. > > I don't believe this is acceptable. > > I ask honestly to understand, and not as a rhetorical question. >> >> Additionally, if the case of updating an array variable is not a problem >> but updating an array expression is a problem then why not just limit the >> -> operator to only work on expressions for immutable methods and require >> variables for mutable methods? I would think should be easy enough to >> throw an error for those specific "methods" that would be mutable, such as >> shift() and unshift() if $a[$b][$c]->shift('foo') were called? >> > > There are externalities associated even with the simple $x->foo() case, > though they are less severe. They primarily involve reduced ability to > analyze code in opcache. > > > In either case, this limitation does not seem reasonable to me from a > language design perspective. If $a->push($b) works, then $a[$k]->push($b) > can reasonably be expected to work as well. > > >> Or maybe just completely limit using the -> operator on array variables. >> Don't work on any array expressions for consistency. There is already >> precedence in PHP for operators that work on variables and not on >> expressions: ++, --, and &. >> >> IF we can get a thumbs up from Nikita that one of these would actually be >> possible then I think the next step should be to write up a list of >> proposed array methods that would be implemented to support the -> operator >> with arrays and put them in an RFC, and to flesh out any edge cases. >> > > The only correct way to resolve this issue is to not support mutable > operations. > > > I don't think I agree that this is the only correct way, but I respect > your position of authority on the matter. > > I don't think there's much need for mutable operations. sort() and > shuffle() would be best implemented by returning a new array instead. > array_push() is redundant with $array[]. array_shift() and array_unshift() > should never be used. > > > Why do you say array_shift() and array_unshift() should never be used? > When I wrote the above questions the use-case I was thinking about most was > $a->unshift($value) as I use array_unshift() more than most of the other > array functions. > > Do you mean that these if applied as "methods" to an array should not be > use immutably — meaning in-place is bad but returning an array value that > has been shifted would be okay — or do you have some other reason you > believe that shifting an array is bad? Note the reason I have used them in > the past is when I need to pass an array to a function written by someone > else that expects the array to be ordered. > > Also, what about very large arrays? I assume — which could be a bad > assumption — that PHP internally can be more efficient about how it handles > array_unshift() instead of just duplicating the large array so as to add an > element at the beginning? >
Arrays only support efficient push/pop operations. Performing an array_shift() or array_unshift() requires going through the whole array to reindex all the keys, even though you're only adding/removing one element. In other words, array_shift() and array_unshift() are O(n) operations, not O(1) as one would intuitively expect. If you use shift/unshift as common operations, you're better off using a different data-structure or construction approach. Regards, Nikita
  114661
May 28, 2021 22:11 the.liquid.metal@gmail.com (Hendra Gunawan)
Hello.

> > The only correct way to resolve this issue is to not support mutable operations. >
Correct me if I'm wrong: scalar object will hit memory limit earlier than old API if it applied to $some_huge_shared_array for several method calls.
> > I don't think there's much need for mutable operations. sort() and shuffle() would be best implemented by returning a new array instead. array_push() is redundant with $array[]. array_shift() and array_unshift() should never be used. array_pop() and array_splice() are the only sensible mutable array methods that come to mind, and I daresay we can do without them. >
Suppose that we all agree with that. **Will scalar object preserve most of all functionality of the old API?** Some functions are very handy that keep us away from the gory detail implementation. In array case, we know that PHP array is a combination of array and plain object in JS term. There is a trend in user land PHP library that they are just copying the JS array API and poorly preserving the existing functionality of PHP old API. Implicitly, the author of this thread wants this to happen. If I am not wrong, scalar object date back to before PHP 7.0. Is there any consideration why scalar object was not escalated to the next phase, say to RFC? Regards Hendra Gunawan.