Regarding array_shift()/array_unshift()

  114686
June 1, 2021 22:34 mike@newclarity.net (Mike Schinkel)
Hi Nikita,

>> On May 28, 2021, at 10:31 AM, Nikita Popov ppv@gmail.com <mailto:nikita.ppv@gmail.com>> wrote: > >> I don't think there's much need for mutable operations. sort() and shuffle() would be best implemented by returning a new array instead. array_push() is redundant with $array[]. array_shift() and array_unshift() should never be used. > > Why do you say array_shift() and array_unshift() should never be used? When I wrote the above questions the use-case I was thinking about most was $a->unshift($value) as I use array_unshift() more than most of the other array functions. > > Do you mean that these if applied as "methods" to an array should not be use immutably — meaning in-place is bad but returning an array value that has been shifted would be okay — or do you have some other reason you believe that shifting an array is bad? Note the reason I have used them in the past is when I need to pass an array to a function written by someone else that expects the array to be ordered. > > Also, what about very large arrays? I assume — which could be a bad assumption — that PHP internally can be more efficient about how it handles array_unshift() instead of just duplicating the large array so as to add an element at the beginning? > > Arrays only support efficient push/pop operations. Performing an array_shift() or array_unshift() requires going through the whole array to reindex all the keys, even though you're only adding/removing one element. In other words, array_shift() and array_unshift() are O(n) operations, not O(1) as one would intuitively expect. If you use shift/unshift as common operations, you're better off using a different data-structure or construction approach.
I appreciate your explanation regarding array_shift()/array_unshift(). It left me curious though to see how those functions and have been used in userland, and what use-cases they frequently solve. I decided to focused on array_unshift() because I have found reason to use it numerous times in the past but rarely use array_shift(). I searched some of the most popular PHP projects on Github and found the following # of uses for array_unshift() while the last project was one of yours: Phan:10 https://github.com/search?q=array_unshift+language%3APHP+repo%3Aphan%2Fphan&type=Code Phabricator:27 https://github.com/search?l=&p=1&q=array_unshift+language%3APHP+repo%3Aphacility%2Fphabricator&type=Code Laravel:8 https://github.com/search?q=array_unshift+language%3APHP+repo%3Alaravel%2Fframework&type=Code Symfony:34 https://github.com/search?q=array_unshift+language%3APHP+repo%3Asymfony%2Fsymfony&type=Code PHP-CS-Fixer:8 https://github.com/search?q=array_unshift+language%3APHP+repo%3AFriendsOfPHP%2FPHP-CS-Fixer&type=Code Composer:7 https://github.com/search?q=array_unshift+language%3APHP+repo%3Acomposer%2Fcomposer&type=Code Statamic:2 https://github.com/search?q=array_unshift+language%3APHP+repo%3Astatamic%2Fcms&type=Code Magento2:75 https://github.com/search?q=array_unshift+language%3APHP+repo%3Amagento%2Fmagento2&type=Code WordPress:22 https://github.com/search?q=array_unshift+language%3APHP+repo%3AWordPress%2Fwordpress-develop&type=Code PHPMailer:1 https://github.com/search?q=array_unshift+language%3APHP+repo%3APHPMailer%2FPHPMailer&type=Code PHPBackporter:1 https://github.com/nikic/PHP-Backporter/blob/master/lib/PHPBackporter/Converter/Closure.php#L100 <https://github.com/nikic/PHP-Backporter/blob/master/lib/PHPBackporter/Converter/Closure.php#L100> From a manual review of those usages I have found roughly the following use-case categories: ------ Testing — It seems a lot of tests use array_unshift(), for a variety of pre- and post- test uses. Path Manipulation — Such as when relative paths are exploded, the base path is inserted, and path segments imploded. Ordered Processing/Filtering — Where a list of elements are filtered to insert values to be processed first such as loading "plugins", inserting regular expressions for URL route processing, "policy modes", adding optional positional arguments to default values for a "builder" (command line, SQL, etc.), inserting custom menu items in front of default menu items, inserting higher priority middleware, insert column definitions to be displayed, inserting custom paths to look for theme template in front of default paths, inserting mime-types into a list of default acceptable mime-types, inserting directories for autoloaders, etc. Parsers/Tokenizers — When writing a parser or tokenizer and the need to insert a value or token at the head of the list. Parameter Delegation — When a function accepts a list of parameter, especially when they include variadic, and then needs to pass on the same set of parameters to another function such as with call_user_func_array(), sprints(), etc.Additional similar uses are to insert $this into return value of func_get_args() to allow delegation to a global function, also using call_user_func_array(). ------ There may be more but in general those were the main reasons I found for using array_unshift() that I am not sure could be replaced by another data structure to any advantage. It's hard to replace an array with something else when your source for the data is a function that returns an array. (BTW, Phabricator uses array_unshift() more than any other, but uses it in contexts that would be easily replaced with something more performant. Ironic given that the project is no longer being maintained.) So I was also intrigued by your statement that "you're better off using a different data-structure or construction approach" since I think we mostly use these functions when we have to use an array, not because we chose to. Such as when we are either given an array by some function we are not in control of, or when we need to pass a value to an ordered function we are not in control of. In the above use-cases I would really like to know if you can envision other data structures or constructions being better and if so what they are? Also, almost every one of the use-cases mentioned above generally deal with very short arrays, such as parameter delegation. Is O() really problematic when is rather small? ----- But from all this I do agree with you that just returning an array would likely be acceptable. ----- This would not be a great solution for really large arrays, but then we can't eliminate the need for a developer to have a reasonable level of knowledge, right? So: $unshifted = $array->unshift($new_element); $shifted = $array->shift(); But, we could also possibly use better names for these: $right_shifted = $array->shift_right(...$new_element(s)); $left_shifted = $array->shift_left([element_count]); OR $prepended = $array->prepend(...$new_element(s)); $prebated = $array->prebate([element_count]); -Mike
  114691
June 2, 2021 08:25 j.boggiano@seld.be (Jordi Boggiano)
On 02/06/2021 00:34, Mike Schinkel wrote:
> But from all this I do agree with you that just returning an array > would likely be acceptable. > ----- > > This would not be a great solution for really large arrays, but then we can't eliminate the need for a developer to have a reasonable level of knowledge, right? So: > > $unshifted = $array->unshift($new_element); > $shifted = $array->shift(); > > But, we could also possibly use better names for these: > > $right_shifted = $array->shift_right(...$new_element(s)); > $left_shifted = $array->shift_left([element_count]);
IMO for unshift() it'd be fine to return a new array, but when processing a list of things in a FIFO pattern I often used array_shift() to get the first "job" out of the array. If $array->shift() returns a new array then how do I access the first item? I mean sure one can call reset first to get it, but I find this isn't great ergonomics compared to the existing. Best, Jordi -- Jordi Boggiano @seldaek - https://seld.be
  114698
June 2, 2021 22:59 mike@newclarity.net (Mike Schinkel)
> On Jun 2, 2021, at 4:25 AM, Jordi Boggiano boggiano@seld.be> wrote: > > On 02/06/2021 00:34, Mike Schinkel wrote: >> But from all this I do agree with you that just returning an array would likely be acceptable. >> ----- >> >> This would not be a great solution for really large arrays, but then we can't eliminate the need for a developer to have a reasonable level of knowledge, right? So: >> >> $unshifted = $array->unshift($new_element); >> $shifted = $array->shift(); >> >> But, we could also possibly use better names for these: >> >> $right_shifted = $array->shift_right(...$new_element(s)); >> $left_shifted = $array->shift_left([element_count]); > > IMO for unshift() it'd be fine to return a new array, but when processing a list of things in a FIFO pattern I often used array_shift() to get the first "job" out of the array. If $array->shift() returns a new array then how do I access the first item?
That is a really excellent point, something I did not think to consider given how rarely I use array_shift(). But your comment causes me to ponder a number of follow up questions, some of which are tangential. If any reader feels these tangents are worth discussing please make another email and quote my relevant comments so we can have a dedicated thread for each. 1.) Given Nikita's position that it would only be viable to offer a syntax that simulates method calling for arrays if the methods themselves are immutable can you envision a solution for allowing $array->shift() functionality that would address getting both element and shifted array without resorting to by-reference parameters? 2.) One approach I can envision is to add functionality to PHP that would allow the returning multiple values from a function or method — NOT to be confused with returning an array that can be destructured with list() or []. There is certainly precedence in other languages to prove the validity of the programming model, and they solve some thorny issues quite nicely. I have wanted to bring this up on the list for eons but have feared not having my use-cases written up well enough to keep people from immediately having a negative reaction and thus prejudicing too many on the list for it ever to be viable. But I am going to risk it and hope this use-case is sufficient to peak people's interest, or at least not have them summarily dismiss the idea with prejudice. If PHP were to allow multiple returns then we could have one of the following (not sure which would be best): $element, $shifted = $array->shift(); OR $shifted, $element = $array->shift(); Following the lead of another language, we could use an underscore placeholder to indicate not to capture the return value if we don't need it: $element, _ = $array->shift(); _, $shifted = $array->shift(); Again, this is NOT the same as array destructuring where the function would need to return an array containing the array. 3.) When you have used an array to contain a list of "jobs", are you creating that array in your code or have you gotten the array from a function within PHP that provides the array such as scandir() or glob()? 4.) Given Nikita's comments on the O() performance of array_shift() have you considered instead simulating a FIFO queue using an array as a stack? Any reason that would not work for your needs? 5.) Alternately given Nikita's comments on array_shift() have you considered using SqlQueue: https://www.php.net/manual/en/book.spl.php? It seems a perfect fit for your FIFO use-case to the extent you've explained it. Honest question, not a challenge. Wouldn't that work better than an array? 6.) More broadly, this made me wonder about the SPL data structure classes in general. I honestly did not think of SplQueue at first because I so rarely use any of the SPL classes. I was not until I started to ponder what "different data-structure or construction approach" might be appropriate, referencing Nikita's comment. Then I wondered why I don't think to use those classes more often, and I came up with these hypotheses, in ascending order of likelihood in my opinion: A.) Admittedly superficial but the Spl class names make code look more complicated. I'd much rather see Queue than SplQueue, or more specifically — while ignoring the still-raging debate over the best way tonamespace built-in PHP classes — I'd much rather use an \SPL\Queue than an SplQueue. Maybe PHP could create aliases for these classes but within a namespace? B.) Out-of-sight, out-of-mind. I rarely see people use the SPL functions in open-source code, discussed on StackExchange, written about in articles or on mailing lists. C1.) The SPL data structure classes do not have anywhere near the functionality that arrays have built-in functions for, such as: array_combine(), array_fill(), array_map(), array_reduce(), array_slice(), array_splice(), array_search(), array_filter(), array_walk(), etc. etc. Maybe PHP could add more of that kind of functionality for these data structures? C2.) Also there are no performant ways to convert one related data structure to another. You can't easily convert a SplStack to a SplQueue, or a SqlQueue to a SplPriorityQueue, and there may be other appropriate conversions that PHP obviously does not provide. D.) Few if any built-in PHP functions or methods accept instances of these classes as parameters, nor return instances of these classes. Examples include: sort(), file(), glob(), scandir(), iterator_to_array(), str_split(), str_replace(), preg_match_all(), str_getcsv(), fputcsv()/fgetcsv(), func_get_args(), call_user_func_array(), implode()/explode(), sprintf(), file_put_contents(), var_dump(), var_export(), filter_input_array(), mysql_fetch_row(), get_defined_constants(), get_defined_vars(), debug_backtrace(), etc. etc. So if you are always being handed an array, no wonder you rarely ever use the SPL classes, right? If we were to address any or all of A-to-D maybe PHP developers would be more inclined to use different data-structure and/or construction approaches that better fit their use-cases? So in summary the tangential topics I addressed were: 1.) PHP potentially supporting multiple return values, and 2.) Renewed focus on SPL data structures. Again, if anyone is interested in these tangential topics and wants to discuss further, please create a new thread and quote my applicable comments from this email. -Mike
  114699
June 2, 2021 23:07 larry@garfieldtech.com ("Larry Garfield")
On Wed, Jun 2, 2021, at 5:59 PM, Mike Schinkel wrote:
> > On Jun 2, 2021, at 4:25 AM, Jordi Boggiano boggiano@seld.be> wrote:
> > IMO for unshift() it'd be fine to return a new array, but when processing a list of things in a FIFO pattern I often used array_shift() to get the first "job" out of the array. If $array->shift() returns a new array then how do I access the first item? > > That is a really excellent point, something I did not think to consider > given how rarely I use array_shift(). > > But your comment causes me to ponder a number of follow up questions, > some of which are tangential. If any reader feels these tangents are > worth discussing please make another email and quote my relevant > comments so we can have a dedicated thread for each. > > 1.) Given Nikita's position that it would only be viable to offer a > syntax that simulates method calling for arrays if the methods > themselves are immutable can you envision a solution for allowing > $array->shift() functionality that would address getting both element > and shifted array without resorting to by-reference parameters?
$list->head() (returns first item) $list->tail() (returns all items but the first one) Straight outa FP. :-) Can probably do some internal optimizations to make them more efficient, but now we're back to the discussion of a List or Vector type built in, which always runs into the Generics discussion, so... yeah. --Larry Garfield
  114700
June 2, 2021 23:15 mike@newclarity.net (Mike Schinkel)
> On Jun 2, 2021, at 7:07 PM, Larry Garfield <larry@garfieldtech.com> wrote: > > On Wed, Jun 2, 2021, at 5:59 PM, Mike Schinkel wrote: >>> On Jun 2, 2021, at 4:25 AM, Jordi Boggiano boggiano@seld.be> wrote: > >>> IMO for unshift() it'd be fine to return a new array, but when processing a list of things in a FIFO pattern I often used array_shift() to get the first "job" out of the array. If $array->shift() returns a new array then how do I access the first item? >> >> That is a really excellent point, something I did not think to consider >> given how rarely I use array_shift(). >> >> But your comment causes me to ponder a number of follow up questions, >> some of which are tangential. If any reader feels these tangents are >> worth discussing please make another email and quote my relevant >> comments so we can have a dedicated thread for each. >> >> 1.) Given Nikita's position that it would only be viable to offer a >> syntax that simulates method calling for arrays if the methods >> themselves are immutable can you envision a solution for allowing >> $array->shift() functionality that would address getting both element >> and shifted array without resorting to by-reference parameters? > > $list->head() (returns first item) > $list->tail() (returns all items but the first one)
I assume $list->tail() would still have the O() problem in PHP, though? -Mike
  114701
June 2, 2021 23:47 larry@garfieldtech.com ("Larry Garfield")
On Wed, Jun 2, 2021, at 6:15 PM, Mike Schinkel wrote:
> > On Jun 2, 2021, at 7:07 PM, Larry Garfield <larry@garfieldtech.com> wrote: > > > > On Wed, Jun 2, 2021, at 5:59 PM, Mike Schinkel wrote: > >>> On Jun 2, 2021, at 4:25 AM, Jordi Boggiano boggiano@seld.be> wrote: > > > >>> IMO for unshift() it'd be fine to return a new array, but when processing a list of things in a FIFO pattern I often used array_shift() to get the first "job" out of the array. If $array->shift() returns a new array then how do I access the first item? > >> > >> That is a really excellent point, something I did not think to consider > >> given how rarely I use array_shift(). > >> > >> But your comment causes me to ponder a number of follow up questions, > >> some of which are tangential. If any reader feels these tangents are > >> worth discussing please make another email and quote my relevant > >> comments so we can have a dedicated thread for each. > >> > >> 1.) Given Nikita's position that it would only be viable to offer a > >> syntax that simulates method calling for arrays if the methods > >> themselves are immutable can you envision a solution for allowing > >> $array->shift() functionality that would address getting both element > >> and shifted array without resorting to by-reference parameters? > > > > $list->head() (returns first item) > > $list->tail() (returns all items but the first one) > > I assume $list->tail() would still have the O() problem in PHP, though? > > -Mike
If implemented like an array, yes. If implemented in a more efficient way in C, potentially not. Or at least a smaller constant multiplier with n. (If something is O(n) but the cost per in is very small, then it being O(n) is not a big deal. If the cost per n is huge, it's a very huge deal.) --Larry Garfield
  114704
June 3, 2021 09:31 lauri.kentta@gmail.com (=?UTF-8?Q?Lauri_Kentt=C3=A4?=)
On 2021-06-03 01:59, Mike Schinkel wrote:
> 1.) Given Nikita's position that it would only be viable to offer a > syntax that simulates method calling for arrays if the methods > themselves are immutable can you envision a solution for allowing > $array->shift() functionality that would address getting both element > and shifted array without resorting to by-reference parameters?
Spread operator support in assignments would solve this nicely: [$first, ...$rest] = $array; -- Lauri Kenttä
  114724
June 4, 2021 04:15 mike@newclarity.net (Mike Schinkel)
> On Jun 3, 2021, at 5:31 AM, Lauri Kenttä kentta@gmail.com> wrote: > > On 2021-06-03 01:59, Mike Schinkel wrote: >> 1.) Given Nikita's position that it would only be viable to offer a >> syntax that simulates method calling for arrays if the methods >> themselves are immutable can you envision a solution for allowing >> $array->shift() functionality that would address getting both element >> and shifted array without resorting to by-reference parameters? > > Spread operator support in assignments would solve this nicely: > [$first, ...$rest] = $array;
Excellent idea! Anyone else have thoughts on potential use of the spread/variadic operator in array destructuring assignments? -Mike
  114707
June 3, 2021 11:32 j.boggiano@seld.be (Jordi Boggiano)
Hah, I sure did not expect my answer to trigger these 5 complex RFCs 
worth of work ;)


On 03/06/2021 00:59, Mike Schinkel wrote:

> > 4.) Given Nikita's comments on the O() performance of array_shift() have you considered instead simulating a FIFO queue using an array as a stack? Any reason that would not work for your needs?
I think your analysis is overall on point. There are probably better data structures that could lead to me not needing array_shift in most places I use it. However arrays are quite the tool in PHP, and lazy as it may be they usually do the job well enough and the perf overhead of reindexing (which tbh I was not aware of until this thread) seems unnoticeable in most case (a quick test shows 1ms to shift a 300K item array on my loaded laptop, which is really quite acceptable).
> So in summary the tangential topics I addressed were: > > 1.) PHP potentially supporting multiple return values, and > 2.) Renewed focus on SPL data structures. > > Again, if anyone is interested in these tangential topics and wants to discuss further, please create a new thread and quote my applicable comments from this email. To your other points: Multiple return values and SPL with better
docs/naming/usability would absolutely be great to have from my userland perspective. Best, Jordi -- Jordi Boggiano @seldaek - https://seld.be