Request for couple memory optimized array improvements

  111744
August 31, 2020 18:13 riikka.kalliomaki@riimu.net (=?UTF-8?Q?Riikka_Kalliom=C3=A4ki?=)
Hello,

For the past couple years I've been working with a PHP code base that
at times deals with quite large payloads memory wise. This has made me
pay more attention to some array operations in PHP that are rather
frustrating to deal with in userland PHP, but could perhaps be
optimized more in PHP core.

A common pattern that I've seen that could dearly use PHP internal
optimization, if possible, would be:

foreach (array_keys($array) as $key) {
}

The problem with this pattern, of course, is the fact that it
needlessly duplicates the array passed to foreach, as can be seen from
this example: https://3v4l.org/MRSv6

I would be ever so grateful, if it would be possible to improve the
PHP engine to detect that fully qualified function name array_keys is
used with foreach, in which case it would simply perform a foreach
over the keys of the array without creating a copy. Optimizing this
wouldn't even require any userland changes. Not sure if the PHP engine
makes it at all feasible, though.

Of course, you could just be using something like this in code:

foreach ($array as $key => $_) {
}

Which has actually become a pattern for us in some memory sensitive
places, but using array_keys inside foreach is a very intuitive and
common approach and doesn't require the unused variable, so it would
be nice to see the usage enshired.

Another similar problem with creating array copies is the detection of
"indexed" arrays (as opposed to associative arrays). Particularly when
dealing with JSON, it's a common need to detect if an array has keys
from 0 to n-1 and in that order. My understanding is that at least in
some cases this would be trivial and fast to tell internally in PHP,
but the functionality is not exposed to userland.

Current common practices include for example:

array_keys($array) === range(0, count($array) - 1)

Memory optimized way of dealing with this is via foreach, but it's
quite cumbersome and again, you must not use array_keys in the
foreach. The following example demonstrates that the worst case
scenario triples the memory usage using range: https://3v4l.org/FiWdk

Interestingly, using "array_values($array) === $array" is the fastest
and most optimized way in best case scenarios, since php just returns
the array itself in cases it's "packed" and "without holes". However,
this could get hairy in worst case scenarios since it starts comparing
the values as well.

So, it would be nice to have a core PHP function implementing this
test, because the userland way of doing it is unnecessarily
unoptimized. I don't know what the function should be called. In our
code base the function is called is_indexed_array, but PHP doesn't
really have a standard term for this, afaik.

I regret my lack of C skills so I can't really propose
implementations, but I would be truly appreciative if these
suggestions would gain some traction.

-- 
Riikka Kalliomäki
https://github.com/Riimu
  111745
August 31, 2020 18:46 markus@fischer.name (Markus Fischer)
On 31.08.20 20:13, Riikka Kalliomäki wrote:
> Another similar problem with creating array copies is the detection of > "indexed" arrays (as opposed to associative arrays).
I'm looking forward to an answer from someone proficient in this area because _I think_ the engine _internally_ keeps track of this for performance optimizations, though I don't think it exposes it. - Markus
  111746
August 31, 2020 18:49 olleharstedt@gmail.com (=?UTF-8?Q?Olle_H=C3=A4rstedt?=)
NB: You have SplFixedArray which can cover some use-cases.

On Mon, 31 Aug 2020, 20:48 Markus Fischer, <markus@fischer.name> wrote:

> On 31.08.20 20:13, Riikka Kalliomäki wrote: > > Another similar problem with creating array copies is the detection of > > "indexed" arrays (as opposed to associative arrays). > > I'm looking forward to an answer from someone proficient in this area > because _I think_ the engine _internally_ keeps track of this for > performance optimizations, though I don't think it exposes it. > > - Markus > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: https://www.php.net/unsub.php > >
  111750
August 31, 2020 20:53 vorismi3@fel.cvut.cz (=?UTF-8?Q?Michael_Vo=C5=99=C3=AD=C5=A1ek_-_=C4=8CVUT_FEL?=)
Optimizing foreach (array_keys($arr) as $k) is very important, not only
because of memory, but because of speed when not all elements needs to
be iterated, like: 

foreach (array_keys($arr) as $k) { 

    if ($k some condition) { 

        break; 

    } 

} 

please, can someone send a PR for this? 

With kind regards / Mit freundlichen Grüßen / S přátelským pozdravem,

Michael Voříšek

On 31 Aug 2020 20:13, Riikka Kalliomäki wrote:

> Hello, > > For the past couple years I've been working with a PHP code base that > at times deals with quite large payloads memory wise. This has made me > pay more attention to some array operations in PHP that are rather > frustrating to deal with in userland PHP, but could perhaps be > optimized more in PHP core. > > A common pattern that I've seen that could dearly use PHP internal > optimization, if possible, would be: > > foreach (array_keys($array) as $key) { > } > > The problem with this pattern, of course, is the fact that it > needlessly duplicates the array passed to foreach, as can be seen from > this example: https://3v4l.org/MRSv6 > > I would be ever so grateful, if it would be possible to improve the > PHP engine to detect that fully qualified function name array_keys is > used with foreach, in which case it would simply perform a foreach > over the keys of the array without creating a copy. Optimizing this > wouldn't even require any userland changes. Not sure if the PHP engine > makes it at all feasible, though. > > Of course, you could just be using something like this in code: > > foreach ($array as $key => $_) { > } > > Which has actually become a pattern for us in some memory sensitive > places, but using array_keys inside foreach is a very intuitive and > common approach and doesn't require the unused variable, so it would > be nice to see the usage enshired. > > Another similar problem with creating array copies is the detection of > "indexed" arrays (as opposed to associative arrays). Particularly when > dealing with JSON, it's a common need to detect if an array has keys > from 0 to n-1 and in that order. My understanding is that at least in > some cases this would be trivial and fast to tell internally in PHP, > but the functionality is not exposed to userland. > > Current common practices include for example: > > array_keys($array) === range(0, count($array) - 1) > > Memory optimized way of dealing with this is via foreach, but it's > quite cumbersome and again, you must not use array_keys in the > foreach. The following example demonstrates that the worst case > scenario triples the memory usage using range: https://3v4l.org/FiWdk > > Interestingly, using "array_values($array) === $array" is the fastest > and most optimized way in best case scenarios, since php just returns > the array itself in cases it's "packed" and "without holes". However, > this could get hairy in worst case scenarios since it starts comparing > the values as well. > > So, it would be nice to have a core PHP function implementing this > test, because the userland way of doing it is unnecessarily > unoptimized. I don't know what the function should be called. In our > code base the function is called is_indexed_array, but PHP doesn't > really have a standard term for this, afaik. > > I regret my lack of C skills so I can't really propose > implementations, but I would be truly appreciative if these > suggestions would gain some traction.
  111752
August 31, 2020 21:31 maxsem.wiki@gmail.com (Max Semenik)
On Mon, Aug 31, 2020 at 11:53 PM Michael Voříšek - ČVUT FEL <
vorismi3@fel.cvut.cz> wrote:

> Optimizing foreach (array_keys($arr) as $k) is very important, not only > because of memory, but because of speed when not all elements needs to > be iterated, like: > > foreach (array_keys($arr) as $k) { > > if ($k some condition) { > > break; > > } > > } >
Can an iterator be the answer? E.g. foreach (new ArrayKeysIterator($arr) as $k) { ... } -- Best regards, Max Semenik
  111753
August 31, 2020 21:39 vorismi3@fel.cvut.cz (=?UTF-8?Q?Michael_Vo=C5=99=C3=AD=C5=A1ek_-_=C4=8CVUT_FEL?=)
I would highly prefer php optimalization for it. ArrayKeysIterator can
not even used if the input is iterable instead of pure array. 

With kind regards / Mit freundlichen Grüßen / S přátelským pozdravem,

Michael Voříšek

On 31 Aug 2020 23:31, Max Semenik wrote:

> On Mon, Aug 31, 2020 at 11:53 PM Michael Voříšek - ČVUT FEL <vorismi3@fel.cvut.cz> wrote: > >> Optimizing foreach (array_keys($arr) as $k) is very important, not only >> because of memory, but because of speed when not all elements needs to >> be iterated, like: >> >> foreach (array_keys($arr) as $k) { >> >> if ($k some condition) { >> >> break; >> >> } >> >> } > > Can an iterator be the answer? E.g. > > foreach (new ArrayKeysIterator($arr) as $k) { ... } > > -- > > Best regards, > Max Semenik
  111765
September 1, 2020 15:26 josh@joshbruce.dev (Josh Bruce)
That’s interesting as I haven’t played with iterates and generators much. 

If the iterator can’t take an iterable, the idea of something like __toArray seems way more convenient: https://wiki.php.net/rfc/to-array

Cheers,
Josh

> On Aug 31, 2020, at 4:39 PM, Michael Voříšek - ČVUT FEL <vorismi3@fel.cvut.cz> wrote: > > I would highly prefer php optimalization for it. ArrayKeysIterator can > not even used if the input is iterable instead of pure array. > With kind regards / Mit freundlichen Grüßen / S přátelským pozdravem, > > Michael Voříšek > >> On 31 Aug 2020 23:31, Max Semenik wrote: >> >>> On Mon, Aug 31, 2020 at 11:53 PM Michael Voříšek - ČVUT FEL <vorismi3@fel.cvut.cz> wrote: >>> Optimizing foreach (array_keys($arr) as $k) is very important, not only >>> because of memory, but because of speed when not all elements needs to >>> be iterated, like: foreach (array_keys($arr) as $k) { if ($k some condition) { break; } } >> Can an iterator be the answer? E.g. foreach (new ArrayKeysIterator($arr) as $k) { ... } -- >> Best regards, >> Max Semenik
  111754
August 31, 2020 21:50 jbafford@zort.net (John Bafford)
Hi Riikka,

> On Aug 31, 2020, at 14:13, Riikka Kalliomäki kalliomaki@riimu.net> wrote: > > A common pattern that I've seen that could dearly use PHP internal > optimization, if possible, would be: > > foreach (array_keys($array) as $key) { > }
I have a draft RFC (https://wiki.php.net/rfc/foreach_void) and patch (https://github.com/jbafford/php-src/tree/foreachvoid against php 7.0, I think) that helps with this, by allowing the following syntax: foreach($iterable as $key => void) { ... } This would iterate over the keys of the iterable, and does not retrieve the values at all. In theory, this should be a general performance win any time one needs to iterate over only the keys of an iterable, because it does not require the use of an O(n) iteration and building of the array that array_keys() would, plus it works on non-array types (such as generators or iterators). It also is more performant than foreach($iterable as $key => $_) {}, because it omits the opcode that instructs the engine to retrieve the value. (Presumably, a future direction could include using this request to inform generators or iterators to only return keys, and not values, which could further improve performance, especially if value generation is expensive. But that is out of scope for this RFC.) If this is something we'd like for PHP 8.1 and there are no major objections to the idea, then after 8.0 is released, I can move the RFC out of Draft and into Under Discussion, and presuming a vote passes, I'll update the patch as necessary to work against 8.0. But my time is limited and I'm not willing to put further time into the code unless an RFC vote passes. Thoughts anyone? -John
  111755
August 31, 2020 23:21 tysonandre775@hotmail.com (tyson andre)
Hi Riikka Kalliomäki,

> Another similar problem with creating array copies is the detection of > "indexed" arrays (as opposed to associative arrays). Particularly when > dealing with JSON, it's a common need to detect if an array has keys > from 0 to n-1 and in that order. My understanding is that at least in > some cases this would be trivial and fast to tell internally in PHP, > but the functionality is not exposed to userland. > > Current common practices include for example: > > array_keys($array) === range(0, count($array) - 1) > > Memory optimized way of dealing with this is via foreach, but it's > quite cumbersome and again, you must not use array_keys in the > foreach. The following example demonstrates that the worst case > scenario triples the memory usage using range: https://3v4l.org/FiWdk
This was brought up in https://externals.io/message/109760#109795 there's a PR against php-src but there was some discussion about whether php should strive to add an actual "list" type instead and what it should be named - https://github.com/php/php-src/pull/4886 I'm fairly certain is_list() would have to go through the RFC process given the proposals for alternate approaches or alternate names Cheers, - Tyson