RFC: CachedIterable (rewindable, allows any key&repeating keys)

  113136
February 11, 2021 03:47 tysonandre775@hotmail.com (tyson andre)
Hi internals,

I've created a new RFC https://wiki.php.net/rfc/cachediterable adding CachedIterable,
which eagerly evaluates any iterable and contains an immutable copy of the keys and values of the iterable it was constructed from

This has the proposed signature:

```
final class CachedIterable implements IteratorAggregate, Countable, JsonSerializable
{
    public function __construct(iterable $iterator) {}
    public function getIterator(): InternalIterator {}
    public function count(): int {}
    // [[$key1, $value1], [$key2, $value2]]
    public static function fromPairs(array $pairs): CachedIterable {}
    // [[$key1, $value1], [$key2, $value2]]
    public function toPairs(): array{} 
    public function __serialize(): array {}  // [$k1, $v1, $k2, $v2,...]
    public function __unserialize(array $data): void {}
 
    // useful for converting iterables back to arrays for further processing
    public function keys(): array {}  // [$k1, $k2, ...]
    public function values(): array {}  // [$v1, $v2, ...]
    // useful to efficiently get offsets at the middle/end of a long iterable
    public function keyAt(int $offset): mixed {}
    public function valueAt(int $offset): mixed {}
 
    // '[["key1","value1"],["key2","value2"]]' instead of '{...}'
    public function jsonSerialize(): array {}
    // dynamic properties are forbidden
}
```

Currently, PHP does not provide a built-in way to store the state of an arbitrary iterable for reuse later
(when the iterable has arbitrary keys, or when keys might be repeated). It would be useful to do so for many use cases, such as:

1. Creating a rewindable copy of a non-rewindable Traversable 
2. Generating an IteratorAggregate from a class still implementing Iterator
3. In the future, providing internal or userland helpers such as iterable_flip(iterable $input), iterable_take(iterable $input, int $limit),
    iterable_chunk(iterable $input, int $chunk_size), iterable_reverse(), etc (these are not part of the RFC)
4. Providing memory-efficient random access to both keys and values of arbitrary key-value sequences 

Having this implemented as an internal class would also allow it to be much more efficient than a userland solution
(in terms of time to create, time to iterate over the result, and total memory usage). See https://wiki.php.net/rfc/cachediterable#benchmarks

After some consideration, this is being created as a standalone RFC, and going in the global namespace:

- Based on early feedback on https://wiki.php.net/rfc/any_all_on_iterable#straw_poll (on the namespace preferred in previous polls)
  It seems like it's way too early for me to be proposing namespaces in any RFCs for PHP adding to modules that already exist, when there is no consensus.

  An earlier attempt by others on creating a policy for namespaces in general(https://wiki.php.net/rfc/php_namespace_policy#vote) also did not pass.

  Having even 40% of voters opposed to introducing a given namespace (in pre-existing modules)
  makes it an impractical choice when RFCs require a 2/3 majority to pass.
- While some may argue that a different namespace might pass,
  https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote had a sharp dropoff in feedback after the 3rd form.
  I don't know how to interpret that - e.g. are unranked namespaces preferred even less than the options that were ranked or just not seen as affecting the final result.

Any other feedback unrelated to namespaces?

Thanks,
- Tyson
  113151
February 12, 2021 05:14 tysonandre775@hotmail.com (tyson andre)
Hi internals,

> I've created a new RFC https://wiki.php.net/rfc/cachediterable adding CachedIterable, > which eagerly evaluates any iterable and contains an immutable copy of the keys and values of the iterable it was constructed from > > This has the proposed signature: > > ``` > final class CachedIterable implements IteratorAggregate, Countable, JsonSerializable > { >     public function __construct(iterable $iterator) {} >     public function getIterator(): InternalIterator {} >     public function count(): int {} >     // [[$key1, $value1], [$key2, $value2]] >     public static function fromPairs(array $pairs): CachedIterable {} >     // [[$key1, $value1], [$key2, $value2]] >     public function toPairs(): array{} >     public function __serialize(): array {}  // [$k1, $v1, $k2, $v2,...] >     public function __unserialize(array $data): void {} >   >     // useful for converting iterables back to arrays for further processing >     public function keys(): array {}  // [$k1, $k2, ...] >     public function values(): array {}  // [$v1, $v2, ...] >     // useful to efficiently get offsets at the middle/end of a long iterable >     public function keyAt(int $offset): mixed {} >     public function valueAt(int $offset): mixed {} >   >     // '[["key1","value1"],["key2","value2"]]' instead of '{...}' >     public function jsonSerialize(): array {} >     // dynamic properties are forbidden > } > ``` > > Currently, PHP does not provide a built-in way to store the state of an arbitrary iterable for reuse later > (when the iterable has arbitrary keys, or when keys might be repeated). It would be useful to do so for many use cases, such as: > > 1. Creating a rewindable copy of a non-rewindable Traversable > 2. Generating an IteratorAggregate from a class still implementing Iterator > 3. In the future, providing internal or userland helpers such as iterable_flip(iterable $input), iterable_take(iterable $input, int $limit), >     iterable_chunk(iterable $input, int $chunk_size), iterable_reverse(), etc (these are not part of the RFC) > 4. Providing memory-efficient random access to both keys and values of arbitrary key-value sequences > > Having this implemented as an internal class would also allow it to be much more efficient than a userland solution > (in terms of time to create, time to iterate over the result, and total memory usage). See https://wiki.php.net/rfc/cachediterable#benchmarks > > After some consideration, this is being created as a standalone RFC, and going in the global namespace: > > - Based on early feedback on https://wiki.php.net/rfc/any_all_on_iterable#straw_poll (on the namespace preferred in previous polls) >   It seems like it's way too early for me to be proposing namespaces in any RFCs for PHP adding to modules that already exist, when there is no consensus. > >   An earlier attempt by others on creating a policy for namespaces in general(https://wiki.php.net/rfc/php_namespace_policy#vote) also did not pass. > >   Having even 40% of voters opposed to introducing a given namespace (in pre-existing modules) >   makes it an impractical choice when RFCs require a 2/3 majority to pass. > - While some may argue that a different namespace might pass, >   https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote had a sharp dropoff in feedback after the 3rd form. >   I don't know how to interpret that - e.g. are unranked namespaces preferred even less than the options that were ranked or just not seen as affecting the final result. > > Any other feedback unrelated to namespaces?
After feedback, I have decided to postpone the start of voting on this (or other proposals related to SPL or iterables) until April at the earliest, to avoid interfering with the ongoing SPL naming policy discussions. Thanks, - Tyson
  113155
February 12, 2021 15:50 drealecs@gmail.com (=?UTF-8?Q?Alexandru_P=C4=83tr=C4=83nescu?=)
On Thu, Feb 11, 2021 at 5:47 AM tyson andre <tysonandre775@hotmail.com>
wrote:

> Hi internals, > > I've created a new RFC https://wiki.php.net/rfc/cachediterable adding > CachedIterable, > which eagerly evaluates any iterable and contains an immutable copy of the > keys and values of the iterable it was constructed from > >
> Any other feedback unrelated to namespaces? > > Thanks, > - Tyson > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: https://www.php.net/unsub.php > > Hi Tyson,
I needed this feature a few years ago. In that case, the source was a generator that was slowly generating data while fetching them from a paginated API that had rate limits. The result wrapping iterator was used at runtime in multiple (hundreds) other iterators that were processing elements in various ways (technical analysis indicator on time series) and after that merged back with some MultipleIterator. Just for reference, this is how the implementation in userland was and I was happy with it as a solution: https://gist.github.com/drealecs/ad720b51219675a8f278b8534e99d7c7 Not sure if it's useful but I thought I should share it as I noticed you mentioned in your example for PolyfillIterator you chose not to use an IteratorAggregate because complexity Was wondering how much inefficient this would be compared to the C implementation. Also, the implementation having the ability to be lazy was important and I think that should be the case here as well, by design, especially as we are dealing with Generators. Regards, Alex
  113157
February 12, 2021 18:19 tysonandre775@hotmail.com (tyson andre)
Hi Alex,

> > I've created a new RFC https://wiki.php.net/rfc/cachediterable adding CachedIterable, > > which eagerly evaluates any iterable and contains an immutable copy of the keys and values of the iterable it was constructed from > > > > > > Any other feedback unrelated to namespaces? > > Hi Tyson, > > I needed this feature a few years ago. In that case, the source was a generator that was slowly generating data while fetching them from a paginated API that had rate limits. > The result wrapping iterator was used at runtime in multiple (hundreds) other iterators that were processing elements in various ways (technical analysis indicator on time series) and after that merged back with some MultipleIterator. > > Just for reference, this is how the implementation in userland was and I was happy with it as a solution: > https://gist.github.com/drealecs/ad720b51219675a8f278b8534e99d7c7 > > Not sure if it's useful but I thought I should share it as I noticed you mentioned in your example for PolyfillIterator you chose not to use an IteratorAggregate because complexity > Was wondering how much inefficient this would be compared to the C implementation.
That was for simplicity(shortness) of the RFC for people reading the polyfill. I don't expect it to affect CPU timing or memory usage for large arrays in the polyfill. Userland lazy iterable implementations could still benefit from having a CachedIterable around, by replacing the lazy IteratorAggregate with a Cached Iterable when the end of iteration was detected.
> Also, the implementation having the ability to be lazy was important and I think that should be the case here as well, by design, especially as we are dealing with Generators.
We're dealing with the entire family of iterables, including but not limited to Generators, arrays, user-defined Traversables, etc. I'd considered that but decided not to include it in the RFC's scope. If I was designing that, it would be a separate class `LazyCachedIterable`. Currently, `CachedIterable` has several useful properties: 1. Serialization/Unserializable behavior is predictable - if the object was constructed it can be safely serialized if keys/values can be serialized. 2. Iteration has no side effects (e.g. won't throw) 3. keyAt(int $offset) and so on have predictable behavior, good performance, and only one throwable type 4. Memory usage is small - this might also be the case for a LazyIterable depending on implementation choices/constraints. Adding lazy iteration support would make it no longer have some of those properties. While I'd be in favor of that if it was implemented correctly, I don't plan to work on implementing this until I know if the addition of `CachedIterable` to a large family of iterable classes would pass. CachedIterable has some immediate benefits on problems I was actively working on, such as: 1. Being able to represent iterable functions such as iterable_reverse() 2. Memory efficiency and time efficiency for iteration 3. Being something internal code could return for getIterator(), etc. Regards, Tyson
  114790
June 9, 2021 00:22 tysonandre775@hotmail.com (tyson andre)
Hi internals,

> I've created a new RFC https://wiki.php.net/rfc/cachediterable adding CachedIterable, > which eagerly evaluates any iterable and contains an immutable copy of the keys and values of the iterable it was constructed from > > This has the proposed signature: > > ``` > final class CachedIterable implements IteratorAggregate, Countable, JsonSerializable > { >     public function __construct(iterable $iterator) {} >     public function getIterator(): InternalIterator {} >     public function count(): int {} >     // [[$key1, $value1], [$key2, $value2]] >     public static function fromPairs(array $pairs): CachedIterable {} >     // [[$key1, $value1], [$key2, $value2]] >     public function toPairs(): array{} >     public function __serialize(): array {}  // [$k1, $v1, $k2, $v2,...] >     public function __unserialize(array $data): void {} >   >     // useful for converting iterables back to arrays for further processing >     public function keys(): array {}  // [$k1, $k2, ...] >     public function values(): array {}  // [$v1, $v2, ...] >     // useful to efficiently get offsets at the middle/end of a long iterable >     public function keyAt(int $offset): mixed {} >     public function valueAt(int $offset): mixed {} >   >     // '[["key1","value1"],["key2","value2"]]' instead of '{...}' >     public function jsonSerialize(): array {} >     // dynamic properties are forbidden > } > ``` > > Currently, PHP does not provide a built-in way to store the state of an arbitrary iterable for reuse later > (when the iterable has arbitrary keys, or when keys might be repeated). It would be useful to do so for many use cases, such as: > > 1. Creating a rewindable copy of a non-rewindable Traversable > 2. Generating an IteratorAggregate from a class still implementing Iterator > 3. In the future, providing internal or userland helpers such as iterable_flip(iterable $input), iterable_take(iterable $input, int $limit), >     iterable_chunk(iterable $input, int $chunk_size), iterable_reverse(), etc (these are not part of the RFC) > 4. Providing memory-efficient random access to both keys and values of arbitrary key-value sequences > > Having this implemented as an internal class would also allow it to be much more efficient than a userland solution > (in terms of time to create, time to iterate over the result, and total memory usage). See https://wiki.php.net/rfc/cachediterable#benchmarks > > After some consideration, this is being created as a standalone RFC, and going in the global namespace: > > - Based on early feedback on https://wiki.php.net/rfc/any_all_on_iterable#straw_poll (on the namespace preferred in previous polls) >   It seems like it's way too early for me to be proposing namespaces in any RFCs for PHP adding to modules that already exist, when there is no consensus. > >   An earlier attempt by others on creating a policy for namespaces in general(https://wiki.php.net/rfc/php_namespace_policy#vote) also did not pass. > >   Having even 40% of voters opposed to introducing a given namespace (in pre-existing modules) >   makes it an impractical choice when RFCs require a 2/3 majority to pass. > - While some may argue that a different namespace might pass, >   https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote had a sharp dropoff in feedback after the 3rd form. >   I don't know how to interpret that - e.g. are unranked namespaces preferred even less than the options that were ranked or just not seen as affecting the final result.
A heads up - I will probably start voting on https://wiki.php.net/rfc/cachediterable this weekend after https://wiki.php.net/rfc/cachediterable_straw_poll is finished. Any other feedback on CachedIterable? Thanks, Tyson
  114791
June 9, 2021 04:47 internals@lists.php.net ("Levi Morrison via internals")
On Tue, Jun 8, 2021 at 6:22 PM tyson andre <tysonandre775@hotmail.com> wrote:
> > Hi internals, > > > I've created a new RFC https://wiki.php.net/rfc/cachediterable adding CachedIterable, > > which eagerly evaluates any iterable and contains an immutable copy of the keys and values of the iterable it was constructed from > > > > This has the proposed signature: > > > > ``` > > final class CachedIterable implements IteratorAggregate, Countable, JsonSerializable > > { > > public function __construct(iterable $iterator) {} > > public function getIterator(): InternalIterator {} > > public function count(): int {} > > // [[$key1, $value1], [$key2, $value2]] > > public static function fromPairs(array $pairs): CachedIterable {} > > // [[$key1, $value1], [$key2, $value2]] > > public function toPairs(): array{} > > public function __serialize(): array {} // [$k1, $v1, $k2, $v2,...] > > public function __unserialize(array $data): void {} > > > > // useful for converting iterables back to arrays for further processing > > public function keys(): array {} // [$k1, $k2, ...] > > public function values(): array {} // [$v1, $v2, ...] > > // useful to efficiently get offsets at the middle/end of a long iterable > > public function keyAt(int $offset): mixed {} > > public function valueAt(int $offset): mixed {} > > > > // '[["key1","value1"],["key2","value2"]]' instead of '{...}' > > public function jsonSerialize(): array {} > > // dynamic properties are forbidden > > } > > ``` > > > > Currently, PHP does not provide a built-in way to store the state of an arbitrary iterable for reuse later > > (when the iterable has arbitrary keys, or when keys might be repeated). It would be useful to do so for many use cases, such as: > > > > 1. Creating a rewindable copy of a non-rewindable Traversable > > 2. Generating an IteratorAggregate from a class still implementing Iterator > > 3. In the future, providing internal or userland helpers such as iterable_flip(iterable $input), iterable_take(iterable $input, int $limit), > > iterable_chunk(iterable $input, int $chunk_size), iterable_reverse(), etc (these are not part of the RFC) > > 4. Providing memory-efficient random access to both keys and values of arbitrary key-value sequences > > > > Having this implemented as an internal class would also allow it to be much more efficient than a userland solution > > (in terms of time to create, time to iterate over the result, and total memory usage). See https://wiki.php.net/rfc/cachediterable#benchmarks > > > > After some consideration, this is being created as a standalone RFC, and going in the global namespace: > > > > - Based on early feedback on https://wiki.php.net/rfc/any_all_on_iterable#straw_poll (on the namespace preferred in previous polls) > > It seems like it's way too early for me to be proposing namespaces in any RFCs for PHP adding to modules that already exist, when there is no consensus. > > > > An earlier attempt by others on creating a policy for namespaces in general(https://wiki.php.net/rfc/php_namespace_policy#vote) also did not pass. > > > > Having even 40% of voters opposed to introducing a given namespace (in pre-existing modules) > > makes it an impractical choice when RFCs require a 2/3 majority to pass. > > - While some may argue that a different namespace might pass, > > https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote had a sharp dropoff in feedback after the 3rd form. > > I don't know how to interpret that - e.g. are unranked namespaces preferred even less than the options that were ranked or just not seen as affecting the final result. > > A heads up - I will probably start voting on https://wiki.php.net/rfc/cachediterable this weekend after https://wiki.php.net/rfc/cachediterable_straw_poll is finished. > > Any other feedback on CachedIterable? > > Thanks, > Tyson > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: https://www.php.net/unsub.php >
Based on a recent comment you made on GitHub, it seems like `CachedIterable` eagerly creates the datastore instead of doing so on-demand. Is this correct?
  114792
June 9, 2021 04:49 internals@lists.php.net ("Levi Morrison via internals")
On Tue, Jun 8, 2021 at 10:47 PM Levi Morrison
morrison@datadoghq.com> wrote:
> > On Tue, Jun 8, 2021 at 6:22 PM tyson andre <tysonandre775@hotmail.com> wrote: > > > > Hi internals, > > > > > I've created a new RFC https://wiki.php.net/rfc/cachediterable adding CachedIterable, > > > which eagerly evaluates any iterable and contains an immutable copy of the keys and values of the iterable it was constructed from > > > > > > This has the proposed signature: > > > > > > ``` > > > final class CachedIterable implements IteratorAggregate, Countable, JsonSerializable > > > { > > > public function __construct(iterable $iterator) {} > > > public function getIterator(): InternalIterator {} > > > public function count(): int {} > > > // [[$key1, $value1], [$key2, $value2]] > > > public static function fromPairs(array $pairs): CachedIterable {} > > > // [[$key1, $value1], [$key2, $value2]] > > > public function toPairs(): array{} > > > public function __serialize(): array {} // [$k1, $v1, $k2, $v2,...] > > > public function __unserialize(array $data): void {} > > > > > > // useful for converting iterables back to arrays for further processing > > > public function keys(): array {} // [$k1, $k2, ...] > > > public function values(): array {} // [$v1, $v2, ...] > > > // useful to efficiently get offsets at the middle/end of a long iterable > > > public function keyAt(int $offset): mixed {} > > > public function valueAt(int $offset): mixed {} > > > > > > // '[["key1","value1"],["key2","value2"]]' instead of '{...}' > > > public function jsonSerialize(): array {} > > > // dynamic properties are forbidden > > > } > > > ``` > > > > > > Currently, PHP does not provide a built-in way to store the state of an arbitrary iterable for reuse later > > > (when the iterable has arbitrary keys, or when keys might be repeated). It would be useful to do so for many use cases, such as: > > > > > > 1. Creating a rewindable copy of a non-rewindable Traversable > > > 2. Generating an IteratorAggregate from a class still implementing Iterator > > > 3. In the future, providing internal or userland helpers such as iterable_flip(iterable $input), iterable_take(iterable $input, int $limit), > > > iterable_chunk(iterable $input, int $chunk_size), iterable_reverse(), etc (these are not part of the RFC) > > > 4. Providing memory-efficient random access to both keys and values of arbitrary key-value sequences > > > > > > Having this implemented as an internal class would also allow it to be much more efficient than a userland solution > > > (in terms of time to create, time to iterate over the result, and total memory usage). See https://wiki.php.net/rfc/cachediterable#benchmarks > > > > > > After some consideration, this is being created as a standalone RFC, and going in the global namespace: > > > > > > - Based on early feedback on https://wiki.php.net/rfc/any_all_on_iterable#straw_poll (on the namespace preferred in previous polls) > > > It seems like it's way too early for me to be proposing namespaces in any RFCs for PHP adding to modules that already exist, when there is no consensus. > > > > > > An earlier attempt by others on creating a policy for namespaces in general(https://wiki.php.net/rfc/php_namespace_policy#vote) also did not pass. > > > > > > Having even 40% of voters opposed to introducing a given namespace (in pre-existing modules) > > > makes it an impractical choice when RFCs require a 2/3 majority to pass. > > > - While some may argue that a different namespace might pass, > > > https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote had a sharp dropoff in feedback after the 3rd form. > > > I don't know how to interpret that - e.g. are unranked namespaces preferred even less than the options that were ranked or just not seen as affecting the final result. > > > > A heads up - I will probably start voting on https://wiki.php.net/rfc/cachediterable this weekend after https://wiki.php.net/rfc/cachediterable_straw_poll is finished. > > > > Any other feedback on CachedIterable? > > > > Thanks, > > Tyson > > > > -- > > PHP Internals - PHP Runtime Development Mailing List > > To unsubscribe, visit: https://www.php.net/unsub.php > > > > Based on a recent comment you made on GitHub, it seems like > `CachedIterable` eagerly creates the datastore instead of doing so > on-demand. Is this correct?
Sorry, yes, that's correct and pointed out in the RFC. I think that's a significant implementation flaw. I don't see why we'd balloon memory usage unnecessarily by being eager -- if an operation needs to fetch more data then it can go ahead and do so.
  114797
June 9, 2021 14:12 tysonandre775@hotmail.com (tyson andre)
Hi Levi Morrison,

> > > Hi internals, > > > > > > > I've created a new RFC https://wiki.php.net/rfc/cachediterable adding CachedIterable, > > > > which eagerly evaluates any iterable and contains an immutable copy of the keys and values of the iterable it was constructed from > > > > > > > > This has the proposed signature: > > > > > > > > ``` > > > > final class CachedIterable implements IteratorAggregate, Countable, JsonSerializable > > > > { > > > >     public function __construct(iterable $iterator) {} > > > >     public function getIterator(): InternalIterator {} > > > >     public function count(): int {} > > > >     // [[$key1, $value1], [$key2, $value2]] > > > >     public static function fromPairs(array $pairs): CachedIterable {} > > > >     // [[$key1, $value1], [$key2, $value2]] > > > >     public function toPairs(): array{} > > > >     public function __serialize(): array {}  // [$k1, $v1, $k2, $v2,...] > > > >     public function __unserialize(array $data): void {} > > > > > > > >     // useful for converting iterables back to arrays for further processing > > > >     public function keys(): array {}  // [$k1, $k2, ...] > > > >     public function values(): array {}  // [$v1, $v2, ...] > > > >     // useful to efficiently get offsets at the middle/end of a long iterable > > > >     public function keyAt(int $offset): mixed {} > > > >     public function valueAt(int $offset): mixed {} > > > > > > > >     // '[["key1","value1"],["key2","value2"]]' instead of '{...}' > > > >     public function jsonSerialize(): array {} > > > >     // dynamic properties are forbidden > > > > } > > > > ``` > > > > > > > > Currently, PHP does not provide a built-in way to store the state of an arbitrary iterable for reuse later > > > > (when the iterable has arbitrary keys, or when keys might be repeated). It would be useful to do so for many use cases, such as: > > > > > > > > 1. Creating a rewindable copy of a non-rewindable Traversable > > > > 2. Generating an IteratorAggregate from a class still implementing Iterator > > > > 3. In the future, providing internal or userland helpers such as iterable_flip(iterable $input), iterable_take(iterable $input, int $limit), > > > >     iterable_chunk(iterable $input, int $chunk_size), iterable_reverse(), etc (these are not part of the RFC) > > > > 4. Providing memory-efficient random access to both keys and values of arbitrary key-value sequences > > > > > > > > Having this implemented as an internal class would also allow it to be much more efficient than a userland solution > > > > (in terms of time to create, time to iterate over the result, and total memory usage). See https://wiki.php.net/rfc/cachediterable#benchmarks > > > > > > > > After some consideration, this is being created as a standalone RFC, and going in the global namespace: > > > > > > > > - Based on early feedback on https://wiki.php.net/rfc/any_all_on_iterable#straw_poll (on the namespace preferred in previous polls) > > > >   It seems like it's way too early for me to be proposing namespaces in any RFCs for PHP adding to modules that already exist, when there is no consensus. > > > > > > > >   An earlier attempt by others on creating a policy for namespaces in general(https://wiki.php.net/rfc/php_namespace_policy#vote) also did not pass. > > > > > > > >   Having even 40% of voters opposed to introducing a given namespace (in pre-existing modules) > > > >   makes it an impractical choice when RFCs require a 2/3 majority to pass. > > > > - While some may argue that a different namespace might pass, > > > >   https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote had a sharp dropoff in feedback after the 3rd form. > > > >   I don't know how to interpret that - e.g. are unranked namespaces preferred even less than the options that were ranked or just not seen as affecting the final result. > > > > > > A heads up - I will probably start voting on https://wiki.php.net/rfc/cachediterable this weekend after https://wiki.php.net/rfc/cachediterable_straw_poll is finished. > > > > > > Any other feedback on CachedIterable? > > > > > > Thanks, > > > Tyson > > > > > > -- > > > PHP Internals - PHP Runtime Development Mailing List > > > To unsubscribe, visit: https://www.php.net/unsub.php > > > > > > > Based on a recent comment you made on GitHub, it seems like > > `CachedIterable` eagerly creates the datastore instead of doing so > > on-demand. Is this correct? > > Sorry, yes, that's correct and pointed out in the RFC. > > I think that's a significant implementation flaw. I don't see why we'd > balloon memory usage unnecessarily by being eager -- if an operation > needs to fetch more data then it can go ahead and do so.
First, PHP's standard library accommodates a wide variety of use cases, of which I believe eager evaluation is the most common. There is no reason that an eagerly evaluated CachedIterable and lazily evaluated LazyCachedIterable couldn't be both added at some point if both had passing RFCs. (This is referring to https://en.wikipedia.org/wiki/Lazy_evaluation and https://en.wikipedia.org/wiki/Eager_evaluation) As was stated in that GitHub Discussion, 1) If a CachedIterable were to be used in the standard library or a user-defined library, many end users would want the standard library to return something that could be iterated over multiple times. The limit of a single iteration was a source of bugs in SPL classes such as https://www.php.net/arrayobject prior to them being switched to IteratorAggregate. (This is concerning whether functions such as `*filter` and `*map` should evaluate the result eagerly or lazily if they do get added. It is possible for a LazyCachedIterable to be implemented that computes values on demand, but see below points.) ``` $foo = map(...); foreach ($foo as $i => $v1) { foreach ($foo as $i => $v2) { if (some_pair_predicate($v1, $v2)) { // do something } } } ``` 2) Userland library/application authors that are interested in lazy generators could use or implement something such as https://github.com/nikic/iter instead. My opinion is that the standard library should provide something that is easy to understand, debug, serialize or represent, etc.. I expect the inner iterable may be hidden entirely in a LazyCachedIterable from var_dump as an implementation detail. 3) It would be harder to understand why SomeFrameworkException is thrown in code unrelated to that framework when a lazy (instead of eager) iterable is passed to some function that accepts a generic iterable, and harder to write correct exception handling for it if done in a lazy generation style. Many RFCs have been rejected due to being perceived as being likely to be misused in userland or to make code harder to understand. 4) It is possible to implement a lazy alternative to CachedIterable that only loads values as needed. However, I hadn't proposed it due to doubts that 2/3 of voters would consider it widely useful enough to be included in php rather than as a userland or PECL library. Additionally, CachedIterables are much more memory efficient than existing options such as arrays https://wiki.php.net/rfc/cachediterable#cachediterables_are_memory-efficient (The only thing more efficient in PHP's core modules is SplFixedArray, and that only allows keys `0..n-1`) Regards, Tyson
  114798
June 9, 2021 14:55 internals@lists.php.net ("Levi Morrison via internals")
On Wed, Jun 9, 2021 at 8:12 AM tyson andre <tysonandre775@hotmail.com> wrote:
> > Hi Levi Morrison, > > > > > Hi internals, > > > > > > > > > I've created a new RFC https://wiki.php.net/rfc/cachediterable adding CachedIterable, > > > > > which eagerly evaluates any iterable and contains an immutable copy of the keys and values of the iterable it was constructed from > > > > > > > > > > This has the proposed signature: > > > > > > > > > > ``` > > > > > final class CachedIterable implements IteratorAggregate, Countable, JsonSerializable > > > > > { > > > > > public function __construct(iterable $iterator) {} > > > > > public function getIterator(): InternalIterator {} > > > > > public function count(): int {} > > > > > // [[$key1, $value1], [$key2, $value2]] > > > > > public static function fromPairs(array $pairs): CachedIterable {} > > > > > // [[$key1, $value1], [$key2, $value2]] > > > > > public function toPairs(): array{} > > > > > public function __serialize(): array {} // [$k1, $v1, $k2, $v2,...] > > > > > public function __unserialize(array $data): void {} > > > > > > > > > > // useful for converting iterables back to arrays for further processing > > > > > public function keys(): array {} // [$k1, $k2, ...] > > > > > public function values(): array {} // [$v1, $v2, ...] > > > > > // useful to efficiently get offsets at the middle/end of a long iterable > > > > > public function keyAt(int $offset): mixed {} > > > > > public function valueAt(int $offset): mixed {} > > > > > > > > > > // '[["key1","value1"],["key2","value2"]]' instead of '{...}' > > > > > public function jsonSerialize(): array {} > > > > > // dynamic properties are forbidden > > > > > } > > > > > ``` > > > > > > > > > > Currently, PHP does not provide a built-in way to store the state of an arbitrary iterable for reuse later > > > > > (when the iterable has arbitrary keys, or when keys might be repeated). It would be useful to do so for many use cases, such as: > > > > > > > > > > 1. Creating a rewindable copy of a non-rewindable Traversable > > > > > 2. Generating an IteratorAggregate from a class still implementing Iterator > > > > > 3. In the future, providing internal or userland helpers such as iterable_flip(iterable $input), iterable_take(iterable $input, int $limit), > > > > > iterable_chunk(iterable $input, int $chunk_size), iterable_reverse(), etc (these are not part of the RFC) > > > > > 4. Providing memory-efficient random access to both keys and values of arbitrary key-value sequences > > > > > > > > > > Having this implemented as an internal class would also allow it to be much more efficient than a userland solution > > > > > (in terms of time to create, time to iterate over the result, and total memory usage). See https://wiki.php.net/rfc/cachediterable#benchmarks > > > > > > > > > > After some consideration, this is being created as a standalone RFC, and going in the global namespace: > > > > > > > > > > - Based on early feedback on https://wiki.php.net/rfc/any_all_on_iterable#straw_poll (on the namespace preferred in previous polls) > > > > > It seems like it's way too early for me to be proposing namespaces in any RFCs for PHP adding to modules that already exist, when there is no consensus. > > > > > > > > > > An earlier attempt by others on creating a policy for namespaces in general(https://wiki.php.net/rfc/php_namespace_policy#vote) also did not pass. > > > > > > > > > > Having even 40% of voters opposed to introducing a given namespace (in pre-existing modules) > > > > > makes it an impractical choice when RFCs require a 2/3 majority to pass. > > > > > - While some may argue that a different namespace might pass, > > > > > https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote had a sharp dropoff in feedback after the 3rd form. > > > > > I don't know how to interpret that - e.g. are unranked namespaces preferred even less than the options that were ranked or just not seen as affecting the final result. > > > > > > > > A heads up - I will probably start voting on https://wiki.php.net/rfc/cachediterable this weekend after https://wiki.php.net/rfc/cachediterable_straw_poll is finished. > > > > > > > > Any other feedback on CachedIterable? > > > > > > > > Thanks, > > > > Tyson > > > > > > > > -- > > > > PHP Internals - PHP Runtime Development Mailing List > > > > To unsubscribe, visit: https://www.php.net/unsub.php > > > > > > > > > > Based on a recent comment you made on GitHub, it seems like > > > `CachedIterable` eagerly creates the datastore instead of doing so > > > on-demand. Is this correct? > > > > Sorry, yes, that's correct and pointed out in the RFC. > > > > I think that's a significant implementation flaw. I don't see why we'd > > balloon memory usage unnecessarily by being eager -- if an operation > > needs to fetch more data then it can go ahead and do so. > > First, PHP's standard library accommodates a wide variety of use cases, of which I believe eager evaluation is the most common. > There is no reason that an eagerly evaluated CachedIterable and lazily evaluated LazyCachedIterable couldn't be both added at some point > if both had passing RFCs. > > (This is referring to https://en.wikipedia.org/wiki/Lazy_evaluation and https://en.wikipedia.org/wiki/Eager_evaluation) > > As was stated in that GitHub Discussion, > > 1) If a CachedIterable were to be used in the standard library or a user-defined library, > many end users would want the standard library to return something that could be iterated over multiple times. > The limit of a single iteration was a source of bugs in SPL classes > such as https://www.php.net/arrayobject prior to them being switched to IteratorAggregate. > > (This is concerning whether functions such as `*filter` and `*map` should evaluate the result eagerly or lazily if they do get added. > It is possible for a LazyCachedIterable to be implemented that computes values on demand, but see below points.) > > ``` > $foo = map(...); > foreach ($foo as $i => $v1) { > foreach ($foo as $i => $v2) { > if (some_pair_predicate($v1, $v2)) { > // do something > } > } > } > ``` > > 2) Userland library/application authors that are interested in lazy generators could use or implement something > such as https://github.com/nikic/iter instead. My opinion is that the standard library should provide > something that is easy to understand, debug, serialize or represent, etc. > I expect the inner iterable may be hidden entirely in a LazyCachedIterable from var_dump as an implementation detail. > > 3) It would be harder to understand why SomeFrameworkException is thrown in code unrelated to that framework > when a lazy (instead of eager) iterable is passed to some function that accepts a generic iterable, > and harder to write correct exception handling for it if done in a lazy generation style. > > Many RFCs have been rejected due to being perceived as being likely to be misused in userland or > to make code harder to understand. > > 4) It is possible to implement a lazy alternative to CachedIterable that only loads values as needed. > However, I hadn't proposed it due to doubts that 2/3 of voters would consider it widely useful > enough to be included in php rather than as a userland or PECL library. > > Additionally, > > CachedIterables are much more memory efficient than existing options such as arrays > https://wiki.php.net/rfc/cachediterable#cachediterables_are_memory-efficient > (The only thing more efficient in PHP's core modules is SplFixedArray, > and that only allows keys `0..n-1`) > > Regards, > Tyson > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: https://www.php.net/unsub.php >
I think you misunderstood my complaint because of the other conversation on GitHub. CachedIterable should load from the underlying datastore lazily -- there is hardly any visible impact from the user if this happens, because for the most part it looks and behaves the same as it does today. The only visible changes are around loading data from the underlying iterable. For example, if the user calls the count method on the CachedIterable, it would then load the remainder of the underlying data-store (and then drop its reference to it). If the user asks for valueAt($n) and it's beyond what's already loaded and we haven't finished consuming the underlying iterable, then it would load until $n is found or the end of the store is reached. I understand your concerns with `map`, `filter`, etc. CachedIterable is different because it holds onto the data, can be iterated over more than once, including the two nested loop cases, _even if it loads data from the underlying iterable on demand_.
  114799
June 9, 2021 15:35 phpmailinglists@gmail.com (Peter Bowyer)
On Wed, 9 Jun 2021 at 15:55, Levi Morrison via internals <
internals@lists.php.net> wrote:

> On Wed, Jun 9, 2021 at 8:12 AM tyson andre <tysonandre775@hotmail.com> > wrote: > > > > Hi Levi Morrison, > > > > > > > Hi internals, > > > > > >
Would participants please trim the emails they're quoting, it makes it easier for readers to focus on what's being discussed in emails. Thanks, Peter
  114805
June 10, 2021 05:05 drealecs@gmail.com (=?UTF-8?Q?Alexandru_P=C4=83tr=C4=83nescu?=)
On Wed, Jun 9, 2021 at 3:22 AM tyson andre <tysonandre775@hotmail.com>
wrote:

> Hi internals, > > > I've created a new RFC https://wiki.php.net/rfc/cachediterable adding > CachedIterable, > > which eagerly evaluates any iterable and contains an immutable copy of > the keys and values of the iterable it was constructed from > > > A heads up - I will probably start voting on > https://wiki.php.net/rfc/cachediterable this weekend after > https://wiki.php.net/rfc/cachediterable_straw_poll is finished. > > Any other feedback on CachedIterable? > > Thanks, > Tyson > > Hi Tyson,
Thanks for explaining 4 months ago about my concern. I think I understand the main real impact of an eager iterable cache vs a lazy iterable cache from a functional point of view: - exceptions are thrown during construction vs during the first iteration - predictable performance also on the first iteration. How did you gather the information that eager implementation is more valuable than lazy one? I'm mostly curious also how to assess this as technically to me it also looks the other way around. Maybe mention that in the RFC. I was even thinking that CachedIterable should be lazy and an EagerCachedIterable would be built upon that with more methods. Or have it in the same class with a constructor parameter. Also, being able to have a perfect userland implementation, not very complex, even considering the lower performance, is not that good for positive voting from what I remember from history... Regards, Alex
  114812
June 10, 2021 14:16 tysonandre775@hotmail.com (tyson andre)
Hi Alex,

> > I've created a new RFC https://wiki.php.net/rfc/cachediterable adding CachedIterable, > > which eagerly evaluates any iterable and contains an immutable copy of the keys and values of the iterable it was constructed from > > > > A heads up - I will probably start voting on https://wiki.php.net/rfc/cachediterable this weekend after https://wiki.php.net/rfc/cachediterable_straw_poll is finished. > > > > Any other feedback on CachedIterable? > > Thanks for explaining 4 months ago about my concern. > I think I understand the main real impact of an eager iterable cache vs a lazy iterable cache from a functional point of view: > - exceptions are thrown during construction vs during the first iteration > - predictable performance also on the first iteration. > > How did you gather the information that eager implementation is more valuable than lazy one? I'm mostly curious also how to assess this as technically to me it also looks the other way around. Maybe mention that in the RFC. > I was even thinking that CachedIterable should be lazy and an EagerCachedIterable would be built upon that with more methods. Or have it in the same class with a constructor parameter.
One of the reasons was size/efficiency. Adding the functionality to support lazy evaluation would require extra properties to track internal state and extra checks at runtime, point to the original iterable and the functions being applied to that iterable - so an application that creates lots of small/empty cached iterables would have a higher memory usage. Having a data structure that tries to do everything would do other things poorly (potentially not support serialization, use more memory than necessary, have unintuitive behaviors when attempting to var_export/var_dump it, surprisingly throw when being iterated over, etc)
> Also, being able to have a perfect userland implementation, not very complex, even considering the lower performance, is not that good for positive voting from what I remember from history...
1. The userland polyfill included in the RFC is an incomplete implementation that only supports iteration. It's meant to be as fast as possible at the cost of memory usage. It's not even an IteratorAggregate, doesn't support json encode, createFromPairs, and many other functions. 2. Virtually all of the spl iterables that don't deal with filesystems can be reimplemented in userland. (https://en.wikipedia.org/wiki/Turing_completeness) Even complicated extensions such as redis or memcached can be reimplemented in userland on top of sockets, but with higher cpu usage than native extensions (https://github.com/predis/predis/blob/main/FAQ.md#predis-is-a-pure-php-implementation-it-can-not-be-fast-enough) The benefit of having data structures internally is the fact that developers who learn them can use them in any project without adding dependencies (even in single file scripts) and that applications using CachedIterable would have much better performance Also, you and Levi have pointed out that iterable/iterator functionality is traditionally on-demand (https://en.wikipedia.org/wiki/Lazy_evaluation) (e.g. iterables such as CallbackFilterIterator, RecursiveArrayIterator, etc) As a result, I'm thinking CachedIterable is really not a good name for the eagerly evaluated data structure I'm proposing here, and that there was confusion about how the data structure behaved when the name CachedIterable was suggested. If functionality like that described in https://externals.io/message/114805#114792 was added, it could use the name CachedIterable instead. So I'm probably changing this to `ImmutableTraversable` as a short name for the functionality, to make it clear arguments are eagerly evaluated when it is created. (ImmutableSequence may be expected to only contain values, and would be confused with the ds PECL's https://www.php.net/manual/en/class.ds-sequence.php) Thanks, Tyson
  114814
June 10, 2021 14:39 pierre-php@processus.org (Pierre)
Le 10/06/2021 à 16:16, tyson andre a écrit :
> So I'm probably changing this to `ImmutableTraversable` as a short name for the functionality, > to make it clear arguments are eagerly evaluated when it is created. > (ImmutableSequence may be expected to only contain values, and would be confused with the ds PECL's https://www.php.net/manual/en/class.ds-sequence.php)
Hello, And why not simply RewindableIterator ? Isn't it the prominent feature of it ? Agreed it's immutable, but a lot of traversable could be as well. Regards, -- Pierre
  114819
June 10, 2021 15:54 internals@lists.php.net ("Levi Morrison via internals")
On Thu, Jun 10, 2021 at 8:40 AM Pierre <pierre-php@processus.org> wrote:
> > Le 10/06/2021 à 16:16, tyson andre a écrit : > > So I'm probably changing this to `ImmutableTraversable` as a short name for the functionality, > > to make it clear arguments are eagerly evaluated when it is created. > > (ImmutableSequence may be expected to only contain values, and would be confused with the ds PECL's https://www.php.net/manual/en/class.ds-sequence.php) > > Hello, > > And why not simply RewindableIterator ? Isn't it the prominent feature > of it ? > > Agreed it's immutable, but a lot of traversable could be as well. > > Regards, > > -- > > Pierre > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: https://www.php.net/unsub.php >
All iterators are "rewindable", though of course not in practice. I would avoid such names because we may eventually add an interface which works as a "tag" to say "yes, I actually do support rewinding." The property of being rewindable comes from it being cached. Maybe `CachedAggregate`? Aggregates are data structures from which an external iterator can be obtained, so it makes a bit more sense if it's eager.
  114834
June 12, 2021 15:20 tysonandre775@hotmail.com (tyson andre)
Hi internals,

> > > So I'm probably changing this to `ImmutableTraversable` as a short name for the functionality, > > > to make it clear arguments are eagerly evaluated when it is created. > > > (ImmutableSequence may be expected to only contain values, and would be confused with the ds PECL's https://www.php.net/manual/en/class.ds-sequence.php) > > > > Hello, > > > > And why not simply RewindableIterator ? Isn't it the prominent feature > > of it ? > > > > Agreed it's immutable, but a lot of traversable could be as well. > > All iterators are "rewindable", though of course not in practice. I > would avoid such names because we may eventually add an interface > which works as a "tag" to say "yes, I actually do support rewinding." > > The property of being rewindable comes from it being cached. Maybe > `CachedAggregate`? Aggregates are data structures from which an > external iterator can be obtained, so it makes a bit more sense if > it's eager.
I think CachedAggregate would have problems with an unclear meaning similar to those that were raised previously in https://externals.io/message/114819#114798 (Some developers would think it may refer to the act of lazily evaluating the iterable(caching it on-demand to access later)) https://en.wikipedia.org/wiki/Aggregate on its own refers to a collection of objects/values or in other contexts, functions such as count/sum/min/max https://en.wikipedia.org/wiki/Aggregate_function - In other contexts such as set theory, there might not be keys associated with the values so aggregate on its own seems unclear. ImmutableIteratorAggregate or just ImmutableIterable/ImmutableTraversable makes more sense than `Cached*` to me. ImmutableKeyValueSequence is an even shorter name than ImmutableIteratorAggregate and describes what the data structure is. Thanks, Tysosn