[RFC] [VOTE] Random Extension 5.x

  117939
June 14, 2022 00:01 g-kudo@colopl.co.jp (Go Kudo)
Hello internals.

Voting began on 2022-06-14 00:00:00 (UTC) and will end on 2022-06-28
00:00:00 (UTC).

https://wiki.php.net/rfc/rng_extension

The implementation is not yet complete and has some issues.
See TODO in Pull Request for details.

https://github.com/php/php-src/pull/8094

Best Regards,
Go Kudo
  117955
June 15, 2022 17:23 tim@bastelstu.be (=?UTF-8?Q?Tim_D=c3=bcsterhus?=)
Hi

On 6/14/22 02:01, Go Kudo wrote:
> Voting began on 2022-06-14 00:00:00 (UTC) and will end on 2022-06-28 > 00:00:00 (UTC). > > https://wiki.php.net/rfc/rng_extension > > The implementation is not yet complete and has some issues. > See TODO in Pull Request for details. > > https://github.com/php/php-src/pull/8094 >
Unfortunately the vote has already started and I'm not sure if that's a change that might change the outcome of the vote, but while looking through the implementation once more I noticed that the engine implementations are not 'final' (and extending those engines is actually tested with the existing tests). However I believe they should be final: a) I generally believe that it's a best practice to make everything 'final' by default. b) It's easily possible to use composition with engines, as the interface only has a single method. c) Especially for 'Random\Engine\Secure' I believe that allowing subclassing is actively harmful, as basically any adjustment of the engine's behavior violates the contract that the engine returns cryptographically secure randomness. But also for other engines changing the behavior also changes the implied behavior given by the engine's name. What do you think? Best regards Tim Düsterhus
  117960
June 16, 2022 08:24 g-kudo@colopl.co.jp (Go Kudo)
2022年6月16日(木) 2:23 Tim Düsterhus <tim@bastelstu.be>:

> Hi > > On 6/14/22 02:01, Go Kudo wrote: > > Voting began on 2022-06-14 00:00:00 (UTC) and will end on 2022-06-28 > > 00:00:00 (UTC). > > > > https://wiki.php.net/rfc/rng_extension > > > > The implementation is not yet complete and has some issues. > > See TODO in Pull Request for details. > > > > https://github.com/php/php-src/pull/8094 > > > > Unfortunately the vote has already started and I'm not sure if that's a > change that might change the outcome of the vote, but while looking > through the implementation once more I noticed that the engine > implementations are not 'final' (and extending those engines is actually > tested with the existing tests). > > However I believe they should be final: > > a) I generally believe that it's a best practice to make everything > 'final' by default. > > b) It's easily possible to use composition with engines, as the > interface only has a single method. > > c) Especially for 'Random\Engine\Secure' I believe that allowing > subclassing is actively harmful, as basically any adjustment of the > engine's behavior violates the contract that the engine returns > cryptographically secure randomness. But also for other engines changing > the behavior also changes the implied behavior given by the engine's name.. > > What do you think? > > Best regards > Tim Düsterhus >
Hi Tim
> However I believe they should be final
That is correct, indeed. The interface is already provided and creating a composition is easy. However, the voting has already started. It would be impossible to edit the RFC now. Fortunately, the Feature Freeze for PHP 8.2 is 7/19. Even after the current Random Extension 5.x RFC voting is over, there is still time to create and vote on RFCs to make changes. I will create an additional RFC like PHP 8.0 Attribute. Regards Go Kudo
  117962
June 16, 2022 10:14 g-kudo@colopl.co.jp (Go Kudo)
2022年6月14日(火) 9:01 Go Kudo <g-kudo@colopl.co.jp>:

> Hello internals. > > Voting began on 2022-06-14 00:00:00 (UTC) and will end on 2022-06-28 > 00:00:00 (UTC). > > https://wiki.php.net/rfc/rng_extension > > The implementation is not yet complete and has some issues. > See TODO in Pull Request for details. > > https://github.com/php/php-src/pull/8094 > > Best Regards, > Go Kudo >
Hello internals. Thank you for voting for RFC. Now, as Tim pointed out *1, there are several problems with the current RFC implementation. However, the RFC is already in the voting phase, and it is not advisable to change the content now. I have drafted a new RFC to fix this. https://wiki.php.net/rfc/random_extension_improvement I was hesitant to create a new RFC page while the current RFC was still in the early voting stage, but I thought it was time to do it now, given the lack of progress in previous discussions and the upcoming PHP 8.2 Feature Freeze. If the content is acceptable, we would like to change the status of the RFC to Under Discussion and make an announcement thread to internals ML. Can anyone review the content? *1: https://externals.io/message/117939#117955 Best Regards Go Kudo
  117963
June 16, 2022 12:52 tim@bastelstu.be (=?UTF-8?Q?Tim_D=c3=bcsterhus?=)
Hi

On 6/16/22 12:14, Go Kudo wrote:
> If the content is acceptable, we would like to change the status of the RFC > to Under Discussion and make an > announcement thread to internals ML. Can anyone review the content? >
(1) Engines should be final: That was my suggestion and that paragraph looks good to me. (2) array_rand(): That looks good to me. For the method name: Perhaps ->pickArrayKey() for similarity with ->shuffleArray() which also includes the type it operates on in the name? (3) Naming of PCG64: I must admit that the fact that PCG is a full family of similar, but not identical generators is one thing that made me (and still makes me) prefer the xoshiro family which has clearer names for its variants. It was also pretty hard to find the PCG definitions on the PCG website, but I believe that it refers to this: https://www.pcg-random.org/using-pcg-c.html#low-level-api? In that case PCG64S would be consistent with the upstream high level API name. I am not sure if Pcg64s would be more readable, though. It would definitely need a good explanation in the documentation which exact variant it is, though. (4) Randomizer::randomString($charset, $length) If we extend the Randomizer anyway, I wonder if we also should use the opportunity to add a method to generate a random string for a given charset, e.g. ->randomString('0123456789abcdef', 6) would return a random hexadecimal color code. This does not yet exist in the standard library, but I am seeing that many libraries implement something like that on their own, making it a useful addition in my mind. Best regards Tim Düsterhus
  117966
June 16, 2022 13:37 tim@bastelstu.be (=?UTF-8?Q?Tim_D=c3=bcsterhus?=)
Hi

On 6/16/22 14:52, Tim Düsterhus wrote:
> (3) Naming of PCG64: > > I must admit that the fact that PCG is a full family of similar, but not > identical generators is one thing that made me (and still makes me) > prefer the xoshiro family which has clearer names for its variants. > > It was also pretty hard to find the PCG definitions on the PCG website, > but I believe that it refers to this: > > https://www.pcg-random.org/using-pcg-c.html#low-level-api? > > In that case PCG64S would be consistent with the upstream high level API > name. I am not sure if Pcg64s would be more readable, though. > > It would definitely need a good explanation in the documentation which > exact variant it is, though.
I've now had a look at the Paper [1], because I wanted to find out what the various bits and pieces within the full oneseq-128-xsl-rr-64 name mean and in the paper I came across section "6.3 Specific Implementations" which notes:
> The library provides named generators based on > their properties, not their underlying implementations (e.g., pcg32_unique for a > general-purpose 32-bit generator with a unique stream). That way, when future family > members that perform even better are discovered and added (hopefully due to the > discoveries of others), users can switch seamlessly over to them.
This implies that the official Pcg64s name might refer to a different implementation in the future. This makes it hard for PHP to keep the compatibility guarantee that a specific engine will always refer to a specific well-defined RNG. This implies that the engine needs to be named "PcgOneseq128XslRr64" to accurately describe the implementation. This - of course - is absolutely unwieldy. Note sure what the best course of action is here. [1] https://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf Best regards Tim Düsterhus
  117976
June 17, 2022 12:52 g-kudo@colopl.co.jp (Go Kudo)
2022年6月16日(木) 22:37 Tim Düsterhus <tim@bastelstu.be>:

> Hi > > On 6/16/22 14:52, Tim Düsterhus wrote: > > (3) Naming of PCG64: > > > > I must admit that the fact that PCG is a full family of similar, but not > > identical generators is one thing that made me (and still makes me) > > prefer the xoshiro family which has clearer names for its variants. > > > > It was also pretty hard to find the PCG definitions on the PCG website, > > but I believe that it refers to this: > > > > https://www.pcg-random.org/using-pcg-c.html#low-level-api? > > > > In that case PCG64S would be consistent with the upstream high level API > > name. I am not sure if Pcg64s would be more readable, though. > > > > It would definitely need a good explanation in the documentation which > > exact variant it is, though. > > I've now had a look at the Paper [1], because I wanted to find out what > the various bits and pieces within the full oneseq-128-xsl-rr-64 name > mean and in the paper I came across section "6.3 Specific > Implementations" which notes: > > > The library provides named generators based on > > their properties, not their underlying implementations (e.g., > pcg32_unique for a > > general-purpose 32-bit generator with a unique stream). That way, when > future family > > members that perform even better are discovered and added (hopefully due > to the > > discoveries of others), users can switch seamlessly over to them. > > This implies that the official Pcg64s name might refer to a different > implementation in the future. This makes it hard for PHP to keep the > compatibility guarantee that a specific engine will always refer to a > specific well-defined RNG. This implies that the engine needs to be > named "PcgOneseq128XslRr64" to accurately describe the implementation. > This - of course - is absolutely unwieldy. Note sure what the best > course of action is here. > > [1] https://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf > > Best regards > Tim Düsterhus >
Hello Tim Thank you for your continued support. (1) Thanks! (2) You are indeed correct. I renamed it to pickArrayKey(). (3) This is a very tricky problem. As you point out, it took me a while to figure out PCG64 too. However, after statistical testing PCG64 (pcg_oneseq-128-xsl-rr-64) seems to do a good job. (Though Xoshiro256** does as well, and has a more obvious name.) Regarding the naming issue, I have looked at implementations in other languages (Python's NumPy and Rust) and none of them seem to be very clear. I agree with you about keeping the names clear. Although it is complicated, I don't think anyone would be particularly bothered by the name PcgOneseq128XslRr64. However, some people may continue to use MT because they don't understand it well. I think the possible options are as follows A. Rename PCG64 to PcgOneseq128XslRr64 B. Rename PCG64 to PcgOneseq128XslRr64 and then re-implement the more obvious Xoshiro256StarStar C. Remove PCG64 and re-implement Xoshiro256StarStart Personally, I think B is the best. What do you think? Regards Go Kudo
  117982
June 17, 2022 15:54 tim@bastelstu.be (=?UTF-8?Q?Tim_D=c3=bcsterhus?=)
Hi

On 6/17/22 14:52, Go Kudo wrote:
> (3) This is a very tricky problem. As you point out, it took me a while to > figure out PCG64 too. > However, after statistical testing PCG64 (pcg_oneseq-128-xsl-rr-64) seems > to do a good job. > (Though Xoshiro256** does as well, and has a more obvious name.) > > Regarding the naming issue, I have looked at implementations in other > languages > (Python's NumPy and Rust) and none of them seem to be very clear. > > I agree with you about keeping the names clear. Although it is complicated, > I don't think anyone would be particularly bothered by the name > PcgOneseq128XslRr64. > However, some people may continue to use MT because they don't understand > it well. > > I think the possible options are as follows > > A. Rename PCG64 to PcgOneseq128XslRr64 > B. Rename PCG64 to PcgOneseq128XslRr64 and then re-implement the more > obvious Xoshiro256StarStar > C. Remove PCG64 and re-implement Xoshiro256StarStart > > Personally, I think B is the best. > > What do you think? >
Any of those would be fine for me. You can either make a decision as the RFC author and include that into a general vote, or you could split this into separate votes (each with a 2/3 decision): - Add xoshiro256** as Xoshiro256StarStar? - If xoshiro256** is added: Remove PCG64? - If PCG64 stays: Rename PCG64 to PcgOneseq128XslRr64? Best regards Tim Düsterhus PS: I believe you missed my suggestion (4) Randomizer::randomString($charset, $length).
  117985
June 17, 2022 17:04 g-kudo@colopl.co.jp (Go Kudo)
2022年6月14日(火) 9:01 Go Kudo <g-kudo@colopl.co.jp>:

> Hello internals. > > Voting began on 2022-06-14 00:00:00 (UTC) and will end on 2022-06-28 > 00:00:00 (UTC). > > https://wiki.php.net/rfc/rng_extension > > The implementation is not yet complete and has some issues. > See TODO in Pull Request for details. > > https://github.com/php/php-src/pull/8094 > > Best Regards, > Go Kudo >
Hi. RFC has been updated. Includes corrections to areas pointed out by Tim and changes MersenneTwister to MT19937. I also made it possible to vote for each item. https://wiki.php.net/rfc/random_extension_improvement How about it? for Tim:
> I believe you missed my suggestion (4)
My apologies! I had completely missed that. That new feature sounds good to me. But, I think the method name `pickString()` would be better. (It is interoperable with `pickArrayKey()`) Added to the RFC.
  117986
June 17, 2022 17:14 tim@bastelstu.be (=?UTF-8?Q?Tim_D=c3=bcsterhus?=)
Hi

On 6/17/22 19:04, Go Kudo wrote:
> RFC has been updated. Includes corrections to areas pointed out by Tim and > changes MersenneTwister to MT19937. > I also made it possible to vote for each item.
I suggest to split the "PCG is not so famous" vote into 2 votes to make it clear how exactly the majority is calculated and to have a clear primary vote as indicated in https://wiki.php.net/rfc/voting#required_majority
> https://wiki.php.net/rfc/random_extension_improvement > > How about it? > > for Tim: > >> I believe you missed my suggestion (4) > > My apologies! I had completely missed that. > That new feature sounds good to me. But, I think the method name > `pickString()` would be better. (It is interoperable with `pickArrayKey()`)
I don't think that ->pickString() is a good name, because it is not really comparable to pickArrayKey(): pickArrayKey() will return each key only once. It is more comparable to: substr($r->shuffleString('0123456789abcdef'), 0, 6) My proposed ->randomString() must be able to return a character multiple times. If you don't like ->randomString(), I have an alternative suggestion: ->stringFromCharset() Best regards Tim Düsterhus
  117988
June 17, 2022 17:28 g-kudo@colopl.co.jp (Go Kudo)
2022年6月18日(土) 2:14 Tim Düsterhus <tim@bastelstu.be>:

> Hi > > On 6/17/22 19:04, Go Kudo wrote: > > RFC has been updated. Includes corrections to areas pointed out by Tim > and > > changes MersenneTwister to MT19937. > > I also made it possible to vote for each item. > > I suggest to split the "PCG is not so famous" vote into 2 votes to make > it clear how exactly the majority is calculated and to have a clear > primary vote as indicated in > > https://wiki.php.net/rfc/voting#required_majority > > > https://wiki.php.net/rfc/random_extension_improvement > > > > How about it? > > > > for Tim: > > > >> I believe you missed my suggestion (4) > > > > My apologies! I had completely missed that. > > That new feature sounds good to me. But, I think the method name > > `pickString()` would be better. (It is interoperable with > `pickArrayKey()`) > > I don't think that ->pickString() is a good name, because it is not > really comparable to pickArrayKey(): pickArrayKey() will return each key > only once. It is more comparable to: > > substr($r->shuffleString('0123456789abcdef'), 0, 6) > > My proposed ->randomString() must be able to return a character multiple > times. > > If you don't like ->randomString(), I have an alternative suggestion: > ->stringFromCharset() > > Best regards > Tim Düsterhus >
> "PCG is not so famous" vote into 2 votes
My apologies. This is a complete mistake. Since PCG64 has already clarified its implementation in an earlier RFC, removing it does not seem to be an option. The item has been removed.
> I don't think that ->pickString() is a good name
I see. But I think `randomString()` is ambiguous with `getBytes()`. `stringFromCharset(string $string, int $num): string` solves that, but I think it is possible that the meaning of "char" is not well known in the PHP world (although I think this name is most appropriate) How about adding an optional `?int $num` argument to `shuffleString(string $string, ?int $num): string`? Regards Go Kudo
  117989
June 17, 2022 17:37 tim@bastelstu.be (=?UTF-8?Q?Tim_D=c3=bcsterhus?=)
Hi

On 6/17/22 19:28, Go Kudo wrote
>> I don't think that ->pickString() is a good name > > I see. But I think `randomString()` is ambiguous with `getBytes()`. > > `stringFromCharset(string $string, int $num): string` solves that, but I > think it is possible > that the meaning of "char" is not well known in the PHP world (although I > think this name is most appropriate)
->stringFromAlphabet()?
> How about adding an optional `?int $num` argument to `shuffleString(string > $string, ?int $num): string`? >
No, because it would be pretty unclear what that `$num` argument would do there. It specifically would be different from the `$num` of `pickArrayKey()`. pickArrayKey() returns every key only once. Generating a string from a given charset may return the same character multiple times. Don't overload a single method with too many purposes. Best regards Tim Düsterhus
  117990
June 17, 2022 17:46 g-kudo@colopl.co.jp (Go Kudo)
2022年6月18日(土) 2:37 Tim Düsterhus <tim@bastelstu.be>:

> Hi > > On 6/17/22 19:28, Go Kudo wrote > >> I don't think that ->pickString() is a good name > > > > I see. But I think `randomString()` is ambiguous with `getBytes()`. > > > > `stringFromCharset(string $string, int $num): string` solves that, but I > > think it is possible > > that the meaning of "char" is not well known in the PHP world (although I > > think this name is most appropriate) > > ->stringFromAlphabet()? > > > How about adding an optional `?int $num` argument to > `shuffleString(string > > $string, ?int $num): string`? > > > > No, because it would be pretty unclear what that `$num` argument would > do there. It specifically would be different from the `$num` of > `pickArrayKey()`. pickArrayKey() returns every key only once. Generating > a string from a given charset may return the same character multiple > times. Don't overload a single method with too many purposes. > > Best regards > Tim Düsterhus >
I was fundamentally wrong, I understand now. As you said, there was no interoperability with `pickArrayKey()` in the first place...
> stringFromAlphabet()
Hmmm. I guess randomString would be better then. At the same time, it would be nice to have an array version of randomArray. However, I don't want to add more methods without any thought. I think operations that can be done on userland should be done on userland. That is why I did not implement the array_rand() function in the first place. I would like to hear other people's opinions in this area. Incidentally, our own PHP implementation of the library (Xorshift128+) has equivalents for arrayPickKey, stringPick, arrayPickValue, randomArray, randomString. And it is useful. Regards Go Kudo
  117991
June 17, 2022 18:13 tim@bastelstu.be (=?UTF-8?Q?Tim_D=c3=bcsterhus?=)
Hi

On 6/17/22 19:46, Go Kudo wrote:
> I was fundamentally wrong, I understand now. > As you said, there was no interoperability with `pickArrayKey()` in the > first place... > >> stringFromAlphabet() > > Hmmm. I guess randomString would be better then. At the same time, it would > be nice to have an array version of randomArray. > > However, I don't want to add more methods without any thought. > I think operations that can be done on userland should be done on userland. > That is why I did not implement the array_rand() function in the first > place.
Yes, I agree here. But I believe that "generate a string with a given alphabet" is a very common operation that would be useful to include in the standard library. In any case it's better to leave something out than to implement something badly, so if you don't feel comfortable with that, then leave it out. There will be more PHP versions after 8.2. For me both of these: ->randomString(string $alphabet, int $length) ->stringFromAlphabet(string $alphabet, int $length) with the description "Return a string of $length characters selected from the given $alphabet. Characters may be selected more than once". would be acceptable names. Best regards Tim Düsterhus
  117992
June 17, 2022 18:37 g-kudo@colopl.co.jp (Go Kudo)
2022年6月18日(土) 3:13 Tim Düsterhus <tim@bastelstu.be>:

> Hi > > On 6/17/22 19:46, Go Kudo wrote: > > I was fundamentally wrong, I understand now. > > As you said, there was no interoperability with `pickArrayKey()` in the > > first place... > > > >> stringFromAlphabet() > > > > Hmmm. I guess randomString would be better then. At the same time, it > would > > be nice to have an array version of randomArray. > > > > However, I don't want to add more methods without any thought. > > I think operations that can be done on userland should be done on > userland. > > That is why I did not implement the array_rand() function in the first > > place. > > Yes, I agree here. But I believe that "generate a string with a given > alphabet" is a very common operation that would be useful to include in > the standard library. In any case it's better to leave something out > than to implement something badly, so if you don't feel comfortable with > that, then leave it out. There will be more PHP versions after 8.2. > > For me both of these: > > ->randomString(string $alphabet, int $length) > ->stringFromAlphabet(string $alphabet, int $length) > > with the description "Return a string of $length characters selected > from the given $alphabet. Characters may be selected more than once". > > would be acceptable names. > > Best regards > Tim Düsterhus >
I noticed one important thing. "String" in PHP means binary, and operating on multibyte characters often causes problems. Although I rarely deal with multibyte characters in my work, this can probably be a big problem for Japanese PHP users like myself. To work around this, the mbstring extension must be used properly, but since mbstring is not built-in, it is appropriate to implement it in userland. For the reasons stated above, we will abandon the addition of new methods. sorry. `str_shuffle()` and `shuffleString()` already have similar problems. So perhaps an alternative method name for `str_shuffle()` might be `bytesShuffle()` instead of `stringShuffle()`. Regards Go Kudo
  117993
June 17, 2022 19:16 tim@bastelstu.be (=?UTF-8?Q?Tim_D=c3=bcsterhus?=)
Hi

On 6/17/22 20:37, Go Kudo wrote:
> For the reasons stated above, we will abandon the addition of new methods. > sorry.
Okay, that's fine for me, no problem.
> `str_shuffle()` and `shuffleString()` already have similar problems. So > perhaps an alternative > method name for `str_shuffle()` might be `bytesShuffle()` instead of > `stringShuffle()`. >
No opinion there. To make sure that the updates are voted on in time I suggest that you open the official 2-week discussion with a dedicated thread. Best regards Tim Düsterhus
  117995
June 17, 2022 19:44 g-kudo@colopl.co.jp (Go Kudo)
2022年6月18日(土) 4:16 Tim Düsterhus <tim@bastelstu.be>:

> Hi > > On 6/17/22 20:37, Go Kudo wrote: > > For the reasons stated above, we will abandon the addition of new > methods. > > sorry. > > Okay, that's fine for me, no problem. > > > `str_shuffle()` and `shuffleString()` already have similar problems. So > > perhaps an alternative > > method name for `str_shuffle()` might be `bytesShuffle()` instead of > > `stringShuffle()`. > > > > No opinion there. > > To make sure that the updates are voted on in time I suggest that you > open the official 2-week discussion with a dedicated thread. > > Best regards > Tim Düsterhus >
OK, My greatest thanks to you for your cooperation! Best regards Go Kudo
  118112
June 28, 2022 01:04 g-kudo@colopl.co.jp (Go Kudo)
2022年6月14日(火) 9:01 Go Kudo <g-kudo@colopl.co.jp>:

> Hello internals. > > Voting began on 2022-06-14 00:00:00 (UTC) and will end on 2022-06-28 > 00:00:00 (UTC). > > https://wiki.php.net/rfc/rng_extension > > The implementation is not yet complete and has some issues. > See TODO in Pull Request for details. > > https://github.com/php/php-src/pull/8094 > > Best Regards, > Go Kudo >
Hello Internals. Voting is now closed as the deadline has passed. This RFC accepted with 20 yes / 0 no. https://wiki.php.net/rfc/rng_extension Meanwhile, a new RFC will be voted on to improve the content of the RFC. https://wiki.php.net/rfc/random_extension_improvement https://externals.io/message/117994 Thank you for your continued support. Best regards Go Kudo