[RFC] [Vote] Pre-vote announcement for Random Extension 5.x

  117840
May 31, 2022 09:54 g-kudo@colopl.co.jp (Go Kudo)
Hi internals.

Although I have explained that due to the global turmoil I will delay
voting on the RFC as long as possible, it is time to start voting on the
RFC in order to get the implementation to full status by the PHP 8.2
Feature Freeze.

I apologize for the delay in responding to the points you had already
pointed out. It has been addressed as follows.

- Random\Engine\PCG64::__construct() now takes an `int|string` $inc as its
second argument
    - This makes it fully compatible with the PCG64 original implementation
- Fixed an algorithm implementation error in PCG64
- Fixed compatibility issues with PHP 8.2 in test cases
- More detailed description in RFC

Previous discussion thread: https://externals.io/message/117295

Voting will begin at 2022-06-14 00:00:00 (UTC).

https://wiki.php.net/rfc/rng_extension

Regards,
Go Kudo
  117851
June 4, 2022 09:02 come@chilliet.eu (=?ISO-8859-1?Q?C=F4me_Chilliet?=)
Le 31 mai 2022 11:54:18 GMT+02:00, Go Kudo <g-kudo@colopl.co.jp> a écrit :
>Hi internals. > >Although I have explained that due to the global turmoil I will delay >voting on the RFC as long as possible, it is time to start voting on the >RFC in order to get the implementation to full status by the PHP 8.2 >Feature Freeze. > >I apologize for the delay in responding to the points you had already >pointed out. It has been addressed as follows. > >- Random\Engine\PCG64::__construct() now takes an `int|string` $inc as its >second argument > - This makes it fully compatible with the PCG64 original implementation >- Fixed an algorithm implementation error in PCG64 >- Fixed compatibility issues with PHP 8.2 in test cases >- More detailed description in RFC > >Previous discussion thread: https://externals.io/message/117295 > >Voting will begin at 2022-06-14 00:00:00 (UTC). > >https://wiki.php.net/rfc/rng_extension > >Regards, >Go Kudo
The first section starts with "However", which sounds like an error from a text reorganization? Also: "adding any randomization functions between the seeding the intended usage would break the code. " -> missing "and" I suppose "Even with JIT enabled, the native implementation is not far behind. " -> it *is* far behind no? It's not even behind but ahead as I see it so maybe I just misunderstand this comment. Côme
  117863
June 7, 2022 11:03 g-kudo@colopl.co.jp (Go Kudo)
2022年6月4日(土) 18:03 Côme Chilliet <come@chilliet.eu>:

> Le 31 mai 2022 11:54:18 GMT+02:00, Go Kudo <g-kudo@colopl.co.jp> a écrit : > >Hi internals. > > > >Although I have explained that due to the global turmoil I will delay > >voting on the RFC as long as possible, it is time to start voting on the > >RFC in order to get the implementation to full status by the PHP 8.2 > >Feature Freeze. > > > >I apologize for the delay in responding to the points you had already > >pointed out. It has been addressed as follows. > > > >- Random\Engine\PCG64::__construct() now takes an `int|string` $inc as its > >second argument > > - This makes it fully compatible with the PCG64 original > implementation > >- Fixed an algorithm implementation error in PCG64 > >- Fixed compatibility issues with PHP 8.2 in test cases > >- More detailed description in RFC > > > >Previous discussion thread: https://externals.io/message/117295 > > > >Voting will begin at 2022-06-14 00:00:00 (UTC). > > > >https://wiki.php.net/rfc/rng_extension > > > >Regards, > >Go Kudo > > The first section starts with "However", which sounds like an error from a > text reorganization? > Also: > "adding any randomization functions between the seeding the intended usage > would break the code. " -> missing "and" I suppose > "Even with JIT enabled, the native implementation is not far behind. " -> > it *is* far behind no? It's not even behind but ahead as I see it so maybe > I just misunderstand this comment. > > Côme > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: https://www.php.net/unsub.php > >
> The first section starts with "However", which sounds like an error from a text reorganization?
> "adding any randomization functions between the seeding the intended usage would break the code. " -> missing "and" I suppose
Thanks, I fixed it.
> "Even with JIT enabled, the native implementation is not far behind. " -> it *is* far behind no? It's not even behind but ahead as I see it so maybe
I just misunderstand this comment. Just wanted to explain that the native implementation is faster. (C > PHP) I changed it to a more clear form. (I was replying directly to you. sorry)
  117865
June 7, 2022 15:24 tim@bastelstu.be (=?UTF-8?Q?Tim_D=c3=bcsterhus?=)
Hi

On 5/31/22 11:54, Go Kudo wrote:
> - More detailed description in RFC >
a) For the Random\Engine interface I suggest to clarify that the returned bytestring will be interpreted as a little endian integer. b) I'm still missing an explanation of the guarantees the Randomizer implementation will make (last bullet point in https://externals.io/message/117295#117299). To me the guarantees is the most important thing about this RFC. As a user of the API I need to know what of the behavior can and what cannot change in future PHP versions. I don't really care whether the RNG is PCG64 or some Xoshiro. Both are great. The Engine part is pretty solid: They implement a well-defined RNG and any numbers are returned as little endian integers (see (a)). Either the implementation is correct or it is not. This is not likely to change in the future. However the Randomizer part is pretty undefined: As an example: ->getInt() will return an integer uniformly selected from the given range. But there's more than one way to perform this uniform selection. Will the algorithm stay the safe in future PHP versions? There are more examples in my previous emails. Best regards Tim Düsterhus
  117898
June 10, 2022 10:02 g-kudo@colopl.co.jp (Go Kudo)
2022年6月8日(水) 0:24 Tim Düsterhus <tim@bastelstu.be>:

> Hi > > On 5/31/22 11:54, Go Kudo wrote: > > - More detailed description in RFC > > > > a) > > For the Random\Engine interface I suggest to clarify that the returned > bytestring will be interpreted as a little endian integer. > > b) > > I'm still missing an explanation of the guarantees the Randomizer > implementation will make (last bullet point in > https://externals.io/message/117295#117299). > > To me the guarantees is the most important thing about this RFC. As a > user of the API I need to know what of the behavior can and what cannot > change in future PHP versions. I don't really care whether the RNG is > PCG64 or some Xoshiro. Both are great. > > The Engine part is pretty solid: They implement a well-defined RNG and > any numbers are returned as little endian integers (see (a)). Either the > implementation is correct or it is not. This is not likely to change in > the future. > > However the Randomizer part is pretty undefined: As an example: > ->getInt() will return an integer uniformly selected from the given > range. But there's more than one way to perform this uniform selection. > Will the algorithm stay the safe in future PHP versions? There are more > examples in my previous emails. > > Best regards > Tim Düsterhus >
Hi Tim. Thanks for the continued feedback! I have added the following regarding the points you pointed out. * interface Random\Engine
> It has a single generate(): string method that generates random numbers as a binary string. This string must be non-empty and attempting to return
an empty will result in a RuntimeException.
> If you implement a random number generator in PHP, the generated numbers must be converted to binary using the pack() function, and the values must
be little-endian. * class Random Randomizer
> The same current PHP algorithm is used to generate random numbers within the specified range in Randomizer::getInt(). This is necessary for backward
compatibility.
> It also provides a guarantee of consistency in the future.
However, I understand that perhaps this fix will not satisfy your request. I just do not have a good understanding of your intentions due to my poor English.... I am considering the following possibilities regarding your intentions: 1. Should be stated in detail so that consistency of results is maintained in the future. 2. Current PHP ranged random number generation algorithm has room for improvement and should be examined further 3. Consistency of results is difficult to maintain in the future (Maybe incorrect) In case 1, I think the following statement would satisfy the requirement.
> The values generated by the seedable Engine guarantee future reproducibility of results, and the Randomizer uses the results to process
them, so if the results generated by the Engine are identical, the Randomizer's results will also be consistent. Although the consistency of the Randomizer results is not mentioned here, it would be a clear BC Break if the results were to change after the Randomizer is officially implemented, so I do think it is sufficient that an RFC be created and voted on as necessary. In case 2, I also thought about it along the way. Nikita also taught me about better algorithms: https://externals.io/message/115918#115982 But, I also think that the current PHP implementation is good enough, and there is no need to change it to the point of breaking compatibility. I think the current global scope MT implementation is very troublesome for some use cases and should first be able to be drop-in-replaceable with this implementation. In case 3, I think it is necessary to guarantee consistency at least once at the language level, even though this may change in the future. I have already indicated the need for this in the RFC. Most of all, I do not believe you intend this to be the case. (And this sentence is not intended to offend you either. Please don't misunderstand me.) I would like to hear more about your opinion. Regards, Go Kudo
  117899
June 10, 2022 10:42 tim@bastelstu.be (=?UTF-8?Q?Tim_D=c3=bcsterhus?=)
Hi

On 6/10/22 12:02, Go Kudo wrote:
>> It has a single generate(): string method that generates random numbers > as a binary string. This string must be non-empty and attempting to return > an empty will result in a RuntimeException. >> If you implement a random number generator in PHP, the generated numbers > must be converted to binary using the pack() function, and the values must > be little-endian.
Thanks, that looks good to me.
> * class Random Randomizer > >> The same current PHP algorithm is used to generate random numbers within > the specified range in Randomizer::getInt(). This is necessary for backward > compatibility. >> It also provides a guarantee of consistency in the future. > > However, I understand that perhaps this fix will not satisfy your request. > I just do not have a good understanding of your intentions due to my poor > English....
Don't worry. I understand that using a foreign language can sometimes be hard - I am not a native speaker of English either and I suspect that this also applies to many other participants.
> I am considering the following possibilities regarding your intentions: > > 1. Should be stated in detail so that consistency of results is maintained > in the future. > 2. Current PHP ranged random number generation algorithm has room for > improvement and should be examined further > 3. Consistency of results is difficult to maintain in the future (Maybe > incorrect) > > In case 1, I think the following statement would satisfy the requirement. > >> The values generated by the seedable Engine guarantee future > reproducibility of results, and the Randomizer uses the results to process > them, so if the results generated by the Engine are identical, the > Randomizer's results will also be consistent. > > Although the consistency of the Randomizer results is not mentioned here, > it would be a clear BC Break if the results were to change after the > Randomizer is officially implemented, so I do think it is sufficient that > an RFC be created and voted on as necessary.
If I understand you correctly, then (1) is what I am looking for: It should be spelled out explicitly what behavior the user may rely on and what should be considered an implementation detail. Something like the following would work for me: ---- The engines implement a specific well-defined random number generator. For a given seed it is guaranteed that they return the same sequence as the reference implementation. For the Randomizer it is considered a breaking change if the observable behavior of the methods changes. For a given seeded engine and identical method parameters the following must hold: - The number of calls to the Engine's ->generate() method remains the same. - The return value remains the same for a given result retrieved from ->generate(). Any changes to the Randomizer that violate these guarantees require a separate RFC. ----
> In case 2, I also thought about it along the way. Nikita also taught me > about better algorithms: https://externals.io/message/115918#115982 > > But, I also think that the current PHP implementation is good enough, and > there is no need to change it to the point of breaking compatibility. > > I think the current global scope MT implementation is very troublesome for > some use cases and should first be able to be drop-in-replaceable with this > implementation. > > In case 3, I think it is necessary to guarantee consistency at least once > at the language level, even though this may change in the future. I have > already indicated the need for this in the RFC.
Can you comment on whether the Randomizer behaves identically on both 32 and 64 bit PHP and also on little endian and big endian architectures? As an example: Will ->getInt() calculate the same "uniform distribution" on all bitnesses? If not I consider that a bug. The engines *should* behave identically, because of the "little endian string" return value. If that's already the case then something like the following should be added to the RFC guarantees: - The implementation will guarantee that the same results are returned independent of the processor architecture (little endian / big endian) and integer bit length's (32 / 64).
> Most of all, I do not believe you intend this to be the case. (And this > sentence is not intended to offend you either. Please don't misunderstand > me.) >
Best regards Tim Düsterhus
  117923
June 13, 2022 02:23 g-kudo@colopl.co.jp (Go Kudo)
2022年6月10日(金) 19:42 Tim Düsterhus <tim@bastelstu.be>:

> Hi > > On 6/10/22 12:02, Go Kudo wrote: > >> It has a single generate(): string method that generates random numbers > > as a binary string. This string must be non-empty and attempting to > return > > an empty will result in a RuntimeException. > >> If you implement a random number generator in PHP, the generated numbers > > must be converted to binary using the pack() function, and the values > must > > be little-endian. > > Thanks, that looks good to me. > > > * class Random Randomizer > > > >> The same current PHP algorithm is used to generate random numbers within > > the specified range in Randomizer::getInt(). This is necessary for > backward > > compatibility. > >> It also provides a guarantee of consistency in the future. > > > > However, I understand that perhaps this fix will not satisfy your > request. > > I just do not have a good understanding of your intentions due to my poor > > English.... > > Don't worry. I understand that using a foreign language can sometimes be > hard - I am not a native speaker of English either and I suspect that > this also applies to many other participants. > > > I am considering the following possibilities regarding your intentions: > > > > 1. Should be stated in detail so that consistency of results is > maintained > > in the future. > > 2. Current PHP ranged random number generation algorithm has room for > > improvement and should be examined further > > 3. Consistency of results is difficult to maintain in the future (Maybe > > incorrect) > > > > In case 1, I think the following statement would satisfy the requirement. > > > >> The values generated by the seedable Engine guarantee future > > reproducibility of results, and the Randomizer uses the results to > process > > them, so if the results generated by the Engine are identical, the > > Randomizer's results will also be consistent. > > > > Although the consistency of the Randomizer results is not mentioned here, > > it would be a clear BC Break if the results were to change after the > > Randomizer is officially implemented, so I do think it is sufficient that > > an RFC be created and voted on as necessary. > > If I understand you correctly, then (1) is what I am looking for: It > should be spelled out explicitly what behavior the user may rely on and > what should be considered an implementation detail. > > Something like the following would work for me: > > ---- > > The engines implement a specific well-defined random number generator. > For a given seed it is guaranteed that they return the same sequence as > the reference implementation. > > For the Randomizer it is considered a breaking change if the observable > behavior of the methods changes. For a given seeded engine and identical > method parameters the following must hold: > > - The number of calls to the Engine's ->generate() method remains the same. > - The return value remains the same for a given result retrieved from > ->generate(). > > Any changes to the Randomizer that violate these guarantees require a > separate RFC. > > ---- > > > In case 2, I also thought about it along the way. Nikita also taught me > > about better algorithms: https://externals.io/message/115918#115982 > > > > But, I also think that the current PHP implementation is good enough, and > > there is no need to change it to the point of breaking compatibility. > > > > I think the current global scope MT implementation is very troublesome > for > > some use cases and should first be able to be drop-in-replaceable with > this > > implementation. > > > > In case 3, I think it is necessary to guarantee consistency at least once > > at the language level, even though this may change in the future. I have > > already indicated the need for this in the RFC. > > Can you comment on whether the Randomizer behaves identically on both 32 > and 64 bit PHP and also on little endian and big endian architectures? > As an example: Will ->getInt() calculate the same "uniform distribution" > on all bitnesses? If not I consider that a bug. > > The engines *should* behave identically, because of the "little endian > string" return value. > > If that's already the case then something like the following should be > added to the RFC guarantees: > > - The implementation will guarantee that the same results are returned > independent of the processor architecture (little endian / big endian) > and integer bit length's (32 / 64). > > > Most of all, I do not believe you intend this to be the case. (And this > > sentence is not intended to offend you either. Please don't misunderstand > > me.) > > > > > Best regards > Tim Düsterhus >
It does not depend on endianness, but does depend on the number of bits. This is because the new algorithm generates 64-bit values. When using a 64-bit RNG in a 32-bit environment, Engine::generate() returns the same binary string as in a 64-bit environment, but the Randomizer methods return different values. This is because the size of zend_long in a 32-bit environment does not match uint64_t and is truncated. To keep the results the same in 32-bit / 64-bit environments, only the lower 32-bit of the 64-bit value should be used. However, this leads to reduced randomness and does not seem appropriate considering that most environments running PHP are 64-bit. I have created a PoC that allows all internal operations to be performed in 64-bit environments to achieve the same results, although the efficiency of execution in 32-bit environments will be reduced. (Note that Randomizer::getInt() with no argument is still incompatible.) https://github.com/php/php-src/commit/dbed218bfcd45e713fa3df2c88a4c2efce9f0651 Another idea I had was to throw an exception when trying to generate a 64-bit RNG in a 32-bit environment. Regards, Go Kudo
  117924
June 13, 2022 07:14 tim@bastelstu.be (=?UTF-8?Q?Tim_D=c3=bcsterhus?=)
Hi

On 6/13/22 04:23, Go Kudo wrote:
> I have created a PoC that allows all internal operations to be performed in > 64-bit environments to achieve the same results, although the efficiency of > execution in 32-bit environments will be reduced. (Note that > Randomizer::getInt() with no argument is still incompatible.) > > https://github.com/php/php-src/commit/dbed218bfcd45e713fa3df2c88a4c2efce9f0651 >
I believe this is a good solution. The majority of the modern setups (i.e. the setups that are using PHP 8.2) will likely be 64-bit anyway and reduced performance on 32-bit is then acceptable. Best regards Tim Düsterhus
  117926
June 13, 2022 09:43 g-kudo@colopl.co.jp (Go Kudo)
2022年6月13日(月) 16:14 Tim Düsterhus <tim@bastelstu.be>:

> Hi > > On 6/13/22 04:23, Go Kudo wrote: > > I have created a PoC that allows all internal operations to be performed > in > > 64-bit environments to achieve the same results, although the efficiency > of > > execution in 32-bit environments will be reduced. (Note that > > Randomizer::getInt() with no argument is still incompatible.) > > > > > https://github.com/php/php-src/commit/dbed218bfcd45e713fa3df2c88a4c2efce9f0651 > > > > I believe this is a good solution. The majority of the modern setups > (i.e. the setups that are using PHP 8.2) will likely be 64-bit anyway > and reduced performance on 32-bit is then acceptable. > > Best regards > Tim Düsterhus >
Hi Tim Thanks! I have updated the implementation and RFCs. https://wiki.php.net/rfc/rng_extension Voting will open tomorrow (2022-06-14 00:00:00 UTC) as planned. I have reviewed the implementation and would like to clean up the implementation as it is cluttered with past spec changes. This may possibly be too late to start voting, but I do not plan to change the behavior in any way. Best regards, Go Kudo
  117935
June 13, 2022 15:09 g-kudo@colopl.co.jp (Go Kudo)
2022年5月31日(火) 18:54 Go Kudo <g-kudo@colopl.co.jp>:

> Hi internals. > > Although I have explained that due to the global turmoil I will delay > voting on the RFC as long as possible, it is time to start voting on the > RFC in order to get the implementation to full status by the PHP 8.2 > Feature Freeze. > > I apologize for the delay in responding to the points you had already > pointed out. It has been addressed as follows. > > - Random\Engine\PCG64::__construct() now takes an `int|string` $inc as its > second argument > - This makes it fully compatible with the PCG64 original implementation > - Fixed an algorithm implementation error in PCG64 > - Fixed compatibility issues with PHP 8.2 in test cases > - More detailed description in RFC > > Previous discussion thread: https://externals.io/message/117295 > > Voting will begin at 2022-06-14 00:00:00 (UTC). > > https://wiki.php.net/rfc/rng_extension > > Regards, > Go Kudo >
Good evening. While refactoring, I found an error in the PCG64 algorithm we are implementing. It should be implemented as pcg64_oneseq_128 (XSL-RR), but currently pcg64_setseq_128 (XSL-RR-RR) is implemented. The RFC also incorrectly states that NumPy implements pcg64_oneseq_128. https://wiki.php.net/rfc/rng_extension The RFC has been corrected for the above. This change removes the int|string $inc argument from RandomEngine\PCG64::__construct(), leaving only int|string|null $seed. Regards, Go Kudo