[RFC] Numeric Literal Separator

  105714
May 15, 2019 15:32 theodorejb@outlook.com (Theodore Brown)
Hi internals,

As requested at the end of April [1], Bishop and I are resurrecting the Numeric Literal Separator RFC: https://wiki.php.net/rfc/numeric_literal_separator.

If no serious issues come up, voting will start in two weeks (on 2019-05-29).

Best regards,
Theodore

[1]: https://externals.io/message/105450
  105787
May 28, 2019 14:10 theodorejb@outlook.com (Theodore Brown)
On Wed, May 15, 2019 at 10:32 AM Theodore Brown wrote:

> As requested at the end of April, [1] Bishop and I are resurrecting > the Numeric Literal Separator RFC: > https://wiki.php.net/rfc/numeric_literal_separator. > > If no serious issues come up, voting will start in two weeks > (on 2019-05-29).
Just a heads-up that voting is scheduled to start tomorrow. I've made some improvements to the implementation based on Nikita's feedback, and also added a couple more example use cases to the RFC. Please let me know if you have any questions or concerns before voting starts. Thanks, Theodore [1]: https://externals.io/message/105450
  105788
May 28, 2019 19:24 Danack@basereality.com (Dan Ackroyd)
On Tue, 28 May 2019 at 15:10, Theodore Brown <theodorejb@outlook.com> wrote:
> > Please let me know if you have any questions or concerns before > voting starts. >
Particularly if you're currently planning to vote no, but a small change or explanation might change your vote. I also happen to think people should only vote no if they think the change is something that adds too much complexity to PHP's internals for the change to be worth it, even to the people who want this feature, rather than voting no just because they don't plan to use this feature themselves. But opinions on that might differ. cheers Dan Ack
  105789
May 28, 2019 21:17 rowan.collins@gmail.com (Rowan Collins)
On 28/05/2019 20:24, Dan Ackroyd wrote:
> I also happen to think people should only vote no if they think the > change is something that adds too much complexity to PHP's internals > for the change to be worth it
I see where you're coming from with that, but there is also a cost to *users* in having more variants and complexity in syntax to understand and be tripped up by, even if it actually mad the internals simpler for some reason. I don't personally think that applies here, but it's a reason someone voting might decide to consider. Regards, -- Rowan Collins [IMSoP]
  105790
May 29, 2019 07:49 come@opensides.be (=?ISO-8859-1?Q?C=F4me?= Chilliet)
What bugs me with this RFC is that it seems to be mainly intended for grouping digits by 3, which from what I understand is cultural.
At least some asian languages have a concept of https://en.wikipedia.org/wiki/Myriad and group them by 4, at least from the language point of view.
It does seem when writing as numbers they still group by 3, but it seems other usages exists:
https://japantoday.com/category/features/lifestyle/10-000-or-1-0000-japanese-schools-are-starting-to-move-commas-on-big-numbers-but-why

Anyway, just wanted to point out that grouping digit may not be completely universal.
I will not vote no on the RFC, most likely I won’t vote. But I think I will not use this in my code.

Côme
  105791
May 29, 2019 08:19 cmbecker69@gmx.de ("Christoph M. Becker")
On 29.05.2019 at 09:49, Côme Chilliet wrote:

> What bugs me with this RFC is that it seems to be mainly intended for grouping digits by 3, which from what I understand is cultural. > At least some asian languages have a concept of https://en.wikipedia.org/wiki/Myriad and group them by 4, at least from the language point of view.. > It does seem when writing as numbers they still group by 3, but it seems other usages exists: > https://japantoday.com/category/features/lifestyle/10-000-or-1-0000-japanese-schools-are-starting-to-move-commas-on-big-numbers-but-why > > Anyway, just wanted to point out that grouping digit may not be completely universal.
Well, choosing English/German/Japanese words for identifiers isn't completely universal either. :) -- Christoph M. Becker
  105792
May 29, 2019 08:39 markus@fischer.name (Markus Fischer)
Hi,

On 29.05.19 09:49, Côme Chilliet wrote:
> What bugs me with this RFC is that it seems to be mainly intended for grouping digits by 3, which from what I understand is cultural. > At least some asian languages have a concept of https://en.wikipedia.org/wiki/Myriad and group them by 4, at least from the language point of view. > It does seem when writing as numbers they still group by 3, but it seems other usages exists: > https://japantoday.com/category/features/lifestyle/10-000-or-1-0000-japanese-schools-are-starting-to-move-commas-on-big-numbers-but-why
My understanding from the RFC is that that the grouping is not relevant, the `_` is stripped regardless. I would expected this all to work the same - 1_000_000 => 1000000 - 100_0000 => 1000000 - 1_0_0_0_0_0_0 => 1000000 It even gives similar examples with the hex variant: 0x42_72_6F_77_6E; // with separator Am I wrong? - Markus
  105793
May 29, 2019 11:02 yohgaki@ohgaki.net (Yasuo Ohgaki)
On Wed, May 29, 2019 at 5:40 PM Markus Fischer <markus@fischer.name> wrote:

> Hi, > > On 29.05.19 09:49, Côme Chilliet wrote: > > What bugs me with this RFC is that it seems to be mainly intended for > grouping digits by 3, which from what I understand is cultural. > > At least some asian languages have a concept of > https://en.wikipedia.org/wiki/Myriad and group them by 4, at least from > the language point of view. > > It does seem when writing as numbers they still group by 3, but it seems > other usages exists: > > > https://japantoday.com/category/features/lifestyle/10-000-or-1-0000-japanese-schools-are-starting-to-move-commas-on-big-numbers-but-why > > My understanding from the RFC is that that the grouping is not relevant, > the `_` is stripped regardless. > > I would expected this all to work the same > > - 1_000_000 => 1000000 > - 100_0000 => 1000000 > - 1_0_0_0_0_0_0 => 1000000 > > It even gives similar examples with the hex variant: > > 0x42_72_6F_77_6E; // with separator > > Am I wrong? >
Simply ignoring "_" in numeric literal is nicer (and a bit faster) especially for hex/octal/bit. Hex may be grouped by 2,4,8,16,32 and so on. Bit fields may be grouped by any length. Regards, P.S. Even if it's easier for us, we don't use 1,0000 normally at least today. -- Yasuo Ohgaki yohgaki@ohgaki.net
  105794
May 29, 2019 11:34 george.banyard@gmail.com ("G. P. B.")
I share the same concerns as Rowan Collins, and I'm really not a fan of the
RFC in general.
Also I think those kind of magic numbers should be constants with
meaningful names, and it that case you could just compute them by adding
powers of ten.
E.g. DISCOUNT_IN_CENTS = 1 * 10^5 + 3 * 10^4 + 5 * 10^3;
Now I'm assuming opcache would compile this into a final integer but I may
be wrong on how the internals of the engine work in this case.
Moreover I feel that people may misread numbers like that if people use
different groupings.
E.g. 1_0000_0000_0000; by skimming rapidly I could think it's a
billion(10^6) when in reality it's a trillion (10^9).
Even if maybe some countries are moving away from the grouping digits in
groups of 4.

I'll probably vote against it but that's only my opinion.

George P. Banyard
  105797
May 29, 2019 15:48 theodorejb@outlook.com (Theodore Brown)
On Wed, May 29, 2019 at 6:34 AM G. P. B. banyard@gmail.com> wrote:

> I share the same concerns as Rowan Collins
From my reading of Rowan's email, he was making a general point that new features can have a cost of added complexity for users. He then clarified "I don't personally think that applies here".
> I'm really not a fan of the RFC in general. Also I think those kind > of magic numbers should be constants with meaningful names, and it > that case you could just compute them by adding powers of ten. > E.g. DISCOUNT_IN_CENTS = 1 * 10^5 + 3 * 10^4 + 5 * 10^3;
Actually I think this example highlights why numeric literal separators can be very helpful for improving readability and preventing mistakes. First, which of these is faster to read? ```php $discount = 1 * 10**5 + 3 * 10**4 + 5 * 10**3; // or $discount = 135_00; ``` Secondly, your example of adding powers of 10 is off by an order of magnitude! It's equivalent to $1,350.00, not $135.00, but this isn't very obvious when reading the complex expression. Of course, if you prefer the first approach you can continue using it. But personally I find the second approach quicker to read and less prone to mistakes.
> Moreover I feel that people may misread numbers like that if people > use different groupings. E.g. 1_0000_0000_0000; by skimming rapidly > I could think it's a billion(10^6) when in reality it's a trillion > (10^9). Even if maybe some countries are moving away from the > grouping digits in groups of 4.
Even with the different grouping, it's faster for me to count the digits in that number than if it had no separator at all.
> I'll probably vote against it but that's only my opinion.
That's up to you. But even if you don't personally have a need for the feature, I think it's worth considering that there are valid use cases for it which can help improve code readability and clarify intent. Best regards, Theodore
  105808
May 30, 2019 21:11 george.banyard@gmail.com ("G. P. B.")
On Wed, 29 May 2019 at 17:48, Theodore Brown <theodorejb@outlook.com> wrote:

> On Wed, May 29, 2019 at 6:34 AM G. P. B. banyard@gmail.com> wrote: > > > I share the same concerns as Rowan Collins > > From my reading of Rowan's email, he was making a general point that > new features can have a cost of added complexity for users. He then > clarified "I don't personally think that applies here". > > > I'm really not a fan of the RFC in general. Also I think those kind > > of magic numbers should be constants with meaningful names, and it > > that case you could just compute them by adding powers of ten. > > E.g. DISCOUNT_IN_CENTS = 1 * 10^5 + 3 * 10^4 + 5 * 10^3; > > Actually I think this example highlights why numeric literal > separators can be very helpful for improving readability and > preventing mistakes. First, which of these is faster to read? > > ```php > $discount = 1 * 10**5 + 3 * 10**4 + 5 * 10**3; > // or > $discount = 135_00; > ``` > > Secondly, your example of adding powers of 10 is off by an order > of magnitude! It's equivalent to $1,350.00, not $135.00, but this > isn't very obvious when reading the complex expression. >
Oh well I suppose that'll teach me trying to write some code on my phone. Of course, if you prefer the first approach you can continue using it.
> But personally I find the second approach quicker to read and less > prone to mistakes. >
I mean I don't really use that as I personally don't have a problem counting digits nor do I use massive numbers. There are also other ways to go about it but that's not really the deal here.
> Moreover I feel that people may misread numbers like that if people > > use different groupings. E.g. 1_0000_0000_0000; by skimming rapidly > > I could think it's a billion(10^6) when in reality it's a trillion > > (10^9). Even if maybe some countries are moving away from the > > grouping digits in groups of 4. > > Even with the different grouping, it's faster for me to count the > digits in that number than if it had no separator at all. >
IMHO using a power of ten in this example would be the "best" solution. But like before that's not really the question here.
> > I'll probably vote against it but that's only my opinion. > > That's up to you. But even if you don't personally have a need for > the feature, I think it's worth considering that there are valid use > cases for it which can help improve code readability and clarify intent. >
I'm just fundamentally against it but if I'm in the minority it will pass and it's not like I'm going to make a fuss about it behind added to the language. Best regards George P. Banyard
  105799
May 29, 2019 18:33 Danack@basereality.com (Dan Ackroyd)
On Wed, 29 May 2019 at 12:35, G. P. B. banyard@gmail.com> wrote:
> > Also I think those kind of magic numbers should be constants with > meaningful names, and it that case you could just compute them by adding > powers of ten. > E.g. DISCOUNT_IN_CENTS = 1 * 10^5 + 3 * 10^4 + 5 * 10^3;
I agree naming things is important, but when I'm working with non-programmers (e.g. the people in the accounts department), and we have lots of numbers in front of us, something like: EXAMPLE_ON_BUDGET = 1_000_000; EXAMPLE_UNDER_BUDGET = 990_000; EXAMPLE_OVER_TARGET = 1_001_000; EXAMPLE_WAY_OVER_BUDGET = 1_100_000; is going to be way easier to read than either the version without underscore, and definitely easier than power of ten notation. cheers Dan Ack
  105796
May 29, 2019 14:03 come@opensides.be (=?ISO-8859-1?Q?C=F4me?= Chilliet)
Le mercredi 29 mai 2019, 10:39:17 CEST Markus Fischer a écrit :
> My understanding from the RFC is that that the grouping is not relevant, > the `_` is stripped regardless. > > Am I wrong?
No you’re not, the RFC allows grouping as the coder wants. Which is why I think it may cause problems because the way the coder wants to group digits and the way easier for me to read is not always the same. As Christoph M. Becker states there are already problems like this with choice of names for variables and code style and such, but until now numbers were a safe place that always looks the same. If people want to see big numbers broke up in groups of 3 I would expect their IDE to do this on numbers for them. But I do get the point of the RFC for hexa and bit masks. Côme
  105798
May 29, 2019 16:49 theodorejb@outlook.com (Theodore Brown)
On Wed, May 29, 2019 at 9:03 AM Côme Chilliet <come@opensides.be> wrote:

> > My understanding from the RFC is that that the grouping is not > > relevant, the `_` is stripped regardless. > > > > Am I wrong? > > No you’re not, the RFC allows grouping as the coder wants. > > Which is why I think it may cause problems because the way the > coder wants to group digits and the way easier for me to read is > not always the same. > > As Christoph M. Becker states there are already problems like this > with choice of names for variables and code style and such, but > until now numbers were a safe place that always looks the same.
Numbers don't always look the same, though. They can already be written using hexadecimal, octal, decimal, binary, or exponential notation. Furthermore, as a workaround for the lack of numeric literal separators, some programmers end up writing numbers as complex expressions like `1 * 10**5 + 3 * 10**4` which can actually make them more difficult to read.
> If people want to see big numbers broke up in groups of 3 I would > expect their IDE to do this on numbers for them.
It isn't always desirable to group big numbers the same way, though. For example, a programmer may want to write `13500` as `135_00` or `13_500` depending on whether or not it represents a financial quantity stored as cents.
> But I do get the point of the RFC for hexa and bit masks.
Yes, this is another case where it can be useful to group by a varying number of digits depending on how a value is being used (e.g. nibbles, bytes, or words). So while it's conceivable that someone could use numeric literal separators to write a number in a less readable way, does this mean that the many good PHP developers shouldn't have the option to use this feature to improve readability? Sincerely, Theodore
  105795
May 29, 2019 12:43 theodorejb@outlook.com (Theodore Brown)
On Wed, May 29, 2019 at 2:49 AM Côme Chilliet <come@opensides.be> wrote:

> What bugs me with this RFC is that it seems to be mainly intended > for grouping digits by 3, which from what I understand is cultural.
While it is expected that grouping decimal literals by 3 will be a frequent use case, the RFC does not enforce an arbitrary group size. Not only would doing so add complexity, but it would also prevent some of the use cases mentioned in the RFC. For example, if you're working with financial quantities stored as cents, it can be useful to group the dollar amount by 3 and the cents by 2: ```php $amount = 100_500_00; // represents $100,500.00 ``` The RFC also contains examples of grouping hex and binary literals by 2, 4, or 8 digits. Not placing restrictions on the size of digit groups enables programmers to choose the grouping that best reflects their intent. It's also consistent with the other languages that support this feature.
> I will not vote no on the RFC, most likely I won’t vote. But I > think I will not use this in my code.
That's fine. Not everyone has a use case for this feature, but it can be very helpful for those that do. Best regards, Theodore
  105800
May 29, 2019 21:25 theodorejb@outlook.com (Theodore Brown)
On Tue, May 28, 2019 at 9:10 AM Theodore Brown <theodorejb@outlook.com> wrote:

> > As requested at the end of April, [1] Bishop and I are resurrecting > > the Numeric Literal Separator RFC: > > https://wiki.php.net/rfc/numeric_literal_separator. > > > > If no serious issues come up, voting will start in two weeks > > (on 2019-05-29). > > Just a heads-up that voting is scheduled to start tomorrow. I've made > some improvements to the implementation based on Nikita's feedback, > and also added a couple more example use cases to the RFC. > > Please let me know if you have any questions or concerns before > voting starts.
Thank you to everyone who participated in the discussion. I added a section to the RFC about whether it should be the role of an IDE to automatically group digits. If no new issues come up, I plan to start the vote tomorrow. Best regards, Theodore [1]: https://externals.io/message/105450