Re: [PHP-DEV] Deprecate and remove case-insensitive constants?

This is only part of a thread. view whole thread
  100537
September 12, 2017 12:52 francois@tekwire.net (=?UTF-8?Q?Fran=c3=a7ois_Laupretre?=)
Hi,

Le 12/09/2017 à 14:02, Christoph M. Becker a écrit :
> Hi everybody! > > Usually constant identifiers are treated case-sensitive in PHP. This is > always the case for constants defined via a `const` declaration. > However, define() allows to pass TRUE as third argument to define a > case-insensitive constant. This feature appears to potentially result > in confusion, and also causes bugs as shown in > <https://bugs.php.net/74450>. See an example created by Nikita to see > some probably unexpected behavior: <https://3v4l.org/L6nCp>. > > Even if these issues could be resolved, I still think allowing both > case-sensitive and case-insensitive constant identifiers does more harm > than good, so either case-sensitive or case-insensitive constant > identifiers should be removed from the language. Since case-sensitive > constant identifiers are already the default, and HHVM doesn't even > support case-insensitive identifiers at all, I would suggest to remove > case-insensitive constant identifiers. > > This could be implemented by triggering E_DEPRECATED whenever the third > argument to define() is TRUE in PHP 7.3, and to remove this parameter > altogether in PHP 8. Most likely some further simplification in the > engine could be done then as well. > > Thoughts?
What about making PHP 8 100% case-sensitive (except true/false) ? If we announce it years in advance, it is possible, IMO. Regards François
  100539
September 12, 2017 13:04 cmbecker69@gmx.de ("Christoph M. Becker")
On 12.09.2017 at 14:52, François Laupretre wrote:

> What about making PHP 8 100% case-sensitive (except true/false) ? If we > announce it years in advance, it is possible, IMO.
I don't think we can do that. Consider, for instance, ext/gd where all functions are actually in lower case, but I've seen a lot of code written in pascal or camel case to make the functions better readable, e.g. imageCreateFromJpeg() vs. imagecreatefromjpeg() -- Christoph M. Becker
  100540
September 12, 2017 14:04 danack@basereality.com (Dan Ackroyd)
On 12 September 2017 at 14:04, Christoph M. Becker <cmbecker69@gmx.de> wrote:
> > I don't think we can do that. Consider, for instance, ext/gd where all > functions are actually in lower case, but I've seen a lot of code > written in pascal or camel case to make the functions better readable, e.g. > > imageCreateFromJpeg() vs. imagecreatefromjpeg()
It's pretty easy to imagine that if we had function autoloading, creating an optional small backwards compatibility shim/library to work around that problem would be pretty easy. It's also the type of error that would be easy to add a deprecation warning to in a late 7.x branch. cheers Dan
  100657
September 16, 2017 03:22 yohgaki@ohgaki.net (Yasuo Ohgaki)
Hi Christoph,

On Tue, Sep 12, 2017 at 10:04 PM, Christoph M. Becker <cmbecker69@gmx.de>
wrote:

> On 12.09.2017 at 14:52, François Laupretre wrote: > > > What about making PHP 8 100% case-sensitive (except true/false) ? If we > > announce it years in advance, it is possible, IMO. > > I don't think we can do that. Consider, for instance, ext/gd where all > functions are actually in lower case, but I've seen a lot of code > written in pascal or camel case to make the functions better readable, e.g. > > imageCreateFromJpeg() vs. imagecreatefromjpeg() >
Consistent function names at the same time, perhaps? https://wiki.php.net/rfc/consistent_function_names -- Yasuo Ohgaki yohgaki@ohgaki.net
  100541
September 12, 2017 14:52 levim@php.net (Levi Morrison)
On Tue, Sep 12, 2017 at 6:52 AM, François Laupretre
<francois@tekwire.net> wrote:
> Hi, > > Le 12/09/2017 à 14:02, Christoph M. Becker a écrit : >> >> Hi everybody! >> >> Usually constant identifiers are treated case-sensitive in PHP. This is >> always the case for constants defined via a `const` declaration. >> However, define() allows to pass TRUE as third argument to define a >> case-insensitive constant. This feature appears to potentially result >> in confusion, and also causes bugs as shown in >> <https://bugs.php.net/74450>. See an example created by Nikita to see >> some probably unexpected behavior: <https://3v4l.org/L6nCp>. >> >> Even if these issues could be resolved, I still think allowing both >> case-sensitive and case-insensitive constant identifiers does more harm >> than good, so either case-sensitive or case-insensitive constant >> identifiers should be removed from the language. Since case-sensitive >> constant identifiers are already the default, and HHVM doesn't even >> support case-insensitive identifiers at all, I would suggest to remove >> case-insensitive constant identifiers. >> >> This could be implemented by triggering E_DEPRECATED whenever the third >> argument to define() is TRUE in PHP 7.3, and to remove this parameter >> altogether in PHP 8. Most likely some further simplification in the >> engine could be done then as well. >> >> Thoughts? > > > What about making PHP 8 100% case-sensitive (except true/false) ? If we > announce it years in advance, it is possible, IMO. > > Regards > > François > > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php
By itself this change provides little value. If it was done in connection with other features such as merging symbol tables then we can actually gain some significant improvements: array_map(sum2, $input1, $input2); Currently that requires `sum2` to be a constant. To get the correct behavior we currently need to do: array_map('fully\qualified\namespace\sum2', $input1, $input2); This is not just convenience; it provides safety to refactoring and general code analysis tools. Maintenance is a crucial aspect of large code bases and being able to move away from stringly-typed things is a significant improvement. It's also a step towards general autoloading instead of just class/trait/interface autoloading; however this would require further changes. I believe these improvements would be worth it and do understand it is a large backwards compatibility break. Given sufficient time and tooling to prepare I think PHP would be markedly better in the long-run for these two changes. However, if we change only the case sensitivity of constants we gain little value for our BC break.
  100542
September 12, 2017 14:55 levim@php.net (Levi Morrison)
> By itself this change provides little value. If it was done in > connection with other features such as merging symbol tables then we > can actually gain some significant improvements: > > array_map(sum2, $input1, $input2); > > Currently that requires `sum2` to be a constant. To get the correct > behavior we currently need to do: > > array_map('fully\qualified\namespace\sum2', $input1, $input2);
After rewriting my reply I noticed this sentence doesn't quite make sense:
> This is not just convenience; it provides safety to refactoring and > general code analysis tools.
Instead I meant that using the string is not just inconvenient; it also prevents fully-safe code refactoring and analysis.
> Maintenance is a crucial aspect of large > code bases and being able to move away from stringly-typed things is a > significant improvement. It's also a step towards general autoloading > instead of just class/trait/interface autoloading; however this would > require further changes. > > I believe these improvements would be worth it and do understand it is > a large backwards compatibility break. Given sufficient time and > tooling to prepare I think PHP would be markedly better in the > long-run for these two changes. However, if we change only the case > sensitivity of constants we gain little value for our BC break.
  100544
September 12, 2017 16:17 rowan.collins@gmail.com (Rowan Collins)
On 12 September 2017 15:52:38 BST, Levi Morrison <levim@php.net> wrote:
> array_map(sum2, $input1, $input2); > >Currently that requires `sum2` to be a constant.
I'm not clear what this has to do with case sensitivity; the problem here is that we don't have a type of "function reference" (nor "class reference") so simulate such references with strings and runtime assertions. Are you saying that without case sensitivity, the language could deduce that sum2 in that case was a function reference? That seems optimistic: not only can you have a class, a constant, and a function all with the same name, but you can't actually know which exists until the line is executed, because all three can be defined at any time. This kind of ambiguous syntax is precisely what I was trying to reduce by deprecating "undefined constant as string", and a similar "convenient fallback" (from current to global namespace) is currently the biggest thing blocking function autoloading. If we want function and class references, they should have their own, unambiguous, syntax. Apologies if I've completely missed the point here. Regards, -- Rowan Collins [IMSoP]
  100545
September 12, 2017 16:45 levim@php.net (Levi Morrison)
> Apologies if I've completely missed the point here.
Oh well, it happens.
> Are you saying that without case sensitivity, the language could deduce that sum2 in that case was a function reference? That seems optimistic: not only can you have a class, a constant, and a function all with the same name, but you can't actually know which exists until the line is executed, because all three can be defined at any time.
Close. If we make case sensitivity consistent (either all insensitive or all sensitive) *and* merge symbol tables then we can get actual features out of it. As it stands just changing the case sensitivity does not buy as any features for our BC break. The rest of my message only makes sense once you understand I was proposing unified case sensitivity for all symbols *and* merging them into one table.
>> array_map(sum2, $input1, $input2); >> >>Currently that requires `sum2` to be a constant. > > I'm not clear what this has to do with case sensitivity; the problem here is that we don't have a type of "function reference" (nor "class reference") so simulate such references with strings and runtime assertions.
This confusion stems from the aforementioned items.
> If we want function and class references, they should have their own, unambiguous, syntax.
My point was rather that if we fix our inconsistency issues and merge the tables no such syntax is required; all existing syntax works. There are engine changes that have to accompany those as well, obviously. In summary I think changing constant case sensitivity is too small of a step to gain us anything, but would be *very* happy to take it further because it will give us actual features for our trouble.
  100548
September 12, 2017 17:59 rowan.collins@gmail.com (Rowan Collins)
On 12 September 2017 17:45:46 BST, Levi Morrison <levim@php.net> wrote:
>The rest of my message only makes sense once you understand I was >proposing unified case sensitivity for all symbols *and* merging them >into one table.
Ah, OK, so I partially missed the point. I'm still not sure what you're suggesting is sensible, though...
>> If we want function and class references, they should have their own, >>unambiguous, syntax.
I stand by this assertion. Consider the following statement: $foo = bar; Even if "bar" cannot *simultaneously* be the name of a function, a class, and a constant, it can still *potentially* be any of the three, from the point of view of the compiler. So, far from allowing us to make nice inferences about function references vs strings-that-look-callable, we have now *broken* assumptions we could previously have made. It seems like we'd just be adding another equally ambiguous way of writing the same code. Regards, -- Rowan Collins [IMSoP]
  100551
September 12, 2017 20:35 levim@php.net (Levi Morrison)
On Tue, Sep 12, 2017 at 11:59 AM, Rowan Collins collins@gmail.com> wrote:
> On 12 September 2017 17:45:46 BST, Levi Morrison <levim@php.net> wrote: >>The rest of my message only makes sense once you understand I was >>proposing unified case sensitivity for all symbols *and* merging them >>into one table. > > Ah, OK, so I partially missed the point. I'm still not sure what you're suggesting is sensible, though... > > >>> If we want function and class references, they should have their own, >>>unambiguous, syntax. > > I stand by this assertion. Consider the following statement: > > $foo = bar; > > Even if "bar" cannot *simultaneously* be the name of a function, a class, and a constant, it can still *potentially* be any of the three, from the point of view of the compiler.
If it's known, it's known, and it can proceed with that type. If it's not known then autoload and proceed like normal. I fail to see how this is an issue, and in fact, see it as a *significant* improvement to our current situation...
  100559
September 13, 2017 13:48 rowan.collins@gmail.com (Rowan Collins)
Regards,
On 12 September 2017 21:35:51 BST, Levi Morrison <levim@php.net> wrote:
>On Tue, Sep 12, 2017 at 11:59 AM, Rowan Collins >collins@gmail.com> wrote: >>>> If we want function and class references, they should have their >own, >>>>unambiguous, syntax. >> >> I stand by this assertion. Consider the following statement: >> >> $foo = bar; >> >> Even if "bar" cannot *simultaneously* be the name of a function, a >class, and a constant, it can still *potentially* be any of the three, >from the point of view of the compiler. > >If it's known, it's known, and it can proceed with that type. If it's >not known then autoload and proceed like normal. I fail to see how >this is an issue, and in fact, see it as a *significant* improvement >to our current situation...
If the symbol tables had always been unified, I guess you could think of a function name as a constant whose value happened to be of type IS_FUNC - like how in JS "function foo() {}" and "var foo = function{}" are very nearly interchangeable. But it feels like retrofitting that onto the existing language would be messy. For instance, an autoloader would have to be given a token name, with no context of whether it's expected to be a class, function, or constant. (Of course we'd have to solve the dilemma of how global function fallback/shadowing should interact with autoloading first.) Users would have to learn this concept of an untyped token, because the error message they'd get if it wasn't defined could no longer say "undefined constant". Then there's all the existing support for string-based callables. I can't actually think of any cases that are unresolvable, but there's some odd implications: function foo() { echo 'Hello, world!'; } const bar='foo'; $fn = bar; $fn(); // already works bar(); // would this work? if not, why not, since it's no longer ambiguous? const baz='bar'; $fn2 = baz; $fn2(); // in which case, would this also work? baz(); // and then what about this? I feel like this could lead to confusion either way, and just increase the complexity for both human and machine analysis. Then there's other symbol tables that would need to be unified - we'd want $foo->bar be able to grant a method reference, and Foo::bar a static method reference. Just how much code is it worth breaking to allow this syntax? It feels a lot cleaner to say "function and class references are a new concept, and you'll know when you're using them because they look like this". Something like "SomeClass::classref", "some_func::funcref", "SomeClass::someStaticMethod::funcref", "$some_object->someMethod::funcref".   -- Rowan Collins [IMSoP]
  100553
September 12, 2017 21:32 cmbecker69@gmx.de ("Christoph M. Becker")
On 12.09.2017 at 16:52, Levi Morrison wrote:

> On Tue, Sep 12, 2017 at 6:52 AM, François Laupretre > <francois@tekwire.net> wrote: >> >> Le 12/09/2017 à 14:02, Christoph M. Becker a écrit : >> >>> Even if these issues could be resolved, I still think allowing both >>> case-sensitive and case-insensitive constant identifiers does more harm >>> than good, so either case-sensitive or case-insensitive constant >>> identifiers should be removed from the language. Since case-sensitive >>> constant identifiers are already the default, and HHVM doesn't even >>> support case-insensitive identifiers at all, I would suggest to remove >>> case-insensitive constant identifiers. >>> >>> This could be implemented by triggering E_DEPRECATED whenever the third >>> argument to define() is TRUE in PHP 7.3, and to remove this parameter >>> altogether in PHP 8. Most likely some further simplification in the >>> engine could be done then as well. > > […] However, if we change only the case > sensitivity of constants we gain little value for our BC break.
I have not suggested to *change* the case sensivity of constants, but rather to settle on a common case – since `const` constants are always case-sensitive, it appears that this should be so for define'd constants. This would make code as the following to work as expected: a.php b.php true echo FOO; // => bar - WFT? ?> And it obviously would fix a bug. IMHO, that is sufficient gain for a presumably moderate BC break. Please note, that I do not want to pursue a discussion regarding changing all constants to be case-sensitive or all functions and class names to be case-insensitive. Of course, it is fine to discuss it, but it is clearly out of scope for what I'm trying to improve (in my opinion) here, which is more in the "a bird in the hand is worth two in the bush" corner. If it will be decided that all constant identifiers should be case-insensitive, I'd be fine with it (not happy, though). Probably, I should reword the RFC to reflect that it is actually about deprecation and removal of the third parameter of define() (plus preventing any extension to register constants which do not conform to the "default" casing). In short: don't have two kinds of constants wrt. spelling (true, false, null are not covered, since they are special anyway and could be promoted to keywords). -- Christoph M. Becker -- Christoph