## Re: [PHP-DEV] Deprecate and remove case-insensitive constants?

September 12, 2017 12:52
Hi,

Le 12/09/2017 Ã  14:02, Christoph M. Becker a Ã©critÂ :
> Hi everybody!
>
> Usually constant identifiers are treated case-sensitive in PHP.  This is
> always the case for constants defined via a const declaration.
> However, define() allows to pass TRUE as third argument to define a
> case-insensitive constant.  This feature appears to potentially result
> in confusion, and also causes bugs as shown in
> <https://bugs.php.net/74450>.  See an example created by Nikita to see
> some probably unexpected behavior: <https://3v4l.org/L6nCp>.
>
> Even if these issues could be resolved, I still think allowing both
> case-sensitive and case-insensitive constant identifiers does more harm
> than good, so either case-sensitive or case-insensitive constant
> identifiers should be removed from the language.  Since case-sensitive
> constant identifiers are already the default, and HHVM doesn't even
> support case-insensitive identifiers at all, I would suggest to remove
> case-insensitive constant identifiers.
>
> This could be implemented by triggering E_DEPRECATED whenever the third
> argument to define() is TRUE in PHP 7.3, and to remove this parameter
> altogether in PHP 8.  Most likely some further simplification in the
> engine could be done then as well.
>
> Thoughts?

What about making PHP 8 100% case-sensitive (except true/false) ? If we
announce it years in advance, it is possible, IMO.

Regards

FranÃ§ois
September 12, 2017 13:04
On 12.09.2017 at 14:52, FranÃ§ois Laupretre wrote:

> What about making PHP 8 100% case-sensitive (except true/false) ? If we
> announce it years in advance, it is possible, IMO.

I don't think we can do that.  Consider, for instance, ext/gd where all
functions are actually in lower case, but I've seen a lot of code
written in pascal or camel case to make the functions better readable, e.g.

imageCreateFromJpeg() vs. imagecreatefromjpeg()

--
Christoph M. Becker
September 12, 2017 14:04
On 12 September 2017 at 14:04, Christoph M. Becker <cmbecker69@gmx.de> wrote:
>
> I don't think we can do that.  Consider, for instance, ext/gd where all
> functions are actually in lower case, but I've seen a lot of code
> written in pascal or camel case to make the functions better readable, e.g.
>
>   imageCreateFromJpeg() vs. imagecreatefromjpeg()

creating an optional small backwards compatibility shim/library to
work around that problem would be pretty easy.

It's also the type of error that would be easy to add a deprecation
warning to in a late 7.x branch.

cheers
Dan
September 16, 2017 03:22
Hi Christoph,

On Tue, Sep 12, 2017 at 10:04 PM, Christoph M. Becker <cmbecker69@gmx.de>
wrote:

> On 12.09.2017 at 14:52, FranÃ§ois Laupretre wrote:
>
> > What about making PHP 8 100% case-sensitive (except true/false) ? If we
> > announce it years in advance, it is possible, IMO.
>
> I don't think we can do that.  Consider, for instance, ext/gd where all
> functions are actually in lower case, but I've seen a lot of code
> written in pascal or camel case to make the functions better readable, e.g.
>
>   imageCreateFromJpeg() vs. imagecreatefromjpeg()
>

Consistent function names at the same time, perhaps?
https://wiki.php.net/rfc/consistent_function_names

--
Yasuo Ohgaki
yohgaki@ohgaki.net
September 12, 2017 14:52
On Tue, Sep 12, 2017 at 6:52 AM, FranÃ§ois Laupretre
<francois@tekwire.net> wrote:
> Hi,
>
> Le 12/09/2017 Ã  14:02, Christoph M. Becker a Ã©crit :
>>
>> Hi everybody!
>>
>> Usually constant identifiers are treated case-sensitive in PHP.  This is
>> always the case for constants defined via a const declaration.
>> However, define() allows to pass TRUE as third argument to define a
>> case-insensitive constant.  This feature appears to potentially result
>> in confusion, and also causes bugs as shown in
>> <https://bugs.php.net/74450>.  See an example created by Nikita to see
>> some probably unexpected behavior: <https://3v4l.org/L6nCp>.
>>
>> Even if these issues could be resolved, I still think allowing both
>> case-sensitive and case-insensitive constant identifiers does more harm
>> than good, so either case-sensitive or case-insensitive constant
>> identifiers should be removed from the language.  Since case-sensitive
>> constant identifiers are already the default, and HHVM doesn't even
>> support case-insensitive identifiers at all, I would suggest to remove
>> case-insensitive constant identifiers.
>>
>> This could be implemented by triggering E_DEPRECATED whenever the third
>> argument to define() is TRUE in PHP 7.3, and to remove this parameter
>> altogether in PHP 8.  Most likely some further simplification in the
>> engine could be done then as well.
>>
>> Thoughts?
>
>
> What about making PHP 8 100% case-sensitive (except true/false) ? If we
> announce it years in advance, it is possible, IMO.
>
> Regards
>
> FranÃ§ois
>
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php

By itself this change provides little value. If it was done in
connection with other features such as merging symbol tables then we
can actually gain some significant improvements:

array_map(sum2, $input1,$input2);

Currently that requires sum2 to be a constant. To get the correct
behavior we currently need to do:

array_map('fully\qualified\namespace\sum2', $input1,$input2);

This is not just convenience; it provides safety to refactoring and
general code analysis tools. Maintenance is a crucial aspect of large
code bases and being able to move away from stringly-typed things is a
require further changes.

I believe these improvements would be worth it and do understand it is
a large backwards compatibility break. Given sufficient time and
tooling to prepare I think PHP would be markedly better in the
long-run for these two changes. However, if we change only the case
sensitivity of constants we gain little value for our BC break.
September 12, 2017 14:55
> By itself this change provides little value. If it was done in
> connection with other features such as merging symbol tables then we
> can actually gain some significant improvements:
>
>     array_map(sum2, $input1,$input2);
>
> Currently that requires sum2 to be a constant. To get the correct
> behavior we currently need to do:
>
>     array_map('fully\qualified\namespace\sum2', $input1,$input2);

After rewriting my reply I noticed this sentence doesn't quite make sense:

> This is not just convenience; it provides safety to refactoring and
> general code analysis tools.

Instead I meant that using the string is not just inconvenient; it
also prevents fully-safe code refactoring and analysis.

> Maintenance is a crucial aspect of large
> code bases and being able to move away from stringly-typed things is a
> require further changes.
>
> I believe these improvements would be worth it and do understand it is
> a large backwards compatibility break. Given sufficient time and
> tooling to prepare I think PHP would be markedly better in the
> long-run for these two changes. However, if we change only the case
> sensitivity of constants we gain little value for our BC break.
September 12, 2017 16:17
On 12 September 2017 15:52:38 BST, Levi Morrison <levim@php.net> wrote:
>    array_map(sum2, $input1,$input2);
>
>Currently that requires sum2 to be a constant.

I'm not clear what this has to do with case sensitivity; the problem here is that we don't have a type of "function reference" (nor "class reference") so simulate such references with strings and runtime assertions.

Are you saying that without case sensitivity, the language could deduce that sum2 in that case was a function reference? That seems optimistic: not only can you have a class, a constant, and a function all with the same name, but you can't actually know which exists until the line is executed, because all three can be defined at any time.

This kind of ambiguous syntax is precisely what I was trying to reduce by deprecating "undefined constant as string", and a similar "convenient fallback" (from current to global namespace) is currently the biggest thing blocking function autoloading.

If we want function and class references, they should have their own, unambiguous, syntax.

Apologies if I've completely missed the point here.

Regards,

--
Rowan Collins
[IMSoP]
September 12, 2017 16:45
> Apologies if I've completely missed the point here.

Oh well, it happens.

> Are you saying that without case sensitivity, the language could deduce that sum2 in that case was a function reference? That seems optimistic: not only can you have a class, a constant, and a function all with the same name, but you can't actually know which exists until the line is executed, because all three can be defined at any time.

Close. If we make case sensitivity consistent (either all insensitive
or all sensitive) *and* merge symbol tables then we can get actual
features out of it. As it stands just changing the case sensitivity
does not buy as any features for our BC break.

The rest of my message only makes sense once you understand I was
proposing unified case sensitivity for all symbols *and* merging them
into one table.

>>    array_map(sum2, $input1,$input2);
>>
>>Currently that requires sum2 to be a constant.
>
> I'm not clear what this has to do with case sensitivity; the problem here is that we don't have a type of "function reference" (nor "class reference") so simulate such references with strings and runtime assertions.

This confusion stems from the aforementioned items.

> If we want function and class references, they should have their own, unambiguous, syntax.

My point was rather that if we fix our inconsistency issues and merge
the tables no such syntax is required; all existing syntax works.
There are engine changes that have to accompany those as well,
obviously.

In summary I think changing constant case sensitivity is too small of
a step to gain us anything, but would be *very* happy to take it
further because it will give us actual features for our trouble.
September 12, 2017 17:59
On 12 September 2017 17:45:46 BST, Levi Morrison <levim@php.net> wrote:
>The rest of my message only makes sense once you understand I was
>proposing unified case sensitivity for all symbols *and* merging them
>into one table.

Ah, OK, so I partially missed the point. I'm still not sure what you're suggesting is sensible, though...

>> If we want function and class references, they should have their own,
>>unambiguous, syntax.

I stand by this assertion. Consider the following statement:

$foo = bar; Even if "bar" cannot *simultaneously* be the name of a function, a class, and a constant, it can still *potentially* be any of the three, from the point of view of the compiler. So, far from allowing us to make nice inferences about function references vs strings-that-look-callable, we have now *broken* assumptions we could previously have made. It seems like we'd just be adding another equally ambiguous way of writing the same code. Regards, -- Rowan Collins [IMSoP] September 12, 2017 20:35 On Tue, Sep 12, 2017 at 11:59 AM, Rowan Collins collins@gmail.com> wrote: > On 12 September 2017 17:45:46 BST, Levi Morrison <levim@php.net> wrote: >>The rest of my message only makes sense once you understand I was >>proposing unified case sensitivity for all symbols *and* merging them >>into one table. > > Ah, OK, so I partially missed the point. I'm still not sure what you're suggesting is sensible, though... > > >>> If we want function and class references, they should have their own, >>>unambiguous, syntax. > > I stand by this assertion. Consider the following statement: > >$foo = bar;
>
> Even if "bar" cannot *simultaneously* be the name of a function, a class, and a constant, it can still *potentially* be any of the three, from the point of view of the compiler.

If it's known, it's known, and it can proceed with that type. If it's
not known then autoload and proceed like normal. I fail to see how
this is an issue, and in fact, see it as a *significant* improvement
to our current situation...
September 13, 2017 13:48
Regards,
On 12 September 2017 21:35:51 BST, Levi Morrison <levim@php.net> wrote:
>On Tue, Sep 12, 2017 at 11:59 AM, Rowan Collins
>collins@gmail.com> wrote:
>>>> If we want function and class references, they should have their
>own,
>>>>unambiguous, syntax.
>>
>> I stand by this assertion. Consider the following statement:
>>
>> $foo = bar; >> >> Even if "bar" cannot *simultaneously* be the name of a function, a >class, and a constant, it can still *potentially* be any of the three, >from the point of view of the compiler. > >If it's known, it's known, and it can proceed with that type. If it's >not known then autoload and proceed like normal. I fail to see how >this is an issue, and in fact, see it as a *significant* improvement >to our current situation... If the symbol tables had always been unified, I guess you could think of a function name as a constant whose value happened to be of type IS_FUNC - like how in JS "function foo() {}" and "var foo = function{}" are very nearly interchangeable. But it feels like retrofitting that onto the existing language would be messy. For instance, an autoloader would have to be given a token name, with no context of whether it's expected to be a class, function, or constant. (Of course we'd have to solve the dilemma of how global function fallback/shadowing should interact with autoloading first.) Users would have to learn this concept of an untyped token, because the error message they'd get if it wasn't defined could no longer say "undefined constant". Then there's all the existing support for string-based callables. I can't actually think of any cases that are unresolvable, but there's some odd implications: function foo() { echo 'Hello, world!'; } const bar='foo';$fn = bar;
$fn(); // already works bar(); // would this work? if not, why not, since it's no longer ambiguous? const baz='bar';$fn2 = baz;
$fn2(); // in which case, would this also work? baz(); // and then what about this? I feel like this could lead to confusion either way, and just increase the complexity for both human and machine analysis. Then there's other symbol tables that would need to be unified - we'd want$foo->bar be able to grant a method reference, and Foo::bar a static method reference. Just how much code is it worth breaking to allow this syntax?

It feels a lot cleaner to say "function and class references are a new concept, and you'll know when you're using them because they look like this". Something like "SomeClass::classref", "some_func::funcref", "SomeClass::someStaticMethod::funcref", "\$some_object->someMethod::funcref".
Â
--
Rowan Collins
[IMSoP]
September 12, 2017 21:32
On 12.09.2017 at 16:52, Levi Morrison wrote:

> On Tue, Sep 12, 2017 at 6:52 AM, FranÃ§ois Laupretre
> <francois@tekwire.net> wrote:
>>
>> Le 12/09/2017 Ã  14:02, Christoph M. Becker a Ã©crit :
>>
>>> Even if these issues could be resolved, I still think allowing both
>>> case-sensitive and case-insensitive constant identifiers does more harm
>>> than good, so either case-sensitive or case-insensitive constant
>>> identifiers should be removed from the language.  Since case-sensitive
>>> constant identifiers are already the default, and HHVM doesn't even
>>> support case-insensitive identifiers at all, I would suggest to remove
>>> case-insensitive constant identifiers.
>>>
>>> This could be implemented by triggering E_DEPRECATED whenever the third
>>> argument to define() is TRUE in PHP 7.3, and to remove this parameter
>>> altogether in PHP 8.  Most likely some further simplification in the
>>> engine could be done then as well.
>
> [â¦] However, if we change only the case
> sensitivity of constants we gain little value for our BC break.

I have not suggested to *change* the case sensivity of constants, but
rather to settle on a common case â since const constants are always
case-sensitive, it appears that this should be so for define'd
constants.  This would make code as the following to work as expected:

a.php

b.php
true
echo FOO; // => bar - WFT?
?>

And it obviously would fix a bug.  IMHO, that is sufficient gain for a
presumably moderate BC break.

Please note, that I do not want to pursue a discussion regarding
changing all constants to be case-sensitive or all functions and class
names to be case-insensitive.  Of course, it is fine to discuss it, but
it is clearly out of scope for what I'm trying to improve (in my
opinion) here, which is more in the "a bird in the hand is worth two in
the bush" corner.

If it will be decided that all constant identifiers should be
case-insensitive, I'd be fine with it (not happy, though).  Probably, I
should reword the RFC to reflect that it is actually about deprecation
and removal of the third parameter of define() (plus preventing any
extension to register constants which do not conform to the "default"
casing).  In short: don't have two kinds of constants wrt. spelling
(true, false, null are not covered, since they are special anyway and
could be promoted to keywords).

--
Christoph M. Becker

--
Christoph