Re: [PHP-DEV] Deprecate and remove case-insensitive constants?

This is only part of a thread. view whole thread
  100564
September 13, 2017 16:03 TonyMarston@hotmail.com ("Tony Marston")
"Levi Morrison"  wrote in message 
news:CAFMT4NrC43y-nL_V85qt7JgV1ohM0y4KExhB4e3mi1EjHJ0hBw@mail.gmail.com...
> >On Wed, Sep 13, 2017 at 2:59 AM, Tony Marston <TonyMarston@hotmail.com> >wrote: >> People who think that case sensitive software is cool are deluding >> themselves. When I started working on mainframe computers (UNIVAC and >> IBM) >> in the early 1970s everything was case-insensitive. This was only changed >> by >> people who did not understand the ramifications of their choice. > >Actually there are concrete bugs caused by case insensitivity. For one >example, here is our own bugs.ph p.net report about a Turkish locale >issue: > > https://bugs.php.net/bug.php?id=18556 > >The short summary of the issue is that when capital `I`, the ninth >letter of the English alphabet, is lowercased in the Turkish locales >it does not become the same `i` as it does in English but a different >i that is not considered equal. Thus classes such as `Iterator` are >not found in the Turkish locales. Note that this bug was fixed, and >then there was a regression that lasted until PHP 5.5. > >There are other case insensitivity bugs but this Turkish one is the >poster child and if you search around you can find many examples of >it. > >Case sensitivity is thus *a correctness issue* and not a "cool"ness, >personal preference, performance, or some other type of issue. I argue >correctness and maintenance issues are the most important and thus if >we change sensitivity of *any* type of symbol it should go in the >direction of being case sensitive. Someone can disagree on what they >value but people who think case insensitivity is not a correctness >issue "are deluding themselves". > >Levi Morrison
I'm sorry, but errors in translation from one character set to another are insignificant when compared with the much larger problem of the same word having diferent meanings depending on case. In the English language "info" is the same as "Info" is the same as "INFO" is the same as "iNFO" is the same as "iNfO" and so on. If the problem is that an English word cannot be recognised as the same word regardless of case when switching to a non-English character set then the issue is with switching to a non-English character set. Introducing case sensitivity just for this minor bug would create more issues than it would solve, so this bug should be solved using a different technique . -- Tony Marston
  100576
September 14, 2017 09:23 TonyMarston@hotmail.com ("Tony Marston")
""Tony Marston""  wrote in message news:09.43.19300.8E659B95@pb1.pair.com...
> >"Levi Morrison" wrote in message >news:CAFMT4NrC43y-nL_V85qt7JgV1ohM0y4KExhB4e3mi1EjHJ0hBw@mail.gmail.com... >> >>On Wed, Sep 13, 2017 at 2:59 AM, Tony Marston <TonyMarston@hotmail.com> >>wrote: >>> People who think that case sensitive software is cool are deluding >>> themselves. When I started working on mainframe computers (UNIVAC and >>> IBM) >>> in the early 1970s everything was case-insensitive. This was only >>> changed by >>> people who did not understand the ramifications of their choice. >> >>Actually there are concrete bugs caused by case insensitivity. For one >>example, here is our own bugs.ph p.net report about a Turkish locale >>issue: >> >> https://bugs.php.net/bug.php?id=18556 >> >>The short summary of the issue is that when capital `I`, the ninth >>letter of the English alphabet, is lowercased in the Turkish locales >>it does not become the same `i` as it does in English but a different >>i that is not considered equal. Thus classes such as `Iterator` are >>not found in the Turkish locales. Note that this bug was fixed, and >>then there was a regression that lasted until PHP 5.5. >> >>There are other case insensitivity bugs but this Turkish one is the >>poster child and if you search around you can find many examples of >>it. >> >>Case sensitivity is thus *a correctness issue* and not a "cool"ness, >>personal preference, performance, or some other type of issue. I argue >>correctness and maintenance issues are the most important and thus if >>we change sensitivity of *any* type of symbol it should go in the >>direction of being case sensitive. Someone can disagree on what they >>value but people who think case insensitivity is not a correctness >>issue "are deluding themselves". >> >>Levi Morrison > >I'm sorry, but errors in translation from one character set to another are >insignificant when compared with the much larger problem of the same word >having diferent meanings depending on case. In the English language "info" >is the same as "Info" is the same as "INFO" is the same as "iNFO" is the >same as "iNfO" and so on. If the problem is that an English word cannot be >recognised as the same word regardless of case when switching to a >non-English character set then the issue is with switching to a non-English >character set. > >Introducing case sensitivity just for this minor bug would create more >issues than it would solve, so this bug should be solved using a different >technique . >
Would this problem disappear by using UTF8 instead of the Turkish character set? If so then ten no other solution would be required. -- Tony Marston
  100578
September 14, 2017 12:02 rowan.collins@gmail.com (Rowan Collins)
On 14 September 2017 10:23:48 BST, Tony Marston
>Would this problem disappear by using UTF8 instead of the Turkish >character >set? If so then ten no other solution would be required.
No, the problem has nothing to do with character sets, but with the actual alphabet that humans in Turkey use, which doesn't follow the same rules as the alphabet that American humans use. Unicode (the standard, not the character set or any of its encodings) has an algorithm / lookup table for "case folding", because "convert everything to lower case" is not a reliable way to produce case insensitive comparisons. Using that correctly world presumably solve this particular problem. The bottom line is that case sensitive comparisons are easier than case insensitive ones. Early programming languages and OSes just ignored the edge cases (or ignored the existence of the world outside the USA altogether), some later ones decided the whole thing wasn't worth the effort. Regards, -- Rowan Collins [IMSoP]
  100658
September 16, 2017 08:57 TonyMarston@hotmail.com ("Tony Marston")
"Rowan Collins"  wrote in message 
news:A7FFCE81-74E5-47D0-9EBF-9BDC90E2E957@gmail.com...
> >On 14 September 2017 10:23:48 BST, Tony Marston >>Would this problem disappear by using UTF8 instead of the Turkish >>character >>set? If so then ten no other solution would be required. > >No, the problem has nothing to do with character sets, but with the actual >alphabet that humans in Turkey use, which doesn't follow the same rules as >the alphabet that American humans use. > >Unicode (the standard, not the character set or any of its encodings) has >an algorithm / lookup table for "case folding", because "convert everything >to lower case" is not a reliable way to produce case insensitive >comparisons. Using that correctly world presumably solve this particular >problem. > >The bottom line is that case sensitive comparisons are easier than case >insensitive ones.
A programmer's job is to write software which makes life easier for his users, not to remove features which his users are used to just because it is "more convenient" for him. While the vast majority of characters in any character set have a one-to-one mapping between upper and lower case, there are exceptions. I have been writing software for several decades, and I have come to know the 80-20 rule which states that 80% of the code is for "normal" circumstances while 20% is for the exceptions, yet coding for the "normal" circumstances takes 20% of the effort while the exceptions require 80%. It is the programmer's job to deal with these exceptions, so to say that it's not going to be done because it is not easy is a very poor excuse. -- Tony Marston