Re: [PHP-DEV] Proposal for a new basic function: str_contains

This is only part of a thread. view whole thread
March 3, 2020 13:53 (Andreas Heigl)
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: quoted-printable

Am 03.03.20 um 14:29 schrieb Nicolas Grekas:
> Le mar. 3 mars 2020 =C3=A0 11:04, Rowan Tommins m> a
> =C3=A9crit : >=20 >> On Tue, 3 Mar 2020 at 08:46, Andreas Heigl <> wrote: >> >>> >>> While it is mainly aimed at being a mere convenience-function that co= uld
>>> also be easily implemented in userland it misses one main thing IMO w= hen
>>> handling unicode-strings: Normalization. >>> >>> >> >> While I would love to see more functionality for handling Unicode whic= h
>> didn't treat it as just another character set, I don't think sprinklin= g it
>> into the main string functions of the language would be the right appr= oach.
>> Even if we changed all the existing functions to be "Unicode-aware", a= s was
>> planned for PHP 6, the resulting API would not handle all cases correc= tly.
>> >> In this case, a Unicode-based string API ought to provide at least two=
>> variants of "contains", as options or separate functions: >> >> - a version which matches on code point, for answering queries like "d= oes
>> this string contain right-to-left override characters?" >> - at least one form of normalization, but probably several >> >> If there was serious work on a new string API in progress, a freeze on=
>> additions to the current API would make sense; but right now, the >> byte-based string API is what we have, and I think this function is a >> sensible addition to it. >> >=20 >=20 > FYI, I wrote a String handling lib, shipped as Symfony String: > - doc: > - src: >=20 > TL;DR, it provides 3 classes of value objects, dealing with bytes, code=
> points and grapheme cluster (~=3D normalized unicode) >=20 > It makes no sense to have `str_contains()` or any global function able = to
> deal with Unicode normalization *unless* the PHP string values embed th= eir
> unit system (one of: bytes, codepoints or graphemes). >=20 > With this rationale, I agree with Rowan: PHP's native string functions = deal
> with bytes. So should str_contains(). Other unit systems can be impleme= nted
> in userland (until PHP implements something similar to Symfony String i= n
> core - but that's another topic.)
str_contains as it currently is implemented can also easily be implemented in userland. That was my reasoning. I would think otherwise would it take unicode into account as that's much harder to implement in userland. And I didn'T want to start a new discussion, I merely wanted to explain the reasoning behind my decission. Cheers Andreas --=20 ,,, (o o) +---------------------------------------------------------ooO-(_)-Ooo-+ | Andreas Heigl | | N 50=C2=B022'59.5" E 08=C2=B0= 23'58" | | | +---------------------------------------------------------------------+ | | +---------------------------------------------------------------------+ --X3DZcFQg0hKFkAZpXpbjfOujTUoMEt66M--