RFC Posted for str_begins and str_ends functions

  94787
August 1, 2016 22:52 will@wkhudgins.info
Hello,

I recently emailed the group about submitting an RFC for str_begins() 
and str_ends() functions. The RFC has now been officially submitted and 
is viewable at:

https://wiki.php.net/rfc/add_str_begin_and_end_functions

The github PR may be found at:

https://github.com/php/php-src/pull/2049

Hope to be hearing about this,

Will
  94788
August 1, 2016 22:56 pollita@php.net (Sara Golemon)
On Mon, Aug 1, 2016 at 3:52 PM,  <will@wkhudgins.info> wrote:
> I recently emailed the group about submitting an RFC for str_begins() and > str_ends() functions. The RFC has now been officially submitted and is > viewable at: > > https://wiki.php.net/rfc/add_str_begin_and_end_functions > Feeling "meh" on it (neither for nor against), but I would consider
consistency with other str*() functions by making case-insensitivity live in separate functions rather than as a parameter. e.g. str_begins(), str_ibegins(), str_ends(), end_iends() -Sara
  94790
August 2, 2016 00:06 yohgaki@ohgaki.net (Yasuo Ohgaki)
On Tue, Aug 2, 2016 at 7:56 AM, Sara Golemon <pollita@php.net> wrote:
> On Mon, Aug 1, 2016 at 3:52 PM, <will@wkhudgins.info> wrote: >> I recently emailed the group about submitting an RFC for str_begins() and >> str_ends() functions. The RFC has now been officially submitted and is >> viewable at: >> >> https://wiki.php.net/rfc/add_str_begin_and_end_functions >> > Feeling "meh" on it (neither for nor against), but I would consider > consistency with other str*() functions by making case-insensitivity > live in separate functions rather than as a parameter. e.g. > str_begins(), str_ibegins(), str_ends(), end_iends()
+1 for having functions for case insensitivity. I'm not sure if we should have "s". i.e. str_begin"s". Regards, -- Yasuo Ohgaki yohgaki@ohgaki.net
  94792
August 2, 2016 01:36 david.proweb@gmail.com (David Rodrigues)
Sara Golemon wrote:
> Feeling "meh" on it (neither for nor against), but I would consider > consistency with other str*() functions by making case-insensitivity > live in separate functions rather than as a parameter. e.g. > str_begins(), str_ibegins(), str_ends(), end_iends()
I guess that "i" isn't appliable when it have slashes. In this case, functions should be: strbegins, stribegins, strends, striends. In all case, I think that is better a third parameter and keep underlined. Yasuo Ohgaki wrote:
> +1 for having functions for case insensitivity. > I'm not sure if we should have "s". i.e. str_begin"s".
I think that "s" is good here. Sounds better for me, but I don't know if it is right in english. In JS, for instance, we have startsWith. It have a "s" too. 2016-08-01 21:06 GMT-03:00 Yasuo Ohgaki <yohgaki@ohgaki.net>:
> On Tue, Aug 2, 2016 at 7:56 AM, Sara Golemon <pollita@php.net> wrote: >> On Mon, Aug 1, 2016 at 3:52 PM, <will@wkhudgins.info> wrote: >>> I recently emailed the group about submitting an RFC for str_begins() and >>> str_ends() functions. The RFC has now been officially submitted and is >>> viewable at: >>> >>> https://wiki.php.net/rfc/add_str_begin_and_end_functions >>> >> Feeling "meh" on it (neither for nor against), but I would consider >> consistency with other str*() functions by making case-insensitivity >> live in separate functions rather than as a parameter. e.g. >> str_begins(), str_ibegins(), str_ends(), end_iends() > > +1 for having functions for case insensitivity. > I'm not sure if we should have "s". i.e. str_begin"s". > > Regards, > > -- > Yasuo Ohgaki > yohgaki@ohgaki.net > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php >
-- David Rodrigues
  94806
August 2, 2016 22:44 yohgaki@ohgaki.net (Yasuo Ohgaki)
Hi David,

On Tue, Aug 2, 2016 at 10:36 AM, David Rodrigues proweb@gmail.com> wrote:
> Sara Golemon wrote: >> Feeling "meh" on it (neither for nor against), but I would consider >> consistency with other str*() functions by making case-insensitivity >> live in separate functions rather than as a parameter. e.g. >> str_begins(), str_ibegins(), str_ends(), end_iends() > > I guess that "i" isn't appliable when it have slashes. > In this case, functions should be: strbegins, stribegins, strends, striends. > In all case, I think that is better a third parameter and keep underlined.
This is difficult issue. String function names are inconsistent currently. It is better to stick to CODING_STANDARDS naming convention for new function names. Therefore, new string functions are better to be named str_*() unless they are too strange. e.g. http://php.net/manual/en/function.str-replace.php http://php.net/manual/en/function.str-ireplace.php I would like to fix function name inconsistencies by having aliases in near future. https://wiki.php.net/rfc/consistent_function_names It might be okay to have "s" in function names, but if we want to be consistent, str_replace -> str_replaces str_ireplace -> str_ireplaces IMO, following names are better for consistency. str_begin str_ibegin str_end str_iend In addition, str_replace() has seach_value at first, so signature might be boolean str_begin(string $search_value, string $str, [boolean $case_sensitive = true]) boolean str_end(string $search_value, string $str, string $search_value [boolean $case_sensitive = true]) However, strstr() (and other str functions without "_". e.g. strpos/stripos/strrpos/strripos) has search_value as the 2nd parameter. If we follow this format, current signature is fine. It may be better sort out and fix consistency issues first, then add new functions. Otherwise, we may introduce more consistency issues. Regards, BTW, having "i" is more readable. str_ibegin("searchthis", $str); is more readable than str_begin("seachthis", $str, TRUE); as programmer does not have to know that's the TRUE means. It's small thing, but small things add up. -- Yasuo Ohgaki yohgaki@ohgaki.net
  94807
August 2, 2016 22:53 rowan.collins@gmail.com (Rowan Collins)
On 02/08/2016 23:44, Yasuo Ohgaki wrote:
> It might be okay to have "s" in function names, but if we want to be > consistent, > > str_replace -> str_replaces > str_ireplace -> str_ireplaces > > IMO, following names are better for consistency. > > str_begin > str_ibegin > str_end > str_iend
I think those names mean something different: "str_begin" sounds like an imperative "make this string begin with X"; "str_begins" is more of an assertion "the string begins with X". Ruby would spell it with a ? at the end. It's also the same form, grammatically, as the common "isFoo". Note that this logic holds for "str_replace", which *is* an imperative - you are not saying "tell me if X replaces Y", you are saying "please replace X with Y". Regards, -- Rowan Collins [IMSoP]
  94808
August 3, 2016 01:15 will@wkhudgins.info
On 2016-08-02 18:44, Yasuo Ohgaki wrote:
> Hi David, > > On Tue, Aug 2, 2016 at 10:36 AM, David Rodrigues > proweb@gmail.com> wrote: >> Sara Golemon wrote: >>> Feeling "meh" on it (neither for nor against), but I would consider >>> consistency with other str*() functions by making case-insensitivity >>> live in separate functions rather than as a parameter. e.g. >>> str_begins(), str_ibegins(), str_ends(), end_iends() >> >> I guess that "i" isn't appliable when it have slashes. >> In this case, functions should be: strbegins, stribegins, strends, >> striends. >> In all case, I think that is better a third parameter and keep >> underlined. > > This is difficult issue. > String function names are inconsistent currently. > It is better to stick to CODING_STANDARDS naming convention for new > function names. Therefore, new string functions are better to be named > str_*() unless they are too strange. > > e.g. > http://php.net/manual/en/function.str-replace.php > http://php.net/manual/en/function.str-ireplace.php > > I would like to fix function name inconsistencies by having aliases in > near future. > https://wiki.php.net/rfc/consistent_function_names > > It might be okay to have "s" in function names, but if we want to be > consistent, > > str_replace -> str_replaces > str_ireplace -> str_ireplaces > > IMO, following names are better for consistency. > > str_begin > str_ibegin > str_end > str_iend > > In addition, str_replace() has seach_value at first, so signature might > be > > boolean str_begin(string $search_value, string $str, [boolean > $case_sensitive = true]) > boolean str_end(string $search_value, string $str, string > $search_value [boolean $case_sensitive = true]) > > However, strstr() (and other str functions without "_". e.g. > strpos/stripos/strrpos/strripos) has search_value as the 2nd > parameter. If we follow this format, current signature is fine. > > It may be better sort out and fix consistency issues first, then add > new functions. Otherwise, we may introduce more consistency issues. > > Regards, > > BTW, having "i" is more readable. > > str_ibegin("searchthis", $str); > is more readable than > str_begin("seachthis", $str, TRUE); > as programmer does not have to know that's the TRUE means. > It's small thing, but small things add up. > > -- > Yasuo Ohgaki > yohgaki@ohgaki.net
Everyone has raised important considerations. For me, the most important thing is maintaining consistency with the existing PHP string library. I do not want these functions to feel "tacked" on, as if they were haphazardly added to PHP. If these functions are added to the language, it should feel as if they have always been a part of the language (even if they haven't been). This consistency is important in order to ensure these functions ADD to PHP instead of just cluttering it up. Having separate functions for case sensitivity makes sense, that is much more consistent with the existing string library. I think the proposal should be amended to separate those two functionalities. I think like having an "s" at the end of the function names reads better, but omitting the "s" fits better with the existing function names and does not read bad. Therefore, I am in favor of dropping the "s". As far as str_begin vs strbegin, I think str_begin is more readable. Therefore, I think it would be better to implement: boolean str_begin(string $search_value, string $str) boolean str_ibegin(string $search_value, string $str) boolean str_end(string $search_value, string $str) boolean str_iend(string $search_value, string $str) This is much more consistent with the existing string library. Regards, Will
  94809
August 3, 2016 03:02 yohgaki@ohgaki.net (Yasuo Ohgaki)
On Wed, Aug 3, 2016 at 7:44 AM, Yasuo Ohgaki <yohgaki@ohgaki.net> wrote:
> as programmer does not have to know that's the TRUE means.
s/that's/what's/ I shouldn't write mails while writing code :( -- Yasuo Ohgaki yohgaki@ohgaki.net
  94822
August 4, 2016 05:50 smalyshev@gmail.com (Stanislav Malyshev)
Hi!

> I guess that "i" isn't appliable when it have slashes. > In this case, functions should be: strbegins, stribegins, strends, striends. > In all case, I think that is better a third parameter and keep underlined.
Please, not stribegins. We have enough functions with weird names :) I am ambivalent of the question whether to have additional argument or two functions, I guess with a slight preference for argument. -- Stas Malyshev smalyshev@gmail.com
  95106
August 13, 2016 00:37 will@wkhudgins.info
I've updated the RFC to reflect the discussion here and on github. You 
may see it at
https://wiki.php.net/rfc/add_str_begin_and_end_functions . You can see 
the github PR at https://github.com/php/php-src/pull/2049 .

The motivation for these changes was to maximize consistency between the 
proposed functions and existing PHP string functions. The goal is to 
make these functions feel natural and add functionality to the language 
without cluttering it up.

Thanks,

Will
  95107
August 13, 2016 02:29 bishop@php.net (Bishop Bettini)
On Fri, Aug 12, 2016 at 8:37 PM, <will@wkhudgins.info> wrote:

> I've updated the RFC to reflect the discussion here and on github. You may > see it at > https://wiki.php.net/rfc/add_str_begin_and_end_functions . You can see > the github PR at https://github.com/php/php-src/pull/2049 . > > The motivation for these changes was to maximize consistency between the > proposed functions and existing PHP string functions. The goal is to make > these functions feel natural and add functionality to the language without > cluttering it up. >
Generally, +1. A few thoughts. First, the RFC refers to these working on "characters". I assume you mean ASCII characters and these actually work strictly on bytes. Working on "characters" would be more in-line for a multi-byte extension. Would you please clarify this point? Second, and related to the multi-byte issue: do the case insensitive versions honor case-folding in a multi-byte fashion? Either way, it's probably a good idea to separate the vote between the sensitive and insensitive versions because this is fundamentally a different, and perhaps more contentious, question. Third, perhaps these functions could provide more information than just yes/no. Return boolean TRUE if and only if the needle completely begins/ends the haystack, otherwise return INT representing the length in common. Yes, that'll probably be a trap for new developers who don't honor ===, but that could be illuminated in docs. Formally: boolean|int str_begin(string $needle, string $haystack) boolean|int str_end(string $needle, string $haystack) For example: str_begin('http://', 'http://example.com') === true str_begin('http://', 'https://example.com') === 4 Finally, since the RFC will fuel the final documentation, it might be a good idea to use needle/haystack terminology in the function signatures for some kind of consistency.
  95108
August 13, 2016 07:42 lester@lsces.co.uk (Lester Caine)
On 04/08/16 06:50, Stanislav Malyshev wrote:
>> I guess that "i" isn't appliable when it have slashes. >> > In this case, functions should be: strbegins, stribegins, strends, striends. >> > In all case, I think that is better a third parameter and keep underlined.
> Please, not stribegins. We have enough functions with weird names :) > I am ambivalent of the question whether to have additional argument or > two functions, I guess with a slight preference for argument.
The bulk of the time I'm applying this to the SQL query that is going to return a set of results rather than direct to a string. In that case it's STARTING 'xYZ'. Because the need has not arisen I've only just noticed - after 20 odd years - there is no matching ENDING. Although normally one needs to build a phantom field to index the data, so I do have ONE case of reversed_field STARTING 'ZYX'. Is starting just a Firebird SQL thing or is it more generally available. I do a few google searches but as usual when searching for things like 'starting' one gets hundreds of pages on 'running' the software and it's other connotations. I suspect like PHP the other methods of doing things take the strain, so certainly LIKE 'XYZ%' and '%XYZ' are probably the 'generic' solution but suffer from slower search times, especially when looking for the ENDING string. -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
  95292
August 18, 2016 11:39 rowan.collins@gmail.com (Rowan Collins)
On 13/08/2016 08:42, Lester Caine wrote:
> Is starting just a Firebird SQL thing or is it more generally available. > I do a few google searches but as usual when searching for things like > 'starting' one gets hundreds of pages on 'running' the software and it's > other connotations.
I've never come across it in Postgres, MS SQL Server, or MySQL. Generally LIKE 'abc%' is the recommended approach (and will I think hit the index in many cases, because the DBMS can optimize the case of a prefix match if it knows at planning time). A "starting" keyword would certainly be useful if it was there. :) It doesn't quite fill the same need as a PHP function, of course, because you might be checking user input, or API results, or all sorts of things that won't, or haven't yet, hit the database. Currently the common idiom for that is the ugly strpos($string, 'abc') === 0 Regards, -- Rowan Collins [IMSoP]
  95296
August 18, 2016 15:29 lester@lsces.co.uk (Lester Caine)
On 18/08/16 12:39, Rowan Collins wrote:
>> Is starting just a Firebird SQL thing or is it more generally available. >> I do a few google searches but as usual when searching for things like >> 'starting' one gets hundreds of pages on 'running' the software and it's >> other connotations. > > I've never come across it in Postgres, MS SQL Server, or MySQL. > Generally LIKE 'abc%' is the recommended approach (and will I think hit > the index in many cases, because the DBMS can optimize the case of a > prefix match if it knows at planning time). A "starting" keyword would > certainly be useful if it was there. :) > > It doesn't quite fill the same need as a PHP function, of course, > because you might be checking user input, or API results, or all sorts > of things that won't, or haven't yet, hit the database. Currently the > common idiom for that is the ugly strpos($string, 'abc') === 0
PHP is never going to be loading millions of records into memory and searching them. That is the job of a database, and while LIKE 'abc%' can be optimised to use an index and speed up results, if the 'abc%' is supplied as a parameter that is not generally possible to prepare the query using an index. While STARTING always knows the matching string is the first characters of the index. While PHP and SQL share a number of alternatives, the SQL versions will have a premium on search time if an index can't be used. I was just wondering if str_starting and str_ending matched better with other string handling options. -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
  94813
August 3, 2016 07:59 lauri.kentta@gmail.com (=?UTF-8?Q?Lauri_Kentt=C3=A4?=)
Hello,

I only saw you mention strpos, preg_match and substr as (slower) 
alternatives. However, there's already a function called substr_compare 
which is meant for just this kind of comparisons but which is more 
general than your RFC.

function str_begins($a, $b) {
   return substr_compare($a, $b, 0, strlen($b)) === 0;
}
function str_ends($a, $b) {
   return substr_compare($a, $b, -strlen($b)) === 0;
}

-- 
Lauri Kenttä
  94814
August 3, 2016 09:23 cmbecker69@gmx.de (Christoph Becker)
On 03.08.2016 at 09:59, Lauri Kenttä wrote:

> I only saw you mention strpos, preg_match and substr as (slower) > alternatives. However, there's already a function called substr_compare > which is meant for just this kind of comparisons but which is more > general than your RFC.
Thanks for pointing out substr_compare(), of which I even have not been aware of. And indeed, substr_compare() is perfectly suitable to verify whether a string starts or ends with a certain substring, so, in my opinion, there is no need for the other functions to be added to ext/standard. -- Christoph M. Becker
  94982
August 10, 2016 02:09 ajf@ajf.me (Andrea Faulds)
Hi Will,

will@wkhudgins.info wrote:
> I recently emailed the group about submitting an RFC for str_begins() > and str_ends() functions. The RFC has now been officially submitted and > is viewable at: > > https://wiki.php.net/rfc/add_str_begin_and_end_functions
I like this RFC, I've long wanted a PHP equivalent to JavaScript and the like's .startsWith()/.endsWith(). Two comments, however. Firstly, I'm not sure if having control over case-sensitivity as part of the function is necessary, as you can always lowercase the string yourself. Furthermore, if the user is dealing with non-single-byte strings (e.g. Unicode), traditional byte-by-byte case-insensitive comparison is not going to work properly in all cases (e.g. ß vs. SS). I'd rather we leave case-insensitivity out of the function, and let the user call strtolower() or mb_strtolower(), as the case may be, themselves. Secondly, in JavaScript, .startsWith()[1] and .endsWith()[2] have an extra parameter for specifying the offset into the haystack (in the case of startsWith) or the string length of the haystack (in the case of endsWith). In Python,[3][4] there's *two* extra parameters, to let you clip the haystack from both ends (like with substr, I think). Have you considered these? They could replace the case-sensitivity parameter, perhaps. Thanks for your work! [1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/startsWith [2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/endsWith [3] https://docs.python.org/3/library/stdtypes.html?highlight=str.startswith#str.startswith [4] https://docs.python.org/3/library/stdtypes.html?highlight=str.endswith#str.endswith -- Andrea Faulds https://ajf.me/
  95109
August 13, 2016 08:34 simon@simon.geek.nz (Simon Welsh)
> On 2/08/2016, at 8:52 AM, will@wkhudgins.info wrote: > > Hello, > > I recently emailed the group about submitting an RFC for str_begins() and str_ends() functions. The RFC has now been officially submitted and is viewable at: > > https://wiki.php.net/rfc/add_str_begin_and_end_functions > > The github PR may be found at: > > https://github.com/php/php-src/pull/2049 > > Hope to be hearing about this, > > Will
Firstly, the argument ordering is the wrong way round for a string function. String functions — especially search-related ones — are haystack, needle (see strpos, strstr, strcspn, strpbrk, etc). Secondly, I feel like this RFC does need to include that it’s a BC break by introducing new global functions. A quick search shows that SugarCRM[1] already implements str_begin and str_end functions and there’s likely to be other projects that do too. [1]: https://github.com/sugarcrm/sugarcrm_dev/blob/ae189cfa4ed4edd6a4e1e0d9d1d5ec66f46a0b74/include/utils.php#L2082-L2090 -- Simon Welsh
  95120
August 13, 2016 16:47 will@wkhudgins.info
On 2016-08-13 04:34, Simon Welsh wrote:
>> On 2/08/2016, at 8:52 AM, will@wkhudgins.info wrote: >> >> Hello, >> >> I recently emailed the group about submitting an RFC for str_begins() >> and str_ends() functions. The RFC has now been officially submitted >> and is viewable at: >> >> https://wiki.php.net/rfc/add_str_begin_and_end_functions >> >> The github PR may be found at: >> >> https://github.com/php/php-src/pull/2049 >> >> Hope to be hearing about this, >> >> Will > > Firstly, the argument ordering is the wrong way round for a string > function. String functions — especially search-related ones — are > haystack, needle (see strpos, strstr, strcspn, strpbrk, etc). > > Secondly, I feel like this RFC does need to include that it’s a BC > break by introducing new global functions. A quick search shows that > SugarCRM[1] already implements str_begin and str_end functions and > there’s likely to be other projects that do too. > > [1]: > https://github.com/sugarcrm/sugarcrm_dev/blob/ae189cfa4ed4edd6a4e1e0d9d1d5ec66f46a0b74/include/utils.php#L2082-L2090 > -- > Simon Welsh
You are correct, functions like strpos and strstr do follow (haystack, needle) but functions like str_replace follow the format (needle, haystack). Because I did these functions with the underscore, it made sense to make the functions follow the format found in other str_* functions. If the functions were changed to be strbegin, stribegin, strend, and striend, then it would make sense to follow the (haystack, needle) format. However, I think adding the underscore greatly improves the readability of these functions. And if the functions are named with an underscore, I think it should follow the format found in the other underscore functions. Good call on the BC break, I had not thought about it breaking userland functions with the same name. -Will
  95126
August 13, 2016 23:17 simon@simon.geek.nz (Simon Welsh)
> On 14/08/2016, at 2:47 AM, will@wkhudgins.info wrote: > > On 2016-08-13 04:34, Simon Welsh wrote: >>> On 2/08/2016, at 8:52 AM, will@wkhudgins.info wrote: >>> Hello, >>> I recently emailed the group about submitting an RFC for str_begins() and str_ends() functions. The RFC has now been officially submitted and is viewable at: >>> https://wiki.php.net/rfc/add_str_begin_and_end_functions >>> The github PR may be found at: >>> https://github.com/php/php-src/pull/2049 >>> Hope to be hearing about this, >>> Will >> Firstly, the argument ordering is the wrong way round for a string >> function. String functions — especially search-related ones — are >> haystack, needle (see strpos, strstr, strcspn, strpbrk, etc). >> Secondly, I feel like this RFC does need to include that it’s a BC >> break by introducing new global functions. A quick search shows that >> SugarCRM[1] already implements str_begin and str_end functions and >> there’s likely to be other projects that do too. >> [1]: >> https://github.com/sugarcrm/sugarcrm_dev/blob/ae189cfa4ed4edd6a4e1e0d9d1d5ec66f46a0b74/include/utils.php#L2082-L2090 >> -- >> Simon Welsh > > You are correct, functions like strpos and strstr do follow (haystack, needle) but functions like str_replace follow the format (needle, haystack). Because I did these functions with the underscore, it made sense to make the functions follow the format found in other str_* functions. If the functions were changed to be strbegin, stribegin, strend, and striend, then it would make sense to follow the (haystack, needle) format. However, I think adding the underscore greatly improves the readability of these functions. And if the functions are named with an underscore, I think it should follow the format found in the other underscore functions.
str_replace and str_ireplace are the only str_* functions that don’t take the full string (haystack) as the first argument. str_pad, str_repeat, str_split and str_word_count all take the full string first even if there are other compulsory arguments. Also, these functions are replacements for current usage of strpos/strrpos/substr_compare, so I feel like the argument ordering should match those rather than another function that isn’t closely related in functionality.
> > Good call on the BC break, I had not thought about it breaking userland functions with the same name. > > -Will
-- Simon Welsh
  105976
June 18, 2019 18:45 will@wkhudgins.info
Hello all,

I submitted this RFC several years ago. I collected a lot of feedback 
and I have updated the RFC and corresponding github patch. Please see 
the RFC at https://wiki.php.net/rfc/add_str_begin_and_end_functions and 
the github patch at https://github.com/php/php-src/pull/2049. I have 
addressed many concerns
(order of arguments, name of functions, multibye support, etc). I plan 
to move this RFC to a vote in the coming weeks.

Thanks,

Will
  105990
June 19, 2019 22:31 will@wkhudgins.info
I sent this earlier this week without [RFC] in the subject line...since 
some people might have filters to check the subject line I wanted to 
send this again with the proper substring in the subject line–to make it 
clear I intend to take this to a vote in two weeks. Apologies for the 
duplicate email.

-Will

On 2019-06-18 14:45, will@wkhudgins.info wrote:
> Hello all, > > I submitted this RFC several years ago. I collected a lot of feedback > and I have updated the RFC and corresponding github patch. Please see > the RFC at https://wiki.php.net/rfc/add_str_begin_and_end_functions > and the github patch at https://github.com/php/php-src/pull/2049. I > have addressed many concerns > (order of arguments, name of functions, multibye support, etc). I plan > to move this RFC to a vote in the coming weeks. > > Thanks, > > Will
  106030
June 22, 2019 15:32 nikita.ppv@gmail.com (Nikita Popov)
On Thu, Jun 20, 2019 at 12:32 AM <will@wkhudgins.info> wrote:

> I sent this earlier this week without [RFC] in the subject line...since > some people might have filters to check the subject line I wanted to > send this again with the proper substring in the subject line–to make it > clear I intend to take this to a vote in two weeks. Apologies for the > duplicate email. > > -Will > > On 2019-06-18 14:45, will@wkhudgins.info wrote: > > Hello all, > > > > I submitted this RFC several years ago. I collected a lot of feedback > > and I have updated the RFC and corresponding github patch. Please see > > the RFC at https://wiki.php.net/rfc/add_str_begin_and_end_functions > > and the github patch at https://github.com/php/php-src/pull/2049. I > > have addressed many concerns > > (order of arguments, name of functions, multibye support, etc). I plan > > to move this RFC to a vote in the coming weeks. > > > > Thanks, > > > > Will >
Unfortunately, this looks like a case where the RFC feedback has made the proposal worse, rather than better :( I think it's easier to start with what I think this proposal should be: There should be just two functions, str_starts_with() and str_ends_with() -- and that's it. The important realization to have here is that these functions are a bit of sugar for an operation that is quite common, but can also be easily implemented with existing functions (using strcmp, strpos or substr, depending on what you like). There is no need for us to cover every conceivable combination, just make the common case more convenient and easier to read. With that in mind: * I believe the "starts with" and "ends with" naming is a lot more canonical, used by Python, Ruby, Java, JavaScript and probably lots more. * In my experience case-insensitive "i" variants of strings functions are used much less, by an order of magnitude. With this being sugar in the first place, I don't think there's a need to cover case-insensitive variations (and from a quick look, these don't seem to be first class methods in other languages either). If we do want to have them, I'd suggest making the names str_starts_with_ci() and str_ends_with_ci(), which is more obvious and harder to miss than str_istarts_with() etc. * Having mb_* variants of these functions doesn't really make sense. I realize that there's this knee-jerk reaction about how if it doesn't have "mb" in the name it's not Unicode compatible, but in this case it's even more wrong than usual. The normal str_starts_with() function is perfectly safe to use on UTF-8 strings, the only difference between it and mb_str_starts_with() is that it's going to be implemented a lot more efficiently. The only case that *might* make some sense is the case-insensitive variant here, because that has some genuine reliance on the character encoding. But then again, this can be handled by case-folding the strings first, something that mbstring is going to do internally anyway.. I would happily accept a proposal for str_starts_with() + str_ends_with(), but I'm a lot more apprehensive about adding these 8 new functions. Regards, Nikita
  106035
June 22, 2019 19:56 ben@benramsey.com (Ben Ramsey)
> On Jun 22, 2019, at 10:32, Nikita Popov ppv@gmail.com> wrote: > >> On Thu, Jun 20, 2019 at 12:32 AM <will@wkhudgins.info> wrote: >> >> I sent this earlier this week without [RFC] in the subject line...since >> some people might have filters to check the subject line I wanted to >> send this again with the proper substring in the subject line–to make it >> clear I intend to take this to a vote in two weeks. Apologies for the >> duplicate email. >> >> -Will >> >>> On 2019-06-18 14:45, will@wkhudgins.info wrote: >>> Hello all, >>> >>> I submitted this RFC several years ago. I collected a lot of feedback >>> and I have updated the RFC and corresponding github patch. Please see >>> the RFC at https://wiki.php.net/rfc/add_str_begin_and_end_functions >>> and the github patch at https://github.com/php/php-src/pull/2049. I >>> have addressed many concerns >>> (order of arguments, name of functions, multibye support, etc). I plan >>> to move this RFC to a vote in the coming weeks. >>> >>> Thanks, >>> >>> Will >> > > Unfortunately, this looks like a case where the RFC feedback has made the > proposal worse, rather than better :( > > I think it's easier to start with what I think this proposal should be: > There should be just two functions, str_starts_with() and str_ends_with() > -- and that's it. > > The important realization to have here is that these functions are a bit of > sugar for an operation that is quite common, but can also be easily > implemented with existing functions (using strcmp, strpos or substr, > depending on what you like). There is no need for us to cover every > conceivable combination, just make the common case more convenient and > easier to read. > > With that in mind: > * I believe the "starts with" and "ends with" naming is a lot more > canonical, used by Python, Ruby, Java, JavaScript and probably lots more. > * In my experience case-insensitive "i" variants of strings functions are > used much less, by an order of magnitude. With this being sugar in the > first place, I don't think there's a need to cover case-insensitive > variations (and from a quick look, these don't seem to be first class > methods in other languages either). If we do want to have them, I'd suggest > making the names str_starts_with_ci() and str_ends_with_ci(), which is more > obvious and harder to miss than str_istarts_with() etc. > * Having mb_* variants of these functions doesn't really make sense. I > realize that there's this knee-jerk reaction about how if it doesn't have > "mb" in the name it's not Unicode compatible, but in this case it's even > more wrong than usual. The normal str_starts_with() function is perfectly > safe to use on UTF-8 strings, the only difference between it and > mb_str_starts_with() is that it's going to be implemented a lot more > efficiently. The only case that *might* make some sense is the > case-insensitive variant here, because that has some genuine reliance on > the character encoding. But then again, this can be handled by case-folding > the strings first, something that mbstring is going to do internally anyway. > > I would happily accept a proposal for str_starts_with() + str_ends_with(), > but I'm a lot more apprehensive about adding these 8 new functions. > > Regards, > Nikita
I like the idea of simplifying this to the two functions str_starts_with() and str_ends_with(). When I was looking through this the other day, I had trouble coming up with an example of a string with the mb_* versions would ever generate a different result from the non-multibyte versions, since the implementation only needs to count and analyze bytes for uniqueness. Perhaps it would only be an issue with the case-insensitive versions, as Nikita points out? If so, can someone provide some example strings where an mb_starts_with_ci() would return true, while str_starts_with_ci() would return false? I think the case sensitivity versions would be common enough in use cases (i..e. looking to see if a path ends with .CSV vs. .csv, etc.), but maybe the signatures could be revised to pass a third parameter? str_starts_with($haystack, $needle, $case_sensitive = true): bool -Ben
  106041
June 23, 2019 10:35 rowan.collins@gmail.com (Rowan Collins)
On 22 June 2019 20:56:24 BST, Ben Ramsey <ben@benramsey.com> wrote:
>Perhaps it would only be an issue with the case-insensitive versions, >as Nikita points out? If so, can someone provide some example strings >where an mb_starts_with_ci() would return true, while >str_starts_with_ci() would return false?
That's easy: any character that has a lower- and uppercase form, and is not represented as one byte in the target encoding. For that matter, any such character in the non-ASCII section of a single-byte encoding, since a non-mbstring case insensitive flag would presumably leave everything other than ASCII letters untouched. So, any non-Latin script, like Greek or Cyrillic; any accented characters, unless you're lucky and they're represented by ASCII-letter plus combining modifier; the Turkish "i", which if I remember rightly has three forms not two; and so on. Regards, -- Rowan Collins [IMSoP]
  106042
June 23, 2019 15:29 ben@benramsey.com (Ben Ramsey)
> On Jun 23, 2019, at 05:35, Rowan Collins collins@gmail.com> wrote: > > On 22 June 2019 20:56:24 BST, Ben Ramsey <ben@benramsey.com> wrote: >> Perhaps it would only be an issue with the case-insensitive versions, >> as Nikita points out? If so, can someone provide some example strings >> where an mb_starts_with_ci() would return true, while >> str_starts_with_ci() would return false? > > > That's easy: any character that has a lower- and uppercase form, and is not represented as one byte in the target encoding. For that matter, any such character in the non-ASCII section of a single-byte encoding, since a non-mbstring case insensitive flag would presumably leave everything other than ASCII letters untouched. > > So, any non-Latin script, like Greek or Cyrillic; any accented characters, unless you're lucky and they're represented by ASCII-letter plus combining modifier; the Turkish "i", which if I remember rightly has three forms not two; and so on.
According to Google, "İyi akşamlar” is the Turkish phrase for “Good evening” (Turkish speakers, please correct me, if this wrong). However, using the existing mb_* functions, I can’t get mb_stripos() to return 0 when trying to see if the string “İYI AKŞAMLAR” begins with “i̇yi.” I’m just using UTF-8, so maybe there’s an encoding issue here? $string = 'İyi akşamlar'; $upper = mb_strtoupper($string); $lowerChars = mb_strtolower(mb_substr($string, 0, 3)); var_dump($string, $upper, $lowerChars); var_dump(mb_stripos($upper, $lowerChars));
  106043
June 23, 2019 15:46 nikita.ppv@gmail.com (Nikita Popov)
On Sun, Jun 23, 2019 at 5:30 PM Ben Ramsey <ben@benramsey.com> wrote:

> > On Jun 23, 2019, at 05:35, Rowan Collins collins@gmail.com> > wrote: > > > > On 22 June 2019 20:56:24 BST, Ben Ramsey <ben@benramsey.com> wrote: > >> Perhaps it would only be an issue with the case-insensitive versions, > >> as Nikita points out? If so, can someone provide some example strings > >> where an mb_starts_with_ci() would return true, while > >> str_starts_with_ci() would return false? > > > > > > That's easy: any character that has a lower- and uppercase form, and is > not represented as one byte in the target encoding. For that matter, any > such character in the non-ASCII section of a single-byte encoding, since a > non-mbstring case insensitive flag would presumably leave everything other > than ASCII letters untouched. > > > > So, any non-Latin script, like Greek or Cyrillic; any accented > characters, unless you're lucky and they're represented by ASCII-letter > plus combining modifier; the Turkish "i", which if I remember rightly has > three forms not two; and so on. > > > According to Google, "İyi akşamlar” is the Turkish phrase for “Good > evening” (Turkish speakers, please correct me, if this wrong). However, > using the existing mb_* functions, I can’t get mb_stripos() to return 0 > when trying to see if the string “İYI AKŞAMLAR” begins with “i̇yi.” > > I’m just using UTF-8, so maybe there’s an encoding issue here? > > $string = 'İyi akşamlar'; > $upper = mb_strtoupper($string); > $lowerChars = mb_strtolower(mb_substr($string, 0, 3)); > > var_dump($string, $upper, $lowerChars); > var_dump(mb_stripos($upper, $lowerChars)); >
The reason why this doesn't work is that mb_stripos internally performs a simple case fold, while a full case fold would be needed in this case (Turkish i is hard). It's a bit tricky due to the need to remap character offsets. Nikita
  106048
June 24, 2019 09:28 nikita.ppv@gmail.com (Nikita Popov)
On Sun, Jun 23, 2019 at 5:46 PM Nikita Popov ppv@gmail.com> wrote:

> On Sun, Jun 23, 2019 at 5:30 PM Ben Ramsey <ben@benramsey.com> wrote: > >> > On Jun 23, 2019, at 05:35, Rowan Collins collins@gmail.com> >> wrote: >> > >> > On 22 June 2019 20:56:24 BST, Ben Ramsey <ben@benramsey.com> wrote: >> >> Perhaps it would only be an issue with the case-insensitive versions, >> >> as Nikita points out? If so, can someone provide some example strings >> >> where an mb_starts_with_ci() would return true, while >> >> str_starts_with_ci() would return false? >> > >> > >> > That's easy: any character that has a lower- and uppercase form, and is >> not represented as one byte in the target encoding. For that matter, any >> such character in the non-ASCII section of a single-byte encoding, since a >> non-mbstring case insensitive flag would presumably leave everything other >> than ASCII letters untouched. >> > >> > So, any non-Latin script, like Greek or Cyrillic; any accented >> characters, unless you're lucky and they're represented by ASCII-letter >> plus combining modifier; the Turkish "i", which if I remember rightly has >> three forms not two; and so on. >> >> >> According to Google, "İyi akşamlar” is the Turkish phrase for “Good >> evening” (Turkish speakers, please correct me, if this wrong). However, >> using the existing mb_* functions, I can’t get mb_stripos() to return 0 >> when trying to see if the string “İYI AKŞAMLAR” begins with “i̇yi.” >> >> I’m just using UTF-8, so maybe there’s an encoding issue here? >> >> $string = 'İyi akşamlar'; >> $upper = mb_strtoupper($string); >> $lowerChars = mb_strtolower(mb_substr($string, 0, 3)); >> >> var_dump($string, $upper, $lowerChars); >> var_dump(mb_stripos($upper, $lowerChars)); >> > > The reason why this doesn't work is that mb_stripos internally performs a > simple case fold, while a full case fold would be needed in this case > (Turkish i is hard). It's a bit tricky due to the need to remap character > offsets. >
I've implemented use of full case folding in https://github.com/php/php-src/pull/4303. While doing that I kind of convinced myself that we probably shouldn't actually do this, because it breaks simple mb_stripos loops in a subtle way. It probably makes more sense for people to explicitly call mb_convert_case($string, MB_CASE_FOLD) and then operate on the resulting strings. Both much more efficient, and avoids offset remapping issues. Nikita
  106049
June 24, 2019 22:13 php.lists@allenjb.me.uk (AllenJB)
On 24/06/2019 10:28, Nikita Popov wrote:
> On Sun, Jun 23, 2019 at 5:46 PM Nikita Popov ppv@gmail.com> wrote: > >> On Sun, Jun 23, 2019 at 5:30 PM Ben Ramsey <ben@benramsey.com> wrote: >> >> The reason why this doesn't work is that mb_stripos internally >> performs a >> simple case fold, while a full case fold would be needed in this case >> (Turkish i is hard). It's a bit tricky due to the need to remap character >> offsets. >> > I've implemented use of full case folding in > https://github.com/php/php-src/pull/4303. While doing that I kind of > convinced myself that we probably shouldn't actually do this, because it > breaks simple mb_stripos loops in a subtle way. It probably makes more > sense for people to explicitly call mb_convert_case($string, MB_CASE_FOLD) > and then operate on the resulting strings. Both much more efficient, and > avoids offset remapping issues. > > Nikita
If these functions (mb_stripos and any others affected by the same issue) are not the recommended way, and may not act as users expect, should they be deprecated? Or at least notes added to the manual pages regarding this behavior / the differences between the different methods? AllenJB
  106044
June 23, 2019 15:48 rowan.collins@gmail.com (Rowan Collins)
On 23/06/2019 16:29, Ben Ramsey wrote:
> According to Google, "İyi akşamlar” is the Turkish phrase for “Good evening” (Turkish speakers, please correct me, if this wrong). However, using the existing mb_* functions, I can’t get mb_stripos() to return 0 when trying to see if the string “İYI AKŞAMLAR” begins with “i̇yi.”
Probably mb_string not using the right case-folding routines; as mentioned in another thread, ext/mbstring wasn't written for Unicode, but for older multibyte encodings, particularly those used for Japanese text. grapheme_stripos (from ext/intl) apparently gets it right as of PHP 7.3: https://3v4l.org/0431j A much simpler example, though, is using just the second word of that string: the accented "s" confuses plain stripos but not mb_stripos. Regards, -- Rowan Collins [IMSoP]
  106095
June 28, 2019 20:54 will@wkhudgins.info
These are good points. Originally my RFC called for less functions but 
based on feedback I added the others. My proposal: take the RFC as-is to 
a vote. If it fails, I will raise another RFC for a vote that will just 
contain the two basic functions: str_begins and str_ends.

Thanks,

Will

On 2019-06-22 15:56, Ben Ramsey wrote:
>> On Jun 22, 2019, at 10:32, Nikita Popov ppv@gmail.com> wrote: >> >>> On Thu, Jun 20, 2019 at 12:32 AM <will@wkhudgins.info> wrote: >>> >>> I sent this earlier this week without [RFC] in the subject >>> line...since >>> some people might have filters to check the subject line I wanted to >>> send this again with the proper substring in the subject line–to make >>> it >>> clear I intend to take this to a vote in two weeks. Apologies for the >>> duplicate email. >>> >>> -Will >>> >>>> On 2019-06-18 14:45, will@wkhudgins.info wrote: >>>> Hello all, >>>> >>>> I submitted this RFC several years ago. I collected a lot of >>>> feedback >>>> and I have updated the RFC and corresponding github patch. Please >>>> see >>>> the RFC at https://wiki.php.net/rfc/add_str_begin_and_end_functions >>>> and the github patch at https://github.com/php/php-src/pull/2049. I >>>> have addressed many concerns >>>> (order of arguments, name of functions, multibye support, etc). I >>>> plan >>>> to move this RFC to a vote in the coming weeks. >>>> >>>> Thanks, >>>> >>>> Will >>> >> >> Unfortunately, this looks like a case where the RFC feedback has made >> the >> proposal worse, rather than better :( >> >> I think it's easier to start with what I think this proposal should >> be: >> There should be just two functions, str_starts_with() and >> str_ends_with() >> -- and that's it. >> >> The important realization to have here is that these functions are a >> bit of >> sugar for an operation that is quite common, but can also be easily >> implemented with existing functions (using strcmp, strpos or substr, >> depending on what you like). There is no need for us to cover every >> conceivable combination, just make the common case more convenient and >> easier to read. >> >> With that in mind: >> * I believe the "starts with" and "ends with" naming is a lot more >> canonical, used by Python, Ruby, Java, JavaScript and probably lots >> more. >> * In my experience case-insensitive "i" variants of strings functions >> are >> used much less, by an order of magnitude. With this being sugar in the >> first place, I don't think there's a need to cover case-insensitive >> variations (and from a quick look, these don't seem to be first class >> methods in other languages either). If we do want to have them, I'd >> suggest >> making the names str_starts_with_ci() and str_ends_with_ci(), which is >> more >> obvious and harder to miss than str_istarts_with() etc. >> * Having mb_* variants of these functions doesn't really make sense. I >> realize that there's this knee-jerk reaction about how if it doesn't >> have >> "mb" in the name it's not Unicode compatible, but in this case it's >> even >> more wrong than usual. The normal str_starts_with() function is >> perfectly >> safe to use on UTF-8 strings, the only difference between it and >> mb_str_starts_with() is that it's going to be implemented a lot more >> efficiently. The only case that *might* make some sense is the >> case-insensitive variant here, because that has some genuine reliance >> on >> the character encoding. But then again, this can be handled by >> case-folding >> the strings first, something that mbstring is going to do internally >> anyway. >> >> I would happily accept a proposal for str_starts_with() + >> str_ends_with(), >> but I'm a lot more apprehensive about adding these 8 new functions. >> >> Regards, >> Nikita > > > I like the idea of simplifying this to the two functions > str_starts_with() and str_ends_with(). > > When I was looking through this the other day, I had trouble coming up > with an example of a string with the mb_* versions would ever generate > a different result from the non-multibyte versions, since the > implementation only needs to count and analyze bytes for uniqueness. > Perhaps it would only be an issue with the case-insensitive versions, > as Nikita points out? If so, can someone provide some example strings > where an mb_starts_with_ci() would return true, while > str_starts_with_ci() would return false? > > I think the case sensitivity versions would be common enough in use > cases (i.e. looking to see if a path ends with .CSV vs. .csv, etc.), > but maybe the signatures could be revised to pass a third parameter? > > str_starts_with($haystack, $needle, $case_sensitive = true): bool > > -Ben
  106101
June 28, 2019 22:07 nikita.ppv@gmail.com (Nikita Popov)
On Fri, Jun 28, 2019 at 10:54 PM <will@wkhudgins.info> wrote:

> These are good points. Originally my RFC called for less functions but > based on feedback I added the others. My proposal: take the RFC as-is to > a vote. If it fails, I will raise another RFC for a vote that will just > contain the two basic functions: str_begins and str_ends. >
To put my comments into more actionable form, here is what I would recommend for this RFC: * Rename str_begins -> str_starts_with, str_ends -> str_ends_with, str_ibegins -> str_starts_with_ci, str_iends -> str_ends_with_ci. As mentioned before, this is standard terminology used by many, many programming languages and it would be great if PHP did not deviate from convention without strong reason. * Have a separate vote (in the same RFC) for the addition of the corresponding mb_* variants. I believe doing those two changes will ensure that the core part of the RFC passes. I personally would be voting yes on the first part and no on the second, but others may decide as they see fit. Nikita
> On 2019-06-22 15:56, Ben Ramsey wrote: > >> On Jun 22, 2019, at 10:32, Nikita Popov ppv@gmail.com> wrote: > >> > >>> On Thu, Jun 20, 2019 at 12:32 AM <will@wkhudgins.info> wrote: > >>> > >>> I sent this earlier this week without [RFC] in the subject > >>> line...since > >>> some people might have filters to check the subject line I wanted to > >>> send this again with the proper substring in the subject line–to make > >>> it > >>> clear I intend to take this to a vote in two weeks. Apologies for the > >>> duplicate email. > >>> > >>> -Will > >>> > >>>> On 2019-06-18 14:45, will@wkhudgins.info wrote: > >>>> Hello all, > >>>> > >>>> I submitted this RFC several years ago. I collected a lot of > >>>> feedback > >>>> and I have updated the RFC and corresponding github patch. Please > >>>> see > >>>> the RFC at https://wiki.php.net/rfc/add_str_begin_and_end_functions > >>>> and the github patch at https://github.com/php/php-src/pull/2049. I > >>>> have addressed many concerns > >>>> (order of arguments, name of functions, multibye support, etc). I > >>>> plan > >>>> to move this RFC to a vote in the coming weeks. > >>>> > >>>> Thanks, > >>>> > >>>> Will > >>> > >> > >> Unfortunately, this looks like a case where the RFC feedback has made > >> the > >> proposal worse, rather than better :( > >> > >> I think it's easier to start with what I think this proposal should > >> be: > >> There should be just two functions, str_starts_with() and > >> str_ends_with() > >> -- and that's it. > >> > >> The important realization to have here is that these functions are a > >> bit of > >> sugar for an operation that is quite common, but can also be easily > >> implemented with existing functions (using strcmp, strpos or substr, > >> depending on what you like). There is no need for us to cover every > >> conceivable combination, just make the common case more convenient and > >> easier to read. > >> > >> With that in mind: > >> * I believe the "starts with" and "ends with" naming is a lot more > >> canonical, used by Python, Ruby, Java, JavaScript and probably lots > >> more. > >> * In my experience case-insensitive "i" variants of strings functions > >> are > >> used much less, by an order of magnitude. With this being sugar in the > >> first place, I don't think there's a need to cover case-insensitive > >> variations (and from a quick look, these don't seem to be first class > >> methods in other languages either). If we do want to have them, I'd > >> suggest > >> making the names str_starts_with_ci() and str_ends_with_ci(), which is > >> more > >> obvious and harder to miss than str_istarts_with() etc. > >> * Having mb_* variants of these functions doesn't really make sense. I > >> realize that there's this knee-jerk reaction about how if it doesn't > >> have > >> "mb" in the name it's not Unicode compatible, but in this case it's > >> even > >> more wrong than usual. The normal str_starts_with() function is > >> perfectly > >> safe to use on UTF-8 strings, the only difference between it and > >> mb_str_starts_with() is that it's going to be implemented a lot more > >> efficiently. The only case that *might* make some sense is the > >> case-insensitive variant here, because that has some genuine reliance > >> on > >> the character encoding. But then again, this can be handled by > >> case-folding > >> the strings first, something that mbstring is going to do internally > >> anyway. > >> > >> I would happily accept a proposal for str_starts_with() + > >> str_ends_with(), > >> but I'm a lot more apprehensive about adding these 8 new functions. > >> > >> Regards, > >> Nikita > > > > > > I like the idea of simplifying this to the two functions > > str_starts_with() and str_ends_with(). > > > > When I was looking through this the other day, I had trouble coming up > > with an example of a string with the mb_* versions would ever generate > > a different result from the non-multibyte versions, since the > > implementation only needs to count and analyze bytes for uniqueness. > > Perhaps it would only be an issue with the case-insensitive versions, > > as Nikita points out? If so, can someone provide some example strings > > where an mb_starts_with_ci() would return true, while > > str_starts_with_ci() would return false? > > > > I think the case sensitivity versions would be common enough in use > > cases (i.e. looking to see if a path ends with .CSV vs. .csv, etc.), > > but maybe the signatures could be revised to pass a third parameter? > > > > str_starts_with($haystack, $needle, $case_sensitive = true): bool > > > > -Ben >
  106109
June 29, 2019 15:41 me@jhdxr.com (=?utf-8?b?Q0hVIFpoYW93ZWk=?=)
Agreed. I'm wondering why the author choose to use begin(s) /end(s) while almost all other popular language has a more clear naming. e.g. starts_with or has_prefix. 

In addition, like someone else pointed out two years ago, userland may already have functions with the same name, and this should be considered as a potential BC break, which is not reflected in the RFC yet. 

Regards,
CHU Zhaowei

> -----Original Message----- > From: Nikita Popov ppv@gmail.com> > Sent: Saturday, June 29, 2019 6:07 AM > To: will@wkhudgins.info > Cc: Ben Ramsey <ben@benramsey.com>; PHP internals <internals@lists.php.net> > Subject: Re: [PHP-DEV] [RFC] Desire to move RFC > add_str_begin_and_end_functions to a vote > > On Fri, Jun 28, 2019 at 10:54 PM <will@wkhudgins.info> wrote: > > > These are good points. Originally my RFC called for less functions but > > based on feedback I added the others. My proposal: take the RFC as-is > > to a vote. If it fails, I will raise another RFC for a vote that will > > just contain the two basic functions: str_begins and str_ends. > > > > To put my comments into more actionable form, here is what I would > recommend for this RFC: > > * Rename str_begins -> str_starts_with, str_ends -> str_ends_with, str_ibegins -> > str_starts_with_ci, str_iends -> str_ends_with_ci. As mentioned before, this is > standard terminology used by many, many programming languages and it would > be great if PHP did not deviate from convention without strong reason. > * Have a separate vote (in the same RFC) for the addition of the corresponding > mb_* variants. > > I believe doing those two changes will ensure that the core part of the RFC > passes. I personally would be voting yes on the first part and no on the second, > but others may decide as they see fit. > > Nikita > > > > On 2019-06-22 15:56, Ben Ramsey wrote: > > >> On Jun 22, 2019, at 10:32, Nikita Popov ppv@gmail.com> wrote: > > >> > > >>> On Thu, Jun 20, 2019 at 12:32 AM <will@wkhudgins.info> wrote: > > >>> > > >>> I sent this earlier this week without [RFC] in the subject > > >>> line...since some people might have filters to check the subject > > >>> line I wanted to send this again with the proper substring in the > > >>> subject line–to make it clear I intend to take this to a vote in > > >>> two weeks. Apologies for the duplicate email. > > >>> > > >>> -Will > > >>> > > >>>> On 2019-06-18 14:45, will@wkhudgins.info wrote: > > >>>> Hello all, > > >>>> > > >>>> I submitted this RFC several years ago. I collected a lot of > > >>>> feedback and I have updated the RFC and corresponding github > > >>>> patch. Please see the RFC at > > >>>> https://wiki.php.net/rfc/add_str_begin_and_end_functions > > >>>> and the github patch at https://github.com/php/php-src/pull/2049. > > >>>> I have addressed many concerns (order of arguments, name of > > >>>> functions, multibye support, etc). I plan to move this RFC to a > > >>>> vote in the coming weeks. > > >>>> > > >>>> Thanks, > > >>>> > > >>>> Will > > >>> > > >> > > >> Unfortunately, this looks like a case where the RFC feedback has > > >> made the proposal worse, rather than better :( > > >> > > >> I think it's easier to start with what I think this proposal should > > >> be: > > >> There should be just two functions, str_starts_with() and > > >> str_ends_with() > > >> -- and that's it. > > >> > > >> The important realization to have here is that these functions are > > >> a bit of sugar for an operation that is quite common, but can also > > >> be easily implemented with existing functions (using strcmp, strpos > > >> or substr, depending on what you like). There is no need for us to > > >> cover every conceivable combination, just make the common case more > > >> convenient and easier to read. > > >> > > >> With that in mind: > > >> * I believe the "starts with" and "ends with" naming is a lot more > > >> canonical, used by Python, Ruby, Java, JavaScript and probably lots > > >> more. > > >> * In my experience case-insensitive "i" variants of strings > > >> functions are used much less, by an order of magnitude. With this > > >> being sugar in the first place, I don't think there's a need to > > >> cover case-insensitive variations (and from a quick look, these > > >> don't seem to be first class methods in other languages either). If > > >> we do want to have them, I'd suggest making the names > > >> str_starts_with_ci() and str_ends_with_ci(), which is more obvious > > >> and harder to miss than str_istarts_with() etc. > > >> * Having mb_* variants of these functions doesn't really make > > >> sense. I realize that there's this knee-jerk reaction about how if > > >> it doesn't have "mb" in the name it's not Unicode compatible, but > > >> in this case it's even more wrong than usual. The normal > > >> str_starts_with() function is perfectly safe to use on UTF-8 > > >> strings, the only difference between it and > > >> mb_str_starts_with() is that it's going to be implemented a lot > > >> more efficiently. The only case that *might* make some sense is the > > >> case-insensitive variant here, because that has some genuine > > >> reliance on the character encoding. But then again, this can be > > >> handled by case-folding the strings first, something that mbstring > > >> is going to do internally anyway. > > >> > > >> I would happily accept a proposal for str_starts_with() + > > >> str_ends_with(), but I'm a lot more apprehensive about adding these > > >> 8 new functions. > > >> > > >> Regards, > > >> Nikita > > > > > > > > > I like the idea of simplifying this to the two functions > > > str_starts_with() and str_ends_with(). > > > > > > When I was looking through this the other day, I had trouble coming > > > up with an example of a string with the mb_* versions would ever > > > generate a different result from the non-multibyte versions, since > > > the implementation only needs to count and analyze bytes for uniqueness. > > > Perhaps it would only be an issue with the case-insensitive > > > versions, as Nikita points out? If so, can someone provide some > > > example strings where an mb_starts_with_ci() would return true, > > > while > > > str_starts_with_ci() would return false? > > > > > > I think the case sensitivity versions would be common enough in use > > > cases (i.e. looking to see if a path ends with .CSV vs. .csv, etc.), > > > but maybe the signatures could be revised to pass a third parameter? > > > > > > str_starts_with($haystack, $needle, $case_sensitive = true): bool > > > > > > -Ben > >
  106110
June 29, 2019 15:53 will@wkhudgins.info
Nikita: I like the idea of splitting the mb_* versions from the main 
vote...I'll have to see how to do that in the docu-wiki GUI but I like 
the idea!

CHU: I will add a note that some userland functions may not be 
compatible with this change although I don't think that should be a 
showstopper, voters can decide as they see fit.

How do people tend to feel about the "str_startswith" vs 
"str_starts_with" naming convention? I've seen people propose both.

Thanks,

Will

On 2019-06-29 11:41, CHU Zhaowei wrote:
> Agreed. I'm wondering why the author choose to use begin(s) /end(s) > while almost all other popular language has a more clear naming. e.g. > starts_with or has_prefix. > > In addition, like someone else pointed out two years ago, userland may > already have functions with the same name, and this should be > considered as a potential BC break, which is not reflected in the RFC > yet. > > Regards, > CHU Zhaowei > >> -----Original Message----- >> From: Nikita Popov ppv@gmail.com> >> Sent: Saturday, June 29, 2019 6:07 AM >> To: will@wkhudgins.info >> Cc: Ben Ramsey <ben@benramsey.com>; PHP internals >> <internals@lists.php.net> >> Subject: Re: [PHP-DEV] [RFC] Desire to move RFC >> add_str_begin_and_end_functions to a vote >> >> On Fri, Jun 28, 2019 at 10:54 PM <will@wkhudgins.info> wrote: >> >> > These are good points. Originally my RFC called for less functions but >> > based on feedback I added the others. My proposal: take the RFC as-is >> > to a vote. If it fails, I will raise another RFC for a vote that will >> > just contain the two basic functions: str_begins and str_ends. >> > >> >> To put my comments into more actionable form, here is what I would >> recommend for this RFC: >> >> * Rename str_begins -> str_starts_with, str_ends -> str_ends_with, >> str_ibegins -> >> str_starts_with_ci, str_iends -> str_ends_with_ci. As mentioned >> before, this is >> standard terminology used by many, many programming languages and it >> would >> be great if PHP did not deviate from convention without strong reason. >> * Have a separate vote (in the same RFC) for the addition of the >> corresponding >> mb_* variants. >> >> I believe doing those two changes will ensure that the core part of >> the RFC >> passes. I personally would be voting yes on the first part and no on >> the second, >> but others may decide as they see fit. >> >> Nikita >> >> >> > On 2019-06-22 15:56, Ben Ramsey wrote: >> > >> On Jun 22, 2019, at 10:32, Nikita Popov ppv@gmail.com> wrote: >> > >> >> > >>> On Thu, Jun 20, 2019 at 12:32 AM <will@wkhudgins.info> wrote: >> > >>> >> > >>> I sent this earlier this week without [RFC] in the subject >> > >>> line...since some people might have filters to check the subject >> > >>> line I wanted to send this again with the proper substring in the >> > >>> subject line–to make it clear I intend to take this to a vote in >> > >>> two weeks. Apologies for the duplicate email. >> > >>> >> > >>> -Will >> > >>> >> > >>>> On 2019-06-18 14:45, will@wkhudgins.info wrote: >> > >>>> Hello all, >> > >>>> >> > >>>> I submitted this RFC several years ago. I collected a lot of >> > >>>> feedback and I have updated the RFC and corresponding github >> > >>>> patch. Please see the RFC at >> > >>>> https://wiki.php.net/rfc/add_str_begin_and_end_functions >> > >>>> and the github patch at https://github.com/php/php-src/pull/2049. >> > >>>> I have addressed many concerns (order of arguments, name of >> > >>>> functions, multibye support, etc). I plan to move this RFC to a >> > >>>> vote in the coming weeks. >> > >>>> >> > >>>> Thanks, >> > >>>> >> > >>>> Will >> > >>> >> > >> >> > >> Unfortunately, this looks like a case where the RFC feedback has >> > >> made the proposal worse, rather than better :( >> > >> >> > >> I think it's easier to start with what I think this proposal should >> > >> be: >> > >> There should be just two functions, str_starts_with() and >> > >> str_ends_with() >> > >> -- and that's it. >> > >> >> > >> The important realization to have here is that these functions are >> > >> a bit of sugar for an operation that is quite common, but can also >> > >> be easily implemented with existing functions (using strcmp, strpos >> > >> or substr, depending on what you like). There is no need for us to >> > >> cover every conceivable combination, just make the common case more >> > >> convenient and easier to read. >> > >> >> > >> With that in mind: >> > >> * I believe the "starts with" and "ends with" naming is a lot more >> > >> canonical, used by Python, Ruby, Java, JavaScript and probably lots >> > >> more. >> > >> * In my experience case-insensitive "i" variants of strings >> > >> functions are used much less, by an order of magnitude. With this >> > >> being sugar in the first place, I don't think there's a need to >> > >> cover case-insensitive variations (and from a quick look, these >> > >> don't seem to be first class methods in other languages either). If >> > >> we do want to have them, I'd suggest making the names >> > >> str_starts_with_ci() and str_ends_with_ci(), which is more obvious >> > >> and harder to miss than str_istarts_with() etc. >> > >> * Having mb_* variants of these functions doesn't really make >> > >> sense. I realize that there's this knee-jerk reaction about how if >> > >> it doesn't have "mb" in the name it's not Unicode compatible, but >> > >> in this case it's even more wrong than usual. The normal >> > >> str_starts_with() function is perfectly safe to use on UTF-8 >> > >> strings, the only difference between it and >> > >> mb_str_starts_with() is that it's going to be implemented a lot >> > >> more efficiently. The only case that *might* make some sense is the >> > >> case-insensitive variant here, because that has some genuine >> > >> reliance on the character encoding. But then again, this can be >> > >> handled by case-folding the strings first, something that mbstring >> > >> is going to do internally anyway. >> > >> >> > >> I would happily accept a proposal for str_starts_with() + >> > >> str_ends_with(), but I'm a lot more apprehensive about adding these >> > >> 8 new functions. >> > >> >> > >> Regards, >> > >> Nikita >> > > >> > > >> > > I like the idea of simplifying this to the two functions >> > > str_starts_with() and str_ends_with(). >> > > >> > > When I was looking through this the other day, I had trouble coming >> > > up with an example of a string with the mb_* versions would ever >> > > generate a different result from the non-multibyte versions, since >> > > the implementation only needs to count and analyze bytes for uniqueness. >> > > Perhaps it would only be an issue with the case-insensitive >> > > versions, as Nikita points out? If so, can someone provide some >> > > example strings where an mb_starts_with_ci() would return true, >> > > while >> > > str_starts_with_ci() would return false? >> > > >> > > I think the case sensitivity versions would be common enough in use >> > > cases (i.e. looking to see if a path ends with .CSV vs. .csv, etc.), >> > > but maybe the signatures could be revised to pass a third parameter? >> > > >> > > str_starts_with($haystack, $needle, $case_sensitive = true): bool >> > > >> > > -Ben >> >
  106114
June 29, 2019 20:36 d.takken@xs4all.nl (Dik Takken)
On 29-06-19 17:53, will@wkhudgins.info wrote:
> How do people tend to feel about the "str_startswith" vs > "str_starts_with" naming convention? I've seen people propose both.
For best readability one should write '_' between separate words, just like a space would be used in regular text, IHMO. The PHP standard library already has a number of methods that are named that way, like str_word_count(). So, I would favor to have str_starts_with(). Regards, Dik Takken
  106142
July 4, 2019 16:43 will@wkhudgins.info
I have updated the RFC here 
https://wiki.php.net/rfc/add_str_begin_and_end_functions to reflect 
changes from the mailing list discussions. I will promptly open voting 
on this RFC.

-Will
  106144
July 4, 2019 22:18 bjorn.x.larsson@telia.com (=?UTF-8?Q?Bj=c3=b6rn_Larsson?=)
Den 2019-07-04 kl. 18:43, skrev will@wkhudgins.info:
> I have updated the RFC here > https://wiki.php.net/rfc/add_str_begin_and_end_functions to reflect > changes from the mailing list discussions. I will promptly open voting > on this RFC. > > -Will > Hi,
I think it would be good to include references to the Javscript & Python functions that was referenced earlier since it was a driver for the name change. r//Björn L
  106148
July 5, 2019 02:13 will@wkhudgins.info
Hello all,

After 15 days of discussion I have opened up voting on the following RFC 
(https://wiki.php.net/rfc/add_str_begin_and_end_functions) .

You can access the voting page here: 
https://wiki.php.net/rfc/add_str_begin_and_end_functions/vote

I have never set up a vote on doku-wiki so please let me know if I made 
the vote incorrectly!

Thanks,

Will
  106149
July 5, 2019 02:16 will@wkhudgins.info
Following up on this, I plan to leave voting open for a full 15 days, 
until July 20, 2019 Anywhere-on-Earth (AOE) time. If there are issues 
with this time, let me know.

Thanks,

Will

On 2019-07-04 22:13, will@wkhudgins.info wrote:
> Hello all, > > After 15 days of discussion I have opened up voting on the following > RFC (https://wiki.php.net/rfc/add_str_begin_and_end_functions) . > > You can access the voting page here: > https://wiki.php.net/rfc/add_str_begin_and_end_functions/vote > > I have never set up a vote on doku-wiki so please let me know if I > made the vote incorrectly! > > Thanks, > > Will
  106150
July 5, 2019 04:16 theodorejb@outlook.com (Theodore Brown)
On Thu, July 4, 2019 at 9:13 PM Will <will@wkhudgins.info> wrote:

> After 15 days of discussion I have opened up voting on the following RFC > (https://wiki.php.net/rfc/add_str_begin_and_end_functions) . > > You can access the voting page here: > https://wiki.php.net/rfc/add_str_begin_and_end_functions/vote > > I have never set up a vote on doku-wiki so please let me know if I made > the vote incorrectly!
It seems really unusual for voting to be on a separate page than the RFC. Can you move the doodle voting macro to a "Vote" section on the main RFC page? Thanks, Theodore
  106151
July 5, 2019 08:04 petercowburn@gmail.com (Peter Cowburn)
On Fri, 5 Jul 2019 at 05:17, Theodore Brown <theodorejb@outlook.com> wrote:

> On Thu, July 4, 2019 at 9:13 PM Will <will@wkhudgins.info> wrote: > > > After 15 days of discussion I have opened up voting on the following RFC > > (https://wiki.php.net/rfc/add_str_begin_and_end_functions) . > > > > You can access the voting page here: > > https://wiki.php.net/rfc/add_str_begin_and_end_functions/vote > > > > I have never set up a vote on doku-wiki so please let me know if I made > > the vote incorrectly! > > It seems really unusual for voting to be on a separate page than the > RFC. Can you move the doodle voting macro to a "Vote" section on the > main RFC page? >
Further to this, please follow the instructions at https://wiki.php.net/rfc/howto. It has simple to follow steps detailing exactly what to do. Also, this RFC is *still* showing as "inactive" on the RFC list ( https://wiki.php.net/rfc) - anyone watching that page won't even know it was back under discussion never mind in voting.
> > Thanks, > Theodore > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > >
  106152
July 5, 2019 08:37 nikita.ppv@gmail.com (Nikita Popov)
On Fri, Jul 5, 2019 at 6:17 AM Theodore Brown <theodorejb@outlook.com>
wrote:

> On Thu, July 4, 2019 at 9:13 PM Will <will@wkhudgins.info> wrote: > > > After 15 days of discussion I have opened up voting on the following RFC > > (https://wiki.php.net/rfc/add_str_begin_and_end_functions) . > > > > You can access the voting page here: > > https://wiki.php.net/rfc/add_str_begin_and_end_functions/vote > > > > I have never set up a vote on doku-wiki so please let me know if I made > > the vote incorrectly! > > It seems really unusual for voting to be on a separate page than the > RFC. Can you move the doodle voting macro to a "Vote" section on the > main RFC page? > > Thanks, > Theodore
I've taken the liberty to move the voting widgets onto the main RFC page, the new voting link is: https://wiki.php.net/rfc/add_str_begin_and_end_functions#vote I've also move the RFC into the voting section on the RFC overview page. Nikita
  106159
July 7, 2019 20:45 theodorejb@outlook.com (Theodore Brown)
On Thu, July 4, 2019 at 9:13 PM Will <will@wkhudgins.info> wrote:

> Hello all, > > After 15 days of discussion I have opened up voting on the following > RFC (https://wiki.php.net/rfc/add_str_begin_and_end_functions).
Thank you for your work on this. I'm surprised that so far the vote is so controversial, with 8 votes in favor and 8 opposed. For those voting against adding these functions, can you clarify why? Do you dislike how they are named, or do you not see the need for the case insensitive versions, or is there an issue with the implementation? Personally I'd find the basic `str_starts_with` and `str_ends_with` functions very valuable. Currently I either have to implement functions like this myself in almost every script, or else write repetitious code like the following: ```php $needle = "foobar"; if (substr($haystack, 0, strlen($needle)) === $needle) { // starts with "foobar" } ``` To avoid repetition, many developers use the following pattern instead: ```php if (strpos($haystack, "foobar") === 0) { // starts with "foobar" } ``` However, with longer strings this becomes far less efficient, since PHP has to search through the entire haystack to find the needle position. If this RFC is accepted, these awkward and inefficient approaches could be replaced with straightforward and fast code like this: ```php if (str_starts_with($haystack, "foobar")) { // ... } ``` Please vote on the RFC if you haven't already. Clarification would be appreciated if don't feel that these functions would be a good addition. Best regards, Theodore
  106174
July 8, 2019 14:10 pollita@php.net (Sara Golemon)
On Sun, Jul 7, 2019 at 3:45 PM Theodore Brown <theodorejb@outlook.com>
wrote:
> For those voting against adding these functions, can you clarify why? > Explaining my non-vote. I'm explicitly abstaining as I don't see the value
in these functions (I'd rather see a community driven library which does the same thing in a more agile way), but neither do I see much intrinsic harm in allowing these functions in. I did vote against the mb* variants as I'd like to see those die in favor of ext/intl in all ways and every way. -Sara
  106178
July 8, 2019 15:13 theodorejb@outlook.com (Theodore Brown)
On Mon, July 8, 2019 at 9:10 AM Sara Golemon <pollita@php.net> wrote:

> On Sun, Jul 7, 2019 at 3:45 PM Theodore Brown <theodorejb@outlook.com> wrote: > > > For those voting against adding these functions, can you clarify why? > > Explaining my non-vote. I'm explicitly abstaining as I don't see the > value in these functions (I'd rather see a community driven library > which does the same thing in a more agile way), but neither do I see > much intrinsic harm in allowing these functions in.
Thanks Sara. I understand your perspective of not wanting to add more functions to PHP core which can be easily implemented in userland. However, when it comes to basic string functions which are needed in almost every script, I don't think it makes sense to ask users to depend on an extra library for this. Almost every other language has built-in functions for simply checking if a string starts or ends with another string. Best regards, Theodore
  106181
July 8, 2019 18:09 bjorn.x.larsson@telia.com (=?UTF-8?Q?Bj=c3=b6rn_Larsson?=)
Den 2019-07-07 kl. 22:45, skrev Theodore Brown:

> On Thu, July 4, 2019 at 9:13 PM Will <will@wkhudgins.info> wrote: > >> Hello all, >> >> After 15 days of discussion I have opened up voting on the following >> RFC (https://wiki.php.net/rfc/add_str_begin_and_end_functions). > > Thank you for your work on this. I'm surprised that so far the vote > is so controversial, with 8 votes in favor and 8 opposed. > > For those voting against adding these functions, can you clarify why? > Do you dislike how they are named, or do you not see the need for the > case insensitive versions, or is there an issue with the implementation? > > Personally I'd find the basic `str_starts_with` and `str_ends_with` > functions very valuable. Currently I either have to implement functions > like this myself in almost every script, or else write repetitious > code like the following: > > ```php > $needle = "foobar"; > > if (substr($haystack, 0, strlen($needle)) === $needle) { > // starts with "foobar" > } > ``` > > To avoid repetition, many developers use the following pattern instead: > > ```php > if (strpos($haystack, "foobar") === 0) { > // starts with "foobar" > } > ``` > > However, with longer strings this becomes far less efficient, since PHP > has to search through the entire haystack to find the needle position. > > If this RFC is accepted, these awkward and inefficient approaches > could be replaced with straightforward and fast code like this: > > ```php > if (str_starts_with($haystack, "foobar")) { > // ... > } > ``` > > Please vote on the RFC if you haven't already. Clarification would be > appreciated if don't feel that these functions would be a good addition. > > Best regards, > Theodore Hi,
Having this _ci postfix is  a new way of indicating case insensitivity. I think that it might add to negative votes. Personally I think it's a good idea to mimic existing ways, even if they are a bit awkward. How about using a flag or following "tradition", like stri_starts_with & stri_ends_with or str_istarts_with & str_iends_with? That would follow strstr / stristr and str_replace / str_ireplace. I have no voting rights though. r//Björn L
  106187
July 8, 2019 21:52 ben@benramsey.com (Ben Ramsey)
> On Jul 8, 2019, at 13:09, Björn Larsson larsson@telia.com> wrote: > > Having this _ci postfix is a new way of indicating case insensitivity. > I think that it might add to negative votes. Personally I think it's a > good idea to mimic existing ways, even if they are a bit awkward. > > How about using a flag or following "tradition", like stri_starts_with > & stri_ends_with or str_istarts_with & str_iends_with? That would > follow strstr / stristr and str_replace / str_ireplace. > > I have no voting rights though.
I made this recommendation earlier in the other thread (<https://externals.io/message/94787#106035>), but it didn’t get any traction or response:
> maybe the signatures could be revised to pass a third parameter? > > str_starts_with($haystack, $needle, $case_sensitive = true): bool
Since voting has already begun, is this something that could still be considered? -Ben
  106188
July 9, 2019 01:25 will@wkhudgins.info
On 2019-07-08 17:52, Ben Ramsey wrote:
>> On Jul 8, 2019, at 13:09, Björn Larsson larsson@telia.com> >> wrote: >> >> Having this _ci postfix is a new way of indicating case >> insensitivity. >> I think that it might add to negative votes. Personally I think it's a >> good idea to mimic existing ways, even if they are a bit awkward. >> >> How about using a flag or following "tradition", like stri_starts_with >> & stri_ends_with or str_istarts_with & str_iends_with? That would >> follow strstr / stristr and str_replace / str_ireplace. >> >> I have no voting rights though. > > > I made this recommendation earlier in the other thread > (<https://externals.io/message/94787#106035>), but it didn’t get any > traction or response: > >> maybe the signatures could be revised to pass a third parameter? >> >> str_starts_with($haystack, $needle, $case_sensitive = true): bool > > Since voting has already begun, is this something that could still be > considered? > > -Ben
Thanks for the interest everyone! I've been following the email thread and have a few thoughts. - At one point I had it set to take case sensitivity as a parameter (https://github.com/php/php-src/pull/2049/commits/f89d8edc5f32d8a4b702699209e72d864e2ca440). That isn't a bad idea IMO. I changed it to have split functions to match str_ireplace, stripos, etc. - I agree the *_ci naming convention is different than most of the existing codebase, but a lot of discussion during the process led to the *_ci naming convention. And while the *_ci naming convention isn't traditional, it does seem more intuitive. The i is easier to read in something short like "str_ireplace" but kind of gets lost in something long like "str_istarts_with". - I'd considered splitting the vote into 3 parts: 1) str_starts_with and str_ends_with 2) str_starts_with_ci and str_ends_with_ci 3) The mb_* functions. But I decided against that as I felt that might be overly splitting up the proposal. If the main issue is naming and not functionality, I am happy to rework the RFC (if it fails) to be more palatable. I primarily would like to add this functionality to PHP, regardless of the naming. In my opinion one of the nice things about PHP is that it comes with so many things under the hood. As a user of the language that is something I appreciate. A lot of powerful functionality is baked into the language and that functionality is available on almost every web host. A language like Python or Java just can't compare in that respect. Even NodeJS requires an extensive amount of packages to accomplish even simple tasks. PHP is nice because it ships with "batteries" included. Sure, that brings some issues along with it, but that is as much a strength of the language as it is a challenge. Adding a common task like starts_with and ends_with seems like a reasonable thing to do. Thanks, Will
  106191
July 9, 2019 07:40 phpmailinglists@gmail.com (Peter Bowyer)
On Mon, 8 Jul 2019 at 19:09, Björn Larsson larsson@telia.com>
wrote:

> Having this _ci postfix is a new way of indicating case insensitivity. > I think that it might add to negative votes. Personally I think it's a > good idea to mimic existing ways, even if they are a bit awkward. > > How about using a flag or following "tradition", like stri_starts_with > & stri_ends_with or str_istarts_with & str_iends_with? That would > follow strstr / stristr and str_replace / str_ireplace. >
I would vote yes with that naming. It's a damn silly tradition, but it's what PHP uses for other functions, and keeping consistency is better than improving individual functions. Peter
  106192
July 9, 2019 08:55 claude.pache@gmail.com (Claude Pache)
> Le 9 juil. 2019 à 09:40, Peter Bowyer <phpmailinglists@gmail.com> a écrit : > > On Mon, 8 Jul 2019 at 19:09, Björn Larsson larsson@telia.com> > wrote: > >> Having this _ci postfix is a new way of indicating case insensitivity. >> I think that it might add to negative votes. Personally I think it's a >> good idea to mimic existing ways, even if they are a bit awkward. >> >> How about using a flag or following "tradition", like stri_starts_with >> & stri_ends_with or str_istarts_with & str_iends_with? That would >> follow strstr / stristr and str_replace / str_ireplace. >> > > I would vote yes with that naming. It's a damn silly tradition, but it's > what PHP uses for other functions, and keeping consistency is better than > improving individual functions. > > Peter
There are currently (at least) two ways for marking case insensitivity in the name: the character “i” as in stripos() and the substring “case” as in: strcasecmp(). Adding a third way, namely the “ci” suffix (or, even worse, a flag) is absolutely in the silly tradition of inconsistent naming of PHP functions (although admittedly not one we should strive to maintain)... except that “ci” is maybe more meaningful than “i” and “case”. —Claude
  106244
July 22, 2019 08:53 nikita.ppv@gmail.com (Nikita Popov)
On Fri, Jul 5, 2019 at 4:13 AM <will@wkhudgins.info> wrote:

> Hello all, > > After 15 days of discussion I have opened up voting on the following RFC > (https://wiki.php.net/rfc/add_str_begin_and_end_functions) . > > You can access the voting page here: > https://wiki.php.net/rfc/add_str_begin_and_end_functions/vote > > I have never set up a vote on doku-wiki so please let me know if I made > the vote incorrectly! > > Thanks, > > Will >
As we're already two days past the announced end, I've closed the RFC vote. The final outcome is 26 in favor vs 20 against for str_starts_with and friends, and 4 in favor to 36 against for mb_starts_with and friends. Because a 2/3 majority is required, both parts of the proposal are declined. Based on the discussion during voting, I think that trying this again with just str_starts_with+str_ends_with without the case-insensitive variants might pass, as that's where the main controversy seems to be -- though some people also expressed the view that these functions are too trivial to add to the standard library. In any case, thanks for driving this through the RFC process! Nikita
  106037
June 22, 2019 20:27 weirdan@gmail.com (Bruce Weirdan)
On Sat, Jun 22, 2019 at 6:32 PM Nikita Popov ppv@gmail.com> wrote:
> > The normal str_starts_with() function is perfectly safe to use on UTF-8 strings,
Only if you assume strings to be normalized to the same form. Checking if NFC string starts with NFD substring by checking them bit by bit is going to yield false negatives [1] [1] https://3v4l.org/4HgUL -- Best regards, Bruce Weirdan mailto:weirdan@gmail.com
  106038
June 22, 2019 20:43 nikita.ppv@gmail.com (Nikita Popov)
On Sat, Jun 22, 2019 at 10:27 PM Bruce Weirdan <weirdan@gmail.com> wrote:

> On Sat, Jun 22, 2019 at 6:32 PM Nikita Popov ppv@gmail.com> wrote: > > > > The normal str_starts_with() function is perfectly safe to use on UTF-8 > strings, > > Only if you assume strings to be normalized to the same form. Checking if > NFC > string starts with NFD substring by checking them bit by bit is going > to yield false negatives [1] > > [1] https://3v4l.org/4HgUL >
That's correct, but not really relevant in the context of the discussion, as mbstring does not perform Unicode normalization, so mb_* functions wouldn't change anything about this. (Not that basic string operations should be performing implicit Unicode normalization...) Nikita