Proposal for a new basic function: str_contains

  108562
February 14, 2020 09:17 philipp.tanlak@gmail.com (Philipp Tanlak)
Hello PHP Devs,

I would like to propose the new basic function: str_contains.

The goal of this proposal is to standardize on a function, to check weather
or not a string is contained in another string, which has a very common
use-case in almost every PHP project.
PHP Frameworks like Laravel create helper functions for this behavior
because it is so ubiquitous.

There are currently a couple of approaches to create such a behavior, most
commonly:
    https://github.com/php/php-src/pull/5179 ).

To get this function into the PHP core, I will open up an RFC for this.
But first, I would like to get your opinions and consensus on this proposal.

What are your opinions on this proposal?

Kind regards,
Philipp Tanlak
  108563
February 14, 2020 09:44 phpmailinglists@gmail.com (Peter Bowyer)
On Fri, 14 Feb 2020 at 09:18, Philipp Tanlak tanlak@gmail.com>
wrote:

> I would like to propose the new basic function: str_contains. > > The proposed signature for this function follows the conventions of other > signatures of string functions and should look like this: > > str_contains(string $haystack, string $needle): bool > > What are your opinions on this proposal? >
In principle, yes. There are a couple of considerations first, like how you plan to handle case-insensitive matches; and previous discussions for this and the wider context of related string functions: https://externals.io/message/106162 https://externals.io/message/100142 https://externals.io/message/94787 https://wiki.php.net/rfc/add_str_begin_and_end_functions Peter
  108564
February 14, 2020 09:58 aegir@aegir.sexy (Aegir Leet)
I generally like the idea, but it seems many (most?) real-world 
implementations actually use mb_strpos() !== false by default.

https://github.com/danielstjules/Stringy/blob/df24ab62d2d8213bbbe88cc36fc35a4503b4bd7e/src/Stringy.php#L206-L215
https://github.com/illuminate/support/blob/6eff6cff19f7ad5540b9a61a9fb3612ca8218c19/Str.php#L157-L166

So there should definitely be an mb_str_contains in ext/mbstring in 
addition to the regular str_contains proposed here.
  108565
February 14, 2020 10:14 george.banyard@gmail.com ("G. P. B.")
On Fri, 14 Feb 2020 at 10:58, Aegir Leet <aegir@aegir.sexy> wrote:

> I generally like the idea, but it seems many (most?) real-world > implementations actually use mb_strpos() !== false by default. > > > https://github.com/danielstjules/Stringy/blob/df24ab62d2d8213bbbe88cc36fc35a4503b4bd7e/src/Stringy.php#L206-L215 > > https://github.com/illuminate/support/blob/6eff6cff19f7ad5540b9a61a9fb3612ca8218c19/Str.php#L157-L166 > > So there should definitely be an mb_str_contains in ext/mbstring in > addition to the regular str_contains proposed here. >
The biggest reason to have an mb_* variant if for when comparing with case insensitivity. The only other reason is if you need to check a string which is in a different encoding, which is, I'm assuming, is a quasi non-existent problem as everything things is UTF-8 nowadays. The reason why I personally voted no on the previous RFC was that I don't see the value of having functions checking if a string starts/ends with a sequence but not a general one. Moreover, checking for a substring to start/end a string seems to be fitting for the current strpos functions. This function on it's own is way more reasonable and useful to add IMHO Best regards George P. Banyard
  108634
February 17, 2020 07:18 guilliam.xavier@gmail.com (Guilliam Xavier)
On Fri, Feb 14, 2020 at 11:14 AM G. P. B. banyard@gmail.com> wrote:
> > Moreover, checking for a substring to start/end a string seems > to be > fitting for the current strpos functions.
Maybe in terms of semantics (`0 === strpos($haystack, $needle)`), but suboptimal in terms of performance, especially when $haystack is a *very long* string which *doesn't* contain $needle, strpos() will vainly search along the whole string, while a specialized function would stop as soon as possible (which is also the case of existing strncmp() but you need to write `0 === strncmp($haystack, $needle, strlen($needle))`, arguably not really the cleanest code...). For "contains" you have to search along the whole string anyway, so `str_contains()` is "just" `false !== strpos()` but cleaner. To be clear, I'm not against the current proposal (rather for actually) [I just would want `str_{starts,ends}_with` even more (without case-insensitive nor multibyte variants)] -- Guilliam Xavier On Fri, Feb 14, 2020 at 11:14 AM G. P. B. banyard@gmail.com> wrote:
> > On Fri, 14 Feb 2020 at 10:58, Aegir Leet <aegir@aegir.sexy> wrote: > > > I generally like the idea, but it seems many (most?) real-world > > implementations actually use mb_strpos() !== false by default. > > > > > > https://github.com/danielstjules/Stringy/blob/df24ab62d2d8213bbbe88cc36fc35a4503b4bd7e/src/Stringy.php#L206-L215 > > > > https://github.com/illuminate/support/blob/6eff6cff19f7ad5540b9a61a9fb3612ca8218c19/Str.php#L157-L166 > > > > So there should definitely be an mb_str_contains in ext/mbstring in > > addition to the regular str_contains proposed here. > > > > The biggest reason to have an mb_* variant if for when comparing with case > insensitivity. > The only other reason is if you need to check a string which is in a > different encoding, > which is, I'm assuming, is a quasi non-existent problem as everything > things is UTF-8 > nowadays. > > The reason why I personally voted no on the previous RFC was that I don't > see the > value of having functions checking if a string starts/ends with a sequence > but not a > general one. Moreover, checking for a substring to start/end a string seems > to be > fitting for the current strpos functions. > > This function on it's own is way more reasonable and useful to add IMHO > > Best regards > > George P. Banyard
-- Guilliam Xavier
  108639
February 17, 2020 09:02 philipp.tanlak@gmail.com (Philipp Tanlak)
Now that we've talked about the pros and cons of case-insensitivity and
multibyte variants, I'm still unsure what your opinions on those are.

* Should we include a case-insensitive variant (str_icontains) ?
* Should we include multibyte variants (mb_str_icontains) ?

Slightly off-topic:
Also, since this is my first time I'm trying to contribute: How can we
proceed to write an RFC?
I've read in the howto, that I need to earn RFC karma in order to create a
new RFC page. How can I request that?
My wiki.php.net username is: philippta and my email is
philipp.tanlak@gmail.com

Thanks for your help :)
  108641
February 17, 2020 09:52 nikita.ppv@gmail.com (Nikita Popov)
On Mon, Feb 17, 2020 at 10:03 AM Philipp Tanlak tanlak@gmail.com>
wrote:

> Now that we've talked about the pros and cons of case-insensitivity and > multibyte variants, I'm still unsure what your opinions on those are. > > * Should we include a case-insensitive variant (str_icontains) ? > * Should we include multibyte variants (mb_str_icontains) ? >
Especially considering how past proposals in this general area went, I'd suggest to start small (just str_contains), and then go from there. (Personally I'd like to have the trifecta of str_contains, str_starts_with and str_ends_with in one go, but given that a proposal for the latter two recently failed... though the main contention there seems to be the case-insensitive part, not the functions themselves.) Slightly off-topic:
> Also, since this is my first time I'm trying to contribute: How can we > proceed to write an RFC? > I've read in the howto, that I need to earn RFC karma in order to create a > new RFC page. How can I request that? > My wiki.php.net username is: philippta and my email is > philipp.tanlak@gmail.com
I've granted you RFC karma on the wiki, so you should be able to add a new page under wiki.php.net/rfcs now. Regards, Nikita
  108643
February 17, 2020 11:38 philipp.tanlak@gmail.com (Philipp Tanlak)
Am Mo., 17. Feb. 2020 um 10:53 Uhr schrieb Nikita Popov <
nikita.ppv@gmail.com>:

> On Mon, Feb 17, 2020 at 10:03 AM Philipp Tanlak tanlak@gmail.com> > wrote: > >> Now that we've talked about the pros and cons of case-insensitivity and >> multibyte variants, I'm still unsure what your opinions on those are. >> >> * Should we include a case-insensitive variant (str_icontains) ? >> * Should we include multibyte variants (mb_str_icontains) ? >> > > Especially considering how past proposals in this general area went, I'd > suggest to start small (just str_contains), and then go from there. > (Personally I'd like to have the trifecta of str_contains, str_starts_with > and str_ends_with in one go, but given that a proposal for the latter two > recently failed... though the main contention there seems to be the > case-insensitive part, not the functions themselves.) > > Slightly off-topic: >> Also, since this is my first time I'm trying to contribute: How can we >> proceed to write an RFC? >> I've read in the howto, that I need to earn RFC karma in order to create a >> new RFC page. How can I request that? >> My wiki.php.net username is: philippta and my email is >> philipp.tanlak@gmail.com > > > I've granted you RFC karma on the wiki, so you should be able to add a new > page under wiki.php.net/rfcs now. > > Regards, > Nikita >
Thanks for the karma! An RFC has been created: https://wiki.php.net/rfc/str_contains Kind Regards, Philipp
  108645
February 17, 2020 11:48 benjamin.morel@gmail.com (Benjamin Morel)
> > Thanks for the karma! An RFC has been created: > https://wiki.php.net/rfc/str_contains
Something that's missing from the RFC is the behaviour when $needle is an empty string: str_contains('abc', ''); str_contains('', ''); Will these always return false? — Benjamin
  108646
February 17, 2020 11:55 nikita.ppv@gmail.com (Nikita Popov)
On Mon, Feb 17, 2020 at 12:49 PM Benjamin Morel morel@gmail.com>
wrote:

> Thanks for the karma! An RFC has been created: >> https://wiki.php.net/rfc/str_contains > > > > Something that's missing from the RFC is the behaviour when $needle is an > empty string: > > str_contains('abc', ''); > str_contains('', ''); > > Will these always return false? >
As of PHP 8, behavior of '' in string search functions is well defined, and we consider '' to occur at every position in the string, including one past the end. As such, both of these will (or at least should) return true. The empty string is contained in every string. Regards, Nikita
  108647
February 17, 2020 12:08 philipp.tanlak@gmail.com (Philipp Tanlak)
Am Mo., 17. Feb. 2020 um 12:56 Uhr schrieb Nikita Popov <
nikita.ppv@gmail.com>:

> On Mon, Feb 17, 2020 at 12:49 PM Benjamin Morel morel@gmail.com> > wrote: > >> Thanks for the karma! An RFC has been created: >>> https://wiki.php.net/rfc/str_contains >> >> >> >> Something that's missing from the RFC is the behaviour when $needle is an >> empty string: >> >> str_contains('abc', ''); >> str_contains('', ''); >> >> Will these always return false? >> > > As of PHP 8, behavior of '' in string search functions is well defined, > and we consider '' to occur at every position in the string, including one > past the end. As such, both of these will (or at least should) return true. > The empty string is contained in every string. > > Regards, > Nikita >
Thanks for the hint Benjamin. I've cited Nikita and added that to the RFC for clarification.
  108568
February 14, 2020 11:41 claude.pache@gmail.com (Claude Pache)
> Le 14 févr. 2020 à 10:17, Philipp Tanlak tanlak@gmail.com> a écrit : > > Hello PHP Devs, > > I would like to propose the new basic function: str_contains. > > The goal of this proposal is to standardize on a function, to check weather > or not a string is contained in another string, which has a very common > use-case in almost every PHP project. > PHP Frameworks like Laravel create helper functions for this behavior > because it is so ubiquitous.
Some time ago, an RFC proposing to add str_starts_with() and str_ends_with() was unfortunately declined: https://wiki.php.net/rfc/add_str_begin_and_end_functions <https://wiki.php.net/rfc/add_str_begin_and_end_functions> Therefore, unless several people have changed their mind in the right direction since few months ago, I am pessimistic about the acceptance of str_contains(). —Claude
  108569
February 14, 2020 11:53 nikita.ppv@gmail.com (Nikita Popov)
On Fri, Feb 14, 2020 at 10:18 AM Philipp Tanlak tanlak@gmail.com>
wrote:

> Hello PHP Devs, > > I would like to propose the new basic function: str_contains. > > The goal of this proposal is to standardize on a function, to check weather > or not a string is contained in another string, which has a very common > use-case in almost every PHP project. > PHP Frameworks like Laravel create helper functions for this behavior > because it is so ubiquitous. > > There are currently a couple of approaches to create such a behavior, most > commonly: > strpos($haystack, $needle) !== false; > strstr($haystack, $needle) !== false; > preg_match('/' . $needle . '/', $haystack) != 0; > > All of these functions serve the same purpose but are either not intuitive, > easy to get wrong (especially with the !== comparison) or hard to remember > for new PHP developers. > > The proposed signature for this function follows the conventions of other > signatures of string functions and should look like this: > > str_contains(string $haystack, string $needle): bool > > This function is very easy to implement, has no side effects or backward > compatibility issues. > I've implemented this feature and created a pull request on GitHub ( Link: > https://github.com/php/php-src/pull/5179 ). > > To get this function into the PHP core, I will open up an RFC for this. > But first, I would like to get your opinions and consensus on this > proposal. > > What are your opinions on this proposal? >
Sounds good to me. This operation is needed often enough that it deserves a dedicated function. I'd recommend leaving the proposal at only str_contains(), in particular: * Do not propose a case-insensitive variant. I believe this is really the point on which the last str_starts_with/str_ends_with proposal failed. * Do not propose mb_str_contains(). Especially as no offsets are involved, there is no reason to have this function. (For UTF-8, the behavior would be exactly equivalent to str_contains.) Regards, Nikita
  108577
February 14, 2020 14:14 philipp.tanlak@gmail.com (Philipp Tanlak)
Am Fr., 14. Feb. 2020 um 12:54 Uhr schrieb Nikita Popov <
nikita.ppv@gmail.com>:

> On Fri, Feb 14, 2020 at 10:18 AM Philipp Tanlak tanlak@gmail.com> > wrote: > >> Hello PHP Devs, >> >> I would like to propose the new basic function: str_contains. >> >> The goal of this proposal is to standardize on a function, to check >> weather >> or not a string is contained in another string, which has a very common >> use-case in almost every PHP project. >> PHP Frameworks like Laravel create helper functions for this behavior >> because it is so ubiquitous. >> >> There are currently a couple of approaches to create such a behavior, most >> commonly: >> > strpos($haystack, $needle) !== false; >> strstr($haystack, $needle) !== false; >> preg_match('/' . $needle . '/', $haystack) != 0; >> >> All of these functions serve the same purpose but are either not >> intuitive, >> easy to get wrong (especially with the !== comparison) or hard to remember >> for new PHP developers. >> >> The proposed signature for this function follows the conventions of other >> signatures of string functions and should look like this: >> >> str_contains(string $haystack, string $needle): bool >> >> This function is very easy to implement, has no side effects or backward >> compatibility issues. >> I've implemented this feature and created a pull request on GitHub ( Link: >> https://github.com/php/php-src/pull/5179 ). >> >> To get this function into the PHP core, I will open up an RFC for this. >> But first, I would like to get your opinions and consensus on this >> proposal. >> >> What are your opinions on this proposal? >> > > Sounds good to me. This operation is needed often enough that it deserves > a dedicated function. > > I'd recommend leaving the proposal at only str_contains(), in particular: > > * Do not propose a case-insensitive variant. I believe this is really the > point on which the last str_starts_with/str_ends_with proposal failed. > > * Do not propose mb_str_contains(). Especially as no offsets are > involved, there is no reason to have this function. (For UTF-8, the > behavior would be exactly equivalent to str_contains.) > > Regards, > Nikita >
I like to elaborate on Nikitas response: I don't think a mb_str_contains is necessary, because the proposed function does not behave differently, if the input strings are multibyte strings. When searched for a multibyte string in another multibyte string, the return value would consistently be true/false. The position/offset at which the multibyte string was found is not relevant. The reason for the existence of a strpos/mb_strpos is the fact, that the returned position/offset varies depending on weather or not the string is a multibyte string or not. The only possible valid variants concerning multibyte and incasesensitivity I see are: * str_contains: works as expected with multibyte and non multibyte strings. * mb_str_icontains: is the only valid option to do a incasesensitive search for multibyte strings. Unneeded variants I see are: * mb_str_contains: does not behave differently when compared to str_contains, as mentioned above. * str_icontains: is a possible option but could be error prone for when used with multibyte strings like UTF-8, as it is de facto the standard nowadays. I'm certain there would be confusion among php developers when the newly proposed functions are only str_contains and mb_str_icontains. Patrick ALLAERT: Yes, it does have one: people having already defined a str_contains() function in the global scope will have a PHP Fatal error: Cannot redeclare str_contains() You are absolutely correct with this. Although functions added by frameworks to the global scope are usually guarded by: if (!function_exists('str_contains')) {}
  108648
February 17, 2020 13:37 pierre.php@gmail.com (Pierre Joye)
hello,

On Fri, Feb 14, 2020, 6:54 PM Nikita Popov ppv@gmail.com> wrote:

> On Fri, Feb 14, 2020 at 10:18 AM Philipp Tanlak tanlak@gmail.com> > wrote: > > > Hello PHP Devs, > > > > I would like to propose the new basic function: str_contains. > > > > The goal of this proposal is to standardize on a function, to check > weather > > or not a string is contained in another string, which has a very common > > use-case in almost every PHP project. > > PHP Frameworks like Laravel create helper functions for this behavior > > because it is so ubiquitous. > > > > There are currently a couple of approaches to create such a behavior, > most > > commonly: > > > strpos($haystack, $needle) !== false; > > strstr($haystack, $needle) !== false; > > preg_match('/' . $needle . '/', $haystack) != 0; > > > > All of these functions serve the same purpose but are either not > intuitive, > > easy to get wrong (especially with the !== comparison) or hard to > remember > > for new PHP developers. > > > > The proposed signature for this function follows the conventions of other > > signatures of string functions and should look like this: > > > > str_contains(string $haystack, string $needle): bool > > > > This function is very easy to implement, has no side effects or backward > > compatibility issues. > > I've implemented this feature and created a pull request on GitHub ( > Link: > > https://github.com/php/php-src/pull/5179 ). > > > > To get this function into the PHP core, I will open up an RFC for this. > > But first, I would like to get your opinions and consensus on this > > proposal. > > > > What are your opinions on this proposal? > > > > Sounds good to me. This operation is needed often enough that it deserves a > dedicated function. > > I'd recommend leaving the proposal at only str_contains(), in particular: > > * Do not propose a case-insensitive variant. I believe this is really the > point on which the last str_starts_with/str_ends_with proposal failed. > > * Do not propose mb_str_contains(). Especially as no offsets are involved, > there is no reason to have this function. (For UTF-8, the behavior would be > exactly equivalent to str_contains.) >
Btw, while some mbstring references I I mentioned, I do like the ICU search implementation as well. http://userguide.icu-project.org/collation/icu-string-search-service It handles a lot of cases based on locales.
> Regards, > Nikita >
  108649
February 17, 2020 14:23 rowan.collins@gmail.com (Rowan Tommins)
On Mon, 17 Feb 2020 at 13:38, Pierre Joye php@gmail.com> wrote:

> > Btw, while some mbstring references I I mentioned, I do like the ICU search > implementation as well. > > http://userguide.icu-project.org/collation/icu-string-search-service > > It handles a lot of cases based on locales. >
That's a lovely example of why treating Unicode as a character encoding is the wrong mindset. I would love to see more people using ext/intl rather than ext/mbstring, and more ICU features like this being included. Regards, -- Rowan Tommins [IMSoP]