New functions: string_starts_with(), string_ends_with()

  100142
August 1, 2017 04:57 andreas@dqxtech.net (Andreas Hennings)
Hello list,
a quite common use case is that one needs to find out if a string
$haystack begins or ends with another string $needle.
Or in other words, if $needle is a prefix or a suffix of $haystack.

One prominent example would be in PSR-4 or PSR-0 class loaders.
Maybe the use case also occurs when writing parsers..
In each of these two examples (parsers, class loaders), we care about
performance.

(forgive me if this was discussed before, I did not find it anywhere
in the archives)

--------------------------

Existing solutions to this problem feel non-trivial, and/or are
suboptimal in performance.
https://stackoverflow.com/questions/2790899/how-to-check-if-a-string-starts-with-a-specified-string
https://stackoverflow.com/questions/834303/startswith-and-endswith-functions-in-php
This answer compares different solutions,
https://stackoverflow.com/a/7168986/246724

Existing solutions:
(Let's focus on string_starts_with(), the other case is mostly
equivalent / symmetric)

if (0 === strpos($haystack, $needle)) {..}
I have often seen this presented as the preferable solution.
Unfortunately, this searches the entire string, not just the
beginning. Especially if $haystack is really long, this can be a
waste.
E.g. if (0 === strpos(file_get_contents('some_source_file.php'),
'https://stackoverflow.com/a/10473026/246724
The author says that it will be outperformed by strncmp() - so..

if (preg_match('/^' . preg_quote($needle, '/') . '/', $haystack)) {..}
Clearly gonna be slower than other options.

As said, all these solutions do work, but they are either suboptimal,
or they add clutter and overhead, or feel a bit like mind acrobatics.

-----------------

So, I wonder if it would be worthwhile to add new functions
string_starts_with() / string_has_prefix(), and string_ends_with() /
string_has_suffix().

(Or maybe change strncmp(), so that the 3rd parameter $len is
optional. If $len is NULL / not provided, it would use the length of
the second (or first?) string.
(idea was that second parameter = needle).)

For me personally, I am sure that I would use a new
string_starts_with() a lot more often than a lot of the other existing
string functions.
I don't think it is an exotic or niche use case.

--------------

Spinning this further:
A lot of times if I want to check if $haystack begins with $needle, I
will then need the rest of the string after $needle.
So
if (string_starts_with($haystack, $needle)) {
    $suffix = substr($haystack, strlen($needle));
}
or
if (string_ends_with($filename, '.php')) {
    $basename = substr($filename, 0, -4);
}

I wonder if this could be somehow combined.
E.g.
if (FALSE !== $basename = string_clip_suffix($filename, '.php')) {
    // Do something with $basename.
}

------------------

One flaw of these new functions would be that they are less versatile
than other string functions.
They solve this problem, and nothing else.
On the other hand, this is the point, to avoid unnecessary overhead.

The other problem would be, of course, "feature creep" aka "we have so
many string functions already".
This is a matter of opinion.
I would imagine the "cost" of new native functions is:
- global namespace pollution
- increased mental load to learn and remember all of them
- higher memory footprint of php engine?
- more C code to maintain
- a new doc page.
Did I miss something?

------------------

-- Andreas
  100143
August 1, 2017 06:29 michal@brzuchalski.com (=?UTF-8?Q?Micha=C5=82_Brzuchalski?=)
Hi Andreas,

2017-08-01 6:57 GMT+02:00 Andreas Hennings <andreas@dqxtech.net>:

> Hello list, > a quite common use case is that one needs to find out if a string > $haystack begins or ends with another string $needle. > Or in other words, if $needle is a prefix or a suffix of $haystack. > > One prominent example would be in PSR-4 or PSR-0 class loaders. > Maybe the use case also occurs when writing parsers.. > In each of these two examples (parsers, class loaders), we care about > performance. > > (forgive me if this was discussed before, I did not find it anywhere > in the archives) > > -------------------------- > > Existing solutions to this problem feel non-trivial, and/or are > suboptimal in performance. > https://stackoverflow.com/questions/2790899/how-to- > check-if-a-string-starts-with-a-specified-string > https://stackoverflow.com/questions/834303/startswith- > and-endswith-functions-in-php > This answer compares different solutions, > https://stackoverflow.com/a/7168986/246724 > > Existing solutions: > (Let's focus on string_starts_with(), the other case is mostly > equivalent / symmetric) > > if (0 === strpos($haystack, $needle)) {..} > I have often seen this presented as the preferable solution. > Unfortunately, this searches the entire string, not just the > beginning. Especially if $haystack is really long, this can be a > waste. > E.g. if (0 === strpos(file_get_contents('some_source_file.php'), > ' ' > if ($needle === substr($haystack, 0, strlen($needle))) {..} > This reserves new memory for the substring, which later needs to be > garbage-collected. > Also, this requires an additional function call to strlen() - which > adds even more clutter if $needle is an expression, not just a > variable. > > if (0 === strncmp($haystack, $needle, strlen($needle))) {..} > Needs the additional call to strlen(). > Otherwise, this seems like a really good solution. > > if ('' === $needle || false !== strrpos($haystack, $needle, > -strlen($haystack))) {..} > This is the funky solution from https://stackoverflow.com/a/ > 10473026/246724 > The author says that it will be outperformed by strncmp() - so.. > > if (preg_match('/^' . preg_quote($needle, '/') . '/', $haystack)) {..} > Clearly gonna be slower than other options. > > As said, all these solutions do work, but they are either suboptimal, > or they add clutter and overhead, or feel a bit like mind acrobatics. > > ----------------- > > So, I wonder if it would be worthwhile to add new functions > string_starts_with() / string_has_prefix(), and string_ends_with() / > string_has_suffix(). > > (Or maybe change strncmp(), so that the 3rd parameter $len is > optional. If $len is NULL / not provided, it would use the length of > the second (or first?) string. > (idea was that second parameter = needle).) > > For me personally, I am sure that I would use a new > string_starts_with() a lot more often than a lot of the other existing > string functions. > I don't think it is an exotic or niche use case. > > -------------- > > Spinning this further: > A lot of times if I want to check if $haystack begins with $needle, I > will then need the rest of the string after $needle. > So > if (string_starts_with($haystack, $needle)) { > $suffix = substr($haystack, strlen($needle)); > } > or > if (string_ends_with($filename, '.php')) { > $basename = substr($filename, 0, -4); > } > > I wonder if this could be somehow combined. > E.g. > if (FALSE !== $basename = string_clip_suffix($filename, '.php')) { > // Do something with $basename. > } > > ------------------ > > One flaw of these new functions would be that they are less versatile > than other string functions. > They solve this problem, and nothing else. > On the other hand, this is the point, to avoid unnecessary overhead. > > The other problem would be, of course, "feature creep" aka "we have so > many string functions already". > This is a matter of opinion. > I would imagine the "cost" of new native functions is: > - global namespace pollution > - increased mental load to learn and remember all of them > - higher memory footprint of php engine? > - more C code to maintain > - a new doc page. > Did I miss something? > > ------------------ > > -- Andreas > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > > This idea was discussed 11 months ago https://externals.io/message/94787
There is also a proper RFC https://wiki.php.net/rfc/add_str_begin_and_end_functions You might wanna contact with Will to get feedback from the idea. -- regards / pozdrawiam, -- Michał Brzuchalski about.me/brzuchal brzuchalski.com
  100144
August 1, 2017 07:35 andreas@dqxtech.net (Andreas Hennings)
Thanks!
I did not find those, maybe the emails need to be enriched with keywords.
Like SEO-aware email authoring.

Ok. I am looking at the RFC and the old discussions at
https://marc.info/?l=php-internals&m=147017797404339&w=2

I don't know how to follow up on old threads that I don't have in my
email inbox.


So here is my feedback.

The RFC seems mostly fine as it is.
It does not contain anything like the string_clip_suffix() /
string_clip_prefix(), but I think these should be discussed
separately.

About the naming:
The "i" in str_ibegin and str_iend() seems ok to me.
I also strongly support separate functions instead of a parameter for
case sensitivity.

I also support the underscore. str_begin() is better than strbegin().


------------------------

Whether to have an "s" at the end:

https://marc.info/?l=php-internals&m=147017797404339&w=2
(Yasuo Ohgaki)

> It might be okay to have "s" in function names, but if we want to be > consistent, > str_replace -> str_replaces > str_ireplace -> str_ireplaces
I disagree with this analogy. The "s" in str_begins() would be for "haystack beginS with needle". An "s" in str_replaces() would stand for what? Both "begin" and "replace" are verbs, but they have a different role in the function name. "begin" describes a state or condition we want to verfiy, whereas "replace" is a command we give to the machine. So to me it would make sense to have str_begins() and str_ends() instead of str_begin() and str_end(). To me, str_end() means either "End the string!" (command) or "Give me the end of the string!" (noun). In fact Rowan Collins made the same argument here, https://marc.info/?l=php-internals&m=147017844704431&w=2
> I think those names mean something different: "str_begin" sounds like an > imperative "make this string begin with X"; "str_begins" is more of an > assertion "the string begins with X". Ruby would spell it with a ? at > the end. It's also the same form, grammatically, as the common "isFoo". > > Note that this logic holds for "str_replace", which *is* an imperative - > you are not saying "tell me if X replaces Y", you are saying "please > replace X with Y".
But then Will talks about consistency again. https://marc.info/?l=php-internals&m=147018700406320&w=2
> I think like > having an "s" at the end of the function names reads better, but > omitting the "s" fits better with the existing function names and does > not read bad. Therefore, I am in favor of dropping the "s".
Honestly, looking at the existing string functions at http://php.net/manual/en/ref.strings.php I don't see a lot of consistency here. Just a long list of garbled abbreviations. I also don't see any existing function where the verb has a similar role as the "begin" in str_begin(). For all the existing string functions, the verb is a command. I think a better comparison would be file_exists() function_exists() class_exists() is_subclass_of() extension_loaded() ncurses_has_colors() ncurses_can_change_color() What these functions have in common: - The return value is boolean. - The verb is not a command, but it describes a state or condition. The verb is not always at the end of the function name, and it does not always end with -s. But the form and ending of the verb follows its grammatical role in the sentence. I think this is a much better guideline than following a wrong idea of consistency. ------------------------- Finally, I don't know why everything needs to be abbreviated. Having str_* instead of string_* seems ok to me, and is consistent with existing string functions. But my first idea would have been more complete phrases like str_ends_with, str_has_ending(), str_has_suffix(). Instead of just str_end(), or str_ends(). On the other hand, shorter function names have their benefits. So.. no strong opinion here. -------------- -- Andreas On Tue, Aug 1, 2017 at 8:29 AM, Michał Brzuchalski <michal@brzuchalski.com> wrote:
> Hi Andreas, > > 2017-08-01 6:57 GMT+02:00 Andreas Hennings <andreas@dqxtech.net>: >> >> Hello list, >> a quite common use case is that one needs to find out if a string >> $haystack begins or ends with another string $needle. >> Or in other words, if $needle is a prefix or a suffix of $haystack. >> >> One prominent example would be in PSR-4 or PSR-0 class loaders. >> Maybe the use case also occurs when writing parsers.. >> In each of these two examples (parsers, class loaders), we care about >> performance. >> >> (forgive me if this was discussed before, I did not find it anywhere >> in the archives) >> >> -------------------------- >> >> Existing solutions to this problem feel non-trivial, and/or are >> suboptimal in performance. >> >> https://stackoverflow.com/questions/2790899/how-to-check-if-a-string-starts-with-a-specified-string >> >> https://stackoverflow.com/questions/834303/startswith-and-endswith-functions-in-php >> This answer compares different solutions, >> https://stackoverflow.com/a/7168986/246724 >> >> Existing solutions: >> (Let's focus on string_starts_with(), the other case is mostly >> equivalent / symmetric) >> >> if (0 === strpos($haystack, $needle)) {..} >> I have often seen this presented as the preferable solution. >> Unfortunately, this searches the entire string, not just the >> beginning. Especially if $haystack is really long, this can be a >> waste. >> E.g. if (0 === strpos(file_get_contents('some_source_file.php'), >> '> '> >> if ($needle === substr($haystack, 0, strlen($needle))) {..} >> This reserves new memory for the substring, which later needs to be >> garbage-collected. >> Also, this requires an additional function call to strlen() - which >> adds even more clutter if $needle is an expression, not just a >> variable. >> >> if (0 === strncmp($haystack, $needle, strlen($needle))) {..} >> Needs the additional call to strlen(). >> Otherwise, this seems like a really good solution. >> >> if ('' === $needle || false !== strrpos($haystack, $needle, >> -strlen($haystack))) {..} >> This is the funky solution from >> https://stackoverflow.com/a/10473026/246724 >> The author says that it will be outperformed by strncmp() - so.. >> >> if (preg_match('/^' . preg_quote($needle, '/') . '/', $haystack)) {..} >> Clearly gonna be slower than other options. >> >> As said, all these solutions do work, but they are either suboptimal, >> or they add clutter and overhead, or feel a bit like mind acrobatics. >> >> ----------------- >> >> So, I wonder if it would be worthwhile to add new functions >> string_starts_with() / string_has_prefix(), and string_ends_with() / >> string_has_suffix(). >> >> (Or maybe change strncmp(), so that the 3rd parameter $len is >> optional. If $len is NULL / not provided, it would use the length of >> the second (or first?) string. >> (idea was that second parameter = needle).) >> >> For me personally, I am sure that I would use a new >> string_starts_with() a lot more often than a lot of the other existing >> string functions. >> I don't think it is an exotic or niche use case. >> >> -------------- >> >> Spinning this further: >> A lot of times if I want to check if $haystack begins with $needle, I >> will then need the rest of the string after $needle. >> So >> if (string_starts_with($haystack, $needle)) { >> $suffix = substr($haystack, strlen($needle)); >> } >> or >> if (string_ends_with($filename, '.php')) { >> $basename = substr($filename, 0, -4); >> } >> >> I wonder if this could be somehow combined. >> E.g. >> if (FALSE !== $basename = string_clip_suffix($filename, '.php')) { >> // Do something with $basename. >> } >> >> ------------------ >> >> One flaw of these new functions would be that they are less versatile >> than other string functions. >> They solve this problem, and nothing else. >> On the other hand, this is the point, to avoid unnecessary overhead. >> >> The other problem would be, of course, "feature creep" aka "we have so >> many string functions already". >> This is a matter of opinion. >> I would imagine the "cost" of new native functions is: >> - global namespace pollution >> - increased mental load to learn and remember all of them >> - higher memory footprint of php engine? >> - more C code to maintain >> - a new doc page. >> Did I miss something? >> >> ------------------ >> >> -- Andreas >> >> -- >> PHP Internals - PHP Runtime Development Mailing List >> To unsubscribe, visit: http://www.php.net/unsub.php >> > > This idea was discussed 11 months ago https://externals.io/message/94787 > There is also a proper RFC > https://wiki.php.net/rfc/add_str_begin_and_end_functions > You might wanna contact with Will to get feedback from the idea. > > -- > regards / pozdrawiam, > -- > Michał Brzuchalski > about.me/brzuchal > brzuchalski.com