[RFC] [Under Discussion] New Curl URL API

  118041
June 22, 2022 04:38 pierrick@php.net (Pierrick Charron)
Hi,

Following our discussions we had on the subject of the new Curl URL API,
and other curl improvements. I decided to only focus on adding the new Curl
URL API and put aside all other improvements. Here is the RFC that reflects
our current conversations.

https://wiki.php.net/rfc/curl-url-api

Feel free to give any feedback, concern or support :-)

Regards,
Pierrick
  118043
June 22, 2022 12:03 craig@craigfrancis.co.uk (Craig Francis)
On 22 Jun 2022, at 05:38, Pierrick Charron <pierrick@php.net> wrote:
> Here is the RFC that reflects our current conversations. > > https://wiki.php.net/rfc/curl-url-api > > Feel free to give any feedback, concern or support :-)
Thanks Pierrick, I think this is a good approach to add the URL functionality to PHP 8.2 with a fairly simple CurlUrl object, just like CurlFile. This would allow developers to avoid the parsing vulnerability issues (as in, using two different libraries that can parse a URL in different ways)... then the completely new, and fully OOP version, would be done at a later date, after a lot more discussion / planning. Craig
  118044
June 22, 2022 12:12 derick@php.net (Derick Rethans)
On 22 June 2022 05:38:13 BST, Pierrick Charron <pierrick@php.net> wrote:
>Hi, > >Following our discussions we had on the subject of the new Curl URL API, >and other curl improvements. I decided to only focus on adding the new Curl >URL API and put aside all other improvements. Here is the RFC that reflects >our current conversations. > >https://wiki.php.net/rfc/curl-url-api > >Feel free to give any feedback, concern or support :-)
I've two things: - The new CurlUrl class should probably be immutable from the start. It was my biggest mistake not to do that with DateTime. - What happens if the curl library available on the system does not have the features and functions that this new class relies on? I would expect the class to not be available either, but the RFC does not mention that. cheers Derick
  118047
June 22, 2022 14:23 pierrick@php.net (Pierrick Charron)
Hi Derick,


> > - The new CurlUrl class should probably be immutable from the start. It > was my biggest mistake not to do that with DateTime. > > Thanks for sharing your lessons learned. But I still see some use cases
where mutable objects are easier to use. From the experience you had with DateTime, do you think that having `CurlUrl` being immutable and providing a `MutableCurlUrl` would make sense ? I see some cases where you "navigate" on a website using the same CurlHandle and just want to manipulate the MutableCurlUrl handle to change urls.
> - What happens if the curl library available on the system does not have > the features and functions that this new class relies on? I would expect > the class to not be available either, but the RFC does not mention that. >
Good point. As you expected, if the functions are not available in libcurl, the class will not be available. Same thing for each constant/feature. The extension will "adapt" to the curl version. I will add this to the RFC. Pierrick
  118049
June 22, 2022 14:46 divinity76@gmail.com (Hans Henrik Bergan)
any particular reason CurlUrl::getPort() defaults to 0 rather than one of
the valid options? (that being CurlUrl::DEFAULT_PORT
and CurlUrl::NO_DEFAULT_PORT )

On Wed, 22 Jun 2022 at 16:23, Pierrick Charron <pierrick@php.net> wrote:

> Hi Derick, > > > > > > - The new CurlUrl class should probably be immutable from the start. It > > was my biggest mistake not to do that with DateTime. > > > > > Thanks for sharing your lessons learned. But I still see some use cases > where mutable objects are easier to use. From the experience you had with > DateTime, do you think that having `CurlUrl` being immutable and providing > a `MutableCurlUrl` would make sense ? I see some cases where you "navigate" > on a website using the same CurlHandle and just want to manipulate the > MutableCurlUrl handle to change urls. > > > > - What happens if the curl library available on the system does not have > > the features and functions that this new class relies on? I would expect > > the class to not be available either, but the RFC does not mention that. > > > > Good point. As you expected, if the functions are not available in libcurl, > the class will not be available. Same thing for each constant/feature. The > extension will "adapt" to the curl version. I will add this to the RFC. > > Pierrick >
  118050
June 22, 2022 14:53 divinity76@gmail.com (Hans Henrik Bergan)
nitpicking but I kind-of doubt the description for CurlUrl::NO_DEFAULT_PORT
is correct, quote:
> Instructs the method to return null if the port matches the default port for the scheme.
makes it sound like these would return null: http://localhost:80/ https://localhost:443/ ftps://localhost:22/ Is that correct? I would imagine it returns null if the port isn't specified, rather than null if the port when specified matches the default port? On Wed, 22 Jun 2022 at 16:46, Hans Henrik Bergan <divinity76@gmail.com> wrote:
> any particular reason CurlUrl::getPort() defaults to 0 rather than one of > the valid options? (that being CurlUrl::DEFAULT_PORT > and CurlUrl::NO_DEFAULT_PORT ) > > On Wed, 22 Jun 2022 at 16:23, Pierrick Charron <pierrick@php.net> wrote: > >> Hi Derick, >> >> >> > >> > - The new CurlUrl class should probably be immutable from the start. It >> > was my biggest mistake not to do that with DateTime. >> > >> > >> Thanks for sharing your lessons learned. But I still see some use cases >> where mutable objects are easier to use. From the experience you had with >> DateTime, do you think that having `CurlUrl` being immutable and providing >> a `MutableCurlUrl` would make sense ? I see some cases where you >> "navigate" >> on a website using the same CurlHandle and just want to manipulate the >> MutableCurlUrl handle to change urls. >> >> >> > - What happens if the curl library available on the system does not have >> > the features and functions that this new class relies on? I would expect >> > the class to not be available either, but the RFC does not mention that. >> > >> >> Good point. As you expected, if the functions are not available in >> libcurl, >> the class will not be available. Same thing for each constant/feature. The >> extension will "adapt" to the curl version. I will add this to the RFC. >> >> Pierrick >> >
  118051
June 22, 2022 14:56 divinity76@gmail.com (Hans Henrik Bergan)
(dammit, mixed sftp:// with ftps:// there, ignore that, i meant sftp:// )

On Wed, 22 Jun 2022 at 16:53, Hans Henrik Bergan <divinity76@gmail.com>
wrote:

> nitpicking but I kind-of doubt the description > for CurlUrl::NO_DEFAULT_PORT is correct, quote: > > Instructs the method to return null if the port matches the default port > for the scheme. > > makes it sound like these would return null: http://localhost:80/ > https://localhost:443/ ftps://localhost:22/ > > Is that correct? I would imagine it returns null if the port isn't > specified, rather than null if the port when specified matches the default > port? > > On Wed, 22 Jun 2022 at 16:46, Hans Henrik Bergan <divinity76@gmail.com> > wrote: > >> any particular reason CurlUrl::getPort() defaults to 0 rather than one of >> the valid options? (that being CurlUrl::DEFAULT_PORT >> and CurlUrl::NO_DEFAULT_PORT ) >> >> On Wed, 22 Jun 2022 at 16:23, Pierrick Charron <pierrick@php.net> wrote: >> >>> Hi Derick, >>> >>> >>> > >>> > - The new CurlUrl class should probably be immutable from the start. It >>> > was my biggest mistake not to do that with DateTime. >>> > >>> > >>> Thanks for sharing your lessons learned. But I still see some use cases >>> where mutable objects are easier to use. From the experience you had with >>> DateTime, do you think that having `CurlUrl` being immutable and >>> providing >>> a `MutableCurlUrl` would make sense ? I see some cases where you >>> "navigate" >>> on a website using the same CurlHandle and just want to manipulate the >>> MutableCurlUrl handle to change urls. >>> >>> >>> > - What happens if the curl library available on the system does not >>> have >>> > the features and functions that this new class relies on? I would >>> expect >>> > the class to not be available either, but the RFC does not mention >>> that. >>> > >>> >>> Good point. As you expected, if the functions are not available in >>> libcurl, >>> the class will not be available. Same thing for each constant/feature. >>> The >>> extension will "adapt" to the curl version. I will add this to the RFC. >>> >>> Pierrick >>> >>
  118053
June 22, 2022 15:04 pierrick@php.net (Pierrick Charron)
HI Hans,

any particular reason CurlUrl::getPort() defaults to 0 rather than one of
> the valid options? (that being CurlUrl::DEFAULT_PORT > and CurlUrl::NO_DEFAULT_PORT ) >
This is because the default is none of those 2 behaviours, If the port wasn't set it will return null, but if the port is the default port for the scheme it will still return it. makes it sound like these would return null: http://localhost:80/
> https://localhost:443/ ftps://localhost:22/ > > Is that correct? I would imagine it returns null if the port isn't > specified, rather than null if the port when specified matches the default > port? > > That's correct, if you use CurlUrl::NO_DEFAULT_PORT. The behaviour you
were expecting is the default one ($flags = 0)
  118142
June 30, 2022 16:48 pierrick@php.net (Pierrick Charron)
Hi all,


> - The new CurlUrl class should probably be immutable from the start. It > was my biggest mistake not to do that with DateTime. > > After thinking about it and some discussions, I followed Derick's
recommendation and therefore changed the RFC to make the CurlUrl class immutable. All the setters were replaced by new `with*` methods. For example setHost is now withHost and will return a new object with the host modified. This will prevent confusing behavior where the CurlUrl object would be unintentionally modified after being attached to a CurlHandle. Pierrick
  118145
June 30, 2022 22:21 internals@lists.php.net ("Levi Morrison via internals")
On Thu, Jun 30, 2022 at 10:48 AM Pierrick Charron <pierrick@php.net> wrote:
> > Hi all, > > > > - The new CurlUrl class should probably be immutable from the start. It > > was my biggest mistake not to do that with DateTime. > > > > > After thinking about it and some discussions, I followed Derick's > recommendation and therefore changed the RFC to make the CurlUrl class > immutable. All the setters were replaced by new `with*` methods. > For example setHost is now withHost and will return a new object with the > host modified. This will prevent confusing behavior where the CurlUrl > object would be unintentionally modified after being attached to a > CurlHandle. > > Pierrick
It's clear people do not agree on how we should be designing the APIs for 3rd party extensions. However, let me redraw attention to the introduction of the RFC:
> One of the goal of this API is to tighten a problematic vulnerable area > for applications where the URL parser library would believe one thing > and libcurl another. This could and has sometimes led to security > problems.
Designing another API on top of what libcurl provides _could make the problem worse_. I am fine with these kinds of adjustments: 1. Using exceptions instead of return codes. 2. Using enums instead of constants if it makes sense (it may not if they are bitwise-or'd together, which is pretty common for C libs). 3. Renaming things that have keyword or reserved word conflicts. I am not fine with designing an immutable, with* style API that doesn't mirror the underlying library at all. At least, not by itself; I'd be okay with having both, where the nicer API is built on top of the lower level, but what is nicer is subjective. As this thread shows, designing a nicer API will have quite a bit more discussion and disagreement than "exposing" or "porting" libcurl's API.
  118077
June 23, 2022 16:48 internals@lists.php.net ("Levi Morrison via internals")
On Tue, Jun 21, 2022 at 10:38 PM Pierrick Charron <pierrick@php.net> wrote:
> > Hi, > > Following our discussions we had on the subject of the new Curl URL API, > and other curl improvements. I decided to only focus on adding the new Curl > URL API and put aside all other improvements. Here is the RFC that reflects > our current conversations. > > https://wiki.php.net/rfc/curl-url-api > > Feel free to give any feedback, concern or support :-) > > Regards, > Pierrick
IMO, this should mirror the low-level curl url API very directly. The basis of my opinion is that we do not own libcurl; we are merely adapting it for use in PHP. We cannot anticipate changes in their design, nor do we have authority to do so if we feel something should change. Touching it as little as possible makes it easier to track upstream changes, etc. Based on that, I think the naming should be closer to libcurl.: - CurlUrl::URL_ENCODE should be CurlUrl::URLENCODE - CurlUrl::URL_DECODE should be CurlUrl::URLDECODE And so on, only differing if necessary because something is a reserved word. The API should be as exact as possible to what libcurl provides. The "helpers" getHost, getPassword, etc should be removed and should expose `curl_url_get` more directly. Of course, it should be object based instead of resource based, but that's it. A nicer API can be built on top of it, but I don't think that's the role this particular API should play.
  118078
June 23, 2022 20:04 rowan.collins@gmail.com (Rowan Tommins)
On 23 June 2022 17:48:57 BST, Levi Morrison via internals <internals@lists.php.net> wrote:
>IMO, this should mirror the low-level curl url API very directly. The >basis of my opinion is that we do not own libcurl; we are merely >adapting it for use in PHP. We cannot anticipate changes in their >design, nor do we have authority to do so if we feel something should >change. Touching it as little as possible makes it easier to track >upstream changes, etc. > >Based on that, I think the naming should be closer to libcurl.: > - CurlUrl::URL_ENCODE should be CurlUrl::URLENCODE > - CurlUrl::URL_DECODE should be CurlUrl::URLDECODE > >And so on, only differing if necessary because something is a reserved >word. The API should be as exact as possible to what libcurl provides. >The "helpers" getHost, getPassword, etc should be removed and should >expose `curl_url_get` more directly. > >Of course, it should be object based instead of resource based, but that's it. > >A nicer API can be built on top of it, but I don't think that's the >role this particular API should play.
For the record, I disagree with literally everything you've said here. PHP indeed does not own libcurl, and nor does libcurl own PHP. We are targeting a different audience, and have a different set of facilities available to us. We have our own documentation, so there is no reason a user of PHP should know or care what the underlying implementation looks like, any more than they should know how the memory allocation works. If the underlying library adds a feature, it will be as easy to add a new method as a new constant. If the underlying library changes behaviour, we will want to make our own decision on whether that meets our compatibility policy, and whether to emulate the older behaviour (or indeed emulate the newer behaviour on systems with an older library). Twenty years ago, maybe PHP programmers were used to it being a veneer over C. I don't think that is or should be the expectation today, unless you're using FFI. Regards, -- Rowan Tommins [IMSoP]
  118080
June 24, 2022 13:49 pierrick@php.net (Pierrick Charron)
Hi all, and thanks Levi for your feedback,

If you look at the first thread (Discussion about new Curl URL API and
ext/curl improvements) you'll see that this was my first approach. I even
proposed to "OOPify" the actual CurlHandle & co objects with "simple"
methods like $ch->setOpt(). Anyone who reads the actual API can see this is
the actual "philosophy" of the extension. It was designed (on purpose or
not) as a thin wrapper over curl. But lets focus on URL API only !

With previous thread answers, I was under the impression that I really was
the only one liking the approach of following the cURL api as much as
possible and leaving the more high level API to things like Guzzle & co
(for example by adding an Implementation of PSR-7 UriInterface using this
API) since all the others parts of the API are done this way. I think it's
important to have this new Curl URL API in 8.2 since it fixes security
issues and that's of course not ideal but I would rather have an
implementation that would not be my first choice, than not having it at all..

What do you think ? I would love more people to give feedback on what they
are expecting, if they don't care if they prefer one approach or the other
and of course why ? I was thinking about doing a vote on this, but I'm not
sure it's a good idea. What do you all think ?

Regards,
Pierrick







Le jeu. 23 juin 2022, à 12 h 49, Levi Morrison morrison@datadoghq.com>
a écrit :

> On Tue, Jun 21, 2022 at 10:38 PM Pierrick Charron <pierrick@php.net> > wrote: > > > > Hi, > > > > Following our discussions we had on the subject of the new Curl URL API, > > and other curl improvements. I decided to only focus on adding the new > Curl > > URL API and put aside all other improvements. Here is the RFC that > reflects > > our current conversations. > > > > https://wiki.php.net/rfc/curl-url-api > > > > Feel free to give any feedback, concern or support :-) > > > > Regards, > > Pierrick > > IMO, this should mirror the low-level curl url API very directly. The > basis of my opinion is that we do not own libcurl; we are merely > adapting it for use in PHP. We cannot anticipate changes in their > design, nor do we have authority to do so if we feel something should > change. Touching it as little as possible makes it easier to track > upstream changes, etc. > > Based on that, I think the naming should be closer to libcurl.: > - CurlUrl::URL_ENCODE should be CurlUrl::URLENCODE > - CurlUrl::URL_DECODE should be CurlUrl::URLDECODE > > And so on, only differing if necessary because something is a reserved > word. The API should be as exact as possible to what libcurl provides. > The "helpers" getHost, getPassword, etc should be removed and should > expose `curl_url_get` more directly. > > Of course, it should be object based instead of resource based, but that's > it. > > A nicer API can be built on top of it, but I don't think that's the > role this particular API should play. >
  118081
June 24, 2022 14:05 JDafoe@fsx.com (Jeffrey Dafoe)
Hello Pierrick,

A thin wrapper would be the most flexible. Someone can always write a "friendlier" class using your thin wrapper, as you mentioned, but one cannot easily go the other direction.

-Jeff


-----Original Message-----
From: Pierrick Charron <pierrick@php.net> 
Sent: Friday, June 24, 2022 9:49 AM
To: Levi Morrison morrison@datadoghq.com>
Cc: PHP internals <internals@lists.php.net>
Subject: Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

Hi all, and thanks Levi for your feedback,

If you look at the first thread (Discussion about new Curl URL API and ext/curl improvements) you'll see that this was my first approach. I even proposed to "OOPify" the actual CurlHandle & co objects with "simple"
methods like $ch->setOpt(). Anyone who reads the actual API can see this is the actual "philosophy" of the extension. It was designed (on purpose or
not) as a thin wrapper over curl. But lets focus on URL API only !

With previous thread answers, I was under the impression that I really was the only one liking the approach of following the cURL api as much as possible and leaving the more high level API to things like Guzzle & co (for example by adding an Implementation of PSR-7 UriInterface using this
API) since all the others parts of the API are done this way. I think it's important to have this new Curl URL API in 8.2 since it fixes security issues and that's of course not ideal but I would rather have an implementation that would not be my first choice, than not having it at all.

What do you think ? I would love more people to give feedback on what they are expecting, if they don't care if they prefer one approach or the other and of course why ? I was thinking about doing a vote on this, but I'm not sure it's a good idea. What do you all think ?

Regards,
Pierrick







Le jeu. 23 juin 2022, à 12 h 49, Levi Morrison morrison@datadoghq.com> a écrit :

> On Tue, Jun 21, 2022 at 10:38 PM Pierrick Charron <pierrick@php.net> > wrote: > > > > Hi, > > > > Following our discussions we had on the subject of the new Curl URL > > API, and other curl improvements. I decided to only focus on adding > > the new > Curl > > URL API and put aside all other improvements. Here is the RFC that > reflects > > our current conversations. > > > > https://wiki.php.net/rfc/curl-url-api > > > > Feel free to give any feedback, concern or support :-) > > > > Regards, > > Pierrick > > IMO, this should mirror the low-level curl url API very directly. The > basis of my opinion is that we do not own libcurl; we are merely > adapting it for use in PHP. We cannot anticipate changes in their > design, nor do we have authority to do so if we feel something should > change. Touching it as little as possible makes it easier to track > upstream changes, etc. > > Based on that, I think the naming should be closer to libcurl.: > - CurlUrl::URL_ENCODE should be CurlUrl::URLENCODE > - CurlUrl::URL_DECODE should be CurlUrl::URLDECODE > > And so on, only differing if necessary because something is a reserved > word. The API should be as exact as possible to what libcurl provides. > The "helpers" getHost, getPassword, etc should be removed and should > expose `curl_url_get` more directly. > > Of course, it should be object based instead of resource based, but > that's it. > > A nicer API can be built on top of it, but I don't think that's the > role this particular API should play. >
  118087
June 24, 2022 15:36 rowan.collins@gmail.com (Rowan Tommins)
On Fri, 24 Jun 2022 at 15:05, Jeffrey Dafoe <JDafoe@fsx.com> wrote:

> A thin wrapper would be the most flexible. Someone can always write a > "friendlier" class using your thin wrapper, as you mentioned, but one > cannot easily go the other direction. >
I think this argument has some validity when talking about the entire API of a library like curl. But we're not, we're talking about a tiny bounded context. The underlying implementation consists of 6 functions, 4 of which are trivial: * An argument-less "constructor" https://curl.se/libcurl/c/curl_url.html * An equivalent to PHP "clone" https://curl.se/libcurl/c/curl_url_dup.html * A cleanup method, which would be automatic in any PHP implementation https://curl.se/libcurl/c/curl_url_cleanup.html * A function for looking up error strings from codes https://curl.se/libcurl/c/curl_url_strerror.html That leaves the main getter and setter functions https://curl.se/libcurl/c/curl_url_get.html and https://curl.se/libcurl/c/curl_url_set.html which take the same arguments: * A resource handle * The URL part to handle * The new value for set, or an output parameter for get * A set of flags If we really believe the goal is to expose the C API in PHP, the get function would look like this: /** * @return int Error code */ function curl_url_get(resource $url, int $what, string &$part, int $flags): int That would be ... awful, and I hope nobody's really suggesting that. So we have to map each part to some concept in PHP, which is what the proposal does: * The resource handle becomes $this * The $what argument becomes the method name * The output parameter becomes the return value * The error code becomes an exception There's no "flexibility" which has been lost, and no "future changes" which are being prevented. The only other change is a few renamed constants, which I suggested on the PR. I can see an argument for making them match the library exactly; but it's the *values* that actually matter, so I don't see why we shouldn't choose our own names if they're descriptive of the functionality. Regards, -- Rowan Tommins [IMSoP]
  118082
June 24, 2022 14:13 tim@bastelstu.be (=?UTF-8?Q?Tim_D=c3=bcsterhus?=)
Hi

On 6/24/22 15:49, Pierrick Charron wrote:
> If you look at the first thread (Discussion about new Curl URL API and > ext/curl improvements) you'll see that this was my first approach. I even > proposed to "OOPify" the actual CurlHandle & co objects with "simple" > methods like $ch->setOpt(). Anyone who reads the actual API can see this is > the actual "philosophy" of the extension. It was designed (on purpose or > not) as a thin wrapper over curl. But lets focus on URL API only ! > > With previous thread answers, I was under the impression that I really was > the only one liking the approach of following the cURL api as much as > possible and leaving the more high level API to things like Guzzle & co > (for example by adding an Implementation of PSR-7 UriInterface using this > API) since all the others parts of the API are done this way. I think it's > important to have this new Curl URL API in 8.2 since it fixes security > issues and that's of course not ideal but I would rather have an > implementation that would not be my first choice, than not having it at all. > > What do you think ? I would love more people to give feedback on what they > are expecting, if they don't care if they prefer one approach or the other > and of course why ? I was thinking about doing a vote on this, but I'm not > sure it's a good idea. What do you all think ?
I agree with Levi. The curl extension should be a thin wrapper over libcurl. A high level API that is more convenient to use may be provided by PHP, but it should not be called curl. It should have a generic name with the fact that curl does the heavy lifting only being an implementation detail. Best regards Tim Düsterhus
  118083
June 24, 2022 14:24 tekiela246@gmail.com (Kamil Tekiela)
Please do not add yet another thin wrapper of an underlying C API. PHP is
not a drop-in replacement of C. PHP is a much higher-level language.
Developers should not have to understand how the underlying library works
to be able to use PHP. We need to move away from thin C wrappers as a
language. PHP should abstract more, not less, of C code.

The new API doesn't have to have exactly the same names as the C library.
Please follow PHP naming conventions and implement OOP-based API.
  118084
June 24, 2022 14:27 larry@garfieldtech.com ("Larry Garfield")
On Fri, Jun 24, 2022, at 8:49 AM, Pierrick Charron wrote:
> Hi all, and thanks Levi for your feedback, > > If you look at the first thread (Discussion about new Curl URL API and > ext/curl improvements) you'll see that this was my first approach. I even > proposed to "OOPify" the actual CurlHandle & co objects with "simple" > methods like $ch->setOpt(). Anyone who reads the actual API can see this is > the actual "philosophy" of the extension. It was designed (on purpose or > not) as a thin wrapper over curl. But lets focus on URL API only ! > > With previous thread answers, I was under the impression that I really was > the only one liking the approach of following the cURL api as much as > possible and leaving the more high level API to things like Guzzle & co > (for example by adding an Implementation of PSR-7 UriInterface using this > API) since all the others parts of the API are done this way. I think it's > important to have this new Curl URL API in 8.2 since it fixes security > issues and that's of course not ideal but I would rather have an > implementation that would not be my first choice, than not having it at all. > > What do you think ? I would love more people to give feedback on what they > are expecting, if they don't care if they prefer one approach or the other > and of course why ? I was thinking about doing a vote on this, but I'm not > sure it's a good idea. What do you all think ? > > Regards, > Pierrick
The root question, I think, is who the consumer is for this RFC? Is the consumer PHP developers? Then it should be a good PHP API. Is the consumer the Guzzle team and everyone else just uses Guzzle and ignores Curl/CurlUrl? Then it should probably adhere as closely as feasible to Curl's API, awful as it is, and be documented out the wazoo. There likely isn't time for 8.2 to do the first option, but probably is for the second. --Larry Garfield
  118097
June 26, 2022 14:57 marandall@php.net (Mark Randall)
On 22/06/2022 05:38, Pierrick Charron wrote:
> Feel free to give any feedback, concern or support :-)
This should preferably be namespaced. Mark Randall
  118098
June 27, 2022 06:29 michal.brzuchalski@gmail.com (=?UTF-8?Q?Micha=C5=82_Marcin_Brzuchalski?=)
Hi Pierrick

śr., 22 cze 2022 o 06:38 Pierrick Charron <pierrick@php.net> napisał(a):

> Hi, > > Following our discussions we had on the subject of the new Curl URL API, > and other curl improvements. I decided to only focus on adding the new Curl > URL API and put aside all other improvements. Here is the RFC that reflects > our current conversations. > > https://wiki.php.net/rfc/curl-url-api
TBH I have a problem with the proposed Curl* classes. IMO they present a mix of responsibilities that I don't like. CurlUrl is for me is a mix of Url/Uri object properties well known from other userland implementations like the PSR one with Uri specific like the host, scheme, port, path, query, fragment, etc. but on the other hand, we have flags and options which purpose is to pass Curl specific things but in IMO wrong place instead (for instance Guzzle use separate argument in all methods for options and flags. Secondly, it has some default logic like `setScheme` allows to put a scheme but requires the supported one, with an exception that you can ignore support checking by a flag - IMO this logic belongs to its consumer logic and is not part of URL class logic and the `setPort` like above. Next thing is that it again has all the getters and setters and is not immutable. Why can't it be a simple constructor with read-only properties, no getters, no setters? Maybe I miss some discussion about why here. The same goes for CurlException which looks like is an exception for handling some encoding of Url and not exactly to the Url properties itself. But with all that said, if the target audience is only a user of low lever `curl_*` functions rather than users of higher abstraction libraries like Guzzle or Httplug then, I guess I don't care that much. I'm only afraid that such a mix of responsibilities can lead to bad habits when it gets to the separation of concerns by developers who get used to it and won' see the problem as I do. Cheers, Michał Marcin Brzuchalski
  118102
June 27, 2022 08:05 rowan.collins@gmail.com (Rowan Tommins)
On 27/06/2022 07:29, Michał Marcin Brzuchalski wrote:
> CurlUrl is for me is a mix of Url/Uri object properties well known from > other userland implementations like the PSR one > with Uri specific like the host, scheme, port, path, query, fragment, etc. > but on the other hand, we have flags and options which > purpose is to pass Curl specific things but in IMO wrong place
Rather than a *representation* of a URL, think of the class as a *builder* for URLs. There are multiple methods because you might want to build the URL in different orders ("start with this URL but replace the port", "start with this domain and I'll add the path later", etc). All the flags are related to how the input should be interpreted, and the output manipulated, in order to build a correctly formatted URL string. Maybe it should even be called CurlUrlBuilder? That also fits with the design of having mutable setters; as Derick pointed out, mutable value objects are generally a bad idea, so it would make sense to encourage users to think of this as a way to get one or more strings, rather than as a result in itself. Regards, -- Rowan Tommins [IMSoP]
  118107
June 27, 2022 13:18 pierrick@php.net (Pierrick Charron)
Hi Rowan

> > Rather than a *representation* of a URL, think of the class as a > *builder* for URLs. There are multiple methods because you might want to > build the URL in different orders ("start with this URL but replace the > port", "start with this domain and I'll add the path later", etc). All > the flags are related to how the input should be interpreted, and the > output manipulated, in order to build a correctly formatted URL string. >
Sorry if it was really not clear in the RFC since I didn't even talk about the CURLOPT_CURLU option, but this class is not only there to parse/build strings for Curl but to give this specific object to Curl instead of a string representation of an URL.
> > Maybe it should even be called CurlUrlBuilder? That also fits with the > design of having mutable setters; as Derick pointed out, mutable value > objects are generally a bad idea, so it would make sense to encourage > users to think of this as a way to get one or more strings, rather than > as a result in itself. >
Since you can give this object to curl instead of an URL string, I would not call it CurlUrlBuilder. Regards, Pierrick
  118106
June 27, 2022 13:10 pierrick@php.net (Pierrick Charron)
Hi Michał,

Thanks for your comments. You made me realise that the RFC missed
information on a crucial part which is the new CURLOPT_CURLU option that
tells curl to use the CurlUrl object instead of the "standard" CURLOPT_URL
option. I just added some information on it in the RFC.

The purpose of the CurlUrl class is to be able to build/see/modify an URL
as it is seen by libcurl before/after passing it to your CurlHandle. It is
not meant to be used alone as a representation of an URL.

For example, you may want to do some checks to make sure that the URL you
gave to your CurlHandle is well formatted and that the host or any other
parts are what you want them to be, and that it was not something else
because of some differences on how the URL was parsed. Or you may want to
add/delete/overwrite some parts to sanitize your URL. That's why you have
those setters/getters. This class is definitively not there to replace
PSR-7 UriInterface. I can imagine some Guzzle or other implementation to
build a CurlUrl handle from a UriInterface before giving it to curl.

So this RFC is really targeted to users of curl_* functions.

Le lun. 27 juin 2022, à 02 h 29, Michał Marcin Brzuchalski <
michal.brzuchalski@gmail.com> a écrit :

> Hi Pierrick > > śr., 22 cze 2022 o 06:38 Pierrick Charron <pierrick@php.net> napisał(a): > >> Hi, >> >> Following our discussions we had on the subject of the new Curl URL API, >> and other curl improvements. I decided to only focus on adding the new >> Curl >> URL API and put aside all other improvements. Here is the RFC that >> reflects >> our current conversations. >> >> https://wiki.php.net/rfc/curl-url-api > > > TBH I have a problem with the proposed Curl* classes. > IMO they present a mix of responsibilities that I don't like. > > CurlUrl is for me is a mix of Url/Uri object properties well known from > other userland implementations like the PSR one > with Uri specific like the host, scheme, port, path, query, fragment, etc.. > but on the other hand, we have flags and options which > purpose is to pass Curl specific things but in IMO wrong place instead > (for instance Guzzle use separate argument in all methods for options and > flags. > Secondly, it has some default logic like `setScheme` allows to put a > scheme but requires the supported one, with an exception that you can > ignore support checking by a flag - IMO this logic belongs to its consumer > logic and is not part of URL class logic and the `setPort` like above. > Next thing is that it again has all the getters and setters and is not > immutable. Why can't it be a simple constructor with read-only properties, > no getters, no setters? Maybe I miss some discussion about why here. > > > The same goes for CurlException which looks like is an exception for > handling some encoding of Url and not exactly to the Url properties itself. > > But with all that said, if the target audience is only a user of low lever > `curl_*` functions rather than users of higher abstraction libraries like > Guzzle or Httplug then, I guess I don't care that much. > I'm only afraid that such a mix of responsibilities can lead to bad habits > when it gets to the separation of concerns by developers who get used to it > and won' see the problem as I do. > > Cheers, > Michał Marcin Brzuchalski >
  118108
June 27, 2022 15:50 rowan.collins@gmail.com (Rowan Tommins)
On 27/06/2022 14:10, Pierrick Charron wrote:
> The purpose of the CurlUrl class is to be able to build/see/modify an URL > as it is seen by libcurl before/after passing it to your CurlHandle. It is > not meant to be used alone as a representation of an URL. > > For example, you may want to do some checks to make sure that the URL you > gave to your CurlHandle is well formatted and that the host or any other > parts are what you want them to be, and that it was not something else > because of some differences on how the URL was parsed. Or you may want to > add/delete/overwrite some parts to sanitize your URL. That's why you have > those setters/getters. This class is definitively not there to replace > PSR-7 UriInterface. I can imagine some Guzzle or other implementation to > build a CurlUrl handle from a UriInterface before giving it to curl.
This leaves me a bit baffled, frankly. If I've got a URL, which is already a string, what code would I write to "do some checks" on it, outside of a unit test? If I'm using CurlUrl to "add/delete/overwrite some parts" how is that not "using it alone as a representation of an URL"? If I'm writing a PSR-7 object, am I only supposed to use CurlUrl when interfacing with curl, and generate the string myself for other purposes? If the implementation I come up with differs from curl's, how does the user know which is the "real" URL? Regards, -- Rowan Tommins [IMSoP]
  118111
June 27, 2022 21:32 pierrick@adoy.net (Pierrick Charron)
Hi Rowan


> If I've got a URL, which is already a string, what code would I write to > "do some checks" on it, outside of a unit test? >
That's just an example with an old version of PHP, but let's say you have some code that makes requests but only to a specific list of servers, so you want to analyze the URL and check if the host is in a whitelist. If the provided URL is "http://127.0.0.1:11211#@google.com:80/" and that you used PHP <= 7.0.13 your parse_url function would tell you that the domain you're trying to request is google.com so everything is fine, but in fact when the call to curl is made, curl would call 127.0.0.1. This one was fixed but the problem could still occur if the parser is not the same as the one used in the requester.
> > If I'm using CurlUrl to "add/delete/overwrite some parts" how is that > not "using it alone as a representation of an URL"? > > What I meant here was that if you're not using curl, you have no advantage
of using this class alone to parse since the requester you're using could handle the URL differently.
> If I'm writing a PSR-7 object, am I only supposed to use CurlUrl when > interfacing with curl, and generate the string myself for other > purposes? If the implementation I come up with differs from curl's, how > does the user know which is the "real" URL? > > You can use CurlUrl within your implementation of UriInterface but for the
same reason if you're using another request engine than curl, you may have the same security problem where curl will not parse the same data. If you want to make sure that your CurlUrl object represents the same thing as your UriInterface you could build the CurlUrl object part by part using your UriInterface. When you assign your CurlUrl to your CurlHandle with the CURLOPT_CURLU option, curl will use the parts directly instead of parsing the URL again, so you're sure that the host will be the one you set with `CurlUrl::setHost()` and so on. Pierrick [1] https://www.blackhat.com/docs/us-17/thursday/us-17-Tsai-A-New-Era-Of-SSRF-Exploiting-URL-Parser-In-Trending-Programming-Languages.pdf
  118113
June 28, 2022 14:23 rowan.collins@gmail.com (Rowan Tommins)
On Mon, 27 Jun 2022 at 22:32, Pierrick Charron <pierrick@adoy.net> wrote:

> That's just an example with an old version of PHP, but let's say you have > some code that makes requests but only to a specific list of servers, so > you want to analyze the URL and check if the host is in a whitelist. If the > provided URL is "http://127.0.0.1:11211#@google.com:80/" and that you > used PHP <= 7.0.13 your parse_url function would tell you that the domain > you're trying to request is google.com so everything is fine, but in fact > when the call to curl is made, curl would call 127.0.0.1. This one was > fixed but the problem could still occur if the parser is not the same as > the one used in the requester. >
....
> You can use CurlUrl within your implementation of UriInterface but for the > same reason if you're using another request engine than curl, you may have > the same security problem where curl will not parse the same data. If you > want to make sure that your CurlUrl object represents the same thing as > your UriInterface you could build the CurlUrl object part by part using > your UriInterface. When you assign your CurlUrl to your CurlHandle with the > CURLOPT_CURLU option, curl will use the parts directly instead of parsing > the URL again, so you're sure that the host will be the one you set with > `CurlUrl::setHost()` and so on. >
This is actually a lot trickier than it sounds. Imagine this code, with the bug you gave as an example still present in one of the libraries we're interacting with: $url = new MyUrlObject("http://127.0.0.1:11211#@google.com:80/"); var_dump($url->getHostName()); // ??? This won't work, because we don't know which parser to call; it needs to be something like this: var_dump($url->getHostName(UrlContext::CURL)); // '127.0.0.1' var_dump($url->getHostName(UrlContext::BROKEN_PHP)); // 'google.com' Then we call this: $url->setHostName("duckduckgo.com"); var_dump($url->getFullUrl(); // ??? Again, the result depends on context: var_dump($url->getFullUrl( UrlContext::CURL )); // " http://duckduckgo.com#@google.com:80/" var_dump($url->getFullUrl( UrlContext::BROKEN_PHP )); // " http://127.0.0.1:11211#@duckduckgo.com:80/" But that means we can't actually apply the setHostName change until the call to getFullUrl(), because we don't know how the original URL will be parsed. Instead, the object has to internally store a queue of applied modifications, and then reproduce them, in order, on the underlying implementation. Alternatively, we can build our implementation to assume it will always be used in the context of curl, or for debugging curl, so can just have getHostName() and setHostName() directly use curl's implementation. Which leads us back to where we started, of using CurlUrl directly or via a thin wrapper, as either a URL parser+builder, or as a value object in its own right. I don't really know how to make all this nuance clear to users, but that makes me a bit wary of adding the object to PHP in its current design. Regards, -- Rowan Tommins [IMSoP]