Re: [PHP-DEV] RFC: Server-Side Request and Response Objects (v2)

This is only part of a thread. view whole thread
  108548
February 13, 2020 18:34 pmjones@pmjones.io ("Paul M. Jones")
Hi Mike,

Thanks for your continued evaluation of the RFC.

> Take a look at WordPress. It does a lot of "fixup" to $_SERVER` variables — to deal with badly implemented web servers — to ensure all known variables have a value and that the format of the value is consistent. > > ... > > Another option actually would be to allow changes to $_SERVER prior to instantiating ServerRequest() the first time.
Believe it or not, this RFC does make allowance for what you're describing, as a result of requiring a $globals array as the first parameter. An application-specific builder or factory can modify those $globals values as desired, before instantiating ServerRequest with the modified values. For example: namespace App; use ServerRequest; class ServerRequestFactory { public function new(array $globals, ?string $content = null) : ServerRequest { // do fixup to $globals['_SERVER'] ... // ... then: return new ServerRequest($globals, $content); } } $request = (new \App\ServerRequestFactory())->new($GLOBALS); I can easily imagine many ways to achieve the same end (i.e., modification of the constructor arguments before instantiating ServerRequest). Do you get where I'm coming from?
>> First, I'd have to decline adding request() (or $request) at all; my opinion is that one ought to be reading from $get, $post, $cookies, etc. specifically, not from a pool of those values. > > That's definitely fair. I almost did not include request() in my suggestion but did because PHP has $_REQUEST.
Good enough. :-)
> Why a method is important is to support filters (I guess I assumed that was obvious but realize now it was not.) For example: > > $request->get('redirect', FILTER_SANITIZE_URL); > $request->get('product_id', FILTER_SANITIZE_NUMBER_INT); > $request->get('email', FILTER_SANITIZE_EMAIL); > > You could potentially even have scoped, easier to remember constants that can work with autocomplete: > > $request->get('redirect', ServerRequest::URL); > $request->get('product_id', ServerRequest::INT); > $request->get('email', ServerRequest::EMAIL);
[snip] I do see what you mean. Even so, I think filtering is out-of-scope for this RFC. Certainly I want to avoid adding methods to the ServerRequest class, which as envisioned is a bag of values much the same way $_GET, $_POST, $_SERVER, etc. are bags of values.
>>> Would you not also add an option to generate a warning when using them for those who want to deprecate their use in their own code (deprecating across the board would be too extreme give how much CMS and framework code uses them intimately.) >> >> That seems a bit much at this point. ;-) > > Really? Seems like this and some guard code is all it would take: > > ini_set( "disallow_superglobals", true);
Even if that's true (and I think that example masks some complexity) I must maintain that it's out of scope for this RFC. -- Paul M. Jones pmjones@pmjones.io http://paul-m-jones.com Modernizing Legacy Applications in PHP https://leanpub.com/mlaphp Solving the N+1 Problem in PHP https://leanpub.com/sn1php
  108551
February 13, 2020 20:31 mike@newclarity.net (Mike Schinkel)
> On Feb 13, 2020, at 1:34 PM, Paul M. Jones <pmjones@pmjones.io> wrote: > > Hi Mike, > > Thanks for your continued evaluation of the RFC. > >> Take a look at WordPress. It does a lot of "fixup" to $_SERVER` variables — to deal with badly implemented web servers — to ensure all known variables have a value and that the format of the value is consistent. >> >> ... >> >> Another option actually would be to allow changes to $_SERVER prior to instantiating ServerRequest() the first time. > > Believe it or not, this RFC does make allowance for what you're describing, as a result of requiring a $globals array as the first parameter. An application-specific builder or factory can modify those $globals values as desired, before instantiating ServerRequest with the modified values. > > For example: > > namespace App; > > use ServerRequest; > > class ServerRequestFactory > { > public function new(array $globals, ?string $content = null) : ServerRequest > { > // do fixup to $globals['_SERVER'] ... > > // ... then: > return new ServerRequest($globals, $content); > } > } > > $request = (new \App\ServerRequestFactory())->new($GLOBALS); > > I can easily imagine many ways to achieve the same end (i.e., modification of the constructor arguments before instantiating ServerRequest). > > Do you get where I'm coming from?
Ah, I see now. That does allow for changing. OTOH, what it does not do is ensure that everyone looking at the values are looking at the fixed-up values, nor does it offer any way to cache the original value except to do so at the top of every PHP file that is a URL endpoint. I was hoping that your proposal was having a way to capture the original values before any PHP code could change them AND also then allow for fixed-up values to be provided for _every_ instantiation of ServerRequest(). This in case a library were to use it, I would not want the library to be getting the non-fixed up values. So in that respect, this is worse than what we have now.
>> Why a method is important is to support filters (I guess I assumed that was obvious but realize now it was not.) For example: >> >> $request->get('redirect', FILTER_SANITIZE_URL); >> $request->get('product_id', FILTER_SANITIZE_NUMBER_INT); >> $request->get('email', FILTER_SANITIZE_EMAIL); >> >> You could potentially even have scoped, easier to remember constants that can work with autocomplete: >> >> $request->get('redirect', ServerRequest::URL); >> $request->get('product_id', ServerRequest::INT); >> $request->get('email', ServerRequest::EMAIL); > > [snip] > > I do see what you mean. Even so, I think filtering is out-of-scope for this RFC.
I would implore you to reconsider. Without incorporating the functionality of filter_input() you have effectively neutered much of the value I envision your RFC having. In the USA we might say "Besides that Mrs. Lincoln, how was the play?" If I had a vote I would vote I would enthusiastically vote for the RFC if it includes filter_input() functionality. But I would vote against this RFC if it omits filter_input() functionality because it would mean another subsystem in core that does not actually address day-to-day concerns.
> Certainly I want to avoid adding methods to the ServerRequest class, which as envisioned is a bag of values much the same way $_GET, $_POST, $_SERVER, etc. are bags of values.
Unless I misunderstand avoiding methods in ServerRequest is an arbitrary constraint which has no real-world requirement. And if your motivation is to only mirror what exists then filter_input() is a function so an equivalent in ServerRequest should be a method and thus this in not inconsistent with your mirroring aspect. -Mike
  108627
February 16, 2020 23:34 rowan.collins@gmail.com (Rowan Tommins)
On 13/02/2020 20:31, Mike Schinkel wrote:
> If I had a vote I would vote I would enthusiastically vote for the RFC if it includes filter_input() functionality. But I would vote against this RFC if it omits filter_input() functionality because it would mean another subsystem in core that does not actually address day-to-day concerns.
I think you are over-estimating how central the filter API is to most people's workflow with requests. I think that's partly because designing a good validation API is hard, but also because filter_input in particular is a combination of three different concerns: 1) Fetching the raw information about the incoming HTTP request from the web server (the "SAPI") 2) Parsing that raw information into individual fields 3) Validating those fields against expected type constraints The superglobals already combine concerns 1 and 2, and the filter API adds concern 3; but to do so they all assume that the user is basically writing a CGI wrapper for some HTML forms. The modern reality is rather different, and step 2 in particular is much more variable: - Rather than query string parameters, it might involve extracting parameters from an SEO URL like "/products/123-acme-thingummy" or a RESTish URL like "/products/123/description/en-GB" - Rather than submitted form data, it might involve parsing JSON from an AJAX request or API call I would love to see new APIs that take a step back from the legacy, and tackle each of these concerns separately, based on modern requirements. For concern 1, getting data out of the web server, I'd love to see: - A more intuitive way to get the raw request body than file_get_contents('php://input') - A more reliable way to get the URL the user requested than checking 5 different variables in $_SERVER to handle different deployment methods (see e.g. [1] and [2] for the lengths libraries go to for this) - A proper distinction between HTTP headers, server status variables, and environment variables, because CGI name-mangling is legacy cruft that users shouldn't need to learn For concern 2, parsing that data, I'd love to see: - A better API than parse_str for parsing arbitrary strings in application/x-www-form-urlencoded format - A way to parse data in multipart/form-data format decoupled from the current HTTP request - Tools for working with Content-Type strings, such as a function for correctly parsing things like "text/html;charset=UTF-8", and constants for common MIME types Concern 3, filtering / sanitising / validating, I think is a really hard problem space, and I don't think there will ever be one implementation that suits all cases. A similar "shopping list" could probably be made for responses, but if we decoupled the pieces, we wouldn't have to perfect them all at once; instead, we could provide building blocks that make userland implementations easier. [1] https://github.com/symfony/symfony/blob/9acb06041cc88b5c14d40f8cd9a74dd14d7ac786/src/Symfony/Component/HttpFoundation/Request.php#L1741 [2] https://github.com/laminas/laminas-diactoros/blob/b36d6bf376b03dfc3190b1065630090e57f2e20d/src/functions/marshal_uri_from_sapi.php Regards, -- Rowan Tommins (né Collins) [IMSoP]
  108628
February 17, 2020 00:45 mike@newclarity.net (Mike Schinkel)
> On Feb 16, 2020, at 6:34 PM, Rowan Tommins collins@gmail.com> wrote: > > On 13/02/2020 20:31, Mike Schinkel wrote: >> If I had a vote I would vote I would enthusiastically vote for the RFC if it includes filter_input() functionality. But I would vote against this RFC if it omits filter_input() functionality because it would mean another subsystem in core that does not actually address day-to-day concerns. > > I think you are over-estimating how central the filter API is to most people's workflow with requests. I think that's partly because designing a good validation API is hard, but also because filter_input in particular is a combination of three different concerns:
I think your latter points are orthogonal to this. And that you are taking my advocacy for adding filtering to apply too literally to only the specific implementations in filter_input(). I can see addressing your comments below *and* having a filtering method built into these objects. Possibly even with applicable method names as opposed to a 2nd type parameter, like: $request->getInt('db_id'); $request->getJson('package'); $request->getUrl('return_url');
> 1) Fetching the raw information about the incoming HTTP request from the web server (the "SAPI") > 2) Parsing that raw information into individual fields > 3) Validating those fields against expected type constraints > > The superglobals already combine concerns 1 and 2, and the filter API adds concern 3; but to do so they all assume that the user is basically writing a CGI wrapper for some HTML forms. > > The modern reality is rather different, and step 2 in particular is much more variable: > > - Rather than query string parameters, it might involve extracting parameters from an SEO URL like "/products/123-acme-thingummy" or a RESTish URL like "/products/123/description/en-GB" > - Rather than submitted form data, it might involve parsing JSON from an AJAX request or API call > > > I would love to see new APIs that take a step back from the legacy, and tackle each of these concerns separately, based on modern requirements. > > For concern 1, getting data out of the web server, I'd love to see: > > - A more intuitive way to get the raw request body than file_get_contents('php://input') > - A more reliable way to get the URL the user requested than checking 5 different variables in $_SERVER to handle different deployment methods (see e.g. [1] and [2] for the lengths libraries go to for this) > - A proper distinction between HTTP headers, server status variables, and environment variables, because CGI name-mangling is legacy cruft that users shouldn't need to learn > > For concern 2, parsing that data, I'd love to see: > > - A better API than parse_str for parsing arbitrary strings in application/x-www-form-urlencoded format > - A way to parse data in multipart/form-data format decoupled from the current HTTP request > - Tools for working with Content-Type strings, such as a function for correctly parsing things like "text/html;charset=UTF-8", and constants for common MIME types > > Concern 3, filtering / sanitising / validating, I think is a really hard problem space, and I don't think there will ever be one implementation that suits all cases. > > A similar "shopping list" could probably be made for responses, but if we decoupled the pieces, we wouldn't have to perfect them all at once; instead, we could provide building blocks that make userland implementations easier.
Decoupling is a valid approach. But given how much work it is get to an RFC over the line, it feels like decoupling would end up with a lot more work, lengthen the timeline to achieve base level functionality, and add uncertainty to whether it will even happen whereas handling the 20% now that we need 80% of the time would mean the API would be mostly fully usable out of the gate. -Mike
  108635
February 17, 2020 07:51 rowan.collins@gmail.com (Rowan Tommins)
On 17 February 2020 00:45:26 GMT+00:00, Mike Schinkel <mike@newclarity.net> wrote:
>I think your latter points are orthogonal to this. And that you are >taking my advocacy for adding filtering to apply too literally to only >the specific implementations in filter_input().
Firstly, I deliberately didn't say "the filter API isn't well designed", I said "designing a good validation API is hard". In particular, finding the balance between flexibility and simplicity is key. Including a single blessed validation API in something as fundamental as a request object should take a lot of careful design, not be an afterthought to something like the current RFC. I also talked specifically about moving away from the old assumptions of CGI. What does it mean to "filter" a JSON body? We could check it's valid JSON, but parsing it will reveal that anyway. We could automatically parse it in the request object, and have "filters" apply to individual elements; but where would the user supply parser options, and how would you specify nested elements? Or we could keep it as a dumb string, and leave the validation to other systems, like a JSON Schema validator. Even with a plain HTML form, you might be using a form builder and want to associate your validation with the form definition rather than having it baked into the request object.
>But given how much work it is get to an RFC over the line, it feels >like decoupling would end up with a lot more work, lengthen the >timeline to achieve base level functionality, and add uncertainty to >whether it will even happen whereas handling the 20% now that we need >80% of the time would mean the API would be mostly fully usable out of >the gate.
Funny, I see the exact opposite: trying to build a single set of classes which include a system for getting global state AND a system for parsing it in different ways AND an in-built validation API seems like a mammoth task. And if you keep it monolithic, any feature you miss or make a mistake on is much harder to fix later. Regards, -- Rowan Tommins [IMSoP]