Re: [PHP-DEV] [RFC] token_get_all() TOKEN_AS_OBJECT mode

This is only part of a thread. view whole thread
  108583
February 14, 2020 16:32 nikita.ppv@gmail.com (Nikita Popov)
On Fri, Feb 14, 2020 at 2:38 PM Sara Golemon <pollita@php.net> wrote:

> On Thu, Feb 13, 2020 at 3:48 AM Nikita Popov ppv@gmail.com> wrote: > >> This has been discussed a while ago already, now as a proper proposal: >> https://wiki.php.net/rfc/token_as_object >> >> tl;dr is that it allows you to get token_get_all() output as an array of >> PhpToken objects. This reduces memory usage, improves performance, makes >> code more uniform and readable... What's not to like? >> >> An open question is whether (at least to start with) PhpToken should be >> just a data container, or whether we want to add some helper methods to >> it. >> If this generates too much bikeshed, I'll drop methods from the proposal. >> >> Yep. I remember bringing this up in 2016 and there was generally good > reception to it from you, Andrea, Stas, and Stig at the very least. Why > isn't it in? It got derailed by some bike shed colorizing and a little bit > of workplace drama then dropped on the floor. > > Thanks for picking it up, and I agree with your response to Larry. As > nice as it would be to lazy iterate, the scanner is just in NO shape to > tolerate reentering userspace and potentially reinvoking the scanner before > the first run through is done. > > I also agree that being able to subclass the token would be great, but > that PDO made some mistakes here and we can do that as a separate addition > later on if there's not consensus now. > > I'm +1 for NOT overloading the token_get_all() function, but rather > putting a static factory method on the PhpToken class (or whatever you call > it). When we add subclassability, we can always add additional statics if > our initial signature doesn't work out. >
As there seems to be a pretty clear consensus on this point at least, I've updated the proposal to add the static factory method. I'm not clear why you want to final the base constructor. IMO we populate
> the fields on object creation, then invoke constructor (which is a no-op in > the base class). Later uses of subclassing can deal with the properties > (or not) at that time, in their own constructor, delegating (or not) to the > base. >
Populating the fields on creation and then calling the constructor is the whole problem with PDO, because that's not how PHP usually does construction. Normally there is either no constructor and properties are explicitly populated, or there is a constructor, in which case only the constructor is called. Populating properties first and the calling a constructor is something that essentially only PDO does. To clarify a bit what I meant with the final constructor: As part of the last update, I've added the following constructor: class PhpToken { public function __construct(int $id, string $text, int $line = -1, int $pos = -1); } This constructor will initialize the corresponding properties. Now, the behavior that would make most sense to me (if extension of the class is allowed) is that MyPhpToken::getAll() is going to create the new tokens based on "new MyPhpToken($id, $text, $line, $pos)". If we mark the constructor final, then we could hardcode the construction behavior of the base class without introducing any kind of weird rules, it would be just the usual language semantics. If we don't make the constructor final, then we would have to actually call it (if it is overridden -- otherwise we can use more optimized initialization code). We can do that (I believe calling user code here should be perfectly safe -- the lexer is reentrant after all), it's just extra complexity and I'm not sure it's actually useful. The final constructor would still allow a) adding methods in the child class and b) adding properties with default values in the child class, which seems like it should cover most of the usefulness. So I think the options here are: a) Make PhpToken final and simplify don't support this. b) Make PhpToken::__construct() final and (very easily) support basic extension. c) Make PhpToken::__construct() non-final and support full extension, with some extra effort. Regards, Nikita
  108588
February 14, 2020 18:30 rowan.collins@gmail.com (Rowan Tommins)
On Fri, 14 Feb 2020 at 16:33, Nikita Popov ppv@gmail.com> wrote:

> This constructor will initialize the corresponding properties. Now, the > behavior that would make most sense to me (if extension of the class is > allowed) is that MyPhpToken::getAll() is going to create the new tokens > based on "new MyPhpToken($id, $text, $line, $pos)". If we mark the > constructor final, then we could hardcode the construction behavior of the > base class without introducing any kind of weird rules, it would be just > the usual language semantics. >
It's worth noting that this is how ext/simplexml works: the $class parameter of simplexml_load_string and simplexml_load_file must be the name of a class that inherits from SimpleXMLElement, and an instance of that class will be constructed for each element of the document. The constructor of SimpleXMLElement is final, so the internal initialisation logic doesn't have to actually call it, it can just initialise the private state directly. Regards, -- Rowan Tommins [IMSoP]
  108600
February 15, 2020 15:47 nikita.ppv@gmail.com (Nikita Popov)
On Fri, Feb 14, 2020 at 7:30 PM Rowan Tommins collins@gmail.com>
wrote:

> On Fri, 14 Feb 2020 at 16:33, Nikita Popov ppv@gmail.com> wrote: > > > This constructor will initialize the corresponding properties. Now, the > > behavior that would make most sense to me (if extension of the class is > > allowed) is that MyPhpToken::getAll() is going to create the new tokens > > based on "new MyPhpToken($id, $text, $line, $pos)". If we mark the > > constructor final, then we could hardcode the construction behavior of > the > > base class without introducing any kind of weird rules, it would be just > > the usual language semantics. > > > > > It's worth noting that this is how ext/simplexml works: the $class > parameter of simplexml_load_string and simplexml_load_file must be the name > of a class that inherits from SimpleXMLElement, and an instance of that > class will be constructed for each element of the document. The constructor > of SimpleXMLElement is final, so the internal initialisation logic doesn't > have to actually call it, it can just initialise the private state > directly. >
Thanks for pointing that out, I wasn't aware that SimpleXMLElement is already using this pattern. Given that, I've updated the RFC to go with that option now. Nikita