Re: [PHP-DEV] [RFC] token_get_all() TOKEN_AS_OBJECT mode

This is only part of a thread. view whole thread
February 14, 2020 10:24 (Nikita Popov)
On Fri, Feb 14, 2020 at 9:48 AM Nikita Popov> wrote:

> On Thu, Feb 13, 2020 at 6:06 PM Larry Garfield <> > wrote: > >> On Thu, Feb 13, 2020, at 3:47 AM, Nikita Popov wrote: >> > Hi internals, >> > >> > This has been discussed a while ago already, now as a proper proposal: >> > >> > >> > tl;dr is that it allows you to get token_get_all() output as an array of >> > PhpToken objects. This reduces memory usage, improves performance, makes >> > code more uniform and readable... What's not to like? >> > >> > An open question is whether (at least to start with) PhpToken should be >> > just a data container, or whether we want to add some helper methods to >> it. >> > If this generates too much bikeshed, I'll drop methods from the >> proposal. >> > >> > Regards, >> > Nikita >> >> I love everything about this. >> >> 1) I would agree with Nicolas that a static constructor would be better. >> I don't know about polyfilling it, but it's definitely more >> self-descriptive. >> >> 2) I'm skeptical about the methods. I can see them being useful, but >> also being bikeshed material. For instance, if you're doing annotation >> parsing then docblocks are not ignorable. They're what you're actually >> looking for. >> >> Two possible additions, feel free to ignore if they're too complicated: >> >> 1) Should it return an array of token objects, or a lazy iterable? If >> I'm only interested in certain types (eg, doc strings, classes, etc.) then >> a lazy iterable would allow me to string some filter and map operations on >> to it and use even less memory overall, since the whole tree is not in >> memory at once. >> > > I'm going to take you up on your offer and ignore this one :P Returning > tokens as an iterator is inefficient because it requires full lexer state > backups and restores for each token. Could be optimized, but I wouldn't > bother with it for this feature. I also personally have no use-case for a > lazy token stream. (It's technically sufficient for parsing, but if you > want to preserve formatting, you're going to be preserving all the tokens > anyway.) > > >> 2) Rather than provide bikesheddable methods, would it be feasible to >> take a queue from PDO and let users specify a subclass of PhpToken to fetch >> into? That way the properties are always there, but a user can attach >> whatever methods make sense for them. >> > > It would be technically feasible. If we go with a static method for > construction, then one might even say that there's reasonable expectation > that PhpToken::getAll(...) is going to return PhpToken[] and > MyPhpTokenExtension::getAll() is going to return MyPhpTokenExtension[]. > > I'm a bit apprehensive about this though, specifically because you mention > PDO... which, I think, isn't exactly a success story when it comes to this. > If we do this, then the behavior would be that the object gets created, the > properties populated, and *no constructor gets called*. The last part is > important -- when you start calling constructors and magic methods, that's > where the mess starts and you get PDO. >
After thinking about this a bit more, there's a very nice solution to this: Something that's missing from the current proposal is a constructor. Right now, if code wants to insert new tokens, then those would have to be constructed by creating the object and then manually assigning properties, so we should definitely have a constructor. Once we have one, we can mark it final, and thus make the construction behavior well-defined, even if the class is extended. Nikita