[RFC] token_get_all() TOKEN_AS_OBJECT mode

  108532
February 13, 2020 09:47 nikita.ppv@gmail.com (Nikita Popov)
Hi internals,

This has been discussed a while ago already, now as a proper proposal:
https://wiki.php.net/rfc/token_as_object

tl;dr is that it allows you to get token_get_all() output as an array of
PhpToken objects. This reduces memory usage, improves performance, makes
code more uniform and readable... What's not to like?

An open question is whether (at least to start with) PhpToken should be
just a data container, or whether we want to add some helper methods to it.
If this generates too much bikeshed, I'll drop methods from the proposal.

Regards,
Nikita
  108543
February 13, 2020 17:05 larry@garfieldtech.com ("Larry Garfield")
On Thu, Feb 13, 2020, at 3:47 AM, Nikita Popov wrote:
> Hi internals, > > This has been discussed a while ago already, now as a proper proposal: > https://wiki.php.net/rfc/token_as_object > > tl;dr is that it allows you to get token_get_all() output as an array of > PhpToken objects. This reduces memory usage, improves performance, makes > code more uniform and readable... What's not to like? > > An open question is whether (at least to start with) PhpToken should be > just a data container, or whether we want to add some helper methods to it. > If this generates too much bikeshed, I'll drop methods from the proposal. > > Regards, > Nikita
I love everything about this. 1) I would agree with Nicolas that a static constructor would be better. I don't know about polyfilling it, but it's definitely more self-descriptive. 2) I'm skeptical about the methods. I can see them being useful, but also being bikeshed material. For instance, if you're doing annotation parsing then docblocks are not ignorable. They're what you're actually looking for. Two possible additions, feel free to ignore if they're too complicated: 1) Should it return an array of token objects, or a lazy iterable? If I'm only interested in certain types (eg, doc strings, classes, etc.) then a lazy iterable would allow me to string some filter and map operations on to it and use even less memory overall, since the whole tree is not in memory at once. 2) Rather than provide bikesheddable methods, would it be feasible to take a queue from PDO and let users specify a subclass of PhpToken to fetch into? That way the properties are always there, but a user can attach whatever methods make sense for them. IMO the laziness would be more valuable, since a struct class can be operated on by an external function just as easily as a method, especially if using a functional pipeline. --Larry Garfield
  108561
February 14, 2020 08:48 nikita.ppv@gmail.com (Nikita Popov)
On Thu, Feb 13, 2020 at 6:06 PM Larry Garfield <larry@garfieldtech.com>
wrote:

> On Thu, Feb 13, 2020, at 3:47 AM, Nikita Popov wrote: > > Hi internals, > > > > This has been discussed a while ago already, now as a proper proposal: > > https://wiki.php.net/rfc/token_as_object > > > > tl;dr is that it allows you to get token_get_all() output as an array of > > PhpToken objects. This reduces memory usage, improves performance, makes > > code more uniform and readable... What's not to like? > > > > An open question is whether (at least to start with) PhpToken should be > > just a data container, or whether we want to add some helper methods to > it. > > If this generates too much bikeshed, I'll drop methods from the proposal. > > > > Regards, > > Nikita > > I love everything about this. > > 1) I would agree with Nicolas that a static constructor would be better. > I don't know about polyfilling it, but it's definitely more > self-descriptive. > > 2) I'm skeptical about the methods. I can see them being useful, but also > being bikeshed material. For instance, if you're doing annotation parsing > then docblocks are not ignorable. They're what you're actually looking for. > > Two possible additions, feel free to ignore if they're too complicated: > > 1) Should it return an array of token objects, or a lazy iterable? If I'm > only interested in certain types (eg, doc strings, classes, etc.) then a > lazy iterable would allow me to string some filter and map operations on to > it and use even less memory overall, since the whole tree is not in memory > at once. >
I'm going to take you up on your offer and ignore this one :P Returning tokens as an iterator is inefficient because it requires full lexer state backups and restores for each token. Could be optimized, but I wouldn't bother with it for this feature. I also personally have no use-case for a lazy token stream. (It's technically sufficient for parsing, but if you want to preserve formatting, you're going to be preserving all the tokens anyway.)
> 2) Rather than provide bikesheddable methods, would it be feasible to take > a queue from PDO and let users specify a subclass of PhpToken to fetch > into? That way the properties are always there, but a user can attach > whatever methods make sense for them. >
It would be technically feasible. If we go with a static method for construction, then one might even say that there's reasonable expectation that PhpToken::getAll(...) is going to return PhpToken[] and MyPhpTokenExtension::getAll() is going to return MyPhpTokenExtension[]. I'm a bit apprehensive about this though, specifically because you mention PDO... which, I think, isn't exactly a success story when it comes to this. If we do this, then the behavior would be that the object gets created, the properties populated, and *no constructor gets called*. The last part is important -- when you start calling constructors and magic methods, that's where the mess starts and you get PDO. Regards, Nikita
  108566
February 14, 2020 10:24 nikita.ppv@gmail.com (Nikita Popov)
On Fri, Feb 14, 2020 at 9:48 AM Nikita Popov ppv@gmail.com> wrote:

> On Thu, Feb 13, 2020 at 6:06 PM Larry Garfield <larry@garfieldtech.com> > wrote: > >> On Thu, Feb 13, 2020, at 3:47 AM, Nikita Popov wrote: >> > Hi internals, >> > >> > This has been discussed a while ago already, now as a proper proposal: >> > https://wiki.php.net/rfc/token_as_object >> > >> > tl;dr is that it allows you to get token_get_all() output as an array of >> > PhpToken objects. This reduces memory usage, improves performance, makes >> > code more uniform and readable... What's not to like? >> > >> > An open question is whether (at least to start with) PhpToken should be >> > just a data container, or whether we want to add some helper methods to >> it. >> > If this generates too much bikeshed, I'll drop methods from the >> proposal. >> > >> > Regards, >> > Nikita >> >> I love everything about this. >> >> 1) I would agree with Nicolas that a static constructor would be better. >> I don't know about polyfilling it, but it's definitely more >> self-descriptive. >> >> 2) I'm skeptical about the methods. I can see them being useful, but >> also being bikeshed material. For instance, if you're doing annotation >> parsing then docblocks are not ignorable. They're what you're actually >> looking for. >> >> Two possible additions, feel free to ignore if they're too complicated: >> >> 1) Should it return an array of token objects, or a lazy iterable? If >> I'm only interested in certain types (eg, doc strings, classes, etc.) then >> a lazy iterable would allow me to string some filter and map operations on >> to it and use even less memory overall, since the whole tree is not in >> memory at once. >> > > I'm going to take you up on your offer and ignore this one :P Returning > tokens as an iterator is inefficient because it requires full lexer state > backups and restores for each token. Could be optimized, but I wouldn't > bother with it for this feature. I also personally have no use-case for a > lazy token stream. (It's technically sufficient for parsing, but if you > want to preserve formatting, you're going to be preserving all the tokens > anyway.) > > >> 2) Rather than provide bikesheddable methods, would it be feasible to >> take a queue from PDO and let users specify a subclass of PhpToken to fetch >> into? That way the properties are always there, but a user can attach >> whatever methods make sense for them. >> > > It would be technically feasible. If we go with a static method for > construction, then one might even say that there's reasonable expectation > that PhpToken::getAll(...) is going to return PhpToken[] and > MyPhpTokenExtension::getAll() is going to return MyPhpTokenExtension[]. > > I'm a bit apprehensive about this though, specifically because you mention > PDO... which, I think, isn't exactly a success story when it comes to this. > If we do this, then the behavior would be that the object gets created, the > properties populated, and *no constructor gets called*. The last part is > important -- when you start calling constructors and magic methods, that's > where the mess starts and you get PDO. >
After thinking about this a bit more, there's a very nice solution to this: Something that's missing from the current proposal is a constructor. Right now, if code wants to insert new tokens, then those would have to be constructed by creating the object and then manually assigning properties, so we should definitely have a constructor. Once we have one, we can mark it final, and thus make the construction behavior well-defined, even if the class is extended. Nikita
  108573
February 14, 2020 13:31 thruska@cubiclesoft.com (Thomas Hruska)
On 2/14/2020 1:48 AM, Nikita Popov wrote:
> On Thu, Feb 13, 2020 at 6:06 PM Larry Garfield <larry@garfieldtech.com> > wrote: > >> On Thu, Feb 13, 2020, at 3:47 AM, Nikita Popov wrote: >>> Hi internals, >>> >>> This has been discussed a while ago already, now as a proper proposal: >>> https://wiki.php.net/rfc/token_as_object >>> >>> tl;dr is that it allows you to get token_get_all() output as an array of >>> PhpToken objects. This reduces memory usage, improves performance, makes >>> code more uniform and readable... What's not to like? >>> >>> An open question is whether (at least to start with) PhpToken should be >>> just a data container, or whether we want to add some helper methods to >> it. >>> If this generates too much bikeshed, I'll drop methods from the proposal. >>> >>> Regards, >>> Nikita >> >> I love everything about this. >> >> 1) I would agree with Nicolas that a static constructor would be better. >> I don't know about polyfilling it, but it's definitely more >> self-descriptive. >> >> 2) I'm skeptical about the methods. I can see them being useful, but also >> being bikeshed material. For instance, if you're doing annotation parsing >> then docblocks are not ignorable. They're what you're actually looking for. >> >> Two possible additions, feel free to ignore if they're too complicated: >> >> 1) Should it return an array of token objects, or a lazy iterable? If I'm >> only interested in certain types (eg, doc strings, classes, etc.) then a >> lazy iterable would allow me to string some filter and map operations on to >> it and use even less memory overall, since the whole tree is not in memory >> at once. >> > > I'm going to take you up on your offer and ignore this one :P Returning > tokens as an iterator is inefficient because it requires full lexer state > backups and restores for each token. Could be optimized, but I wouldn't > bother with it for this feature. I also personally have no use-case for a > lazy token stream. (It's technically sufficient for parsing, but if you > want to preserve formatting, you're going to be preserving all the tokens > anyway.)
Try passing a 10MB PHP file that's all code into token_get_all(). It's pretty easy to hit hard memory limits and/or start crashing PHP when token_get_all() tokenizes the whole thing into a giant array or set of objects. Calling gc_mem_caches() when the previous RAM bits aren't needed anymore helps. Stream-based token parsing would be better for RAM usage but I can see how that might be complex to implement and largely not worth it since such scenarios will be rare and require the ability to maintain lexer state externally as you mentioned and would only be used by this part of the software. -- Thomas Hruska CubicleSoft President I've got great, time saving software that you will find useful. http://cubiclesoft.com/ And once you find my software useful: http://cubiclesoft.com/donate/
  108556
February 14, 2020 00:27 internals@lists.php.net ("Levi Morrison via internals")
On Thu, Feb 13, 2020 at 2:48 AM Nikita Popov ppv@gmail.com> wrote:
> > Hi internals, > > This has been discussed a while ago already, now as a proper proposal: > https://wiki.php.net/rfc/token_as_object > > tl;dr is that it allows you to get token_get_all() output as an array of > PhpToken objects. This reduces memory usage, improves performance, makes > code more uniform and readable... What's not to like? > > An open question is whether (at least to start with) PhpToken should be > just a data container, or whether we want to add some helper methods to it. > If this generates too much bikeshed, I'll drop methods from the proposal. > > Regards, > Nikita
Overall it looks great. Thanks, Nikita. I do think it should use a separate function from `token_get_all`, and am not sure I care whether it is a function or static method. The only helper I see as totally non-controversial is `getTokenName`, which seems completely reasonable and users should not have to keep re-implementing that. I am open to more discussion about the others.
  108574
February 14, 2020 13:38 pollita@php.net (Sara Golemon)
On Thu, Feb 13, 2020 at 3:48 AM Nikita Popov ppv@gmail.com> wrote:

> This has been discussed a while ago already, now as a proper proposal: > https://wiki.php.net/rfc/token_as_object > > tl;dr is that it allows you to get token_get_all() output as an array of > PhpToken objects. This reduces memory usage, improves performance, makes > code more uniform and readable... What's not to like? > > An open question is whether (at least to start with) PhpToken should be > just a data container, or whether we want to add some helper methods to it. > If this generates too much bikeshed, I'll drop methods from the proposal. > > Yep. I remember bringing this up in 2016 and there was generally good reception to it from you, Andrea, Stas, and Stig at the very least. Why
isn't it in? It got derailed by some bike shed colorizing and a little bit of workplace drama then dropped on the floor. Thanks for picking it up, and I agree with your response to Larry. As nice as it would be to lazy iterate, the scanner is just in NO shape to tolerate reentering userspace and potentially reinvoking the scanner before the first run through is done. I also agree that being able to subclass the token would be great, but that PDO made some mistakes here and we can do that as a separate addition later on if there's not consensus now. I'm +1 for NOT overloading the token_get_all() function, but rather putting a static factory method on the PhpToken class (or whatever you call it). When we add subclassability, we can always add additional statics if our initial signature doesn't work out. I'm not clear why you want to final the base constructor. IMO we populate the fields on object creation, then invoke constructor (which is a no-op in the base class). Later uses of subclassing can deal with the properties (or not) at that time, in their own constructor, delegating (or not) to the base. TL;DR +1, because I wanted this four years ago. :) -Sara
  108576
February 14, 2020 13:44 ocramius@gmail.com (Marco Pivetta)
On Fri, Feb 14, 2020 at 2:38 PM Sara Golemon <pollita@php.net> wrote:

> Thanks for picking it up, and I agree with your response to Larry. As nice > as it would be to lazy iterate, the scanner is just in NO shape to tolerate > reentering userspace and potentially reinvoking the scanner before the > first run through is done. >
If this is the current state, maybe it would suffice to declare the return type as `iterable`, and return a strict (fully populated) structure in a first implementation, later to be changed to an iterator, if applicable? Marco Pivetta http://twitter.com/Ocramius http://ocramius.github.com/
  108578
February 14, 2020 14:33 pollita@php.net (Sara Golemon)
On Fri, Feb 14, 2020 at 7:44 AM Marco Pivetta <ocramius@gmail.com> wrote:

> On Fri, Feb 14, 2020 at 2:38 PM Sara Golemon <pollita@php.net> wrote: > >> Thanks for picking it up, and I agree with your response to Larry. As >> nice >> as it would be to lazy iterate, the scanner is just in NO shape to >> tolerate >> reentering userspace and potentially reinvoking the scanner before the >> first run through is done. >> > > If this is the current state, maybe it would suffice to declare the return > type as `iterable`, and return a strict (fully populated) structure in a > first implementation, later to be changed to an iterator, if applicable? > > I think that's an optimistic, but ultimately harmless approach. Arrays satisfy Iterables. Worst case we never improve on that. Best case we get
iterable token generators. +1 -Sara
  108579
February 14, 2020 14:41 nikita.ppv@gmail.com (Nikita Popov)
On Fri, Feb 14, 2020 at 2:44 PM Marco Pivetta <ocramius@gmail.com> wrote:

> On Fri, Feb 14, 2020 at 2:38 PM Sara Golemon <pollita@php.net> wrote: > >> Thanks for picking it up, and I agree with your response to Larry. As >> nice >> as it would be to lazy iterate, the scanner is just in NO shape to >> tolerate >> reentering userspace and potentially reinvoking the scanner before the >> first run through is done. >> > > If this is the current state, maybe it would suffice to declare the return > type as `iterable`, and return a strict (fully populated) structure in a > first implementation, later to be changed to an iterator, if applicable? > > Marco Pivetta >
The fact that it returns an array is an important part of the contract. If an iterator variant is added in the future, it should be added as a separate method. I don't want to be writing $tokens = PhpToken::getAll($code); if (!is_array($tokens)) { $tokens = iterator_to_array($tokens); } to convert this to the right type. And I also don't want my usage to be implicitly "upgraded" to an iterator in the future: The iterator will always be less efficient, and I don't want to be forced to use it if I need an array anyway. Nikita
  108583
February 14, 2020 16:32 nikita.ppv@gmail.com (Nikita Popov)
On Fri, Feb 14, 2020 at 2:38 PM Sara Golemon <pollita@php.net> wrote:

> On Thu, Feb 13, 2020 at 3:48 AM Nikita Popov ppv@gmail.com> wrote: > >> This has been discussed a while ago already, now as a proper proposal: >> https://wiki.php.net/rfc/token_as_object >> >> tl;dr is that it allows you to get token_get_all() output as an array of >> PhpToken objects. This reduces memory usage, improves performance, makes >> code more uniform and readable... What's not to like? >> >> An open question is whether (at least to start with) PhpToken should be >> just a data container, or whether we want to add some helper methods to >> it. >> If this generates too much bikeshed, I'll drop methods from the proposal. >> >> Yep. I remember bringing this up in 2016 and there was generally good > reception to it from you, Andrea, Stas, and Stig at the very least. Why > isn't it in? It got derailed by some bike shed colorizing and a little bit > of workplace drama then dropped on the floor. > > Thanks for picking it up, and I agree with your response to Larry. As > nice as it would be to lazy iterate, the scanner is just in NO shape to > tolerate reentering userspace and potentially reinvoking the scanner before > the first run through is done. > > I also agree that being able to subclass the token would be great, but > that PDO made some mistakes here and we can do that as a separate addition > later on if there's not consensus now. > > I'm +1 for NOT overloading the token_get_all() function, but rather > putting a static factory method on the PhpToken class (or whatever you call > it). When we add subclassability, we can always add additional statics if > our initial signature doesn't work out. >
As there seems to be a pretty clear consensus on this point at least, I've updated the proposal to add the static factory method. I'm not clear why you want to final the base constructor. IMO we populate
> the fields on object creation, then invoke constructor (which is a no-op in > the base class). Later uses of subclassing can deal with the properties > (or not) at that time, in their own constructor, delegating (or not) to the > base. >
Populating the fields on creation and then calling the constructor is the whole problem with PDO, because that's not how PHP usually does construction. Normally there is either no constructor and properties are explicitly populated, or there is a constructor, in which case only the constructor is called. Populating properties first and the calling a constructor is something that essentially only PDO does. To clarify a bit what I meant with the final constructor: As part of the last update, I've added the following constructor: class PhpToken { public function __construct(int $id, string $text, int $line = -1, int $pos = -1); } This constructor will initialize the corresponding properties. Now, the behavior that would make most sense to me (if extension of the class is allowed) is that MyPhpToken::getAll() is going to create the new tokens based on "new MyPhpToken($id, $text, $line, $pos)". If we mark the constructor final, then we could hardcode the construction behavior of the base class without introducing any kind of weird rules, it would be just the usual language semantics. If we don't make the constructor final, then we would have to actually call it (if it is overridden -- otherwise we can use more optimized initialization code). We can do that (I believe calling user code here should be perfectly safe -- the lexer is reentrant after all), it's just extra complexity and I'm not sure it's actually useful. The final constructor would still allow a) adding methods in the child class and b) adding properties with default values in the child class, which seems like it should cover most of the usefulness. So I think the options here are: a) Make PhpToken final and simplify don't support this. b) Make PhpToken::__construct() final and (very easily) support basic extension. c) Make PhpToken::__construct() non-final and support full extension, with some extra effort. Regards, Nikita
  108588
February 14, 2020 18:30 rowan.collins@gmail.com (Rowan Tommins)
On Fri, 14 Feb 2020 at 16:33, Nikita Popov ppv@gmail.com> wrote:

> This constructor will initialize the corresponding properties. Now, the > behavior that would make most sense to me (if extension of the class is > allowed) is that MyPhpToken::getAll() is going to create the new tokens > based on "new MyPhpToken($id, $text, $line, $pos)". If we mark the > constructor final, then we could hardcode the construction behavior of the > base class without introducing any kind of weird rules, it would be just > the usual language semantics. >
It's worth noting that this is how ext/simplexml works: the $class parameter of simplexml_load_string and simplexml_load_file must be the name of a class that inherits from SimpleXMLElement, and an instance of that class will be constructed for each element of the document. The constructor of SimpleXMLElement is final, so the internal initialisation logic doesn't have to actually call it, it can just initialise the private state directly. Regards, -- Rowan Tommins [IMSoP]
  108600
February 15, 2020 15:47 nikita.ppv@gmail.com (Nikita Popov)
On Fri, Feb 14, 2020 at 7:30 PM Rowan Tommins collins@gmail.com>
wrote:

> On Fri, 14 Feb 2020 at 16:33, Nikita Popov ppv@gmail.com> wrote: > > > This constructor will initialize the corresponding properties. Now, the > > behavior that would make most sense to me (if extension of the class is > > allowed) is that MyPhpToken::getAll() is going to create the new tokens > > based on "new MyPhpToken($id, $text, $line, $pos)". If we mark the > > constructor final, then we could hardcode the construction behavior of > the > > base class without introducing any kind of weird rules, it would be just > > the usual language semantics. > > > > > It's worth noting that this is how ext/simplexml works: the $class > parameter of simplexml_load_string and simplexml_load_file must be the name > of a class that inherits from SimpleXMLElement, and an instance of that > class will be constructed for each element of the document. The constructor > of SimpleXMLElement is final, so the internal initialisation logic doesn't > have to actually call it, it can just initialise the private state > directly. >
Thanks for pointing that out, I wasn't aware that SimpleXMLElement is already using this pattern. Given that, I've updated the RFC to go with that option now. Nikita
  108755
February 25, 2020 15:59 nikita.ppv@gmail.com (Nikita Popov)
On Thu, Feb 13, 2020 at 10:47 AM Nikita Popov ppv@gmail.com> wrote:

> Hi internals, > > This has been discussed a while ago already, now as a proper proposal: > https://wiki.php.net/rfc/token_as_object > > tl;dr is that it allows you to get token_get_all() output as an array of > PhpToken objects. This reduces memory usage, improves performance, makes > code more uniform and readable... What's not to like? > > An open question is whether (at least to start with) PhpToken should be > just a data container, or whether we want to add some helper methods to it. > If this generates too much bikeshed, I'll drop methods from the proposal. >
I think this proposal is in a pretty decent shape now, and I'd like to move it to voting soon. The only remaining open question is whether we want to add any additional predefined methods. As the class can now be extended, every library can add their own methods, but there might still be value in providing some things by default, primarily for performance reason. For example, the proposed is() method can be a good bit more efficient when implemented directly in extension code. Any feedback on this point? Nikita
  108756
February 25, 2020 16:58 larry@garfieldtech.com ("Larry Garfield")
On Tue, Feb 25, 2020, at 9:59 AM, Nikita Popov wrote:
> On Thu, Feb 13, 2020 at 10:47 AM Nikita Popov ppv@gmail.com> wrote: > > > Hi internals, > > > > This has been discussed a while ago already, now as a proper proposal: > > https://wiki.php.net/rfc/token_as_object > > > > tl;dr is that it allows you to get token_get_all() output as an array of > > PhpToken objects. This reduces memory usage, improves performance, makes > > code more uniform and readable... What's not to like? > > > > An open question is whether (at least to start with) PhpToken should be > > just a data container, or whether we want to add some helper methods to it. > > If this generates too much bikeshed, I'll drop methods from the proposal. > > > > I think this proposal is in a pretty decent shape now, and I'd like to move > it to voting soon. The only remaining open question is whether we want to > add any additional predefined methods. As the class can now be extended, > every library can add their own methods, but there might still be value in > providing some things by default, primarily for performance reason. For > example, the proposed is() method can be a good bit more efficient when > implemented directly in extension code. > > Any feedback on this point? > > Nikita
Seems good to me in its current state. One last thought I had: in its current form, it *seems* like it would be possible, in concept, to create tokens directly via the constructor (ignoring the positional values), then have a generic render command you can apply to a list of them to use for code generation. I know that's not the intent here, but in concept is what I described possible? --Larry Garfield
  108764
February 25, 2020 21:21 nikita.ppv@gmail.com (Nikita Popov)
On Tue, Feb 25, 2020 at 5:59 PM Larry Garfield <larry@garfieldtech.com>
wrote:

> On Tue, Feb 25, 2020, at 9:59 AM, Nikita Popov wrote: > > On Thu, Feb 13, 2020 at 10:47 AM Nikita Popov ppv@gmail.com> > wrote: > > > > > Hi internals, > > > > > > This has been discussed a while ago already, now as a proper proposal: > > > https://wiki.php.net/rfc/token_as_object > > > > > > tl;dr is that it allows you to get token_get_all() output as an array > of > > > PhpToken objects. This reduces memory usage, improves performance, > makes > > > code more uniform and readable... What's not to like? > > > > > > An open question is whether (at least to start with) PhpToken should be > > > just a data container, or whether we want to add some helper methods > to it. > > > If this generates too much bikeshed, I'll drop methods from the > proposal. > > > > > > > I think this proposal is in a pretty decent shape now, and I'd like to > move > > it to voting soon. The only remaining open question is whether we want to > > add any additional predefined methods. As the class can now be extended, > > every library can add their own methods, but there might still be value > in > > providing some things by default, primarily for performance reason. For > > example, the proposed is() method can be a good bit more efficient when > > implemented directly in extension code. > > > > Any feedback on this point? > > > > Nikita > > Seems good to me in its current state. > > One last thought I had: in its current form, it *seems* like it would be > possible, in concept, to create tokens directly via the constructor > (ignoring the positional values), then have a generic render command you > can apply to a list of them to use for code generation. I know that's not > the intent here, but in concept is what I described possible? >
Sure, that works. You can construct your own tokens and the "render command" should actually be as simple as implode(array_column($tokens, 'text')). Nikita
  108757
February 25, 2020 18:16 theodorejb@outlook.com (Theodore Brown)
On Tue, Feb 25, 2020 at 9:59 AM Nikita Popov ppv@gmail.com> wrote:

> On Thu, Feb 13, 2020 at 10:47 AM Nikita Popov ppv@gmail.com> wrote: > > > This has been discussed a while ago already, now as a proper proposal: > > https://wiki.php.net/rfc/token_as_object > > > > An open question is whether (at least to start with) PhpToken should be > > just a data container, or whether we want to add some helper methods to it. > > If this generates too much bikeshed, I'll drop methods from the proposal. > > I think this proposal is in a pretty decent shape now, and I'd like to move > it to voting soon. The only remaining open question is whether we want to > add any additional predefined methods. As the class can now be extended, > every library can add their own methods, but there might still be value in > providing some things by default, primarily for performance reason. For > example, the proposed is() method can be a good bit more efficient when > implemented directly in extension code. > > Any feedback on this point?
Hi Nikita, Thanks for this RFC. The proposed `PhpToken` class will definitely make it easier to work with parsed tokens (I struggled with this when working on the migration script for deprecated alternate array/string offset syntax). In regards to the open question of additional methods, the `getTokenName()` method would be very welcome. `isIgnorable()` would also be helpful sometimes, though I don't care much one way or another if it's added. But I'm really skeptical about the value of the `is($kind)` method. Code working with tokens should know the type of value being checked (ID int, text content, or array of one or the other), and a method that accepts any of three types can make it harder to understand what's going on. For example, the proposed `isIgnorable()` method doesn't need to use `is()`, it could just check whether the `id` property is in the array of token constants. Best regards, Theodore
  108763
February 25, 2020 21:13 nikita.ppv@gmail.com (Nikita Popov)
On Tue, Feb 25, 2020 at 7:16 PM Theodore Brown <theodorejb@outlook.com>
wrote:

> On Tue, Feb 25, 2020 at 9:59 AM Nikita Popov ppv@gmail.com> wrote: > > > On Thu, Feb 13, 2020 at 10:47 AM Nikita Popov ppv@gmail.com> > wrote: > > > > > This has been discussed a while ago already, now as a proper proposal: > > > https://wiki.php.net/rfc/token_as_object > > > > > > An open question is whether (at least to start with) PhpToken should be > > > just a data container, or whether we want to add some helper methods > to it. > > > If this generates too much bikeshed, I'll drop methods from the > proposal. > > > > I think this proposal is in a pretty decent shape now, and I'd like to > move > > it to voting soon. The only remaining open question is whether we want to > > add any additional predefined methods. As the class can now be extended, > > every library can add their own methods, but there might still be value > in > > providing some things by default, primarily for performance reason. For > > example, the proposed is() method can be a good bit more efficient when > > implemented directly in extension code. > > > > Any feedback on this point? > > Hi Nikita, > > Thanks for this RFC. The proposed `PhpToken` class will definitely make it > easier to work with parsed tokens (I struggled with this when working on > the migration script for deprecated alternate array/string offset syntax). > > In regards to the open question of additional methods, the `getTokenName()` > method would be very welcome. `isIgnorable()` would also be helpful > sometimes, though I don't care much one way or another if it's added. > > But I'm really skeptical about the value of the `is($kind)` method. Code > working with tokens should know the type of value being checked (ID int, > text content, or array of one or the other), and a method that accepts > any of three types can make it harder to understand what's going on. > > For example, the proposed `isIgnorable()` method doesn't need to use > `is()`, it could just check whether the `id` property is in the array > of token constants. >
To provide some context, a method like is() is primarily useful in the implementation of other methods accepting a token-descriptor to search for. Token stream implementation commonly have many helper methods along the lines of public function findRight($pos, $findTokenType) { $tokens = $this->tokens; for ($count = \count($tokens); $pos < $count; $pos++) { if ($tokens[$pos]->is($findTokenType)) { return $pos; } } return -1; } which search for tokens in different directions, skip tokens, check that tokens exist while skipping whitespace etc. Having a fast primitive to perform the token type check would be quite useful for that, I think. Regards, Nikita
  108765
February 26, 2020 06:00 michal.brzuchalski@gmail.com (=?UTF-8?Q?Micha=C5=82_Brzuchalski?=)
czw., 13 lut 2020 o 10:48 Nikita Popov ppv@gmail.com> napisał(a):

> Hi internals, > > This has been discussed a while ago already, now as a proper proposal: > https://wiki.php.net/rfc/token_as_object > > tl;dr is that it allows you to get token_get_all() output as an array of > PhpToken objects. This reduces memory usage, improves performance, makes > code more uniform and readable... What's not to like? > > An open question is whether (at least to start with) PhpToken should be > just a data container, or whether we want to add some helper methods to it. > If this generates too much bikeshed, I'll drop methods from the proposal. > > Regards, > Nikita >
Hi Nikita, I really like the RFC itself and got only one question regarding the class name which I couldn't find an answer in RFC. Why PHP prefix in PhpToken class name? Why can't we go with Token class name alone without the prefix? The only one which includes PHP in class names so far are only: * __PHP_Incomplete_Class * php_user_filter Above taken from https://www.php.net/manual/en/reserved.classes.php BR, Michał Brzuchalski
  108813
March 2, 2020 18:11 nikita.ppv@gmail.com (Nikita Popov)
On Thu, Feb 13, 2020 at 10:47 AM Nikita Popov ppv@gmail.com> wrote:

> Hi internals, > > This has been discussed a while ago already, now as a proper proposal: > https://wiki.php.net/rfc/token_as_object > > tl;dr is that it allows you to get token_get_all() output as an array of > PhpToken objects. This reduces memory usage, improves performance, makes > code more uniform and readable... What's not to like? > > An open question is whether (at least to start with) PhpToken should be > just a data container, or whether we want to add some helper methods to it. > If this generates too much bikeshed, I'll drop methods from the proposal. > > Regards, > Nikita >
I've done some final updates to the RFC ( https://wiki.php.net/rfc/token_as_object) and intend to open voting tomorrow, if there are major objections. The last changes were to clarify that getTokenName() returns null for unknown token IDs, and to provide some more extended rationale for what the individual methods are useful for. @Michal: The reason why it's called PhpToken is that it is ... a PHP token :) At least that's what it is intended to be used for -- while you could technically use it to represent other kinds of tokens, I don't think that's a usage that should be endorsed. Just "Token" is very generic, and not even specific to lexing/tokenization. The term "Token" is commonly used in other contexts as well (e.g. token ring). Regards, Nikita