Optional pre-compiler for PHP8?

  107706
October 27, 2019 02:12 mike@newclarity.net (Mike Schinkel)
Hello all:

While reading the [RFC] Union Types v2 thread and comments from Dmitry[1], and especially Benjamin[2] who suggested "building a static analysis tool which could prove that certain type checks would never fail, and prime OpCache" it occurred to me that a PHP pre-compiler could potentially be used to resolve numerous issues the community has been debating.

But first, let me define what I envision for a pre-compiler: 

- A command-line tool that could take a PHP file and/or application and generate pre-compiled files with an extension of .phpc or similar.
- This of these .phpc files being implemented similar to .phar` files, but actually compiled to a OpCache binary form.
- Pre-compiled files would be deployed alongside .php files, or optionally(?) standalone without PHP files.
- Libraries and (WordPress) plugins could deliver pre-compiled files too, alongside their .php source files

- Command line switches could allow for:
- Compiling with or without (selected) deprecations
- Selected constants defined on the command line
- Packaging code on a one-to-one per PHP file, as one file per namespace, one file per app, etc.

- The pre-compilation process would be able to:
- Type-check everything that has type-hints
- Do type checking that is too expensive to do at runtime

- Pre-compiling: 
- Could help eliminate the complexity of auto-loading and opening many files, at least for pre-compiled code.
- Would be an option for type checking and improved performance, but not be required.

If the PHP community were to embrace the idea of an optional pre-compiler then we could see the following benefits:

1. Full type checking capability without any concerns for runtime performance issues related to type checking.

2. Ability to significant improve performance over time, possibly even more than a JIT model.

3. Potential to support optimized real types — as in Hack — where code needs to be highly performant 

4. Ability to deprecate features for pre-compiled code while still supporting them when not precompiled.

While benefits #1 to #3 are highly valuable, consider benefit #4. If we had such a pre-compiler than the concern for BC for pre-compiled code could become moot as the deprecations would not affect any existing code that is not pre-compiled.  

This could potentially give us the best of both worlds?

Further, those most interested in deprecations and moving to enterprisey language features certainly use a CI/CD build process so it should be not problem at all for them to incorporate a pre-compile step.

Lastly, having such an optional process — with its primary promoted benefit being performance — could be a great incentive for those running less strict and backwards-compatible PHP code to refactor their source code to gain greater performance.  This contrasts with deprecating features and breaking BC just "because it is a better way to program." Give them a carrot rather than use a stick.

So for those who know PHP's internal core code: 

Is there any reason this is not technically viable?

And for everyone: 

What do you think of this as a potential future for PHP?

-Mike

[1] https://news-web.php.net/php.internals/107699
[2] https://news-web.php.net/php.internals/107702
  107707
October 27, 2019 16:56 marandall@php.net (Mark Randall)
On 27/10/2019 02:12, Mike Schinkel wrote:
> Hello all: > And for everyone:> > What do you think of this as a potential future for PHP?
I had received the impression that a lot of the problems for performance optimizations relate to how PHP can shift things around at runtime, where identical code at the run-site means something completely different in practice because the same class name or function has been included from one file, rather than another. I imagine if PHP had full knowledge of all its state, that might provide an avenue for additional optimizations.
  107710
October 27, 2019 23:04 rowan.collins@gmail.com (Rowan Tommins)
On 27/10/2019 02:12, Mike Schinkel wrote:
> While reading the [RFC] Union Types v2 thread and comments from Dmitry[1], and especially Benjamin[2] who suggested "building a static analysis tool which could prove that certain type checks would never fail, and prime OpCache" it occurred to me that a PHP pre-compiler could potentially be used to resolve numerous issues the community has been debating.
I chose the phrase "static analysis tool" deliberately, because I wanted to think about the minimum requirements for such a tool, rather than its long-term possibilities. The basic requirements are fairly straight-forward: - a static analyser that can infer types in a PHP program; we know that's possible from a number of third-party tools, although they do rely on docblock comments for things the language doesn't (yet) let you define - the ability to generate OpCodes for some code and store it to disk; this is more or less what OpCache does if enabled for CLI mode However, combining those usefully may not be that easy. The first problem is that OpCache is designed to work one file at a time, because a program can load any combination of files at run-time. Static analysers, on the other hand, need to process a whole directory at a time, so that calls can be matched to definitions; multiple definitions of the same function or class tend to cause problems, even though only one is loaded at run-time. So we'd probably need some built-in definition of a "package", which could be analysed and compiled as one unit, and didn't rely on any run-time loading. The second problem is that, as I understand it, type checks aren't actually separate OpCodes, so eliminating them from the compiled program may not be that easy. There are some cases where you can just eliminate the type check from a definition, e.g.: class A {     private int $x=1;     private function foo(int $x) { }     public function bar() {        $this->foo($this->x);     } } Since we know that function foo is only ever called with the correctly typed argument, we can compile it as though it had no type declaration. However, in the seemingly obvious case Benjamin gave, the optimisation isn't so easy: function x(): int {} function y(int $foo) {} y(x()); We can't eliminate the type check for all calls to x(), or for all calls to y(), but we want to eliminate the duplicate check for that particular line. So the OpCodes need to represent that somehow. I've no idea how easy or hard that would be. In order to extend this to a full compiler, we need at least one more thing: a stable compilation target. What I mean by that is that if I distribute a package in binary form, it needs to run on a reasonably large range of PHP versions and installations. My understanding is that the OpCodes in the Zend VM are not designed to be stable across versions, so you can't just ship today's OpCache output like you would a Java class file or .net assembly. Again, I don't know how much effort it would be to make the VM work as such a stable target.
> 4. Ability to deprecate features for pre-compiled code while still supporting them when not precompiled.
Unlike P++, Editions, or Strict Mode, this would undeniably define that the deprecated features were "the wrong way". If the engine had to support the feature anyway, I'm not sure what the advantage would be of tying it to "compiled vs non-compiled", rather than opting in via a declare() statement or package config. Regards, -- Rowan Tommins (né Collins) [IMSoP]
  107711
October 27, 2019 23:56 mike@newclarity.net (Mike Schinkel)
> On Oct 27, 2019, at 7:04 PM, Rowan Tommins collins@gmail.com> wrote:
Thank you for your comments.
> I chose the phrase "static analysis tool" deliberately, because I wanted to think about the minimum requirements for such a tool, rather than its long-term possibilities.
Your points are all well-considered. To be clear, I wasn't stating the idea as a alternative to your idea, I was only stating that your comments inspired me to have the idea of a pre-compiler. IOW, I saw no reason both could not be done, one sooner and the other later.
> However, combining those usefully may not be that easy.
Also for clarity, I was not assuming existing OpCache would be 100% unmodified, I was talking about benefits that a pre-compiler could have and was less focused on ensuring it could slot into an existing OpCache implementation as-is. IOW, if it is worth doing it might be worth extending how the OpCache works.
> So we'd probably need some built-in definition of a "package", which could be analysed and compiled as one unit, and didn't rely on any run-time loading.
That idea of a "package" came up during a debate on this list at least once, a few months ago, and I think it makes a lot of sense. And what I proposed effectively implies that namespaces would be treated like packages from the perspective of the compiler. But then again a new package concept might be needed in addition to namespaces, I am not certain either way.
> Unlike P++, Editions, or Strict Mode, this would undeniably define that the deprecated features were "the wrong way".
I am not sure I cam agree that it would define them as the "wrong way." The way I would see it is there would be a "strict way" and an "unstrict way." If you prefer the simplicity of low strictness and do not need more/better performance or the benefits of type-safety that are needed for building large applications, then the "right way" would still be the "unstrict way." And the non-strict features would not be "deprecated" per-se, they would instead be disallowed for the strict (compiled) way, but still allowed for the unstrict (interpreted) way.
> If the engine had to support the feature anyway,
I think we are talking two engines; one for compiling and another for interpreting. They could probably share a lot of code, but I would think it would still need to be two different engines.
> I'm not sure what the advantage would be of tying it to "compiled vs non-compiled", rather than opting in via a declare() statement or package config.
The advantage would be two-fold: 1. Backward compatibility 2. Allowing PHP to continue to meet the needs of new/less-skilled programmers and/or people who want a more productive language for smaller projects that do not need or want all the enterprisey type-safe features. Frankly it is this advantage which is the primary reason I though to send a message to the list. The chance to have the benefit of strictness and high performance for more advanced PHP developers while still having full BC for existing code and for beginner developers seemed highly compelling to me. -Mike
  107712
October 28, 2019 00:32 benjamin.morel@gmail.com (Benjamin Morel)
> > > So we'd probably need some built-in definition of a "package", which > could be analysed and compiled as one unit, and didn't rely on any run-time > loading. > That idea of a "package" came up during a debate on this list at least > once, a few months ago, and I think it makes a lot of sense. And what I > proposed effectively implies that namespaces would be treated like packages > from the perspective of the compiler.
Putting aside the idea of distributing pre-compiled PHP scripts, if we're only debating the precompilation as, notably, a means to reduce the cost of type checks, I wouldn't mind if the precompilation occurred *only if preloading is in use*, i.e if most class definitions are known on server startup, which is when the compilation / optimization passes could occur. No preloading = no such optimizations, I could personally live with that. No need for a package definition, IMO. — Benjamin On Mon, 28 Oct 2019 at 00:56, Mike Schinkel <mike@newclarity.net> wrote:
> > On Oct 27, 2019, at 7:04 PM, Rowan Tommins collins@gmail.com> > wrote: > > Thank you for your comments. > > > I chose the phrase "static analysis tool" deliberately, because I wanted > to think about the minimum requirements for such a tool, rather than its > long-term possibilities. > > Your points are all well-considered. > > To be clear, I wasn't stating the idea as a alternative to your idea, I > was only stating that your comments inspired me to have the idea of a > pre-compiler. > > IOW, I saw no reason both could not be done, one sooner and the other > later. > > > However, combining those usefully may not be that easy. > > Also for clarity, I was not assuming existing OpCache would be 100% > unmodified, I was talking about benefits that a pre-compiler could have and > was less focused on ensuring it could slot into an existing OpCache > implementation as-is. > > IOW, if it is worth doing it might be worth extending how the OpCache > works. > > > So we'd probably need some built-in definition of a "package", which > could be analysed and compiled as one unit, and didn't rely on any run-time > loading. > > That idea of a "package" came up during a debate on this list at least > once, a few months ago, and I think it makes a lot of sense. And what I > proposed effectively implies that namespaces would be treated like packages > from the perspective of the compiler. > > But then again a new package concept might be needed in addition to > namespaces, I am not certain either way. > > > Unlike P++, Editions, or Strict Mode, this would undeniably define that > the deprecated features were "the wrong way". > > I am not sure I cam agree that it would define them as the "wrong way." > > The way I would see it is there would be a "strict way" and an "unstrict > way." If you prefer the simplicity of low strictness and do not need > more/better performance or the benefits of type-safety that are needed for > building large applications, then the "right way" would still be the > "unstrict way." > > And the non-strict features would not be "deprecated" per-se, they would > instead be disallowed for the strict (compiled) way, but still allowed for > the unstrict (interpreted) way. > > > If the engine had to support the feature anyway, > > I think we are talking two engines; one for compiling and another for > interpreting. They could probably share a lot of code, but I would think > it would still need to be two different engines. > > > I'm not sure what the advantage would be of tying it to "compiled vs > non-compiled", rather than opting in via a declare() statement or package > config. > > The advantage would be two-fold: > > 1. Backward compatibility > > 2. Allowing PHP to continue to meet the needs of new/less-skilled > programmers and/or people who want a more productive language for smaller > projects that do not need or want all the enterprisey type-safe features. > > Frankly it is this advantage which is the primary reason I though to send > a message to the list. The chance to have the benefit of strictness and > high performance for more advanced PHP developers while still having full > BC for existing code and for beginner developers seemed highly compelling > to me. > > -Mike > >
  107717
October 28, 2019 03:33 andreas@dqxtech.net (Andreas Hennings)
On Mon, 28 Oct 2019 at 01:33, Benjamin Morel morel@gmail.com> wrote:
> > > > > > So we'd probably need some built-in definition of a "package", which > > could be analysed and compiled as one unit, and didn't rely on any run-time > > loading. > > That idea of a "package" came up during a debate on this list at least > > once, a few months ago, and I think it makes a lot of sense. And what I > > proposed effectively implies that namespaces would be treated like packages > > from the perspective of the compiler. > > > > Putting aside the idea of distributing pre-compiled PHP scripts, if we're > only debating the precompilation as, notably, a means to reduce the cost of > type checks, I wouldn't mind if the precompilation occurred *only if > preloading is in use*, i.e if most class definitions are known on server > startup, which is when the compilation / optimization passes could occur. > No preloading = no such optimizations, I could personally live with that. > > No need for a package definition, IMO.
This would break as soon as we have two versions of a class, and a runtime choice which of them to use. (see also Mark Randall's comment) What about this, instead: - Instead of a cli command, lazily "compile" in the opcache. So more or less what we are already doing, I guess. - Possibility to store/cache multiple versions of a file, depending on other files it depends on. Somehow like this, per file: 1. Compile a low-level version of the file, or load it from a cache, with cache id = file path. 2. Recursively process all the files and classes (autoload) this file depends on. 3. Generate a hash from the dependencies. 4. Compile the final version of the file, or load it from a cache, with cache id = file path + dependencies hash. Perhaps this could even be further optimized with some "guessing": Assume everything is as it was the last time, until we hit a conflict. This is probably more complicated than I am describing it here. I kept the term "dependencies" intentionally vague, because I am not sure what exactly we would need to look at. Perhaps we would store not just multiple versions of each file, but of each global symbol (class, function). - One "base version" for each distinct definition of a symbol in a distinct file. - One "specific version" per combination of versions of other symbols this depends on. One problem I see is that some of the dependees may be unknown at the time a file is included. E.g. a function might call a static method from a class that has not yet been included, triggering the autoloader. Since the autoloader can be anything, we have no way to predict which file will be included, and thus, which version the static method to typecheck against. Even if we previously scanned the entire project directory, and found only one class with the given static method, the autoloader might instead include a file outside the project directory, or define the class with eval() or stream wrappers, or dump a generated file in /tmp. This would mean we would have to run a non-deterministic model until all dependees are included. So perhaps this idea is a dead end :) -- Andreas
> > — Benjamin > > On Mon, 28 Oct 2019 at 00:56, Mike Schinkel <mike@newclarity.net> wrote: > > > > On Oct 27, 2019, at 7:04 PM, Rowan Tommins collins@gmail.com> > > wrote: > > > > Thank you for your comments. > > > > > I chose the phrase "static analysis tool" deliberately, because I wanted > > to think about the minimum requirements for such a tool, rather than its > > long-term possibilities. > > > > Your points are all well-considered. > > > > To be clear, I wasn't stating the idea as a alternative to your idea, I > > was only stating that your comments inspired me to have the idea of a > > pre-compiler. > > > > IOW, I saw no reason both could not be done, one sooner and the other > > later. > > > > > However, combining those usefully may not be that easy. > > > > Also for clarity, I was not assuming existing OpCache would be 100% > > unmodified, I was talking about benefits that a pre-compiler could have and > > was less focused on ensuring it could slot into an existing OpCache > > implementation as-is. > > > > IOW, if it is worth doing it might be worth extending how the OpCache > > works. > > > > > So we'd probably need some built-in definition of a "package", which > > could be analysed and compiled as one unit, and didn't rely on any run-time > > loading. > > > > That idea of a "package" came up during a debate on this list at least > > once, a few months ago, and I think it makes a lot of sense. And what I > > proposed effectively implies that namespaces would be treated like packages > > from the perspective of the compiler. > > > > But then again a new package concept might be needed in addition to > > namespaces, I am not certain either way. > > > > > Unlike P++, Editions, or Strict Mode, this would undeniably define that > > the deprecated features were "the wrong way". > > > > I am not sure I cam agree that it would define them as the "wrong way." > > > > The way I would see it is there would be a "strict way" and an "unstrict > > way." If you prefer the simplicity of low strictness and do not need > > more/better performance or the benefits of type-safety that are needed for > > building large applications, then the "right way" would still be the > > "unstrict way." > > > > And the non-strict features would not be "deprecated" per-se, they would > > instead be disallowed for the strict (compiled) way, but still allowed for > > the unstrict (interpreted) way. > > > > > If the engine had to support the feature anyway, > > > > I think we are talking two engines; one for compiling and another for > > interpreting. They could probably share a lot of code, but I would think > > it would still need to be two different engines. > > > > > I'm not sure what the advantage would be of tying it to "compiled vs > > non-compiled", rather than opting in via a declare() statement or package > > config. > > > > The advantage would be two-fold: > > > > 1. Backward compatibility > > > > 2. Allowing PHP to continue to meet the needs of new/less-skilled > > programmers and/or people who want a more productive language for smaller > > projects that do not need or want all the enterprisey type-safe features. > > > > Frankly it is this advantage which is the primary reason I though to send > > a message to the list. The chance to have the benefit of strictness and > > high performance for more advanced PHP developers while still having full > > BC for existing code and for beginner developers seemed highly compelling > > to me. > > > > -Mike > > > >
  107721
October 28, 2019 09:03 benjamin.morel@gmail.com (Benjamin Morel)
> > This would break as soon as we have two versions of a class, and a > runtime choice which of them to use. > (see also Mark Randall's comment)
That's why I'm suggesting to only make these optimizations when preloading <https://wiki.php.net/rfc/preload>is in use, which means that you know ahead of time the class definitions, and you cannot have 2 runtime definitions of a given class. No preloading = no optimizations. Full preloading (whole codebase) = maximum optimizations. Partial preloading = the compiler should still be able to optimize *some *of the code involving only the preloaded classes. We already have, since PHP 7.4, a mechanism to know static class definitions on startup, so why not build further optimizations on top of it? ⁠— Benjamin
  107716
October 28, 2019 03:20 marandall@php.net (Mark Randall)
On 27/10/2019 23:56, Mike Schinkel wrote:
> 2. Allowing PHP to continue to meet the needs of new/less-skilled programmers and/or people who want a more productive language for smaller projects that do not need or want all the enterprisey type-safe features.
This concept of type safety being an enterprise feature needs to die. Types are a way of preventing your program from getting into states that you don't expect it to be in, so you don't have to worry about handling them in the first place. Scalars, and strict types would have saved me _so much_ time when I started trying to learn PHP. Here's a video I stumbled upon recently that helps explain why types help make coding easier, by reducing the number of possible states an application can be in: https://youtu.be/q1Yi-WM7XqQ?t=656 -- Mark Randall
  107722
October 28, 2019 10:00 rowan.collins@gmail.com (Rowan Tommins)
On Sun, 27 Oct 2019 at 23:56, Mike Schinkel <mike@newclarity.net> wrote:

> > So we'd probably need some built-in definition of a "package", which could > be analysed and compiled as one unit, and didn't rely on any run-time > loading. > > > That idea of a "package" came up during a debate on this list at least > once, a few months ago, and I think it makes a lot of sense. And what I > proposed effectively implies that namespaces would be treated like packages > from the perspective of the compiler. > > But then again a new package concept might be needed in addition to > namespaces, I am not certain either way. > >
Current tools tend to actually work on a directory level, because you don't actually know what namespaces are involved until after you've loaded it, and a file can include code for two completely separate namespaces. My thinking was that a package would pre-define the full list of files that define it, with no auto-loader, and no conditional definitions evaluated at run-time. As Benjamin points out, this is closely related to preloading.
> Unlike P++, Editions, or Strict Mode, this would undeniably define that > the deprecated features were "the wrong way". > > > I am not sure I cam agree that it would define them as the "wrong way." > > > The way I would see it is there would be a "strict way" and an "unstrict > way." If you prefer the simplicity of low strictness and do not need > more/better performance or the benefits of type-safety that are needed for > building large applications, then the "right way" would still be the > "unstrict way." > >
And what if you want simplicity *and* performance? Most of the things people want to make strict about the language don't make it faster, so if we limited "pre-compiled mode" to be strict, we'd be making a deliberate choice to group objectively good things (fast vs slow) with subjective preferences (strict vs simple). That pretty clearly marks strict mode as "the better way".
> If the engine had to support the feature anyway, > > > I think we are talking two engines; one for compiling and another for > interpreting. They could probably share a lot of code, but I would think > it would still need to be two different engines. > >
That sounds like the worst kind of fork: two different engines, running two different dialects of the language. At that point, you might as well just switch to Hack. Note that this was exactly what "P++" was intended to avoid - the two dialects would exist in the same engine, and get the same performance and security enhancements.
> I'm not sure what the advantage would be of tying it to "compiled vs > non-compiled", rather than opting in via a declare() statement or package > config. > > The advantage would be two-fold: > > 1. Backward compatibility > > 2. Allowing PHP to continue to meet the needs of new/less-skilled > programmers and/or people who want a more productive language for smaller > projects that do not need or want all the enterprisey type-safe features. > >
Both of these are reasons to have some sort of "strict mode", but not for tying it to some other feature. Regards, -- Rowan Tommins [IMSoP]
  107726
October 29, 2019 19:04 mike@newclarity.net (Mike Schinkel)
> On Oct 28, 2019, at 6:00 AM, Rowan Tommins collins@gmail.com> wrote: > > Current tools tend to actually work on a directory level, because you don't > actually know what namespaces are involved until after you've loaded it, > and a file can include code for two completely separate namespaces. My > thinking was that a package would pre-define the full list of files that > define it, with no auto-loader, and no conditional definitions evaluated at > run-time. As Benjamin points out, this is closely related to preloading.
I would rather a tool that did not require specifying the files. I personally would be fine with one that used a directory as the demarcator, and even if it only worked when you put your namespace in another directory it won't work.
> And what if you want simplicity *and* performance? Most of the things > people want to make strict about the language don't make it faster, so if > we limited "pre-compiled mode" to be strict, we'd be making a deliberate > choice to group objectively good things (fast vs slow) with subjective > preferences (strict vs simple). That pretty clearly marks strict mode as > "the better way".
At the risk of being too flippant, I defer to the wisdom on that great philosopher Mick Jagger and say you can't always get what you want... But seriously, at some point tradeoffs have to be made to see any forward progress. What we have not found before was a good tradeoff between strict and BC. Maybe this it is? After all, while not all strict things are about performance but many things that enable performance are strict.
> That sounds like the worst kind of fork: two different engines, running two > different dialects of the language. At that point, you might as well just > switch to Hack.
That feels like an over-reaction. Hack has purposely diverged from PHP and requires a different runtime than PHP. The idea I was proposing is that the PHP runtime be one but operates in two different modes — one mode per "engine" — and the goal of two different modes would to be to stay more similar than different, but allow one of them to have BC breaks.
> Note that this was exactly what "P++" was intended to avoid - the two > dialects would exist in the same engine, and get the same performance and > security enhancements.
It could also be one engine, it just seemed like that coupling would be more problematic than separating them. That said, I'm not skilled enough in PHP internals to implement it (yet?) so I can only speak to it at a high level.
>> The advantage would be two-fold: >> >> 1. Backward compatibility >> >> 2. Allowing PHP to continue to meet the needs of new/less-skilled >> programmers and/or people who want a more productive language for smaller >> projects that do not need or want all the enterprisey type-safe features. > > Both of these are reasons to have some sort of "strict mode", but not for > tying it to some other feature.
I don't understand your reply, but maybe it is moot considering the rest of the dialog? What we have today is a rock vs a hard-place, and no one wants to give even a millimeter. So, if this is not a viable solution in your mind to break the logjam between BC and the desire for strictness-in-all-the-things, do you have an alternate, better proposal? -Mike
  107728
October 29, 2019 21:49 rowan.collins@gmail.com (Rowan Tommins)
On 29/10/2019 19:04, Mike Schinkel wrote:
>> Note that this was exactly what "P++" was intended to avoid - the two >> dialects would exist in the same engine, and get the same performance and >> security enhancements. > > It could also be one engine, it just seemed like that coupling > would be more problematic than separating them. >
I think the problem is that as soon as you have two engines targeting different feature sets, it will be hard to persuade people to spend equal attention on both. If all the new features end up being added to one engine, the other one is going to increasingly feel like "legacy mode", rather than "equal but different".
>> Both of these are reasons to have some sort of "strict mode", but not for >> tying it to some other feature. > > I don't understand your reply, but maybe it is moot considering > the rest of the dialog? > > What we have today is a rock vs a hard-place, and no one wants to > give even a millimeter. > > So, if this is not a viable solution in your mind to break the > logjam between BC and the desire for strictness-in-all-the-things, > do you have an alternate, better proposal? >
The idea of an "extra strict" and/or "less backwards compatible" mode has been mentioned on the list several times, but you're the first to suggest making it mandatory when using an otherwise unrelated performance feature. It would be much better to keep it separate, and opt into it via a declare() statement, or a package configuration, or a file extension. There have been proposals for a single flag, lots of separate flags, a complete "P++" dialect, or bundles of settings ("Editions"). Whatever the approach, a key goal in my mind should be to maximise the compatibility between the two, and share as much implementation as possible. Both/all modes should get the same performance improvements, except where the actual features are necessarily slower or faster. Regards, -- Rowan Tommins (né Collins) [IMSoP]
  107729
October 29, 2019 21:56 mike@newclarity.net (Mike Schinkel)
> On Oct 29, 2019, at 5:49 PM, Rowan Tommins collins@gmail.com> wrote: > > I think the problem is that as soon as you have two engines targeting different feature sets, it will be hard to persuade people to spend equal attention on both. If all the new features end up being added to one engine, the other one is going to increasingly feel like "legacy mode", rather than "equal but different".
That is a fair point.
> It would be much better to keep it separate, and opt into it via a declare() statement, or a package configuration, or a file extension. There have been proposals for a single flag, lots of separate flags, a complete "P++" dialect, or bundles of settings ("Editions").
Correct me if I am wrong, but all of those have been objected to, strenuously, by at least several people on the list. What will it take to finally get enough consensus to move forward?
> Both/all modes should get the same performance improvements, except where the actual features are necessarily slower or faster.
Fine. But a pre-compiler still could have merit. One of the things I would like to see from a pre-compiler is getting rid of the need to deal with an autoloader and hence we able to store multiple related classes in the same file. Primarily I would like this will doing R&D on a project idea prior to fully understanding what the object hierarchy needs to be. That, of course, would conflict with the non-pre-compiled code by its very nature. -Mike
  107730
October 29, 2019 22:26 rowan.collins@gmail.com (Rowan Tommins)
On 29/10/2019 21:56, Mike Schinkel wrote:
> > >> It would be much better to keep it separate, and opt into it via a >> declare() statement, or a package configuration, or a file extension. >> There have been proposals for a single flag, lots of separate flags, >> a complete "P++" dialect, or bundles of settings ("Editions"). > > Correct me if I am wrong, but all of those have been objected to, > strenuously, by at least several people on the list. >
Indeed, but adding "the strict mode will be faster than the legacy mode" is likely to make those objections stronger, not resolve them, unless you can demonstrate _why_ the strict mode needs to be mandatory for the pre-compiled mode.
>> Both/all modes should get the same performance improvements, except >> where the actual features are necessarily slower or faster. > > Fine. But a pre-compiler still could have merit. >
Absolutely! In case you've forgotten, it was my remark that started this whole discussion: https://externals.io/message/106844#107656
> One of the things I would like to see from a pre-compiler is > getting rid of the need to deal with an autoloader and hence we > able to store multiple related classes in the same file. >
Yes, I think moving from auto-loading to eager loading would make sense for a lot of projects. Regards, -- Rowan Tommins (né Collins) [IMSoP]
  107727
October 29, 2019 21:44 d.takken@xs4all.nl (Dik Takken)
On 28-10-19 00:04, Rowan Tommins wrote:
> - a static analyser that can infer types in a PHP program; we know > that's possible from a number of third-party tools, although they do > rely on docblock comments for things the language doesn't (yet) let you > define
Opcache already performs type inference. It does not make use of information in comments. It only looks at the code, yielding type information that is accurate and can be used for optimization. Here is an interesting read on the subject: https://depositonce.tu-berlin.de/bitstream/11303/7919/3/popov_etal_2017.pdf
> The first problem is that OpCache is designed to work one file at a > time, because a program can load any combination of files at run-time. > Static analysers, on the other hand, need to process a whole directory > at a time, so that calls can be matched to definitions; multiple > definitions of the same function or class tend to cause problems, even > though only one is loaded at run-time. So we'd probably need some > built-in definition of a "package", which could be analysed and compiled > as one unit, and didn't rely on any run-time loading.
This problem could possibly be solved by using preloading. The definition of a package would then be: the set of files that the application will load during startup. Preloading could give opcache access to the full application and optimize more effectively.
> The second problem is that, as I understand it, type checks aren't > actually separate OpCodes, so eliminating them from the compiled program > may not be that easy. There are some cases where you can just eliminate > the type check from a definition, e.g.:
This is partially correct. Some type checks are separate opcodes, some are not. Type checking opcodes are actually removed by opcache when its static analysis can prove that the type check will always pass. It has some limitations but the functionality is all there. Regards, Dik Takken
  107736
October 30, 2019 21:05 rowan.collins@gmail.com (Rowan Tommins)
Hi Dik,

On 29/10/2019 21:44, Dik Takken wrote:
> Opcache already performs type inference. [...] > Here is an interesting read on the subject: > > https://depositonce.tu-berlin.de/bitstream/11303/7919/3/popov_etal_2017.pdf
Thanks for the link, and the insight into how much OpCache can already do. I guess preloading gets us pretty close to the tool I was imagining - OpCache could make assumptions that cross file boundaries, within the preloaded set, and could spend longer optimizing during the preloading phase than might be expected on a simple cache miss. I think it will be interesting to see how tools adopt that feature, and whether eventually we'll see autoloader functions as just a fallback mechanism, with most packages being enumerated in advance as large preloaded blocks. Regards, -- Rowan Tommins (né Collins) [IMSoP]
  107740
October 30, 2019 21:46 andreas@dqxtech.net (Andreas Hennings)
On Wed, 30 Oct 2019 at 22:06, Rowan Tommins collins@gmail.com> wrote:
> > Hi Dik, > > On 29/10/2019 21:44, Dik Takken wrote: > > Opcache already performs type inference. [...] > > Here is an interesting read on the subject: > > > > https://depositonce.tu-berlin.de/bitstream/11303/7919/3/popov_etal_2017..pdf > > > Thanks for the link, and the insight into how much OpCache can already do.. > > I guess preloading gets us pretty close to the tool I was imagining - > OpCache could make assumptions that cross file boundaries, within the > preloaded set, and could spend longer optimizing during the preloading > phase than might be expected on a simple cache miss. > > I think it will be interesting to see how tools adopt that feature, and > whether eventually we'll see autoloader functions as just a fallback > mechanism, with most packages being enumerated in advance as large > preloaded blocks.
What if we had a "native" autoload layer? The native autoloader could be made to fire before userland autoloaders. It could be based on a mapping like PSR-4, or simply a classmap. The mappings could be defined at "compile time", or frozen early in a request. This would allow to predict where each class is located at "compile time" or at opcache time, allowing to do all the type checks. An alternative would be to allow userland autoloaders to be registered with a hash, with the promise that as long as the hash is the same, classes remain where they are. Or allow userland to specify "class locators" instead of autoloaders, which could also be registered with a prediction hash. So, the overarching idea here is to make autoloading predictable at compile time or opcache time, and would not require an artificial "package" concept. As in my previous proposal, the opcache would have to store different versions of each file, for different combinations of autoload prediction hashes. This would allow e.g. different applications to share some of their PHP files without spoiling the opcache. -- Andreas
> > Regards, > > -- > Rowan Tommins (né Collins) > [IMSoP] > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php >