Covariance, again

  101359
December 18, 2017 12:18 andreas@dqxtech.net (Andreas Hennings)
There were discussions about covariance and contravariance in the past.
https://externals.io/message/98085#98105
Unfortunately I was not subscribed back then, so I cannot respond to anything.
So, here it goes again.

WIth co- and contravariance, the following would be possible:
- contravariance.php - https://3v4l.org/I3v0u
- covariance.php - https://3v4l.org/i79O5

(from guilhermeblanco's older email in "PHP's support to
contravariance and covariance")

The main problem was expressed by Levi Morrison in this older thread.

Currently we do not autoload classes in type hints.
https://3v4l.org/sFsDd
In the example I can declare "UnknownClass" as a return type hint, and
PHP won't care.

However, to validate if a return type matches with the parent
definition, the class must be autoloaded first.
Or rather:
- If the return type is identical with the parent, PHP can say "yes"
with no class loading required.
- If the return type is different from the parent:
-- Currently, PHP simply says "no" (Fatal error: Declaration of
C::foo(): C must be compatible with I::foo()).
-- To support covariance, PHP would have to autoload the class in the
type hint, and then check the hierarchy.


## Solutions proposed in old thread

Levi Morrison:

> You need to adjust the passes over the code to register symbols and > their declared relationships, and then in a separate pass validate > them. After that if the class isn't found then you trigger an > autoload. > > It's doable, it just hasn't been done.
Christoph M. Becker:
> An alternative might be forward class declarations:
## What I propose instead I think it is not so complicated actually, and can be done without BC break, and without forward type hints. This only gives us covariance. Contravariance is another story. When a class declaration is executed, do the following: - Parse the class AST (obviously). - Autoload all identifiers in "extends" and "implements". - Autoload all identifiers in return type hints that are not identical with the parent return type hint. If such a class is not found, report "Return type must either be identical with the parent, or it must be an existing or autoloadable class or interface." Well, or any message that clarifies why this one was autoloaded, while other type hint classes are not autoloaded. So, this behavior is the same as today, except for the case of return type hints that differ from the parent, which currently result in fatal error. Does this sound doable? Am I missing something else? -- Andreas
  101360
December 18, 2017 14:45 levim@php.net (Levi Morrison)
On Mon, Dec 18, 2017 at 5:18 AM, Andreas Hennings <andreas@dqxtech.net> wrote:
> There were discussions about covariance and contravariance in the past. > https://externals.io/message/98085#98105 > Unfortunately I was not subscribed back then, so I cannot respond to anything. > So, here it goes again. > > WIth co- and contravariance, the following would be possible: > - contravariance.php - https://3v4l.org/I3v0u > - covariance.php - https://3v4l.org/i79O5 > > (from guilhermeblanco's older email in "PHP's support to > contravariance and covariance") > > The main problem was expressed by Levi Morrison in this older thread. > > Currently we do not autoload classes in type hints. > https://3v4l.org/sFsDd > In the example I can declare "UnknownClass" as a return type hint, and > PHP won't care. > > However, to validate if a return type matches with the parent > definition, the class must be autoloaded first. > Or rather: > - If the return type is identical with the parent, PHP can say "yes" > with no class loading required. > - If the return type is different from the parent: > -- Currently, PHP simply says "no" (Fatal error: Declaration of > C::foo(): C must be compatible with I::foo()). > -- To support covariance, PHP would have to autoload the class in the > type hint, and then check the hierarchy. > > > ## Solutions proposed in old thread > > Levi Morrison: > >> You need to adjust the passes over the code to register symbols and >> their declared relationships, and then in a separate pass validate >> them. After that if the class isn't found then you trigger an >> autoload. >> >> It's doable, it just hasn't been done. > > Christoph M. Becker: > >> An alternative might be forward class declarations: > > > ## What I propose instead > > I think it is not so complicated actually, and can be done without BC > break, and without forward type hints. > This only gives us covariance. Contravariance is another story. > > When a class declaration is executed, do the following: > - Parse the class AST (obviously). > - Autoload all identifiers in "extends" and "implements". > - Autoload all identifiers in return type hints that are not identical > with the parent return type hint. > > If such a class is not found, report "Return type must either be > identical with the parent, or it must be an existing or autoloadable > class or interface." > Well, or any message that clarifies why this one was autoloaded, while > other type hint classes are not autoloaded. > > So, this behavior is the same as today, except for the case of return > type hints that differ from the parent, which currently result in > fatal error. > > Does this sound doable? Am I missing something else?
I believe your algorithm fails on this simple setup: If I correctly typed this from memory there is no way to order this such that all units are defined ahead of time as needed for verifying correctness. This means we trigger the autoloader even though the type is defined in the same file. Even if we do some more complicated compile-time passes we'd fail on things like this: This case shows that care needs to be taken to get the order down correctly even if we autoload: All-in-all I don't think we can resolve every case cleanly because we do not have purely ahead-of-time compilation for all units involved. I think every method of implementing this feature has drawbacks and we need to thoughtfully evaluate them.
  101361
December 18, 2017 14:58 andreas@dqxtech.net (Andreas Hennings)
Let me address the simple example first.

On 18 December 2017 at 15:45, Levi Morrison <levim@php.net> wrote:
> > I believe your algorithm fails on this simple setup: > > > interface A { > function foo(): X; > } > > interface B extends A { > function foo(): Y; > } > > interface X { > function bar(): A; > } > > interface Y extends X { > function bar(): B; > } > > ?> > > If I correctly typed this from memory there is no way to order this > such that all units are defined ahead of time as needed for verifying > correctness. This means we trigger the autoloader even though the type > is defined in the same file. Even if we do some more complicated > compile-time passes we'd fail on things like this:
You need to compile all classes in the file, and then do the autoloading. So maybe my description of the algorithm is too simple. However, this is not really new, and is not really a problem I would say. What about this: https://3v4l.org/5klJQ Here the interface I is declared after the class C that implements the interface. This means the autoloading for identifiers in "extends" or "implements" clauses already need to wait for the current file to be fully processed. So yes, we need to process the entire file before autoloading anything. But we already do that for inheritance. So it is nothing new.
  101362
December 18, 2017 15:12 andreas@dqxtech.net (Andreas Hennings)
Ok, I think I missed the circularity aspect in your examples.
Inheritance by itself is never circular.
However, return types can make this entire thing circular.

So the problem would be if we try to autoload the same thing that is
currently in the process of being being defined.

Maybe we could generate similar circularity problems with class_exists() calls?

On 18 December 2017 at 15:58, Andreas Hennings <andreas@dqxtech.net> wrote:
> Let me address the simple example first. > > On 18 December 2017 at 15:45, Levi Morrison <levim@php.net> wrote: >> >> I believe your algorithm fails on this simple setup: >> >> > >> interface A { >> function foo(): X; >> } >> >> interface B extends A { >> function foo(): Y; >> } >> >> interface X { >> function bar(): A; >> } >> >> interface Y extends X { >> function bar(): B; >> } >> >> ?> >> >> If I correctly typed this from memory there is no way to order this >> such that all units are defined ahead of time as needed for verifying >> correctness. This means we trigger the autoloader even though the type >> is defined in the same file. Even if we do some more complicated >> compile-time passes we'd fail on things like this: > > You need to compile all classes in the file, and then do the autoloading. > So maybe my description of the algorithm is too simple. > > However, this is not really new, and is not really a problem I would say. > What about this: https://3v4l.org/5klJQ > > class C implements I {} > > interface I {} > ?> > > Here the interface I is declared after the class C that implements the > interface. > This means the autoloading for identifiers in "extends" or > "implements" clauses already need to wait for the current file to be > fully processed. > > So yes, we need to process the entire file before autoloading anything. > But we already do that for inheritance. So it is nothing new.
  101363
December 18, 2017 18:52 andreas@dqxtech.net (Andreas Hennings)
> I believe your algorithm fails on this simple setup:
Another comment I want to make here: The examples you give each have multiple class declarations per file. I would personally not care much, if these result in fatal error. All of this code used to be illegal until now (because no covariance support), so it would not be a BC problem if some of it continues to be illegal. This being said: I think we can probably construct examples that have one-class-per-file, but that still have a circularity problem due to covariance. Or possibly even with class_exists()? I am going to play around a bit.
> > > interface A { > function foo(): X; > } > > interface B extends A { > function foo(): Y; > } > > interface X { > function bar(): A; > } > > interface Y extends X { > function bar(): B; > } > > ?> > > If I correctly typed this from memory there is no way to order this > such that all units are defined ahead of time as needed for verifying > correctness. This means we trigger the autoloader even though the type > is defined in the same file. Even if we do some more complicated > compile-time passes we'd fail on things like this: > > > interface A { > function foo(): X; > } > > interface B extends A { > function foo(): Y; > } > > if (getenv("ENABLE_X")) { > interface X { > function bar(): A; > } > } > ?> > > interface Y extends X { > function bar(): B; > } > > ?> > > This case shows that care needs to be taken to get the order down > correctly even if we autoload: > > interface A { > function foo(): X; > } > > interface B extends A { > function foo(): Y; > } > ?> > interface X { > function bar(): A; > } > ?> > > interface Y extends X { > function bar(): C; > } > // At this point the engine will need to verify A and C but we may not > have finished verifying A and B yet > ?> > > All-in-all I don't think we can resolve every case cleanly because we > do not have purely ahead-of-time compilation for all units involved. I > think every method of implementing this feature has drawbacks and we > need to thoughtfully evaluate them.
  101366
December 18, 2017 20:48 levim@php.net (Levi Morrison)
On Mon, Dec 18, 2017 at 11:52 AM, Andreas Hennings <andreas@dqxtech.net> wrote:
>> I believe your algorithm fails on this simple setup: > > Another comment I want to make here: > The examples you give each have multiple class declarations per file. > I would personally not care much, if these result in fatal error. > All of this code used to be illegal until now (because no covariance > support), so it would not be a BC problem if some of it continues to > be illegal.
Just because it isn't a backwards compatibility break doesn't mean it's a good way forward. Right now people have covariant returns - they just don't express it in the signature because we don't allow it. Wouldn't it seem odd that if the bodies of the methods stayed the same and all they did was update the signature that it somehow breaks their code? I have some ideas about minimizing this impact but I really think we ought to tackle covariant returns and contravariant parameters at the same time. Any endeavor to add one should add the other to create a cohesive design that works in both cases.
  101368
December 18, 2017 21:21 andreas@dqxtech.net (Andreas Hennings)
> I really think we ought to tackle covariant returns and contravariant parameters
My "algorithm" could be extended for contravariance: Whenever a method has a parameter type hint that differs from the parent type hint, autoload the class of the parent type hint. I think I know too little about the internal workings of PHP to understand why your examples would break. I think we should give it a try, and write your examples as unit tests to try to crash it. If we indeed can produce circularity problems, then I might have more ideas. I think the main idea in my algorithm about which classes needs to be autoloaded and which don't is good. Maybe at some point we need to write a class into the table of defined classes, before it is fully verified. At some point, for a different purpose, I thought about "stub" classes, which have all the information from the declaration itself, but not from any parent class. So we could write classes into the stub table and then later write the completed thing into the actual class table/list. But maybe we don't need to go that far. If I were to do this, it would be my first shot on the php engine itself. Not going to happen today, but maybe I will find time for it in a future month or so. I am sure if/when I have done this, I can write more knowledgeable posts here on this mailing list. Of course if someone else wants to step up, go ahead. I think the goal and spec is pretty clear. So once we have a proof-of-concept implementation and show that it can be done and the problems can be solved, it would be straightforward to make an RFC. If we would do the RFC first, people would have to vote on something which is unclear if it can be implemented. On 18 December 2017 at 21:48, Levi Morrison <levim@php.net> wrote:
> On Mon, Dec 18, 2017 at 11:52 AM, Andreas Hennings <andreas@dqxtech.net> wrote: >>> I believe your algorithm fails on this simple setup: >> >> Another comment I want to make here: >> The examples you give each have multiple class declarations per file. >> I would personally not care much, if these result in fatal error. >> All of this code used to be illegal until now (because no covariance >> support), so it would not be a BC problem if some of it continues to >> be illegal. > > Just because it isn't a backwards compatibility break doesn't mean > it's a good way forward. Right now people have covariant returns - > they just don't express it in the signature because we don't allow it. > Wouldn't it seem odd that if the bodies of the methods stayed the same > and all they did was update the signature that it somehow breaks their > code? > > I have some ideas about minimizing this impact but I really think we > ought to tackle covariant returns and contravariant parameters at the > same time. Any endeavor to add one should add the other to create a > cohesive design that works in both cases.