[RFC][DISCUSSION] Strong Typing Syntax

  101477
January 2, 2018 10:35 tendoaki@gmail.com (Michael Morris)
I would like to propose a clean way to add some strong typing to PHP in a
manner that is almost fully backward compatible (there is a behavior change
with PHP 7 type declarations). As I don't have access to the add RFC's to
the wiki I'll place this here.

Before I begin detailing this I want to emphasize this syntax is optional
and lives alongside PHP's default scalar variables. If variables aren't
declared using the syntax detailed below than nothing changes.  This is not
only for backwards compatibility, but it's also to keep the language easy
to learn as understanding datatypes can be a stumbling block (I know it was
for me at least).

VARIABLE DECLARATION

Currently the var keyword is used to formally declare a variable.  The
keyword will now allow a type argument before the var name as so

var [type] $varname;

If the type is omitted, scalar is assumed.  If Fleshgrinder's scalar RFC is
accepted then it would make sense to allow programmers to explicitly
declare the variable as a scalar, but in any event when the type is omitted
scalar must be assumed for backwards compatibility.

The variables created by this pattern auto cast anything assigned to them
without pitching an error. So...

var string $a = 5.3;

The float of 5.3 will be cast as a string.

For some this doesn't go far enough - they'd rather have a TypeError thrown
when the assignment isn't going to work.  For them there is this syntax

string $a = "Hello";

Note that the var keyword isn't used.


FUNCTION DECLARATION

PHP 7 introduced type declarations.  This RFC calls for these to become
binding for consistency, which introduces the only backward compatibility
break of the proposal.  Consider the following code.

function foo ( string $a ) {
  $a = 5;
  echo is_int($a) ? 'Yes' : 'No';
}

Under this RFC "No" is returned because 5 is cast to a string when assigned
to $a. Currently "Yes" would be returned since a scalar has the type that
makes sense for the last assignment.

I believe this is an acceptable break for two reasons. 1, the type
declaration syntax is relatively new.  2, changing the type of a variable
mid-function is a bad pattern anyway.


OBJECT TYPE LOCKING

Currently there is no way to prevent a variable from being changed from an
object to something else. Example.

$a = new SomeClass();
$a = 5;

If objects are allowed to follow the same pattern outlined above though
this problem is mostly solved..

SomeClass $a = new SomeClass();
var SomeClass $a = new SomeClass();

QUESTION: How do we handle the second auto casting case? $a is not allowed
to not be a SomeClass() object, but there are no casting rules. We have
three options:
1. Throw an error on illegal assign.
2. Allow a magic __cast function that will cast any assignment to the
object.
3. Create a PHP Internal interface the object can implement that will
accomplish what 2 does without the magic approach.

Note that 1 will need to occur without implementation. 2 and 3 are not
mutually exclusive though my understanding is PHP is moving away from magic
functions.


CLASS DECLARATION
Again, by default class members are scalars. The syntax translates over
here as might be expected.

class SomeClass {
  public var string $a;
  protected int $b;
  private SomeOtherClass $c;
  public var SomeThirdClass $d;
}

Note a default value doesn't need to be provided.  In the case of object
members, these types are only checked for on assignment to prevent
recursion sending the autoloader into an infinite loop.

Also note that one of the functions of setters - guaranteeing correct type
assignment - comes free of charge with this change.


COMPARISON BEHAVIOR
When a strongly typed variable (autocasting or not) is compared to a scalar
variable only the scalar switches types. The strict comparison operator is
allowed though it only blocks the movement of the scalar.

Comparisons between strongly typed variables are always strict and a
TypeError results if their types don't match. This actually provides a way
to force the greater than, lesser than, and spaceship operation to be
strict.


FUNCTION CALLING
When a strong typed variable is passed to a function that declares a
variable's type then autocasting will occur so long as the pass is not by
reference.  For obvious reasons a TypeError will occur on a by reference
assignment..

function bar( string $a) {}
function foo( string &$a ) {}

$a = 5.3;
foo( $a ); // Works, $a is a scalar, so it type adjusts.
var bool $b = false;
foo( $b ); // TypeError, $b is boolean, function expects to receive a
string by reference.
bar($b); // Works since the pass isn't by reference, so the type can be
adjusted for the local scope.


CONCLUSION
I believe that covers all the bases needed. This will give those who want
things to use strong typing better tools, and those who don't can be free
to ignore them.
  101499
January 3, 2018 08:50 me@kelunik.com (Niklas Keller)
Hey Michael,

I don't think the BC break is acceptable. You argue that scalar type
declarations are relatively new, but in fact they're already years old now.
They're used in most PHP 7+ packages. Even if changing types might be
discouraged, it still happens a lot.

Regards, Niklas
  101500
January 3, 2018 09:03 tendoaki@gmail.com (Michael Morris)
On Wed, Jan 3, 2018 at 3:50 AM, Niklas Keller <me@kelunik.com> wrote:

> Hey Michael, > > I don't think the BC break is acceptable. You argue that scalar type > declarations are relatively new, but in fact they're already years old now. > They're used in most PHP 7+ packages. Even if changing types might be > discouraged, it still happens a lot. >
Hmm. Well, that aspect of this can be dropped. What about the rest of it?
  101509
January 3, 2018 17:10 andreas@dqxtech.net (Andreas Hennings)
This proposal contains some interesting ideas, which I see as separate:
1. A syntax to declare the type of local variables.
2. A syntax to declare the type of object properties.
3. Preventing local variables, object properties and parameters to
change their type after initialization/declaration.

For me the point 3 is the most interesting one.
I think the other points are already discussed elsewhere in some way,
although they are clearly related to 3.

Point 3 would be a BC break, if we would introduce it for parameters.
Current behavior: https://3v4l.org/bjaLQ

Local variables and object properties currently cannot be types, so
point 3 would not be a BC break for them, if we introduce it together
with 1 and 2.
But then we would have an inconsistency between parameters and local
vars / object properties.

What we could do to avoid BC break is to introduce
declare(fixed_parameter_types=1) in addition to
declare(strict_types=1).
For local variables and object properties, the type would always be fixed.
But for parameters, it would only be fixed if the
declare(fixed_parameter_types=1) is active.

Maybe to make it less verbose, we could say declare(strict_types=2),
which would mean the combination of both those things?
Or some other type of shortcut.
I think we will have to think about shortcuts like this if we
introduce more "modes" in the future.


> Currently the var keyword is used to formally declare a variable.
Are you talking about local variables? In which PHP version? https://3v4l.org/o0PFg Afaik, currently var is only used for class/object properties from the time when people did not declare the visibility as public/protected/private.
> If the type is omitted, scalar is assumed. If Fleshgrinder's scalar RFC is > accepted then it would make sense to allow programmers to explicitly > declare the variable as a scalar, but in any event when the type is omitted > scalar must be assumed for backwards compatibility.
If no type is specified, then "mixed" should be assumed, not "scalar". Assuming "scalar" would be a BC break, and it would be confusing. On 3 January 2018 at 10:03, Michael Morris <tendoaki@gmail.com> wrote:
> On Wed, Jan 3, 2018 at 3:50 AM, Niklas Keller <me@kelunik.com> wrote: > >> Hey Michael, >> >> I don't think the BC break is acceptable. You argue that scalar type >> declarations are relatively new, but in fact they're already years old now. >> They're used in most PHP 7+ packages. Even if changing types might be >> discouraged, it still happens a lot. >> > > Hmm. Well, that aspect of this can be dropped. What about the rest of it?
  101514
January 3, 2018 17:21 andreas@dqxtech.net (Andreas Hennings)
Another idea I have when reading this proposal is "implicit" typing
based on the initialization.

E.g.

$x = 5;
$x = 'hello';  // -> Error: $x was initialized as integer, and cannot
hold a string.

or

$x = $a + $b;
$x = 'hello';  // -> Error: $x was initialized as number (int|float),
and cannot hold a string.

To me this is only acceptable if the implicit type can be determined
at compile time.
So:

if ($weather_is_nice) {
  $x = 5;
}
else {
  $x = 'hello';  // -> Error: $x would be initialized as int
elsewhere, so cannot be initialized as string.
}


This change would be controversial and leave a lot of questions.
It would be a BC break, unless we introduce yet another declare()
setting, e.g. declare(implicit_types=1).

It could be tricky for global variables, or in combination with
include/require, where the variable can be seen from outside a
function body, and outside the range of the declare() statement.

I only mention it here because it relates to the proposal. I do not
have a strong opinion on it atm.


On 3 January 2018 at 18:10, Andreas Hennings <andreas@dqxtech.net> wrote:
> This proposal contains some interesting ideas, which I see as separate: > 1. A syntax to declare the type of local variables. > 2. A syntax to declare the type of object properties. > 3. Preventing local variables, object properties and parameters to > change their type after initialization/declaration. > > For me the point 3 is the most interesting one. > I think the other points are already discussed elsewhere in some way, > although they are clearly related to 3. > > Point 3 would be a BC break, if we would introduce it for parameters. > Current behavior: https://3v4l.org/bjaLQ > > Local variables and object properties currently cannot be types, so > point 3 would not be a BC break for them, if we introduce it together > with 1 and 2. > But then we would have an inconsistency between parameters and local > vars / object properties. > > What we could do to avoid BC break is to introduce > declare(fixed_parameter_types=1) in addition to > declare(strict_types=1). > For local variables and object properties, the type would always be fixed. > But for parameters, it would only be fixed if the > declare(fixed_parameter_types=1) is active. > > Maybe to make it less verbose, we could say declare(strict_types=2), > which would mean the combination of both those things? > Or some other type of shortcut. > I think we will have to think about shortcuts like this if we > introduce more "modes" in the future. > > >> Currently the var keyword is used to formally declare a variable. > > Are you talking about local variables? > In which PHP version? https://3v4l.org/o0PFg > > Afaik, currently var is only used for class/object properties from the > time when people did not declare the visibility as > public/protected/private. > > >> If the type is omitted, scalar is assumed. If Fleshgrinder's scalar RFC is >> accepted then it would make sense to allow programmers to explicitly >> declare the variable as a scalar, but in any event when the type is omitted >> scalar must be assumed for backwards compatibility. > > If no type is specified, then "mixed" should be assumed, not "scalar". > Assuming "scalar" would be a BC break, and it would be confusing. > > > On 3 January 2018 at 10:03, Michael Morris <tendoaki@gmail.com> wrote: >> On Wed, Jan 3, 2018 at 3:50 AM, Niklas Keller <me@kelunik.com> wrote: >> >>> Hey Michael, >>> >>> I don't think the BC break is acceptable. You argue that scalar type >>> declarations are relatively new, but in fact they're already years old now. >>> They're used in most PHP 7+ packages. Even if changing types might be >>> discouraged, it still happens a lot. >>> >> >> Hmm. Well, that aspect of this can be dropped. What about the rest of it?
  101518
January 3, 2018 17:54 tendoaki@gmail.com (Michael Morris)
On Wed, Jan 3, 2018 at 12:21 PM, Andreas Hennings <andreas@dqxtech.net>
wrote:

> Another idea I have when reading this proposal is "implicit" typing > based on the initialization. > > E.g. > > $x = 5; > $x = 'hello'; // -> Error: $x was initialized as integer, and cannot > hold a string. > > No, no no. I don't think I'd like that always on approach. However, I just
had an idea..... Let's step back. Way back. PHP/FF days back. Back in the day Ramus chose to put variables off on their own symbol table for performance reasons. This isn't as necessary now, but vars in PHP continue to be always $something. Now I don't know the implementation can of worms this would touch but what if this was changed for the locked type variables. That would distinguish them greatly.. int x = 5; Here x is a locked type variable of the integer type. Since it's also on the same symbol tables as the classes, functions, constants et al I presume it is namespace bound as well. var x = 5; If allowed what would this mean? And what to do with class members is an open question. Anyway, I'm looking for an implementation that allows loose and strong typing to coexist even within a given file. I use loosely typed variables most of time myself.
  101516
January 3, 2018 17:41 tendoaki@gmail.com (Michael Morris)
On Wed, Jan 3, 2018 at 12:10 PM, Andreas Hennings <andreas@dqxtech.net>
wrote:

> This proposal contains some interesting ideas, which I see as separate: > 1. A syntax to declare the type of local variables. > 2. A syntax to declare the type of object properties. > 3. Preventing local variables, object properties and parameters to > change their type after initialization/declaration. > > For me the point 3 is the most interesting one. > I think the other points are already discussed elsewhere in some way, > although they are clearly related to 3. > > Point 3 would be a BC break, if we would introduce it for parameters. > Current behavior: https://3v4l.org/bjaLQ > > Local variables and object properties currently cannot be types, so > point 3 would not be a BC break for them, if we introduce it together > with 1 and 2. > But then we would have an inconsistency between parameters and local > vars / object properties. > > What we could do to avoid BC break is to introduce > declare(fixed_parameter_types=1) in addition to > declare(strict_types=1). > For local variables and object properties, the type would always be fixed. > But for parameters, it would only be fixed if the > declare(fixed_parameter_types=1) is active. > > Maybe to make it less verbose, we could say declare(strict_types=2), > which would mean the combination of both those things? > Or some other type of shortcut. > I think we will have to think about shortcuts like this if we > introduce more "modes" in the future. > > There will be occasions where having an unfixed variable alongside normal
ones will be desirable.
> > > Currently the var keyword is used to formally declare a variable. > > Are you talking about local variables? > In which PHP version? https://3v4l.org/o0PFg > > Sorry, I'm confusing PHP for JavaScript. I forgot that the var keyword was
only used in PHP 4 for class members. For some reason my brain assumed it was usable in a local scope.
> Afaik, currently var is only used for class/object properties from the > time when people did not declare the visibility as > public/protected/private. > >
> > If no type is specified, then "mixed" should be assumed, not "scalar". > Assuming "scalar" would be a BC break, and it would be confusing. > > Ok. I'm misusing the term scalar to mean "variable who's type can be
changed at will depending on context." Sorry.
  101526
January 3, 2018 20:26 rowan.collins@gmail.com (Rowan Collins)
Hi Michael,

On 02/01/2018 10:35, Michael Morris wrote:
> I would like to propose a clean way to add some strong typing to PHP in a > manner that is almost fully backward compatible (there is a behavior change > with PHP 7 type declarations). As I don't have access to the add RFC's to > the wiki I'll place this here.
Thanks for putting this together. Perhaps unlike Andreas, I think it is good to look at typing changes as a unified framework, rather than considering "typed properties", "typed variables", etc, as separate concerns. If we don't, there is a real risk we'll end up making decisions now that hobble us for future changes, or over-complicating things in one area because we're not yet ready to make changes in another. My own thoughts on the subject from a while ago are here: http://rwec.co.uk/q/php-type-system In that post, I borrowed the term "container" from Perl6 for the conceptual thing that type constraints are stored against; in PHP's case, this would include variables, object properties, class static properties, function parameters, and return values. I think a good plan for introducing typing is one that considers all of these as equals. The biggest issue with any proposal, though, is going to be performance. I don't think this is an incidental detail to be dealt with later, it is a fundamental issue with the way type hints in PHP have evolved. PHP is extremely unusual, if not unique, in exclusively enforcing type constraints at runtime. Other languages with "gradual typing" such as Python, Hack, and Dart, use the annotations only in separate static analysers and/or when a runtime debug flag is set (similar to enabling assertions). Extending that to all containers means every assignment operation would effectively need to check the value on the right-hand-side against the constraint on the left-hand-side. Some of those checks are non-trivial, e.g. class/interface constraints, callable; or in future maybe "array of Foo", "Foo | Bar | int", "Foo & Bar", etc. There are ways to ease this a bit, like passing around a cache of type constraints a value has passed, but I think we should consider whether the "always-on runtime assertions" model is the one we want in the long term.
> If the type is omitted, scalar is assumed.
As Andreas pointed out, you mean "mixed" here (accepts any value), rather than "scalar" (accepts int, string, float, and bool).
> The variables created by this pattern auto cast anything assigned to them > without pitching an error.
My initial thought was that this makes the assignment operator a bit too magic for my taste. It's conceptually similar to the "weak mode" for scalar type hints (and could perhaps use the same setting), but those feel less magic because they happen at a clear scope boundary, and the cast only happens once. But on reflection, the consistency makes sense, and assigning to an object property defined by another library is similar to calling a method defined by another library, so the separation of caller and callee has similar justification.
> PHP 7 introduced type declarations.
This is incorrect, and leads you to a false conclusion. PHP 7 introduced *scalar* type declarations, which extended an existing system which had been there for years, supporting classes, interfaces, the generic "array" constraint, and later pseudo-types like "callable". I don't think it's tenable to change the meaning of this syntax, but it would certainly be possible to bikeshed some modifier to simultaneously declare "check type on function call, and declare corresponding local variable as fixed type".
> OBJECT TYPE LOCKING > > [...] > > QUESTION: How do we handle the second auto casting case? $a is not allowed > to not be a SomeClass() object, but there are no casting rules.
There are actually more than just object and scalar type hints - "callable" is a particularly complex check - but currently they all just act as assertions, so it would be perfectly consistent for "locking" to also only have the one mode.
> COMPARISON BEHAVIOR > When a strongly typed variable (autocasting or not) is compared to a scalar > variable only the scalar switches types. The strict comparison operator is > allowed though it only blocks the movement of the scalar. > > Comparisons between strongly typed variables are always strict and a > TypeError results if their types don't match. This actually provides a way > to force the greater than, lesser than, and spaceship operation to be > strict.
I like this idea. The over-eager coercion in comparisons is a common criticism of PHP. In general I really like the outline of this; there's a lot of details to work out, but we have to start somewhere. Regards, -- Rowan Collins [IMSoP]
  101529
January 3, 2018 23:19 tendoaki@gmail.com (Michael Morris)
On Wed, Jan 3, 2018 at 3:26 PM, Rowan Collins collins@gmail.com>
wrote:

> Hi Michael, > > On 02/01/2018 10:35, Michael Morris wrote: > >> I would like to propose a clean way to add some strong typing to PHP in a >> manner that is almost fully backward compatible (there is a behavior >> change >> with PHP 7 type declarations). As I don't have access to the add RFC's to >> the wiki I'll place this here. >> > > Thanks for putting this together. Perhaps unlike Andreas, I think it is > good to look at typing changes as a unified framework, rather than > considering "typed properties", "typed variables", etc, as separate > concerns. If we don't, there is a real risk we'll end up making decisions > now that hobble us for future changes, or over-complicating things in one > area because we're not yet ready to make changes in another. >
My thoughts exactly. PHP already has enough warts born of piecemeal design - a cursory look at the PHP string functions shows this very well. We have functions with haystack/needle and needle/haystack. Some function names are _ delimited, some aren't (or were meant to be camel cased but since PHP function labels aren't case sensitive), and so on. When I see an RFC based on types it worries me precisely because without a core plan of action we are inviting more language fragmentation.
> > My own thoughts on the subject from a while ago are here: > http://rwec.co.uk/q/php-type-system In that post, I borrowed the term > "container" from Perl6 for the conceptual thing that type constraints are > stored against; in PHP's case, this would include variables, object > properties, class static properties, function parameters, and return > values. I think a good plan for introducing typing is one that considers > all of these as equals. > > That was one of the most enjoyable reads I've had in awhile and I can't
think of anything there I disagree with. I'm still working through your references for how Python is handling things and the treatise on the nature of types.
> The biggest issue with any proposal, though, is going to be performance. I > don't think this is an incidental detail to be dealt with later, it is a > fundamental issue with the way type hints in PHP have evolved. PHP is > extremely unusual, if not unique, in exclusively enforcing type constraints > at runtime. Other languages with "gradual typing" such as Python, Hack, and > Dart, use the annotations only in separate static analysers and/or when a > runtime debug flag is set (similar to enabling assertions). > > Has the bus already left the station forever on this?
I think it's clear that what we are discussing here can't go into effect before PHP 8. Further, it could very well be on of if not the key feature of PHP 8. In majors backwards compatibility breaks are considered were warranted. I'm not familiar with the Zend Engine as I probably should be. I bring the perspective of an end user. From what you've posted am I correct in stating that PHP Type Hints / scalar Type Declarations are in truth syntactic sugar for asserting the type checks. Hence we read this function foo( ClassA $a, ClassB $b, string $c ) {} But the engine has to do the work of this... function foo ( $a, $b, $c ) { assert( $a instanceof ClassA, TypeError ); assert( $b instanceof ClassB, TypeError ); assert( is_string($c), InvalidArgument ); } If that is indeed the case, why not disable these checks according to the zend.assertions flag, or if that's too bold a move create a php.ini flag that allows them to be disabled in production. Existing code would be unaffected if it has been fully debugged because, in accordance with the principles of Design by Contract, a call with an illegal type should be impossible. For code that isn't up to par though we have the possibility of data corruption when the code proceeds past the call to wherever the reason for that type hint is. I'll hazard that most of the time that will be a call to method on non-object or something similar. PHP programmers however would need to get used to the idea that their type hints mean nothing when assertions are turned off (or if handled by a separate flag, when that flag is turned off). I'm ok with this, but I'm a big proponent of Design by Contract methodology as a supplement to Test Driven Design. Another thing to consider is that if the existing type hints are so expensive, this change might grant a welcome speed boost. Extending that to all containers means every assignment operation would
> effectively need to check the value on the right-hand-side against the > constraint on the left-hand-side. Some of those checks are non-trivial, > e.g. class/interface constraints, callable; or in future maybe "array of > Foo", "Foo | Bar | int", "Foo & Bar", etc. There are ways to ease this a > bit, like passing around a cache of type constraints a value has passed, > but I think we should consider whether the "always-on runtime assertions" > model is the one we want in the long term. > > > If the type is omitted, scalar is assumed. >> > > As Andreas pointed out, you mean "mixed" here (accepts any value), rather > than "scalar" (accepts int, string, float, and bool). > > Yes. I admitted to him in a previous post that I had made that mistake.
> > The variables created by this pattern auto cast anything assigned to them >> without pitching an error. >> > > My initial thought was that this makes the assignment operator a bit too > magic for my taste. It's conceptually similar to the "weak mode" for scalar > type hints (and could perhaps use the same setting), but those feel less > magic because they happen at a clear scope boundary, and the cast only > happens once. But on reflection, the consistency makes sense, and assigning > to an object property defined by another library is similar to calling a > method defined by another library, so the separation of caller and callee > has similar justification. > > > PHP 7 introduced type declarations. >> > > This is incorrect, and leads you to a false conclusion. PHP 7 introduced > *scalar* type declarations, which extended an existing system which had > been there for years, supporting classes, interfaces, the generic "array" > constraint, and later pseudo-types like "callable". > > I don't think it's tenable to change the meaning of this syntax, but it > would certainly be possible to bikeshed some modifier to simultaneously > declare "check type on function call, and declare corresponding local > variable as fixed type". > > Or go back to using the under utilized assert() statement :D
Or, if it's really important to the programmer, they can re-declare the variable to lock the type down. I only suggested this change to bring about consistency.
> > > COMPARISON BEHAVIOR >> When a strongly typed variable (autocasting or not) is compared to a >> scalar >> variable only the scalar switches types. The strict comparison operator is >> allowed though it only blocks the movement of the scalar. >> >> Comparisons between strongly typed variables are always strict and a >> TypeError results if their types don't match. This actually provides a way >> to force the greater than, lesser than, and spaceship operation to be >> strict. >> > > I like this idea. The over-eager coercion in comparisons is a common > criticism of PHP. > > > In general I really like the outline of this; there's a lot of details to > work out, but we have to start somewhere. > > Well, after this post I'm going to write a second draft pursuant to what
you and Andre have taught me and addressing some of the concerns that have been raised.
  101562
January 6, 2018 19:05 rowan.collins@gmail.com (Rowan Collins)
On 03/01/2018 23:19, Michael Morris wrote:
> I'm not familiar with the Zend Engine as I probably should be. I bring the > perspective of an end user. From what you've posted am I correct in stating > that PHP Type Hints / scalar Type Declarations are in truth syntactic sugar > for asserting the type checks.
This is how I've always pictured it, but I've never dug into the implementation before, so I had a look. (If anyone's curious how I found it, I started by searching for "callable", because it's a keyword that should only show up in type hints, then clicked through on LXR to everything that looked promising.) It looks like the actual "assertion" is the function zend_verify_arg_type [1] which calls zend_check_type [2] and formats an appropriate Error if the type check returns false. zend_check_type has to do various things depending on the type hint the user specified, which I'm guessing are classified when the function is compiled: * Null values are checked against nullable type markers and null default values. * A class name traverses up through the inheritance hierarchy of the argument until it finds a match or reaches the end [3], while an interface name has to recursively check all interfaces that might be indirectly implemented [4] * The "callable" type hint has to check all sorts of different formats, and is scope-dependent [5] * Strict array and scalar type hints are just a comparison of bit fields * Weak scalar type hints which aren't a direct match end up in zend_verify_scalar_type_hint to perform coercion if possible [6] When talking about additional type checks for assignment to properties, or "locked" local variables, etc, this is the code we're saying needs to be run more often. For simple types, in strict mode, it's not too bad, but checking classes, interfaces, and complex pseudotypes like "callable" seem pretty intensive. This is likely to get more complex too: proposed additions include union types ("Foo|Bar"), intersection types ("Foo&Bar"), typed arrays ("int[]"), generics ("Map"), and others. So I guess I'm agreeing with Rasmus and Dan Ackroyd that thinking there's an easy optimisation here is naive. For the same reason, I am supportive of the idea of having type checks, at least those we don't have yet, only enable with an off-by-default INI setting, treating them like assertions or DbC, not as part of the normal runtime behaviour. [1] https://php-lxr.adamharvey.name/source/xref/master/Zend/zend_execute.c#zend_verify_arg_type [2] https://php-lxr.adamharvey.name/source/xref/master/Zend/zend_execute.c#zend_check_type [3] https://php-lxr.adamharvey.name/source/xref/master/Zend/zend_operators.c#instanceof_class [4] https://php-lxr.adamharvey.name/source/xref/master/Zend/zend_operators.c#instanceof_interface [5] https://php-lxr.adamharvey.name/source/xref/master/Zend/zend_API.c#zend_is_callable_impl [6] https://php-lxr.adamharvey.name/source/xref/master/Zend/zend_execute.c#zend_verify_scalar_type_hint -- Rowan Collins [IMSoP]
  101536
January 4, 2018 21:09 andreas@dqxtech.net (Andreas Hennings)
On 3 January 2018 at 21:26, Rowan Collins collins@gmail.com> wrote:
> Hi Michael, > > On 02/01/2018 10:35, Michael Morris wrote: >> >> I would like to propose a clean way to add some strong typing to PHP in a >> manner that is almost fully backward compatible (there is a behavior >> change >> with PHP 7 type declarations). As I don't have access to the add RFC's to >> the wiki I'll place this here. > > > Thanks for putting this together. Perhaps unlike Andreas, I think it is good > to look at typing changes as a unified framework, rather than considering > "typed properties", "typed variables", etc, as separate concerns. If we > don't, there is a real risk we'll end up making decisions now that hobble us > for future changes, or over-complicating things in one area because we're > not yet ready to make changes in another.
I think the best strategy is to develop a greater vision of where we want to go, and then identify manageably small steps that move us in this direction, and that do not create conflicts in the future. This means we are both right. I still think the following are good "small steps": - typed properties with type lock - typed local variables with type lock - discussion whether and when parameters should be type-locked in the function body. Of course there should be consistency between those steps. You are right, we also need to consider when these types should be validated, and/or how the variables would be implemented. Perhaps we could actually create a system where type-locked variables use less memory, because they no longer need to store the type of the variable? E.g. a type-locked integer would only use the 64 bit or whichever size we currently use to store the actual number.
> The biggest issue with any proposal, though, is going to be performance. I don't think this is an incidental detail to be dealt with later, it is a fundamental issue with the way type hints in PHP have evolved. PHP is extremely unusual, if not unique, in exclusively enforcing type constraints at runtime. Other languages with "gradual typing" such as Python, Hack, and Dart, use the annotations only in separate static analysers and/or when a runtime debug flag is set (similar to enabling assertions).
A system where all variables are type-locked could in fact be faster than a system with dynamically typed variables. Depends on the implementation, of course. I imagine it would be a lot of work to get there.
  101537
January 5, 2018 01:21 rasmus@lerdorf.com (Rasmus Lerdorf)
> On Jan 4, 2018, at 13:09, Andreas Hennings <andreas@dqxtech.net> wrote: > > A system where all variables are type-locked could in fact be faster > than a system with dynamically typed variables. > Depends on the implementation, of course. I imagine it would be a lot > of work to get there.
I think you, and many others, commenting here, should start by looking at the engine implementation. Any successful RFC needs to have a strong implementation behind it, or at the very least a very detailed description of how the implementation would mesh with the existing engine code. The reason we don’t have typed properties/variables is that it would require adding type checks on almost every access to the underlying zval. That is a huge perf hit compared to only doing it on method/function egress points as we do now. -Rasmus
  101538
January 5, 2018 02:01 tendoaki@gmail.com (Michael Morris)
On Thu, Jan 4, 2018 at 7:21 PM Rasmus Lerdorf <rasmus@lerdorf.com> wrote:

> > > On Jan 4, 2018, at 13:09, Andreas Hennings <andreas@dqxtech.net> wrote: > > > > A system where all variables are type-locked could in fact be faster > > than a system with dynamically typed variables. > > Depends on the implementation, of course. I imagine it would be a lot > > of work to get there. > > I think you, and many others, commenting here, should start by looking at > the engine implementation. Any successful RFC needs to have a strong > implementation behind it, or at the very least a very detailed description > of how the implementation would mesh with the existing engine code. > > The reason we don’t have typed properties/variables is that it would > require adding type checks on almost every access to the underlying zval. > That is a huge perf hit compared to only doing it on method/function egress > points as we do now.
I’ve been thinking on this during my drive today to a new job and city. I promise to read over the current implementation before going further, but a quick question - what if the underlying zval wasn’t a zval but a separate class specific to the data type but implementing the same interface as zval? The compiler would choose to use the alternate classes when it encounters new syntax calling for their use, in effect adding a static typic layer that augments the existing dynamic typing layer.
> > -Rasmus > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > >
  101542
January 5, 2018 10:35 danack@basereality.com (Dan Ackroyd)
On 5 January 2018 at 02:01, Michael Morris <tendoaki@gmail.com> wrote:
> > what if the underlying zval wasn’t a zval but a separate > class specific to the data type but implementing the same interface as > zval?
I believe the only sensible answer to that is 'mu', as that question is based on misunderstanding. The internals of the PHP engine is C, and zvals are structs not classes, and so there is no interface. In userland classes are also zvals. http://www.phpinternalsbook.com/php7/internal_types/zvals/basic_structure.html Also, I think people who try to guess at how to make changes to the engine, are doing a small disservice to people who have already tried to implement this. The current contributors are a bunch of clever people, and if there was an obvious way to implement it, they would have implemented it already. It's not a case that there is going to be an easy solution that has been overlooked, that someone cleverer is going to be able to guess at. cheers Dan Ack
  101550
January 5, 2018 14:33 andreas@dqxtech.net (Andreas Hennings)
On 5 January 2018 at 11:35, Dan Ackroyd <danack@basereality.com> wrote:
> On 5 January 2018 at 02:01, Michael Morris <tendoaki@gmail.com> wrote: >> >> what if the underlying zval wasn’t a zval but a separate >> class specific to the data type but implementing the same interface as >> zval? > > I believe the only sensible answer to that is 'mu', as that question > is based on misunderstanding. > > The internals of the PHP engine is C, and zvals are structs not > classes, and so there is no interface. In userland classes are also > zvals. http://www.phpinternalsbook.com/php7/internal_types/zvals/basic_structure.html
I think a good beginners intro is this, http://php.net/manual/de/internals2.variables.intro.php Yes, these things are structs, and there are no interfaces. It would be possible, in theory, to create a different struct for type-locked variables, where the type is not stored with each instance, but in the opcode. Or perhaps separate structs per type. This would obviously be a huge amount of work, and a radical change to the language, so I do not imagine this going to happen any time soon. Every place in code that currently deals with the _zval_struct would then have to consider all other structs. The opcode could then be optimized for such type-locked variables, and this would reduce cost in memory and performance. The next best thing would be to keep the existing _zval_struct also for type-locked variables, and still try to optimize the opcode as if the type is known at compile time. Still a lot of work, I imagine, because it still affects every place where we deal with a variable. The third option is to keep the implementation as if all types are dynamic, and only add some type checks here and there, which can be globally enabled or disabled. This is what other gradually typed languages do, as pointed out by Rowan Collins,
> The biggest issue with any proposal, though, is going to be performance. I don't think this is an incidental detail to be dealt with later, it is a fundamental issue with the way type hints in PHP have evolved. PHP is extremely unusual, if not unique, in exclusively enforcing type constraints at runtime. Other languages with "gradual typing" such as Python, Hack, and Dart, use the annotations only in separate static analysers and/or when a runtime debug flag is set (similar to enabling assertions).
> > Also, I think people who try to guess at how to make changes to the > engine, are doing a small disservice to people who have already tried > to implement this.
This is a dilemma. I think there are some people with valuable opinions on language design, which did not find the time yet to study the engine implementation. So, either we risk occasional ignorant ideas, or we will miss some valuable contributions. I personally want to eventually study the engine in more detail, but I don't think I need to completely self-censor myself until then. Instead, I have to make a judgement call each time if my limited understanding is sufficient to allow a meaningful contribution to the discussion.
  101552
January 5, 2018 14:56 cmbecker69@gmx.de ("Christoph M. Becker")
On 05.01.2018 at 15:33, Andreas Hennings wrote:

> On 5 January 2018 at 11:35, Dan Ackroyd <danack@basereality.com> wrote:> >> The internals of the PHP engine is C, and zvals are structs not >> classes, and so there is no interface. In userland classes are also >> zvals. http://www.phpinternalsbook.com/php7/internal_types/zvals/basic_structure.html > > I think a good beginners intro is this, > http://php.net/manual/de/internals2.variables.intro.php
The internals2 part of the PHP manual is about PHP 5. The best info for PHP 7 regarding the internals ist the phpinternalsbook already pointed at by Dan. -- Christoph M. Becker
  101541
January 5, 2018 09:50 lester@lsces.co.uk (Lester Caine)
On 05/01/18 01:21, Rasmus Lerdorf wrote:
> The reason we don’t have typed properties/variables is that it would require adding type checks on almost every access to the underlying zval. That is a huge perf hit compared to only doing it on method/function egress points as we do now.
I think that in hindsight all I have been looking to out of this is that 'zval' has additional capability to standardise validation. 'Simply' adding a crude type check with it's overheads does not remove the validation requirements which still need to be handled much of the time. It the type check ALSO included validation, then the performance hit would be mitigated by the reduction in user side code. But 'error' may not be the right response EVEN with just the simple type check and that is why current typing hacks don't fit MY method of working. I have validation on key paths, but each is isolated from other paths while a core standard method of validation would simplify things in a way 'strong typing' does not! -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
  101563
January 6, 2018 20:12 rowan.collins@gmail.com (Rowan Collins)
On 05/01/2018 09:50, Lester Caine wrote:
> 'Simply' adding a crude type check with it's overheads does not remove the > validation requirements which still need to be handled much of the time.
Yes, I'd love to be able to define custom types like "integer in the range 0 to 100" or whatever.
> But 'error' may not be the right response EVEN with just the simple type check
I think one of the big distinctions is between validation of a value being received from somewhere (can only happen at runtime), versus verifying the validity of a piece of code (would ideally happen at compile time). Maybe we could have a different syntax to define a function with a compile-time constraint that the value was provably of the right type. With custom types, that could then prove that some validation check had already passed. That's kind of what the offline type checkers that other languages offer do, it's just up to you to run them before deploying your code somewhere important. Regards, -- Rowan Collins [IMSoP]
  101572
January 9, 2018 23:06 tendoaki@gmail.com (Michael Morris)
Before I begin, and without picking on anyone specific, I want to say that
it is generally unhelpful to say that because I, or others, do not know how
the engine is set up that it is impossible to make any meaningful
contributions to the list or on this issue specifically.  My clients don't
understand HTML.  If I told them they needed to study how HTML works before
trying to give me input on the sites I'm building for them I'd likely be
fired.  As a theater major I know quite a bit more than most of the people
on this list about what makes a good play or movie actually work, but I
don't pretend that knowledge is prerequisite to knowing if a play or movie
is good.  It either works, or it doesn't.

If the fallback to all suggestions is "Shut up, that's impossible to do
given current engine architecture." then I'm afraid that PHP is doomed to
become the next COBOL - a language with a lot of important legacy programs,
but no new developers or future as those old systems finally give up the
ghost. Also, given that HHVM has implemented at least one aspect if this
proposal with classes the argument that it's impossible carries a rather
large spoonful of salt.

This said, I will refrain from offering any more input on how this might be
implemented as it is clearly not wanted. I will instead focus on the
desired end state.


Much of what follows is based on Michal Brzuchalski's comments.  His
commentary can be largely summed up with "using the var keyword in a
counter-intuitive way is just going to make matters worse. Also,
getter/setter debates are quite a bit out of scope.



Third Draft.

Target version: PHP 8.

This is a proposal to strengthen the dynamic type checking of PHP during
development.

Note - this is not a proposal to change PHP to a statically typed language
or to remove PHP's current typing rules. PHP is typed the way it is for a
reason, and will remain so subsequent to this RFC. This RFC is concerned
with providing tools to make controlling variable types stronger when the
programmer deems this necessary.



DEFINITIONS
Before a meaningful discussion on types and type handling can be performed
some terms must be defined explicitly, especially since their definitions
in common parlance may change from language to language, and even
programmer to programmer. I

* Static Typing: This typing is performed by the compiler either explicitly
or implicitly.
* Dynamic Typing: This typing is performed by the runtime. Unlike static
typing it allows for varying degrees of variable coercion.
* Strong/Weak typing: These terms typically refer to the amount of latitude
the run time has to coerce variables - the more latitude the "weaker" the
typing.



VARIABLE DECLARATION (GLOBAL AND FUNCTION SCOPE)

PHP currently has no keyword to initialize a variable - it is simply
created when it is first referenced. The engine continually coerces the
variables into the required types for each operation. While this is a very
powerful ability, it runs into problems with comparisons (
http://phpsadness.com/sad/52 ). As a result of this issue PHP (and many
languages that share this problem such as JavaScript) provides the strict
comparison operator.  Avoiding this issue is one reason it can be useful to
have some amount of control over a variable's type. This can be
accomplished with two keywords, one old and one new: var and strict.

var $a = 5;
strict $b = 9;

Together these are "mutability operators" - that is they control if a
variable can mutate, be coerced, recast or what have you between types. The
strict keyword removes the mutability of a variable between types. The var
keyword restores it. The keywords perform these operations even in the
absence of an assignment, though using strict without any assignment will
lead to an error since type will be unknown.  Examples.

var $a; // $a will be created and be NULL.
strict $a; // TypeError - strict variables must have an explicit type or a
value from which a type can be inferred.
$b = 5;
strict $b; // $b locks down to integer since it was already declared. It's
value remains 5.
var $b; // $b's ability to be coerced is restored.
strict int $c; // Works. $c will be empty - any assignment must be the
specified type.
strict $d = 'Hello'; // Works. Type of string can be inferred.

Both keywords allow comma delimited lists of declarations (for
consistency), but strict will be the one to use it most frequently:

strict $a = 1, $b = "Hello", $c = 3.14, $d = [];

$a is inferred to int, $b to string, $c to float and $d to array.



FUNCTION DECLARATION

The strict keyword in a function declaration locks down the argument var
for the remainder of the function (or until var is used). For consistency
it is recommended that var be allowed as well, but it wouldn't do anything
beyond cuing IDE's that mixed will be accepted.

function foo (strict int $a, strict string $b, var $c, strict $d = true) {}



ARRAYS
If an array is strict all of its keys and values will be strict and
inferred on assignment.

strict $a = [
  'id' => 1,
  'name' => 'Mark',
];




CLASS MEMBERS

The var keyword appeared in PHP 4 to declare class members and found itself
deprecated. As the mutability operator it is still allowed, and now allowed
alongside the scope operator.

class SomeClass {
  var $a = '3';
  public var $b = 'hello';
  public strict $c = 3.14;
  protected strict int $d;
}

Interfaces can also lock member types following the above pattern.




COMPARISON BEHAVIOR
As mentioned above comparisons are the area where the most stability gains
are to be had. When strict variables are compared to dynamic or "var"
variables only the var variable will be coerced. If two strict variables
are compared a TypeError will raise barring an explicit cast

strict $a = 123;
strict $b = '123';

if ($b == (string) $a) {}
  101574
January 10, 2018 06:53 rasmus@lerdorf.com (Rasmus Lerdorf)
On Tue, Jan 9, 2018 at 3:06 PM, Michael Morris <tendoaki@gmail.com> wrote:

> Before I begin, and without picking on anyone specific, I want to say that > it is generally unhelpful to say that because I, or others, do not know how > the engine is set up that it is impossible to make any meaningful > contributions to the list or on this issue specifically. My clients don't > understand HTML. If I told them they needed to study how HTML works before > trying to give me input on the sites I'm building for them I'd likely be > fired. As a theater major I know quite a bit more than most of the people > on this list about what makes a good play or movie actually work, but I > don't pretend that knowledge is prerequisite to knowing if a play or movie > is good. It either works, or it doesn't. >
The difference here is that the end syntax is something like 10% of the problem. 90% of it is fitting it into the engine in an efficient manner giving that it is affecting the very core of the engine. An RFC on this issue that doesn't address the bulk of the problem isn't all that helpful. -Rasmus
  101575
January 10, 2018 13:27 tendoaki@gmail.com (Michael Morris)
On Wed, Jan 10, 2018 at 12:53 AM, Rasmus Lerdorf <rasmus@lerdorf.com> wrote:

> > The difference here is that the end syntax is something like 10% of the > problem. 90% of it is fitting it into the engine in an efficient manner > giving that it is affecting the very core of the engine. An RFC on this > issue that doesn't address the bulk of the problem isn't all that helpful. > > It makes absolutely NO sense to do that 90% of the work to have it all
burned up when the proposal fails to carry a 2/3rds vote because the syntax is disliked. Also, drawing the architectural drawings for a skyscraper is also like only 10% of the work, but it's a damn important 10%. That the implementation will be a major pain in the ass to do is all the more reason to create and pass a planning RFC before doing any related code/implementation RFC's. It will encourage people to do the research to try to figure out how to get this done because they know the syntax is approved and they aren't fiddling around in the dark trying to figure out how to do something that may not be accepted for inclusion at all, which is a huge waste of time.
  101577
January 10, 2018 15:04 rasmus@lerdorf.com (Rasmus Lerdorf)
On Wed, Jan 10, 2018 at 5:27 AM, Michael Morris <tendoaki@gmail.com> wrote:
> > Also, drawing the architectural drawings for a skyscraper is also like only > 10% of the work, but it's a damn important 10%. >
Wow, that's rather insulting to the amazing work Dmitry, Nikita, Xinchen and others are doing working on the core of PHP. Describing the syntax/UI for a feature like this is nothing like the architectural drawings for a skyscraper. The architectural drawings for a skyscraper are extremely detailed and describe exactly how to build it including all materials, tolerances, etc. The analogy here is more like you saying you would like a blue skyscraper with 30 windows and a door and then complaining that the idiot constructions crew should stop complaining and just build the thing. There are plenty of things where the UI/syntax description is all that is needed because the implementation is trivial and flows straight from such a description. This doesn't happen to be one of those. -Rasmus
  101578
January 10, 2018 15:10 sebastian@php.net (Sebastian Bergmann)
Am 10.01.2018 um 16:04 schrieb Rasmus Lerdorf:
> Wow, that's rather insulting to the amazing work Dmitry, Nikita, Xinchen > and others are doing working on the core of PHP.
I agree. IIRC, last time optional type declarations for attributes were discussed Dmitry optimized/refactored something in the engine that would reduce the performance hit. Do we have a guess at how big that performance hit would be? I, for one, would gladly trade a couple of percent of performance (considering the huge gains in performance PHP 7 brought) to be able to use these type declarations in my code.
  101579
January 10, 2018 15:13 levim@php.net (Levi Morrison)
On Wed, Jan 10, 2018 at 8:10 AM, Sebastian Bergmann <sebastian@php.net> wrote:
> Am 10.01.2018 um 16:04 schrieb Rasmus Lerdorf: >> Wow, that's rather insulting to the amazing work Dmitry, Nikita, Xinchen >> and others are doing working on the core of PHP. > > I agree. > > IIRC, last time optional type declarations for attributes were discussed > Dmitry optimized/refactored something in the engine that would reduce the > performance hit. > > Do we have a guess at how big that performance hit would be? I, for one, > would gladly trade a couple of percent of performance (considering the > huge gains in performance PHP 7 brought) to be able to use these type > declarations in my code. > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php
An additional issue is typed references. I believe Bob Weinand did some work in that area; maybe he can share more insight.
  101590
January 10, 2018 18:39 tendoaki@gmail.com (Michael Morris)
On Wed, Jan 10, 2018 at 9:04 AM, Rasmus Lerdorf <rasmus@lerdorf.com> wrote:

> > > On Wed, Jan 10, 2018 at 5:27 AM, Michael Morris <tendoaki@gmail.com> > wrote: >> >> Also, drawing the architectural drawings for a skyscraper is also like >> only >> 10% of the work, but it's a damn important 10%. >> > > Wow, that's rather insulting to the amazing work Dmitry, Nikita, Xinchen > and others are doing working on the core of PHP. >
No insult was intended here. I apologize if any is taken.
> Describing the syntax/UI for a feature like this is nothing like the > architectural drawings for a skyscraper. >
In terms of time and effort spent it is. It often takes years to complete plans drawn up over the span of weeks. The analogy becomes more firm when you compare the man hours on each side - an architect can draw up plans for a house in less than 100 hours (unless it's a freaking huge house). The contractor labor hours will be 100 times that at a minimum. If anything I'm off in scales, but I was being anecdotal - I wasn't aiming for precise accuracy. Plans still must precede work, and if the ramifications of those plans are to be far reaching they need to be agreed upon as early as possible.
  101596
January 10, 2018 21:08 rowan.collins@gmail.com (Rowan Collins)
On 10/01/2018 18:39, Michael Morris wrote:
> On Wed, Jan 10, 2018 at 9:04 AM, Rasmus Lerdorf <rasmus@lerdorf.com> wrote: > >> Describing the syntax/UI for a feature like this is nothing like the >> architectural drawings for a skyscraper. > In terms of time and effort spent it is. It often takes years to complete > plans drawn up over the span of weeks. The analogy becomes more firm when > you compare the man hours on each side - an architect can draw up plans for > a house in less than 100 hours (unless it's a freaking huge house).
I don't think Rasmus was saying architects' plans aren't important, or making any comment about the scale of the task. I think he was saying that things like syntax and UI are not the appropriate part of the process to compare to architects' plans. Architects know how buildings work, and spend those weeks making sure the subsequent years aren't going to be wasted because the plausible-looking shape the client asked for can't actually support its own weight. And just to be clear, this particular feature IS a freaking huge house. Worse, it's a type of skyscraper nobody has ever tried to build before. Sketching the kinds of shapes it might have is interesting; getting hung up on what size the windows are (the exact keywords to use) is probably a waste of time until we've figured out if there's a material that bends that way. And saying "hey, could you make it out of carbon nanotubes?" is a fun conversation to have over a beer, but probably isn't going to be that helpful to people who are experts on skyscrapers and material science. Apologies for extending the metaphor somewhat beyond stretching point, but I think it acts as a reasonable illustration of where people are coming from in this thread.
> Plans still must precede work, and if the ramifications of those plans are > to be far reaching they need to be agreed upon as early as possible.
Absolutely, and unfortunately, the biggest ramifications of this particular type of change is going to be in the very core of the engine. That's not true of every feature, but for this particular feature, one of the parts that needs planning and agreeing as early as possible is "how are we going to do this without killing performance". Regards, -- Rowan Collins [IMSoP]
  101586
January 10, 2018 18:11 ryan.jentzsch@gmail.com (Ryan Jentzsch)
I agree with Michael (to a large degree) and I think I see clearly
Michael's point:
Under the current system I will NEVER create an RFC (or find someone with
the Zend engine coding chops to help me) because the RISK vs. REWARD with
the current RFC system is too likely to be a colossal waste of everyone's
time.
Currently the tail wags the dog (implementation details govern top level
policy). The current process nearly insists I spend valuable time coding up
front with a good chance that if/when the RFC goes up for a vote someone
will still be bleating about syntax, or using tabs vs. spaces, or some
other minor detail -- with a 2/3 vote needed it may shoot all my
preliminary hard work to hell. No thanks.



On Wed, Jan 10, 2018 at 6:27 AM, Michael Morris <tendoaki@gmail.com> wrote:

> On Wed, Jan 10, 2018 at 12:53 AM, Rasmus Lerdorf <rasmus@lerdorf.com> > wrote: > > > > > The difference here is that the end syntax is something like 10% of the > > problem. 90% of it is fitting it into the engine in an efficient manner > > giving that it is affecting the very core of the engine. An RFC on this > > issue that doesn't address the bulk of the problem isn't all that > helpful. > > > > > It makes absolutely NO sense to do that 90% of the work to have it all > burned up when the proposal fails to carry a 2/3rds vote because the syntax > is disliked. > > Also, drawing the architectural drawings for a skyscraper is also like only > 10% of the work, but it's a damn important 10%. > > That the implementation will be a major pain in the ass to do is all the > more reason to create and pass a planning RFC before doing any related > code/implementation RFC's. It will encourage people to do the research to > try to figure out how to get this done because they know the syntax is > approved and they aren't fiddling around in the dark trying to figure out > how to do something that may not be accepted for inclusion at all, which is > a huge waste of time. >
  101587
January 10, 2018 18:27 rasmus@lerdorf.com (Rasmus Lerdorf)
On Wed, Jan 10, 2018 at 10:11 AM, Ryan Jentzsch jentzsch@gmail.com>
wrote:

> I agree with Michael (to a large degree) and I think I see clearly > Michael's point: > Under the current system I will NEVER create an RFC (or find someone with > the Zend engine coding chops to help me) because the RISK vs. REWARD with > the current RFC system is too likely to be a colossal waste of everyone's > time. > Currently the tail wags the dog (implementation details govern top level > policy). The current process nearly insists I spend valuable time coding up > front with a good chance that if/when the RFC goes up for a vote someone > will still be bleating about syntax, or using tabs vs. spaces, or some > other minor detail -- with a 2/3 vote needed it may shoot all my > preliminary hard work to hell. No thanks.
There is a middle ground here. I agree that doing months of work on a rock-solid implementation doesn't make sense if you don't know the RFC will pass. On the other end of the spectrum, RFCs that are essentially feature requests with no specifics on the actual implementation also don't make any sense. A good RFC strikes a happy balance between the two. For many/most things, the actual work in figuring out the implementation isn't that bad. As Sara said, a full implementation isn't needed, but a rough sketch of what changes are needed along with their potential impact on the existing code definitely is. And yes, unfortunately, if your RFC touches the basic building block of PHP, the zval, then that rough sketch becomes even more important. If you stay away from trying to change a 25-year old loosely typed language into a strictly typed one, then the RFC becomes much simpler. -Rasmus
  101591
January 10, 2018 18:48 tendoaki@gmail.com (Michael Morris)
On Wed, Jan 10, 2018 at 12:27 PM, Rasmus Lerdorf <rasmus@lerdorf.com> wrote:

> If you stay away from trying to change a 25-year old loosely typed > language into a strictly typed one, then the RFC becomes much simpler. > > -Rasmus >
I have REPEATEDLY stated that is not the goal. I don't misrepresent what you say, please do not do that to me. I want to see strict typing as an option, not a requirement. Arggh... I said I'd stay away from implementation, but would this work? Working this into z_val in any way is problematic. So, store elsewhere? Create a symbol table that holds the strict variables and the types they are locked into. The strict keyword pushes them onto that table, the var keyword pulls them off. When an operation that cares about type occurs check that table - if the var appears there than authenticate it. I would hope that if a programmer doesn't want strict typing the overhead of checking an empty table would be minimal, even if repeated a great many times.
  101592
January 10, 2018 18:59 rasmus@lerdorf.com (Rasmus Lerdorf)
On Wed, Jan 10, 2018 at 10:48 AM, Michael Morris <tendoaki@gmail.com> wrote:

> On Wed, Jan 10, 2018 at 12:27 PM, Rasmus Lerdorf <rasmus@lerdorf.com> > wrote: > > > If you stay away from trying to change a 25-year old loosely typed > > language into a strictly typed one, then the RFC becomes much simpler. > > > > -Rasmus > > > > I have REPEATEDLY stated that is not the goal. I don't misrepresent what > you say, please do not do that to me. > > I want to see strict typing as an option, not a requirement. >
But the point is that whether it is an option or not, it still has to touch the zval. Which means everything changes whether the option is enabled or not. If you store this information elsewhere, that other location has to be checked on every zval access. Basically the work is identical to the work required to make PHP strictly typed. Making it optional might actually be harder because we have to build both and add more checks in that case. The only viable place I see to store this optionally is outside the runtime in a static analyzer like Phan (which already does this) which matches how HHVM solved it. Of course, there may be a cleaner way to do it. But that is why an RFC on this topic has to give a clear plan towards this cleaner implementation. Now if the RFC was a plan for baking a compile-time static analysis engine into PHP itself, that would be interesting. But that is a *massive* project. -Rasmus
  101593
January 10, 2018 19:23 andreas@dqxtech.net (Andreas Hennings)
On 10 January 2018 at 19:59, Rasmus Lerdorf <rasmus@lerdorf.com> wrote:
> > Now if the RFC was a plan for baking a compile-time static analysis engine > into PHP itself, that would be interesting. But that is a *massive* project.
Even with my limited understanding of the engine, I can imagine this to be a lot of work. But it sounds much better to me than adding more expensive runtime type checks. I think it would be worth exploring as a long-term direction.
  101594
January 10, 2018 19:25 andreas@dqxtech.net (Andreas Hennings)
Whether we work with runtime type checks or compile-time static analysis:
The user-facing language design questions would still be the same, right?
E.g. we would still have to distinguish type-locked parameter values
vs dynamically typed parameter values.

On 10 January 2018 at 20:23, Andreas Hennings <andreas@dqxtech.net> wrote:
> On 10 January 2018 at 19:59, Rasmus Lerdorf <rasmus@lerdorf.com> wrote: >> >> Now if the RFC was a plan for baking a compile-time static analysis engine >> into PHP itself, that would be interesting. But that is a *massive* project. > > Even with my limited understanding of the engine, I can imagine this > to be a lot of work. > But it sounds much better to me than adding more expensive runtime type checks. > I think it would be worth exploring as a long-term direction.
  101597
January 10, 2018 21:10 ryan.jentzsch@gmail.com (Ryan Jentzsch)
In my opinion The Strong Typing Syntax RFC will have less of a chance of
passing a vote than https://wiki.php.net/rfc/typed-properties.
Since the typed-properties RFC was confined to properties on a class (and
looking at the code it appears to me that it wasn't too difficult to
implement the type strictness constraints). Sadly, even after it was shown
to have minimal effect on performance the RFC was still shot down.

Strong Typing Syntax I would think is even more complicated given this
touches ALL zval processing internally. The concern of "expensive run-time
checks" can of course be mitigated by requiring declare(strict_types=1) to
enable/allow strong typing syntax.
I'd love to see Strong Typing Syntax in PHP but realistically, given the
past history, this RFC will need to target version 8.


On Wed, Jan 10, 2018 at 12:25 PM, Andreas Hennings <andreas@dqxtech.net>
wrote:

> Whether we work with runtime type checks or compile-time static analysis: > The user-facing language design questions would still be the same, right? > E.g. we would still have to distinguish type-locked parameter values > vs dynamically typed parameter values. > > On 10 January 2018 at 20:23, Andreas Hennings <andreas@dqxtech.net> wrote: > > On 10 January 2018 at 19:59, Rasmus Lerdorf <rasmus@lerdorf.com> wrote: > >> > >> Now if the RFC was a plan for baking a compile-time static analysis > engine > >> into PHP itself, that would be interesting. But that is a *massive* > project. > > > > Even with my limited understanding of the engine, I can imagine this > > to be a lot of work. > > But it sounds much better to me than adding more expensive runtime type > checks. > > I think it would be worth exploring as a long-term direction. > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > >
  101659
January 22, 2018 21:11 smalyshev@gmail.com (Stanislav Malyshev)
Hi!

> I want to see strict typing as an option, not a requirement.
You seem to be under impression that this somehow makes things easier. It does not. To explain: let's say you design a strictly typed language, like Java. The compiler knows which variable is of which type at every point, and if it's not clear for some reason, it errors out. You can build a compiler on top of those assumptions. Now let's say you design a loosely typed language, like Javascript. The compiler knows variables have no types, only values have it, and builds on top of that (as in, it doesn't need to implement type tracking for variables). Now, you come in and say - let's make the compiler have *both* assumptions - that sometimes it's strict and sometimes it's not. Sometimes you need to track variable types and sometimes you don't. Sometimes you have type information and can rely on it, and sometimes you don't and have to type-juggle. Do you really think this just made things *easier*? To implement both Java and Javascript inside the same compiler, with radically different types of assumption? If you have desire to answer "yes", then a) please believe me it is not true b) please try to implement a couple of compilers and see how easy it is. Having two options is not even twice as harder as having one. It's much more. So "optional" part adds all work that needs to be done to support strict typing in PHP, and on top of that, you also have to add work that needs to be done to support cases where half of the code is typed and the other half is not. And this is not only code writing work - this is conceptual design work, testing work, documenting work, etc. Without even going to the merits of the proposal itself, it certainly looks to me like you are seriously underestimating what we're talking about, complexity-wise. I am not saying it's not possible at all - a lot of things are possible. It's just "it's merely an option" is exactly the wrong position to take.
> Create a symbol table that holds the strict variables and the types they > are locked into. The strict keyword pushes them onto that table, the var > keyword pulls them off. When an operation that cares about type occurs > check that table - if the var appears there than authenticate it.
And now every function and code piece that works with symbol tables needs to be modified to account for the fact that there are two of them. Every lookup is now two lookups, and no idea how $$var would even work at all. -- Stas Malyshev smalyshev@gmail.com
  101576
January 10, 2018 14:09 rowan.collins@gmail.com (Rowan Collins)
On 9 January 2018 23:06:54 GMT+00:00, Michael Morris <tendoaki@gmail.com> wrote:
>Before I begin, and without picking on anyone specific, I want to say that >it is generally unhelpful to say that because I, or others, do not know how >the engine is set up that it is impossible to make any meaningful >contributions to the list or on this issue specifically. My clients don't >understand HTML. If I told them they needed to study how HTML works >before trying to give me input on the sites I'm building for them I'd likely be fired.
While I understand your frustration, I don't think anyone here is saying you shouldn't offer any input, only to be aware of your own limitations when presenting it. To use your analogy, imagine if a client came to you and said "we think it would be cool if the page changed colour as the user looked at different parts of the screen". You probably wouldn't ask them for details of what colours they wanted, with a vague idea that you'd research if it was possible to implement eye-tracking in browser JS later; more likely, you'd say "yes, that would be cool, but I'm pretty sure it's not possible". If they went ahead and gave you a 10-page spec "in case you work out how to do it after all", that would be a waste of everyone's time. So, back to the subject at hand: it is useful to share ideas on the typing strategy PHP should be taking, things like which types of value you'd like to see checked, whether we need both auto-casting and type errors, whether all of this should be switched off in production, and the implications for the user of those various decisions. But there's always the possibility that those ideals won't be possible, so details like the exact keywords to use for each type of variable are probably best left vague and sorted out later. I'll also echo a previous request that you apply for a wiki account to make your document more readable; or maybe just put it as a github gist or on your own website, and treat it as more of a wishlist and discussion piece than a spec that core developers are going to commit to. Regards, -- Rowan Collins [IMSoP]
  101580
January 10, 2018 15:54 cmbecker69@gmx.de ("Christoph M. Becker")
On 10.01.2018 at 15:09, Rowan Collins wrote:

> I'll also echo a previous request that you apply for a wiki account to make your document more readable; or maybe just put it as a github gist or on your own website, and treat it as more of a wishlist and discussion piece than a spec that core developers are going to commit to.
That is, however, not necessarily sufficient. There are already several accepted RFC with pending implementation[1], the oldest of which had been accepted in 2011 and 2012, respectively. [1] <https://wiki.php.net/rfc#pending_implementation> -- Christoph M. Becker
  101581
January 10, 2018 16:22 rowan.collins@gmail.com (Rowan Collins)
On 10 January 2018 at 15:54, Christoph M. Becker <cmbecker69@gmx.de> wrote:

> On 10.01.2018 at 15:09, Rowan Collins wrote: > > > I'll also echo a previous request that you apply for a wiki account to > make your document more readable; or maybe just put it as a github gist or > on your own website, and treat it as more of a wishlist and discussion > piece than a spec that core developers are going to commit to. > > That is, however, not necessarily sufficient. There are already several > accepted RFC with pending implementation[1], the oldest of which had > been accepted in 2011 and 2012, respectively. >
Sufficient for what? I was just saying it would be easier to read online in a versioned doc than in the bodies of a series of e-mails. -- Rowan Collins [IMSoP]
  101582
January 10, 2018 16:54 cmbecker69@gmx.de ("Christoph M. Becker")
On 10.01.2018 at 17:22, Rowan Collins wrote:

> On 10 January 2018 at 15:54, Christoph M. Becker <cmbecker69@gmx.de> wrote: > >> On 10.01.2018 at 15:09, Rowan Collins wrote: >> >>> I'll also echo a previous request that you apply for a wiki account to >> make your document more readable; or maybe just put it as a github gist or >> on your own website, and treat it as more of a wishlist and discussion >> piece than a spec that core developers are going to commit to. >> >> That is, however, not necessarily sufficient. There are already several >> accepted RFC with pending implementation[1], the oldest of which had >> been accepted in 2011 and 2012, respectively. > > Sufficient for what? I was just saying it would be easier to read online in > a versioned doc than in the bodies of a series of e-mails.
Sorry for badly quoting. I fully agree that using another medium for drafting the RFC other than this list would be preferable. However, I wanted to point out that it might not even make sense to do so, until someone has been found who is willing and able :) to actually write a suitable implementation. -- Christoph M. Becker
  101584
January 10, 2018 17:01 pollita@php.net (Sara Golemon)
On Thu, Jan 4, 2018 at 8:21 PM, Rasmus Lerdorf <rasmus@lerdorf.com> wrote:
> I think you, and many others, commenting here, should start by looking > at the engine implementation. Any successful RFC needs to have a strong > implementation behind it, or at the very least a very detailed description of > how the implementation would mesh with the existing engine code. > > The reason we don’t have typed properties/variables is that it would > require adding type checks on almost every access to the underlying > zval. That is a huge perf hit compared to only doing it on method/function > egress points as we do now. >
I'm going to underline Rasmus' comment here. zval assignment is a deep/core element of what the engine does. Even when it's not
  101585
January 10, 2018 17:08 pollita@php.net (Sara Golemon)
On Wed, Jan 10, 2018 at 12:01 PM, Sara Golemon <pollita@php.net> wrote:
> On Thu, Jan 4, 2018 at 8:21 PM, Rasmus Lerdorf <rasmus@lerdorf.com> wrote: >> I think you, and many others, commenting here, should start by looking >> at the engine implementation. Any successful RFC needs to have a strong >> implementation behind it, or at the very least a very detailed description of >> how the implementation would mesh with the existing engine code. >> >> The reason we don’t have typed properties/variables is that it would >> require adding type checks on almost every access to the underlying >> zval. That is a huge perf hit compared to only doing it on method/function >> egress points as we do now. >> > > **agh-mistabbed into a send
I'm going to underline Rasmus' comment here. zval assignment is a deep/core element of what the engine does. Even when it's not a literal `$x = "foo";` in userspace, zvals are flying around the engine constantly. Adding so much as a Z_TYPEINFO_P(val) & ZVAL_FLAG_STRICT check to EVERY ONE OF THOSE accesses is both heavy-weight and massively complex. On the order of the php-ng rewrite complexity, because EVERY assignment needs to be dealt with, and there WILL be some misbehaving extension out there which gets it wrong. The implementation essentials are not a trivial part of such a feature. You don't need to have the entire implementation written and tested, but you do need to have a clear plan for how and what will be done and vitally, what the impact of that plan will be. You can't just waive your hands and say: "We'll sort this out..." "How does HackLang do this?" has been asked of me offline, so I want to put my answer here: IT DOESN'T. HackLang relies of static analysis to prove "$x will never be assigned a non-integer, so we can always assume it's an integer". This is done by the static analysis tool before the site is ever run, not at runtime. Why? Because the HHVM could see the same thing Rasmus is telling you. Runtime type enforcement is damned expensive. -Sara