Re: [PHP-DEV] Concept: "arrayable" pseudo type and \Arrayableinterface

  107811
November 17, 2019 19:27 rowan.collins@gmail.com (Rowan Tommins)
[Note: I've included the full text of the previous message as it wasn't 
sent to the list.]

On 17/11/2019 18:35, Aimeos | Norbert Sendetzky wrote:
>> It feels like there are two alternative suggestions here. Not only do >> they use the same keyword to mean different things, but "convertible to >> array", and "usable as array" seem like distinct concepts to me. > The name of the interface isn't perfect because it may be too close to > the type hint and therefore misleading. Suggestions for a better name > are always welcome. > > The concept contains three parts: > - arrayable type hint: Use objects and arrays alike > - \Arrayable interface: To enable implementing objects that are "arrayable" > - __toArray(): To convert objects to arrays
Like I say, the name is part of it, but the purposes also overlap the way you describe them. It makes more sense to me to have two separate features for two separate use cases: - If I just want to know that the object I've received can be iterated, counted, and accessed using square brackets, then I need either a special type hint or a new interface, but not both. - On the other hand, if I just want to know I can call toArray or __toArray on an object, I don't care whether it implements Iterator, Countable, etc as well.
>> For the "convertible to array" case, I think __toArray, or an interface >> specifying just that one method, would make more sense than combining it >> with the existing interfaces. I'm sceptical of that concept, though, >> because most objects could be converted to many different arrays in >> different circumstances, each of which should be given a different and >> descriptive name. > Can you give some examples for different conversions?
Well, pretty much any method that returns array *could* be called "to array", but not many would be good candidates for such a generic name. You might return an array structure to be incorporated into a JSON response, or to be passed to a template renderer, or to map to database columns, etc. The only case I can think of where a generic "toArray" method would make sense is if you're creating a general-purpose "collection" or "list" object, in which case you're probably better off directly implementing methods like map and sort, rather than encouraging users to convert it back to a plain array. It doesn't seem like a common enough use case to need a new language feature, when it's simple enough to write "$foo->toArray()" without anything new.
> The possibility to pass arrayable objects to the array_* methods should > be out of scope for the moment and is better discussed later to keep the > pieces small, I think. You are totally right, there may be some > unexpected behavior when doing that.
The reason I mentioned it is that without it the new type hint or interface seems rather limited: I can't imagine many "array" type constraints being replaced with "arrayable" if you had to remember not to pass the variable to array_map, sort, etc. Regards, -- Rowan Tommins (né Collins) [IMSoP]
  107812
November 17, 2019 23:01 mike@newclarity.net (Mike Schinkel)
> On Nov 17, 2019, at 2:27 PM, Rowan Tommins collins@gmail.com> wrote: > Well, pretty much any method that returns array *could* be called "to array", but not many would be good candidates for such a generic name. You might return an array structure to be incorporated into a JSON response, or to be passed to a template renderer, or to map to database columns, etc. > > The only case I can think of where a generic "toArray" method would make sense is if you're creating a general-purpose "collection" or "list" object, in which case you're probably better off directly implementing methods like map and sort, rather than encouraging users to convert it back to a plain array. It doesn't seem like a common enough use case to need a new language feature, when it's simple enough to write "$foo->toArray()" without anything new.
I have been pondering that objection ever since Steven Wade proposed __toArray() on the list a few months back. In part I agree with the sentiment, and in part I feel that perfection is being the enemy of the good here. But as I had no strong argument for how it could be done better I did not say anything on the matter. Consider this: __toArray() is hardly a rare case where a short name can be applied in multiple contexts. We have infinite contexts where we need to name methods for one context that might conflict with others, and so I have been agonizing for years over how to deal with this conundrum without resorting to really long method names. (As an aside, I think long method names result in harder to comprehend code, and as I have never seen a set of conventions that results in multiple programmers (almost) always choosing the same long names for the same use-cases I prefer our standards to suggest shorter names so that we can maintain more consistence across programmers.) Recently I have been experimenting with using namespaces instead of long method names, and I think using them can result in the best of both worlds and resolve your concern. Consider the following classes, each of which could have their own __toArray() method specific to their use-case: \Widgets\Widget \Widgets\JSON\Widget \Widgets\Mustache\Widget \Widgets\DbColumns\Widget With the above you can "have your cake and eat it too." Your namespace can specify the exact context for your __toArray() and thus developers who want to use that architecture can get the benefits of a __toArray() magic method. #jmtcw -Mike
  107815
November 17, 2019 23:51 rowan.collins@gmail.com (Rowan Tommins)
On 17/11/2019 23:01, Mike Schinkel wrote:
> > Consider this: __toArray() is hardly a rare case where a short > name can be applied in multiple contexts.  We have infinite > contexts where we need to name methods for one context that might > conflict with others, and so I have been agonizing for years over > how to deal with this conundrum without resorting to really long > method names. >
I'm not sure avoiding the name "toArray" necessarily leads to "really long method names" - even with extremely specific distinctions, you don't need to call the method "toJsonArrayForVersion5OfTheApi", just "toV5Json" or "getV5Array" or "formatForJsonV5". The important thing is that the method name is now communicating its *purpose*, whereas "toArray" communicates only its return type, and a hugely flexible return type at that.
> Recently I have been experimenting with using namespaces instead > of long method names, and I think using them can result in the > best of both worlds and resolve your concern.  Consider the > following classes, each of which could have their own __toArray() > method /specific/ to their use-case: > > \Widgets\Widget\Widgets\JSON\Widget\Widgets\Mustache\Widget\Widgets\DbColumns\Widget >
I'm not clear what these objects represent. If I have a Widget object passed out from some business logic, how do I make use of these other classes? Would I have to call "(array)(new \Widgets\Mustache\Widget($myWidget))", as sugar for "(new \Widgets\Mustache\Widget($myWidget))->__toArray()"? If so, I don't really see the benefit of the magic method over just standardising a method name, like "interface MustacheFormatter { public function getData(): array; }" Which is basically my objection to __toArray() - I can't think of many situations where writing (array)$foo saves or gains you anything over writing $foo->asArray() or $foo->somethingMoreSpecific() Regards, -- Rowan Tommins (né Collins) [IMSoP]
  107816
November 18, 2019 00:18 mike@newclarity.net (Mike Schinkel)
> On Nov 17, 2019, at 6:51 PM, Rowan Tommins collins@gmail.com> wrote: > > I'm not sure avoiding the name "toArray" necessarily leads to "really long method names" - even with extremely specific distinctions, you don't need to call the method "toJsonArrayForVersion5OfTheApi", just "toV5Json" or "getV5Array" or "formatForJsonV5".
I now realize that my commenting on my experience in reviewing legacy code — where long names are frequently used, regardless of the fact they are not required — caused you to focus on the long naming comment aside and not on the primary ask for consistency. Even if short, a code base littered with method names like toV5Json() or getV5Array() or formatForJsonV5() is still inconsistent.
> The important thing is that the method name is now communicating its *purpose*, whereas "toArray" communicates only its return type, and a hugely flexible return type at that.
What I was suggesting instead, which obviously was not clear, is that you communicate the *purpose* with the namespace and by doing so allows for the capability of casting to array be added. Otherwise the perfect is the enemy of the good.
> I'm not clear what these objects represent. If I have a Widget object passed out from some business logic,
Similarly it is not clear to me was toV5Json() or getV5Array() or formatForJsonV5() mean. However if V5 somehow has meaning then your namespaces class could just as easily be named if that is what you want: \Widgets\ToV5Json\Widget
> how do I make use of these other classes? Would I have to call "(array)(new \Widgets\Mustache\Widget($myWidget))", as sugar for "(new \Widgets\Mustache\Widget($myWidget))->__toArray()"?
Maybe we code differently, but I would never write a full namespace in code like that. I would have one of the following at the top of my PHP file: use \Widgets\Mustache\Widget use \Widgets\Mustache\Widget AS MustacheWidget That would result in the following as one of — a you say — sugar for (new \Widgets\Mustache\Widget($myWidget))->__toArray(): (array)(new Widget($myWidget)) (array)(new MustacheWidget($myWidget)) However I doubt I would ever instantiate and then cast the object in the same line of code. Instead the value of array casting — to me — is in its polymorphic nature where I can have a "generic" method that can just cast to an array and instead of having to know if the developer used toV5Json() or getV5Array() or formatForJsonV5().
> If so, I don't really see the benefit of the magic method over just standardising a method name, like "interface MustacheFormatter { public function getData(): array; }"
Backward compatibility is the benefit. If we standardize one any name then we can break existing code. Adding a magic method cannot break existing code unless that code violated the reserved nature of double underscore methods. Or are you proposing we start adding __*() methods that are not actually magic?
> Which is basically my objection to __toArray() - I can't think of many situations where writing (array)$foo saves or gains you anything over writing $foo->asArray() or $foo->somethingMoreSpecific()
But I and others can think of situations where it would help. Because asArray() and somethingMoreSpecific() are neither a standard you don't get polymorphism with different named methods. And any new standard non-magic method we add has the potential to break BC. And is unlike anything else in PHP I can think of. OTOH, I have a hard time thinking of a scenario where __toArray() would actually cause a problem for someone like you who cannot think of a benefit. Can you present a use-case how it would cause you a tangible problem that could not be resolved with namespaces or by just creating your own somethingMoreSpecific() method and ignoring __toArray()? -Mike
  107820
November 18, 2019 10:53 rowan.collins@gmail.com (Rowan Tommins)
On Mon, 18 Nov 2019 at 00:18, Mike Schinkel <mike@newclarity.net> wrote:

> I now realize that my commenting on my experience in reviewing legacy code > — where long names are frequently used, regardless of the fact they are not > required — caused you to focus on the long naming comment aside and not on > the primary ask for consistency. > > Even if short, a code base littered with method names like toV5Json() or > getV5Array() or formatForJsonV5() is still inconsistent. > >
Inconsistent with what? Unless you suggest we introduce enough magic methods that every method name in your application begins with two underscores, you are always going to have to name things. If there is a common requirement for objects to be converted to a particular type of array, then you should pick a standard name for that and enforce it via code review; or better, make all those objects implement an interface, and the compiler will enforce it for you. That seems no harder to me than enforcing a convention that __toArray should always be included, and have a particular meaning.
> What I was suggesting instead, which obviously was not clear, is that you > communicate the *purpose* with the namespace and by doing so allows for the > capability of casting to array be added. Otherwise the perfect is the > enemy of the good. > >
I'm not asking for any kind of perfection, I'm just saying names should be meaningful. Bear in mind that namespaces are really just part of the class name, so all you're really saying is that you like long specific class names and short generic method names.
> Similarly it is not clear to me was toV5Json() or getV5Array() or > formatForJsonV5() mean. > >
I was imagining an API that had gone through different versions, so needed methods to serialize objects into the JSON published in version 5 of a specification. how do I make use of these other classes? Would I have to call "(array)(new
> \Widgets\Mustache\Widget($myWidget))", as sugar for "(new > \Widgets\Mustache\Widget($myWidget))->__toArray()"? > > > Maybe we code differently, but I would never write a full namespace in > code like that. I would have one of the following at the top of my PHP file: > >
Yes, that example was just me being lazy when typing the example, sorry. I was trying to understand what the "MustacheWidget" class did, and it seems I was correct in seeing it as an adapter.
> If so, I don't really see the benefit of the magic method over just > standardising a method name, like "interface MustacheFormatter { public > function getData(): array; }" > > > Backward compatibility is the benefit. If we standardize one any name > then we can break existing code. > >
If your code base is such a mess that you can't propose any method name without it conflicting, and can't trace usages of any method that needs renaming to not conflict, then yes, you get a one-off benefit from the fact that __ is reserved. That seems a pretty poor justification for a language feature. Since you're suggesting separate classes for each transform, how many public methods do they have? Would making everything in your *\Mustache\* namespaces implement a MustacheFormatter interface actually be harder than adding a __toArray method to them all? One huge benefit I haven't mentioned yet is the ability to add parameters. Imagine if your mustache formatters are used in both plain text and HTML contexts, and that affects some of the data to return. If you have a MustacheFormatter interface, you could alter it to require "getData(bool $forHtml=false)", and only change the implementations where it mattered. With __toArray(), you are limited to one parameter-less method per class, so would have to create new copies of every single formatter, in a new MustacheHtml namespace. Thinking about it, if you're only going to allow one method per class, with the class's namespaced name telling you what it does, you might as well use __invoke(): use Foo\Mustache\Widget as MustacheWidget; $tplWidget = new MustacheWidget($myWidget); $output = $tplWidget(); // calls Foo\Mustache\Widget::__invoke()
> Because asArray() and somethingMoreSpecific() are neither a standard you > don't get polymorphism with different named methods. > >
You can only rely on __toArray if you standardise on every object implementing it; at which point, you can standardise on any name you like. People are quite happily using polymorphism with named methods in literally millions of OO code bases.
> And any new standard non-magic method we add has the potential to break > BC. And is unlike anything else in PHP I can think of. > >
I'm not suggesting *PHP* standardises on some other name, just that *your code base* standardises on common names for common tasks.
> OTOH, I have a hard time thinking of a scenario where __toArray() would > actually cause a problem for someone like you who cannot think of a > benefit. Can you present a use-case how it would cause you a tangible > problem that *could not be resolved* with namespaces or by just creating > your own somethingMoreSpecific() method and ignoring __toArray()? > >
Well, I started this sub-thread saying I was "sceptical" rather than "strongly opposed". If the feature was added, I would simply ignore it, and probably argue against its use in code review and style guides. However, other people have pointed out that unlike (string)$object, (array)$object does have a default behaviour, and adding an overload for it has the potential to break code relying on that. So it's not an entirely zero-cost feature. Regards, -- Rowan Tommins [IMSoP]
  107822
November 18, 2019 17:55 mike@newclarity.net (Mike Schinkel)
> On Nov 18, 2019, at 5:53 AM, Rowan Tommins collins@gmail.com> wrote: > > On Mon, 18 Nov 2019 at 00:18, Mike Schinkel <mike@newclarity.net> wrote: >> Even if short, a code base littered with method names like toV5Json() or >> getV5Array() or formatForJsonV5() is still inconsistent. > > Inconsistent with what?
Inconsistent with each other. If one developer names it toV5Json() and another developer names it getV5Array() then those two developers choices are inconsistent. Yes, a team lead could require an interface be used for consistency across a team, which is fine, but it is not consistent across unrelated projects. Having worked with GoLang for a while where interfaces are not required to be explicitly named by classes that implement them I have learned that having consistency across unrelated projects can result in some surprisingly serendipitous reuse scenarios.
> Unless you suggest we introduce enough magic methods that every method > name in your application begins with two underscores, you are always going > to have to name things.
Forgive me for saying, but that argument is bordering on reductio ad absurdum. We are talking about a magic method for converting to a built-in data type. In PHP — ignoring null — there are only seven built-in data types, we already have a magic method for one of them (string), having one for resource makes no sense to me since it is such a special case, and so the only ones for which most instances of a declared class can be fully captured are array and object. IOW, two (2) potential magic methods: __toArray() and either __toObject() or __toStdClass(). That said, I could see a value in having __toInt(), __toFloat(), and __toBool() as having them would allow us to represent scalar data types in classes. That could be extremely useful in selected contexts. So given the argument there would at maximum be five (5) new magic methods that could be added vs. the infinite methods you assert this approach would invite. P.S. That said, I could see adding more for well-known data that goes beyond existing data types, e.g. JSON, XML, HTML, CSV, etc. but really if we did that I would argue that PHP should create a first-class built-in data type for them first.
> If there is a common requirement for objects to be converted to a > particular type of array, then you should pick a standard name for that
You assert that a developer "should" do it the way you believe is correct, but what is the arbiter of "should" vs. "should not" here? Is there some fundamental best practice that the industry has universally embraced that I am unfamiliar with, or just your opinion? (I am not trying to be sarcastic, but I am challenging you to justify your use of "should.")
> I'm not asking for any kind of perfection, I'm just saying names should be > meaningful. Bear in mind that namespaces are really just part of the class > name, so all you're really saying is that you like long specific class > names and short generic method names.
Absolutely agree that names need to be meaningful, which is why I want more control, not less. What you are implicitly saying is that rather than allow others the option to use __toArray() which by its nature could result in consistent use across unrelated projects you want to deny others who value it from being able to take advantage of it because you personally do not see value in it and would not use it. Or did I misrepresent?
>> Similarly it is not clear to me was toV5Json() or getV5Array() or >> formatForJsonV5() mean. > > I was imagining an API that had gone through different versions, so needed > methods to serialize objects into the JSON published in version 5 of a > specification.
Godforbid someone uses that kind of naming for their methods. Versioning screams out for encoding into namespace names, not method names. Why burden the developer with the requirement to use the version name in every single method call? Better to isolate it to use a use statement. BUT, this is a tangent and a different debate from the discussion about __toArray().
> If your code base is such a mess that you can't propose any method name > without it conflicting, and can't trace usages of any method that needs > renaming to not conflict, then yes, you get a one-off benefit from the fact > that __ is reserved. That seems a pretty poor justification for a language > feature.
I was not talking about *my codebase*, I was talking about the collective entirety of all existing PHP codebases. So I now understand that you were proposing even less than I thought you were proposing. I thought you were proposing that PHP would recognize and give speaking meaning to a method called toArray() or similar. Now I realize you were proposing even less and instead placing the burden of said standardization on each project, guaranteeing no serendipitous reuse across projects. :-(
> One huge benefit I haven't mentioned yet is the ability to add parameters. > Imagine if your mustache formatters are used in both plain text and HTML > contexts, and that affects some of the data to return.
When you have the need for parameters, by all means create a named method. But why block the use cases that do not need parameters? Why not simply let developers who find __toArray() useful be able to use them?
> If you have a > Thinking about it, if you're only going to allow one method per class, with > the class's namespaced name telling you what it does, you might as well use > __invoke():
But it is reasonable to expect that a class could need both an __invoke() and a __toArray() method. Also there is no guarantee that an object that implements __invoke() will return an array whereas there should be a reasonable guarantee that if function_exists($object,'__toArray') is true that casting to (array) would be valid.
> You can only rely on __toArray if you standardise on every object > implementing it; at which point, you can standardise on any name you like.
But method name standardization does not work across unrelated projects. Because programmers. Casting to (array) would work across unrelated projects.
> People are quite happily using polymorphism with named methods in literally > millions of OO code bases.
https://en.wikipedia.org/wiki/Status_quo_bias <https://en.wikipedia.org/wiki/Status_quo_bias>
> I'm not suggesting *PHP* standardises on some other name, just that *your > code base* standardises on common names for common tasks.
And therein is where your argument breaks down compared to the argument for __toArray(). It assumes each team's and/or project's codebase is an island.
> Well, I started this sub-thread saying I was "sceptical" rather than > "strongly opposed". If the feature was added, I would simply ignore it, and > probably argue against its use in code review and style guides.
I think I have discovered another difference between you and I. When someone proposes a feature I can easily ignore, I do not write long emails arguing against it. #justsaying :-)
> However, other people have pointed out that unlike (string)$object, > (array)$object does have a default behaviour, and adding an overload for it > has the potential to break code relying on that. So it's not an entirely > zero-cost feature.
I do not see this as an issue because this BC break is entirely opt-in by the developer. It is possible a library developer could add a __toArray() in a new version of a open-source library where one did not previously exist and thus break other's code that depended on the previous version of the library. That would be no different than if the library developer changed the behavior of one of their existing methods. So that would be one the library developer, not on PHP. -Mike
  107824
November 18, 2019 19:09 rowan.collins@gmail.com (Rowan Tommins)
On Mon, 18 Nov 2019 at 17:55, Mike Schinkel <mike@newclarity.net> wrote:

> > Yes, a team lead could require an interface be used for consistency across > a team, which is fine, but it is not consistent across unrelated projects. > Having worked with GoLang for a while where interfaces are not required to > be explicitly named by classes that implement them I have learned that > having consistency across unrelated projects can result in some > surprisingly serendipitous reuse scenarios. > >
That's actually precisely why I *don't* like generic names like __toArray(). How do I know looking at it whether this particular method will give me something suitable for passing to a Mustache template, or just a list of IDs used in some cross-referencing algorithm? The only thing I know from the name is "it returns an array".
> Forgive me for saying, but that argument is bordering on reductio > ad absurdum. We are talking about a magic method for converting to a > *built-in *data type. > > ...
> That said, I could see a value in having __toInt(), __toFloat(), and > __toBool() as having them would allow us to represent scalar data types > in classes. That could be extremely useful in selected contexts. > > So given the argument there would at maximum be five (5) new magic methods > that could be added vs. the infinite methods you assert this approach would > invite. > >
I didn't say it would *invite* infinite methods, I said it would only give the claimed benefits if you had a huge number of magic / "standard" methods, because you need to "standardise" every method you have in your current code base. __toBool() is perhaps a good comparison. Would you consider renaming a method called "isSuccessful()" to "__toBool()" a positive change? Because I would not. If there is a common requirement for objects to be converted to a
> particular type of array, then you should pick a standard name for that > > > You assert that a developer *"should"* do it the way you believe is > correct, but what is the arbiter of *"should"* vs. *"should not"* here? > Is there some fundamental best practice that the industry has universally > embraced that I am unfamiliar with, or just your opinion? > >
Are you saying that you *don't* believe consistency is a good thing? Because that's all this sentence really says: "if you have something you do a lot, you should do it in a consistent way". The justification for that can be found in the introduction to any style guide you can lay your hands on.
> Absolutely agree that names need to be meaningful, which is why I want > more control, not less. > > What you are implicitly saying is that rather than allow others the option > to use __toArray() which by its nature could result in consistent use > across unrelated projects you want to deny others who value it from being > able to take advantage of it because you personally do not see value in it > and would not use it. Or did I misrepresent? > >
I think you are misrepresenting slightly. I haven't actually said I want to "deny" or "block" this addition. What I have said is that I am unconvinced by the arguments being put forward in favour of it, and have tried to point out what I see as flaws in those arguments. In particular, if your claim is that it will make code more consistent across code bases, it's no use if you're the only person that adopts it. So let me rephrase from a personal objection to a prediction: I predict that people will not use this feature consistently in a way that will bring the benefits you claim for it.
> So I now understand that you were proposing even less than I thought you > were proposing. I thought you were proposing that PHP would recognize and > give speaking meaning to a method called toArray() or similar. Now I > realize you were proposing even less and instead placing the burden of said > standardization on each project, guaranteeing no serendipitous reuse across > projects. :-( > >
There is certainly benefit to standardising things across projects, and that's what organisations like PHP-FIG try to do. However, standardisation is more useful the more specific it is, and if the definition of "toArray" is just "it returns an array that in some way represents the object", I'm struggling to see how that would lead to useful interoperability.
> But it is reasonable to expect that a class could need both an __invoke() > and a __toArray() method. > >
It's also reasonable to expect that a class could need two methods that return arrays; you've already "solved" that, by splitting the methods among more classes and namespaces so they don't need unique names any more.
> Also there is no guarantee that an object that implements __invoke() will > return an array whereas there should be a reasonable guarantee that if > function_exists($object,'__toArray') is true that casting to (array) > would be valid. > >
Which brings us back to square one: knowing that a method returns an array isn't enough; I need to know what kind of array that is, and the thing that normally tells me that is the method's name. Regards, -- Rowan Tommins [IMSoP]