Convert ext/xml to use an object instead of resource

  104361
February 12, 2019 15:00 nikita.ppv@gmail.com (Nikita Popov)
Hi internals,

The ext/xml extension currently has GC issues, see
https://bugs.php.net/bug.php?id=76874. The tl;dr is that uses of xml_parser
will usually result in a cyclic structure, but resources do not support
cycle GC. This means that the user is required to take care of breaking GC
cycles manually, which is very unusual for PHP.

I would like to port the xml extension from using a resources towards using
an object, which is fully GC integrated. This is implemented in
https://github.com/php/php-src/pull/3526.

An XmlParser class is used instead of a resource. The class is final and
cannot be manually constructed. It is still used with the normal
xml_parser_* APIs. This is intended as an internal representation change,
not a conversion to OO APIs. The xml_parser_free() method becomes
effectively a no-op and may be deprecated in some future version.

This change is intended for PHP 7.4. While it is technically BC breaking
(in that code that goes out of the way to use is_resource or similar may
break), but we've done a couple of these resource->object migration in
minor versions in the past (ref
https://wiki.php.net/rfc/operator_overloading_gmp and
https://wiki.php.net/rfc/hash-context.as-resource).

Any thoughts?

Regards,
Nikita
  104362
February 12, 2019 15:29 bishop@php.net (Bishop Bettini)
On Tue, Feb 12, 2019 at 10:00 AM Nikita Popov ppv@gmail.com> wrote:

> The ext/xml extension currently has GC issues, see > https://bugs.php.net/bug.php?id=76874. The tl;dr is that uses of > xml_parser > will usually result in a cyclic structure, but resources do not support > cycle GC. This means that the user is required to take care of breaking GC > cycles manually, which is very unusual for PHP.
> I would like to port the xml extension from using a resources towards using > an object, which is fully GC integrated. This is implemented in > https://github.com/php/php-src/pull/3526. > > An XmlParser class is used instead of a resource. The class is final and > cannot be manually constructed. It is still used with the normal > xml_parser_* APIs. This is intended as an internal representation change, > not a conversion to OO APIs. The xml_parser_free() method becomes > effectively a no-op and may be deprecated in some future version. > > This change is intended for PHP 7.4. While it is technically BC breaking > (in that code that goes out of the way to use is_resource or similar may > break), but we've done a couple of these resource->object migration in > minor versions in the past (ref > https://wiki.php.net/rfc/operator_overloading_gmp and > https://wiki.php.net/rfc/hash-context.as-resource). >
+1 for movement away from resources, generally. Resources represent connections to external resources, in a manner that's opaque to userland [1]. Would it make sense to update is_resource (and friends) to be aware that "resources" returned from xml_parser_* are not resources proper, but rather resources nominal? If userland needed to strictly determine what was a resource proper and what was a resource nominal, we could improve the signature: is_resource(mixed $var, bool $strict = false). But, honestly, the aforementioned opaqueness makes me think this wouldn't be necessary. [1]: http://php.net/manual/en/language.types.resource.php
  104365
February 12, 2019 15:56 johannes@schlueters.de (Johannes =?ISO-8859-1?Q?Schl=FCter?=)
On Di, 2019-02-12 at 10:29 -0500, Bishop Bettini wrote:
> Would it make sense to update is_resource > (and friends) to be aware that "resources" returned from xml_parser_* > are not resources proper, but rather resources nominal? > > If userland needed to strictly determine what was a resource proper > and what was a resource nominal, we could improve the signature: > is_resource(mixed $var, bool $strict = false). But, honestly, the > aforementioned opaqueness makes me think this wouldn't be necessary.
How do you distinguish an object with a "resource" from one not representing such a thing? - Any object? - All internal ones? - Add a class_entry flag? - Hardcode a list? None of those really make sense to me. The `is_*` routines look at PHP's type system, the information for general use is limited (we don't have ReflectionResource as well ... while that couldn't do much anyways as a resource is only a void*) The only use case here seems to be    $foo = xml_*();    if (!s_resource($foo)) error(); Which is limited, since the only xml function returning a resource seems to be xml_parser_create() which will always succeed (except out of memory or similar) johannes
  104366
February 12, 2019 15:58 kontakt@beberlei.de (Benjamin Eberlei)
Johannes Schlüter <johannes@schlueters.de> schrieb am Di. 12. Feb. 2019 um
16:57:

> On Di, 2019-02-12 at 10:29 -0500, Bishop Bettini wrote: > > Would it make sense to update is_resource > > (and friends) to be aware that "resources" returned from xml_parser_* > > are not resources proper, but rather resources nominal? > > > > If userland needed to strictly determine what was a resource proper > > and what was a resource nominal, we could improve the signature: > > is_resource(mixed $var, bool $strict = false). But, honestly, the > > aforementioned opaqueness makes me think this wouldn't be necessary. > > How do you distinguish an object with a "resource" from one not > representing such a thing? > > - Any object? > - All internal ones? > - Add a class_entry flag? > - Hardcode a list?
Extend an interface maybe
> > > None of those really make sense to me. > > The `is_*` routines look at PHP's type system, the information for > general use is limited (we don't have ReflectionResource as well ... > while that couldn't do much anyways as a resource is only a void*) > > The only use case here seems to be > > $foo = xml_*(); > if (!s_resource($foo)) error(); > > Which is limited, since the only xml function returning a resource > seems to be xml_parser_create() which will always succeed (except out > of memory or similar) > > johannes > > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > >
  104367
February 12, 2019 16:04 rowan.collins@gmail.com (Rowan Collins)
On Tue, 12 Feb 2019 at 15:30, Bishop Bettini <bishop@php.net> wrote:

> +1 for movement away from resources, generally. > > Resources represent connections to external resources, in a manner that's > opaque to userland [1]. Would it make sense to update is_resource (and > friends) to be aware that "resources" returned from xml_parser_* are not > resources proper, but rather resources nominal? >
While such a feature would have limited use in this case, as Johannes points out, it might be a nice general pattern to establish for migrating other resources in future. All such objects could implement an empty "Resource" interface, which couldn't be implemented directly from userland, and just triggered this magic behaviour. There would still be other BC impacts, though; for instance, gettype() would return "object" instead of "resource", unless we overloaded that based on the interface as well, which might be a bit confusing. Regards, -- Rowan Collins [IMSoP]
  104368
February 12, 2019 16:12 bishop@php.net (Bishop Bettini)
On Tue, Feb 12, 2019 at 11:04 AM Rowan Collins collins@gmail.com>
wrote:

> On Tue, 12 Feb 2019 at 15:30, Bishop Bettini <bishop@php.net> wrote: > > > +1 for movement away from resources, generally. > > > > Resources represent connections to external resources, in a manner that's > > opaque to userland [1]. Would it make sense to update is_resource (and > > friends) to be aware that "resources" returned from xml_parser_* are not > > resources proper, but rather resources nominal? > > > > > While such a feature would have limited use in this case, as Johannes > points out, it might be a nice general pattern to establish for migrating > other resources in future. All such objects could implement an empty > "Resource" interface, which couldn't be implemented directly from userland, > and just triggered this magic behaviour. >
Exactly. I hope we would adopt a long-vision, since migrating from resources to objects has been discussed as the desired future state of things.
> There would still be other BC impacts, though; for instance, gettype() > would return "object" instead of "resource", unless we overloaded that > based on the interface as well, which might be a bit confusing. >
I think gettype should return "resource" for BC, but get_resource_type would be free to add more context about the now-object nature.
  104371
February 12, 2019 16:41 cmbecker69@gmx.de ("Christoph M. Becker")
On 12.02.2019 at 17:04, Rowan Collins wrote:

> On Tue, 12 Feb 2019 at 15:30, Bishop Bettini <bishop@php.net> wrote: > >> +1 for movement away from resources, generally. >> >> Resources represent connections to external resources, in a manner that's >> opaque to userland [1]. Would it make sense to update is_resource (and >> friends) to be aware that "resources" returned from xml_parser_* are not >> resources proper, but rather resources nominal? > > While such a feature would have limited use in this case, as Johannes > points out, it might be a nice general pattern to establish for migrating > other resources in future. All such objects could implement an empty > "Resource" interface, which couldn't be implemented directly from userland, > and just triggered this magic behaviour. > > There would still be other BC impacts, though; for instance, gettype() > would return "object" instead of "resource", unless we overloaded that > based on the interface as well, which might be a bit confusing.
And we must not forget that resource to int conversion is clearly defined, while object to int conversion is not. -- Christoph M. Becker
  104369
February 12, 2019 16:17 nikita.ppv@gmail.com (Nikita Popov)
On Tue, Feb 12, 2019 at 4:30 PM Bishop Bettini <bishop@php.net> wrote:

> On Tue, Feb 12, 2019 at 10:00 AM Nikita Popov ppv@gmail.com> > wrote: > >> The ext/xml extension currently has GC issues, see >> https://bugs.php.net/bug.php?id=76874. The tl;dr is that uses of >> xml_parser >> will usually result in a cyclic structure, but resources do not support >> cycle GC. This means that the user is required to take care of breaking GC >> cycles manually, which is very unusual for PHP. > > >> I would like to port the xml extension from using a resources towards >> using >> an object, which is fully GC integrated. This is implemented in >> https://github.com/php/php-src/pull/3526. >> >> An XmlParser class is used instead of a resource. The class is final and >> cannot be manually constructed. It is still used with the normal >> xml_parser_* APIs. This is intended as an internal representation change, >> not a conversion to OO APIs. The xml_parser_free() method becomes >> effectively a no-op and may be deprecated in some future version. >> >> This change is intended for PHP 7.4. While it is technically BC breaking >> (in that code that goes out of the way to use is_resource or similar may >> break), but we've done a couple of these resource->object migration in >> minor versions in the past (ref >> https://wiki.php.net/rfc/operator_overloading_gmp and >> https://wiki.php.net/rfc/hash-context.as-resource). >> > > +1 for movement away from resources, generally. > > Resources represent connections to external resources, in a manner that's > opaque to userland [1]. Would it make sense to update is_resource (and > friends) to be aware that "resources" returned from xml_parser_* are not > resources proper, but rather resources nominal? > > If userland needed to strictly determine what was a resource proper and > what was a resource nominal, we could improve the signature: > is_resource(mixed $var, bool $strict = false). But, honestly, the > aforementioned opaqueness makes me think this wouldn't be necessary. > > [1]: http://php.net/manual/en/language.types.resource.php >
Very much opposed to any kind of special handling for is_resource(). We used to do this for is_object() and __PHP_Incomplete_Class and I'm very happy to be rid of this special behavior. Let's not add is back in a new place. Nikita
  104370
February 12, 2019 16:21 pollita@php.net (Sara Golemon)
On Tue, Feb 12, 2019 at 10:18 AM Nikita Popov ppv@gmail.com> wrote:
> Very much opposed to any kind of special handling for is_resource(). We > used to do this for is_object() and __PHP_Incomplete_Class and I'm very > happy to be rid of this special behavior. Let's not add is back in a new > place. > Agreed. This is something that we discuss every time a resource->object
conversion comes up and we always agree that special-casing lies the way of madness and the tiny fraction of user code that actually checks this can adapt `if (is_resource($x) || $x instanceof NewClassName)` Done and sorted. Let's skip this repeat of the past and move forward with the killing all resources with fire. -Sara
  104375
February 12, 2019 22:13 bishop@php.net (Bishop Bettini)
On Tue, Feb 12, 2019 at 11:21 AM Sara Golemon <pollita@php.net> wrote:

> On Tue, Feb 12, 2019 at 10:18 AM Nikita Popov ppv@gmail.com> > wrote: > > Very much opposed to any kind of special handling for is_resource(). We > > used to do this for is_object() and __PHP_Incomplete_Class and I'm very > > happy to be rid of this special behavior. Let's not add is back in a new > > place. > > > Agreed. This is something that we discuss every time a resource->object > conversion comes up and we always agree that special-casing lies the way of > madness and the tiny fraction of user code that actually checks this can > adapt `if (is_resource($x) || $x instanceof NewClassName)` Done and > sorted. Let's skip this repeat of the past and move forward with the > killing all resources with fire. >
Ok. Thinking longer range, perhaps we should prepare users for these transitions by improving the documentation. For example, xml_parser_create returns false on failure, but the manual <http://php.net/manual/en/function.xml-parser-create.php> makes no mention of this. If we had this documented, we might see more if (false == $x) in the wild than the (worse IMO) if (is_resource($x)). It might also help to comment on resource factories that working with the return value as a resource proper is discouraged. I'll do this for xml_parser_create*.
  104363
February 12, 2019 15:49 cmbecker69@gmx.de ("Christoph M. Becker")
On 12.02.2019 at 16:00, Nikita Popov wrote:

> The ext/xml extension currently has GC issues, see > https://bugs.php.net/bug.php?id=76874. The tl;dr is that uses of xml_parser > will usually result in a cyclic structure, but resources do not support > cycle GC. This means that the user is required to take care of breaking GC > cycles manually, which is very unusual for PHP. > > I would like to port the xml extension from using a resources towards using > an object, which is fully GC integrated. This is implemented in > https://github.com/php/php-src/pull/3526. > > An XmlParser class is used instead of a resource. The class is final and > cannot be manually constructed. It is still used with the normal > xml_parser_* APIs. This is intended as an internal representation change, > not a conversion to OO APIs. The xml_parser_free() method becomes > effectively a no-op and may be deprecated in some future version. > > This change is intended for PHP 7.4. While it is technically BC breaking > (in that code that goes out of the way to use is_resource or similar may > break), but we've done a couple of these resource->object migration in > minor versions in the past (ref > https://wiki.php.net/rfc/operator_overloading_gmp and > https://wiki.php.net/rfc/hash-context.as-resource). > > Any thoughts?
I'm all for switching to objects in 7.4. I'm a bit concerned regarding the name of the new `XmlParser` class, though, since the xml_set_object() (a terrible API, by the way) example[1] uses exactly this class name, and there may be quite some code which followed suit. Maybe we should call the class `XMLPushParser`, although that would be inconsistent with functions like xml_parser_create(). [1] <http://php.net/manual/en/function.xml-set-object.php#refsect1-function.xml-set-object-examples> -- Christoph M. Becker
  104855
March 21, 2019 14:39 nikita.ppv@gmail.com (Nikita Popov)
On Tue, Feb 12, 2019 at 4:00 PM Nikita Popov ppv@gmail.com> wrote:

> Hi internals, > > The ext/xml extension currently has GC issues, see > https://bugs.php.net/bug.php?id=76874. The tl;dr is that uses of > xml_parser will usually result in a cyclic structure, but resources do not > support cycle GC. This means that the user is required to take care of > breaking GC cycles manually, which is very unusual for PHP. > > I would like to port the xml extension from using a resources towards > using an object, which is fully GC integrated. This is implemented in > https://github.com/php/php-src/pull/3526. > > An XmlParser class is used instead of a resource. The class is final and > cannot be manually constructed. It is still used with the normal > xml_parser_* APIs. This is intended as an internal representation change, > not a conversion to OO APIs. The xml_parser_free() method becomes > effectively a no-op and may be deprecated in some future version. > > This change is intended for PHP 7.4. While it is technically BC breaking > (in that code that goes out of the way to use is_resource or similar may > break), but we've done a couple of these resource->object migration in > minor versions in the past (ref > https://wiki.php.net/rfc/operator_overloading_gmp and > https://wiki.php.net/rfc/hash-context.as-resource). > > Any thoughts? > > Regards, > Nikita >
I'd like to move forward with this. Jakub requested this change to be postponed to 8.0 in https://github.com/php/php-src/pull/3526#issuecomment-469796632 to minimize BC issues. As 8.0 is close I'm okay with just doing that, this is not super urgent for me. Is there anything else that needs to be resolved here, or is this good to go for PHP 8.0? Nikita
  105147
April 8, 2019 14:27 nikita.ppv@gmail.com (Nikita Popov)
On Thu, Mar 21, 2019 at 3:39 PM Nikita Popov ppv@gmail.com> wrote:

> On Tue, Feb 12, 2019 at 4:00 PM Nikita Popov ppv@gmail.com> wrote: > >> Hi internals, >> >> The ext/xml extension currently has GC issues, see >> https://bugs.php.net/bug.php?id=76874. The tl;dr is that uses of >> xml_parser will usually result in a cyclic structure, but resources do not >> support cycle GC. This means that the user is required to take care of >> breaking GC cycles manually, which is very unusual for PHP. >> >> I would like to port the xml extension from using a resources towards >> using an object, which is fully GC integrated. This is implemented in >> https://github.com/php/php-src/pull/3526. >> >> An XmlParser class is used instead of a resource. The class is final and >> cannot be manually constructed. It is still used with the normal >> xml_parser_* APIs. This is intended as an internal representation change, >> not a conversion to OO APIs. The xml_parser_free() method becomes >> effectively a no-op and may be deprecated in some future version. >> >> This change is intended for PHP 7.4. While it is technically BC breaking >> (in that code that goes out of the way to use is_resource or similar may >> break), but we've done a couple of these resource->object migration in >> minor versions in the past (ref >> https://wiki.php.net/rfc/operator_overloading_gmp and >> https://wiki.php.net/rfc/hash-context.as-resource). >> >> Any thoughts? >> >> Regards, >> Nikita >> > > I'd like to move forward with this. Jakub requested this change to be > postponed to 8.0 in > https://github.com/php/php-src/pull/3526#issuecomment-469796632 to > minimize BC issues. As 8.0 is close I'm okay with just doing that, this is > not super urgent for me. > > Is there anything else that needs to be resolved here, or is this good to > go for PHP 8.0? > > Nikita >
As there haven't been further comments here, I've merged this change for PHP 8 only. Nikita