JSON_THROW_ON_ERROR implementation detail

  105653
May 9, 2019 16:06 Danack@basereality.com (Dan Ackroyd)
Hi,

Apparently there is an implementation detail in JSON_THROW_ON_ERROR
that differs in the RFC text, from the discussion on list
http://news.php.net/php.internals/100569:

> I decided to reset it to no error because there's no > previous error that needs handling by error-checking code, the exception > takes care of that.
RFC text:
> When passed this flag, the error behaviour of these functions is changed. > The global error state is left untouched, and if an error occurs that would > otherwise set it, these functions instead throw a JsonException with the > message and code set to whatever json_last_error() and json_last_error_msg() > would otherwise be respectively.
The implementation is highly surprising and can lead to bugs. Imagine this code exists in a code base. --------------------- $foo = json_decode('[bar'); // many lines of code. // many lines of code. // many lines of code. $bar = json_encode('Hello world!'); if (json_last_error() !== JSON_ERROR_NONE) { throw new Exception("encoding went wrong!"); } echo "All is fine"; // output is "All is fine" --------------------- And to start moving to have json_encode throw on error, they start using the flag on the json_encode ---------------------- $foo = json_decode('[bar'); // many lines of code. // many lines of code. // many lines of code. $bar = json_encode('Hello world!', JSON_THROW_ON_ERROR); if (json_last_error() !== JSON_ERROR_NONE) { throw new Exception("encoding went wrong!"); } echo "All is fine"; ------------------- The result of this is that the user throw exception will be thrown even though the encode worked correctly. https://3v4l.org/JJD5g This is highly surprising*, and doesn't seem the right thing to be doing. Does anyone have any objection to changing this, so that json_last_error() + json_last_error_msg() always refer to the most recent json_encode()/json_decode() function, as per the discussion. Or does this need an RFC? I've logged a bug for this regardless - https://bugs.php.net/bug.php?id=77997 cheers Dan Ack * https://en.wikipedia.org/wiki/Principle_of_least_astonishment
  105665
May 10, 2019 14:42 bishop@php.net (Bishop Bettini)
On Thu, May 9, 2019 at 12:06 PM Dan Ackroyd <Danack@basereality.com> wrote:

> Apparently there is an implementation detail in JSON_THROW_ON_ERROR > that differs in the RFC text, from the discussion on list > http://news.php.net/php.internals/100569: > > > I decided to reset it to no error because there's no > > previous error that needs handling by error-checking code, the exception > > takes care of that. > > RFC text: > > > When passed this flag, the error behaviour of these functions is changed. > > The global error state is left untouched, and if an error occurs that > would > > otherwise set it, these functions instead throw a JsonException with the > > message and code set to whatever json_last_error() and > json_last_error_msg() > > would otherwise be respectively. > > The implementation is highly surprising and can lead to bugs. > > Imagine this code exists in a code base. > > --------------------- > > $foo = json_decode('[bar'); > // many lines of code. > // many lines of code. > // many lines of code. > > $bar = json_encode('Hello world!'); > if (json_last_error() !== JSON_ERROR_NONE) { > throw new Exception("encoding went wrong!"); > } > > echo "All is fine"; > // output is "All is fine" > --------------------- > > > And to start moving to have json_encode throw on error, they start > using the flag on the json_encode > > ---------------------- > > $foo = json_decode('[bar'); > // many lines of code. > // many lines of code. > // many lines of code. > > $bar = json_encode('Hello world!', JSON_THROW_ON_ERROR); > if (json_last_error() !== JSON_ERROR_NONE) { > throw new Exception("encoding went wrong!"); > } > > echo "All is fine"; > > ------------------- > > The result of this is that the user throw exception will be thrown > even though the encode worked correctly. https://3v4l.org/JJD5g > > This is highly surprising*, and doesn't seem the right thing to be doing. >
The new code has what looks like, to me, a refactor bug. I'd expect that, when one adds the JSON_THROW_ON_ERROR flag, one must also remove the subsequent json_last_error() handling, as it's no longer relevant to that call and instead relevant to the most recent prior call. In this case, since the most recent prior call had no error handling itself, it now gets one attached to it. I don't find this action at a distance surprising: it's one of the known intrinsic downsides of a global-last-error mechanism. IMO, that this code now fails is a good thing, because now the author knows to error check after json_decode().
> Does anyone have any objection to changing this, so that > json_last_error() + json_last_error_msg() always refer to the most > recent json_encode()/json_decode() function, as per the discussion. Or > does this need an RFC? >
I think I'd rather see this handled in the documentation. "Note that you should review use of json_last_error() and json_last_error_msg() error handling when adding or removing the JSON_THROW_ON_ERROR flag." If there was to be any language change, I don't think we can't just change the behavior. Extant 7.3 code might be: // Run through an encode/decode cycle to validate the user-supplied data's valid. We don't // support nested arrays on user data, so we set a depth of 1. // @throws \App\Json\DecodeException when the user-supplied data is invalid json or is nested // @throws \JsonException when the data cannot be stored in valid form (eg, bad UTF-8 characters) $valid_data = json_encode(json_decode($user_data, 0, 1), JSON_THROW_ON_ERROR); if (JSON_ERROR_NONE !== json_last_error()) { throw new App\Json\DecodeException(json_last_error_msg(), json_last_error()); } Here, the author expects the error from json_decode() to flow through json_encode(). Yes, this code could be better written to separate out the branches, but it's plausible (IMO) and would be subject to a BC break by a language change on json_last_error() and JSON_THROW_ON_ERROR semantics. We could keep the last error from non-throw, and last error from throw, as separate global states, then provide either new functions to access each, or a boolean flag to json_last_error($includeLastExceptionError = false): int, but both of those seem like fitting a problem to a solution. In summary, mixed error handling interaction has a couple of possible implementations. We chose one, with its pros and cons. The other implementation has its pros and cons, as well. I don't see a reason to swap one set of consequences for another, esp. now that we've already walked down the road of choice. I'd rather we just raise awareness in the form of documentation, and let authors move their code toward one mechanism, or the other.
  105666
May 10, 2019 18:52 riikka.kalliomaki@riimu.net (=?UTF-8?Q?Riikka_Kalliom=C3=A4ki?=)
> The new code has what looks like, to me, a refactor bug. I'd expect that, when one adds the JSON_THROW_ON_ERROR flag, one must also remove the
subsequent json_last_error() handling, as it's no longer relevant to that call and instead relevant to the most recent prior call. This does create a kind of an unfortunate situation with vendor libraries, however. Any library that accepts arbitrary json flags and uses json_last_error() for error handling will be broken if JSON_THROW_ON_ERROR is passed as a flag. This means that library consumers effectively may have to become aware of how a library internally deals with json errors. It kind of turns JSON_THROW_ON_ERROR into unconventional BC break, because any code that takes arbitrary flags and uses json_last_error() needs to account for it. In some cases, I imagine, it can be necessary to add a "json_encode(null);" before the actual json handling just to reset the error state, just in case. I wouldn't be surprised if this becomes a source of unexpected bugs in json handling code down the line. In my opinion, it's very unexpected behaviour that json_last_error() can reflect the state prior to the previous json_encode/json_decode call depending on the flags passed to the function as this has not been the case before. Perhaps it should be documented in the BC breaks section for 7.3 that json_last_error() may now reflect the state of arbitrary previous call? On Fri, May 10, 2019 at 5:43 PM Bishop Bettini <bishop@php.net> wrote:
> > On Thu, May 9, 2019 at 12:06 PM Dan Ackroyd <Danack@basereality.com> wrote: > > > Apparently there is an implementation detail in JSON_THROW_ON_ERROR > > that differs in the RFC text, from the discussion on list > > http://news.php.net/php.internals/100569: > > > > > I decided to reset it to no error because there's no > > > previous error that needs handling by error-checking code, the exception > > > takes care of that. > > > > RFC text: > > > > > When passed this flag, the error behaviour of these functions is changed. > > > The global error state is left untouched, and if an error occurs that > > would > > > otherwise set it, these functions instead throw a JsonException with the > > > message and code set to whatever json_last_error() and > > json_last_error_msg() > > > would otherwise be respectively. > > > > The implementation is highly surprising and can lead to bugs. > > > > Imagine this code exists in a code base. > > > > --------------------- > > > > $foo = json_decode('[bar'); > > // many lines of code. > > // many lines of code. > > // many lines of code. > > > > $bar = json_encode('Hello world!'); > > if (json_last_error() !== JSON_ERROR_NONE) { > > throw new Exception("encoding went wrong!"); > > } > > > > echo "All is fine"; > > // output is "All is fine" > > --------------------- > > > > > > And to start moving to have json_encode throw on error, they start > > using the flag on the json_encode > > > > ---------------------- > > > > $foo = json_decode('[bar'); > > // many lines of code. > > // many lines of code. > > // many lines of code. > > > > $bar = json_encode('Hello world!', JSON_THROW_ON_ERROR); > > if (json_last_error() !== JSON_ERROR_NONE) { > > throw new Exception("encoding went wrong!"); > > } > > > > echo "All is fine"; > > > > ------------------- > > > > The result of this is that the user throw exception will be thrown > > even though the encode worked correctly. https://3v4l.org/JJD5g > > > > This is highly surprising*, and doesn't seem the right thing to be doing. > > > > The new code has what looks like, to me, a refactor bug. I'd expect that, > when one adds the JSON_THROW_ON_ERROR flag, one must also remove the > subsequent json_last_error() handling, as it's no longer relevant to that > call and instead relevant to the most recent prior call. > > In this case, since the most recent prior call had no error handling > itself, it now gets one attached to it. I don't find this action at a > distance surprising: it's one of the known intrinsic downsides of a > global-last-error mechanism. IMO, that this code now fails is a good thing, > because now the author knows to error check after json_decode(). > > > > > Does anyone have any objection to changing this, so that > > json_last_error() + json_last_error_msg() always refer to the most > > recent json_encode()/json_decode() function, as per the discussion. Or > > does this need an RFC? > > > > I think I'd rather see this handled in the documentation. "Note that you > should review use of json_last_error() and json_last_error_msg() error > handling when adding or removing the JSON_THROW_ON_ERROR flag." > > If there was to be any language change, I don't think we can't just change > the behavior. Extant 7.3 code might be: > > // Run through an encode/decode cycle to validate the user-supplied data's > valid. We don't > // support nested arrays on user data, so we set a depth of 1. > // @throws \App\Json\DecodeException when the user-supplied data is invalid > json or is nested > // @throws \JsonException when the data cannot be stored in valid form (eg, > bad UTF-8 characters) > $valid_data = json_encode(json_decode($user_data, 0, 1), > JSON_THROW_ON_ERROR); > if (JSON_ERROR_NONE !== json_last_error()) { > throw new App\Json\DecodeException(json_last_error_msg(), > json_last_error()); > } > > Here, the author expects the error from json_decode() to flow through > json_encode(). Yes, this code could be better written to separate out the > branches, but it's plausible (IMO) and would be subject to a BC break by a > language change on json_last_error() and JSON_THROW_ON_ERROR semantics. > > We could keep the last error from non-throw, and last error from throw, as > separate global states, then provide either new functions to access each, > or a boolean flag to json_last_error($includeLastExceptionError = false): > int, but both of those seem like fitting a problem to a solution. > > In summary, mixed error handling interaction has a couple of possible > implementations. We chose one, with its pros and cons. The other > implementation has its pros and cons, as well. I don't see a reason to swap > one set of consequences for another, esp. now that we've already walked > down the road of choice. I'd rather we just raise awareness in the form of > documentation, and let authors move their code toward one mechanism, or the > other.
-- Riikka Kalliomäki https://github.com/Riimu
  105671
May 11, 2019 02:11 bishop@php.net (Bishop Bettini)
On Fri, May 10, 2019 at 2:52 PM Riikka Kalliomäki <
riikka.kalliomaki@riimu.net> wrote:

> > The new code has what looks like, to me, a refactor bug. I'd expect that, > when one adds the JSON_THROW_ON_ERROR flag, one must also remove the > subsequent json_last_error() handling, as it's no longer relevant to that > call and instead relevant to the most recent prior call. > > This does create a kind of an unfortunate situation with vendor > libraries, however. Any library that accepts arbitrary json flags and > uses json_last_error() for error handling will be broken if > JSON_THROW_ON_ERROR is passed as a flag. >
I would expect such libraries to be sensitive to the possible variations implied by accepting arbitrary arguments, and would compensate accordingly. For example: function library_json_decode(string $json, int $options) { if (70300 <= PHP_VERSION_ID) { $options &= ~JSON_THROW_ON_ERROR; } $decoded = json_decode($json, $options); if (JSON_ERROR_NONE !== json_last_error()) { throw new LibraryException(json_last_error_msg(), json_last_error()); } return $decoded; } This means that library consumers effectively may have to become aware
> of how a library internally deals with json errors. > > It kind of turns JSON_THROW_ON_ERROR into unconventional BC break, > because any code that takes arbitrary flags and uses json_last_error() > needs to account for it. In some cases, I imagine, it can be necessary > to add a "json_encode(null);" before the actual json handling just to > reset the error state, just in case. >
I agree this is possible, but I contend any function taking arbitrary options needs to understand the implication of those options. It's in many ways just like any other user-supplied data: authors must understand the possible values available for the supported versions and handle the accordingly.
> I wouldn't be surprised if this becomes a source of unexpected bugs in > json handling code down the line. In my opinion, it's very unexpected > behaviour that json_last_error() can reflect the state prior to the > previous json_encode/json_decode call depending on the flags passed to > the function as this has not been the case before. > > Perhaps it should be documented in the BC breaks section for 7.3 that > json_last_error() may now reflect the state of arbitrary previous > call? >
Yes, better documentation surrounding JSON_THROW_ON_ERROR would be welcomed. The docs are, right now, quite lean.
  105683
May 12, 2019 17:06 Danack@basereality.com (Dan Ackroyd)
On Fri, 10 May 2019 at 15:42, Bishop Bettini <bishop@php.net> wrote:
> > Extant 7.3 code might be:
People would only write code like that if they hadn't read the manual (or code documentation in their IDE) for json_last_error() which clearly states 'Returns the last error (if any) occurred during the last JSON encoding/decoding,'.
> I'd rather we just raise awareness in the form of documentation
If you have to carefully read the documentation for function, and not taking enough care means that you're going to be surprised by the result, then the function is probably poorly designed.
> We chose one, with its pros and cons.
As we've seen recently, people don't always consider the exact detail implications that deeply. Seeing as people can also just make wrong choices, I think this should be changed. Seeing as you've objected, I'll raise an RFC to put it to a vote. cheers Dan Ack