New functions `hash_serialize` and `hash_unserialize`?

  110422
June 8, 2020 13:01 kohler@seas.harvard.edu (Eddie Kohler)
Hello internals! Thanks for PHP!

I'm writing to gauge interest in two new functions to the PHP `hash`
extension, `hash_serialize` and `hash_unserialize`. These functions would
serialize and unserialize the internals of a HashContext object, allowing a
partially-computed hash to be saved, then restored and completed in a later
run.

EXAMPLE: Multi-part upload.

Say that a very large file is uploaded in pieces, `big.001` through
`big.999`, and it is necessary to compute the SHA256 of the final
concatenated file.
Current PHP must compute the hash in one go:

$ctx = hash_init("sha256");
for ($i = 1; $i <= 999; ++$i) {
     hash_update_file($ctx, sprintf("big.%.03d", $i));
}
$hash = hash_final($ctx);

This in turn requires that all pieces be on the filesystem simultaneously.

With hash_serialize and hash_unserialize, the hash can be computed
gradually, allowing pieces to be deleted as they are uploaded elsewhere.

$ctx = hash_init("sha256");
hash_update_file($ctx, "big.001");
SAVE_TO_DATABASE(hash_serialize($ctx));
....
$ctx = hash_unserialize(LOAD_FROM_DATABASE());
hash_update_file($ctx, "big.002");
SAVE_TO_DATABASE(hash_serialize($ctx));
....
etc.

***

I am happy to write up an RFC for these functions. An initial
implementation with tests is visible here:
https://github.com/kohler/php-src/commit/5a3a828f90b88cd7f660babec7db531cfc04b0a1

New functions `hash_serialize` and `hash_unserialize` appear to fit the
existing API well, and simplify implementation, but it's possible that
`__serialize/__unserialize` or the internal `serialize/unserialize`
functions would be preferred.

I'd be grateful for any feedback.
Thanks!
Eddie Kohler
  110423
June 8, 2020 13:28 johannes@schlueters.de (Johannes =?ISO-8859-1?Q?Schl=FCter?=)
On Mon, 2020-06-08 at 09:01 -0400, Eddie Kohler wrote:
> I'm writing to gauge interest in two new functions to the PHP `hash` > extension, `hash_serialize` and `hash_unserialize`. These functions > would serialize and unserialize the internals of a HashContext > objectallowing a partially-computed hash to be saved, then restored > and completed in a laterrun.
I would suggest to make the HashContext Serializable, then serialize($hash_context); works. Then it also fits when stored in other objects or something ... johannes
  110487
June 11, 2020 16:58 kohler@seas.harvard.edu (Eddie Kohler)
Thanks for this suggestion. I've updated the implementation to make
HashContext implement Serializable.

I'd still be grateful for more feedback, or perhaps I should just create an
RFC?
Eddie


On Mon, Jun 8, 2020 at 9:28 AM Johannes Schlüter <johannes@schlueters.de>
wrote:

> On Mon, 2020-06-08 at 09:01 -0400, Eddie Kohler wrote: > > I'm writing to gauge interest in two new functions to the PHP `hash` > > extension, `hash_serialize` and `hash_unserialize`. These functions > > would serialize and unserialize the internals of a HashContext > > objectallowing a partially-computed hash to be saved, then restored > > and completed in a laterrun. > > I would suggest to make the HashContext Serializable, then > > serialize($hash_context); > > works. Then it also fits when stored in other objects or something ... > > johannes > >
  110491
June 11, 2020 21:07 cmbecker69@gmx.de ("Christoph M. Becker")
On 11.06.2020 at 18:58, Eddie Kohler wrote:

> Thanks for this suggestion. I've updated the implementation to make > HashContext implement Serializable. > > I'd still be grateful for more feedback, or perhaps I should just create an > RFC?
Not sure if that would need an RFC; maybe just start by submitting a pull request. :) Thanks, Christoph
> Eddie > > > On Mon, Jun 8, 2020 at 9:28 AM Johannes Schlüter <johannes@schlueters.de> > wrote: > >> On Mon, 2020-06-08 at 09:01 -0400, Eddie Kohler wrote: >>> I'm writing to gauge interest in two new functions to the PHP `hash` >>> extension, `hash_serialize` and `hash_unserialize`. These functions >>> would serialize and unserialize the internals of a HashContext >>> objectallowing a partially-computed hash to be saved, then restored >>> and completed in a laterrun. >> >> I would suggest to make the HashContext Serializable, then >> >> serialize($hash_context); >> >> works. Then it also fits when stored in other objects or something ... >> >> johannes >> >> >
  110492
June 11, 2020 21:41 pollita@php.net (Sara Golemon)
On Thu, Jun 11, 2020 at 11:59 AM Eddie Kohler <kohler@seas.harvard.edu>
wrote:

> Thanks for this suggestion. I've updated the implementation to make > HashContext implement Serializable. > > I'd still be grateful for more feedback, or perhaps I should just create > an RFC? > > Be careful what you ask for. :)
Overall +1 on the concept with a few notes: 1. Please put this on a branch and make it a PR so we can comment on it directly. 2. Consider using zend_parse_parameters_throws() and family so that the exception which is thrown contains the type error information rather than the generic RETURN_THROWS() macros. 3. Consider using hex or base64 to serialize the contexts. This will reduce various transport/storage issues. 4. It's great that you've thought about endianness, but the current implementation simply bails on endian mismatch. It'd be a nice-to-have for the user if these serializations were portable. I know this represents a lot of work for sort of an edge case so I won't hold it against you if you say 'no' and/or save this for later work if demand surfaces. 5. Storing $key makes me nervous. I don't have a good solution to this since the deserialization doesn't actually give us a chance to specify it in the deserialization process. I wish I'd made $key/hmac an option to hash_final rather than hash_init. Maybe we can think about allowing that to be specified at either end. Let's expand on this topic while you work on your RFC. 6. Yeah... I think you need an RFC because of #5. Sorry. 7. TABS v SPACES indentation issues. -Sara
  110691
June 21, 2020 13:12 kohler@seas.harvard.edu (Eddie Kohler)
Hi all,

I've opened up a pull request and responded to this message there. I'd love
any further comments.

https://github.com/php/php-src/pull/5702

Eddie


On Thu, Jun 11, 2020 at 5:42 PM Sara Golemon <pollita@php.net> wrote:

> WARNING: Harvard cannot validate this message was sent from an authorized > system. Please be careful when opening attachments, clicking links, or > following instructions. For more information, visit the HUIT IT Portal and > search for SPF. > ------------------------------ > On Thu, Jun 11, 2020 at 11:59 AM Eddie Kohler <kohler@seas.harvard.edu> > wrote: > >> Thanks for this suggestion. I've updated the implementation to make >> HashContext implement Serializable. >> >> I'd still be grateful for more feedback, or perhaps I should just create >> an RFC? >> >> Be careful what you ask for. :) > > Overall +1 on the concept with a few notes: > > 1. Please put this on a branch and make it a PR so we can comment on it > directly. > 2. Consider using zend_parse_parameters_throws() and family so that the > exception which is thrown contains the type error information rather than > the generic RETURN_THROWS() macros. > 3. Consider using hex or base64 to serialize the contexts. This will > reduce various transport/storage issues. > 4. It's great that you've thought about endianness, but the current > implementation simply bails on endian mismatch. It'd be a nice-to-have for > the user if these serializations were portable. I know this represents a > lot of work for sort of an edge case so I won't hold it against you if you > say 'no' and/or save this for later work if demand surfaces. > 5. Storing $key makes me nervous. I don't have a good solution to this > since the deserialization doesn't actually give us a chance to specify it > in the deserialization process. I wish I'd made $key/hmac an option to > hash_final rather than hash_init. Maybe we can think about allowing that > to be specified at either end. Let's expand on this topic while you work > on your RFC. > 6. Yeah... I think you need an RFC because of #5. Sorry. > 7. TABS v SPACES indentation issues. > > -Sara > >