Packed Strings

November 27, 2017 23:11 (Dmitry Stogov)

I spent some time, reviewing an old Andrea's idea about packed strings.

The idea is simple. In every place were we use zend_string*, we may store characters directly.

We use low byte to encode packed string marker and string length, we also need one byte for trailing zero, so we can keep up to 2-characters on 32-bit system and up to 6 characters on 64-bit without allocation of additional memory.

The refreshed dirty PoC implementation

You may take a quick look only into zend_string.h changes (the rest is almost a monkey work).

I was able to run bench.php, and probably won't go forward.

Unfortunately, I got into two serious problems:

1) The original implementation used packed strings their selves as their hash value. This leaded to huge slowdown, because of hash collisions. (e.g. on bench.php hash1()). I switched to hash recalculation on each usage, but this negates the benefit of allocation elimination. Probably, we may use a cheaper hash function for packed strings...

2) PHP still uses char* in many places. When we take ZSTR_VAL() from a packed string stored in local variable (or function argument), we may very easy get a dangling pointer. (e.g. INI directives processed by OnUpdateString, internal functions parameters received as char*, ...). Changing all this char* into zend_string* would help, but looks unrealistic for PHP-7.3.

So, I gave up for now.

I decided, to share these results. May be someone would get related ideas.

Thanks. Dmitry.