[RFC] Reproducible Builds Support

  101327
December 11, 2017 21:11 jelle@vdwaa.nl (Jelle van der Waa)
Hi all,

Debian, Arch Linux and other distro's are trying to get full
reproducible builds. There are some issues in PHP's codebase which makes
builds unreproducible. Reprodicuble builds are currently reproduced in
Arch Linux by building PHP twice, and in two different env's, varying
hostname, system time, etc. [1]

Once issue is the PHP_BUILD_DATE, which makes the build
non-reproducible. I've made a PR which uses SOURCE_DATE_EPOCH which is
set in the reprodiculbe build env. This should keep the current
functionality intact, while adding support for reproducible builds. [2]
[3]

Another issue is the php_uname functions which contains the
hostname, since the hostname is varied per build this makes it
non-reproducible. This is caused by the following line:

configure.ac:PHP_UNAME=`uname -a | xargs` required in:
ext/standard/info.c:            php_uname = PHP_UNAME;

Which is there as fallback as the php.net documentation describes:

"On some older UNIX platforms, it may not be able to determine the
current OS information in which case it will revert to displaying the OS
PHP was built on. This will only happen if your uname() library call
either doesn't exist or doesn't work.".

I would argue that this is strange unexpected behaviour, and maybe it
should throw an exception instead? Or can it show only "Linux" as
fallback? basically PHP_OS. Ideas?

The last issue is phar.phar being non-reproducible of which I am not
sure what the issue would be. I'm not sure how the binary data in the
phar.phar is generated.

[1] https://tests.reproducible-builds.org/archlinux/extra/php/php-7.2.0-2-x86_64.pkg.tar.xz.html
[2] https://github.com/php/php-src/pull/2965
[3] https://reproducible-builds.org/specs/source-date-epoch/

Thanks,

-- 
Jelle van der Waa

Arch Linux Developer
  101329
December 12, 2017 07:33 smalyshev@gmail.com (Stanislav Malyshev)
Hi!

> Once issue is the PHP_BUILD_DATE, which makes the build > non-reproducible. I've made a PR which uses SOURCE_DATE_EPOCH which is > set in the reprodiculbe build env. This should keep the current > functionality intact, while adding support for reproducible builds. [2] > [3]
SOURCE_DATE_EPOCH (or any other variable) looks like a good way to make it predictable.
> Another issue is the php_uname functions which contains the > hostname, since the hostname is varied per build this makes it > non-reproducible. This is caused by the following line: > > configure.ac:PHP_UNAME=`uname -a | xargs` required in: > ext/standard/info.c: php_uname = PHP_UNAME;
I think the best solution here would be to have another variable to override this.
> I would argue that this is strange unexpected behaviour, and maybe it > should throw an exception instead? Or can it show only "Linux" as > fallback? basically PHP_OS. Ideas?
If those old systems run PHP and need uname, changing stuff there is probably harder and more expensive than on other systems. With this in mind, I'd rather not mess with it, especially for a purpose that can easily be achieved without it. -- Stas Malyshev smalyshev@gmail.com
  101338
December 12, 2017 20:50 jelle@vdwaa.nl (Jelle van der Waa)
On 12/11/17 at 11:33pm, Stanislav Malyshev wrote:
> Hi! > > > Once issue is the PHP_BUILD_DATE, which makes the build > > non-reproducible. I've made a PR which uses SOURCE_DATE_EPOCH which is > > set in the reprodiculbe build env. This should keep the current > > functionality intact, while adding support for reproducible builds. [2] > > [3] > > SOURCE_DATE_EPOCH (or any other variable) looks like a good way to make > it predictable. > > > Another issue is the php_uname functions which contains the > > hostname, since the hostname is varied per build this makes it > > non-reproducible. This is caused by the following line: > > > > configure.ac:PHP_UNAME=`uname -a | xargs` required in: > > ext/standard/info.c: php_uname = PHP_UNAME; > > I think the best solution here would be to have another variable to > override this.
The issue with this approach would be that every distribution has to set this variable. I know it's the same with SOURCE_DATE_EPOCH, but that is well established.
> > > I would argue that this is strange unexpected behaviour, and maybe it > > should throw an exception instead? Or can it show only "Linux" as > > fallback? basically PHP_OS. Ideas? > > If those old systems run PHP and need uname, changing stuff there is > probably harder and more expensive than on other systems. With this in > mind, I'd rather not mess with it, especially for a purpose that can > easily be achieved without it.
Hmmm true, but the fallback being the hostname where PHP was build on seems a little bit odd, doesn't it? -- Jelle van der Waa
  101341
December 13, 2017 00:02 smalyshev@gmail.com (Stanislav Malyshev)
Hi!

>> I think the best solution here would be to have another variable to >> override this. > > The issue with this approach would be that every distribution has to set > this variable. I know it's the same with SOURCE_DATE_EPOCH, but that is > well established.
All distros that want reproducible build of PHP. But I assume they need to do some special magic to initiate reproducible build anyway, if so, we could document the procedure of setting up reproducible build in some readme file, and make it easy to set it up. They won't need to set it up for all builds, just for PHP build, and since most use special scripts to build PHP anyway, it shouldn't be too hard to add.
>> If those old systems run PHP and need uname, changing stuff there is >> probably harder and more expensive than on other systems. With this in >> mind, I'd rather not mess with it, especially for a purpose that can >> easily be achieved without it. > > Hmmm true, but the fallback being the hostname where PHP was build on > seems a little bit odd, doesn't it?
Yes, but I'd follow "Chesterton fence" principle here. Maybe we could use some ifdefs and configure magic to ensure this is actually not happening on the kind of systems where reproducible builds are run? -- Stas Malyshev smalyshev@gmail.com
  101339
December 12, 2017 21:12 levim@php.net (Levi Morrison)
On Mon, Dec 11, 2017 at 2:11 PM, Jelle van der Waa <jelle@vdwaa.nl> wrote:
> Hi all, > > Debian, Arch Linux and other distro's are trying to get full > reproducible builds. There are some issues in PHP's codebase which makes > builds unreproducible. Reprodicuble builds are currently reproduced in > Arch Linux by building PHP twice, and in two different env's, varying > hostname, system time, etc. [1] > > Once issue is the PHP_BUILD_DATE, which makes the build > non-reproducible. I've made a PR which uses SOURCE_DATE_EPOCH which is > set in the reprodiculbe build env. This should keep the current > functionality intact, while adding support for reproducible builds. [2] > [3]
It looks good to me.
> Another issue is the php_uname functions which contains the > hostname, since the hostname is varied per build this makes it > non-reproducible. This is caused by the following line: > > configure.ac:PHP_UNAME=`uname -a | xargs` required in: > ext/standard/info.c: php_uname = PHP_UNAME; > > Which is there as fallback as the php.net documentation describes: > > "On some older UNIX platforms, it may not be able to determine the > current OS information in which case it will revert to displaying the OS > PHP was built on. This will only happen if your uname() library call > either doesn't exist or doesn't work.". > > I would argue that this is strange unexpected behaviour, and maybe it > should throw an exception instead? Or can it show only "Linux" as > fallback? basically PHP_OS. Ideas?
I wouldn't throw an exception here. It seems PHP_OS is under-documented; maybe PHP_OS_FAMILY is better:
> The operating system family PHP was built for. Either of 'Windows', 'BSD', 'Darwin', 'Solaris', 'Linux' or 'Unknown'. Available as of PHP 7.2.0.
However, I really don't think we should change this for already released PHP versions. We should our maintainers how they feel about changing it in a x.y.NEXT patch. My inclination is to do this for PHP 7.3 and beyond and accept that official PHP sources of earlier versions will not produce reproducible builds.
> The last issue is phar.phar being non-reproducible of which I am not > sure what the issue would be. I'm not sure how the binary data in the > phar.phar is generated.
Phars are like `tars` that are also valid PHP files. This means there are probably modification times, etc, set in there. Not sure what else would need to be changed.
  101354
December 14, 2017 09:02 jelle@vdwaa.nl (Jelle van der Waa)
On 12/12/17 at 02:12pm, Levi Morrison wrote:
> On Mon, Dec 11, 2017 at 2:11 PM, Jelle van der Waa <jelle@vdwaa.nl> wrote: > > Hi all, > > > > Debian, Arch Linux and other distro's are trying to get full > > reproducible builds. There are some issues in PHP's codebase which makes > > builds unreproducible. Reprodicuble builds are currently reproduced in > > Arch Linux by building PHP twice, and in two different env's, varying > > hostname, system time, etc. [1] > > > > Once issue is the PHP_BUILD_DATE, which makes the build > > non-reproducible. I've made a PR which uses SOURCE_DATE_EPOCH which is > > set in the reprodiculbe build env. This should keep the current > > functionality intact, while adding support for reproducible builds. [2] > > [3] > > It looks good to me. > > > Another issue is the php_uname functions which contains the > > hostname, since the hostname is varied per build this makes it > > non-reproducible. This is caused by the following line: > > > > configure.ac:PHP_UNAME=`uname -a | xargs` required in: > > ext/standard/info.c: php_uname = PHP_UNAME; > > > > Which is there as fallback as the php.net documentation describes: > > > > "On some older UNIX platforms, it may not be able to determine the > > current OS information in which case it will revert to displaying the OS > > PHP was built on. This will only happen if your uname() library call > > either doesn't exist or doesn't work.". > > > > I would argue that this is strange unexpected behaviour, and maybe it > > should throw an exception instead? Or can it show only "Linux" as > > fallback? basically PHP_OS. Ideas? > > I wouldn't throw an exception here. It seems PHP_OS is > under-documented; maybe PHP_OS_FAMILY is better:
PHP_OS and PHP_OS_FAMILY is a strange difference indeed. I'll have to do some further digging.
> > > The operating system family PHP was built for. Either of 'Windows', 'BSD', 'Darwin', 'Solaris', 'Linux' or 'Unknown'. Available as of PHP 7.2.0. > > However, I really don't think we should change this for already > released PHP versions. We should our maintainers how they feel about > changing it in a x.y.NEXT patch. My inclination is to do this for PHP > 7.3 and beyond and accept that official PHP sources of earlier > versions will not produce reproducible builds.
Indeed, as an Arch Linux developer I'm fine with these changes adding up in the next release and no backporting.
> > The last issue is phar.phar being non-reproducible of which I am not > > sure what the issue would be. I'm not sure how the binary data in the > > phar.phar is generated. > > Phars are like `tars` that are also valid PHP files. This means there > are probably modification times, etc, set in there. Not sure what else > would need to be changed.
Thanks for the information, I'll see if I can do some more digging. -- Jelle van der Waa
  101356
December 15, 2017 10:13 j.boggiano@seld.be (Jordi Boggiano)
On 2017-12-14 10:02 AM, Jelle van der Waa wrote:
>>> The last issue is phar.phar being non-reproducible of which I am not >>> sure what the issue would be. I'm not sure how the binary data in the >>> phar.phar is generated. >> >> Phars are like `tars` that are also valid PHP files. This means there >> are probably modification times, etc, set in there. Not sure what else >> would need to be changed. > > Thanks for the information, I'll see if I can do some more digging.
I have had similar issues with Phar files when I tried to make Composer builds reproducible. The cause is that the Phar extension uses the current unix timestamp as filemtime for all files in the table of content (at least when using addFromString), so every time you build the TOC is different and hence the signature at the end also is. I built a tool to fix this which just overwrites the TOC timestamps with whatever you want and then updates the signature.. If it helps, you can find it there: https://github.com/Seldaek/phar-utils Example usage in Composer: https://github.com/composer/composer/blob/84f5a1a7e8293978a718663dfac399e83f093e9e/src/Composer/Compiler.php#L161-L164 I guess an alternative fix would be for someone to actually fix the Phar extension so addFromString has a filemtime parameter you can pass the desired mtime to. I have not checked whether addFile suffers from the same issue or not, but possibly it needs to be fixed to read the mtime from the file you add. Best, Jordi -- Jordi Boggiano @seldaek - http://seld.be
  101357
December 15, 2017 10:54 sebastian@php.net (Sebastian Bergmann)
Am 15.12.2017 um 11:13 schrieb Jordi Boggiano:
> I guess an alternative fix would be for someone to actually fix the Phar > extension so addFromString has a filemtime parameter you can pass the > desired mtime to. I have not checked whether addFile suffers from the same > issue or not, but possibly it needs to be fixed to read the mtime from the > file you add.
+1
  101410
December 24, 2017 11:22 jelle@vdwaa.nl (Jelle van der Waa)
On 12/15/17 at 11:54am, Sebastian Bergmann wrote:
> Am 15.12.2017 um 11:13 schrieb Jordi Boggiano: > > I guess an alternative fix would be for someone to actually fix the Phar > > extension so addFromString has a filemtime parameter you can pass the > > desired mtime to. I have not checked whether addFile suffers from the same > > issue or not, but possibly it needs to be fixed to read the mtime from the > > file you add. > > +1
I'm not sure if timestamps are the issue, the created phar.phar binary is non-reproducible as can be seen in this diff. I'll do some more digging :) https://tests.reproducible-builds.org/archlinux/extra/php/php-7.2.0-2-x86_64.pkg.tar.xz.html -- Jelle van der Waa