Re: [PHP-DEV] Re: PHP 7.1.13 and 7.2.1 Available

  101553
January 5, 2018 15:20 phpdev@ehrhardt.nl (Jan Ehrhardt)
Hi Chris,

"Christoph M. Becker" in php.internals (Fri, 5 Jan 2018 15:53:23 +0100):
>On 05.01.2018 at 14:55, Jan Ehrhardt wrote: > >>>> The main reason why I prefer the github zips over the zips at >>>> http://windows.php.net/download/ is some kind of mismatch in the UTF-8 >>>> filenames: >> >> N:\php-sdk\win32sdk2 >> $ unzip -h >> UnZip 6.00 of 20 April 2009, by Info-ZIP. Maintained by C. Spieler. Send >> bug reports using http://www.info-zip.org/zip-bug.html; see README for details. > >From the release notes[1]: > >| Support for UTF-8 encoded entry names, both through PKWARE's "General >| Purpose Flags Bit 11" indicator and Info-ZIP's new "up" unicode path >| extra field. (Currently, on Windows the UTF-8 handling is limited to >| the character subset contained in the configured non-unicode "system >| code page".) > >So this might be a codepage issue.
The warnings do not occur when processing the zips from https://github.com/php/php-src/releases so we know it must be possible to produce zip-files with Unicode filenames without mismatch. Big question is: how? Jan
  101554
January 5, 2018 16:31 cmbecker69@arcor.de ("Christoph M. Becker")
Hi Jan!

On 05.01.2018 at 16:20, Jan Ehrhardt wrote:

> "Christoph M. Becker" in php.internals (Fri, 5 Jan 2018 15:53:23 +0100): >> On 05.01.2018 at 14:55, Jan Ehrhardt wrote: >> >>>>> The main reason why I prefer the github zips over the zips at >>>>> http://windows.php.net/download/ is some kind of mismatch in the UTF-8 >>>>> filenames: >>> >>> N:\php-sdk\win32sdk2 >>> $ unzip -h >>> UnZip 6.00 of 20 April 2009, by Info-ZIP. Maintained by C. Spieler. Send >>> bug reports using http://www.info-zip.org/zip-bug.html; see README for details. >> >>From the release notes[1]: >> >> | Support for UTF-8 encoded entry names, both through PKWARE's "General >> | Purpose Flags Bit 11" indicator and Info-ZIP's new "up" unicode path >> | extra field. (Currently, on Windows the UTF-8 handling is limited to >> | the character subset contained in the configured non-unicode "system >> | code page".) >> >> So this might be a codepage issue. > > The warnings do not occur when processing the zips from > https://github.com/php/php-src/releases so we know it must be possible > to produce zip-files with Unicode filenames without mismatch. Big > question is: how?
Frankly, I don't know. However, these message are indeed mere warnings; the log as well as the file system shows correct filenames after extracting. Furthermore, the messages 'continuing with "central" filename version' don't make sense to me, since the filenames in the local file headers are identical to those in the central directory headers. Anyhow, since everything appears to work with unzip 6.0.0 and there are better tools anyway, I don't think this issue deserves spending much time. :) -- Christoph M. Becker
  101557
January 5, 2018 16:36 ab@php.net (Anatol Belski)
Hi Christoph,

> -----Original Message----- > From: Christoph M. Becker [mailto:cmbecker69@arcor.de] > Sent: Friday, January 5, 2018 5:31 PM > To: internals@lists.php.net; Jan Ehrhardt <phpdev@ehrhardt.nl>; > internals@lists.php.net > Subject: Re: [PHP-DEV] Re: PHP 7.1.13 and 7.2.1 Available > > Hi Jan! > > On 05.01.2018 at 16:20, Jan Ehrhardt wrote: > > > "Christoph M. Becker" in php.internals (Fri, 5 Jan 2018 15:53:23 +0100): > >> On 05.01.2018 at 14:55, Jan Ehrhardt wrote: > >> > >>>>> The main reason why I prefer the github zips over the zips at > >>>>> http://windows.php.net/download/ is some kind of mismatch in the > >>>>> UTF-8 > >>>>> filenames: > >>> > >>> N:\php-sdk\win32sdk2 > >>> $ unzip -h > >>> UnZip 6.00 of 20 April 2009, by Info-ZIP. Maintained by C. Spieler. > >>> Send bug reports using http://www.info-zip.org/zip-bug.html; see README > for details. > >> > >>From the release notes[1]: > >> > >> | Support for UTF-8 encoded entry names, both through PKWARE's > >> | "General Purpose Flags Bit 11" indicator and Info-ZIP's new "up" > >> | unicode path extra field. (Currently, on Windows the UTF-8 handling > >> | is limited to the character subset contained in the configured > >> | non-unicode "system code page".) > >> > >> So this might be a codepage issue. > > > > The warnings do not occur when processing the zips from > > https://github.com/php/php-src/releases so we know it must be possible > > to produce zip-files with Unicode filenames without mismatch. Big > > question is: how? > > Frankly, I don't know. However, these message are indeed mere warnings; the > log as well as the file system shows correct filenames after extracting. > Furthermore, the messages 'continuing with "central" > filename version' don't make sense to me, since the filenames in the local file > headers are identical to those in the central directory headers. > > Anyhow, since everything appears to work with unzip 6.0.0 and there are better > tools anyway, I don't think this issue deserves spending much time. :) > Same here, "7za x file.zip" is a far better option with the latest versions. With the variety of tools, there'll be always some with some conflicting implementation. Many tools are supported by the current approach anyway.
Regards Anatol
  101555
January 5, 2018 16:31 cmbecker69@gmx.de ("Christoph M. Becker")
Hi Jan!

On 05.01.2018 at 16:20, Jan Ehrhardt wrote:

> "Christoph M. Becker" in php.internals (Fri, 5 Jan 2018 15:53:23 +0100): >> On 05.01.2018 at 14:55, Jan Ehrhardt wrote: >> >>>>> The main reason why I prefer the github zips over the zips at >>>>> http://windows.php.net/download/ is some kind of mismatch in the UTF-8 >>>>> filenames: >>> >>> N:\php-sdk\win32sdk2 >>> $ unzip -h >>> UnZip 6.00 of 20 April 2009, by Info-ZIP. Maintained by C. Spieler. Send >>> bug reports using http://www.info-zip.org/zip-bug.html; see README for details. >> >>From the release notes[1]: >> >> | Support for UTF-8 encoded entry names, both through PKWARE's "General >> | Purpose Flags Bit 11" indicator and Info-ZIP's new "up" unicode path >> | extra field. (Currently, on Windows the UTF-8 handling is limited to >> | the character subset contained in the configured non-unicode "system >> | code page".) >> >> So this might be a codepage issue. > > The warnings do not occur when processing the zips from > https://github.com/php/php-src/releases so we know it must be possible > to produce zip-files with Unicode filenames without mismatch. Big > question is: how?
Frankly, I don't know. However, these message are indeed mere warnings; the log as well as the file system shows correct filenames after extracting. Furthermore, the messages 'continuing with "central" filename version' don't make sense to me, since the filenames in the local file headers are identical to those in the central directory headers. Anyhow, since everything appears to work with unzip 6.0.0 and there are better tools anyway, I don't think this issue deserves spending much time. :) -- Christoph M. Becker
  101556
January 5, 2018 16:33 ab@php.net (Anatol Belski)
> -----Original Message----- > From: Jan Ehrhardt [mailto:phpdev@ehrhardt.nl] > Sent: Friday, January 5, 2018 4:20 PM > To: internals@lists.php.net > Subject: Re: [PHP-DEV] Re: PHP 7.1.13 and 7.2.1 Available > > Hi Chris, > > "Christoph M. Becker" in php.internals (Fri, 5 Jan 2018 15:53:23 +0100): > >On 05.01.2018 at 14:55, Jan Ehrhardt wrote: > > > >>>> The main reason why I prefer the github zips over the zips at > >>>> http://windows.php.net/download/ is some kind of mismatch in the > >>>> UTF-8 > >>>> filenames: > >> > >> N:\php-sdk\win32sdk2 > >> $ unzip -h > >> UnZip 6.00 of 20 April 2009, by Info-ZIP. Maintained by C. Spieler. > >> Send bug reports using http://www.info-zip.org/zip-bug.html; see README > for details. > > > >From the release notes[1]: > > > >| Support for UTF-8 encoded entry names, both through PKWARE's "General > >| Purpose Flags Bit 11" indicator and Info-ZIP's new "up" unicode path > >| extra field. (Currently, on Windows the UTF-8 handling is limited to > >| the character subset contained in the configured non-unicode "system > >| code page".) > > > >So this might be a codepage issue. > > The warnings do not occur when processing the zips from > https://github.com/php/php-src/releases so we know it must be possible to > produce zip-files with Unicode filenames without mismatch. Big question is: > how? > I see the warnings now, however the file is unpacked correctly. Strange enough. Chistoph's finding might be the cause of these warnings. But if unzip can only translate UTF-8 to the current codepage, it would mean actually any CJK filenames would be broken, which they're not. Plus, MSYS2 is actually a Linux port, so there can be issues in this regard , too.
I'd rather say unzip should not be used in this case as it inherently documents not a fully compatible behavior, but seems to work in the end. The binary tools also supply 7za which has decent support for other formats. Perhaps it would make sense to check whether 7z should be used to produce zips. I recall that we had some issues when use older MSYS2 tools. Unzip was added there for convenience with the old SDK and because it's used also by several other scenarios, eq . PECL build hosts. Regards Anatol