Consider only ignoring newlines for final ?> in a file

  100428
September 7, 2017 01:45 ajf@ajf.me (Andrea Faulds)
Hi everyone,

This is the tiniest of issues, but it's bugged me for a long time and 
makes the HTML produced by PHP code less readable than it out to be. 
Specifically, PHP ignores a newline immediately following a ?> tag. The 
reason for this is, from what I recall, to prevent issues where 
whitespace at the end of a PHP file is echoed before headers can be 
sent. On UNIX in particular, all text files (should) end in a newline, 
so this is a reasonable and necessary feature.

However, for ?> tags anywhere that aren't right at the end of the file, 
this is just a nuisance that makes for messy output. For example, HTML 
output that should look like:

foo bar
May instead end up looking something like:
foo bar
Of course, HTML doesn't matter so much, it'll render the same to the end-user. However, for outputting e.g. plain text, newlines can be significant, and so you have to insert an ugly and surprising extra newline following a tag. Would anyone object to me changing how PHP handles this so that only the final ?> tag consumes its following newline, and only at the end of the file? Thanks! -- Andrea Faulds https://ajf.me/
  100430
September 7, 2017 05:23 php@golemon.com (Sara Golemon)
> On Sep 6, 2017, at 18:45, Andrea Faulds <ajf@ajf.me> wrote: > Would anyone object to me changing how PHP handles this so that only the final ?> tag consumes its following newline, and only at the end of the file? >
I object. It's a change in ancient behavior that has the potential to break existing code for superficial reasons. We'd never design it that way today, but that die is long cast. -1 -Sara
  100435
September 7, 2017 10:11 francois@tekwire.net (=?UTF-8?Q?Fran=c3=a7ois_Laupretre?=)
Hi Andrea,


Le 07/09/2017 à 03:45, Andrea Faulds a écrit :
> Hi everyone, > > This is the tiniest of issues, but it's bugged me for a long time and > makes the HTML produced by PHP code less readable than it out to be. > Specifically, PHP ignores a newline immediately following a ?> tag. > The reason for this is, from what I recall, to prevent issues where > whitespace at the end of a PHP file is echoed before headers can be > sent. On UNIX in particular, all text files (should) end in a newline, > so this is a reasonable and necessary feature. > > However, for ?> tags anywhere that aren't right at the end of the > file, this is just a nuisance that makes for messy output. For > example, HTML output that should look like: > > >     >        >        >     >
foobar
> > May instead end up looking something like: > >     >               >    
foobar
> > Of course, HTML doesn't matter so much, it'll render the same to the > end-user. However, for outputting e.g. plain text, newlines can be > significant, and so you have to insert an ugly and surprising extra > newline following a tag. > > Would anyone object to me changing how PHP handles this so that only > the final ?> tag consumes its following newline, and only at the end > of the file? > > Thanks!
+1 to create a PHP8 branch and change the behavior there. not in PHP7. Once again, some may think it's too early but, IMO, we should create such a branch and encourage RFCs and changes targeting next major version to be announced, discussed, implemented, and tested as soon as possible. This is the only way to introduce BC breaks while minimizing their impact. We saw this when talking about PHP7 features : when proposed too late, changes introducing BC breaks generally must be rejected, whatever their value. Regards François
  100437
September 7, 2017 10:29 nikita.ppv@gmail.com (Nikita Popov)
On Thu, Sep 7, 2017 at 12:11 PM, François Laupretre <francois@tekwire.net>
wrote:

> Hi Andrea, > > > Le 07/09/2017 à 03:45, Andrea Faulds a écrit : > >> Hi everyone, >> >> This is the tiniest of issues, but it's bugged me for a long time and >> makes the HTML produced by PHP code less readable than it out to be. >> Specifically, PHP ignores a newline immediately following a ?> tag. The >> reason for this is, from what I recall, to prevent issues where whitespace >> at the end of a PHP file is echoed before headers can be sent. On UNIX in >> particular, all text files (should) end in a newline, so this is a >> reasonable and necessary feature. >> >> However, for ?> tags anywhere that aren't right at the end of the file, >> this is just a nuisance that makes for messy output. For example, HTML >> output that should look like: >> >> >> >> >> >> >>
foobar
>> >> May instead end up looking something like: >> >> >> >>
foo bar
>> >> Of course, HTML doesn't matter so much, it'll render the same to the >> end-user. However, for outputting e.g. plain text, newlines can be >> significant, and so you have to insert an ugly and surprising extra newline >> following a tag. >> >> Would anyone object to me changing how PHP handles this so that only the >> final ?> tag consumes its following newline, and only at the end of the >> file? >> >> Thanks! >> > > +1 to create a PHP8 branch and change the behavior there. not in PHP7. > > Once again, some may think it's too early but, IMO, we should create such > a branch and encourage RFCs and changes targeting next major version to be > announced, discussed, implemented, and tested as soon as possible. This is > the only way to introduce BC breaks while minimizing their impact. We saw > this when talking about PHP7 features : when proposed too late, changes > introducing BC breaks generally must be rejected, whatever their value. > > Regards > > François >
New branches cause a lot of additional overhead for core developers. Changes have to merged across all actively supported branches, commonly with NEWS file adjustments. Depending on where we are in the release cycle right now, we already have 3-4 active branches -- we don't need to add to that. I think it's fine to start targeting PHP 8 now with RFCs, but implementation work should be done outside of php-src. It is more cost effective for one person to rebase their code two years down the line than it is for everybody to do extra work every time they commit something. (Alternatively we would have to change our development model so that branches are not synchronized at all times.) Nikita
  100436
September 7, 2017 10:19 nikita.ppv@gmail.com (Nikita Popov)
On Thu, Sep 7, 2017 at 3:45 AM, Andrea Faulds <ajf@ajf.me> wrote:

> Hi everyone, > > This is the tiniest of issues, but it's bugged me for a long time and > makes the HTML produced by PHP code less readable than it out to be. > Specifically, PHP ignores a newline immediately following a ?> tag. The > reason for this is, from what I recall, to prevent issues where whitespace > at the end of a PHP file is echoed before headers can be sent. On UNIX in > particular, all text files (should) end in a newline, so this is a > reasonable and necessary feature. > > However, for ?> tags anywhere that aren't right at the end of the file, > this is just a nuisance that makes for messy output. For example, HTML > output that should look like: > > > > > > >
foobar
> > May instead end up looking something like: > > > >
foo bar
> > Of course, HTML doesn't matter so much, it'll render the same to the > end-user. However, for outputting e.g. plain text, newlines can be > significant, and so you have to insert an ugly and surprising extra newline > following a tag. > > Would anyone object to me changing how PHP handles this so that only the > final ?> tag consumes its following newline, and only at the end of the > file? > > Thanks! >
It also goes the other way. Whether you want to drop the newline after ?> depends (roughly) on whether the code is control flow (drop) or trailing output (don't drop). If the newline is not dropped anymore it doesn't mean that the output will look nice, it's just going to be broken in a different way. Nikita
  100439
September 7, 2017 12:43 ajf@ajf.me (Andrea Faulds)
Hi Nikita,

Nikita Popov wrote:
> > It also goes the other way. Whether you want to drop the newline after ?> > depends (roughly) on whether the code is control flow (drop) or trailing > output (don't drop). If the newline is not dropped anymore it doesn't mean > that the output will look nice, it's just going to be broken in a different > way. >
I understand that it should be dropped for “control flow” code (maybe not the best term, I misunderstood what you meant at first). That's why I suggest ignoring the following newline only for the ?> at the end of the file, because I can't think of another place where you would have a ?> and *not* intend output immediately after it. So I'm not sure I understand your objection, from that standpoint. Did I miss something? Regards. -- Andrea Faulds https://ajf.me/
  100440
September 7, 2017 12:54 nikita.ppv@gmail.com (Nikita Popov)
On Thu, Sep 7, 2017 at 2:43 PM, Andrea Faulds <ajf@ajf.me> wrote:

> Hi Nikita, > > Nikita Popov wrote: > >> >> It also goes the other way. Whether you want to drop the newline after ?> >> depends (roughly) on whether the code is control flow (drop) or trailing >> output (don't drop). If the newline is not dropped anymore it doesn't mean >> that the output will look nice, it's just going to be broken in a >> different >> way. >> >> > I understand that it should be dropped for “control flow” code (maybe not > the best term, I misunderstood what you meant at first). That's why I > suggest ignoring the following newline only for the ?> at the end of the > file, because I can't think of another place where you would have a ?> and > *not* intend output immediately after it. > > So I'm not sure I understand your objection, from that standpoint. Did I > miss something? > > Regards. >
I'm referring to code like
Currently this would produce the output
  • Foo
  • Bar
Without the trailing newline elision it would produce
  • Foo
  • Bar
I always assumed that this is the reason why we do this in the first place. Nikita
  100442
September 7, 2017 13:43 ajf@ajf.me (Andrea Faulds)
Hi,

Nikita Popov wrote:
> On Thu, Sep 7, 2017 at 2:43 PM, Andrea Faulds <ajf@ajf.me> wrote: > >> Hi Nikita, >> >> Nikita Popov wrote: >> >>> >>> It also goes the other way. Whether you want to drop the newline after ?> >>> depends (roughly) on whether the code is control flow (drop) or trailing >>> output (don't drop). If the newline is not dropped anymore it doesn't mean >>> that the output will look nice, it's just going to be broken in a >>> different >>> way. >>> >>> >> I understand that it should be dropped for “control flow” code (maybe not >> the best term, I misunderstood what you meant at first). That's why I >> suggest ignoring the following newline only for the ?> at the end of the >> file, because I can't think of another place where you would have a ?> and >> *not* intend output immediately after it. >> >> So I'm not sure I understand your objection, from that standpoint. Did I >> miss something? >> >> Regards. >> > > I'm referring to code like > >
    > >
  • > >
> > Currently this would produce the output > >
    >
  • Foo
  • >
  • Bar
  • >
> > Without the trailing newline elision it would produce > >
    > >
  • Foo
  • >
  • Bar
  • > >
> > I always assumed that this is the reason why we do this in the first place.
Ah. See, it's actually that kind of code that is my problem. A practical example would be:
which currently produces:
foo bar
baz qux
The doubled-up indentation from missing newlines makes it into a mess. And this is even worse in practice when you have more nested control flow. Extra newlines would be fine here, but missing newlines aren't. Thanks. -- Andrea Faulds https://ajf.me/
  100443
September 7, 2017 14:00 cmbecker69@gmx.de ("Christoph M. Becker")
On 07.09.2017 at 15:43, Andrea Faulds wrote:

> Ah. See, it's actually that kind of code that is my problem. A practical > example would be: > > >     >         >             >                 >             >         >     >
I start the "control flow lines" always on column 0 (similar to C preprocessor instructions), what gives the desired output and is quite readable:
-- Christoph M. Becker
  100445
September 7, 2017 14:21 ajf@ajf.me (Andrea Faulds)
Hi,

Christoph M. Becker wrote:
> On 07.09.2017 at 15:43, Andrea Faulds wrote: > >> Ah. See, it's actually that kind of code that is my problem. A practical >> example would be: >> >> >> >> >> >> >> >> >> >>
> > I start the "control flow lines" always on column 0 (similar to C > preprocessor instructions), what gives the desired output and is quite > readable: > > > > > > > > > >
This seems like a reasonable workaround, thank you for the idea. It reminds me of what PHP's source code does with preprocessor instructions: #ifndef FOO # define FOO #endif I might do this in future code. That said, I still think the ?> newline behaviour should be looked at, since this kind of workaround isn't universally applicable (and in any case isn't to everyone's tastes). In particular, if you want to generate plain text and need to insert a newline, having PHP throw them away and requiring you to add extra ones to compensate makes for uglier source code which is harder to reason about. Thanks! -- Andrea Faulds https://ajf.me/
  100447
September 7, 2017 14:38 cmbecker69@gmx.de ("Christoph M. Becker")
On 07.09.2017 at 16:21, Andrea Faulds wrote:

> This seems like a reasonable workaround, thank you for the idea. It > reminds me of what PHP's source code does with preprocessor instructions: > > #ifndef FOO > #    define FOO > #endif
Hence the name PHP. :)
> That said, I still think the ?> newline behaviour should be looked at, > since this kind of workaround isn't universally applicable (and in any > case isn't to everyone's tastes). In particular, if you want to generate > plain text and need to insert a newline, having PHP throw them away and > requiring you to add extra ones to compensate makes for uglier source > code which is harder to reason about.
If you don't mind a trailing space (I don't like them, but well), you can write: bar And of course, there are template engines which could be used as well. Frankly, I don't see any need for action here. :) -- Christoph M. Becker
  100451
September 7, 2017 17:02 gmblar@gmail.com (Andreas Treichel)
> I always assumed that this is the reason why we do this in the first place.
I think the main reason was that old versions of ie go into quirksmode if the doctype is not in the first line of the output e.g.:
  100444
September 7, 2017 14:15 tendoaki@gmail.com (Michael Morris)
> > > Would anyone object to me changing how PHP handles this so that only the > final ?> tag consumes its following newline, and only at the end of the > file? > > Captain Obvious here. It has long been the policy of many large PHP
projects to not close the last PHP tag for this reason. This change wouldn't affect them. It risks affecting projects without this policy, and those tend to be older and often private.
  100446
September 7, 2017 14:23 ajf@ajf.me (Andrea Faulds)
Hi,

Michael Morris wrote:
>> >> >> Would anyone object to me changing how PHP handles this so that only the >> final ?> tag consumes its following newline, and only at the end of the >> file? >> >> > Captain Obvious here. It has long been the policy of many large PHP > projects to not close the last PHP tag for this reason. This change > wouldn't affect them. It risks affecting projects without this policy, and > those tend to be older and often private. >
The idea here though is not to affect code where the entire file is a block. If newlines are still consumed, but only for ?> at the end of the file, those files should still behave the same. What I want to change is how it behaves in other circumstances, i.e. templating. Thanks. -- Andrea Faulds
  100449
September 7, 2017 15:34 tendoaki@gmail.com (Michael Morris)
On Thu, Sep 7, 2017 at 10:23 AM, Andrea Faulds <ajf@ajf.me> wrote:

> What I want to change is how it behaves in other circumstances, i.e. > templating. > > Thanks. > > I get that, but I can think of one example where this innocent change might
BC break something. You cite this change being for templating - this implies the php files with this feature are being loaded by another php file with require() or include(). Suppose someone creates a template wrapper with this circumstance in mind. Instead of doing the obvious, omit the final ?> tag in the template, they write code in the template wrapper to snip the last endline character from the included file. Depending on how their code is written your change could now become a breaking change: for example they just lop off the last character of the template's return without checking to see if it is indeed a newline character.
  100450
September 7, 2017 16:51 rowan.collins@gmail.com (Rowan Collins)
On 7 September 2017 16:34:38 BST, Michael Morris <tendoaki@gmail.com> wrote:
> Suppose someone creates a template >wrapper with this circumstance in mind. Instead of doing the obvious, >omit >the final ?> tag in the template, they write code in the template >wrapper >to snip the last endline character from the included file. Depending on >how >their code is written your change could now become a breaking change: >for >example they just lop off the last character of the template's return >without checking to see if it is indeed a newline character.
I think you have the change the wrong way round (unless I do). The current behaviour is: - PHP blocks at end of file -> suppress following newline - PHP blocks elsewhere in file -> suppress following newline The proposed behaviour is: - PHP blocks at end of file -> suppress following newline (no change) - PHP blocks elsewhere in file -> treat following newline literally So in your scenario, there would be no newline to trim, before or after the proposed change. Regards, -- Rowan Collins [IMSoP]
  100448
September 7, 2017 15:22 thruska@cubiclesoft.com (Thomas Hruska)
On 9/6/2017 6:45 PM, Andrea Faulds wrote:
> Hi everyone, > > This is the tiniest of issues, but it's bugged me for a long time and > makes the HTML produced by PHP code less readable than it out to be. > Specifically, PHP ignores a newline immediately following a ?> tag. The > reason for this is, from what I recall, to prevent issues where > whitespace at the end of a PHP file is echoed before headers can be > sent. On UNIX in particular, all text files (should) end in a newline, > so this is a reasonable and necessary feature. > > However, for ?> tags anywhere that aren't right at the end of the file, > this is just a nuisance that makes for messy output. For example, HTML > output that should look like: > > >     >        >        >     >
foobar
> > May instead end up looking something like: > >     >               >    
foobar
> > Of course, HTML doesn't matter so much, it'll render the same to the > end-user. However, for outputting e.g. plain text, newlines can be > significant, and so you have to insert an ugly and surprising extra > newline following a tag. > > Would anyone object to me changing how PHP handles this so that only the > final ?> tag consumes its following newline, and only at the end of the > file? > > Thanks!
I've noticed that over the years. When I care, I'll either press enter an extra time or, more frequently, switch over to using pure echo statements for precise output control. I don't think of this as a particularly significant issue.* Alternatively, for the HTML case, it is possible to stream an output buffer and manipulate newlines through the TagFilterStream class: https://github.com/cubiclesoft/ultimate-web-scraper That particular class can process HTML at a rate of up to 1MB/sec even when using callbacks via its very efficient stream-based state engine. The extra overhead is minimal for prettifying HTML output. * I'd personally rather see a suitable fix for Bug #73535 at this point. It's been an open issue with a CVE assigned for almost 10 months. It would be nice to see it triaged properly (e.g. the suggested fix applied) so that I can finally close that browser tab. If you have the spare time for newline output adjustments, I'd love to see that extra energy sunk into fixing existing security vulnerabilities, especially those with CVEs and suggested solutions. Just sayin'. But you guys do whatever you want to do. -- Thomas Hruska CubicleSoft President I've got great, time saving software that you will find useful. http://cubiclesoft.com/ And once you find my software useful: http://cubiclesoft.com/donate/