[RFC] is_literal()

  109183
March 21, 2020 19:13 craig@craigfrancis.co.uk (Craig Francis)
Hi,

I've written up my suggestion for a is_literal() function:

https://wiki.php.net/rfc/is_literal

Any feedback would be appreciated.

Craig
  109184
March 21, 2020 19:50 larry@garfieldtech.com ("Larry Garfield")
On Sat, Mar 21, 2020, at 2:13 PM, Craig Francis wrote:
> Hi, > > I've written up my suggestion for a is_literal() function: > > https://wiki.php.net/rfc/is_literal > > Any feedback would be appreciated. > > Craig
While I appreciate the intent, without an untaint() or equivalent I fear its usefulness will be limited, or else it will get overused and thus cut off numerous entirely valid situations. Eg, there's plenty of very good reasons to put a template string into the database rather than a file literal. Or to build an SQL query dynamically in ways that an is_literal check would not allow, at least not without an absurdly complex query builder. Without a way to flag "yes, I know this was built dynamically but I've vetted it, it's OK" on a value, I fear such a check will either be unuseful or counter-productive. --Larry Garfield
  109187
March 21, 2020 20:09 craig@craigfrancis.co.uk (Craig Francis)
On Sat, 21 Mar 2020 at 19:51, Larry Garfield <larry@garfieldtech.com> wrote:

> Eg, there's plenty of very good reasons to put a template string into the > database rather than a file literal. Or to build an SQL query dynamically > in ways that an is_literal check would not allow, at least not without an > absurdly complex query builder.
Thanks Larry, I think the examples I've provided should cover the issues that typically get raised. The main ones tend to be "WHERE x IN (?,?,?)" and "ORDER BY variable", where the current work arounds get a bit risky (such as string escaping), but please let me know if I've missed any. Craig On Sat, 21 Mar 2020 at 19:51, Larry Garfield <larry@garfieldtech.com> wrote:
> On Sat, Mar 21, 2020, at 2:13 PM, Craig Francis wrote: > > Hi, > > > > I've written up my suggestion for a is_literal() function: > > > > https://wiki.php.net/rfc/is_literal > > > > Any feedback would be appreciated. > > > > Craig > > While I appreciate the intent, without an untaint() or equivalent I fear > its usefulness will be limited, or else it will get overused and thus cut > off numerous entirely valid situations. > > Eg, there's plenty of very good reasons to put a template string into the > database rather than a file literal. Or to build an SQL query dynamically > in ways that an is_literal check would not allow, at least not without an > absurdly complex query builder. > > Without a way to flag "yes, I know this was built dynamically but I've > vetted it, it's OK" on a value, I fear such a check will either be unuseful > or counter-productive. > > --Larry Garfield > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > >
  109185
March 21, 2020 19:54 Danack@basereality.com (Dan Ackroyd)
On Sat, 21 Mar 2020 at 19:14, Craig Francis <craig@craigfrancis.co.uk> wrote:
> > Any feedback would be appreciated.
The RFC should probably define what a literal is. cheers Dan Ack
  109188
March 21, 2020 20:21 craig@craigfrancis.co.uk (Craig Francis)
On Sat, 21 Mar 2020 at 19:54, Dan Ackroyd <Danack@basereality.com> wrote:

> On Sat, 21 Mar 2020 at 19:14, Craig Francis <craig@craigfrancis.co.uk> > wrote: > > > > Any feedback would be appreciated. > > The RFC should probably define what a literal is. > >
Good point Dan, I've added it to the intro. Please let me know if it needs more work (I'm under the impression that PHP already defines these as literals, hence the name).
  109186
March 21, 2020 20:06 ben@benramsey.com (Ben Ramsey)
> On Mar 21, 2020, at 14:13, Craig Francis <craig@craigfrancis.co.uk> wrote: > > Hi, > > I've written up my suggestion for a is_literal() function: > > https://wiki.php.net/rfc/is_literal > > Any feedback would be appreciated. > > Craig
This seems very similar to the taint extension. How does it differ from ext-taint, and have you considered working together with ext-taint? https://www.php.net/manual/en/intro.taint.php The name is_literal() is too ambiguous to me. As Dan suggested, the RFC should define what a literal is. Cheers, Ben
  109189
March 21, 2020 20:26 craig@craigfrancis.co.uk (Craig Francis)
On Sat, 21 Mar 2020 at 20:06, Ben Ramsey <ben@benramsey.com> wrote:

> This seems very similar to the taint extension
Hi Ben, Yes, it is similar; I hope I've covered the differences under the "Taint Checking" heading (please let me know if that could be improved). https://wiki.php.net/rfc/is_literal#taint_checking As to the name, it's to work alongside functions such as is_int(), is_string(), etc - is that a good enough reason? Craig On Sat, 21 Mar 2020 at 20:06, Ben Ramsey <ben@benramsey.com> wrote:
> > On Mar 21, 2020, at 14:13, Craig Francis <craig@craigfrancis.co.uk> > wrote: > > > > Hi, > > > > I've written up my suggestion for a is_literal() function: > > > > https://wiki.php.net/rfc/is_literal > > > > Any feedback would be appreciated. > > > > Craig > > > This seems very similar to the taint extension. How does it differ from > ext-taint, and have you considered working together with ext-taint? > > https://www.php.net/manual/en/intro.taint.php > > The name is_literal() is too ambiguous to me. As Dan suggested, the RFC > should define what a literal is. > > Cheers, > Ben > >
  109190
March 21, 2020 21:23 jakob@givoni.dk (Jakob Givoni)
On Sat, Mar 21, 2020 at 3:26 PM Craig Francis <craig@craigfrancis.co.uk> wrote:
> > As to the name, it's to work alongside functions such as > is_int(), is_string(), etc - is that a good enough reason?
I think it could cause confusion since int and string are value types, - literal is not a type as such, it merely describes the way the value was created.
> I'm under the impression that PHP already defines these as literals
Does anyone know if that is correct? Does PHP remember how the string variable / constant was created?
  109191
March 21, 2020 21:58 craig@craigfrancis.co.uk (Craig Francis)
On Sat, 21 Mar 2020 at 21:23, Jakob Givoni <jakob@givoni.dk> wrote:

> On Sat, Mar 21, 2020 at 3:26 PM Craig Francis <craig@craigfrancis.co.uk> > wrote: > > > > As to the name, it's to work alongside functions such as > > is_int(), is_string(), etc - is that a good enough reason? > > I think it could cause confusion since int and string are value types, > - literal is not a type as such, it merely describes the way the value > was created. >
I'm happy to use a different name; but I should add that is_numeric() isn't really a type, there are other functions such as is_writable(), and the taint extension uses is_tainted().
> I'm under the impression that PHP already defines these as literals > > Does anyone know if that is correct? Does PHP remember how the string > variable / constant was created? >
I've talked to Paul Dragoonis and Derick Rethans recently (they both kindly did talks at PHP-SW); when I mentioned it to Paul, I was told that's where I should start looking, and that was the correct terminology; and Derick helped confirm some of these ideas (but we were walking to the pub at the time). And while I keep trying, I don't know enough about C, or the internals of PHP.
  109197
March 21, 2020 23:22 jakob@givoni.dk (Jakob Givoni)
On Sat, Mar 21, 2020 at 4:58 PM Craig Francis <craig@craigfrancis.co.uk> wrote:

> I'm happy to use a different name; but I should add that is_numeric() isn't really a type, there are other functions such as is_writable(), and the taint extension uses is_tainted().
Right, good points. However, to my logic, whether or not the value was created from a literal is not something you can infer from the value itself, it needs an accompanying flag or something. I'll suggest is_from_literal() as a more precise formulation.
> I've talked to Paul Dragoonis and Derick Rethans recently (they both kindly did talks at PHP-SW); when I mentioned it to Paul, I was told that's where I should start looking, and that was the correct terminology; and Derick helped confirm some of these ideas (but we were walking to the pub at the time).
Sounds like you were having fun though :-)
> And while I keep trying, I don't know enough about C, or the internals of PHP.
Appreciate the effort.
  109198
March 22, 2020 01:48 craig@craigfrancis.co.uk (Craig Francis)
On Sat, 21 Mar 2020 at 23:22, Jakob Givoni <jakob@givoni.dk> wrote:

> I'll suggest is_from_literal() as a more precise formulation.
Good suggestion, I've added it to the list of Open Issues (if we can determine the different ways this can be achieved, without affecting performance, then we can see what people prefer for the name). Sounds like you were having fun though :-) I was; although I must confess, I didn't recognise Derick until the talk began (then, suddenly, the voice from the PHP Internals podcast). On Sat, 21 Mar 2020 at 23:22, Jakob Givoni <jakob@givoni.dk> wrote:
> On Sat, Mar 21, 2020 at 4:58 PM Craig Francis <craig@craigfrancis.co.uk> > wrote: > > > I'm happy to use a different name; but I should add that is_numeric() > isn't really a type, there are other functions such as is_writable(), and > the taint extension uses is_tainted(). > > Right, good points. However, to my logic, whether or not the value was > created from a literal is not something you can infer from the value > itself, it needs an accompanying flag or something. > > I'll suggest is_from_literal() as a more precise formulation. > > > I've talked to Paul Dragoonis and Derick Rethans recently (they both > kindly did talks at PHP-SW); when I mentioned it to Paul, I was told that's > where I should start looking, and that was the correct terminology; and > Derick helped confirm some of these ideas (but we were walking to the pub > at the time). > > Sounds like you were having fun though :-) > > > And while I keep trying, I don't know enough about C, or the internals > of PHP. > > Appreciate the effort. >
  109192
March 21, 2020 21:59 tysonandre775@hotmail.com (tyson andre)
Hi Craig,

https://github.com/laruence/taint#taint notes that
"Please note that do not enable this extension in product(ion) env, since it will slowdown your app."

- That repo already provides is_tainted()  http://docs.php.net/is_tainted

  A fork of that repo would theoretically allow implementing is_literal() as described in the RFC - is that the implementation plan?
- The slowdown would be a large problem if this feature was always on.

  And if it can be implemented as a PECL module, that would be more preferable to me than a core module of php.
  If it was in core, having to support that feature may limit optimizations or implementation changes that could be done in the future.

If it's implemented in the same way as taint (i.e. cannot be used in combination with XDebug, blackfire, newrelic, etc),
that would also be a problem for including it in core.
If it wasn't, then it'd slow down concatenation, calls, etc. even when the application didn't need is_literal.

I also imagine that whether or not opcache was enabled is likely to affect whether or not
something ends up being a literal or not
(e.g. opcache can evaluate functions such as str_repeat() for literals at compile time)
https://github.com/laruence/taint/blob/master/taint.c seems to already support a whitelist (php_taint_override_func),
so that isn't insurmountable for functions,
but it's possible `if ($local === 'literal') { process($local); }` would only satisfy is_literal() with opcache enabled.

Related projects (static analysis instead of runtime checks, though):

It's also worth noting that `vimeo/psalm` had an in progress way to detect some ways in which tainted strings may be used by applications, based on a paper by Facebook.
(https://cacm.acm.org/magazines/2019/8/238344-scaling-static-analyses-at-facebook/fulltext (for HHVM, though))
https://github.com/vimeo/psalm/issues/611#issuecomment-515153305 - but it isn't completed or usable yet, as far as I can tell.

Wikimedia also created https://gerrit.wikimedia.org/g/mediawiki/tools/phan/SecurityCheckPlugin/ , but that's currently beta.
Both would have ways they fail to catch every way an argument could be passed to a function (e.g. unanalyzable dynamic/framework calls)

- Tyson
  109194
March 21, 2020 22:55 craig@craigfrancis.co.uk (Craig Francis)
Hi Tyson,

Thanks for your email, I really appreciate it.

I'm not a C developer, but if I was to have a go at implementing
`is_literal()`, I probably would start with the existing taint extension.

But you're right, that approach can cause performance issues, and doesn't
play well with XDebug (much to my annoyance).

I'm hoping that someone with more C and PHP Internals experience might have
a better solution, as there was some mention of the existing engine knowing
about literals being semi-special, and there might be (I'm really showing
my ignorance here) a better way to work with the compiler.

As to using PECL; my hope is that we can use this to help educate
developers about the dangers of mixing variables with commands. While they
won't think to use this function themselves, it can be used by frameworks
to ensure mistakes aren't being made (and who knows, maybe one day
functions like mysqli_query() could use this as well).

As to the related projects / static analysis; while I do appreciate them,
similar tools have been available for years (admittedly some with huge
price tags), and I don't think they have really solved these problems. That
said, I have added a note in the RFC.

Thanks again,
Craig



On Sat, 21 Mar 2020 at 21:59, tyson andre <tysonandre775@hotmail.com> wrote:

> Hi Craig, > > https://github.com/laruence/taint#taint notes that > "Please note that do not enable this extension in product(ion) env, since > it will slowdown your app." > > - That repo already provides is_tainted() http://docs.php.net/is_tainted > > A fork of that repo would theoretically allow implementing is_literal() > as described in the RFC - is that the implementation plan? > - The slowdown would be a large problem if this feature was always on. > > And if it can be implemented as a PECL module, that would be more > preferable to me than a core module of php. > If it was in core, having to support that feature may limit > optimizations or implementation changes that could be done in the future. > > If it's implemented in the same way as taint (i.e. cannot be used in > combination with XDebug, blackfire, newrelic, etc), > that would also be a problem for including it in core. > If it wasn't, then it'd slow down concatenation, calls, etc. even when the > application didn't need is_literal. > > I also imagine that whether or not opcache was enabled is likely to affect > whether or not > something ends up being a literal or not > (e.g. opcache can evaluate functions such as str_repeat() for literals at > compile time) > https://github.com/laruence/taint/blob/master/taint.c seems to already > support a whitelist (php_taint_override_func), > so that isn't insurmountable for functions, > but it's possible `if ($local === 'literal') { process($local); }` would > only satisfy is_literal() with opcache enabled. > > Related projects (static analysis instead of runtime checks, though): > > It's also worth noting that `vimeo/psalm` had an in progress way to detect > some ways in which tainted strings may be used by applications, based on a > paper by Facebook. > ( > https://cacm.acm.org/magazines/2019/8/238344-scaling-static-analyses-at-facebook/fulltext > (for HHVM, though)) > https://github.com/vimeo/psalm/issues/611#issuecomment-515153305 - but it > isn't completed or usable yet, as far as I can tell. > > Wikimedia also created > https://gerrit.wikimedia.org/g/mediawiki/tools/phan/SecurityCheckPlugin/ > , but that's currently beta. > Both would have ways they fail to catch every way an argument could be > passed to a function (e.g. unanalyzable dynamic/framework calls) > > - Tyson >