Re: Literal / Taint checking

This is only part of a thread. view whole thread
  108537
February 13, 2020 12:31 craig@craigfrancis.co.uk (Craig Francis)
Hi,

While there was a brief discussion about an *is_literal*() method in
August, I'm wondering where I can go next?

Just as a reminder, the main objection seemed to be that Taint checking is
the current solution. For example, those created by Laruence[1],
MediaWiki[2], and Matthew[3]. But this can never be as good at the PHP
engine explicitly stating a variable *only* contains literal values, where
it can be checked at runtime, and be a key part of the development process.

And while I'm using SQL injection in my examples (because it's easy to show
how it can enforce the use of parameterised queries); it would also be
useful to protect against command line injection, and HTML/XSS as well
(e.g. a templating system can only accept HTML as literal strings, and
the user supplied values be provided separately).

I'm assuming this would change the zval structure (to include an
"is_literal" flag?), and it would be more of a PHP 8.0 change, rather than
8.1.

Craig


---

Broken taint check, due to missing quote marks:

$sql = ‘... WHERE id = ’ . mysqli_real_escape_string($db, $_GET[‘id’]);

---

Support for "WHERE ... IN", ideally done via an abstraction, so you don't
need to write this every time:

$sql = '... WHERE id IN (' . substr(str_repeat('?,', count($ids)), 0, -1) .
')';

---

[1] https://github.com/laruence/taint
[2] https://www.mediawiki.org/wiki/Phan-taint-check-plugin
[3] https://psalm.dev/r/ebb9522fea

---




On Thu, 15 Aug 2019 at 19:02, Craig Francis <craig@craigfrancis.co.uk>
wrote:

> Hi, > > How likely would it be for PHP to do Literal tracking of variables? > > This is something that's being discussed JavaScript TC39 at the moment > [1], and I think it would be even more useful in PHP. > > We already know we should use parameterized/prepared SQL, but there is no > way to prove the SQL string hasn't been tainted by external data in large > projects, or even in an ORM. > > This could also work for templating systems (blocking HTML injection) and > commands. > > Internally it would need to introduce a flag on every variable, and a > single function to check if a given variable has only been created by > Literal(s). > > Unlike the taint extension, there should be no way to override this (e.g. > no taint/untaint functions); and if it was part of the core language, it > will continue to work after every update. > > One day certain functions (e.g. mysqli_query) might use this information > to generate a error/warning/notice; but for now, having it available for > checking would be more than enough. > > Craig > > > > public function exec($sql, $parameters = []) { > if (!*is_literal*($sql)) { > throw new Exception('SQL must be a literal.'); > } > $statement = $this->pdo->prepare($sql); > $statement->execute($parameters); > return $statement->fetchAll(); > } > > ... > > $sql = 'SELECT * FROM table WHERE id = ?'; > > $result = $db->exec($sql, [$id]); > > > > [1] https://github.com/tc39/proposal-array-is-template-object > https://github.com/mikewest/tc39-proposal-literals >
  108906
March 9, 2020 13:47 craig@craigfrancis.co.uk (Craig Francis)
Hi,

As I'm not sure how to make any more process on this, I've added added a
Feature Request:

https://bugs.php.net/bug.php?id=79359

It shows how this change in PHP could stop SQL injection, and proposes a
way it could be used against HTML injection as well.

Craig



On Thu, 13 Feb 2020 at 12:31, Craig Francis <craig@craigfrancis.co.uk>
wrote:

> Hi, > > While there was a brief discussion about an *is_literal*() method in > August, I'm wondering where I can go next? > > Just as a reminder, the main objection seemed to be that Taint checking is > the current solution. For example, those created by Laruence[1], > MediaWiki[2], and Matthew[3]. But this can never be as good at the PHP > engine explicitly stating a variable *only* contains literal values, where > it can be checked at runtime, and be a key part of the development process. > > And while I'm using SQL injection in my examples (because it's easy to > show how it can enforce the use of parameterised queries); it would also be > useful to protect against command line injection, and HTML/XSS as well > (e.g. a templating system can only accept HTML as literal strings, and > the user supplied values be provided separately). > > I'm assuming this would change the zval structure (to include an > "is_literal" flag?), and it would be more of a PHP 8.0 change, rather than > 8.1. > > Craig > > > --- > > Broken taint check, due to missing quote marks: > > $sql = ‘... WHERE id = ’ . mysqli_real_escape_string($db, $_GET[‘id’]); > > --- > > Support for "WHERE ... IN", ideally done via an abstraction, so you don't > need to write this every time: > > $sql = '... WHERE id IN (' . substr(str_repeat('?,', count($ids)), 0, -1) > . ')'; > > --- > > [1] https://github.com/laruence/taint > [2] https://www.mediawiki.org/wiki/Phan-taint-check-plugin > [3] https://psalm.dev/r/ebb9522fea > > --- > > > > > On Thu, 15 Aug 2019 at 19:02, Craig Francis <craig@craigfrancis.co.uk> > wrote: > >> Hi, >> >> How likely would it be for PHP to do Literal tracking of variables? >> >> This is something that's being discussed JavaScript TC39 at the moment >> [1], and I think it would be even more useful in PHP. >> >> We already know we should use parameterized/prepared SQL, but there is no >> way to prove the SQL string hasn't been tainted by external data in large >> projects, or even in an ORM. >> >> This could also work for templating systems (blocking HTML injection) and >> commands. >> >> Internally it would need to introduce a flag on every variable, and a >> single function to check if a given variable has only been created by >> Literal(s). >> >> Unlike the taint extension, there should be no way to override this (e.g.. >> no taint/untaint functions); and if it was part of the core language, it >> will continue to work after every update. >> >> One day certain functions (e.g. mysqli_query) might use this information >> to generate a error/warning/notice; but for now, having it available for >> checking would be more than enough. >> >> Craig >> >> >> >> public function exec($sql, $parameters = []) { >> if (!*is_literal*($sql)) { >> throw new Exception('SQL must be a literal.'); >> } >> $statement = $this->pdo->prepare($sql); >> $statement->execute($parameters); >> return $statement->fetchAll(); >> } >> >> ... >> >> $sql = 'SELECT * FROM table WHERE id = ?'; >> >> $result = $db->exec($sql, [$id]); >> >> >> >> [1] https://github.com/tc39/proposal-array-is-template-object >> https://github.com/mikewest/tc39-proposal-literals >> >
  108913
March 9, 2020 16:53 rowan.collins@gmail.com (Rowan Tommins)
On Mon, 9 Mar 2020 at 13:47, Craig Francis <craig@craigfrancis.co.uk> wrote:

> Hi, > > As I'm not sure how to make any more process on this, I've added added a > Feature Request: > > https://bugs.php.net/bug.php?id=79359 > > It shows how this change in PHP could stop SQL injection, and proposes a > way it could be used against HTML injection as well. >
Hi Craig, In my experience, the bug tracker is likely to get you less attention than this list, rather than more. For this kind of significant change, the way to get a more in-depth discussion going is to draft an RFC; there are some instructions and tips on how to go about that at https://wiki.php.net/rfc/howto and https://blogs.oracle.com/opal/the-mysterious-php-rfc-process-and-how-you-can-change-the-web The idea of an RFC is to sit down and design exactly how the proposed feature would work; that helps move the discussion forward, because people can see exactly how it might look, and means you're offering something to the community rather than asking it of them. The RFC doesn't have to include a full implementation, but if you don't know much about the technical details, you might need help from someone who does to make sure the proposal is realistic. I see you've linked an older RFC in the feature request; it would be worth digging out the archived discussion from when that was proposed, to see why it stalled. It may just be that people were distracted by other things, or there may be issues raised which you can consider in a new proposal. If you haven't already, you could try contacting the author as well. In general, I think it's an interesting idea, but as the saying goes "the devil is in the detail", so I don't have much to say without a concrete proposal for what it would look like. Regards, -- Rowan Tommins [IMSoP]
  108977
March 11, 2020 13:09 craig@craigfrancis.co.uk (Craig Francis)
On Mon, 9 Mar 2020 at 16:54, Rowan Tommins collins@gmail.com> wrote:

> [...] the way to get a more in-depth discussion going is to draft an RFC
Thanks Rowan, I've created a Wiki account (craigfrancis), and I believe the next step is to ask for RFC karma? And is there is anyone who can help with the technical details? I'd really appreciate it. If it helps, I've got a development budget of £1,000 I could put towards this (I will need an invoice, a simple PDF would do). As to the discussion around the older RFC[1]... The last message I can find was on the 1st September 2015[2]. The RFC focused on SQL injection, where it was noted that "unfiltered input can affect way more than only SQL"[3], and it isn't ideal for "just for one use case"[4] - my proposed `is_literal()` can be used for other issues, such as Cross-Site Scripting[5], Command Line Injection, etc. There was a belief that education was the answer[6] - but having this check would allow developers to identify (and block) mistakes at runtime. Xinchen mentioned how it was complex in PHP5 to implement the Taint extension - but "with PHP7's new zend_string, and string flags, the implementation will become easier"[7]. And while the Taint checking is useful, it does not address the mistakes that can happen with escaping. As to why I'm deviating away from the original RFC... By providing a `is_literal()` function, it allows the developer to determine how they want to use it - where they can skip it for certain tasks[8], and database drivers (or other extensions) could use it in the future to raise a notice/warning/error[9]. It gives a mechanism to ensure inputs are split between the command (a literal), and user supplied values - which is what Yasuo was asking for[10].. Also, by focusing on just literals (as in, only values defined within PHP scripts), we avoid any concerns about escaping (which can go wrong), and we won't need to identify which sources are trusted[11]. For the last 5 years I've been writing my SQL with literals only, and it's worked very well... with just one oddity (which I still consider a literal): [1] https://news-web.php.net/php.internals/87346 [2] https://news-web.php.net/php.internals/87970 [3] https://news-web.php.net/php.internals/87355 [4] https://news-web.php.net/php.internals/87647 [5] https://news-web.php.net/php.internals/87400 https://bugs.php.net/bug.php?id=79359#1583761206 [6] https://news-web.php.net/php.internals/87383 [7] https://news-web.php.net/php.internals/87396 [8] https://news-web.php.net/php.internals/87406 https://news-web.php.net/php.internals/87446 [9] https://news-web.php.net/php.internals/87436 https://news-web.php.net/php.internals/87650 [10] https://news-web.php.net/php.internals/87725 [11] https://news-web.php.net/php.internals/87667 On Mon, 9 Mar 2020 at 16:54, Rowan Tommins collins@gmail.com> wrote:
> On Mon, 9 Mar 2020 at 13:47, Craig Francis <craig@craigfrancis.co.uk> > wrote: > > > Hi, > > > > As I'm not sure how to make any more process on this, I've added added a > > Feature Request: > > > > https://bugs.php.net/bug.php?id=79359 > > > > It shows how this change in PHP could stop SQL injection, and proposes a > > way it could be used against HTML injection as well. > > > > > Hi Craig, > > In my experience, the bug tracker is likely to get you less attention than > this list, rather than more. For this kind of significant change, the way > to get a more in-depth discussion going is to draft an RFC; there are some > instructions and tips on how to go about that at > https://wiki.php.net/rfc/howto and > > https://blogs.oracle.com/opal/the-mysterious-php-rfc-process-and-how-you-can-change-the-web > > The idea of an RFC is to sit down and design exactly how the proposed > feature would work; that helps move the discussion forward, because people > can see exactly how it might look, and means you're offering something to > the community rather than asking it of them. The RFC doesn't have to > include a full implementation, but if you don't know much about the > technical details, you might need help from someone who does to make sure > the proposal is realistic. > > I see you've linked an older RFC in the feature request; it would be worth > digging out the archived discussion from when that was proposed, to see why > it stalled. It may just be that people were distracted by other things, or > there may be issues raised which you can consider in a new proposal. If you > haven't already, you could try contacting the author as well. > > In general, I think it's an interesting idea, but as the saying goes "the > devil is in the detail", so I don't have much to say without a concrete > proposal for what it would look like. > > Regards, > -- > Rowan Tommins > [IMSoP] >