Literal / Taint checking

  106625
August 15, 2019 18:02 craig@craigfrancis.co.uk (Craig Francis)
Hi,

How likely would it be for PHP to do Literal tracking of variables?

This is something that's being discussed JavaScript TC39 at the moment [1],
and I think it would be even more useful in PHP.

We already know we should use parameterized/prepared SQL, but there is no
way to prove the SQL string hasn't been tainted by external data in large
projects, or even in an ORM.

This could also work for templating systems (blocking HTML injection) and
commands.

Internally it would need to introduce a flag on every variable, and a
single function to check if a given variable has only been created by
Literal(s).

Unlike the taint extension, there should be no way to override this (e.g.
no taint/untaint functions); and if it was part of the core language, it
will continue to work after every update.

One day certain functions (e.g. mysqli_query) might use this information to
generate a error/warning/notice; but for now, having it available for
checking would be more than enough.

Craig



    public function exec($sql, $parameters = []) {
        if (!*is_literal*($sql)) {
            throw new Exception('SQL must be a literal.');
        }
        $statement = $this->pdo->prepare($sql);
        $statement->execute($parameters);
        return $statement->fetchAll();
    }

....

    $sql = 'SELECT * FROM table WHERE id = ?';

    $result = $db->exec($sql, [$id]);



[1] https://github.com/tc39/proposal-array-is-template-object
https://github.com/mikewest/tc39-proposal-literals
  106626
August 15, 2019 18:05 kontakt@beberlei.de (Benjamin Eberlei)
On Thu, Aug 15, 2019 at 8:03 PM Craig Francis <craig@craigfrancis.co.uk>
wrote:

> Hi, > > How likely would it be for PHP to do Literal tracking of variables? > > This is something that's being discussed JavaScript TC39 at the moment [1], > and I think it would be even more useful in PHP. > > We already know we should use parameterized/prepared SQL, but there is no > way to prove the SQL string hasn't been tainted by external data in large > projects, or even in an ORM. > > This could also work for templating systems (blocking HTML injection) and > commands. > > Internally it would need to introduce a flag on every variable, and a > single function to check if a given variable has only been created by > Literal(s). > > Unlike the taint extension, there should be no way to override this (e.g. > no taint/untaint functions); and if it was part of the core language, it > will continue to work after every update. > > One day certain functions (e.g. mysqli_query) might use this information to > generate a error/warning/notice; but for now, having it available for > checking would be more than enough. > > Craig > > It is an interesting topic indeed! I remember that laruence wrote an
extension for this a while ago, I have never tried it myself though. You can find it here: https://github.com/laruence/taint
> > public function exec($sql, $parameters = []) { > if (!*is_literal*($sql)) { > throw new Exception('SQL must be a literal.'); > } > $statement = $this->pdo->prepare($sql); > $statement->execute($parameters); > return $statement->fetchAll(); > } > > ... > > $sql = 'SELECT * FROM table WHERE id = ?'; > > $result = $db->exec($sql, [$id]); > > > > [1] https://github.com/tc39/proposal-array-is-template-object > https://github.com/mikewest/tc39-proposal-literals >
  106627
August 15, 2019 18:18 craig@craigfrancis.co.uk (Craig Francis)
On Thu, 15 Aug 2019 at 19:05, Benjamin Eberlei <kontakt@beberlei.de> wrote:

> On Thu, Aug 15, 2019 at 8:03 PM Craig Francis <craig@craigfrancis.co.uk> > wrote: > >> Hi, >> >> How likely would it be for PHP to do Literal tracking of variables? >> >> This is something that's being discussed JavaScript TC39 at the moment >> [1], >> and I think it would be even more useful in PHP. >> >> We already know we should use parameterized/prepared SQL, but there is no >> way to prove the SQL string hasn't been tainted by external data in large >> projects, or even in an ORM. >> >> This could also work for templating systems (blocking HTML injection) and >> commands. >> >> Internally it would need to introduce a flag on every variable, and a >> single function to check if a given variable has only been created by >> Literal(s). >> >> Unlike the taint extension, there should be no way to override this (e.g. >> no taint/untaint functions); and if it was part of the core language, it >> will continue to work after every update. >> >> One day certain functions (e.g. mysqli_query) might use this information >> to >> generate a error/warning/notice; but for now, having it available for >> checking would be more than enough. >> >> Craig >> >> > It is an interesting topic indeed! I remember that laruence wrote an > extension for this a while ago, I have never tried it myself though. You > can find it here: https://github.com/laruence/taint > >
Thanks, I've been using that extension for a few years - laruence has done a fantastic job with it. But it can be a bit buggy; and due to it being a taint based system, with the ability to taint/untaint, it introduces some problems. https://github.com/laruence/taint/issues/54
  106628
August 15, 2019 18:43 matthewmatthew@gmail.com (Matthew Brown)
There are already some userland taint-checking solutions for PHP e.g. the
Phan taint-check plugin from MediaWiki:
https://www.mediawiki.org/wiki/Phan-taint-check-plugin

I'm working on my own userland solution, too (based on Facebook's
approach). Demo is here: https://psalm.dev/r/ebb9522fea
  106629
August 15, 2019 19:20 craig@craigfrancis.co.uk (Craig Francis)
On Thu, 15 Aug 2019 at 7:43 pm, Matthew Brown <matthewmatthew@gmail.com>
wrote:

> There are already some userland taint-checking solutions for PHP e.g. the > Phan taint-check plugin from MediaWiki: > https://www.mediawiki.org/wiki/Phan-taint-check-plugin > > I'm working on my own userland solution, too (based on Facebook's > approach). Demo is here: https://psalm.dev/r/ebb9522fea >
Hi Matthew, If anything, this proposal would help user-land solutions (it gives them more information while the code is in running). At the moment, they all need to make their own parsers, or extensions, and they all have blind spots. I’d also like us to move slowly away from taint checkers that allow for tainted strings to be marked as un-tainted, as these allow mistakes to be made. Please excuse any typos, on my phone, but how about: $sql = ‘... WHERE id = ’ . mysqli_real_escape_string($db, $_GET[‘id’]); It’s been escaped, so surely it’s not tainted any more? Unfortunately, because it’s not surrounded with quote marks, it’s not safe. It also relies on there not being any parsing issues within the database engine itself (parameterised queries help here, as those values aren’t part of the SQL parsing process). Craig
  106630
August 15, 2019 20:36 matthewmatthew@gmail.com (Matthew Brown)
> If anything, this proposal would help user-land solutions (it gives them > more information while the code is in running). >
Well, it might help runtime-based user-land solutions, but not static analysis-based solutions. In our bug disclosure program at Vimeo we've had no SQL injection issues reported, but a number of XSS issues (echoing attacker-controlled data), and those issues cannot so easily be prevented by this technique as there's generally little reason to echo literal values. I can also think of a number of user-constructed SQL queries (e.g. WHERE .... IN) that require non-literal values to work (if this were to come to pass there might be a set of special `unsafe` methods).
  106631
August 15, 2019 22:21 craig@craigfrancis.co.uk (Craig Francis)
On Thu, 15 Aug 2019 at 21:37, Matthew Brown <matthewmatthew@gmail.com>
wrote:

> > If anything, this proposal would help user-land solutions (it gives them >> more information while the code is in running). >> > > Well, it might help runtime-based user-land solutions, but not static > analysis-based solutions. >
I mostly see us needing to use both solutions - static analysis does a deep dive (ideally helped with any information the PHP engine can provide, even if it's just parsing), and this runtime check running constantly - only because static analysis by itself can skip bits, e.g. https://gerrit.wikimedia.org/g/mediawiki/tools/phan/SecurityCheckPlugin/#general-limitations
> In our bug disclosure program at Vimeo we've had no SQL injection issues > reported, but a number of XSS issues (echoing attacker-controlled data), > and those issues cannot so easily be prevented by this technique as there's > generally little reason to echo literal values. >
This proposal can ensure SQL injection is impossible, rather than our current processes, which has some gaps (I'm glad to see you haven't had any reported issues, but I believe you're in the minority). This can also be expanded for command line injection issues (kind of moving away from escapeshellarg). I've not spent enough time on the templating side of things yet (I've been working more on the browser side for this, e.g. CSP and application/xhtml+xml). But I'm hopeful this proposal can still be useful, in a similar way to how the JavaScript changes will help templating (ref Trusted Types). Even if we only use this for guarding some inputs - e.g. a templating system being sure which bits are safe HTML literals, loaded into a DomDocument, and unsafe user data being applied with setAttribute() after some sanity checks. But there are some annoying edge cases, which means that I don't think this can be perfect: $user_homepage = 'javascript:alert(document.cookie)'; Example
> I can also think of a number of user-constructed SQL queries (e.g. WHERE > ... IN) that require non-literal values to work (if this were to come to > pass there might be a set of special `unsafe` methods). >
This is what I've been using for `WHERE ... IN` to create a literal for the SQL string: $parameters = []; $in_sql = $db->*parameter_in*($parameters, 'i', $ids); // I'm using `maxdb_stmt::bind_param` which needs a type. $sql = 'SELECT * FROM table WHERE id IN (' . $in_sql . ')'; $db->fetch_all($sql, $parameters) ... public function *parameter_in*(&$parameters, $type, $values) { $count = count($values); if ($count == 0) { throw new Exception('At least 1 value is required for an IN list'); } if ($type == 'i') { $values = array_map('intval', $values); } else if ($type == 's') { $values = array_map('strval', $values); } else { throw new Exception('Unknown parameter type for parameter_in(), should be "i" or "s"'); } foreach ($values as $value) { $parameters[] = [$type, $value]; } return substr(str_repeat('?,', $count), 0, -1); // Returns a literal string. }
  108537
February 13, 2020 12:31 craig@craigfrancis.co.uk (Craig Francis)
Hi,

While there was a brief discussion about an *is_literal*() method in
August, I'm wondering where I can go next?

Just as a reminder, the main objection seemed to be that Taint checking is
the current solution. For example, those created by Laruence[1],
MediaWiki[2], and Matthew[3]. But this can never be as good at the PHP
engine explicitly stating a variable *only* contains literal values, where
it can be checked at runtime, and be a key part of the development process.

And while I'm using SQL injection in my examples (because it's easy to show
how it can enforce the use of parameterised queries); it would also be
useful to protect against command line injection, and HTML/XSS as well
(e.g. a templating system can only accept HTML as literal strings, and
the user supplied values be provided separately).

I'm assuming this would change the zval structure (to include an
"is_literal" flag?), and it would be more of a PHP 8.0 change, rather than
8.1.

Craig


---

Broken taint check, due to missing quote marks:

$sql = ‘... WHERE id = ’ . mysqli_real_escape_string($db, $_GET[‘id’]);

---

Support for "WHERE ... IN", ideally done via an abstraction, so you don't
need to write this every time:

$sql = '... WHERE id IN (' . substr(str_repeat('?,', count($ids)), 0, -1) .
')';

---

[1] https://github.com/laruence/taint
[2] https://www.mediawiki.org/wiki/Phan-taint-check-plugin
[3] https://psalm.dev/r/ebb9522fea

---




On Thu, 15 Aug 2019 at 19:02, Craig Francis <craig@craigfrancis.co.uk>
wrote:

> Hi, > > How likely would it be for PHP to do Literal tracking of variables? > > This is something that's being discussed JavaScript TC39 at the moment > [1], and I think it would be even more useful in PHP. > > We already know we should use parameterized/prepared SQL, but there is no > way to prove the SQL string hasn't been tainted by external data in large > projects, or even in an ORM. > > This could also work for templating systems (blocking HTML injection) and > commands. > > Internally it would need to introduce a flag on every variable, and a > single function to check if a given variable has only been created by > Literal(s). > > Unlike the taint extension, there should be no way to override this (e.g. > no taint/untaint functions); and if it was part of the core language, it > will continue to work after every update. > > One day certain functions (e.g. mysqli_query) might use this information > to generate a error/warning/notice; but for now, having it available for > checking would be more than enough. > > Craig > > > > public function exec($sql, $parameters = []) { > if (!*is_literal*($sql)) { > throw new Exception('SQL must be a literal.'); > } > $statement = $this->pdo->prepare($sql); > $statement->execute($parameters); > return $statement->fetchAll(); > } > > ... > > $sql = 'SELECT * FROM table WHERE id = ?'; > > $result = $db->exec($sql, [$id]); > > > > [1] https://github.com/tc39/proposal-array-is-template-object > https://github.com/mikewest/tc39-proposal-literals >