# Analyzing control flow

CodeQL for C/C++

## Setup

Note

For the examples in this presentation, we will be analyzing ChakraCore.

You can query the project in the query console on LGTM.com.

You can download the database as a zip file by clicking the link on the slide above. To use the database in CodeQL for Visual Studio Code:

1. Unzip the file
2. Add the unzipped database to Visual Studio Code
3. Upgrade the database if necessary

For further information, see Using the extension in the CodeQL for Visual Studio Code help.

Note that results generated in the query console are likely to differ to those generated in CodeQL for Visual Studio Code as LGTM.com analyzes the most recent revisions of each project that has been added–the CodeQL database available to download above is based on an historical version of the codebase.

## Agenda

• Control flow graphs
• Exercise: use after free
• Recursion over the control flow graph
• Basic blocks
• Guard conditions

## Control flow graphs

We frequently want to ask questions about the possible order of execution for a program.

Example:

if (x) {
return 1;
} else {
return 2;
}


Possible execution order is usually represented by a control flow graph:

Note

The control flow graph is a static over-approximation of possible control flow at runtime. Its nodes are program elements such as expressions and statements. If there is an edge from one node to another, then it means that the semantic operation corresponding to the first node may be immediately followed by the operation corresponding to the second node. Some nodes (such as conditions of “if” statements or loop conditions) have more than one successor, representing conditional control flow at runtime.

## Modeling control flow

The control flow is modeled with a CodeQL class, ControlFlowNode. Examples of control flow nodes include statements and expressions.

• ControlFlowNode provides API for traversing the control flow graph:
• ControlFlowNode ControlFlowNode.getASuccessor()
• ControlFlowNode ControlFlowNode.getAPredecessor()
• ControlFlowNode ControlFlowNode.getATrueSuccessor()
• ControlFlowNode ControlFlowNode.getAFalseSuccessor()

The control-flow graph is intra-procedural–in other words, only models paths within a function. To find the associated function, use

• Function ControlFlowNode.getControlFlowScope()

Note

The control flow graph is similar in concept to data flow graphs. In contrast to data flow, however, the AST nodes are directly control flow graph nodes.

The predecessor/successor predicates are prime examples of member predicates with results that are used in functional syntax, but that are not actually functions. This is because a control flow node may have any number of predecessors and successors (including zero or more than one).

## Example: malloc/free pairs

Find calls to free that are reachable from an allocation on the same variable:

import cpp

from FunctionCall alloc, FunctionCall free, LocalScopeVariable v
where allocationCall(alloc)
and alloc = v.getAnAssignedValue()
and freeCall(free, v.getAnAccess())
and alloc.getASuccessor+() = free
select alloc, free


Note

Predicates allocationCall and freeCall are defined in the standard library and model a number of standard alloc/free-like functions.

## Exercise: use after free

Based on this query, write a query that finds accesses to the variable that occur after the free.

• What do you find? What problems occur with this approach to detecting use-after-free vulnerabilities?

import cpp

from FunctionCall free, LocalScopeVariable v, VariableAccess u
where freeCall(free, v.getAnAccess())
and u = v.getAnAccess()
and u.isRValue()
and free.getASuccessor+() = u
select free, u


## Utilizing recursion

The main problem we observed in the previous exercise was that the successor’s relation is unaware of changes to the variable that would invalidate our results.

We can fix this by writing our own successor predicate that stops traversing the CFG if the variable is re-defined.

## Utilizing recursion

ControlFlowNode reachesWithoutReassignment(FunctionCall free, LocalScopeVariable v)
{
freeCall(free, v.getAnAccess()) and
(
// base case
result = free
or
// recursive case
exists(ControlFlowNode mid |
mid = reachesWithoutReassignment(free, v) and
result = mid.getASuccessor() and
// stop tracking when the value may change
not result = v.getAnAssignedValue() and
)
)
}


## Exercise

Find local variables that are written to, and then never accessed again.

Hint: Use LocalVariable.getAnAssignment().

import cpp

from LocalVariable lv, ControlFlowNode def
where
def = lv.getAnAssignment() and
not exists(VariableAccess use |
use = lv.getAnAccess() and
use = def.getASuccessor+()
)
select lv, def


## Basic blocks

BasicBlock represents basic blocks, that is, straight-line sequences of control flow nodes without branching.

• ControlFlowNode BasicBlock.getNode(int)
• BasicBlock BasicBlock.getASuccessor()
• BasicBlock BasicBlock.getAPredecessor()
• BasicBlock BasicBlock.getATrueSuccessor()
• BasicBlock BasicBlock.getAFalseSuccessor()

Often, queries can be made more efficient by treating basic blocks as a unit instead of reasoning about individual control flow nodes.

## Exercise: unreachable blocks

Write a query to find unreachable basic blocks.

Hint: First define a recursive predicate to identify reachable blocks. Class EntryBasicBlock may be useful.

import cpp

predicate isReachable(BasicBlock bb) {
bb instanceof EntryBasicBlock or
isReachable(bb.getAPredecessor())
}

from BasicBlock bb
where not isReachable(bb)
select bb


Note

This query has a good number of false positives on Chakra, many of them to do with templating and macros.

## Guard conditions

A GuardCondition is a Boolean condition that controls one or more basic blocks in the sense that it is known to be true/false at the entry of those blocks.

• GuardCondition.controls(BasicBlock bb, boolean outcome): the entry of bb can only be reached if the guard evaluates to outcome
• GuardCondition.comparesLt, GuardCondition.ensuresLt, GuardCondition.comparesEq: auxiliary predicates to identify conditions that guarantee that one expression is less than/equal to another

## Call graph customizations

The default implementation of call target resolution does not handle function pointers, because they are difficult to deal with in general.

We can, however, add support for particular patterns of use by contributing a new override of Call.getTarget.

## Exercise: unresolvable calls

Write a query that finds all calls for which no call target can be determined, and run it on libjpeg-turbo.

Examine the results. What do you notice?

import cpp

from Call c
where not exists(c.getTarget())
select c

• Many results are calls through struct fields emulating virtual dispatch.

## Exercise: resolving calls through variables

Write a query that resolves the call at cjpeg.c:640.

Hint: Use classes ExprCall, PointerDereferenceExpr, and Access.

import cpp

from ExprCall c, PointerDereferenceExpr deref, VariableAccess va,
Access fnacc
where c.getLocation().getFile().getBaseName() = "cjpeg.c" and
c.getLocation().getStartLine() = 640 and
deref = c.getExpr() and
va = deref.getOperand() and
fnacc = va.getTarget().getAnAssignedValue()
select c, fnacc.getTarget()


## Exercise: customizing the call graph

Create a subclass of ExprCall that uses your query to implement getTarget.

class CallThroughVariable extends ExprCall {
Variable v;
CallThroughVariable() {
exists(PointerDereferenceExpr callee | callee = getExpr() |
callee.getOperand() = v.getAnAccess()
)
}
override Function getTarget() {
result = super.getTarget() or
exists(Access init | init = v.getAnAssignedValue() |
result = init.getTarget()
)
}
}


## Control-flow graph customizations

The default control-flow graph implementation recognizes a few common patterns for non-returning functions, but sometimes it fails to spot them, which can cause imprecision.

We can add support for new non-returning functions by overriding ControlFlowNode.getASuccessor().

## Exercise: calls to error_exit

Write a query that finds all calls to a field called error_exit.

Hint: Reuse (parts of) the CallThroughVariable class from before.

class CallThroughVariable extends ExprCall { ... }

class ErrorExitCall extends CallThroughVariable {
override Field v;

ErrorExitCall() { v.getName() = "error_exit" }
}

from ErrorExitCall eec
select eec


## Exercise: customizing the control-flow graph

Override ControlFlowNode to mark calls to error_exit as non-returning.

Hint: ExprCall is an indirect subclass of ControlFlowNode.

class CallThroughVariable extends ExprCall { ... }

class ErrorExitCall extends CallThroughVariable {
override Field v;

ErrorExitCall() { v.getName() = "error_exit" }

override ControlFlowNode getASuccessor() { none() }
}


## CustomOptions class

The Options library defines a CustomOptions class with various member predicates that can be overridden to customize aspects of the analysis.

In particular, it has an exprExits predicate that can be overridden to more easily perform the customization on the previous slide:

import Options

class MyOptions extends CustomOptions {
override predicate exprExits(Expr e) {
super.exprExits(e) or ...
}
}