Module CFG

Provides classes for working with a CFG-based program representation.

Overview

Each StmtContainer (that is, function or toplevel) has an intra-procedural CFG associated with it, which is composed of ControlFlowNodes under a successor relation exposed by predicates ControlFlowNode.getASuccessor() and ControlFlowNode.getAPredecessor().

Each CFG has designated entry and exit nodes with types ControlFlowEntryNode and ControlFlowExitNode, respectively, which are the only two subtypes of SyntheticControlFlowNode. All ControlFlowNodes that are not SyntheticControlFlowNodes belong to class ConcreteControlFlowNode.

The predicate ASTNode.getFirstControlFlowNode() relates AST nodes to the first (concrete) CFG node in the sub-graph of the CFG corresponding to the node.

Most statement containers also have a start node, obtained by StmtContainer.getStart(), which is the unique CFG node at which execution of the toplevel or function begins. Unlike the entry node, which is a synthetic construct, the start node corresponds to an AST node: for instance, for toplevels, it is the first CFG node of the first statement, and for functions with parameters it is the CFG node corresponding to the first parameter.

Empty toplevels do not have a start node, since all their CFG nodes are synthetic.

CFG Nodes

Non-synthetic CFG nodes exist for six kinds of AST nodes, representing various aspects of the program’s runtime semantics:

  • Expr: the CFG node represents the evaluation of the expression, including any side effects this may have;
  • Stmt: the CFG node represents the execution of the statement;
  • Property: the CFG node represents the assignment of the property;
  • PropertyPattern: the CFG node represents the matching of the property;
  • MemberDefinition: the CFG node represents the definition of the member method or field;
  • MemberSignature: the CFG node represents the point where the signature is declared, although this has no effect at runtime.

CFG Structure

Expressions

For most expressions, the successor relation visits sub-expressions first, and then the expression itself, representing the order of evaluation at runtime. For example, the CFG for the expression 23 + 19 is

 … → [23] → [19] → [23 + 19] → …
 

In particular, this means that 23 is the first CFG node of the expression 23 + 19.

Similarly, for assignments the left hand side is visited first, then the right hand side, then the assignment itself:

 … → [x] → [y] → [x = y] → …
 

For properties, the name expression is visited first, then the value, then the default value, if any. The same principle applies for getter and setter properties: in this case, the “value” is simply the accessor function, and there is no default value.

There are only a few exceptions, generally for cases where the value of the whole expression is the value of one of its sub-expressions. That sub-expression then comes last in the CFG:

  • Parenthesized expression:
 … → [(x)] → [x] → …
 
  • Conditional expressions:
 … →  [x ? y : z]  → [x] ┬→ [y] → … 
                      └→ [z] → …
  • Short-circuiting operator && (same for ||):
 … → [x && y] → [x] → … 
                ↓
               [y] → …
  • Sequence/comma expressions:
 … → [x, y] → [x] → [y] → …
 

Finally, array expressions and object expressions also precede their sub-expressions in the CFG to model the fact that the new array/object is created before its elements/properties are evaluated:

 … → [{ x: 42 }] → [x] → [42] → [x : 42] → …
 

Statements

For most statements, the successor relation visits the statement first and then its sub-expressions and sub-statements.

For example, the CFG of a block statement first visits the individual statements, then the block statement itself.

Similarly, the CFG for an if statement first visits the statement itself, then the condition. The condition, in turn, has the “then” branch as one of its successors and the “else” branch (if it exists) or the next statement after the “if” (if it does not) as the other:

 … → [if (x) s1 else s2] → [x] ┬→ [s1] → …
                               └→ [s2] → …
 

For loops, the CFG reflects the order in which the loop test and the body are executed.

For instance, the CFG of a while loop starts with the statement itself, followed by the condition. The condition has two successors: the body, and the statement following the loop. The body, in turn, has the condition as its successor. This reflects the fact that while loops first test their condition before executing their body:

 … → [while (x) s] → [x] → …
                      ⇅
                     [s]
 

On the other hand, do-while loops first execute their body before testing their condition:

 … → [do s while (x)] → [s] ⇄ [x] → …
 

The CFG of a for loop starts with the loop itself, followed by the initializer expression (if any), then the test expression (if any). The test expression has two successors: the body, and the statement following the loop. The body, in turn, has the update expression (if any) as its successor, and the update expression has the test expression as its only successor:

 … → [for(i;t;u) s] → [i] → [t] → …
                            ↙ ↖
                         [s] → [u]
 

The CFG of a for-in loop for(x in y) s starts with the loop itself, followed by the iteration domain y. That node has two successors: the iterator x, and the statement following the loop (modeling early exit in case y is empty). After the iterator x comes the loop body s, which again has two successors: the iterator x (modeling the case where there are more elements to iterate over), and the statement following the loop (modeling the case where there are no more elements to iterate):

 … → [for(x in y) s] → [y] →  …
                        ↓     ↑
                       [x] ⇄ [s]
 

For-of loops are the same.

Finally, return and throw statements are different from all other statement types in that for them the statement itself comes after the operand, reflecting the fact that the operand is evaluated before the return or throw is initiated:

 … → [x] → [return x;] → …
 

Unstructured control flow

Unstructured control flow is modeled in the obvious way: break and continue statements have as their successor the next statement that is executed after the jump; throw statements have the nearest enclosing catch clause as their successor, or the exit node of the enclosing container if there is no enclosing catch; return statements have the exit node of the enclosing container as their successor.

In all cases, the control flow may be intercepted by an intervening finally block. For instance, consider the following code snippet:

 try {
   if (x)
     return;
   s
 } finally {
   t
 }
 u
 

Here, the successor of return is not the exit node of the enclosing container, but instead the finally block. The last statement of the finally block (here, t) has two successors: u to model the case where finally was entered from s, and the exit node of the enclosing container to model the case where the return is resumed after the finally block.

Note that finally blocks can lead to imprecise control flow modeling since the finally block resumes the action of all statements it intercepts: in the above example, the CFG not only models the executions returnfinallytexit and sfinallytu, but also allows the path returnfinallytu, which does not correspond to any actual execution.

The CFG also models the fact that certain kinds of expressions (calls, new expressions, property accesses and await expressions) can throw exceptions, but only if there is an enclosing try-catch statement.

Function preambles

The CFG of a function starts with its entry node, followed by a preamble, which is a part of the CFG that models parameter passing and function hoisting. The preamble is followed by the function body, which in turn is followed by the exit node.

For function expressions, the preamble starts with the function name, if any, to reflect the fact that the function object is bound to that name inside the scope of the function. Next, for both function expressions and function declarations, the parameters are executed in sequence to represent parameter passing. If a parameter has a default value, that value is visited before the parameter itself. Finally, the CFG nodes corresponding to the names of all hoisted functions inside the outer function body are visited in lexical order. This reflects the fact that hoisted functions are initialized before the body starts executing, but after parameters have been initialized.

For instance, consider the following function declaration:

 function outer(x, y = 42) {
   s
   function inner() {}
   t
 }
 

Its CFG is

 [entry] → [x] → [42] → [y] → [inner] → [s] → [function inner() {}] → [t] → [exit]
 

Note that the function declaration [function inner() {}] as a whole is part of the CFG of the body of outer, while its function identifier inner is part of the preamble.

Toplevel preambles

Similar to functions, toplevels (that is, modules, scripts or event handlers) also have a preamble. For ECMAScript 2015 modules, all import specifiers are traversed first, in lexical order, reflecting the fact that imports are resolved before execution of the module itself begins; next, for all toplevels, the names of hoisted functions are traversed in lexical order (as for functions). Afterwards, the CFG continues with the body of the toplevel, and ends with the exit node.

As an example, consider the following module:

s
import x as y from 'foo';
function f() {}
t

Its CFG is

 [entry] → [x as y] → [f] → [s] → [import x as y from 'foo';] → [function f() {}] → [t] → [exit]
 

Note that the import statement as a whole is part of the CFG of the body, while its single import specifier x as y forms part of the preamble.

Import path

semmle.javascript.CFG

Imports

javascript

Provides classes for working with JavaScript programs, as well as JSON, YAML and HTML.

Classes

ConcreteControlFlowNode

A CFG node corresponding to a program element, that is, a CFG node that is not a SyntheticControlFlowNode.

ConditionGuardNode

A guard node recording that some condition is known to be truthy or falsy at this point in the program.

ControlFlowEntryNode

A synthetic CFG node marking the entry point of a function or toplevel script.

ControlFlowExitNode

A synthetic CFG node marking the exit of a function or toplevel script.

ControlFlowNode

A node in the control flow graph, which is an expression, a statement, or a synthetic node.

GuardControlFlowNode

A synthetic CFG node recording that some condition is known to hold at this point in the program.

SyntheticControlFlowNode

A synthetic CFG node that does not correspond to a statement or expression; examples include guard nodes and entry/exit nodes.