Python Taint Tracking Library
The taint tracking library is described in three parts.
- Specification of kinds, sources, sinks and flows.
- The high level query API
- The implementation.
There are four parts to the specification of a taint tracking query. These are:
The Python taint tracking library supports arbitrary kinds of taint. This is useful where you want to track something related to “taint”, but that is in itself not dangerous. For example, we might want to track the flow of request objects. Request objects are not in themselves tainted, but they do contain tainted data. For example, the length or timestamp of a request may not pose a risk, but the GET or POST string probably do. So, we would want to track request objects distinctly from the request data in the GET or POST field.
Kinds can also specify additional flow steps, but we recommend using the
DataFlowExtensionmodule, which is less likely to cause issues with unwanted recursion.
Sources of taint can be added by importing a predefined sub-type of
TaintSource, or by defining new ones.
Sinks (or vulnerabilities)
Sinks can be added by importing a predefined sub-type of
TaintSink, or by defining new ones.
Additional flow can be added by importing predefined sub-types of
DataFlowExtension::DataFlowVariableor by defining new ones.
The high-level query API
TaintedNode fully describes the taint flow graph.
The full graph can be expressed as:
from TaintedNode n, TaintedNode s where s = n.getASuccessor() select n, s
The source -> sink relation can be expressed either using
from TaintedNode src, TaintedNode sink where src.isSource() and sink.isSink() and src.getASuccessor*() = sink select src, sink
or, using the specification API:
from TaintSource src, TaintSink sink where src.flowsToSink(sink) select src, sink
The data-flow graph used by the taint-tracking library is the one created by the points-to analysis,
and consists of the base data-flow graph produced by
enhanced with precise variable flows, call graph and type information.
This graph is then enhanced with additional flows as specified above.
Since the call graph and points-to information is context sensitive, the taint graph must also be context sensitive.
The taint graph is a directed graph where each node consists of a
(CFG node, context, taint) triple although it could be thought of more naturally
as a number of distinct graphs, one for each input taint-kind consisting of data flow nodes,
(CFG node, context) pairs, labelled with their
TrackedValue used in the implementation is not the taint kind specified by the user,
but describes both the kind of taint and how that taint relates to any object referred to by a data-flow graph node or edge.
Currently, only two types of
taint are supported: simple taint, where the object is actually tainted;
and attribute taint where a named attribute of the referred object is tainted.
Support for tainted members (both specific members of tuples and the like, and generic members for mutable collections) are likely to be added in the near future and other forms are possible. The types of taints are hard-wired with no user-visible extension method at the moment.
Call context for use in taint-tracking. Using call contexts prevents “cross talk” between different calls to the same function. For example, if a function f is defined as
Taint kinds representing collections of other taint kind. We use
A taint kind representing a mapping of objects to kinds. Typically a dict, but can include other mappings.
A type of sanitizer of untrusted data. Examples include sanitizers for http responses, for DB access or for shell commands. Usually a sanitizer can only sanitize data for one particular use. For example, a sanitizer for DB commands would not be safe to use for http responses.
A taint kind representing a flat collections of kinds. Typically a sequence, but can include sets.
DEPRECATED – Use DataFlowExtension instead. An extension to taint-flow. For adding library or framework specific flows. Examples include flow from a request to untrusted part of that request or from a socket to data from that socket.
A ‘kind’ of taint. This may be almost anything, but it is typically something like a “user-defined string”. Examples include, data from a http request object, data from an SMS or other mobile data source, or, for a super secure system, environment variables or the local file system.
A node that is vulnerable to one or more types of taint. These nodes provide the sinks when computing the taint flow graph. An example would be an argument to a write to a http response object, such an argument would be vulnerable to unsanitized user-input (XSS).
A source of taintedness. Users of the taint tracking library should override this class to provide their own sources.
Warning: Advanced feature. Users are strongly recommended to use
A tainted data flow graph node. This is a triple of
Extension for data-flow, to help express data-flow paths that are library or framework specific and cannot be inferred by the general data-flow machinery.
This module contains the implementation of taint-flow. It is recommended that users use the