LGTM Enterprise 1.25

Exploring data flow paths

Data flow analysis and taint analysis are core techniques often used for variant analysis to exploit security vulnerabilities. In both cases, data is tracked from a source, where it enters an application, to a sink, where the data is used in a potentially harmful way if there isn't any form of sanitization implemented. The longer a path is from source to sink, the more difficult it is for a developer to trust the data. As a result, it can be hard to see how a particular result (source/sink pair) was produced, and to verify and fix the problem.

Among the built-in queries that come with LGTM, there are queries that generate alerts highlighting data flow issues. Any query that has @kind path-problem in its metadata is a path query. You can explore paths between the sources and sinks by examining the reported alert in LGTM.

You can also write your own path queries and add them to your repository. For more details, see Writing custom queries to include in LGTM analysis. The alerts for these queries will be displayed on LGTM alongside the results of the built-in queries, and you’ll be able to explore the paths in exactly the same way.

To quickly find path queries on LGTM, search for terms that may be part of a path query name (for example: cross scripting, injection, unsafe, unstrusted, etc.), click a query in the returned results and open the Alerts page. For more information, see Viewing all the alerts found by a query.

When you view alerts generated by path queries on LGTM, you'll see a Show paths button ( below).

Show paths button

If you want to write and run queries locally, as well as see results directly in your IDE, you can use our CodeQL for Visual Studio Code extension. You can also explore paths there.

If you click the Show paths button, you’re presented with a popup showing the available paths where the data goes from the source to the sink.

LGTM only displays a subset of all the paths generated by a path query. By "available paths", we mean the paths in that subset (the retained paths are those that are substantially different from each another, where possible).

The two screenshots below illustrate two different cases:

  • Example 1, with one path, starting and ending in different files.

    Exploring paths in LGTM

  • Example 2, with several paths that are local to a single file.

    Exploring paths in LGTM

The following information is provided:

Number of paths available—this is located above the query name.

Path drop-down list—contains the available paths for the flow of data. If there is only one path (example 1), it's selected by default. You can select a different path on the drop-down list if there is more than one path available (example 2).

Steps for the currently selected path—using these steps, you can follow the flow of data all the way from the source (first step) to the sink (last step).

At each step, the location of the data is highlighted in light orange. The alert is usually on the source or the sink.