Analyzing databases with the CodeQL CLI

To analyze a codebase, you run queries against a CodeQL database extracted from the code.

CodeQL analyses produce interpreted results that can be displayed as alerts or paths in source code. For information about writing queries to run with database analyze, see “Using custom queries with the CodeQL CLI.”

Other query-running commands

Queries run with database analyze have strict metadata requirements. You can also execute queries using the following plumbing-level subcommands:

  • database run-queries, which outputs non-interpreted results in an intermediate binary format called BQRS.
  • query run, which will output BQRS files, or print results tables directly to the command line. Viewing results directly in the command line may be useful for iterative query development using the CLI.

Queries run with these commands don’t have the same metadata requirements. However, to save human-readable data you have to process each BQRS results file using the bqrs decode plumbing subcommand. Therefore, for most use cases it’s easiest to use database analyze to directly generate interpreted results.

Before starting an analysis you must:

Running codeql database analyze

When you run database analyze, it does two things:

  1. Executes one or more query files, by running them over a CodeQL database.
  2. Interprets the results, based on certain query metadata, so that alerts can be displayed in the correct location in the source code.

You can analyze a database by running the following command:

codeql database analyze <database> <queries> --format=<format> --output=<output>

You must specify:

  • <database>: the path to the CodeQL database you want to analyze.
  • <queries>: the queries to run over your database. You can list one or more individual query files, specify a directory that will be searched recursively for query files, or name a query suite that defines a particular set of queries. For more information, see the examples below.
  • --format: the format of the results file generated during analysis. A number of different formats are supported, including CSV, SARIF, and graph formats. To find out which other results formats are supported, see the database analyze reference.
  • --output: the output path of the results file generated during analysis.

You can also specify:

  • --threads: optionally, the number of threads to use when running queries. The default option is 1. You can specify more threads to speed up query execution. Specifying 0 matches the number of threads to the number of logical processors.

Upgrading databases

If the CodeQL queries you want to use are newer than the extractor used to create the database, then you may see a message telling you that your database needs to be upgraded when you run database analyze. You can quickly upgrade a database by running the database upgrade command. For more information, see “Upgrading CodeQL databases.”

For full details of all the options you can use when analyzing databases, see the database analyze reference documentation.

Examples

The following examples assume your CodeQL databases have been created in a directory that is a sibling of your local copies of the CodeQL and CodeQL for Go repositories.

Running a single query

To run a single query over a JavaScript codebase, you could use the following command from the directory containing your database:

codeql database analyze <javascript-database> ../ql/javascript/ql/src/Declarations/UnusedVariable.ql --format=csv --output=js-analysis/js-results.csv

This command runs a simple query that finds potential bugs related to unused variables, imports, functions, or classes—it is one of the JavaScript queries included in the CodeQL repository. You could run more than one query by specifying a space-separated list of similar paths.

The analysis generates a CSV file (js-results.csv) in a new directory (js-analysis).

You can also run your own custom queries with the database analyze command. For more information about preparing your queries to use with the CodeQL CLI, see “Using custom queries with the CodeQL CLI.”

Running LGTM.com query suites

The CodeQL repository also includes query suites, which can be run over your code as part of a broader code review. CodeQL query suites are .qls files that use directives to select queries to run based on certain metadata properties.

The query suites included in the CodeQL repository select the same set of queries that are run by default on LGTM.com. The queries are selected to highlight the most relevant and useful results for each language.

The language-specific LGTM query suites are located at the following paths in the CodeQL repository:

ql/<language>/ql/src/codeql-suites/<language>-lgtm.qls

and at the following path in the CodeQL for Go repository:

ql/src/codeql-suites/go-lgtm.qls

These locations are specified in the metadata included in the standard QL packs. This means that CodeQL knows where to find the suite files automatically, and you don’t have to specify the full path on the command line when running an analysis. For more information, see “About QL packs.”

For example, to run the LGTM.com query suite on a C++ codebase (generating results in the latest SARIF format), you would run:

codeql database analyze <cpp-database> cpp-lgtm.qls --format=sarif-latest --output=cpp-analysis/cpp-results.sarif

For information about creating custom query suites, see “Creating CodeQL query suites.”

Running all queries in a directory

You can run all the queries located in a directory by providing the directory path, rather than listing all the individual query files. Paths are searched recursively, so any queries contained in subfolders will also be executed.

Important

You shouldn’t specify the root of a QL pack when executing database analyze as it contains some special queries that aren’t designed to be used with the command. Rather, to run a wide range of useful queries, run one of the LGTM.com query suites.

For example, to execute all Python queries contained in the Functions directory you would run:

codeql database analyze <python-database> ../ql/python/ql/src/Functions/ --format=sarif-latest --output=python-analysis/python-results.sarif

A SARIF results file is generated. Specifying --format=sarif-latest ensures that the results are formatted according to the most recent SARIF specification supported by CodeQL.

Results

You can save analysis results in a number of different formats, including SARIF and CSV.

The SARIF format is designed to represent the output of a broad range of static analysis tools. For more information, see SARIF overview.

If you choose to generate results as CSV, the output will contain the following information for each alert:

Property Description Example
Name Name of the query that identified the result. Inefficient regular expression
Description Description of the query. A regular expression that requires exponential time to match certain inputs can be a performance bottleneck, and may be vulnerable to denial-of-service attacks.
Severity Severity of the query. error
Message Alert message. This part of the regular expression may cause exponential backtracking on strings containing many repetitions of '\\\\'.
Path Path of the file containing the alert. /vendor/codemirror/markdown.js
Start line Line of the file where the code that triggered the alert begins. 617
Start column Column of the start line that marks the start of the alert code. Not included when equal to 1. 32
End line Line of the file where the code that triggered the alert ends. Not included when the same value as the start line. 64
End column Where available, the column of the end line that marks the end of the alert code. Otherwise the end line is repeated. 617

Results files can be integrated into your own code-review or debugging infrastructure. For example, SARIF file output can be used to highlight alerts in the correct location in your source code using a SARIF viewer plugin for your IDE.