Creating a CodeQL database

Overview

CodeQL analysis relies on extracting relational data from your code, and using it to build a CodeQL database—a directory containing all the data required to run queries on your code. This topic shows you how to create a CodeQL database using the database create subcommand.

Prerequisites

Before you generate a CodeQL database, you need to:

  • Install and set up the CodeQL CLI. For further information, see Getting started with the CodeQL CLI.
  • Check out the version of your codebase you want to analyze. The directory should be ready to build, with all dependencies already installed.

Running codeql database create

CodeQL databases are created by running the following command from the checkout root of your project:

codeql database create <database> --language=<language-identifier>

You must specify:

  • <database>: a path to the new database to be created. This directory will be created when you execute the command—you cannot specify an existing directory.

  • --language: the identifier for the language to create a database for. CodeQL supports creating databases for the following languages:

    Language Identifier
    C/C++ cpp
    C# csharp
    Go go
    Java java
    JavaScript/TypeScript javascript
    Python python

Other options may be specified depending on the location of your source file and the language you want to analyze:

  • --source-root: the root folder for the primary source files used in database creation. By default, the command assumes that the current directory is the source root—use this option to specify a different location.
  • --command: for compiled languages only, the build commands that invoke the compiler. Do not specify --command options for Python and JavaScript. Commands will be run from the current folder, or --source-root if specified. If you don’t include a --command, CodeQL will attempt to detect the build system automatically, using a built-in autobuilder.

For full details of all the options you can use when creating databases, see the database create reference documentation.

For more information about running the database create subcommand to create databases for various compiled and non-compiled languages, see Examples: Creating CodeQL databases.

Progress and results

Errors are reported if there are any problems with the options you have specified. For interpreted languages, the extraction progress is displayed in the console—for each source file, it reports if extraction was successful or if it failed. For compiled languages, the console will display the output of the build system.

When the database is successfully created, you’ll find a new directory at the path specified in the command. This directory contains a number of subdirectories, including the relational data (required for analysis) and a source archive—a copy of the source files made at the time the database was created—which is used for displaying analysis results.

Obtaining databases from LGTM.com

LGTM.com analyzes thousands of open-source projects using CodeQL. For each project on LGTM.com, you can download an archived CodeQL database corresponding to the most recently analyzed revision of the code. These databases can also be analyzed using the CodeQL CLI.

To download a database from LGTM.com:

  1. Log in to LGTM.com.
  2. Find a project you’re interested in and display the Integrations tab (for example, Apache Kafka).
  3. Scroll to the CodeQL databases for local analysis section at the bottom of the page.
  4. Download databases for the languages that you want to explore.
  5. Unzip the databases.

Before running an analysis, try upgrading the unzipped databases to ensure they are compatible with your local copy of the CodeQL queries and libraries.

Note

The CodeQL CLI doesn’t currently extract data from additional configuration files (such as web.xml files in C# and .properties files in Java) when you create databases. This means that CodeQL databases created using the CodeQL CLI may be slightly different from those obtained from LGTM.com or created using the legacy QL command-line tools. As such, analysis results generated from databases created using the CodeQL CLI may differ from those generated from databases obtained from elsewhere.

What next?

After you have successfully created a database it can be analyzed using the CodeQL CLI, or imported into VS Code for advanced query development.