Introduction to query files

Overview

Queries are programs written with CodeQL. They are designed to highlight issues related to the security, correctness, maintainability, and readability of a code base. You can also write custom queries to find specific issues relevant to your own project. Three important types of query are:

  • Alert queries: queries that highlight issues in specific locations in your code.
  • Path queries: queries that describe the flow of information between a source and a sink in your code.
  • Metric queries: queries that compute statistics for your code.

You can add custom queries to custom query packs to analyze your projects in LGTM, use them to analyze a project using the command-line tools, or you can contribute to the standard CodeQL queries in our open source repository on GitHub.

Note

Only the results generated by alert and path queries are displayed on LGTM. You can display the results generated by metric queries by running them against your project in the query console on LGTM or in QL for Eclipse. You can explore the paths generated by path queries directly in LGTM and the path explorer view in QL for Eclipse.

This topic is a basic introduction to structuring query files. You can find further information on writing queries for specific programming languages here, and detailed technical information about QL in the QL language handbook and the QL language specification. For information on how to format your code when contributing queries to the GitHub repository, see the CodeQL style guide.

Basic query structure

Queries written with CodeQL have the file extension .ql, and contain a select clause. Many of the existing queries include additional optional information, and have the following structure:

/**
 *
 * Query metadata
 *
 */

import /* ... CodeQL libraries or modules ... */

/* ... Optional, define CodeQL classes and predicates ... */

from /* ... variable declarations ... */
where /* ... logical formula ... */
select /* ... expressions ... */

The following sections describe the information that is typically included in a query file for alerts and metrics. Path queries are discussed in more detail in Constructing path queries.

Query metadata

Query metadata is used to identify your custom queries when they are added to the GitHub repository or used in your analysis. Metadata provides information about the query’s purpose, and also specifies how to interpret and display the query results. For a full list of metadata properties, see the query metadata reference. The exact metadata requirement depends on how you are going to run your query:

Note

Queries that are contributed to the open source repository, added to a query pack in LGTM, or used to analyze a project with the QL command-line tools must have a query type (@kind) specified. The @kind property indicates how to interpret and display the results of the query analysis:

  • Alert query metadata must contain @kind problem.
  • Path query metadata must contain @kind path-problem.
  • Metric query metadata must contain @kind metric.

When you define the @kind property of a custom query you must also ensure that the rest of your query has the correct structure in order to be valid, as described below.

Import statements

Each query generally contains one or more import statements, which define the libraries or modules to import into the query. Libraries and modules provide a way of grouping together related types, predicates, and other modules. The contents of each library or module that you import can then be accessed by the query. Our open source repository on GitHub contains the standard CodeQL libraries for each supported language.

When writing your own alert queries, you would typically import the standard library for the language of the project that you are querying, using import followed by a language:

  • C/C++: cpp
  • C#: csharp
  • COBOL: cobol
  • Java: java
  • JavaScript/TypeScript: javascript
  • Python: python

There are also libraries containing commonly used predicates, types, and other modules associated with different analyses, including data flow, control flow, and taint-tracking. In order to calculate path graphs, path queries require you to import a data flow library into the query file. See Constructing path queries for further information.

You can explore the contents of all the standard libraries in the CodeQL library reference documentation, using QL for Eclipse, or in the GitHub repository.

Optional CodeQL classes and predicates

You can customize your analysis by defining your own predicates and classes in the query. See Defining a predicate and Defining a class for further details.

From clause

The from clause declares the variables that are used in the query. Each declaration must be of the form <type> <variable name>. For more information on the available types, and to learn how to define your own types using classes, see the QL language handbook.

Where clause

The where clause defines the logical conditions to apply to the variables declared in the from clause to generate your results. This clause uses aggregations, predicates, and logical formulas to limit the variables of interest to a smaller set, which meet the defined conditions. The CodeQL libraries group commonly used predicates for specific languages and frameworks. You can also define your own predicates in the body of the query file or in your own custom modules, as described above.

Select clause

The select clause specifies the results to display for the variables that meet the conditions defined in the where clause. The valid structure for the select clause is defined by the @kind property specified in the metadata.

Select clauses for alert queries (@kind problem) consist of two ‘columns’, with the following structure:

select element, string
  • element: a code element that is identified by the query, which defines where the alert is displayed.
  • string: a message, which can also include links and placeholders, explaining why the alert was generated.

The alert message defined in the final column of the select statement can be developed to give more detail about the alert or path found by the query using links and placeholders. For further information, see Defining ‘select’ statements.

Select clauses for path queries (@kind path-problem) are crafted to display both an alert and the source and sink of an associated path graph. See Constructing path queries for further information.

Select clauses for metric queries (@kind metric) consist of two ‘columns’, with the following structure:

select element, metric
  • element: a code element that is identified by the query, which defines where the alert is displayed.
  • metric: the result of the metric that the query computes.

Query help files

When you write a custom query, we also recommend that you write a query help file to explain the purpose of the query to other users. For more information, see the Query help style guide on GitHub, and the Query help reference.

What next?

  • See the queries used in real-life variant analysis on the Semmle blog.
  • To learn more about writing path queries, see Constructing path queries.
  • Take a look at the built-in queries to see examples of the queries included in CodeQL.
  • Explore the query cookbooks to see how to access the basic language elements contained in the CodeQL libraries.
  • For a full list of resources to help you learn CodeQL, including beginner tutorials and language-specific examples, visit Learning CodeQL.