Semmle 1.20
Skip to end of metadata
Go to start of metadata

This topic describes how to configure a project so that full analysis results are generated for every revision of your code, but each revision is built using an incremental build (that only compiles the code that has changed since the previous revision).

Overview

Incremental build systems save time by only recompiling source files that have changed since the last time the code was compiled. This is often very effective, but it interferes with the standard configuration of Semmle analysis: the tools observe a build process and, for each observed compiler call, record data about the compiled source files for later analysis. Therefore, if an incremental build only compiles some of the source, only that portion of the source is analyzed.

However, you can set up a project to work with incremental builds and calculate Semmle analysis results for the full source code. This works by supplementing data in the current snapshot with data from a previous snapshot for any source files that have not changed since the previous incremental build.

Prerequisites

-

Support for incremental builds is currently only available for Java, C, and C++ projects.

-

Before setting up an incremental build, you should have a working  project file from which you have successfully created a working dashboard .

This project must use a detached source directory (that is, a new checkout of the code is not created for each snapshot - the same build directory is reused). A good choice for the source location might be ${project}/src – that is, a subdirectory called src/ under the project directory.

Configuring the project

There are two parts to configuring a project for incremental builds: update the project file, and update the script that adds a new snapshot to the project.

Update the project file

Open the project file for the project you wish to configure and find the existing build command that cleans the source tree. For example, this might be one of the following

  • <build>make clean</build>
  • <build>ant clean</build>
  • <build>msbuild /target:Clean</build>

It may be the case that there is currently one command that both cleans and compiles the code (for example, make clean compile or msbuild /t:Rebuild ). If this is the case, you will need to split the command into two separate build commands, one to clean and one to compile.

When using incremental builds, the source no longer has to be cleaned for every build, but there are still cases when a full build must be triggered (for the very first snapshot, and after a failed compilation). To do this, prefix your clean command with odasa onIncrementalFailure. For example, the build commands shown above would be written as follows.

  • <build>odasa onIncrementalFailure make clean</build>
  • <build>odasa onIncrementalFailure ant clean</build>
  • <build>odasa onIncrementalFailure msbuild /target:Clean</build>

Next add the following command, which supplements the current snapshot with data from the previous snapshot for source files that are part of the system but were not compiled by the current build.

  • <build>odasa completeIncrementalBuild</build>

This new build command should appear after the main build command, but before other commands (like odasa duplicateCode) that expect a full source archive to have been created. Typically, this means that it should be inserted second to last, immediately before the odasa duplicateCode build command.

Update the script that adds a new snapshot

Finally, update your Semmle analysis script to pass either the --fail-early flag or the --delete-on-error flag to all calls to odasa buildSnapshot or odasa buildDashboard . This ensures that any compilation failure will abort the analysis run, which is important in order to avoid corrupting the state that is kept between builds.

Adding the --fail-early flag, to indicates that a Semmle command should abort as soon as something fails. In particular, for buildDashboard this means that no new dashboard web archive is produced.

If the dashboard configuration combines several different projects, this may not be what you want. If one of the project configurations fails, you may still want to produce a dashboard including the information that has been successfully collected for other projects. To do this, add the --delete-on-error flag to buildDashboard, rather than --fail-early.

Example

The following project file is for the popular version control system git. It does not have Semmle incremental builds enabled, but performs a full build each time. For brevity, the contents of the snapshot-policy section have been omitted.

Git project file (non-incremental builds)
<project language="cpp">
  <ram>2048</ram>
  <timeout>600</timeout>
  <autoupdate>
    <source-location>${project}/src</source-location>
    <build>git checkout ${REVISION}</build>
    <build>make clean</build>
    <build>make configure</build>
    <build index="true">make</build>
    <build>odasa duplicateCode --ram 2048 --minimum-tokens 100</build>
    <days-between-updates>0</days-between-updates>
  </autoupdate>
  <snapshot-policy>
    ....
  </snapshot-policy>
</project>

This is the same project file updated to use Semmle incremental builds. Again, the snapshot-policy section has been omitted.

Git project file (incremental builds)
<project language="cpp">
  <ram>2048</ram>
  <timeout>600</timeout>
  <autoupdate>
    <source-location>${project}/src</source-location>
    <build>git checkout ${REVISION}</build>
    <build>odasa onIncrementalFailure make clean</build>
    <build>make configure</build>
    <build index="true">make</build>
    <build>odasa completeIncrementalBuild</build>
    <build>odasa duplicateCode --ram 2048 --minimum-tokens 100</build>
    <days-between-updates>0</days-between-updates>
  </autoupdate>
  <snapshot-policy>
    ....
  </snapshot-policy>
</project>

Known limitations

Support for incremental builds is currently only available for Java, C, and C++ projects.

Using this feature will reduce the amount of time spent building the software but the subsequent analysis time will remain the same, because the analysis phase always works on complete data for the whole software system. This allows advanced analyses, such as detecting misuse of an API based on statistics about how it is commonly used and tracking the flow of data through the code.