Semmle 1.21
Skip to end of metadata
Go to start of metadata

This topic describes the project configuration file that defines the language, checkout commands, build commands and snapshot retention policy for a project.

Purpose

Each of the projects you analyze has its own project configuration file. The project file defines the programming language, checkout commands, build commands and snapshot policy for one code base.

The information stored in the project file is used when a new snapshot is added to the project and built. Each new snapshot is added to a snapshot-specific directory created within the projects/<project-name> directory. Each snapshot directory contains a snapshot file, which is generated based on the information in the project file and defines the configuration of the snapshot. By default, the snapshot directory also contains a src subdirectory which contains a copy of the source code.

The checkout elements control the behavior of the addSnapshot and addLatestSnapshot tools, and the build elements control the behavior of the buildSnapshot tool. The resulting snapshots can be analyzed using the analyzeSnapshot and runQuery tools.

You can create a project file by using the following tools:

or by copying and modifying the project file for a similar project.

Location

Typically the project file is stored in projects/<project name>.

For example: 

/opt/semmle-home/projects/commons-io_java/project

-

In the above example, SEMMLE_HOME has been set to /opt/semmle-home. For details about setting SEMMLE_HOME to store configuration files separately from the Semmle Core distribution files, see Large-scale deployments.

Structure

The project file is an XML file with the following structure:

<project language="programming-language">
   <timeout>...</timeout>
   <displayName>...</displayName>
   <autoupdate>
       THIS ELEMENT CONTAINS:
        - Code checkout instruction
        - Build configuration
        - Frequency of code checkout
   </autoupdate>
   <snapshot-policy>
       THIS ELEMENT CONTAINS:
        - Definition of nightly snapshots to keep
   </snapshot-policy>
 </project> 

Example of a basic project file for Java
<project language="java">
  <timeout>600</timeout>
  <autoupdate>
    <checkout>git clone -n ${repository} ${src}</checkout>
    <checkout>git checkout ${revision}</checkout>
    <build index="true">mvn compile</build>
    <build>odasa duplicateCode --ram 2048 --minimum-tokens 100</build>
    <source-location>${checkout_cache}/commons-io</source-location>
  </autoupdate>
</project> 

Elements

project

This is the root element. It has a single attribute:

AttributeMandatory?Description
language(tick)
Defines the language to analyze. It is not possible to mix two languages in one project. The standard languages supported are:



  • C or C++ — language="cpp"
  • C# — language="csharp"
  • COBOL—language="cobol"
  • Java — language="java"
  • JavaScript — language="javascript"
  • Python — language="python"

project child elements

One of: autoupdate

Zero or one of: ramtimeoutautoupdatesnapshot-policydisplayName, threads

ram (parent: project)

Defines the maximum memory allowed for any process run on the project—for example, building a snapshot or running a query. The ram element has no attributes or child elements.

The value is assumed to be defined in 1024-based megabytes unless a unit is specified. Units are case insensitive.
 Show valid units...
  • k or kb—kilobytes, the value will be rounded to the nearest megabyte
  • m or mb—megabytes, the default unit if none is specified
  • g or gb—gigabytes
  • t or tb—terabytes

Some commands, such as analyzeSnapshot, support a --ram flag that can be used to override a ram element in the project file. If a --ram flag is used, its value always has preference over any ram value defined in the project file.

Default: 1024 MB on 32-bit JVMs and 4096 MB on 64-bit JVMs. The latter default (4096 MB) is suitable for most systems and projects. For 32-bit JVMs you should increase the RAM—for example, to 2048 MB.

The value should be set according to the size of the code base, the available memory in the system and the number of parallel threads that you intend to use when running queries. The value of this element should never be set below 1024 MB.

timeout (parent: project)

Defines a timeout for each query. If a query exceeds this time, processing of this query is halted and the next query is run. This is a fail-safe to prevent a single query blocking the calculation of other results.

The timeout element has no attributes or child elements.

The value is assumed to be defined in seconds unless a unit is specified. Units are case insensitive.

 Show valid units...
  • ms — milliseconds, the value will be rounded to the nearest second
  • ssecsecssecond, or seconds — the default unit if none is specified
  • mminminsminute, or minutes — minutes
  • hhrhourhrs, or hours — hours

Default: 10mins. The timeout should only be increased for very large projects where the standard value is not sufficient for some custom queries.

autoupdate (parent: project)

This mandatory element defines the steps required to obtain and build a version of the code base for the project. The autoupdate element has no attributes.

autoupdate child elements

Zero or more of: checkoutbuild

Zero or one of: source-locationdays-between-updates

threads (parent: project)

Defines the maximum number of threads allowed for any process run on the project—for example, building a snapshot. The threads element has no attributes or child elements. Set to zero to use threads equal to the number of available processors.

Default: processes are single-threaded.

checkout (parent: autoupdate)

Each checkout element defines a single command for the addSnapshot and addLatestSnapshot commands to run. Where more than one element is defined, the checkout commands are run in sequence. When the checkout commands have all been run, the source code is stored in the default source location ready to be built, unless the source-location element defines an alternative location. See Defining checkout commands manually for examples. Any project files created by the bootstrap tool will use Semmle variables to define the repository and revision to checkout.

The checkout element has many optional attributes which can be used to define the location and context for each checkout command:

AttributeDefault valueCan use variables?Description
dir
${src}(tick)The directory in which to run the command. If a relative path is given, it is taken relative to the source location.
export
Empty(error)A comma-separated list of variables to make available to the command as environment variables. These can be custom variables defined in the variables file for the project.
if
Not set(error)If set, the name of a variable which must expand to true in order for the command to be run.
status
Not set(error)Currently unused.
stderr
Not set(tick)If set, the name of a file to which the error output of the command should be written.
stdout
Not set(tick)If set, the name of a file to which the standard output of the command should be written.
subproject
Not set(error)Not documented; use only under instruction from Semmle support personnel.
unless
Not set(error)If set, the name of a variable which must expand to something other than true in order for the command to be run.

If the source code repository is password protected then you can use a credentials store to keep the username and password for the account used to access the repository secure. See Creating and using a credentials store  for details.

This element has no child elements.

build (parent: autoupdate)

Each build element defines a single command for the buildSnapshot command to run. Where more than one element is defined, the build commands are run in sequence. Build commands are not monitored by Semmle analysis unless the optional index attribute is set to true or the build element explicitly calls out to odasa index.

AttributeDefault valueCan use variables?Description
dir${src}(tick)The directory in which to run the command. If a relative path is given, it is taken relative to the source location.
exportEmpty(error)A comma-separated list of variables to make available to the command as environment variables.
ifNot set(error)If set, the name of a variable which must expand to true in order for the command to be run.
indexfalse(error)Set true to enable Semmle analysis to monitor the build command and extract data from the step.
statusNot set(error)Currently unused.
stderrNot set(tick)If set, the name of a file to which the error output of the command should be written.
stdoutNot set(tick)If set, the name of a file to which the standard output of the command should be written.
subprojectNot set(error)Not documented; use only under instruction from Semmle support personnel.

See Defining build commands for examples.

Build commands can also be used to add external data or metadata to the snapshot database, see: Tutorial: Incorporating external data for an example.

This element has no child elements.

source-location (parent: autoupdate)

This optional element can be used to define an alternative location for the snapshot copy of the source code. That is, the location at which to check out the code (if checkout commands are defined), or the location at which the code is already checked out (if no checkout commands are defined). This latter case is commonly referred to as a "detached source directory".

This element should be set if you use the repositoryName and revision attributes in the checkout element. In this case, Semmle recommends setting source-location to ${checkout_cache}/${project_name}.

All relevant snapshot source code must be located in the directory defined by this element, or one of its subdirectories. Any source code outside the source location will not be directly analyzed by Semmle tools.

This element has no attributes and no child elements. For more information, see Changing the build or checkout locations.

days-between-updates (parent: autoupdate)

This optional element can be used to define the minimum number of days to allow between snapshots. This is typically used only on systems where Semmle analysis is triggered by a daily script and the company wants to restrict the frequency at which new code is checked out. 

This element has no attributes and no child elements.

snapshot-policy (parent: project)

This optional element is used to define an automatic snapshot deletion policy for the project. Snapshots are automatically assessed against this policy by the addLatestSnapshot or addSnapshot commands, any unwanted snapshots are automatically deleted.

The snapshot-policy element has no attributes.

snapshot-policy child elements

Exactly one of: max

One or more of: include

max (parent: either snapshot-policy or include)

This element defines how many (untagged) snapshots are kept. It is specified either as a child of the snapshot-policy element to define the overall total number of snapshots to keep, or as a child of an include element to define the maximum number of snapshots of that type to keep.

This element has no attributes and no child elements.

include (parent: snapshot-policy)

Each include element defines a policy for a specific type of snapshot. When the snapshot deletion policy is applied, the code snapshot for the most recent date is tested against each include element in sequence.

include child and grandchild elements

ElementParentDescriptionAttributesOccurence
max
include
The maximum number of snapshots of the defined type to keep.NoneZero or one
recurrent
include

Defines a type of snapshot to keep. It has a mandatory kind attribute, whose value must be one of: dailyweekdaysweeklymonthly or yearly.

Example: <recurrent kind="weekdays"/>

kindExactly one

day

recurrent

Optional child of recurrent element when kind="weekly" or kind="monthly" or kind="yearly".

When kind="weekly": the value of day must be one of Monday, Tuesday, Wednesday, Thursday, Friday, Saturday or Sunday. When undefined, the most recent snapshot of the week is retained.

When kind="monthly" : the value of day must be an integer from 1 to 31 . When undefined, the most recent snapshot of the month is retained.

When kind="yearly":

  • Either the day and the month must both be defined, or both must be omitted. When both are undefined, the most recent snapshot of the year is retained.
  • If day is defined, its value must be an integer from 1 to 31 .
NoneZero or one
month
recurrent

Optional child of <recurrent> element when kind="yearly". The value must be one of: January, February, March, April, May, June, July, August, September, October, November or December.

When kind="yearly" : either the day and the month must both be defined, or both must be omitted. When both are undefined, the most recent snapshot of the year is retained.

NoneZero or one

Application of the snapshot policy

Snapshots are processed in order starting with the most recent date (so that most recent snapshots are most likely to be kept). Each snapshot is matched with the first <include> rule that applies to it, and the quota for this rule is decreased by one. When the quota for a rule reaches 0, the rule is ignored and matching continues with other rules.

An untagged snapshot that does not match any of the <include> rules is deleted, even if the maximum number of snapshots has not been reached. It is important to make sure that the <include> rules cover all cases of interest. In projects where the build fails regularly you should consider removing the <day> element to ensure that at least one snapshot per month or week is retained.

displayName (parent: project) 

Defines a label for the project for display in client applications. The displayName element has no attributes or child elements.

Default: the project directory name is displayed.

Variables supported

Predefined variables

The variables available for use in the project file are described in detail in the Semmle variables topic.

User-defined variables

You can define additional variables to use in the project file by creating a variables file in the same directory. Any variables defined in this file are automatically detected and available to any tools that read the project file.