Semmle 1.22
Skip to end of metadata
Go to start of metadata

Purpose

By default, the bootstrap tool creates a project configuration with one build command that performs extraction of all the Python code in the source tree (see Tutorial: Basic project creation (Python) for an example). In some cases this may need modifying. For example, you might want to exclude test files, or the source may be divided into two or more parts, each with a separate search path. In addition, you can configure the extractor to analyze pyxlspitfire and protbuf modules.

The behavior of the Python extractor can be modified by adding command-line flags to the the relevant build command.

Usage

The Python extractor is distributed as a small launcher called python_tracer.py and a zip file containing the bulk of the the extractor, both are present in the tools sub-directory of your Semmle Core installation. When you create a project configuration file using the bootstrap command, your responses to prompts are converted into a call to the Python extractor with the command-line options required to perform that analysis.

A basic call to the Python extractor, generated by the bootstrap command and stored in the project configuration file looks like this: 

python ${odasa_tools}/python_tracer.py -a -v -c ${project}/trap_cache -j -R "${src}" -p "${src}"

The path ${odasa_tools}/python_tracer.py refers to the launcher for the extractor and the variable ${odasa_tools} points to the tools directory of Semmle Core during normal use of the command-line tools. 

The extractor is given two main arguments:

  • -p "${src}" search for imports under ${src}, the location where the snapshot is stored.
  • -R "${src}" extract all Python files under ${src}, the location where the snapshot is stored.

The additional arguments define:

  • -a follow all imports (not just top level ones), ensuring that any necessary dependencies are included in the database (provided that they can be found).
  • -v output more detailed logging information during the extraction.
  • -c ${project}/trap_cache store the intermediate files generated during extraction in a directory named trap_cache. The use of a cache is solely a performance optimisation, and it can be safely deleted at any time.

Flags

You can use command-line flags to change the default behavior of the Python extractor. You can edit the extraction build command in the project file, adding or changing flags to produce the required behavior.

Detailed information

For full details of the flags supported by the Python extractor type, call the extractor with the --help and -v options. For example, on Linux you might run the following command in the odasa directory:

python tools/python_tracer.py --help -v

Import options

In order to analyze a Python module, it is necessary to be able to analyze its dependencies. To do this, the extractor follows imports in the modules that it extracts, extracting those modules as well.

There are four options that determine how the extractor follows options. The default option added by the bootstrap command is -a (short for --all-imports), which gives the best results for most projects. 

Should performance be critical, for example in a code review set up, then some accuracy can be traded for speed by using the --max-import-depth option. The combination of -a and -max-import-depth=2 can give reasonably accurate results should performance be an issue.

Search path options

When the extractor is following imports it needs to know where to look. As well as searching the default interpreter path and PYTHONPATH, additional paths can be specified using the -p (short for --path) option. Path specified using -p take precedence over the default path.

Specifying what Python code to extract

The Python code to extract can be specified either by path or by module name. It is generally easier to specify by path, but sometimes it may be desirable to specify modules by name.

The simplest option (and the default specified by the bootstrap command) is to pass the top level folder in a checkout as the argument to the --recurse-files option (-R).

Example build commands 

Basic example

A simple build command for Python extraction such as this:
<build>python ${odasa_tools}/python_tracer.py -p "${src}" -R "${src}"</build>

could be modified as follows:
<build>python ${odasa_tools}/python_tracer.py -p ${src}/src -p ${src}/test -R ${src}</build> 

The additional flags tell the extractor that the import path contains the two elements: ${src}/src and ${src}/test.

Detailed example

Here we look at extracting the source of the SCons build tool. https://bitbucket.org/scons/sconsThe source of SCons contains many tests, various scripts, some benchmarks and the core engine.

  • We assume that the source is checked out into ${src}, the default location for code checkouts. 
  •  We want to extract all of this code, so the first option we define is -R ${src}/src — that is, treat the paths under ${src}/src as paths for packages, then recurse. Compute the package names from their paths.
  • We need to specify the search path. Looking at the SCons source code shows that most of the Python source code files are tests and scripts. In fact, the only modules are in the Scons package which is located under ${src}/src/engine.
  • So we need to add a single path option for this location: -p ${src}/src/engine
  • Finally, we define the main scons.py script as the __main__ module using the --main flag: -m ${src}/src/script/scons.py

This gives us the following command to call the Python extractor:

python ${odasa_tools}/python_tracer.py -R ${src}/src -p ${src}/src/engine -m ${src}/src/script/scons.py

When you analyze a project using this call, all paths under ${src}/src/engine are searched for Python modules, the main module for extraction is defined as ${src}/src/script/scons.py and all paths within ${src}/src are explored for packages (package names computed from the paths).