Semmle 1.19
Skip to end of metadata
Go to start of metadata

Purpose

By default, the bootstrap tool creates a project configuration with two build commands: one that extracts externs files for standard libraries and platform APIs, and one that extracts all JavaScript code found anywhere in the project's source tree—see Tutorial: Basic project creation (JavaScript). In some cases, however, this may not be the right thing to do. For instance, the project source code may include both the minified and the unminified version of some JavaScript files. In such a case, usually only the unminified version should be included in the analysis. Another common situation is that the source code includes files containing incomplete JavaScript snippets that cannot be processed by themselves, but are meant to be concatenated with other files. In such a case, these snippets should not be included in the analysis.

Some analyses can benefit from additional information about library functions and external objects provided by native libraries such as the DOM that are not written in JavaScript. Externs declarations are a commonly used mechanism for specifying this information, and the extractor can be instructed to include such externs in the database.

Finally, the extractor has a special Node.js mode that enables more precise analysis of Node.js code. It looks for the presence of require calls and exports to detect whether a file is a Node.js module or plain JavaScript code, but in some cases these heuristics may fail, and it may need to be told explicitly to extract a file as Node.js code.

All of these fine-tunings of the JavaScript extraction can be achieved by adding flags to the command that runs the extractor.

Usage

The JavaScript extractor is distributed as a Java JAR file named extractor-javascript.jar, which is contained in the JavaScript pack and will be installed into the tools subdirectory of your Semmle installation. A typical invocation of the JavaScript extractor as generated by bootstrap looks like this:

${odasa_tools}/java/bin/java -jar ${odasa_tools}/extractor-javascript.jar .

This assumes that the variable ${odasa_tools} is set up to point to the tools directory of a Semmle installation, which is always the case during an execution of buildSnapshot. The path ${odasa_tools}/java/bin/java refers to the Java executable in the JRE bundled with the Semmle installation, while ${odasa_tools}/extractor-javascript.jar refers to the extractor JAR.

The extractor takes a single argument ., which is the current directory. In general, the extractor can be given any number of files and directories as arguments. It traverses the given directories and collects all JavaScript files (that is, files with the extension .js.jsx, .mjs, .es, or .es6) and HTML/XHTML files (that is, files with the extension .xhtml, .xhtm.html.htm, or .vue). Then it extracts the JavaScript code contained in all these files for inclusion in the database.

A project file can contain any number of invocations of the JavaScript extractor.

Flags

Flags allow the default behavior of the JavaScript extractor to be modified. You can edit an extraction build command in the project file, adding flags to produce the required behavior.

Example build commands 

A simple build command for JavaScript extraction such as this:

<build>java -jar ${odasa_tools}/extractor-javascript.jar .</build>

could be modified as follows:

<build>java -jar ${odasa_tools}/extractor-javascript.jar --platform nodejs --quiet .</build>

In the above example, flags have been added to explicitly tell the extractor to perform Node.js analysis without outputting diagnostic information.

<build>java -jar ${odasa_tools}/extractor-javascript.jar --exclude **/*.min.js  dist lib test</build>

In the above example, flags have been added to tell the extractor to exclude files with names ending .min.js and only to process files in the dist, lib and test top-level directories.

Flag details

The JavaScript extractor supports the following flags:

FlagsValueExampleNotes
--includea pattern**/*.jsonWhen traversing directories looking for files to extract, include files matching this pattern (in addition to JavaScript and HTML files). This flag may be repeated. The value of each flag is interpreted as an ant-style pattern.
--excludea pattern**/*.min.js

Do not extract any files whose path matches the given pattern. This flag may be repeated. It is applied after any --include flags. The value of each flag is interpreted as an ant-style pattern.

--exclude-patha file name.jshintignore

Do not extract any files whose path matches any pattern listed in the given exclusion file, interpreted as an ant-style pattern. This flag may be repeated.

Note: Relative paths in the exclusion file are interpreted relative to the directory of the exclusion file itself. Therefore, the exclusion file should normally itself be located in the source tree being analyzed, not in the project or snapshot directory.

The exclusion file specifies one pattern per line, for example:

legacy.js
somelib/**
otherlib/*.js

If the exclusion file is located at source-root/.jshintignore, then the first pattern matches source-root/legacy.js, the second one matches the directory source-root/somelib and all its subdirectories and files, while the third path matches all .js files in source-root/otherlib.

--experimentalnone 

Enable experimental support for the following language extensions:

  • Asynchronous functions; trailing commas in function parameter and argument lists; spread/rest properties; public class fields; function.sent meta-property; decorators; export extensions; function bind syntax
  • Flow type annotation syntax
  • Mozilla-specific language extensions (expression closures, guarded catch clauses, for-in blocks in comprehensions, for each ... in loops, legacy let statements, let expressions and array and generator comprehensions)
  • JScript-style double colon methods
--externsnone Extract the given files as externs declarations, not as ordinary JavaScript code.
--extract-program-text
none Extract the textual content of all JavaScript files in addition to syntactic and semantic information.
--htmlscripts or elements 

Specify what information to extract about HTML files:

  • scripts: extract scripts embedded inside HTML files;
  • elements: additionally, extract information about the document structure (elements and attributes), but not textual information.

The default is elements.

--platformnodeweb or auto 

Specify whether the given source files should be extracted as Node.js code or as plain JavaScript:

  • node: extract all files as Node.js modules;
  • web: extract all files as plain JavaScript files;
  • auto: for each file, automatically determine whether it is a Node.js module or a plain JavaScript file and extract accordingly.

The default is auto.

If the platform is node or auto, any package.json files found in directories passed to the extractor are also extracted, as well as any files that look like Node.js command line scripts. A file is considered to be a Node.js command line script if it has no extension, and its first line is a UNIX shebang line (that is, it starts with the two characters #!), which contains the word node or nodejs.

Note that JavaScript code embedded in HTML files is always extracted as plain JavaScript code (even if –platform node is specified).

--quietnone 

Do not produce diagnostic output.

--abort-on-parse-errorsnone Abort extraction as soon as a parse error is encountered. The default is to record the error and continue extraction.
--trap-cachea directory name/tmp/js-trap-cache

Use the given directory to cache extraction results. If the same file is extracted repeatedly, these results will be reused.

--trap-cache-bounda size5g

Limit the size of the TRAP cache: before starting extraction, the current size of the cache is computed; if it exceeds the limit, the cache is trimmed to 40% of the given size.

Note that the TRAP cache is only trimmed once, before starting extraction. During extraction, new data may be put into the cache, which can cause it to exceed the limit.

--source-typescriptmodule or autoscript

Specify how JavaScript source files should be parsed.

Source type module means that JavaScript files are parsed as module definitions, while type script means that they are interpreted as a stand-alone scripts. In the latter case, import or export declarations will not be recognized and lead to a parse error. When the default source type auto is specified, the extractor attempts to determine the type of each file and then processes it accordingly.

--file-typejshtmljson or yamlyaml Analyze all files as the type defined by this flag, regardless of their extension. By default, the extractor attempts to determine the type of each file using the file extension and any files with unknown extensions are ignored.
--default-encodinga character encodinglatin1Use the given encoding when reading source files. The default is UTF-8.
--typescriptnone 

Use to enable extraction of TypeScript files (.ts and .tsx). This uses the TypeScript compiler to extract information about TypeScript files during the analysis and requires that you have Node.js version 6.x or later installed (we recommend the latest LTS version).

Default: TypeScript files are ignored by installations of Semmle Core. The lgtm enterprise is configured differently.

--typescript-fullnone
Like --typescript, but additionally extracts static type information from TypeScript files.
--typescript-ramnumber and byte unit1GSets the amount of memory to allocate for the TypeScript extraction process. When using --typescript-full, raising this can help speed up extraction time. The default is 1G.

For extractor commands in project files, --include and --exclude patterns should not be enclosed in quotes.

Bundled externs

The JavaScript analysis includes a few generally useful externs files that are made available in the tools/data/externs directory of your installation. For most projects, it makes sense to extract all of these externs, as is done by the default configuration created by odasa bootstrap.

The externs are organized into five subdirectories:

  • es: Externs for standard library functions and objects for various versions of the ECMAScript standard.

    These definitions are cumulative: es3.js includes externs definitions for the standard library specified in the ECMAScript 3 standard, es5.js covers the additions made by the ECMAScript 5 standard, and so on.

  • lib: Externs for a few selected libraries.
  • nodejs: Externs for the Node.js standard library.
  • vm: Externs for non-standard built-in functions of popular JavaScript engines: jsshell.js provides support for Mozilla's JSShell environment, spidermonkey.js is for Firefox-specific API, rhino.js covers Rhino built-ins, and v8.js models v8.
  • web: Externs for DOM objects and functions.

File exclusions

The patterns specified by --exclude and --exclude-path are matched against file paths including the extraction root. For instance, assume the extractor is invoked as follows:

java -jar ${odasa_tools}/extractor-javascript.jar --exclude a.js dir

Here, the extraction root is dir. If dir is a directory containing a file a.js, the --exclude pattern will not match, since the pattern does not include the extraction root. Instead, the following invocation should be used:

java -jar ${odasa_tools}/extractor-javascript.jar --exclude dir/a.js dir

You can also exclude entire directories, which means that none of the files or subdirectories inside that directory are extracted.

TRAP Caching

Using a TRAP cache can speed up extraction if the same file is encountered multiple times. This is especially common during Team Insight data collection, where multiple versions of the same code base are extracted, which often consist of mostly the same files. The TRAP cache is thread-safe and hence can be used concurrently by multiple executors.

By default, the TRAP cache is not limited in size. As of version 1.11, the extractor supports a new flag --trap-cache-bound that can be used to specify a limit on cache size: as described above, upon starting extraction the current cache size is checked, and if it exceeds the specified limit, the cache is trimmed to 40% of the limit. This is a soft bound, as the cache is only trimmed once, before extraction, and no attempt is made to enforce the size limit during extraction.