Semmle 1.19
Skip to end of metadata
Go to start of metadata

This topic describes the infrastructure required to set up an installation of Semmle Team Insight.

Overview

Semmle Team Insight consists of three main functional areas:

  1. Semmle Team Insight data processing—a master server connected to one or more worker node machines that are used to collect and analyze revision data.
  2. Analysis results storage—receives data and imports it into an SQL database.
  3. Visualization—runs the chosen business intelligence tool, for example: Tableau or QlikView.

-

For information about setting up Semmle Team Insight light analysis, see the Team Insight light analysis help.

-

Logical architecture

An overview of the components used by Team Insight (also available for download as a PDF or Microsoft Visio file):

Servers

The servers shown above perform the following actions:

ServerActions
Version Control System
(VCS) servers
  • Responds to requests from the master server for source code data
  • For centralized VCSs: Also responds to requests from the workers for source code data
Master server
  • Requests data from the VCS servers to determine the revisions to be built and analyzed

  • For static setups: Copies Semmle Core to workers over SSH

  • Creates a job queue from which workers fetch build and analysis tasks

  • Receives snapshots and analysis results returned by workers

  • Aggregates results returned by workers

  • Publishes data to an Insight Server service
  • Optional: requests or receives data from other servers to integrate with the Team Insight data (for example, HR data, issue tracking)
Worker node(s)
  • For static setups: Receives Semmle Core software from the master server over SSH
  • Runs one or more worker-daemon.jar, which fetches build and analysis tasks from the master server over HTTPS
  • For projects stored in a centralized VCS: Requests source code data from the VCS server
  • Runs Semmle Core to perform the build/analysis tasks
  • Returns results to the master server
Insight server
  • Hosts one or more Insight Server services

  • Receives data from one or more master servers

  • Passes data to a DBMS (database management system—PostgreSQL or SQL Server)
DBMS
  • Hosts one or more SQL databases

  • Receives data from the Insight server service and stores it in an SQL database

  • On request, passes data to the business intelligence server
Business intelligence server
  • Hosts the visualization software (for example, Tableau or QlikView)
  • Requests data from the DBMS for display by the visualization tool
  • Delivers data to browser or application-based clients belonging to users

Ideally all the servers should be physically co-located within a single data center to ensure good connectivity and efficient data transfer. Team Insight places no other restrictions on the physical location of these different servers, you can organize them as required within your network. The Insight server, DBMS and business intelligence server are often located on a single machine.

Requirements

Team Insight is based on Semmle Core technology (see System requirements for general requirements). Please note that for Team Insight a standard installation of Semmle Core is required, that is, the toolchain should be co-located with the projects, dashboards and licenses directories.

Summary

The master server, worker nodes, Insight server and the server hosting the DBMS require passwordless public/private key connections to enable data to be passed as shown in the diagram above. This can be configured using either SSH (OpenSSH or Tectia SSH preferred) or HTTPS. 

ServerSoftware required to be installedOther requirements
Master server
  • Semmle Core

  • VCS client software (to communicate with the VCS servers)

  • For static data collection setups: rsync and passwordless login to the worker nodes over SSH
  • Good network access to worker nodes, VCS servers and Insight server
  • Access to large storage space for Team Insight data (that is, built and archived snapshots). This may be on network-attached storage
Worker node
  • Build prerequisites for the software projects to be built: build tools (compilers, etc.), build dependencies
  • If communication with centralized VCS is required: VCS client software
  • Good network access to master server
  • Each worker node typically needs a minimum of disk space equivalent to about 10 times the size of the largest project configured for analysis, plus 6 GB, multiplied by the number of worker processes that you intend to run on that node.
  • For centralized VCS, good network access to VCS servers
Insight server
  • Semmle Core

  • Good network access to the master server and the server that hosts the DBMS
DBMS server
  • SQL DBMS—either PostgreSQL (9.1 or above), or Microsoft SQL Server (Express 2014 or above)

  • Good network access to the business intelligence server
Business intelligence server
  • Visualization software—for example, QlikView (11 or above) or Tableau (8 or above)
  • Must have fast network access to the server that hosts the DBMS
  • Meet the requirements for the chosen business intelligence software

VCS servers

The master server must be able to request and receive data from the VCS server. This includes details of all revisions for each project with metadata such as commit date/times and the names of contributors. In particular, it requires the repository to understand the concept of a 'revision'—that is, a snapshot of the code at a specific point in time. For centralized VCS (for example, Subversion, Perforce, and Team Foundation Server), the worker nodes must also be able to request and receive data from the VCS server.

Workers

The worker nodes (that is, servers that run one or more worker processes) must be able to build the projects that are configured for analysis. This means that any build tools, dependencies or other prerequisites must be installed on each worker node.

Team Insight can be configured to build old or "historic" versions of each project. Since the tools, dependencies and prerequisites needed to build a project often change over time, it is essential to ensure that all the required elements are available on the worker nodes otherwise the historic builds will fail. If it is not possible to install the tools or dependencies required by a historic build on a worker node, then that build will fail and be excluded from analysis.

Data storage requirements

The data storage requirements for the business intelligence server, Insight server and worker nodes are fairly small, although the worker nodes do need enough space to allow each worker process you run on that node to obtain a full copy of the source code, build it and save the results of analysis. In practice, each worker node typically needs a minimum of disk space equivalent to about 10 times the size of the largest project configured for analysis, plus 6 GB, multiplied by the number of worker processes that you intend to run on that server. 

The data storage requirements for the Team Insight master server are much higher than for the other servers because this is where the main data is stored. The majority of the data storage space on the master server is used to store built and archived snapshots. The Semmle Core installation as a whole can be stored anywhere on the file system, including on network-attached storage. Parts of the installation may also be stored on different storage as long as the installation appears to be standard at the file system level (see Installing Semmle Core for details of the standard directory structure). If the Semmle SQL database is hosted on the Insight server, then this typically accounts for less than 1% of the storage requirements.

For each project configured for Team Insight, every revision of the source code added to the VCS from the chosen starting point must be analyzed and the data stored. Consequently the data stored by Team Insight grows with the analysis of new revisions of the code every day. The master server saves one archived snapshot of each revision of every project so the storage requirements are roughly the size of an archived snapshot times the number of revisions for each project, which often can be several terabytes - depending on the size of the project and the length of the history. In addition the results also take some space but usually significantly less than the archived snapshots.

Visualization requirements

The two visualization applications actively supported are Tableau and Qlikview. Since the data is stored in a standard SQL database, it supports reporting and visualization by many other applications. Please refer to the product documentation for details of the server and database requirements for your chosen application.

Server maintenance and backup

The standard server maintenance and file system backup responsibilities are as follows:

ServerMaintenanceBackup
Master server

Customer:

  • Update the VCS client software as necessary

Semmle:

  • Update the Semmle Core software as necessary2

Customer:

  • Semmle Core
  • Semmle configuration and any associated scripts
  • All snapshot analysis and results
Worker node(s)

Customer:

  • Update the build prerequisites as necessary
  • For centralized systems, update the VCS client software as necessary

Customer:

  • Any custom dependencies and set-up
Insight server

Semmle:

  • Update the Semmle Core software as necessary2

Customer:

  • Semmle Core
  • Semmle configuration and any associated scripts
  • Semmle data respository (if hosted here)
Visualization server

Customer or Semmle, as appropriate1:

  • Update the Visualization software as necessary

Customer:

  • Visualization data/workbooks etc.

1 Where an existing visualization application is used to serve Team Insight data, the customer normally takes responsibility for the maintenance of the software. If Semmle installs a new visualization application then they are normally responsible for its maintenance. Visualization software typically needs to be updated less frequently, unless there are specific reasons to upgrade to a new revision of the software.

2 Semmle software is usually updated on a regular schedule agreed with the customer.

Minimum backup requirements

The data stored in the DBMS and on the business intelligence server is derived and so it is not essential to back up all analysis and results—providing that, in the event of a server failure, you are prepared to take the time/resources to regenerate all the data again. The minimum backup requirements are as follows:

BackupServerRestoration required to enable
Visualization data/workbooks

Visualization

Users to view existing data and workbooks
Semmle SQL database

Insight

Users to view existing data and workbooks
Insight server to process new data received from the master server
Semmle Core, configuration and any associated scriptsInsightInsight server to process new data received from master server
Semmle Core, configuration and any associated scriptsMasterAnalysis of new versions of code
Configuration and build environmentWorkersAnalysis of new versions of code
Semmle snapshot results dataMaster serverEfficient analysis of new versions of code—if not backed up and restored then some old versions will need to be recreated to allow tracking of violations across versions
Semmle snapshot analysis dataMaster serverOption of analyzing old versions with a new rule when you upgrade Semmle Core

What next?

The following topics have more information about getting started with Team Insight: