This topic describes the infrastructure required to set up an installation of Semmle Team Insight.
Semmle Team Insight consists of three main functional areas:
- Semmle Team Insight data processing—a master server connected to one or more worker node machines that are used to collect and analyze revision data.
- Analysis results storage—receives data and imports it into an SQL database.
- Visualization—runs the chosen business intelligence tool, for example: Tableau or QlikView.
The servers shown above perform the following actions:
|Version Control System |
|Business intelligence server|
Ideally all the servers should be physically co-located within a single data center to ensure good connectivity and efficient data transfer. Team Insight places no other restrictions on the physical location of these different servers, you can organize them as required within your network. The Insight server, DBMS and business intelligence server are often located on a single machine.
Team Insight is based on Semmle Core technology (see System requirements for general requirements). Please note that for Team Insight a standard installation of Semmle Core is required, that is, the toolchain should be co-located with the projects, dashboards and licenses directories.
The master server, worker nodes, Insight server and the server hosting the DBMS require passwordless public/private key connections to enable data to be passed as shown in the diagram above. This can be configured using either SSH (OpenSSH or Tectia SSH preferred) or HTTPS.
|Server||Software required to be installed||Other requirements|
|Business intelligence server|
The master server must be able to request and receive data from the VCS server. This includes details of all revisions for each project with metadata such as commit date/times and the names of contributors. In particular, it requires the repository to understand the concept of a 'revision'—that is, a snapshot of the code at a specific point in time. For centralized VCS (for example, Subversion, Perforce, and Team Foundation Server), the worker nodes must also be able to request and receive data from the VCS server.
The worker nodes (that is, servers that run one or more worker processes) must be able to build the projects that are configured for analysis. This means that any build tools, dependencies or other prerequisites must be installed on each worker node.
Team Insight can be configured to build old or "historic" versions of each project. Since the tools, dependencies and prerequisites needed to build a project often change over time, it is essential to ensure that all the required elements are available on the worker nodes otherwise the historic builds will fail. If it is not possible to install the tools or dependencies required by a historic build on a worker node, then that build will fail and be excluded from analysis.
Data storage requirements
The data storage requirements for the business intelligence server, Insight server and worker nodes are fairly small, although the worker nodes do need enough space to allow each worker process you run on that node to obtain a full copy of the source code, build it and save the results of analysis. In practice, each worker node typically needs a minimum of disk space equivalent to about 10 times the size of the largest project configured for analysis, plus 6 GB, multiplied by the number of worker processes that you intend to run on that server.
The data storage requirements for the Team Insight master server are much higher than for the other servers because this is where the main data is stored. The majority of the data storage space on the master server is used to store built and archived snapshots. The Semmle Core installation as a whole can be stored anywhere on the file system, including on network-attached storage. Parts of the installation may also be stored on different storage as long as the installation appears to be standard at the file system level (see Installing Semmle Core for details of the standard directory structure). If the Semmle SQL database is hosted on the Insight server, then this typically accounts for less than 1% of the storage requirements.
For each project configured for Team Insight, every revision of the source code added to the VCS from the chosen starting point must be analyzed and the data stored. Consequently the data stored by Team Insight grows with the analysis of new revisions of the code every day. The master server saves one archived snapshot of each revision of every project so the storage requirements are roughly the size of an archived snapshot times the number of revisions for each project, which often can be several terabytes - depending on the size of the project and the length of the history. In addition the results also take some space but usually significantly less than the archived snapshots.
The two visualization applications actively supported are Tableau and Qlikview. Since the data is stored in a standard SQL database, it supports reporting and visualization by many other applications. Please refer to the product documentation for details of the server and database requirements for your chosen application.
Server maintenance and backup
The standard server maintenance and file system backup responsibilities are as follows:
Customer or Semmle, as appropriate1:
1 Where an existing visualization application is used to serve Team Insight data, the customer normally takes responsibility for the maintenance of the software. If Semmle installs a new visualization application then they are normally responsible for its maintenance. Visualization software typically needs to be updated less frequently, unless there are specific reasons to upgrade to a new revision of the software.
2 Semmle software is usually updated on a regular schedule agreed with the customer.
Minimum backup requirements
The data stored in the DBMS and on the business intelligence server is derived and so it is not essential to back up all analysis and results—providing that, in the event of a server failure, you are prepared to take the time/resources to regenerate all the data again. The minimum backup requirements are as follows:
|Backup||Server||Restoration required to enable|
|Users to view existing data and workbooks|
|Semmle SQL database|
| Users to view existing data and workbooks |
Insight server to process new data received from the master server
|Semmle Core, configuration and any associated scripts||Insight||Insight server to process new data received from master server|
|Semmle Core, configuration and any associated scripts||Master||Analysis of new versions of code|
|Configuration and build environment||Workers||Analysis of new versions of code|
|Semmle snapshot results data||Master server||Efficient analysis of new versions of code—if not backed up and restored then some old versions will need to be recreated to allow tracking of violations across versions|
|Semmle snapshot analysis data||Master server||Option of analyzing old versions with a new rule when you upgrade Semmle Core|
The following topics have more information about getting started with Team Insight:
- Planning for scalability: Team Insight—an overview of how to ensure that your deployment of Team Insight is scalable
- Introduction to Team Insight administration—links to further topics on configuring and administering Team Insight
- Configuring a local setup of data collection—a step-by-step guide to configuring a simple, single-machine setup of Team Insight data collection for a simple, open-source project
- Using insightConfig to configure the master server—a step-by-step guide to setting up data collection and analysis, connecting between master and workers over HTTPS