The work pool consists of one or more machines that run LGTM daemons, referred to as "workers." These workers perform the "heavy lifting" involved in building each revision of a project and running analysis on it. In addition, a few workers are set aside to perform tasks for which a user is likely to be waiting for the results. Setting aside workers specifically for these tasks minimizes the delay in returning results to users without disrupting the ongoing analysis.
You can see which workers are currently being used—or are available for use—by LGTM by going to the
In the example shown above there are 5 workers, running on 3 worker host machines (with the hostnames:
A worker's type is defined in the cluster configuration file. There are three types of worker:
- General workers perform all types of jobs apart from
queryjobs. They spend most of their time building and analyzing code.
- Query workers perform only
queryjobs, which are created to run queries submitted by users in the LGTM query console. Having a dedicated worker or workers for these jobs ensures that users have a minimum delay before the results of their queries are reported to them in the web application.
- On-demand workers (defined in the configuration using the term
on_demand) perform tasks for which a user is likely to be waiting for the results, apart from
queryjobs. This includes analyzing pull requests (code review), and ensuring that alerts for every project are updated promptly after the external repository is polled.
Balancing worker types
The default, basic installation creates one
general and one
query worker. This is the minimum number of workers for LGTM to operate properly. It is enough to do some basic testing of the system.
When you extend the work pool to include more machines and workers, most of the new workers should be
on_demand workers. Building and analyzing code is an essential and constant task because there are always new commits to analyze, and typically a backlog of history to analyze in quieter periods.
Many systems run smoothly without any
on_demand workers. If users find LGTM is slow to analyze pull requests (code review) or to update project alerts, adding
on_demand workers should improve the response time. Note that you should not replace all your
general workers with
on_demand workers otherwise analysis will be limited. Information will be missing from the project overview page, the My alerts page, and the details of when an alert was introduced.
In addition to these workers, you should always have at least one
query worker, otherwise queries submitted through the query console will not be run. If you have many query console users, you should increase the number of
If your organization has a virtual infrastructure that supports CPU-based autoscaling of virtual machines, for example Amazon EC2,
you may want to create virtual machine images for
on_demand workers from a deployed instance of each, and configure them to autoscale accordingly.
Contact Semmle support if you'd like help setting this up.
Note that LGTM Enterprise schedules builds of historical commits in proportion to the number of
general workers available.
This means that if you use CPU-based autoscaling for this worker type, you must set an upper limit on the number of workers to run, as all
general workers are used at full capacity
until the entire history of all projects has been analyzed.
Managing workers and their host machines
If a machine in the work pool has adequate hardware resources, it may run multiple workers. Periodically, depending on the projects you're analyzing, and the stage of the analysis, you may need to change the number of worker hosts or the number of workers that are running on some of the host machines. You do this by editing the
workers block in the cluster configuration file, generating new sets of files for the configuration, and deploying these on each server in the cluster.
Configuring the properties of worker hosts
Each worker host needs to have the right software and the correct environment to enable checkout and analysis of the codebases you want to analyze. If you need to set environment variables, defining them in the cluster configuration file is the most reliable method. This ensures that the values are retained throughout upgrades and changes to the work pool.
If some of the host machines in the work pool are configured for analysis of a specific language or group of projects, you can assign these machines labels. You can then use these to ensure that certain jobs are assigned to workers running on a suitable worker host.
To find out more, see: