LGTM Enterprise 1.20.2

Managing the work pool

The work pool consists of one or more machines that run LGTM daemons, referred to as "workers." These workers perform the "heavy lifting" involved in building each revision of a project and running analysis on it. In addition, a few workers are set aside to perform tasks for which a user is likely to be waiting for the results. Setting aside workers specifically for these tasks minimizes the delay in returning results to users without disrupting the ongoing analysis.

You can see which workers are currently being used—or are available for use—by LGTM by going to the Infrastructure page and clicking the Worker management tab:

In the example shown above there are 5 workers, running on 3 worker host machines (with the hostnames: workerhost1-windows, workerhost2-linux, and workerhost3-linux).

Worker types

A worker's type is defined in the cluster configuration file. There are three types of worker:

  1. General workers perform all types of jobs apart from query jobs. They spend most of their time building and analyzing code.
  2. Query workers perform only query jobs, which are created to run queries submitted by users in the LGTM query console. Having a dedicated worker or workers for these jobs ensures that users have a minimum delay before the results of their queries are reported to them in the web application.
  3. On-demand workers (defined in the configuration using the term on_demand) perform tasks for which a user is likely to be waiting for the results, apart from query jobs. This includes analyzing pull requests (code review), and ensuring that alerts for every project are updated promptly after the external repository is polled.

For information about defining workers, see Adding and removing workers. For information about the type of work carried out by workers, see: Job types.

Balancing worker types

The default, basic installation creates one general and one query worker. This is the minimum number of workers for LGTM to operate properly. It is enough to do some basic testing of the system.

When you extend the work pool to include more machines and workers, most of the new workers should be general or on_demand workers. Building and analyzing code is an essential and constant task because there are always new commits to analyze, and typically a backlog of history to analyze in quieter periods.

Many systems run smoothly without any on_demand workers. If users find LGTM is slow to analyze pull requests (code review) or to update project alerts, adding on_demand workers should improve the response time. Note that you should not replace all your general workers with on_demand workers otherwise analysis will be limited. Information will be missing from the project overview page, the My alerts page, and the details of when an alert was introduced.

In addition to these workers, you should always have at least one query worker, otherwise queries submitted through the query console will not be run. If you have many query console users, you should increase the number of query workers.

If your organization has a virtual infrastructure that supports CPU-based autoscaling of virtual machines, for example Amazon EC2, you may want to create virtual machine images for query and on_demand workers from a deployed instance of each, and configure them to autoscale accordingly. Contact Semmle support if you'd like help setting this up.

Note that LGTM Enterprise schedules builds of historical commits in proportion to the number of general workers available. This means that if you use CPU-based autoscaling for this worker type, you must set an upper limit on the number of workers to run, as all general workers are used at full capacity until the entire history of all projects has been analyzed.

Managing workers and their host machines

The General infrastructure tab on the Infrastructure page gives you a high-level view of how many workers are registered and what they are currently doing. You can see more details about the workers on the Worker management tab. From there you can click the arrow button for a worker to show a list of recent jobs handled by this worker, including the duration of each job and its status. From the worker detail page you can step down a further level of detail by clicking the arrow next to a job to see the log for that job.

If a machine in the work pool has adequate hardware resources, it may run multiple workers. Periodically, depending on the projects you're analyzing, and the stage of the analysis, you may need to change the number of worker hosts or the number of workers that are running on some of the host machines. You do this by editing the workers block in the cluster configuration file, generating new service configurations, and deploying these to all the servers in the cluster.

For more information, see: Monitoring usage and analysis progress and Adding and removing workers.

Configuring the properties of worker hosts

Each worker host needs to have the right software and the correct environment to enable checkout and analysis of the code bases you want to analyze. If you need to set environment variables, defining them in the cluster configuration file is the most reliable method. This ensures that the values are retained throughout upgrades and changes to the work pool.

If some of the host machines in the work pool are configured for analysis of a specific language or group of projects, you can assign these machines labels. You can then use these to ensure that certain jobs are assigned to workers running on a suitable worker host.

More information

To find out more, see: