Semmle 1.18
Skip to end of metadata
Go to start of metadata

This topic describes how to import custom data into Enterprise Insight using the insightClient (SSH services) or insightHttpClient (HTTPS services) commands.

Prerequisites

This topic assumes that you have already created a database, defined the basic database configuration and set up an Insight server service using either SSH or HTTPS. If you have not already done this, please see the steps in Setting up a Semmle Insight server.

Overview

You can import custom data into Enterprise Insight from a zip archive containing CSV files. Data is imported in three stages:

  1. Data from the CSV files is used to create or overwrite tables in the internal schema of the database (internal.csv_<name> tables). 
  2. This data is merged with the existing data stored in the internal schema of the database (internal.input_<name> tables). 
  3. Data from the internal.input_<name> tables is used to populate tables in the dbo (SQL Server) or public (PostgreSQL) schema for use by the business intelligence tool.

Stage 1 is performed automatically by insightClient (SSH services) or insightHttpClient (HTTPS services) without any additional configuration. For the standard Enterprise Insight data, stages 2 and 3 are performed automatically by the default SQL queries supplied with Enterprise Insight.

For custom data, you will need to write SQL queries for stages 2 and 3:

  • Stage 2 queries need to merge custom data from internal.csv_<name> tables into internal.input_<name> tables. This gives you full control over whether new data overwrites or is appended to existing data.
  • Stage 3 queries need define how the raw data from the internal.input_<name> tables should be transformed and used to populate tables in the dbo (SQL Server) or public (PostgreSQL) schema.

Data requirements

Before it can be imported into an Insight database, data must be in the following format:

  • Zip file containing one or more CSV files.
  • The first line of each CSV file must be a header containing the columns names to use for the data in that file.

Stage 1: Loading data into the database

When you have generated a zip archive containing the CSV files that you want to load into Enterprise Insight, you can load this data into the database schema using insightClient (SSH services) or insightHttpClient (HTTPS services).

To upload data to an HTTPS Insight server service

If the Insight server service is set up for HTTP or HTTPS connections, then you can communicate with the service using an appropriate client keystore and associated password (you may also need a separate file containing the truststore).

  1. On the server that hosts the Insight server service, confirm that the service is running by calling it using the insightHttpClient command with the --ping flag. For example:

    Example call to verify that the service is running
    odasa insightHttpClient --ping --url https://localhost:8443/insight-server/upload/ei-server --keystore client.jks --keystore-password password

    where https://localhost:8443/insight-server/upload/ei-server is the URL for the Insight server service, client.jks is the name of the combined key and trust store for the client and password is the password for this store.

  2. If the command prompt returns without reporting any errors, then the service is running correctly. You are now ready to call the insightHttpClient command with the --load flag and details of the zip archive to upload. For example:

    Example call to upload custom data to the service
    odasa insightHttpClient --load exports/extra-data.zip --url https://localhost:8443/insight-server/upload/ei-server --keystore client.jks --keystore-password password

    where exports/extra-data.zip is the location of the file to upload and the other details are as described above.

  3. If the command prompt returns without reporting any errors, then the file has been successfully transferred to the Insight server service. You can use the database admin console to check that the expected tables and columns have been created in the database.

 

To upload data to a local Insight server service

If your Insight server service is local, then you can communicate with the service using the workspace directory.

  1. Call the insightClient command with the --load flag and details of the zip archive to upload. For example:

    Example call to upload custom data to the service
    odasa insightClient /home/ei-workspace --load exports/extra-data.zip

    where /home/ei-workspace is the workspace directory for the Insight server service, and exports/extra-data.zip is the location of the file to upload.

  2. If the command prompt returns without reporting any errors, this indicates that the file has been successfully transferred to the Insight server service. You can use the database Admin console to check that the expected tables and columns have been created in the database.

 

To upload data to an SSH Insight server service

If your Insight server service is set up for SSH connections, then you can communicate with the service using the workspace directory.

  1. Call the insightClient command with the --load-stdin flag and details of the zip archive to upload. For example:

    Example call to upload custom data to the service
    #Linux
    cat exports/extra-data.zip | ssh remote-server '/path/to/odasa/tools/odasa insightClient /home/ei-workspace --load-stdin'
    
    # Windows
    cat exports/extra-data.zip | ssh remote-server '/path/to/odasa/tools/odasa.exe insightClient /home/ei-workspace --load-stdin'

    where /home/ei-workspace is the workspace directory for the Insight server service, exports/extra-data.zip is the location of the file to upload and remote-server is the Insight server.

  2. If the command prompt returns without reporting any errors, this indicates that the file has been successfully transferred to the Insight server service. You can use the database Admin console to check that the expected tables and columns have been created in the database.

Automatic processing

Every time the Insight server receives new data, it performs the following operations:

  1. Data is deleted, using a truncate command, from all tables with a csv_ prefix in the internal schema.
  2. For any new <name>.csv files, csv_<name> and input_<name> tables are created:
    • Column names are taken directly from the header row of the CSV file.
    • Any hyphens or spaces in the header row of the CSV file are replaced by underscore characters to give a valid column name. If the resulting column header duplicates an existing column header then this is reported as an error.
    • All table and column names are converted to lowercase text to reduce compatibility issues between zip files generated using different operating systems.
    • Any table or column names that contain non-standard characters are quoted.
  3. Data is loaded from each <name>.csv file into the associated csv_<name> table. If a zip file contains more than one file with the same name then data from both files is appended to a single table.
  4. All tables in the dbo (SQL Server) or public (PostgreSQL) schema that are defined in the databaseconfig, specified by the Insight Server configuration file, are truncated.
  5. Any SQL queries defined in the datamodel, specified by the Insight Server configuration file, are run.

The intention is that the internal.csv_ tables always contain the raw data from the most recently imported zip file. The internal.input_<name> tables act as a staging area for data from the CSV file.

Stage 2: Creating an internal table with aggregated data

When you have loaded data into an internal.csv_<name> table, you need to write a SQL query to define how each new import of data into this table is aggregated into an internal.input_<name> table. The internal.input_<name> table is an internal staging post for data which is then transformed into tables in the public schema, for consumption by business intelligence tools, by additional SQL queries.

The SQL query needs to:

  • Update the internal.input_<name> table with data from the internal.csv_<name> table, either replacing, updating or appending to existing data as appropriate.
  • Ensure that all data in this table is ready to be used to populate the public tables, in the dbo (SQL Server) or public (PostgreSQL) schema.

When you have written and tested the SQL query, you need to update the Insight server configuration file for the Enterprise Insight service (stored in the workspace directory used by the service) and add a new sql element to call the new SQL query.

To update the Insight server service to create an internal table with aggregated custom data

When you have written and tested the query as described above then you can update the Insight server service to automatically populate internal.input_<name> tables from custom data uploaded to internal.csv_<name> tables.

  1. Edit the Insight server configuration file for the Insight server service (stored in the workspace directory used by the service).

  2. Determine which transaction the new SQL query should be run in and define one new sql element to call the custom query:

    datamodel element of the database configuration file
    <datamodel>
      <transaction>
        <sql>custom-queries/populate-custom-input-tables.sql</sql>
        <sql>semmle-insight-queries/sql/enterprise/default/00-update-state.sql</sql>
      </transaction>
      <transaction>
        <sql>semmle-insight-queries/sql/enterprise/default/default/10-datamodel.sql</sql>
        <sql>semmle-insight-queries/sql/enterprise/default/default/20-populate.sql</sql>
        <sql>semmle-insight-queries/sql/enterprise/default/default/50-bi-tables.sql</sql>
        <sql>semmle-insight-queries/sql/enterprise/100-cache.sql</sql>
      </transaction>
    </datamodel>

    where the custom SQL queries are stored in a custom-queries subdirectory of the Semmle Core queries directory.

  3. Save the updated configuration file.
  4. Restart the Insight server service.

     Click to show more information...
  5. Upload a test data file to the service and check that the internal.input_<name> tables are populated as expected.

Stage 3: Making custom data available in the public schema

When you have loaded data into internal.input_<name> tables in the database, then you need to make this data available to business intelligence tools in tables in the dbo (SQL Server) or public (PostgreSQL) schema. The precise way that you do this depends on your preferences for maintaining SQL databases, how you want to join the custom data to the core data from Semmle analysis and the hierarchy of the data. Please note that the default SQL queries used to set up and maintain the dbo/public tables for the core data all use the following steps:

  1. Create the tables needed in the dbo or public schema.
  2. Populate these tables from the internal.input_<name> tables.
  3. Populate additional tables with metadata.

The simplest way to create primary keys between custom and core data is to create two SQL queries:

  • Query 1 - creates the custom tables.
  • Query 2 - populates the custom tables.

The first query needs to be configured to run before the default SQL queries in transaction two. The second query needs to run after the majority of the default SQL queries, but before the final 100-cache.sql query runs. You can test that the SQL queries populate the intended dbo/public tables as expected using your standard database admin tools.

To update the Enterprise Insight service to automatically populate public tables with custom data

When you have written and tested the two custom queries as described above then you can update the Insight server service to automatically populate public tables from custom data uploaded to the service.

  1. Edit the Insight server configuration file for the Insight server service (stored in the workspace directory used by the service).

  2. Define new sql elements with the custom queries:

    datamodel element of the database configuration file
    <datamodel>
      <transaction>
        <sql>custom-queries/populate-internal-input-tables.sql</sql>
        <sql>semmle-insight-queries/sql/enterprise/default/00-update-state.sql</sql>
      </transaction>
      <transaction>
        <sql>custom-queries/query1-create-tables.sql</sql>
        <sql>semmle-insight-queries/sql/enterprise/default/10-datamodel.sql</sql>
        <sql>semmle-insight-queries/sql/enterprise/default/20-populate.sql</sql>
        <sql>semmle-insight-queries/sql/enterprise/default/50-bi-tables.sql</sql>
        <sql>custom-queries/query2-populate-tables.sql</sql>
        <sql>semmle-insight-queries/sql/enterprise/100-cache.sql</sql>
      </transaction>
    </datamodel>

    where the custom queries are stored in a custom-queries subfolder of the Semmle Core queries directory.

  3. Save the updated configuration file.
  4. Restart the Insight server service.

     Click to show more information...
  5. Upload a test data file to the service and check that the public tables are populated as expected.

Example

You want to import some metadata for the projects in Enterprise Insight, including the departments which are responsible for each project. You do this in three stages.

Stage 1 - import data

First you need to create and import a test file, using the file structure that you will use for the full dataset.

  1. Create a CSV file containing a small subset of the data - making sure that you include the Project Insight names for the projects. For example:

    Project_metadata.csv
    "Full_name","Project_Insight_name","Department"
    "Product X plug-in for Y","X for Y","Department A"
    "Product X main interface","X main","Department B"
    "Product X core functions","X core","Department A"
  2. Use insightClient (SSH services) or insightHttpClient (HTTPS services) to load the data into the Enterprise Insight database. 

When this example file is loaded, a new table internal.csv_project_metadata is created. The table has three columns: full_nameproject_insight_name and department. Every time you import data from a CSV file with the same name, the data in the csv_project_metadata table will be overwritten with data from the new CSV file. 

Stage 2 - create aggregated internal tables

Next you need to create an internal.input_project_metadata table to hold the data aggregated from all data imported into the internal.csv_project_metadata table.

  1. Write and save an SQL query to populate the new, automatically created internal.input_project_metadata table with data from the internal.csv_project_metadata table. For example:

    query1-setup-tables.sql
    delete from internal.input_project_metadata;
    
    insert into internal.input_project_metadata
      select * from internal.csv_project_metadata;

    In this example, the imported CSV file will always contain the latest metadata for all projects. Consequently, this query deletes any existing content from the internal.input_project_metadata table and then populates it with the new data.

  2. Run the new query on the database and verify that it populates the internal.input_project_metadata table with the expected data (in this example the contents will match the internal.csv_project_metadata table).
  3. Update the Insight server configuration file for the Insight server service (stored in the workspace directory used by the service) and add a new sql element to call the new SQL query:

    <datamodel> element of the Insight server configuration file
    <datamodel>
      <transaction>
        <sql>custom-queries/custom-input-tables.sql</sql>
        <sql>semmle-insight-queries/sql/enterprise/default/00-update-state.sql</sql>
      </transaction>
      <transaction>
        <sql>semmle-insight-queries/sql/enterprise/default/10-datamodel.sql</sql>
        <sql>semmle-insight-queries/sql/enterprise/default/20-populate.sql</sql>
        <sql>semmle-insight-queries/sql/enterprise/default/50-bi-tables.sql</sql>
        <sql>semmle-insight-queries/sql/enterprise/100-cache.sql</sql>
      </transaction>
    </datamodel>

Stage 3 - create dbo or public tables

Finally, to make the new data available to business intelligence tools you need to populate dbo (SQL Server) or public (PostgreSQL) schema tables with the metadata.

  1. Write and save an SQL query to create a new table and, if it already exists, delete any existing content. For example:

    query1-create-tables.sql
    create table project_metadata (
      id serial primary key,
      project bigint references projects(id),
      full_name varchar(300),
      department varchar(300),
      unique (project)
    );

    This example can be updated to be compatible with SQL Server, in addition to PostgreSQL, by changing Line 2 to: id ${serial} primary key,

  2. Write and save an SQL query to populate columns in the new table and link the metadata to the existing projects table, using the custom project_insight_name column. For example:

    query2-populate-tables.sql
    insert into project_metadata (project, full_name, department)
      select distinct p.id, m.full_name, m.department
      from internal.input_project_metadata m
      inner join projects p on m.project_insight_name = p.name;
  3. Update the Insight server configuration file for the Insight server service to include sql elements for these two new queries:

    <datamodel> element of the Insight server configuration file
    <datamodel>
      <transaction>
        <sql>custom-queries/custom-input-tables.sql</sql>
        <sql>semmle-insight-queries/sql/enterprise/default/00-update-state.sql</sql>
      </transaction>
      <transaction>
        <sql>custom-queries/query1-create-tables.sql</sql>
        <sql>semmle-insight-queries/sql/enterprise/default/10-datamodel.sql</sql>
        <sql>semmle-insight-queries/sql/enterprise/default/20-populate.sql</sql>
        <sql>semmle-insight-queries/sql/enterprise/default/50-bi-tables.sql</sql>
        <sql>custom-queries/query2-populate-tables.sql</sql>
        <sql>semmle-insight-queries/sql/enterprise/100-cache.sql</sql>
      </transaction>
    </datamodel>
  4. Restart the Insight server service.
  5. Upload the test CSV file using insightClient (SSH services) or insightHttpClient (HTTPS services).
  6. Review the data stored in the Enterprise Insight database. For example, the results of running select * from project_metadata might be:

    idprojectfull_namedepartment
    120
    Product X plug-in for Y
    Department A        
    25Product X core functionsDepartment A
    331Product X main interfaceDepartment B

    where the project column shows the id of the associated project in the projects table, linked by a primary key.

If you are satisfied that the custom import is correctly configured, then you can upload the full data file by calling insightClient (SSH services) or insightHttpClient (HTTPS services) with the --load flag.

  • No labels