Data Theorem On-Prem Scanner

Data Theorem allows you to run our source code SAST analyzer directly in your environment and on your hardware. This gives you full control over the scanning infrastructure: the scanning machine could be your own on-prem hardware, or it could be a CI runner (for example, from Github, Bitbucket, Gitlab, or Azure DevOps).

Data Theorem’s on-prem scanner allows you to leverage Data Theorem’s SAST scanning without sending any source code off-site. The security scan results will still be uploaded to the Data Theorem portal. However, this approach comes with a couple of important limitations:

  • Data Theorem’s SAST analyzer won’t be able to post source code annotations directly in the Github / Bitbucket / Gitlab UI. Security scan results will only be consumable within the Data Theorem portal, or via our Security Scan Results API.

If you prefer not to be limited by the above, we recommend utilizing our dedicated Github / Bitbucket / Gitlab integrations, which are built around Data Theorem’s Cloud infrastructure and provide the most polished developer experience (see onboarding instructions at DevSecOps > SAST Code Analysis).

 

Requirements

  • The machine running the scanner must have docker installed

  • The machine running the scanner must have internet access

  • Here are our base spec recommendations for running the on-prem scanner

Repository Size

CPUs

RAM

Disk Size (SSD)

Repository Size

CPUs

RAM

Disk Size (SSD)

0-5 GB

4 CPUs

8 GB

16 GB

5-10 GB

8 CPUs

16 GB

32 GB

10-20 GB

16 CPUs

32 GB

64 GB

Note: Scan time is relative to the repository size so the specs that fit your needs may vary based on the size of your repository.

Step 1: Generate a SAST Security Results API Key

Navigate to Data Theorem’s API key provisioning portal Data Theorem

Make sure the API key has the “SAST Scanning” feature permission

sast-api-key.png

 

Step 2: Run the Data Theorem SAST scanner

The Data Theorem Scanner docker image is available at us-central1-docker.pkg.dev/prod-scandal-us/datatheorem-sast/datatheorem-sast

Environment Variables Inputs

The Data Theorem SAST scanner needs the following inputs to run:

  • DT_SAST_API_KEY: Data Theorem API Key retrieved on step 1

  • DT_SAST_REPOSITORY_NAME: name of your resource to scan

    • example my_org_name/my_repo_name

  • DT_SAST_REPOSITORY_ID: the identifier for the repository on your platform (Github, Bitbucket, Gitlab…)

  • DT_SAST_REPOSITORY_HTML_URL: base web url to the resource

    • example https://github.com/my_org_name/my_repo_name

  • DT_SAST_REPOSITORY_DEFAULT_BRANCH_NAME: name of the default branch name of your repository

    • example main

[Diff scans] Often, it’s more useful to find out what security issues have been introduced in a given branch, rather than just scanning for all the issues in the codebase as a whole. This is accomplished by providing the scan with two git snapshots: one for the branch you’re going to merge the code into (i.e. the base state of the code), and another for the branch where you’ve introduced new code (i.e. your PR branch). To to this, use the following inputs

  • DT_SAST_SCAN_HEAD_REF: git ref of the head to scan

  • DT_SAST_SCAN_TARGET_REF: git ref of the target to scan

[Optional]

  • set DT_SAST_FAIL_MODE=true if set, the process will return a non-zero status when issues are found. This can be used to make Data Theorem SAST a blocking step of your workflow.

  • set DT_SAST_NO_FORWARD_MODE=true if you want to skip forwarding scan results/metadata to Data Theorem, note that this will mean that no scan results will be visible from the Data Theorem Portal

  • set DT_SAST_INCLUDE_CODE_SNIPPETS=false if you want to hide code snippets from the printed scan result in the output (you will still see the issue location in the code from the file path and line)

Local Scanning example

The Data Theorem on-prem scanner can run from your local machine.

From the root of the git repository you wish to scan, run the following command

docker run -it \ -e DT_SAST_API_KEY=$DT_SAST_API_KEY \ -e DT_SAST_REPOSITORY_NAME="<my_org>/<my_repo>" \ -e DT_SAST_NO_FORWARD_MODE=true \ --mount type=bind,source="$(pwd)"/,target=/target \ us-central1-docker.pkg.dev/prod-scandal-us/datatheorem-sast/datatheorem-sast \ data_theorem_sast_analyzer scan /target

Example with inputs to forward scan results to the [Data Theorem Portal](Data Theorem )

docker run -it \ -e DT_SAST_API_KEY=$DT_SAST_API_KEY \ -e DT_SAST_REPOSITORY_NAME="<my_org>/<my_repo>" \ -e DT_SAST_REPOSITORY_PLATFORM=BITBUCKET \ -e DT_SAST_REPOSITORY_ID={1e734a1b-8d0e-4787-a205-aba048c00a89} \ -e DT_SAST_REPOSITORY_HTML_URL="https://bitbucket.org/<my_org>/<my_repo>" \ -e DT_SAST_REPOSITORY_DEFAULT_BRANCH_NAME="main" \ -e DT_SAST_SCANNED_BRANCH="main" \ --mount type=bind,source="$(pwd)"/,target=/target \ us-central1-docker.pkg.dev/prod-scandal-us/datatheorem-sast/datatheorem-sast \ data_theorem_sast_analyzer scan /target

Sample output:

Scanning completed in 15.65 seconds Scan results: 1 issues on commit=f719d004ef98254b46187c53ef1b3ed2f8643082 Total Issues: 1 Issues per types: - First Party Code: 1 - SCA: 1 Issues per severity: - High Severity: 1 - Medium Severity: 1 [ { "issue_title": "Unauthenticated Route Found for Flask API", "issue_description": "The security of this code is compromised due to the presence of unauthenticated access to specific routes within the Flask API. This vulnerability poses a significant risk as it can potentially expose sensitive data or allow unauthorized actions to be performed. To mitigate this risk, it is crucial to implement robust authentication mechanisms that ensure only authorized users can access the protected routes.\n\nBy allowing unauthenticated access, the code fails to validate the identity of users before granting them access to certain routes. This lack of authentication opens the door for malicious actors to exploit the system and gain unauthorized access to sensitive information or perform actions that they should not be able to.\n\nTo address this issue, it is recommended to implement a secure authentication process that verifies the identity of users before granting them access to protected routes. This can be achieved through various methods such as username/password authentication, token-based authentication, or integration with third-party authentication providers.\n\nAdditionally, it is important to consider implementing other security measures such as encryption of sensitive data, input validation to prevent injection attacks, and proper error handling to avoid leaking sensitive information.\n\nBy implementing these security measures, the code can ensure that only authenticated and authorized users can access the protected routes, significantly reducing the risk of unauthorized access or data breaches. It is essential to prioritize security in the development process to safeguard sensitive data and protect the integrity of the system.", "issue_type": "FIRST_PARTY_CODE", "severity": "HIGH", "detected_in_file_path": "sample_code/bad_python.py", "detected_on_line": 7, "issue_code_snippet": "@app.route(\"/\")\ndef index():\n cmd = request.args.get(\"cmd\", \"\")\n exec(cmd)\n return \"\"" }, { "issue_title": "jinja2 version 3.1.2 contains a known vulnerability (via PyPI dependency): Jinja vulnerable to HTML attribute injection when passing user input as keys to xmlattr filter", "issue_description": "Jinja vulnerable to HTML attribute injection when passing user input as keys to xmlattr filter", "issue_type": "SCA", "severity": "MEDIUM", "detected_in_file_path": "sample_code/requirements.txt", "detected_on_line": 1, "issue_code_snippet": "jinja2==3.1.2\n" } ] Visit https://www.securetheorem.com/api/v2/security/sast for more details

GitHub Actions example

Set the Data Theorem API Key as a secret variable

Go to your repository > Settings > Security > Secrets and variables > Actions> Secrets

Click on New Repository Secret and create a secret variable named DT_SAST_API_KEY with the value retrieved in Step 1

Scans on pushes

Scans on pull requests

Bitbucket pipeline example

Set the Data Theorem API Key as a secret variable

Go to your repository > Repository Settings > Repository Variables

Add a variable named DT_SAST_API_KEY with the value retrieved in step 1 and make sure the Secured option is checked

Gitlab pipeline example

Set the Data Theorem API Key as a secret variable

Go to your project > Settings > CI/CD > Variables

Add a variable named DT_SAST_API_KEY with the value retrieved in step 1 and make sure the Masked option is checked

Note: the Gitlab pipeline must run the Data Theorem SAST step on an executor that supports the image feature.
See Executors | GitLab for more information on compatible executors

 

Azure DevOps Pipeline Example

Create a new Azure DevOps Pipeline

Add a variable named DT_SAST_API_KEY with the value retrieved in step 1 and make sure the Keep this value secret option is checked. (See https://learn.microsoft.com/en-us/azure/devops/pipelines/process/set-secret-variables?view=azure-devops&tabs=yaml%2Cbash )

The Azure Pipeline definition should look like this:

Troubleshooting

SSL Errors

If the scanner if failing because of SSL errors, it may be because you are running the scanner behind a proxy that is making SSL verification fail.

If this is the case, we recommend to do the following:

You can build a custom Docker images that embeds your own valid SSL certificates

Make sure you have valid certificates that are able to call api.securetheorem.com from the machine that is running the Data Theorem On-Prem Scanner

The Dockerfile would look like this

 

  • If this is not working, please contact support@datatheorem.com for help