This document explains the theory behind Data Theorem’s Private Network Proxy offering, and as well as instructions for setting it up as a Docker container.

Use-Case and Architecture

In order to analyze APIs and services on private networks and VPCs, Data Theorem needs a proxy/connector that gives Data Theorem’s analyzer engine access to private networks. Data Theorem’s Private Network Proxy, provided as a Docker image, creates an SSH port forwarding “tunnel” between Data Theorem and a private network in order to proxy the analyzer engine's network traffic.

The diagram below shows the architecture for how this works:

Setting up a Private Network Proxy

These instructions are for the initial “v1” implementation. Data Theorem expects to refine and improve the setup flow with future releases.

Summary

Planning out the Deployment

Data Theorem’s Private Network Proxy is provided as a Docker image that can be deployed to a host with access to a private environment that contains APIs that are not publicly addressable. There are several requirements for how this image should be deployed:

The container image is available at: gcr.io/datatheorem-public-images/private-network-proxy-client-v1

Configuration

An instance of the Private Network Proxy needs an SSH keypair to be able to connect to Data Theorem, identify itself, and associate it with the APIs hosted in its private network. This keypair should be kept safe and secure, but it will need to be configured and accessible to the proxy’s container.

It also requires a port number that Data Theorem must provide. This is an internal configuration parameter that is bound to the instance, but the instance’s container must request it due to how SSH’s port forwarding works.

1. Create an SSH keypair

The keypair should be created without a password to encrypt it, but Data Theorem recommends taking steps to protect the private key – depending on how the container is hosted, you may be able to use a secrets feature of Docker, Kubernetes, etc.

The keys must be stored in OpenSSH’s format, and not in some other public key format (eg, PKCS8 or PEM).

Creating a ED25519 keypair:

# This will create my_keyfile and my_keyfile.pub
ssh-keygen -t ed25519 -C "description for my connector" -f my_keyfile

Alternately, create an RSA keypair:

# This will create my_keyfile and my_keyfile.pub
ssh-keygen -t rsa -b 3072  -C "description for my connector" -f my_keyfile

When prompted to set a password, leave it blank and press enter. You can set a password to encrypt the private key if you want the extra protection, but you will need to remove the password later when the private key is provided to the connector’s Docker container.

Store the private key file somewhere safe until you are ready to configure the Docker container.

2. Send the public key to Data Theorem

Email the public key file to support@datatheorem.com along with a brief explanation of your use-case for the Private Network Proxy. Eg, explain whether it will access a dev or staging environment, or some other use-case.

Data Theorem will use the public key to set up the service in our own infrastructure and assign a port to it. We will then reply with the value needed for the PROXY_PORT ENV value for the container.

3. Configure the container

In order to configure the container you must provide it with at least the SSH private key and the PROXY_PORT. The private key can be supplied through a Docker ENV variable, or it can be provided as a file on the filesystem (eg, if you want to mount the file or a volume, or if you use a secrets manager that can mount files).

The image is available at: gcr.io/datatheorem-public-images/private-network-proxy-client-v1

Required ENV variables

The following ENV variable is always required:

Just one of the following ENV variables must be specified:

Optional ENV variables

The following ENV variables should not be explicitly set unless you want to change the behavior of the container.

Examples

Run the container directly with Docker using SSH_PRIVATE_KEY_DATA

PRIVATE_KEY_FILE="/path/to/private_key"
PROXY_PORT=10123

# Replace newline characters with a \n character sequence:
PRIVATE_KEY_DATA=`cat ${PRIVATE_KEY_FILE}| while read line ; do echo -n "${line}\\n" ; done`

docker run -it \
    -e "PROXY_PORT=${PROXY_PORT}" \
    -e "SSH_PRIVATE_KEY_DATA=${PRIVATE_KEY_DATA}" \
    gcr.io/datatheorem-public-images/private-network-proxy-client-v1:latest

Run the container directly with Docker using SSH_PRIVATE_KEY_FILE

PRIVATE_KEY_FILE="/path/to/private_key"
PROXY_PORT=10123

# bind-mount the private key into the container. The private key file must be
# readable by the low-rights user within the container -- the service within the
# container does not run as root. Eg, you may have to chmod the private key file,
# or grant access to the container-user's group/gid from on the host system.
docker run -it \
    -e "PROXY_PORT=${PROXY_PORT}" \
    -e "SSH_PRIVATE_KEY_FILE=/private_key" \
    --mount "type=bind,src=${PRIVATE_KEY_FILE},dst=/private_key,readonly=true" \
    gcr.io/datatheorem-public-images/private-network-proxy-client-v1:latest

Run the container on GCP’s Compute Engine, within a Container Optimized OS

GCP’s Compute Engine provides a convenient way to spin up a VM that runs a single Docker container without having to worry about container orchestration. This example shows how to launch such a VM, in order to test-launch the container.

Create a somefilename.env file that contains the ENV configuration for the container. Note that the newlines in the private key have been replaced with \n to include it on a single line, due to a limitation of this ENV file format (the deployed container will handle \n and \r character sequences within a key file or data by automatically by replacing the former with a newline and by removing the latter):

PROXY_PORT=10123
SSH_PRIVATE_KEY_DATA=-----BEGIN OPENSSH PRIVATE_KEY-----\n...\n-----END OPENSSH PRIVATE KEY-----

Then create the VM using GCP’s gcloud command line tool:

gcloud --project=${PROJECT} compute instances create-with-container \
    my-vm-name \
    --container-image gcr.io/datatheorem-public-images/private-network-proxy-client-v1:latest \
    --container-env-file=client1_vm.env \
    ...  # Any additional flags for creating the VM, such as network tags, zone, etc.

Configuring Private APIs

Scanning a private API requires providing an API definition for the private API

Security Architecture

Client Security

The container and the Private Network Proxy’s security primarily depends on how/where it is deployed and configured. However, Data Theorem has taken steps to minimize the attack surface of the client and follow container best practices:

Server Security

The server component is also run in a container in a VM on GCP.