Private Network Proxy: Setup Instructions

This document explains the theory behind Data Theorem’s Private Network Proxy offering, and as well as instructions for setting it up as a Docker container.

Use-Case and Architecture

In order to analyze APIs and services on private networks and VPCs, Data Theorem needs a proxy/connector that gives Data Theorem’s analyzer engine access to private networks. Data Theorem’s Private Network Proxy, provided as a Docker image, creates an SSH port forwarding “tunnel” between Data Theorem and a private network in order to proxy the analyzer engine's network traffic.

The diagram below shows the architecture for how this works:

  • A Private Network Proxy client container is deployed within a private network

  • It establishes an SSH tunnel/port forward back to Data Theorem that connects to a proxy

  • Data Theorem’s analyzer engine uses the tunnel to connect to the proxy and scan APIs within the private network

Setting up a Private Network Proxy

These instructions are for the initial “v1” implementation. Data Theorem expects to refine and improve the setup flow with future releases.

Summary

  • Plan out how you want to deploy an instance of the Private Network Proxy

  • Create an SSH keypair, and provide the public key to DT support.

  • DT will prepare the Private Network Proxy service for the instance and will provide some additional configuration parameters

  • Set up a host to run the instance as a Docker container

  • Configure the container with the SSH keypair and the other necessary parameters

  • Upload API definitions for the private APIs, and notify DT support that they should use the Private Network Proxy

Planning out the Deployment

Data Theorem’s Private Network Proxy is provided as a Docker image that can be deployed to a host with access to a private environment that contains APIs that are not publicly addressable. There are several requirements for how this image should be deployed:

  • The container must be able to perform DNS lookups of, and be able to connect to, the APIs that will be scanned. For simple deployments, if the host system for a container can resolve hostnames in the private network, so can the container.

  • The container must be able to resolve and connect to private-network-proxy1.securetheorem.com, on port 20422 to set up the tunnel/port forwarding.

  • The container must be configured with an SSH private key and a port assigned to the connector by Data Theorem. The sections below discuss coordinating this configuration with Data Theorem.

  • A particular instance’s container should only exist once – it should not be scaled or replicated across a cluster (eg, Docker Swarm or Kubernetes). A deployed container represents where network traffic from Data Theorem will originate within the private network.

  • If you have multiple isolated private networks where there you have APIs to scan, then each network will need its own Private Network Proxy configured with Data Theorem.

  • The container currently logs all output to STDOUT and STDERR.

  • The container should have 2 vCPUs, 2GB memory, and 2GB disk

The container image is available at: gcr.io/datatheorem-public-images/private-network-proxy-client-v1

Configuration

An instance of the Private Network Proxy needs an SSH keypair to be able to connect to Data Theorem, identify itself, and associate it with the APIs hosted in its private network. This keypair should be kept safe and secure, but it will need to be configured and accessible to the proxy’s container.

It also requires a port number that Data Theorem must provide. This is an internal configuration parameter that is bound to the instance, but the instance’s container must request it due to how SSH’s port forwarding works.

1. Create an SSH keypair

The keypair should be created without a password to encrypt it, but Data Theorem recommends taking steps to protect the private key – depending on how the container is hosted, you may be able to use a secrets feature of Docker, Kubernetes, etc.

The keys must be stored in OpenSSH’s format, and not in some other public key format (eg, PKCS8 or PEM).

Creating a ED25519 keypair:

# This will create my_keyfile and my_keyfile.pub ssh-keygen -t ed25519 -C "description for my connector" -f my_keyfile

Alternately, create an RSA keypair:

# This will create my_keyfile and my_keyfile.pub ssh-keygen -t rsa -b 3072 -C "description for my connector" -f my_keyfile

When prompted to set a password, leave it blank and press enter. You can set a password to encrypt the private key if you want the extra protection, but you will need to remove the password later when the private key is provided to the connector’s Docker container.

Store the private key file somewhere safe until you are ready to configure the Docker container.

2. Send the public key to Data Theorem

Email the public key file to support@datatheorem.com along with a brief explanation of your use-case for the Private Network Proxy. Eg, explain whether it will access a dev or staging environment, or some other use-case.

Data Theorem will use the public key to set up the service in our own infrastructure and assign a port to it. We will then reply with the value needed for the PROXY_PORT ENV value for the container.

3. Configure the container

In order to configure the container you must provide it with at least the SSH private key and the PROXY_PORT. The private key can be supplied through a Docker ENV variable, or it can be provided as a file on the filesystem (eg, if you want to mount the file or a volume, or if you use a secrets manager that can mount files).

The image is available at: gcr.io/datatheorem-public-images/private-network-proxy-client-v1

Required ENV variables

The following ENV variable is always required:

  • PROXY_PORT – The SSH client uses this value to set up a reverse port forward on the server. The server restricts which server port(s) a client may ask to use, but this parameter is necessary because the client must ask to set up the port to forward its traffic to the proxy.

Just one of the following ENV variables must be specified:

  • SSH_PRIVATE_KEY_DATA – The raw data of the SSH private key for the Private Network Proxy. It uniquely identifies and authenticates the container back to Data Theorem. The key data must not be encrypted with a password.

    • If you are unable to specify newline characters when setting this key’s value, the container has special support for replacing a \n character sequence with a newline character – if you use this ENV variable you can replace any newlines in the private key with a \n. Container orchestration that specifies configuration through YAML files usually can specify newlines, but specifying values through command line arguments or .env files may not be able to easily include newlines in the ENV values.

  • SSH_PRIVATE_KEY_FILE – A path to the private key file within the container. Offered as an alternative to specifying the contents of the private key file directly, this allows you to interoperate with various key/secrets managers, or if you want to mount the key file from a volume or from the host.

    • The file must be readable by the appliance user within the container because the container does not run as root. The container’s logs will print the user’s UID when it first starts up.

Optional ENV variables

The following ENV variables should not be explicitly set unless you want to change the behavior of the container.

  • WORKSPACE_DIR – Specify a directory where dynamically generated files will be written to. The SSH private key and other files will be created/copied here. It defaults to /app, but you can set it to /dev/shm (if available) or to your own tmpfs mount if you want the container itself to write files to ephemeral storage.

  • RETRY_AFTER_DISCONNECT – Whether the container should automatically reconnect if something happens to the connection to the server. Defaults to yes. If it is set to no the container will exit if SSH ever disconnects instead of reconnecting automatically.

  • VERBOSITY – How verbose the container’s output is. Defaults to 1. Set this to 0 for almost no output, or 1, 2, or 3 for increasingly verbose levels of output. Only set it to a higher level if you need to do some sort of low-level debugging of the SSH connection.

  • SERVER_HOST – Override the host that the container tries to connect to. By default, the container connects to private-network-proxy1.securetheorem.com.

  • SERVER_PORT – Override the server port that the container tries to connect to. By default, the container initiates an SSH connection to port 20422 on the remote server.

  • SERVER_PUBKEY – Override the public key of the remote server. The image contains the remote server’s key already, and it will refuse to connect to other server keys. Setting this overrides the server key that it trusts.

Examples

Run the container directly with Docker using SSH_PRIVATE_KEY_DATA

PRIVATE_KEY_FILE="/path/to/private_key" PROXY_PORT=10123 # Replace newline characters with a \n character sequence: PRIVATE_KEY_DATA=`cat ${PRIVATE_KEY_FILE}| while read line ; do echo -n "${line}\\n" ; done` docker run -it \ -e "PROXY_PORT=${PROXY_PORT}" \ -e "SSH_PRIVATE_KEY_DATA=${PRIVATE_KEY_DATA}" \ -e "WORKSPACE_DIR=/dev/shm" \ gcr.io/datatheorem-public-images/private-network-proxy-client-v1:latest

Run the container directly with Docker using SSH_PRIVATE_KEY_FILE

Run the container on GCP’s Compute Engine, within a Container Optimized OS

GCP’s Compute Engine provides a convenient way to spin up a VM that runs a single Docker container without having to worry about container orchestration. This example shows how to launch such a VM, in order to test-launch the container.

Create a somefilename.env file that contains the ENV configuration for the container. Note that the newlines in the private key have been replaced with \n to include it on a single line, due to a limitation of this ENV file format (the deployed container will handle \n and \r character sequences within a key file or data by automatically by replacing the former with a newline and by removing the latter):

Then create the VM using GCP’s gcloud command line tool:

Configuring Private APIs

Scanning a private API requires providing an API definition for the private API

  • Upload API definitions for any private APIs you want scanned. The hostnames for the APIs must be resolvable to the Private Network Proxy’s container.

  • Notify support@datatheorem.com about the private APIs/hostnames and the Private Network Proxy instance they should be scanned with

    • Data Theorem will configure the analyzer engine to use the Private Network Proxy for those APIs

Security Architecture

Client Security

The container and the Private Network Proxy’s security primarily depends on how/where it is deployed and configured. However, Data Theorem has taken steps to minimize the attack surface of the client and follow container best practices:

  • The Docker image is based on Alpine Linux, which is known for having a significantly smaller footprint compared to other popular distributions. It also relies on Linux hardening features like PIE, and it uses MUSL as its libc instead of GNU’s libc.

  • The service that runs in the container does not run as root. The Docker image runs commands as a normal, non-root user, minimizing what code running in the container can modify, and minimizing concerns about root processes running within containers.

  • The SSH client is configured to only trust a specific server key, instead of the default of prompting or auto-trusting new keys for a new host (via a UserKnownHostsFile and the StrictHostKeyChecking option). The trusted server key can be overridden using the SERVER_PUBKEY ENV variable.

  • The private key used by the client currently must not be encrypted with a password. However, it must also be written to a filesystem for SSH to use it. If you want to ensure that the container itself doesn’t write the key to some kind of persistent storage, you can set point WORKSPACE_DIR to some sort of ephemeral storage. Docker usually provides a /dev/shm shared memory filesystem, but you could also point it to your own tmpfs mount as well.

Server Security

The server component is also run in a container in a VM on GCP.

  • The VM runs Google’s Container Optimized OS (COS), which is a hardened, minimal OS optimized for deploying individual docker containers on GCP. GCP also builds its GKE (Kubernetes) offering and other container-based offerings on COS.

  • The VM is firewalled to only publicly expose a non-standard SSHD port

  • The server’s container runs SSHD and nothing else

  • The server’s container runs SSHD as a non-root user

  • SSHD is configured to lock down what SSH features and services it offers

    • In addition to being run as a non-root user, it is configured to disallow root logins and to disable password-based authentication entirely

    • It disables all SSH sub-services except for remote/reverse port forwarding

    • It only allows authorized keys to authenticate

    • It restricts each authorized key to disable running commands, and to disable all services except reverse port forwarding

    • Each authorized key is granted a single port that it can open that SSHD will listen on to receive traffic meant for the proxy running in the client

  • The SSHD server’s private key is kept out of source code. Instead, it is protected using GCP’s Secrets Manager, and it is only accessed in order to deploy it to the VM and provide it to the container.

  • The proxy ports are only accessible to a VPC that is restricted to the security scanner components of Data Theorem’s analyzer engine that need traffic to originate from a static IP address, or that may need to go through the private-network-proxy

  • The VM is hosted on a GCP project separate from any other Data Theorem services, isolating it from other services, and making it easier to manage who has internal access