...
This document explains the theory behind Data Theorem’s Private Network Proxy appliance offering, and as well as instructions for setting it up.
Use-Case and Architecture
In order to analyze APIs and services on private networks and VPCs, Data Theorem needs a proxy/connector that gives Data Theorem’s analyzer engine access to private networks. Data Theorem’s Private Network Proxy appliance, provided as a Docker image, creates an SSH port forwarding “tunnel” between Data Theorem and private networks to proxy the analyzer engine's network traffic.
...
A Private Network Proxy appliance is deployed within a private network
It establishes an SSH tunnel/port forward back to Data Theorem that connects to a proxy
Data Theorem’s analyzer engine uses the tunnel to connect to the proxy and scan APIs within the private network
...
Setting up a Private Network Proxy Appliance
These instructions are for the initial “v1” implementation. Data Theorem expects to refine and improve the setup flow with future releases.
Summary
Plan out how you want to deploy the appliance
Create an SSH keypair, and provide the public key to DT support.
DT will prepare the private-network-proxy service for the appliance and will provide some additional configuration parameters
Set up a host to run the appliance as a Docker container
Configure the container with the SSH keypair and the other necessary parameters
Upload API definitions for the private APIs, and notify DT support that they should use the Private Network Proxy
Planning out the Deployment
Data Theorem’s Private Network Proxy appliance is provided as a Docker image that can be deployed to a host with access to a private environment that contains APIs that are not publicly addressable. There are several requirements for how this image should be deployed:
...
The appliance’s Docker container image is available at: gcr.io/datatheorem-public-images/private-network-proxy-client-v1
Configuration
The appliance needs an SSH keypair to be able to connect to Data Theorem, identify itself, and associate it with the APIs hosted in its private network. This keypair should be kept safe and secure, but it will need to be configured and accessible to the appliance’s container.
The appliance also requires a port number that Data Theorem must provide. This is an internal configuration parameter that is bound to the appliance, but the appliance must request it due to how SSH’s port forwarding works.
1. Create an SSH keypair
The keypair should be created without a password to encrypt it, but Data Theorem recommends taking steps to protect the private key – depending on how the container is hosted, you may be able to use a secrets feature of Docker, Kubernetes, etc.
...
Store the private key file somewhere safe until you are ready to configure the Docker container.
2. Send the public key to Data Theorem
Email the public key file to support@datatheorem.com along with a brief explanation of your use-case for the appliance. Eg, explain whether this appliance will access a dev or staging environment, or some other use-case.
Data Theorem will use the public key to set up the service in our own infrastructure and assign a port to it. We will then reply with the value needed for the PROXY_PORT
ENV value for the container.
3. Configure the container
In order to configure the container you must provide it with at least the SSH private key and the PROXY_PORT
. The private key can be supplied through a Docker ENV variable, or it can be provided as a file on the filesystem (eg, if you want to mount the file or a volume, or if you use a secrets manager that can mount files).
The image is available at: gcr.io/datatheorem-public-images/private-network-proxy-client-v1
Required ENV variables
The following ENV variable is always required:
...
SSH_PRIVATE_KEY_DATA
– The raw data of the SSH private key for the appliance. It uniquely identifies the appliance and the proxy it provides. The key data must not be encrypted with a password.If you are unable to specify newline characters when setting this key’s value, the container has special support for replacing a
\n
character sequence with a newline character – if you use this ENV variable you can replace any newlines in the private key with a\n
. Container orchestration that specifies configuration through YAML files usually can specify newlines, but specifying values through command line arguments or.env
files may not be able to easily include newlines in the ENV values.
SSH_PRIVATE_KEY_FILE
– A path to the private key file within the container. Offered as an alternative to specifying the contents of the private key file directly, this allows you to interoperate with various key/secrets managers, or if you want to mount the key file from a volume or from the host.The file must be readable by the
appliance
user within the container because the container does not run as root. The container’s logs will print the user’s UID when it first starts up.
Optional ENV variables
The following ENV variables should not be explicitly set unless you want to change the behavior of the container.
RETRY_AFTER_DISCONNECT
– Whether the appliance should automatically reconnect if something happens to the connection to the server. Defaults toyes
. If it is set tono
the container will exit if SSH ever disconnects.VERBOSITY
– How verbose the container’s output is. Defaults to1
. Set this to0
for almost no output, or1
,2
, or3
for increasingly verbose levels of output. Only set it to a higher level if you need to do some sort of low-level debugging of the SSH connection.SERVER_HOST
– Override the host that the container tries to connect to. By default, the appliance connects toprivate-network-proxy1.securetheorem.com
.SERVER_PORT
– Override the server port that the container tries to connect to. By default, the appliance initiates an SSH connection to port20422
on the remote server.SERVER_PUBKEY
– Override the public key of the remote server. The image contains the remote server’s key already, and it will refuse to connect to other server keys. Setting this overrides the server key that it trusts.
Examples
Run the appliance directly with Docker using SSH_PRIVATE_KEY_DATA
Code Block | ||
---|---|---|
| ||
PRIVATE_KEY_FILE="/path/to/private_key" PROXY_PORT=10123 # Replace newline characters with a \n character sequence: PRIVATE_KEY_DATA=`cat ${PRIVATE_KEY_FILE}| while read line ; do echo -n "${line}\\n" ; done` docker run -it \ -e "PROXY_PORT=${PROXY_PORT}" \ -e "SSH_PRIVATE_KEY_DATA=${PRIVATE_KEY_DATA}" \ gcr.io/datatheorem-public-images/private-network-proxy-client-v1:latest |
Run the appliance directly with Docker using SSH_PRIVATE_KEY_FILE
Code Block | ||
---|---|---|
| ||
PRIVATE_KEY_FILE="/path/to/private_key" PROXY_PORT=10123 # bind-mount the private key into the container. The private key file must be # readable by the low-rights user within the container -- the appliance within the # container does not run as root. Eg, you may have to chmod the private key file, # or grant access to the container-user's group/gid from on the host system. docker run -it \ -e "PROXY_PORT=${PROXY_PORT}" \ -e "SSH_PRIVATE_KEY_FILE=/private_key" \ --mount "type=bind,src=${PRIVATE_KEY_FILE},dst=/private_key,readonly=true" \ gcr.io/datatheorem-public-images/private-network-proxy-client-v1:latest |
Run the appliance on GCP’s Compute Engine, within a Container Optimized OS
GCP’s Compute Engine provides a convenient way to spin up a VM that runs a single Docker container without having to worry about container orchestration. This example shows how to launch such a VM, in order to test-launch the appliance.
...
Code Block | ||
---|---|---|
| ||
gcloud --project=${PROJECT} compute instances create-with-container \ my-vm-name \ --container-image gcr.io/datatheorem-public-images/private-network-proxy-client-v1:latest \ --container-env-file=client1_vm.env \ ... # Any additional flags for creating the VM, such as network tags, zone, etc. |
Configuring Private APIs
Scanning a private API requires providing an API definition for the private API
Upload API definitions for any private APIs you want scanned. The hostnames must be resolvable to the Private Network Proxy appliance’s container.
Notify support@datatheorem.com about the private APIs/hostnames and the appliance they should be scanned with
Data Theorem will configure the analyzer engine to use the Private Network Proxy for those APIs
Security Architecture
Client Security
The appliance container’s security primarily depends on how/where it is deployed and configured. However, Data Theorem has taken steps to minimize the attack surface of the client and follow container best practices:
The Docker image is based on Alpine Linux, which is known for having a significantly smaller footprint compared to other popular distributions. It also relies on Linux hardening features like PIE, and it uses MUSL as its libc instead of GNU’s libc.
The service that runs in the container does not run as root. The Docker image runs commands as a normal, non-root user, minimizing what code running in the container can modify, and minimizing concerns about root processes running within containers.
The SSH client is configured to only trust a specific server key, instead of the default of prompting or auto-trusting new keys for a new host (via a
UserKnownHostsFile
and theStrictHostKeyChecking
option). The trusted server key can be overridden using theSERVER_PUBKEY
ENV variable.
Server Security
The server component is also run in a container in a VM on GCP.
...