Version 2.5 of the documentation is no longer actively maintained. The site that you are currently viewing is an archived snapshot. For up-to-date documentation, see the latest version.
GCP Setup
This document describes how to setup EngFlow Remote Execution in a VM-based deployment on Google Compute Engine (GCE) which is part of Google Cloud Platform (GCP). If you want to use Google Kubernetes Engine (GKE), please see the Kubernetes Setup instead.
Requirements
In addition to the baseline requirements for running the EngFlow Remote Execution Service, you will need administrative access to a GCE project to create VM images and instance templates, to start VMs, and so on.
Automated Setup using Packer and Terraform
We provide a Packer config to create the base image, and a minimal Terraform config to start the cluster. The Terraform config includes a service account, an optional GCS bucket, scheduler and worker pools, an optional auto-scaler, and a TCP load balancer.
Both may need to be adjusted for your desired build environment and production deployment. We recommend first starting a basic cluster and adjusting the configuration only after verifying its operation.
1. Setup Google Application Credentials
We recommend creating an additional service account to handle base image creation and cluster setup. This service account requires roles to create VM images and configure and start VMs.
You need to provide credentials for this service account to the packer
and
terraform
tools to setup the cluster. Download credentials as a .json
file
and set the GOOGLE_APPLICATION_CREDENTIALS
environment variable:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
2. Create a Base Image
The included Terraform template uses the same base image for both scheduler and worker instances. If you want to use separate images (for example when you want to install extra tools on workers), then you need to adjust the Terraform config.
To create the base image with Packer, you need to install the packer command line tool (installation instructions).
Before generating the image, you should inspect the packer configuration in
setup/gcp/base-image.json
, and modify it to fit your requirements - however,
we recommend first starting a minimal cluster and verifying its operation.
For base OS we recommend using Debian 10. You can use other versions or other
distributions too (e.g. Ubuntu 18.04) as long as the openjdk-11-jdk-headless
package is installed.
To generate the image, change into the setup/gcp
directory and run:
packer build base-image.json
The resulting image will be called
engflow-re-image-<year>-<month>-<day>-<hour>-<minute>
.
3. Start the Cluster
We provide a Terraform config file in setup/gcp/main.tf
, which includes
scheduler and worker templates, instance group managers (configured with a fixed
size), and an internal TCP load balancer. It also embeds the license file as
well as the service config (setup/gcp/config
).
At a minimum, you need to configure the following options before starting a cluster:
project_name
- the GCP project name the cluster should run underavailability_zone
- the target zone where the cluster should run
You should edit the setup/gcp/main.tf
file and set these to the desired
values. Additionally, you can configure the number of schedulers and workers in
the cluster as well as configure
Terraform remote state.
Start the cluster using:
terraform init
terraform apply
Once the cluster is running, Terraform prints the IP address of the load balancer, which is the end point for Bazel to talk to. Note that the default configuration only allows connections from other machines in the same GCE network.
Note that the TCP load balancer does not distribute requests from a single client coming over a single connection. On the other hand, GCP’s HTTP/2 load balancers do not support TLS client authentication (mTLS).
Manual Setup
This section outlines the manual process for setting up a cluster on GCE.
1. Create a Service Account
You have to create a new service account for the Remote Execution cluster. This service account is required on the scheduler and worker instances to perform discovery and to log monitoring data to Google Cloud Operations (formerly StackDriver). It must have at least the following role:
-
Compute Viewer aka
roles/compute.viewer
Used to auto-detect live scheduler and worker instances.
If you enable Google Cloud Operations (formerly StackDriver) monitoring with
--enable_stackdriver
, the service account requires these
additional roles:
-
Monitoring Metric Writer aka
roles/monitoring.metricWriter
Necessary to write metrics to GCO.
-
Cloud Trace Agent aka
roles/cloudtrace.agent
Necessary to write performance traces to GCO; only needed if you set
--monitoring_trace_probability
to a non-zero value.
If you configure Google Cloud Storage as a backup CAS/Action Cache, then the service account requires these additional roles:
-
Storage Object Admin aka
roles/storage.objectAdmin
Necessary to read, write, and delete objects to and from GCS.
If you use Docker images stored in Google Container Registry (GCR), then the service account may require these additional roles:
-
Storage Object Viewer aka
roles/storage.objectViewer
Necessary to read Docker images from GCR. Note that this is a subset of Storage Object Admin (needed for GCS), so you do not need both.
2. Create a Base Image
You can use the same base image for both scheduler and worker instances, or you can create separate images (for example when you want to install extra tools on workers).
-
Start a clean VM
We support the following distributions for the base image:
- Debian 10 (Buster)
- Ubuntu 18.04 (Bionic Beaver)
-
SSH into the VM
-
Update distribution
sudo apt update && sudo apt upgrade
-
Install the
engflow-re-services.deb
package usingsudo apt install ./engflow-re-services.deb
-
Install the
docker.io
package usingsudo apt install docker.io
-
Copy your license file to
/etc/engflow/license
usingsudo mv license /etc/engflow/license
-
Copy your configuration file to
/etc/engflow/config
usingsudo mv config /etc/engflow/config
-
You can customize the base image at this point if you need additional software installed. However, we recommend using Docker images for customization rather than running actions directly on the underlying VM.
-
Pull the Docker image you plan to use, e.g.,
docker pull gcr.io/cloud-marketplace/google/rbe-ubuntu16-04
.Note: the RBE Docker images require authenticating with gcloud first:
gcloud auth configure-docker
-
Stop the VM
-
Create an image snapshot of the VM
3. Create Instance Templates
You need to create at least two templates - one for the worker instances and one for the scheduler instances. Depending on your intended cluster layout, you may need multiple worker templates. The steps to create instance templates are very similar in all cases.
-
Create a new Instance Template
-
Give it a descriptive name, e.g.,
worker-template-1
-
Select a VM configuration, following the baseline requirements:
- Scheduler: Quad-core, 4 GB RAM
- Worker: Single-Core, 1 GB RAM
-
Select the VM image created previously; set disk size following the baseline requirements
-
Select the Service account create previously
-
Management -> Labels:
engflow_re_cluster_name
,default
-
Management -> Startup Script:
- Worker: #!/bin/bash systemctl start worker
- Scheduler: #!/bin/bash systemctl start scheduler
Do not enable HTTP or HTTPS in the firewall configuration unless you want to expose the cluster to the public internet.
4. Start the Cluster
You need to start both schedulers and workers using the previously created templates; you can start them in any order.
-
Create a new Instance Group from one of the templates
-
Configure auto-scaling or set a fixed number of instances
-
Configure the Health Check: TCP to the internal port
--private_port
5. Create a TCP Load Balancer
-
Create a new TCP Load Balancer
-
Backend: select the scheduler instance group
-
Frontend: TCP, port 443