Version 2.5 of the documentation is no longer actively maintained. The site that you are currently viewing is an archived snapshot. For up-to-date documentation, see the latest version.

FAQ

Frequently Asked Questions

How do I capture a Bazel profile?

Bazel 5.x:

bazel build --generate_json_trace_profile --profile=/tmp/prof.json.gz //foo:bar

Bazel 4.x:

bazel build --experimental_generate_json_trace_profile --profile=/tmp/prof.json.gz //foo:bar

The resulting file (/tmp/prof.json.gz) can be inspected in Chrome’s profile viewer (chrome://tracing).

Note: if /tmp/prof.json.gz exists, Bazel overwrites it. As of Bazel 5.1.0 it’s not possible to append to an existing profile.

If your main build tool is Make or Ninja and they call Bazel multiple times in a build, then you need to save the profile file after every Bazel run, or somehow pass different --profile flags every time. Otherwise Bazel will keep overwriting the profile.

One possible solution is to replace the real Bazel binary with a script that calls the real Bazel and appends --profile=/tmp/prof-$(date +%s).json.gz.

Why do I get PERMISSION_DENIED when trying to write to the cache?

The default service configuration allows remote execution but not remote caching. This is more secure.

A user with write access to the cache can write anything, including malicious code and binaries, which can then be returned to other users on cache lookups.

By comparison, remotely executed actions are typically sandboxed on a remote machine, which the user does not have direct control over. Since the cache key is a cryptographic hash of all input files, the command line, and the environment variables, it’s significantly harder to inject malicious data into the action cache.

In order to allow remote cache access, you need to adjust the permissions settings for the service depending on your authentication configuration (--client_auth ).

If you are using --client_auth=gcp_rbe , then you need to adjust permissions in the GCP IAM console.

Otherwise use --principal_based_permissions to configure per-user permissions.

If you disable client authentication (--client_auth=none ), you can add the following line to your configuration:

--principal_based_permissions=*->admin

When running on GCP, why can’t it pull my image from gcr.io?

While the GCP workers are authenticated with gcloud out of the box, images uploaded to gcr.io are not world-readable by default. You should check that the EngFlow RE role account has access to the image (or give it access if necessary).

See the Google Container Registry documentation for more details: https://cloud.google.com/container-registry/docs/access-control

The EngFlow RE role account is typically named: engflow-re-bot@<project>.iam.gserviceaccount.com

What if I get `"clone": Operation not permitted` from the sandbox?

Sandboxed execution uses clone(2), and may fail if the current user has insufficient privileges to use user namespaces.

If you run the Remote Execution service on Kubernetes or in Docker containers, or on a host where unprivileged user namespaces are disabled, sandboxed actions may fail with this error:

external/bazel/src/main/tools/linux-sandbox.cc:153: "clone": Operation not permitted

If you run the service in a container, you can try running it in privileged mode (--privileged).

You can also try enabling unprivileged user namespaces in the kernel (Debian):

sysctl -w kernel.unprivileged_userns_clone=1

As a last alternative, you can enable local execution on the server side (--allow_local ) and disable sandboxed execution on the client side (sandboxAllowed ).

Why do actions hang for 5 minutes then fail with `RESOURCE_EXHAUSTED: Max queue time exhausted`?

The rule was probably requesting an Executor Pool that didn’t exist. You can verify this theory if you override --max_queue_time_in_empty_pool to 30s for example, retry the build, and check if the same action fails exactly after that timeout.

How can I force a full rebuild to measure clean build performance?

Run bazel clean, then build again with --noremote_accept_cached.

Why do I get “403 Forbidden” errors from S3?

If you see the error on the client side:

/home/foo/.cache/bazel/_bazel_foo/84bdc474e377f556da900f3f344494fb/external/com_google_protobuf/BUILD:161:11: C++ compilation of rule '@com_google_protobuf//:protobuf_lite' failed (Exit 34): java.io.IOException: io.grpc.StatusRuntimeException: INTERNAL: Permission error while looking for 'blobs/ac/12d38991349d6297f807262fbf301ff178fdd178a4eed14c7d7df1fdbb955f89' in bucket '<BUCKET-NAME>' in region 'eu-central-1' (status 403). Details: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 29BB9C279A99CFFD; S3 Extended Request ID: <REDACTED>; Proxy: null), S3 Extended Request ID: <REDACTED>
--- cut here ---8<--------8<--------8<--------8<--- cut here ---
java.nio.file.AccessDeniedException: Permission error while looking for 'blobs/ac/12d38991349d6297f807262fbf301ff178fdd178a4eed14c7d7df1fdbb955f89' in bucket '<BUCKET-NAME>' in region 'eu-central-1' (status 403). Details: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 29BB9C279A99CFFD; S3 Extended Request ID: <REDACTED>; Proxy: null), S3 Extended Request ID: <REDACTED>
        at com.engflow.re.storage.aws.S3Client$RealApiWrapper.permissionError(S3Client.java:566)
        at com.engflow.re.storage.aws.S3Client$RealApiWrapper.download(S3Client.java:507)
        at com.engflow.re.storage.aws.S3Client.getDownloadStream(S3Client.java:257)
        at com.engflow.re.storage.ExternalStorage.lambda$getActionResult$5(ExternalStorage.java:492)
        <abbreviated>
--- cut here ---8<--------8<--------8<--------8<--- cut here ---

then you need to add s3 permissions to the IAM role policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:List*"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::<BUCKET-NAME>",
                "arn:aws:s3:::<BUCKET-NAME>/*"
            ]
        }
    ]
}

What if my C++ compilation fails with `missing dependency declarations` errors?

This error:

ERROR: /home/foo/stuff/bazel/third_party/zlib/BUILD:25:19: undeclared inclusion(s) in rule '//third_party/zlib:zlib_checked_in': this rule is missing dependency declarations for the following files included by 'third_party/zlib/trees.c':
  '/usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/limits.h'
  '/usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/syslimits.h'

The culprit is that the selected C++ toolchain’s cc_toolchain_config.cxx_builtin_include_directories is missing that directory. Add /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/ to that list.

What if my Java tests fail with `SecurityException: Can't read cryptographic policy directory`?

Bazel had a bug (https://github.com/bazelbuild/bazel/issues/9189) before release 4.0.0 that caused the security configuration files be excluded from the uploaded JDK when using --javabase=@remotejdk11_linux//:jdk, --javabase=@remotejdk15_macos//:jdk, or similar. We therefore reccomend upgrading to Bazel 4.0.0 or later.

How large of a disk should I attach to my EngFlow Virtual Machines?

We recommend giving schedulers 16GB of disk space. This need only be large enough to hold the operating system plus a little extra for logs and other resources.

Workers, however, should be given roughly 50GB per executor. So, for example, if you have set your worker config to --worker_config=4*cpu=2 you should give each worker box a disk with at least 200GB (50GB * 4 executors).

Bazel: What if I get `Invalid action cache entry` errors?

If you see a similar error to this:

Invalid action cache entry b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c: expected output foo/libbar-class.jar does not exist.

then it’s due to a Bazel 5.1 bug: https://github.com/bazelbuild/bazel/pull/15151

Downgrading to 5.0 or upgrading beyond 5.1.0 should fix the issue.

Chromium/Goma: What actions run remotely?

As of 2022-02-01 only C++ compilation actions run remotely, everything else is built locally.

Chromium/Goma: Can I debug the binaries built remotely?

Yes. Goma automatically downloads all build output files from the remote execution cluster.

Chromium/Goma: How can I download all build outputs? How can I download debug symbols?

Goma does this automatically: it downloads all build output files from the remote execution cluster.

Chromium/Goma: What are typical build times?

As of 2022-02-01, clean build of Chromium for Linux x86-64 on a typical trial cluster is about 1 hour; fully cached build is about 15 minutes.

Chromium/Goma: How to build for macOS? Windows? Android? x86-32?

Your Remote Execution cluster supports all common target platforms except for iOS (due to license restrictions). Should one be missing, please let us know!

Chromium/Goma: Can I use a custom CC wrapper?

No.

Chromium/Goma: If two developers have slightly different `gn args`, will they be able to share cache hits?

Maybe. As long as the differences aren’t substantial (e.g. different target_cpu settings), these developers can expect cross-user cache hits.

GN args that modify the Clang command line (e.g. optimization level, debug symbol output) or modify generated header files, will prevent cross-user caching.

Maybe. As long as everything below their source tree looks the same (including the output directory they ran gn args for), and the action is relocatable, these developers can expect cross-user cache hits.

Relocatable actions are independent of the current working directory and they don’t depend on the absolute paths of input files. For example C++ compilation actions are usually relocatable, users can expect to share cache hits for them.

Chromium/Goma: The Goma client crashed, how to fix it? Is this normal?

Run goma_ctl restart to fix.

This should be rare. It could happen if the compiler_proxy crashes and closes the IPC channels, crashing the gomacc processes.

Chromium/Goma: The build was hanging for a minute, then continued. Is this normal?

If you see slow builds, always look at the Active Tasks on the Goma dashboard (http://localhost:8088) to learn more.

Maybe the remote execution cluster was under heavy load, or maybe the build was running non-remoteable slow actions locally (e.g., //third_party/devtools-frontend).

Chromium/Goma: What do the Task Stats mean on the Goma dashboard?

In http://localhost:8088/#task-stats you may see:

Remote VS Local
both run     3534
goma win     23
local win    3511

Goma runs some of the remotable actions both locally and remotely. The actions race with each other; one finishes first (“wins”), the other is cancelled.

Goma does this to ensure you get the fastest compilation possible. Most actions are so fast that it’s quicker to execute them locally than wait for the network overhead of remote execution. (The power of remote execution comes from parallellism and caching.) This is the reason for the “race”, and explains the “both run” and “win” lines.

You may wonder why local actions win (almost) all races against remote. The likely culprit is that the remote cache is still cold: perhaps you checked out a new branch and need to recompile base libraries that everything else depends on.

Subsequent builds should be faster and you should see higher “goma win” rates as you get more cache hits.

2022-04-28

FAQ