Skip to content

refactor!(auth): drop SSH handshake secret in favor of mTLS#1274

Draft
TaylorMutch wants to merge 1 commit intomainfrom
tmutch/ssh-handshake-helm-fix
Draft

refactor!(auth): drop SSH handshake secret in favor of mTLS#1274
TaylorMutch wants to merge 1 commit intomainfrom
tmutch/ssh-handshake-helm-fix

Conversation

@TaylorMutch
Copy link
Copy Markdown
Collaborator

Summary

Removes the misnamed `OPENSHELL_SSH_HANDSHAKE_SECRET` (`x-sandbox-secret`) mechanism. It did not authenticate SSH — SSH flows over the `RelayStream` gRPC RPC and is gated by mTLS plus the supervisor's Unix-socket permissions — it only gated a small set of sandbox→gateway control-plane RPCs. Production deployments already enforce mTLS on that channel, so the shared secret was redundant. Replaces the secret check with an mTLS-presence marker so sandbox-class callers are recognized by the absence of an OIDC Bearer token on a verified gRPC channel.

Related Issue

Refs OS-174.

Changes

  • Server auth: deleted `SANDBOX_SECRET_METHODS` interceptor, `validate_sandbox_secret`, and `mark_sandbox_secret_authenticated` from `auth/oidc.rs`. New `is_sandbox_caller` / `mark_sandbox_caller` plus `AUTH_SOURCE_SANDBOX` marker. `AuthGrpcRouter` now marks sandbox-class methods as sandbox-caller and routes dual-auth methods on Bearer-present vs Bearer-absent. `validate_sandbox_secret_update` → `validate_sandbox_caller_update` in `grpc/policy.rs`.
  • Server CLI/startup: removed `--ssh-handshake-secret` and `--ssh-handshake-skew-secs` (gateway + drivers). Removed the empty-secret startup gate.
  • Sandbox client: dropped `SandboxSecretInterceptor` and the env scrub. `OpenShellClient`/`InferenceClient` now use bare `Channel`. mTLS via `connect_channel` is unchanged.
  • Drivers: K8s, Podman, and VM no longer inject the secret or skew env vars. Podman driver no longer creates/deletes a per-sandbox Podman secret. Tests rewritten to assert absence.
  • Core config: removed `ssh_handshake_secret`, `ssh_handshake_skew_secs`, `DEFAULT_SSH_HANDSHAKE_SKEW_SECS`, and the builder methods.
  • Helm chart: deleted `ssh-handshake-secret-hook.yaml`; removed `server.sshHandshakeSecretName`, the `sshHandshake:` block, and the `OPENSHELL_SSH_HANDSHAKE_SECRET` env entry from the StatefulSet.
  • Docs: removed handshake refs from gateway man pages, RPM `CONFIGURATION.md` + `init-gateway-env.sh`, podman README, and `debug-openshell-cluster` skill. Updated `docs/reference/gateway-auth.mdx` to describe the mTLS-based sandbox-caller model.

Breaking changes

  • `--ssh-handshake-secret` / `OPENSHELL_SSH_HANDSHAKE_SECRET` and `--ssh-handshake-skew-secs` / `OPENSHELL_SSH_HANDSHAKE_SKEW_SECS` are removed from the gateway, sandbox, and all driver binaries.
  • The Helm chart no longer manages the `openshell-ssh-handshake` Secret. After upgrade, operators may delete the orphan.
  • Deployments using `--disable-gateway-auth` must enforce caller authentication at the fronting proxy, since the gateway no longer validates a per-request secret on sandbox-class methods.

Testing

  • `mise run pre-commit` passes
  • `mise run test` passes (workspace unit tests)
  • `mise run ci` passes (lint + checks + tests)
  • `mise run e2e` — could not run locally (mise tool-install hit GitHub API rate limits during cross-compile setup; unrelated to this change). Needs CI / fresh env.

Checklist

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 8, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

The OPENSHELL_SSH_HANDSHAKE_SECRET / x-sandbox-secret mechanism was
misnamed: it does not authenticate SSH (which flows over the
RelayStream gRPC RPC and is gated by mTLS plus supervisor Unix-socket
permissions). It only gated a small set of sandbox-to-gateway
control-plane RPCs, and production deployments already enforce mTLS
on that channel — so the shared secret was redundant.

Replace the secret check with an mTLS-presence marker. Sandbox-class
methods (ReportPolicyStatus, PushSandboxLogs,
GetSandboxProviderEnvironment, SubmitPolicyAnalysis, GetSandboxConfig,
GetInferenceBundle) accept callers without a Bearer token; the gRPC
mTLS handshake is the trust boundary. Dual-auth methods treat
Bearer-present as full-scope CLI access and Bearer-absent as
sandbox-restricted scope via validate_sandbox_caller_update.

Drops the secret from all drivers (K8s, Podman, VM), the sandbox gRPC
interceptor, the Helm chart (values + pre-install hook + StatefulSet
env), the RPM bootstrap script, the man pages, and the
debug-openshell-cluster skill. Also removes the never-read
ssh_handshake_skew_secs flag and config field.

BREAKING CHANGE: --ssh-handshake-secret / OPENSHELL_SSH_HANDSHAKE_SECRET
and --ssh-handshake-skew-secs / OPENSHELL_SSH_HANDSHAKE_SKEW_SECS are
removed from the gateway, sandbox, and all driver binaries. The
openshell-ssh-handshake K8s Secret is no longer managed by the chart;
operators may delete the orphan. Deployments using
--disable-gateway-auth must enforce caller authentication at the
fronting proxy, since the gateway no longer validates a per-request
secret on sandbox-class methods.

Refs OS-174.
@TaylorMutch TaylorMutch force-pushed the tmutch/ssh-handshake-helm-fix branch from 0f5b57c to 72be22f Compare May 8, 2026 23:13
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant