Skip to content

feat(rpm): replace init-pki.sh with openshell-gateway generate-certs#1258

Open
TaylorMutch wants to merge 1 commit intomainfrom
tmutch/rpm-certgen-cutover
Open

feat(rpm): replace init-pki.sh with openshell-gateway generate-certs#1258
TaylorMutch wants to merge 1 commit intomainfrom
tmutch/rpm-certgen-cutover

Conversation

@TaylorMutch
Copy link
Copy Markdown
Collaborator

@TaylorMutch TaylorMutch commented May 7, 2026

Summary

RPM cutover: the gateway systemd user unit's ExecStartPre now invokes openshell-gateway generate-certs --output-dir %S/openshell/tls instead of the 197-line deploy/rpm/init-pki.sh openssl wrapper. One PKI implementation, one file layout, real test coverage.

Builds on #1257, which landed the generate-certs subcommand and its --output-dir local mode.

Changes

  • Spec rewire (openshell.spec):
    • ExecStartPre=/usr/bin/openshell-gateway generate-certs --output-dir %S/openshell/tls (was init-pki.sh %S/openshell/tls).
    • Removed the install -pm 0755 deploy/rpm/init-pki.sh ... line and the matching %files gateway entry.
  • deploy/rpm/init-pki.sh deleted (-197 lines).
  • pki.rs::DEFAULT_SERVER_SANS gains host.containers.internal so podman parity is built-in. Docker (host.docker.internal) and Kubernetes (cluster.local DNS) were already covered. The RPM systemd unit needs no extra --server-san flag; k8s Helm chart also picks it up automatically.
  • Docs: man page (deploy/man/openshell-gateway.8.md), RPM CONFIGURATION.md, and the comment in init-gateway-env.sh all point at the new entrypoint.

The output paths, file modes, and CLI auto-discovery copy are byte-for-byte identical to what init-pki.sh produced — every OPENSHELL_TLS_* / OPENSHELL_PODMAN_TLS_* path in the unit stays valid without edits.

Testing

Local binary smoke

$ openshell-gateway generate-certs --output-dir /tmp/test/state/openshell/tls
INFO openshell_server::certgen: PKI files created. dir=/tmp/test/state/openshell/tls

$ ls -la /tmp/test/state/openshell/tls/{ca.crt,ca.key,server,client}/...
-rw-r--r--   ca.crt
-rw-------   ca.key
-rw-r--r--   server/tls.crt
-rw-------   server/tls.key
-rw-r--r--   client/tls.crt
-rw-------   client/tls.key

$ openssl x509 -in tls/server/tls.crt -noout -ext subjectAltName
DNS:openshell, DNS:openshell.openshell.svc, DNS:openshell.openshell.svc.cluster.local,
DNS:localhost, DNS:host.docker.internal, DNS:host.containers.internal, IP Address:127.0.0.1

$ openshell-gateway generate-certs --output-dir /tmp/test/state/openshell/tls
INFO openshell_server::certgen: PKI files already exist, skipping.

Helm cluster regression check

Deleted both Secrets, redeployed via Skaffold, confirmed:

  • Both kubernetes.io/tls Secrets created with 3 keys each, chain verifies via openssl verify.
  • Server cert SANs include the new host.containers.internal alongside the existing 6 — no duplicates.
  • StatefulSet stabilized, no regression.

Pre-commit

  • mise run pre-commit passes (clippy -D warnings, fmt, markdownlint, tests).
  • pki.rs::tests::build_server_sans_includes_defaults_and_extras continues to pass — uses DEFAULT_SERVER_SANS.len(), auto-adapts.

What this PR does not test locally

  • COPR/Packit RPM build. Triggered automatically on PR; will surface any spec-level issues.
  • systemd ExecStartPre execution on a real Fedora host. Plan: install the COPR-built RPM in a Fedora VM (or podman run --systemd=always fedora) and run systemctl --user enable --now openshell-gateway.service, then verify the 6 PEMs land under ~/.local/state/openshell/tls/.

Checklist

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 7, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@TaylorMutch TaylorMutch force-pushed the tmutch/rpm-certgen-cutover branch from bea306f to 6c7d354 Compare May 8, 2026 04:36
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 8, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@maxamillion
Copy link
Copy Markdown
Collaborator

@TaylorMutch I very much like where this is going, thank you for working on this. Let me know when you're ready to take it out of draft and have a review done.

Cuts the RPM gateway over to the unified Rust certgen path. The systemd
user unit's first ExecStartPre now invokes:

  /usr/bin/openshell-gateway generate-certs --output-dir %S/openshell/tls

producing the same six-PEM layout init-pki.sh built (ca.{crt,key},
server/tls.{crt,key}, client/tls.{crt,key}) and the same CLI mTLS copy
under $XDG_CONFIG_HOME/openshell/gateways/openshell/mtls/. None of the
OPENSHELL_TLS_* / OPENSHELL_PODMAN_TLS_* paths in the unit change.

Adds host.containers.internal to the gateway's built-in SAN list so
podman containers reaching their host validate cleanly with no
per-deployment --server-san flag. Docker (host.docker.internal) and
Kubernetes (cluster.local DNS) were already covered.

Drops 197 lines of openssl shell, the install/file lines for the script
itself, and updates the docs (man page, RPM CONFIGURATION.md, env-file
generator comment) to point at the new entrypoint. The %S state dir,
unit security hardening, and consumer paths are untouched.
@TaylorMutch TaylorMutch force-pushed the tmutch/rpm-certgen-cutover branch from 6c7d354 to 267fe24 Compare May 8, 2026 17:03
@TaylorMutch TaylorMutch marked this pull request as ready for review May 8, 2026 17:06
@TaylorMutch TaylorMutch added test:e2e Requires end-to-end coverage labels May 8, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

Label test:e2e applied for 267fe24. Open the existing run and click Re-run all jobs to execute with the label set. The E2E Gate check on this PR will flip green automatically once the run finishes.

1 similar comment
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

Label test:e2e applied for 267fe24. Open the existing run and click Re-run all jobs to execute with the label set. The E2E Gate check on this PR will flip green automatically once the run finishes.

Copy link
Copy Markdown
Collaborator

@maxamillion maxamillion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this, I like this approach much better than the shell script and I think the error behavior makes more sense because it's explicit instead of the shell script quietly cleaning up when the certs are half setup

@TaylorMutch
Copy link
Copy Markdown
Collaborator Author

FYI @maxamillion, @drew and I hit some issues on this front. Will double back on this next week once we get this more well tested. Let us know if you are able to test it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:e2e Requires end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants