feat(db) resource version cas#1292
Open
derekwaynecarr wants to merge 2 commits intoNVIDIA:mainfrom
Open
Conversation
Switch from Docker Mount API to string-based binds API with :z labels to enable SELinux-enforcing systems to access bind-mounted files. The :z option applies a shared SELinux content label, allowing containers to read supervisor binaries and TLS certificates. Docker safely ignores :z on non-SELinux systems. Signed-off-by: Derek Carr <decarr@redhat.com>
Add Compare-And-Swap (CAS) infrastructure for safe concurrent object mutations and migrate critical paths to use it. This prevents lost updates in HA deployments with multiple gateway replicas. Core infrastructure: - Add resource_version field to ObjectMeta proto (uint64) - Add resource_version column to objects table (SQLite: INTEGER, Postgres: BIGINT) - Add WriteCondition enum (MustCreate, MatchResourceVersion, Unconditional) - Add PersistenceError::Conflict variant for version mismatch - Add Store::put_if() and Store::delete_if() CAS methods - Add Store::update_message_cas() with bounded retry for mutations - Implement CAS operations for both SQLite and Postgres backends - Hydrate resource_version on all typed reads (defaults to 1 for backfill) Migrations: - Migrate policy mutations to CAS (draft operations, settings) - Migrate provider updates to CAS (credentials, config merging) - Migrate sandbox updates to CAS (phase transitions, status reconciliation) - Migrate compute status updates to CAS (driver watch event handling) Database migrations backfill existing rows with resource_version = 1. CAS updates increment atomically: resource_version = resource_version + 1. gRPC handlers map PersistenceError::Conflict to ABORTED status code to signal clients to retry with fresh data. Server-side retries use bounded retry (5 attempts) with fresh reads on each iteration. Test coverage includes concurrent update scenarios and handler-level resource_version round-trip tests. Signed-off-by: Derek Carr <decarr@redhat.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add Compare-And-Swap (CAS) infrastructure for safe concurrent object mutations
and migrate critical paths to use it. This prevents lost updates in HA
deployments with multiple gateway replicas.
Core infrastructure:
Migrations:
Database migrations backfill existing rows with resource_version = 1.
CAS updates increment atomically: resource_version = resource_version + 1.
gRPC handlers map PersistenceError::Conflict to ABORTED status code
to signal clients to retry with fresh data. Server-side retries use
bounded retry (5 attempts) with fresh reads on each iteration.
Test coverage includes concurrent update scenarios and handler-level
resource_version round-trip tests.
Related Issue
Fixes #1255
Changes
Testing
mise run pre-commitpassesChecklist