Checks
Controller Version
0.14.1
Deployment Method
Helm
Checks
To Reproduce
1. Deploy `actions-runner-controller` and `gha-runner-scale-set` version v0.14.1 to an AKS cluster.
2. Configure a runner scale set with:
`minRunners` set (e.g., 50)
Moderate to high concurrency workflows targeting the scale set.
3. Trigger a large number of workflows simultaneously (bursty workload).
4. Observe the GitHub Actions UI:
Jobs remain in queued state for 10–15 minutes.
Scaling beyond the minimum runner count is significantly delayed.
5. Observe Kubernetes:
Listener pod restarts or becomes intermittently unavailable during this period.
6. After the listener stabilizes, runners eventually spin up and jobs start executing.
Describe the bug
A significant delay in runner allocation and scaling was observed after upgrading actions-runner-controller (ARC) and gha-runner-scale-set to v0.14.1.
Workflow jobs remain queued for extended periods (approximately 10–15 minutes) before runners are allocated. This behavior becomes more pronounced when a large number of jobs start concurrently and the scale set reaches the configured minimum runner count (e.g., 50).
Additionally, during these periods, the listener component was intermittently restarting or unavailable, which correlated with delayed scaling activity and difficulty maintaining the minimum number of runners.
This issue appears to be a regression, as the same workload behaved normally prior to the upgrade.
Describe the expected behavior
Runners should be provisioned within seconds to a few minutes after jobs are queued.
Scaling beyond the configured minimum runners should start promptly when demand exceeds capacity.
The minimum runner count should be maintained consistently.
The listener component should remain stable and continuously connected under load.
Overall scaling behavior should be comparable to pre‑v0.14.1 releases under the same workload.
Additional Context
ARC version: `v0.14.1`
gha-runner-scale-set version: `v0.14.1`
Kubernetes: AKS
Runner type: Self‑hosted GitHub Actions scale sets
Regression observed immediately after upgrade
Rolling back to a previous ARC version restores normal scaling behavior
This issue closely mirrors the behavior discussed in:
<https://github.com/actions/actions-runner-controller/issues/4460>
Controller Logs
NAME READY STATUS RESTARTS AGE
arc-gha-rs-controller-67ccddfd87-6hfxv 1/1 Running 0 2d2h
prod-67d8f47f-listener 1/1 Running 0 3h58m
prod-large-5677b8c9-listener 1/1 Running 0 2d2h
prod-67d8f47f-listener restarted
Runner Pod Logs
At the time when workflows are queued for extended periods (10–15 minutes), there are no runner pods available or being created to service the queued jobs.
Observed behavior:
- Jobs remain in `queued` state in GitHub Actions while no new runner pods are scheduled
- Runner pods only start appearing several minutes later, after which jobs begin executing
- This delay occurs especially when many jobs start simultaneously and the scale set has already reached the configured minimum runners
The delayed job execution is directly correlated with the absence or delayed creation of runner pods rather than runner pod failures.
kubectl get pods -n arc-system
NAME READY STATUS RESTARTS AGE
arc-gha-rs-controller-67ccddfd87-rgjg5 1/1 Running 0 2d1h
prod-6cb9f565-listener 1/1 Running 0 69s
prod-large-7b7c869c-listener 1/1 Running 0 2d1h
Checks
Controller Version
0.14.1
Deployment Method
Helm
Checks
To Reproduce
Describe the bug
A significant delay in runner allocation and scaling was observed after upgrading actions-runner-controller (ARC) and gha-runner-scale-set to v0.14.1.
Workflow jobs remain queued for extended periods (approximately 10–15 minutes) before runners are allocated. This behavior becomes more pronounced when a large number of jobs start concurrently and the scale set reaches the configured minimum runner count (e.g., 50).
Additionally, during these periods, the listener component was intermittently restarting or unavailable, which correlated with delayed scaling activity and difficulty maintaining the minimum number of runners.
This issue appears to be a regression, as the same workload behaved normally prior to the upgrade.
Describe the expected behavior
Runners should be provisioned within seconds to a few minutes after jobs are queued.
Scaling beyond the configured minimum runners should start promptly when demand exceeds capacity.
The minimum runner count should be maintained consistently.
The listener component should remain stable and continuously connected under load.
Overall scaling behavior should be comparable to pre‑v0.14.1 releases under the same workload.
Additional Context
Controller Logs
Runner Pod Logs