Skip to content

Significantly increased runner queue wait times and intermittent listener restarts after upgrading to ARC v0.14.1 #4470

@wesco-prathapmotupalli

Description

@wesco-prathapmotupalli

Checks

Controller Version

0.14.1

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1.  Deploy `actions-runner-controller` and `gha-runner-scale-set` version v0.14.1 to an AKS cluster.
2.  Configure a runner scale set with:
       `minRunners` set (e.g., 50)
       Moderate to high concurrency workflows targeting the scale set.
3.  Trigger a large number of workflows simultaneously (bursty workload).
4.  Observe the GitHub Actions UI:
       Jobs remain in queued state for 10–15 minutes.
       Scaling beyond the minimum runner count is significantly delayed.
5.  Observe Kubernetes:
       Listener pod restarts or becomes intermittently unavailable during this period.
6.  After the listener stabilizes, runners eventually spin up and jobs start executing.

Describe the bug

A significant delay in runner allocation and scaling was observed after upgrading actions-runner-controller (ARC) and gha-runner-scale-set to v0.14.1.
Workflow jobs remain queued for extended periods (approximately 10–15 minutes) before runners are allocated. This behavior becomes more pronounced when a large number of jobs start concurrently and the scale set reaches the configured minimum runner count (e.g., 50).
Additionally, during these periods, the listener component was intermittently restarting or unavailable, which correlated with delayed scaling activity and difficulty maintaining the minimum number of runners.
This issue appears to be a regression, as the same workload behaved normally prior to the upgrade.

Describe the expected behavior

Runners should be provisioned within seconds to a few minutes after jobs are queued.
Scaling beyond the configured minimum runners should start promptly when demand exceeds capacity.
The minimum runner count should be maintained consistently.
The listener component should remain stable and continuously connected under load.
Overall scaling behavior should be comparable to pre‑v0.14.1 releases under the same workload.

Additional Context

ARC version: `v0.14.1`
   gha-runner-scale-set version: `v0.14.1`
   Kubernetes: AKS
   Runner type: Self‑hosted GitHub Actions scale sets
   Regression observed immediately after upgrade
   Rolling back to a previous ARC version restores normal scaling behavior
   This issue closely mirrors the behavior discussed in:
       <https://github.com/actions/actions-runner-controller/issues/4460>

Controller Logs

NAME                                     READY   STATUS    RESTARTS   AGE
arc-gha-rs-controller-67ccddfd87-6hfxv   1/1     Running   0          2d2h
prod-67d8f47f-listener                   1/1     Running   0          3h58m
prod-large-5677b8c9-listener             1/1     Running   0          2d2h


prod-67d8f47f-listener  restarted

Runner Pod Logs

At the time when workflows are queued for extended periods (10–15 minutes), there are no runner pods available or being created to service the queued jobs.

Observed behavior:
- Jobs remain in `queued` state in GitHub Actions while no new runner pods are scheduled
- Runner pods only start appearing several minutes later, after which jobs begin executing
- This delay occurs especially when many jobs start simultaneously and the scale set has already reached the configured minimum runners

The delayed job execution is directly correlated with the absence or delayed creation of runner pods rather than runner pod failures.


kubectl get pods -n arc-system
NAME                                     READY   STATUS    RESTARTS   AGE
arc-gha-rs-controller-67ccddfd87-rgjg5   1/1     Running   0          2d1h
prod-6cb9f565-listener                   1/1     Running   0          69s
prod-large-7b7c869c-listener             1/1     Running   0          2d1h

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggha-runner-scale-setRelated to the gha-runner-scale-set modeneeds triageRequires review from the maintainers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions