Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Register act_runner as ephemeral to Gitea #32461

Closed
6 tasks done
ChristopherHX opened this issue Nov 9, 2024 · 11 comments · Fixed by #33570
Closed
6 tasks done

Register act_runner as ephemeral to Gitea #32461

ChristopherHX opened this issue Nov 9, 2024 · 11 comments · Fixed by #33570
Assignees
Labels
topic/gitea-actions related to the actions of Gitea type/proposal The new feature has not been accepted yet but needs to be discussed first.

Comments

@ChristopherHX
Copy link
Contributor

ChristopherHX commented Nov 9, 2024

Feature Description

My idea is to

  • add a Ephemeral field to the database structure of the Runner
  • when the Ephemeral field is true let FetchTask return no task without error if the assigned task is in progress
  • when the Ephemeral field is true let FetchTask return no task with an error if the assigned task is in done, and remove the runner from the database.
  • UpdateTask and UpdateLog scope access to runnerid
  • Update the runnerv1 protocol to have an Ephemeral field as well during registration
  • Update act_runner to have the ephemeral flag, which implies run-once

This proposal would allow to securely deploy a registred host-mode act_runner in a VM and reset the same after job exit.

Some of my idea has been scetched in https://github.com/ChristopherHX/gitea/tree/ephemeral-runners.

Protocol proposal here: https://gitea.com/gitea/actions-proto-def/pulls/14

Read more here: https://gitea.com/gitea/act_runner/issues/19#issuecomment-932880

Screenshots

No response

@ChristopherHX ChristopherHX added the type/proposal The new feature has not been accepted yet but needs to be discussed first. label Nov 9, 2024
@kemzeb kemzeb added the topic/gitea-actions related to the actions of Gitea label Nov 9, 2024
@wolfogre
Copy link
Member

wolfogre commented Nov 11, 2024

I have a wait-and-see attitude towards this proposal.

Regarding running safely in host mode, my first instinct is:

  • Do not register a new act_runner every time.
  • The concurrency of tasks for the registered runner can only be 1, so during the execution of a task, the runner will not fetch a second one. (This is my biggest concern about the proposal; when the runner is not suitable for receiving new tasks, it should not FetchTask, rather than letting the FetchTask function decide whether to assign tasks.)
  • When executing tasks, once the execution is successful, do not immediately fetch the next task. Instead, clean up the env or even rebuild the virtual machine.
  • When starting a new runner in new env, reuse the local state file of the previously registered runner so that it can be recognized and accepted by Gitea without needing to register again.

To clarify, in the current design, Gitea does not actively assign tasks to runners; it only attempts to assign a new task when a runner requests one (if available). The purpose of this design is to allow the runner to decide for itself whether it is ready to receive more tasks, while Gitea only determines if there are new tasks to assign.

lunny pushed a commit that referenced this issue Nov 11, 2024

Verified

This commit was signed with the committer’s verified signature.
Per proposal #32461
@ChristopherHX
Copy link
Contributor Author

Regarding running safely in host mode, my first instinct is:

This matches with the new run once mode that has been recently merged: https://gitea.com/gitea/act_runner/pulls/598

Anyone can fetch new jobs with the current runner state files, that could be uploaded to a bad actors server when the runner is in host-mode.

The same applies to GitHub Actions as long they are not ephemeral.

My proposal would optionally invalidate the token like in GitHub Actions.

Making FetchTask always fail after a job has been fetched is another idea that depends on once mode of the runner.

@wolfogre
Copy link
Member

It cannot be a reason that "#598 has been merged so we should keep going that way." I just want to discuss whether we have a better design to handle this.

@ChristopherHX
Copy link
Contributor Author

ChristopherHX commented Nov 11, 2024

Yes let's discuss, this is still a pure proposal except the hardening change.

I only wanted to clairify the problems of the approuch you have described and that it looks based on my opinion like the once flag.

In GitHub Actions there is an alternative way as well for long lived runner state files

  • make the SYSTEM_RUNTIME_TOKEN writeable for logs and task state (GitHub Actions has this as of 2019)
  • allow spawning a script that executes a fetched task by pipeing stdio and sent updates directly to gitea (My custom github-act-runner has this, but only GitHub Actions can be used for direct communication)

Adding more native act runtimes, seems to be a mess.

DennisRasey pushed a commit to DennisRasey/forgejo that referenced this issue Nov 21, 2024

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.
Per proposal go-gitea/gitea#32461

(cherry picked from commit f888e45432ccb86b18e6709fbd25223e07f2c422)
@ChristopherHX
Copy link
Contributor Author

This proposal aims to implement https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/autoscaling-with-self-hosted-runners#using-ephemeral-runners-for-autoscaling.

While I agree that gitea doesn't assign jobs to a specfic runner, but I still want the stronger security you get by this feature of GitHub Actions also in Gitea.

Once my main post would get 10 upvotes or more I create a Pull Request (eventually a ping/message in this issue is required) until then I wait for alternative proposals that bring similar security to act_runner in host mode.

I will create another proposal for uploading logs and job state via the ACTIONS_RUNTIME_TOKEN so an act_runner can act as auto scaler for single job act_runners.

@KhaineBOT
Copy link

The challenge is how to manage the ephemeral runners. If you need to spin up, register, run the act, and then tear it down. Either, you need a third party piece of software to do this orchestration or it gets built into gitea. I think what would make sense to define a lightweight orchestration protocol. Third parties could use this to create complex orchestrations (i.e. running across many disparate hosts/platforms.) and gitea could implement simple orchestration on the local host. For many users, a simple host/docker only orchestration solution built into gitea would be highly beneficial.

We should make it as easy as possible for ephemeral runners to be used. Ideally, they should be the preferred method given the risks posed by long running acts.

@gabriel-samfira
Copy link

gabriel-samfira commented Feb 8, 2025

I am willing to add support for gitea in GARM if it comes somewhat close to how ephemeral runners and webhooks work in github. I could then abstract away the code forge part of GARM and create an implementation for gitea. That would help with the ephemeral runner management part. And it would unlock the plethora of IaaS providers that GARM offers, for gitea as well.

Edit: someone already opened a request for this feature here: cloudbase/garm#323

@ChristopherHX
Copy link
Contributor Author

From your referenced request, it might make sense that you would create a proposal additionally for the following item, which is 0% covered right now and reference this proposal

can send webhooks with job runs so we can schedule runners to a pool

this proposal seem to hit 10 upvotes, so I will look into making a Pull Request for the ephemeral runner part soon

@ChristopherHX
Copy link
Contributor Author

ChristopherHX commented Feb 23, 2025

workflow_job webhook experimental #33694 as independent feature

Can be cherry picked on top of ephemeral runners PR to be tested together.

I'm using this branch with both changes to test things locally: https://github.com/ChristopherHX/gitea/tree/workflow_job_webhook, beware this contains Database Migration that are not part of nightly backups are important.

richmahn pushed a commit to unfoldingWord/dcs that referenced this issue Mar 2, 2025
@cobak78
Copy link

cobak78 commented Mar 12, 2025

I think this issue https://codeberg.org/forgejo/discussions/issues/241 is slightly related and explores another solution on this topic by moving the poller to an external scaler.

@ChristopherHX
Copy link
Contributor Author

Hello @cobak78,

I skimmed over your discussion, while I'm planning to only implement the workflow_run + workflow_job endpoints defined by GitHub in Gitea instead of brewing our own like you seem to have done for Forgejo. This means for example using github_runner scaler of keda instead of creating a Gitea specific one, my initial test seems to show the queue metric can be calculated from my POC branch without webhooks or code changes to keda.

If you trust everyone who gains access to queue Jobs on your Forgejo instance, please ignore the rest of my post.

I would suggest that you audit the security of your Forgejo runner long living credentials when using labels with :host, since only the :docker:// labels can protect the credential file, without using the feature discussed here, based on my knowledge from the Gitea runner and nektos/act.

My goal of ephemeral runners in Gitea is, even if the long living runner credentials are exposed to a non trusted job that they become invalid for accessing additional write tokens + secets by fetching additional tasks that everybody can do with read access to the runner credential file (you make use of the reusability of this file content as a feature).

I saw inside your gist an kubernetes internal dns name for the runner creds to Forgejo, this means for exploiting this outside of kubernetes you need to know the public endpoint of gitea if this is not already leaked inside the job in GITHUB_SERVER_URL, but afaik the original registration domain is not recorded on the server to block access.

For a private or proof of concept instance your approach is perfectly fine 👍 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/gitea-actions related to the actions of Gitea type/proposal The new feature has not been accepted yet but needs to be discussed first.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants