Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for a community owned GCP project for minikube #7414

Open
medyagh opened this issue Oct 15, 2024 · 11 comments
Open

Request for a community owned GCP project for minikube #7414

medyagh opened this issue Oct 15, 2024 · 11 comments
Labels
sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra.

Comments

@medyagh
Copy link
Member

medyagh commented Oct 15, 2024

Hello, minikube maintainer here, I would like to ask for a GCP project for minikube owned by the CNCF community, our release test infra is at a google owned project that we like to explore migrating it to CNCF-owned project, is this the right place to ask for it ?

related: kubernetes/test-infra#33654

@medyagh medyagh added the sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. label Oct 15, 2024
@ameukam
Copy link
Member

ameukam commented Oct 15, 2024

cc @BenTheElder @upodroid

@BenTheElder
Copy link
Member

BenTheElder commented Oct 15, 2024

Can you outline some more detailed requirements so we can determine how best to provide them?

We generally try to work from "my project needs VMs for testing cgroups v2 which we cannot do locally in a CI container" => "well we have AWS credits let's use EC2, make sure to use boskos to rent access" or "my project needs to host container images" => use registry.k8s.io (which is AWS+GCP, there are standardized docs for setting up image hosting on here in this repo). We have to maintain balance across the budgets available to the project.

What infra we do provide we also setup here in git wherever possible (terraform, bash etc), so it's auditable and so others can chip in in the future, instead of just creating cloud project admins and having them create random resources. So we need to know what to spin up, exactly.

We have a lot of existing shared resources in the project for things like CI and release.
We also have been asking subprojects to for example use github to host binaries, to avoid digging a deeper dependency on vendor credits when there are reasonable alternatives.

@medyagh
Copy link
Member Author

medyagh commented Oct 21, 2024

There are multiple aspects of it, and since it is an 8 years old infrastructure for both test/release and also hosting live apps and released artifacts (binaries and tarballs, ISOs, docker images ...) .
currently there is no concrete plan on how the design of the new infrastructure would be.

I agree that we would like to leverage github binaries and github actions as much possible when do-able, some cases might not work such as minikube preload tarball images.

Currently the idea is to get a footprint in the public owned infra and then try to move little by little without disrupting the system or unrealistic overcapacity re-eingeering

The current requirements coming to mind

  • GCS buckets (ISOs, Preload Tarballs, released binaries, json files, ...)
  • Compute Engine VMs (jenkins CI and test agents)
  • Cloud Run (to host minikube apps such as triage-party, gopoph-server)
  • Artifact Registry to host various images (in-house addons such as storage provisioner, kic base image,...)

This is a good list to start with but not comprehensive,

The idea to to get a footprint in the new project and re-evaluate the path forward.

@BenTheElder
Copy link
Member

BenTheElder commented Oct 22, 2024

Currently the idea is to get a footprint in the public owned infra and then try to move little by little without disrupting the system or unrealistic overcapacity re-eingeering

We have already engineered systems for e.g. hosting images though, and we do not want to dig a new unsustainable hole for these.

From the specific examples:

Artifact Registry to host various images (in-house addons such as storage provisioner, kic base image,...)

We do not want users consuming directly from any paid SaaS like this, it is a liability for the project (we have no flexibility to shift costs when utilization and funding shifts).

We shouldn't re-introduce this.

GCS buckets (ISOs, Preload Tarballs, released binaries, json files, ...)

See above comment, also can be hosted on github at no cost?

Compute Engine VMs (jenkins CI and test agents)

Can we use our existing CI infra? We already have a lot of resources behind this and they're shared/pooled across the project. We care a lot about things like making sure that VMs get cleaned up when they're no longer in use.

At the scale that we're supporting, if every project runs custom unmonitored systems we can't keep track of the waste.
When subprojects rent an e2e project/account on prow and create a test cluster there, we have some assurances that whether or not the test itself is a good use of resources, the resources will not be forgotten to run indefinitely.


currently there is no concrete plan on how the design of the new infrastructure would be.

The idea to to get a footprint in the new project and re-evaluate the path forward.

That's just not how we run k8s infra though, it's not transparent or sustainable.

Everything we've lifted and shifted previously we've span up a new copy in k8s infra, with the specifics checked in, so others can read through, edit/PR, and otherwise take over in the future.

We haven't granted any subproject the ability to arbitrarily create cloud resources in a project because it's not accountable and it's not reproducible. Everything we're running can be traced back to e.g. https://github.com/kubernetes/k8s.io/tree/main/infra/gcp/terraform and the SIG (as steward) has agreed is reasonable to run (and always sought out the most effective answers, we've had to work hard to reach sustainable spend, up to and including things like working with SIG Scalability to evaluate their test workloads and adjust frequency and scheduling).

@BenTheElder
Copy link
Member

cc @dims (chair) in additional to TLs (#7414 (comment))

@BenTheElder
Copy link
Member

All of the infra we've migrated has been similarly old if not older and it does take a lot of work, but I also think we really don't want to regress from all the effort we've put in so far and the ground rules we've stablished (such as not permitting non-community owned accounts into our CI), which are all based on mitigating real issues we've experienced in the past.

It's really important that I or any of the other infra leads can quit and someone else can pick up the pieces without blockers, and that we keep an eye on sustainable spend and know what it is that we're funding and what the usage trends are.

@medyagh
Copy link
Member Author

medyagh commented Oct 28, 2024

I undrestand and I agree with leveraging github as much as possible, Some of the the artifacts can be hosted in github such as binaries, as part of the Release Assets
however some can not such a preload tar balls since they would need to be generated per kubernetes version per container runtime, and they get generated After minikube is released, that would require a separate release tag or possibly a new Kubernetes Projects just for preloads generation.

there are also many jobs that build ISOs and Kic Images Per PR and push to the PR, that would not be doable in Free github action machines, that would need beefy machines to build ISOs.

currently we have 80 internal autmoation jobs (not dependabot) thats bumps new versions of ISO/Image software and pushes a new ISO during Off peak hours (mid night) those wouldn not be implementable using github or github actions.

also as mentioned in my previous comment, we also have multiple hosted Software running for minikube that are essential in running minikube project, currently deployed to Cloud run

@BenTheElder
Copy link
Member

however some can not such a preload tar balls since they would need to be generated per kubernetes version per container runtime, and they get generated After minikube is released, that would require a separate release tag or possibly a new Kubernetes Projects just for preloads generation.

The content contained in github releases is mutable, even after advertising a release publicly.

Are these "preload tarballs" essentially a set of container images? Because that sounds like if we host it we're going to have the registry.k8s.io egress problem duplicated. Per above it sounds like these are advertised directly from GCS buckets, which is not a cost-effective approach and not something we want to do again.

Cost effectiveness aside, it limits our ability to make decisions later about what resources to use for hosting as users become dependent on the buckets and make assumptions about them).

Again, we have an established process and common infra for container image hosting: https://github.com/kubernetes/k8s.io/tree/main/registry.k8s.io#managing-kubernetes-container-registries

@upodroid has been working on migrating the staging to artifact registry and may have some updates for the process but we don't have to block on that.

there are also many jobs that build ISOs and Kic Images Per PR and push to the PR, that would not be doable in Free github action machines, that would need beefy machines to build ISOs.

That's a distinct problem from where they're hosted though. The output of the jobs can be copied where we need it ...?

also as mentioned in my previous comment, we also have multiple hosted Software running for minikube that are essential in running minikube project, currently deployed to Cloud run

ACK ... We still need an accounting of what exactly.

Should probably prioritize the most critical assets first.

@medyagh
Copy link
Member Author

medyagh commented Nov 13, 2024

Are these "preload tarballs" essentially a set of container images?
the preload are not images, but they are essentially the File System Compressed for a specific Runtime/FileSystem Storage/Kubernetes Version, that way both VM and Container Drivers can spin up quickly without having to load each image individually to the container runtime

@BenTheElder
Copy link
Member

Ok, but we still have to sustainably host the ingress if we're paying for it in k8s infra. We have an allocation for the core repos binaries (we get a bandwidth budget that we negotiated based on that need), and we have registry.k8s.io

We have to be careful with introducing content hosts because we have limited ability to cut usage and manage costs. We've been asking subprojects to use GitHub releases to host files. We probably would do this for Kubernetes too but we have a huge legacy around that and we receive an ongoing donation specifically for that problem.

@ameukam
Copy link
Member

ameukam commented Nov 14, 2024

IMHO we should break down this migration project in different conversations. I can't definitively do a lift and shift for Minikube.
Can we start the CI migration and migrate away from Jenkins to Prow ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra.
Projects
None yet
Development

No branches or pull requests

3 participants