-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cinder-csi-plugin] csi-cinder storage capacity #2597
base: master
Are you sure you want to change the base?
Conversation
Hi @sergelogvinov. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/ok-to-test |
b564f33
to
fbe585f
Compare
/retest |
@@ -396,6 +397,16 @@ func (os *OpenStack) GetMaxVolLimit() int64 { | |||
return defaultMaxVolAttachLimit | |||
} | |||
|
|||
// GetFreeCapacity returns free capacity of the block storage | |||
func (os *OpenStack) GetFreeCapacity() (int64, error) { | |||
res, err := volumelimit.Get(os.blockstorage).Extract() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure, but this sounds like an admin API. We can't make admin calls from CSIs, all CPO is supposed to work normally on a regular tenant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.openstack.org/api-ref/block-storage/v3/#limits-limits i did not find the role/permission list for csi-cinder.
I think this feature should be optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm it with DevStack and demo tenant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OVH Cloud, openstack account has computeOperator
, volumeOperator
permission, here doc: https://help.ovhcloud.com/csm/en-public-cloud-authenticate-api-openstack-service-account?id=kb_article_view&sysparm_article=KB0059364
but i do not know/can get the real openstack permission of it.
If you have a permission list of DevStack - i'd like to check...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked that the API is available. What puzzles me now is - how often is GetCapacity()
called? Would calling this API be a significant blow on the Cinder API in the cloud?
/ok-to-test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it calls 1/min (capacity-poll-interval
flag of csi-provisioner https://github.com/kubernetes-csi/external-provisioner/blob/c7f94435bd29e49edf1af0eda9c4ee2907c4d160/cmd/csi-provisioner/csi-provisioner.go#L111) for each storageClass and accessibleTopology (av zone).
flag enable-capacity
by default is off, so csi-provisioner does not collect this metrics by default.
fbe585f
to
f15d7e0
Compare
/retest-required |
f15d7e0
to
8c95830
Compare
/retest-required |
1 similar comment
/retest-required |
@@ -396,6 +397,21 @@ func (os *OpenStack) GetMaxVolLimit() int64 { | |||
return defaultMaxVolAttachLimit | |||
} | |||
|
|||
// GetFreeCapacity returns free capacity of the block storage, in GB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least this is not my understanding
the free storage usually comes from backend and I don't know whether every storage provider has this
and the API seems from here https://github.com/openstack/cinder/blob/master/cinder/api/v3/limits.py#L25 ?
if so ,it's actually the quota, not the real storage ? e.g you can have 100G max NFS pool
but you can set quota to 1T ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this is a quota.
Based on this quota, I cannot allocate more disk space as I am not the owner of the OpenStack cluster.
These metrics provide the scheduler with additional information about which availability zone the pod can be scheduled in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
um... at least the CSI GetFreeSpace is expecting to give real
free space?
this might confuse a little bit
and if the quota is bigger than real storage, this is not accurate info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, now I see that you mean. thanks.
I've tried to get the real size of the pool, but it required rule:admin_api
role for the api method scheduler_extension:scheduler_stats:get_pools
So, It is better to rename the function to GetFreeQuotaStorageSpace since we cannot get the real size of the pool.
a5a1e4d
to
57f388b
Compare
/retest-required |
57f388b
to
4fc81e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this change looks fine. However I wonder whether we can consider the maxTotalVolumes
and totalVolumesUsed
data returned by the limits API. In theory there can be a case, when you have a lot of capacity but running low on amount of volumes.
The csi.GetCapacityResponse
struct has only AvailableCapacity
, MaximumVolumeSize
and MinimumVolumeSize
: https://pkg.go.dev/github.com/container-storage-interface/spec/lib/go/csi#GetCapacityResponse. I guess it makes sense to add an option toggle the GetCapacity
(e.g. whether to advertise the csi.ControllerServiceCapability_RPC_GET_CAPACITY
) + add a check for maxTotalVolumes
vs totalVolumesUsed
to fake zero capacity response , when the total amount of volumes has been reached.
res, err := limits.Get(context.TODO(), os.blockstorage).Extract() | ||
if mc.ObserveRequest(err) != nil { | ||
return 0, err | ||
} | ||
|
||
capacity := res.Absolute.MaxTotalVolumeGigabytes - res.Absolute.TotalGigabytesUsed | ||
if capacity < 0 { | ||
capacity = 0 | ||
} | ||
return capacity, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kayrus asked me to take a look at this, since I'm our team's resident OpenStack quota/usage measurement expert.
This calculation is going to yield incorrect (or at the least, misleading) results when Cinder supports multiple volume types. The /limits
endpoint lumps quota and usage for all volume types together into a grand total, so if you get capacity = 10
here, it could mean "6 GiB for volume type A plus 4 GiB for volume type B", and actually creating a 10 GiB volume of either type will fail.
The correct endpoint for the thing that you want to do is https://docs.openstack.org/api-ref/block-storage/v3/index.html#show-quota-usage-for-a-project, modelled in Gophercloud as https://pkg.go.dev/github.com/gophercloud/gophercloud/v2/openstack/blockstorage/v3/quotasets#GetUsage. Unfortunately, the modelling in Gophercloud is not very good, so the relevant quota_set.gigabytes_$VOLUMETYPE
fields need to be extracted in a custom way. This is how I do it in my own code: https://github.com/sapcc/limes/blob/7153750c217e51e668ba67375e64a2938b97532b/internal/liquids/cinder/usage.go#L39-L52
Hope that helps!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the interesting thoughts! However, my focus is on addressing customer-side issues, specifically when dealing with account limits for storage space. This limit includes both block volumes and snapshots, and having this information helps the Kubernetes scheduler pods correctly, and also alerting us when we are nearing the usage limit.
The actual storage capacity is managed by the OpenStack team, not the customers. I trust that they have several methods in place to monitor and determine when additional storage capacity is needed.
4fc81e3
to
5c3d65b
Compare
This is indeed a very interesting idea. However, I can imagine that receiving an alert about running out of space when there is still space available might cause some confusion, similar to the situation with file systems and inodes. In my opinion, it would be ideal if Kubernetes first supported something like |
5c3d65b
to
e932508
Compare
Available capacity of disk storage Signed-off-by: Serge Logvinov <[email protected]>
e932508
to
0f69efa
Compare
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Available capacity of disk storage
What this PR does / why we need it:
To ensure Kubernetes scheduler selects the optimal zone/region based on available disk capacity limits of the account.
Also to have statistics/alerts in cluster for the specific storageClass
This PR doesn't aim to solve the real capacity issue because Kubernetes CSI doesn't have permission to see all the details of the disk infrastructure.
Which issue this PR fixes(if applicable):
for #2035, #2551
Special notes for reviewers:
Release note:
required documentation update/helm-chart/migration process