-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
select "best" UMAP layout for clustering #199
Comments
P.S. hotspot scrollbar was not appearing in cases where it should. But my |
Hi Erik, thanks for your thoughtful comment! This is in line with a discussion we've been having internally about where and when to cluster, given the newly-landed optional hyperparameter arrays ( One idea we were kicking around was doing the clustering in the original high-dimensional space (2048). Then each resulting UMAP projection would visually represent, via the hover-on-mouseover affordance, how well that layout captured the clustering that hdbscan saw in the original space. The user could then make a determination, via the two hyperparameter sliders, which projection worked best as a basis to start editing and curating. I have to admit the above is typed without any personal experience in clustering in such a high-dimensional space, and so it's possible this would take way too long, or would produce nonsense in any 2d projection, etc. A possible hybrid model would be to run a special umap reduction purely for the purposes of clustering, to give hdbscan like 10dims to work with... the points McInnes makes about the differing needs of visualization vs clustering in the documentation you point out are great ones and we should really take that into consideration too. I'm sure @duhaime will chime in here shortly! |
Clustering the hi-D space might be a good option to have available, at least just for comparisons. So far using the highest-D UMAP space "works well for me", as I'm actually interested in seeing the connectivity of the hi-D manifold. For novice users, auto-adding a "special umap reduction", in case a reasonable one cannot be found, seems a nice touch @pleonard212! I emit a warning (that didn't even check I guess a visual consequence of adding a potentially off-grid umap layout would be amalgamating the neighbors and min_dist info into a single layouts slider. The slider would report the 2 values of the (somehow sorted) layout index. |
Suggestion
Current pixplot clustering uses features from ...
['variants][0][
...Often this is the clustering that looks the worst (often lowest n_neighbors embedding),
and in practice rarely agrees clusters I'd like to lasso.
UMAP clustering docs and examples (and experience) suggest a reasonable approach would be to cluster based on
So what I do is
where
best_umap_clustering_json
simply does search (*) and returns the filenamehttps://github.com/kruus/pix-plot/blob/8a1cd231ce20cc075b9cb72c8ebeda97fdfb335c/pixplot/pixplot.py#L1219-L1244
Erik
The text was updated successfully, but these errors were encountered: