Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chesapeake RasterDatasets will not be able to be used "as is" in modeling #2283

Open
calebrob6 opened this issue Sep 4, 2024 · 4 comments
Open
Labels
documentation Improvements or additions to documentation

Comments

@calebrob6
Copy link
Member

calebrob6 commented Sep 4, 2024

Issue

The Chesapeake state-specific family of RasterDatasets are land cover masks that have values like "11" for water, and "22" for impervious structures. If you were to create an IntersectionDataset with some imagery layer, then you would not be able to use this in modeling as torch cross entropy expects the mask values to be in [0, num_classes - 1]. You would first need to write a transform to re-map the values to this range.

@calebrob6 calebrob6 added the documentation Improvements or additions to documentation label Sep 4, 2024
@calebrob6
Copy link
Member Author

Another thing I just found is that by default the dataset will be instantiated with both the 2013 and 2018 layers if you pass the root directory (i.e. what you would do if you used download=True). If you use a RandomGeoSampler then sometimes you will get 2013 patches and sometimes you will get 2018 patches. If you've already downloaded it and do ds = ChesapeakeDE(paths="data/ChesapeakeDE/de_lulc_2013_2022-Edition.tif") then you'll just get a single layer.

@adamjstewart
Copy link
Collaborator

This seems to come up a lot. Especially in our land cover datasets, you almost always want to be able to select a subset of classes and then map them to ordinal numbers. CDL and NLCD do this, and I want to do the same for many others.

Instead of hard-coding this in every dataset, should we create a standard Kornia transform to do this? I think it should be easy to do.

@calebrob6
Copy link
Member Author

calebrob6 commented Sep 5, 2024

I think you can simply do something like:

class_val_to_idx = np.array([
0, 
0,
....
1,   # the 11th index should map to 1
2,  # the 12th index should map to 2
...
])

then mask = class_val_to_idx[mask]

you almost always want to be able to select a subset of classes

Why would you only want to re-map a subset of classes?

@adamjstewart
Copy link
Collaborator

I think you can simply do something like:

Correct, but we could formalize this in a more user-friendly transform.

Why would you only want to re-map a subset of classes?

Not remap a subset, select a subset. So instead of training on 256 class CDL, you pick the 10 most common classes and only use those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants