Chesapeake RasterDatasets will not be able to be used "as is" in modeling #2283

calebrob6 · 2024-09-04T17:06:35Z

Issue

The Chesapeake state-specific family of RasterDatasets are land cover masks that have values like "11" for water, and "22" for impervious structures. If you were to create an IntersectionDataset with some imagery layer, then you would not be able to use this in modeling as torch cross entropy expects the mask values to be in [0, num_classes - 1]. You would first need to write a transform to re-map the values to this range.

The text was updated successfully, but these errors were encountered:

calebrob6 · 2024-09-04T17:21:37Z

Another thing I just found is that by default the dataset will be instantiated with both the 2013 and 2018 layers if you pass the root directory (i.e. what you would do if you used download=True). If you use a RandomGeoSampler then sometimes you will get 2013 patches and sometimes you will get 2018 patches. If you've already downloaded it and do ds = ChesapeakeDE(paths="data/ChesapeakeDE/de_lulc_2013_2022-Edition.tif") then you'll just get a single layer.

adamjstewart · 2024-09-05T12:53:14Z

This seems to come up a lot. Especially in our land cover datasets, you almost always want to be able to select a subset of classes and then map them to ordinal numbers. CDL and NLCD do this, and I want to do the same for many others.

Instead of hard-coding this in every dataset, should we create a standard Kornia transform to do this? I think it should be easy to do.

calebrob6 · 2024-09-05T14:37:13Z

I think you can simply do something like:

class_val_to_idx = np.array([
0, 
0,
....
1,   # the 11th index should map to 1
2,  # the 12th index should map to 2
...
])

then mask = class_val_to_idx[mask]

you almost always want to be able to select a subset of classes

Why would you only want to re-map a subset of classes?

adamjstewart · 2024-09-05T14:45:31Z

I think you can simply do something like:

Correct, but we could formalize this in a more user-friendly transform.

Why would you only want to re-map a subset of classes?

Not remap a subset, select a subset. So instead of training on 256 class CDL, you pick the 10 most common classes and only use those.

calebrob6 added the documentation Improvements or additions to documentation label Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chesapeake RasterDatasets will not be able to be used "as is" in modeling #2283

Chesapeake RasterDatasets will not be able to be used "as is" in modeling #2283

calebrob6 commented Sep 4, 2024 •

edited

Loading

calebrob6 commented Sep 4, 2024

adamjstewart commented Sep 5, 2024

calebrob6 commented Sep 5, 2024 •

edited

Loading

adamjstewart commented Sep 5, 2024

Chesapeake RasterDatasets will not be able to be used "as is" in modeling #2283

Chesapeake RasterDatasets will not be able to be used "as is" in modeling #2283

Comments

calebrob6 commented Sep 4, 2024 • edited Loading

Issue

calebrob6 commented Sep 4, 2024

adamjstewart commented Sep 5, 2024

calebrob6 commented Sep 5, 2024 • edited Loading

adamjstewart commented Sep 5, 2024

calebrob6 commented Sep 4, 2024 •

edited

Loading

calebrob6 commented Sep 5, 2024 •

edited

Loading