How to make a PyDecoder that doesn't keep the whole image in memory? #8233

lampmerchant · 2024-07-14T13:39:57Z

lampmerchant
Jul 14, 2024

Hi all,

The docs for Writing Your Own File Codec in Python warn that if you set _pulls_fd to True, the codec will have increased freedom at the cost of keeping the whole image in memory. Keeping the whole image in memory is what set_as_raw seems to require, and what all the existing Python decoders in the repo seem to do (with _pulls_fd set to True); there don't appear to be any that follow the model of decoding data in chunks.

If I wanted to write a Python decoder that left _pulls_fd False and decoded data in chunks, how would I deliver the corresponding chunks of decoded raw data from the decode method?

Thanks!

radarhere · 2024-07-15T02:13:02Z

radarhere
Jul 15, 2024
Collaborator

I expect you could just split up the code of set_as_raw, so that you're only initialising the decoder and image once.

class CustomDecoder(ImageFile.PyDecoder):
    def init() -> None:
        # You will need to determine the "mode"
        # I expect the "rawmode" is probably the same
        self.d = Image.core.raw_decoder(mode, rawmode)
        assert self.im is not None
        self.d.setimage(self.im, self.state.extents())

    def decode(self, buffer: bytes) -> tuple[int, int]:
        # ...
        # After decoding "buffer" into raw "data"
        s = self.d.decode(data)
        if s[0] >= 0:
            msg = "not enough image data"
            raise ValueError(msg)
        if s[1] != 0:
            msg = "cannot decode image data"
            raise ValueError(msg)
        # Returning -1, 0 when finished
        return len(buffer), 0

0 replies

wiredfool · 2024-07-15T07:23:59Z

wiredfool
Jul 15, 2024
Maintainer

The _pulls_fd was done to simplify decoding where we had an external library that took a file like object and returned a bunch of bytes, where the old PIL method was to incrementally send chunks to the decoder. It made single pass decoders far easier to deal with.

The classic method of incremental data loading is the various C level shuffle methods, which take rows and spat them into the image storage. These are still at the level of rows, so something like tiled decoding is still a little tricky to handle there. (And when I was deep into Tiff work, I remember that we just didn't deal with the tiled interface)

There are some fundamental issues with not keeping the image in memory -- Pillow's existing image storage basically expects either an unloaded or completely a loaded image. There's no support for tiles, or any other semi-lazy loading. Something like Image.paste might work, or you might need to use the lower level core image methods for that to work on a pre-loaded image.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to make a PyDecoder that doesn't keep the whole image in memory? #8233

{{title}}

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How to make a PyDecoder that doesn't keep the whole image in memory? #8233

lampmerchant Jul 14, 2024

Replies: 2 comments

radarhere Jul 15, 2024 Collaborator

wiredfool Jul 15, 2024 Maintainer

lampmerchant
Jul 14, 2024

radarhere
Jul 15, 2024
Collaborator

wiredfool
Jul 15, 2024
Maintainer