Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move var, std to ColorVectorSpace #872

Open
johnnychen94 opened this issue Feb 11, 2020 · 12 comments
Open

move var, std to ColorVectorSpace #872

johnnychen94 opened this issue Feb 11, 2020 · 12 comments
Milestone

Comments

@johnnychen94
Copy link
Member

johnnychen94 commented Feb 11, 2020

The right place for var and std should be ColorVectorSpace

function Statistics.var(A::AbstractArray{C}; kwargs...) where C<:AbstractGray
imgc = channelview(A)
base_colorant_type(C)(var(imgc; kwargs...))
end
function Statistics.var(A::AbstractArray{C,N}; kwargs...) where {C<:Colorant,N}
imgc = channelview(A)
colons = ntuple(d->Colon(), Val(N))
inds1 = axes(imgc, 1)
val1 = Statistics.var(view(imgc, first(inds1), colons...); kwargs...)
vals = similar(imgc, typeof(val1), inds1)
vals[1] = val1
for i in first(inds1)+1:last(inds1)
vals[i] = Statistics.var(view(imgc, i, colons...); kwargs...)
end
base_colorant_type(C)(vals...)
end
Statistics.std(A::AbstractArray{C}; kwargs...) where {C<:Colorant} = mapc(sqrt, Statistics.var(A; kwargs...))

Steps to achieve it:

  1. copy codes to ColorVectorSpaces, merge and ask for a new release
  2. remove codes here: since ColorVectorSpace is loaded in this package, we don't need to deprecate them. But we need some compatibility-checking codes so that if an older version of ColorVectorSpace is installed, this package still functions normally.
@dipak101505
Copy link

@johnnychen94
Copy link
Member Author

It's just a matter of taste, I would choose https://github.com/JuliaGraphics/ColorVectorSpace.jl/blob/f2baf1d3e3c9769020692abe747289899d08f35e/src/ColorVectorSpace.jl#L210-L211 since they all belong to Statistics 😄

@Tokazama
Copy link

I think this makes more sense than having them here. Isn't defining them here technically type piracy? Whereas ColorVectorSpace is the agreed place to add these operations.

@kimikage
Copy link

If you do this, why not move complement (#698) within the new version?

@kimikage
Copy link

kimikage commented Mar 9, 2020

Since the current implementation of var uses the channelview, there is a discussion on where and how var should be implemented.
JuliaGraphics/ColorVectorSpace.jl#125 (comment)

I'm not a dedicated developer of JuliaImages packages:sweat_smile:, so I'm waiting for the advice of core developers or users.

@kimikage
Copy link

kimikage commented Mar 10, 2020

I thought handling the keyword argument dims was annoying (cf. JuliaGraphics/ColorVectorSpace.jl#125 (comment)).
However, in the first place, I noticed that dims has not been supported.:sweat:

julia> using Images, FixedPointNumbers, Statistics

julia> var(rand(RGB{N0f8},2,3))
RGB{Float64}(0.06165884916057927,0.09422017172882223,0.014523644752018458)

julia> var(rand(RGB{N0f8},2,3), dims=1)
ERROR: MethodError: no method matching RGB(::Array{Float32,2}, ::Array{Float32,2}, ::Array{Float32,2})

If we don't have to support dims now, var can be implemented as follows:

using ColorVectorSpace: sumtype

Statistics.var(A::AbstractArray{C}; corrected::Bool=true, mean=nothing, dims=:) where C <: Colorant = 
    _var(A, corrected, mean, dims)

Statistics.varm(A::AbstractArray{C}, mean; dims=:, corrected::Bool=true) where C <: Colorant =
    _varm(A, mean, corrected, dims)

_var(A::AbstractArray{C}, corrected::Bool, mean, dims::Colon) where C <: Colorant =
    _varm(A, something(mean, Statistics.mean(A)), corrected, dims)

function _varm(A::AbstractArray{C}, mean, corrected, dims::Colon) where C <: Colorant
    n = length(A)
    T = sumtype(C, typeof(mean)) # ?
    n == 0 && return nan(base_colorant_type(C){T})
    mapreduce(c->mapc(abs2, c - mean), +, A) / (n - Int(corrected))
end

Current implementation in Images.jl:

julia> img = rand(RGB{N0f8},100,100);

julia> @benchmark var(view($img,:,:))
BenchmarkTools.Trial:
  memory estimate:  560 bytes
  allocs estimate:  14
  --------------
  minimum time:     281.900 μs (0.00% GC)
  median time:      283.299 μs (0.00% GC)
  mean time:        290.398 μs (0.00% GC)
  maximum time:     644.901 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Yet another implementation:

julia> @benchmark var(view($img,:,:))
BenchmarkTools.Trial:
  memory estimate:  48 bytes
  allocs estimate:  1
  --------------
  minimum time:     21.499 μs (0.00% GC)
  median time:      21.600 μs (0.00% GC)
  mean time:        21.684 μs (0.00% GC)
  maximum time:     78.099 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

@kimikage
Copy link

kimikage commented Mar 10, 2020

The dims support has not been fully tested and optimized.

_var(A::AbstractArray{C}, corrected::Bool, mean, dims) where C <: Colorant =
    _varm(A, something(mean, Statistics.mean(A, dims=dims)), corrected, dims)

function _varm(A::AbstractArray{C}, mean, corrected, dims) where C <: Colorant
    n = length(A)
    T = sumtype(C, eltype(mean)) # ?
    if n == 0
        nans = similar(mean, base_colorant_type(C){T})
        return fill!(nans, nan(base_colorant_type(C){T}))
    end
    y = iterate(mean)
    function s(slice)
        s = mapreduce(c->mapc(abs2, c - y[1]), +, slice)
        y = iterate(mean, y[2])
        s / (length(slice)- Int(corrected))
    end
    mapslices(s, A, dims=dims)
end

@kimikage
Copy link

BTW, I also noticed inconsistencies with the element types.

julia> mean(img)
RGB{Float64}(0.4960086274509804,0.5011768627450981,0.5030243137254904)

julia> mean(img, dims=1)
1×100 Array{RGB{Float32},2} with eltype RGB{Float32}:
 RGB{Float32}(0.510353,0.535608,0.455804)    RGB{Float32}(0.523176,0.478235,0.511608)

I think Float32 is more appropriate in this case.

@johnnychen94
Copy link
Member Author

johnnychen94 commented Mar 10, 2020

From my first glance, this looks great, thanks for putting these together. I'll check it more carefully this weekend if nobody does.

Actually, I'm more interested in improving the performance of colorview and channelview. For example, the WIP benchmark tells me that getindex of channelview is 20x slower than a plain array. 😕

@kimikage
Copy link

kimikage commented Mar 10, 2020

This is off-topic, but @timholy and @chriselrod are trying to improve the vectorization of RGB image processing. The memory layout, i.e. Struct of Arrays vs. Array of Structs, is a worrisome problem in image processing.
IMO, modern CPUs (with AVX2) have reduced alignment issues and improved shuffling/permuting performance, so the current AoS layout is more versatile. So, I think improving channelview has great merits.

@kimikage
Copy link

@timholy
Copy link
Member

timholy commented Mar 12, 2020

Agreed that it's off-topic, but your points are important and I'll briefly address them here. (If we want to continue the discussion, to avoid derailing this PR too much let's open another issue.) EDIT: oops, I thought this was JuliaGraphics/ColorVectorSpace.jl#125, hence my somewhat confusing comment here!

I don't have immediate plans to change the default layout, because I too see many advantages to the current system. For ImageFiltering, I think "we can have our cake and eat it too" (weird English expression 😄): because each array element will be used many times (as many as there are elements in the kernel), it's totally worth it to change the representation used in the TileBuffer from the TiledIteration.jl package. But that would be for a temporary intermediate used to improve cache efficiency in a small block of the image, it's not something that will escape to the user.

There are other circumstances, like resizing/rotating/etc, where the number of uses per array element is much smaller, and for which the appropriate vectorization strategy is less clear. I don't yet have any concrete plans about how to handle that.

For improving channelview, my current favorite strategy is to introduce the possibility of declaring some array dimensions as having a fixed size known to the compiler. If I'm thinking about it clearly, that should make ReinterpretArray perform much better. This would essentially introduce a statically-sized array type to Base, though different from StaticArrays.jl in that one could declare a subset of the dimensions to be statically-sized. (For channelview the one I'm interested in is the first dimension, of size 3.) Of course, it's far from clear that this PR, when I get around to it, will be accepted into Base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants