Improve speed of `pow5_12(::Float64)` and accuracy of `pow5_12(::Float32)` #485

kimikage · 2021-06-08T04:30:22Z

pow5_12 is used in the XYZ{Float64}-->RGB conversion.
This ensures sufficient accuracy (~3 ULP) in the range of [0.003, 1], which is typically required for the conversion.

@inline could be a workaround for the problem with precompilation.

I am planning a Float32 version. However, for now (i.e. without PR #482), the result will be promoted to Float64 within the transform matrix.
Edit: see #485 (comment)

cf. #483

kimikage · 2021-06-08T04:34:14Z

Benchmark

julia> versioninfo()
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)

julia> xs = rand(1000) .+ 0.003;

julia> @noinline f(x) = Colors.pow5_12(x);

julia> @btime f.($xs);
  9.000 μs (1 allocation: 7.94 KiB) # before
  5.817 μs (1 allocation: 7.94 KiB) # after

codecov · 2021-06-08T04:34:16Z

Codecov Report

Merging #485 (c3998ff) into master (c28d392) will increase coverage by 0.41%.
The diff coverage is 97.29%.

@@            Coverage Diff             @@
##           master     #485      +/-   ##
==========================================
+ Coverage   92.41%   92.82%   +0.41%     
==========================================
  Files           9        9              
  Lines         975     1003      +28     
==========================================
+ Hits          901      931      +30     
+ Misses         74       72       -2

Impacted Files	Coverage Δ
src/parse.jl	`100.00% <ø> (+2.52%)`	⬆️
src/conversions.jl	`99.01% <90.90%> (-0.32%)`	⬇️
src/utilities.jl	`98.41% <100.00%> (+0.39%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 109ee4b...c3998ff. Read the comment docs.

kimikage · 2021-06-08T12:36:11Z

It is difficult to avoid rounding errors without converting to Float64.

using Colors: pow3_4, pow5_12

@inline pow5_12_with_f64(x::Float32) = Float32(pow5_12(Float64(x)))

@inline pow5_12_gen(x::Float32) = pow3_4(x) / cbrt(x)

@inline function pow5_12_wo_f64(x::Float32)
    # x^(-1/6)
    if x < 0.019f0
        t0 = @evalpoly(x,
            3.1487858f0, -228.47084f0, 21100.13f0, -1.00334556f6, 1.8385738f7)
    elseif x < 0.12f0
        t0 = @evalpoly(x,
            2.3135905f0, -26.436646f0, 385.01465f0, -2890.092f0, 8366.343f0)
    elseif x < 1.2f0
        t0 = @evalpoly(x,
            1.7047813f0, -3.1261253f0, 7.498745f0, -10.10032f0, 6.8206015f0, -1.7978895f0)
    else
        return pow3_4(x) / cbrt(x)
    end
    # x^(-1/3)
    t1 = t0 * t0
    h1 = muladd(t1^2, -x * t1, 1.0f0)
    t2 = muladd(h1, 1/3f0 * t1, t1)
    h2 = muladd(t2^2, -x * t2, 1.0f0)
    t2h = muladd(2/9f0, h2, 1/3f0) * h2 * t2
    # x^(3/4)
    p3_4 = pow3_4(x)
    p3_4h = 0.25f0 * muladd(x, x^2 / p3_4^3, -p3_4)
    # x^(3/4) * x^(-1/3)
    muladd(p3_4, t2, muladd(p3_4, t2h, p3_4h * t2))
end

julia> xs32 = rand(Float32, 1000) .+ 0.003f0;

julia> @btime pow5_12_with_f64.($xs32); # ~0.5 ULP
  5.417 μs (1 allocation: 4.06 KiB)

julia> @btime pow5_12_gen.($xs32); # ~3 ULP
  6.400 μs (1 allocation: 4.06 KiB)

julia> @btime pow5_12_wo_f64.($xs32); # ~1.5 ULP (0.5 ULP in most cases)
  5.317 μs (1 allocation: 4.06 KiB)

IMO, converting to Float64 might be the best choice.

…t32)` `pow5_12(::Float64)` is used in the `XYZ{Float64}`-->`RGB` conversion. This ensures sufficient accuracy (~3 ULP) in the range of [0.003, 1], which is typically required for the conversion. Previously, even conversions from `XYZ{::Float32}` were calculated with `Float64`. This changes the conversions so that `pow5_12(::Float32)` is used. However, `pow5_12(::Float32)` still be calculated within `Float64` for accuracy. `@inline` could be a workaround for the problem with precompilation.

kimikage · 2021-06-16T03:48:08Z

This PR mitigates the problem of precompilation in XYZ-->RGB conversion. However, there is still a problem with the Lab-->(XYZ)-->RGB conversion. I still have not found a workaround for the latter problem.
So, I'm going to make the improvements that I can now, even if they do not solve the problem.

…t32)` (JuliaGraphics#485) `pow5_12(::Float64)` is used in the `XYZ{Float64}`-->`RGB` conversion. This ensures sufficient accuracy (~3 ULP) in the range of [0.003, 1], which is typically required for the conversion. Previously, even conversions from `XYZ{::Float32}` were calculated with `Float64`. This changes the conversions so that `pow5_12(::Float32)` is used. However, `pow5_12(::Float32)` still be calculated within `Float64` for accuracy. `@inline` could be a workaround for the problem with precompilation.

…t32)` (#485) `pow5_12(::Float64)` is used in the `XYZ{Float64}`-->`RGB` conversion. This ensures sufficient accuracy (~3 ULP) in the range of [0.003, 1], which is typically required for the conversion. Previously, even conversions from `XYZ{::Float32}` were calculated with `Float64`. This changes the conversions so that `pow5_12(::Float32)` is used. However, `pow5_12(::Float32)` still be calculated within `Float64` for accuracy. `@inline` could be a workaround for the problem with precompilation.

kimikage force-pushed the pow5_12 branch from b8e135c to 2e7c327 Compare June 8, 2021 11:57

kimikage changed the title ~~[WIP] Improve speed of pow5_12(::Float64)~~ Improve speed of pow5_12(::Float64) and accuracy of pow5_12(::Float32) Jun 8, 2021

kimikage force-pushed the pow5_12 branch from 2e7c327 to 52f25fb Compare June 8, 2021 11:59

kimikage marked this pull request as ready for review June 8, 2021 12:36

kimikage changed the title ~~Improve speed of pow5_12(::Float64) and accuracy of pow5_12(::Float32)~~ Improve speed of pow5_12(::Float64) and accuracy of pow5_12(::Float32) Jun 8, 2021

kimikage force-pushed the pow5_12 branch from 52f25fb to c3998ff Compare June 8, 2021 14:34

kimikage merged commit eb4f61c into JuliaGraphics:master Jun 16, 2021

kimikage deleted the pow5_12 branch June 16, 2021 03:49

kimikage mentioned this pull request Jun 16, 2021

Consideration of workarounds for cbrt and exp precompilation problems #425

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve speed of `pow5_12(::Float64)` and accuracy of `pow5_12(::Float32)` #485

Improve speed of `pow5_12(::Float64)` and accuracy of `pow5_12(::Float32)` #485

kimikage commented Jun 8, 2021 •

edited

Loading

kimikage commented Jun 8, 2021

codecov bot commented Jun 8, 2021 •

edited

Loading

kimikage commented Jun 8, 2021

kimikage commented Jun 16, 2021

Improve speed of pow5_12(::Float64) and accuracy of pow5_12(::Float32) #485

Improve speed of pow5_12(::Float64) and accuracy of pow5_12(::Float32) #485

Conversation

kimikage commented Jun 8, 2021 • edited Loading

kimikage commented Jun 8, 2021

Benchmark

codecov bot commented Jun 8, 2021 • edited Loading

Codecov Report

kimikage commented Jun 8, 2021

kimikage commented Jun 16, 2021

Improve speed of `pow5_12(::Float64)` and accuracy of `pow5_12(::Float32)` #485

Improve speed of `pow5_12(::Float64)` and accuracy of `pow5_12(::Float32)` #485

kimikage commented Jun 8, 2021 •

edited

Loading

codecov bot commented Jun 8, 2021 •

edited

Loading