-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve speed of pow5_12(::Float64)
and accuracy of pow5_12(::Float32)
#485
Conversation
Benchmarkjulia> versioninfo()
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
julia> xs = rand(1000) .+ 0.003;
julia> @noinline f(x) = Colors.pow5_12(x);
julia> @btime f.($xs);
9.000 μs (1 allocation: 7.94 KiB) # before
5.817 μs (1 allocation: 7.94 KiB) # after |
Codecov Report
@@ Coverage Diff @@
## master #485 +/- ##
==========================================
+ Coverage 92.41% 92.82% +0.41%
==========================================
Files 9 9
Lines 975 1003 +28
==========================================
+ Hits 901 931 +30
+ Misses 74 72 -2
Continue to review full report at Codecov.
|
pow5_12(::Float64)
pow5_12(::Float32)
It is difficult to avoid rounding errors without converting to using Colors: pow3_4, pow5_12
@inline pow5_12_with_f64(x::Float32) = Float32(pow5_12(Float64(x)))
@inline pow5_12_gen(x::Float32) = pow3_4(x) / cbrt(x)
@inline function pow5_12_wo_f64(x::Float32)
# x^(-1/6)
if x < 0.019f0
t0 = @evalpoly(x,
3.1487858f0, -228.47084f0, 21100.13f0, -1.00334556f6, 1.8385738f7)
elseif x < 0.12f0
t0 = @evalpoly(x,
2.3135905f0, -26.436646f0, 385.01465f0, -2890.092f0, 8366.343f0)
elseif x < 1.2f0
t0 = @evalpoly(x,
1.7047813f0, -3.1261253f0, 7.498745f0, -10.10032f0, 6.8206015f0, -1.7978895f0)
else
return pow3_4(x) / cbrt(x)
end
# x^(-1/3)
t1 = t0 * t0
h1 = muladd(t1^2, -x * t1, 1.0f0)
t2 = muladd(h1, 1/3f0 * t1, t1)
h2 = muladd(t2^2, -x * t2, 1.0f0)
t2h = muladd(2/9f0, h2, 1/3f0) * h2 * t2
# x^(3/4)
p3_4 = pow3_4(x)
p3_4h = 0.25f0 * muladd(x, x^2 / p3_4^3, -p3_4)
# x^(3/4) * x^(-1/3)
muladd(p3_4, t2, muladd(p3_4, t2h, p3_4h * t2))
end julia> xs32 = rand(Float32, 1000) .+ 0.003f0;
julia> @btime pow5_12_with_f64.($xs32); # ~0.5 ULP
5.417 μs (1 allocation: 4.06 KiB)
julia> @btime pow5_12_gen.($xs32); # ~3 ULP
6.400 μs (1 allocation: 4.06 KiB)
julia> @btime pow5_12_wo_f64.($xs32); # ~1.5 ULP (0.5 ULP in most cases)
5.317 μs (1 allocation: 4.06 KiB) IMO, converting to |
pow5_12(::Float32)
pow5_12(::Float64)
and accuracy of pow5_12(::Float32)
…t32)` `pow5_12(::Float64)` is used in the `XYZ{Float64}`-->`RGB` conversion. This ensures sufficient accuracy (~3 ULP) in the range of [0.003, 1], which is typically required for the conversion. Previously, even conversions from `XYZ{::Float32}` were calculated with `Float64`. This changes the conversions so that `pow5_12(::Float32)` is used. However, `pow5_12(::Float32)` still be calculated within `Float64` for accuracy. `@inline` could be a workaround for the problem with precompilation.
This PR mitigates the problem of precompilation in |
…t32)` (JuliaGraphics#485) `pow5_12(::Float64)` is used in the `XYZ{Float64}`-->`RGB` conversion. This ensures sufficient accuracy (~3 ULP) in the range of [0.003, 1], which is typically required for the conversion. Previously, even conversions from `XYZ{::Float32}` were calculated with `Float64`. This changes the conversions so that `pow5_12(::Float32)` is used. However, `pow5_12(::Float32)` still be calculated within `Float64` for accuracy. `@inline` could be a workaround for the problem with precompilation.
…t32)` (#485) `pow5_12(::Float64)` is used in the `XYZ{Float64}`-->`RGB` conversion. This ensures sufficient accuracy (~3 ULP) in the range of [0.003, 1], which is typically required for the conversion. Previously, even conversions from `XYZ{::Float32}` were calculated with `Float64`. This changes the conversions so that `pow5_12(::Float32)` is used. However, `pow5_12(::Float32)` still be calculated within `Float64` for accuracy. `@inline` could be a workaround for the problem with precompilation.
pow5_12
is used in theXYZ{Float64}
-->RGB
conversion.This ensures sufficient accuracy (~3 ULP) in the range of [0.003, 1], which is typically required for the conversion.
@inline
could be a workaround for the problem with precompilation.I am planning a
Float32
version. However, for now (i.e. without PR #482), the result will be promoted toFloat64
within the transform matrix.Edit: see #485 (comment)
cf. #483