Fix type instability of `entropy` and generalize `crossentropy` and `kldivergence` #714

devmotion · 2021-09-01T20:44:29Z

While I was working on a PR to Distributions, I noticed that entropy is not type stable:

julia> @inferred entropy([0, 1])
ERROR: return type Float64 does not match inferred return type Union{Float64, Int64}

This PR fixes the issue by using LogExpFunctions.xlogx instead of x -> x > 0 ? zero(x) : z * log(x). Moreover, it bases crossentropy and kldivergence on LogExpFunctions.xlogy and generalizes them to non-Float64 return types and arguments with different element types.

…kldivergence`

nalimilan · 2021-09-04T14:53:28Z

Thanks, that indeed simplifies the code a lot. Have you checked that performance doesn't regress? Also, does it still work for empty inputs?

devmotion · 2021-09-05T16:29:28Z

Thanks, that indeed simplifies the code a lot. Have you checked that performance doesn't regress?
Also, does it still work for empty inputs?

With the latest release:

julia> using StatsBase, BenchmarkTools, Random

julia> Random.seed!(1234);

julia> p, q = rand(100_000), rand(100_000);

julia> @btime entropy($p);
  753.493 μs (0 allocations: 0 bytes)

julia> @btime crossentropy($p, $q);
  779.563 μs (0 allocations: 0 bytes)

julia> @btime kldivergence($p, $q);
  1.540 ms (0 allocations: 0 bytes)

julia> entropy(Float64[])
ERROR: ArgumentError: reducing over an empty collection is not allowed
Stacktrace:
...

julia> crossentropy(Float64[], Float64[])
-0.0

julia> kldivergence(Float64[], Float64[])
0.0

With this PR:

julia> using StatsBase, BenchmarkTools, Random

julia> Random.seed!(1234);

julia> p, q = rand(100_000), rand(100_000);

julia> @btime entropy($p);
  837.509 μs (0 allocations: 0 bytes)

julia> @btime crossentropy($p, $q);
  1.133 ms (0 allocations: 0 bytes)

julia> @btime kldivergence($p, $q);
  1.680 ms (0 allocations: 0 bytes)

julia> entropy(Float64[])
ERROR: ArgumentError: reducing over an empty collection is not allowed
Stacktrace:
...

julia> crossentropy(Float64[], Float64[])
ERROR: ArgumentError: reducing over an empty collection is not allowed
Stacktrace:
...

julia> kldivergence(Float64[], Float64[])
ERROR: ArgumentError: reducing over an empty collection is not allowed
Stacktrace:
...

So there seems to be some performance regression and empty inputs don't work but also don't work with entropy in the latest release. I will look into why there is a performance drop. For entropy I just updated the mapping function in sum with xlogx, so I wonder why it is slower than before - maybe it could be caused by the use of ifelse in LogExpFunctions? If empty inputs should be supported, I could just add a check and compute the return type with zero(eltype(p)) etc. I assume (this would also fix the problem in the latest release I guess).

devmotion · 2021-09-05T21:35:03Z

On a second thought, it is not completely clear to me anymore if it is reasonable to define entropy, kldivergence or crossentropy for empty collections, and what values one would expect.

nalimilan · 2021-09-06T07:30:17Z

Given that these are sums, isn't it logical to return zero for empty inputs? Anyway, throwing an error instead would be breaking so we would at least need to keep returning zero, possibly with a deprecation warning.

devmotion · 2021-09-06T07:45:42Z

entropy already throws an error but indeed it would be a breaking change for crossentropy and kldivergence.

My main motivation for throwing an error would be that empty vectors don't represent probability distributions.

devmotion · 2021-10-01T12:09:42Z

So what's the plan here? Return a typestable 0 for empty inputs also for entropy, as currently done for crossentropy and kldivergence? Or keep throwing a (possibly more meaningful) error in the case of entropy?

mschauer · 2021-10-01T14:23:33Z

My main motivation for throwing an error would be that empty vectors don't represent probability distributions.

I find that convincing.

nalimilan · 2021-10-02T18:04:23Z

As you prefer, but I just think we shouldn't add new errors for now to avoid being breaking. A deprecation warning would be OK, with a PR to turn them into errors in the next breaking release.

BTW should we check that the values sum to one if we want to ensure they are probability distributions (with a way to skip the check)?

nalimilan · 2021-10-02T18:07:35Z

src/scalarstats.jl

-        end
-    end
-    return -s
+    return - sum(xlogy(pi, qi) for (pi, qi) in zip(p, q))


Note that using lazy broadcasting (on recent julia versions) is more efficient and has better precision as it uses pairwise summation. Maybe that would fix the performance regression?

return - sum(Broadcast.instantiate(Broadcast.broadcasted(xlogx, p, q)))

EDIT: probably need to use vec to keep the same behavior as currently.

I've seen the Julia PR and played around with sum(Brodcast....) in some local branch of Distributions but in my benchmarks there another implementation was actually faster. I don't remember the details right now and I assume it was not really comparable to this example here though. I'll check if it helps but I guess we have to live with a slight performance regression - the existing implementations are not type stable and less general, so it is not completely fair to compare the performance of the bugfixes in this PR to the performance of the incorrect existing implementation (of course, we should still try to improve the performance of the bugfixes as much as possible).

Improved precision may still be valuable anyway.

devmotion · 2021-10-03T20:24:28Z

I reran the benchmarks with Julia 1.6.3 (with Float64 and BigFloat instead of Float64 and Int since I think the latter is not very realistic) sand also compared it with the sum(Broadcast....) suggestion. There is still a performance regression for Float64 (switching from ifelse to ternary operators in LogExpFunctions helped a bit but did not resolve them) but IMO it seem acceptable; for BigFloat the PR seems to improve performance slightly, possibly due to the type stability fixes. The comparison between the PR and sum(Broadcast...) is not completely conclusive but in general it seems the sum(Broadcast...) approach performs equally well or outperforms the other approach based on zip.

With StatsBase#master:

julia> using StatsBase, BenchmarkTools, Random, Test

julia> Random.seed!(1234);

julia> p, q = rand(100_000), rand(100_000);

julia> @btime entropy($p);
  721.254 μs (0 allocations: 0 bytes)

julia> @btime crossentropy($p, $q);
  765.894 μs (0 allocations: 0 bytes)

julia> @btime kldivergence($p, $q);
  1.440 ms (0 allocations: 0 bytes)

julia> pbig, qbig = rand(BigFloat, 100_000), rand(BigFloat, 100_000);

julia> @btime entropy($pbig);
  616.253 ms (700184 allocations: 35.11 MiB)

julia> @btime crossentropy($pbig, $qbig);
  612.427 ms (700234 allocations: 35.11 MiB)

julia> @btime kldivergence($pbig, $qbig);
  633.777 ms (900163 allocations: 45.79 MiB)

julia> @inferred entropy(pbig);

julia> @inferred crossentropy(pbig, qbig);
ERROR: return type BigFloat does not match inferred return type Union{Float64, BigFloat}

julia> @inferred kldivergence(pbig, qbig);
ERROR: return type BigFloat does not match inferred return type Union{Float64, BigFloat}

With this PR (commit 97d3bfa):

julia> using StatsBase, BenchmarkTools, Random, Test

julia> Random.seed!(1234);

julia> p, q = rand(100_000), rand(100_000);

julia> @btime entropy($p);
  763.058 μs (0 allocations: 0 bytes)

julia> @btime crossentropy($p, $q);
  1.100 ms (0 allocations: 0 bytes)

julia> @btime kldivergence($p, $q);
  1.568 ms (0 allocations: 0 bytes)

julia> pbig, qbig = rand(BigFloat, 100_000), rand(BigFloat, 100_000);

julia> @btime entropy($pbig);
  613.844 ms (700184 allocations: 35.11 MiB)

julia> @btime crossentropy($pbig, $qbig);
  597.341 ms (700230 allocations: 35.11 MiB)

julia> @btime kldivergence($pbig, $qbig);
  636.595 ms (900159 allocations: 45.79 MiB)

julia> @inferred entropy(pbig);

julia> @inferred crossentropy(pbig, qbig);

julia> @inferred kldivergence(pbig, qbig);

With sum(Broadcast....):

julia> using StatsBase, BenchmarkTools, Random, Test

julia> Random.seed!(1234);

julia> p, q = rand(100_000), rand(100_000);

julia> @btime entropy($p);
  763.252 μs (0 allocations: 0 bytes)

julia> @btime crossentropy($p, $q);
  882.439 μs (0 allocations: 0 bytes)

julia> @btime kldivergence($p, $q);
  1.585 ms (0 allocations: 0 bytes)

julia> pbig, qbig = rand(BigFloat, 100_000), rand(BigFloat, 100_000);

julia> @btime entropy($pbig);
  599.742 ms (700184 allocations: 35.11 MiB)

julia> @btime crossentropy($pbig, $qbig);
  600.415 ms (700230 allocations: 35.11 MiB)

julia> @btime kldivergence($pbig, $qbig);
  621.025 ms (900161 allocations: 45.79 MiB)

julia> @inferred entropy(pbig);

julia> @inferred crossentropy(pbig, qbig);

julia> @inferred kldivergence(pbig, qbig);

devmotion · 2021-10-05T09:56:17Z

I updated the PR, it handles (and deprecates) empty collections of probabilities and uses pairwise summation now for crossentropy and kldivergence.

nalimilan · 2021-10-06T14:11:30Z

src/scalarstats.jl

-entropy(p) = -sum(pᵢ -> iszero(pᵢ) ? zero(pᵢ) : pᵢ * log(pᵢ), p)
+function entropy(p)
+    if isempty(p)
+        throw(ArgumentError("empty collections of probabilities are not supported"))


Maybe more explicit?

Suggested change

throw(ArgumentError("empty collections of probabilities are not supported"))

throw(ArgumentError("empty collections are not supported as they do not " *

"represent a proper probability distribution""))

src/scalarstats.jl

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

devmotion · 2021-10-09T22:44:38Z

Any additional comments or suggestions?

devmotion added 4 commits September 1, 2021 22:39

Fix type instability of entropy and generalize crossentropy and `…

818e593

…kldivergence`

Extend tests

2476216

Remove REQUIRE

0b00db8

Bump version

97d3bfa

devmotion mentioned this pull request Sep 5, 2021

Use ternary operator instead of ifelse JuliaStats/LogExpFunctions.jl#26

Merged

nalimilan reviewed Oct 2, 2021

View reviewed changes

Use pairwise summation and handle empty inputs

78636f1

nalimilan approved these changes Oct 6, 2021

View reviewed changes

devmotion and others added 3 commits October 10, 2021 00:06

Apply suggestions from code review

9d6807b

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

More explicit error and deprecation warnings

05b4aa5

Fix tests

e1d1d10

mschauer merged commit c4432ab into master Oct 10, 2021

devmotion deleted the dw/entropy branch October 10, 2021 08:16

devmotion mentioned this pull request Oct 15, 2021

Remove deprecations in crossentropy and kldivergence #725

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix type instability of `entropy` and generalize `crossentropy` and `kldivergence` #714

Fix type instability of `entropy` and generalize `crossentropy` and `kldivergence` #714

devmotion commented Sep 1, 2021

nalimilan commented Sep 4, 2021

devmotion commented Sep 5, 2021 •

edited

Loading

devmotion commented Sep 5, 2021

nalimilan commented Sep 6, 2021 •

edited

Loading

devmotion commented Sep 6, 2021

devmotion commented Oct 1, 2021

mschauer commented Oct 1, 2021

nalimilan commented Oct 2, 2021

nalimilan Oct 2, 2021 •

edited

Loading

devmotion Oct 2, 2021

nalimilan Oct 3, 2021

devmotion commented Oct 3, 2021 •

edited

Loading

devmotion commented Oct 5, 2021

nalimilan Oct 6, 2021

devmotion Oct 9, 2021

devmotion commented Oct 9, 2021

	throw(ArgumentError("empty collections of probabilities are not supported"))
	throw(ArgumentError("empty collections are not supported as they do not " *
	"represent a proper probability distribution""))

Fix type instability of entropy and generalize crossentropy and kldivergence #714

Fix type instability of entropy and generalize crossentropy and kldivergence #714

Conversation

devmotion commented Sep 1, 2021

nalimilan commented Sep 4, 2021

devmotion commented Sep 5, 2021 • edited Loading

devmotion commented Sep 5, 2021

nalimilan commented Sep 6, 2021 • edited Loading

devmotion commented Sep 6, 2021

devmotion commented Oct 1, 2021

mschauer commented Oct 1, 2021

nalimilan commented Oct 2, 2021

nalimilan Oct 2, 2021 • edited Loading

Choose a reason for hiding this comment

devmotion Oct 2, 2021

Choose a reason for hiding this comment

nalimilan Oct 3, 2021

Choose a reason for hiding this comment

devmotion commented Oct 3, 2021 • edited Loading

devmotion commented Oct 5, 2021

nalimilan Oct 6, 2021

Choose a reason for hiding this comment

devmotion Oct 9, 2021

Choose a reason for hiding this comment

devmotion commented Oct 9, 2021

Fix type instability of `entropy` and generalize `crossentropy` and `kldivergence` #714

Fix type instability of `entropy` and generalize `crossentropy` and `kldivergence` #714

devmotion commented Sep 5, 2021 •

edited

Loading

nalimilan commented Sep 6, 2021 •

edited

Loading

nalimilan Oct 2, 2021 •

edited

Loading

devmotion commented Oct 3, 2021 •

edited

Loading