proposal: spec: disallow NaN keys in maps #20660

bcmills · 2017-06-13T19:34:34Z

Motivation

Go currently supports NaN keys in maps through some clever tricks.

The current spec is ambiguous on this behavior. It says:

[I]f the map contains an entry with key x, a[x] is the map value with key x and the type of a[x] is the value […]

but "with key x" could plausibly mean "with key represented by x". (In practice, today it means "with key equal to x".)

That leads to surprising and subtle bugs (such as #14427) and adds complexity even in correct code (#20138 (comment)).

For example, the following two functions appear to be equivalent, but the latter version silently copies zeroes instead of the intended values for NaN keys:

func copy(m map[float64]int) map[float64]int {
  c := make(map[float64]int, len(m))
  for k, v := range m {
    c[k] = v
  }
}

func copy(m map[float64]int) map[float64]int {
  c := make(map[float64]int, len(m))
  for k := range m {
    c[k] = m[k]
  }
}

Proposal

We already reject incomparable values as map keys:

If the key type is an interface type, these comparison operators must be defined for the dynamic key values; failure will cause a run-time panic.

I propose that we should apply that standard to NaNs as well. Specifically, I propose that we add the following sentence to https://golang.org/ref/spec#Index_expressions:
Assigning to an element of a map with a key that is not equal to itself causes a run-time panic.

This proposal is along the lines of #19624, in that it produces panics for arithmetic operations which are otherwise likely to result in silent corruption in typical user code.

Workarounds

Users can easily avoid the proposed panics by ~~calling math.IsNaN(k)~~ checking whether the key is equal to itself before inserting a map entry.

Users who expect NaNs with the same representation to map to the same element can transform the keys using math.Float64bits or math.Float32bits:

  if k == 0 { // -0.0 or +0.0
    m[math.Float64bits(0)] = v
  } else {
    m[math.Float64bits(k)] = v
  }

Users who really do intend to store all of the entries for NaN keys separately can explicitly pair the map with a slice:

  type nanElem struct {
    k float64
    v V
  }
  var (
    m map[float64]V
    nans []nanElem
  )
  if k != k {
    nans = append(nans, nanElem{k, v})
  } else {
    m[k] = v
  }

The text was updated successfully, but these errors were encountered:

randall77 · 2017-06-13T19:55:04Z

FWIW, there's a fair amount of code in the hashmap implementation that deals with the possibility of k!=k for keys k (search for reflexivekey in runtime/hashmap.go). It would be nice to strip that out.
That said, I don't think it is a big performance impediment. In the common case it's just a load/compare/predictable branch in a few places.

dsnet · 2017-06-13T19:56:40Z

The strange behavior with maps only because == is not symmetric when applied on NaNs.
An alternative proposal would be to define == to be symmetric for NaNs only for maps. Technically there are multiple types of NaNs (signaling and quiet NaNs and a variety of valid bit sequences that represent NaNs). I similarly argue that they should all be equal to each other for maps in the same way that -0.0 == +0.0.

bcmills · 2017-06-13T20:42:43Z

@randall77 It may be that this proposal would simplify the hashmap implementation, but I would prefer to focus on the correctness and complexity implications for user code.

bcmills · 2017-06-13T20:50:27Z

@dsnet As @rsc notes in the blog post, allowing m[k1] and m[k2] to be same element when k1 != k2 would also be surprising. It is straightforward to equate NaNs in user code if desired using the math.FloatNbits approach outlined above, and I would argue that the explicit code makes the change in equality less surprising.

m := map[uint64]V
…
switch {
case math.IsNan(k):
  m[math.Float64bits(math.NaN())] = v
case k == 0:  // +0.0 or -0.0
  m[math.Float64bits(0)] = v
default:
  m[math.Float64bits(k)] = v
}

randall77 · 2017-06-13T21:50:10Z

@bcmills : I agree, just wanted to let people know what the effects on the implementation would be (tl;dr a small net positive).

cznic · 2017-06-14T08:06:17Z

We already reject incomparable values as map keys:

If the key type is an interface type, these comparison operators must be defined for the dynamic key values; failure will cause a run-time panic.

All and every value of type {float32,float64} is comparable (guaranteed by the specs). The result of the comparison is irrelevant here.

Map keys must be comparable is a simple, non-complicated rule. I object to making it not only more complex, but suprisingly non-orthogonal.

rsc · 2017-06-14T15:35:59Z

Users can easily avoid the proposed panics by calling math.IsNaN(k) before inserting a map entry for k.

Not true. NaNs can be inside arrays or structs.

All our current restrictions are type-based, not value-based. Even the interface panic restriction is about the type inside the interface, not the value. I'd really like to avoid introducing and explaining new value-based restrictions.

bcmills · 2017-06-14T15:41:24Z

Users can easily avoid the proposed panics by calling math.IsNaN(k) before inserting a map entry for k.

Not true. NaNs can be inside arrays or structs.

Good point. But it's still easy to check, especially given the "not equal to itself" phrasing I've proposed:

if k != k {
  … // Can't be a map key. Do something else with it.
} else {
  m[k] = v
}

bcmills · 2017-06-14T15:45:26Z

The main issue I want to address is the surprising inability to access elements by key in an irreflexively-keyed map. If you don't want to introduce value-based restrictions, perhaps we could at least change map entries from "with key equal to x" to "with key represented by x"?

rsc · 2017-06-14T15:57:08Z

It's certainly an idea worth evaluating when we get to language changes again (hence the Go2 label). Like all the Go2-labelled things, I'm not going to try to evaluate them now.

zombiezen · 2017-12-21T20:48:48Z

One other way to address the specific case of copying maps would be to extend the copy builtin function to support maps.

bcmills · 2017-12-21T21:01:41Z

That would still give surprising behavior for functions that are not copy exactly.

For example, consider this snippet:

	m := map[float64]int{k: 0}
	m[k] += 1
	m[k] += 1

It appears to increment the entry for m[k] twice, but what it actually does (when k is a NaN) is add two new entries mapping k to 1 (https://play.golang.org/p/gIUlA0lg-9b).

Similarly, there is no way to delete a NaN entry from a map: https://play.golang.org/p/Rz2TLn336c8

So special-casing copy isn't, IMO, a viable solution.

bcmills · 2018-01-02T21:17:33Z

For an example where += matters, consider the case of aggregating two maps into one (https://play.golang.org/p/iwKyUWV3m_U):

func inc(dst, src map[float64]int) {
	for k, v := range src {
		dst[k] += v
	}
}

There is currently no way to implement such a function correctly in the presence of NaNs: because delete doesn't work with NaN keys, there is no way to remove or update the existing entries in the dst map even if you use the Float64bits trick to aggregate into some non-float-keyed intermediate.

ianlancetaylor · 2018-02-20T22:06:29Z

I agree with @bcmills that a map with NaN key values is hard and perhaps even impossible to use. However, the current definition of maps is entirely type based, and I don't see a clear reason to change that. The current behavior is consistent and predictable even though surprising.

@bcmills suggests that you can check for an irreflexive value before adding it to the map, but that is true whether we make this change or not.

Closing.

griesemer · 2018-02-20T22:06:55Z

cc: griesemer

bcmills · 2018-02-21T21:38:47Z

The current behavior is consistent and predictable even though surprising.

I would argue that “surprising” is exactly the opposite of “predictable”: you can only predict the current behavior if you stop to think about it, and (empirically) most users don't.

So what about using representation rather than equality (per #20660 (comment))?

That would also be consistent and predictable, and would arguably be a better match for the behavior people already assume for maps with floating-point keys. (For example, it would give reasonable behavior for all of the examples of subtly-broken loops I've listed above.)

The only downside I can see is that it would not equate the entries for +0.0 and -0.0, but it's not at all obvious to me that that matters in practice: given that lookups are already sensitive to rounding, it seems much less surprising to have separate entries for +0 and -0 than to have arbitrarily many entries for math.NaN().

A quick sample of float-keyed maps at Google (because that's an easy corpus for me to search) gives some interesting data:

The vast majority are inversions or indexes of other floating-point data, where the set of keys used for lookup are drawn from some fixed data set and lookups from that set are assumed to succeed.
- This is a fairly common pattern in unit tests.
- A public example in this category is in @aclements' moremath package: https://github.com/aclements/go-moremath/blob/033754ab1fee508c9f98f2785eec2365964e0b05/internal/mathtest/mathtest.go#L42-L49
- Several of the non-test examples are aggregating loops that pretty much exactly mirror the examples above.
Most of the rest are operating on data sets with no negative sign bits, such as percentiles, probabilities in the range [0, 1], or scalar distances.
One does operate with arbitrary keys, but special-cases keys equal to 0 before performing the lookup.
One is a float64-to-*float64 interning table, which doesn't particularly care about rounding or ±0 but will consume much more memory than intended if you feed it NaNs.

ianlancetaylor · 2018-02-22T01:44:09Z

Right now maps keys use the same operation as the == operator in the language. If we use representation instead, we break that simple rule. I don't think that is a good idea. In general, the current implementation minimizes special cases in the language. I think that is a good thing.

bcmills · 2018-02-22T18:49:25Z

In general, the current implementation minimizes special cases in the language. I think that is a good thing.

I agree that minimizing special cases is a good thing, but I don't think it's accurate to say that the current implementation minimizes special cases. The existing semantics already imposes a special case: it just isn't documented.

The spec for range says:

For each iteration, iteration values are produced as follows if the respective iteration variables are present:
Range expression                          1st value          2nd value
[…]
map             m  map[K]V                key      k  K      m[k]       V

But today, range does not produce the iteration value m[k] if k != k. Instead, it produces the value for an element with key represented by k.

It's also worth noting that the current spec does not use the == operator to define map semantics:

delete "removes the element with key k" (not "the element with key equal to k").
a[x] is "the map element with key x" (not "the map element with key equal to x).
For map types, "[t]he comparison operators == and != must be fully defined", but the spec does not specify how they are used.

If we summarize map semantics in terms of equality, the existing special cases become clear:

a[x] is the value of the element with key equal to x, unless it is the left-hand operand of an assignment.
If there is an existing element with key equal to x, a[x] = v sets the value of that element to v.
Otherwise, it adds a new element with key x and value v.
delete(m, k) removes the element with key equal to k.
range m yields the key and value of each element in m.

Using representation instead, the special cases disappear:

a[x] is the value of the element with key represented by x.
- Assigning to a[x] creates that element if it does not yet exist.
delete(m, k) removes the element with key represented by k.
range m yields the key and value of each element in m.

Enumerating the "equal" rules without the special cases gives the proposed panic semantics:

a[x] is the value of the element with key equal to x.
- Assigning to a[x] creates that element if it does not yet exist.
  (If such an element cannot exist, attempting to create it causes a run-time panic.)
delete(m, k) removes the element with key equal to k.
range m yields the key and value of each element in m.

mdempsky · 2018-02-22T21:25:59Z

All our current restrictions are type-based, not value-based. Even the interface panic restriction is about the type inside the interface, not the value.

It's about the dynamic type of the value, which I think is still arguably a value-based restriction from the perspective of the map key type, even if the spec isn't currently worded to emphasize that.

Currently we incompletely support NaN keys (e.g., delete doesn't work, and package reflect can't access NaN-keyed values). I'd argue we should either explicitly disallow them or we should completely support them (e.g., @bcmills's representation-based proposal above).

That said, I'm not particularly fond of the representation-based approach. It feels like an abstraction violation to me, but that's an admittedly rather subjective concern.

@ianlancetaylor This was tagged for Go2, but now it's closed. Does that mean it's been rejected for consideration from Go2?

cznic · 2018-02-22T21:31:59Z

Currently we incompletely support NaN keys (e.g., delete doesn't work, ...

I don't think delete does not work with NaN keys. I think that delete with a NaN key works actually exactly as specified. Related comment.

bcmills · 2018-02-22T21:36:17Z

I think that delete with a NaN key works actually exactly as specified.

“Works in a way that is not useful to anybody” is the same as “doesn't work” in terms of user value.

I have yet to see a program in which the current behavior of delete for NaN keys is useful and desired: if you have one in mind, that would be a great example to look at.

bcmills · 2018-02-23T16:15:24Z

I am strongly against changing maps to work on the NaN representation. I care less about having map operations panic on NaN, but I continue to find the arguments above to be unconvincing.

I don't have a strong preference between panicking on NaNs or using representations instead.

If we decide to do neither and maintain the status quo, I do think we would need to address the other NaN-key issues in the standard library (#11104, #14427, and #24075 that I know of).

The exact code that people will have to write if map operations panic on NaN will work today without changing the language.

That's also true of a number of other run-time panics: for example, division by zero, out-of-bounds slice access, failed type-assertions, races on map operations, and attempts to put incomparable keys into interface-keyed maps.
We panic on behaviors because they indicate likely bugs, not because those bugs cannot be avoided.

bcmills · 2020-12-16T15:40:02Z

#43207 is another example of a package in the standard library erroneously assuming that map values can be looked up by key.

deanveloper · 2021-07-27T17:59:26Z

When discussing the antireflexivity of NaN, I think it’s important to discuss why NaN is antireflexive.

NaN is supposed to bubble out of mathematical operations. You have an invalid-decoded float, bad operation with an Infinity, etc. You did something bad and now you have a non-real or otherwise invalid float. So, all operations on NaN evaluate to NaN. (And if NaN == NaN then that would mean NaN-NaN==0, NaN/NaN==1, etc) Your application should see this NaN and do something about it!

However, instead Go maps essentially hide NaNs away to never be seen again, they cannot be indexed nor deleted (they can only be detected by range and kind-of len()). The entire point of the antireflexivity of NaNs is to make them seen, but instead maps do the opposite, and I consider this a major flaw.

DeedleFake · 2021-07-28T00:09:38Z

I've done a quick survey of map implementations and how they handle NaN keys in a number of other languages for reference:

Rust: Rust avoids the problem completely by defining two levels of equality based on the Eq and PartialEq traits. The first is required for using a type as a map key, but only the second is defined for floats.
Python: Python instantiates NaN values as float('NaN'). Values of this kind are not equal, even to themselves, but the same instance of a NaN is equal to itself for purposes of maps. They are not equal to other instances, however.
Java and Kotlin: Java and Kotlin treat NaN values properly as being unequal under normal circumstances, but treats them as equal for the sake of map key usage, allowing them to be looked up and disallowing 'duplicates'. It is possible that it's actually doing the same thing as Python, but NaN is a constant in Java so it's mitigated somewhat in some circumstances.
Ruby: Ruby is the same as Java as far as I can tell, meaning that NaN == NaN is false, but is treated as a distinct value that is equal to itself for the purposes of map keys. It also has a constant that it uses for NaN values, rather than instantiating new ones.
Clojure: Clojure does the same thing as Go, and it even has an entire section of its manual dedicated to explaining the odd behavior.

rsc · 2021-07-28T00:19:39Z

There are only bad answers.
The activation energy (the coefficient of static friction?)
required to migrate from one bad answer to a different one is very high.
It seems unlikely it would be reached at this point.

deanveloper · 2021-07-28T00:21:43Z

To add on to @DeedleFake, here is JavaScript Map:

NaN is considered the same as NaN (even though NaN !== NaN) and all other values are considered equal according to the semantics of the === operator.

bcmills · 2021-08-06T15:50:11Z

This comment thread on CL 338609 is another interesting data point.

(It seems that the behavior of NaN map keys runs counter to the intuition of even @robpike!)

robpike · 2021-08-06T21:13:08Z

I wouldn't say that. It was how NaNs worked (or not) in reflect.DeepEqual I didn't know.

docmerlin · 2022-09-12T15:22:02Z

I don't like the idea of runtime panics on NaN being assigned to maps. Instead we should change the map behavior to allow NaN to be a valid key that can be queried by a NaN.

docmerlin · 2022-09-12T15:37:37Z

IMO behind the scenes it should be doing something like what was proposed above for a workaround:

  if k == 0 { // -0.0 or +0.0
    m[math.Float64bits(0)] = v
  } else {
    m[math.Float64bits(k)] = v
  }

robpike · 2022-09-13T00:47:10Z

Users can easily avoid the proposed panics by calling math.IsNaN(k) before inserting a map entry for k.

Not true. NaNs can be inside arrays or structs.

All our current restrictions are type-based, not value-based. Even the interface panic restriction is about the type inside the interface, not the value. I'd really like to avoid introducing and explaining new value-based restrictions.

I'd really like to avoid introducing NaNs, but I missed that boat. :(

atdiar · 2022-09-29T11:03:29Z

It's interesting that if we consider a value to be/have a subtype (if a type is a set of values, a subtype can be represented as a subset... e.g. float64 with the added constraint that the set of values is restricted to NaN), we can reproduce the same logic used for the comparable interface.

I don't know if this would be backward-compatible though (because I think it would require some additional checks, equivalent to implementing a negated constraint such as non-reflexive comparability constraint checks). Could think about it.

ianlancetaylor · 2023-01-24T00:53:29Z

Just a note that as of the upcoming 1.21 release the language will have a new builtin, clear, which removes a part of the argument for this change. (But there are other arguments for this change, and we could in principle still adopt it.)_

bcmills · 2023-10-13T17:36:23Z

#63312 is another issue of a package in the standard library neglecting to document an exception to its behavior for NaN keys.

The documentation for maps.DeleteFunc says “DeleteFunc deletes any key/value pairs from m for which del returns true”, but today it does not delete such pairs if the key for the pair is not equal to itself.

candlerb · 2023-10-13T19:55:29Z

Panicking on m[k] = v if k != k seems sane and reasonable to me, and although it would be backwards-incompatible, it would be in such a way as to draw attention to itself anywhere that it mattered.

The utility of being able to store multiple values in a hash with the "same" key, which you can't actually retrieve (without iterating over the whole map) or delete (without clearing the whole map) is very low. Anybody who currently does this, very likely does so by accident IMO.

Special-casing NaN when used as a hash key sounds appealing at first, but it would also have to apply to NaNs deeply nested in structs.

Nobody likes the idea of another line of code that could panic at runtime, but this is a similar case to interface comparisons, which are almost always allowed but can panic in certain cases (https://go.dev/play/p/eK9pF8naT8q) - or indeed a divide-by-zero error or a nil pointer dereference.

I suppose a counter-argument is it might encourage more use of panic/recover in a try/catch kind of way, like this, but again, I don't suppose NaN map keys are used much.

There is another possible behaviour: m[k] = v silently does nothing if k != k (i.e. nothing is inserted into the map, and no panic). It gives roughly the same semantics as today: m[k] does not return v; delete(m, k) does nothing. The only difference is that you won't find k/v when iterating over the map.

gopherbot added this to the Proposal milestone Jun 13, 2017

gopherbot added the Proposal label Jun 13, 2017

dsnet added the v2 An incompatible library change label Jun 13, 2017

dsnet changed the title ~~proposal: Go 2: disallow NaN keys in maps~~ proposal: disallow NaN keys in maps Jun 13, 2017

dsnet added the LanguageChange Suggested changes to the Go language label Jun 13, 2017

rsc changed the title ~~proposal: disallow NaN keys in maps~~ proposal: spec: disallow NaN keys in maps Jun 16, 2017

bcmills mentioned this issue Jul 28, 2017

proposal: hash: export a built-in hash function for comparable values #21195

Closed

ianlancetaylor closed this as completed Feb 20, 2018

bcmills mentioned this issue Feb 22, 2018

spec: document special cases for floating-point maps #24049

Open

bcmills mentioned this issue Mar 6, 2018

proposal: spec: improve for-loop ergonomics #24282

Open

bcmills mentioned this issue Jun 20, 2018

proposal: spec: define _ on rhs as zero value #19642

Closed

gopherbot removed the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Aug 16, 2019

gopherbot added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Sep 3, 2019

bcmills mentioned this issue Dec 16, 2020

encoding/json: panics on NaN map keys #43207

Closed

ianlancetaylor mentioned this issue Sep 14, 2022

reflect: add Value.Clear #55002

Closed

Merovius mentioned this issue Nov 19, 2022

spec: add clear(x) builtin, to clear map, zero content of slice #56351

Closed

ianlancetaylor removed v2 An incompatible library change NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Aug 23, 2023

ianlancetaylor added this to Proposals Aug 23, 2023

ianlancetaylor moved this to Incoming in Proposals Aug 23, 2023

bcmills mentioned this issue Oct 13, 2023

maps: document that DeleteFunc fails in presence of NaN keys #63312

Closed

ianlancetaylor added the LanguageChangeReview Discussed by language change review committee label Aug 6, 2024

proposal: spec: disallow NaN keys in maps #20660

proposal: spec: disallow NaN keys in maps #20660

Comments

bcmills commented Jun 13, 2017 • edited Loading

randall77 commented Jun 13, 2017

dsnet commented Jun 13, 2017

bcmills commented Jun 13, 2017

bcmills commented Jun 13, 2017 • edited Loading

randall77 commented Jun 13, 2017

cznic commented Jun 14, 2017

rsc commented Jun 14, 2017

bcmills commented Jun 14, 2017

bcmills commented Jun 14, 2017

rsc commented Jun 14, 2017

zombiezen commented Dec 21, 2017

bcmills commented Dec 21, 2017 • edited Loading

bcmills commented Jan 2, 2018

ianlancetaylor commented Feb 20, 2018 • edited Loading

griesemer commented Feb 20, 2018

bcmills commented Feb 21, 2018 • edited Loading

ianlancetaylor commented Feb 22, 2018

bcmills commented Feb 22, 2018 • edited Loading

mdempsky commented Feb 22, 2018

cznic commented Feb 22, 2018

bcmills commented Feb 22, 2018 • edited Loading

bcmills commented Feb 23, 2018 • edited Loading

bcmills commented Dec 16, 2020

deanveloper commented Jul 27, 2021 • edited Loading

DeedleFake commented Jul 28, 2021

rsc commented Jul 28, 2021 • edited Loading

deanveloper commented Jul 28, 2021

bcmills commented Aug 6, 2021

robpike commented Aug 6, 2021 • edited Loading

docmerlin commented Sep 12, 2022

docmerlin commented Sep 12, 2022 • edited Loading

robpike commented Sep 13, 2022

atdiar commented Sep 29, 2022 • edited Loading

ianlancetaylor commented Jan 24, 2023

bcmills commented Oct 13, 2023

candlerb commented Oct 13, 2023 • edited Loading

bcmills commented Jun 13, 2017 •

edited

Loading

bcmills commented Jun 13, 2017 •

edited

Loading

bcmills commented Dec 21, 2017 •

edited

Loading

ianlancetaylor commented Feb 20, 2018 •

edited

Loading

bcmills commented Feb 21, 2018 •

edited

Loading

bcmills commented Feb 22, 2018 •

edited

Loading

bcmills commented Feb 22, 2018 •

edited

Loading

bcmills commented Feb 23, 2018 •

edited

Loading

deanveloper commented Jul 27, 2021 •

edited

Loading

rsc commented Jul 28, 2021 •

edited

Loading

robpike commented Aug 6, 2021 •

edited

Loading

docmerlin commented Sep 12, 2022 •

edited

Loading

atdiar commented Sep 29, 2022 •

edited

Loading

candlerb commented Oct 13, 2023 •

edited

Loading