Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: maps: helper to loop over the maps in predictable order #68598

Open
giautm opened this issue Jul 26, 2024 · 21 comments
Open

proposal: maps: helper to loop over the maps in predictable order #68598

giautm opened this issue Jul 26, 2024 · 21 comments
Labels
Milestone

Comments

@giautm
Copy link

giautm commented Jul 26, 2024

Proposal Details

I propose a function to allow a loop over a map in stable order. It takes all keys of the map, sorts them, and loops over the slides, each loop, yields a key-pair.

// SortedByKey returns an iterator for the given map that
// yields the key-value pairs in sorted order.
func SortedByKey[Map ~map[K]V, K cmp.Ordered, V any](m Map) iter.Seq2[K, V] {
	return func(yield func(K, V) bool) {
		for _, k := range slices.Sorted(Keys(m)) {
			if !yield(k, m[k]) {
				return
			}
		}
	}
}
@gopherbot gopherbot added this to the Proposal milestone Jul 26, 2024
@giautm
Copy link
Author

giautm commented Jul 26, 2024

The Iter func will be helpful for the case you need to build an output (text or HCL) that comes from a Map.

func attributes() map[string]string {
	return map[string]string{
		"key1": "val1",
		"key2": "val2",
	}
}

func Example_SortedByKey() {
	for k, v := range SortedByKey(attributes()) {
		fmt.Println(k, v)
	}
	// Output:
	// key1 val1
	// key2 val2
}

Or processing a map with API call.

https://docs.temporal.io/develop/go/core-application#workflow-logic-requirements

In Go, Workflow Definition code cannot directly do the following:

  • Iterate over maps using range, because with range the order of the map's iteration is randomized.
    Instead you can collect the keys of the map, sort them, and then iterate over the sorted keys to access the map.
    This technique provides deterministic results.
    You can also use a Side Effect or an Activity to process the map instead.
  • Call an external API, conduct a file I/O operation, talk to another service, etc. (Use an Activity for these.)

@giautm
Copy link
Author

giautm commented Jul 26, 2024

@cuonglm Sent me this comment, but I think it's for very specific case that only need process on keys. The Iter or SortedIter would be nice for reduce function calls, shorter syntax, drop-in-replacement to fix bugs with loop over map.

You've more than welcome to raise your consent here.

@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Jul 26, 2024
@ianlancetaylor
Copy link
Member

It's a bit unfortunate that slices.Sorted and maps.Sorted would have rather different behavior.

@gazerro
Copy link
Contributor

gazerro commented Jul 26, 2024

It's a bit unfortunate that slices.Sorted and maps.Sorted would have rather different behavior.

It could be called SortedByKey

for k, v := range maps.SortedByKey(m) {
     fmt.Println(k, v)
}

in this way, you could also have SortedByValue.

@gophun
Copy link

gophun commented Jul 26, 2024

It doesn't feel right to have a function that does an allocation and a full copy just for iteration. Those needing such functionality should likely opt for an ordered map data structure from the start.

@apparentlymart
Copy link

While I do agree that using this function in a hot path would probably not be a good idea, I've read and written many variations of this behavior in non-performance-sensitive situations, such as to produce predictable results in a test case, and so I think it would be nice to have a more readable shorthand for this seemingly-common need.

There are also plenty of situations where the input map is small, and where the allocation/sorting cost is likely immaterial.

I guess the tradeoff here is, as usual, whether it seems likely that offering this utility in stdlib would make people more likely to use it in situations where the performance tradeoffs are not appropriate. Personally I think in this case it would be sufficient to mention in the function's documentation that it allocates a slice containing all of the keys and sorts it using the natural ordering of the key type, leaving potential callers to make their own conclusions about whether that's acceptable.

(I also prefer the name SortedByKey to make the behavior of the function more explicit at the callsite.)

@jimmyfrasche
Copy link
Member

If there were

package maps
func Index[Map ~map[K]V, Slice ~[]K, K comparable, V any](m Map, ks Slice) iter.Seq2[K, V] {
	retun func(yield func(K, V) bool) {
		for _, k := range ks {
			if v, ok := m[k]; ok {
				if !yield(k, v) {
					return
				}
			}
		}
	}
}

then, in the cases where you do need it, it would just be

maps.Index(m, slices.Sorted(maps.Keys(m))...)

@cespare
Copy link
Contributor

cespare commented Jul 29, 2024

I also regularly have the experience of writing the boilerplate code to pull all the keys out of a map into a slice, sort it, and then iterate the map. However, I agree with @gophun: the right answer here is not to make this easier, but to add a sorted map type to the standard library. If it were just as convenient to use the sorted map as a regular map, then there'd be no reason not to reach for that and write the better code right off the bat rather than doing the post hoc key-sorting.

And now that we have both generics and iterator functions, I think we're at the point where we can write an ergonomic implementation that would be appropriate for the standard library. (#60630 is one existing proposal.)

@tdakkota
Copy link

tdakkota commented Jul 29, 2024

I would like to note that prior to Go 1.23, to iterate over map in predictable order you have to write something like:

import (
	"slices"

	"golang.org/x/exp/maps"
)

...
keys := maps.Keys(m)
slices.Sort(keys)
for _, k := range keys {
    v := m[k]
}

But since Go 1.23, you can just write:

import (
	"maps"
	"slices"
)
...
for _, k := range slices.Sorted(maps.Keys(m)) {
    v := m[k]
}

which seems good enough for me.

@giautm
Copy link
Author

giautm commented Jul 29, 2024

which seems good enough for me.

It doesn't change the fact - that we still allocate a new slice for the keys. maps.SortedByKey is convenient for processing the map right returned from the functions.

All of you guys have the right points about this function. Yes, it allocates memory doesn't seem to be the correct way to implement this, but I think we can change it in the future by optimizing from compile.

Creating another map structure sounds like the right way, but the are plenty of libraries that won't change to new maps. - And it should be for memory sensitive optimize

In a lot of cases, maps with under 100 items didn't need such optimization.

import (
	"maps"
	"slices"
)
// ...
m := getattributes()
for _, k := range slices.Sorted(maps.Keys(m)) {
	v := m[k]
	// ...
}

for k, v := range maps.SortedByKey(getattributes()) {
	// ...
}

@apparentlymart
Copy link

The point about slices.Sorted(maps.Keys(m)) is fair. I hadn't noticed that combination was now possible in Go 1.23.

One small advantage of maps.SortedByKey is that it can know exactly how large a slice it needs to allocate because a map has a known length, whereas slices.Sorted is backed by slices.AppendSeq that can't predict how long its input iterator will be and so could potentially reallocate multiple times.

However, we already discussed that this proposed function is not intended for performance-sensitive situations anyway, so that argument doesn't seem very convincing to me.

I do still think that this is a common enough pattern that it could be justified to offer it as a single function, and so I'm leaving my 👍 upvote on it, but I'd also be sympathetic to the argument that we should wait and see how the ecosystem responds to slices.Sorted(maps.Keys(m)) before assuming that a single-function shorthand is warranted. 🤔

@josharian
Copy link
Contributor

the right answer here is not to make this easier, but to add a sorted map type to the standard library.

I respectfully disagree. There are lots of places where I already have a plain map already, including deserialized spots etc, and I'm in non-performance-critical code where I just want to iterate over the thing I already have.

for _, k := range slices.Sorted(maps.Keys(m)) { v := m[k] }

This is not much code, true. But this is also a super, super common thing to do. I'd much rather write for k, v := range maps.SortedIter { } any day of the week. (Or whatever it ends up being named.)

@adonovan
Copy link
Member

adonovan commented Dec 28, 2024

This proposal addresses a common and real need, and does so with maximum brevity and efficiency, unlike the alternatives:

  • slices.Sorted + maps.Keys loses information about the correct array size to allocate.
  • the maps.Index idea is clever but maps.Index(m, slices.Sorted(maps.Keys(m))...) is a lot to type and remember.
  • a hypothetical future sorted map type (if it even comes to pass) is not always appropriate for reasons of efficiency (logarithmic time access) and compatibility (all existing APIs return maps).

So this proposal is not redundant. I support it.

Whatever name is ultimately chosen (maps.Sorted, say), there should also be maps.SortedFunc variant.

@jimmyfrasche
Copy link
Member

Those are fair points but I think the "materializing the sorted keys behind your back" objection is still reasonable.

Maybe that just needs to be its own thing: maps.SortedKeys and maps.SortedKeysFunc that both return a slice.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/639135 mentions this issue: gopls/internal/util/moremaps: use Sorted throughout gopls

gopherbot pushed a commit to golang/tools that referenced this issue Jan 2, 2025
This CL (written in passing while thinking about modernizers)
replaces each place that allocates a slice of map keys, sorts them,
and ranges over the slice, with "for k, v := range moremaps.Sorted(m)".

It also adds moremaps.SortedFunc for symmetry.

Updates golang/go#68598

Change-Id: I526ddf2398e7cee77f8369fc7a977e6f64727f27
Reviewed-on: https://go-review.googlesource.com/c/tools/+/639135
Auto-Submit: Alan Donovan <adonovan@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
Commit-Queue: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@mxey
Copy link

mxey commented Jan 6, 2025

Maybe that just needs to be its own thing: maps.SortedKeys and maps.SortedKeysFunc that both return a slice.

I don't think it should return a slice of keys, because then it's no better than slices.Sorted(maps.Keys(m)). The whole point of an extra iterator function is to naturally provide key and value as the iterator values.

@ianlancetaylor
Copy link
Member

The advantage of maps.SortedKeys would be that it can preallocate the slice, which slices.Sorted can't do. It's not a huge advantage but it's not nothing.

@adonovan
Copy link
Member

adonovan commented Jan 6, 2025

I do wonder whether range-over-func was the right decision. The approach taken by some languages is to indicate iterability through methods, not through the underlying representation; this would have allowed iterable types representing known-length sequences to implement a more specific interface that also provides as Len() int method, which an operation such as the proposed SortedByKey could query. Presumably the reason this was rejected is that it would set a precedent of the compiler treating some methods specially.

@apparentlymart
Copy link

There was some discussion about the possibility of using "special" methods instead of range-over-func in one of the earlier iterator proposals, FWIW: #43557 (comment)

I mention that only to try to link to the historical context. Since this decision has already been made and shipped, I assume it's no longer up for debate anyway.

@adonovan
Copy link
Member

adonovan commented Jan 6, 2025

Thanks. I wasn't trying to re-litigate, only to understand and reflect.

@jimmyfrasche
Copy link
Member

@mxey In addition to preallocating the slice it's also shorter to type and you may not even need to (directly) import slices. That makes it "easy enough" to do (imo) without having the easy-to-misuse API front and center. You can also reuse the slice if you need to iterate over the sames keys more than once. I think it's useful enough on its own and it certainly wouldn't prevent adding a more helpful helper later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests