Skip to content

Commit d013fc0

Browse files
committed
doc: clean up and expand on traversal pkg docs
1 parent 4888b08 commit d013fc0

File tree

2 files changed

+147
-63
lines changed

2 files changed

+147
-63
lines changed

traversal/doc.go

+91-46
Original file line numberDiff line numberDiff line change
@@ -1,48 +1,94 @@
1-
// This package provides functional utilities for traversing and transforming
2-
// IPLD nodes.
3-
//
4-
// The traversal.Path type provides a description of how to perform
5-
// several steps across a Node tree. These are dual purpose:
6-
// Paths can be used as instructions to do some traversal, and
7-
// Paths are accumulated during traversals as a log of progress.
8-
//
9-
// "Focus" functions provide syntactic sugar for using ipld.Path to jump
10-
// to a Node deep in a tree of other Nodes.
11-
//
12-
// "FocusedTransform" functions can do the same such deep jumps, and support
13-
// mutation as well!
14-
// (Of course, since ipld.Node is an immutable interface, more precisely
15-
// speaking, "transformations" are implemented rebuilding trees of nodes to
16-
// emulate mutation in a copy-on-write way.)
17-
//
18-
// "Walk" functions perform a walk of a Node graph, and apply visitor
19-
// functions multiple Nodes. The more advanced Walk functions can be guided
20-
// by Selectors, which provide a declarative mechanism for guiding the
21-
// traversal and filtering which Nodes are of interest.
22-
// (See the selector sub-package for more detail.)
23-
//
24-
// "WalkTransforming" is similar to Traverse, but with support for mutations.
25-
// Like "FocusTransform", "WalkTransforming" operates in a copy-on-write way.
26-
//
27-
// All of these functions -- the "Focus*" and "Walk*" family alike --
28-
// work via callbacks: they do the traversal, and call a user-provided function
29-
// with a handle to the reached Node. Further "Focus" and "Walk" can be used
30-
// recursively within this callback.
31-
//
32-
// All of these functions -- the "Focus*" and "Walk*" family alike --
33-
// include support for automatic resolution and loading of new Node trees
34-
// whenever IPLD Links are encountered. This can be configured freely
35-
// by providing LinkLoader interfaces to the traversal.Config.
36-
//
37-
// Some notes on the limits of usage:
38-
//
39-
// The "*Transform" family of methods is most appropriate for patterns of usage
40-
// which resemble point mutations.
41-
// More general transformations -- zygohylohistomorphisms, etc -- will be best
42-
// implemented by composing the read-only systems (e.g. Focus, Traverse) and
43-
// handling the accumulation in the visitor functions.
44-
//
45-
// (Why? The "point mutation" use-case gets core library support because
1+
// Package traversal provides functional utilities for traversing and
2+
// transforming IPLD graphs.
3+
//
4+
// Two primary types of traversal are implemented in this package: "Focus" and
5+
// "Walk". Both types have a "Transforming" variant, which supports mutation
6+
// through emulated copy-on-write tree rebuilding.
7+
//
8+
// Traversal operations use the Progress type for configuration and state
9+
// tracking. Helper functions such as Focus and Walk exist to avoid manual setup
10+
// of a Progress struct, but they cannot cross link boundaries without a
11+
// LinkSystem, which needs to be configured on the Progress struct.
12+
//
13+
// A typical traversal operation involves creating a Progress struct, setting up
14+
// the LinkSystem, and calling one of the Focus or Walk functions on the
15+
// Progress object. Various other configuration options are available when
16+
// traversing this way.
17+
//
18+
// # Focus
19+
//
20+
// "Focus" and "Get" functions provide syntactic sugar for using ipld.Path to
21+
// access Nodes deep within a graph.
22+
//
23+
// "FocusedTransform" resembles "Focus" but supports user-defined mutation using
24+
// its TransformFn.
25+
//
26+
// # Walk
27+
//
28+
// "Walk" functions perform a recursive walk of a Node graph, applying visitor
29+
// functions to matched parts of the graph.
30+
//
31+
// The selector sub-package offers a declarative mechanism for guiding
32+
// traversals and filtering relevant Nodes.
33+
// (Refer to the selector sub-package for more details.)
34+
//
35+
// "WalkLocal" is a special case of Walk that doesn't require a selector. It
36+
// walks a local graph, not crossing link boundaries, and calls its VisitFn for
37+
// each encountered Node.
38+
//
39+
// "WalkMatching" traverses according to a selector, calling the VisitFn for
40+
// each match based on the selector's matching rules.
41+
//
42+
// "WalkAdv" performs the same traversal as WalkMatching, but calls its
43+
// AdvVisitFn on every Node, regardless of whether it matches the selector.
44+
//
45+
// "WalkTransforming" resembles "WalkMatching" but supports user-defined
46+
// mutation using its TransformFn.
47+
//
48+
// # Usage Notes
49+
//
50+
// These functions work via callbacks, performing traversal and calling a
51+
// user-provided function with a handle to the reached Node(s). Further "Focus"
52+
// and "Walk" operations can be performed recursively within this callback if
53+
// desired.
54+
//
55+
// All traversal functions operate on a Progress object, except "WalkLocal",
56+
// which can be configured with a LinkSystem for automatic resolution and
57+
// loading of new Node trees when IPLD Links are encountered.
58+
//
59+
// The "*Transform" methods are best suited for point-mutation patterns. For
60+
// more general transformations, use the read-only systems (e.g., Focus,
61+
// Traverse) and handle accumulation in the visitor functions.
62+
//
63+
// A common use case for walking traversal is running a selector over a graph
64+
// and noting all the blocks it uses. This is achieved by configuring a
65+
// LinkSystem that can handle and observe block loads. Be aware that a selector
66+
// might visit the same block multiple times during a traversal, as IPLD graphs
67+
// often form "diamond patterns" with the same block referenced from multiple
68+
// locations.
69+
//
70+
// The LinkVisitOnlyOnce option can be used to avoid duplicate loads, but it
71+
// must be used carefully with non-trivial selectors, where repeat visits of
72+
// the same block may be essential for traversal or visit callbacks.
73+
//
74+
// A Budget can be set at the beginning of a traversal to limit the number of
75+
// Nodes and/or Links encountered before failing the traversal (with the
76+
// ErrBudgetExceeded error).
77+
//
78+
// The "Preloader" option provides a way to parallelize block loading in
79+
// environments where block loading is a high-latency operation (such as
80+
// fetching over the network).
81+
// The traversal operation itself is not parallel and will proceed strictly
82+
// according to path or selector order. However, a Preloader can be used to load
83+
// blocks asynchronously, and prepare the LinkSystem that the traversal is using
84+
// with already-loaded blocks.
85+
//
86+
// A Preloader and a Budget option can be used on the same traversal, BUT the
87+
// Preloader may not receive the same links that the traversal wants to load
88+
// from the LinkSystem. Use with care. See notes below.
89+
package traversal
90+
91+
// Why only "point-mutation"? This use-case gets core library support because
4692
// it's both high utility and highly clear how to implement it.
4793
// More advanced transformations are nontrivial to provide generalized support
4894
// for, for three reasons: efficiency is hard; not all existing research into
@@ -53,4 +99,3 @@
5399
// Therefore, attempts at generalization are not included here; handling these
54100
// issues in concrete cases is easy, so we call it an application logic concern.
55101
// However, exploring categorical recursion schemes as a library is encouraged!)
56-
package traversal

traversal/fns.go

+56-17
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,12 @@ type AdvVisitFn func(Progress, datamodel.Node, VisitReason) error
2626
type VisitReason byte
2727

2828
const (
29-
VisitReason_SelectionMatch VisitReason = 'm' // Tells AdvVisitFn that this node was explicitly selected. (This is the set of nodes that VisitFn is called for.)
30-
VisitReason_SelectionParent VisitReason = 'p' // Tells AdvVisitFn that this node is a parent of one that will be explicitly selected. (These calls only happen if the feature is enabled -- enabling parent detection requires a different algorithm and adds some overhead.)
31-
VisitReason_SelectionCandidate VisitReason = 'x' // Tells AdvVisitFn that this node was visited while searching for selection matches. It is not necessarily implied that any explicit match will be a child of this node; only that we had to consider it. (Merkle-proofs generally need to include any node in this group.)
29+
// VisitReason_SelectionMatch tells AdvVisitFn that this node was explicitly selected. (This is the set of nodes that VisitFn is called for.)
30+
VisitReason_SelectionMatch VisitReason = 'm'
31+
// VisitReason_SelectionParent tells AdvVisitFn that this node is a parent of one that will be explicitly selected. (These calls only happen if the feature is enabled -- enabling parent detection requires a different algorithm and adds some overhead.)
32+
VisitReason_SelectionParent VisitReason = 'p'
33+
// VisitReason_SelectionCandidate tells AdvVisitFn that this node was visited while searching for selection matches. It is not necessarily implied that any explicit match will be a child of this node; only that we had to consider it. (Merkle-proofs generally need to include any node in this group.)
34+
VisitReason_SelectionCandidate VisitReason = 'x'
3235
)
3336

3437
// Progress tracks a traversal as it proceeds. It is used initially to begin a traversal, and it is then passed to the visit function as the traversal proceeds.
@@ -46,25 +49,56 @@ const (
4649
// Currently a best-guess approach is used to try and have the preloader adhere to the budget, but with typical real-world graphs, this is likely to be inaccurate.
4750
// In the case of inaccuracies, the budget will be properly applied to the traversal-proper, but the preloader may receive a different set of links than the traversal-proper will.
4851
type Progress struct {
49-
Cfg *Config
50-
Path datamodel.Path // Path is how we reached the current point in the traversal.
51-
LastBlock struct { // LastBlock stores the Path and Link of the last block edge we had to load. (It will always be zero in traversals with no linkloader.)
52+
// Cfg is the configuration for the traversal, set by user.
53+
Cfg *Config
54+
55+
// Budget, if present, tracks "budgets" for how many more steps we're willing to take before we should halt.
56+
// Budget is initially set by user, but is then updated as the traversal proceeds.
57+
Budget *Budget
58+
59+
// Path is how we reached the current point in the traversal.
60+
Path datamodel.Path
61+
62+
// LastBlock stores the Path and Link of the last block edge we had to load. (It will always be zero in traversals with no linkloader.)
63+
LastBlock struct {
5264
Path datamodel.Path
5365
Link datamodel.Link
5466
}
55-
PastStartAtPath bool // Indicates whether the traversal has progressed passed the StartAtPath in the config -- use to avoid path checks when inside a sub portion of a DAG that is entirely inside the "not-skipped" portion of a traversal
56-
Budget *Budget // If present, tracks "budgets" for how many more steps we're willing to take before we should halt.
57-
SeenLinks map[datamodel.Link]struct{} // Set used to remember which links have been visited before, if Cfg.LinkVisitOnlyOnce is true.
67+
68+
// PastStartAtPath indicates whether the traversal has progressed passed the StartAtPath in the config -- use to avoid path checks when inside a sub portion of a DAG that is entirely inside the "not-skipped" portion of a traversal
69+
PastStartAtPath bool
70+
71+
// SeenLinks is a set used to remember which links have been visited before, if Cfg.LinkVisitOnlyOnce is true.
72+
SeenLinks map[datamodel.Link]struct{}
5873
}
5974

6075
// Config is a set of options for a traversal. Set a Config on a Progress to customize the traversal.
6176
type Config struct {
62-
Ctx context.Context // Context carried through a traversal. Optional; use it if you need cancellation.
63-
LinkSystem linking.LinkSystem // LinkSystem used for automatic link loading, and also any storing if mutation features (e.g. traversal.Transform) are used.
64-
LinkTargetNodePrototypeChooser LinkTargetNodePrototypeChooser // Chooser for Node implementations to produce during automatic link traversal.
65-
LinkVisitOnlyOnce bool // By default, we visit across links wherever we see them again, even if we've visited them before, because the reason for visiting might be different than it was before since we got to it via a different path. If set to true, track links we've seen before in Progress.SeenLinks and do not visit them again. Note that sufficiently complex selectors may require valid revisiting of some links, so setting this to true can change behavior noticably and should be done with care.
66-
StartAtPath datamodel.Path // If set, causes a traversal to skip forward until passing this path, and only then begins calling visit functions. Block loads will also be skipped wherever possible.
67-
Preloader preload.Loader // Receives a list of links within each block prior to traversal-proper. This can be used to asynchronously load blocks that will be required at a later phase of the retrieval, or even to load blocks in a different order than the traversal would otherwise do. Preload calls are not de-duplicated, it is up to the receiver to do so if desired. Beware of using both Budget and Preloader! See the documentation on Progress for more information.
77+
// Ctx is the context carried through a traversal.
78+
// Optional; use it if you need cancellation.
79+
Ctx context.Context
80+
81+
// LinkSystem is used for automatic link loading, and also any storing if mutation features (e.g. traversal.Transform) are used.
82+
LinkSystem linking.LinkSystem
83+
84+
// LinkTargetNodePrototypeChooser is a chooser for Node implementations to produce during automatic link traversal.
85+
LinkTargetNodePrototypeChooser LinkTargetNodePrototypeChooser
86+
87+
// LinkVisitOnlyOnce controls repeat-link visitation.
88+
// By default, we visit across links wherever we see them again, even if we've visited them before, because the reason for visiting might be different than it was before since we got to it via a different path.
89+
// If set to true, track links we've seen before in Progress.SeenLinks and do not visit them again.
90+
// Note that sufficiently complex selectors may require valid revisiting of some links, so setting this to true can change behavior noticably and should be done with care.
91+
LinkVisitOnlyOnce bool
92+
93+
// StartAtPath, if set, causes a traversal to skip forward until passing this path, and only then begins calling visit functions.
94+
// Block loads will also be skipped wherever possible.
95+
StartAtPath datamodel.Path
96+
97+
// Preloader receives links within each block prior to traversal-proper by performing a lateral scan of a block without descending into links themselves before backing up and doing a traversal-proper.
98+
// This can be used to asynchronously load blocks that will be required at a later phase of the retrieval, or even to load blocks in a different order than the traversal would otherwise do.
99+
// Preload calls are not de-duplicated, it is up to the receiver to do so if desired.
100+
// Beware of using both Budget and Preloader! See the documentation on Progress for more information on this usage and the likely surprising effects.
101+
Preloader preload.Loader
68102
}
69103

70104
// Budget is a set of monotonically-decrementing "budgets" for how many more steps we're willing to take before we should halt.
@@ -75,9 +109,14 @@ type Config struct {
75109
// If you set any budgets (by having a non-nil Progress.Budget field), you must set some value for all of them.
76110
// Traversal halts when _any_ of the budgets reaches zero.
77111
// The max value of an int (math.MaxInt64) is acceptable for any budget you don't care about.
112+
//
113+
// Beware of using both Budget and Preloader! See the documentation on Progress for more information on this usage and the likely surprising effects.
78114
type Budget struct {
79-
NodeBudget int64 // A monotonically-decrementing "budget" for how many more nodes we're willing to visit before halting.
80-
LinkBudget int64 // A monotonically-decrementing "budget" for how many more links we're willing to load before halting. (This is not aware of any caching; it's purely in terms of links encountered and traversed.)
115+
// NodeBudget is a monotonically-decrementing "budget" for how many more nodes we're willing to visit before halting.
116+
NodeBudget int64
117+
// LinkBudget is a monotonically-decrementing "budget" for how many more links we're willing to load before halting.
118+
// (This is not aware of any caching; it's purely in terms of links encountered and traversed.)
119+
LinkBudget int64
81120
}
82121

83122
// Clone returns a copy of the budget.

0 commit comments

Comments
 (0)