-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add support for branchable collections #3216
feat: Add support for branchable collections #3216
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #3216 +/- ##
===========================================
+ Coverage 77.38% 77.45% +0.06%
===========================================
Files 378 382 +4
Lines 34941 35178 +237
===========================================
+ Hits 27039 27244 +205
- Misses 6274 6304 +30
- Partials 1628 1630 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 8 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
client/collection_description.go
Outdated
@@ -88,6 +88,21 @@ type CollectionDescription struct { | |||
// At the moment this can only be set to `false` if this collection sources its data from | |||
// another collection/query (is a View). | |||
IsMaterialized bool | |||
|
|||
// IsMaterialized defines whether the history of this collection is tracked as a single, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: says IsMaterialized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"typo" 😁 Cheers, fixing now
- fix typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Good work, Andy.
internal/core/crdt/collection.go
Outdated
) | ||
|
||
// Collection is a simple CRDT type that tracks changes to the contents of a | ||
// collection in a similar way to a document composite commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: this documentation doesn't it make clear how it's different from composite. Would be nice to add some more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay - cheers, I'll see what I can do :)
- Expand/clarify doc
@@ -31,13 +31,18 @@ type HeadFetcher struct { | |||
func (hf *HeadFetcher) Start( | |||
ctx context.Context, | |||
txn datastore.Txn, | |||
prefix keys.HeadStoreKey, | |||
prefix immutable.Option[keys.HeadstoreKey], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: I don't think making prefix
optional makes sense here, as empty prefix has also a meaning and None
means there is no value. I would say it's reasonalbe to always pass some value for prefix and it keeps the API cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the implementations of HeadstoreKey
would result in an empty prefix that allows scanning across the entire headstore (required in order to return collection and doc commits).
We could pass nil
and check for that, or we could implement a Nil/All HeadstoreKey type (very weird), but both of those seem a lot less pleasant than using option.
It does sound like why this param is optional should be documented on Start
though, so I'll add something there.
- Doc prefix param
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If None, the entire headstore will be scanned - for example, in order to fetch document and collection heads.
Is that something we even want to allow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, most of the tests in this PR rely on that, and it is the default behaviour of commits
queries.
updateEvent := event.Update{ | ||
Cid: link.Cid, | ||
SchemaRoot: c.Schema().Root, | ||
Block: headNode, | ||
} | ||
|
||
txn.OnSuccess(func() { | ||
c.db.events.Publish(event.NewMessage(event.UpdateName, updateEvent)) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: do we really need to send another Update
event? Can't the receiver of the even simpli check on this side if the collection is branchable and handle correspondingly?
suggestion: if we need to send another event, then I would say it serves a different purpose and in this case we should introduce another event type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't the receiver of the even simpli check on this side if the collection is branchable and handle correspondingly?
I hadn't considered that, I'll think about that and get back to you :)
- Consider event stuff
if we need to send another event, then I would say it serves a different purpose and in this case we should introduce another event type.
I disagree with this, it is an update event, an update of a collection. It needs to be handled for exactly the same reasons that an update (doc) event is currently handled, and in pretty much the same way (some rework required as P2P is very focused on docIDs etc atm as noted in PR description).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't considered that, I'll think about that and get back to you
I think I prefer the extra events. It feels more correct to send independent updates for the doc and the collection, especially so as the event bus is public and can be subscribed to by users.
Keeping as-is, unless dissuaded by future argument :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also prefer the extra event but the reason for why I prefer it is missing from this implementation.
I would expect a CreateMany
to generate only one collection block linking to all the docs created (same for updates or mix of creates and updates). This means that we would have one update event for every doc and one update event at the end for the collection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect a CreateMany to generate only one collection block linking to all the docs created (same for updates or mix of creates and updates). This means that we would have one update event for every doc and one update event at the end for the collection.
I was planning on building that here, but then I realised it is not specific to collection-level commits. The same system can (should) be applied to all crdt/blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same system can (should) be applied to all crdt/blocks.
Can you expand on this? Why would that apply to composites or field level CRDTs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as multiple collection blocks can be created in a single transaction, so too can multiple composite and field level blocks. Grouping any of these together into a single block should work consistently regardless of type.
E.g. if in a single transaction Age
on Document1
in branchable collection 1
is updated twice, the same mechanic for rolling the would-be-two collection level commits into a single collection commit should also apply to the field and composite level commits.
There is nothing special about collection-level commits here, although the need for it is possibly higher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say this is definitely not the case for field level CRDTs. They only point to their previous head (they don't have links to other items) plus they don't cause network updates on their own. It may apply to composites for the multiple updates to the same document in one transaction but I'm not sure there is much benefit to it as it would create multiple heads for the updated fields in question only to save a composite block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say this is definitely not the case for field level CRDTs.
We disagree, and it makes me very glad I did not try and do that work within this PR :)
if c.def.Description.IsBranchable { | ||
collectionCRDT := merklecrdt.NewMerkleCollection( | ||
txn, | ||
keys.NewCollectionSchemaVersionKey(c.Schema().VersionID, c.ID()), | ||
keys.NewHeadstoreColKey(c.def.Description.RootID), | ||
) | ||
|
||
link, headNode, err := collectionCRDT.Save(ctx, []coreblock.DAGLink{{Link: link}}) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
updateEvent := event.Update{ | ||
Cid: link.Cid, | ||
SchemaRoot: c.Schema().Root, | ||
Block: headNode, | ||
} | ||
|
||
txn.OnSuccess(func() { | ||
c.db.events.Publish(event.NewMessage(event.UpdateName, updateEvent)) | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: extract duplicated code
Results: map[string]any{ | ||
"commits": []map[string]any{ | ||
{ | ||
"cid": testUtils.NewUniqueCid("collection"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (out of scope): this string makes me always think that it has a meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I understand that - I think when I next use NewUniqueCid
I should force myself to use another type (probs int) to see how that feels.
|
||
// TODO: This test documents an unimplemented feature. Tracked by: | ||
// https://github.com/sourcenetwork/defradb/issues/3212 | ||
func TestQueryCommitsBranchables_AcrossPeerConnection(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: for complex tests I would like to see more descriptive name, so that I don't need to dig into the test's body to understand what it tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll expand this one, probably to TestQueryCommitsBranchables_SyncsAcrossPeerConnection
- Expand test name, make sure you update the linked ticket too
GQL queries. | ||
|
||
Currently this property is immutable and can only be set on collection creation, however | ||
that will change in the future.`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: We should probably add the todo comment with the linked issue above so that we don't forget to edit this once we add the feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a problem, although that issue does not exist yet, so I'll be adding another item to sit in our long backlog.
- Create and link issue for toggling branchableness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it did already exist 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is well done. I like the tests that document the unimplemented features as well.
Just a few comments to resolve before approval.
internal/core/crdt/collection.go
Outdated
// collection in a similar way to a document composite commit, only simpler, | ||
// without the need to track status and a simpler [Merge] function. | ||
type Collection struct { | ||
store datastore.DSReaderWriter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: This field can be removed as it's never used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice spot :)
- Rm col.store
// schemaVersionKey is the schema version datastore key at the time of commit. | ||
// | ||
// It can be used to identify the collection datastructure state at the time of commit. | ||
schemaVersionKey keys.CollectionSchemaVersionKey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I don't think there is any reason to have to create this schema version key only to extract the schemaVersionID
later on. I think going directly with the schemaVersionID
would be preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will have a look, not a lot of thought went into this, I copy paasted from another crdt and dropped the unused props.
- rm col.schemaversionKey?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving as is, changing it would introduce a significant difference between the usage patterns of this type and the other crdts. And the correctness of moving it also depends on the untested Merge
function (interfaced, so cannot be easily changed).
internal/core/crdt/collection.go
Outdated
return nil | ||
} | ||
|
||
func (c *Collection) Append() *CollectionDelta { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Set
might be a better name for this. There is no reason for it to be different than the composite CRDT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I change the function name of the composite? Set
is a really bad name for a function that doesn't set anything.
- Rename something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed composite.Set to Append
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Append is equally a bad name for something that doesn't append anything though. Set does set fields in the delta object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new delta object, is sets nothing on the host. Although Append
doesnt append to the host really either. Both names are bad, but they are now consistent at least.
The two functions are constructors, nothing more. They could be renamed NewDelta
perhaps.
Note: I don't see this as something terribly important, and am not very keen on thinking much about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you leave it as Set
for now. It doesn't make sense to me to chose the objectively less accurate name Append
when Set
was there previously. I have a vague memory of John deciding to use Set
because it would have been the correct naming in CRDT lingo. If that's not the case and you really don't want to use Set
, NewDelta
is definitely the better choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done :)
Will soon be adding collections to the headstore, and we dont want the keys to intermingle
In order to accomodate new headstore key types
cf03cdf
to
d05f74a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think the CRDT code is starting to be much more approchable 👍
bug bash: Manually tested running 2 nodes, creating docs on a branchable collection, running standard and time travelling queries on collection commits and inspected commit queries. All works as expected. |
## Relevant issue(s) Resolves sourcenetwork#3038 ## Description Adds support for branchable collections. Does not add support for syncing the collection level commits via the P2P system (broken out, lots of people working in the space, significant changes required, I'm not so familiar with that part of the code so will take longer). This does mean that the somewhat surprising (to me at least) implementation of `Collection.Merge` is currently untested. Commits are queriable via the `commits` GQL queries. Time travel queries do not work due to an existing bug: sourcenetwork#3214 - once that bug is fixed I expect there to be some more work (definitely testing) in order to get it to work with branchable collections. This is a breaking change due to the moving/namespacing of the existing document headstore keys.
Relevant issue(s)
Resolves #3038
Description
Adds support for branchable collections.
Does not add support for syncing the collection level commits via the P2P system (broken out, lots of people working in the space, significant changes required, I'm not so familiar with that part of the code so will take longer). This does mean that the somewhat surprising (to me at least) implementation of
Collection.Merge
is currently untested.Commits are queriable via the
commits
GQL queries.Time travel queries do not work due to an existing bug: #3214 - once that bug is fixed I expect there to be some more work (definitely testing) in order to get it to work with branchable collections.
This is a breaking change due to the moving/namespacing of the existing document headstore keys.