Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Decouple db.db from gql #912

Merged
merged 8 commits into from
Oct 28, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion client/db.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@ type DB interface {
GetCollectionByName(context.Context, string) (Collection, error)
GetCollectionBySchemaID(context.Context, string) (Collection, error)
GetAllCollections(ctx context.Context) ([]Collection, error)
GetRelationshipIdField(fieldName, targetType, thisType string) (string, error)

Root() ds.Batching
Blockstore() blockstore.Blockstore
Expand Down
12 changes: 12 additions & 0 deletions client/descriptions.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,18 @@ func (col CollectionDescription) GetField(name string) (FieldDescription, bool)
return FieldDescription{}, false
}

// GetRelation returns the field that supports the relation of the given name.
func (col CollectionDescription) GetRelation(name string) (FieldDescription, bool) {
if !col.Schema.IsEmpty() {
for _, field := range col.Schema.Fields {
if field.RelationName == name {
return field, true
}
}
}
return FieldDescription{}, false
}

func (col CollectionDescription) GetPrimaryIndex() IndexDescription {
return col.Indexes[0]
}
Expand Down
52 changes: 52 additions & 0 deletions core/parser.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
// Copyright 2022 Democratized Data Foundation
//
// Use of this software is governed by the Business Source License
// included in the file licenses/BSL.txt.
//
// As of the Change Date specified in that file, in accordance with
// the Business Source License, use of this software will be governed
// by the Apache License, Version 2.0, included in the file
// licenses/APL.txt.

package core

import (
"context"

"github.com/sourcenetwork/defradb/client"
"github.com/sourcenetwork/defradb/client/request"
)

type SchemaDefinition struct {
// The name of this schema definition.
Name string

// The serialized definition of this schema definition.
// todo: this needs to be properly typed and handled according to
// https://github.com/sourcenetwork/defradb/issues/863
Body []byte
}

type Schema struct {
// The individual declarations of types defined by this schema.
Definitions []SchemaDefinition

// The collection descriptions created from/defined by this schema.
Descriptions []client.CollectionDescription
}
Comment on lines +30 to +36
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: It seems weird to me that a Schema can have multiple definitions and descriptions. Maybe it's because I'm missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They were defined and iterated over separately before I made the change, I think I mostly just needed/wanted a single return type instead of returning two params

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe then it would be appropriate to name it Schemas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single Schema supports multiple collections - this struct is a single Schema. SchemaDefinition and Definitions are a bit poorly named, but they are the old names and I want to avoid fixing everything in this PR. If any suggestions pop up I'll probably change these though if I'm about anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, they can be refactored in a future PR. Was actually thinking about just this the other day when working through some other stuff.

Copy link
Contributor Author

@AndrewSisley AndrewSisley Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll create a ticket and sort this out pretty soon after merge, it is important - as per discord discussion, but it is localized and not worth blocking the rest of the changes from going in

  • create ticket once other convos resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ticket: #922


// Parser represents the object responsible for handling stuff specific to
// a query language. This includes schema and query parsing, and introspection.
type Parser interface {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Can you explain why you decided to put this interface in the core package? Ideally, interfaces are defined in the package where they are used. Since it's used in the db package, why not define it there?

Copy link
Contributor Author

@AndrewSisley AndrewSisley Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, interfaces are defined in the package where they are used

This is very much a different mind set to most other langs IMO :) I can for sure see the merit with the way Golangs magic/inferred interfaces work though. I'll revisit this once the other structural threads have been resolved (in case I would otherwise end up changing this multiple times)

- [ ] Move Parser interface to db package?

Update: Leaving as is as mentioned deeper in the convo

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's different :) I may sound like a broken record at this point cuz I say this often, but I think it's important to follow the language recommended best practices as much as possible regardless of our habits with other langs. Bringing other cool pattern as additional gravy is fine though if it makes sense (i.e. Option type).

I agree with waiting on the other threads to be resolved.

Copy link
Contributor Author

@AndrewSisley AndrewSisley Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah :) I wasn't arguing with you one this one, as it is more a structural thing than just a styling thing - it'll probably be moved if it survives

Similar to the (positive) mindset shift RE pointers/ownership etc I experienced when learning Rust

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you weren't ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also prefer to put these internal use packages into an internal/ directory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eg: If I (or anyone) wanted to play around in a new repo on some experimental tooling that maybe interacts with the core.Replicated type I wouldn't be able to as the internal stuff would block it.

yes. That's the point of internal. It block access by external packages. It does make the type accessible to any packages within the tree rooted at the parent of the internal directory. So if internal is a the root of defradb, then all our packages within defradb can access internal packages. But internal packages wont be part of our public APIs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol.. Yes I'm aware, I'm saying I don't want that, as internal is too strict. I want the freedom to play with core (and other types) from other repos

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does come with the requirement that we maintain those exposed types more carefully because external users can interact with them.

Copy link
Contributor Author

@AndrewSisley AndrewSisley Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does come with the requirement that we maintain those exposed types more carefully because external users can interact with them.

50-50, we can declare that directly using anything but a subset of our packages is unsupported, but it is definitely not going to be clear enough to avoid the occasional support ticket coming through from someone external using pseudo-internal code, as well as occasional confusion for source devs.

// Returns true if the given string is an introspection request.
IsIntrospection(request string) bool

// Executes the given introspection request.
ExecuteIntrospection(request string) *client.QueryResult

// Parses the given request, returning a strongly typed model of that request.
Parse(request string) (*request.Request, []error)

// Adds the given schema to this parser's model.
AddSchema(ctx context.Context, schema string) (*Schema, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: Might be able to drop the AddSchema API if based on my other comments, we don't remove the schema manager from the DB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parking this thought until the rest is resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}
64 changes: 23 additions & 41 deletions db/collection_update.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ import (
"github.com/sourcenetwork/defradb/core"
"github.com/sourcenetwork/defradb/datastore"
"github.com/sourcenetwork/defradb/errors"
"github.com/sourcenetwork/defradb/mapper"
"github.com/sourcenetwork/defradb/planner"
"github.com/sourcenetwork/defradb/query/graphql/parser"
)
Expand Down Expand Up @@ -598,81 +597,64 @@ func (c *collection) makeSelectionQuery(
txn datastore.Txn,
filter any,
) (planner.Query, error) {
mapping := c.createMapping()
var f *mapper.Filter
var f client.Option[request.Filter]
var err error
switch fval := filter.(type) {
case string:
if fval == "" {
return nil, errors.New("invalid filter")
}
var p client.Option[request.Filter]
p, err = parser.NewFilterFromString(fval)

f, err = parser.NewFilterFromString(fval)
if err != nil {
return nil, err
}
f = mapper.ToFilter(p, mapping)
case *mapper.Filter:
case client.Option[request.Filter]:
f = fval
default:
return nil, errors.New("invalid filter")
}
if filter == "" {
return nil, errors.New("invalid filter")
}
slct, err := c.makeSelectLocal(f, mapping)
slct, err := c.makeSelectLocal(f)
if err != nil {
return nil, err
}

return c.db.queryExecutor.MakeSelectQuery(ctx, c.db, txn, slct)
planner := planner.New(ctx, c.db, txn)
return planner.MakePlan(&request.Request{
Queries: []*request.OperationDefinition{
{
Selections: []request.Selection{
slct,
},
},
},
})
Comment on lines +626 to +634
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: Might be nice to have some helpers on the request package to remove the labor in making such a request struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a personal preference I think - I prefer this atm as you can see the structure that you are using

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good.

}

func (c *collection) makeSelectLocal(filter *mapper.Filter, mapping *core.DocumentMapping) (*mapper.Select, error) {
slct := &mapper.Select{
Targetable: mapper.Targetable{
Field: mapper.Field{
Name: c.Name(),
},
Filter: filter,
func (c *collection) makeSelectLocal(filter client.Option[request.Filter]) (*request.Select, error) {
slct := &request.Select{
Field: request.Field{
Name: c.Name(),
},
Fields: make([]mapper.Requestable, len(c.desc.Schema.Fields)),
DocumentMapping: *mapping,
Filter: filter,
Fields: make([]request.Selection, 0),
}

for _, fd := range c.Schema().Fields {
if fd.IsObject() {
continue
}
index := int(fd.ID)
slct.Fields = append(slct.Fields, &mapper.Field{
Index: index,
Name: fd.Name,
slct.Fields = append(slct.Fields, &request.Field{
Name: fd.Name,
})
}

return slct, nil
}

func (c *collection) createMapping() *core.DocumentMapping {
mapping := core.NewDocumentMapping()
mapping.Add(core.DocKeyFieldIndex, request.DocKeyFieldName)
for _, fd := range c.Schema().Fields {
if fd.IsObject() {
continue
}
index := int(fd.ID)
mapping.Add(index, fd.Name)
mapping.RenderKeys = append(mapping.RenderKeys,
core.RenderKey{
Index: index,
Key: fd.Name,
},
)
}
return mapping
}

// getTypeAndCollectionForPatch parses the Patch op path values
// and compares it against the collection schema.
// If it's within the schema, then patchIsSubType is false
Expand Down
37 changes: 5 additions & 32 deletions db/db.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,7 @@ import (
"github.com/sourcenetwork/defradb/events"
"github.com/sourcenetwork/defradb/logging"
"github.com/sourcenetwork/defradb/merkle/crdt"
"github.com/sourcenetwork/defradb/planner"
"github.com/sourcenetwork/defradb/query/graphql/schema"
"github.com/sourcenetwork/defradb/query/graphql"
)

var (
Expand Down Expand Up @@ -61,8 +60,7 @@ type db struct {

events client.Events

schema *schema.SchemaManager
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I think its OK to keep the GQL coupling on the schema level, as I don't see us supporting multiple schema languages. That isn't to say the query langauge can't have multiple implementations.

Therefore, I think its OK to keep a reference to the schema (manager) on the DB.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR: The GQL coupling applies to queries, not schemas (imo)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore, I think its OK to keep a reference to the schema (manager) on the DB.

What for, it would be dead code atm (not to mention it contains types in the gql lib)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favour of decoupling if it doesn't impede our functionalities.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What for, it would be dead code atm (not to mention it contains types in the gql lib)

It would only be dead code if you kept everything about this PR the same. But this PR is two things, decouples Schema from GQL and decouples Query/Request language from GQL.

The query language should be decoupled as is written, but I don't think the schema should be decoupled. Thats a far more complex problem to be able to support different schema systems in one system. It should be a single consistent Schema approach, which can then be queried with whatever language (we implement).

Theres already enough complexities on the roadmap around schemas, I don't want to add this to it, if that makes sense. Perhaps once we get to a good place w.r.t to migration design.

For example/context: Fauna DB supports both FQL (their custom language) and GQL, so theyve decoupled their query language, but they only have a single language for schema definition. Dgraph does the same thing, supports DQL (their custom language) and GQL, but again, only a single schema definition language.

Copy link
Contributor Author

@AndrewSisley AndrewSisley Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell, youre (hoped) future plans would do the first step twice.

It would do a small subset of it (very roughly speaking) twice, FromSDL includes all the query input types (the bulk of the effort. Whereas the schema creation code only cares about field names and types (a simplification, but you get my point hopefully).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we are not decoupled from gql

This is my point tho, "decoupled" is a somewhat loaded term, it is referring to two diff systems, schema and query. I'd prefer to not decouple the schema, as I feel like its unnecessary complicated and we don't get anything as there are no plans to ever not use SDL.

I'm 100% in favor of this PR as it relates to decoupling the query from GQL.

Copy link
Member

@jsimnz jsimnz Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there would be duplication of effort. At the moment the code is conflating two different concepts and is using gql types (query-language dependent) to defined collectionDescriptions (query-language agnostic).

Indeed, the code conflates the two, as there was no need/goal to seperate them, but internally, they are actually seperated into two steps as seen here:

func (g *Generator) buildTypesFromAST(

Which is only responsible for taking the SDL, and returning structured GQL objects (without any query type generation, like filtering, sorting, aggregating, etc), just pure developer type as text (sdl) -> structured object.

Additionally, the collection descriptions use the gql objects defined above only, and only to then generate the descriptions, which themselves are free from GQL types technically (eg; we have our own definitions for Scalar types, field definitions, etc... independant from the GQL package).
https://github.com/sourcenetwork/defradb/blob/develop/query/graphql/schema/descriptions.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but internally, they are actually seperated into two steps as seen here

It is broken into two funcs sure, but if the first func calls the second it is not seperate.

We have two entirely separate systems collectionDescription generation and query-type generation, which have no reason to be coupled to each other, however they are and the collectionDescription code has been made dependent on the query-type generation code.

Conceptually there is no reason for this, and it causes some confusion. Technically the current system saves a few lines of code, but binds the two together - making substitution much harder, and increasing the damage done by any given bug (if query-generation fails in some way, collectionDescription-generation will likely also be impacted), as well as causing conceptual confusion. It also represents a significant complication to the maintaining of the code, as we now have one code block that must support the requirements of two entirely different systems/requirements/concepts (this IMO is the number one cause of crap and hard to maintain code in projects).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a good chat on zoom with the team. Consensus largely achieved, there are some ways in which both the current and long-term goals can be more clearly described but they will be handled in a new issue (same issue as Schema type convo) to be picked up shortly after this is merged.

queryExecutor *planner.QueryExecutor
parser core.Parser

// The options used to init the database
options any
Expand Down Expand Up @@ -92,14 +90,7 @@ func newDB(ctx context.Context, rootstore ds.Batching, options ...Option) (*db,
multistore := datastore.MultiStoreFrom(root)
crdtFactory := crdt.DefaultFactory.WithStores(multistore)

log.Debug(ctx, "Loading: schema manager")
sm, err := schema.NewSchemaManager()
if err != nil {
return nil, err
}

log.Debug(ctx, "Loading: query executor")
exec, err := planner.NewQueryExecutor(sm)
parser, err := graphql.NewParser()
if err != nil {
return nil, err
}
Expand All @@ -110,9 +101,8 @@ func newDB(ctx context.Context, rootstore ds.Batching, options ...Option) (*db,

crdtFactory: &crdtFactory,

schema: sm,
queryExecutor: exec,
options: options,
parser: parser,
options: options,
}

// apply options
Expand Down Expand Up @@ -190,23 +180,6 @@ func (db *db) PrintDump(ctx context.Context) error {
return printStore(ctx, db.multistore.Rootstore())
}

func (db *db) Executor() *planner.QueryExecutor {
return db.queryExecutor
}

func (db *db) GetRelationshipIdField(fieldName, targetType, thisType string) (string, error) {
rm := db.schema.Relations
rel := rm.GetRelationByDescription(fieldName, targetType, thisType)
if rel == nil {
return "", errors.New("relation does not exists")
}
subtypefieldname, _, ok := rel.GetFieldFromSchemaType(targetType)
if !ok {
return "", errors.New("relation is missing referenced field")
}
return subtypefieldname, nil
}

// Close is called when we are shutting down the database.
// This is the place for any last minute cleanup or releasing of resources (i.e.: Badger instance).
func (db *db) Close(ctx context.Context) {
Expand Down
52 changes: 29 additions & 23 deletions db/query.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,9 @@ import (
"context"
"strings"

gql "github.com/graphql-go/graphql"

"github.com/sourcenetwork/defradb/client"
"github.com/sourcenetwork/defradb/datastore"
"github.com/sourcenetwork/defradb/planner"
)

func (db *db) ExecQuery(ctx context.Context, query string) *client.QueryResult {
Expand All @@ -34,7 +33,18 @@ func (db *db) ExecQuery(ctx context.Context, query string) *client.QueryResult {
}
defer txn.Discard(ctx)

results, err := db.queryExecutor.ExecQuery(ctx, db, txn, query)
request, errors := db.parser.Parse(query)
if len(errors) > 0 {
errorStrings := make([]any, len(errors))
for i, err := range errors {
errorStrings[i] = err.Error()
}
res.Errors = errorStrings
return res
}

planner := planner.New(ctx, db, txn)
results, err := planner.RunRequest(ctx, request)
if err != nil {
res.Errors = []any{err.Error()}
return res
Expand All @@ -54,13 +64,24 @@ func (db *db) ExecTransactionalQuery(
query string,
txn datastore.Txn,
) *client.QueryResult {
res := &client.QueryResult{}
// check if its Introspection query
if strings.Contains(query, "IntrospectionQuery") {
if db.parser.IsIntrospection(query) {
return db.ExecIntrospection(query)
}

results, err := db.queryExecutor.ExecQuery(ctx, db, txn, query)
res := &client.QueryResult{}

request, errors := db.parser.Parse(query)
if len(errors) > 0 {
errorStrings := make([]any, len(errors))
for i, err := range errors {
errorStrings[i] = err.Error()
}
res.Errors = errorStrings
return res
}

planner := planner.New(ctx, db, txn)
results, err := planner.RunRequest(ctx, request)
if err != nil {
res.Errors = []any{err.Error()}
return res
Expand All @@ -71,20 +92,5 @@ func (db *db) ExecTransactionalQuery(
}

func (db *db) ExecIntrospection(query string) *client.QueryResult {
schema := db.schema.Schema()
// t := schema.Type("userFilterArg")
// spew.Dump(t.(*gql.InputObject).Fields())
params := gql.Params{Schema: *schema, RequestString: query}
r := gql.Do(params)

res := &client.QueryResult{
Data: r.Data,
Errors: make([]any, len(r.Errors)),
}

for i, err := range r.Errors {
res.Errors[i] = err
}

return res
return db.parser.ExecuteIntrospection(query)
}
Loading