Skip to content

Missing edge categories? #88

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 3 tasks
caufieldjh opened this issue Jan 30, 2023 · 6 comments
Open
1 of 3 tasks

Missing edge categories? #88

caufieldjh opened this issue Jan 30, 2023 · 6 comments

Comments

@caufieldjh
Copy link
Contributor

caufieldjh commented Jan 30, 2023

@kevinschaper reports that kg-phenio may be missing edge categories.

  • Check if edge categories are present in the KGX output.
  • If not, investigate the transform and the PHENIO build.
  • If so, ensure they are consistent with expectations (e.g., in expected column, etc)
@caufieldjh
Copy link
Contributor Author

KG-Phenio has edge properties, they're just quite minimal. This is the header:

id	subject	predicate	object	category	relation	knowledge_source

Not much going on there!

For comparison, here's the Monarch graph heading:

id	original_subject	predicate	original_object	category	aggregator_knowledge_source	primary_knowledge_source	publications	qualifiers	provided_by	has_evidence	stage_qualifier	relation	knowledge_source	negated	frequency_qualifier	onset_qualifier	sex_qualifier	evidence	subject	object

Not all properties will be necessary for PHENIO, but the knowledge sources can certainly be expanded.

@caufieldjh
Copy link
Contributor Author

The merged Upheno mapping table also needs a knowledge_source added, but that can just be added at transform time as a KGX argument.

@caufieldjh
Copy link
Contributor Author

Also want:

caufieldjh added a commit that referenced this issue Feb 9, 2023
- Address some of #88 
- Add primary_knowledge_source and aggregate_knowledge_source to edges as appropriate
- Add knowledge source to provided_by for nodes
@kevinschaper
Copy link
Collaborator

I was just poking at some kgx validation output and I noticed I had complaints about ZFA being an invalid prefix in ZP->ZFA associations, and it might have something to do with the blank categories. Here's a summary of edges with blank categories

category subject_namespace predicate object_namespace primary_knowledge_source count(*)
GO biolink:subclass_of GO infores:go 83592
ZP biolink:subclass_of ZP infores:zp 60886
FBbt biolink:related_to FBbt infores:fbbt 52322
MONDO biolink:subclass_of MONDO infores:mondo 38795
XPO biolink:subclass_of XPO infores:xpo 35717
FBbt biolink:subclass_of FBbt infores:fbbt 31665
UBERON biolink:subclass_of UBERON infores:uberon 23079
HP biolink:subclass_of HP infores:hp 22532
GO biolink:related_to GO infores:go 20260
UBERON biolink:related_to UBERON infores:uberon 19247
MP biolink:subclass_of MP infores:mp 18334
WBbt biolink:subclass_of WBbt infores:wbbt 8168
EMAPA biolink:related_to EMAPA infores:emapa 7037
WBbt biolink:related_to WBbt infores:wbbt 6924
CHEBI biolink:subclass_of CHEBI infores:chebi 6559
ZP biolink:related_to ZFA infores:upheno 5840
ZP biolink:related_to GO infores:upheno 5822
CHEBI biolink:related_to CHEBI infores:chebi 5043
EMAPA biolink:subclass_of EMAPA infores:emapa 4545
EMAPA biolink:subclass_of UBERON infores:emapa 4477
WBPhenotype biolink:subclass_of WBPhenotype infores:wbphenotype 3364
ZFA biolink:subclass_of ZFA infores:zfa 3199
MONDO biolink:related_to UBERON infores:mondo 3027
WBbt biolink:subclass_of GO infores:wbbt 2766
ZFA biolink:related_to ZFA infores:zfa 2752
MP biolink:related_to UBERON infores:upheno 2577
MONDO biolink:related_to MONDO infores:mondo 2458
GO biolink:related_to CHEBI infores:go 2090
ZFA biolink:subclass_of UBERON infores:zfa 2071
HP biolink:related_to UBERON infores:upheno 1588
MONDO biolink:related_to HP infores:mondo 1450
GO biolink:related_to UBERON infores:go 1124
MPATH biolink:subclass_of MPATH infores:mpath 946
MP biolink:related_to GO infores:upheno 870
FBbt biolink:related_to GO infores:fbbt 571
ZP biolink:related_to CHEBI infores:upheno 455
MONDO biolink:related_to GO infores:mondo 432
UBERON biolink:related_to GO infores:uberon 423
FBbt biolink:subclass_of UBERON infores:fbbt 369
HP biolink:related_to CHEBI infores:upheno 359
WBPhenotype biolink:related_to GO infores:upheno 325
HP biolink:related_to GO infores:upheno 279
WBPhenotype biolink:related_to WBbt infores:upheno 264
MP biolink:related_to CHEBI infores:upheno 191
XPO biolink:related_to GO infores:upheno 141
ZP biolink:related_to MPATH infores:upheno 137
MP biolink:related_to MPATH infores:upheno 134
HP biolink:related_to MPATH infores:upheno 71
MONDO biolink:related_to CHEBI infores:mondo 49
WBbt biolink:subclass_of UBERON infores:wbbt 49
MP biolink:related_to MP infores:upheno 32
UBERON biolink:related_to CHEBI infores:uberon 32
WBPhenotype biolink:related_to CHEBI infores:upheno 21
MPATH biolink:related_to MPATH infores:mpath 3
HP biolink:related_to HP infores:upheno 2
UBERON biolink:subclass_of GO infores:uberon 2
WBPhenotype biolink:related_to UBERON infores:upheno 1

@cmungall
Copy link

cmungall commented Mar 16, 2023

I am not sure we have an implemented strategy for populating edge categories when going from owl->kgx

This could be done in kgx by inference,

  gene to phenotypic feature association:
    is_a: association
    exact_mappings:
      - WBVocab:Gene-Phenotype-Association
    defining_slots:
      - subject
      - object
    mixins:
      - entity to phenotypic feature association mixin
      - gene to entity association mixin
    slot_usage:
      subject:
        range: gene or gene product
        description: "gene in which variation is correlated with the phenotypic feature"
        examples:
          - value: HGNC:2197
            description: "COL1A1 (Human)"
      object:
        range: phenotypic feature

however I would do this with linkml:classification_rules now

this will probably not be straightforward to add to kgx - @kevinschaper how much does our validation strategy depend on this being present

@kevinschaper
Copy link
Collaborator

It looks like we're hitting some kgx validation issues within translator infrastructure that might be coming in from blank category fields on edges. Would it work to just fill in with biolink:Association rather than nulls?

(I think I might do that in my phenio kgx massaging, and of course it won't do anything once they're set)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants