Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced export functionalities #74

Merged
merged 18 commits into from
Jan 20, 2025
Merged

Conversation

MBueschelberger
Copy link
Member

@MBueschelberger MBueschelberger commented Dec 19, 2024

Changes from this PR:

Previously, the parsed metadata from the source file was represented as a flat dictionary, which will be used e.g. by the dsms python sdk in order to push the entries of the custom properties field.

E.g.

assert pipeline.plain_metadata == {
 'SampleIdentifier-2': 'Probentyp_2',
 'OriginalGaugeLength': 80,
}

This plain_metadata was previously dynamically flattened from the pipeline.general_metadata:

assert pipeline.general_metadata == [
PropertyGraph(
	iri=https://w3id.org/steel/ProcessOntology/SampleIdentifier-2,
	suffix=SampleIdentifier-2,
	key=Probenkennung 2,
	value=Probentyp_2,
	annotation=None,
	value_relation=rdfs:label),
 QuantityGraph(
	iri=https://w3id.org/steel/ProcessOntology/OriginalGaugeLength,
	suffix=OriginalGaugeLength,
	key=Messlänge Standardweg,
	unit=http://qudt.org/vocab/unit/MilliM,
	value=80,
	unit_relation=qudt:hasUnit,
	value_relation=qudt:value),
]

With this release, the plain_metadata is depricated and replaced by to_dict, which can use an additional callable as kwarg in order to transform the schema into a desired shape:

assert pipline.to_dict() == {
"SampleIdentifier-2": {
              "label": "SampleIdentifier-2",
              "value": "Probentyp_2",
          },
"OriginalGaugeLength":  {
            "label": "OriginalGaugeLength",
            "measurement_unit": {
                "iri": "http://qudt.org/vocab/unit/MilliM",
                "label": "Millimetre",
                "namespace": "http://qudt.org/vocab/unit",
                "symbol": "mm",
            },
            "value": 80,
        },
 }
 

with schema-kwarg:

from dsms.knowledge.utils import sectionize_metadata

pipeline.to_dict(schema=sectionize_metadata) == {
    "sections": [
        {
            "entries": [
                {
                   "id": "id1733827725155y6c07h"
                   "label": "SampleIdentifier-2",
                    "value": "Probentyp_2",
                },
                {
                    "id": "id1733827725155ob0izy"
                    "label": "OriginalGaugeLength",
                    "measurement_unit": {
                        "iri": "http://qudt.org/vocab/unit/MilliM",
                        "label": "Millimetre",
                        "namespace": "http://qudt.org/vocab/unit",
                        "symbol": "mm",
                    },
            ],
            "name": "General",
            "id":  "id1733827725155yhxu2o"
        },
    ],
}

@MBueschelberger MBueschelberger marked this pull request as draft December 19, 2024 09:36
Copy link
Contributor

github-actions bot commented Dec 19, 2024

Coverage

Coverage Report
FileStmtsMissCoverMissing
data2rdf
   __init__.py50100% 
   config.py2711 96%
   utils.py1222 83%
   warnings.py30100% 
data2rdf/models
   __init__.py30100% 
   base.py5355 91%
   graph.py14488 94%
   mapping.py4711 98%
   utils.py581414 76%
data2rdf/modes
   __init__.py40100% 
data2rdf/parsers
   __init__.py60100% 
   base.py1531717 89%
   csv.py1682020 88%
   excel.py1831717 91%
   json.py2293939 83%
   utils.py901010 89%
data2rdf/pipelines
   __init__.py20100% 
   main.py951616 83%
data2rdf/qudt
   __init__.py00100% 
   utils.py551616 71%
TOTAL133716688% 

Tests Skipped Failures Errors Time
121 0 💤 0 ❌ 0 🔥 3m 28s ⏱️

@MBueschelberger MBueschelberger self-assigned this Dec 19, 2024
@MBueschelberger MBueschelberger added the 📈 enhancement New feature or request label Dec 19, 2024
@MBueschelberger MBueschelberger changed the title Refactor/custom properties Advanced export functionalities Dec 19, 2024
@MBueschelberger MBueschelberger marked this pull request as ready for review December 19, 2024 16:49
Copy link
Member

@yoavnash yoavnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a comment for the SDK, but I’d like to mention it here as well: I think a better name for sectionize_metadata would be CustomPropertiesSchema (I imagine camel case fits better here than snake case).

@MBueschelberger
Copy link
Member Author

This is a comment for the SDK, but I’d like to mention it here as well: I think a better name for sectionize_metadata would be CustomPropertiesSchema (I imagine camel case fits better here than snake case).

why camel case? It is a method and hence the convention is snake case.

@yoavnash
Copy link
Member

This is a comment for the SDK, but I’d like to mention it here as well: I think a better name for sectionize_metadata would be CustomPropertiesSchema (I imagine camel case fits better here than snake case).

why camel case? It is a method and hence the convention is snake case.

I'm not sure why I didn't use snake case, but the point still holds.

@MBueschelberger
Copy link
Member Author

This is a comment for the SDK, but I’d like to mention it here as well: I think a better name for sectionize_metadata would be CustomPropertiesSchema (I imagine camel case fits better here than snake case).

why camel case? It is a method and hence the convention is snake case.

I'm not sure why I didn't use snake case, but the point still holds.

sure, the name makes sense and I would rename it in the SDK. Just was wondering about the case schema.

@MBueschelberger
Copy link
Member Author

This is a comment for the SDK, but I’d like to mention it here as well: I think a better name for sectionize_metadata would be CustomPropertiesSchema (I imagine camel case fits better here than snake case).

why camel case? It is a method and hence the convention is snake case.

I'm not sure why I didn't use snake case, but the point still holds.

sure, the name makes sense and I would rename it in the SDK. Just was wondering about the case schema.

Maybe make_custom_properties_schema would be even better, because it's a function?

@MBueschelberger MBueschelberger merged commit 289f69b into main Jan 20, 2025
10 checks passed
@MBueschelberger MBueschelberger deleted the refactor/custom_properties branch January 20, 2025 09:58
@MBueschelberger MBueschelberger restored the refactor/custom_properties branch January 20, 2025 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📈 enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants