preprocess_bsdd_v2 #66

Ghesselink · 2023-03-03T01:25:31Z

Continuation of #55

Output :

application/main.py

aothms · 2023-03-03T10:35:58Z

application/main.py

+        if model.status_bsdd != 'n':
+            preprocessed_bsdd_data = {
+                'bSDD classification found': {
+                    'name': [r['classification_name'] for r in bsdd_results][0],


Why do we only take [0] here? There can definitely be multiple classifications in a file. But this might be something that more fundamentally also affects the report.

Changed in Ghesselink@ad6cc14
Refers to the slack thread https://bsi-technicalservices.slack.com/archives/C04PPUCMED9/p1677890032160909

Done, see https://bsi-technicalservices.slack.com/archives/C04PPUCMED9/p1677890032160909

aothms · 2023-03-03T10:36:26Z

application/main.py

+                'bSDD classification found': {
+                    'name': [r['classification_name'] for r in bsdd_results][0],
+                    'Release data': 'n.a.',
+                    'Organisation': 'BuildingSMART',


I wouldn't hardcode something like that, if we don't know then just leave it out.

I think we should have default values for every field of the object in case something goes wrong/an information is not retrieved.

Done Ghesselink@b161001

I think we should have default values for every field of the object in case something goes wrong/an information is not retrieved.

This is already the case.

The default value of validity is invalid

Default of 'domain_source' is 'classification_not_found' https://github.com/Ghesselink/ifc-pipeline-validation/blob/19deaf9e934d8e6e79445b9bf86657cf68aeba32/application/bsdd_utils.py#L61

All the counts defaults 0

Classification_name defaults 'name not found' https://github.com/Ghesselink/ifc-pipeline-validation/blob/19deaf9e934d8e6e79445b9bf86657cf68aeba32/application/bsdd_utils.py#L86

Or do you think the default values should be different?

aothms · 2023-03-03T10:39:49Z

application/bsdd_utils.py

+    inst = get_inst(session, bsdd_result['instance_id'])
+    observed_type = inst.ifc_type
+    required_type = bsdd_result['bsdd_type_constraint']
+    validity = "valid" if utils.do_try(lambda: ifcopenshell.template.create(schema_identifier="IFC4X3").create_entity(observed_type).is_a(required_type), 'invalid') else 'invalid'


Keep in mind that there can be many many classifications in a model. We need to test with large models and see the performance implications.

While this is a clever way to express this, it's not really scalable and also not fully correct.

We can't assume IFC4X3, entities valid in IFC2x3 might have been deleted.

Creating a full model just to test this is not very performance friendly, instead use ifcopenshell.ifcopenshell_wrapper.schema_by_name(...).declaration_by_name(observed_type).supertype().name() and keep looping over supertype until you get to supertype() is None.

Fixed in Ghesselink@9236a20

Done, see https://bsi-technicalservices.slack.com/archives/C04PPUCMED9/p1677890032160909

But also out due to Johan's suggestion below

aothms · 2023-03-03T10:40:05Z

application/bsdd_utils.py

@@ -42,4 +50,45 @@ def get_inst(instance_id):

    return hierarchical_bsdd_results     

+def bsdd_data_processing(bsdd_task, bsdd_results, session):


This is not a very informative name, what does this function return?

The function returns data for the 'bsdd data' table in a file metrics report. More specifically,it returns a list of dictionaries containing the classifications (ifc instances, in this case) and their valid/invalid counts.
I've modified the function name and left a comment.

aothms · 2023-03-03T10:43:40Z

application/bsdd_utils.py

+            domain_sources.append(bsdd_uri)
+        else:
+            parse = urlparse(bsdd_uri)
+            parsed_domain_file = ''.join(char for char in result[domain_file] if char.isalnum()).lower()


I don't really understand what's going on here, but does this approach of constructing the url really tested on a series of different classifications?

It's changed in https://github.com/Ghesselink/ifc-pipeline-validation/blob/5f5d70eab82e6668745c997f25fae974f58865c5/application/bsdd_utils.py#L108

More in line with Johan's earlier work
https://github.com/Ghesselink/ifc-pipeline-validation/blob/5f5d70eab82e6668745c997f25fae974f58865c5/application/checks/check_bsdd_v2.py#L16

aothms · 2023-03-03T10:44:03Z

application/bsdd_utils.py

+            url = parse.scheme + '/' + parse.netloc + '/' + 'uri' + '/' + domain_part + '/'
+            domain_sources.append(url)
+    sources = list(filter(lambda x: x != default, domain_sources))
+    return sources[0] if sources else default


Again, we're taking [0] here and potentially discarding a lot of other domains.

There is now a similar approach as with the domain url, but this time with classification name
Ghesselink@f4fdee4

aothms · 2023-03-03T10:45:04Z

application/bsdd_utils.py


+def bsdd_report_quantity(bsdd_task, item):
+    return sum(bool(bsdd_result.serialize().get(item)) for bsdd_result in bsdd_task.results)


Calling serialize() is potentially expensive, better restructure the code so that we only do it once.

Done, also in another function with the same issue Ghesselink@5d3759a
serialize() is now only done here
https://github.com/Ghesselink/ifc-pipeline-validation/blob/5d3759a4493752129a9e7af670333b1d5296c3d9/application/main.py#L807

Validate logged user Co-authored-by: Thomas Krijnen <t.krijnen@gmail.com>

update endpoint url Co-authored-by: Thomas Krijnen <t.krijnen@gmail.com>

johltn

Great work: I like how your code is structured. I made a couple of comments regarding how to get the validity information from a bsdd_result object. Plus minor comments about formatting.

johltn · 2023-03-03T17:39:12Z

application/bsdd_utils.py

+def get_inst(session, instance_id):
+    return session.query(database.ifc_instance).filter(database.ifc_instance.id == instance_id).all()[0]
+
+def get_domain(bsdd_results):


To get the domain from the classification URI you can use the same function as what is used in https://github.com/AECgeeks/ifc-pipeline-validation/blob/development/application/checks/check_bsdd_v2.py#L14

That works, and I've implemented those changes in Ghesselink@58a98f1

I understand that it's preferable to avoid duplicating the code. However, the current implementation (again invoking the get_domain function) makes a request for every classification, which significantly increases the loading time for a BSDD report. This may lead to frustration among users.

Wouldn't it be better to add this to the database at check_bsdd_v2.py and retrieve at the flask route? This avoids both duplicating code as well as long loading times.

But it's removed from the implementation since it's not needed.

johltn · 2023-03-03T17:54:53Z

application/bsdd_utils.py

+    inst = get_inst(session, bsdd_result['instance_id'])
+    observed_type = inst.ifc_type
+    required_type = bsdd_result['bsdd_type_constraint']
+    validity = "valid" if utils.do_try(lambda: ifcopenshell.template.create(schema_identifier="IFC4X3").create_entity(observed_type).is_a(required_type), 'invalid') else 'invalid'


It is not needed to recompute the validity of an instance regarding a constraint on its type, this information is already present in a bsdd_result object. You can see a bsdd_result object as one constraint to respect for an entity. Such a constraint is part of requirements imposed by a classification which itself belongs to a domain.

I see, it's duplicating the work already done. I've changed it in Ghesselink@19deaf9

johltn · 2023-03-03T18:03:45Z

application/main.py

+                'bSDD classification found': {
+                    'name': [r['classification_name'] for r in bsdd_results][0],
+                    'Release data': 'n.a.',
+                    'Organisation': 'BuildingSMART',


I think we should have default values for every field of the object in case something goes wrong/an information is not retrieved.

johltn · 2023-03-06T01:07:55Z

application/utils.py

@@ -78,3 +78,12 @@ def send_message(msg_content, user_email, html=None):
              "html":html,
              "subject": "Validation Service update",
              "text": msg_content})
+
+def do_try(fn, default=None):


I think you can remove this change now

I cannot comment on your line above.

I think we should have default values for every field of the object in case something goes wrong/an information is not retrieved.

This is already the case.

The default value of validity is invalid

Default of 'domain_source' is 'classification_not_found https://github.com/Ghesselink/ifc-pipeline-validation/blob/19deaf9e934d8e6e79445b9bf86657cf68aeba32/application/bsdd_utils.py#L61

All the counts defaults 0

Classification_name defaults 'name not found' https://github.com/Ghesselink/ifc-pipeline-validation/blob/19deaf9e934d8e6e79445b9bf86657cf68aeba32/application/bsdd_utils.py#L86

Or do you think the default values should be different?

johltn · 2023-03-06T01:27:36Z

application/bsdd_utils.py

+    observed_type = inst.ifc_type
+    required_type = bsdd_result['bsdd_type_constraint']
+    domain_source = bsdd_result['bsdd_classification_uri']
+    validity = "valid" if required_type in instance_supertypes(observed_type, schema) else 'invalid'


We don't determine the validity of a bsdd_result only with respect to the type

To check if a bsdd_result object is valid, the following attributes of the object need to be set to 1: val_ifc_type, val_property_set, val_property_name, val_property_type, val_property_value

Ghesselink@19deaf9

johltn · 2023-03-06T01:30:27Z

application/bsdd_utils.py

+
+    while True:
+        try:
+            result = (lambda x: ifcopenshell.ifcopenshell_wrapper.schema_by_name(schema).declaration_by_name(x).supertype().name())(allowed_types[-1])


Not sure I understand what this line does.
Also, no need to recompute the validity with respect to the type (nor any other criteria) it's available by doing bsdd_result.val_ifc_type

Done Ghesselink@19deaf9

Maybe a nice-to-know for somewhere in the future. It's a solution based on what Thomas suggested earlier in this PR. The line you mentioned calculated the supertype of the last element in a table. It's the equivalent of (e.g.) 'IfcWallStandardCase'.is_a('IfcWall') without loading an ifc model and slowing everything down. In the current example, the table starts as follows

allowed types = ['IfcStandardWallCase'] ifcopenshell.ifcopenshell_wrapper.schema_by_name('IFC4X3).declaration_by_name(IfcStandardWallCase).supertype().name() # that's allowed_types [-1], results in 'IfcWall' new allowed_types = ['IfcWallStandardCase, 'IfcWall'] ifcopenshell.ifcopenshell_wrapper.schema_by_name('IFC4X3).declaration_by_name('IfcWall').supertype().name() # that's allowed_types [-1], results in 'IfcBuiltElement' new allowed_types = ['IfcWallStandardCase, 'IfcWall, 'IfcBuiltElement'']

So then, x in allowed_types is equivalent to x.is_a(allowed_type) in the sense it takes inheritance into consideration

johltn · 2023-03-06T01:34:04Z

application/main.py

@@ -794,6 +794,30 @@ def get_model(fn):
        return response
    else:
        return send_file(path)
+
+@application.route('/api/bsdd/statistics/<id>', methods=['GET'])


I would prefer for coherence to have /api/bsdd_statistics/<id> instead of /api/bsdd/statistics/<id>

Done Ghesselink@b6a316e

Ghesselink added 2 commits March 2, 2023 19:21

add preprocessed_bsdd_data

280bba2

remove duplicate

9b888a0

Ghesselink mentioned this pull request Mar 3, 2023

add preprocess_bsdd json #55

Closed

aothms requested changes Mar 3, 2023

View reviewed changes

aothms requested a review from johltn March 3, 2023 10:45

Ghesselink and others added 10 commits March 3, 2023 07:35

Update application/main.py

65182b7

Validate logged user Co-authored-by: Thomas Krijnen <t.krijnen@gmail.com>

Update application/main.py

a28a04c

update endpoint url Co-authored-by: Thomas Krijnen <t.krijnen@gmail.com>

consider variation of domain sources

ad6cc14

update allowed instane types

9236a20

add allowed_types to lambda func

7906332

consider variation of classification names

f4fdee4

remove hardcoded unknown values

b161001

leave comment bsdd_table functionality

14265e6

update comment

5f5d70e

serialize results just once

5d3759a

johltn requested changes Mar 6, 2023

View reviewed changes

Ghesselink added 4 commits March 5, 2023 20:30

remove do_try func

816f747

change endpoint name

b6a316e

duplicate get_domain function of bsdd_check_v2.py

58a98f1

modify check validity of bsdd classification

19deaf9

Ghesselink requested a review from johltn March 6, 2023 15:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

preprocess_bsdd_v2 #66

preprocess_bsdd_v2 #66

Ghesselink commented Mar 3, 2023

aothms Mar 3, 2023

Ghesselink Mar 4, 2023

Ghesselink Mar 6, 2023

aothms Mar 3, 2023

johltn Mar 3, 2023

Ghesselink Mar 4, 2023

Ghesselink Mar 6, 2023 •

edited

Loading

aothms Mar 3, 2023

Ghesselink Mar 4, 2023

Ghesselink Mar 6, 2023 •

edited

Loading

aothms Mar 3, 2023

Ghesselink Mar 4, 2023 •

edited

Loading

aothms Mar 3, 2023

Ghesselink Mar 4, 2023

aothms Mar 3, 2023

Ghesselink Mar 4, 2023

aothms Mar 3, 2023

Ghesselink Mar 4, 2023

johltn left a comment

johltn Mar 3, 2023

Ghesselink Mar 6, 2023 •

edited

Loading

johltn Mar 3, 2023

Ghesselink Mar 6, 2023

johltn Mar 3, 2023

johltn Mar 6, 2023

Ghesselink Mar 6, 2023

Ghesselink Mar 6, 2023 •

edited

Loading

johltn Mar 6, 2023

Ghesselink Mar 6, 2023

johltn Mar 6, 2023

Ghesselink Mar 6, 2023

johltn Mar 6, 2023

Ghesselink Mar 6, 2023

		@@ -42,4 +50,45 @@ def get_inst(instance_id):

		return hierarchical_bsdd_results

		def bsdd_data_processing(bsdd_task, bsdd_results, session):


		def bsdd_report_quantity(bsdd_task, item):
		return sum(bool(bsdd_result.serialize().get(item)) for bsdd_result in bsdd_task.results)

preprocess_bsdd_v2 #66

Are you sure you want to change the base?

preprocess_bsdd_v2 #66

Conversation

Ghesselink commented Mar 3, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ghesselink Mar 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ghesselink Mar 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ghesselink Mar 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johltn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ghesselink Mar 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ghesselink Mar 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ghesselink Mar 6, 2023 •

edited

Loading

Ghesselink Mar 6, 2023 •

edited

Loading

Ghesselink Mar 4, 2023 •

edited

Loading

Ghesselink Mar 6, 2023 •

edited

Loading

Ghesselink Mar 6, 2023 •

edited

Loading