-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preprocess_bsdd_v2 #66
base: development
Are you sure you want to change the base?
Conversation
application/main.py
Outdated
if model.status_bsdd != 'n': | ||
preprocessed_bsdd_data = { | ||
'bSDD classification found': { | ||
'name': [r['classification_name'] for r in bsdd_results][0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we only take [0]
here? There can definitely be multiple classifications in a file. But this might be something that more fundamentally also affects the report.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in Ghesselink@ad6cc14
Refers to the slack thread https://bsi-technicalservices.slack.com/archives/C04PPUCMED9/p1677890032160909
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
application/main.py
Outdated
'bSDD classification found': { | ||
'name': [r['classification_name'] for r in bsdd_results][0], | ||
'Release data': 'n.a.', | ||
'Organisation': 'BuildingSMART', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't hardcode something like that, if we don't know then just leave it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have default values for every field of the object in case something goes wrong/an information is not retrieved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done Ghesselink@b161001
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have default values for every field of the object in case something goes wrong/an information is not retrieved.
This is already the case.
-
The default value of validity is invalid
-
Default of 'domain_source' is 'classification_not_found' https://github.com/Ghesselink/ifc-pipeline-validation/blob/19deaf9e934d8e6e79445b9bf86657cf68aeba32/application/bsdd_utils.py#L61
-
All the counts defaults 0
-
Classification_name defaults 'name not found' https://github.com/Ghesselink/ifc-pipeline-validation/blob/19deaf9e934d8e6e79445b9bf86657cf68aeba32/application/bsdd_utils.py#L86
Or do you think the default values should be different?
application/bsdd_utils.py
Outdated
inst = get_inst(session, bsdd_result['instance_id']) | ||
observed_type = inst.ifc_type | ||
required_type = bsdd_result['bsdd_type_constraint'] | ||
validity = "valid" if utils.do_try(lambda: ifcopenshell.template.create(schema_identifier="IFC4X3").create_entity(observed_type).is_a(required_type), 'invalid') else 'invalid' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind that there can be many many classifications in a model. We need to test with large models and see the performance implications.
While this is a clever way to express this, it's not really scalable and also not fully correct.
- We can't assume IFC4X3, entities valid in IFC2x3 might have been deleted.
- Creating a full model just to test this is not very performance friendly, instead use
ifcopenshell.ifcopenshell_wrapper.schema_by_name(...).declaration_by_name(observed_type).supertype().name()
and keep looping over supertype until you get to supertype() is None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in Ghesselink@9236a20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, see https://bsi-technicalservices.slack.com/archives/C04PPUCMED9/p1677890032160909
But also out due to Johan's suggestion below
application/bsdd_utils.py
Outdated
@@ -42,4 +50,45 @@ def get_inst(instance_id): | |||
|
|||
return hierarchical_bsdd_results | |||
|
|||
def bsdd_data_processing(bsdd_task, bsdd_results, session): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a very informative name, what does this function return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function returns data for the 'bsdd data' table in a file metrics report. More specifically,it returns a list of dictionaries containing the classifications (ifc instances, in this case) and their valid/invalid counts.
I've modified the function name and left a comment.
application/bsdd_utils.py
Outdated
domain_sources.append(bsdd_uri) | ||
else: | ||
parse = urlparse(bsdd_uri) | ||
parsed_domain_file = ''.join(char for char in result[domain_file] if char.isalnum()).lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand what's going on here, but does this approach of constructing the url really tested on a series of different classifications?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's changed in https://github.com/Ghesselink/ifc-pipeline-validation/blob/5f5d70eab82e6668745c997f25fae974f58865c5/application/bsdd_utils.py#L108
More in line with Johan's earlier work
https://github.com/Ghesselink/ifc-pipeline-validation/blob/5f5d70eab82e6668745c997f25fae974f58865c5/application/checks/check_bsdd_v2.py#L16
application/bsdd_utils.py
Outdated
url = parse.scheme + '/' + parse.netloc + '/' + 'uri' + '/' + domain_part + '/' | ||
domain_sources.append(url) | ||
sources = list(filter(lambda x: x != default, domain_sources)) | ||
return sources[0] if sources else default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, we're taking [0] here and potentially discarding a lot of other domains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is now a similar approach as with the domain url, but this time with classification name
Ghesselink@f4fdee4
application/bsdd_utils.py
Outdated
|
||
def bsdd_report_quantity(bsdd_task, item): | ||
return sum(bool(bsdd_result.serialize().get(item)) for bsdd_result in bsdd_task.results) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling serialize()
is potentially expensive, better restructure the code so that we only do it once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, also in another function with the same issue Ghesselink@5d3759a
serialize()
is now only done here
https://github.com/Ghesselink/ifc-pipeline-validation/blob/5d3759a4493752129a9e7af670333b1d5296c3d9/application/main.py#L807
Validate logged user Co-authored-by: Thomas Krijnen <t.krijnen@gmail.com>
update endpoint url Co-authored-by: Thomas Krijnen <t.krijnen@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work: I like how your code is structured. I made a couple of comments regarding how to get the validity information from a bsdd_result object. Plus minor comments about formatting.
application/bsdd_utils.py
Outdated
def get_inst(session, instance_id): | ||
return session.query(database.ifc_instance).filter(database.ifc_instance.id == instance_id).all()[0] | ||
|
||
def get_domain(bsdd_results): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To get the domain from the classification URI you can use the same function as what is used in https://github.com/AECgeeks/ifc-pipeline-validation/blob/development/application/checks/check_bsdd_v2.py#L14
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works, and I've implemented those changes in Ghesselink@58a98f1
I understand that it's preferable to avoid duplicating the code. However, the current implementation (again invoking the get_domain function) makes a request for every classification, which significantly increases the loading time for a BSDD report. This may lead to frustration among users.
Wouldn't it be better to add this to the database at check_bsdd_v2.py
and retrieve at the flask route? This avoids both duplicating code as well as long loading times.
But it's removed from the implementation since it's not needed.
application/bsdd_utils.py
Outdated
inst = get_inst(session, bsdd_result['instance_id']) | ||
observed_type = inst.ifc_type | ||
required_type = bsdd_result['bsdd_type_constraint'] | ||
validity = "valid" if utils.do_try(lambda: ifcopenshell.template.create(schema_identifier="IFC4X3").create_entity(observed_type).is_a(required_type), 'invalid') else 'invalid' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not needed to recompute the validity of an instance regarding a constraint on its type, this information is already present in a bsdd_result object. You can see a bsdd_result object as one constraint to respect for an entity. Such a constraint is part of requirements imposed by a classification which itself belongs to a domain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, it's duplicating the work already done. I've changed it in Ghesselink@19deaf9
application/main.py
Outdated
'bSDD classification found': { | ||
'name': [r['classification_name'] for r in bsdd_results][0], | ||
'Release data': 'n.a.', | ||
'Organisation': 'BuildingSMART', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have default values for every field of the object in case something goes wrong/an information is not retrieved.
application/utils.py
Outdated
@@ -78,3 +78,12 @@ def send_message(msg_content, user_email, html=None): | |||
"html":html, | |||
"subject": "Validation Service update", | |||
"text": msg_content}) | |||
|
|||
def do_try(fn, default=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove this change now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot comment on your line above.
I think we should have default values for every field of the object in case something goes wrong/an information is not retrieved.
This is already the case.
- The default value of validity is invalid
- Default of 'domain_source' is 'classification_not_found https://github.com/Ghesselink/ifc-pipeline-validation/blob/19deaf9e934d8e6e79445b9bf86657cf68aeba32/application/bsdd_utils.py#L61
- All the counts defaults 0
- Classification_name defaults 'name not found' https://github.com/Ghesselink/ifc-pipeline-validation/blob/19deaf9e934d8e6e79445b9bf86657cf68aeba32/application/bsdd_utils.py#L86
Or do you think the default values should be different?
application/bsdd_utils.py
Outdated
observed_type = inst.ifc_type | ||
required_type = bsdd_result['bsdd_type_constraint'] | ||
domain_source = bsdd_result['bsdd_classification_uri'] | ||
validity = "valid" if required_type in instance_supertypes(observed_type, schema) else 'invalid' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We don't determine the validity of a bsdd_result only with respect to the type
- To check if a bsdd_result object is valid, the following attributes of the object need to be set to 1: val_ifc_type, val_property_set, val_property_name, val_property_type, val_property_value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
application/bsdd_utils.py
Outdated
|
||
while True: | ||
try: | ||
result = (lambda x: ifcopenshell.ifcopenshell_wrapper.schema_by_name(schema).declaration_by_name(x).supertype().name())(allowed_types[-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand what this line does.
Also, no need to recompute the validity with respect to the type (nor any other criteria) it's available by doing bsdd_result.val_ifc_type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done Ghesselink@19deaf9
Maybe a nice-to-know for somewhere in the future. It's a solution based on what Thomas suggested earlier in this PR. The line you mentioned calculated the supertype of the last element in a table. It's the equivalent of (e.g.) 'IfcWallStandardCase'.is_a('IfcWall') without loading an ifc model and slowing everything down. In the current example, the table starts as follows
allowed types = ['IfcStandardWallCase']
ifcopenshell.ifcopenshell_wrapper.schema_by_name('IFC4X3).declaration_by_name(IfcStandardWallCase).supertype().name() # that's allowed_types [-1], results in 'IfcWall'
new allowed_types = ['IfcWallStandardCase, 'IfcWall']
ifcopenshell.ifcopenshell_wrapper.schema_by_name('IFC4X3).declaration_by_name('IfcWall').supertype().name() # that's allowed_types [-1], results in 'IfcBuiltElement'
new allowed_types = ['IfcWallStandardCase, 'IfcWall, 'IfcBuiltElement'']
So then, x in allowed_types
is equivalent to x.is_a(allowed_type)
in the sense it takes inheritance into consideration
application/main.py
Outdated
@@ -794,6 +794,30 @@ def get_model(fn): | |||
return response | |||
else: | |||
return send_file(path) | |||
|
|||
@application.route('/api/bsdd/statistics/<id>', methods=['GET']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer for coherence to have /api/bsdd_statistics/<id>
instead of /api/bsdd/statistics/<id>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done Ghesselink@b6a316e
Continuation of #55
Output :
