Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preprocess_bsdd_v2 #66

Open
wants to merge 16 commits into
base: development
Choose a base branch
from

Conversation

Ghesselink
Copy link
Contributor

Continuation of #55

Output :
afbeelding

if model.status_bsdd != 'n':
preprocessed_bsdd_data = {
'bSDD classification found': {
'name': [r['classification_name'] for r in bsdd_results][0],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we only take [0] here? There can definitely be multiple classifications in a file. But this might be something that more fundamentally also affects the report.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'bSDD classification found': {
'name': [r['classification_name'] for r in bsdd_results][0],
'Release data': 'n.a.',
'Organisation': 'BuildingSMART',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't hardcode something like that, if we don't know then just leave it out.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have default values for every field of the object in case something goes wrong/an information is not retrieved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@Ghesselink Ghesselink Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have default values for every field of the object in case something goes wrong/an information is not retrieved.

This is already the case.

  • The default value of validity is invalid
    
  • Default of 'domain_source' is 'classification_not_found' https://github.com/Ghesselink/ifc-pipeline-validation/blob/19deaf9e934d8e6e79445b9bf86657cf68aeba32/application/bsdd_utils.py#L61
    
  • All the counts defaults 0
    
  • Classification_name defaults 'name not found' https://github.com/Ghesselink/ifc-pipeline-validation/blob/19deaf9e934d8e6e79445b9bf86657cf68aeba32/application/bsdd_utils.py#L86
    

Or do you think the default values should be different?

inst = get_inst(session, bsdd_result['instance_id'])
observed_type = inst.ifc_type
required_type = bsdd_result['bsdd_type_constraint']
validity = "valid" if utils.do_try(lambda: ifcopenshell.template.create(schema_identifier="IFC4X3").create_entity(observed_type).is_a(required_type), 'invalid') else 'invalid'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that there can be many many classifications in a model. We need to test with large models and see the performance implications.

While this is a clever way to express this, it's not really scalable and also not fully correct.

  • We can't assume IFC4X3, entities valid in IFC2x3 might have been deleted.
  • Creating a full model just to test this is not very performance friendly, instead use ifcopenshell.ifcopenshell_wrapper.schema_by_name(...).declaration_by_name(observed_type).supertype().name() and keep looping over supertype until you get to supertype() is None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@Ghesselink Ghesselink Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, see https://bsi-technicalservices.slack.com/archives/C04PPUCMED9/p1677890032160909

But also out due to Johan's suggestion below

@@ -42,4 +50,45 @@ def get_inst(instance_id):

return hierarchical_bsdd_results

def bsdd_data_processing(bsdd_task, bsdd_results, session):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a very informative name, what does this function return?

Copy link
Contributor Author

@Ghesselink Ghesselink Mar 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function returns data for the 'bsdd data' table in a file metrics report. More specifically,it returns a list of dictionaries containing the classifications (ifc instances, in this case) and their valid/invalid counts.
I've modified the function name and left a comment.

domain_sources.append(bsdd_uri)
else:
parse = urlparse(bsdd_uri)
parsed_domain_file = ''.join(char for char in result[domain_file] if char.isalnum()).lower()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand what's going on here, but does this approach of constructing the url really tested on a series of different classifications?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url = parse.scheme + '/' + parse.netloc + '/' + 'uri' + '/' + domain_part + '/'
domain_sources.append(url)
sources = list(filter(lambda x: x != default, domain_sources))
return sources[0] if sources else default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we're taking [0] here and potentially discarding a lot of other domains.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is now a similar approach as with the domain url, but this time with classification name
Ghesselink@f4fdee4


def bsdd_report_quantity(bsdd_task, item):
return sum(bool(bsdd_result.serialize().get(item)) for bsdd_result in bsdd_task.results)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling serialize() is potentially expensive, better restructure the code so that we only do it once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aothms aothms requested a review from johltn March 3, 2023 10:45
Copy link
Collaborator

@johltn johltn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work: I like how your code is structured. I made a couple of comments regarding how to get the validity information from a bsdd_result object. Plus minor comments about formatting.

def get_inst(session, instance_id):
return session.query(database.ifc_instance).filter(database.ifc_instance.id == instance_id).all()[0]

def get_domain(bsdd_results):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get the domain from the classification URI you can use the same function as what is used in https://github.com/AECgeeks/ifc-pipeline-validation/blob/development/application/checks/check_bsdd_v2.py#L14

Copy link
Contributor Author

@Ghesselink Ghesselink Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works, and I've implemented those changes in Ghesselink@58a98f1

I understand that it's preferable to avoid duplicating the code. However, the current implementation (again invoking the get_domain function) makes a request for every classification, which significantly increases the loading time for a BSDD report. This may lead to frustration among users.

Wouldn't it be better to add this to the database at check_bsdd_v2.py and retrieve at the flask route? This avoids both duplicating code as well as long loading times.

But it's removed from the implementation since it's not needed.

inst = get_inst(session, bsdd_result['instance_id'])
observed_type = inst.ifc_type
required_type = bsdd_result['bsdd_type_constraint']
validity = "valid" if utils.do_try(lambda: ifcopenshell.template.create(schema_identifier="IFC4X3").create_entity(observed_type).is_a(required_type), 'invalid') else 'invalid'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not needed to recompute the validity of an instance regarding a constraint on its type, this information is already present in a bsdd_result object. You can see a bsdd_result object as one constraint to respect for an entity. Such a constraint is part of requirements imposed by a classification which itself belongs to a domain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it's duplicating the work already done. I've changed it in Ghesselink@19deaf9

'bSDD classification found': {
'name': [r['classification_name'] for r in bsdd_results][0],
'Release data': 'n.a.',
'Organisation': 'BuildingSMART',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have default values for every field of the object in case something goes wrong/an information is not retrieved.

@@ -78,3 +78,12 @@ def send_message(msg_content, user_email, html=None):
"html":html,
"subject": "Validation Service update",
"text": msg_content})

def do_try(fn, default=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove this change now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

@Ghesselink Ghesselink Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot comment on your line above.

I think we should have default values for every field of the object in case something goes wrong/an information is not retrieved.

This is already the case.

Or do you think the default values should be different?

observed_type = inst.ifc_type
required_type = bsdd_result['bsdd_type_constraint']
domain_source = bsdd_result['bsdd_classification_uri']
validity = "valid" if required_type in instance_supertypes(observed_type, schema) else 'invalid'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We don't determine the validity of a bsdd_result only with respect to the type
  2. To check if a bsdd_result object is valid, the following attributes of the object need to be set to 1: val_ifc_type, val_property_set, val_property_name, val_property_type, val_property_value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


while True:
try:
result = (lambda x: ifcopenshell.ifcopenshell_wrapper.schema_by_name(schema).declaration_by_name(x).supertype().name())(allowed_types[-1])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand what this line does.
Also, no need to recompute the validity with respect to the type (nor any other criteria) it's available by doing bsdd_result.val_ifc_type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done Ghesselink@19deaf9

Maybe a nice-to-know for somewhere in the future. It's a solution based on what Thomas suggested earlier in this PR. The line you mentioned calculated the supertype of the last element in a table. It's the equivalent of (e.g.) 'IfcWallStandardCase'.is_a('IfcWall') without loading an ifc model and slowing everything down. In the current example, the table starts as follows

allowed types = ['IfcStandardWallCase']
ifcopenshell.ifcopenshell_wrapper.schema_by_name('IFC4X3).declaration_by_name(IfcStandardWallCase).supertype().name() # that's allowed_types [-1], results in 'IfcWall'
new allowed_types = ['IfcWallStandardCase, 'IfcWall']
ifcopenshell.ifcopenshell_wrapper.schema_by_name('IFC4X3).declaration_by_name('IfcWall').supertype().name() # that's allowed_types [-1], results in 'IfcBuiltElement'
new allowed_types = ['IfcWallStandardCase, 'IfcWall, 'IfcBuiltElement'']

So then, x in allowed_types is equivalent to x.is_a(allowed_type) in the sense it takes inheritance into consideration

@@ -794,6 +794,30 @@ def get_model(fn):
return response
else:
return send_file(path)

@application.route('/api/bsdd/statistics/<id>', methods=['GET'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer for coherence to have /api/bsdd_statistics/<id> instead of /api/bsdd/statistics/<id>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ghesselink Ghesselink requested a review from johltn March 6, 2023 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants