Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial version of ccpp_track_variables.py #419

Merged
merged 34 commits into from
May 11, 2022
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
f2619d9
Initial shell version of ccpp_track_variables.py
mkavulich Oct 14, 2021
2f209a5
A few structure changes, add function for parsing arguments and check…
mkavulich Oct 14, 2021
5715006
Change "xml" to "sdf" to better reflect other script conventions, fle…
mkavulich Nov 1, 2021
4160be2
Starting to make use of existing objects
mkavulich Nov 2, 2021
e9da740
Create new method and attribute for Suite class that creates a call t…
mkavulich Nov 2, 2021
31245e1
Changing directions a little: user must provide path to directory wit…
mkavulich Nov 9, 2021
83f733f
Add logging routines; debug flag (not utilized yet)
mkavulich Nov 11, 2021
6cb14c9
working on getting dictionary of schemes <--> meta filenames
mkavulich Nov 11, 2021
0ceb853
Read in config file instead of metadata_path (maybe can revisit this …
mkavulich Nov 12, 2021
a332867
Need to add call to gather_variable_definitions in order to get new m…
mkavulich Nov 12, 2021
e64dbe3
Finally got to where I thought I should be! I now have a calling tree…
mkavulich Nov 12, 2021
64dad53
Getting close to finished now; using parse_metadata_file to return Me…
mkavulich Nov 15, 2021
67229d2
Find if variable matches in subroutine, if so, output variable name, …
mkavulich Nov 15, 2021
36a23e6
Working prototype complete! The script now takes in an SDF, config fi…
mkavulich Nov 15, 2021
5f9e029
Code cleanup with feedback from pylint
mkavulich Nov 16, 2021
5e03756
Remove unneeded debug changes
mkavulich Nov 16, 2021
8ec769d
Add new "--draw" argument and a stub (for now) subroutine that will g…
mkavulich Nov 16, 2021
6c7cfe9
Convert var_graph from Ordered Dictionary to list of tuples to preser…
mkavulich Nov 18, 2021
9fa53bb
Improve function descriptions, remove bits of draw routine until later
mkavulich Nov 18, 2021
d64a0fb
Don't raise exception if partial matches found
mkavulich Feb 3, 2022
c2012ba
Merge remote-tracking branch 'origin/main' into feature/track_variabl…
mkavulich Feb 24, 2022
08c2027
Explicitly shebang to python3, adopt new environment and logging stru…
mkavulich Feb 24, 2022
566e485
Remove unnecessary "success" variables and handle exceptions in the a…
mkavulich Feb 24, 2022
9e6b59a
modify --> use; this script tracks variables that are both intent(in)…
mkavulich Feb 24, 2022
bc6963a
Convert remaining old-format strings to f-strings
mkavulich Feb 24, 2022
1193e44
Incorporate reviewer suggestion for more robust directory name parsing
mkavulich Feb 24, 2022
c7cc6e4
A few more fixes from pylint, remove redundant "action='store_true'" …
mkavulich Mar 3, 2022
12d9078
Move parsing of command-line arguments to "parse_arguments" function,…
mkavulich Mar 3, 2022
ee99e7e
Add a bit more information about call to "gather_variable_definitions…
mkavulich Mar 3, 2022
b8654c4
Assign "call_tree" attribute as an empty list rather than Nonetype in…
mkavulich Mar 3, 2022
5f545d7
restore accidentally removed store_true action from debug argument
mkavulich Mar 3, 2022
c27f06f
Suggestion from Dom: move creation of scheme call tree to "parse" met…
mkavulich Mar 24, 2022
3f1dd5b
Standardize spaces around `=` character
mkavulich Apr 21, 2022
948b620
Merge remote-tracking branch 'origin/main' into feature/track_variabl…
mkavulich May 10, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
220 changes: 220 additions & 0 deletions scripts/ccpp_track_variables.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
#!/usr/bin/env python

# Standard modules
import argparse
import logging
import collections
import glob

# CCPP framework imports
from metadata_table import find_scheme_names, parse_metadata_file
from ccpp_prebuild import import_config, gather_variable_definitions
from mkstatic import Suite
from parse_checkers import registered_fortran_ddt_names

###############################################################################
# Set up the command line argument parser and other global variables #
###############################################################################

parser = argparse.ArgumentParser()
parser.add_argument('-s', '--sdf', action='store', \
help='suite definition file to parse', required=True)
parser.add_argument('-m', '--metadata_path', action='store', \
help='path to CCPP scheme metadata files', required=True)
parser.add_argument('-c', '--config', action='store', \
help='path to CCPP prebuild configuration file', required=True)
parser.add_argument('-v', '--variable', action='store', \
help='variable to track through CCPP suite', required=True)
#parser.add_argument('--draw', action='store_true', \
# help='draw graph of calling tree for given variable', default=False)
parser.add_argument('--debug', action='store_true', help='enable debugging output', default=False)
args = parser.parse_args()

###############################################################################
# Functions and subroutines #
###############################################################################

def parse_arguments(args):
"""Parse command line arguments."""
success = True
sdf = args.sdf
var = args.variable
configfile = args.config
metapath = args.metadata_path
debug = args.debug
return(success,sdf,var,configfile,metapath,debug)

def setup_logging(debug):
"""Sets up the logging module and logging level."""
success = True
if debug:
level = logging.DEBUG
else:
level = logging.WARNING
logging.basicConfig(format='%(levelname)s: %(message)s', level=level)
if debug:
logging.info('Logging level set to DEBUG')
return success

def parse_suite(sdf):
"""Reads the provided sdf, parses into a Suite data structure, including the "call tree":
the ordered list of schemes for the suite specified by the provided sdf"""
logging.info(f'Reading sdf {sdf} and populating Suite object')
suite = Suite(sdf_name=sdf)
success = suite.parse()
if not success:
logging.error(f'Parsing suite definition file {sdf} failed.')
success = False
return (success, suite)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this type of error handling situation, it could be useful to use a try/except block to get more information about how the SDF parsing failed instead of only reporting that it didn't work.

This is also a nice example of how one who isn't regularly checking the value of success in the caller of this function could just power through the rest of main with an un-parsable Suite object.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a limitation of how errors are handled in the existing object definitions; since they handle errors through a "success" flag I don't want to try to mess with that, especially since this code will be superseded in the (hopefully) near-future and need updating regardless.

logging.info(f'Successfully read sdf {suite.sdf_name}')
logging.info(f'Creating calling tree of schemes for suite {suite.name}')
success = suite.make_call_tree()
if not success:
logging.error('Parsing suite definition file {0} failed.'.format(sdf))
success = False
return (success, suite)

def create_metadata_filename_dict(metapath):
"""Given a path, read all .meta files in that directory and add them to a dictionary: the keys
are the name of the scheme, and the values are the filename of the .meta file associated
with that scheme"""

success = True
metadata_dict = {}
scheme_filenames=glob.glob(metapath + "*.meta")
if not scheme_filenames:
logging.error(f'No files found in {metapath} with ".meta" extension')
success = False

for scheme_fn in scheme_filenames:
schemes=find_scheme_names(scheme_fn)
# The above returns a list of schemes in each filename, but
# we want a dictionary of schemes associated with filenames:
for scheme in schemes:
metadata_dict[scheme]=scheme_fn

return (metadata_dict, success)


def create_var_graph(suite, var, config, metapath):
"""Given a suite, variable name, a 'config' dictionary, and a path to .meta files:
1. Creates a dictionary associating schemes with their .meta files
2. Loops through the call tree of the provided suite
3. For each scheme, reads .meta file for said scheme, checks for variable within that
scheme, and if it exists, adds an entry to a list of tuples, where each tuple includes
the name of the scheme and the intent of the variable within that scheme"""

success = True

# Create a list of tuples that will hold the in/out information for each scheme
var_graph=[]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a pattern where you use spaces around the = symbol and where you omit them?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just regular sloppiness :) I've standardized this to use spaces except when in keyword arguments; I believe that's consistent with PEP8


logging.debug("reading .meta files in path:\n {0}".format(metapath))
(metadata_dict, success)=create_metadata_filename_dict(metapath)
if not success:
raise Exception('Call to create_metadata_filename_dict failed')

logging.debug(f"reading metadata files for schemes defined in config file: "
f"{config['scheme_files']}")

# Loop through call tree, find matching filename for scheme via dictionary schemes_in_files,
# then parse that metadata file to find variable info
partial_matches = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have examples of partial matches? How does tracking them help?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is for when a user inputs something like "latent_heat" as their variable, which matches multiple standard names (e.g. latent_heat_of_vaporization_of_water_at_0c, surface_upward_potential_latent_heat_flux, etc.). This makes it easier for users who might not know the exact standard name of the variable they are looking for, or for something like, for example, any variable containing the word "temperature".

The final section in my PR message ("Partial match for variable") gives an example of this.

for scheme in suite.call_tree:
logging.debug("reading meta file for scheme {0} ".format(scheme))

if scheme in metadata_dict:
scheme_filename = metadata_dict[scheme]
else:
raise Exception(f"Error, scheme '{scheme}' from suite '{suite.sdf_name}' "
f"not found in metadata files in {metapath}")

logging.debug("reading metadata file {0} for scheme {1}".format(scheme_filename, scheme))

new_metadata_headers = parse_metadata_file(scheme_filename, \
known_ddts=registered_fortran_ddt_names(), \
logger=logging.getLogger(__name__))
for scheme_metadata in new_metadata_headers:
for section in scheme_metadata.sections():
found_var = []
intent = ''
for scheme_var in section.variable_list():
exact_match = False
if var == scheme_var.get_prop_value('standard_name'):
logging.debug("Found variable {0} in scheme {1}".format(var,section.title))
found_var=var
exact_match = True
intent = scheme_var.get_prop_value('intent')
break
scheme_var_standard_name = scheme_var.get_prop_value('standard_name')
if scheme_var_standard_name.find(var) != -1:
logging.debug(f"{var} matches {scheme_var_standard_name}")
found_var.append(scheme_var_standard_name)
if not found_var:
logging.debug(f"Did not find variable {var} in scheme {section.title}")
elif exact_match:
logging.debug(f"Exact match found for variable {var} in scheme {section.title},"
f" intent {intent}")
#print(f"{var_graph=}")
var_graph.append((section.title,intent))
else:
logging.debug(f"Found inexact matches for variable(s) {var} "
f"in scheme {section.title}:\n{found_var}")
partial_matches[section.title] = found_var
if var_graph:
logging.debug("Successfully generated variable graph for sdf {0}\n".format(suite.sdf_name))
else:
success = False
logging.error(f"Variable {var} not found in any suites for sdf {suite.sdf_name}\n")
if partial_matches:
print("Did find partial matches that may be of interest:\n")
for key in partial_matches:
print("In {0} found variable(s) {1}".format(key, partial_matches[key]))

return (success,var_graph)

def draw_var_graph(var_graph):
"""Draw a graphical representation of the variable graph (not yet implemented)"""

success = True

return success

return

def main():
"""Main routine that traverses a CCPP suite and outputs the list of schemes that modify given variable"""

(success, sdf, var, configfile, metapath, debug) = parse_arguments(args)
if not success:
raise Exception('Call to parse_arguments failed.')

success = setup_logging(debug)
if not success:
raise Exception('Call to setup_logging failed.')

(success, suite) = parse_suite(sdf)
if not success:
raise Exception('Call to parse_suite failed.')

(success, config) = import_config(configfile, None)
if not success:
raise Exception('Call to import_config failed.')

# Variables defined by the host model
(success, _, _) = gather_variable_definitions(config['variable_definition_files'], config['typedefs_new_metadata'])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment about this magic could be helpful. I assume given its name, it will update the config data structure in place, but it's not obvious. You are also opting (I'm assuming intentionally) to not store the output, which throws a wrench in the obvious nature of this call.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be honest, I'm not 100% sure exactly what this step is doing, but it is necessary for converting metadata from an old format. I've added a comment trying to be as informative as possible.

if not success:
raise Exception('Call to gather_variable_definitions failed.')

(success, var_graph) = create_var_graph(suite, var, config, metapath)
if not success:
raise Exception('Call to create_var_graph failed.')

print(f"For suite {suite.sdf_name}, the following schemes (in order) "
f"modify the variable {var}:")
for entry in var_graph:
print(f"{entry[0]} (intent {entry[1]})")


if __name__ == '__main__':
main()
2 changes: 1 addition & 1 deletion scripts/metadata_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
# Output: This routine converts the argument tables for all subroutines / typedefs / kind / module variables
# into dictionaries suitable to be used with ccpp_prebuild.py (which generates the fortran code for the caps)

# Items in this dictionary are used for checking valid entries in metadata tables. For columsn with no keys/keys
# Items in this dictionary are used for checking valid entries in metadata tables. For columns with no keys/keys
# commented out, no check is performed. This is the case for 'type' and 'kind' right now, since models use their
# own derived data types and kind types.
VALID_ITEMS = {
Expand Down
38 changes: 38 additions & 0 deletions scripts/mkstatic.py
Original file line number Diff line number Diff line change
Expand Up @@ -504,6 +504,7 @@ def __init__(self, **kwargs):
self._sdf_name = None
self._all_schemes_called = None
self._all_subroutines_called = None
self._call_tree = None
self._caps = None
self._module = None
self._subroutines = None
Expand Down Expand Up @@ -604,6 +605,38 @@ def parse(self):

return success

def make_call_tree(self):
'''Create a call tree (list of schemes as they are called) for the specified suite definition file'''
success = True

if not os.path.exists(self._sdf_name):
logging.critical("Suite definition file {0} not found.".format(self._sdf_name))
success = False
return success

tree = ET.parse(self._sdf_name)
suite_xml = tree.getroot()
self._name = suite_xml.get('name')
# Validate name of suite in XML tag against filename; could be moved to common.py
if not (os.path.basename(self._sdf_name) == 'suite_{}.xml'.format(self._name)):
logging.critical("Invalid suite name {0} in suite definition file {1}.".format(
self._name, self._sdf_name))
success = False
return success

# Call tree of all schemes in SDF (with duplicates and subcycles)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first 20 lines are identical with the first twenty lines of the parse routine. Can this be combined to avoid code duplication?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe have an optional (or mandatory) argument create_call_tree for the parse routine, that switches between what is done in the second half of the parse routine?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea. I initially had the naive idea to try to not modify any existing routines so that I wouldn't have to run as many tests, but it makes more sense the way you suggested. I have implemented this change, let me know how it looks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks for making the change.

self._call_tree = []

# Populate call tree from SDF's heirarchical structure, including multiple calls in subcycle loops
for group_xml in suite_xml:
for subcycle_xml in group_xml:
for loop in range(0,int(subcycle_xml.get('loop'))):
for scheme_xml in subcycle_xml:
self._call_tree.append(scheme_xml.text)

return success


def print_debug(self):
'''Basic debugging output about the suite.'''
print("ALL SUBROUTINES:")
Expand All @@ -618,6 +651,11 @@ def all_schemes_called(self):
'''Get the list of all schemes.'''
return self._all_schemes_called

@property
def call_tree(self):
'''Get the call tree of the suite (all schemes, in order, with duplicates and loops).'''
return self._call_tree

@property
def all_subroutines_called(self):
'''Get the list of all subroutines.'''
Expand Down