Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slowdown from 1.0.2 to 1.1 #99

Closed
jacobwilliams opened this issue Jun 9, 2019 · 9 comments
Closed

Slowdown from 1.0.2 to 1.1 #99

jacobwilliams opened this issue Jun 9, 2019 · 9 comments

Comments

@jacobwilliams
Copy link

It seems like v1.1 is ~10% slower parsing a large namelist file than v1.0.2. My test case is the file here. I haven't done extensive testing with other files though.

@marshallward
Copy link
Owner

marshallward commented Jun 9, 2019 via email

@marshallward
Copy link
Owner

marshallward commented Jun 9, 2019 via email

@jacobwilliams
Copy link
Author

It's the test.nml one in that directory. The others may also have the same issue but I haven't checked.

@marshallward
Copy link
Owner

marshallward commented Jun 9, 2019

These seem to be the bottlenecks:

test.nml

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   290     73023    4138229.0     56.7     17.6              toks = tokenizer.parse(line)
...
   357     72688      69443.0      1.0      0.3                      v_name, v_values = self._parse_variable(
   358     72688      61842.0      0.9      0.3                          g_vars,
   359     72688    9365743.0    128.8     39.9                          patch_nml=grp_patch
   360                                                               )
...
   397       112    8340883.0  74472.2     35.5                      nmls[g_name] = g_update

test4.nml

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   290     73023    7421571.0    101.6      5.6              toks = tokenizer.parse(line)
...
   357     72688      71568.0      1.0      0.1                      v_name, v_values = self._parse_variable(
   358     72688      59002.0      0.8      0.0                          g_vars,
   359     72688   23687196.0    325.9     18.0                          patch_nml=grp_patch
   360                                                               )
...
   397       112   95230652.0 850273.7     72.5                      nmls[g_name] = g_update

The first two were more or less understood bottlenecks (too much small IO and single-byte or single-token parsing) and needs an overall rethink. But the third one looks new.

There have been some changes in __setitem__, example profile in test4.nml:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   109                                               @profile
   110                                               def __setitem__(self, key, value):
   111                                                   """Case-insensitive interface to OrderedDict.
   112                                           
   113                                                   Python dict inputs to the Namelist, such as derived types, are also
   114                                                   converted into Namelists.
   115                                                   """
   116   5462208    2941308.0      0.5      6.6          if isinstance(value, dict) and not isinstance(value, Namelist):
   117                                                       value = Namelist(value,
   118                                                                        default_start_index=self.default_start_index)
   119                                           
   120   5462208   14101689.0      2.6     31.5          elif is_nullable_list(value, dict):
   121   5368630    3248339.0      0.6      7.2              for i, v in enumerate(value):
   122   2687423    1012680.0      0.4      2.3                  if v is not None:
   123   2687423     968887.0      0.4      2.2                      value[i] = Namelist(
   124   2687423     895073.0      0.3      2.0                          v,
   125   2687423   16359632.0      6.1     36.5                          default_start_index=self.default_start_index
   126                                                               )
   127                                                           else:
   128                                                               value[i] = None
   129                                           
   130   5462208    5306041.0      1.0     11.8          super(Namelist, self).__setitem__(key.lower(), value)

The is_nullable_list calls, and conditional Namelist() constructors, seem to be new and real serious problem. Changes were around Jan 2018, about 1.5 years ago. This is around when 1.0.2 was released, though it seems like these changes were slipped in for that release.

So I am not sure that is exactly due to 1.0.2 1.1 but it's certainly around that time, and it does seem new. I will keep looking.

@marshallward
Copy link
Owner

marshallward commented Jun 9, 2019

test.nml and test4.nml seem largely similar in size, with the biggest difference seems to be that test4.nml is creating nearly 3M Namelist objects, whereas test.nml never even gets to this point. (It always returns false on the second logical check of is_nullable_list, and thus never even does the additionall all() test.)

I can't recall the specifics of this yet, but I believe it was mostly to convert internal dicts into Namelists, to avoid the many problems related to dict-Namelist conversion and interaction.

It's possible these problems don't exist anymore; this is overall a much safer operation nowadays. It's also possible that this should be interpreted as a null operation if the input is already a Namelist.

Again, will keep looking...

@marshallward
Copy link
Owner

Confirmed that this block of code is making your case about 6x slower:

        elif is_nullable_list(value, dict):
            for i, v in enumerate(value):
                if v is not None:
                    value[i] = Namelist(
                        v,
                        default_start_index=self.default_start_index
                    )
                else:
                    value[i] = None

This is ensuring that a potential lists of dict types is properly converted into derived-type Namelist types. But this is probably far too aggressive and there's probably a simpler way to achieve this.

Completely removing the block only breaks one of my 80+ tests, so there's probably a lot of room for improvement here.

@marshallward
Copy link
Owner

marshallward commented Jul 17, 2019

This change appears to work, and restores matches the timing if I just remove the code block:

        elif is_nullable_list(value, dict):
            for i, v in enumerate(value):
                if v is None:
                    value[i] = None
                elif isinstance(v, Namelist):
                    # Assume Namelists do not need to be recursively validated.
                    value[i] = v
                else:
                    # value is a non-Namelist dict
                    value[i] = Namelist(
                        v,
                        default_start_index=self.default_start_index
                    )

It also passes my tests.

This removes the redundant passing of a Namelist through its own constructor, which is generally a good idea, but it also means we are not validating that any internal dicts are also Namelists. I don't think this is a problem, since each sub-namelist ought to validate this within its own __setitem__, but surprises abound.

In any case, I will probably merge this change very soon and will just deal with any potential fallout.

@marshallward
Copy link
Owner

I've pushed this change (907aa4c)

@marshallward
Copy link
Owner

Hoping this has been resolved, I will close this for now but feel free to re-open if it's not sufficient.

As mentioned in this issue, there are some other obvious performance bottlenecks, but probably best to deal with them separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants