Slowdown from 1.0.2 to 1.1 #99

jacobwilliams · 2019-06-09T15:10:09Z

It seems like v1.1 is ~10% slower parsing a large namelist file than v1.0.2. My test case is the file here. I haven't done extensive testing with other files though.

marshallward · 2019-06-09T15:46:12Z

There were some changes but not sure what could affect performance. Maybe the multidimensional bugfixes. Could you link me your test? I can try to profile it.

…

On Sun, 9 Jun 2019, 11:10 Jacob Williams, ***@***.***> wrote: It seems like v1.1 is ~10% slower parsing a large namelist file than v1.0.2. My test case is the file here <https://github.com/jacobwilliams/fast-namelist/tree/master/files>. I haven't done extensive testing with other files though. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#99?email_source=notifications&email_token=AADQ325HOPO4V2RPEGQC3TTPZUMNDA5CNFSM4HWJT5VKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GYOCMSA>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADQ3262TRCWMZQQSAQLRITPZUMNDANCNFSM4HWJT5VA> .

marshallward · 2019-06-09T15:47:01Z

Sorry just noticed the file link, I'll have a look.

…

On Sun, 9 Jun 2019, 11:45 Marshall Ward, ***@***.***> wrote: There were some changes but not sure what could affect performance. Maybe the multidimensional bugfixes. Could you link me your test? I can try to profile it. On Sun, 9 Jun 2019, 11:10 Jacob Williams, ***@***.***> wrote: > It seems like v1.1 is ~10% slower parsing a large namelist file than > v1.0.2. My test case is the file here > <https://github.com/jacobwilliams/fast-namelist/tree/master/files>. I > haven't done extensive testing with other files though. > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <#99?email_source=notifications&email_token=AADQ325HOPO4V2RPEGQC3TTPZUMNDA5CNFSM4HWJT5VKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GYOCMSA>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AADQ3262TRCWMZQQSAQLRITPZUMNDANCNFSM4HWJT5VA> > . >

jacobwilliams · 2019-06-09T16:03:42Z

It's the test.nml one in that directory. The others may also have the same issue but I haven't checked.

marshallward · 2019-06-09T21:33:51Z

These seem to be the bottlenecks:

test.nml

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   290     73023    4138229.0     56.7     17.6              toks = tokenizer.parse(line)
...
   357     72688      69443.0      1.0      0.3                      v_name, v_values = self._parse_variable(
   358     72688      61842.0      0.9      0.3                          g_vars,
   359     72688    9365743.0    128.8     39.9                          patch_nml=grp_patch
   360                                                               )
...
   397       112    8340883.0  74472.2     35.5                      nmls[g_name] = g_update

test4.nml

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   290     73023    7421571.0    101.6      5.6              toks = tokenizer.parse(line)
...
   357     72688      71568.0      1.0      0.1                      v_name, v_values = self._parse_variable(
   358     72688      59002.0      0.8      0.0                          g_vars,
   359     72688   23687196.0    325.9     18.0                          patch_nml=grp_patch
   360                                                               )
...
   397       112   95230652.0 850273.7     72.5                      nmls[g_name] = g_update

The first two were more or less understood bottlenecks (too much small IO and single-byte or single-token parsing) and needs an overall rethink. But the third one looks new.

There have been some changes in __setitem__, example profile in test4.nml:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   109                                               @profile
   110                                               def __setitem__(self, key, value):
   111                                                   """Case-insensitive interface to OrderedDict.
   112                                           
   113                                                   Python dict inputs to the Namelist, such as derived types, are also
   114                                                   converted into Namelists.
   115                                                   """
   116   5462208    2941308.0      0.5      6.6          if isinstance(value, dict) and not isinstance(value, Namelist):
   117                                                       value = Namelist(value,
   118                                                                        default_start_index=self.default_start_index)
   119                                           
   120   5462208   14101689.0      2.6     31.5          elif is_nullable_list(value, dict):
   121   5368630    3248339.0      0.6      7.2              for i, v in enumerate(value):
   122   2687423    1012680.0      0.4      2.3                  if v is not None:
   123   2687423     968887.0      0.4      2.2                      value[i] = Namelist(
   124   2687423     895073.0      0.3      2.0                          v,
   125   2687423   16359632.0      6.1     36.5                          default_start_index=self.default_start_index
   126                                                               )
   127                                                           else:
   128                                                               value[i] = None
   129                                           
   130   5462208    5306041.0      1.0     11.8          super(Namelist, self).__setitem__(key.lower(), value)

The is_nullable_list calls, and conditional Namelist() constructors, seem to be new and real serious problem. Changes were around Jan 2018, about 1.5 years ago. This is around when 1.0.2 was released, though it seems like these changes were slipped in for that release.

So I am not sure that is exactly due to ~~1.0.2~~ 1.1 but it's certainly around that time, and it does seem new. I will keep looking.

marshallward · 2019-06-09T22:14:50Z

test.nml and test4.nml seem largely similar in size, with the biggest difference seems to be that test4.nml is creating nearly 3M Namelist objects, whereas test.nml never even gets to this point. (It always returns false on the second logical check of is_nullable_list, and thus never even does the additionall all() test.)

I can't recall the specifics of this yet, but I believe it was mostly to convert internal dicts into Namelists, to avoid the many problems related to dict-Namelist conversion and interaction.

It's possible these problems don't exist anymore; this is overall a much safer operation nowadays. It's also possible that this should be interpreted as a null operation if the input is already a Namelist.

Again, will keep looking...

marshallward · 2019-07-17T14:43:36Z

Confirmed that this block of code is making your case about 6x slower:

        elif is_nullable_list(value, dict):
            for i, v in enumerate(value):
                if v is not None:
                    value[i] = Namelist(
                        v,
                        default_start_index=self.default_start_index
                    )
                else:
                    value[i] = None

This is ensuring that a potential lists of dict types is properly converted into derived-type Namelist types. But this is probably far too aggressive and there's probably a simpler way to achieve this.

Completely removing the block only breaks one of my 80+ tests, so there's probably a lot of room for improvement here.

marshallward · 2019-07-17T15:02:16Z

This change appears to work, and ~~restores~~ matches the timing if I just remove the code block:

        elif is_nullable_list(value, dict):
            for i, v in enumerate(value):
                if v is None:
                    value[i] = None
                elif isinstance(v, Namelist):
                    # Assume Namelists do not need to be recursively validated.
                    value[i] = v
                else:
                    # value is a non-Namelist dict
                    value[i] = Namelist(
                        v,
                        default_start_index=self.default_start_index
                    )

It also passes my tests.

This removes the redundant passing of a Namelist through its own constructor, which is generally a good idea, but it also means we are not validating that any internal dicts are also Namelists. I don't think this is a problem, since each sub-namelist ought to validate this within its own __setitem__, but surprises abound.

In any case, I will probably merge this change very soon and will just deal with any potential fallout.

marshallward · 2019-07-17T15:38:26Z

I've pushed this change (907aa4c)

marshallward · 2019-07-23T15:21:25Z

Hoping this has been resolved, I will close this for now but feel free to re-open if it's not sufficient.

As mentioned in this issue, there are some other obvious performance bottlenecks, but probably best to deal with them separately.

marshallward mentioned this issue Jun 26, 2019

Namelist print fail when containing generic objects #103

Open

marshallward added the performance label Jun 26, 2019

marshallward closed this as completed Jul 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slowdown from 1.0.2 to 1.1 #99

Slowdown from 1.0.2 to 1.1 #99

jacobwilliams commented Jun 9, 2019

marshallward commented Jun 9, 2019 via email

marshallward commented Jun 9, 2019 via email

jacobwilliams commented Jun 9, 2019

marshallward commented Jun 9, 2019 •

edited

Loading

marshallward commented Jun 9, 2019 •

edited

Loading

marshallward commented Jul 17, 2019

marshallward commented Jul 17, 2019 •

edited

Loading

marshallward commented Jul 17, 2019

marshallward commented Jul 23, 2019

Slowdown from 1.0.2 to 1.1 #99

Slowdown from 1.0.2 to 1.1 #99

Comments

jacobwilliams commented Jun 9, 2019

marshallward commented Jun 9, 2019 via email

marshallward commented Jun 9, 2019 via email

jacobwilliams commented Jun 9, 2019

marshallward commented Jun 9, 2019 • edited Loading

marshallward commented Jun 9, 2019 • edited Loading

marshallward commented Jul 17, 2019

marshallward commented Jul 17, 2019 • edited Loading

marshallward commented Jul 17, 2019

marshallward commented Jul 23, 2019

marshallward commented Jun 9, 2019 •

edited

Loading

marshallward commented Jun 9, 2019 •

edited

Loading

marshallward commented Jul 17, 2019 •

edited

Loading