Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement: Use BTreeMap instead of HashMap for logs and metrics #1838

Merged
7 commits merged into from
Feb 21, 2020

Conversation

ghost
Copy link

@ghost ghost commented Feb 18, 2020

Closes #1428, closes #1439. Expected to be merged after #1836.

This PR replaces the map type for logs and metrics from HashMap to BTreeMap. It does it for both logs and metrics at once because this way the changes in Protobuf deserialization are the simplest (the default map type just changed from HashMap to BTreeMap for all Protobuf structures in build.rs).

I wanted to introduce special type aliases for BTreeMap<Atom, String> in log event and BTreeMap<String, String> in metrics as was proposed in #1662 (comment), but it would still keep multiple mentions of BTreeMap over the codebase. Because of that I think such a replacement is better to be done as a separate issue, probably a part of the Tech-debt payment #1 milestone, but not in this PR to keep things scoped.

See #1662 (comment) for details.

Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
@binarylogic
Copy link
Contributor

Would it make sense to add tests here that assert the output is sorted in some way? For example, the console sink should be sorting keys now, correct? Maybe this should be added as a behavior test?

Copy link
Contributor

@Hoverbear Hoverbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the code looks good.

I'm not entirely clear why adding ValueMap and ValueArray are necessary as part of this change, but I feel like I'm missing some context around that. (Edit: I see now why in: #1428)

Having a test for sorted-ness would be useful as one of us may try to get clever one day and optimize things, breaking the sorting.

Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
@ghost
Copy link
Author

ghost commented Feb 18, 2020

@binarylogic

Would it make sense to add tests here that assert the output is sorted in some way? For example, the console sink should be sorting keys now, correct? Maybe this should be added as a behavior test?

I've added the unit tests which check serialization order for console sink and iteration order for fields in log events. The behavior tests are currently not able to check the fields order.

@Hoverbear

I'm not entirely clear why adding ValueMap and ValueArray are necessary as part of this change, but I feel like I'm missing some context around that. (Edit: I see now why in: #1428)

The first commit of this PR 1cc021b is supposed to be merged as #1836, and then this PR will be rebased on top of master. So it contains 1cc021b just to not be blocked by waiting until #1836 is merged.

@bruceg
Copy link
Member

bruceg commented Feb 19, 2020

Out of curiosity, has anybody benchmarked how this might affect performance?

@binarylogic
Copy link
Contributor

Excellent question! This is a perfect PR for trying out #1416.

@binarylogic
Copy link
Contributor

I'm also curious if we should be comparing the /benches results?

@LucioFranco
Copy link
Contributor

LucioFranco commented Feb 19, 2020

I'll run some benchmarks but it looks like some of our benches currently don't run so I will figure that out and check against this branch. I should have a decent machine for benching now...

@LucioFranco
Copy link
Contributor

LucioFranco commented Feb 19, 2020

Overall benches look very positive, my system is somewhat noisy but these are macro benchmarks anyways:

pipe                    time:   [147.74 ms 152.41 ms 157.02 ms]
                        thrpt:  [60.734 MiB/s 62.571 MiB/s 64.549 MiB/s]
                 change:
                        time:   [-25.235% -21.497% -17.903%] (p = 0.00 < 0.05)
                        thrpt:  [+21.807% +27.384% +33.752%]
                        Performance has improved.

Benchmarking pipe_with_tiny_lines: Warming up for 3.0000 s
pipe_with_tiny_lines    time:   [109.46 ms 114.05 ms 118.78 ms]
                        thrpt:  [822.14 KiB/s 856.27 KiB/s 892.17 KiB/s]
                 change:
                        time:   [-24.639% -20.734% -16.496%] (p = 0.00 < 0.05)
                        thrpt:  [+19.755% +26.158% +32.694%]
                        Performance has improved.

Benchmarking pipe_with_huge_lines: Warming up for 3.0000 s
pipe_with_huge_lines    time:   [717.95 ms 723.49 ms 726.37 ms]
                        thrpt:  [262.59 MiB/s 263.63 MiB/s 265.67 MiB/s]
                 change:
                        time:   [+1.2460% +2.3198% +3.3286%] (p = 0.00 < 0.05)
                        thrpt:  [-3.2214% -2.2672% -1.2306%]
                        Change within noise threshold.

Benchmarking pipe_with_many_writers: Warming up for 3.0000 s
pipe_with_many_writers  time:   [299.94 ms 302.09 ms 304.37 ms]
                        thrpt:  [31.333 MiB/s 31.569 MiB/s 31.795 MiB/s]
                 change:
                        time:   [+1.6253% +4.0722% +7.1013%] (p = 0.00 < 0.05)
                        thrpt:  [-6.6305% -3.9129% -1.5993%]
                        Change within noise threshold.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

Benchmarking interconnected: Warming up for 3.0000 s
interconnected          time:   [237.65 ms 240.24 ms 244.07 ms]
                        thrpt:  [78.146 MiB/s 79.395 MiB/s 80.257 MiB/s]
                 change:
                        time:   [+6.3640% +8.4617% +10.522%] (p = 0.00 < 0.05)
                        thrpt:  [-9.5202% -7.8016% -5.9832%]
                        Performance has regressed.

Benchmarking transforms: Warming up for 3.0000 s
transforms              time:   [310.74 ms 316.92 ms 323.36 ms]
                        thrpt:  [32.442 MiB/s 33.101 MiB/s 33.759 MiB/s]
                 change:
                        time:   [-21.314% -19.847% -18.331%] (p = 0.00 < 0.05)
                        thrpt:  [+22.446% +24.761% +27.087%]
                        Performance has improved.

Copy link
Contributor

@LucioFranco LucioFranco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks fantastic! Benches look good, just a few questions, otherwise LGTM!

@bruceg
Copy link
Member

bruceg commented Feb 20, 2020

Overall benches look very positive, my system is somewhat noisy but these are macro benchmarks anyways:

They do look positive, surprisingly so (at least to me). I'm wondering if these benches have an small set of keys that would end up fitting in a single B-Tree node, and so not actually stress the harder cases much. I see transforms at least adds a couple more fields in the transforms. The regex benchmark parses a lot of fields out of the message, but that's just as likely to stress the regex crate rather than our code. Do you have numbers for that one too? If not, I'll check myself.

@LucioFranco
Copy link
Contributor

@bruceg yeah, B for rust's btree is 6 so I think maybe pushing events to contain more fields beyond that so that it will force a split could be good.

@bruceg
Copy link
Member

bruceg commented Feb 20, 2020

The regex benchmark adds 10 fields, so that's a start. I guess many simple uses of Vector will be just a few fields. I was just thinking of sources like journald, which can have 25-30 fields per record.

@bruceg
Copy link
Member

bruceg commented Feb 20, 2020

I'm also uncertain why this is marked as a breaking change. What user facing behavior is being changed by this that would break existing uses? I agree this does change our internal API, but I think we have reserved the breaking change indicator for things that cause changes to user visible behavior that they have to compensate for (ie config fields change, etc)

@Hoverbear
Copy link
Contributor

@a-rodin did you have a chance to look at IndexMap? https://docs.rs/indexmap/1.3.2/indexmap/ It looks also suitable for this.

@Hoverbear
Copy link
Contributor

Hoverbear commented Feb 20, 2020

While this is a breaking change (we're changing the order output of fields) JSON by it's nature is unordered, and the internal ordering is left up to the implementer. It may be worth noting to users that ordering may change in the future.

We can also consider that users may be using text, or in the future protobufs or some other data type.

@LucioFranco
Copy link
Contributor

There were no ordering guarantees before, so adding them now shouldn't be a breaking change?

@ghost
Copy link
Author

ghost commented Feb 20, 2020

I'm also uncertain why this is marked as a breaking change. What user facing behavior is being changed by this that would break existing uses? I agree this does change our internal API, but I think we have reserved the breaking change indicator for things that cause changes to user visible behavior that they have to compensate for (ie config fields change, etc)

There were no ordering guarantees before, so adding them now shouldn't be a breaking change?

My reasoning was that it is observable in principle, but as we didn't have any guarantees about this it doesn't break anything.

@ghost ghost changed the title enhancement!: Use BTreeMap instead of HashMap for logs and metrics enhancement: Use BTreeMap instead of HashMap for logs and metrics Feb 20, 2020
@ghost
Copy link
Author

ghost commented Feb 20, 2020

@Hoverbear

@a-rodin did you have a chance to look at IndexMap? https://docs.rs/indexmap/1.3.2/indexmap/ It looks also suitable for this.

We discussed it, but the main issue is that prost doesn't support IndexMap. See #1428 (comment).

@bruceg
Copy link
Member

bruceg commented Feb 20, 2020

I ran the benchmarks (or at least I tried, see below) and discovered that either we don't have good benchmarks, or they are very susceptible to external noise. I set up an EC2 instance (m5.2xlarge 8 thread CPU, 32GB RAM) with nothing else running. I noticed that one of the benchmarks regressed by almost 20% (lua_add_fields/lua), so I reran it. Here are some of the results of running it in a loop:

lua_add_fields/lua      time:   [275.77 ms 283.80 ms 292.01 ms]
lua_add_fields/lua      time:   [266.07 ms 274.29 ms 282.18 ms]
lua_add_fields/lua      time:   [275.07 ms 285.47 ms 292.59 ms]
lua_add_fields/lua      time:   [300.52 ms 307.70 ms 311.71 ms]
lua_add_fields/lua      time:   [258.50 ms 264.29 ms 269.05 ms]
lua_add_fields/lua      time:   [290.66 ms 296.30 ms 304.42 ms]
lua_add_fields/lua      time:   [260.00 ms 268.36 ms 279.30 ms]

The estimate of our run time varied from 264.29ms to 307.70 ms, which swamps most estimates of improvement or regression. Increasing the measurement time to 180s (more than 10x the default) doesn't help much:

lua_add_fields/lua      time:   [273.25 ms 274.81 ms 276.60 ms]
lua_add_fields/lua      time:   [272.05 ms 274.54 ms 276.68 ms]
lua_add_fields/lua      time:   [283.73 ms 285.96 ms 289.96 ms]
lua_add_fields/lua      time:   [281.14 ms 284.54 ms 287.78 ms]
lua_add_fields/lua      time:   [286.95 ms 289.36 ms 291.80 ms]

So, what do we know? This change likely doesn't have a massive impact on performance. I don't think we should hold up this needed change to wait for us to figure out what the actual impact is. We need a better benchmarking system to be able to prove anything. Benchmarking is hard. :(

(Note that this is the when running 6dc3823 cherry picked on top of the last commit that would run benchmarks. I might have made mistakes handling the merge conflicts wrong, but it shouldn't affect the reliability of the performance numbers. See #1872)

@ghost ghost mentioned this pull request Feb 20, 2020
@ghost
Copy link
Author

ghost commented Feb 21, 2020

I set up an EC2 instance (m5.2xlarge 8 thread CPU, 32GB RAM) with nothing else running.

Just a thought: aren't EC2 instances intrinsically noisy because the underlying hardware is shared between multiple VMs? I think to make really precise benchmarks it is better to run them on an operating system running on bare metal.

Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
@ghost
Copy link
Author

ghost commented Feb 21, 2020

Because #1838 (comment) and #1838 (comment) suggest there there were no blocking regressions in performance and also both #1838 (comment) and #1838 (comment) are resolved now, I'm merging this.

@ghost ghost merged commit 17340e2 into master Feb 21, 2020
@LucioFranco
Copy link
Contributor

@bruceg I would say some of our benchmarks in my experience can show varaiance, but in general I try to rely on the pipe based ones and the http ones. Those for me have always be super consistent over multiple runs.

Also one thing to note m5's from what I understand are not the right machine since their CPU's will be somewhat noisy. c5's do a much better job at not being noisy and that's generally what we've used in the past to run benchmarks.

@bruceg bruceg deleted the btree-map-for-logs-and-metrics branch February 21, 2020 18:08
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider ordered data structure for metric tags Sorted json keys
5 participants