You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+9-6
Original file line number
Diff line number
Diff line change
@@ -10,21 +10,24 @@ For an 80MB gzipped log file containing 915,427 JSON event objects (which is 1.0
10
10
- 5.0 seconds total to also parse every JSON object into a Rust struct
11
11
- 7.8 seconds total to further parse every User Agent field for Bundler, RubyGems, and Ruby versions and other metrics
12
12
13
-
This is... very good. For comparison, a Python script that used AWS Glue to do something similar took about _30 minutes_. My first approach of writing a `nom` parser-combinator to parse the User Agent field, instead of using a regex, took 18.7 seconds. Processing a gigabyte of almost a million JSON objects into useful histograms in less than 8 seconds just blows my mind. But then I figured out how to use Rayon, and now if you give it 8 gzipped log files on an 8-core MacBook Pro, it can parse 399,300 JSON objects per second.
13
+
This is... very good. For comparison, a Python script that used AWS Glue to do something similar took about _30 minutes_. My first approach of writing a `nom` parser-combinator to parse the User Agent field, instead of using a regex, took 18.7 seconds. Processing a gigabyte of almost a million JSON objects into useful histograms in less than 8 seconds just blows my mind. But then I figured out how to use Rayon, and now it can parse 8 gzipped log files in parallel on an 8-core MacBook Pro, and that's super fast.
14
14
15
15
### Wait, _how_ fast?
16
16
17
17
~525 records/second/cpu in Python on AWS Glue
18
-
50,534 records/second/cpu in Rust with nom
19
-
121,153 records/second/cpu in Rust with regex
18
+
~300,000 records/second/cpu in Rust with regex
20
19
21
20
### Are you kidding me?
22
21
23
-
No. It gets even better if you have multiple cores.
22
+
No. The latest version (which I am now benchmarking without also running `cargo build` 🤦🏻♂️) can parse records really, really fast.
24
23
25
24
~4,200 records/second in Python with 8 worker instances on AWS Glue
26
-
399,300 records/second in Rust with 8 cores and rayon on a MacBook Pro
25
+
~1,085,000 records/second in Rust with 8 cores and rayon on a MacBook Pro
27
26
28
27
### What does it calculate?
29
28
30
-
It counts Bundler, RubyGems, and Ruby versions, in hourly buckets, and prints those out as nested JSON to stdout.
29
+
It counts Bundler, RubyGems, and Ruby versions, in hourly buckets, and prints those out as nested JSON to stdout.
30
+
31
+
### Tell me more about how this happened.
32
+
33
+
Okay, I wrote [a blog post with details about creating this library](https://andre.arko.net/2018/10/25/parsing-logs-230x-faster-with-rust/).
0 commit comments