Skip to content

Commit 7a70eb0

Browse files
committed
update readme for micro-opts and fixed benchmarks
1 parent c5cc1d1 commit 7a70eb0

File tree

1 file changed

+9
-6
lines changed

1 file changed

+9
-6
lines changed

README.md

+9-6
Original file line numberDiff line numberDiff line change
@@ -10,21 +10,24 @@ For an 80MB gzipped log file containing 915,427 JSON event objects (which is 1.0
1010
- 5.0 seconds total to also parse every JSON object into a Rust struct
1111
- 7.8 seconds total to further parse every User Agent field for Bundler, RubyGems, and Ruby versions and other metrics
1212

13-
This is... very good. For comparison, a Python script that used AWS Glue to do something similar took about _30 minutes_. My first approach of writing a `nom` parser-combinator to parse the User Agent field, instead of using a regex, took 18.7 seconds. Processing a gigabyte of almost a million JSON objects into useful histograms in less than 8 seconds just blows my mind. But then I figured out how to use Rayon, and now if you give it 8 gzipped log files on an 8-core MacBook Pro, it can parse 399,300 JSON objects per second.
13+
This is... very good. For comparison, a Python script that used AWS Glue to do something similar took about _30 minutes_. My first approach of writing a `nom` parser-combinator to parse the User Agent field, instead of using a regex, took 18.7 seconds. Processing a gigabyte of almost a million JSON objects into useful histograms in less than 8 seconds just blows my mind. But then I figured out how to use Rayon, and now it can parse 8 gzipped log files in parallel on an 8-core MacBook Pro, and that's super fast.
1414

1515
### Wait, _how_ fast?
1616

1717
~525 records/second/cpu in Python on AWS Glue
18-
50,534 records/second/cpu in Rust with nom
19-
121,153 records/second/cpu in Rust with regex
18+
~300,000 records/second/cpu in Rust with regex
2019

2120
### Are you kidding me?
2221

23-
No. It gets even better if you have multiple cores.
22+
No. The latest version (which I am now benchmarking without also running `cargo build` 🤦🏻‍♂️) can parse records really, really fast.
2423

2524
~4,200 records/second in Python with 8 worker instances on AWS Glue
26-
399,300 records/second in Rust with 8 cores and rayon on a MacBook Pro
25+
~1,085,000 records/second in Rust with 8 cores and rayon on a MacBook Pro
2726

2827
### What does it calculate?
2928

30-
It counts Bundler, RubyGems, and Ruby versions, in hourly buckets, and prints those out as nested JSON to stdout.
29+
It counts Bundler, RubyGems, and Ruby versions, in hourly buckets, and prints those out as nested JSON to stdout.
30+
31+
### Tell me more about how this happened.
32+
33+
Okay, I wrote [a blog post with details about creating this library](https://andre.arko.net/2018/10/25/parsing-logs-230x-faster-with-rust/).

0 commit comments

Comments
 (0)