measuring performance against tungstenite-rs #302

tdudz · 2022-08-30T23:24:49Z

i'm working on a project using tungstenite right now, and after profiling my code I noticed it's doing a lot of allocations so i'm looking to replace it with something faster and more lightweight. i put together a quick benchmark of tungstenite and websocket-lite clients reading a bunch of json from a simple server and compared the results, the difference was not as large as I had hoped. do you have any of your own benchmarks i can compare against? or maybe my benchmark was not representative of websocket-lite's actual performance?

criterion code, which i ran with cargo bench:

use std::{
    thread::{sleep, spawn},
    time::Duration,
};

use criterion::{black_box, criterion_group, criterion_main, Criterion};
use tungstenite::connect;
use websocket_lite::ClientBuilder;
use ws_benchmark_experiment::{server, ADDR};

pub fn criterion_benchmark(c: &mut Criterion) {
    spawn(server);

    let mut client_tungstenite = connect(format!("ws://{}", ADDR)).unwrap().0;
    let mut client_lite = ClientBuilder::new(&format!("ws://{}", ADDR))
        .unwrap()
        .connect()
        .unwrap();

    sleep(Duration::from_secs(1));

    let mut group = c.benchmark_group("read latency");

    group.bench_function("tungstenite", |b| {
        b.iter(|| {
            let msg = client_tungstenite.read_message().unwrap();
            black_box(msg);
        })
    });
    group.bench_function("websocket_lite", |b| {
        b.iter(|| {
            let msg = client_lite.receive().unwrap();
            black_box(msg);
        })
    });
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

use std::{
    net::TcpListener,
    thread::{sleep, spawn},
    time::{Duration, Instant},
};

use tungstenite::{accept, connect, Message};
use websocket_lite::ClientBuilder;

pub static ADDR: &'static str = "127.0.0.1:9119";
pub static ITERATIONS: u32 = 10_000_000;
pub static MESSAGE_SIZE: usize = 100;

pub fn server() {
    let server = TcpListener::bind(ADDR).unwrap();

    for stream in server.incoming() {
        spawn(move || {
            let mut websocket = accept(stream.unwrap()).unwrap();
            let payload = Message::Text(
                String::from(r#"{ "foo": "bar", "baz": 123, "quux": false }"#.repeat(MESSAGE_SIZE)), //.as_bytes()
                                                                                                     //.to_vec(),
            );

            loop {
                websocket.write_message(payload.clone());
            }
        });
    }
}

The text was updated successfully, but these errors were encountered:

1tgr · 2022-08-31T19:01:44Z

I think the websocket-lite benchmark should be ok. You should expect zero memory allocations from the profiler loop (that is, after the initial client connect). When developing websocket clients I used https://github.com/KDE/heaptrack to verify this manually. (The idea is that every Message struct has a reference-counted reference back to a single buffer owned by the codec.)

I'll try to find time to run the benchmark through heaptrack and check for any rogue memory allocations.

smabie · 2022-10-10T17:48:43Z

I guess the bigger point is: if the performance of tungstenite and websocket-lite aren't significantly different, than what's the point of this project?

Perhaps something is off from the benchmark?

1tgr · 2022-10-10T21:43:08Z

This is an interesting result, thanks for putting together the benchmark.

The initial focus of websocket-lite was to have zero memory allocations after initialisation. Allocations were inherent in the tungstenite design at the time (you have to call into_text or into_vec, which drop the Message), and malloc and free still appear on the tungstenite benchmark whereas they do not appear when calling websocket-lite.

Even without the overhead of memory allocations, parsing of messages on websocket-lite seems a little slower. Hopefully I can see if the two libraries are doing anything different.

1tgr · 2022-10-11T19:12:17Z

I think the timings are close because they're dominated by the network itself, even on localhost; that is, the recv() call shows up in the profiler as the hottest function.

In terms of elapsed time, the two libraries are close for me, with websocket-lite marginally faster when you include the recv() call (which is what the criterion output shows).

With recv() removed my profiler is showing websocket-lite being twice as fast as tungstenite. Apologies for not sharing the data, at home I'm using Mac OS Instruments.app and I don't think I have the data in a form I can share easily.

In terms of context - the aim of websocket-lite is to give you a parsed message as quickly as possible, with predictable timings consistent from one message to the next. That is, if we can measure the interval between the last byte of the frame and the return of the client_lite.receive() function, it aims to minimise the mean and the variance of this interval.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

measuring performance against tungstenite-rs #302

measuring performance against tungstenite-rs #302

tdudz commented Aug 30, 2022 •

edited

Loading

1tgr commented Aug 31, 2022 •

edited

Loading

smabie commented Oct 10, 2022

1tgr commented Oct 10, 2022

1tgr commented Oct 11, 2022

measuring performance against tungstenite-rs #302

measuring performance against tungstenite-rs #302

Comments

tdudz commented Aug 30, 2022 • edited Loading

1tgr commented Aug 31, 2022 • edited Loading

smabie commented Oct 10, 2022

1tgr commented Oct 10, 2022

1tgr commented Oct 11, 2022

tdudz commented Aug 30, 2022 •

edited

Loading

1tgr commented Aug 31, 2022 •

edited

Loading