-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rust: implement data provider server with fake data #4355
Conversation
Summary: The data provider API uses `list_plugins` to populate the list of active dashboards. I forgot to add a corresponding RPC in #4314, since I’d hacked around it in my prototypes, but we do want to be able to implement this cleanly. Test Plan: It builds: `bazel build //tensorboard/data/proto:protos_all_py_pb2_grpc`. wchargin-branch: data-list-plugins-rpc wchargin-source: 6ab2a2acb154faac3310c4bac690554d7a9e1e74
Summary: The `//tensorboard/data/server` binary now provides a gRPC server that implements the `TensorBoardDataProvider` service. It only supports scalars, and it has entirely fake data. But it’s enough to be wired up to TensorBoard and show real charts in the UI. Test Plan: Run the server with `bazel run -c opt //tensorboard/data/server`. (Running with `-c opt` is not necessary, but the server is notably faster that way.) Then, in a different shell, use `grpc_cli`: ``` $ grpc_cli --channel_creds_type=insecure \ > --protofiles tensorboard/data/proto/data_provider.proto \ > call localhost:6806 TensorBoardDataProvider.ListRuns '' 2>/dev/null runs { name: "train" start_time: 1605752017 } ``` Interestingly, this takes 13.9 ± 4.9 ms on my machine, whereas the demo server added in #4318 took only 5.2 ± 2.6 ms. Both are basically doing no work on the server, so I suspect that the difference may be due to `grpc_cli` having to do more work to parse our real proto files. And, indeed, making the calls from Python instead takes only 0.8–1.1 ms. wchargin-branch: rust-real-fake-data-provider wchargin-source: 67e69e8512a96443568ca916dd9559f4457ddf27
tensorboard/data/server/main.rs
Outdated
type ReadBlobStream = | ||
Pin<Box<dyn Stream<Item = Result<data::ReadBlobResponse, Status>> + Send + Sync + 'static>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flagging this type signature, since it’s imposing and will probably
raise questions…
This Pin<T>
business is mostly to do with async code. In any function,
you can hold a variable on your stack, take a reference to it, and use
that reference:
fn foo() -> (i32, i32, i32) {
let mut xyz = (1, 2, 3);
let x = &mut xyz.0;
*x = 777;
xyz // returns (777, 2, 3)
}
But the defining characteristic of async functions is that they can be
suspended and resumed at await
points. So what happens if there’s an
await
point in there?
async fn foo(new_x: impl Future<Output = i32>) -> (i32, i32, i32) {
let mut xyz = (1, 2, 3);
let x = &mut xyz.0;
*x = new_x.await;
xyz
}
When we invoked foo
, it runs until the new_x.await
expression, then
suspends until the new_x
future resolves. When it suspends, its stack
frame is recorded and stashed away. Once new_x
resolves, the event
loop can resume the invocation of foo
by restoring the contents of its
stack frame. But the stack frame may be at a different physical address
now, which means that while xyz
is still valid (it’s just a 12-byte
block of memory with three integers in it), the reference x
will not
be valid, because it pointed into the old location of xyz
on the old
stack frame.
Yet if you try it out you’ll see that the above function
compiles and runs just fine. How can this be?
The trick is that the “stack frame” used for the async function is
pinned. A Pin<&mut T>
is a mutable reference to T
whose
value can never be moved until it is dropped, or unless it is
“move-safe”. A value is move-safe if it doesn’t have any references into
itself—so most types are move-safe, but the stack frame for our function
looks kind of like
struct FooFuture {
xyz: (i32, i32, i32),
x: &'self mut xyz,
}
where &'self
is made-up syntax for “reference into self”. The
technical name for “move-safe” is Unpin
.
So, you can read the ReadBlobStream
type definition as “a unique owned
pointer to a non-relocatable, dynamically dispatched stream, whose items
are either ReadBlobResponse
s or Status
es, where the stream itself
can be transferred across threads, can be shared among threads, and has
only static references (i.e., is permitted to live forever)”. A lot, to
be sure, but undeniably precise.
In practice, actually creating the stream is pretty straightforward
and doesn’t require any weird types.
I hope that this helps describe the problem that pinning is meant to
solve. I haven’t gone into the details of how exactly it works. IMHO,
pinning is the most complicated corner of Rust, and I can’t claim to
understand it entirely. But if you have a vague understanding of the
concepts and how they relate to memory layout, then the compiler errors
and documentation are generally good enough to guide you to a solution
that works. I expect that over time I’ll come to really grok it more
deeply. :-)
wchargin-branch: rust-real-fake-data-provider wchargin-source: f681817f2483ced7ea49b5757fad20ae4fbd0655
wchargin-branch: rust-real-fake-data-provider wchargin-source: 8b0ae806f174d87b71211f679fa67099b2a22062
wchargin-branch: rust-real-fake-data-provider wchargin-source: d19fc0c8657e93aad6f3b8a426cc6c4e87b92d44
wchargin-branch: rust-real-fake-data-provider wchargin-source: d19fc0c8657e93aad6f3b8a426cc6c4e87b92d44
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed a commit to add tests, which I had planned for later but decided
to add now in case it helps your review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading articles about Pin, Send, Static, things make sense. I was able to repro the successful test after building OSS grpc_cli as well.
tensorboard/data/server/server.rs
Outdated
fn block_on<F: Future>(future: F) -> F::Output { | ||
let mut rt = Runtime::new().unwrap(); | ||
rt.block_on(future) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought to self: rustc doesn't allow tests to be async fn
s, but I wonder if it will be allowed in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. (I did try it.)
You should be able to write an async test yourself with a bit of
desugaring:
#[test]
fn test() {
async fn body() {
// put whatever you want here
}
let mut rt = Runtime::new().unwrap();
rt.block_on(body())
}
For #[test]
to support that natively, it’d need some way to get ahold
of a runtime. (Rust’s futures are runtime-agnostic.) I wouldn’t be
surprised to see #[tokio::test]
pop up for that, though.
…oh, look: #[tokio::test]
exists, and it does just that.
Neat. I guess that is a bit nicer, so I’ll switch over. Thanks!
} | ||
|
||
type ReadBlobStream = | ||
Pin<Box<dyn Stream<Item = Result<data::ReadBlobResponse, Status>> + Send + Sync + 'static>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation around this type (the explanation is longer than the PR itself!). I'm glad that creating a stream is simpler.
iiuc, the Send + Sync + 'static trait bounds all come directly from the type in the generated code from Prost in tensorboard.pb.rs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The trait definition specifies
type ReadBlobStream: Stream<Item = Result<super::ReadBlobResponse, tonic::Status>>
+ Send
+ Sync
+ 'static;
which means that our trait implementation must provide a type
ReadBlobStream
that implements the Stream
trait with the given
Item
and also meets the Send + Sync + 'static
bounds.
wchargin-branch: rust-real-fake-data-provider wchargin-source: 298acfd7f14bdc2896f04ac37a87481673564e8f
wchargin-branch: rust-real-fake-data-provider wchargin-source: 298acfd7f14bdc2896f04ac37a87481673564e8f
Summary:
The
//tensorboard/data/server
binary now provides a gRPC server thatimplements the
TensorBoardDataProvider
service. It only supportsscalars, and it has entirely fake data. But it’s enough to be wired up
to TensorBoard and show real charts in the UI.
Test Plan:
Run the server with
bazel run -c opt //tensorboard/data/server
.(Running with
-c opt
is not necessary, but the server is notablyfaster that way.) Then, in a different shell, use
grpc_cli
:Interestingly, this takes 13.9 ± 4.9 ms on my machine, whereas the demo
server added in #4318 took only 5.2 ± 2.6 ms. Both are basically doing
no work on the server, so I suspect that the difference may be due to
grpc_cli
having to do more work to parse our real proto files. And,indeed, making the calls from Python instead takes only 0.8–1.1 ms.
wchargin-branch: rust-real-fake-data-provider