Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rust: implement data provider server with fake data #4355

Merged
merged 8 commits into from
Nov 24, 2020

Conversation

wchargin
Copy link
Contributor

Summary:
The //tensorboard/data/server binary now provides a gRPC server that
implements the TensorBoardDataProvider service. It only supports
scalars, and it has entirely fake data. But it’s enough to be wired up
to TensorBoard and show real charts in the UI.

Test Plan:
Run the server with bazel run -c opt //tensorboard/data/server.
(Running with -c opt is not necessary, but the server is notably
faster that way.) Then, in a different shell, use grpc_cli:

$ grpc_cli --channel_creds_type=insecure \
>   --protofiles tensorboard/data/proto/data_provider.proto \
>   call localhost:6806 TensorBoardDataProvider.ListRuns '' 2>/dev/null
runs {
  name: "train"
  start_time: 1605752017
}

Interestingly, this takes 13.9 ± 4.9 ms on my machine, whereas the demo
server added in #4318 took only 5.2 ± 2.6 ms. Both are basically doing
no work on the server, so I suspect that the difference may be due to
grpc_cli having to do more work to parse our real proto files. And,
indeed, making the calls from Python instead takes only 0.8–1.1 ms.

wchargin-branch: rust-real-fake-data-provider

Summary:
The data provider API uses `list_plugins` to populate the list of active
dashboards. I forgot to add a corresponding RPC in #4314, since I’d
hacked around it in my prototypes, but we do want to be able to
implement this cleanly.

Test Plan:
It builds: `bazel build //tensorboard/data/proto:protos_all_py_pb2_grpc`.

wchargin-branch: data-list-plugins-rpc
wchargin-source: 6ab2a2acb154faac3310c4bac690554d7a9e1e74
Summary:
The `//tensorboard/data/server` binary now provides a gRPC server that
implements the `TensorBoardDataProvider` service. It only supports
scalars, and it has entirely fake data. But it’s enough to be wired up
to TensorBoard and show real charts in the UI.

Test Plan:
Run the server with `bazel run -c opt //tensorboard/data/server`.
(Running with `-c opt` is not necessary, but the server is notably
faster that way.) Then, in a different shell, use `grpc_cli`:

```
$ grpc_cli --channel_creds_type=insecure \
>   --protofiles tensorboard/data/proto/data_provider.proto \
>   call localhost:6806 TensorBoardDataProvider.ListRuns '' 2>/dev/null
runs {
  name: "train"
  start_time: 1605752017
}
```

Interestingly, this takes 13.9 ± 4.9 ms on my machine, whereas the demo
server added in #4318 took only 5.2 ± 2.6 ms. Both are basically doing
no work on the server, so I suspect that the difference may be due to
`grpc_cli` having to do more work to parse our real proto files. And,
indeed, making the calls from Python instead takes only 0.8–1.1 ms.

wchargin-branch: rust-real-fake-data-provider
wchargin-source: 67e69e8512a96443568ca916dd9559f4457ddf27
@wchargin wchargin added the core:rustboard //tensorboard/data/server/... label Nov 19, 2020
@google-cla google-cla bot added the cla: yes label Nov 19, 2020
@wchargin wchargin requested a review from psybuzz November 19, 2020 17:34
Base automatically changed from wchargin-data-list-plugins-rpc to master November 20, 2020 00:05
Comment on lines 127 to 128
type ReadBlobStream =
Pin<Box<dyn Stream<Item = Result<data::ReadBlobResponse, Status>> + Send + Sync + 'static>>;
Copy link
Contributor Author

@wchargin wchargin Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging this type signature, since it’s imposing and will probably
raise questions…

This Pin<T> business is mostly to do with async code. In any function,
you can hold a variable on your stack, take a reference to it, and use
that reference:

fn foo() -> (i32, i32, i32) {
    let mut xyz = (1, 2, 3);
    let x = &mut xyz.0;
    *x = 777;
    xyz  // returns (777, 2, 3)
}

But the defining characteristic of async functions is that they can be
suspended and resumed at await points. So what happens if there’s an
await point in there?

async fn foo(new_x: impl Future<Output = i32>) -> (i32, i32, i32) {
    let mut xyz = (1, 2, 3);
    let x = &mut xyz.0;
    *x = new_x.await;
    xyz
}

When we invoked foo, it runs until the new_x.await expression, then
suspends until the new_x future resolves. When it suspends, its stack
frame is recorded and stashed away. Once new_x resolves, the event
loop can resume the invocation of foo by restoring the contents of its
stack frame. But the stack frame may be at a different physical address
now, which means that while xyz is still valid (it’s just a 12-byte
block of memory with three integers in it), the reference x will not
be valid, because it pointed into the old location of xyz on the old
stack frame.

Yet if you try it out you’ll see that the above function
compiles and runs just fine. How can this be?

The trick is that the “stack frame” used for the async function is
pinned. A Pin<&mut T> is a mutable reference to T whose
value can never be moved until it is dropped, or unless it is
“move-safe”. A value is move-safe if it doesn’t have any references into
itself—so most types are move-safe, but the stack frame for our function
looks kind of like

struct FooFuture {
    xyz: (i32, i32, i32),
    x: &'self mut xyz,
}

where &'self is made-up syntax for “reference into self”. The
technical name for “move-safe” is Unpin.

So, you can read the ReadBlobStream type definition as “a unique owned
pointer to a non-relocatable, dynamically dispatched stream, whose items
are either ReadBlobResponses or Statuses, where the stream itself
can be transferred across threads, can be shared among threads, and has
only static references (i.e., is permitted to live forever)”. A lot, to
be sure, but undeniably precise.

In practice, actually creating the stream is pretty straightforward
and doesn’t require any weird types.

I hope that this helps describe the problem that pinning is meant to
solve. I haven’t gone into the details of how exactly it works. IMHO,
pinning is the most complicated corner of Rust, and I can’t claim to
understand it entirely. But if you have a vague understanding of the
concepts and how they relate to memory layout, then the compiler errors
and documentation are generally good enough to guide you to a solution
that works. I expect that over time I’ll come to really grok it more
deeply. :-)

wchargin-branch: rust-real-fake-data-provider
wchargin-source: f681817f2483ced7ea49b5757fad20ae4fbd0655
wchargin-branch: rust-real-fake-data-provider
wchargin-source: 8b0ae806f174d87b71211f679fa67099b2a22062
wchargin-branch: rust-real-fake-data-provider
wchargin-source: d19fc0c8657e93aad6f3b8a426cc6c4e87b92d44
wchargin-branch: rust-real-fake-data-provider
wchargin-source: d19fc0c8657e93aad6f3b8a426cc6c4e87b92d44
Copy link
Contributor Author

@wchargin wchargin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a commit to add tests, which I had planned for later but decided
to add now in case it helps your review.

Copy link
Contributor

@psybuzz psybuzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading articles about Pin, Send, Static, things make sense. I was able to repro the successful test after building OSS grpc_cli as well.

fn block_on<F: Future>(future: F) -> F::Output {
let mut rt = Runtime::new().unwrap();
rt.block_on(future)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought to self: rustc doesn't allow tests to be async fns, but I wonder if it will be allowed in the future.

Copy link
Contributor Author

@wchargin wchargin Nov 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. (I did try it.)

You should be able to write an async test yourself with a bit of
desugaring:

#[test]
fn test() {
    async fn body() {
        // put whatever you want here
    }
    let mut rt = Runtime::new().unwrap();
    rt.block_on(body())
}

For #[test] to support that natively, it’d need some way to get ahold
of a runtime. (Rust’s futures are runtime-agnostic.) I wouldn’t be
surprised to see #[tokio::test] pop up for that, though.

…oh, look: #[tokio::test] exists, and it does just that.
Neat. I guess that is a bit nicer, so I’ll switch over. Thanks!

}

type ReadBlobStream =
Pin<Box<dyn Stream<Item = Result<data::ReadBlobResponse, Status>> + Send + Sync + 'static>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation around this type (the explanation is longer than the PR itself!). I'm glad that creating a stream is simpler.

iiuc, the Send + Sync + 'static trait bounds all come directly from the type in the generated code from Prost in tensorboard.pb.rs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The trait definition specifies

type ReadBlobStream: Stream<Item = Result<super::ReadBlobResponse, tonic::Status>>
    + Send
    + Sync
    + 'static;

which means that our trait implementation must provide a type
ReadBlobStream that implements the Stream trait with the given
Item and also meets the Send + Sync + 'static bounds.

wchargin-branch: rust-real-fake-data-provider
wchargin-source: 298acfd7f14bdc2896f04ac37a87481673564e8f
wchargin-branch: rust-real-fake-data-provider
wchargin-source: 298acfd7f14bdc2896f04ac37a87481673564e8f
@wchargin wchargin merged commit c52e185 into master Nov 24, 2020
@wchargin wchargin deleted the wchargin-rust-real-fake-data-provider branch November 24, 2020 03:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes core:rustboard //tensorboard/data/server/...
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants