diff --git a/content/posts/process-per-request-performance.md b/content/posts/process-per-request-performance.md
new file mode 100644
index 0000000..f19d092
--- /dev/null
+++ b/content/posts/process-per-request-performance.md
@@ -0,0 +1,458 @@
+---
+title: "Can We Improve Process Per Request Performance in Node"
+date: 2024-07-12T00:00:00
+draft: false
+toc: true
+images: []
+tags:
+  - node
+  - js
+  - performance
+  - bun
+  - go
+  - deno
+---
+
+<style>
+    table code { background-color: initial }
+    table td, table th { padding: 0.5rem }
+</style>
+
+How fast can an HTTP server in Node run if we spawn a process for every request?
+
+```js
+import { spawn } from "node:child_process";
+import http from "node:http";
+http
+.createServer((req, res) => spawn("echo", ["hi"]).stdout.pipe(res))
+.listen(8001);
+```
+
+You should avoid spawning a new process for every HTTP request if at all
+possible. Creating a new process or thread is expensive and could easily become
+your core bottleneck. At [Val Town](https://val.town) there are many request
+types where we spawn a new process to handle the request. While we're working to
+reduce this, it is likely that we'll always have some requests that spawn a
+process, and we'd like them to be fast.
+
+When under load, a single one of Val Town's Node servers cannot exceed 40 req/s
+and it spends 30% of the time blocked on calls to `spawn`. Why is it so slow?
+Can we make it any faster?
+
+Let's write up some baseline examples and run them in Node, Deno, Bun, Go, and
+Rust and see how fast we can get them.
+
+I am running all of these on a Hetzner CCX33 with 8 vCPUs and 32 GB of ram. I am
+benchmarking with [bombardier](https://github.com/codesenberg/bombardier)
+running on the same machine. The command I'll run to benchmark each server is
+`bombardier -c 30 -n 10000 http://localhost:8001`. 10,000 total requests over 30
+connections. I prewarm each server before running the benchmark. I'm using Go
+v1.22.2, Rust v1.77.2, Node v22.3.0, Bun 1.1.20, Deno 1.44.2.
+
+Each implementation will run an HTTP server, spawn `echo hi` for each request,
+and respond with the stdout of the command. The Node/Bun/Deno server source is
+at the beginning of this post. The Go source is
+[here](https://github.com/maxmcd/process-per-request/blob/fb2f5f9518d62f058f7e587580c302b56f7a5781/go/main.go)
+and the Rust source is
+[here](https://github.com/maxmcd/process-per-request/blob/0a6442f656fe7bc8f6c61ef2c5fdef65c6afa0f1/rust/src/main.rs).
+
+Here are the results:
+
+| Language/Runtime | Req/s | Command                            |
+| ---------------- | ----- | ---------------------------------- |
+| Node             | 651   | `node baseline.js`                 |
+| Deno             | 2,290 | `deno run --allow-all baseline.js` |
+| Bun              | 2,208 | `bun run baseline.js`              |
+| Go               | 5,227 | `go run go/main.go`                |
+| Rust (tokio)     | 5,466 | `cd rust && cargo run --release`   |
+
+Ok, so Node is slow. Deno and Bun have figured out how to make this faster, and
+the compiled, thread-pool languages are much faster again.
+
+Node's `spawn` performance does seem to be notably bad. [This
+thread](https://github.com/node/node/issues/14917) was an interesting read,
+and while in my testing things have improved since the time of that post, Node
+still spends an awful lot of time blocking the main thread for each Spawn call.
+
+Switching to Bun or Deno would improve this a lot. That is great to know, but
+let's try and improve things with Node.
+
+## Node `cluster` Module
+
+The simplest thing we can do spawn more processes and run an http server
+per-process using Node's `cluster` module. Like so:
+
+```js
+import { spawn } from "node:child_process";
+import http from "node:http";
+import cluster from "node:cluster";
+import { availableParallelism } from "node:os";
+
+if (cluster.isPrimary) {
+  for (let i = 0; i < availableParallelism(); i++) cluster.fork();
+} else {
+  http
+    .createServer((req, res) => spawn("echo", ["hi"]).stdout.pipe(res))
+    .listen(8001);
+}
+```
+
+Node shares the network socket between processes here, so all of our processes
+can listen on `:8001` and they'll be routed requests round-robin.
+
+The main issue with this approach for me is that each HTTP server is isolated in
+it's own process. This can complicate things if you manage any kind of in-memory
+caching or global state that needs to be shared between these processes. I'd
+ideally find a way to keep the single thread execution model of javascript and
+still make spawns fast.
+
+Here are the results:
+
+| Language/Runtime | Req/s | Command                                      |
+| ---------------- | ----- | -------------------------------------------- |
+| Node             | 1,766 | `node cluster.js`                            |
+| Deno             | 2,133 | `deno run --allow-all cluster.js`            |
+| Bun              | n/a   | "node:cluster is not yet implemented in Bun" |
+
+Super weird. Deno is slower, Bun doesn't work just yet, and Node has improved
+a lot, but I would have expected it to be even faster.
+
+Nice to know there is some speedup here. We'll move on from it for now.
+
+## Move The Spawn Calls To Worker Threads
+
+If the `spawn` calls are blocking the main thread, let's move them to worker
+threads.
+
+Here's our `worker-threads/worker.js` code. We listen for messages with a
+command and an id. We run it and post the result back. We're using `execFile`
+here for convenience, but it is just an abstraction on top of `spawn`.
+
+```js
+import { execFile } from "node:child_process";
+import { parentPort } from "node:worker_threads";
+
+parentPort.on("message", (message) => {
+  const [id, cmd, ...args] = message;
+
+  execFile(cmd, args, (_error, stdout, _stderr) => {
+    parentPort.postMessage([id, stdout]);
+  });
+});
+```
+
+And here's our `worker-threads/index.js`. We create 8 worker threads. When we
+want to handle a request we send a message to a thread to make the spawn call
+and send back the output. Once we get the response back, we respond to the http
+request.
+
+```js
+import assert from "node:assert";
+import http from "node:http";
+import { EventEmitter } from "node:events";
+import { Worker } from "node:worker_threads";
+
+const newWorker = () => {
+  const worker = new Worker("./worker-threads/worker.js");
+  const ee = new EventEmitter();
+  // Emit messages from the worker to the EventEmitter by id.
+  worker.on("message", ([id, msg]) => ee.emit(id, msg));
+  return { worker, ee };
+};
+
+// Spawn 8 worker threads.
+const workers = Array.from({ length: 8 }, newWorker);
+const randomWorker = () => workers[Math.floor(Math.random() * workers.length)];
+
+const spawnInWorker = async () => {
+  const worker = randomWorker();
+  const id = Math.random();
+  // Send and wait for our response.
+  worker.worker.postMessage([id, "echo", "hi"]);
+  return new Promise((resolve) => {
+    worker.ee.once(id, (msg) => {
+      resolve(msg);
+    });
+  });
+};
+
+http
+  .createServer(async (_, res) => {
+    let resp = await spawnInWorker();
+    assert.equal(resp, "hi\n"); // no cheating!
+    res.end(resp);
+  })
+  .listen(8001);
+```
+
+Results!:
+
+| Language/Runtime | Req/s | Command                                        |
+| ---------------- | ----- | ---------------------------------------------- |
+| Node             | 426   | `node worker-threads/index.js`                 |
+| Deno             | 3,601 | `deno run --allow-all worker-threads/index.js` |
+| Bun              | 2,898 | `bun run worker-threads/index.js`              |
+
+Node is slower! Ok, so presumably we are not bypassing Node's bottleneck by
+using threads. So we're doing the same work with the added overhead of
+coordinating with the worker threads. Bummer.
+
+Deno loves this, and Bun likes it a little more. Generally, it's nice to see
+that Bun and Deno don't see much of an improvement here. They're already doing a
+good job of keeping the sycall overhead off of the execution thread.
+
+Onward.
+
+## Move Spawn Calls to Child Processes
+
+If threads are not going to work, let's use child processes to do the work.
+
+This is quite easy. We simply swap out the worker threads for processes spawned
+by `child_process.fork` and change how we send and receive messages.
+
+```diff
+$ git diff --unified=1 --no-index ./worker-threads/ ./child-process/
+diff --git a/./worker-threads/index.js b/./child-process/index.js
+index 52a93fe..0ed206e 100644
+--- a/./worker-threads/index.js
++++ b/./child-process/index.js
+@@ -3,6 +3,6 @@ import http from "node:http";
+ import { EventEmitter } from "node:events";
+-import { Worker } from "node:worker_threads";
++import { fork } from "node:child_process";
+
+ const newWorker = () => {
+-  const worker = new Worker("./worker-threads/worker.js");
++  const worker = fork("./child-process/worker.js");
+   const ee = new EventEmitter();
+@@ -21,3 +21,3 @@ const spawnInWorker = async () => {
+   // Send and wait for our response.
+-  worker.worker.postMessage([id, "echo", "hi"]);
++  worker.worker.send([id, "echo", "hi"]);
+   return new Promise((resolve) => {
+diff --git a/./worker-threads/worker.js b/./child-process/worker.js
+index 5f025ca..9b3fcf5 100644
+--- a/./worker-threads/worker.js
++++ b/./child-process/worker.js
+@@ -1,5 +1,4 @@
+ import { execFile } from "node:child_process";
+-import { parentPort } from "node:worker_threads";
+
+-parentPort.on("message", (message) => {
++process.on("message", (message) => {
+   const [id, cmd, ...args] = message;
+@@ -7,3 +6,3 @@ parentPort.on("message", (message) => {
+   execFile(cmd, args, (_error, stdout, _stderr) => {
+-    parentPort.postMessage([id, stdout]);
++    process.send([id, stdout]);
+   });
+```
+
+Nice. And the results:
+
+| Language/Runtime | Req/s | Command                                       |
+| ---------------- | ----- | --------------------------------------------- |
+| Node             | 2,209 | `node child-process/index.js`                 |
+| Deno             | 3,800 | `deno run --allow-all child-process/index.js` |
+| Bun              | 3,871 | `bun run worker-threads/index.js`             |
+
+
+Good speedups all around. I am very curious what the bottleneck is that is
+preventing Deno and Bun from getting to Rust/Go speeds. Please let me know if
+you have suggestions for how to dig into that!
+
+One fun thing here is that we can mix Node and Bun. Bun implements the Node IPC
+protocol, so we can configure Node to spawn Bun child processes. Let's try that.
+
+Update the `fork` arguments to use the `bun` binary instead of Node.
+```js
+const worker = fork("./child-process/worker.js", {
+  execPath: "/home/maxm/.bun/bin/bun",
+});
+```
+
+| Language/Runtime | Req/s | Command                       |
+| ---------------- | ----- | ----------------------------- |
+| Node + Bun       | 3,853 | `node child-process/index.js` |
+
+Hah, cool. I get to use Node on the main thread and leverage Bun's performance.
+
+
+## Stdio
+
+Logs. The previous implementations assume there will be minimal log output, but
+what if there's a lot? We could send the logs using `process.send`, but that
+will be quite expensive if our output bytes are serialized to JSON.
+
+I spent a lot of time in this rabbit hole. Here's a rough summary of the things
+I tried:
+
+1. Passing file descriptors between processes. Like passing the stdout/err back
+   up to the parent process. I tried this a few different ways but couldn't get
+   it working so that we'd always capture all the bytes written.
+2. Just using `process.send`. This works, but is only performant if you use
+   `serialization: "advanced"` so that you can send bytes without serialization.
+   This doesn't work in Deno and Bun.
+3. I created a pair of [Abstract
+   Sockets](https://man7.org/linux/man-pages/man7/unix.7.html) for each spawn
+   call and sent the logs over the socket. This spends too much time setting up
+   the sockets to be worth it.
+
+Also abstract sockets are crazy. I'm familiar with [Unix Domain
+Sockets](https://en.wikipedia.org/wiki/Unix_domain_socket) where you have a file
+called (eg) `something.sock` and you can listen on it and connect to it just
+like a network address. Turns out, that if you use a Unix socket and the
+filename starts with a null byte, like `\0foo` the socket will not exist on the
+filesystem and it'll be automatically removed when no longer used. Weird! Cool!
+
+After all this testing I have two approaches that work pretty well.
+
+1. Set up a pool of processes with `.fork()` and also set up a separate abstract
+   socket for each one to send logs.
+2. Simply use `process.send` but use `serialization: "advanced"`.
+
+Let's see how those work out.
+
+We'll need something that outputs a lot of logs. So I grabbed the `main.c` file
+from Sqlite's source. This is a 163Kb file. We'll run the command `cat main.c`
+to print it out.
+
+Here's our `baseline.js` again with that update:
+```ts
+import { spawn } from "node:child_process";
+import http from "node:http";
+http
+  .createServer((_, res) => spawn("cat", ["main.c"]).stdout.pipe(res))
+  .listen(8001);
+```
+
+I've updated the Go and Rust code as well. Let's see how they do:
+
+| Language/Runtime | Req/s | Command                            |
+| ---------------- | ----- | ---------------------------------- |
+| Node             | 374   | `node baseline.js`                 |
+| Deno             | 667   | `deno run --allow-all baseline.js` |
+| Bun              | 1,374 | `bun run baseline.js`              |
+| Go               | 2,757 | `go run go/main.go`                |
+| Rust (tokio)     | 3,535 | `cd rust && cargo run --release`   |
+
+
+Fascinating. It's cool to see Bun and Rust pull ahead here compared to the
+previous benchmarks. Node is still slow very slow and Deno is surprisingly
+unhappy with this workload.
+
+Next let's try my abstract socket communication channel implementation. It's
+getting quite complex so I won't post it here, but you can [take a look
+here](https://github.com/maxmcd/process-per-request/tree/7528cd8045c998c8b5451961e0818473b4a81860/child-process-comm-channel).
+
+| Language/Runtime | Req/s | Command                                                    |
+| ---------------- | ----- | ---------------------------------------------------------- |
+| Node             | 1,336 | `node child-process-comm-channel/index.js`                 |
+| Node + Bun       | 2,635 | `node child-process-comm-channel/index.js`                 |
+| Deno             | 862   | `deno run --allow-all child-process-comm-channel/index.js` |
+| Bun              | 1,833 | `bun child-process-comm-channel/index.js`                  |
+
+Haha. I had seen some random benchmark results where Node+Bun was faster than
+bun alone, but it never netted out in the final runs.
+
+The Deno results are quite perplexing. In implementing this example I had a
+"bug" where I was buffering the response as a string. Here's the diff of me fixing it:
+
+```diff
+@@ -88,9 +88,8 @@ const spawnInWorker = async (res) => {
+   worker.child.send([id, "spawn", ["cat", ["main.c"]]]);
+-  let resp = "";
+   worker.ee.on(id, (msg, data) => {
+     if (msg == MessageType.STDOUT) {
+-      resp += data.toString();
++      res.write(data);
+     }
+     if (msg == MessageType.STDOUT_CLOSE) {
+-      res.end(resp);
++      res.end();
+       worker.requests -= 1;
+```
+
+Deno performs far better before this fix! Node and Bun both perform better once
+the string buffer is removed.
+
+| Language/Runtime     | Req/s | Command                                                    |
+| -------------------- | ----- | ---------------------------------------------------------- |
+| Deno + string buffer | 1,453 | `deno run --allow-all child-process-comm-channel/index.js` |
+
+Weird!
+
+Finally, here is the `process.send` implementation. It is fast and also
+incredibly simple to implement. I am a little unexcited about this solution
+because it is slower than I'd like, doesn't support Deno and Bun, and there's
+very little space to improve things. However, this implementation is deeply
+practical and easy to understand, which is beautiful. Here's the source of
+`worker.js`, the rest [is here](https://github.com/maxmcd/process-per-request/tree/7528cd8045c998c8b5451961e0818473b4a81860/child-process-send-logs).
+
+```ts
+import { spawn } from "node:child_process";
+import process from "node:process";
+
+process.on("message", (message) => {
+  const [id, cmd, ...args] = message;
+  const cp = spawn(cmd, args);
+  cp.stdout.on("data", (data) => process.send([id, "stdout", data]));
+  cp.stderr.on("data", (data) => process.send([id, "stderr", data]));
+  cp.on("close", (code, signal) => process.send([id, "exit", code, signal]));
+});
+```
+
+| Language/Runtime | Req/s | Command                                       |
+| ---------------- | ----- | --------------------------------------------- |
+| Node             | 1,179   | `node child-process-send-logs/index.js` |
+
+Very nice, probably the practical choice if you are only targeting Node.
+
+## Load Balancing
+
+A quick note on load balancing between processes. Both Go and Rust [have
+complicated schedulers](https://rakyll.org/scheduler/) that [distribute work
+efficiently](https://tokio.rs/blog/2019-10-scheduler). So far, when picking a
+worker I've been grabbing a random one:
+
+```ts
+const workers = await Promise.all(Array.from({ length: 8 }, newWorker));
+const randomWorker = () => workers[Math.floor(Math.random() * workers.length)];
+```
+
+However, we can also implement round-robin, and least-connections style load
+balancing. [See a wonderful writeup on those
+here](https://samwho.dev/load-balancing/).
+
+```ts
+const pickWorkerInOrder = () => workers[(count += 1) % workers.length];
+const pickWorkerWithLeastRequests = () =>
+  workers.reduce((selectedWorker, worker) =>
+    worker.requests < selectedWorker.requests ? worker : selectedWorker
+  );
+```
+
+Sadly I didn't see consistent performance improvements with these approaches.
+They all perform about the same. Maybe more typical workloads where the spawn
+calls are not entirely uniform would benefit more from these changes.
+
+## Library?
+
+It seems possible, given all of these findings, to implement a `child_process`
+library that implements the same API surface as `node:child_process` but farms
+the spawn calls out to a process pool. Maybe I will write that, or maybe you
+will. Please [let me know](https://x.com/mxmcd) if there's interest.
+
+## Final Thoughts
+
+We're sadly at the limits of my knowledge/experimentation, but I wonder what
+could unlock more performance.
+
+It was really fun to see improved performance and what didn't, and the random
+moments where Deno/Bun/Node were affected differently.
+
+Using Node and Bun together is a fun pattern and it's nice to see it lead to
+such a speedup. Please support Node's IPC Deno!
+
+Let me know if there's anything else I should experiment with here! See you next
+time :)
\ No newline at end of file
diff --git a/content/posts/static-chess.md b/content/posts/static-chess.md
new file mode 100644
index 0000000..3c9aca5
--- /dev/null
+++ b/content/posts/static-chess.md
@@ -0,0 +1,14 @@
+---
+title: "Static Chess"
+date: 2020-06-19T01:29:25Z
+draft: true
+toc: false
+---
+
+
+
+https://www.val.town/settings/evaluations/5589f924-12be-11ef-9965-cab58f78796a
+
+https://www.val.town/settings/evaluations/e912381a-12c2-11ef-92f1-cab58f78796a
+https://www.val.town/settings/evaluations/42b27358-12c3-11ef-a572-cab58f78796a
+https://www.val.town/settings/evaluations/45984eb2-12c3-11ef-b57c-cab58f78796a
\ No newline at end of file