Skip to content

Commit a347e7e

Browse files
committed
vault backup: 2023-03-11 23:56:23 with 5 changed
Affected files: .obsidian/core-plugins.json 2023 - Archive/Draft - Accidentally Database.md A - Mixtape/Mixtape Daily Technical notes.md Draft - Accidentally Database.md R - Investing & Ideas/ai paas.md
1 parent d7c1770 commit a347e7e

File tree

5 files changed

+77
-0
lines changed

5 files changed

+77
-0
lines changed

.obsidian/core-plugins.json

+1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
"switcher",
55
"graph",
66
"backlink",
7+
"canvas",
78
"outgoing-link",
89
"tag-pane",
910
"page-preview",
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
2+
# Oops, You Wrote a Database
3+
4+
Dear Madam,
5+
6+
**I am afraid to inform you that you have written a database.** I know you just wanted some "simple persistence" and that a basic "key-value store" will due. Maybe keep it in memory as an `object`, or read/write simple JSON files on disk or to a cloud KV store. You said that "Postgres is overkill" and "ORMs create impedance mismatches", and yet, six months later, you have a mountain of application code dedicated to caching, updating, and defensively reading your data — breaking every time you changed data model. You moved 30% faster for your 1 month MVP but it is now slowing you down 30% per team member per month.
7+
8+
Surely, you've read [Reddit has two tables](https://news.ycombinator.com/item?id=32407873) and Dan Pritchett's [BASE: An ACID Alternative](https://dl.acm.org/doi/10.1145/1394127.1394128) and you don't mind writing some extra migration and defensive code in userland to be [web scale](https://www.youtube.com/watch?v=HdnDXsqiPYo). But after working on the app for multiple weeks and hiring more people you are having trouble remembering what goes where, so you start writing down **a list of all the important entities and their attributes and the range of their values**. Perhaps you manually maintain them, or you pull in something like a [Prisma](https://www.prisma.io/) or an [Apollo GraphQL](https://www.apollographql.com/) to get some extra dev tooling and codegen.
9+
10+
The other problem you soon encountered is that there would be weird Heisenbugs cropping up in your KV stores where a user update would go through and show up in one feed but not be updated in another, especially when multiple users and apps access the same sets of data. Your new team members suggest adding a `Pending` state to all the fast updates, and then waiting for success on the slow/error-prone updates to then do a second update to `Complete`. We're moving fast, and we're ensuring consistency in userland while keeping things simple. Maybe we'll need this in like 5 places throughout the app, so you just extracted some utility **code to wait for confirmation of updates for success, or roll them back on partial failure**. Maybe you even had someone split out the `Pending` updates to a separate "log" since if the app crashes we don't want to lose any user data. Maybe when updates happen, you want other things to happen, so you devise an ingenious "hook" system that **triggers more code to run when your not-a-database code ends**.
11+
12+
Then you saw that there were bits of data that always get accessed and updated together. They are inconsistently named and it's tiring to always write the same 3 lines of code needed to join them together every time. In the spirit of keeping things DRY, you wrote a class with all these CRUD operations across fields, but also taking care to have intuitive, [guessable](https://www.johno.com/guessable) API naming [with a consistent grammar](https://www.youtube.com/watch?v=18F5v1diO_A). Maybe it needs to be learnable by others, and maybe you want to expose it to end users (your app's users) for them to make their own queries whether through plain text or autogenerated UI.
13+
14+
Your app launched out of beta, and you got real users! Performance became an issue. There are a myriad of ways to tackle it and your growing team wrote more code to use them all:
15+
16+
- **The same queries get asked again and again?** Maybe we can just memoize the reads, save the results and return them without rerunning the read, join, and aggregate code.
17+
- **We can predict which queries get asked again and again?** Maybe we can just pick a few of those queries and precompute all the results. This way, we're faster on ***initial*** queries, not just the subsequent ones.
18+
- **We can't predict which queries will happen?** But we want to make sure we don't request data we don't need? And nested data dependencies means some queries wait on other queries? Maybe we'll compile a little graph and caching layer to run everything faster. Maybe give it an unassuming name like [Dataloader](https://leebyron.com/dataloader-v2/).
19+
- **Some writes are slow?** Like [this one](https://twitter.com/pdrmnvd/status/1628927239265091585?s=20)? When your team's book club read [Designing Data Intensive Applications](https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321), someone had the bright idea of splitting out some high fan-out writes and pushing some of the load into the [third normal form](https://en.wikipedia.org/wiki/Third_normal_form).
20+
21+
In the last leg of your journey to avoid to avoid using a database, your new high paying Enterprise customers demand assurance that you have taken necessary security measures:
22+
23+
- How do you ensure users can't edit documents they don't own?
24+
- How do you ensure that a devious hacker who is snooping around your undocumented but publicly exposed APIs can't read what they shouldn't?
25+
- How do you ensure that people shouldn't have access to your customer's data, don't?
26+
- How do you reassure them that YOU don't have access to their data?
27+
- When something bad happens, how do you go back in time to figure out what's wrong? How do you know who did what and when?
28+
29+
Your engineers sigh, but those enterprise contracts are juicy. You write more and more and more code and have them audited by a fancy security firm to get the thumbs up.
30+
31+
A schema, transaction manager, write ahead log, query language, caching, indexing, query planning, security/authorization, and an audit log.
32+
33+
Dear Madam, you have written a database.
34+
35+
36+
37+
## internal notes
38+
39+
40+
Planning
41+
- start with kv store
42+
- schema
43+
- transactions
44+
- UX
45+
- query language?
46+
- Perf
47+
- query planner
48+
- caching? indexing? normalizing
49+
- https://twitter.com/pdrmnvd/status/1628927239265091585?s=20
50+
- security/authz
51+
- change data capture
52+
- logging and auditing
53+
- storage engine?
54+
- backup?
55+
56+
57+
58+
https://rachitnigam.com/post/you-have-built-a-compiler/
59+
60+
- i can just store it in files
61+
- need transactions
62+
- nosql
63+
- reference the reddit has two tables post
64+
- https://news.ycombinator.com/item?id=32407873
65+
- graphql
66+
- query planning, caching, query cost, auth
67+
- redux
68+
- oh im just doing optimistic updates for UX
69+
- oh im just normalizing
70+
- route53 DNS
71+
72+
database reinvention https://news.ycombinator.com/item?id=34573776

A - Mixtape/Mixtape Daily Technical notes.md

+2
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,8 @@ the timbit gebru story and stochastic parrots https://overcast.fm/+Y_EFdRvB8/02:
7070

7171
how heroku made postgres https://overcast.fm/+HaNOAkh8M/04:41
7272

73+
snowflake early days https://overcast.fm/+x8z9PdSqA/07:00 from kent graziano
74+
7375

7476
auren hoffman on types of data https://overcast.fm/+w94V5_Ttw/9:00 you can sell.
7577
- at 36mins, data coop model

Draft - Accidentally Database.md

Whitespace-only changes.

R - Investing & Ideas/ai paas.md

+2
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@ trends
1919
- OpenAI has great brand but
2020
- FB released LLaMA https://ai.facebook.com/blog/large-language-model-llama-meta-ai/
2121
- Google released FLAN 20B fully open source
22+
- foundationa models
23+
- https://txt.cohere.ai/ai-is-eating-the-world/
2224

2325
alex bowe
2426
https://github.com/FMInference/FlexGen

0 commit comments

Comments
 (0)