Hi all—I'm the EM for the Search team at Notion, and I want to chime in to clear up one unfortunate misconception I've seen a few times in this thread.
Notion does not sell its users' data.
Instead, I want to expand on one of the first use-cases for the Notion data lake, which was by my team. This is an elaboration of the description in TFA under the heading "Use case support".
As is described there, Notion's block permissions are highly normalized at the source of truth. This is usually quite efficient and generally brings along all the benefits of normalization in application databases. However, we need to _denormalize_ all the permissions that relate to a specific document when we index it into our search index.
When we transactionally reindex a document "online", this is no problem. However, when we need to reindex an entire search cluster from scratch, loading every ancestor of each page in order to collect all of its permissions is far too expensive.
Thus, one of the primary needs that my team had from the new data lake is "tree traversal and permission data construction for each block". We rewrote our "offline" reindexer to read from the data lake instead of reading from RDS instances serving database snapshots. This allowed us to dramatically reduce the impact of iterating through every page when spinning up a new cluster (not to mention save a boatload in spinning up those ad-hoc RDS instances).
I hope this miniature deep dive gives a little bit more color on the uses of this data store—as it is emphatically _not_ to sell our users' data!
They didn’t say the quiet part out loud, which is almost certainly that the Fivetran and Snowflake bills for what they were doing were probably enormous and those were undoubtedly what got management’s attention about fixing this.
This was a nice read, interesting to see how far Postgres (largely alone) can get you.
Also we see how at self hosting within a startup can make perfect sense. :)
Devops that abstract away things in some cases to the cloud might just add to architectural and technical debt later, without the history of learning from working through the challenges
Still, it might have been a great opportunity to figure out offline first use of notion.
I have been forced to use anytype instead of notion for the offline first reason. Time to checkout to learn how they handle storage from the source code.
crux ·106 days ago
Notion does not sell its users' data.
Instead, I want to expand on one of the first use-cases for the Notion data lake, which was by my team. This is an elaboration of the description in TFA under the heading "Use case support".
As is described there, Notion's block permissions are highly normalized at the source of truth. This is usually quite efficient and generally brings along all the benefits of normalization in application databases. However, we need to _denormalize_ all the permissions that relate to a specific document when we index it into our search index.
When we transactionally reindex a document "online", this is no problem. However, when we need to reindex an entire search cluster from scratch, loading every ancestor of each page in order to collect all of its permissions is far too expensive.
Thus, one of the primary needs that my team had from the new data lake is "tree traversal and permission data construction for each block". We rewrote our "offline" reindexer to read from the data lake instead of reading from RDS instances serving database snapshots. This allowed us to dramatically reduce the impact of iterating through every page when spinning up a new cluster (not to mention save a boatload in spinning up those ad-hoc RDS instances).
I hope this miniature deep dive gives a little bit more color on the uses of this data store—as it is emphatically _not_ to sell our users' data!
Show replies
SOLAR_FIELDS ·107 days ago
Show replies
adolph ·107 days ago
What does a backing data lake afford a Notion user that can’t be done in a similar product, like Obsidian?
Show replies
methou ·107 days ago
These aren't something I would like to hear if I'm still using Notion. It's very bold to publish something like this on their own website.
Show replies
j45 ·107 days ago
Also we see how at self hosting within a startup can make perfect sense. :)
Devops that abstract away things in some cases to the cloud might just add to architectural and technical debt later, without the history of learning from working through the challenges
Still, it might have been a great opportunity to figure out offline first use of notion.
I have been forced to use anytype instead of notion for the offline first reason. Time to checkout to learn how they handle storage from the source code.