The post mentions the idea that querying a database D can be understood algebraically as enumerating all morphisms Q -> D, where Q is the "classifying" database of the query, i.e. a minimal database instance that admits a single "generic" result of the query. You can use this to give a neat formulation of Datalog evaluation. A Datalog rule then corresponds a morphism P -> H, where P is the classifying database instance of the rule body and H is the classifying database instance for matches of both body and head. For example, for the the transitivity rule
edge(x, z) :- edge(x, y), edge(y, z).
you'd take for P the database instance containing two rows (a_1, a_2) and (a_2, a_3), and the database instance H contains additionally (a_1, a_3). Now saying that a Database D satisfies this rule means that every morphism P -> D (i.e., every match of the premise of the rule) can be completed to a commuting diagram
P --> D
| ^
| /
⌄ /
Q
where the additional map is the arrow Q -> D, which corresponds to a match of both body and head.
This kind of phenomenon is known in category theory as a "lifting property", and there's rich theory around it. For example, you can show in great generality that there's always a "free" way to add data to a database D so that it satisfies the lifting property (the orthogonal reflection construction/the small object argument). Those are the theoretical underpinnings of the Datalog engine I'm sometimes working on [1], and there they allow you to prove that Datalog evaluation is also well-defined if you allow adjoining new elements during evaluation in a controlled way. I believe the author of this post is involved in the egglog project [2], which might have similar features as well.
For anyone curious: the performance difference between Clang and GCC on the example C solution for verbal arithmetic comes down to Clang's auto-vectorisation (deducing SIMD) whilst GCC here sticks with scalar, which is why the counter brings Clang closer in line to GCC (https://godbolt.org/z/xfdxGvMYP), and it's actually a pretty nice example of auto-vectorisation (and its limitations) in action, which is a fun tangent from this article (given its relevance to high-performance SMT/SAT solving for CSP)
Someone had an issue because SQLite failed to optimize the following query
select * from t where x = 'x' or '' = 'x'
Someone said that SQLite could not optimize out the "or '' = 'x'" because it would be too expensive to compute. Which is obviously true only for huge queries on tiny datasets.
mbid ·11 hours ago
This kind of phenomenon is known in category theory as a "lifting property", and there's rich theory around it. For example, you can show in great generality that there's always a "free" way to add data to a database D so that it satisfies the lifting property (the orthogonal reflection construction/the small object argument). Those are the theoretical underpinnings of the Datalog engine I'm sometimes working on [1], and there they allow you to prove that Datalog evaluation is also well-defined if you allow adjoining new elements during evaluation in a controlled way. I believe the author of this post is involved in the egglog project [2], which might have similar features as well.
[1] https://github.com/eqlog/eqlog [2] https://github.com/egraphs-good/egglog
Show replies
babel_ ·12 hours ago
lovasoa ·10 hours ago
Someone had an issue because SQLite failed to optimize the following query
Someone said that SQLite could not optimize out the "or '' = 'x'" because it would be too expensive to compute. Which is obviously true only for huge queries on tiny datasets.Show replies