If you had asked me to make a wild guess as to how Cloudflare stores internal headers and then removes them, I would have come up with some options:
- An entire separate dictionary or other data structure.
- One single header containing all internal metadata.
- All headers have a prefix, and the internal ones start with I and the external ones start with E.
- All internal headers start with “CFInt”.
I would not have come up with a scheme in which headers in a particular list are internal. (What if someone else uses this name? What if something forgets to sanitize? What if different simultaneously running programs disagree in the list? What if the Connection header names a Cloudflare-internal header? What if the set-difference algorithm is annoyingly slow?)
The web is already full of obnoxiously ambiguous in-band signaling and header naming, and I find it bizarre that a company with Cloudflare’s scale uses such a tedious and error-prone mechanism internally.
It took me a moment to ponder the effectiveness of mapping utf8 characters into a bitmask. At first, it seemed unwise. Then I realized that 32 bits would get you `a-z` plus six special characters, and 64 bits would get you 'A-Z' (uppercase) plus six more. That's plenty of room for HTTP headers, and for a blazing fast matching algorithm since it just masks and compares a couple integers. Even figuring out which bit corresponds to which character is a simple lookup into a 256-word table.
One thing the author leaves out is this technique is technically a Bloom Filter. I find these kinds of things fascinating since they came about in an era where computing was far more resource bound than today (1970 in this case). But here we are, still finding real-world corners to use the same old optimizations.
The presented data for demonstrating a win doesn't have enough power to actually show this - not enough samples were taken.
A very simple analysis in R:
> prop.test(c(9, 4), c(1103,1171))
2-sample test for equality of proportions with continuity correction
data: c(9, 4) out of c(1103, 1171)
X-squared = 1.4915, df = 1, p-value = 0.222
alternative hypothesis: two.sided
95 percent confidence interval:
-0.002409831 0.011897193
sample estimates:
prop 1 prop 2
0.008159565 0.003415884
A p-value of 0.22 isn't below the magic 0.05 and the 95% confidence interval suggests that the trie might actually be slightly worse.
I imagine the trie is better, given the prior analysis, and there is weak evidence for this. But take (a lot) more samples and know with confidence how much better.
It's silly to use Big-O to describe the number of comparisons when the goal is to analyze performance. Comparisons cost 1 cycle, and you can do many of them per cycle with instruction level parallelism and SIMD. The main bottleneck, the actual source of slowness, is memory. It costs thousands of cycles to go to memory, sometimes tens of thousands or hundreds of thousands (if you need a TLB walk or OS interrupt). If you want to use Big-O, use it to estimate the number of cache misses.
I'd probably go with a custom perfect hash function. And Phil Bagwell's popcount trick. That would be faster than everyone else's solution that involves multiple lookups in memory.
I'm not very well versed in data structure optimization, but I was surprised they dismissed hash tables so quickly, especially when the table they are searching through is static. I find it somewhat hard to believe that a specially optimized hash table would not be faster than their trie implementation.
amluto ·126 days ago
- An entire separate dictionary or other data structure.
- One single header containing all internal metadata.
- All headers have a prefix, and the internal ones start with I and the external ones start with E.
- All internal headers start with “CFInt”.
I would not have come up with a scheme in which headers in a particular list are internal. (What if someone else uses this name? What if something forgets to sanitize? What if different simultaneously running programs disagree in the list? What if the Connection header names a Cloudflare-internal header? What if the set-difference algorithm is annoyingly slow?)
The web is already full of obnoxiously ambiguous in-band signaling and header naming, and I find it bizarre that a company with Cloudflare’s scale uses such a tedious and error-prone mechanism internally.
Show replies
pragma_x ·126 days ago
One thing the author leaves out is this technique is technically a Bloom Filter. I find these kinds of things fascinating since they came about in an era where computing was far more resource bound than today (1970 in this case). But here we are, still finding real-world corners to use the same old optimizations.
https://en.wikipedia.org/wiki/Bloom_filter
Show replies
gregsexton ·125 days ago
A very simple analysis in R:
A p-value of 0.22 isn't below the magic 0.05 and the 95% confidence interval suggests that the trie might actually be slightly worse.I imagine the trie is better, given the prior analysis, and there is weak evidence for this. But take (a lot) more samples and know with confidence how much better.
Validark ·125 days ago
I'd probably go with a custom perfect hash function. And Phil Bagwell's popcount trick. That would be faster than everyone else's solution that involves multiple lookups in memory.
The CPU is fast, memory is slow.
mightyham ·126 days ago
Show replies