I find their critique compelling, particularly their emphasis on the disconnect between CoT’s algorithmic mimicry and true cognitive exploration. The authors illustrate this with examples from advanced mathematics, such as the "windmill problem" from the International Mathematics Olympiad, a puzzle whose solution eludes brute-force sequential thinking. These cases underscore the limits of a framework that relies on static datasets and rigid generative processes. CoT, as they demonstrate, falters not because it cannot generate solutions, but because it cannot conceive of them in ways that mirror human ingenuity.
As they say - "Superintelligence isn't about discovering new things; it's about discovering new ways to discover."
This is the big idea in the paper, basically that CoT is limited for some complex problems because there is a class of problems where there is no 'textbook' way to find a solution. These are novel problems that need a unique methodology. "Essentially, to start generating the solution requires that we already know the full approach. The underlying generative process of the solution is not auto-regressive from left-to-right."
Mathematical meaning:
We can formalize this argument through the interpretation of reasoning as a latent variable process (Phan et al., 2023). In particular, classical CoT can be viewed as (equation) i.e., the probability of the final answer being produced by a marginalization over latent reasoning
chains.
We claim that for complex problems, the true solution generating process should be viewed as (equation) i.e., the joint probability distribution of the solution (a, s1, . . . , s) is conditioned on the latent generative process. Notice that this argument is a meta-generalization of the prior CoT argument, hence why we will refer to the process q → z1 → . . . → z as Meta-CoT.
I think this is seminal. It is getting at heart of some issues. Ask o1-pro how you could make a 1550nm laser diode operating at 1ghz have low geometric loss without an expensive collimator using commodity materials or novel manufacturing approaches using first principle physics and the illusion is lost that o1-pro is a big deal. 'Novel' engineering is out of reach because there is no text book on how to do novel engineering and these class of problems is 'not auto-regressive from left-to-right'.
> That is, language
models learn the implicit meaning in text, as opposed to the early belief some researchers held that
sequence-to-sequence models (including transformers) simply fit correlations between sequential
words.
Is this so, that the research community is agreed? Are there papers discussing this topic?
>> Behind this approach is a simple principle often abbreviated as "compression is intelligence", or the model must approximate the distribution of data and perform implicit reasoning in its activations in order to predict the next token (see Solomonoff Induction; Solomonoff 1964)
For the record, the word "intelligence" appears in the two parts of "A Formal Theory of Inductive Inference" (referenced above) a total of 0 times. The word "Compression" appears a total of 0 times. The word "reasoning" once; in the phrase "using similar reasoning".
Unsurprisingly, Solomonoff's work was preoccupied with Inductive Inference. I don't know that he ever said anything bout "compression is intelligence" but I believe this is an idea, and a slogan, that was developed only much later. I am not sure where it comes from, originally.
It is correct that Solomonoff induction was very much about predicting the next symbol in a sequence of symbols; not necessarily linguistic tokens, either. The common claim that LLMs are "in their infancy" or similar are dead wrong. Language modelling is basically ancient (in CS terms) and we have long since crossed in the era of its technological maturity.
Congrats to the authors for a thoughtful work! I have been thinking and working on related ideas for a few months now but did not yet spent commensurate compute on them and might have gone in a different direction; this work certainly helps create better baselines along the way of making better use of decoder transformer architectures. Please keep it coming!
drcwpl ·6 days ago
As they say - "Superintelligence isn't about discovering new things; it's about discovering new ways to discover."
Show replies
adampk ·6 days ago
Mathematical meaning:
We can formalize this argument through the interpretation of reasoning as a latent variable process (Phan et al., 2023). In particular, classical CoT can be viewed as (equation) i.e., the probability of the final answer being produced by a marginalization over latent reasoning chains.
We claim that for complex problems, the true solution generating process should be viewed as (equation) i.e., the joint probability distribution of the solution (a, s1, . . . , s) is conditioned on the latent generative process. Notice that this argument is a meta-generalization of the prior CoT argument, hence why we will refer to the process q → z1 → . . . → z as Meta-CoT.
I think this is seminal. It is getting at heart of some issues. Ask o1-pro how you could make a 1550nm laser diode operating at 1ghz have low geometric loss without an expensive collimator using commodity materials or novel manufacturing approaches using first principle physics and the illusion is lost that o1-pro is a big deal. 'Novel' engineering is out of reach because there is no text book on how to do novel engineering and these class of problems is 'not auto-regressive from left-to-right'.
Show replies
erikerikson ·6 days ago
Is this so, that the research community is agreed? Are there papers discussing this topic?
Show replies
YeGoblynQueenne ·6 days ago
For the record, the word "intelligence" appears in the two parts of "A Formal Theory of Inductive Inference" (referenced above) a total of 0 times. The word "Compression" appears a total of 0 times. The word "reasoning" once; in the phrase "using similar reasoning".
Unsurprisingly, Solomonoff's work was preoccupied with Inductive Inference. I don't know that he ever said anything bout "compression is intelligence" but I believe this is an idea, and a slogan, that was developed only much later. I am not sure where it comes from, originally.
It is correct that Solomonoff induction was very much about predicting the next symbol in a sequence of symbols; not necessarily linguistic tokens, either. The common claim that LLMs are "in their infancy" or similar are dead wrong. Language modelling is basically ancient (in CS terms) and we have long since crossed in the era of its technological maturity.
_______________
[1] https://raysolomonoff.com/publications/1964pt1.pdf
[2] https://raysolomonoff.com/publications/1964pt2.pdf
Show replies
pama ·6 days ago