Calling it now - RL finally "just works" for any domain where answers are easily verifiable. Verifiability was always a prerequisite, but the difference from prior generations (not just AlphaGo, but any nontrivial RL process prior to roughly mid-2024) is that the reasoning traces and/or intermediate steps can be open-ended with potentially infinite branching, no clear notion of "steps" or nodes and edges in the game tree, and a wide range of equally valid solutions. As long as the quality of the end result can be evaluated cleanly, LLM-based RL is good to go.
As a corollary, once you add in self-play with random variation, the synthetic data problem is solved for coding, math, and some classes of scientific reasoning. No more modal collapse, no more massive teams of PhDs needed for human labeling, as long as you have a reliable metric for answer quality.
This isn't just neat, it's important - as we run out of useful human-generated data, RL scaling is the best candidate to take over where pretraining left off.
Are there platforms that make such training more streamlined? Say I have some definition of success for a given problem and it’s data how do I go about generating said RL model as fast and easily as possible?
We're working on an OSS industrial-grade version of this at TensorZero but there's a long way to go. I think the easiest out of the box solution today is probably OpenAI RFT but that's a partial solve with substantial vendor lock-in.
Most things are verifiable, just not with code. I'm not particularly excited for a world where everything is predictable. This is coming from a guy who loves forecasting/prediction modeling too, but one thing I hate about prediction modeling, especially from a hobbyist standpoint is data. Its very hard to get useful data. Investors will literally buy into hospital groups to get medical data for example.
There are monopolies on the coolest sets of data in almost all industries, all the RL in the world won't do us any good if those companies doing the data hoarding are only using it to forecast outcomes that will make them more money, not what can be done to better society.
Remember that GPUs have cache hierarchies and matching block sizes to optimally hit those caches is a big win that you often don't get by default, just because the number of important kernels times important GPUs times effort to properly tune one is greater than what people are willing to do for others for free in open source. Not to mention kernel fusion and API boundaries that socially force suboptimal choices for the sake of clarity and simplicity.
It's a very impressive result, but not magic, but also not cheating!
For the people awaiting the singularity, lines like this written almost straight from science fiction:
> By suggesting modifications in the standard language of chip designers, AlphaEvolve promotes a collaborative approach between AI and hardware engineers to accelerate the design of future specialized chips."
The paper does not give that many details about the evolution part. Normally, evolutionary algorithms contain some cross-over component where solutions can breed with each other. Otherwise it's better classified as hill climbing / beam search.
There's also 'evolutionary strategy' algorithms that do not use the typical mutation and crossover, but instead use a population of candidates (search samples) to basically approximate the gradient landscape.
This is very neat work! Will be interested in how they make this sort of thing available to the public but it is clear from some of the results they mention that search + LLM is one path to the production of net-new knowledge from AI systems.
Interestingly, they improved matrix multiplication and there was a paper on Arxiv a few days ago [1] that also improved matrix multiplication and the only case common to both is <4,5,6> (multiplying 4x5 matrix with 5x6 matrix) and they both improved it from 93 to 90.
Software engineering will be completely solved. Even systems like v0 are astounding in their ability to generate code, and are very primitive to whats coming. I get downvoted on HN for this opinion, but its truly going to happen. Any system that can produce code, test the code, and iterate if needed will eventually outperform humans. Add in the reinforcement learning, where they can run the code, and train the model when it gets code generation right, and we are on our way to a whole different world.
What about brownfield development though? What about vague requirements or cases with multiple potential paths or cases where some technical choices might have important business consequences that shareholders might need to know about? Can we please stop pretending that software engineering happens in a vacuum?
It is not that you get downvoted because they don’t understand you, it is because you sell your opinion as fact, like an apostle. For example what does it mean that software engineering is solved?
Prophets are always beaten by average citizens, because prophecy is always unpleasant. It can't be otherwise. At the same time, you can't tell right away whether a person is really a prophet, because it becomes known much later. That's probably why beating them (the simplest solution) turns out to be the most observed.
Maybe this one can stop writing a fucking essay in code comments.
I'm now no longer surprised just how consistently all the gemini models overcomplicate coding challenges or just plain get them wrong.
Claude is just consistently spot on. A few salient comments for tricky code instead of incessantly telling me what it's changed and what I might want to do, incorrect assumptions when it has the code or is something we've discussed, changing large amounts of unrelated code (eg styles). I could go on.
Shame I'm too tight to pay for Claude RN though...
Calling it now - RL finally "just works" for any domain where answers are easily verifiable. Verifiability was always a prerequisite, but the difference from prior generations (not just AlphaGo, but any nontrivial RL process prior to roughly mid-2024) is that the reasoning traces and/or intermediate steps can be open-ended with potentially infinite branching, no clear notion of "steps" or nodes and edges in the game tree, and a wide range of equally valid solutions. As long as the quality of the end result can be evaluated cleanly, LLM-based RL is good to go.
As a corollary, once you add in self-play with random variation, the synthetic data problem is solved for coding, math, and some classes of scientific reasoning. No more modal collapse, no more massive teams of PhDs needed for human labeling, as long as you have a reliable metric for answer quality.
This isn't just neat, it's important - as we run out of useful human-generated data, RL scaling is the best candidate to take over where pretraining left off.
Are there platforms that make such training more streamlined? Say I have some definition of success for a given problem and it’s data how do I go about generating said RL model as fast and easily as possible?
We're working on an OSS industrial-grade version of this at TensorZero but there's a long way to go. I think the easiest out of the box solution today is probably OpenAI RFT but that's a partial solve with substantial vendor lock-in.
Most things are verifiable, just not with code. I'm not particularly excited for a world where everything is predictable. This is coming from a guy who loves forecasting/prediction modeling too, but one thing I hate about prediction modeling, especially from a hobbyist standpoint is data. Its very hard to get useful data. Investors will literally buy into hospital groups to get medical data for example.
There are monopolies on the coolest sets of data in almost all industries, all the RL in the world won't do us any good if those companies doing the data hoarding are only using it to forecast outcomes that will make them more money, not what can be done to better society.
> AlphaEvolve achieved up to a 32.5% speedup for the FlashAttention kernel implementation in Transformer-based AI models
> In roughly 75% of cases, it rediscovered state-of-the-art solutions, to the best of our knowledge.
> And in 20% of cases, AlphaEvolve improved the previously best known solutions
These sound like incredible results. I'd be curious what kind of improvements were made / what the improvements were.
Like, was that "up to a 32.5% speedup" on some weird edge case and it was negligible speed up otherwise? Would love to see the benchmarks.
Remember that GPUs have cache hierarchies and matching block sizes to optimally hit those caches is a big win that you often don't get by default, just because the number of important kernels times important GPUs times effort to properly tune one is greater than what people are willing to do for others for free in open source. Not to mention kernel fusion and API boundaries that socially force suboptimal choices for the sake of clarity and simplicity.
It's a very impressive result, but not magic, but also not cheating!
100%. LLMs are extremely useful for doing obvious but repetitive optimizations that a human might miss.
For the people awaiting the singularity, lines like this written almost straight from science fiction:
> By suggesting modifications in the standard language of chip designers, AlphaEvolve promotes a collaborative approach between AI and hardware engineers to accelerate the design of future specialized chips."
Interesting that this wasn't tested on ARC-AGI. Francois has always said he believed program search of this type was the key to solving it.
The paper does not give that many details about the evolution part. Normally, evolutionary algorithms contain some cross-over component where solutions can breed with each other. Otherwise it's better classified as hill climbing / beam search.
There's also 'evolutionary strategy' algorithms that do not use the typical mutation and crossover, but instead use a population of candidates (search samples) to basically approximate the gradient landscape.
This is very neat work! Will be interested in how they make this sort of thing available to the public but it is clear from some of the results they mention that search + LLM is one path to the production of net-new knowledge from AI systems.
Interestingly, they improved matrix multiplication and there was a paper on Arxiv a few days ago [1] that also improved matrix multiplication and the only case common to both is <4,5,6> (multiplying 4x5 matrix with 5x6 matrix) and they both improved it from 93 to 90.
[1]: https://arxiv.org/html/2505.05896v1
Would love for AI to kill the leetcode interview
https://www.interviewcoder.co/ already served that.
Good method to generate synthetic training data, but only works for domains where validation can be scaled up.
Software engineering will be completely solved. Even systems like v0 are astounding in their ability to generate code, and are very primitive to whats coming. I get downvoted on HN for this opinion, but its truly going to happen. Any system that can produce code, test the code, and iterate if needed will eventually outperform humans. Add in the reinforcement learning, where they can run the code, and train the model when it gets code generation right, and we are on our way to a whole different world.
What about brownfield development though? What about vague requirements or cases with multiple potential paths or cases where some technical choices might have important business consequences that shareholders might need to know about? Can we please stop pretending that software engineering happens in a vacuum?
> Any system that can produce code, test the code, and iterate if needed
That isn't every problem in software engineering.
It is not that you get downvoted because they don’t understand you, it is because you sell your opinion as fact, like an apostle. For example what does it mean that software engineering is solved?
> it is because you sell your opinion as fact.
The guy's making a prediction. Classifying it as some kind of religious zealotry isn't fair to his point or him.
Check his profile.
> about: I believe in the creation of a machine god
Sounds about right.
Prophets are always beaten by average citizens, because prophecy is always unpleasant. It can't be otherwise. At the same time, you can't tell right away whether a person is really a prophet, because it becomes known much later. That's probably why beating them (the simplest solution) turns out to be the most observed.
> because prophecy is always unpleasant.
Not necessarily. 'Gospel' is translated as good news. The unpleasant news tends towards those within the power structure that the prophet challenges.
it's known idiom, it means: optimal algorithm is found; like in "tic tac toe is solved problem".
The sleeper has awakened.
Maybe this one can stop writing a fucking essay in code comments.
I'm now no longer surprised just how consistently all the gemini models overcomplicate coding challenges or just plain get them wrong.
Claude is just consistently spot on. A few salient comments for tricky code instead of incessantly telling me what it's changed and what I might want to do, incorrect assumptions when it has the code or is something we've discussed, changing large amounts of unrelated code (eg styles). I could go on.
Shame I'm too tight to pay for Claude RN though...