xianshou 38 minutes ago

Calling it now - RL finally "just works" for any domain where answers are easily verifiable. Verifiability was always a prerequisite, but the difference from prior generations (not just AlphaGo, but any nontrivial RL process prior to roughly mid-2024) is that the reasoning traces and/or intermediate steps can be open-ended with potentially infinite branching, no clear notion of "steps" or nodes and edges in the game tree, and a wide range of equally valid solutions. As long as the quality of the end result can be evaluated cleanly, LLM-based RL is good to go.

As a corollary, once you add in self-play with random variation, the synthetic data problem is solved for coding, math, and some classes of scientific reasoning. No more modal collapse, no more massive teams of PhDs needed for human labeling, as long as you have a reliable metric for answer quality.

This isn't just neat, it's important - as we run out of useful human-generated data, RL scaling is the best candidate to take over where pretraining left off.

  • smattiso 22 minutes ago

    Are there platforms that make such training more streamlined? Say I have some definition of success for a given problem and it’s data how do I go about generating said RL model as fast and easily as possible?

    • vrm 16 minutes ago

      We're working on an OSS industrial-grade version of this at TensorZero but there's a long way to go. I think the easiest out of the box solution today is probably OpenAI RFT but that's a partial solve with substantial vendor lock-in.

  • TechDebtDevin 26 minutes ago

    Most things are verifiable, just not with code. I'm not particularly excited for a world where everything is predictable. This is coming from a guy who loves forecasting/prediction modeling too, but one thing I hate about prediction modeling, especially from a hobbyist standpoint is data. Its very hard to get useful data. Investors will literally buy into hospital groups to get medical data for example.

    There are monopolies on the coolest sets of data in almost all industries, all the RL in the world won't do us any good if those companies doing the data hoarding are only using it to forecast outcomes that will make them more money, not what can be done to better society.

jasonjmcghee 23 minutes ago

> AlphaEvolve achieved up to a 32.5% speedup for the FlashAttention kernel implementation in Transformer-based AI models

> In roughly 75% of cases, it rediscovered state-of-the-art solutions, to the best of our knowledge.

> And in 20% of cases, AlphaEvolve improved the previously best known solutions

These sound like incredible results. I'd be curious what kind of improvements were made / what the improvements were.

Like, was that "up to a 32.5% speedup" on some weird edge case and it was negligible speed up otherwise? Would love to see the benchmarks.

  • schmidtleonard 10 minutes ago

    Remember that GPUs have cache hierarchies and matching block sizes to optimally hit those caches is a big win that you often don't get by default, just because the number of important kernels times important GPUs times effort to properly tune one is greater than what people are willing to do for others for free in open source. Not to mention kernel fusion and API boundaries that socially force suboptimal choices for the sake of clarity and simplicity.

    It's a very impressive result, but not magic, but also not cheating!

    • hiddencost 7 minutes ago

      100%. LLMs are extremely useful for doing obvious but repetitive optimizations that a human might miss.

moritonal 4 minutes ago

For the people awaiting the singularity, lines like this written almost straight from science fiction:

> By suggesting modifications in the standard language of chip designers, AlphaEvolve promotes a collaborative approach between AI and hardware engineers to accelerate the design of future specialized chips."

modeless 2 minutes ago

Interesting that this wasn't tested on ARC-AGI. Francois has always said he believed program search of this type was the key to solving it.

markisus 28 minutes ago

The paper does not give that many details about the evolution part. Normally, evolutionary algorithms contain some cross-over component where solutions can breed with each other. Otherwise it's better classified as hill climbing / beam search.

  • mattdesl 10 minutes ago

    There's also 'evolutionary strategy' algorithms that do not use the typical mutation and crossover, but instead use a population of candidates (search samples) to basically approximate the gradient landscape.

vrm an hour ago

This is very neat work! Will be interested in how they make this sort of thing available to the public but it is clear from some of the results they mention that search + LLM is one path to the production of net-new knowledge from AI systems.

HappyPanacea 5 minutes ago

Interestingly, they improved matrix multiplication and there was a paper on Arxiv a few days ago [1] that also improved matrix multiplication and the only case common to both is <4,5,6> (multiplying 4x5 matrix with 5x6 matrix) and they both improved it from 93 to 90.

[1]: https://arxiv.org/html/2505.05896v1

visarga an hour ago

Good method to generate synthetic training data, but only works for domains where validation can be scaled up.

ldjkfkdsjnv an hour ago

Software engineering will be completely solved. Even systems like v0 are astounding in their ability to generate code, and are very primitive to whats coming. I get downvoted on HN for this opinion, but its truly going to happen. Any system that can produce code, test the code, and iterate if needed will eventually outperform humans. Add in the reinforcement learning, where they can run the code, and train the model when it gets code generation right, and we are on our way to a whole different world.

  • bossyTeacher 3 minutes ago

    What about brownfield development though? What about vague requirements or cases with multiple potential paths or cases where some technical choices might have important business consequences that shareholders might need to know about? Can we please stop pretending that software engineering happens in a vacuum?

  • IncreasePosts 33 minutes ago

    > Any system that can produce code, test the code, and iterate if needed

    That isn't every problem in software engineering.

  • nevertoolate 40 minutes ago

    It is not that you get downvoted because they don’t understand you, it is because you sell your opinion as fact, like an apostle. For example what does it mean that software engineering is solved?

    • squidbeak 5 minutes ago

      > it is because you sell your opinion as fact.

      The guy's making a prediction. Classifying it as some kind of religious zealotry isn't fair to his point or him.

    • jpnc 36 minutes ago

      Check his profile.

      > about: I believe in the creation of a machine god

      Sounds about right.

    • sannysanoff 29 minutes ago

      Prophets are always beaten by average citizens, because prophecy is always unpleasant. It can't be otherwise. At the same time, you can't tell right away whether a person is really a prophet, because it becomes known much later. That's probably why beating them (the simplest solution) turns out to be the most observed.

      • handfuloflight 26 minutes ago

        > because prophecy is always unpleasant.

        Not necessarily. 'Gospel' is translated as good news. The unpleasant news tends towards those within the power structure that the prophet challenges.

    • sannysanoff 35 minutes ago

      it's known idiom, it means: optimal algorithm is found; like in "tic tac toe is solved problem".

nprateem 33 minutes ago

Maybe this one can stop writing a fucking essay in code comments.

I'm now no longer surprised just how consistently all the gemini models overcomplicate coding challenges or just plain get them wrong.

Claude is just consistently spot on. A few salient comments for tricky code instead of incessantly telling me what it's changed and what I might want to do, incorrect assumptions when it has the code or is something we've discussed, changing large amounts of unrelated code (eg styles). I could go on.

Shame I'm too tight to pay for Claude RN though...