Five years ago the best AI systems could handle a task that takes a person a couple of seconds before they came apart. By early 2025, one of OpenAI’s reasoning models had reached a Codeforces rating of 2724, which put it above roughly 99.8% of the humans who write code under a clock, and it earned a gold medal at the International Olympiad in Informatics. That summer, general reasoning models from OpenAI and Google DeepMind both scored at the gold-medal line at the International Mathematical Olympiad, solving five of six problems that stop almost every teenager who qualifies to sit the exam.

That is one of the steepest capability ramps any technology has ever had, and it happened in the time it takes to finish high school.

Now point that same engine at the physical world. The AI that learned to reason its way through a proof is the AI now being built into humanoid robots, and the robots are already good. They follow spoken instructions, work out what is in front of them, and handle objects they were never specifically trained on. The question worth asking is no longer whether they keep getting smarter. It is how far this goes, and on that, the honest answer is that nobody can really picture it yet.

The intelligence curve is the steepest thing in tech

METR, a nonprofit that evaluates AI systems, measures progress in a way that travels well beyond software: the length of a task, in human time, that a model can finish on its own about half the time. That figure has been doubling roughly every seven months for years. A model stuck on two-second tasks not long ago can now work through an hour or more of real engineering before it loses the thread. METR is careful about the caveats, the number depends on which tasks and which humans you measure against, and the team says the trend could bend. It has not bent yet.

A lot of that jump came from one idea: reinforcement learning at scale. OpenAI was blunt about it. The leap from its earlier models to o3 came from scaling general-purpose reinforcement learning and letting the model spend more time thinking, not from hand-built tricks. The system learns from its own attempts, keeps what works, and improves. Hold onto that idea, because it is the same one now reshaping robotics.

The models people actually use have moved in step. OpenAI shipped GPT-5.5 in April 2026, aimed squarely at agentic work, where the model drives software across many steps until a job is done. Anthropic released Claude Opus 4.8 in May, then Claude Fable 5 in June, built for long-horizon agentic work and capable enough that the company bolted on hard safety limits. Read the makers’ own benchmark claims with a discount. The independent measurements, the Codeforces percentile, the olympiad scores, METR’s horizons, do not depend on anyone’s marketing, and they all point the same way.

The robots are already running on it

This intelligence is already out of the lab and inside machines that move.

In March 2025, Google DeepMind released Gemini Robotics, a model built on Gemini 2.0 that controls a robot directly, takes instructions in ordinary conversational language, and adapts to objects and situations it was not trained on. DeepMind says it more than doubled performance on a generalization benchmark over earlier robot models, and a companion model, Gemini Robotics-ER, is built for embodied reasoning: working out the steps a task needs before acting on it. Figure’s Helix, announced a month earlier, splits the work the way people do, with a slower system that reasons about the scene and the request and a fast one that runs the body across 35 joints in real time. Physical Intelligence’s π0, trained across seven robot types and dozens of tasks and then open-sourced, folds laundry pulled from a dryer and keeps going when someone yanks the shirt out of its hands mid-fold.

These are early models, and their makers say so. DeepMind calls Gemini Robotics still “on the path to real-world applications.” Look at what already works, though. A robot that understands what you said, reasons about how to do it, generalizes to a cup it has never seen, and recovers when something goes wrong is a genuinely capable system, and an early one. It will get much better, for the same reason the chatbots did.

They learn now, they are not programmed

The old way to make a robot do something was to write the motion out by hand, line by line, for one task in one fixed setup. That world is ending. Modern robots are trained, the way the language models are, and that single shift is what makes the next few years so hard to bound.

Some of the training is human demonstration. A person guides the robot through a task, the model learns to imitate, and then it generalizes. π0 can pick up a new skill from as little as one to twenty hours of that data. More of it, increasingly, is simulation. Robots now practice inside fast, photorealistic copies of the world, running a skill millions of times overnight and carrying what they learn back to the real machine. At its 2025 developer conference, NVIDIA released GR00T N1, an open foundation model for humanoids, alongside tools that generated 780,000 synthetic training examples in eleven hours, work that would have taken months by hand. Mixing that simulated practice with real data improved the model’s performance by 40% over real data alone.

This is where reinforcement learning comes back. In simulation a robot can try a task, fail, adjust, and try again, with nothing to break and no human to wait on, which is the same loop that carried AI from two-second tasks to olympiad medals. Humanoids are now learning to walk over rough ground, stay upright when shoved, and handle objects through that kind of trial and error, with a growing amount of it transferring from simulation to real hardware on the first try. A robot that learns from experience does not have a fixed list of things it can do. The list grows.

From tools to coworkers

Put the pieces together and the near future comes into focus. The intelligence is climbing a steep curve. It already runs real robot bodies. And those bodies now learn instead of waiting to be reprogrammed. The endpoint of that is something closer to a coworker than an appliance.

METR’s horizon number is the signal to watch. As the span of work a system can carry on its own keeps doubling, you cross from a machine that needs a person for every step to one you can hand a goal and leave alone. Applied to a humanoid, that looks like a robot you ask to unload the truck, restock the shelves, and flag whatever is running low, and it sets the order itself, makes the small calls along the way, and adjusts when a box turns up in the wrong place. The embodied-reasoning models are early versions of exactly that judgment. They are built to decide, not only to follow.

The word people will hesitate over is independent, and within limits it is the right one. A near-future humanoid will make its own choices about how to carry out a job, learn the quirks of your warehouse or your kitchen by doing the work, and get better at it week over week without an engineer touching the code. That is less a leap of faith than the straight-line continuation of trends you can already measure, pointed at a body that can finally learn.

The honest part: the direction is sure, the timing is not

Balance matters here, because the same evidence that makes the direction obvious says almost nothing reliable about the calendar.

The physical world is a harder teacher than text. The internet handed the language models a near-infinite supply of free examples. Robots have no such library, as Epoch AI and a stack of robotics research keep pointing out, so every hour of real experience has to be earned, on real hardware or in a simulation good enough to count. That is why the most consumer-ready home humanoid so far, 1X’s NEO, ships in 2026 with a human teleoperator on call to take over the hard tasks. There is a reason for that, and it is the flywheel: every chore a person guides the robot through becomes training data for the autonomy meant to replace the guidance. The hype also runs ahead of the floor. Tesla wanted thousands of its Optimus robots doing real work by the end of 2025, and on its Q4 2025 call the company conceded that none of them were yet, in any material way.

So here is the careful version. That humanoids get a lot more capable, more autonomous, and more useful is about as close to a sure thing as this field offers, because every input that drives it, the models, the training methods, the simulation, the money, is improving at the same time. Exactly when a robot becomes a dependable coworker rather than a supervised one is genuinely uncertain, and anyone who hands you a confident year is guessing.

What nobody can fully picture

This is the part worth sitting with. In the span of a few years, AI went from barely stringing a paragraph together to out-reasoning most people at a fast-growing list of cognitive tasks, and the people building it were repeatedly surprised by their own systems. That same engine, the scaling, the reinforcement learning, the learning from raw experience, is now wired into machines that can walk into a room and act.

We are not at the start of that story for robots. We are at the part of the curve where it bends upward. A humanoid today understands you, reasons about the task, and improves by doing it. Give that a few more turns of the same crank that produced olympiad-level reasoning, and you land somewhere that honestly resists prediction: machines that not only do the work but figure out how, learn the job, and keep getting better at it on their own.

The one thing not in doubt is the direction. These robots are going to be far more capable than they are today, more independent, and much better at making their own decisions, and the distance between the version in the lab and the version that lives in the world is closing fast. How far, how fast, and how strange it gets is the most interesting open question in technology right now, and for once the honest answer is that the ceiling is nowhere in sight.