Robots that walk like us and talk like us are lining up to be the next world-shaping tech. 

We’ve been dreaming of mechanical, artificial versions of ourselves for at least a hundred years. Even longer, if you count the clockwork, flute-playing automatons of the 18th and 19th century. Early depictions in films like Fritz Lang’s Metropolis helped solidify humanoid robots in the popular consciousness, whether serving us breakfast or rising up against us.

But they remained science fiction to most of us until ASIMO, Honda’s pint-sized pioneer that debuted in 2000. Agile, coordinated, and objectively cute, it was lightyears apart from the awkward clanking prototypes that had graced expos in the years prior. It said to the world that, actually, just maybe, humanoid robots might not just be science fiction after all.

Yet the vision of a world where robots walked among us ASIMO promised us fizzled away in a sea of technical difficulties. For you see, Dorothy, there was one thing that ASIMO lacked – and it was not the Tin Man’s heart.

ASIMO captured our hearts but not our markets.

The yellow brick road to robotic reality

The Wizard of Oz was released in August of 1939, famed for its dramatic use of full Technicolor, and the Tin Man as one of its colourful cast. But far fewer know about the Tin Man’s chain-smoking, 7ft tall brother.

Simultaneous to the release of The Wizard of Oz, the 1939 World Fair was taking place in New York. And star of the show was Elektro – built by manufacturing giant Westinghouse, he was arguably the first humanoid robot to fulfil all the main criteria of what we would consider to be a functional robot.

Able to walk (barely), count on his fingers, and smoke cigarettes in response to voice commands, it was an astonishing proof of concept for the time. But more than anything, it shows how far back we’ve been tinkering and dreaming of making humanoid robots a reality.

Elektro: more functional than some bar hoppers on a night out.

It took 60 years from Elektro to reach ASIMO, a prototype still far removed from commercial viability. And 25 years after ASIMO, we are still wrestling with many of the same issues we were in 1939. This is an astonishingly long lead time for a technology, illustrating how complex the technical barriers are. 

Nevertheless, a huge number of hardware hurdles have been cleared. We have the capability to build dextrous and fluid humanoid movement. What has truly become the obstacle is software. In the end, what ASIMO truly lacked was Scarecrow’s brain.

In search of a brain

Robots have been able to ‘see’ for a long time indeed. Even Elektro had sensors allowing him to detect green or red light. Back in 1966, MIT’s ‘Shakey’ was able to navigate structured environments and plan routes based on what it could see. Robotic arms have been operating using rudimentary machine vision since the 80s.

However, none of the above could meaningfully ‘understand’ what they saw beyond carefully scripted presets. The groundwork to change all that arrived in 2012 with ImageNet – and more specifically, the first deep learning neural networks trained on ImageNet’s vast database of labeled imagery. Think CAPTCHA solving algorithms on a massive scale.

Google’s DeepDream used a quirk of neural network pattern recognition to generate strange (mostly dog-based) psychedelia.

For the first time, robots could begin to reliably identify, distinguish, and learn different objects. The development and refinement of neural networks kickstarted many of the most influential innovations of the present day, from facial recognition to autonomous vehicles. And most pertinently of all, it laid the foundations for AI.

AI takes neural networks and supercharges them. For robots, it means being able to visually parse environments and understand instructions – the necessary spark for a robotic ‘brain’. Previously, robots were generally confined to structured or static environments – factory floor robots or household Roombas – because they simply could not cope with the unexpected. Behaviour was painstakingly programmed in the lab through iterative testing.

AI however gives robots the tools to adapt to different situations, which is precisely what general purpose humanoid robots need to enter our messy, messy world – now they’re starting to use it.

AI and the great humanoid robot hope

The 2010s saw a revival of interest in robotics, encapsulated by the popular fascination with the slightly unsettling videos released from Boston Dynamics (now part of Hyundai, by the way, who acquired it from Google.) Its increasingly advanced abilities were in large part due to the advances in neural network empowered computer vision.

Boston Dynamics have not been alone in reaping the rewards of neural networks. It has translated into major growth in service robot uptake. From warehouses to chain restaurants, hotel corridors to university campuses, robots are now increasingly seen cleaning streets and delivering packages.

Every one of these specialist areas represents a multi-billion dollar prize. But since the introduction of AI the holy grail has become humanoid robots. At stake? A huge slice of the world’s entire labour market.

In response, the money is moving. US tech leaders like Alphabet, Microsoft and Meta are ploughing significant funding into internal development efforts and acquisitions, while VC backing in robotics in 2024 was tenfold what it was in 2014. Tech figureheads from Elon Musk to Jensen Huang are extremely bullish, as are investors.

How bullish are we talking? Some predictions are truly stratospheric, with Morgan Stanley projecting a market of $5 trillion by 2050 for humanoid robots alone – rivalling or even surpassing the automotive industry. Figure’s CEO has spoken of a $40 trillion opportunity. Even conservative estimates are eyeing a $1 trillion dollar market. Either way, they have momentum and a trajectory straight toward becoming the next truly novel product category and major household purchase.

If you’re a sceptic of US tech barons and conference hype, just know that the US numbers pale in comparison to China. The Chinese government has handed out more than $20 billion in state subsidies and grants specifically aimed at humanoid robot firms in 2024 alone. This alongside a new 1 trillion yuan (~$137 billion) fund to spur startups in AI and robotics.

The results speak for themselves. In 2024, 31 Chinese companies unveiled 36 different humanoid robot prototypes, while the US unveiled 8.

Culturally and economically, China is throwing itself towards robotics in all forms, transforming them from novelty to normal. Robots are visible both in public spaces and private ones, as delivery drones and as companions for children. Little wonder Western media frequently throws up accounts of executives returning from business trips more than a little chastened.

The World Humanoid Robot Games kicked off in China this year.

Physical intelligence: the final hurdle

But not so fast. As a few minutes trawling internet comment sections can attest, just having a brain is not sufficient to have intelligence.

AI in its most common form of LLM chatbots is a long way from the kind of intelligence humanoid robots will need to function effectively among us. What is required is physical intelligence, the ability to make reasoned decisions in a 3D environment. How much force is too much force to peel a banana, or ice a cake?

Through demonstrations, people are commonly fooled into thinking robots will be impossibly precise and dextrous at everything by nature. But the truth is that these examples are deeply misleading – the product of extensive training within very constrained parameters.

Physical intelligence is currently not easily transferable from one robot to the next, unlike iterations of GenAI LLM models. But even if this can be accounted for, the data required for a general purpose humanoid robot will be orders of magnitude greater than anything ever attempted.

Training autonomous vehicles has been the most data intensive task in history, even more so than multimodal AI (though it is fast catching up.) Millions of hours of raw footage and exabytes of synthetic data have been used to train AVs. This data is semantically not particularly diverse (i.e. it is primarily of just cars moving within road markings) and it is easy to gather via on-vehicle cameras and sensors. Even so, over a decade on, kinks are still being ironed out, and edge cases are still throwing up issues.

Comparatively, the data humanoid robots will need is significantly more complex and far harder to gather. In a real sense, the world is all edge cases. Unlike scraping all the world’s publicly available written materials as a training set for a text model, there is no equivalent dataset for peeling bananas. Data on each activity must be gathered thousands of times over for reliable replication – to say nothing of the challenge of performing an action with 100% safety. The occasional hallucination isn’t acceptable where robots are handling human care, or wielding knives.

This is not an insurmountable challenge, though it is an extremely daunting one. And the work has already begun. Robotics AI specifically received $1 billion in funding in 2024, 4x more than the previous five years combined. Some efforts are already paying off.

How interlinked the future development of AI and robotics has become is hard to overstate. Before DALL-E and ChatGPT entered our lives, OpenAI were also working on robotics in tandem – notably using robot hands to solve Rubik’s cubes. They are interdependent technologies – robots needing AI to understand their environments and AI needing robots to gather data on their environments.

Where are we now?

Gemini put out their first robotics AI model in March 2025. It showed impressive results and some of the first green shoots of AI translating effective machine vision into task success in situations outside of its training. For example, following instructions to pick up and move a pink cloth from a never-before-seen cluttered surface. Similarly, following merging verbal and visual reasoning in tasks like ‘put the sweetest snack in the bowl’ – in both cases, the model yield higher success rates than ever seen before.

Yet even with careful training and very controlled environments, even small things could still lead to failure – an object facing the wrong direction, variations in lighting or background. At this stage, it is very much a proof of concept. But it is the first step to a kind of universal baseline lens through which robots can perceive and interact with the world.

Figure have released their 03 model, with demos showing it performing numerous household tasks such as loading a dishwasher and folding laundry. The scene recognition and manual dexterity it displays (in a model far cheaper to produce than the 02) shows extremely rapid progress in realistically moving humanoid robots into a household setting, even if demonstrations were tightly controlled.

In the limelight. Photograph by Spencer Lowell for TIME

What’s clear that in the 86 years since Elektro took his first drag, humanoid robots have come a long way indeed. Although if that length of time can teach us anything, it is to not underestimate the complexity involved in turning this particular piece of science fiction into reality.

However, as AI and other hardware capabiltiies improve, and supply chain costs fall, in the short term we will be seeing far more specialist robots in our day to day lives, performing routine, labour-saving tasks in the world around us.

And above them, general purpose humanoid robots look poised to eclipse all of these in the longer term. It’s an opportunity we’re extremely excited for. However, they will likely remain out of the mainstream for a while yet while training data collection reaches a critical mass and they can prove themselves absolutely reliable – two tasks that can’t easily be rushed.

In other words: you should probabably go and put your laundry out.

Newsletter Signup

Type your email to get our musings, exclusive content, and event invites straight to your inbox.