Google DeepMind Trained Humanoid Robots to Play Soccer Using Egocentric Vision

Robot soccer is a challenging task that requires skills like agile movement, long-term planning, coordination with teammates and scoring goals against an opponent. Previous research focused on learning these skills by providing robots access to the exact positions of the ball, goals and other robots. However, this is not realistic for real robots as they don’t have such luxuries and rely only on sensors like cameras. Recently, Google Deepmind researchers trained robots to play soccer using only images from their onboard cameras, like how humans play. They call this approach vision-based soccer. The goal was to learn complex soccer-playing abilities like in previous work but fully from raw camera inputs like humans, making it more practical for real robots.

Training Vision-Based Deepmind Robot Soccer Agents

1. Deep Reinforcement Learning

The researchers trained their robot soccer agents entirely in a simulated environment using a technique called deep reinforcement learning. In reinforcement learning, agents learn skills by trial and error while receiving rewards for desirable behaviours. They developed a multi-stage training process in which robots first learn basic skills individually and then learn to play together through self-play.

A key challenge with vision-based learning is the limited field of view of cameras. To address this, the agents were given memory to recall past observations. They also randomized aspects of training, such as the robotic bodies, lighting, and background, to improve real-world performance.

2. NeRF Rendering and Sim-to-Real Transfer

Google DeepMind leverages Neural Radiance Fields to generate realistic camera observations in simulation. Multiple NeRFs of the real soccer field are captured from different angles and lighting conditions. During training, a NeRF is randomly selected to render the static scene and dynamic objects like the ball and opponent are overlaid using MuJoCo physics.

To reduce the sim-to-real gap, the NeRF colours are calibrated, and several image augmentations are applied. This realistic rendering, combined with domain randomization, enables zero-shot transfer of vision-based policies to physical robots.

Deepmind Soccer Robots Emergent Vision-Based Behavior

Surprisingly, complex multi-agent behaviours emerged during training without explicitly rewarding vision skills. For example, agents learned to search for obscured balls, track opponent movements, block shots and coordinate plays – all from raw pixels.

The researchers analyzed the learned policies to better understand these emergent behaviours. They found that the agents developed an internal representation of the field from distinctive visual elements, such as goals. Robots also robustly tracked moving balls, opponents and its own position, even when objects occluded or moved out of view.

Analysis showed agents actively controlled camera movements to maintain objects like moving balls within their field of view. This demonstrates the emergence of active perception skills through task rewards alone.

Comparable Agility with Vision

Various tests showed that vision-based agents maintained agility on par with previous state-based policies. Their maximum walking speed, kicking power and accuracy at scoring penalties matched levels seen before with full state information. Some drops in real-world performance expected due to additional noise sources like motion blur and lighting changes for cameras.

In simulation, agents learned effective shooting through smoothly coordinated body and leg motions, paralleling real human soccer skills. Tests confirmed that researchers applied policies successfully transferred to operate real robots with only camera inputs for the first time at this task complexity.

Applications and Future Work

This research addressed many practical challenges for real-world deployments, such as limited robot sensors and uncertainties. By training directly from raw inputs, the developed techniques could empower a wider range of robots without state estimators. The emergence of active perceptual behaviours also simplified the design process.

Conclusion

Deepmind presents the first demonstration of multi-agent humanoid robots soccer trained end-to-end from pixels via self-supervised DRL. Their research represents an important step towards general robot learning by successfully transferring end-to-end visuomotor policies for a challenging multi-agent domain. The techniques can enable progress on other robotics applications.

| Also Read Latest From Us

AI, Artificial Intelligence, DeepMind, Google, Machine Vision, Robot, Robotics

Faizan Ali Naqvi

Research is my hobby and I love to learn new skills. I make sure that every piece of content that you read on this blog is easy to understand and fact checked!