From Zero to Robot Hero: Building Your Own AI Robots

Introduction

This blog is a write-up of the talk I did at live at Sheffield AI on current state of robotics and AI focusing on hugging faces LeRobot. The Video and the presentation can be found in the below.

Google Slides Presentation

Robots Are Already Here

Do you have a robot at home? You might think not, but what about a dishwasher? A washing machine or dryer? These are all forms of robots – machines designed to do a specific, repetitive job. But the robotics revolution we’re seeing now is about something different: AI-powered robots. These aren’t just single-task machines; they are more general-purpose, capable of learning and adapting (as apose to programmed), bringing automation potential to a much wider range of tasks, both at home and at work.

My own journey into this field comes from 14 years in manufacturing, seeing traditional factory robots weld and move parts, combined with a passion for the intersection of the physical and virtual worlds explored over the last 7 years through AI, IoT, and robotics project, both professionally and as a hobbyist at places like Sheffield Hackspace. This post isn’t about selling you anything, but about sharing ideas and showing how accessible building your own AI-enabled robots has become.

Why Robotics, Why Now? The AI Advantage

Several factors are converging to make this the right time for a robotics surge, especially outside of large factories:

Cost Efficiency: Think about making sandwiches on a production line or picking crops in agriculture. Human labor is expensive. Automating these tasks with robots can lower the cost of goods. Imagine robots precisely targeting weeds with herbicide instead of blanket spraying – better for costs and health.
Time Saving: We all have repetitive tasks that annoy us. While a washing machine helps, it doesn’t empty itself, hang the clothes, and put them away. AI robots could handle the entire pipeline, freeing up significant time for more creative, enjoyable, or important activities. Imagine a robot tidying up while you sleep or work!
Personalized Help: From keeping the house tidy (making partners happy!) to potentially assisting with personalized shopping or pet care, the scope for helpful home automation is vast.
Accessibility: Building and training robots is no longer solely the domain of highly specialized engineers with massive budgets. Affordable hardware and open-source AI tools are democratizing the field.
Competitiveness: For regions like Sheffield with its manufacturing history (and the UK as a whole), embracing robotics is crucial for staying competitive globally. While the UK has lagged, the potential for AI to accelerate adoption is huge.

The New Paradigm: AI-Guided Robotics

Traditionally, building robots involved:

Mechanical/Electrical Engineering focus.
High-precision (and expensive) motors.
Manual programming of exact coordinates (e.g., to pick up a cup). If the cup moved slightly, the robot often failed.

The new paradigm, driven by AI, is different:

Camera-Guided: Robots use cameras (often simple, cheap ones) to see their environment.
AI for Control: Instead of rigid coordinates, AI models interpret the visual input and generate motor commands dynamically. Think about how you catch a ball – you constantly adjust based on visual feedback. AI allows robots to do the same, fine-tuning movements as they get closer to an object.
Less Precision Needed: Because the AI adapts, the motors themselves don’t need to be ultra-precise, lowering hardware costs.
Skill Shift: While traditional engineering skills are still valuable, AI and Computer Science skills are becoming central.

Democratizing Robotics: Hugging Face & Le Robot

A key player in making this accessible is Hugging Face, known for its open-source AI platform. They have a dedicated robotics team in France and other locations working on the “Le Robot” project (hence the name!).

Goal: To make AI robotics easy to learn and use.
Resources: They provide open-source libraries (mostly Python), pre-trained models, datasets, and educational materials.
Community: Users can contribute code, models, and crucial training data.

How it Works (Simplified): Le Robot simplifies the complex setup often associated with systems like ROS (Robot Operating System). You typically:

Clone their Python library from GitHub.
Connect USB devices (cameras, robot arm motors).
Use the library to record training data and train/run AI models.
The AI model takes camera images as input and outputs motor positions/paths.

A Concrete Example: An engineer from Hugging Face (formerly Tesla) trained a robot arm to fold clothes using the Le Robot framework. It only took around 50 demonstrations (showing the robot what to do) for the AI to learn the task reliably. This number of examples needed is rapidly decreasing, similar to how language models have improved.

Getting Started: Hardware and Skills

You don’t need a massive lab budget:

Affordable Hardware: Kits like the SO ARM 100 are excellent starting points. Designed by a community member, it’s open-source (3D models, parts list on GitHub), can be largely 3D printed, and costs around £110-£150 per arm (at the time of recording). You typically need two for the initial training method (leader/follower), but alternative control methods are emerging.
Basic Requirements:
- A laptop with a modest GPU (e.g., 4GB VRAM) for running the AI models (inference).
- Ability to clone a GitHub repository and run Python scripts.
- Editing simple text configuration files.
- Basic electronics assembly (connecting motors/wires).
Advanced Skills (Optional): 3D modeling (getting easier with AI tools!), data science practices, reinforcement learning, contributing Python code to the libraries.

Training Your Robot: Learning from Demonstration & Simulation

How do these robots learn?

Imitation Learning (Behaviour Cloning): This is the primary method used initially with Le Robot.
- You manually control one robot arm (the “leader”).
- Cameras record another identical arm (the “follower”) copying the leader’s movements to perform a task (e.g., pick up a cup, fold a cloth).
- The AI model learns to map the camera views to the required motor actions based on these recorded demonstrations.
- Emerging Trend: Using game controllers (like an Xbox controller) or other devices (teleoperation) to guide the robot, potentially removing the need for a dedicated leader arm.
Reinforcement Learning:
- Train the robot in a virtual simulation (like Nvidia Isaac Sim).
- Define a goal (e.g., object in the cup).
- The AI gets “rewards” for actions that bring it closer to the goal. It learns through trial and error, much faster than in the real world.
- Often requires some fine-tuning on the real robot afterwards to account for real-world physics differences.
Synthetic Data: Using AI video generation models (like Sora) to create videos of desired robot actions, which can then be used as additional training data.

The AI Brains: Model Types

Two main types of AI models are commonly used:

Diffusion Models: Similar to AI image generators. They start with noise and iteratively refine an output – in this case, the entire plan or path for the robot’s motors. They are generally less resource-intensive, making them suitable for running on laptops.
Vision-Language Models (VLMs) / Transformers: Similar to models like ChatGPT. They often predict actions step-by-step. They can leverage powerful pre-trained models (trained on vast internet data) and then be fine-tuned for specific robot tasks. Examples include models like RT-X/Octo (used in Google’s robotics efforts) or the Pi-Zero model mentioned, which combines Google’s SigLIP (vision) and Gemma (language) models. While potentially more powerful, they often require more computational resources.

The key takeaway is that robotics is piggybacking on the massive advancements (and investment) in general AI and large language models.

Expanding Capabilities: Sensors and Safety

Sensors: While standard cameras are the primary input (two are often sufficient for 3D perception), the models can incorporate data from any sensor: infrared, touch/tactile sensors (like those being developed for better grip), depth sensors, etc. It’s about fusing sensor data into the model.
Compute for Training: Training huge base models (like GPT-4) takes immense resources. However, fine-tuning a pre-trained model for a specific task (like folding clothes with 50 examples) or training diffusion models for simpler tasks can often be done on a regular gaming laptop.
Risks and Safety: As robots become more autonomous and capable in the real world, safety and ethics are critical. Regulations are evolving (e.g., for autonomous vehicles). In industrial settings, “cobots” (collaborative robots) are designed with safety features like stopping immediately on contact. Thoughtful design and regulation are essential.

Join the Robotics Revolution

The barriers to entry in AI-powered robotics are lower than ever. With affordable hardware, powerful open-source tools like Le Robot, and a growing community sharing data and knowledge, building robots to solve real-world problems is within reach. All you need is curiosity, a willingness to learn and experiment, and perhaps a task you’d really like to automate!

Find the Le Robot project and resources via Hugging Face. Check out the SO ARM 100 on GitHub for an affordable hardware starting point.