On Wednesday, Hugging Face revealed an expansion of its LeRobot platform, introducing a comprehensive dataset focused on automotive automation. The dataset, developed in partnership with the AI startup Yaak, is titled Learning to Drive (L2D) and was compiled from a range of sensors fitted on 60 electric vehicles (EVs) over three years. This open-source dataset is designed to assist developers and the robotics community in creating spatial intelligence solutions for the automotive sector.
Hugging Face Incorporates L2D Dataset into LeRobot
In a blog announcement, the company described the AI dataset as “the world’s largest multimodal dataset aimed at building an open-sourced spatial intelligence for the automotive domain.” The total size of the dataset exceeds 1PB (one Petabyte) and was sourced from sensor suites mounted on 60 EVs operated by driving schools across 30 cities in Germany over a three-year period. Consistent sensor technology was employed to ensure uniformity in the data gathered.
The LeRobot platform was introduced last year, serving as a repository of open-source AI models, datasets, and tools designed to aid developers in the creation of AI-driven robotics systems.
The Learning to Drive dataset
Photo Credit: Hugging Face
The dataset categorizes driving behaviors into two distinct groups: expert policies and student policies. Expert policies draw from driving instructors, showcasing optimal driving with zero errors, while student policies reflect the performance of learner drivers, including recognizable sub-optimal behaviors. Both categories feature natural language instructions relevant to various driving tasks.
Each category encompasses all necessary driving scenarios for obtaining a driving license in the European Union (EU), including maneuvers such as overtaking, navigating roundabouts, and track driving.
Hugging Face provided insight into the sensor suite utilized for the L2D dataset collection, noting that each of the 60 Kia Niro EV models was outfitted with six RGB cameras for 360-degree environmental imaging, on-board GPS for vehicle tracking, and an inertial measurement unit (IMU) for capturing vehicle dynamics. All data was recorded with precise timestamps.
This dataset aims to support developers and robotics researchers in creating end-to-end self-driving AI models that could contribute to the development of fully autonomous vehicle systems.
Hugging Face announced that the L2D dataset will be gradually released, with each subsequent release building on the previous one to facilitate user access. The platform is also encouraging the community to propose models for closed-loop testing of the dataset, which will include a safety driver, starting in the summer of 2025.