New advancements in artificial intelligence (AI) have emerged from Apple with the introduction of Matrix3D, an innovative model capable of generating 3D perspectives from multiple 2D images. Developed by Apple’s Machine Learning team in collaboration with researchers from Nanjing University and the Hong Kong University of Science and Technology (HKUST), this large language model (LLM) has been released for public use and is accessible via Apple’s GitHub repository.
Apple’s Matrix3D Innovates Multi-Task Photogrammetry
A recent blog post by the tech powerhouse provides insights into the research underpinning the development of the Matrix3D model. While various 3D rendering models exist, Matrix3D distinguishes itself by integrating the 3D generation pipeline, thereby minimizing error risks. This singular LLM can handle multiple photogrammetry subtasks, including pose estimation, depth prediction, and novel view synthesis, rather than relying on separate models for each task.
Photogrammetry offers precise measurements and 3D representations of physical entities and environments through the analysis of images. This method is frequently employed to produce maps, 3D models, and precise measurements derived from 2D images captured from various angles.
Additionally, a research paper detailing the innovative aspects of the model has been made available on the online preprint platform arXiv here. The Matrix3D model operates on a multimodal diffusion transformer (DiT) architecture, allowing it to merge data from multiple sources, including image data, camera parameters, and depth maps.
The study emphasizes a mask learning strategy utilized during training, where sections of the image are obscured, compelling the AI model to predict the correct pixels needed to fill in the gaps. Researchers discovered that Matrix3D can produce a complete 3D object or scene view using just three images taken from different perspectives.
Although the dataset employed for training is not disclosed, the model is freely available for download, modification, and redistribution under a permissive license on Apple’s GitHub listing.