Latent Action Pretraining for General Action models (LAPA): An Unsupervised Method for Pretraining Vision-Language-Action (VLA) Models without Ground-Truth Robot Action Labels
Vision-Language-Action Models (VLA) for robotics are trained by combining large language models with vision encoders and then fine-tuning them on...