With an innovative combination of robotics and artificial intelligence, Boston Dynamics has reimagined four-legged mechanical wonder Spot as a charismatic tour guide.
Armed with the power of OpenAI’s ChatGPT and other large language models (LLMs), Spot has been transformed from an audit assistant into an interactive robot that can chat, answer questions, and offer tours with a touch of fun and nuance.
This evolution in Spot’s capabilities is a result of Boston Dynamics exploring the broad potential of foundational models—complex AI systems that are trained on extensive data sets and can exhibit emergent behavior.
From Control to Interaction
Previously known for his inspection skills, Spot now gains new abilities as he wanders the halls of Boston Dynamics. Equipped with an array of sensors and AI-powered speech and text recognition tools, Spot demonstrates a remarkable ability to interact with people in real time. This interaction isn’t just about presenting dry facts; it’s about creating an engaging, informative experience that may include some impromptu role-playing and even humor.
Technical crew
This transformation required Spot to be equipped with a vibration-resistant speaker housing to project its new sound. Controlled by an external computer using Spot SDK, the robot integrates OpenAI’s ChatGPT API upgraded to GPT-4 and various open source LLMs. Spot’s tour guide persona is also enhanced by visual question-answering patterns that allow him to identify objects he “sees” with his cameras and answer questions about them.
Emerging Behaviors
Spot’s interactions during the tours revealed unexpected behavior, such as independently asking for help or identifying ‘parents’ among older robot models. While the Boston Dynamics team is quick to clarify that this doesn’t mean LLMs are conscious or intelligent in a human-like way, these actions highlight AI’s capacity to make statistical associations and adapt to new contexts.
Human Touch
To contribute to Spot’s human-like interactions, the team used text-to-speech services and programmed body language into the robot, allowing its robotic arm to turn towards people and ‘talk’ to them by mimicking the movements of a human mouth.
Challenges and Prospects
Despite the successes, the team also acknowledges limitations, such as the LLM’s tendency to fabricate answers or the awkwardness of delayed answers. However, the team is optimistic about the future, envisioning a world where robots understand and act on verbal instructions, reducing the learning curve for human users and increasing the utility of robots in a variety of fields.
Spot’s new role as a tour guide represents a significant step in the ongoing convergence of artificial intelligence and robotics. It highlights the potential of these technologies to provide not only functional benefits, but also cultural context and a whimsical touch to our interactions with machines. The experience gained from this proof of concept promises to pave the way for even more sophisticated and seamless human-robot collaborations in the future.