Spot the Robot Tour Guide: Boston Dynamics Combines AI and Robotics for Engaging Cultural Experience

Date:

Updated: [falahcoin_post_modified_date]

Spot the Robot Tour Guide: Boston Dynamics Combines AI and Robotics for Engaging Cultural Experience

Boston Dynamics, renowned for its cutting-edge robotic technology, has taken its quadruped robot, Spot, to a whole new level. Combining artificial intelligence (AI) and robotics, the company has transformed Spot into a robot tour guide, offering a unique cultural experience.

The integration of Spot with ChatGPT and other AI models serves as a proof of concept, showcasing the potential applications of foundational models in robotics. In recent times, there have been remarkable advancements in Generative AI, largely driven by the emergence of large Foundation Models (FMs). These FMs are massive AI systems trained on vast amounts of data scraped from various sources, enabling them to develop Emergent Behaviors. This means they can perform tasks beyond their direct training and adapt to a wide range of applications, serving as a foundation for other algorithms.

The Boston Dynamics team spent the summer creating proof-of-concept demonstrations using FMs specifically for robotic applications. Building upon these demos during an internal hackathon, they focused on showcasing Spot’s ability to make real-time decisions based on the output of FMs.

The team was particularly interested in leveraging Large Language Models (LLMs) like ChatGPT. These LLMs function as highly capable autocomplete algorithms that take in a stream of text and predict the next part of it. In this project, the Boston Dynamics team saw the potential of LLMs to role-play, replicate culture and nuance, make plans, and maintain coherence over time. They were also inspired by Visual Question Answering (VQA) models, which can caption images and answer simple questions about them.

To put their concepts to the test, the team decided to develop a robotic tour guide. Spot would roam around, observing objects in the environment, and employ a combination of VQA or image captioning models along with an LLM to describe them. The robot would interact with the tour audience by answering questions, providing elaborations, and planning its next actions.

According to the team, the LLM plays the role of an improv actor. The engineers provide a broad script, and the LLM fills in the details on the fly. Rather than seeking purely factual information, the team aimed for entertainment, interactivity, and nuance in the tour experience.

To enable Spot to communicate with the group and understand their questions and prompts, the Boston Dynamics team 3D printed a vibration-resistant mount for a Respeaker V2 speaker. This speaker mount was attached to Spot’s EAP 2 payload using a USB connection. Spot is controlled via an offboard computer, either a desktop PC or a laptop, utilizing Spot’s SDK for communication. The team integrated a Spot SDK service to facilitate audio communication with the EAP 2 payload.

While Spot now had audio capabilities, it required conversation skills. The team initially used OpenAI’s ChaptGPT API on gpt-3.5 and later upgraded to gpt-4 when it became available. They also conducted tests using smaller open-source LLMs.

The team drew inspiration from Microsoft’s research and prompted GPT by creating an illusion that it was generating the next line of a Python script. They provided English documentation to the LLM in the form of comments and assessed the output as if it were Python code.

To enhance the robot’s conversational abilities, the team gave the LLM access to the robot’s SDK, a tour site map with brief descriptions of each location, and the capability to ask questions or utter phrases. This integration involved incorporating a VQA model and speech-to-text software.

Spot’s gripper camera and front body camera were fed into BLIP-2, which operated in either visual question answering mode or image captioning mode. This process ran approximately once per second, and the results were directly fed into the prompt.

For Spot to effectively process audio, the team employed OpenAI’s Whisper to convert microphone data into English text. Spot awaited a wake-up word, such as Hey, Spot, before incorporating the text into the prompt, and it muted its audio when speaking.

Since ChatGPT generated text-based responses, the team used a text-to-speech tool to enable Spot to respond verbally to the audience. After trying various off-the-shelf options, they settled on using ElevenLabs, a cloud service. To minimize latency, they streamed the text to the platform as parallel phrases and played back the generated audio.

To provide Spot with more natural-looking body language, the team capitalized on a feature introduced in the Spot 3.3 update. This feature enables the robot to detect and track moving objects, allowing it to determine the location of the nearest person and turn its arm towards them. By applying a low-pass filter to the generated speech, the team managed to make the gripper mimic speech, enhancing the illusion with the addition of costumes or googly eyes.

During the experiments, the team observed new behaviors quickly emerging from the robot’s basic action space. When asked about Mark Raibert, the robot admitted its lack of knowledge and volunteered to seek help from the IT help desk, despite not being programmed to do so. Similarly, when asked about its parents, Spot autonomously navigated to the display area where older versions of Spot, the Spot V1, and Big Dog were exhibited.

Although these behaviors demonstrate the power of statistical associations between concepts, such as help desk and asking a question, or parents and old, the team emphasized that the LLM’s abilities do not indicate consciousness or human-like intelligence.

While the LLM performed admirably, it occasionally provided inaccurate information during the tour. For instance, it repeatedly claimed that Stretch, Boston Dynamics’ logistics robot, is used for yoga.

Moving forward, the Boston Dynamics team is committed to further exploring the fusion of artificial intelligence and robotics. Robotics offers a tangible way to ground large foundation models in the real world, while these models provide crucial cultural context, general knowledge, and flexibility that can serve a wide range of robotic applications.

In conclusion, Boston Dynamics’ innovative venture of combining AI and robotics has resulted in Spot the Robot Tour Guide, a fascinating and engaging cultural experience. By integrating Spot with advanced AI models, the company has paved the way for future applications while showcasing the benefits and potential of large foundation models in the field of robotics.

[single_post_faqs]
Neha Sharma
Neha Sharma
Neha Sharma is a tech-savvy author at The Reportify who delves into the ever-evolving world of technology. With her expertise in the latest gadgets, innovations, and tech trends, Neha keeps you informed about all things tech in the Technology category. She can be reached at neha@thereportify.com for any inquiries or further information.

Share post:

Subscribe

Popular

More like this
Related

Revolutionary Small Business Exchange Network Connects Sellers and Buyers

Revolutionary SBEN connects small business sellers and buyers, transforming the way businesses are bought and sold in the U.S.

District 1 Commissioner Race Results Delayed by Recounts & Ballot Reviews, US

District 1 Commissioner Race in Orange County faces delays with recounts and ballot reviews. Find out who will come out on top in this close election.

Fed Minutes Hint at Potential Rate Cut in September amid Economic Uncertainty, US

Federal Reserve minutes suggest potential rate cut in September amid economic uncertainty. Find out more about the upcoming policy decisions.

Baltimore Orioles Host First-Ever ‘Faith Night’ with Players Sharing Testimonies, US

Experience the powerful testimonies of Baltimore Orioles players on their first-ever 'Faith Night.' Hear how their faith impacts their lives on and off the field.