FACEBOOK OPEN-SOURCES “SOUNDSPACES” TO ADVANCE AI ASSISTANTS
A new artificial intelligence development module was open-sourced by Facebook recently, to help researchers create robots that are capable of answering to tasks that require both visual and auditory sensing, like “Get my ringing phone” or “Open the door where the person is knocking.”
The open-sourced tool, called SoundSpaces, is a research dataset designed for the “embodied AI”, a field of artificial intelligence that focuses on fitting physical bodies- such as robots for example- with software, before training the systems I real-life environment.
The sound-rendering tool, according to MIT Technology Review, allows researchers to add highly realistic acoustics to any given physical environment.
It could also render the sounds produced by hitting different pieces of furniture or the sounds of heels versus sneakers on a floor.
SoundSpaces provides a collection of audio files that AI developers can use to train sound-aware AI models in a simulation. These audio files are not simple recordings but rather “geometrical acoustic simulations,” Silicon Angle reported.
The simulations include information on how waves reflect off surfaces such as walls, how they interact with different materials, and other data that developers can use to create realistic-sounding simulations for training AI models.
“To accomplish a task like checking to see whether you locked the front door or retrieving a cell phone that’s ringing in an upstairs bedroom, AI assistants of the future must learn to plan their route, navigate effectively, look around their physical environment, listen to what’s happening around them, and build memories of the 3D space” Facebook research scientists Kristen Grauman and Dhruv Batra wrote in a blog post.
“These smarter assistants will require new advances in embodied AI, which seeks to teach machines to understand and interact with the complexities of the physical world as people do.”
With Facebook’s breakthrough “first audio-visual platform for embodied AI”, researchers can train AI agents in 3D environments with highly realistic acoustics, which opens up an array of new embodied AI tasks, such as navigating to a sound-emitting target, learning from echolocation, or exploring with multimodal sensors.
According to Facebook, adding sound not only yields faster training and more accurate navigation at inference, but also enables the agent to discover the goal on its own from afar.
For improved accurate navigation, Facebook also open-sourced a tool called Semantic MapNet, to be used by developers to provide their models a spatial memory.
“Semantic Mapnet sets a new state of the art for predicting where particular objects, such as a sofa or a kitchen sink, are located on the pixel-level, top-down map that it creates,” Grauman and Batra wrote.
“It outperforms the previous approaches and baselines on mean-IoU, a widely used metric for the overlap between prediction and ground truth.”