Licensing Consultant

Not just any technology

Bridging the Gap: Using Deep Acoustic Representations to Learn Grounded Language from Percepts and Raw Speech

Discovering to fully grasp grounded language—the language that happens in the context of, and refers to, the broader world—is a well known place of investigation in robotics. The bulk of present get the job done in this place even now operates on textual knowledge, and that limitations the capacity to deploy agents in practical environments.

Digital analysis of the end-user speech (or raw speech) is a vital part in robotics. Image credit: Kaufdex via Pixabay, free license

Digital analysis of the end-user speech (or raw speech) is a crucial section in robotics. Image credit score: Kaufdex by way of Pixabay, free license

A current article posted on arXiv.org proposes to receive grounded language specifically from end-user speech making use of a fairly tiny range of knowledge details rather of relying on intermediate textual representations.

A specific analysis of all-natural language grounding from raw speech to robotic sensor knowledge of day to day objects making use of point out-of-the-art speech representation versions is presented. The analysis of audio and speech characteristics of unique members demonstrates that studying specifically from raw speech increases performance on buyers with accented speech as in contrast to relying on automated transcriptions.

Discovering to fully grasp grounded language, which connects all-natural language to percepts, is a critical investigation place. Prior get the job done in grounded language acquisition has targeted primarily on textual inputs. In this get the job done we demonstrate the feasibility of performing grounded language acquisition on paired visual percepts and raw speech inputs. This will enable interactions in which language about novel tasks and environments is uncovered from end buyers, reducing dependence on textual inputs and likely mitigating the consequences of demographic bias uncovered in extensively readily available speech recognition systems. We leverage current get the job done in self-supervised speech representation versions and show that uncovered representations of speech can make language grounding systems additional inclusive to particular groups although sustaining or even expanding standard performance.

Research paper: Youssouf Kebe, G., Richards, L. E., Raff, E., Ferraro, F., and Matuszek, C., “Bridging the Gap: Employing Deep Acoustic Representations to Study Grounded Language from Percepts and Uncooked Speech”, 2021. Website link: https://arxiv.org/abs/2112.13758