AI headphones allow wearer to focus on specific voices in a crowd with just one glance.

Noise-canceling headphones have become increasingly popular for creating a quiet auditory space for their users. However, allowing certain sounds from the wearer’s environment to still come through, such as important conversations, remains a challenge. The latest version of Apple’s AirPods Pro adjusts sound levels automatically for wearers, but the user has little control over whom to listen to or when this happens. Researchers are working on solutions to address this issue, such as the artificial intelligence system developed by a University of Washington team.

The team at the University of Washington has created an artificial intelligence system called “Target Speech Hearing” that allows a user wearing headphones to enroll a specific speaker by looking at them for three to five seconds. Once enrolled, the system cancels out all other sounds in the environment and plays only the enrolled speaker’s voice in real-time, even as the listener moves around in noisy places and no longer faces the speaker. This innovative system was presented at the ACM CHI Conference on Human Factors in Computing Systems and the code for the proof-of-concept device is available for others to build on.

Senior author Shyam Gollakota, a professor at the UW Paul G. Allen School of Computer Science & Engineering, emphasizes that the AI technology developed in this project is focused on modifying the auditory perception of headphone wearers based on their preferences. Users can now hear a single speaker clearly in a noisy environment with lots of other people talking, thanks to this device. The process involves tapping a button while facing the speaker to enroll them, allowing the system to focus on that speaker’s voice and play it back to the listener consistently.

The system, which has been tested on 21 subjects, received positive feedback with users rating the clarity of the enrolled speaker’s voice nearly twice as high as the unfiltered audio on average. While the TSH system can currently enroll only one speaker at a time and requires no loud voices coming from the same direction, the team is working on expanding its capabilities. In the future, they aim to incorporate the system into earbuds and hearing aids, making it accessible to a wider range of users who may benefit from this technology.

Overall, this research builds on the team’s previous work in “semantic hearing,” allowing users to select specific sound classes they want to hear while canceling out other sounds in their environment. Through the development of the Target Speech Hearing system, the team has taken a significant step towards enhancing the auditory experience for headphone users. Future advancements in the system may lead to even more personalized and customizable options for users to further optimize their listening experience in various settings.

Supported by funding from a Moore Inventor Fellow award, a Thomas J. Cabel Endowed Professorship, and a UW CoMotion Innovation Gap Fund, the research team plans to continue refining and expanding the capabilities of the Target Speech Hearing system. With the potential to revolutionize how users interact with their headphone technology, this innovative AI solution offers a glimpse into the future of personalized auditory experiences for individuals in a variety of environments.