Machine Learning Model where sound in a room will propagate through the space | Image credit: MIT News

Yilun Du, a grad student in the Department of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology, has developed a machine learning technique that accurately captures and models the underlying acoustics of a scene from only a limited number of sound recordings. This machine-learning system can simulate how a listener would hear sound from any point in a room.

Joining Du on the work are lead author Andrew Luo, a graduate student at Carnegie Mellon University (CMU), senior author Joshua B. Tenenbaum, the Paul E. Newton Career Development Professor of Cognitive Science and Computation in MIT’s Department of Brain and Cognitive Sciences and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL), and Antonio Torralba, the Delt Professor of Cognitive and Brain Science at CMU, and Michael J. Tarr, the Kavi-Moura Professor of Cognitive and Brain Science At the Conference on Neural Information Processing Systems, the research will be presented.

The use of spatial acoustic information to aid robots in better understanding their environs was also investigated by the researchers. They created a machine-learning model that can mimic what a listener would hear at various positions by capturing how any sound in a room will travel across space.

An implicit neural representation model, a sort of machine-learning model, has been utilized in computer vision research to produce continuous, smooth reconstructions of 3D scenes from photographs. These models make use of neural networks, which are composed of layers of connected nodes, or neurons, that analyze data to act.

The same kind of model was used by MIT researchers to depict how sound permeates a scene continuously.

However, they discovered that sound models do not share a trait known as photometric consistency that makes vision models more advantageous. The identical thing appears to be about the same when viewed from two different angles. However, when it comes to sound, different locations could result in completely different sounds due to obstructions, distance, etc. As a result, audio prediction is quite challenging.

The reciprocal nature of sound and the influence of regional geometric elements were two acoustic properties that the researchers incorporated into their model to solve this issue.

Lead author Andrew Luo, a graduate student at Carnegie Mellon University (CMU) -“If you imagine standing near a doorway, what most strongly affects what you hear is the presence of that doorway, not necessarily geometric features far away from you on the other side of the room. We found this information enables better generalization than a simple fully connected network,” .

In addition, they discovered that incorporating the acoustic data their model picks up into a computer vision model can improve the visual reconstruction of the scene.

“When you only have a sparse set of views, using these acoustic features enables you to capture boundaries more sharply, for instance. And maybe this is because to accurately render the acoustics of a scene, you have to capture the underlying 3D geometry of that scene,” Du says.

The model will be improved further by the researchers so that it may be applied to fresh scenes. Additionally, they aim to use this method for more involved impulsive reactions and bigger scenarios, such as entire buildings or even a whole town or metropolis.

“This new technique might open up new opportunities to create a multimodal immersive experience in the metaverse application,” adds Gan.

The Latest

Partner With Us

Digital advertising offers a way for your business to reach out and make much-needed connections with your audience in a meaningful way. Advertising on Techgenyz will help you build brand awareness, increase website traffic, generate qualified leads, and grow your business.

Know More

MIT Develops Machine-learning Tech that Simulates How Listeners Hear Sound from Any Point

Windows 11 KB5060826 Update: Boost Performance Now

Cybersecurity Worries? Your MSP is Your First Line of Defense

Crypto Surge Ignites Bold Financial Revolution in April 2025

Three MIT researchers built a mathematical framework for machine-lea...

Korea Will Be the World’s 1st Country to Formulate Fiber-based White...

The World’s First Independent Networking 5G+5G Dual Sim is Ann...

How MIT Aims to Reducing Methane Emissions From Landfills With a Sys...

Scientists Use Quantum Dots to Boost Perovskite Solar Cell Efficienc...