Code related to "Loudspeaker Beamforming to Enhance Speech Recognition Performance of Voice Driven Applications"

doi:10.4121/36b9065e-278e-40ee-b359-6cd734561f86.v1
The doi above is for this specific version of this dataset, which is currently the latest. Newer versions may be published in the future. For a link that will always point to the latest version, please use
doi: 10.4121/36b9065e-278e-40ee-b359-6cd734561f86
Datacite citation style:
de Groot, Dimme (2025): Code related to "Loudspeaker Beamforming to Enhance Speech Recognition Performance of Voice Driven Applications". Version 1. 4TU.ResearchData. software. https://doi.org/10.4121/36b9065e-278e-40ee-b359-6cd734561f86.v1
Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite
Software

This repository contains MATLAB code implementing the loudspeaker spotformer proposed in "Loudspeaker Beamforming to Enhance Speech Recognition Performance of Voice Driven Applications", see also https://github.com/D1mme/LoudspeakerBeamformingForVoiceDrivenApplications. The article was accepted but has not yet been published. A DOI will be added once available. Note that some parts of the data and code in this repository are not my own and are published under different but permissive licenses. To see them, please refer to the corresponding directories.


Abstract:

In this paper we propose a robust loudspeaker beamforming algorithm which is used to enhance the performance of voice driven applications in scenarios where the loudspeakers introduce the majority of the noise, e.g. when music is playing loudly. The loudspeaker beamformer modifies the loudspeaker playback signals to create a low-acoustic-energy region around the device that implements automatic speech recognition for a voice driven application (VDA). The algorithm utilises a distortion measure based on human auditory perception to limit the distortion perceived by human listeners. Simulations and real-world experiments show that the proposed loudspeaker beamformer improves the speech recognition performance in all tested scenarios. Moreover, the algorithm allows to further reduce the acoustic energy around the VDA device at the expense of reduced objective audio quality at the listener’s location.

history
  • 2025-01-02 first online, published, posted
publisher
4TU.ResearchData
format
MATLAB scripts (.m, .mexa64), audio files (.wav)
organizations
TU Delft, Faculty of Electrical Engineering, Mathematics and Computer Science, Department of Intelligent Systems

DATA

To access the source code, use the following command:

git clone https://data.4tu.nl/v3/datasets/fbc606b8-548e-40ca-ab39-50f5145c3db3.git

Or download the latest commit as a ZIP.