RU2022105152A

RU2022105152A - ACOUSTIC ECHO CANCELLATION CONTROL FOR DISTRIBUTED AUDIO DEVICES

Info

Publication number: RU2022105152A
Application number: RU2022105152A
Authority: RU
Inventors: Гленн Н. ДИКИНС; Кристофер Грэхэм ХАЙНЕС; Дэвид ГУНАВАН; Ричард Дж. КАРТРАЙТ; Алан Дж. СИФЕЛДТ; Даниэль АРТЕАГА; Марк Р. П. ТОМАС; Джошуа Б. ЛАНДО
Original assignee: Долби Лабораторис Лайсэнзин Корпорейшн; Долби Интернешнл Аб
Priority date: 2019-07-30
Filing date: 2020-07-29
Publication date: 2023-08-28

Claims

1. A method for managing an audio session, comprising:

receiving output signals from each microphone of the plurality of microphones in the audio environment, each microphone of the plurality of microphones being located at a microphone location in the audio environment, the output signals including signals corresponding to a current portion of the person's speech;

determining, based on the outputs, one or more aspects of contextual information related to the person, the context information comprising at least one of an estimated current location of the person or an estimated current proximity of the person to one or more microphone locations;

determining the nearest speaker-equipped audio device that is closest to the microphone location closest to the estimated current location of the person;

selecting two or more audio media devices based at least in part on one or more aspects of context information, wherein each of the two or more audio devices includes at least one speaker, and wherein the two or more audio devices include the nearest speaker-equipped audio device;

determining one or more types of audio processing changes to be applied to audio data rendered into speaker signals for two or more audio devices, wherein the audio processing changes result in an increase in the speech-to-echo ratio at the microphone closest to the person's estimated current location, wherein the echo contains at least some portion of audio data output by two or more audio devices, and wherein at least one of the audio processing changes for the nearest audio device is different from the audio processing change for the second audio device of the at least two audio devices, and wherein one or more types of changes audio processing ensures that the loudspeaker output level is reduced for a nearby audio device; And

causing one or more types of audio processing changes to be applied.

2. The method of claim 1, wherein one or more types of audio processing changes include spectral modification.

3. A method for managing an audio session, comprising:

selecting two or more audio media devices based at least in part on one or more aspects of context information, wherein each of the two or more audio devices includes at least one speaker;

determining one or more types of audio processing changes to be applied to audio data rendered into speaker signals for two or more audio devices, wherein the audio processing changes result in an increase in the speech-to-echo ratio at one or more microphones of the plurality of microphones, wherein one or more More types of audio processing changes include spectral modification; And

causing one or more types of audio processing changes to be applied.

4. The method of claim 3, wherein at least one of the audio processing changes for the first audio device is different from the audio processing change for the second audio device.

5. A method as claimed in any one of the preceding claims, wherein one or more types of audio processing changes cause a reduction in speaker reproduction level for the speakers of two or more audio devices.

6. Method according to any one of paragraphs. 1-5, wherein the selection of two or more audio media devices includes the selection of N speaker-equipped audio media devices, where N is an integer greater than 2.

7. Method according to any one of paragraphs. 1-6, wherein the selection of two or more audio media devices is based at least in part on an estimated current location of the person relative to at least one of a microphone location or a speaker-equipped audio device location.

8. The method according to claim 7 in the part that is dependent on claim 3, characterized in that it further includes determining the nearest speaker-equipped audio device that is closest to the estimated current location of the person or to the microphone location closest to the estimated current location of the person, wherein the two or more audio devices include the nearest speaker-equipped audio device.

9. Method according to any one of paragraphs. 1-8, wherein one or more types of audio processing changes include changing the rendering process to warp the rendering of the audio signals away from the estimated current location of the person.

10. Method according to any one of paragraphs. 2-9, characterized in that the spectral modification includes reducing the level of audio data in the frequency band from 500 Hz to 3 kHz.

11. Method according to any one of paragraphs. 1-10, wherein one or more types of audio processing changes include inserting at least one gap into at least one selected frequency band of the audio playback signal.

12. Method according to any one of paragraphs. 1-11, wherein one or more types of audio processing changes include dynamic range compression.

13. Method according to any one of paragraphs. 1-12, wherein the selection of two or more audio devices is based at least in part on an assessment of the signal-to-echo ratio for one or more microphone locations.

14. The method of claim 13, wherein the selection of the two or more audio devices is at least in part based on determining whether the signal-to-echo ratio estimate is less than or equal to the signal-to-echo ratio threshold.

15. The method of claim 13, wherein the determination of one or more types of audio processing changes is based on optimization of a cost function that is at least in part based on an estimate of the signal-to-echo ratio.

16. The method of claim 15, wherein the cost function is at least partially based on the rendering performance.

17. Method according to any one of paragraphs. 1-16, wherein the selection of two or more audio devices is based at least in part on a proximity assessment.

18. Method according to any one of paragraphs. 1-17, characterized in that it additionally includes:

determining a number of current acoustic signatures from the output signals of each microphone;

applying a classifier to a number of current acoustic features, wherein applying the classifier includes applying a model trained on previously determined acoustic features derived from a plurality of previous speech fragments spoken by a person in a plurality of user zones in the environment; And

wherein determining one or more aspects of contextual information related to the person includes determining, at least in part based on output from the classifier, an estimate of the user zone in which the person is currently located.

19. The method according to claim 18, characterized in that the user area estimate is determined without reference to the geometric locations of the plurality of microphones.

20. The method according to claim 18 or 19, characterized in that the current fragment of speech and previous fragments of speech include fragments of speech containing a wake-up word.

21. Method according to any one of paragraphs. 1-20, further comprising selecting at least one microphone according to one or more aspects of contextual information.

22. Method according to any one of paragraphs. 1-21, characterized in that one or more microphones are located in a number of audio devices in the audio environment.

23. Method according to any one of paragraphs. 1-22, characterized in that one or more microphones are located in one audio device of the audio environment.

24. Method according to any one of paragraphs. 1-23, characterized in that at least one of the one or more microphone locations corresponds to a row of microphones of one audio device.

25. Equipment configured to perform the method according to any one of paragraphs. 1-24.

26. A system configured to perform the method according to any one of claims. 1-24.

27. One or more non-transitory storage media containing software stored thereon, wherein the software contains instructions for controlling one or more devices to perform the method of any one of claims. 1-24.