WO2021212905A1 - 一种音频处理方法、装置、电子设备及存储介质 - Google Patents

一种音频处理方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2021212905A1
WO2021212905A1 PCT/CN2020/140641 CN2020140641W WO2021212905A1 WO 2021212905 A1 WO2021212905 A1 WO 2021212905A1 CN 2020140641 W CN2020140641 W CN 2020140641W WO 2021212905 A1 WO2021212905 A1 WO 2021212905A1
Authority
WO
WIPO (PCT)
Prior art keywords
echo
audio data
audio
data
configuration parameter
Prior art date
Application number
PCT/CN2020/140641
Other languages
English (en)
French (fr)
Inventor
唐杰
张洋
陈彦宇
马雅奇
叶盛世
Original Assignee
珠海格力电器股份有限公司
珠海联云科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 珠海格力电器股份有限公司, 珠海联云科技有限公司 filed Critical 珠海格力电器股份有限公司
Publication of WO2021212905A1 publication Critical patent/WO2021212905A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Definitions

  • the present disclosure relates to the technical field of sound signal processing, and in particular to an audio processing method, device, electronic equipment, and storage medium.
  • the electronic device can realize the cooperative work of the microphone and the speaker, that is, the electronic device can collect the sound of the user through the microphone while playing audio through the speaker. Since sound waves have the characteristics of reflection and propagation, the audio data collected by the microphone not only includes the user's audio data, but also includes the echo data of the audio played by the speaker.
  • the electronic device may collect the audio played by the speaker through the sampling loop to obtain the first audio data. Then, the electronic device can perform echo cancellation processing on the mixed audio data and the first audio data collected by the microphone to obtain the second audio data. After that, the electronic device may use the second audio data as target audio data containing only the user's voice.
  • the echo data of the audio played by the speaker includes direct echo data and indirect echo data.
  • the direct echo data refers to the audio data collected by the microphone without reflection of the sound wave of the audio played by the speaker.
  • Indirect echo data refers to the audio data played by the speaker.
  • the audio data collected by the microphone only after the sound waves are reflected in the current scene multiple times.
  • the first audio data acquired by the electronic device only contains direct echo data. Therefore, the above-mentioned echo cancellation processing can only remove the influence of the direct echo data, and cannot eliminate the indirect echo data in the mixed audio data, resulting in poor echo cancellation effect.
  • the purpose of the embodiments of the present disclosure is to provide an audio processing method, device, electronic device, and storage medium to solve the problem of poor echo cancellation effect.
  • the specific technical solutions are as follows:
  • an audio processing method includes:
  • the simulating and calculating the echo data generated by the sound waves of the played audio propagating in the current scene based on the first audio data to obtain the simulated echo data includes:
  • the echo data generated by the sound waves of the played audio propagating in the current scene are simulated and calculated to obtain simulated echo data.
  • the method before determining the target configuration parameter corresponding to the current scene according to the pre-stored correspondence between the scene and the configuration parameter, the method further includes:
  • volume decibel is greater than the preset volume threshold, determining that the current scene is the first scene
  • volume decibel is not greater than the preset volume threshold, it is determined that the current scene is the second scene.
  • the corresponding relationship between the scenario and the configuration parameter includes:
  • the configuration parameter corresponding to the first scenario is the first configuration parameter
  • the configuration parameter corresponding to the second scenario is a second configuration parameter, and the second configuration parameter is smaller than the first configuration parameter.
  • the echo simulation algorithm includes a recursive least squares rls adaptive filter; the configuration parameter includes at least one of the number of iterations and the expected value.
  • the echo data includes direct echo data and indirect echo data.
  • an audio processing device in a second aspect, includes:
  • the first acquiring module is configured to acquire the first audio data of the played audio
  • the calculation module is configured to simulate and calculate the echo data generated by the sound waves of the played audio propagating in the current scene based on the first audio data, to obtain simulated echo data;
  • the collection module is configured to collect audio data in the current scene through a microphone to obtain mixed audio data, where the mixed audio data includes second audio data and real echo data generated by sound waves of the played audio propagating in the current scene;
  • the echo cancellation module is configured to perform echo cancellation processing on the mixed audio data based on the analog echo data to obtain the second audio data.
  • the calculation module includes:
  • the determining sub-module is set to determine the target configuration parameter corresponding to the current scene according to the corresponding relationship between the pre-stored scene and the configuration parameter;
  • the setting sub-module is set to set the configuration parameters in the echo simulation algorithm according to the target configuration parameters
  • the calculation sub-module is configured to simulate and calculate the echo data generated by the sound waves of the played audio propagating in the current scene based on the first audio data and the echo simulation algorithm set with the target configuration parameters to obtain simulated echo data .
  • the device further includes:
  • the second acquiring module is configured to acquire the volume decibel of the collected mixed audio data
  • a determining module configured to determine that the current scene is the first scene when the volume decibel is greater than a preset volume threshold
  • the determining module is further configured to determine that the current scene is the second scene when the volume decibel is not greater than a preset volume threshold.
  • the corresponding relationship between the scenario and the configuration parameter includes:
  • the configuration parameter corresponding to the first scenario is the first configuration parameter
  • the configuration parameter corresponding to the second scenario is a second configuration parameter, and the second configuration parameter is smaller than the first configuration parameter.
  • the echo simulation algorithm includes a recursive least squares rls adaptive filter; the configuration parameter includes at least one of the number of iterations and the expected value.
  • the echo data includes direct echo data and indirect echo data.
  • an electronic device including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus;
  • the memory is set to store computer programs
  • the processor is configured to implement any of the method steps described in the first aspect when executing the program stored in the memory.
  • a computer-readable storage medium is provided, and a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the method steps in any one of the first aspects are implemented.
  • a computer program product containing instructions which when run on a computer, causes the computer to execute any of the method steps described in the first aspect.
  • the embodiments of the present disclosure provide an audio processing method, device, electronic equipment, and storage medium, which can obtain the first audio data of the played audio; based on the first audio data, simulate and calculate the sound waves of the played audio to propagate in the current scene. Then, the audio data in the current scene is collected through the microphone to obtain the mixed audio data.
  • the mixed audio data includes the second audio data and the sound waves of the played audio propagate in the current scene. Echo data: Perform echo cancellation processing on the mixed audio data based on the analog echo data to obtain the second audio data.
  • the simulated echo data is obtained, and then the mixed audio data collected by the microphone is echo canceled based on the simulated echo data. Therefore, the mixed audio data can be removed.
  • Direct echo data and indirect echo data can improve the effect of echo cancellation.
  • FIG. 1 is a flowchart of an audio processing method provided by an embodiment of the disclosure
  • FIG. 2 is a flowchart of another audio processing method provided by an embodiment of the disclosure.
  • FIG. 3 is a flowchart of another audio processing method provided by an embodiment of the disclosure.
  • FIG. 4 is a schematic structural diagram of an audio processing device provided by an embodiment of the disclosure.
  • FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the disclosure.
  • the embodiments of the present disclosure provide an audio processing method, which may be applied to an electronic device, and the electronic device may include an audio collection component and an audio playback component.
  • the electronic device may be a mobile phone, a tablet computer, etc.
  • the audio collection component may be a microphone
  • the audio playback component may be a speaker.
  • the embodiments of the present disclosure take the audio collection component as a microphone as an example, and the audio playback component as a speaker as an example to describe a specific processing process of an audio processing method.
  • the electronic device can realize the cooperative work of the microphone and the speaker. For example, the electronic device can collect the voice instructions issued by the user through the microphone while playing music through the speaker. Or, in the process of the user making a call through the electronic device, the electronic device may play the voice of the far-end user through a speaker, and collect the voice of the near-end user through a microphone.
  • the audio data played by the speaker is called the first audio data; the audio data that the microphone needs to collect is called the second audio data; the audio data actually collected by the audio collection component Audio data is called mixed audio data.
  • the mixed audio data includes not only the second audio data, but also the direct echo data and indirect echo data generated by the sound waves of the first audio data propagating in the current scene.
  • the direct echo data and the indirect echo data contained in the mixed audio data can be removed, and the effect of echo cancellation can be improved.
  • Step 101 Acquire first audio data of the played audio.
  • the electronic device in the process of playing a certain audio, can use the audio data of the audio as the first audio data of the played audio.
  • the audio played can be a song or the voice of a remote user in a call scene.
  • the electronic device can obtain audio data of a certain audio in a variety of ways.
  • the electronic device can collect the audio played by the speaker through a sampling loop after the audio is played by the speaker to obtain the first audio data.
  • the electronic device may use the original audio data of the audio as the first audio data, and the original audio data is the audio data that is not played by the speaker.
  • the electronic device may use the audio data transmitted to the speaker as the first audio data of the played audio.
  • the collected audio data may contain electronic noise.
  • the audio data transmitted to the speaker is used as the first audio data of the played audio, which can avoid the problem that the audio data collected after the speaker is played may contain electronic noise. Further, it is beneficial to improve the closeness of the simulated echo data obtained by the simulation calculation based on the first audio data to the real echo data.
  • Step 102 Based on the first audio data, simulate and calculate the echo data generated by the sound waves of the played audio propagating in the current scene to obtain simulated echo data.
  • an echo simulation algorithm can be preset in the electronic device.
  • the echo simulation algorithm can be RLS (Recursive Least Squares) adaptive filter, LMS (Least Mean Squares) algorithm, and NLMS (Normalized Least Squares) algorithm. Mean Squares, normalized least mean square) algorithm, etc.
  • the electronic device may perform simulation calculation on the echo data generated by the sound wave of the played audio propagating in the current scene based on the echo simulation algorithm and the first audio data to obtain the simulated echo data.
  • the electronic device Based on the echo simulation algorithm and the first audio data, the electronic device simulates and calculates the specific processing process of the echo data generated by the sound wave of the played audio propagating in the current scene. You can refer to the related technology to calculate based on the echo simulation algorithm and certain audio data. The processing process of the echo data of the audio data will not be repeated here in this disclosure.
  • the echo data includes direct echo data and indirect echo data.
  • Step 103 Collect audio data in the current scene through a microphone to obtain mixed audio data.
  • the mixed audio data includes the second audio data and the real echo data generated by the sound waves of the played audio propagating in the current scene.
  • the mixed audio data collected by the microphone includes not only the voice commands issued by the user, but also real echo data generated by the sound waves of the played audio propagating in the current scene.
  • the real echo data includes the direct echo data generated when the sound wave of the audio played by the speaker directly enters the microphone, and the indirect echo data generated when the sound wave enters the microphone after multiple reflections in the current scene.
  • the embodiment of the present disclosure does not specifically limit the execution order of step 102 and step 103.
  • Step 104 Perform echo cancellation processing on the mixed audio data based on the analog echo data to obtain second audio data.
  • an AEC algorithm Acoustic Echo Cancellation, echo cancellation algorithm
  • the AEC algorithm may be an RLS adaptive filter, an LMS algorithm, an NLMS algorithm, etc. If the AEC algorithm has an echo simulation function, you can choose the AEC algorithm as the echo simulation algorithm.
  • the electronic device can perform echo cancellation processing based on the AEC algorithm, mixed audio data, and analog echo data.
  • the specific processing process refer to the processing process of echo cancellation processing based on the AEC algorithm, mixed audio data and direct echo data in related technologies. This disclosure No longer.
  • the embodiments of the present disclosure provide an audio processing method, which can obtain the first audio data of the played audio; based on the first audio data, simulate and calculate the echo data generated by the sound wave of the played audio propagating in the current scene to obtain the simulated echo data ; Then, the audio data in the current scene is collected through the microphone to obtain the mixed audio data, the mixed audio data includes the second audio data, and the real echo data generated by the sound waves of the played audio propagating in the current scene; based on the analog echo data pair The mixed audio data is subjected to echo cancellation processing to obtain second audio data.
  • the simulated echo data is obtained, and then the mixed audio data collected by the microphone is echo canceled based on the simulated echo data. Therefore, the mixed audio data can be removed.
  • Direct echo data and indirect echo data can improve the effect of echo cancellation.
  • the user may be in different scenes, and the requirements for the echo cancellation effect in different scenes are also different.
  • the requirements for the echo cancellation effect are also different. For example, when the user is in a speech recognition scene, since semantic analysis needs to be performed based on the second audio data, it is necessary to remove the real echo data in the mixed audio data as much as possible.
  • the requirements for the echo cancellation effect can be lower compared to the voice recognition scene.
  • the electronic device may pre-store the corresponding relationship between the scene and the configuration parameter, and the configuration parameter may be at least one of the number of iterations and the desired value.
  • the electronic device can simulate and calculate different simulated echo data based on the corresponding configuration parameters, so as to achieve different echo cancellation effects.
  • the specific processing process can include:
  • Step 201 Determine the target configuration parameter corresponding to the current scene according to the pre-stored correspondence between the scene and the configuration parameter.
  • the electronic device can determine the current scene, and then the electronic device can determine the target configuration parameter corresponding to the current scene according to the pre-stored correspondence between the scene and the configuration parameters.
  • the electronic device can determine the current scene in a variety of ways.
  • the electronic device may determine the current scene according to the received control instruction. For example, if the electronic device receives a voice control instruction, the electronic device can determine that the current scene is a voice recognition scene; if the electronic device receives a call control instruction, the electronic device can determine that the current scene is a call scene.
  • the electronic device can determine the current scene according to the volume of the collected mixed audio data, and the specific processing process will be described in detail later.
  • Step 202 Set configuration parameters in the echo simulation algorithm according to the target configuration parameters.
  • the electronic device may set the parameter value of the configuration parameter in the echo simulation algorithm to the determined parameter value of the target configuration parameter.
  • the current scene is a call scene
  • the target configuration parameters corresponding to the current scene include: the number of iterations is 2, and the expected value is 0.8.
  • the electronic device can set the parameter value of the number of iterations of the configuration parameter in the echo simulation algorithm to 2, and the parameter value of the expected value of the configuration parameter to 0.8.
  • Step 203 Based on the first audio data and the echo simulation algorithm set with target configuration parameters, simulate and calculate the echo data generated by the sound waves of the played audio propagating in the current scene to obtain simulated echo data.
  • processing procedure of this step can refer to the processing procedure of step 102, which will not be repeated here.
  • the electronic device may determine the target configuration parameter corresponding to the current scene according to the corresponding relationship between the pre-stored scene and the configuration parameter. Then, the electronic device can set the configuration parameters in the echo simulation algorithm according to the target configuration parameters. After that, the electronic device can simulate and calculate the echo data generated by the sound waves of the played audio propagating in the current scene based on the first audio data and the echo simulation algorithm set with the target configuration parameters to obtain the simulated echo data.
  • the simulated echo data is simulated and calculated based on the echo simulation algorithm set with the target configuration parameters and the first audio data. Therefore, different simulated echo data can be determined for different scenarios, thereby achieving different Echo cancellation effect. In scenes with higher requirements for echo cancellation effects, the requirements for echo cancellation can be met; in scenes with lower requirements for echo cancellation effects, the processing speed of echo cancellation can be increased.
  • a preset volume threshold may be preset in the electronic device, and the preset volume threshold may be 50 dB.
  • the electronic device can determine the current scene based on the preset volume threshold and the volume of the mixed audio data, as shown in FIG. 3, including the following steps:
  • Step 301 Acquire the volume of the collected mixed audio data in decibels.
  • the electronic device can detect the volume of the collected mixed audio data in decibels while collecting the mixed audio data.
  • Step 302 Determine whether the volume decibel is greater than a preset volume threshold.
  • the electronic device can determine whether the volume decibel is greater than a preset volume threshold. If the volume decibel is greater than the preset volume threshold, the electronic device may perform step 303; if the volume decibel is not greater than the preset volume threshold, the electronic device may perform step 304.
  • Step 303 Determine that the current scene is the first scene.
  • Step 304 Determine that the current scene is the second scene.
  • the electronic device can obtain the volume decibel of the collected mixed audio data. Then, it is determined whether the volume decibel is greater than the preset volume threshold. When the volume decibel is greater than the preset volume threshold, the current scene is determined to be the first scene, and when the volume decibel is not greater than the preset volume threshold, the current scene is determined to be the first scene. Two scenes. In this way, the current scene can be judged based on the volume level. It is convenient to subsequently simulate and calculate the simulated echo data based on the target configuration parameters corresponding to the current scene, and perform echo cancellation processing on the mixed audio data based on the simulated echo data to obtain the second audio data that meets the echo cancellation requirements of the current scene.
  • the configuration parameter corresponding to the first scenario is the first configuration parameter
  • the configuration parameter corresponding to the second scenario is the second configuration parameter.
  • the second configuration parameter is smaller than the first configuration parameter.
  • the first scene may indicate a scene with higher requirements for echo cancellation effects
  • the second scene may indicate a scene with lower requirements for echo cancellation effects
  • the first scene is a public place scene
  • the first configuration parameter corresponding to the first scene is: the number of iterations is 3, and the expected value is 1.
  • the second scene is a bedroom scene, and the second configuration parameters corresponding to the second scene are: the number of iterations is 2, and the expected value is 0.8.
  • the simulated echo data obtained by simulation calculation based on the first audio data can be improved to be close to the real echo data. Therefore, for the second scene with lower requirements for echo cancellation effect, set For the second configuration parameter, for the first scene with higher requirements for the echo cancellation effect, the first configuration parameter that is larger than the second configuration parameter can be set. As a result, different echo cancellation effects can be achieved for different scenes. In scenes with higher requirements for echo cancellation effects, the requirements for echo cancellation can be met; in scenes with lower requirements for echo cancellation effects, the processing speed of echo cancellation can be increased.
  • the embodiments of the present disclosure also provide an audio processing device. As shown in FIG. 4, the device includes:
  • the first acquiring module 410 is configured to acquire the first audio data of the played audio
  • the calculation module 420 is configured to simulate and calculate the echo data generated by the sound waves of the played audio propagating in the current scene based on the first audio data, to obtain simulated echo data;
  • the collection module 430 is configured to collect audio data in the current scene through a microphone to obtain mixed audio data.
  • the mixed audio data includes second audio data and real echo data generated by the sound waves of the played audio propagating in the current scene. ;
  • the echo cancellation module 440 is configured to perform echo cancellation processing on the mixed audio data based on the analog echo data to obtain the second audio data.
  • the calculation module includes:
  • the determining sub-module is set to determine the target configuration parameter corresponding to the current scene according to the corresponding relationship between the pre-stored scene and the configuration parameter;
  • the setting sub-module is set to set the configuration parameters in the echo simulation algorithm according to the target configuration parameters
  • the calculation sub-module is configured to simulate and calculate the echo data generated by the sound waves of the played audio propagating in the current scene based on the first audio data and the echo simulation algorithm set with the target configuration parameters to obtain simulated echo data .
  • the device further includes:
  • the second acquiring module is configured to acquire the volume decibel of the collected mixed audio data
  • a determining module configured to determine that the current scene is the first scene when the volume decibel is greater than a preset volume threshold
  • the determining module is further configured to determine that the current scene is the second scene when the volume decibel is not greater than a preset volume threshold.
  • the corresponding relationship between the scenario and the configuration parameter includes:
  • the configuration parameter corresponding to the first scenario is the first configuration parameter
  • the configuration parameter corresponding to the second scenario is a second configuration parameter, and the second configuration parameter is smaller than the first configuration parameter.
  • the echo simulation algorithm includes a recursive least squares rls adaptive filter; the configuration parameter includes at least one of the number of iterations and the expected value.
  • the echo data includes direct echo data and indirect echo data.
  • the embodiment of the present disclosure provides an audio processing device that can obtain the first audio data of the played audio; based on the first audio data, simulate and calculate the echo data generated by the sound wave of the played audio propagating in the current scene to obtain the simulated echo data ; Then, the audio data in the current scene is collected through the microphone to obtain the mixed audio data, the mixed audio data includes the second audio data, and the real echo data generated by the sound waves of the played audio propagating in the current scene; based on the analog echo data pair The mixed audio data is subjected to echo cancellation processing to obtain second audio data.
  • the simulated echo data is obtained, and then the mixed audio data collected by the microphone is echo canceled based on the simulated echo data. Therefore, the mixed audio data can be removed.
  • Direct echo data and indirect echo data can improve the effect of echo cancellation.
  • the embodiment of the present disclosure also provides an electronic device, as shown in FIG.
  • the memories 503 communicate with each other through the communication bus 504,
  • the memory 503 is set to store computer programs
  • the simulating and calculating the echo data generated by the sound waves of the played audio propagating in the current scene based on the first audio data to obtain the simulated echo data includes:
  • the echo data generated by the sound waves of the played audio propagating in the current scene are simulated and calculated to obtain simulated echo data.
  • the method before determining the target configuration parameter corresponding to the current scene according to the pre-stored correspondence between the scene and the configuration parameter, the method further includes:
  • volume decibel is greater than the preset volume threshold, determining that the current scene is the first scene
  • volume decibel is not greater than the preset volume threshold, it is determined that the current scene is the second scene.
  • the corresponding relationship between the scenario and the configuration parameter includes:
  • the configuration parameter corresponding to the first scenario is the first configuration parameter
  • the configuration parameter corresponding to the second scenario is a second configuration parameter, and the second configuration parameter is smaller than the first configuration parameter.
  • the echo simulation algorithm includes a recursive least squares rls adaptive filter; the configuration parameter includes at least one of the number of iterations and the expected value.
  • the echo data includes direct echo data and indirect echo data.
  • the communication bus mentioned in the above electronic device may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the communication bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
  • the communication interface is set for communication between the above-mentioned electronic device and other devices.
  • the memory may include random access memory (Random Access Memory, RAM), and may also include non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk storage.
  • RAM Random Access Memory
  • NVM non-Volatile Memory
  • the memory may also be at least one storage device located far away from the aforementioned processor.
  • the above-mentioned processor can be a general-purpose processor, including a central processing unit (CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (Digital Signal Processing, DSP), a dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • CPU central processing unit
  • NP Network Processor
  • DSP Digital Signal Processing
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • FPGA Field-Programmable Gate Array
  • a computer-readable storage medium is also provided, and a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, any one of the above-mentioned audio processing methods is implemented A step of.
  • a computer program product containing instructions is also provided, which when running on a computer, causes the computer to execute any audio processing method in the above-mentioned embodiments.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

一种音频处理方法、装置、电子设备及存储介质,属于声音信号处理技术领域。方法包括:获取所播放音频的第一音频数据(101);基于第一音频数据,模拟计算所播放音频的声波在当前场景中传播生成的回声数据,得到模拟回声数据(102);通过麦克风采集当前场景中的音频数据,得到混合音频数据(103),混合音频数据包括第二音频数据、所播放音频的声波在当前场景中传播生成的真实回声数据;基于模拟回声数据对混合音频数据进行回声消除处理,得到第二音频数据(104),可以提高回声消除的效果。

Description

一种音频处理方法、装置、电子设备及存储介质
本公开要求于2020年04月21日提交中国专利局、申请号为202010317867.5、发明名称为“一种音频处理方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及声音信号处理技术领域,尤其涉及一种音频处理方法、装置、电子设备及存储介质。
背景技术
在语音交互场景中,电子设备可以实现麦克风和扬声器的协同工作,即,电子设备可以在通过扬声器播放音频的同时,通过麦克风采集用户发出的声音。由于声波具有反射传播的特性,因此,麦克风采集到的音频数据不仅包含用户的音频数据、还包括扬声器所播放音频的回声数据。
相关技术中,为消除回声,电子设备可以通过采样回路采集扬声器播放的音频,得到第一音频数据。然后,电子设备可以对麦克风采集到的混合音频数据和第一音频数据进行回声消除处理,得到第二音频数据。之后,电子设备可以将第二音频数据作为仅包含用户声音的目标音频数据。
然而,扬声器所播放音频的回声数据包括直接回声数据和间接回声数据,直接回声数据指扬声器所播放音频的声波未经反射,直接由麦克风采集到的音频数据,间接回声数据指扬声器所播放音频的声波在当前场景中多次反射后,才由麦克风采集到的音频数据。电子设备获取到的第一音频数据仅包含直接回声数据,因此,采用上述回声消除处理仅能去除直接回声数据的影响,无法消除混合音频数据中的间 接回声数据,导致回声消除的效果差。
发明内容
本公开实施例的目的在于提供一种音频处理方法、装置、电子设备及存储介质,以解决回声消除效果差的问题。具体技术方案如下:
第一方面,提供了一种音频处理方法,所述方法包括:
获取所播放音频的第一音频数据;
基于所述第一音频数据,模拟计算所播放音频的声波在当前场景中传播生成的回声数据,得到模拟回声数据;
通过麦克风采集当前场景中的音频数据,得到混合音频数据,所述混合音频数据包括第二音频数据、所播放音频的声波在所述当前场景中传播生成的真实回声数据;
基于所述模拟回声数据对所述混合音频数据进行回声消除处理,得到所述第二音频数据。
在一些实施方式中,所述基于所述第一音频数据,模拟计算所播放音频的声波在当前场景中传播生成的回声数据,得到模拟回声数据,包括:
根据预先存储的场景与配置参数的对应关系,确定与所述当前场景对应的目标配置参数;
根据所述目标配置参数设置回声模拟算法中的配置参数;
基于所述第一音频数据和设置有所述目标配置参数的回声模拟算法,模拟计算所播放音频的声波在所述当前场景中传播生成的回声数据,得到模拟回声数据。
在一些实施方式中,所述根据预先存储的场景与配置参数的对应关系,确定与当前场景对应的目标配置参数之前,还包括:
获取采集到的所述混合音频数据的音量分贝;
如果所述音量分贝大于预设音量阈值,则确定所述当前场景为第一场景;
如果所述音量分贝不大于预设音量阈值,则确定所述当前场景为第二场景。
在一些实施方式中,所述场景与配置参数的对应关系包括:
所述第一场景对应的配置参数为第一配置参数;
所述第二场景对应的配置参数为第二配置参数,所述第二配置参数小于所述第一配置参数。
在一些实施方式中,所述回声模拟算法包括递归最小二乘rls自适应滤波器;所述配置参数包括:迭代次数、期望值中的至少一种。
在一些实施方式中,所述回声数据包括直接回声数据和间接回声数据。
第二方面,提供了一种音频处理装置,所述装置包括:
第一获取模块,被设置为获取所播放音频的第一音频数据;
计算模块,被设置为基于所述第一音频数据,模拟计算所播放音频的声波在当前场景中传播生成的回声数据,得到模拟回声数据;
采集模块,被设置为通过麦克风采集当前场景中的音频数据,得到混合音频数据,所述混合音频数据包括第二音频数据、所播放音频的声波在所述当前场景中传播生成的真实回声数据;
回声消除模块,被设置为基于所述模拟回声数据对所述混合音频数据进行回声消除处理,得到所述第二音频数据。
在一些实施方式中,所述计算模块包括:
确定子模块,被设置为根据预先存储的场景与配置参数的对应关系,确定与所述当前场景对应的目标配置参数;
设置子模块,被设置为根据所述目标配置参数设置回声模拟算法中的配置参数;
计算子模块,被设置为基于所述第一音频数据和设置有所述目标配置参数的回声模拟算法,模拟计算所播放音频的声波在所述当前场景中传播生成的回声数据,得到模拟回声数据。
在一些实施方式中,所述装置还包括:
第二获取模块,被设置为获取采集到的所述混合音频数据的音量分贝;
确定模块,被设置为当所述音量分贝大于预设音量阈值时,确定所述当前场景为第一场景;
所述确定模块,还被设置为当所述音量分贝不大于预设音量阈值时,确定所述当前场景为第二场景。
在一些实施方式中,所述场景与配置参数的对应关系包括:
所述第一场景对应的配置参数为第一配置参数;
所述第二场景对应的配置参数为第二配置参数,所述第二配置参数小于所述第一配置参数。
在一些实施方式中,所述回声模拟算法包括递归最小二乘rls自适应滤波器;所述配置参数包括:迭代次数、期望值中的至少一种。
在一些实施方式中,所述回声数据包括直接回声数据和间接回声数据。
第三方面,提供了一种电子设备,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成 相互间的通信;
存储器,被设置为存放计算机程序;
处理器,被设置为执行存储器上所存放的程序时,实现任一第一方面所述的方法步骤。
第四方面,提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现任一第一方面所述的方法步骤。
第五方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述任一第一方面所述的方法步骤。
本公开实施例有益效果:
本公开实施例提供了一种音频处理方法、装置、电子设备及存储介质,可以获取所播放音频的第一音频数据;基于第一音频数据,模拟计算所播放音频的声波在当前场景中传播生成的回声数据,得到模拟回声数据;然后,通过麦克风采集当前场景中的音频数据,得到混合音频数据,混合音频数据包括第二音频数据、所播放音频的声波在所述当前场景中传播生成的真实回声数据;基于模拟回声数据对混合音频数据进行回声消除处理,得到第二音频数据。
由于对所播放音频在当前场景中传播生成的回声数据进行模拟计算,得到模拟回声数据,再基于模拟回声数据对麦克风采集到的混合音频数据进行回声消除处理,因此,能够去除混合音频数据中的直接回声数据和间接回声数据,从而能够提高回声消除效果。
当然,实施本公开的任一产品或方法并不一定需要同时达到以上所述的所有优点。
附图说明
为了更清楚地说明本公开实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而 易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提供的一种音频处理方法的流程图;
图2为本公开实施例提供的另一种音频处理方法的流程图;
图3为本公开实施例提供的另一种音频处理方法的流程图;
图4为本公开实施例提供的一种音频处理装置的结构示意图;
图5为本公开实施例提供的一种电子设备的结构示意图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
本公开实施例提供了一种音频处理方法,可以应用于电子设备,电子设备可以包含音频采集部件和音频播放部件。电子设备可以是手机、平板电脑等,音频采集部件可以是麦克风,音频播放部件可以是扬声器。本公开实施例以音频采集部件为麦克风为例、音频播放部件为扬声器为例,对一种音频处理方法的具体处理过程进行说明。
电子设备可以实现麦克风和扬声器的协同工作,例如,电子设备可以在通过扬声器播放音乐的同时,通过麦克风采集用户发出的语音指令。或者,在用户通过电子设备进行通话的过程中,电子设备可以通过扬声器播放远端用户的声音,并通过麦克风采集近端用户的声音。
针对麦克风和扬声器协同工作的情况,为了便于区分,将扬声器播放的音频数据,称为第一音频数据;将麦克风需要采集的音频数据,称为第二音频数据;将音频采集部件实际采集到的音频数据,称为混合音频数据,混合音频数据不仅包含第二音频数据,还包含第一音频数据的声波在当前场景中传播生成的直接回声数据和间接回声数据。
采用本公开实施例提供的技术方案,可以去除混合音频数据中包含的直接回声数据和间接回声数据,提高回声消除效果。
下面将结合具体实施方式,对本公开实施例提供的一种音频处理方法进行详细的说明,如图1所示,具体步骤如下:
步骤101、获取所播放音频的第一音频数据。
在实施中,在播放某一音频的过程中,电子设备可以将该音频的音频数据,作为所播放音频的第一音频数据。所播放音频可以是歌曲,也可以是通话场景中远端用户的语音。
电子设备可以通过多种方式获取某一音频的音频数据,在一种可行的实现方式中,电子设备可以在扬声器播放该音频后,通过采样回路采集扬声器播放的音频,得到第一音频数据。
在另一种可行的实现方式中,电子设备可以将该音频的原始音频数据,作为第一音频数据,原始音频数据即未经扬声器播放的音频数据。例如,电子设备可以将传输至扬声器的音频数据,作为所播放音频的第一音频数据。
相关技术中,如果在扬声器播放音频后再采集音频数据,采集到的音频数据中可能包含电子噪声。而本公开实施例中,将传输至扬声器的音频数据,作为所播放音频的第一音频数据,能够避免在扬声器播放后采集到的音频数据中可能包含电子噪声的问题。进一步的,有利于提高基于第一音频数据模拟计算得到的模拟回声数据,与真实回声数据的接近程度。
步骤102、基于第一音频数据,模拟计算所播放音频的声波在当前场景中传播生成的回声数据,得到模拟回声数据。
其中,电子设备中可以预先设置有回声模拟算法,回声模拟算法可以是RLS(Recursive least squares,递归最小二乘)自适应滤波器, LMS(Least Mean Squares,最小均方)算法,NLMS(Normalized Least Mean Squares,归一化最小均方)算法等。
在实施中,电子设备可以基于回声模拟算法、第一音频数据,对所播放音频的声波在当前场景中传播生成的回声数据进行模拟计算,得到模拟回声数据。
电子设备基于回声模拟算法、第一音频数据,模拟计算所播放音频的声波在当前场景中传播生成的回声数据的具体处理过程,可以参照相关技术中,基于回声模拟算法和某一音频数据,计算该音频数据的回声数据的处理过程,本公开在此不再赘述。
在一些实施方式中,回声数据包括直接回声数据和间接回声数据。
步骤103、通过麦克风采集当前场景中的音频数据,得到混合音频数据。
其中,混合音频数据包括第二音频数据、所播放音频的声波在当前场景中传播生成的真实回声数据。
以麦克风采集用户发出的语音指令为例,麦克风采集到的混合音频数据不仅包含用户发出的语音指令,还包括所播放音频的声波在当前场景中传播生成的真实回声数据。真实回声数据包括扬声器所播放音频的声波直接进入麦克风时产生的直接回声数据,该声波在当前场景中多次反射后进入麦克风时产生的间接回声数据。
本公开实施例对步骤102和步骤103的执行顺序不作具体限定。
步骤104、基于模拟回声数据对混合音频数据进行回声消除处理,得到第二音频数据。
在实施中,电子设备中可以预先设置有AEC算法(Acoustic Echo Cancellation,回声消除算法),AEC算法可以是RLS自适应滤波器、LMS算法,NLMS算法等。如果AEC算法具有回声模拟功能,可以选 用AEC算法作为回声模拟算法。
电子设备可以基于AEC算法、混合音频数据和模拟回声数据,进行回声消除处理,具体处理过程,可以参考相关技术中基于AEC算法、混合音频数据和直接回声数据进行回声消除处理的处理过程,本公开不再赘述。
本公开实施例提供了一种音频处理方法,可以获取所播放音频的第一音频数据;基于第一音频数据,模拟计算所播放音频的声波在当前场景中传播生成的回声数据,得到模拟回声数据;然后,通过麦克风采集当前场景中的音频数据,得到混合音频数据,混合音频数据包括第二音频数据、所播放音频的声波在所述当前场景中传播生成的真实回声数据;基于模拟回声数据对混合音频数据进行回声消除处理,得到第二音频数据。
由于对所播放音频在当前场景中传播生成的回声数据进行模拟计算,得到模拟回声数据,再基于模拟回声数据对麦克风采集到的混合音频数据进行回声消除处理,因此,能够去除混合音频数据中的直接回声数据和间接回声数据,从而能够提高回声消除效果。
在一些实施方式中,用户可能处于不同的场景中,对于不同场景中回声消除效果的要求也有所不同。例如,当用户处于语音识别场景中时,由于需要基于第二音频数据进行语义分析,因此,需要尽可能的去除混合音频数据中的真实回声数据。当用户处于通话场景中时,由于将采集到的近端用户的声音传输给远端用户,因此,相对于语音识别场景,回声消除效果的要求可以低一些。
电子设备中可以预先存储有场景与配置参数的对应关系,配置参数可以是迭代次数、期望值中的至少一种。针对不同的场景,电子设备可以基于相应配置参数模拟计算出不同的模拟回声数据,从而实现不同的回声消除效果,如图2所示,具体处理过程可以包括:
步骤201、根据预先存储的场景与配置参数的对应关系,确定与当前场景对应的目标配置参数。
在实施中,电子设备可以确定当前场景,然后,电子设备可以根据预先存储的场景与配置参数的对应关系,确定与当前场景对应的目标配置参数。
本公开实施例中,电子设备可以通过多种方式确定当前场景。在一种可行的实现方式中,电子设备可以根据接收到的控制指令,确定当前场景。例如,如果电子设备接收到语音控制指令,则电子设备可以确定当前场景为语音识别场景;如果电子设备接收到通话控制指令,则电子设备可以确定当前场景为通话场景。在另一种可行的实现方式中,电子设备可以根据采集到的混合音频数据的音量大小,确定当前场景,具体处理过程后续会进行详细说明。
步骤202、根据目标配置参数设置回声模拟算法中的配置参数。
在实施中,电子设备可以将回声模拟算法中配置参数的参数值,设置为确定出的目标配置参数的参数值。
例如,当前场景为通话场景,当前场景对应的目标配置参数包括:迭代次数为2,期望值为0.8。电子设备可以将回声模拟算法中配置参数迭代次数的参数值设置为2,将配置参数期望值的参数值设置为0.8。
步骤203、基于第一音频数据和设置有目标配置参数的回声模拟算法,模拟计算所播放音频的声波在当前场景中传播生成的回声数据,得到模拟回声数据。
在实施中,此步骤的处理过程可以参照步骤102的处理过程,此处不再赘述。
本公开实施例中,电子设备可以根据预先存储的场景与配置参数的对应关系,确定与当前场景对应的目标配置参数。然后,电子设备 可以根据目标配置参数设置回声模拟算法中的配置参数。之后,电子设备可以基于第一音频数据和设置有目标配置参数的回声模拟算法,模拟计算所播放音频的声波在当前场景中传播生成的回声数据,得到模拟回声数据。
由于确定与当前场景对应的目标配置参数,基于设置有目标配置参数的回声模拟算法和第一音频数据,模拟计算模拟回声数据,因此,能够针对不同场景确定不同的模拟回声数据,从而实现不同的回声消除效果。在回声消除效果要求较高的场景下,能够满足回声消除需求;在回声消除效果要求较低的场景下,能够提高回声消除的处理速度。
在一些实施方式中,电子设备中可以预先设置有预设音量阈值,预设音量阈值可以是50dB。电子设备可以基于预设音量阈值和混合音频数据的音量大小,确定当前场景,如图3所示,包括以下步骤:
步骤301、获取采集到的混合音频数据的音量分贝。
在实施中,电子设备可以在采集混合音频数据的同时,检测采集到的混合音频数据的音量分贝。
步骤302、判断该音量分贝是否大于预设音量阈值。
在实施中,电子设备可以判断该音量分贝是否大于预设音量阈值。如果该音量分贝大于预设音量阈值,则电子设备可以执行步骤303;如果该音量分贝不大于预设音量阈值,则电子设备可以执行步骤304。
步骤303、确定当前场景为第一场景。
步骤304、确定当前场景为第二场景。
本公开实施例中,电子设备可以获取采集到的混合音频数据的音量分贝。然后,判断该音量分贝是否大于预设音量阈值,当该音量分贝大于预设音量阈值时,确定当前场景为第一场景,并当该音量分贝不大于预设音量阈值时,确定当前场景为第二场景。由此,可以实现 基于音量大小,对当前场景进行判定。便于后续基于与当前场景对应的目标配置参数模拟计算模拟回声数据,并基于模拟回声数据对混合音频数据进行回声消除处理,得到满足当前场景的回声消除要求的第二音频数据。
在一些实施方式中,场景与配置参数的对应关系中,第一场景对应的配置参数为第一配置参数,第二场景对应的配置参数为第二配置参数。第二配置参数小于第一配置参数。
第一场景可以表示对回声消除效果要求较高的场景,第二场景可以表示对回声消除效果要求较低的场景。
例如,第一场景为公共场所场景,第一场景对应的第一配置参数为:迭代次数为3,期望值为1。第二场景为卧室场景,第二场景对应的第二配置参数为:迭代次数为2,期望值为0.8。
本公开实施例中,由于调大配置参数,可以提高基于第一音频数据模拟计算得到的模拟回声数据,与真实回声数据的接近程度,因此,针对回声消除效果要求较低的第二场景,设置第二配置参数,针对回声消除效果要求较高的第一场景,可以设置比第二配置参数更大的第一配置参数。由此,能够针对不同场景实现不同的回声消除效果。在回声消除效果要求较高的场景下,能够满足回声消除需求;在回声消除效果要求较低的场景下,能够提高回声消除的处理速度。
基于相同的技术构思,本公开实施例还提供了一种音频处理装置,如图4所示,该装置包括:
第一获取模块410,被设置为获取所播放音频的第一音频数据;
计算模块420,被设置为基于所述第一音频数据,模拟计算所播放音频的声波在当前场景中传播生成的回声数据,得到模拟回声数据;
采集模块430,被设置为通过麦克风采集当前场景中的音频数据, 得到混合音频数据,所述混合音频数据包括第二音频数据、所播放音频的声波在所述当前场景中传播生成的真实回声数据;
回声消除模块440,被设置为基于所述模拟回声数据对所述混合音频数据进行回声消除处理,得到所述第二音频数据。
在一些实施方式中,所述计算模块包括:
确定子模块,被设置为根据预先存储的场景与配置参数的对应关系,确定与所述当前场景对应的目标配置参数;
设置子模块,被设置为根据所述目标配置参数设置回声模拟算法中的配置参数;
计算子模块,被设置为基于所述第一音频数据和设置有所述目标配置参数的回声模拟算法,模拟计算所播放音频的声波在所述当前场景中传播生成的回声数据,得到模拟回声数据。
在一些实施方式中,所述装置还包括:
第二获取模块,被设置为获取采集到的所述混合音频数据的音量分贝;
确定模块,被设置为当所述音量分贝大于预设音量阈值时,确定所述当前场景为第一场景;
所述确定模块,还被设置为当所述音量分贝不大于预设音量阈值时,确定所述当前场景为第二场景。
在一些实施方式中,所述场景与配置参数的对应关系包括:
所述第一场景对应的配置参数为第一配置参数;
所述第二场景对应的配置参数为第二配置参数,所述第二配置参数小于所述第一配置参数。
在一些实施方式中,所述回声模拟算法包括递归最小二乘rls自适应滤波器;所述配置参数包括:迭代次数、期望值中的至少一种。
在一些实施方式中,所述回声数据包括直接回声数据和间接回声数据。
本公开实施例提供了一种音频处理装置,可以获取所播放音频的第一音频数据;基于第一音频数据,模拟计算所播放音频的声波在当前场景中传播生成的回声数据,得到模拟回声数据;然后,通过麦克风采集当前场景中的音频数据,得到混合音频数据,混合音频数据包括第二音频数据、所播放音频的声波在所述当前场景中传播生成的真实回声数据;基于模拟回声数据对混合音频数据进行回声消除处理,得到第二音频数据。
由于对所播放音频在当前场景中传播生成的回声数据进行模拟计算,得到模拟回声数据,再基于模拟回声数据对麦克风采集到的混合音频数据进行回声消除处理,因此,能够去除混合音频数据中的直接回声数据和间接回声数据,从而能够提高回声消除效果。
基于相同的技术构思,本公开实施例还提供了一种电子设备,如图5所示,包括处理器501、通信接口502、存储器503和通信总线504,其中,处理器501,通信接口502,存储器503通过通信总线504完成相互间的通信,
存储器503,被设置为存放计算机程序;
处理器501,被设置为执行存储器503上所存放的程序时,实现如下步骤:
获取所播放音频的第一音频数据;
基于所述第一音频数据,模拟计算所播放音频的声波在当前场景中传播生成的回声数据,得到模拟回声数据;
通过麦克风采集当前场景中的音频数据,得到混合音频数据,所述混合音频数据包括第二音频数据、所播放音频的声波在所述当前场景中传播生成的真实回声数据;
基于所述模拟回声数据对所述混合音频数据进行回声消除处理,得到所述第二音频数据。
在一些实施方式中,所述基于所述第一音频数据,模拟计算所播放音频的声波在当前场景中传播生成的回声数据,得到模拟回声数据,包括:
根据预先存储的场景与配置参数的对应关系,确定与所述当前场景对应的目标配置参数;
根据所述目标配置参数设置回声模拟算法中的配置参数;
基于所述第一音频数据和设置有所述目标配置参数的回声模拟算法,模拟计算所播放音频的声波在所述当前场景中传播生成的回声数据,得到模拟回声数据。
在一些实施方式中,所述根据预先存储的场景与配置参数的对应关系,确定与当前场景对应的目标配置参数之前,还包括:
获取采集到的所述混合音频数据的音量分贝;
如果所述音量分贝大于预设音量阈值,则确定所述当前场景为第一场景;
如果所述音量分贝不大于预设音量阈值,则确定所述当前场景为第二场景。
在一些实施方式中,所述场景与配置参数的对应关系包括:
所述第一场景对应的配置参数为第一配置参数;
所述第二场景对应的配置参数为第二配置参数,所述第二配置参 数小于所述第一配置参数。
在一些实施方式中,所述回声模拟算法包括递归最小二乘rls自适应滤波器;所述配置参数包括:迭代次数、期望值中的至少一种。
在一些实施方式中,所述回声数据包括直接回声数据和间接回声数据。
上述电子设备提到的通信总线可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
通信接口被设置为上述电子设备与其他设备之间的通信。
存储器可以包括随机存取存储器(Random Access Memory,RAM),也可以包括非易失性存储器(Non-Volatile Memory,NVM),例如至少一个磁盘存储器。在一些实施方式中,存储器还可以是至少一个位于远离前述处理器的存储装置。
上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
在本公开提供的又一实施例中,还提供了一种计算机可读存储介质,该计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现上述任一音频处理方法的步骤。
在本公开提供的又一实施例中,还提供了一种包含指令的计算机 程序产品,当其在计算机上运行时,使得计算机执行上述实施例中任一音频处理方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本公开实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上所述仅是本公开的具体实施方式,使本领域技术人员能够理解或实现本公开。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本公开的精神或范围的情况下,在其它实施例中实现。因此,本公开将不会被限制于本文所示的这些实施例,而是要符合与本文所申请的原理和新颖特点相一致的最宽的范围。

Claims (10)

  1. 一种音频处理方法,所述方法包括:
    获取所播放音频的第一音频数据;
    基于所述第一音频数据,模拟计算所播放音频的声波在当前场景中传播生成的回声数据,得到模拟回声数据;
    通过麦克风采集当前场景中的音频数据,得到混合音频数据,所述混合音频数据包括第二音频数据、所播放音频的声波在所述当前场景中传播生成的真实回声数据;
    基于所述模拟回声数据对所述混合音频数据进行回声消除处理,得到所述第二音频数据。
  2. 根据权利要求1所述的方法,其中所述基于所述第一音频数据,模拟计算所播放音频的声波在当前场景中传播生成的回声数据,得到模拟回声数据,包括:
    根据预先存储的场景与配置参数的对应关系,确定与所述当前场景对应的目标配置参数;
    根据所述目标配置参数设置回声模拟算法中的配置参数;
    基于所述第一音频数据和设置有所述目标配置参数的回声模拟算法,模拟计算所播放音频的声波在所述当前场景中传播生成的回声数据,得到模拟回声数据。
  3. 根据权利要求2所述的方法,其中,所述根据预先存储的场景与配置参数的对应关系,确定与所述当前场景对应的目标配置参数之前,还包括:
    获取采集到的所述混合音频数据的音量分贝;
    如果所述音量分贝大于预设音量阈值,则确定所述当前场景为第 一场景;
    如果所述音量分贝不大于预设音量阈值,则确定所述当前场景为第二场景。
  4. 根据权利要求3所述的方法,其中,所述场景与配置参数的对应关系包括:
    所述第一场景对应的配置参数为第一配置参数;
    所述第二场景对应的配置参数为第二配置参数,所述第二配置参数小于所述第一配置参数。
  5. 根据权利要求2-4中任一项所述的方法,其中,所述回声模拟算法包括递归最小二乘rls自适应滤波器;所述配置参数包括:迭代次数、期望值中的至少一种。
  6. 根据权利要求1所述的方法,其中,所述回声数据包括直接回声数据和间接回声数据。
  7. 一种音频处理装置,,所述装置包括:
    第一获取模块,被设置为获取所播放音频的第一音频数据;
    计算模块,被设置为基于所述第一音频数据,模拟计算所播放音频的声波在当前场景中传播生成的回声数据,得到模拟回声数据;
    采集模块,被设置为通过麦克风采集当前场景中的音频数据,得到混合音频数据,所述混合音频数据包括第二音频数据、所播放音频的声波在所述当前场景中传播生成的真实回声数据;
    回声消除模块,被设置为基于所述模拟回声数据对所述混合音频数据进行回声消除处理,得到所述第二音频数据。
  8. 根据权利要求7所述的装置,其中,所述计算模块包括:
    第一确定子模块,被设置为根据预先存储的场景与配置参数的对应关系,确定与所述当前场景对应的目标配置参数;
    设置子模块,被设置为根据所述目标配置参数设置回声模拟算法中的配置参数;
    计算子模块,被设置为基于所述第一音频数据和设置有所述目标配置参数的回声模拟算法,模拟计算所播放音频的声波在所述当前场景中传播生成的回声数据,得到模拟回声数据。
  9. 一种电子设备,,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信;
    存储器,被设置为存放计算机程序;
    处理器,被设置为执行存储器上所存放的程序时,实现权利要求1-6任一所述的方法步骤。
  10. 一种计算机可读存储介质,,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-6任一所述的方法步骤。
PCT/CN2020/140641 2020-04-21 2020-12-29 一种音频处理方法、装置、电子设备及存储介质 WO2021212905A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010317867.5 2020-04-21
CN202010317867.5A CN111583950B (zh) 2020-04-21 2020-04-21 一种音频处理方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021212905A1 true WO2021212905A1 (zh) 2021-10-28

Family

ID=72113106

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/140641 WO2021212905A1 (zh) 2020-04-21 2020-12-29 一种音频处理方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN111583950B (zh)
WO (1) WO2021212905A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111583950B (zh) * 2020-04-21 2024-05-03 珠海格力电器股份有限公司 一种音频处理方法、装置、电子设备及存储介质
CN113160790A (zh) * 2021-04-08 2021-07-23 维沃移动通信有限公司 回声消除方法、装置、电子设备及存储介质
CN114596871B (zh) * 2022-03-22 2023-03-28 镁佳(北京)科技有限公司 一种车机音量调整方法、装置及电子设备
CN117880696A (zh) * 2022-10-12 2024-04-12 广州开得联软件技术有限公司 混音方法、装置、计算机设备以及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050047611A1 (en) * 2003-08-27 2005-03-03 Xiadong Mao Audio input system
US20140274218A1 (en) * 2013-03-12 2014-09-18 Motorola Mobility Llc Apparatus with Adaptive Acoustic Echo Control for Speakerphone Mode
CN109166589A (zh) * 2018-08-13 2019-01-08 深圳市腾讯网络信息技术有限公司 应用声音抑制方法、装置、介质以及设备
US20190392853A1 (en) * 2018-06-26 2019-12-26 Google Llc Multi-channel echo cancellation with scenario memory
CN110956973A (zh) * 2018-09-27 2020-04-03 深圳市冠旭电子股份有限公司 一种回声消除方法、装置及智能终端
CN111583950A (zh) * 2020-04-21 2020-08-25 珠海格力电器股份有限公司 一种音频处理方法、装置、电子设备及存储介质

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102014205A (zh) * 2010-11-19 2011-04-13 中兴通讯股份有限公司 语音通话质量的处理方法及装置
TWI469650B (zh) * 2012-11-29 2015-01-11 Quanta Comp Inc 回音消除系統
CN103312913B (zh) * 2013-07-03 2015-12-23 苏州科达科技股份有限公司 一种消除回声的系统及方法
CN106910510A (zh) * 2017-02-16 2017-06-30 智车优行科技(北京)有限公司 车载功放设备、车辆及其音频播放处理方法
CN109961797B (zh) * 2017-12-25 2023-07-18 阿里巴巴集团控股有限公司 一种回声消除方法、装置以及电子设备
CN108630219B (zh) * 2018-05-08 2021-05-11 北京小鱼在家科技有限公司 回声抑制音频信号特征跟踪的处理系统、方法及装置
CN109767777A (zh) * 2019-01-31 2019-05-17 迅雷计算机(深圳)有限公司 一种直播软件的混音方法
CN209994549U (zh) * 2019-08-16 2020-01-24 深圳市技湛科技有限公司 音频互动主机及音频互动设备
CN110930987B (zh) * 2019-12-11 2021-01-08 腾讯科技(深圳)有限公司 音频处理方法、装置和存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050047611A1 (en) * 2003-08-27 2005-03-03 Xiadong Mao Audio input system
US20140274218A1 (en) * 2013-03-12 2014-09-18 Motorola Mobility Llc Apparatus with Adaptive Acoustic Echo Control for Speakerphone Mode
US20190392853A1 (en) * 2018-06-26 2019-12-26 Google Llc Multi-channel echo cancellation with scenario memory
CN109166589A (zh) * 2018-08-13 2019-01-08 深圳市腾讯网络信息技术有限公司 应用声音抑制方法、装置、介质以及设备
CN110956973A (zh) * 2018-09-27 2020-04-03 深圳市冠旭电子股份有限公司 一种回声消除方法、装置及智能终端
CN111583950A (zh) * 2020-04-21 2020-08-25 珠海格力电器股份有限公司 一种音频处理方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN111583950B (zh) 2024-05-03
CN111583950A (zh) 2020-08-25

Similar Documents

Publication Publication Date Title
WO2021212905A1 (zh) 一种音频处理方法、装置、电子设备及存储介质
US11017799B2 (en) Method for processing voice in interior environment of vehicle and electronic device using noise data based on input signal to noise ratio
WO2018188282A1 (zh) 回声消除方法、装置、会议平板及计算机存储介质
CN111883156B (zh) 音频处理方法、装置、电子设备及存储介质
WO2016180100A1 (zh) 一种音频处理的性能提升方法及装置
CN108630219B (zh) 回声抑制音频信号特征跟踪的处理系统、方法及装置
CN107360530B (zh) 一种回声消除的测试方法和装置
CN110956976B (zh) 一种回声消除方法、装置、设备及可读存储介质
JP2016513816A (ja) コンテンツベースのノイズ抑制
CN109831733A (zh) 音频播放性能的测试方法、装置、设备和存储介质
US11349525B2 (en) Double talk detection method, double talk detection apparatus and echo cancellation system
CN110782914B (zh) 信号处理方法、装置、终端设备及存储介质
CN109361995B (zh) 一种电器设备的音量调节方法、装置、电器设备和介质
CN109658935B (zh) 多通道带噪语音的生成方法及系统
WO2020097828A1 (zh) 回声消除方法、延时估计方法、装置、存储介质及设备
CN110992923B (zh) 回声消除方法、电子设备以及存储装置
CN113241085B (zh) 回声消除方法、装置、设备及可读存储介质
CN112437391B (zh) 用于开放环境的麦克风测试方法及系统
US20190221226A1 (en) Electronic apparatus and echo cancellation method applied to electronic apparatus
US20240177726A1 (en) Speech enhancement
CN110475181B (zh) 设备配置方法、装置、设备和存储介质
CN112037810A (zh) 一种回音处理方法、装置、介质和计算设备
CN106297816B (zh) 一种回声消除的非线性处理方法和装置及电子设备
CN112489679A (zh) 声学回声消除算法的评估方法、装置及终端设备
WO2020107455A1 (zh) 语音处理方法、装置、存储介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20932050

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20932050

Country of ref document: EP

Kind code of ref document: A1