WO2020125325A1 - Method for eliminating echo and device - Google Patents

Method for eliminating echo and device Download PDF

Info

Publication number
WO2020125325A1
WO2020125325A1 PCT/CN2019/120452 CN2019120452W WO2020125325A1 WO 2020125325 A1 WO2020125325 A1 WO 2020125325A1 CN 2019120452 W CN2019120452 W CN 2019120452W WO 2020125325 A1 WO2020125325 A1 WO 2020125325A1
Authority
WO
WIPO (PCT)
Prior art keywords
echo
audio
signal
reference signal
audio reference
Prior art date
Application number
PCT/CN2019/120452
Other languages
French (fr)
Chinese (zh)
Inventor
张真赫
刘安
熊张亮
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020125325A1 publication Critical patent/WO2020125325A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Definitions

  • the invention relates to the field of information processing, in particular to a method and device for eliminating echo.
  • voice is more and more widely used.
  • terminal devices on the market that interact via voice, such as mobile phones, smart speakers, set-top boxes, smart TVs, and smart remote controls.
  • the terminal device When the terminal device communicates with the user through voice, it is necessary to acquire and recognize voice first. In the process of voice interaction with the user, the terminal device often plays audio and video content at the same time. The played sound will generate an echo in the microphone, which affects the user's voice input and thus affects the accuracy of voice recognition.
  • Embodiments of the present invention provide a method and terminal device for eliminating echoes to reduce the interference of echoes on user voice input and improve the quality of input voice.
  • an embodiment of the present invention provides an echo cancellation method, which is applied to a terminal device and includes: outputting an audio reference signal; collecting an audio input signal, the audio input signal including an echo of the audio reference signal; according to the audio reference The echo of the signal determines the delay and attenuation coefficient of the echo channel; according to the delay and attenuation coefficient, the echo of the audio content signal in the audio input signal is eliminated.
  • the above method uses the audio reference signal to obtain the characteristic parameters of the echo channel, thereby eliminating the echo and improving the voice input quality.
  • determining the attenuation coefficient of the echo channel includes: calculating the amplitude of the echo signal at the frequency of the audio reference signal by Fourier transform of the audio input signal; the amplitude of the echo signal at the frequency of the audio reference signal The signal amplitude ratio with the output audio reference signal is the attenuation coefficient of the echo signal.
  • the above method further includes filtering the audio input signal through a band-pass filter to obtain the echo of the audio reference signal.
  • determining the attenuation coefficient of the echo channel includes: calculating the amplitude of the echo signal at the frequency of the audio reference signal by means of root mean square; the amplitude of the echo signal at the frequency of the audio reference signal and the The signal amplitude ratio of the output audio reference signal is the attenuation coefficient of the echo signal.
  • determining the delay of the echo channel includes: recording the first time when the audio reference signal starts to be output, and recording the second time when the echo of the audio reference signal starts to be detected in the audio input signal; The delay is the time difference between the second time and the first time.
  • the frequency of the audio reference signal is greater than the frequency range of human ear audible sound.
  • the output of the audio reference signal is performed when the terminal device is turned on, or periodically.
  • an embodiment of the present invention provides a terminal device that has the function of implementing the above method.
  • the function can be realized by hardware, or can also be realized by hardware executing corresponding software.
  • the hardware or software includes one or more units corresponding to the above functions, such as an audio output unit, an audio input unit, and a processing unit.
  • the structure of the terminal device includes a processor and a memory, where the memory is used to store application program codes that support the above method, and the processor is configured to execute the program stored in the memory.
  • an embodiment of the present invention provides a computer storage medium for storing computer software instructions for the above-mentioned terminal device, which includes a program designed to execute the above-mentioned method.
  • the above method and terminal device for echo cancellation provided by an embodiment of the present invention achieve echo cancellation by outputting audio echo parameters and collecting their echoes, thereby determining characteristic parameters of the echo channel. It greatly reduces the interference of echo to the user's voice input and improves the quality of the input voice. This can improve the quality and performance of subsequent speech processing, such as speech recognition.
  • FIG. 1 is a schematic diagram of a system architecture for echo cancellation provided by an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of an echo cancellation method according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of another terminal device according to an embodiment of the present invention.
  • the audio and video content may be playing at the same time.
  • the played sound will generate an echo in the microphone.
  • the user's voice input is usually interfered by the echo generated by the playing voice, resulting in the terminal device's recognition of the voice input Reduced ability.
  • the echo cancellation method provided by the embodiment of the present invention is applied to the system shown in FIG. 1, and the system includes: a terminal device 101, a speaker 102, and a microphone 103.
  • the terminal device shown in FIG. 1 may be a personal computer PC, a mobile phone, a set-top box, a smart speaker, a smart TV, and other devices.
  • the terminal device may directly include a speaker 102 and a microphone 103, such as a mobile phone.
  • the terminal device can also be connected with an external speaker and microphone, such as an external speaker and microphone of a personal computer, and an external TV set-top box as an audio and video playback device.
  • the terminal device 101 is used to output the audio content signal of the audiovisual program content to the speaker 102, and also output the audio reference signal to the speaker.
  • the audio reference signal is usually a high-frequency signal, the frequency of which is greater than the frequency range of the human ear audible sound.
  • the frequency range of the sound that can be heard by the general human ear is 20 Hz to 20,000 Hz, so the frequency of the audio reference signal can be selected above 20,000 Hz.
  • the terminal device is used to collect the audio input signal of the microphone and process it to eliminate the echo mixed in the audio input signal and restore the user's voice input.
  • the speaker 102 is used to play audio signals output by the terminal device, including audio content signals or audio reference signals.
  • the sound of the played audio content signal can be listened to by the user, while the sound of the played audio reference signal cannot be heard by the user, which does not affect the user experience.
  • the sound of the audio content signal played by the speaker or the sound of the audio reference signal is propagated into the microphone 103 to generate an echo.
  • the microphone 103 is used to receive the voice of the user during voice interaction with the terminal device.
  • the sound received by the microphone may be mixed with the echo of the audio content signal played by the speaker or the echo of the audio reference signal.
  • the sound output from the speaker will generate an echo in the microphone, and the causes include the diffraction and reflection of the sound.
  • the echo signal can be considered as a sound signal after the audio signal passes through the echo channel.
  • the effects of the echo channel on sound include: time delay and energy attenuation.
  • the effect of the echo channel on the audio content signal is similar to the effect on the audio reference signal. Therefore, it is possible to analyze the audio reference signal to obtain the echo channel characteristic parameters, including time delay and attenuation coefficient, and then use these two echo channel characteristic parameters to eliminate the echo of the audio content signal.
  • the terminal device 101 output an audio signal to the speaker 102, output an audio content signal X 0 (n), or output an audio reference signal C 0 (n).
  • the sound from the speaker will be propagated to the echo signal X(n) of the audio content signal generated in the microphone, or the echo signal C(n) of the audio reference signal.
  • the user's voice input S 0 (n) is collected by the microphone 103, and the collected voice signal S(n) includes the user's voice input S 0 (n) and echo signals of possible audio content signals X(n).
  • the terminal device needs to eliminate the echo signal X(n) from the collected voice signal S(n). That is to calculate the following formula 1:
  • an embodiment of the present invention provides an echo cancellation method. As shown in Fig. 2, it specifically includes the following steps.
  • the frequency of the audio reference signal C 0 (n) is usually selected in a high frequency band that is inaudible to the human ear, for example, 20 kHz may be selected. If the terminal device is playing audio and video program content, the audio reference signal and the audio content signal can be superimposed and output without affecting the user's listening to the audio program content.
  • An example of C 0 (n) is:
  • a 0 is the amplitude of the audio reference signal
  • f 0 is the frequency of the audio reference signal
  • fs is the sampling frequency of the system digitization.
  • the sampling frequency of the system needs to be greater than twice the frequency of the audio reference signal. For example, when the frequency of the audio reference signal is 20 kHz, the commonly used sampling frequency of 44.1 kHz can meet this requirement.
  • the audio reference signal can be output when the terminal device is turned on and the characteristic parameters of the echo channel can be determined. After the determination of the characteristic parameters is completed, the output of the audio reference signal can be stopped. Subsequent echo cancellation of voice input is performed according to the determined parameters.
  • the system can also periodically output audio reference signals and determine the echo channel characteristic parameters, and constantly update the echo channel characteristic parameters to adapt to changes in the possible surrounding environment of the terminal device.
  • the audio input signal S(n) of the microphone includes the echo C(n) of the audio reference signal through the echo channel in addition to the possible voice input of the terminal device user.
  • the recording start output time T 1 is recorded.
  • DFT discrete discrete Fourier Transform
  • the echo of the audio reference signal undergoes Fourier transform and is a pulse function in the frequency domain:
  • f 0 is the frequency of the initial audio reference signal, that is, the main frequency after Fourier transform
  • a 1 is the amplitude of the main frequency f 0
  • the other is the sub-frequency, due to the spectral response characteristics of the speaker, microphone, and environment
  • the amplitude of the sub-frequency is usually negligible in practical applications.
  • the attenuation coefficient r of the echo channel that is, the ratio of the amplitude of the echo of the audio reference signal to the amplitude of the original reference signal, can be expressed as:
  • the terminal device After determining the delay t and attenuation coefficient r of the echo channel according to the above steps, the terminal device removes the echo of the audio content signal played from the input voice signal of the microphone during the subsequent voice interaction with the user, and the user’s Voice input.
  • f s is the sampling frequency of the system.
  • the user's voice input after echo cancellation can be used as input for voice recognition.
  • the audio input signal collected in the above step 202 may be band-pass filtered to filter out the echo signal of the audio reference signal.
  • the discrete Fourier transform calculation in step 203 only includes the echo signal of the audio reference signal, which will greatly improve the calculation speed of the subsequent Fourier transform.
  • the system can set the bandwidth f B of the band-pass filter according to the frequency f 0 of the audio reference signal.
  • Bandpass filtering can be expressed as:
  • the root-mean-square (RMS) value of the filtered output signal can be directly calculated in the time domain, thereby calculating the energy average E 1 of the echo of the audio reference signal .
  • the root mean square value is used to calculate the energy average E 0 of the original audio reference signal.
  • the attenuation coefficient r of the echo channel that is, the ratio of the amplitude of the echo of the audio reference signal to the amplitude of the original audio reference signal, can be expressed as:
  • the echo cancellation does not need to perform FFT calculation, which further improves the speed of the system echo cancellation calculation.
  • the echo channel characteristic parameters are determined through the audio reference signal, which achieves echo cancellation, reduces the interference of the echo on the user's voice input, and improves the quality of the input voice.
  • An embodiment of the present invention also provides a schematic structural diagram of a terminal device, as shown in FIG. 3, including an audio output unit 301, an audio input unit 302, and a processing unit 303; wherein:
  • Audio output unit used to output audio reference signal
  • the audio input unit is used to collect audio input signals, and the audio input signals include echoes of audio reference signals;
  • the processing unit is configured to determine the delay and attenuation coefficient of the echo channel according to the echo of the audio reference signal, and eliminate the echo of the audio content signal in the audio input signal according to the delay and attenuation coefficient.
  • the terminal device is presented in the form of a functional unit.
  • "Unit" here may refer to an application-specific integrated circuit (ASIC), a circuit, a processor and memory that execute one or more software or firmware programs, an integrated logic circuit, and/or other functions that can provide the above functions Device.
  • ASIC application-specific integrated circuit
  • the terminal device is implemented by using a processor, a memory, and a communication interface.
  • the terminal device in the embodiment of the present invention may also be implemented in the manner of the computer device (or system) in FIG. 4.
  • 4 is a schematic diagram of a computer device provided by an embodiment of the present invention.
  • the computer device includes at least one processor 401, a communication bus 402, a memory 403, and at least one communication interface 404, and may further include an IO interface 405.
  • the processor may be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program of the present invention.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • the communication bus may include a path to transfer information between the aforementioned components.
  • the communication interface uses any transceiver-like device to communicate with other devices or communication networks, such as Ethernet, radio access network (RAN), wireless local area network (Wireless Local Area Networks, WLAN), and so on.
  • RAN radio access network
  • WLAN Wireless Local Area Networks
  • the memory may be read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM) or other types of information and instructions that can be stored Dynamic storage devices can also be Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM) or other optical disc storage, optical disc storage ( (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), disk storage media or other magnetic storage devices, or can be used to carry or store the desired program code in the form of instructions or data structures and can be stored by the computer Any other media, but not limited to this.
  • the memory may exist independently and be connected to the processor through a bus. The memory can also be integrated with the processor.
  • the memory is used to store application program code for executing the solution of the present invention, and is controlled and executed by the processor.
  • the processor is used to execute application code stored in the memory.
  • the processor may include one or more CPUs, and each CPU may be a single-core (single-core) processor or a multi-core (multi-Core) processor.
  • the processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (eg, computer program instructions).
  • the computer device may further include an input/output (I/O) interface.
  • the output device may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector, etc.
  • the input device may be a mouse, a keyboard, a touch screen device or a sensing device, and at least two imaging sensors.
  • the aforementioned computer device may be a general-purpose computer device or a dedicated computer device.
  • the computer device may be a desktop computer, a portable computer, a network server, a PDA (Personal Digital Assistant), a mobile phone, a tablet computer, a wireless terminal device, a communication device, an embedded device, or the like in FIG. 4 Structured equipment.
  • PDA Personal Digital Assistant
  • the embodiment of the present invention does not limit the type of computer equipment.
  • the terminal device in FIG. 1 may be the device shown in FIG. 4, and one or more software modules are stored in the memory.
  • the terminal device can implement the software module through the processor and the program code in the memory to complete the above method.
  • An embodiment of the present invention also provides a computer storage medium for storing computer software instructions for the device shown in FIG. 3 or FIG. 4 above, which includes a program designed to execute the above method embodiment. By executing the stored program, the above method can be realized.
  • the embodiments of the present invention may be provided as a method, an apparatus (device), or a computer program product. Therefore, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code.
  • the computer program is stored/distributed in a suitable medium, provided together with other hardware or as a part of the hardware, and may also adopt other distribution forms, such as via the Internet or other wired or wireless telecommunication systems.
  • each flow and/or block in the flowchart and/or block diagram and a combination of the flow and/or block in the flowchart and/or block diagram may be implemented by computer program instructions.
  • These computer program instructions can be provided to the processor of a general-purpose computer, special-purpose computer, embedded processing machine, or other programmable data processing device to produce a machine that enables the generation of instructions executed by the processor of the computer or other programmable data processing device
  • These computer program instructions may also be stored in a computer readable memory that can guide a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory produce an article of manufacture including an instruction device, the instructions
  • the device implements the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to produce computer-implemented processing, which is executed on the computer or other programmable device
  • the instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)

Abstract

A method for eliminating echo, applied to a terminal device. The method comprises: outputting an audio reference signal (201); acquiring an audio input signal, the audio input signal comprising echo of the audio reference signal (202); determining a time delay and an attenuation coefficient of an echo channel according to the echo of the audio reference signal (203); and eliminating echo of an audio content signal according to the time delay and the attenuation coefficient (204). By means of the method, the interference of echo to voice input of a user is eliminated, and the quality of input voice is improved.

Description

一种消除回声的方法和设备Method and equipment for eliminating echo
本申请要求于2018年12月17日提交中国国家知识产权局、申请号为201811542603.9、发明名称为“一种消除回声的方法和设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application submitted to the State Intellectual Property Office of China on December 17, 2018, with the application number 201811542603.9 and the invention titled "a method and equipment for echo cancellation", the entire content of which is incorporated by reference in In this application.
技术领域Technical field
本发明涉及信息处理领域,尤其涉及一种消除回声的方法和设备。The invention relates to the field of information processing, in particular to a method and device for eliminating echo.
背景技术Background technique
语音作为当前一种人机交互技术,使用越来越广泛。目前市场上有许多通过语音进行交互的终端设备,如移动电话、智能音箱、机顶盒、智能电视,智能遥控器等。As a current human-computer interaction technology, voice is more and more widely used. At present, there are many terminal devices on the market that interact via voice, such as mobile phones, smart speakers, set-top boxes, smart TVs, and smart remote controls.
终端设备与用户通过语音进行交流,首先需要进行语音的获取与识别。终端设备在与用户进行语音交互的过程中,常常同时也播放着音视频内容,播放的声音会在麦克风中产生回声,影响用户的语音输入,进而影响语音识别的准确性。When the terminal device communicates with the user through voice, it is necessary to acquire and recognize voice first. In the process of voice interaction with the user, the terminal device often plays audio and video content at the same time. The played sound will generate an echo in the microphone, which affects the user's voice input and thus affects the accuracy of voice recognition.
现有技术中,有一些回声消除方法,如自适应滤波算法,可一定程度消除回声,但计算复杂,效果比较差。In the prior art, there are some echo cancellation methods, such as an adaptive filtering algorithm, which can cancel the echo to a certain extent, but the calculation is complicated and the effect is relatively poor.
发明内容Summary of the invention
本发明实施例提供一种消除回声的方法和终端设备,减少回声对用户语音输入的干扰,提高输入语音的质量。Embodiments of the present invention provide a method and terminal device for eliminating echoes to reduce the interference of echoes on user voice input and improve the quality of input voice.
第一方面,本发明实施例提供一种消除回声的方法,应用于终端设备,包括:输出音频参考信号;采集音频输入信号,所述音频输入信号中包含了音频参考信号的回声;根据音频参考信号的回声确定回声信道的时延和衰减系数;根据所述时延和衰减系数消除音频输入信号中的音频内容信号的回声。In a first aspect, an embodiment of the present invention provides an echo cancellation method, which is applied to a terminal device and includes: outputting an audio reference signal; collecting an audio input signal, the audio input signal including an echo of the audio reference signal; according to the audio reference The echo of the signal determines the delay and attenuation coefficient of the echo channel; according to the delay and attenuation coefficient, the echo of the audio content signal in the audio input signal is eliminated.
上述方法利用音频参考信号,得到回声信道的特征参数,从而消除回声,提高语音输入质量。The above method uses the audio reference signal to obtain the characteristic parameters of the echo channel, thereby eliminating the echo and improving the voice input quality.
在一个可能的设计中,确定回声信道的衰减系数包括:对音频输入信号通过傅里叶变换计算出在音频参考信号频率上的回声信号幅值;所述音频参考信号频率上的回声信号幅值与所述输出的音频参考信号的信号幅值比值即为回声信号的衰减系数。In a possible design, determining the attenuation coefficient of the echo channel includes: calculating the amplitude of the echo signal at the frequency of the audio reference signal by Fourier transform of the audio input signal; the amplitude of the echo signal at the frequency of the audio reference signal The signal amplitude ratio with the output audio reference signal is the attenuation coefficient of the echo signal.
在另一个可能的设计中,上述方法还包括将音频输入信号通过带通滤波器进行滤波,获得所述音频参考信号的回声。In another possible design, the above method further includes filtering the audio input signal through a band-pass filter to obtain the echo of the audio reference signal.
在另一个可能的设计中,确定回声信道的衰减系数包括:通过均方根值方式计算出在 音频参考信号频率上的回声信号幅值;所述音频参考信号频率上的回声信号幅值与所述输出的音频参考信号的信号幅值比值即为回声信号的衰减系数。In another possible design, determining the attenuation coefficient of the echo channel includes: calculating the amplitude of the echo signal at the frequency of the audio reference signal by means of root mean square; the amplitude of the echo signal at the frequency of the audio reference signal and the The signal amplitude ratio of the output audio reference signal is the attenuation coefficient of the echo signal.
在另一个可能的设计中,确定回声信道的时延包括:记录开始输出音频参考信号的第一时间,并记录检测到音频输入信号中开始出现音频参考信号的回声的第二时间;所述时延为所述第二时间与第一时间的时间差。In another possible design, determining the delay of the echo channel includes: recording the first time when the audio reference signal starts to be output, and recording the second time when the echo of the audio reference signal starts to be detected in the audio input signal; The delay is the time difference between the second time and the first time.
在另一个可能的设计中,所述音频参考信号的频率大于人耳可听见声音的频率范围。In another possible design, the frequency of the audio reference signal is greater than the frequency range of human ear audible sound.
在另一个可能的设计中,所述输出音频参考信号在所述终端设备开机时进行,或周期性地进行。In another possible design, the output of the audio reference signal is performed when the terminal device is turned on, or periodically.
第二方面,本发明实施例提供了一种终端设备,具有实现上述方法的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的单元,如包括音频输出单元,音频输入单元,和处理单元。In a second aspect, an embodiment of the present invention provides a terminal device that has the function of implementing the above method. The function can be realized by hardware, or can also be realized by hardware executing corresponding software. The hardware or software includes one or more units corresponding to the above functions, such as an audio output unit, an audio input unit, and a processing unit.
在一个可能的设计中,终端设备的结构中包括处理器和存储器,所述存储器用于存储支持上述方法的应用程序代码,所述处理器被配置为用于执行所述存储器中存储的程序。In a possible design, the structure of the terminal device includes a processor and a memory, where the memory is used to store application program codes that support the above method, and the processor is configured to execute the program stored in the memory.
第三方面,本发明实施例提供了一种计算机存储介质,用于储存为上述终端设备所用的计算机软件指令,其包含用于执行上述方法所设计的程序。In a third aspect, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for the above-mentioned terminal device, which includes a program designed to execute the above-mentioned method.
本发明实施例提供的上述消除回声的方法和终端设备,通过输出音频回声参数并采集其回声,从而确定回声信道的特征参数,实现了回声消除。极大的减小了回声对用户语音输入的干扰,提高输入语音的质量。从而可以提高后续的语音处理,如语音识别等的质量和性能。The above method and terminal device for echo cancellation provided by an embodiment of the present invention achieve echo cancellation by outputting audio echo parameters and collecting their echoes, thereby determining characteristic parameters of the echo channel. It greatly reduces the interference of echo to the user's voice input and improves the quality of the input voice. This can improve the quality and performance of subsequent speech processing, such as speech recognition.
附图说明BRIEF DESCRIPTION
图1为本发明实施例提供的一种消除回声的系统架构示意图;1 is a schematic diagram of a system architecture for echo cancellation provided by an embodiment of the present invention;
图2为本发明实施例提供的一种消除回声方法的流程示意图;2 is a schematic flowchart of an echo cancellation method according to an embodiment of the present invention;
图3为本发明实施例提供的一种终端设备的结构示意图;3 is a schematic structural diagram of a terminal device according to an embodiment of the present invention;
图4为本发明实施例提供的另一种终端设备的结构示意图。4 is a schematic structural diagram of another terminal device according to an embodiment of the present invention.
具体实施方式detailed description
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。To make the objectives, technical solutions, and advantages of the present invention clearer, the following describes the embodiments of the present invention in further detail with reference to the accompanying drawings.
终端设备与用户语音交互时,可能同时正在播放音视频内容,播放的声音会在麦克风中产生回声,用户的语音输入通常会被正在播放的语音产生的回声干扰,导致终端设备对语音输入的识别能力降低。When the terminal device interacts with the user's voice, the audio and video content may be playing at the same time. The played sound will generate an echo in the microphone. The user's voice input is usually interfered by the echo generated by the playing voice, resulting in the terminal device's recognition of the voice input Reduced ability.
本发明实施例提供的回声消除方法应用于图1所示的系统中,系统包括:终端设备101、 扬声器102、麦克风103。图1所示的终端设备可以是个人计算机PC、移动电话、机顶盒、智能音箱、智能电视等设备。终端设备上还可以直接包括了扬声器102和麦克风103,如移动电话。终端设备也可以外接扬声器和麦克风,如个人计算机外接扬声器和麦克风,机顶盒外接电视机作为音视频播放设备。The echo cancellation method provided by the embodiment of the present invention is applied to the system shown in FIG. 1, and the system includes: a terminal device 101, a speaker 102, and a microphone 103. The terminal device shown in FIG. 1 may be a personal computer PC, a mobile phone, a set-top box, a smart speaker, a smart TV, and other devices. The terminal device may directly include a speaker 102 and a microphone 103, such as a mobile phone. The terminal device can also be connected with an external speaker and microphone, such as an external speaker and microphone of a personal computer, and an external TV set-top box as an audio and video playback device.
终端设备101用于输出音视频节目内容的音频内容信号到扬声器102,还输出音频参考信号到扬声器。音频参考信号通常为高频信号,其频率大于人耳可听见的声音的频率范围。一般人耳可听见的声音的频率范围为20赫兹到20,000赫兹,因此音频参考信号的频率可选择20,000赫兹以上。终端设备用于采集麦克风的音频输入信号,并进行处理,将音频输入信号中混入的回声消除掉,还原用户的语音输入。The terminal device 101 is used to output the audio content signal of the audiovisual program content to the speaker 102, and also output the audio reference signal to the speaker. The audio reference signal is usually a high-frequency signal, the frequency of which is greater than the frequency range of the human ear audible sound. The frequency range of the sound that can be heard by the general human ear is 20 Hz to 20,000 Hz, so the frequency of the audio reference signal can be selected above 20,000 Hz. The terminal device is used to collect the audio input signal of the microphone and process it to eliminate the echo mixed in the audio input signal and restore the user's voice input.
扬声器102用于播放终端设备输出的音频信号,包括音频内容信号或者音频参考信号。播放出来的音频内容信号的声音可以供用户收听,而播放的音频参考信号的声音用户听不见,这样不会影响用户的使用体验。扬声器播放的音频内容信号的声音或音频参考信号的声音会传播到麦克风103中产生回声。The speaker 102 is used to play audio signals output by the terminal device, including audio content signals or audio reference signals. The sound of the played audio content signal can be listened to by the user, while the sound of the played audio reference signal cannot be heard by the user, which does not affect the user experience. The sound of the audio content signal played by the speaker or the sound of the audio reference signal is propagated into the microphone 103 to generate an echo.
麦克风103用于接收用户与终端设备语音交互时的语音。麦克风接收的声音中可能混入了扬声器播放的音频内容信号的回声,或者音频参考信号的回声。The microphone 103 is used to receive the voice of the user during voice interaction with the terminal device. The sound received by the microphone may be mixed with the echo of the audio content signal played by the speaker or the echo of the audio reference signal.
扬声器输出的声音会在麦克风中产生回声,产生的原因包括声音的衍射、反射等。回声信号可以认为是音频信号经过回声信道后的声音信号。回声信道对声音的影响包括:时间上产生了时延,能量上产生了衰减。一般情况下,回声信道对音频内容信号的影响与对音频参考信号的影响相似。因此可以分析音频参考信号获得回声信道特性参数,包括时延和衰减系数,再利用这两个回声信道特性参数消除音频内容信号的回声。The sound output from the speaker will generate an echo in the microphone, and the causes include the diffraction and reflection of the sound. The echo signal can be considered as a sound signal after the audio signal passes through the echo channel. The effects of the echo channel on sound include: time delay and energy attenuation. In general, the effect of the echo channel on the audio content signal is similar to the effect on the audio reference signal. Therefore, it is possible to analyze the audio reference signal to obtain the echo channel characteristic parameters, including time delay and attenuation coefficient, and then use these two echo channel characteristic parameters to eliminate the echo of the audio content signal.
如图1中所示,设终端设备101输出音频信号到扬声器102,输出音频内容信号X 0(n),或者输出音频参考信号C 0(n)。扬声器发出的声音会传播到麦克风中产生音频内容信号的回声信号X(n),或者音频参考信号的回声信号C(n)。用户与系统交互时,用户的语音输入S 0(n)由麦克风103采集,采集到的语音信号S(n)中包含了用户的语音输入S 0(n)和可能的音频内容信号的回声信号X(n)。终端设备需要从采集的语音信号S(n)中消除回声信号X(n)。即进行如下公式1的计算: As shown in FIG. 1, let the terminal device 101 output an audio signal to the speaker 102, output an audio content signal X 0 (n), or output an audio reference signal C 0 (n). The sound from the speaker will be propagated to the echo signal X(n) of the audio content signal generated in the microphone, or the echo signal C(n) of the audio reference signal. When the user interacts with the system, the user's voice input S 0 (n) is collected by the microphone 103, and the collected voice signal S(n) includes the user's voice input S 0 (n) and echo signals of possible audio content signals X(n). The terminal device needs to eliminate the echo signal X(n) from the collected voice signal S(n). That is to calculate the following formula 1:
S 0(n)=S(n)-C(n)   (1) S 0 (n)=S(n)-C(n) (1)
应用于上述图1所示的系统,本发明的实施例提供一种回声消除的方法。如图2所示,具体包括下列步骤。Applied to the system shown in FIG. 1 above, an embodiment of the present invention provides an echo cancellation method. As shown in Fig. 2, it specifically includes the following steps.
201,输出音频参考信号。201. Output an audio reference signal.
如前所述,为了不影响用户的使用,音频参考信号C 0(n)的频率,通常选择人耳听不见的高频段,例如,可选20k赫兹。如果终端设备正在播放音视频节目内容,音频参考信号可与音频内容信号叠加在一起输出,不会影响用户收听音频节目内容。C 0(n)的一个示例为: As mentioned above, in order not to affect the user's use, the frequency of the audio reference signal C 0 (n) is usually selected in a high frequency band that is inaudible to the human ear, for example, 20 kHz may be selected. If the terminal device is playing audio and video program content, the audio reference signal and the audio content signal can be superimposed and output without affecting the user's listening to the audio program content. An example of C 0 (n) is:
C 0(n)=A 0*sin(2πf 0/f s*n)   (2) C 0 (n)=A 0 *sin(2πf 0 /f s *n) (2)
其中,A 0是音频参考信号的幅值,f 0是音频参考信号的频率。fs是系统数字化的采样频率。 Among them, A 0 is the amplitude of the audio reference signal, and f 0 is the frequency of the audio reference signal. fs is the sampling frequency of the system digitization.
系统的采样频率需要大于音频参考信号频率的两倍。例如当音频参考信号的频率是20kHZ时,常用的44.1kHZ的采样频率可以满足这个要求。The sampling frequency of the system needs to be greater than twice the frequency of the audio reference signal. For example, when the frequency of the audio reference signal is 20 kHz, the commonly used sampling frequency of 44.1 kHz can meet this requirement.
可以在终端设备开机时输出音频参考信号并确定回声信道特征参数,确定特征参数完成后可停止输出音频参考信号。后续根据确定的参数进行语音输入的回声消除。The audio reference signal can be output when the terminal device is turned on and the characteristic parameters of the echo channel can be determined. After the determination of the characteristic parameters is completed, the output of the audio reference signal can be stopped. Subsequent echo cancellation of voice input is performed according to the determined parameters.
系统也可以周期性的输出音频参考信号并确定回声信道特征参数,不断更新回声信道特征参数以适应可能的终端设备周边环境的变化。The system can also periodically output audio reference signals and determine the echo channel characteristic parameters, and constantly update the echo channel characteristic parameters to adapt to changes in the possible surrounding environment of the terminal device.
202,采集音频输入信号。202. Collect audio input signals.
麦克风的音频输入信号S(n)中除了可能的终端设备用户的语音输入外,还包含了音频参考信号经过回声信道的回声C(n)。The audio input signal S(n) of the microphone includes the echo C(n) of the audio reference signal through the echo channel in addition to the possible voice input of the terminal device user.
203,根据音频参考信号的回声信号确定回声信道的时延和衰减系数。203. Determine the delay and attenuation coefficient of the echo channel according to the echo signal of the audio reference signal.
在步骤201开始输出音频参考信号时,记录开始输出时间T 1When the output of the audio reference signal starts at step 201, the recording start output time T 1 is recorded.
对采集的麦克风的音频输入信号S(n),进行循环的离散傅里叶变换(Discrete Fourier Transform,DFT)。例如,对于44.1kHZ采样的音频输入信号,采集的5.8ms的数据就可以进行一次256点的快速傅里叶变换FFT。这样,当FFT计算结果中的频域上包含了参考信号频率的数值,则认为采集的麦克风的音频输入信号中包含了音频参考信号的回声。由于音频参考信号的频率高于一般的声音信号,因此,播放的音频内容信号中不包含音频参考信号频率的信号,采集的音频输入信号中,音频参考信号频率的输入都是来自音频参考信号的回声。Perform a discrete discrete Fourier transform (Dcrete Fourier Transform, DFT) on the audio input signal S(n) of the collected microphone. For example, for an audio input signal sampled at 44.1kHZ, a 256-point fast Fourier transform FFT can be performed on the collected 5.8ms of data. In this way, when the frequency domain in the FFT calculation result contains the value of the reference signal frequency, it is considered that the collected audio input signal of the microphone contains the echo of the audio reference signal. Since the frequency of the audio reference signal is higher than the general sound signal, the audio content signal played does not contain the signal of the audio reference signal frequency. Of the collected audio input signals, the input of the audio reference signal frequency comes from the audio reference signal. echo.
记录下此时的时间T2,即麦克风开始收到音频参考信号回声的时刻。回声信道的时延为:Record the time T2 at this time, that is, the time when the microphone begins to receive the echo of the audio reference signal. The delay of the echo channel is:
t=T2-T1   (3)t=T2-T1 (3)
音频参考信号的回声经过傅里叶变换,在频域上为脉冲函数:The echo of the audio reference signal undergoes Fourier transform and is a pulse function in the frequency domain:
|c(f)|=∑A i*δ(f-i*f 0)  (4) |c(f)|=∑A i *δ(fi*f 0 ) (4)
其中f 0为初始音频参考信号的频率,也就是傅里叶变换后的主频,A 1为主频f 0的幅值,其他为副频,由于扬声器、麦克风、以及环境的频谱响应特性,副频的幅值通常实际应用中可近似忽略。 Where f 0 is the frequency of the initial audio reference signal, that is, the main frequency after Fourier transform, A 1 is the amplitude of the main frequency f 0 , and the other is the sub-frequency, due to the spectral response characteristics of the speaker, microphone, and environment, The amplitude of the sub-frequency is usually negligible in practical applications.
这样,回声信道的衰减系数r,即音频参考信号回声的幅值与原始参考信号的幅值的比值,可表示为:In this way, the attenuation coefficient r of the echo channel, that is, the ratio of the amplitude of the echo of the audio reference signal to the amplitude of the original reference signal, can be expressed as:
r=A 1/A 0  (5) r=A 1 /A 0 (5)
204,根据所述时延和衰减系数消除音频输入信号中的音频内容信号的回声。204. Eliminate the echo of the audio content signal in the audio input signal according to the time delay and attenuation coefficient.
根据上述步骤确定回声信道的时延t和衰减系数r后,终端设备在后续与用户进行语音交互的过程中,从麦克风的输入语音信号中去除播放的音频内容信号的回声,就可以得到用户的语音输入。After determining the delay t and attenuation coefficient r of the echo channel according to the above steps, the terminal device removes the echo of the audio content signal played from the input voice signal of the microphone during the subsequent voice interaction with the user, and the user’s Voice input.
即音频内容信号的回声X(n)可表达为:X(n)=r*X 0(n–t*f s),用户语音输入为: That is, the echo X(n) of the audio content signal can be expressed as: X(n)=r*X 0 (n–t*f s ), and the user's voice input is:
S 0(n)=S(n)-r*X 0(n-t*f s)  (6) S 0 (n)=S(n)-r*X 0 (nt*f s ) (6)
其中,f s为系统的采样频率。消除回音后的用户语音输入可作为语音识别的输入。 Among them, f s is the sampling frequency of the system. The user's voice input after echo cancellation can be used as input for voice recognition.
优选的,对上述步骤202采集的音频输入信号,可以先进行带通滤波,过滤出音频参考信号的回声信号。这样,步骤203中的离散傅里叶变换计算就只包含了音频参考信号的回声信号,将极大提高后续傅里叶变换的运算速度。Preferably, the audio input signal collected in the above step 202 may be band-pass filtered to filter out the echo signal of the audio reference signal. In this way, the discrete Fourier transform calculation in step 203 only includes the echo signal of the audio reference signal, which will greatly improve the calculation speed of the subsequent Fourier transform.
系统根据音频参考信号的频率f 0,可设定带通滤波器的带宽f B。带通滤波可表示为: The system can set the bandwidth f B of the band-pass filter according to the frequency f 0 of the audio reference signal. Bandpass filtering can be expressed as:
C(n)=bandpass(S(n),f 0,f B)  (7) C(n)=bandpass(S(n), f 0 , f B ) (7)
进一步,对带通滤波输出的音频参考信号的回声,可以直接在时域上计算滤波输出信号的均方根值(root-mean-square,RMS),从而算出音频参考信号回声的能量均值E 1。同样在时域上用均方根值计算出原始音频参考信号的能量均值E 0。则回声信道的衰减系数r,即音频参考信号回声的幅值与原始音频参考信号的幅值的比值,可表示为: Further, for the echo of the audio reference signal output by the band-pass filtering, the root-mean-square (RMS) value of the filtered output signal can be directly calculated in the time domain, thereby calculating the energy average E 1 of the echo of the audio reference signal . In the same time domain, the root mean square value is used to calculate the energy average E 0 of the original audio reference signal. Then the attenuation coefficient r of the echo channel, that is, the ratio of the amplitude of the echo of the audio reference signal to the amplitude of the original audio reference signal, can be expressed as:
r=(E 1/E 0) 1/2   (8) r = (E 1 /E 0 ) 1/2 (8)
而回声信道的时延仍然采用公式(3)的方法即可。For the delay of the echo channel, the method of formula (3) can still be used.
这样,回声消除可不必进行FFT计算,进一步提高了系统回声消除计算的速度。In this way, the echo cancellation does not need to perform FFT calculation, which further improves the speed of the system echo cancellation calculation.
本发明的上述实施例,通过音频参考信号确定回声信道特性参数,实现了回声消除,减少回声对用户语音输入的干扰,提高输入语音的质量。In the above-mentioned embodiments of the present invention, the echo channel characteristic parameters are determined through the audio reference signal, which achieves echo cancellation, reduces the interference of the echo on the user's voice input, and improves the quality of the input voice.
本发明实施例还提供了一种终端设备的结构示意图,如图3所示,包括音频输出单元301,音频输入单元302,以及处理单元303;其中:An embodiment of the present invention also provides a schematic structural diagram of a terminal device, as shown in FIG. 3, including an audio output unit 301, an audio input unit 302, and a processing unit 303; wherein:
音频输出单元,用于输出音频参考信号;Audio output unit, used to output audio reference signal;
音频输入单元,用于采集音频输入信号,所述音频输入信号中包含了音频参考信号的回声;The audio input unit is used to collect audio input signals, and the audio input signals include echoes of audio reference signals;
处理单元,用于根据音频参考信号的回声确定回声信道的时延和衰减系数,并根据所述时延和衰减系数消除音频输入信号中的音频内容信号的回声。The processing unit is configured to determine the delay and attenuation coefficient of the echo channel according to the echo of the audio reference signal, and eliminate the echo of the audio content signal in the audio input signal according to the delay and attenuation coefficient.
进一步,这些单元实现前述方法中的相关功能,不再赘述。Further, these units implement related functions in the foregoing method, and will not be described in detail.
在本实施例中,终端设备是以功能单元的形式来呈现。这里的“单元”可以指特定应用集成电路(application-specific integrated circuit,ASIC),电路,执行一个或多个软件或固件程序的处理器和存储器,集成逻辑电路,和/或其他可以提供上述功能的器件。在一个简单的实施例中,本领域的技术人员可以想到终端设备采用处理器、存储器和通信接口来实现。In this embodiment, the terminal device is presented in the form of a functional unit. "Unit" here may refer to an application-specific integrated circuit (ASIC), a circuit, a processor and memory that execute one or more software or firmware programs, an integrated logic circuit, and/or other functions that can provide the above functions Device. In a simple embodiment, those skilled in the art may think that the terminal device is implemented by using a processor, a memory, and a communication interface.
本发明实施例中的终端设备还可以以图4中的计算机设备(或系统)的方式来实现。图4所示为本发明实施例提供的计算机设备示意图。该计算机设备包括至少一个处理器401,通信总线402,存储器403以及至少一个通信接口404,还可以包括IO接口405。The terminal device in the embodiment of the present invention may also be implemented in the manner of the computer device (or system) in FIG. 4. 4 is a schematic diagram of a computer device provided by an embodiment of the present invention. The computer device includes at least one processor 401, a communication bus 402, a memory 403, and at least one communication interface 404, and may further include an IO interface 405.
处理器可以是一个通用中央处理器(CPU),微处理器,特定应用集成电路 (application-specific integrated circuit,ASIC),或一个或多个用于控制本发明方案程序执行的集成电路。The processor may be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program of the present invention.
通信总线可包括一通路,在上述组件之间传送信息。所述通信接口,使用任何收发器一类的装置,用于与其他设备或通信网络通信,如以太网,无线接入网(RAN),无线局域网(Wireless Local Area Networks,WLAN)等。The communication bus may include a path to transfer information between the aforementioned components. The communication interface uses any transceiver-like device to communicate with other devices or communication networks, such as Ethernet, radio access network (RAN), wireless local area network (Wireless Local Area Networks, WLAN), and so on.
存储器可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。The memory may be read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM) or other types of information and instructions that can be stored Dynamic storage devices can also be Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM) or other optical disc storage, optical disc storage ( (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), disk storage media or other magnetic storage devices, or can be used to carry or store the desired program code in the form of instructions or data structures and can be stored by the computer Any other media, but not limited to this. The memory may exist independently and be connected to the processor through a bus. The memory can also be integrated with the processor.
其中,所述存储器用于存储执行本发明方案的应用程序代码,并由处理器来控制执行。所述处理器用于执行所述存储器中存储的应用程序代码。Wherein, the memory is used to store application program code for executing the solution of the present invention, and is controlled and executed by the processor. The processor is used to execute application code stored in the memory.
在具体实现中,处理器可以包括一个或多个CPU,每个CPU可以是一个单核(single-core)处理器,也可以是一个多核(multi-Core)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。In a specific implementation, the processor may include one or more CPUs, and each CPU may be a single-core (single-core) processor or a multi-core (multi-Core) processor. The processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (eg, computer program instructions).
在具体实现中,作为一种实施例,该计算机设备还可以包括输入/输出(I/O)接口。例如,输出设备可以是液晶显示器(liquid crystal display,LCD),发光二级管(light emitting diode,LED)显示设备,阴极射线管(cathode ray tube,CRT)显示设备,或投影仪(projector)等。输入设备可以是鼠标、键盘、触摸屏设备或传感设备,以及至少两个成像传感器等。In a specific implementation, as an embodiment, the computer device may further include an input/output (I/O) interface. For example, the output device may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector, etc. . The input device may be a mouse, a keyboard, a touch screen device or a sensing device, and at least two imaging sensors.
上述的计算机设备可以是一个通用计算机设备或者是一个专用计算机设备。在具体实现中,计算机设备可以是台式机、便携式电脑、网络服务器、掌上电脑(Personal Digital Assistant,PDA)、移动手机、平板电脑、无线终端设备、通信设备、嵌入式设备或有图4中类似结构的设备。本发明实施例不限定计算机设备的类型。The aforementioned computer device may be a general-purpose computer device or a dedicated computer device. In a specific implementation, the computer device may be a desktop computer, a portable computer, a network server, a PDA (Personal Digital Assistant), a mobile phone, a tablet computer, a wireless terminal device, a communication device, an embedded device, or the like in FIG. 4 Structured equipment. The embodiment of the present invention does not limit the type of computer equipment.
如图1中的终端设备可以为图4所示的设备,存储器中存储了一个或多个软件模块。终端设备可以通过处理器以及存储器中的程序代码来实现软件模块,完成上述方法。The terminal device in FIG. 1 may be the device shown in FIG. 4, and one or more software modules are stored in the memory. The terminal device can implement the software module through the processor and the program code in the memory to complete the above method.
本发明实施例还提供了一种计算机存储介质,用于储存为上述图3或图4所示的设备所用的计算机软件指令,其包含用于执行上述方法实施例所设计的程序。通过执行存储的程序,可以实现上述方法。An embodiment of the present invention also provides a computer storage medium for storing computer software instructions for the device shown in FIG. 3 or FIG. 4 above, which includes a program designed to execute the above method embodiment. By executing the stored program, the above method can be realized.
尽管在此结合各实施例对本发明进行了描述,然而,在实施所要求保护的本发明过程中,本领域技术人员通过查看所述附图、公开内容、以及所附权利要求书,可理解并实现所述公开实施例的其他变化。在权利要求中,“包括”(comprising)一词不排除其他组成部分或步骤,“一”或“一个”不排除多个的情况。单个处理器或其他单元可以实现权利要求中列举的若干项功能。相互不同的从属权利要求中记载了某些措施,但这并不表示 这些措施不能组合起来产生良好的效果。Although the present invention has been described in conjunction with various embodiments herein, in the process of implementing the claimed invention, those skilled in the art can understand and understand by looking at the drawings, the disclosure, and the appended claims Other changes to the disclosed embodiments are implemented. In the claims, the word "comprising" does not exclude other components or steps, and "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill several functions recited in the claims. Certain measures are recited in mutually different dependent claims, but this does not mean that these measures cannot be combined to produce good results.
本领域技术人员应明白,本发明的实施例可提供为方法、装置(设备)、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。计算机程序存储/分布在合适的介质中,与其它硬件一起提供或作为硬件的一部分,也可以采用其他分布形式,如通过Internet或其它有线或无线电信系统。Those skilled in the art should understand that the embodiments of the present invention may be provided as a method, an apparatus (device), or a computer program product. Therefore, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code. The computer program is stored/distributed in a suitable medium, provided together with other hardware or as a part of the hardware, and may also adopt other distribution forms, such as via the Internet or other wired or wireless telecommunication systems.
本发明是参照本发明实施例的方法、装置(设备)和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to the flowchart and/or block diagram of the method, apparatus (device) and computer program product of the embodiments of the present invention. It should be understood that each flow and/or block in the flowchart and/or block diagram and a combination of the flow and/or block in the flowchart and/or block diagram may be implemented by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, special-purpose computer, embedded processing machine, or other programmable data processing device to produce a machine that enables the generation of instructions executed by the processor of the computer or other programmable data processing device A device for realizing the functions specified in one block or multiple blocks of one flow or multiple flows of a flowchart and/or one block or multiple blocks of a block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer readable memory that can guide a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory produce an article of manufacture including an instruction device, the instructions The device implements the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to produce computer-implemented processing, which is executed on the computer or other programmable device The instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
尽管结合具体特征及其实施例对本发明进行了描述,显而易见的,可对其进行各种修改和组合。相应地,本说明书和附图仅仅是所附权利要求所界定的本发明的示例性说明,且视为已覆盖本发明范围内的任意和所有修改、变化、组合或等同物。显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。Although the invention has been described in conjunction with specific features and embodiments thereof, it is obvious that various modifications and combinations can be made to it. Accordingly, the specification and drawings are merely exemplary illustrations of the invention as defined by the appended claims, and are deemed to cover any and all modifications, changes, combinations, or equivalents within the scope of the invention. Obviously, those skilled in the art can make various modifications and variations to the present invention without departing from the scope of the present invention. In this way, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and equivalent technologies thereof, the present invention is also intended to include these modifications and variations.

Claims (14)

  1. 一种消除回声的方法,其特征在于,应用于终端设备,包括:A method for eliminating echo, which is characterized in that it is applied to a terminal device and includes:
    输出音频参考信号;Output audio reference signal;
    采集音频输入信号,所述音频输入信号中包含了音频参考信号的回声;Collect audio input signals, which include echoes of audio reference signals;
    根据音频参考信号的回声确定回声信道的时延和衰减系数;Determine the delay and attenuation coefficient of the echo channel according to the echo of the audio reference signal;
    根据所述时延和衰减系数消除音频输入信号中的音频内容信号的回声。The echo of the audio content signal in the audio input signal is eliminated according to the time delay and attenuation coefficient.
  2. 如权利要求1所述的方法,其特征在于,所述确定回声信道的衰减系数包括:The method according to claim 1, wherein the determining the attenuation coefficient of the echo channel comprises:
    对音频输入信号通过傅里叶变换计算出在音频参考信号频率上的回声信号幅值;Calculate the amplitude of the echo signal at the frequency of the audio reference signal through the Fourier transform of the audio input signal;
    所述音频参考信号频率上的回声信号幅值与所述输出的音频参考信号的信号幅值比值即为回声信号的衰减系数。The ratio of the amplitude of the echo signal at the frequency of the audio reference signal to the amplitude of the signal of the output audio reference signal is the attenuation coefficient of the echo signal.
  3. 如权利要求1所述的方法,其特征在于,所述方法还包括将音频输入信号通过带通滤波器进行滤波,获得所述音频参考信号的回声。The method of claim 1, wherein the method further comprises filtering the audio input signal through a band-pass filter to obtain the echo of the audio reference signal.
  4. 如权利要求3所述的方法,其特征在于,所述确定回声信道的衰减系数包括:The method of claim 3, wherein the determining the attenuation coefficient of the echo channel comprises:
    通过均方根值方式计算出在音频参考信号频率上的回声信号幅值;Calculate the amplitude of the echo signal at the frequency of the audio reference signal by means of root mean square;
    所述音频参考信号频率上的回声信号幅值与所述输出的音频参考信号的信号幅值比值即为回声信号的衰减系数。The ratio of the amplitude of the echo signal at the frequency of the audio reference signal to the amplitude of the signal of the output audio reference signal is the attenuation coefficient of the echo signal.
  5. 如权利要求1-4任一项所述的方法,其特征在于,所述确定回声信道的时延包括:The method according to any one of claims 1 to 4, wherein the determining the delay of the echo channel comprises:
    记录开始输出音频参考信号的第一时间,并记录检测到音频输入信号中开始出现音频参考信号的回声的第二时间;所述时延为所述第二时间与第一时间的时间差。Record the first time when the audio reference signal starts to be output, and record the second time when the echo of the audio reference signal starts to be detected in the audio input signal; the time delay is the time difference between the second time and the first time.
  6. 如权利要求1-5任一项所述的方法,其特征在于,所述音频参考信号的频率大于人耳可听见声音的频率范围。The method according to any one of claims 1-5, wherein the frequency of the audio reference signal is greater than the frequency range of human ear audible sound.
  7. 如权利要求1-6任一项所述的方法,其特征在于,所述输出音频参考信号在所述终端设备开机时进行,或周期性地进行。The method according to any one of claims 1-6, wherein the outputting of the audio reference signal is performed when the terminal device is turned on, or periodically.
  8. 一种终端设备,其特征在于,包括:音频输出单元,音频输入单元和处理单元;其中:A terminal device is characterized by comprising: an audio output unit, an audio input unit and a processing unit; wherein:
    所述音频输出单元,用于输出音频参考信号;The audio output unit is used to output an audio reference signal;
    所述音频输入单元,用于采集音频输入信号,所述音频输入信号中包含了音频参考信号的回声;The audio input unit is used to collect audio input signals, and the audio input signals include echoes of audio reference signals;
    所述处理单元,用于根据音频参考信号的回声确定回声信道的时延和衰减系数,并根据所述时延和衰减系数消除音频输入信号中的音频内容信号的回声。The processing unit is configured to determine the delay and attenuation coefficient of the echo channel according to the echo of the audio reference signal, and eliminate the echo of the audio content signal in the audio input signal according to the delay and attenuation coefficient.
  9. 如权利要求8所述的终端设备,其特征在于,所述处理单元用于确定回声信道的衰减系数具体包括:The terminal device according to claim 8, wherein the processing unit for determining the attenuation coefficient of the echo channel specifically includes:
    所述处理单元进一步用于对音频输入信号通过傅里叶变换计算出在音频参考信号频率上的回声信号幅值;The processing unit is further used to calculate the amplitude of the echo signal at the frequency of the audio reference signal by Fourier transform of the audio input signal;
    所述音频参考信号频率上的回声信号幅值与所述输出的音频参考信号的信号幅值比值即为回声信号的衰减系数。The ratio of the amplitude of the echo signal at the frequency of the audio reference signal to the amplitude of the signal of the output audio reference signal is the attenuation coefficient of the echo signal.
  10. 如权利要求8所述的终端设备,其特征在于,所述所述处理单元进一步用于将音频输入信号通过带通滤波器进行滤波,获得所述音频参考信号的回声。The terminal device according to claim 8, wherein the processing unit is further configured to filter the audio input signal through a band-pass filter to obtain the echo of the audio reference signal.
  11. 如权利要求10所述的终端设备,其特征在于,所述处理单元用于确定回声信道的衰减系数具体包括:The terminal device according to claim 10, wherein the processing unit for determining the attenuation coefficient of the echo channel specifically includes:
    所述处理单元进一步用于通过均方根值方式计算出在音频参考信号频率上的回声信号幅值;The processing unit is further used to calculate the amplitude of the echo signal at the frequency of the audio reference signal by means of root mean square;
    所述音频参考信号频率上的回声信号幅值与所述输出的音频参考信号的信号幅值比值即为回声信号的衰减系数。The ratio of the amplitude of the echo signal at the frequency of the audio reference signal to the amplitude of the signal of the output audio reference signal is the attenuation coefficient of the echo signal.
  12. 如权利要求8-11任一项所述的终端设备,其特征在于,所述处理单元用于确定回声信道的时延包括:The terminal device according to any one of claims 8 to 11, wherein the processing unit is configured to determine the delay of the echo channel including:
    所述处理单元进一步用于记录开始输出音频参考信号的第一时间,并记录检测到音频输入信号中开始出现音频参考信号的回声的第二时间;所述时延为所述第二时间与第一时间的时间差。The processing unit is further used to record the first time when the audio reference signal starts to be output, and record the second time when the echo of the audio reference signal starts to be detected in the audio input signal; the delay is the second time and the second Time difference.
  13. 如权利要求8-12任一项所述的终端设备,其特征在于,所述音频参考信号的频率大于人耳可听见声音的频率范围。The terminal device according to any one of claims 8-12, wherein the frequency of the audio reference signal is greater than the frequency range of human ear audible sound.
  14. 如权利要求8-13任一项所述的终端设备,其特征在于,所述音频输出单元输出音频参考信号在所述终端设备开机时进行,或周期性地进行。The terminal device according to any one of claims 8 to 13, wherein the output of the audio reference signal by the audio output unit is performed when the terminal device is turned on, or periodically.
PCT/CN2019/120452 2018-12-17 2019-11-23 Method for eliminating echo and device WO2020125325A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811542603.9A CN111402910B (en) 2018-12-17 2018-12-17 Method and equipment for eliminating echo
CN201811542603.9 2018-12-17

Publications (1)

Publication Number Publication Date
WO2020125325A1 true WO2020125325A1 (en) 2020-06-25

Family

ID=71100733

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/120452 WO2020125325A1 (en) 2018-12-17 2019-11-23 Method for eliminating echo and device

Country Status (2)

Country Link
CN (1) CN111402910B (en)
WO (1) WO2020125325A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113362819B (en) * 2021-05-14 2022-06-14 歌尔股份有限公司 Voice extraction method, device, equipment, system and storage medium
CN113938746B (en) * 2021-09-28 2023-10-27 广州华多网络科技有限公司 Network live broadcast audio processing method and device, equipment, medium and product thereof
CN113891152A (en) * 2021-09-28 2022-01-04 广州华多网络科技有限公司 Audio playing control method and device, equipment, medium and product thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0837480A (en) * 1994-07-22 1996-02-06 Fujitsu Ltd Echo canceler
CN101114844A (en) * 2006-07-26 2008-01-30 冲电气工业株式会社 Resonance mornitoring system and method
CN101312372A (en) * 2008-05-12 2008-11-26 北京创毅视讯科技有限公司 Echo eliminator and echo eliminating method
CN103391381A (en) * 2012-05-10 2013-11-13 中兴通讯股份有限公司 Method and device for canceling echo
CN106657507A (en) * 2015-11-03 2017-05-10 中移(杭州)信息技术有限公司 Acoustic echo cancellation method and device

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4165578B2 (en) * 1997-02-25 2008-10-15 日本ビクター株式会社 Digital audio signal processing recording medium, digital audio signal communication method and reception method, and digital audio recording medium
JP2000049668A (en) * 1998-07-29 2000-02-18 Oki Electric Ind Co Ltd Echo canceler
US6256161B1 (en) * 1999-05-20 2001-07-03 Agere Systems Guardian Corp. Echo cancellation for disk drive read circuit
JP4192483B2 (en) * 2002-03-25 2008-12-10 ソニー株式会社 Echo canceller and echo canceling method
CN108133712B (en) * 2016-11-30 2021-02-12 华为技术有限公司 Method and device for processing audio data
CN106898359B (en) * 2017-03-24 2020-03-17 上海智臻智能网络科技股份有限公司 Audio signal processing method and system, audio interaction device and computer equipment
CN108322859A (en) * 2018-02-05 2018-07-24 北京百度网讯科技有限公司 Equipment, method and computer readable storage medium for echo cancellor

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0837480A (en) * 1994-07-22 1996-02-06 Fujitsu Ltd Echo canceler
CN101114844A (en) * 2006-07-26 2008-01-30 冲电气工业株式会社 Resonance mornitoring system and method
CN101312372A (en) * 2008-05-12 2008-11-26 北京创毅视讯科技有限公司 Echo eliminator and echo eliminating method
CN103391381A (en) * 2012-05-10 2013-11-13 中兴通讯股份有限公司 Method and device for canceling echo
CN106657507A (en) * 2015-11-03 2017-05-10 中移(杭州)信息技术有限公司 Acoustic echo cancellation method and device

Also Published As

Publication number Publication date
CN111402910B (en) 2023-09-01
CN111402910A (en) 2020-07-10

Similar Documents

Publication Publication Date Title
US11605393B2 (en) Audio cancellation for voice recognition
WO2020125325A1 (en) Method for eliminating echo and device
JP5497217B2 (en) Headphone correction system
CN109285554B (en) Echo cancellation method, server, terminal and system
CN106535039A (en) Loudness-Based Audio-Signal Compensation
US8498429B2 (en) Acoustic correction apparatus, audio output apparatus, and acoustic correction method
US11632200B2 (en) Measuring and evaluating a test signal generated by a device under test (DUT)
CN101909191B (en) Video processing apparatus and video processing method
JP2023520570A (en) Volume automatic adjustment method, device, medium and equipment
CN113707183A (en) Audio processing method and device in video
WO2018133247A1 (en) Abnormal sound detection method and apparatus
CN114223219A (en) Audio processing method and device
WO2020073564A1 (en) Method and apparatus for detecting loudness of audio signal
WO2022083502A1 (en) Voice interaction method and related apparatus, and method for establishing correspondence
CN114678038A (en) Audio noise detection method, computer device and computer program product
US20210385595A1 (en) Audio calibration method and device
CN112307161B (en) Method and apparatus for playing audio
WO2020087788A1 (en) Audio processing method and device
CN113470673A (en) Data processing method, device, equipment and storage medium
CN113055809B (en) 5.1 sound channel signal generation method, equipment and medium
CN111145792B (en) Audio processing method and device
WO2023245700A1 (en) Audio energy analysis method and related apparatus
CN107277687A (en) A kind of audio frequency apparatus and its processing method
WO2024131371A1 (en) Voice processing method and apparatus, and electronic device
WO2016143276A1 (en) Acoustic device and correction method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19898404

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19898404

Country of ref document: EP

Kind code of ref document: A1