WO2023040322A1 - Echo cancellation method, and terminal device and storage medium - Google Patents

Echo cancellation method, and terminal device and storage medium Download PDF

Info

Publication number
WO2023040322A1
WO2023040322A1 PCT/CN2022/093959 CN2022093959W WO2023040322A1 WO 2023040322 A1 WO2023040322 A1 WO 2023040322A1 CN 2022093959 W CN2022093959 W CN 2022093959W WO 2023040322 A1 WO2023040322 A1 WO 2023040322A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice data
terminal device
echo
signal
correlation
Prior art date
Application number
PCT/CN2022/093959
Other languages
French (fr)
Chinese (zh)
Inventor
王清泉
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023040322A1 publication Critical patent/WO2023040322A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers

Definitions

  • the embodiments of the present application relate to but are not limited to the communication field, and in particular, relate to an echo cancellation method, a terminal device, and a storage medium.
  • Echo cancellation is a key technology for real-time voice transmission. Without echo cancellation, during voice interaction, the local end can always hear the voice it transmits to the peer end, which greatly affects the auditory effect and even makes it impossible to communicate effectively.
  • the traditional echo cancellation algorithm is limited by the length of the filter. If the echo tail length exceeds the length supported by the filter, the echo cannot be effectively eliminated. At the same time, it takes a certain amount of time for the filter parameters to converge during the delay and jittering process of the filter. Echo leakage will occur.
  • the main purpose of the embodiments of the present application is to provide an echo cancellation method, a terminal device, and a storage medium.
  • an embodiment of the present application provides an echo cancellation method, which is applied to a first terminal device, and the method includes: acquiring output first voice data from a second terminal device, the first voice data being the The voice data replied by the second terminal device after receiving the second voice data from the first terminal device; the first voice data is separated and processed through the time-domain audio separation model to obtain a plurality of split voice data Carrying out a correlation determination between each of the branch voice data in a plurality of the branch voice data and a plurality of reference signals at different times, and determining a residual echo signal; The residual echo signal is filtered to obtain the target voice data.
  • the embodiment of the present application provides an echo cancellation device, including: an acquisition module configured to acquire the output first voice data from the second terminal device, the first voice data being the second terminal device The voice data replied after receiving the second voice data from the first terminal device; the separation module is configured to separate the first voice data through a time-domain audio separation model to obtain multiple branches Speech data; a judging module, configured to determine the correlation between a plurality of the split speech data and the reference signal of the first speech data, and determine a residual echo signal; a filtering module, configured to convert the first The residual echo signal in the voice data is filtered to obtain target voice data.
  • an embodiment of the present application provides a terminal device, including: a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the computer program, the following In one aspect, the echo cancellation method.
  • a computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to execute the echo cancellation method described in the first aspect.
  • FIG. 1 is a schematic diagram of a system architecture platform for performing an echo cancellation method provided by an embodiment of the present application
  • FIG. 2 is a flowchart of an echo cancellation method provided by an embodiment of the present application.
  • Fig. 3 is a flowchart of determining a residual echo signal in an echo cancellation method provided by an embodiment of the present application
  • FIG. 4 is another flowchart of an echo cancellation method provided by an embodiment of the present application.
  • FIG. 5 is a flowchart of an echo cancellation algorithm for adaptive filtering in an echo cancellation method provided by an embodiment of the present application
  • FIG. 6 is a schematic diagram of an echo cancellation device provided by an embodiment of the present application.
  • Fig. 7 is a schematic diagram of a terminal device provided by an embodiment of the present application.
  • the embodiment of the present application provides an echo cancellation method, a terminal device, and a storage medium.
  • the echo cancellation method includes but is not limited to the following steps: the first terminal device acquires the output first voice data from the second terminal device; The separation model separates the first voice data to obtain a plurality of branch voice data; each of the multiple branch voice data is correlated with a plurality of reference signals at different times to determine the residual Echo signal: filtering the residual echo signal in the first voice data to obtain the target voice data.
  • the first voice data is separated and processed to obtain the divided voice data and the reference signal to determine the residual echo signal, and then filter the residual echo signal, so as to solve the problem of echo leakage and provide more High quality voice output.
  • the divided voice data obtained by separating the first voice data through the real-time and fast-response time-domain audio separation model can be judged with the reference signal to obtain the residual echo signal, and then the residual echo signal can be filtered to be more accurate.
  • High-quality target voice data can solve the problem of echo leakage.
  • FIG. 1 is a functional block diagram of a system architecture platform 100 applicable to an embodiment of the present application.
  • the system architecture platform 100 includes a first terminal device 110 and a second terminal device 120, the first terminal device 110 includes a first echo canceling device 111, the second terminal device 120 includes a speaker 121, a pickup 122 and a second Echo cancellation device 123.
  • the echo cancellation method provided by the embodiment of the present application is applied to the system architecture platform 100 shown in FIG. 1, and the terminal devices (the first terminal device 110 and the second terminal device 120) shown in FIG. 1 can be personal computers PCs, mobile phones, etc. , set-top boxes, smart speakers, smart TVs and other devices.
  • the speaker 121 and the pickup 122 may also be directly included on the terminal device, such as a mobile phone.
  • the terminal device can also be connected to the speaker 121 and the pickup 122, such as a personal computer connected to the speaker 121 and the pickup 122, and a set-top box connected to a TV as an audio and video playback device.
  • the first terminal device 110 may be the same device or may be a different device, which is not specifically limited in this embodiment.
  • the first terminal device 110 is configured to send an audio signal (second voice data and/or reference signal) to the second terminal device 120, and the first echo canceling device 111 of the first terminal device 110 is configured to send an audio signal to the second terminal device 120.
  • the transmitted first voice signal is subjected to residual echo cancellation processing.
  • the second terminal device 120 is configured to output the second voice data sent from the first terminal device 110 to the speaker 121 , and also output the reference signal to the speaker 121 .
  • the reference signal is usually a high-frequency signal, and its frequency is greater than the frequency range of audible sounds by human ears. Generally, the frequency range of the sound audible to the human ear is from 20 Hz to 20,000 Hz, so the frequency of the reference signal can be selected to be above 20,000 Hz.
  • the second echo cancellation device 123 of the second terminal device 120 is configured to collect the audio input signal of the pickup 122 (including the voice signal output by the second speaker 121 according to the second voice signal and the voice of the user), and process the audio signal The echo mixed in the input signal is eliminated, so as to obtain the first voice data.
  • the speaker 121 is set to play the signal acquired by the second terminal device 120 from the first terminal device 110 , including the second voice signal and/or the reference signal.
  • the sound of the played second voice signal can be listened to by the user, but the sound of the played audio reference signal cannot be heard by the user, so that the user experience will not be affected.
  • the sound of the second voice signal or the audio reference signal output by the speaker 121 will propagate to the pickup 122 to generate an echo.
  • the sound pickup 122 is configured to receive a mixed voice signal, the mixed voice signal at least includes a voice signal uttered by the user and a second voice signal output by the speaker 121 of the second terminal device 120 .
  • the echo of the second voice signal output by the speaker 121 or the echo of the reference signal may be mixed in the sound received by the pickup 122.
  • the second voice signal and the reference signal output by the loudspeaker 121 will generate an echo in the sound pickup 122, and the reasons for this include diffraction and reflection of the sound.
  • the echo signal can be regarded as the sound signal after the audio signal passes through the echo channel.
  • the impact of the echo channel on sound includes: delay in time and attenuation in energy.
  • the echo channel has a similar effect on the audio content signal as it does on the audio reference signal. Therefore, the reference signal can be analyzed to obtain characteristic parameters of the echo channel, including time delay and attenuation coefficient, and then the echo of the audio content signal can be eliminated by using these two characteristic parameters of the echo channel.
  • the system architecture platform 100 can be applied to 3G cellular communications; for example, Code Division Multiple Access (Code Division Multiple Access, CDMA for short), EVDO, Global System for Mobile Communications (Global System for Mobile Communications, GSM for short)/General Packet Radio Service (GPRS for short), or 4G cellular communication, such as Long Term Evolution (LTE for short); or, 5G cellular communication; or, the subsequent evolution of mobile communication networks, This embodiment does not specifically limit it.
  • CDMA Code Division Multiple Access
  • EVDO Code Division Multiple Access
  • GSM Global System for Mobile Communications
  • GPRS General Packet Radio Service
  • 4G cellular communication such as Long Term Evolution (LTE for short)
  • LTE Long Term Evolution
  • 5G cellular communication 5G cellular communication
  • system architecture platform 100 shown in FIG. 1 does not constitute a limitation to the embodiment of the present application, and may include more or less components than those shown in the illustration, or combine certain components, or be different layout of the components.
  • Figure 2 is a flowchart of an echo cancellation method provided by an embodiment of the present application, the echo cancellation method is applied to the first terminal device, and the echo cancellation method includes but is not limited to step S100, step S200, Step S300 and Step S400.
  • step S100 the output first voice data from the second terminal device is acquired, and the first voice data is the voice data that the second terminal device replies after receiving the second voice data from the first terminal device.
  • the first terminal device and the second terminal device are in the process of voice interaction, the first terminal device sends the second voice data to the second terminal device, and the second terminal device will transmit the second voice data through the speaker after receiving the second voice data playing, at the same time, the pickup of the second terminal device will simultaneously receive the second voice data and other sounds played by the speaker.
  • the second echo cancellation device of the second terminal device will filter the echo signal in the pickup, The first voice data is generated and sent to the first terminal device. At this time, the first terminal device acquires the first voice data output from the second terminal device.
  • Step S200 performing separation processing on the first speech data through a time-domain audio separation model to obtain a plurality of split speech data.
  • the second terminal device can separate the first voice data through the time-domain audio separation model to obtain multiple split voice data. Due to the time-domain audio separation The model has the ability of time-domain audio separation, and the residual echo signals are included in the multiple split voice data obtained through the time-domain audio separation model. Among them, the time-domain audio separation model is obtained by training with mixed speech data.
  • time-domain audio separation model can be a time-domain audio separation network (Time-domain Audio Separation Network, referred to as TasNet) model, can be a full convolution time-domain audio separation network (Convolution Time-domain Audio Separation Network, referred to as Conv-TasNet) can also be other time-domain audio separation models, which are not specifically limited in this embodiment.
  • TasNet Time-domain Audio Separation Network
  • Conv-TasNet Conv-TasNet
  • step S300 the correlation degree of each branch voice data among the multiple branch voice data is determined with a plurality of reference signals at different times, and the residual echo signal is determined.
  • the first terminal device may perform a correlation determination on each split voice data among the plurality of split voice data and a plurality of reference signals at different times, and may A residual echo signal is determined based on the correlation result.
  • Step S400 Filter the residual echo signal in the first voice data to obtain target voice data.
  • the first terminal device filters the residual echo signal in the first voice data, so as to obtain target voice data.
  • the target voice data is pure voice data, which can solve the problem of echo leakage.
  • the filtering method for filtering the residual echo signal in the first voice data may be spectral subtraction or other filtering methods, which is not limited in this embodiment.
  • the spectrum subtraction is to subtract the spectrum of the noise signal from the spectrum of the noisy signal. Assuming that the noise in the speech has additive noise, the noise spectrum can be subtracted from the noisy speech spectrum to obtain pure speech, in which the noise signal is stable or slowly changing.
  • the formula is as follows:
  • Ps(w) is the spectrum of the input noisy speech (first speech data)
  • Pn(w) is the spectrum of the estimated noise (residual echo signal)
  • the difference spectrum of D(w) is obtained by subtracting the two . Since negative values may appear after the subtraction, a judgment condition can be added to set all negative values to 0, and the result obtained is the spectrum of the final output denoising speech (target speech data).
  • the first terminal device obtains the output first voice data from the second terminal device. Since the first voice data is the voice data processed by the second terminal device after echo cancellation, however, the echo cancellation processing method in the second terminal device is limited by the length of the filter. If the echo tail length exceeds the length supported by the filter, the echo cannot be effectively eliminated. At the same time, the filter parameters converge during the delay and jitter process It takes a certain period of time. In this process, echo leakage will occur and a residual echo signal will be generated, that is, the first voice data is data with a residual echo signal. Then the second terminal device can use the TasNet model to process the first voice data.
  • the multiple branch voice data obtained through the separation of the TasNet model includes residual echo signals, and then in order to be able to obtain from multiple branch voice data
  • the residual echo signal is identified in the first terminal device, and the first terminal device can determine the correlation degree between the plurality of branched voice data and the reference signal of the first voice data, determine the residual echo signal according to the correlation result, and then compare the first voice data
  • the residual echo signal is filtered to obtain the target voice data.
  • the target voice data is pure voice data.
  • the first terminal device sets a local echo cancellation mechanism that does not rely on the remote echo cancellation effect, which provides a guarantee for the voice quality of the local end and can solve the echo leakage The problem.
  • step S300 includes but not limited to step S410 to step S430 .
  • Step S410 performing correlation calculations on each of the plurality of branch voice data and a plurality of reference signals at different times to obtain a reference signal with the maximum correlation;
  • Step S420 when the maximum correlation degree is greater than the preset threshold, determine the reference signal corresponding to the maximum correlation degree as the target reference signal;
  • Step S430 determining the divided voice data corresponding to the target reference signal as the residual echo signal.
  • the reference signal is s(t)
  • the divided voice data is yn(t)
  • n represents the nth channel signal. Since the residual echo in yn(t) has a certain delay compared to s(t), it is possible to retain historical information in s(t), and use si(t) to represent the history compared to the current moment The reference signal of the i-th frame.
  • si(t) corresponding to yn(t) that is, by traversing all si(t), calculate the correlation with yn(t), and obtain the maximum correlation si(t ), if the correlation is greater than the preset threshold cohne, the si(t) of the maximum correlation is considered to be a valid reference frame;
  • the yn(t) with the highest correlation is the residual echo signal.
  • the correlation calculation method can adopt the cross-correlation method in Web Real-Time Communication (WebRTC for short), and this algorithm can achieve a better balance in algorithm degree and performance.
  • FFT Fast Fourier Transform
  • the correlation between si(t) and unseparated speech data y(t) can be traversed first, and the corresponding si(t) corresponding to the highest correlation can be selected. t), and then use si(t) and y(t) to traverse the separate voice data yn(t), and finally get the yn(t) with the highest correlation with si(t), which will be related to si(t)
  • the yn(t) with the highest degree is determined as the residual echo signal.
  • the first voice data is the voice data processed by the second terminal device through the echo cancellation algorithm of the adaptive filter to the echo data generated by the second voice data, wherein the echo of the adaptive filter
  • the elimination algorithm includes but not limited to step S510, step S520, step S530, step S550, step S550.
  • Step S510 obtaining estimated delay information by the second terminal device according to the second voice data and the reference signal
  • Step S520 obtaining the estimated echo signal according to the estimated delay information, the preset adaptive filter coefficient and the reference signal;
  • Step S530 performing correlation calculation on the second speech data and the estimated echo signal to obtain correlation coefficients at different time points
  • Step S540 determine the echo input signal according to the relevant system
  • Step S550 performing filtering processing on the echo input signal to obtain first voice data.
  • the part of the correlation coefficient higher than the threshold can be judged as close terminal input, and the part whose correlation coefficient is lower than the threshold value can be judged as the far-end echo input.
  • the noise source is generated according to the random number generation algorithm, and low-frequency filtering and amplitude limitation are performed to generate comfort noise, which is used to send to the first terminal device. of the first voice data.
  • the echo canceling device includes: an acquiring module 610 , a separating module 620 , a judging module 630 and a filtering module 640 .
  • the obtaining module 610 is configured to obtain the outputted first voice data from the second terminal device, the first voice data is the voice data replied by the second terminal device after receiving the second voice data from the first terminal device.
  • the separation module 620 is set to separate the first voice data through the time-domain audio separation model to obtain a plurality of branch voice data; the determination module 630 is set to separate the multiple branch voice data with the first voice data respectively
  • the reference signal is used for correlation determination to determine the residual echo signal; the filtering module 640 is configured to filter the residual echo signal in the first voice data to obtain the target voice data.
  • the separation module 620 is also configured to separate the first speech data through the TasNet model trained according to the mixed speech data to obtain a plurality of split speech data.
  • the judging module 630 is also configured to perform correlation calculations on the plurality of branch voice data and the reference signal of the first voice data respectively, to obtain a plurality of correlation degrees corresponding to the branch voice data; when the maximum correlation degree is greater than the preset threshold , determine the branch voice data corresponding to the highest correlation degree as the residual echo signal.
  • the judging module 630 is also configured to use the cross-correlation function of WebRTC to calculate the correlation between the plurality of branch voice data and the reference signal of the first voice data respectively.
  • the filtering module 640 is further configured to use spectral subtraction to filter the residual echo signal in the first speech data to obtain the target speech data.
  • the echo canceling device is set to perform the same technical means as the echo canceling method in the above-mentioned embodiment, and the technical problem solved by the echo canceling method in the above-mentioned embodiment is the same as that achieved by the echo canceling method in the above-mentioned embodiment.
  • the technical effects of the echo canceling device are the same, and the technical means adopted by the echo canceling device, the technical problems solved and the technical effects achieved will not be repeated here.
  • the terminal device 700 includes a memory 720 , a processor 710 and a computer program stored in the memory 720 and operable on the processor 710 .
  • the processor 710 and the memory 720 may be connected via a bus or in other ways.
  • the memory 720 can be used to store non-transitory software programs and non-transitory computer-executable programs.
  • the memory 720 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices.
  • memory 720 may include memory located remotely from the processor, which may be connected to the processor via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the non-transitory software programs and instructions required to realize the echo cancellation method of the above-mentioned embodiment are stored in the memory 720, and when executed by the processor 710, the echo cancellation method in the above-mentioned embodiment is executed, for example, the implementation of the above-described Figure 2
  • the method steps S100 to S400 in FIG. 4 , the method steps S410 to S430 in FIG. 4 , and the method steps S510 to S550 in FIG. 5 are stored in the memory 720, and when executed by the processor 710, the echo cancellation method in the above-mentioned embodiment is executed, for example, the implementation of the above-described Figure 2
  • the method steps S100 to S400 in FIG. 4 the method steps S410 to S430 in FIG. 4
  • the method steps S510 to S550 in FIG. 5 are stored in the memory 720, and when executed by the processor 710, the echo cancellation method in the above-mentioned embodiment is executed, for example, the implementation of the above-described Figure 2
  • an embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are executed by a processor or a controller, for example, by the above-mentioned Execution by a processor in the communication device in the embodiment can cause the processor to execute the echo cancellation method corresponding to the terminal device side in the above embodiment, for example, execute steps S100 to S400 of the method in FIG. 2 described above, FIG. The method steps S410 to S430 in 4, and the method steps S510 to S550 in FIG. 5 .
  • the embodiment of the present application includes: the first terminal device obtains the output first voice data from the second terminal device, and the first voice data is the reply of the second terminal device after receiving the second voice data from the first terminal device voice data; the time-domain audio separation model is used to separate the first voice data to obtain a plurality of branch voice data; each branch voice data in the multiple branch voice data is respectively compared with a plurality of references at different times Signal correlation is judged to determine the residual echo signal; the residual echo signal in the first voice data is filtered to obtain the target voice data.
  • the time-domain audio separation model the first voice data is separated and processed to obtain the divided voice data and the reference signal to determine the residual echo signal, and then filter the residual echo signal, so as to solve the problem of echo leakage and provide more High quality voice output.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
  • Telephone Function (AREA)

Abstract

An echo cancellation method, and a terminal device and a storage medium. The echo cancellation method comprises: a first terminal device acquiring first voice data output from a second terminal device (S100); performing separation processing on the first voice data by means of a time-domain audio separation model, so as to obtain a plurality of pieces of branch voice data (S200); respectively performing relevancy determination on each of the plurality of pieces of branch voice data and a plurality of reference signals at different moments, so as to determine a residual echo signal (S300); and filtering the residual echo signal in the first voice data to obtain target voice data (S400).

Description

回声消除方法、终端设备及存储介质Echo cancellation method, terminal equipment and storage medium
相关申请的交叉引用Cross References to Related Applications
本申请基于申请号为202111073754.6、申请日为2021年09月14日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is based on a Chinese patent application with application number 202111073754.6 and a filing date of September 14, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference into this application.
技术领域technical field
本申请实施例涉及但不限于通信领域,尤其涉及回声消除方法、终端设备及存储介质。The embodiments of the present application relate to but are not limited to the communication field, and in particular, relate to an echo cancellation method, a terminal device, and a storage medium.
背景技术Background technique
回声消除是实时语音传输的关键技术,没有回声消除,语音交互中,本端始终可以听到自己传输给对端的语音,极大地影响听觉效果,甚至无法有效地进行语音沟通。而传统回声消除算法受限于滤波器的长度,回声尾长超过滤波器支持的长度则无法有效消除回声,同时滤波器在延时抖动的过程中,滤波器参数收敛需要一定的时长,该过程会出现回声泄露的情况。Echo cancellation is a key technology for real-time voice transmission. Without echo cancellation, during voice interaction, the local end can always hear the voice it transmits to the peer end, which greatly affects the auditory effect and even makes it impossible to communicate effectively. However, the traditional echo cancellation algorithm is limited by the length of the filter. If the echo tail length exceeds the length supported by the filter, the echo cannot be effectively eliminated. At the same time, it takes a certain amount of time for the filter parameters to converge during the delay and jittering process of the filter. Echo leakage will occur.
发明内容Contents of the invention
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics described in detail in this article. This summary is not intended to limit the scope of the claims.
本申请实施例的主要目的在于提出回声消除方法、终端设备及存储介质。The main purpose of the embodiments of the present application is to provide an echo cancellation method, a terminal device, and a storage medium.
第一方面,本申请实施例提供了回声消除方法,应用于第一终端设备,所述方法包括:获取来自第二终端设备的所输出的第一语音数据,所述第一语音数据为所述第二终端设备在接收到来自所述第一终端设备的第二语音数据后所回复的语音数据;通过时域音频分离模型对所述第一语音数据进行分离处理,得到多个分路语音数据;将多个所述分路语音数据中的每一个所述分路语音数据分别与多个不同时刻的参考信号进行相关度判定,确定残余回声信号;将所述第一语音数据中的所述残余回声信号进行过滤,得到目标语音数据。In a first aspect, an embodiment of the present application provides an echo cancellation method, which is applied to a first terminal device, and the method includes: acquiring output first voice data from a second terminal device, the first voice data being the The voice data replied by the second terminal device after receiving the second voice data from the first terminal device; the first voice data is separated and processed through the time-domain audio separation model to obtain a plurality of split voice data Carrying out a correlation determination between each of the branch voice data in a plurality of the branch voice data and a plurality of reference signals at different times, and determining a residual echo signal; The residual echo signal is filtered to obtain the target voice data.
第二方面,本申请实施例提供了回声消除装置,包括:获取模块,被设置为获取来自第二终端设备的所输出的第一语音数据,所述第一语音数据为所述第二终端设备在接收到来自所述第一终端设备的第二语音数据后所回复的语音数据;分离模块,被设置为通过时域音频分离模型对所述第一语音数据进行分离处理,得到多个分路语音数据;判定模块,被设置为将多个所述分路语音数据分别与所述第一语音数据的参考信号进行相关度判定,确定残余回声信号;过滤模块,被设置为将所述第一语音数据中的所述残余回声信号进行过滤,得到目标语音数 据。In a second aspect, the embodiment of the present application provides an echo cancellation device, including: an acquisition module configured to acquire the output first voice data from the second terminal device, the first voice data being the second terminal device The voice data replied after receiving the second voice data from the first terminal device; the separation module is configured to separate the first voice data through a time-domain audio separation model to obtain multiple branches Speech data; a judging module, configured to determine the correlation between a plurality of the split speech data and the reference signal of the first speech data, and determine a residual echo signal; a filtering module, configured to convert the first The residual echo signal in the voice data is filtered to obtain target voice data.
第三方面,本申请实施例提供了一种终端设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如第一方面所述的回声消除方法。In a third aspect, an embodiment of the present application provides a terminal device, including: a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the computer program, the following In one aspect, the echo cancellation method.
第四方面,一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行第一方面所述的回声消除方法。In a fourth aspect, a computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to execute the echo cancellation method described in the first aspect.
本申请的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请而了解。本申请的目的和其他优点可通过在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。Additional features and advantages of the application will be set forth in the description which follows, and, in part, will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
附图说明Description of drawings
图1是本申请一个实施例提供的用于执行回声消除方法的系统架构平台的示意图;FIG. 1 is a schematic diagram of a system architecture platform for performing an echo cancellation method provided by an embodiment of the present application;
图2是本申请一个实施例提供的回声消除方法的流程图;FIG. 2 is a flowchart of an echo cancellation method provided by an embodiment of the present application;
图3是本申请一个实施例提供的回声消除方法中确定残余回声信号的流程图;Fig. 3 is a flowchart of determining a residual echo signal in an echo cancellation method provided by an embodiment of the present application;
图4是本申请一个实施例提供的回声消除方法的另一个流程图;FIG. 4 is another flowchart of an echo cancellation method provided by an embodiment of the present application;
图5是本申请一个实施例提供的回声消除方法中自适应滤波的回声消除算法的流程图;FIG. 5 is a flowchart of an echo cancellation algorithm for adaptive filtering in an echo cancellation method provided by an embodiment of the present application;
图6是本申请一个实施例提供的回声消除装置的示意图;FIG. 6 is a schematic diagram of an echo cancellation device provided by an embodiment of the present application;
图7是本申请一个实施例提供的终端设备的示意图。Fig. 7 is a schematic diagram of a terminal device provided by an embodiment of the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, not to limit the present application.
需要说明的是,虽然在装置示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。说明书、权利要求书或上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that although the functional modules are divided in the schematic diagram of the device, and the logical sequence is shown in the flowchart, in some cases, it can be executed in a different order than the module division in the device or the flowchart in the flowchart. steps shown or described. The terms "first", "second" and the like in the specification, claims or the above drawings are used to distinguish similar objects, and not necessarily used to describe a specific order or sequence.
本申请实施例提供了回声消除方法、终端设备及存储介质,该回声消除方法包括但不限于如下步骤:第一终端设备获取来自第二终端设备的所输出的第一语音数据;通过时域音频分离模型对第一语音数据进行分离处理,得到多个分路语音数据;将多个分路语音数据中的每一个分路语音数据分别与多个不同时刻的参考信号进行相关度判定,确定残余回声信号;将第一语音数据中的残余回声信号进行过滤得到目标语音数据。通过时域音频分离模型对第一语音数据进行分离处 理得到分路语音数据与参考信号进行判定能够得到残余回声信号,然后对该残余回声信号进行过滤,从而能够解决回声泄露的问题,可以提供更高质量的语音输出。通过实时、快速响应的时域音频分离模型对第一语音数据进行分离处理所得到的分路语音数据,与参考信号进行判定能够得到残余回声信号,然后对该残余回声信号进行过滤处理,能够更高质量的目标语音数据,从而能够解决回声泄露的问题。The embodiment of the present application provides an echo cancellation method, a terminal device, and a storage medium. The echo cancellation method includes but is not limited to the following steps: the first terminal device acquires the output first voice data from the second terminal device; The separation model separates the first voice data to obtain a plurality of branch voice data; each of the multiple branch voice data is correlated with a plurality of reference signals at different times to determine the residual Echo signal: filtering the residual echo signal in the first voice data to obtain the target voice data. Through the time-domain audio separation model, the first voice data is separated and processed to obtain the divided voice data and the reference signal to determine the residual echo signal, and then filter the residual echo signal, so as to solve the problem of echo leakage and provide more High quality voice output. The divided voice data obtained by separating the first voice data through the real-time and fast-response time-domain audio separation model can be judged with the reference signal to obtain the residual echo signal, and then the residual echo signal can be filtered to be more accurate. High-quality target voice data can solve the problem of echo leakage.
下面结合附图,对本申请实施例的技术方案进行介绍。The technical solutions of the embodiments of the present application will be introduced below with reference to the accompanying drawings.
图1是本申请实施例适用的一种系统架构平台100的功能框图。在一个实施例中,系统架构平台100包括第一终端设备110和第二终端设备120,第一终端设备110包括第一回声消除装置111,第二终端设备120包括扬声器121、拾音器122和第二回声消除装置123。FIG. 1 is a functional block diagram of a system architecture platform 100 applicable to an embodiment of the present application. In one embodiment, the system architecture platform 100 includes a first terminal device 110 and a second terminal device 120, the first terminal device 110 includes a first echo canceling device 111, the second terminal device 120 includes a speaker 121, a pickup 122 and a second Echo cancellation device 123.
本申请实施例提供的回声消除方法应用于图1所示的系统架构平台100中,图1所示的终端设备(第一终端设备110和第二终端设备120)可以是个人计算机PC、移动电话、机顶盒、智能音箱、智能电视等设备。终端设备上还可以直接包括了扬声器121和拾音器122,如移动电话。终端设备也可以外接扬声器121和拾音器122,如个人计算机外接扬声器121和拾音器122,机顶盒外接电视机作为音视频播放设备。可以理解的是,第一终端设备110可以是相同的设备,也可以是不相同的设备,本实施例对其不作具体限定。The echo cancellation method provided by the embodiment of the present application is applied to the system architecture platform 100 shown in FIG. 1, and the terminal devices (the first terminal device 110 and the second terminal device 120) shown in FIG. 1 can be personal computers PCs, mobile phones, etc. , set-top boxes, smart speakers, smart TVs and other devices. The speaker 121 and the pickup 122 may also be directly included on the terminal device, such as a mobile phone. The terminal device can also be connected to the speaker 121 and the pickup 122, such as a personal computer connected to the speaker 121 and the pickup 122, and a set-top box connected to a TV as an audio and video playback device. It can be understood that the first terminal device 110 may be the same device or may be a different device, which is not specifically limited in this embodiment.
第一终端设备110被设置为向第二终端设备120发送音频信号(第二语音数据和/或参考信号),第一终端设备110第一回声消除装置111被设置为对第二终端设备120所发送的第一语音信号的进行残余回声消除处理。The first terminal device 110 is configured to send an audio signal (second voice data and/or reference signal) to the second terminal device 120, and the first echo canceling device 111 of the first terminal device 110 is configured to send an audio signal to the second terminal device 120. The transmitted first voice signal is subjected to residual echo cancellation processing.
第二终端设备120被设置为输出来自第一终端设备110所发送的第二语音数据到扬声器121,还输出参考信号到扬声器121。参考信号通常为高频信号,其频率大于人耳可听见的声音的频率范围。一般人耳可听见的声音的频率范围为20赫兹到20,000赫兹,因此参考信号的频率可选择20,000赫兹以上。The second terminal device 120 is configured to output the second voice data sent from the first terminal device 110 to the speaker 121 , and also output the reference signal to the speaker 121 . The reference signal is usually a high-frequency signal, and its frequency is greater than the frequency range of audible sounds by human ears. Generally, the frequency range of the sound audible to the human ear is from 20 Hz to 20,000 Hz, so the frequency of the reference signal can be selected to be above 20,000 Hz.
第二终端设备120的第二回声消除装置123被设置为采集拾音器122的音频输入信号(包括第二扬声器121根据第二语音信号所输出的语音信号和用户的声音),并进行处理,将音频输入信号中混入的回声进行消除处理,从而得到第一语音数据。The second echo cancellation device 123 of the second terminal device 120 is configured to collect the audio input signal of the pickup 122 (including the voice signal output by the second speaker 121 according to the second voice signal and the voice of the user), and process the audio signal The echo mixed in the input signal is eliminated, so as to obtain the first voice data.
扬声器121被设置为播放第二终端设备120从第一终端设备110所获取的信号,包括第二语音信号和/或参考信号。播放出来的第二语音信号的声音可以供用户收听,而播放的音频参考信号的声音用户听不见,这样不会影响用户的使用体验。扬声器121输出的第二语音信号或音频参考信号的声音会传播到拾音器122中产生回声。The speaker 121 is set to play the signal acquired by the second terminal device 120 from the first terminal device 110 , including the second voice signal and/or the reference signal. The sound of the played second voice signal can be listened to by the user, but the sound of the played audio reference signal cannot be heard by the user, so that the user experience will not be affected. The sound of the second voice signal or the audio reference signal output by the speaker 121 will propagate to the pickup 122 to generate an echo.
拾音器122被设置为接收混合语音信号,该混合的语音信号至少包括用户所发出的语音信号和第二终端设备120的扬声器121所输出的第二语音信号。拾音 器122接收的声音中可能混入了扬声器121输出的第二语音信号的回声,或者参考信号的回声。The sound pickup 122 is configured to receive a mixed voice signal, the mixed voice signal at least includes a voice signal uttered by the user and a second voice signal output by the speaker 121 of the second terminal device 120 . The echo of the second voice signal output by the speaker 121 or the echo of the reference signal may be mixed in the sound received by the pickup 122.
扬声器121输出的第二语音信号和参考信号会在拾音器122中产生回声,产生的原因包括声音的衍射、反射等。回声信号可以认为是音频信号经过回声信道后的声音信号。回声信道对声音的影响包括:时间上产生时延,能量上产生衰减。一般情况下,回声信道对音频内容信号的影响与对音频参考信号的影响相似。因此可以分析参考信号获得回声信道特性参数,包括时延和衰减系数,再利用这两个回声信道特性参数消除音频内容信号的回声。The second voice signal and the reference signal output by the loudspeaker 121 will generate an echo in the sound pickup 122, and the reasons for this include diffraction and reflection of the sound. The echo signal can be regarded as the sound signal after the audio signal passes through the echo channel. The impact of the echo channel on sound includes: delay in time and attenuation in energy. In general, the echo channel has a similar effect on the audio content signal as it does on the audio reference signal. Therefore, the reference signal can be analyzed to obtain characteristic parameters of the echo channel, including time delay and attenuation coefficient, and then the echo of the audio content signal can be eliminated by using these two characteristic parameters of the echo channel.
本领域技术人员可以理解的是,该系统架构平台100可以应用于3G蜂窝通信;例如,码分多址(Code Division Multiple Access,简称CDMA)、EVD0、全球移动通信系统(Global System for Mobile Communications,简称GSM)/通用分组无线服务(General Packet Radio Service,简称GPRS),或者4G蜂窝通信,例如长期演进(Long Term Evolution,简称LTE);或者,5G蜂窝通信;或者,后续演进的移动通信网络,本实施例对其不作具体限定。Those skilled in the art can understand that the system architecture platform 100 can be applied to 3G cellular communications; for example, Code Division Multiple Access (Code Division Multiple Access, CDMA for short), EVDO, Global System for Mobile Communications (Global System for Mobile Communications, GSM for short)/General Packet Radio Service (GPRS for short), or 4G cellular communication, such as Long Term Evolution (LTE for short); or, 5G cellular communication; or, the subsequent evolution of mobile communication networks, This embodiment does not specifically limit it.
本领域技术人员可以理解的是,图1中示出的系统架构平台100并不构成对本申请实施例的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the system architecture platform 100 shown in FIG. 1 does not constitute a limitation to the embodiment of the present application, and may include more or less components than those shown in the illustration, or combine certain components, or be different layout of the components.
基于上述系统架构平台,下面提出本申请的回声消除方法的各个实施例。Based on the above system architecture platform, various embodiments of the echo cancellation method of the present application are proposed below.
如图2所示,图2是本申请一个实施例提供的回声消除方法的流程图,该回声消除方法应用于第一终端设备,并且该回声消除方法包括但不限于有步骤S100、步骤S200、步骤S300和步骤S400。As shown in Figure 2, Figure 2 is a flowchart of an echo cancellation method provided by an embodiment of the present application, the echo cancellation method is applied to the first terminal device, and the echo cancellation method includes but is not limited to step S100, step S200, Step S300 and Step S400.
步骤S100,获取来自第二终端设备的所输出的第一语音数据,第一语音数据为第二终端设备在接收到来自第一终端设备的第二语音数据后所回复的语音数据。In step S100, the output first voice data from the second terminal device is acquired, and the first voice data is the voice data that the second terminal device replies after receiving the second voice data from the first terminal device.
第一终端设备和第二终端设备处于语音交互的过程中,第一终端设备向第二终端设备发送第二语音数据,第二终端设备接收到第二语音数据后会通过扬声器将第二语音数据进行播放,同时第二终端设备的拾音器会同时接收到扬声器播放的第二语音数据和其他声音,通常情况下,第二终端设备的第二回音消除装置会对拾音器中的回音信号进行过滤后,生成第一语音数据,并向第一终端设备进行发送,此时,第一终端设备会获取来自第二终端设备的所输出的第一语音数据。The first terminal device and the second terminal device are in the process of voice interaction, the first terminal device sends the second voice data to the second terminal device, and the second terminal device will transmit the second voice data through the speaker after receiving the second voice data playing, at the same time, the pickup of the second terminal device will simultaneously receive the second voice data and other sounds played by the speaker. Usually, the second echo cancellation device of the second terminal device will filter the echo signal in the pickup, The first voice data is generated and sent to the first terminal device. At this time, the first terminal device acquires the first voice data output from the second terminal device.
步骤S200,通过时域音频分离模型对第一语音数据进行分离处理,得到多个分路语音数据。Step S200, performing separation processing on the first speech data through a time-domain audio separation model to obtain a plurality of split speech data.
由于第二终端设备的回声消除处理受限于滤波器的长度,回声尾长超过滤波器支持的长度则无法有效消除回声,同时滤波器在延时抖动的过程中,滤波器参数收敛需要一定的时长,该过程会出现回声泄露的情况,而产生残余回声信号,第二终端设备可以通过时域音频分离模型对第一语音数据进行分离处理,得到多个分路语音数据,由于时域音频分离模型具有时域音频分离的能力,通过时域音 频分离模型分离得到的多个分路语音数据中包括残余回声信号。其中,时域音频分离模型是通过混合语音数据训练得到的。Since the echo cancellation processing of the second terminal device is limited by the length of the filter, the echo cannot be effectively eliminated if the echo tail length exceeds the length supported by the filter. At the same time, it takes a certain amount of time for the filter parameters to converge during the delay and jittering process of the filter. For a long time, echo leakage will occur in this process, and residual echo signals will be generated. The second terminal device can separate the first voice data through the time-domain audio separation model to obtain multiple split voice data. Due to the time-domain audio separation The model has the ability of time-domain audio separation, and the residual echo signals are included in the multiple split voice data obtained through the time-domain audio separation model. Among them, the time-domain audio separation model is obtained by training with mixed speech data.
需要说明的是,时域音频分离模型可以是时域音频分离网络(Time-domain Audio Separation Network,简称TasNet)模型,可以是全卷积时域音频分离网络(Convolution Time-domain Audio Separation Network,简称Conv-TasNet),还可以是其他时域音频分离模型,本实施例对其不作具体限定。It should be noted that the time-domain audio separation model can be a time-domain audio separation network (Time-domain Audio Separation Network, referred to as TasNet) model, can be a full convolution time-domain audio separation network (Convolution Time-domain Audio Separation Network, referred to as Conv-TasNet) can also be other time-domain audio separation models, which are not specifically limited in this embodiment.
步骤S300,将多个分路语音数据中的每一个分路语音数据分别与多个不同时刻的参考信号进行相关度判定,确定残余回声信号。In step S300, the correlation degree of each branch voice data among the multiple branch voice data is determined with a plurality of reference signals at different times, and the residual echo signal is determined.
为了从多个分路语音数据中识别出残余回声信号,第一终端设备可以将多个分路语音数据中的每一个分路语音数据分别与多个不同时刻的参考信号进行相关度判定,可以根据相关度结果确定残余回声信号。In order to identify the residual echo signal from the plurality of split voice data, the first terminal device may perform a correlation determination on each split voice data among the plurality of split voice data and a plurality of reference signals at different times, and may A residual echo signal is determined based on the correlation result.
步骤S400,将第一语音数据中的残余回声信号进行过滤,得到目标语音数据。Step S400: Filter the residual echo signal in the first voice data to obtain target voice data.
第一终端设备将第一语音数据中的残余回声信号进行过滤处理,从而可以得到目标语音数据,该目标语音数据是纯净语音数据,能够解决回声泄露的问题。The first terminal device filters the residual echo signal in the first voice data, so as to obtain target voice data. The target voice data is pure voice data, which can solve the problem of echo leakage.
需要说明的是,将第一语音数据中的残余回声信号进行过滤的过滤方法可以是谱减法,也可以是其他过滤方法,本实施例对其不作唯一限定。It should be noted that the filtering method for filtering the residual echo signal in the first voice data may be spectral subtraction or other filtering methods, which is not limited in this embodiment.
可以理解的是,谱减法是用带噪信号的频谱减去噪声信号的频谱。假设语音中的噪声有加性噪声,将带噪语音谱减去噪声频谱,能够得到纯净语音,其中噪声信号是平稳的或者缓慢变化的。公式如下:It can be understood that the spectrum subtraction is to subtract the spectrum of the noise signal from the spectrum of the noisy signal. Assuming that the noise in the speech has additive noise, the noise spectrum can be subtracted from the noisy speech spectrum to obtain pure speech, in which the noise signal is stable or slowly changing. The formula is as follows:
let D(w)=P s(w)-P n(w) let D(w)=P s (w)-P n (w)
Figure PCTCN2022093959-appb-000001
Figure PCTCN2022093959-appb-000001
其中,Ps(w)是输入的带噪语音的频谱(第一语音数据),Pn(w)是估计出的噪音的频谱(残余回声信号),两者相减得到D(w)差值频谱。由于相减后可能会出现负值,可以加上一个判断条件,将负值全部置为0,那么得到的结果为最终输出去噪语音的频谱(目标语音数据)。Among them, Ps(w) is the spectrum of the input noisy speech (first speech data), Pn(w) is the spectrum of the estimated noise (residual echo signal), and the difference spectrum of D(w) is obtained by subtracting the two . Since negative values may appear after the subtraction, a judgment condition can be added to set all negative values to 0, and the result obtained is the spectrum of the final output denoising speech (target speech data).
在一实施例中,参照图3所示,第一终端设备获取来自第二终端设备的所输出的第一语音数据,由于第一语音数据是第二终端设备的回声消除处理后的语音数据,而在第二终端设备中的回声消除处理方法受限于滤波器的长度,回声尾长超过滤波器支持的长度则无法有效消除回声,同时滤波器在延时抖动的过程中,滤波器参数收敛需要一定的时长,在该过程会中出现回声泄露的情况,产生残余回声信号,即第一语音数据是带有残余回声信号的数据,那么第二终端设备可以通过TasNet模型对第一语音数据进行分离处理,得到多个分路语音数据,由于TasNet模型具有时域音频分离的能力,通过TasNet模型分离得到的多个分路语音数据中包括残余回声信号,然后为了能够从多个分路语音数据中识别出残余回声信号,第一终端设备可以将多个分路语音数据分别与第一语音数据的参考信号进行相关度判定,可以根据相关度结果确定残余回声信号,再将第一语音数据中 的残余回声信号进行过滤处理,从而可以得到目标语音数据,该目标语音数据是纯净语音数据,上述方法不仅可以利用传统回声消除的算法,覆盖大部分正常使用的场景,同时能够弥补传统回声消除的算法在延时变化与抖动过程中产生的回声泄露,同时,第一终端设备设置一种不依赖远端回声消除效果的本端回声消除机制,为本端的语音质量提供了保障,能够解决回声泄露的问题。In an embodiment, as shown in FIG. 3 , the first terminal device obtains the output first voice data from the second terminal device. Since the first voice data is the voice data processed by the second terminal device after echo cancellation, However, the echo cancellation processing method in the second terminal device is limited by the length of the filter. If the echo tail length exceeds the length supported by the filter, the echo cannot be effectively eliminated. At the same time, the filter parameters converge during the delay and jitter process It takes a certain period of time. In this process, echo leakage will occur and a residual echo signal will be generated, that is, the first voice data is data with a residual echo signal. Then the second terminal device can use the TasNet model to process the first voice data. Separation processing to obtain a plurality of branch voice data, because the TasNet model has the ability of time-domain audio separation, the multiple branch voice data obtained through the separation of the TasNet model includes residual echo signals, and then in order to be able to obtain from multiple branch voice data The residual echo signal is identified in the first terminal device, and the first terminal device can determine the correlation degree between the plurality of branched voice data and the reference signal of the first voice data, determine the residual echo signal according to the correlation result, and then compare the first voice data The residual echo signal is filtered to obtain the target voice data. The target voice data is pure voice data. The above method can not only use the traditional echo cancellation algorithm to cover most of the normal use scenarios, but also can make up for the traditional echo cancellation. The echo leakage generated by the algorithm during the delay change and jittering process, at the same time, the first terminal device sets a local echo cancellation mechanism that does not rely on the remote echo cancellation effect, which provides a guarantee for the voice quality of the local end and can solve the echo leakage The problem.
参照图4,在一实施例中,步骤S300包括但不限于步骤S410至步骤S430。Referring to FIG. 4 , in one embodiment, step S300 includes but not limited to step S410 to step S430 .
步骤S410,将多个分路语音数据中的每一个分路语音数据分别与多个不同时刻的参考信号进行相关度计算,得到最大相关度的参考信号;Step S410, performing correlation calculations on each of the plurality of branch voice data and a plurality of reference signals at different times to obtain a reference signal with the maximum correlation;
步骤S420,当最大相关度大于预设阈值,确定最大相关度对应的参考信号为目标参考信号;Step S420, when the maximum correlation degree is greater than the preset threshold, determine the reference signal corresponding to the maximum correlation degree as the target reference signal;
步骤S430,将目标参考信号所对应的分路语音数据确定为残余回声信号。Step S430, determining the divided voice data corresponding to the target reference signal as the residual echo signal.
假设参考信号为s(t),分路语音数据为yn(t),其中n表示第n路信号。由于yn(t)中的残余回声相较于s(t)会存在一定的延时,因此,可以在s(t)中需要保留历史信息,以si(t)表示相较于当前时刻的历史第i帧的参考信号。然后,通过语音帧的相关性,估计与yn(t)对应的si(t),即通过遍历所有的si(t),与yn(t)进行相关度计算,得到最大相关度的si(t),若该相关度大于预设阈值cohne,则认为最大相关度的si(t)是有效的参考帧;然后使用最大相关度的si(t)与每个yn(t)进行遍历计算,得到相关度最高的yn(t),即为残余回声信号。通过对si(t)与yn(t)进行i*n的遍历计算,求得相关度最高si(t)与yn(t)。Assume that the reference signal is s(t), and the divided voice data is yn(t), where n represents the nth channel signal. Since the residual echo in yn(t) has a certain delay compared to s(t), it is possible to retain historical information in s(t), and use si(t) to represent the history compared to the current moment The reference signal of the i-th frame. Then, through the correlation of the speech frame, estimate si(t) corresponding to yn(t), that is, by traversing all si(t), calculate the correlation with yn(t), and obtain the maximum correlation si(t ), if the correlation is greater than the preset threshold cohne, the si(t) of the maximum correlation is considered to be a valid reference frame; The yn(t) with the highest correlation is the residual echo signal. By performing i*n ergodic calculation on si(t) and yn(t), obtain si(t) and yn(t) with the highest correlation.
为了满足实时性的要求,相关度计算方法可以采用网页即时通信(Web Real-Time Communication,简称WebRTC)中互相关方法,该算法在算法度和性能上能够达到较好的平衡。通过快速傅立叶变换(Fast Fourier Transform,简称FFT)将si(t)和yn(t)转换到频域,并对其均分为64个子频带,采用频谱中最重要的32个频段(即频段12-43)。然后通过该算法估计频谱的均值threshold_spectrum并设置该均值为门限值,当某个频段值大于门限值时,将其设置为1,反之将其设置为0。能够有效得到远端和近端信号二值化的频谱数值,将两数值进行按位异或运算,并计算其中的1的个数,即可以得到si(t)与yn(t)的相关度。In order to meet the real-time requirements, the correlation calculation method can adopt the cross-correlation method in Web Real-Time Communication (WebRTC for short), and this algorithm can achieve a better balance in algorithm degree and performance. Convert si(t) and yn(t) to the frequency domain through Fast Fourier Transform (FFT for short), and divide them into 64 sub-bands equally, using the most important 32 frequency bands in the spectrum (that is, frequency band 12 -43). Then use this algorithm to estimate the mean threshold_spectrum of the spectrum and set the mean value as the threshold value. When a certain frequency band value is greater than the threshold value, set it to 1, otherwise set it to 0. It can effectively obtain the binarized spectrum values of the far-end and near-end signals, perform a bitwise XOR operation on the two values, and calculate the number of 1s, that is, the correlation between si(t) and yn(t) can be obtained .
在一实施例中,通过该种快速计算相关度的方法,首先可以遍历计算si(t)与未分离的语音数据y(t)的相关度,并选出最高的相关度所对应的si(t),然后使用si(t)与y(t)的分离各路语音数据yn(t)进行遍历,最终得到与si(t)相关度最高的yn(t),将与si(t)相关度最高的yn(t)确定为残余回声信号。In one embodiment, through this fast correlation calculation method, the correlation between si(t) and unseparated speech data y(t) can be traversed first, and the corresponding si(t) corresponding to the highest correlation can be selected. t), and then use si(t) and y(t) to traverse the separate voice data yn(t), and finally get the yn(t) with the highest correlation with si(t), which will be related to si(t) The yn(t) with the highest degree is determined as the residual echo signal.
参照图5,在一实施例中,第一语音数据为第二终端设备通过自适应滤波的回声消除算法对第二语音数据所产生的回声数据处理后的语音数据,其中,自适应滤波的回声消除算法包括但不限于步骤S510、步骤S520、步骤S530、步骤S550、步骤S550。Referring to FIG. 5 , in one embodiment, the first voice data is the voice data processed by the second terminal device through the echo cancellation algorithm of the adaptive filter to the echo data generated by the second voice data, wherein the echo of the adaptive filter The elimination algorithm includes but not limited to step S510, step S520, step S530, step S550, step S550.
步骤S510,由第二终端设备根据第二语音数据和参考信号得到估计延时信 息;Step S510, obtaining estimated delay information by the second terminal device according to the second voice data and the reference signal;
步骤S520,根据估计延时信息、预设的自适应滤波器系数以及参考信号得到估计回声信号;Step S520, obtaining the estimated echo signal according to the estimated delay information, the preset adaptive filter coefficient and the reference signal;
步骤S530,将第二语音数据与估计回声信号进行相关度计算,得到不同时间点的相关系数;Step S530, performing correlation calculation on the second speech data and the estimated echo signal to obtain correlation coefficients at different time points;
步骤S540,根据相关系统确定回声输入信号;Step S540, determine the echo input signal according to the relevant system;
步骤S550,对回声输入信号进行过滤处理,得到第一语音数据。Step S550, performing filtering processing on the echo input signal to obtain first voice data.
回声消除算法是由第二终端设备根据第二语音数据和参考信号得到估计延时信息,即可以利用第二语音数据和参考信号两路信号的相关性来计算延时,预设置一组初始的自适应滤波器系数,根据参考信号,使其不断迭代至均方误差最小,或者达到迭代步骤的上限,此步骤输出得到估计的回声信号,使用估计回声信号与输入信号进行相关性计算,得到不同时间点上的相关系数,并设置一定的策略,例如对相关系数取n次方(n>=2),对生成的相关系数与设置一定的阈值,相关系数高于阈值部分,可以判断为近端输入,而相关系数低于阈值部分可以判定为远端回声输入,根据随机数生成算法产生噪声源,并进行低频滤波处理以及幅度限制,从而生成舒适噪声,得到用于向第一终端设备发送的第一语音数据。The echo cancellation algorithm is that the second terminal device obtains the estimated delay information according to the second voice data and the reference signal, that is, the correlation between the second voice data and the reference signal can be used to calculate the delay, and a set of initial Adaptive filter coefficients, according to the reference signal, make it iterate continuously to the minimum mean square error, or reach the upper limit of the iterative step, this step outputs the estimated echo signal, and uses the estimated echo signal to perform correlation calculations with the input signal to obtain different The correlation coefficient at the time point, and set a certain strategy, such as taking the nth power of the correlation coefficient (n>=2), and setting a certain threshold for the generated correlation coefficient. The part of the correlation coefficient higher than the threshold can be judged as close terminal input, and the part whose correlation coefficient is lower than the threshold value can be judged as the far-end echo input. The noise source is generated according to the random number generation algorithm, and low-frequency filtering and amplitude limitation are performed to generate comfort noise, which is used to send to the first terminal device. of the first voice data.
基于上述回声消除方法,下面分别提出本申请的回声消除装置、终端设备和计算机可读存储介质的各个实施例。Based on the above-mentioned echo cancellation method, various embodiments of the echo cancellation device, the terminal device and the computer-readable storage medium of the present application are respectively proposed below.
本申请的一个实施例还提供了一种回声消除装置,如图6所示,该回声消除装置包括:获取模块610、分离模块620、判定模块630和过滤模块640。其中获取模块610被设置为获取来自第二终端设备的所输出的第一语音数据,第一语音数据为第二终端设备在接收到来自第一终端设备的第二语音数据后所回复的语音数据;分离模块620被设置为通过时域音频分离模型对第一语音数据进行分离处理,得到多个分路语音数据;判定模块630被设置为将多个分路语音数据分别与第一语音数据的参考信号进行相关度判定,确定残余回声信号;过滤模块640被设置为将第一语音数据中的残余回声信号进行过滤,得到目标语音数据。An embodiment of the present application also provides an echo canceling device. As shown in FIG. 6 , the echo canceling device includes: an acquiring module 610 , a separating module 620 , a judging module 630 and a filtering module 640 . Wherein the obtaining module 610 is configured to obtain the outputted first voice data from the second terminal device, the first voice data is the voice data replied by the second terminal device after receiving the second voice data from the first terminal device The separation module 620 is set to separate the first voice data through the time-domain audio separation model to obtain a plurality of branch voice data; the determination module 630 is set to separate the multiple branch voice data with the first voice data respectively The reference signal is used for correlation determination to determine the residual echo signal; the filtering module 640 is configured to filter the residual echo signal in the first voice data to obtain the target voice data.
分离模块620还被设置为通过对根据混合语音数据训练得到的TasNet模型对第一语音数据进行分离处理,得到多个分路语音数据。The separation module 620 is also configured to separate the first speech data through the TasNet model trained according to the mixed speech data to obtain a plurality of split speech data.
判定模块630还被设置为将多个分路语音数据分别与第一语音数据的参考信号进行相关度计算,得到多个与分路语音数据对应的相关度;当最大的相关度大于预设阈值,将最高的相关度所对应的分路语音数据确定为残余回声信号。The judging module 630 is also configured to perform correlation calculations on the plurality of branch voice data and the reference signal of the first voice data respectively, to obtain a plurality of correlation degrees corresponding to the branch voice data; when the maximum correlation degree is greater than the preset threshold , determine the branch voice data corresponding to the highest correlation degree as the residual echo signal.
判定模块630还被设置为采用网页即时通信WebRTC的互相关函数对多个分路语音数据分别与第一语音数据的参考信号进行相关度计算。The judging module 630 is also configured to use the cross-correlation function of WebRTC to calculate the correlation between the plurality of branch voice data and the reference signal of the first voice data respectively.
过滤模块640还被设置为利用谱减法将第一语音数据中的残余回声信号进行过滤,得到目标语音数据。The filtering module 640 is further configured to use spectral subtraction to filter the residual echo signal in the first speech data to obtain the target speech data.
需要说明的是,回声消除装置被设置为执行与上述实施例中回声消除方法相同的技术手段,与上述实施例中回声消除方法所解决的技术问题相同,与上述实 施例中回声消除方法所达到的技术效果相同,此处对回声消除装置所采用的技术手段、解决的技术问题和达到的技术效果不作赘述。It should be noted that the echo canceling device is set to perform the same technical means as the echo canceling method in the above-mentioned embodiment, and the technical problem solved by the echo canceling method in the above-mentioned embodiment is the same as that achieved by the echo canceling method in the above-mentioned embodiment. The technical effects of the echo canceling device are the same, and the technical means adopted by the echo canceling device, the technical problems solved and the technical effects achieved will not be repeated here.
本申请的一个实施例还提供了一种终端设备,如图7所示,终端设备700包括存储器720、处理器710及存储在存储器720上并可在处理器710上运行的计算机程序。An embodiment of the present application also provides a terminal device. As shown in FIG. 7 , the terminal device 700 includes a memory 720 , a processor 710 and a computer program stored in the memory 720 and operable on the processor 710 .
处理器710和存储器720可以通过总线或者其他方式连接。The processor 710 and the memory 720 may be connected via a bus or in other ways.
存储器720作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器720可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中,存储器720可包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至该处理器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。As a non-transitory computer-readable storage medium, the memory 720 can be used to store non-transitory software programs and non-transitory computer-executable programs. In addition, the memory 720 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices. In some embodiments, memory 720 may include memory located remotely from the processor, which may be connected to the processor via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
实现上述实施例的回声消除方法所需的非暂态软件程序以及指令存储在存储器720中,当被处理器710执行时,执行上述实施例中的回声消除方法,例如,执行以上描述的图2中的方法步骤S100至步骤S400、图4中的方法步骤S410至S430、图5中的方法步骤S510至S550。The non-transitory software programs and instructions required to realize the echo cancellation method of the above-mentioned embodiment are stored in the memory 720, and when executed by the processor 710, the echo cancellation method in the above-mentioned embodiment is executed, for example, the implementation of the above-described Figure 2 The method steps S100 to S400 in FIG. 4 , the method steps S410 to S430 in FIG. 4 , and the method steps S510 to S550 in FIG. 5 .
此外,本申请的一个实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被一个处理器或控制器执行,例如,被上述实施例中通信设备中的一个处理器执行,可使得处理器执行上述实施例中的对应于终端设备侧的回声消除方法,例如,执行以上描述的图2中的方法步骤S100至步骤S400、图4中的方法步骤S410至S430、图5中的方法步骤S510至S550。In addition, an embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are executed by a processor or a controller, for example, by the above-mentioned Execution by a processor in the communication device in the embodiment can cause the processor to execute the echo cancellation method corresponding to the terminal device side in the above embodiment, for example, execute steps S100 to S400 of the method in FIG. 2 described above, FIG. The method steps S410 to S430 in 4, and the method steps S510 to S550 in FIG. 5 .
本申请实施例包括:第一终端设备获取来自第二终端设备的所输出的第一语音数据,第一语音数据为第二终端设备在接收到来自第一终端设备的第二语音数据后所回复的语音数据;通时域音频分离模型对第一语音数据进行分离处理,得到多个分路语音数据;将多个分路语音数据中的每一个分路语音数据分别与多个不同时刻的参考信号进行相关度判定,确定残余回声信号;将第一语音数据中的残余回声信号进行过滤,得到目标语音数据。通过时域音频分离模型对第一语音数据进行分离处理得到分路语音数据与参考信号进行判定能够得到残余回声信号,然后对该残余回声信号进行过滤,从而能够解决回声泄露的问题,可以提供更高质量的语音输出。The embodiment of the present application includes: the first terminal device obtains the output first voice data from the second terminal device, and the first voice data is the reply of the second terminal device after receiving the second voice data from the first terminal device voice data; the time-domain audio separation model is used to separate the first voice data to obtain a plurality of branch voice data; each branch voice data in the multiple branch voice data is respectively compared with a plurality of references at different times Signal correlation is judged to determine the residual echo signal; the residual echo signal in the first voice data is filtered to obtain the target voice data. Through the time-domain audio separation model, the first voice data is separated and processed to obtain the divided voice data and the reference signal to determine the residual echo signal, and then filter the residual echo signal, so as to solve the problem of echo leakage and provide more High quality voice output.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术 语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。Those skilled in the art can understand that all or some of the steps and systems in the methods disclosed above can be implemented as software, firmware, hardware and an appropriate combination thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit . Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As known to those of ordinary skill in the art, the term computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. permanent, removable and non-removable media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer. In addition, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
以上是对本申请的若干实施方式进行了具体说明,但本申请并不局限于上述实施方式,熟悉本领域的技术人员在不违背本申请精神的前提下还可作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a specific description of several embodiments of the present application, but the present application is not limited to the above-mentioned embodiments, and those skilled in the art can also make various equivalent deformations or replacements without violating the spirit of the present application. Equivalent modifications or replacements are all within the scope defined by the claims of the present application.

Claims (10)

  1. 一种回声消除方法,应用于第一终端设备,所述方法包括:An echo cancellation method applied to a first terminal device, the method comprising:
    获取来自第二终端设备的所输出的第一语音数据,所述第一语音数据为所述第二终端设备在接收到来自所述第一终端设备的第二语音数据后所回复的语音数据;Acquiring the output first voice data from the second terminal device, the first voice data being the voice data replied by the second terminal device after receiving the second voice data from the first terminal device;
    通过时域音频分离模型对所述第一语音数据进行分离处理,得到多个分路语音数据;Separating the first voice data through a time-domain audio separation model to obtain a plurality of split voice data;
    将多个所述分路语音数据中的每一个所述分路语音数据分别与多个不同时刻的参考信号进行相关度判定,确定残余回声信号;Carrying out a correlation determination between each of the plurality of branch voice data and a plurality of reference signals at different times to determine a residual echo signal;
    将所述第一语音数据中的所述残余回声信号进行过滤,得到目标语音数据。Filtering the residual echo signal in the first voice data to obtain target voice data.
  2. 根据权利要求1所述的回声消除方法,其中,所述将多个所述分路语音数据中的每一个所述分路语音数据分别与多个不同时刻的参考信号进行相关度判定,确定残余回声信号,包括:The echo cancellation method according to claim 1, wherein the correlation determination is performed between each of the plurality of branch voice data and a plurality of reference signals at different times to determine the residual Echo signals, including:
    将多个所述分路语音数据中的每一个所述分路语音数据分别与多个不同时刻的参考信号进行相关度计算,得到最大相关度的所述参考信号;performing correlation calculation on each of the plurality of branch voice data and a plurality of reference signals at different times to obtain the reference signal with the maximum correlation;
    当所述最大相关度大于预设阈值,确定所述最大相关度对应的所述参考信号为目标参考信号;When the maximum correlation degree is greater than a preset threshold, determine that the reference signal corresponding to the maximum correlation degree is a target reference signal;
    将所述目标参考信号所对应的所述分路语音数据确定为残余回声信号。Determining the split speech data corresponding to the target reference signal as a residual echo signal.
  3. 根据权利要求2所述的回声消除方法,其中,所述将多个所述分路语音数据中的每一个所述分路语音数据分别与多个不同时刻的参考信号进行相关度判定,包括:The echo cancellation method according to claim 2, wherein said determining the correlation between each of said branched voice data and a plurality of reference signals at different times includes:
    采用网页即时通信的互相关函数对多个所述分路语音数据中的每一个所述分路语音数据分别与多个不同时刻的参考信号进行相关度判定。Using the cross-correlation function of webpage instant messaging to determine the correlation between each of the plurality of branch voice data and a plurality of reference signals at different times.
  4. 根据权利要求1所述的回声消除方法,其中,所述第一语音数据为所述第二终端设备通过自适应滤波的回声消除算法对所述第二语音数据所产生的回声数据处理后的语音数据。The echo cancellation method according to claim 1, wherein the first voice data is the voice processed by the second terminal device on the echo data generated by the second voice data through an echo cancellation algorithm of adaptive filtering data.
  5. 根据权利要求4所述的回声消除方法,其中,自适应滤波的回声消除算法包括:The echo cancellation method according to claim 4, wherein the echo cancellation algorithm of adaptive filtering comprises:
    由所述第二终端设备根据所述第二语音数据和所述参考信号得到估计延时信息;obtaining estimated delay information by the second terminal device according to the second voice data and the reference signal;
    根据所述估计延时信息、预设的自适应滤波器系数以及所述参考信号得到估计回声信号;obtaining an estimated echo signal according to the estimated delay information, preset adaptive filter coefficients, and the reference signal;
    将所述第二语音数据与所述估计回声信号进行相关度计算,得到不同时间点的相关系数;performing correlation calculation on the second voice data and the estimated echo signal to obtain correlation coefficients at different time points;
    根据所述相关系数确定回声输入信号;determining an echo input signal based on said correlation coefficient;
    对所述回声输入信号进行过滤处理,得到所述第一语音数据。Filtering the echo input signal to obtain the first voice data.
  6. 根据权利要求1所述的回声消除方法,其中,所述时域音频分离模型为根据混合语音数据训练得到的TasNet模型。The echo cancellation method according to claim 1, wherein the time-domain audio separation model is a TasNet model trained according to mixed speech data.
  7. 根据权利要求1所述的回声消除方法,其中,所述将所述第一语音数据中的所述残余回声信号进行过滤,得到目标语音数据包括:The echo cancellation method according to claim 1, wherein said filtering the residual echo signal in the first speech data to obtain the target speech data comprises:
    利用谱减法将所述第一语音数据中的所述残余回声信号进行过滤,得到目标语音数据。The residual echo signal in the first speech data is filtered by spectral subtraction to obtain target speech data.
  8. 一种回声消除装置,包括:An echo canceling device, comprising:
    获取模块,被设置为获取来自第二终端设备的所输出的第一语音数据,所述第一语音数据为所述第二终端设备在接收到来自所述第一终端设备的第二语音数据后所回复的语音数据;An acquisition module, configured to acquire the outputted first voice data from the second terminal device, the first voice data is after the second terminal device receives the second voice data from the first terminal device The voice data of the reply;
    分离模块,被设置为通过时域音频分离模型对所述第一语音数据进行分离处理,得到多个分路语音数据;The separation module is configured to perform separation processing on the first voice data through a time-domain audio separation model to obtain a plurality of split voice data;
    判定模块,被设置为将多个所述分路语音数据中的每一个所述分路语音数据分别与多个不同时刻的参考信号进行相关度判定,确定残余回声信号;The judging module is configured to perform a correlation judgment on each of the plurality of branch voice data and a plurality of reference signals at different times, and determine a residual echo signal;
    过滤模块,被设置为将所述第一语音数据中的所述残余回声信号进行过滤,得到目标语音数据。The filtering module is configured to filter the residual echo signal in the first voice data to obtain target voice data.
  9. 一种终端设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如权利要求1至7任意一项所述的回声消除方法。A terminal device, comprising: a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein, when the processor executes the computer program, the computer program described in any one of claims 1 to 7 is implemented. The echo cancellation method described above.
  10. 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行权利要求1至7任意一项所述的回声消除方法。A computer-readable storage medium, storing computer-executable instructions, the computer can execute the echo cancellation method according to any one of claims 1-7.
PCT/CN2022/093959 2021-09-14 2022-05-19 Echo cancellation method, and terminal device and storage medium WO2023040322A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111073754.6A CN115810361A (en) 2021-09-14 2021-09-14 Echo cancellation method, terminal device and storage medium
CN202111073754.6 2021-09-14

Publications (1)

Publication Number Publication Date
WO2023040322A1 true WO2023040322A1 (en) 2023-03-23

Family

ID=85481395

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/093959 WO2023040322A1 (en) 2021-09-14 2022-05-19 Echo cancellation method, and terminal device and storage medium

Country Status (2)

Country Link
CN (1) CN115810361A (en)
WO (1) WO2023040322A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108881652A (en) * 2018-07-11 2018-11-23 北京大米科技有限公司 Echo detection method, storage medium and electronic equipment
US20190066713A1 (en) * 2016-06-14 2019-02-28 The Trustees Of Columbia University In The City Of New York Systems and methods for speech separation and neural decoding of attentional selection in multi-speaker environments
CN109410975A (en) * 2018-10-31 2019-03-01 歌尔科技有限公司 A kind of voice de-noising method, equipment and storage medium
CN110047497A (en) * 2019-05-14 2019-07-23 腾讯科技(深圳)有限公司 Background audio signals filtering method, device and storage medium
CN111341336A (en) * 2020-03-16 2020-06-26 北京字节跳动网络技术有限公司 Echo cancellation method, device, terminal equipment and medium
CN112929506A (en) * 2019-12-06 2021-06-08 阿里巴巴集团控股有限公司 Audio signal processing method and apparatus, computer storage medium, and electronic device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190066713A1 (en) * 2016-06-14 2019-02-28 The Trustees Of Columbia University In The City Of New York Systems and methods for speech separation and neural decoding of attentional selection in multi-speaker environments
CN108881652A (en) * 2018-07-11 2018-11-23 北京大米科技有限公司 Echo detection method, storage medium and electronic equipment
CN109410975A (en) * 2018-10-31 2019-03-01 歌尔科技有限公司 A kind of voice de-noising method, equipment and storage medium
CN110047497A (en) * 2019-05-14 2019-07-23 腾讯科技(深圳)有限公司 Background audio signals filtering method, device and storage medium
CN112929506A (en) * 2019-12-06 2021-06-08 阿里巴巴集团控股有限公司 Audio signal processing method and apparatus, computer storage medium, and electronic device
CN111341336A (en) * 2020-03-16 2020-06-26 北京字节跳动网络技术有限公司 Echo cancellation method, device, terminal equipment and medium

Also Published As

Publication number Publication date
CN115810361A (en) 2023-03-17

Similar Documents

Publication Publication Date Title
CN107123430B (en) Echo cancel method, device, meeting plate and computer storage medium
US9870783B2 (en) Audio signal processing
US9653092B2 (en) Method for controlling acoustic echo cancellation and audio processing apparatus
CN111341336B (en) Echo cancellation method, device, terminal equipment and medium
US10115411B1 (en) Methods for suppressing residual echo
US8085930B2 (en) Communication system
KR101422984B1 (en) Method and device for suppressing residual echoes
US9185506B1 (en) Comfort noise generation based on noise estimation
CN110782912A (en) Sound source control method and speaker device
US9343073B1 (en) Robust noise suppression system in adverse echo conditions
CN108076239B (en) Method for improving IP telephone echo
US20180309871A1 (en) Double talk detection for echo suppression in power domain
US20150086006A1 (en) Echo suppressor using past echo path characteristics for updating
CN109559756B (en) Filter coefficient determining method, echo eliminating method, corresponding device and equipment
WO2021077599A1 (en) Double-talk detection method and apparatus, computer device and storage medium
CN110956975A (en) Echo cancellation method and device
CN109215672B (en) Method, device and equipment for processing sound information
US11380312B1 (en) Residual echo suppression for keyword detection
US9680999B2 (en) Apparatus and method for removing acoustic echo in teleconference system
US9172791B1 (en) Noise estimation algorithm for non-stationary environments
WO2019239977A1 (en) Echo suppression device, echo suppression method, and echo suppression program
CN106297816B (en) Echo cancellation nonlinear processing method and device and electronic equipment
CN112929506B (en) Audio signal processing method and device, computer storage medium and electronic equipment
CN114697782A (en) Earphone wind noise identification method and device and earphone
WO2023040322A1 (en) Echo cancellation method, and terminal device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22868699

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE