CN113763980A - Echo cancellation method - Google Patents

Echo cancellation method Download PDF

Info

Publication number
CN113763980A
CN113763980A CN202111277825.4A CN202111277825A CN113763980A CN 113763980 A CN113763980 A CN 113763980A CN 202111277825 A CN202111277825 A CN 202111277825A CN 113763980 A CN113763980 A CN 113763980A
Authority
CN
China
Prior art keywords
domain signal
frequency domain
ref
signal
vec
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111277825.4A
Other languages
Chinese (zh)
Other versions
CN113763980B (en
Inventor
刘文通
万东琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chipintelli Technology Co Ltd
Original Assignee
Chipintelli Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chipintelli Technology Co Ltd filed Critical Chipintelli Technology Co Ltd
Priority to CN202111277825.4A priority Critical patent/CN113763980B/en
Publication of CN113763980A publication Critical patent/CN113763980A/en
Application granted granted Critical
Publication of CN113763980B publication Critical patent/CN113763980B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

An echo cancellation method, comprising the steps of: s1, acquiring a digital microphone signal and a digital reference signal through a microphone array; s2, converting the digital time domain signal into a frequency domain signal; s3, performing linear prediction caching and nonlinear expansion on the reference frequency domain signal to obtain a reference frequency domain signal matrix; s4, calculating an autocorrelation diagonalization matrix; s5, calculating an echo cancellation gain vector of each frequency point, and performing echo cancellation on the microphone frequency domain signal obtained in the step S2; and S6, outputting the final output frequency domain signal and converting the final output frequency domain signal into a time domain signal. Compared with the traditional echo cancellation method, the method is beneficial to improving the influence of the nonlinear distortion of the system on the processing effect, saves the resource consumption on the calculation principle, effectively improves the signal-to-noise ratio of the processed voice signal, and further improves the echo cancellation effect.

Description

Echo cancellation method
Technical Field
The invention belongs to the technical field of audio processing, and particularly relates to an echo cancellation method.
Background
In an audio system with a loudspeaker and a microphone, an echo cancellation technology is widely applied, and with the rapid development of an artificial intelligence technology and the internet of things, actual application products put more rigorous requirements on echo cancellation effect, computing power and memory.
A common echo cancellation method estimates an echo channel through a self-adaptive filter to further cancel echo, and if the self-adaptive filter of the method involves inversion operation and nonlinear suppression, hardware consumption of a product is increased; meanwhile, many echo cancellation methods based on a deep neural network appear in recent years, the methods can further improve the echo cancellation effect, have certain processing effects on the problems of nonlinear distortion, reverberation, environmental noise and the like, but considering various complex application environments, the screening of a training set is a challenge, the stability in the actual application process can be directly influenced, and meanwhile, the calculation power and the memory of the echo cancellation method based on deep learning limit the wide application of the echo cancellation method.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention discloses an echo cancellation method.
The echo cancellation method of the invention comprises the following steps:
s1, acquiring an analog microphone signal and an analog reference signal through a microphone array, converting the analog microphone signal and the analog reference signal into digital time domain signals, and respectively acquiring a digital microphone signal and a digital reference signal;
wherein the analog microphone signal is an electrical signal which is sent out by the loudspeaker and output after being received by the microphone, and the analog reference signal is an electrical signal which is input into the loudspeaker;
s2, converting the digital microphone signal and the digital reference signal in the form of digital time domain signal into a microphone frequency domain signal and a reference frequency domain signal respectively by adopting short-time Fourier transform technology;
s3, performing linear prediction caching and nonlinear expansion on the reference frequency domain signal to obtain a reference frequency domain signal matrix, wherein the reference frequency domain signal matrix is composed of a plurality of reference frequency domain signal vectors;
reference frequency domain signal vector REF _ VEC of frequency point k of reference channel l frameqThe calculation process of (k, l) is:
s31, setting a linear prediction length LP, and carrying out cache prediction on a k frequency point of the ith frame of the q reference channel
REF_VEC_PREq(k,l)=[Refq(k,l), Refq(k,l-1),…Refq(k,l-LP+1)]
Wherein Refq(k, l) is the reference frequency domain signal of the kth frequency point of the ith frame of the qth reference channel, and the rest is analogized;
S32, for the prediction buffer vector REF _ VEC _ PREqThe reference frequency domain signal stored in (k, l) is subjected to nonlinear expansion to obtain a reference frequency domain signal vector Ref _ VEC after nonlinear expansionq(k, l) is:
Figure 160819DEST_PATH_IMAGE001
p1,p2,…pLPfor the order of the non-linear expansion, ref _ vec _ pq,p(k, l) representsqP-order expanded reference frequency domain signals of the kth frequency point of the ith frame of each reference channel, and the rest are analogized; reference frequency domain signal Ref of the kth frequency point of the ith frame of the qth reference channel through odd power seriesq(k, l) is subjected to nonlinear expansion, and specifically:
ref_vec_pq,p(k,l)= Refq(k,l)2*p-1(ii) a The rest is analogized;
s33, traversing each frame, frequency point and reference channel, and combining all reference frequency domain signal vectors to obtain a reference frequency domain signal matrix REF _ VEC;
s4, calculating an autocorrelation diagonalization matrix R _ IVM of the reference frequency domain signal matrix REF _ VEC;
s5, calculating echo cancellation gain vector W of each frequency point, and performing echo cancellation on the microphone frequency domain signal obtained in the step S2;
s51, calculating a microphone frequency domain signal Mic of the kth frequency point of the nth microphone channel of the l framenCross correlation vector R _ MIC _ Ref between (k, l) and kth frequency point reference frequency domain signal of ith frame of qth channelq(k,l)
S52, traversing all reference frequency domain signal vectors Ref _ VEC for multiple timesq(k, l) taking the element of the non-linear extended reference frequency domain signal vector of the jth traversal as ref _ vec _ nq(j),
The specific process of each traversal is as follows:
S521.R_y_ref= yn,j-1(k,l)*conj(ref_vec_nq(j));
conj denotes taking the conjugate, yn,j-1(k, l) th frame for the (j-1) th traversal of the nth microphone channelA k frequency point residual voice frequency domain signal, wherein R _ y _ ref is a traversal intermediate variable;
j =1, yn,j-1(k,l) =Micn(k,l),
S522, traversing the smoothed cross-correlation signal r _ cm of the jth frequency point of the kth frequency point of the l frame for the jth timeq(k,l,j)= λ* r_cmq(k, l-1, j) + (1- λ) × R _ y _ ref, λ being a smoothing factor; r _ y _ ref is a traversal intermediate variable;
s523, echo cancellation gain in the jth traversal:
W(j)= r_cmq(k,l,j)/ [r_ivmq(k,l,j)+δ]δ is a minimum value that prevents the denominator from being zero;
wherein r _ ivmq(k, l, j) is the k frequency point reference frequency domain signal autocorrelation diagonalization signal of the ith frame of the qth channel of the jth traversal, and the autocorrelation diagonalization matrix R _ IVM obtained in the step S4;
s524, carrying out echo cancellation processing, and enabling the traversal result y of the last residual voice frequency domain signaln,j-1(k, l) is used in the traversal calculation process of this time, and the residual voice frequency domain signal y is calculated in the traversal process of this timen,j(k, l) is
yn,j(k,l)= yn,j-1(k,l)-W(j)* ref_vec_nq(j);
S6, after all the traversal times are finished, outputting a final output frequency domain signal obtained by the last traversal,
and converted into a time domain signal.
Preferably, the step S4 specifically includes:
s41, referring to a diagonal simplified matrix R _ Ref of the kth frequency point of the ith frame of the qth reference channel in a frequency domain signal matrix REF _ VECq(k,l)= Ref_VECq(k,l)* Ref_VECq(k,l)H
Wherein Ref _ VECq(k, l) a reference frequency domain signal vector obtained by carrying out nonlinear expansion on the kth frequency point of the ith frame of the qth reference channel, wherein the superscript H represents a conjugate transpose and the-represents a dot product;
s42, the self-correlation diagonalization vector of the reference frequency domain signal of the kth frequency point of the ith frame of the qth reference channel
R_ IVMq(k,l)= λ* R_ IVMq(k,l) +(1-λ)* R_ Refq(k,l);
λ is the smoothing factor;
s43, traversing each frame, frequency point and reference channel, and combining all the reference frequency domain signal autocorrelation diagonalization vectors to obtain an autocorrelation diagonalization matrix R _ IVM.
Preferably, the value of the smoothing factor lambda is 0.7-0.99.
Preferably, in step S6, the frequency domain signal after the echo cancellation is converted into a time domain signal by using an inverse short-time fourier transform module.
Compared with the traditional echo cancellation method, the scheme of the invention utilizes a rapid echo cancellation algorithm, the signal-to-noise ratio of the processed voice signal is higher, and the echo cancellation effect can be effectively improved.
Drawings
Fig. 1 is a flow chart of an embodiment of the echo cancellation method according to the present invention;
FIG. 2 is a schematic flow chart of an echo cancellation method according to the present invention;
FIG. 3 is a waveform diagram of a time domain signal before echo cancellation processing in an embodiment of the present invention;
fig. 3 (a 1) part signal is a microphone signal acquired by a microphone array, and (a 2) part signal is a reference signal;
FIG. 4 is a schematic diagram illustrating a comparison of waveforms obtained by performing echo cancellation processing on the signal in FIG. 3 according to the prior art and the present invention;
fig. 4 (A3) is a part of an output waveform processed by a prior art echo cancellation method, and (a 4) is a waveform diagram of an output processed by the echo cancellation device shown in fig. 2 according to the present invention;
in fig. 3 and 4, the abscissa represents time, and the ordinate represents voltage amplitude.
Detailed Description
The following provides a more detailed description of the present invention.
The echo cancellation method of the invention comprises the following steps:
s1, acquiring an analog microphone signal and an analog reference signal through a microphone array, converting the analog microphone signal and the analog reference signal into digital time domain signals, and respectively acquiring a digital microphone signal and a digital reference signal;
wherein the analog microphone signal is an electrical signal which is sent out by the loudspeaker and output after being received by the microphone, and the analog reference signal is an electrical signal which is input into the loudspeaker;
s2, converting the digital microphone signal and the digital reference signal in the form of digital time domain signal into a microphone frequency domain signal and a reference frequency domain signal respectively by adopting short-time Fourier transform technology;
s3, performing linear prediction caching and nonlinear expansion on the reference frequency domain signal to obtain a reference frequency domain signal matrix, wherein the reference frequency domain signal matrix is composed of a plurality of reference frequency domain signal vectors;
reference frequency domain signal vector REF _ VEC of frequency point k of reference channel l frameqThe calculation process of (k, l) is:
s31, setting linear prediction length LP, and predicting buffer vector for buffer storage
REF_VEC_PREq(k,l)=[Refq(k,l), Refq(k,l-1),…Refq(k,l-LP+1)]
Wherein Refq(k, l) is a reference frequency domain signal of the kth frequency point of the ith frame of the qth reference channel, and the rest is analogized;
s32, for the prediction buffer vector REF _ VEC _ PREqThe reference frequency domain signal stored in (k, l) is subjected to nonlinear expansion to obtain a reference frequency domain signal vector Ref _ VEC after nonlinear expansionq(k, l) is:
Figure 872030DEST_PATH_IMAGE001
p1,p2,…pLPfor the order of the non-linear expansion, ref _ vec _ pq,p(k, l) representsqP-order expanded reference frequency domain signals of the kth frequency point of the ith frame of each reference channel, and the rest are analogized; reference frequency domain signal Ref of the kth frequency point of the ith frame of the qth reference channel through odd power seriesq(k, l) is subjected to nonlinear expansion, and specifically:
ref_vec_pq,p(k,l)= Refq(k,l)2*p-1(ii) a The rest is analogized;
s33, traversing each frame, frequency point and reference channel, and combining all reference frequency domain signal vectors to obtain a reference frequency domain signal matrix REF _ VEC;
s4, calculating an autocorrelation diagonalization matrix R _ IVM of the reference frequency domain signal matrix REF _ VEC;
s5, calculating echo cancellation gain vector W of each frequency point, and performing echo cancellation on the microphone frequency domain signal obtained in the step S2;
s51, calculating a microphone frequency domain signal Mic of the kth frequency point of the nth microphone channel of the l framenCross correlation vector R _ MIC _ Ref between (k, l) and kth frequency point reference frequency domain signal of ith frame of qth channelq(k,l)
S52, traversing all reference frequency domain signal vectors Ref _ VEC for multiple timesq(k, l) taking the element of the non-linear extended reference frequency domain signal vector of the jth traversal as ref _ vec _ nq(j),
The specific process of each traversal is as follows:
s521, traversing intermediate variable R _ y _ ref = yn,j-1(k,l)*conj(ref_vec_nq(j));
conj denotes taking the conjugate, yn,j-1(k, l) for the (j-1) th traverse of the nth microphone channel, the kth frequency point residual speech frequency domain signal of the l frame,
j =1, yn,j-1(k,l) =Micn(k,l),
Where ref _ vec _ nq(j) Is a reference frequency domain signal vector Ref _ VECqThe jth element of (k, l);
s522, traversing the smoothed cross-correlation signal r _ cm of the jth frequency point of the kth frequency point of the l frame for the jth timeq(k,l,j)= λ* r_cmq(k, l-1, j) + (1- λ) × R _ y _ ref, λ being a smoothing factor;
s523, echo cancellation gain in the jth traversal:
W(j)= r_cmq(k,l,j)/ [r_ivmq(k,l,j)+δ]δ is a minimum value that prevents the denominator from being zero;
wherein r _ ivmq(k, l, j) is the kth frequency point parameter of the ith frame of the qth channel of the jth traversalThe autocorrelation diagonalization signal of the reference frequency domain signal is obtained from the autocorrelation diagonalization matrix R _ IVM obtained in step S4;
s524, carrying out echo cancellation processing, and enabling the traversal result y of the last residual voice frequency domain signaln,j-1(k, l) is used in the traversal calculation process of this time, and the residual voice frequency domain signal y is calculated in the traversal process of this timen,j(k, l) is
yn,j(k,l)= yn,j-1(k,l)-W(j)* ref_vec_nq(j);
S6, after all the traversal times are finished, outputting a final output frequency domain signal obtained by the last traversal,
and converted into a time domain signal.
One embodiment, as shown in fig. 1, may be implemented by the following steps:
s1, acquiring an analog microphone signal and an analog reference signal through a microphone array, and then converting the analog microphone signal and the analog reference signal into a digital time domain signal by adopting an analog-to-digital converter (ADC) to obtain a digital microphone signal and a digital reference signal;
wherein the analog microphone signal is an electrical signal which is sent out by the loudspeaker and output after being received by the microphone, and the analog reference signal is an electrical signal which is input into the loudspeaker;
s2, converting the digital time domain signal into a digital frequency domain signal by adopting a short-time Fourier transform technology
Converting the digital microphone signal and the digital reference signal obtained in the step S1 into frequency domain signals of K frequency points
For further details of the embodiments, the minimum system is taken as an example, and a single-microphone single-speaker system is adopted for description, in which the number of microphones N is=1, number of reference channels Q=1, converting a microphone time domain signal of a current frame digital microphone signal into a microphone frequency domain signal; converting the reference time domain signal of the current l frame into a reference frequency domain signal;
in a specific embodiment, a short-time fourier transform with 512 points is adopted, and then the frequency point number K =257 and the dimension of the microphone frequency domain signal is K =; the dimension of the reference frequency domain signal is;
and S3, performing linear prediction caching and nonlinear expansion on the reference frequency domain signal to obtain a reference frequency domain signal matrix, wherein the reference frequency domain signal matrix is composed of a plurality of reference frequency domain signal vectors.
For convenience of description, the reference frequency domain signal vector REF _ VEC of the kth frequency point of the ith frame of the qth reference channelqThe calculation process of (k, l) is described in detail, and the q value is at least 1.
Because the loudspeaker signal collected by the microphone has strong linear correlation with the original reference signal, the correlation can be approximated by adopting the linear prediction technology, and if the technology for realizing the linear prediction needs to buffer the past frame signal, therefore, the prediction buffer vector REF _ VEC _ PRE of the kth frequency point of the ith frame of the qth reference channel is used for the k frequency point of the ith frameq(k, l) the linear prediction length LP =4 is set for buffering, and in the present embodiment:
REF_VEC_PREq(k,l)= [ Refq(k,l), Refq(k,l-1),…Refq(k,l-4+1)] ;
Refqand (k, l) is a reference frequency domain signal of the kth frequency point of the ith frame of the qth reference channel, and the rest is analogized.
In practical application, especially in embedded equipment using micro-speakers, nonlinearity is inevitable, and in order to weaken the influence of nonlinear distortion of a system on echo cancellation effect, a reference frequency domain signal after nonlinear expansion can be obtained by performing nonlinear expansion on a kth frequency point reference frequency domain signal of an ith frame of a qth reference channel through odd power series
ref_vec_pq,i(k,l)= Refq(k,l)2*i-1I =1,2 … m; wherein m is the order of expansion, and a vector P = [2,2,1 ] of nonlinear expansion order],
To take advantage of linear prediction and nonlinear extension into account, the vector REF _ VEC _ PRE is buffered for predictionqThe reference frequency domain signal stored in (k, l) is subjected to nonlinear expansion to obtain a reference frequency domain signal vector Ref _ VEC after nonlinear expansionq(k, l) is:
Figure 935801DEST_PATH_IMAGE002
since P = [2,2,1 ]]Therefore, each row is expanded from top to bottom by 2,1 and 1 orders, ref _ vec _ pq,2(k, l) representsq2-order expanded reference frequency domain signals of the kth frequency point of the ith frame of each reference channel, and the rest are analogized in sequence;
traversing each frame, frequency point and reference channel, and combining all reference frequency domain signal vectors to obtain a reference frequency domain signal matrix REF _ VEC;
in this embodiment, Ref _ VEC can be obtainedqThe dimension of (k, l) is 1 × 6.
And S4, calculating an autocorrelation diagonalization matrix R _ IVM of the reference frequency domain signal matrix REF _ VEC.
For convenience of description, the autocorrelation diagonalization vector R _ IVM of the kth frequency point of the ith frame of the qth reference channelqThe calculation process of (k, l) is described in detail.
Usually, an autocorrelation matrix of a reference signal is calculated in an adaptive filter for analyzing the correlation between a microphone signal and the reference signal, so that an autocorrelation matrix R _ Ref _ VEC of a kth frequency point prediction buffer vector of an ith frame of a qth reference channel needs to be calculatedq(k, l) due to the vector Ref _ VECqThe dimension of (k, l) is 1 x 6, and the matrix R _ Ref _ VEC can be calculatedqThe dimension of (k, l) is 1 × 6, when the order setting is large to obtain a good echo cancellation effect, it will create a huge challenge for subsequent calculation, and at this time, the memory space required by the system is very harsh for the actual embedded product, so the simplified matrix R _ Ref _ VEC is approximated by using a diagonal sequenceq(k, l), obtaining a diagonal reduced matrix R _ Ref of the k frequency point of the ith frame of the q reference channel after reductionq(k,l)= Ref_VECq(k,l)* Ref_VECq(k,l)H
Where the superscript H denotes the conjugate transpose and x denotes the dot product.
Resulting R _ RefqThe dimension of (k, l) is 1 x 6, compared to the matrix R _ Ref _ VECqThe dimension of (k, l) is 1 × 6, and it can be found that the memory and subsequent calculation amount can be greatly reduced by using the method of approximating the diagonalized sequence.
Because the amplitude fluctuation of the reference signal is large in the actual processing process, the stability of the system is considered, and R _ Ref is correctedq(k, l) smoothing to obtain a smoothed autocorrelation diagonalized vector R _ IVM of the reference frequency domain signalq(k, l) is:
R_ IVMq(k,l)= λ* R_ IVMq(k,l) +(1-λ)* R_ Refq(k,l)
wherein λ is a smoothing factor, generally taking a value of 0.7 to 0.999, and λ =0.99 in the present embodiment;
s5, calculating echo cancellation gain vector W of each frequency point, and performing echo cancellation on the microphone frequency domain signal obtained in the step S2;
s51, calculating a microphone frequency domain signal Mic of the kth frequency point of the nth microphone channel of the l framenCross correlation vector R _ MIC _ Ref between (k, l) and kth frequency point reference frequency domain signal of ith frame of qth channelq(k, l), for a single microphone system, n = 1;
s52, traversing all reference frequency domain signal vectors Ref _ VEC for multiple timesq(k, l) taking the element of the non-linear extended reference frequency domain signal vector of the jth traversal as ref _ vec _ nq(j),
The specific process of each traversal is as follows:
S521.R_y_ref= yn,j-1(k,l)*conj(ref_vec_nq(j));
conj denotes taking the conjugate, yn,j-1(k, l) for the (j-1) th traverse of the nth microphone channel, the kth frequency point residual speech frequency domain signal of the l frame,
j =1, yn,j-1(k,l) =Micn(k,l),
Where ref _ vec _ nq(j) Is a reference frequency domain signal vector Ref _ VECqThe jth element of (k, l);
s522, traversing the smoothed cross-correlation signal of the jth frequency point of the kth frequency point of the l frame for the jth time
r_cmq(k,l,j)= λ* r_cmq(k, l-1, j) + (1- λ) × R _ y _ ref, λ being the smoothing factor;
s523, echo cancellation gain in the jth traversal:
W(j)= r_cmq(k,l,j)/ [r_ivmq(k,l,j)+δ]delta is preventionMinimum value with stop denominator of zero, and delta =10 can be taken-6
Wherein r _ ivmq(k, l, j) is the k frequency point reference frequency domain signal autocorrelation diagonalization signal of the ith frame of the qth channel of the jth traversal, and the autocorrelation diagonalization matrix R _ IVM obtained in the step S4;
s524, carrying out echo cancellation processing, and enabling the traversal result y of the last residual voice frequency domain signaln,j-1(k, l) is used in the traversal calculation process of this time, and the residual voice frequency domain signal y is calculated in the traversal process of this timen,j(k, l) is
yn,j(k,l)= yn,j-1(k,l)-W(j)* ref_vec_nq(j);
S6, after all the traversal times are finished, outputting a final output frequency domain signal obtained by the last traversal,
and converted into a time domain signal.
In step S6, an inverse short-time fourier transform module ISTFT technique may be used to convert the frequency domain signal after echo cancellation into a time domain signal, and the time domain signal after echo cancellation may be directly transmitted to the next processing module through the system.
Compared with the conventional echo cancellation method, the scheme of the invention can effectively improve the echo cancellation effect by using a fast echo cancellation algorithm, and fig. 3 and 4 show a specific embodiment of the invention, and the echo cancellation processing process is performed based on the echo cancellation device shown in fig. 2. Fig. 3 is a time domain signal before echo cancellation processing, where the (a 1) partial signal is a microphone signal acquired by a microphone array, including ambient noise, sound played by a speaker, and a target human voice, where the target human voice is a command word played by a sound at a distance of 3m from a microphone; (A2) part of the signal is a reference signal, i.e. the signal input to the loudspeaker by the audio source shown in fig. 2.
Fig. 4 is a waveform diagram obtained after echo cancellation processing, and part (a 4) in fig. 4 is a waveform diagram of an output after processing by the echo cancellation device shown in fig. 1 according to the present invention; (A3) and in part, an output waveform map processed using a prior art RLS (least squares based) echo cancellation method. As can be seen from fig. 4, the difference between the target speech processed by the present invention, i.e., the parts with larger voltage amplitudes appearing in the upper and lower waveforms of fig. 4, and the echo residual value, i.e., the part with smaller voltage amplitude, is larger, i.e., the signal-to-noise ratio of the speech signal processed by the present invention is higher, which indicates that the present invention has a better echo cancellation effect.
The foregoing is a description of preferred embodiments of the present invention, and the preferred embodiments in the preferred embodiments may be combined and combined in any combination, if not obviously contradictory or prerequisite to a certain preferred embodiment, and the specific parameters in the examples and the embodiments are only for the purpose of clearly illustrating the inventor's invention verification process and are not intended to limit the patent protection scope of the present invention, which is defined by the claims and the equivalent structural changes made by the content of the description of the present invention are also included in the protection scope of the present invention.

Claims (4)

1. An echo cancellation method, comprising the steps of:
s1, acquiring an analog microphone signal and an analog reference signal through a microphone array, converting the analog microphone signal and the analog reference signal into digital time domain signals, and respectively acquiring a digital microphone signal and a digital reference signal;
wherein the analog microphone signal is an electrical signal which is sent out by the loudspeaker and output after being received by the microphone, and the analog reference signal is an electrical signal which is input into the loudspeaker;
s2, converting a digital microphone signal and a digital reference signal in a digital time domain signal form into a microphone frequency domain signal and a reference frequency domain signal respectively by adopting a short-time Fourier transform technology;
s3, performing linear prediction caching and nonlinear expansion on the reference frequency domain signal to obtain a reference frequency domain signal matrix, wherein the reference frequency domain signal matrix is composed of a plurality of reference frequency domain signal vectors;
reference frequency domain signal vector REF _ VEC of frequency point k of reference channel l frameqThe calculation process of (k, l) is:
s31, setting a linear prediction length LP, and carrying out cache prediction on a k frequency point of the ith frame of the q reference channel
REF_VEC_PREq(k,l)=[Refq(k,l), Refq(k,l-1),…Refq(k,l-LP+1)]
Refq(k, l) is a reference frequency domain signal of the kth frequency point of the ith frame of the qth reference channel, and the rest is analogized;
s32, for the prediction buffer vector REF _ VEC _ PREqThe reference frequency domain signal stored in (k, l) is subjected to nonlinear expansion to obtain a reference frequency domain signal vector Ref _ VEC after nonlinear expansionq(k, l) is:
Figure 773419DEST_PATH_IMAGE002
p1,p2,…pLPfor the order of the non-linear extension, LP is the linear prediction length, ref _ vec _ pq,p(k, l) representsqP-order expanded reference frequency domain signals of the kth frequency point of the ith frame of each reference channel, and the rest are analogized; reference frequency domain signal Ref of the kth frequency point of the ith frame of the qth reference channel through odd power seriesq(k, l) is subjected to nonlinear expansion, and specifically:
ref_vec_pq,p(k,l)= Refq(k,l)2*p-1(ii) a The rest is analogized;
s33, traversing each frame, frequency point and reference channel, and combining all reference frequency domain signal vectors to obtain a reference frequency domain signal matrix REF _ VEC;
s4, calculating an autocorrelation diagonalization matrix R _ IVM of a reference frequency domain signal matrix REF _ VEC;
s5, calculating an echo cancellation gain vector W of each frequency point, and performing echo cancellation on the microphone frequency domain signal obtained in the step S2;
s51, calculating a microphone frequency domain signal Mic of the kth frequency point of the nth microphone channel of the l framenCross correlation vector R _ MIC _ Ref between (k, l) and kth frequency point reference frequency domain signal of ith frame of qth channelq(k,l)
S52, traversing all reference frequency domain signals for multiple timesVector Ref _ VECq(k, l) taking the element of the non-linear extended reference frequency domain signal vector of the jth traversal as ref _ vec _ nq(j),
The specific process of each traversal is as follows:
S521. R_y_ref= yn,j-1(k,l)*conj(ref_vec_nq(j));
conj denotes taking the conjugate, yn,j-1(k, l) traversing the kth frequency point residual voice frequency domain signal of the ith frame of the nth microphone channel for the (j-1) th time, wherein R _ y _ ref is a traversal intermediate variable;
j =1, yn,j-1(k,l) =Micn(k,l),
S522, traversing the smoothed cross-correlation signal r _ cm of the jth frequency point of the kth frequency point of the l frame for the jth timeq(k,l,j)= λ* r_cmq(k, l-1, j) + (1- λ) × R _ y _ ref, λ being a smoothing factor; r _ y _ ref is a traversal intermediate variable;
s523, echo cancellation gain in the jth traversal:
W(j)= r_cmq(k,l,j)/ [r_ivmq(k,l,j)+δ]δ is a minimum value that prevents the denominator from being zero;
wherein r _ ivmq(k, l, j) is the k frequency point reference frequency domain signal autocorrelation diagonalization signal of the ith frame of the qth channel of the jth traversal, and the autocorrelation diagonalization matrix R _ IVM obtained in the step S4;
s524, carrying out echo cancellation processing, and enabling the traversal result y of the last residual voice frequency domain signaln,j-1(k, l) is used in the traversal calculation process of this time, and the residual voice frequency domain signal y is calculated in the traversal process of this timen,j(k, l) is
yn,j(k,l)= yn,j-1(k,l)-W(j)* ref_vec_nq(j);
And S6, after all the traversal times are finished, outputting a final output frequency domain signal obtained by the last traversal, and converting the final output frequency domain signal into a time domain signal.
2. The echo cancellation method according to claim 1, wherein the step S4 specifically includes:
s41, diagonal simplification of kth frequency point of the ith frame of the qth reference channel in the reference frequency domain signal matrix REF _ VECMatrix R _ Refq(k,l)= Ref_VECq(k,l)* Ref_VECq(k,l)H
Wherein Ref _ VECq(k, l) a reference frequency domain signal vector obtained by carrying out nonlinear expansion on the kth frequency point of the ith frame of the qth reference channel, wherein the superscript H represents a conjugate transpose and the-represents a dot product;
s42, the self-correlation diagonalization vector of the reference frequency domain signal of the kth frequency point of the ith frame of the qth reference channel
R_ IVMq(k,l)= λ* R_ IVMq(k,l) +(1-λ)* R_ Refq(k,l);
λ is the smoothing factor; r _ Refq(k, l) is a diagonal reduced matrix of the kth frequency point of the ith frame of the qth reference channel;
s43, traversing each frame, frequency point and reference channel, and combining all the reference frequency domain signal autocorrelation diagonalization vectors to obtain an autocorrelation diagonalization matrix R _ IVM.
3. The echo cancellation method of claim 1, wherein said smoothing factor λ is in a range of 0.7-0.99.
4. The echo cancellation method of claim 1, wherein in step S6, the frequency domain signal after echo cancellation processing is converted into a time domain signal by using an inverse short-time fourier transform module.
CN202111277825.4A 2021-10-30 2021-10-30 Echo cancellation method Active CN113763980B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111277825.4A CN113763980B (en) 2021-10-30 2021-10-30 Echo cancellation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111277825.4A CN113763980B (en) 2021-10-30 2021-10-30 Echo cancellation method

Publications (2)

Publication Number Publication Date
CN113763980A true CN113763980A (en) 2021-12-07
CN113763980B CN113763980B (en) 2023-05-12

Family

ID=78784583

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111277825.4A Active CN113763980B (en) 2021-10-30 2021-10-30 Echo cancellation method

Country Status (1)

Country Link
CN (1) CN113763980B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170270935A1 (en) * 2016-03-18 2017-09-21 Qualcomm Incorporated Audio signal decoding
WO2018195299A1 (en) * 2017-04-21 2018-10-25 Qualcomm Incorporated Non-harmonic speech detection and bandwidth extension in a multi-source environment
CN112820311A (en) * 2021-04-16 2021-05-18 成都启英泰伦科技有限公司 Echo cancellation method and device based on spatial prediction
CN113114865A (en) * 2021-04-09 2021-07-13 苏州大学 Combined function linkage type kernel self-response nonlinear echo cancellation method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170270935A1 (en) * 2016-03-18 2017-09-21 Qualcomm Incorporated Audio signal decoding
WO2018195299A1 (en) * 2017-04-21 2018-10-25 Qualcomm Incorporated Non-harmonic speech detection and bandwidth extension in a multi-source environment
CN113114865A (en) * 2021-04-09 2021-07-13 苏州大学 Combined function linkage type kernel self-response nonlinear echo cancellation method
CN112820311A (en) * 2021-04-16 2021-05-18 成都启英泰伦科技有限公司 Echo cancellation method and device based on spatial prediction

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
QAIN ZHANG: "Orthogonal Least Squares Based Incrumental Echo Stat Networks for Nonlinear Time Series Data Analysis", 《IEEE ACCESS》 *
严涛: "声学回声消除的自适应滤波方法研究", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *

Also Published As

Publication number Publication date
CN113763980B (en) 2023-05-12

Similar Documents

Publication Publication Date Title
CN109065067B (en) Conference terminal voice noise reduction method based on neural network model
CN108172231B (en) Dereverberation method and system based on Kalman filtering
Meyer et al. Multi-channel speech enhancement in a car environment using Wiener filtering and spectral subtraction
JP5230103B2 (en) Method and system for generating training data for an automatic speech recognizer
Mertins et al. Room impulse response shortening/reshaping with infinity-and $ p $-norm optimization
US8391471B2 (en) Echo suppressing apparatus, echo suppressing system, echo suppressing method and recording medium
CN112151059A (en) Microphone array-oriented channel attention weighted speech enhancement method
CN112863535B (en) Residual echo and noise elimination method and device
Zhao et al. Late reverberation suppression using recurrent neural networks with long short-term memory
CN110611871A (en) Howling suppression method and system for digital hearing aid and special DSP
CN112331226B (en) Voice enhancement system and method for active noise reduction system
CN116030823B (en) Voice signal processing method and device, computer equipment and storage medium
CN113782044B (en) Voice enhancement method and device
CN110111802A (en) Adaptive dereverberation method based on Kalman filtering
Huang et al. Multi-microphone adaptive noise cancellation for robust hotword detection
Kim et al. Factorized MVDR deep beamforming for multi-channel speech enhancement
CN113409810B (en) Echo cancellation method for joint dereverberation
Alam et al. Robust feature extraction for speech recognition by enhancing auditory spectrum
CN112820311A (en) Echo cancellation method and device based on spatial prediction
CN113763980B (en) Echo cancellation method
Gomez et al. Robustness to speaker position in distant-talking automatic speech recognition
Zhang et al. Supervised single-channel speech dereverberation and denoising using a two-stage model based sparse representation
Schwartz et al. RNN-based step-size estimation for the RLS algorithm with application to acoustic echo cancellation
CN117854536B (en) RNN noise reduction method and system based on multidimensional voice feature combination
Ortega-Garcia et al. Providing single and multi-channel acoustical robustness to speaker identification systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant