CN109102821B - Time delay estimation method, time delay estimation system, storage medium and electronic equipment - Google Patents

Time delay estimation method, time delay estimation system, storage medium and electronic equipment Download PDF

Info

Publication number
CN109102821B
CN109102821B CN201811049712.7A CN201811049712A CN109102821B CN 109102821 B CN109102821 B CN 109102821B CN 201811049712 A CN201811049712 A CN 201811049712A CN 109102821 B CN109102821 B CN 109102821B
Authority
CN
China
Prior art keywords
reference signal
frequency
delay
signal
microphone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811049712.7A
Other languages
Chinese (zh)
Other versions
CN109102821A (en
Inventor
何赛娟
张华兵
周强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sipic Technology Co Ltd
Original Assignee
Sipic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sipic Technology Co Ltd filed Critical Sipic Technology Co Ltd
Priority to CN201811049712.7A priority Critical patent/CN109102821B/en
Publication of CN109102821A publication Critical patent/CN109102821A/en
Application granted granted Critical
Publication of CN109102821B publication Critical patent/CN109102821B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

The invention discloses a time delay estimation method, which comprises the following steps: acquiring a reference signal and a microphone signal acquired by a microphone, and performing fast Fourier transform to obtain a frequency domain reference signal and a frequency domain microphone signal; inputting the frequency-domain reference signal to an adaptive filter to obtain a relevant frequency-domain reference signal corresponding to the frequency-domain reference signal contained in the frequency-domain microphone signal, wherein the frequency-domain microphone signal is used for updating the adaptive filter; calculating filter energy from the correlated frequency domain reference signal output by the adaptive filter for use in determining a delay value. The method solves the problems that the performance is sharply reduced under the condition of environmental interference and the time delay estimation result is very unstable under the condition of a complex environment or a double-talk condition by adopting a cross-correlation method in the prior art.

Description

Time delay estimation method, time delay estimation system, storage medium and electronic equipment
Technical Field
The present invention relates to the field of audio signal processing technologies, and in particular, to a time delay estimation method, a time delay estimation system, a storage medium, and an electronic device.
Background
With the continuous development of information technology, various distributed intelligent hardware is increasingly widely applied in various fields. Echo cancellation has been a research hotspot of technicians in related fields as an indispensable link in intelligent device interaction. Taking a television set-top box capable of being controlled by voice as an example, since signals collected by a microphone can be mixed with voice commands and the sound of a television program, in order to distinguish the voice commands sent by a user, the voice of the television needs to be eliminated. Therefore, accurately estimating the delay between the television speech signal to the microphone greatly affects the quality of echo cancellation.
Echo cancellation is a process of preventing the return of far-end sounds by canceling or removing the far-end audio signal picked up in the local microphone. The existing typical echo cancellation method is based on a time delay estimation method, calculates the linear correlation between a reference signal and a microphone signal, and selects a time delay corresponding to the maximum cross-correlation as a device time delay for echo cancellation.
Although the cross-correlation method is a time delay estimation method with a relatively simple principle, due to the defects of the algorithm, the performance is sharply reduced under the condition of environmental interference, and the time delay estimation result is very unstable under the condition of a relatively complex environment or a double-talk condition. In addition, the time delay required to be estimated is long, the number of FFT points is large, the complexity is high, and the CPU occupancy rate is high.
Disclosure of Invention
Embodiments of the present invention provide a method, system, electronic device, and storage medium for time delay estimation, which are used to solve at least one of the above technical problems.
In a first aspect, an embodiment of the present invention provides a method for estimating a time delay, including:
acquiring a reference signal and a microphone signal acquired by a microphone, and performing fast Fourier transform to obtain a frequency domain reference signal and a frequency domain microphone signal;
inputting the frequency-domain reference signal to an adaptive filter to obtain a relevant frequency-domain reference signal corresponding to the frequency-domain reference signal contained in the frequency-domain microphone signal, wherein the frequency-domain microphone signal is used for updating the adaptive filter;
calculating filter energy from the correlated frequency domain reference signal output by the adaptive filter for use in determining a delay value.
In a second aspect, an embodiment of the present invention provides a delay estimation system, including:
the signal acquisition program module is used for acquiring a reference signal and a microphone signal acquired by a microphone and performing fast Fourier transform to obtain a frequency domain reference signal and a frequency domain microphone signal;
an adaptive filter program module, configured to input the frequency-domain reference signal to an adaptive filter to obtain a relevant frequency-domain reference signal corresponding to the frequency-domain reference signal, where the frequency-domain reference signal is included in the frequency-domain microphone signal, and the frequency-domain microphone signal is used to update the adaptive filter;
a delay determination program module for calculating filter energy for determining a delay value from the correlated frequency domain reference signal output by the adaptive filter.
In a third aspect, an embodiment of the present invention provides a storage medium, where one or more programs including execution instructions are stored, where the execution instructions can be read and executed by an electronic device (including but not limited to a computer, a server, or a network device, etc.) to perform any one of the latency estimation methods described above in the present invention.
In a fourth aspect, an electronic device is provided, comprising: the system comprises at least one processor and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executable by the at least one processor to enable the at least one processor to perform any of the latency estimation methods of the present invention described above.
In a fifth aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a storage medium, and the computer program includes program instructions, which, when executed by a computer, cause the computer to execute any one of the latency estimation methods described above.
The embodiment of the invention determines the relevant frequency domain reference signal which is contained in the microphone signal collected by the microphone and is related to the reference signal by adopting the adaptive filter, and further determines the time delay value of the reference signal by calculating the energy of the adaptive filter. The method solves the problems that the performance is sharply reduced under the condition of environmental interference and the time delay estimation result is very unstable under the condition of a complex environment or a double-talk condition by adopting a cross-correlation method in the prior art.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of an embodiment of a delay estimation method of the present invention;
FIG. 2 is a flowchart of an embodiment of step S10 of the present invention;
FIG. 3 is a flowchart of an embodiment of step S30 of the present invention;
FIG. 4 is a flowchart of another embodiment of step S30 of the present invention;
FIG. 5 is a flowchart of another embodiment of step S30 of the present invention;
FIG. 6 is a flow chart of another embodiment of a delay estimation method of the present invention;
FIG. 7 is a functional block diagram of an embodiment of a delay estimation system of the present invention;
FIG. 8 is a functional block diagram of one embodiment of a signal acquisition program module in the delay estimation system of the present invention;
FIG. 9 is a functional block diagram of one embodiment of a delay determination program module in the delay estimation system of the present invention;
FIG. 10 is a functional block diagram of another embodiment of a delay determination program module in the inventive delay estimation system;
FIG. 11 is a functional block diagram of yet another embodiment of a delay determination program module in the inventive delay estimation system;
fig. 12 is a schematic structural diagram of an embodiment of an electronic device according to the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
As used in this disclosure, "module," "device," "system," and the like are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, or software in execution. In particular, for example, an element may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. Also, an application or script running on a server, or a server, may be an element. One or more elements may be in a process and/or thread of execution and an element may be localized on one computer and/or distributed between two or more computers and may be operated by various computer-readable media. The elements may also communicate by way of local and/or remote processes based on a signal having one or more data packets, e.g., from a data packet interacting with another element in a local system, distributed system, and/or across a network in the internet with other systems by way of the signal.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
As shown in fig. 1, an embodiment of the present invention provides a delay estimation method for echo cancellation of an electronic device (e.g., a smart speaker, a voice-controlled television set-top box, etc.), including the following steps:
s10, acquiring a reference signal and a microphone signal collected by a microphone, and performing fast Fourier transform to obtain a frequency domain reference signal and a frequency domain microphone signal;
s20, inputting the frequency-domain reference signal to an adaptive filter to obtain a correlated frequency-domain reference signal corresponding to the frequency-domain reference signal and contained in the frequency-domain microphone signal, where the frequency-domain microphone signal is used to update the adaptive filter;
s30, calculating adaptive filter energy according to the related frequency domain reference signal output by the adaptive filter for determining a time delay value.
The embodiment of the invention determines the relevant frequency domain reference signal which is contained in the microphone signal collected by the microphone and is related to the reference signal by adopting the adaptive filter, and further determines the time delay value of the reference signal by calculating the energy of the adaptive filter. The method solves the problems that the performance is sharply reduced under the condition of environmental interference and the time delay estimation result is very unstable under the condition of a complex environment or a double-talk condition by adopting a cross-correlation method in the prior art.
In addition, the updating complexity of the adaptive filter is reduced due to the fact that the adaptive filter is subjected to block updating instead of point updating in the frequency domain based on the frequency domain reference signal and the frequency domain microphone signal, and therefore the complexity of time delay estimation is reduced. By performing adaptive filtering on the reference signal, a block or a corresponding sampling point which is most matched (closest) to a frequency domain reference signal associated with the reference signal contained in the microphone signal is obtained, so that delay is obtained, and a more stable result can be obtained by continuously adaptively adding certain post-processing.
As shown in fig. 2, in some embodiments of the invention, the step S10: the acquiring a reference signal and a microphone signal collected by a microphone, and performing fast fourier transform to obtain a frequency domain reference signal and a frequency domain microphone signal includes:
s11, acquiring a prestored reference signal and a microphone signal acquired by a microphone; illustratively, for the smart speaker, when a song is played, the played song is stored as a reference signal, and a user instruction voice collected by a microphone of the smart speaker collects a signal for the microphone.
S12, inputting the reference signal and the microphone signal into a low-pass filter for filtering; the low-pass filter adopts a 15-order FIR, and the down-sampling adopts eight times down-sampling in consideration of complexity and stability.
S13, down-sampling the filtered reference signal and the microphone signal respectively; specifically, in order to prevent frequency aliasing, a low-pass filter is used to perform low-pass filtering processing on the filtered microphone signal and the reference signal, and then down-sampling is performed to obtain a down-sampled signal. The high-frequency signal is filtered by the filter, so that aliasing phenomenon cannot be generated when the frequency spectrum is expanded outwards in the down-sampling process, and the aliasing can generate a high-frequency signal to be converted into a low-frequency band.
And S14, respectively carrying out fast Fourier transform on the down-sampled reference signal and the down-sampled microphone signal obtained by down-sampling to obtain the frequency domain reference signal and the frequency domain microphone signal. The microphone signal and the reference signal after the down-sampling are converted from the time domain to the frequency domain to reduce the complexity of data processing. The frame length of the Fast Fourier Transform (FFT) is 128, 256, 512, or other sizes, which is not limited by the present invention.
As shown in fig. 3, in some embodiments of the invention, the above step S30: said calculating filter energy from said correlated frequency domain reference signal output by said adaptive filter for determining a delay value comprises:
s31, calculating the energy of each filter block according to the related frequency domain reference signal in the frequency domain; e.g. in the frequency domainThe values of the lower filter block are a number of complex points, [ a ]1+j*b1,a2+j*b2,...,an+j*bn]Then the energy of the block can be represented as (a)1 2+b1 2+a2 2+b2 2+...+an 2+bn 2) I.e. the square of the absolute value of the complex number or the square of the amplitude of the call.
And S32, determining the time delay value according to the maximum value in the energy of each block of filter. For example, the energy of the filter block is once [1,2,4,2,1], then the maximum energy is 4 (3 rd value from left to right), then the corresponding delay is 3 (i.e. is an index value of 4), indicating that the number of delay blocks is 3.
The embodiment of the invention is used for determining the time delay value in a mode of calculating the energy of the filter block in the frequency domain, the data amount required to be calculated and processed is small, and the complexity of estimating the time delay value is reduced.
As shown in fig. 4, in some embodiments of the invention, the above step S30: said calculating filter energy from said correlated frequency domain reference signal output by said adaptive filter for determining a delay value comprises:
s31', inverse Fourier transform the related frequency domain reference signal to obtain a corresponding related time domain reference signal;
s32', calculating the energy of each sampling point according to the related time domain reference signal in the time domain; for example, a series of input samples [1,2,3,4 ]]Then the energy per sample point is 12,22,32,42I.e. the square of the corresponding sample point.
S33', determining the delay value according to the maximum value of the energy of each sampling point. For example, assuming that the energy of each sampling point is [1,2,4,2,1], then the maximum energy is 4 (3 rd value from left to right), then the corresponding delay is 3 (i.e. is an index value of 4), which means that the sampling point of the delay is 3, and it is different from the block delay in unit, if the block delay is to be converted to the sampling point delay, then the block delay needs to be multiplied by the size of each block.
In the embodiment of the invention, the time delay value is estimated by calculating the energy of the sampling points in the time domain, and the estimation precision of the time delay value can be improved by estimating the energy of a plurality of sampling points in the time domain.
As shown in fig. 5, in some embodiments of the invention, the above step S30: said calculating filter energy from said correlated frequency domain reference signal output by said adaptive filter for determining a delay value comprises:
s31', calculating the energy of each filter block according to the related frequency domain reference signal in the frequency domain;
s32', determining a first time delay value according to the maximum value in the energy of each filter;
s33', inverse Fourier transform the related frequency domain reference signal to obtain a corresponding related time domain reference signal;
s34', calculating the energy of each sampling point according to the related time domain reference signal in the time domain;
s35', determining a second time delay value according to the maximum value of the energy of each sampling point;
s36', determining the delay value according to the first delay value and the second delay value. For example, each block of the filter has 512 points, if the sampling point delay estimation result is 1024, the block delay result is 1, the sampling point delay is converted to the block delay 1024/512 which is 2, and the block delay is not equal to the estimated value of the block delay, then the estimation result is invalidated, and the time still outputs the delay value of the previous time. If the block delay estimation result is also 2 at this time, the block delay corresponds to the sampling point delay estimation result, and the current result can be output.
In the embodiment of the invention, the sampling point delay and the block delay are the delays corresponding to the points needing to carry out peak value search on the W value of the filter, and the sampling point delay and the block delay are the results of the peak value search. The embodiment of the invention comprehensively considers the sampling point delay and the block delay, and the sampling point delay and the block delay are mutually used as references, thereby further improving the precision of the time delay estimation.
As shown in fig. 6, a flowchart of a field embodiment of the delay estimation method of the present invention specifically includes the following steps: down-sampling, fourier transform, adaptive filtering, peak search, and post-processing. Two paths of signals (a microphone signal and a reference signal) are input, and each frame is output. Each step is described separately below.
1) Down sampling
The reference signal and the microphone signal are low-pass filtered and then down-sampled (algorithm complexity can be reduced).
In order to prevent frequency aliasing, a low pass filter is first passed, where a FIR of 15 th order or 7 th order is used, which is not limited by the present invention. And in the sampling, the complexity and the stability are considered, and 8-time down-sampling or 4-time down-sampling is adopted, which is not limited by the invention. The microphone signal and the reference signal are down-sampled together to ensure that the data length processed each time is consistent, and the low-pass filtering and the sampling are included.
The high-frequency signal is filtered by the filter, so that aliasing phenomenon cannot be generated when the frequency spectrum is expanded outwards in the down-sampling process, and the aliasing can generate a high-frequency signal to be converted into a low-frequency band.
2) FFT (Fourier transform)
The down-sampled microphone signal and the reference signal are subjected to FFT (fourier transform) respectively. In order to reduce the complexity, processing is performed in the frequency domain, and therefore an FFT is required, and the frame length of the FFT is 128, 256, 512 or other sizes, which is not limited by the present invention.
3) Adaptive filtering
Here, the cyclic convolution is used to replace the linear convolution mode, and the overlap-preserving method is adopted to realize the method, and 50% overlap is used. The purpose of the adaptive filtering is to estimate the part of the microphone signal that is correlated with the reference signal. The input of the adaptive filtering is the reference signal and the output is the correlated part of the estimated microphone signal.
For the kth block filter and the reference signal, the reference signal filtering output result is:
y (k) the second half of the IFFT [ x (k) w (k) ],
where x (k) is the far-end block signal and w (k) is the filter block coefficients, and only the second half of the elements are retained because the second half of the elements are the result of the cyclic convolution. Wherein, the element is the sampling point, and the remote block signal is the form of a block into which the previous reference signal is divided.
The time domain block error signal is:
e(k)=d(k)-y(k),
where d (k) represents a microphone signal.
The frequency domain block error signal is:
E(k)=FFT[0e(k)],
where 0 means that half of 0 is added before e (k).
Normalizing E (k) to obtain
Figure BDA0001794196900000091
Wherein | X (k) | represents the smoothed energy of the reference signal,δis a fixed value that prevents the filter from diverging.
The update amount of the filter is:
Figure BDA0001794196900000092
the first half of the elements of (a),
where μ is the step-size coefficient factor, since only the first half of the results are correct results, and the second half needs to be discarded. This method is called overlap-save method, meaning that only a portion is saved.
The filter update formula is:
W(k+1)=W(k)+FFT[Φ(k)0],
where 0 represents the addition of half of 0 after Φ (k).
Updating of the filter is a relatively critical step, but where Φ (k) is the error
Figure BDA0001794196900000093
In connection with
Figure BDA0001794196900000094
And near-end signal, reference signalThe estimate of the signal at the near end is related to the estimate of the reference signal at the near end, which requires reference signal filtering. This is an iterative process.
4) Peak search
Considering here the output sample point delay and the block delay, two different branches are needed. Wherein the content of the first and second substances,
the flow of sampling point time delay needs to perform IFFT transformation on the whole filter to time domain, then calculate the energy of each coefficient of the filter, and select the sampling point with the largest energy as the estimated point time delay.
The block delay process is to calculate the energy of each filter block in the frequency domain, and then to take the block with the maximum energy as the block delay.
The sampling point delay and the block delay are delays corresponding to the points needing to carry out peak value search on the W value of the filter, and are the results of the peak value search. The point delay and the block delay can be considered together or can be considered separately for estimating the delay. The block delay has low precision requirement, low complexity and large error; the sampling points are delayed and have small errors but are more complex. The two can also be estimated together, and can be mutually used as reference, so that the precision is further improved.
5) Post-treatment
The post-treatment is mainly considered from two aspects: (1) the case where the delay is outside the filter length; (2) short time delay is abnormal jitter.
The current approach to alleviate the first problem is to perform short-time analysis on 20 consecutive frames, obtain the average energy and energy peak of the filter for each frame, then average the 20 frames, compare the average energy and peak energy, and consider the estimate of the 20 frames to be unreliable if the peak energy is less than some multiple of the average energy. Therefore, the one-time statistics of 20 frames is to prevent sudden and sporadic time delay values in a short time, which has no meaning on subsequent AEC filter adjustment and can also reduce certain complexity.
The second problem is alleviated by comparing the current estimated delay with the average estimated delay of the previous 20 frames, and if a threshold is exceeded, the current frame estimation result is deemed to be unreliable, and the previous frame result is output. If the current frame is deemed to be credible and the continuous multiframes are deemed to be credible, the time delay result of the current frame is output and multiplied by the down-sampling times to be used as the finally output sampling point time delay result.
And post-processing on block delay is relatively simple, because the estimation result of the block delay has small floating, and excessive post-processing can affect the accuracy, only considering the problem (1), comparing the energy of the current block with the average energy of the whole filter, and if the energy is smaller than a certain threshold value, considering that the current delay result is invalid.
The frame concept here is the number of samples or the length of the reference signal and the microphone signal processed at one time. Since real-time output is required, one audio segment cannot be processed all together, and therefore, output while processing is required. Here the frame concept is the same as the length of the previous block, but the filter consists of several such blocks. The input to the post-processing is the previously estimated delay result for each frame. Here, operations such as smoothing are performed on multi-frame results, and the post-processing is essential to make the results more stable. The output is also a time delay result, but is a composite output combining multiple frames.
It should be noted that for simplicity of explanation, the foregoing method embodiments are described as a series of acts or combination of acts, but those skilled in the art will appreciate that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention. In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
As shown in fig. 7, an embodiment of the present invention further provides a delay estimation system 700, including:
a signal obtaining program module 710, configured to obtain a reference signal and a microphone signal collected by a microphone, and perform fast fourier transform to obtain a frequency domain reference signal and a frequency domain microphone signal;
an adaptive filter procedure module 720, configured to input the frequency-domain reference signal to an adaptive filter to obtain a relevant frequency-domain reference signal corresponding to the frequency-domain reference signal included in the frequency-domain microphone signal, where the frequency-domain microphone signal is used to update the adaptive filter;
a delay determination module 730, configured to calculate filter energy for determining a delay value according to the correlated frequency domain reference signal output by the adaptive filter.
The embodiment of the invention determines the relevant frequency domain reference signal which is contained in the microphone signal collected by the microphone and is related to the reference signal by adopting the adaptive filter, and further determines the time delay value of the reference signal by calculating the energy of the adaptive filter. The method solves the problems that the performance is sharply reduced under the condition of environmental interference and the time delay estimation result is very unstable under the condition of a complex environment or a double-talk condition by adopting a cross-correlation method in the prior art.
In addition, the updating complexity of the adaptive filter is reduced due to the fact that the adaptive filter is subjected to block updating instead of point updating in the frequency domain based on the frequency domain reference signal and the frequency domain microphone signal, and therefore the complexity of time delay estimation is reduced. By performing adaptive filtering on the reference signal, a block or a corresponding sampling point which is most matched (closest) to a frequency domain reference signal associated with the reference signal contained in the microphone signal is obtained, so that delay is obtained, and a more stable result can be obtained by continuously adaptively adding certain post-processing.
As shown in fig. 8, in some embodiments of the invention, the signal acquisition program module 710 includes:
a signal acquisition program unit 711 for acquiring a reference signal stored in advance and a microphone signal acquired by a microphone;
a filter processing program unit 712, configured to input the reference signal and the microphone signal to a low-pass filter for filter processing;
a down-sampling program unit 713, configured to down-sample the filtered reference signal and the filtered microphone signal, respectively;
a fourier transform program unit 714, configured to perform fast fourier transform on the down-sampled reference signal and the down-sampled microphone signal obtained by down-sampling to obtain the frequency-domain reference signal and the frequency-domain microphone signal, respectively.
As shown in fig. 9, in some embodiments of the invention, the latency determination module 730 includes:
an energy calculation program unit 731 for calculating the energy of each filter block in the frequency domain from the correlated frequency domain reference signal;
a delay determination unit 732, configured to determine the delay value according to a maximum value of the energy of each block of the filter.
As shown in fig. 10, in some embodiments of the invention, the latency determination module 730 includes:
a signal conversion program unit 731' for performing inverse fourier transform on the correlated frequency domain reference signal to obtain a corresponding correlated time domain reference signal;
an energy calculation program unit 732' for calculating the energy of each sample point in the time domain from the correlated time domain reference signal;
a delay determination program unit 733' for determining said delay value from a maximum value of the energy of said each sample point.
As shown in fig. 11, in some embodiments of the invention, the latency determination module 730 includes:
a first energy calculation program unit 731' for calculating the energy per filter block in the frequency domain from the correlated frequency domain reference signal;
a first delay determination program unit 732' for determining a first delay value from a maximum of the energy of said each block of filters;
an inverse fourier transform program unit 733' for performing an inverse fourier transform on the correlated frequency domain reference signal to obtain a corresponding correlated time domain reference signal;
a second energy calculation program unit 734' for calculating the energy of each sampling point in the time domain according to the correlated time domain reference signal;
a second delay determining program unit 735' for determining a second delay value based on a maximum value of the energy of said each sample point;
a delay determining program unit 736' configured to determine the delay value according to the first delay value and the second delay value.
In some embodiments, the present invention provides a non-transitory computer-readable storage medium, in which one or more programs including executable instructions are stored, where the executable instructions can be read and executed by an electronic device (including but not limited to a computer, a server, or a network device, etc.) to perform any of the latency estimation methods of the present invention.
In some embodiments, the present invention further provides a computer program product comprising a computer program stored on a non-volatile computer-readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform any of the latency estimation methods described above.
In some embodiments, an embodiment of the present invention further provides an electronic device, which includes: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a latency estimation method.
In some embodiments, the present invention further provides a storage medium having a computer program stored thereon, wherein the computer program is executed by a processor to perform the time delay estimation method.
The delay estimation system of the embodiment of the present invention may be used to execute the delay estimation method of the embodiment of the present invention, and accordingly achieve the technical effect achieved by the implementation of the delay estimation method of the embodiment of the present invention, which is not described herein again. In the embodiment of the present invention, the relevant functional module may be implemented by a hardware processor (hardware processor).
Fig. 12 is a schematic diagram of a hardware structure of an electronic device for executing a delay estimation method according to another embodiment of the present application, and as shown in fig. 12, the electronic device includes:
one or more processors 1210 and a memory 1220, with one processor 1210 being an example in fig. 12.
The apparatus for performing the delay estimation method may further include: an input device 1230 and an output device 1240.
The processor 1210, memory 1220, input device 1230, and output device 1240 may be connected by a bus or other means, such as by a bus connection in fig. 12.
The memory 1220 is a non-volatile computer-readable storage medium, and can be used for storing non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the time delay estimation method in the embodiment of the present application. The processor 1210 executes various functional applications of the server and data processing by running nonvolatile software programs, instructions and modules stored in the memory 1220, so as to implement the latency estimation method of the above method embodiment.
The memory 1220 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the delay estimation device, and the like. Further, the memory 1220 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 1220 may optionally include memory located remotely from the processor 1210, and such remote memory may be connected to the latency estimation apparatus via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 1230 may receive input numeric or character information and generate signals related to user settings and function control of the delay estimation device. The output device 1240 may include a display device such as a display screen.
The one or more modules are stored in the memory 1220 and, when executed by the one or more processors 1210, perform the latency estimation method of any of the method embodiments described above.
The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.
The electronic device of the embodiments of the present application exists in various forms, including but not limited to:
(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.
(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as ipads.
(3) Portable entertainment devices such devices may display and play multimedia content. Such devices include audio and video players (e.g., ipods), handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.
(4) The server is similar to a general computer architecture, but has higher requirements on processing capability, stability, reliability, safety, expandability, manageability and the like because of the need of providing highly reliable services.
(5) And other electronic devices with data interaction functions.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions substantially or contributing to the related art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (6)

1. A method of delay estimation, comprising:
acquiring a reference signal and a microphone signal acquired by a microphone, and performing fast Fourier transform to obtain a frequency domain reference signal and a frequency domain microphone signal;
inputting the frequency-domain reference signal to an adaptive filter to obtain a relevant frequency-domain reference signal corresponding to the frequency-domain reference signal contained in the frequency-domain microphone signal, wherein the frequency-domain microphone signal is used for updating the adaptive filter;
calculating the energy of each filter block according to the related frequency domain reference signal in the frequency domain;
determining a first delay value according to a maximum value in the energy of each filter;
performing inverse Fourier transform on the relevant frequency domain reference signal to obtain a corresponding relevant time domain reference signal;
calculating the energy of each sampling point according to the relevant time domain reference signal in the time domain;
determining a second time delay value according to the maximum value in the energy of each sampling point;
determining a delay value according to the first delay value and the second delay value;
performing short-time analysis on 20 continuous frames, obtaining average energy and an energy peak value of a filter for each frame, then averaging the 20 frames, comparing the average energy with the peak energy, and if the peak energy is less than a certain multiple relation of the average energy, determining that the estimation of the 20 frames is not reliable;
comparing the current estimated delay with the average estimated delay of the previous 20 frames, and if the current estimated delay exceeds a certain threshold, determining that the current frame estimated result is not credible, and outputting the previous frame result; if the current frame is deemed to be credible and the continuous multiframes are deemed to be credible, the time delay result of the current frame is output and multiplied by the down-sampling times to be used as the finally output sampling point time delay result.
2. The method of claim 1, wherein the acquiring the reference signal and the microphone signal collected by the microphone and performing the fast fourier transform to obtain the frequency-domain reference signal and the frequency-domain microphone signal comprises:
acquiring a pre-stored reference signal and a microphone signal acquired by a microphone;
inputting the reference signal and the microphone signal to a low-pass filter for filtering;
respectively performing down-sampling on the filtered reference signal and the microphone signal;
and respectively carrying out fast Fourier transform on the down-sampled reference signal and the down-sampled microphone signal obtained by down-sampling to obtain the frequency domain reference signal and the frequency domain microphone signal.
3. A delay estimation system, comprising:
the signal acquisition program module is used for acquiring a reference signal and a microphone signal acquired by a microphone and performing fast Fourier transform to obtain a frequency domain reference signal and a frequency domain microphone signal;
an adaptive filter program module, configured to input the frequency-domain reference signal to an adaptive filter to obtain a relevant frequency-domain reference signal corresponding to the frequency-domain reference signal, where the frequency-domain reference signal is included in the frequency-domain microphone signal, and the frequency-domain microphone signal is used to update the adaptive filter;
a delay determination program module comprising:
a first energy calculation program unit for calculating the energy of each filter block in the frequency domain from the associated frequency domain reference signal;
a first delay determining program unit for determining a first delay value according to a maximum value among the energies of each block of filters;
an inverse fourier transform program unit, configured to perform inverse fourier transform on the relevant frequency domain reference signal to obtain a corresponding relevant time domain reference signal;
a second energy calculation program unit, configured to calculate, in the time domain, an energy of each sampling point according to the correlated time domain reference signal;
a second time delay determining program unit, configured to determine a second time delay value according to a maximum value in the energy of each sampling point;
a delay determining program unit, configured to determine a delay value according to the first delay value and the second delay value;
performing short-time analysis on 20 continuous frames, obtaining average energy and an energy peak value of a filter for each frame, then averaging the 20 frames, comparing the average energy with the peak energy, and if the peak energy is less than a certain multiple relation of the average energy, determining that the estimation of the 20 frames is not reliable;
comparing the current estimated delay with the average estimated delay of the previous 20 frames, and if the current estimated delay exceeds a certain threshold, determining that the current frame estimated result is not credible, and outputting the previous frame result; if the current frame is deemed to be credible and the continuous multiframes are deemed to be credible, the time delay result of the current frame is output and multiplied by the down-sampling times to be used as the finally output sampling point time delay result.
4. The system of claim 3, wherein the signal acquisition program module comprises:
a signal acquisition program unit for acquiring a reference signal stored in advance and a microphone signal acquired by a microphone;
a filter processing program unit, which is used for inputting the reference signal and the microphone signal to a low-pass filter for filter processing;
the down-sampling program unit is used for respectively down-sampling the reference signal and the microphone signal after the filtering processing;
and the Fourier transform program unit is used for respectively carrying out fast Fourier transform on the down-sampled reference signal and the down-sampled microphone signal obtained by down-sampling so as to obtain the frequency domain reference signal and the frequency domain microphone signal.
5. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any of claims 1-2.
6. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1-2.
CN201811049712.7A 2018-09-10 2018-09-10 Time delay estimation method, time delay estimation system, storage medium and electronic equipment Active CN109102821B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811049712.7A CN109102821B (en) 2018-09-10 2018-09-10 Time delay estimation method, time delay estimation system, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811049712.7A CN109102821B (en) 2018-09-10 2018-09-10 Time delay estimation method, time delay estimation system, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN109102821A CN109102821A (en) 2018-12-28
CN109102821B true CN109102821B (en) 2021-05-25

Family

ID=64865653

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811049712.7A Active CN109102821B (en) 2018-09-10 2018-09-10 Time delay estimation method, time delay estimation system, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN109102821B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109584898B (en) * 2018-12-29 2022-05-31 上海瑾盛通信科技有限公司 Voice signal processing method and device, storage medium and electronic equipment
CN111294473B (en) * 2019-01-28 2022-01-04 展讯通信(上海)有限公司 Signal processing method and device
CN109862503B (en) * 2019-01-30 2021-02-23 北京雷石天地电子技术有限公司 Method and equipment for automatically adjusting loudspeaker delay
CN110349592B (en) * 2019-07-17 2021-09-28 百度在线网络技术(北京)有限公司 Method and apparatus for outputting information
CN110992973A (en) * 2019-11-29 2020-04-10 维沃移动通信有限公司 Signal time delay determination method and electronic equipment
CN111613238B (en) * 2020-05-21 2023-09-19 阿波罗智联(北京)科技有限公司 Method, device, equipment and storage medium for determining delay between signals

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3369077A (en) * 1964-06-09 1968-02-13 Ibm Pitch modification of audio waveforms
CN1691716A (en) * 2004-04-23 2005-11-02 北京三星通信技术研究有限公司 Echo eliminator
CN101026659B (en) * 2006-02-23 2010-04-07 中兴通讯股份有限公司 Method for realizing echo time delay positioning
CN100524466C (en) * 2006-11-24 2009-08-05 北京中星微电子有限公司 Echo elimination device for microphone and method thereof
CN101119135B (en) * 2007-07-04 2010-09-01 深圳市融创天下科技发展有限公司 Step parameter regulation means and equipment for eliminating echo
EP2141696A1 (en) * 2008-07-03 2010-01-06 Deutsche Thomson OHG Method for time scaling of a sequence of input signal values
CN103700374B (en) * 2013-12-25 2016-08-17 宁波菊风系统软件有限公司 Determine method and the acoustic echo removing method of system delay in acoustic echo elimination
CN104751853B (en) * 2013-12-31 2019-01-04 辰芯科技有限公司 Dual microphone noise suppressing method and system
CN104038181B (en) * 2014-06-05 2017-05-17 北京航空航天大学 Self-adapting filter construction method based on NLMS algorithm
CN105810202B (en) * 2014-12-31 2019-07-02 展讯通信(上海)有限公司 It is a kind of to drop hypoechoic method, apparatus and communication apparatus
CN105472191B (en) * 2015-11-18 2019-09-20 百度在线网络技术(北京)有限公司 A kind of method and apparatus tracking echo delay time
DK3273608T3 (en) * 2016-07-20 2022-03-14 Sennheiser Electronic Gmbh & Co Kg ADAPTIVE FILTER UNIT FOR USE AS AN ECO COMPENSATOR
CN106936407B (en) * 2017-01-12 2021-03-02 西南电子技术研究所(中国电子科技集团公司第十研究所) Frequency domain block least mean square adaptive filtering method
CN107123430B (en) * 2017-04-12 2019-06-04 广州视源电子科技股份有限公司 Echo cancel method, device, meeting plate and computer storage medium
CN107785026B (en) * 2017-10-18 2020-10-20 会听声学科技(北京)有限公司 Time delay estimation method for indoor echo cancellation of set top box
CN107610713B (en) * 2017-10-23 2022-02-01 科大讯飞股份有限公司 Echo cancellation method and device based on time delay estimation

Also Published As

Publication number Publication date
CN109102821A (en) 2018-12-28

Similar Documents

Publication Publication Date Title
CN109102821B (en) Time delay estimation method, time delay estimation system, storage medium and electronic equipment
CN109473118B (en) Dual-channel speech enhancement method and device
CN111341336B (en) Echo cancellation method, device, terminal equipment and medium
Enzner et al. Acoustic echo control
US20140064476A1 (en) Systems and methods of echo & noise cancellation in voice communication
CN111951819A (en) Echo cancellation method, device and storage medium
CN112292844B (en) Double-end call detection method, double-end call detection device and echo cancellation system
US9866792B2 (en) Display apparatus and echo cancellation method thereof
CN104994249B (en) Sound method for echo cancellation and device
CN111768796A (en) Acoustic echo cancellation and dereverberation method and device
CN110782914B (en) Signal processing method and device, terminal equipment and storage medium
CN109727605B (en) Method and system for processing sound signal
US9773510B1 (en) Correcting clock drift via embedded sine waves
CN112602150A (en) Noise estimation method, noise estimation device, voice processing chip and electronic equipment
CN113539285A (en) Audio signal noise reduction method, electronic device, and storage medium
JP6422884B2 (en) Echo suppression
US11380312B1 (en) Residual echo suppression for keyword detection
WO2020135875A1 (en) Wiener adaptation-based channel estimation method and system
CN109246548B (en) Blasting noise control system, method and computing device
EP2716023A1 (en) Control of adaptation step size and suppression gain in acoustic echo control
CN111370016B (en) Echo cancellation method and electronic equipment
CN111989934B (en) Echo cancellation device, echo cancellation method, signal processing chip, and electronic apparatus
CN112151051A (en) Audio data processing method and device and storage medium
JP2014164190A (en) Signal processor, signal processing method and program
CN113205824B (en) Sound signal processing method, device, storage medium, chip and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Applicant after: Sipic Technology Co.,Ltd.

Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Applicant before: AI SPEECH Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Time delay estimation methods, systems, storage media, and electronic devices

Effective date of registration: 20230726

Granted publication date: 20210525

Pledgee: CITIC Bank Limited by Share Ltd. Suzhou branch

Pledgor: Sipic Technology Co.,Ltd.

Registration number: Y2023980049433

PE01 Entry into force of the registration of the contract for pledge of patent right