CN107770683B - A kind of detection method and device of echo scene subaudio frequency acquisition state - Google Patents

A kind of detection method and device of echo scene subaudio frequency acquisition state Download PDF

Info

Publication number
CN107770683B
CN107770683B CN201710948010.1A CN201710948010A CN107770683B CN 107770683 B CN107770683 B CN 107770683B CN 201710948010 A CN201710948010 A CN 201710948010A CN 107770683 B CN107770683 B CN 107770683B
Authority
CN
China
Prior art keywords
signal
difference
present frame
audio collection
error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710948010.1A
Other languages
Chinese (zh)
Other versions
CN107770683A (en
Inventor
陈超
邓滨
宋晨枫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xiaodu Technology Co Ltd
Original Assignee
Beijing Fish In Home Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Fish In Home Technology Co Ltd filed Critical Beijing Fish In Home Technology Co Ltd
Priority to CN201710948010.1A priority Critical patent/CN107770683B/en
Publication of CN107770683A publication Critical patent/CN107770683A/en
Application granted granted Critical
Publication of CN107770683B publication Critical patent/CN107770683B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/12Circuits for transducers, loudspeakers or microphones for distributing signals to two or more loudspeakers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups

Abstract

The embodiment of the invention discloses a kind of detection method and device of echo scene subaudio frequency acquisition state.Wherein method includes: the remote signaling and near end signal for obtaining present frame, and determines error signal according to remote signaling and near end signal, wherein remote signaling, near end signal are the corresponding frequency-region signal of present frame time-domain signal;The first coherence factor and remote signaling of near end signal and error signal and the second coherence factor of error signal are determined according to remote signaling, near end signal and error signal;The coherence's difference and difference pursuit gain of present frame are determined according to the first coherence factor and the second coherence factor, wherein the difference pursuit gain of present frame is determined according to coherence's difference of present frame and the difference pursuit gain of former frame;The audio collection state of present frame is determined according to coherence's difference of present frame and the difference pursuit gain.The embodiment of the present invention realizes the comparison threshold value that variation is determined according to each frame audio signal, improves the detection accuracy of terminal audio frequency acquisition state.

Description

A kind of detection method and device of echo scene subaudio frequency acquisition state
Technical field
The present embodiments relate to audio signal processing technique more particularly to a kind of detections of echo scene subaudio frequency acquisition state Method and device.
Background technique
With the continuous development of terminal, more and more terminals have a function of audio input and audio output, and due to Output audio is picked up by audio input device again, forms echo.
It is a kind of electronic equipment schematic diagram with echoing characteristic of the prior art referring to Fig. 1, Fig. 1.Wherein, electronic equipment 110 obtain the acoustic information 101 in environment by microphone 130, and transmission sound is converted into after necessary Audio Signal Processing Frequency signal 103 is transmitted to audio collection network or subsequent processing module 150;The playing module of electronic equipment 110, which obtains, to be received Audio signal 104 is simultaneously played by loudspeaker 140.Since the acoustic information that loudspeaker 140 plays simultaneously can be by the wheat of this equipment Gram wind 130 is picked up, i.e., it further includes echo signal 105 that the acoustic information that microphone 130 obtains, which not only includes acoustic information 101, The presence of echo signal can impact sound quality.
The mode that sef-adapting filter is usually used to the inhibition processing of the echo of terminal is realized, in the case where singly saying state, Sef-adapting filter, which can gradually restrain, reaches stable state, effectively eliminates echo signal;And double speaking state influences adaptive filter The convergence state of wave device, or even the diverging of filter can be caused, therefore under different audio collection states need to be arranged different Echo restrainable algorithms.Currently, it is low to the detection method precision of the audio collection state of terminal, error is big, seriously affect audio letter Number quality.
Summary of the invention
The present invention provides a kind of detection method and device of echo scene subaudio frequency acquisition state, improves terminal sound to realize The detection accuracy of frequency acquisition state.
In a first aspect, the embodiment of the invention provides a kind of detection method of echo scene subaudio frequency acquisition state, the party Method includes:
The remote signaling and near end signal of present frame are obtained, and is determined and is missed according to the remote signaling and the near end signal Difference signal, wherein the remote signaling, the near end signal and the error signal are the corresponding frequency domain of present frame time-domain signal Signal;
The near end signal and the error are determined according to the remote signaling, the near end signal and the error signal The first coherence factor and the remote signaling of signal and the second coherence factor of the error signal;
According to first coherence factor and second coherence factor determine present frame coherence's difference and difference with Track value, wherein the difference pursuit gain of present frame is determined according to coherence's difference of present frame and the difference pursuit gain of former frame;
The audio collection state of present frame is determined according to coherence's difference of present frame and the difference pursuit gain.
Further, after the remote signaling and near end signal that obtain present frame, further includes:
The distal end subband signal, proximal end subband signal and error subband of the remote signaling are obtained according to predeterminated frequency range Signal;
Correspondingly, according to the remote signaling, the near end signal and the error signal determine the near end signal with The first coherence factor and the remote signaling of the error signal and the second coherence factor of the error signal, comprising:
First phase is determined according to the distal end subband signal, the proximal end subband signal and the error subband signal Responsibility number and second coherence factor.
Further, it is determined according to the distal end subband signal, the proximal end subband signal and the error subband signal First coherence factor and second coherence factor, comprising:
Determine distal end subband certainly according to the distal end subband signal, the proximal end subband signal and the error subband signal Power spectrum, proximal end subband auto-power spectrum, error subband auto-power spectrum, the distal end subband signal and the error subband signal Second crosspower spectrum of the first crosspower spectrum and the proximal end subband signal and the error subband signal;
Institute is determined according to the distal end subband auto-power spectrum, the error subband auto-power spectrum and first crosspower spectrum The first coherence factor is stated, according to the proximal end subband auto-power spectrum, the error subband auto-power spectrum and second cross-power Spectrum determines first coherence factor.
Further, the audio collection state of present frame is determined according to coherence's difference and difference pursuit gain, comprising:
The second difference is determined according to coherence's difference and the difference pursuit gain;
If second difference is more than or equal to preset threshold, it is determined that the audio collection state of the present frame says shape to be double State;
If second difference is less than the preset threshold, it is determined that the audio collection state of the present frame is singly to say shape State.
Further, after the remote signaling and near end signal for obtaining present frame, further includes:
Obtain the energy of the remote signaling;
If the energy is less than energy threshold, it is determined that present frame is not under echo scene, is stopped to present frame audio The detection of acquisition state;
If the energy is more than or equal to the energy threshold, institute is judged according to the remote signaling of present frame and near end signal State the audio collection state of present frame.
Further, in the audio collection state for determining present frame according to the first coherence difference and difference pursuit gain Later, further includes:
When the audio collection state of the audio collection state and previous frame that detect present frame is inconsistent, present frame is detected Obstruction number whether be greater than zero;
If so, the audio collection state to present frame switches over, and the difference of the obstruction number and one is determined For the obstruction number of next frame;
If it is not, then keeping the audio collection state of the present frame.
Further, when the audio collection state consistency for the audio collection state and previous frame for detecting present frame or currently When the obstruction number of frame is zero, the obstruction number is updated to initial default.
Second aspect, the embodiment of the invention also provides a kind of detection devices of echo scene subaudio frequency acquisition state, should Device includes:
Signal acquisition module, for obtaining the remote signaling and near end signal of present frame, and according to the remote signaling with The near end signal determines error signal, wherein the remote signaling, the near end signal and the error signal are present frame The corresponding frequency-region signal of time-domain signal;
Coherence factor determining module, for being determined according to the remote signaling, the near end signal and the error signal Second phase of the near end signal and the first coherence factor of the error signal and the remote signaling and the error signal Responsibility number;
Difference determining module, for determining the phase of present frame with second coherence factor according to first coherence factor Stemness difference and difference pursuit gain, wherein the difference pursuit gain of present frame is according to coherence's difference of present frame and the difference of former frame It is worth pursuit gain to determine;
First audio collection state determining module, for according to coherence's difference of present frame and difference tracking It is worth the audio collection state for determining present frame.
Further, the device further include:
Subband signal obtains module, after the remote signaling and near end signal that obtain present frame, according to predeterminated frequency range Obtain the distal end subband signal, proximal end subband signal and error subband signal of the remote signaling;
Correspondingly, the coherence factor determining module includes:
First phase is determined according to the distal end subband signal, the proximal end subband signal and the error subband signal Responsibility number and second coherence factor.
Further, the coherence factor determining module is specifically used for:
Determine distal end subband certainly according to the distal end subband signal, the proximal end subband signal and the error subband signal Power spectrum, proximal end subband auto-power spectrum, error subband auto-power spectrum, the distal end subband signal and the error subband signal Second crosspower spectrum of the first crosspower spectrum and the proximal end subband signal and the error subband signal;
Institute is determined according to the distal end subband auto-power spectrum, the error subband auto-power spectrum and first crosspower spectrum The first coherence factor is stated, according to the proximal end subband auto-power spectrum, the error subband auto-power spectrum and second cross-power Spectrum determines first coherence factor.
Further, the first audio collection state determining module includes:
Second difference value determining unit, for determining the second difference according to coherence's difference and the difference pursuit gain;
First audio collection status determining unit, if being more than or equal to preset threshold for second difference, it is determined that institute The audio collection state for stating present frame is double speaking state;If second difference is less than the preset threshold, it is determined that described to work as The audio collection state of previous frame is singly to say state.
Further, the device further include:
Energy harvesting module, for after the remote signaling and near end signal for obtaining present frame, obtaining the distal end letter Number energy;
Second audio collection state determining module, if being less than energy threshold for the energy, it is determined that present frame is not located Under echo scene, stop the detection to present frame audio collection state;If the energy is more than or equal to the energy threshold, The audio collection state of the present frame is judged according to the remote signaling of present frame and near end signal.
Further, described device further include:
Number determining module is hindered, for determining present frame according to the first coherence difference and difference pursuit gain After audio collection state, when the audio collection state of the audio collection state and previous frame that detect present frame is inconsistent, Whether the obstruction number of detection present frame is greater than zero;
Third audio collection state determining module, if for hindering number to be greater than zero, to the audio collection shape of present frame State switches over, and the difference of the obstruction number and one is determined as to the obstruction number of next frame;If number is hindered to be equal to zero, Then keep the audio collection state of the present frame.
Further, when the audio collection state consistency for the audio collection state and previous frame for detecting present frame or currently When the obstruction number of frame is zero, the obstruction number is updated to initial default.
The embodiment of the present invention is by having determined proximal end according to the near end signal of frequency domain, remote signaling and corresponding error signal Second coherence factor of the first coherence factor and remote signaling and error signal of signal and error signal, and according to first Coherence factor is good and the second coherence factor further determined coherence's difference and corresponding difference pursuit gain, according to each frame Coherence's difference and difference pursuit gain determine the audio collection state of terminal, solve terminal audio frequency acquisition state in the prior art The big problem of detection error realizes the comparison threshold value that variation is determined according to each frame audio signal, the terminal audio frequency improved The detection accuracy of acquisition state.
Detailed description of the invention
Fig. 1 is a kind of electronic equipment schematic diagram with echoing characteristic of the prior art;
Fig. 2 is a kind of process of the detection method for echo scene subaudio frequency acquisition state that the embodiment of the present invention one provides Figure;
Fig. 3 is a kind of process of the detection method of echo scene subaudio frequency acquisition state provided by Embodiment 2 of the present invention Figure;
Fig. 4 is that a kind of structure of the detection device for echo scene subaudio frequency acquisition state that the embodiment of the present invention three provides is shown It is intended to.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
Embodiment one
Fig. 2 is a kind of process of the detection method for echo scene subaudio frequency acquisition state that the embodiment of the present invention one provides The case where figure, the present embodiment are applicable to both-end audio collection there are the audio collection states for judging terminal when echo, this method It can be executed by the detection device of audio collection state provided in an embodiment of the present invention, which can be used software and/or hard The mode of part is realized.This method is applicable to the terminal device with loudspeaker-microphone circuit, illustratively, terminal device It can be the audio collecting devices such as smart phone, Intelligent bracelet, intelligent sound box or smart television etc..Referring to fig. 2, this method has Body includes:
S110, the remote signaling and near end signal for obtaining present frame, and error is determined according to remote signaling and near end signal Signal, wherein remote signaling, near end signal and error signal are the corresponding frequency-region signal of present frame time-domain signal.
Wherein, remote signaling refers to that terminal's signal reception module is received, for the audio signal of broadcasting, proximal end letter Number refer to the audio signal of the signal acquisition module acquisition of terminal, optionally, near end signal can be including the effective of user The echo signal acquired again when signal, ambient noise signal and terminal plays remote signaling.
In the present embodiment, error signal and remote signaling and near end signal are concerned with, and optionally, are believed based on filter distal end It number is filtered, obtains estimated echo signal, the difference of near end signal and estimated echo signal is determined as error signal.
In the present embodiment, the remote signaling, near end signal and error signal of time domain are obtained, and time-domain signal is converted into frequency Domain signal, wherein corresponding frequency-region signal can be obtained by carrying out Fourier transformation to time-domain signal.Illustratively, when setting Domain remote signaling is x (k), and time domain near end signal is d (k), and time domain error signal is e (k), then carries out Fu to above-mentioned time-domain signal In Ye Bianhua are as follows:
Wherein, STFT is Fourier transform function, and x (f, i) is frequency domain remote signaling, and d (f, i) is frequency domain near end signal, e (f, i) is error of frequency domain signal, and f is the frequency of Fourier transformation, and i is the frame number of the signal of each Fourier transformation.
Optionally, after the remote signaling and near end signal that obtain present frame, further includes: obtained according to predeterminated frequency range Distal end subband signal, proximal end subband signal and error subband signal.
Wherein, predeterminated frequency range refers to being influenced lesser frequency range by noise signal, optionally, predeterminated frequency model Enclosing can be is determined by the noise signal for identifying near end signal.It, can basis after determining predeterminated frequency range in the present embodiment Predeterminated frequency range intercepts the remote signaling, near end signal and error signal of frequency domain, determines corresponding subband signal, shows Example property, referring to formula (2);Be also possible to be determined according to predeterminated frequency range distal end subband signal and proximal end subband signal it Afterwards, error subband signal is determined according to distal end subband signal and proximal end subband signal.
Wherein, xb(f, i), db(f, i), eb(f, i) is respectively the distal end subband signal, proximal end subband signal and mistake of the i-th frame Poor subband signal, FblFor the minimum value of predeterminated frequency range, FbhFor the maximum value of predeterminated frequency range.
In the present embodiment, by the way that the predeterminated frequency range of low noise sound signal is arranged, and according to predeterminated frequency range to frequency domain Signal is intercepted, so that the subband signal obtained includes a small amount of noise information, reduces noise signal to each frequency-region signal Interference, be further conducive to improve terminal audio frequency acquisition state detection accuracy.
S120, determine that the first of near end signal and error signal is concerned with according to remote signaling, near end signal and error signal Second coherence factor of coefficient and remote signaling and error signal.
Wherein, the first coherence factor is used to characterize the coherence of near end signal and error signal, and the second coherence factor is used for Characterize the coherence of remote signaling and error signal, wherein coherence factor is bigger, shows that the coherence of corresponding two signals is higher. Illustratively, the first coherence factor is bigger, shows that the coherence of near end signal and error signal is higher, if the first coherence factor is 1, then show that there is no remote signalings;Similarly, the second coherence factor is bigger, shows that the coherence of remote signaling and error signal gets over Height shows that there is no near end signals if the second coherence factor is 1.
Optionally, distal end subband signal, proximal end subband signal and error subband signal are being determined according to predeterminated frequency range Later, the first coherence factor and the second phase responsibility are determined according to distal end subband signal, proximal end subband signal and error subband signal Number, reduces the noise signal in each subband signal, improves the accuracy of the first coherence factor and the second coherence factor.
Optionally, the first coherence factor and the second coherence factor can be determined as follows: according to distal end subband signal, Proximal end subband signal and error subband signal determine distal end subband auto-power spectrum, proximal end subband auto-power spectrum, error subband from function Rate spectrum, the second crosspower spectrum and proximal end subband signal of distal end subband signal and error subband signal and error subband signal First crosspower spectrum;Determine that second is relevant according to distal end subband auto-power spectrum, error subband auto-power spectrum and the second crosspower spectrum Coefficient determines the first coherence factor according to proximal end subband auto-power spectrum, error subband auto-power spectrum and the first crosspower spectrum.
Wherein, the auto-power spectrum of each subband signal, illustratively, the auto-power spectrum of each subband signal are as follows:
Wherein, Px(f, i) is the distal end subband auto-power spectrum of the i-th frame, Pd(f, i) is the proximal end subband of the i-th frame from power Spectrum, Pe(f, i) is the error subband auto-power spectrum of the i-th frame.
Optionally, the second crosspower spectrum, illustratively, the second crosspower spectrum are Pxe(f, i)=xb(f,i)·eb(f,i); Illustratively, the first crosspower spectrum is Pde(f, i)=db(f,i)·eb(f, i), wherein f ∈ (Fbl,Fbh) and 0≤Fbl< Fbh ≤Fmax
Optionally, each subband auto-power spectrum, the first crosspower spectrum and the second crosspower spectrum are smoothed, wherein when The smooth power spectrum of previous frame and the power spectrum of present frame and the power spectrum of former frame are relevant, illustratively, referring to formula (4):
Wherein, Psx(f, i) is the smooth auto-power spectrum of distal end subband of the i-th frame, Pse(f, i) is that the error subband of the i-th frame is flat Sliding auto-power spectrum, Psd(f, i) is the smooth auto-power spectrum of proximal end subband of the i-th frame, Psde(f, i) is the first smooth mutual function of the i-th frame Rate spectrum, Psxe(f, i) is the second smooth crosspower spectrum of the i-th frame, αs1For the first default smoothing factor, and 0 < αs1<1.It needs Bright, the subband auto-power spectrum and crosspower spectrum of the 0th frame are 0.
In the present embodiment, each subband auto-power spectrum and the first crosspower spectrum and the second crosspower spectrum are smoothed, The power spectrum error for reducing adjacent frame signal improves the precision that each frame corresponds to power spectrum, is conducive to improve the first coherence factor With the precision of the second coherence factor.
Optionally, the first coherence factor is directly proportional to the first crosspower spectrum, with proximal end subband auto-power spectrum and error subband The product of auto-power spectrum is inversely proportional, and similarly, the second coherence factor is directly proportional to the second crosspower spectrum, with distal end subband auto-power spectrum It is inversely proportional with the product of error subband auto-power spectrum.It should be noted that true according to smoothed out crosspower spectrum and auto-power spectrum Fixed first coherence factor and the second coherence factor are conducive to the precision for improving the first coherence factor and the second coherence factor.
Optionally, the first coherence factor and the second coherence factor are smoothed, reduce the mistake of each frame coherence factor Difference, such as can be and the second smooth coherence factor is determined by following formula (5), determine that first is smooth by following formula (6) Coherence factor:
Wherein, hxe(f, i) is the corresponding second smooth coherence factor of each frequency f of the i-th frame;hde(f, i) is each frequency of the i-th frame The corresponding first smooth coherence factor of rate f, αs2For the second default smoothing factor, and 0 < αs2<1.It should be noted that the 0th frame Coherence factor is 0.
Optionally, the first coherence factor average value and the second coherence factor average value of each frequency f of the i-th frame are obtained.Example Property, the first coherence factor average value isSecond coherence factor average value are as follows:Wherein, n is the quantity of the i-th frame rate f.Illustratively, it is assumed that each Frequency point is 100hz Precision, Fbl=500Hz, Fbh=1000Hz, and f=500Hz, 600Hz, 700Hz, 800Hz, 900Hz and 1000Hz, then n=6.
S130, coherence's difference that present frame is determined according to the first coherence factor and the second coherence factor and difference tracking Value.
Wherein, the difference pursuit gain of present frame is true according to coherence's difference of present frame and the difference pursuit gain of former frame It is fixed.
Wherein, coherence's difference is directly proportional to the difference of the first coherence factor and the second coherence factor, illustratively, is concerned with Property difference is
In the present embodiment, determining there are in the case where remote signaling, according to the first coherence factor and the second coherence factor Determining coherence's difference can be used for reacting the variation of near end signal.Optionally, if coherence's difference is greater than the first judgment threshold, It then determines there are near end signal, further determines that the audio collection state of terminal is in double speaking state.
Wherein, difference pursuit gain is used to characterize the situation of change of coherence's difference of each frame signal, the difference of present frame with Track value is determined according to coherence's difference of present frame and the difference pursuit gain of former frame, and the change with coherence's difference of present frame Change trend is relevant.Illustratively, the difference pursuit gain of the i-th frame can be determines according to formula (7):
Wherein, αs3uTo rise criterion smoothing factor, αs3dTo decline criterion smoothing factor, and 0 < αs3us3d< 1, ξs(i- It 1) is the difference pursuit gain of the (i-1)-th frame, it should be noted that ξs(0) it can be 0.
S140, the audio collection state that present frame is determined according to the coherence's difference and difference pursuit gain of present frame.
In the present embodiment, for each frame audio signal, according to the difference of corresponding coherence's difference and difference pursuit gain It determines corresponding audio collection state, i.e., corresponding audio is determined according to the variable quantity of coherence's difference of each frame audio signal Acquisition state.Illustratively, it is mutated if coherence's difference of present frame exists relative to difference pursuit gain, it is determined that terminal is worked as Preceding audio collection state is double speaking state, conversely, if coherence's difference of present frame is constant relative to difference pursuit gain or deposit In local gradual change, it is determined that the present video acquisition state of terminal is singly to say state.
Optionally, step S140 includes: to determine the second difference according to coherence's difference and difference pursuit gain;If the second difference More than or equal to preset threshold, it is determined that the audio collection state of present frame is double speaking state;If the second difference is less than preset threshold, The audio collection state for then determining present frame is singly to say state.
Illustratively, the second difference can be ξd(i)=ξ (i)-ξs(i), preset threshold is set as σT
Wherein, DTD (i) is for indicating the i-th frame audio collection state, if the second difference is more than or equal to preset threshold, DTD (i) it is 1, determines that the audio collection state of the i-th frame is double speaking state;If the second difference is less than preset threshold, DTD (i) is 0, The audio collection state for determining the i-th frame is singly to say state.Wherein, σTFor fixed value, and -2≤σT≤2。
In the present embodiment, the difference pursuit gain of each frame forms the change curve of coherence's difference, the difference pursuit gain of each frame With fixed threshold σTThe comparison threshold value for forming variation, by coherence's difference for each frame, threshold value is carried out compared with corresponding Compare, audio collection state is determined according to comparison result, it is true compared with same threshold instead of coherence's difference to each frame The case where accordatura frequency acquisition state, realizes and provides different relatively threshold values according to each frame audio signal, increases and compare threshold value Flexibility, improve audio collection state judgement accuracy.
Optionally, after step s 140, corresponding echo restrainable algorithms are determined according to the audio collection state of present frame, Echo inhibition processing is carried out to the audio signal of present frame, interference of the echo signal to effective audio signal is reduced, improves audio The quality of signal.
The technical solution of the present embodiment, by true according to the near end signal of frequency domain, remote signaling and corresponding error signal Determine the first coherence factor and remote signaling of near end signal and error signal and the second coherence factor of error signal, and It is good according to the first coherence factor and the second coherence factor further determined coherence's difference and corresponding difference pursuit gain, foundation The coherence's difference and difference pursuit gain of each frame determine the audio collection state of terminal, solve terminal audio frequency in the prior art The big problem of acquisition state detection error realizes the comparison threshold value for determining variation according to each frame audio signal, improves The detection accuracy of terminal audio frequency acquisition state.
On the basis of the above embodiments, after the remote signaling and near end signal for obtaining present frame, further includes: obtain The energy of the remote signaling;If the energy is less than energy threshold, it is determined that present frame is not under echo scene, stopping pair The detection of present frame audio collection state;If the energy is more than or equal to the energy threshold, believed according to the distal end of present frame Number and near end signal judge the audio collection state of the present frame.
In the present embodiment, for the audio signal of each frame, remote signaling if it does not exist, it is determined that present frame is not in back Under sound field scape, without carrying out echo processing to audio signal, stop the detection to present frame audio collection state.Illustratively, If the energy of the remote signaling is less than energy threshold, it is determined that remote signaling is not present.Illustratively, if according to remote signaling Energy determine there are remote signaling, then provided according to embodiments of the present invention in current frame voice frequency signal audio collection state inspection Survey method determines the audio collection state of present frame.It should be noted that if remote signaling is not present in present frame, it can be to present frame It is handled with double speaking state.
Embodiment two
Fig. 3 is a kind of process of the detection method of echo scene subaudio frequency acquisition state provided by Embodiment 2 of the present invention Figure, is refined on the basis of the above embodiments, correspondingly, this method specifically includes:
S210, the remote signaling and near end signal for obtaining present frame, and error is determined according to remote signaling and near end signal Signal.
Wherein, remote signaling, near end signal and error signal are the corresponding frequency-region signal of present frame time-domain signal;
S220, determine that the first of near end signal and error signal is concerned with according to remote signaling, near end signal and error signal Second coherence factor of coefficient and remote signaling and error signal.
S230, coherence's difference that present frame is determined according to the first coherence factor and the second coherence factor and difference tracking Value.
Wherein the difference pursuit gain of present frame is determined according to coherence's difference of present frame and the difference pursuit gain of former frame.
S240, the audio collection state that present frame is determined according to the coherence's difference and difference pursuit gain of present frame.
Whether the audio collection state of S250, the audio collection state for detecting present frame and previous frame are consistent, if so, Step S280 is executed, if it is not, thening follow the steps S260.
In the present embodiment, situations such as user voice pauses or trails is commonly present in audio signal, according to the above method When judging the audio collection state of each frame audio signal, it is also easy to produce the frequent switching phenomenon of audio collection state.Illustratively, The case where terminal is in double speaking state in user's speech, and user pauses or sound trails is determined as singly saying state, and audio is adopted The frequent switching of collection state, the poor robustness of detection.
Whether S260, the obstruction number for detecting present frame are greater than zero, if it is not, S280 is thened follow the steps, if so, executing step Rapid S270.
S270, the audio collection state of present frame is switched over, and it is next that difference of the number with one will be hindered to be determined as The obstruction number of frame.
It is inconsistent in the audio collection state of the audio collection state and previous frame that detect present frame in the present embodiment When, the obstruction number of present frame is obtained, wherein obstruction number refers to hindering the number of audio collection state change.
Illustratively, if number is hindered to be not zero, show that the audio collection state of present frame should be with the audio of previous frame Acquisition state is consistent, and the audio collection state of present frame is switched over, and number and one difference will be hindered to be determined as The obstruction number of next frame.Wherein, the switching of the audio collection state of present frame is referred to the audio collection state of present frame It is updated to the audio collection state consistency with previous frame.
Illustratively, the audio collection state of previous frame is double speaking state, and the audio collection state of present frame is singly to say shape State, and the obstruction number of present frame is greater than zero, illustrative obstruction number is 5, then by the final audio collection state of present frame It is determined as double speaking state, while determines that the obstruction number of next frame is 4.
It should be noted that for comparing the final audio collection shape for being previous frame with the audio collection state of present frame State.
S280, the audio collection state for keeping present frame.
In the present embodiment, if the audio collection state consistency of the audio collection state of present frame and previous frame, it is determined that when The audio collection state of previous frame and the audio collection state of previous frame keep identical trend, and determine the audio collection of present frame State is constant.Illustratively, if user is constantly in talk situation, the audio collection state of consecutive frame is identical, is double say State, without being switched over to audio collection state.
If the audio collection state of present frame and the audio collection state of previous frame are inconsistent, and hindering number is zero, then Show that present frame has disengaged from a kind of hangover time of audio collection state, a kind of lower audio collection state can be entered, keep working as The audio collection state of previous frame.Illustratively, if after a period of time that user winds up a speech, i.e. the obstruction number of double speaking state It is gradually reduced and becomes after zero, it may be determined that enter and singly say state.
Optionally, when the audio collection state consistency or present frame of the audio collection state and previous frame for detecting present frame Obstruction number when being zero, the obstruction number is updated to initial default.
Wherein, the initial value of double speaking state and the obstruction number for singly saying state can be different.Illustratively, audio is believed Number rise time be generally 4-40ms, can set the obstruction number for singly saying state to 1-10 times, when the hangover of audio signal Between generally 40-400ms, can set the obstruction number of double speaking state to 10-100 times.
In the present embodiment, hindering the initial default of number is to hinder the maximum value of number.Illustratively, if present frame The audio collection state consistency of audio collection state and previous frame, is in double speaking state, then the obstruction number of present frame is double Say that the maximum of state hinders number;If the audio collection state of previous frame is data mode, the audio collection state of present frame is Singly say state, and the obstruction number of present frame is zero, it is determined that the final audio collection state of present frame is singly to say state, simultaneously The obstruction number for updating present frame is singly to say that the maximum of state hinders number.
In the present embodiment, the audio collection state trend where present frame is identified, and according to the audio collection state trend The audio collection state of present frame is smoothed, so that the audio signal in preset quantity frame is in same audio collection State avoids the frequent switching of audio collection state, improves the robustness of audio collection state-detection.
The technical solution of the present embodiment passes through the audio collection when the audio collection state and previous frame for recognizing present frame State is different, and when the obstruction number of present frame is greater than zero, the audio collection state of present frame is switched to the sound with previous frame Frequency acquisition state is consistent, solves the problems, such as the frequent switching of audio collection state caused by being paused or trailed due to user, Improve the robustness of audio collection state-detection.
Embodiment three
Fig. 4 is that a kind of structure of the detection device for echo scene subaudio frequency acquisition state that the embodiment of the present invention three provides is shown It is intended to, which specifically includes:
Signal acquisition module 310, for obtaining the remote signaling and near end signal of present frame, and according to remote signaling and closely End signal determines error signal, wherein remote signaling, near end signal and error signal are the corresponding frequency domain of present frame time-domain signal Signal;
Coherence factor determining module 320, for according to remote signaling, near end signal and error signal determine near end signal with The first coherence factor and remote signaling of error signal and the second coherence factor of error signal;
Difference determining module 330, for determining the coherence of present frame according to the first coherence factor and the second coherence factor Difference and difference pursuit gain, wherein the difference pursuit gain of present frame according to the difference of coherence's difference of present frame and former frame with Track value determines;
First audio collection state determining module 340, for according to coherence's difference of present frame and the difference Pursuit gain determines the audio collection state of present frame.
Optionally, further includes:
Subband signal obtains module, after the remote signaling and near end signal that obtain present frame, according to predeterminated frequency range Obtain distal end subband signal, proximal end subband signal and the error subband signal of remote signaling;
Correspondingly, coherence factor determining module 320 includes:
Determine that the first coherence factor and second is concerned with according to distal end subband signal, proximal end subband signal and error subband signal Coefficient.
Optionally, coherence factor determining module 320 is specifically used for:
Distal end subband auto-power spectrum, proximal end are determined according to distal end subband signal, proximal end subband signal and error subband signal Subband auto-power spectrum, error subband auto-power spectrum, the first crosspower spectrum of distal end subband signal and error subband signal and close Second crosspower spectrum of terminal band signal and error subband signal;
The first coherence factor is determined according to distal end subband auto-power spectrum, error subband auto-power spectrum and the first crosspower spectrum, The first coherence factor is determined according to proximal end subband auto-power spectrum, error subband auto-power spectrum and the second crosspower spectrum.
Optionally, the first audio collection state determining module 340 includes:
Second difference value determining unit, for determining the second difference according to coherence's difference and difference pursuit gain;
First audio collection status determining unit, if being more than or equal to preset threshold for the second difference, it is determined that present frame Audio collection state be double speaking state;If the second difference is less than preset threshold, it is determined that the audio collection state of present frame is Singly say state.
Optionally, further includes:
Energy harvesting module, for obtaining remote signaling after the remote signaling and near end signal for obtaining present frame Energy;
Second audio collection state determining module, if being less than energy threshold for energy, it is determined that present frame is not in back Under sound field scape, stop the detection to present frame audio collection state;If energy is more than or equal to energy threshold, according to present frame Remote signaling and near end signal judge the audio collection state of present frame.
Optionally, device further include:
Number determining module is hindered, in the audio for determining present frame according to first coherence's difference and difference pursuit gain After acquisition state, when the audio collection state of the audio collection state and previous frame that detect present frame is inconsistent, detection Whether the obstruction number of present frame is greater than zero;
Third audio collection state determining module, if for hindering number to be greater than zero, to the audio collection shape of present frame State switches over, and the difference of number and one will be hindered to be determined as the obstruction number of next frame;If number is hindered to be equal to zero, protect Hold the audio collection state of present frame.
Optionally, when the audio collection state consistency or present frame of the audio collection state and previous frame for detecting present frame Obstruction number when being zero, number will be hindered to be updated to initial default.
The detection device of audio collection state provided in an embodiment of the present invention can be performed any embodiment of that present invention and be provided Audio collection state detection method, have execute audio collection state the corresponding functional module of detection method and beneficial to effect Fruit.
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims (12)

1. a kind of detection method of echo scene subaudio frequency acquisition state characterized by comprising
The remote signaling and near end signal of present frame are obtained, and determines that error is believed according to the remote signaling and the near end signal Number, wherein the remote signaling, the near end signal and the error signal are the corresponding frequency domain letter of present frame time-domain signal Number;
The near end signal and the error signal are determined according to the remote signaling, the near end signal and the error signal The first coherence factor and the remote signaling and the error signal the second coherence factor;
The coherence's difference and difference pursuit gain of present frame are determined according to first coherence factor and second coherence factor, Wherein the difference pursuit gain of present frame is determined according to coherence's difference of present frame and the difference pursuit gain of former frame;
The audio collection state of present frame is determined according to coherence's difference of present frame and the difference pursuit gain;
The audio collection state is singly to say state or double speaking state;
The audio collection state that present frame is determined according to coherence's difference and difference pursuit gain, comprising:
The second difference is determined according to coherence's difference and the difference pursuit gain;
If second difference is more than or equal to preset threshold, it is determined that the audio collection state of the present frame is double speaking state;
If second difference is less than the preset threshold, it is determined that the audio collection state of the present frame is singly to say state.
2. the method according to claim 1, wherein obtain present frame remote signaling and near end signal after, Further include:
Distal end subband signal, proximal end subband signal and error the subband letter of the remote signaling are obtained according to predeterminated frequency range Number;
Correspondingly, according to the remote signaling, the near end signal and the error signal determine the near end signal with it is described The first coherence factor and the remote signaling of error signal and the second coherence factor of the error signal, comprising:
The first phase responsibility is determined according to the distal end subband signal, the proximal end subband signal and the error subband signal Several and second coherence factor.
3. according to the method described in claim 2, it is characterized in that, being believed according to the distal end subband signal, the proximal end subband Number and the error subband signal determine first coherence factor and second coherence factor, comprising:
Determine distal end subband from power according to the distal end subband signal, the proximal end subband signal and the error subband signal The first of spectrum, proximal end subband auto-power spectrum, error subband auto-power spectrum, the distal end subband signal and the error subband signal Second crosspower spectrum of crosspower spectrum and the proximal end subband signal and the error subband signal;
Described is determined according to the distal end subband auto-power spectrum, the error subband auto-power spectrum and first crosspower spectrum One coherence factor, it is true according to the proximal end subband auto-power spectrum, the error subband auto-power spectrum and second crosspower spectrum Fixed first coherence factor.
4. the method according to claim 1, wherein obtain present frame remote signaling and near end signal it Afterwards, further includes:
Obtain the energy of the remote signaling;
If the energy is less than energy threshold, it is determined that present frame is not under echo scene, is stopped to present frame audio collection The detection of state;
If the energy is more than or equal to the energy threshold, work as according to the remote signaling of present frame and near end signal judgement The audio collection state of previous frame.
5. method according to claim 1 to 4, which is characterized in that according to coherence's difference and difference tracking It is worth after the audio collection state for determining present frame, further includes:
When the audio collection state of the audio collection state and previous frame that detect present frame is inconsistent, the resistance of present frame is detected Hinder whether number is greater than zero;
If so, the audio collection state to present frame switches over, and under the difference of the obstruction number and one is determined as The obstruction number of one frame;
If it is not, then keeping the audio collection state of the present frame.
6. according to the method described in claim 5, it is characterized in that, when the audio collection state and previous frame for detecting present frame Audio collection state consistency or the obstruction number of present frame when being zero, the obstruction number is updated to initial default.
7. a kind of detection device of echo scene subaudio frequency acquisition state characterized by comprising
Signal acquisition module, for obtaining the remote signaling and near end signal of present frame, and according to the remote signaling with it is described Near end signal determines error signal, wherein the remote signaling, the near end signal and the error signal are present frame time domain The corresponding frequency-region signal of signal;
Coherence factor determining module, described in being determined according to the remote signaling, the near end signal with the error signal Second phase responsibility of near end signal and the first coherence factor of the error signal and the remote signaling and the error signal Number;
Difference determining module, for determining the coherence of present frame according to first coherence factor and second coherence factor Difference and difference pursuit gain, wherein the difference pursuit gain of present frame according to the difference of coherence's difference of present frame and former frame with Track value determines;
First audio collection state determining module, for true according to coherence's difference of present frame and the difference pursuit gain The audio collection state of settled previous frame;
The audio collection state is singly to say state or double speaking state;
The first audio collection state determining module includes:
Second difference value determining unit, for determining the second difference according to coherence's difference and the difference pursuit gain;
First audio collection status determining unit, if being more than or equal to preset threshold for second difference, it is determined that described to work as The audio collection state of previous frame is double speaking state;If second difference is less than the preset threshold, it is determined that the present frame Audio collection state be singly say state.
8. device according to claim 7, which is characterized in that further include:
Subband signal obtains module, after the remote signaling and near end signal that obtain present frame, is obtained according to predeterminated frequency range Distal end subband signal, proximal end subband signal and the error subband signal of the remote signaling;
Correspondingly, the coherence factor determining module includes:
The first phase responsibility is determined according to the distal end subband signal, the proximal end subband signal and the error subband signal Several and second coherence factor.
9. device according to claim 8, which is characterized in that the coherence factor determining module is specifically used for:
Determine distal end subband from power according to the distal end subband signal, the proximal end subband signal and the error subband signal The first of spectrum, proximal end subband auto-power spectrum, error subband auto-power spectrum, the distal end subband signal and the error subband signal Second crosspower spectrum of crosspower spectrum and the proximal end subband signal and the error subband signal;
Described is determined according to the distal end subband auto-power spectrum, the error subband auto-power spectrum and first crosspower spectrum One coherence factor, it is true according to the proximal end subband auto-power spectrum, the error subband auto-power spectrum and second crosspower spectrum Fixed first coherence factor.
10. device according to claim 7, which is characterized in that further include:
Energy harvesting module, for obtaining the remote signaling after the remote signaling and near end signal for obtaining present frame Energy;
Second audio collection state determining module, if being less than energy threshold for the energy, it is determined that present frame is not in back Under sound field scape, stop the detection to present frame audio collection state;If the energy is more than or equal to the energy threshold, basis The remote signaling and near end signal of present frame judge the audio collection state of the present frame.
11. according to any device of claim 7-10, which is characterized in that described device further include:
Number determining module is hindered, in the audio collection for determining present frame according to coherence's difference and difference pursuit gain After state, when the audio collection state of the audio collection state and previous frame that detect present frame is inconsistent, detection is current Whether the obstruction number of frame is greater than zero;
Third audio collection state determining module, if for hindering number to be greater than zero, to the audio collection state of present frame into Row switches, and the difference of the obstruction number and one is determined as to the obstruction number of next frame;If number is hindered to be equal to zero, protect Hold the audio collection state of the present frame.
12. device according to claim 11, which is characterized in that when the audio collection state and upper one for detecting present frame When the audio collection state consistency of frame or the obstruction number of present frame are zero, the obstruction number is updated to initial default.
CN201710948010.1A 2017-10-12 2017-10-12 A kind of detection method and device of echo scene subaudio frequency acquisition state Active CN107770683B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710948010.1A CN107770683B (en) 2017-10-12 2017-10-12 A kind of detection method and device of echo scene subaudio frequency acquisition state

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710948010.1A CN107770683B (en) 2017-10-12 2017-10-12 A kind of detection method and device of echo scene subaudio frequency acquisition state

Publications (2)

Publication Number Publication Date
CN107770683A CN107770683A (en) 2018-03-06
CN107770683B true CN107770683B (en) 2019-10-11

Family

ID=61267213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710948010.1A Active CN107770683B (en) 2017-10-12 2017-10-12 A kind of detection method and device of echo scene subaudio frequency acquisition state

Country Status (1)

Country Link
CN (1) CN107770683B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108172233B (en) * 2017-12-12 2019-08-13 天格科技(杭州)有限公司 The echo cancel method of signal and error signal regression vectors is estimated based on distal end
CN108696648B (en) * 2018-05-16 2021-08-24 上海小度技术有限公司 Method, device, equipment and storage medium for processing short-time voice signal
WO2019223603A1 (en) * 2018-05-22 2019-11-28 出门问问信息科技有限公司 Voice processing method and apparatus and electronic device
CN108806713B (en) * 2018-05-22 2020-06-16 出门问问信息科技有限公司 Method and device for detecting double-speech state
CN109068012B (en) * 2018-07-06 2021-04-27 南京时保联信息科技有限公司 Double-end call detection method for audio conference system
CN111294473B (en) * 2019-01-28 2022-01-04 展讯通信(上海)有限公司 Signal processing method and device
CN112291676B (en) * 2020-05-18 2021-10-15 珠海市杰理科技股份有限公司 Method and system for inhibiting audio signal tailing, chip and electronic equipment
CN112397082A (en) * 2020-11-17 2021-02-23 北京达佳互联信息技术有限公司 Method, apparatus, electronic device and storage medium for estimating echo delay
CN114401399B (en) * 2022-03-28 2022-08-09 广州迈聆信息科技有限公司 Audio bidirectional delay estimation method and device, conference terminal and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1917386A (en) * 2006-09-05 2007-02-21 华为技术有限公司 Method for detecting both speaking status in operatioon of echo cancel
CN1925346A (en) * 2006-09-05 2007-03-07 华为技术有限公司 Detecting method for double speaking state in echo wave counteract
US7333605B1 (en) * 2002-04-27 2008-02-19 Fortemedia, Inc. Acoustic echo cancellation with adaptive step size and stability control
CN102160296A (en) * 2009-01-20 2011-08-17 华为技术有限公司 Method and apparatus for detecting double talk
CN103325379A (en) * 2012-03-23 2013-09-25 杜比实验室特许公司 Method and device used for acoustic echo control
CN105338450A (en) * 2015-09-23 2016-02-17 苏州科达科技股份有限公司 Residual echo inhibition method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7333605B1 (en) * 2002-04-27 2008-02-19 Fortemedia, Inc. Acoustic echo cancellation with adaptive step size and stability control
CN1917386A (en) * 2006-09-05 2007-02-21 华为技术有限公司 Method for detecting both speaking status in operatioon of echo cancel
CN1925346A (en) * 2006-09-05 2007-03-07 华为技术有限公司 Detecting method for double speaking state in echo wave counteract
CN102160296A (en) * 2009-01-20 2011-08-17 华为技术有限公司 Method and apparatus for detecting double talk
CN103325379A (en) * 2012-03-23 2013-09-25 杜比实验室特许公司 Method and device used for acoustic echo control
CN105338450A (en) * 2015-09-23 2016-02-17 苏州科达科技股份有限公司 Residual echo inhibition method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于噪声估计和能量比的双讲检测方法;吴超;《第十二届全国人机语音通讯学术会议》;20131231;第1-5页 *

Also Published As

Publication number Publication date
CN107770683A (en) 2018-03-06

Similar Documents

Publication Publication Date Title
CN107770683B (en) A kind of detection method and device of echo scene subaudio frequency acquisition state
US10972837B2 (en) Robust estimation of sound source localization
US10079026B1 (en) Spatially-controlled noise reduction for headsets with variable microphone array orientation
CN110197669B (en) Voice signal processing method and device
CN105825864B (en) Both-end based on zero-crossing rate index is spoken detection and echo cancel method
CN104661153A (en) Earphone sound effect compensation method and device as well as earphone
CN107464565B (en) Far-field voice awakening method and device
CN107863099B (en) Novel double-microphone voice detection and enhancement method
CN108712703A (en) The high-efficient noise-reducing earphone and noise reduction system of low-power consumption
CN112004177B (en) Howling detection method, microphone volume adjustment method and storage medium
CN111742541B (en) Acoustic echo cancellation method, acoustic echo cancellation device and storage medium
CN110782912A (en) Sound source control method and speaker device
US9330677B2 (en) Method and apparatus for generating a noise reduced audio signal using a microphone array
CN110335618A (en) A kind of method and computer equipment improving non-linear inhibition
WO2020232659A1 (en) Double talk detection method, double talk detection device and echo cancellation system
CN109961797A (en) A kind of echo cancel method, device and electronic equipment
CN110956975A (en) Echo cancellation method and device
CN205754809U (en) A kind of robot self-adapting volume control system
JP6179081B2 (en) Noise reduction device, voice input device, wireless communication device, and noise reduction method
CN112735370B (en) Voice signal processing method and device, electronic equipment and storage medium
CN111355855B (en) Echo processing method, device, equipment and storage medium
CN106534461A (en) Denoising system for earphone and denoising method thereof
CN110099328B (en) Intelligent sound box
JP5958218B2 (en) Noise reduction device, voice input device, wireless communication device, and noise reduction method
CN108833681A (en) A kind of volume adjusting method and mobile terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210518

Address after: 201210 4 / F, building 1, 701 Naxian Road, Shanghai pilot Free Trade Zone, Pudong New Area, Shanghai, China

Patentee after: Shanghai Xiaodu Technology Co.,Ltd.

Address before: 100012 3rd floor, building 10, No.18 ziyue Road, Chaolai science and Technology Industrial Park, No.1, Laiguangying middle street, Chaoyang District, Beijing

Patentee before: AINEMO Inc.