CN109901113A - A kind of voice signal localization method, apparatus and system based on complex environment - Google Patents
A kind of voice signal localization method, apparatus and system based on complex environment Download PDFInfo
- Publication number
- CN109901113A CN109901113A CN201910190519.3A CN201910190519A CN109901113A CN 109901113 A CN109901113 A CN 109901113A CN 201910190519 A CN201910190519 A CN 201910190519A CN 109901113 A CN109901113 A CN 109901113A
- Authority
- CN
- China
- Prior art keywords
- way
- residual signals
- echo
- signal
- transmission function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
- Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
Abstract
The embodiment of the present invention specifically discloses a kind of voice signal localization method, apparatus and system based on complex environment, comprising: according to loudspeaker reference signal, carries out echo cancellation process respectively at least two-way desired signal, obtains at least the first residual signals of two-way;Inhibit transmission function according to default echo, carries out echo inhibition processing respectively per the first residual signals all the way at least the first residual signals of two-way, obtain at least the second residual signals of two-way;According to default noise suppressed transmission function, noise suppressed processing is carried out per the second residual signals all the way at least the second residual signals of two-way respectively, obtains at least two-way third residual signals;According at least two-way third residual signals, when to determine ambient sound currently be voice status, the voice signal in environment is positioned.By this kind of mode, interference of the non-targeted voice status signal to auditory localization is reduced.It realizes and precise positioning is carried out to sound source, improve auditory localization robustness.
Description
Technical field
The present embodiments relate to signal processing technology fields, and in particular to a kind of voice signal based on complex environment is fixed
Position method, apparatus and system.
Background technique
Auditory localization algorithm is generally basede on the array of multiple microphone compositions, utilizes the phase between multipath input audio signal
Position difference estimation Sounnd source direction information.But in audio device speakers playing audio signal and when in the higher noise environment,
There are the echo signal of higher-energy and noise signal in microphone signal, these signals can be to auditory localization algorithm to target language
The positioning in speech source causes extremely strong interference, causes algorithm azimuth information calculating error very big, robustness is low.
To solve the above-mentioned problems, it exists in the prior art and echo cancellor and mute detection scheduling algorithm is added to system
In, but this method in nonlinear echo compared in the strong and higher situation of ambient noise, robustness is still very low.
How to guarantee in the complex environment even at higher noise and compared with strong nonlinearity echo, still can to sound source into
Row precise positioning, improving auditory localization robustness becomes the application technical problem urgently to be resolved.
Summary of the invention
For this purpose, the embodiment of the present invention provides a kind of voice signal localization method, apparatus and system based on complex environment, with
It solves in the prior art in the complex environment in higher noise and compared with strong nonlinearity echo, to auditory localization inaccuracy, Shandong
The low problem of stick.
To achieve the goals above, the embodiment of the present invention provides the following technical solutions:
In a first aspect, the embodiment of the invention provides a kind of voice signal localization method based on complex environment, this method
Include:
According to the loudspeaker reference signal of pre-acquiring, it is expected all the way every in at least desired signal of two-way microphone pick
Signal carries out echo cancellation process respectively, obtains at least the first residual signals of two-way;
Inhibit transmission function according to default echo, divides per the first residual signals all the way at least the first residual signals of two-way
Not carry out echo inhibition processing, obtain at least the second residual signals of two-way;
According to default noise suppressed transmission function, respectively to every second residual error all the way at least the second residual signals of two-way
Signal carries out noise suppressed processing, obtains at least two-way third residual signals;According at least two-way third residual signals, to environment
Sound is presently in state and is detected;
When determine ambient sound be presently in state be voice status when, according at least two-way third residual signals to environment
In voice signal positioned.
The embodiment of the present invention is further characterized in that, when determining ambient sound to be presently in state being mute state, will before
The positioning result once positioned to the voice signal in environment is as this positioning result.
The embodiment of the present invention is further characterized in that, when determining ambient sound to be presently in state being voice status, according to
At least two-way third residual signals progress auditory localization positions the voice signal in environment, specifically includes:
It is smoothed respectively to per third residual signals all the way, obtains at least letter of the two-way after smoothing processing
Number;
Auditory localization is carried out according to signal of at least two-way after smoothing processing to determine the voice signal in environment
Position.
The embodiment of the present invention is further characterized in that, according to the loudspeaker reference signal of pre-acquiring, at least two-way microphone
Per desired signal all the way in the desired signal of acquisition, echo cancellation process is carried out respectively to the loudspeaker reference signal of pre-acquiring,
At least the first residual signals of two-way are obtained, are specifically included:
Loudspeaker reference signal is input in n-th of sef-adapting filter, n-th of output signal is obtained;
N-th of desired signal and n-th of input signal output signal are subjected to difference operation, obtain n-th of first residual errors
Signal, wherein n is the positive integer of the number more than or equal to 1, and less than or equal to desired signal.
The embodiment of the present invention is further characterized in that, presets echo and inhibits transmission function are as follows: according to loudspeaker reference signal and
First via desired signal carries out echo to the first residual signals of the first via and the echo used when processing is inhibited to inhibit transmission function;
Inhibit transmission function according to default echo, is returned respectively at least the first residual signals of two-way per the first residual signals all the way
Sound inhibition processing, obtains at least the second residual signals of two-way, specifically includes:
According to loudspeaker reference signal and first via desired signal, the first residual signals of the first via are carried out at echo inhibition
Reason obtains the second residual signals of the first via, and records the echo carried out when echo inhibits processing to the first residual signals of the first via
Inhibit transmission function, wherein the first residual signals of the first via are the first residual error of any road at least in the first residual signals of two-way
Signal, first via desired signal are desired signal corresponding with the first residual signals of the first via;
According to echo inhibit transmission function, respectively at least the first residual signals of two-way remove the first residual signals of the first via
Except echo inhibition processing is carried out per the first residual signals all the way, obtain at least the second residual signals all the way.
The embodiment of the present invention is further characterized in that, preset noise suppressed transmission function be to the second residual signals of the first via into
Noise suppressed transmission function when the processing of row noise suppressed;According to default noise suppressed transmission function, respectively at least two-way
Noise suppressed processing is carried out per the second residual signals all the way in two residual signals, obtains at least two-way third residual signals, tool
Body includes:
Noise suppressed processing is carried out to the second residual signals of the first via, obtains first via third residual signals, and record pair
The second residual signals of the first via carry out noise suppressed transmission function when noise suppressed processing;
According to noise suppressed transmission function, in addition to the second residual signals of the first via per the second residual signals all the way into
The processing of row noise suppressed, obtains at least third residual signals all the way.
Second aspect, the embodiment of the invention also provides a kind of voice signal positioning device based on complex environment, the dress
It sets and includes:
Echo cancellation module, for the loudspeaker reference signal according to pre-acquiring, to the phase of at least two-way microphone pick
It hopes in signal and carries out echo cancellation process respectively per desired signal all the way, obtain at least the first residual signals of two-way;
Echo suppression module, for inhibiting transmission function according to default echo, to every at least the first residual signals of two-way
The first residual signals carry out echo inhibition processing respectively all the way, obtain at least the second residual signals of two-way;
Noise suppression module is used for according to default noise suppressed transmission function, respectively at least the second residual signals of two-way
In noise suppressed processing is carried out per the second residual signals all the way, obtain at least two-way third residual signals;
State detection module, for being presently in state to ambient sound and carrying out according at least two-way third residual signals
Detection;
Voice signal locating module, for when determine ambient sound be presently in state be voice status when, according at least
Two-way third residual signals position the voice signal in environment.
The embodiment of the present invention is further characterized in that voice signal locating module is also used to, when determining the current institute of ambient sound
When place's state is mute state, once the positioning result that the voice signal in environment positions is determined as this sound source using preceding
Position result.
The embodiment of the present invention is further characterized in that voice signal locating module is specifically used for:
It is smoothed respectively to per third residual signals all the way, obtains at least letter of the two-way after smoothing processing
Number;
The voice signal in environment is positioned according to signal of at least two-way after smoothing processing.
The embodiment of the present invention is further characterized in that echo cancellation module is specifically used for:
Loudspeaker reference signal is input in n-th of sef-adapting filter, n-th of output signal is obtained;
N-th of desired signal and n-th of output signal are subjected to difference operation, obtain n-th of first residual signals,
In, n is the positive integer of the number more than or equal to 1, and less than or equal to desired signal.
The embodiment of the present invention is further characterized in that, presets echo and inhibits transmission function are as follows: according to loudspeaker reference signal and
First via desired signal carries out echo to the first residual signals of the first via and the echo used when processing is inhibited to inhibit transmission function;
Echo suppression module is specifically used for, according to loudspeaker reference signal and first via desired signal, to the first via first
Residual signals carry out echo inhibition processing, obtain the second residual signals of the first via, and record to the first residual signals of the first via into
Row echo inhibits echo when processing to inhibit transmission function, wherein the first residual signals of the first via are at least the first residual error of two-way
The first residual signals of any road in signal, first via desired signal are expectation corresponding with the first residual signals of first via letter
Number;
According to echo inhibit transmission function, respectively at least the first residual signals of two-way remove the first residual signals of the first via
Except echo inhibition processing is carried out per the first residual signals all the way, obtain at least the second residual signals all the way.
The embodiment of the present invention is further characterized in that, preset noise suppressed transmission function be to the second residual signals of the first via into
Noise suppressed transmission function when the processing of row noise suppressed;
Noise suppression module is specifically used for carrying out noise suppressed processing to the second residual signals of the first via, obtains the first via
Third residual signals, and record noise suppressed transmission function when the second residual signals of the first via are carried out with noise suppressed processing;
According to noise suppressed transmission function, in addition to the second residual signals of the first via per the second residual signals all the way into
The processing of row noise suppressed, obtains at least third residual signals all the way.
The third aspect, the voice signal positioning system based on complex environment that the embodiment of the invention also provides a kind of, this is
System includes: processor and memory;
Memory is for storing one or more program instructions;
Processor, for running one or more program instructions, to execute a kind of voice based on complex environment as above
Method step either in the method for signal framing.
Fourth aspect includes in computer storage medium the embodiment of the invention also provides a kind of computer storage medium
One or more program instructions, one or more program instructions are used for by a kind of voice signal positioning system based on complex environment
Either execute in a kind of voice signal localization method based on complex environment of first aspect as above method step.
Embodiment according to the present invention has the advantages that using echo cancellation process method, refers to according to loudspeaker
Signal is linearly disappeared in at least desired signal of two-way microphone pick per the echo signal in desired signal all the way respectively
It removes.To have the function that certain echo cancellor.Echo inhibition processing is carried out at least the first residual signals of two-way respectively, is obtained
At least the second residual signals of two-way.By this kind of mode, the non-linear component in the first residual signals can be inhibited;Moreover, right
At least the first residual signals of two-way are all made of the same echo inhibition transmission function, it is ensured that in each the first residual signals of road
The Nonlinear Processing that voice phase information is subject to is identical, that is to say that nonlinear impairments are consistent, to avoid inconsistent non-linear
It is distorted bring severe jamming, system is improved and is inhibiting the robustness in noise.Then, then at least the second residual signals of two-way
Noise suppressed processing is carried out, stationary noise is significantly reduced, improves robustness of the system in stationary noise.It is handled with echo inhibition
Similar, when carrying out noise suppressed processing at least the second residual signals of two-way, it is all made of the same noise suppressed transmission function,
Avoid non-linear distortion bring severe jamming.Final basis at least two-way third residual signals, are presently in ambient sound
State is detected, and only when ambient sound status is voice status, is carried out respectively to per third residual signals all the way
After smoothing processing, then sound source is positioned, further decreases interference of the non-targeted voice status signal to auditory localization.
Detailed description of the invention
It, below will be to embodiment party in order to illustrate more clearly of embodiments of the present invention or technical solution in the prior art
Formula or attached drawing needed to be used in the description of the prior art are briefly described.It should be evident that the accompanying drawings in the following description is only
It is merely exemplary, it for those of ordinary skill in the art, without creative efforts, can also basis
The attached drawing of offer, which is extended, obtains other implementation attached drawings.
Fig. 1 is a kind of voice signal localization method process signal based on complex environment that one embodiment of the invention provides
Figure;
Fig. 2 is a kind of voice signal positioning principle structural representation based on complex environment that one embodiment of the invention provides
Figure;
Fig. 3 be another embodiment of the present invention provides it is a kind of based on complex environment voice signal positioning device structure signal
Figure;
Fig. 4 be another embodiment of the present invention provides it is a kind of based on complex environment voice signal positioning system structure signal
Figure.
Specific embodiment
Embodiments of the present invention are illustrated by particular specific embodiment below, those skilled in the art can be by this explanation
Content disclosed by book is understood other advantages and efficacy of the present invention easily, it is clear that described embodiment is the present invention one
Section Example, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not doing
Every other embodiment obtained under the premise of creative work out, shall fall within the protection scope of the present invention.
The embodiment of the present invention 1 provides a kind of voice signal localization method based on complex environment, specific such as Fig. 1 and Fig. 2
Shown, Fig. 1 shows the voice signal localization method flow diagram based on complex environment, and Fig. 2 shows be based on complex environment
Voice signal positioning principle structural schematic diagram.This method comprises:
Step 110, according to the loudspeaker reference signal of pre-acquiring, to every in at least desired signal of two-way microphone pick
Desired signal carries out echo cancellation process respectively all the way, obtains at least the first residual signals of two-way.
Specifically, automatic echo cancellation (Automatic Echo Cancellation, abbreviation AEC) method can be used
Remove linear echo segment in desired signal.Xref (z) shown in Figure 2 is the reference letter that audio frequency apparatus inputs to loudspeaker
Number, at least two-way audio signal that D0 (z), D1 (z) ..., Dn-1 (z) receive at least two microphones disappears as echo
Except desired signal in algorithm, wherein n is the positive integer more than or equal to 1.
Its realization principle is that loudspeaker reference signal is input in n-th of sef-adapting filter, obtains n-th of output
Signal;
N-th of desired signal and n-th of output signal are subjected to difference operation, obtain n-th of first residual signals,
In, n is the positive integer of the number more than or equal to 1, and less than or equal to desired signal.
If setting Waesn-1 (z) as the transmission function of n-th of sef-adapting filter, Ecn-1 (z) is n-th of first residual errors
Signal then has: Ecn-1 (z)=Dn-1 (z)-Waesn-1 (z) Xref (z);
It that is to say:
Ec0 (z)=D0 (z)-Waec0 (z) Xref (z),
Ec1 (z)=D1 (z)-Waec1 (z) Xref (z),
...
Ecn-1 (z)=Dn-1 (z)-Waecn-1 (z) Xref (z)
Although the linear echo part in desired signal can be removed by step 110, moreover, the process is linear place
Reason, therefore the phase information in signal will not be destroyed.It eliminates certain echo that is, can reach and does not influence auditory localization institute simultaneously
Need the effect of information.However, linear process cannot be eliminated by audio system non-linear distortion bring nonlinear echo part, because
This needs to be implemented step 120 to step 130, inhibits (Acoustic Echo Suppression, abbreviation using automatic echo
AES) method removes the nonlinear echo part in the first residual signals.
Step 120, inhibit transmission function according to default echo, to residual per all the way first at least the first residual signals of two-way
Difference signal carries out echo inhibition processing respectively, obtains at least the second residual signals of two-way.
Optionally, presetting echo inhibits transmission function can be with are as follows: according to loudspeaker reference signal and first via desired signal,
Carrying out echo to the first residual signals of the first via inhibits the echo used when processing to inhibit transmission function.
When executing step 120, it can specifically include following steps:
According to loudspeaker reference signal and first via desired signal, the first residual signals of the first via are carried out at echo inhibition
Reason obtains the second residual signals of the first via, and records the echo carried out when echo inhibits processing to the first residual signals of the first via
Inhibit transmission function.
Wherein, the first residual signals of the first via are the first residual signals of any road at least in the first residual signals of two-way,
First via desired signal is desired signal corresponding with the first residual signals of the first via.
According to echo inhibit transmission function, respectively at least the first residual signals of two-way remove the first residual signals of the first via
Except echo inhibition processing is carried out per the first residual signals all the way, obtain at least the second residual signals all the way.
Concrete processing procedure is expressed as follows by formula:
Esn-1 (z)=Waesn-1 (z) Ecn-1 (z);
It that is to say,
Es0 (z)=Waes0 (z) Ec0 (z);
Es1 (z)=Waes1 (z) Ec1 (z);
...
Esn-1 (z)=Waesn-1 (z) Ecn-1 (z).
Wherein, Esn-1 (z) is n-th the second residual signals of tunnel, and Waesn-1 (z) is that echo inhibits transmission function, Ecn-1
It (z) is n-th the first residual signals of tunnel.
Although AES method can inhibit the non-linear component in the first residual signals, Nonlinear Processing
It destroys per the phase information in signal all the way.If used when to carrying out echo inhibition processing per the first residual signals all the way
Be different echo inhibit transmission function if, strong influence will necessarily be caused to final auditory localization.In order to solve
The problem proposes in this implementation to be all made of the same echo suppression to when carrying out echo per the first residual signals all the way and inhibiting processing
Modulation trnasfer function.It that is to say and carry out echo inhibition processing using the same filter transfer function.By this kind of mode, guarantee
AES method is identical to the Nonlinear Processing carried out per the first residual signals all the way, that is to say that nonlinear impairments are consistent, passes through this kind
Mode avoids inconsistent non-linear distortion bring severe jamming.Meanwhile considering in audio frequency apparatus in each microphone
Signal similarity is higher, and carrying out echo inhibition processing to other road signals using the filter of signal all the way can also play preferably
Effect.
After executing the step 120, the nonlinear echo part in the first residual signals of removal may be implemented.And it connects down
Come, then need to carry out noise suppressed processing, further promotes the robustness of auditory localization.That is, executing step 130.
Step 130, according to default noise suppressed transmission function, respectively at least the second residual signals of two-way per all the way
Second residual signals carry out noise suppressed processing, obtain at least two-way third residual signals.
Optionally, presetting echo inhibits transmission function can be with are as follows: according to loudspeaker reference signal and first via desired signal,
Carrying out echo to the first residual signals of the first via inhibits the echo used when processing to inhibit transmission function.
When executing step 130, it can specifically include following steps:
Noise suppressed processing is carried out to the second residual signals of the first via, obtains first via third residual signals, and record pair
The second residual signals of the first via carry out noise suppressed transmission function when noise suppressed processing.
According to noise suppressed transmission function, in addition to the second residual signals of the first via per the second residual signals all the way into
The processing of row noise suppressed, obtains at least third residual signals all the way.
Concrete processing procedure is expressed as follows by formula:
Ern-1 (z)=Waes (z) Esn-1 (z);
It that is to say:
Er0 (z)=Wnr (z) Es0 (z)
Er1 (z)=Wnr (z) Es1 (z),
…
Ern-1 (z)=Wnr (z) Esn-1 (z)
Wherein, Ern-1 (z) is the n-th third residual signals, and Wnr (z) is noise suppressed transmission function, and Esn-1 (z) is n-th
Second residual signals.
Noise suppressed processing is similarly Nonlinear Processing, can equally destroy the nonlinear transformations of signal.Therefore, at least two-way
Second residual signals use the same noise suppressed transmission function, that is to say and carry out noise using the same filter transfer function
Inhibition processing, guarantees that the non-linear distortion of each road third residual signals is identical, and it is broken to signal phase information to reduce Nonlinear Processing
The interference of bad bring.Noise suppression treatment process can be using prior art realization, for example, by using noise reduction method, commonly
Noise reduction method may include spectrum sword method, Wiener filtering etc., and noise reduction method can also significantly reduce stationary noise,
Robustness of the raising system in stationary noise.
Step 140, according at least two-way third residual signals, state is presently in ambient sound and is detected.
Specifically, according at least two-way third residual signals, determining that ambient sound is presently in using condition detection method
State.It may include voice status or mute state that ambient sound, which is presently in state,.If mute state, then in Fig. 2
If voice=0;If voice status, then the if voice=1 in Fig. 2.Specific condition detection method is routine techniques, example
As using mute detection method based on signal-to-noise ratio, or the mute detection method based on machine learning etc..
Step 150, when determine ambient sound be presently in state be voice status when, according at least two-way third residual error believe
Number the voice signal in environment is positioned.
In the present embodiment, when determine ambient sound be presently in state be voice status when, according at least two-way third
Residual signals position the voice signal in environment, specifically include:
It is smoothed respectively to per third residual signals all the way, obtains at least letter of the two-way after smoothing processing
Number;
The voice signal in environment is positioned according to signal of at least two-way after smoothing processing.
Specifically, can be smoothed in the following way at least two-way third residual signals:
E0 (z)=smoothfactor*E0 (z)+(1-smoothfactor) * Er0 (z),
E1 (z)=smoothfactor*E0 (z)+(1-smoothfactor) * Er1 (z),
…
En-1 (z)=smoothfactor*E0 (z)+(1-smoothfactor) * Ern-1 (z)
Wherein, smoothfactor is smoothing factor, and numerical value is generally the real number in (0,1) section, and specific value can be by
Staff is previously set, and in the present embodiment, smoothfactor=0.9 can be set.
And sound localization method can use existing method, such as the undistorted response of minimum variance based on Estimation of Spatial Spectrum
(Minimum Variance Distortionless Response, abbreviation MVDR) algorithm and multiple signal classification
(Multiple Signal Classification, abbreviation MUSIC) algorithm, the broad sense cross-correlation based on reaching time-difference
(Generalized Cross Correlation, abbreviation GCC) algorithm, and the auditory localization based on Wave beam forming scheduling algorithm
Method.Specific auditory localization process is also the prior art, does not do excessive explanation here.
Optionally, when determine ambient sound be presently in state be mute state when, then not at least two-way third residual error
Signal does any processing, but directly exports this at least two-way third residual signals, and previous auditory localization result is made
For this auditory localization result.
After only the signal of voice status is smoothed in the present embodiment, as the input signal of auditory localization,
It is the interference in order to further decrease non-targeted speech phase signal to auditory localization.And time smoothing can drop to a certain extent
Interference of the low short-term burst noise to auditory localization, keeps algorithm more stable to the estimation of targeted voice signal.
A kind of voice signal localization method based on complex environment provided in an embodiment of the present invention, utilizes echo cancellation process
Method, according to loudspeaker reference signal, respectively in desired signal all the way every in at least desired signal of two-way microphone pick
Echo signal linearly eliminated.To have the function that certain echo cancellor.At least the first residual signals of two-way are distinguished
Echo inhibition processing is carried out, at least the second residual signals of two-way are obtained.By this kind of mode, can inhibit in the first residual signals
Non-linear component;Moreover, being all made of the same echo at least the first residual signals of two-way inhibits transmission function, it is ensured that
The Nonlinear Processing that voice phase information in each the first residual signals of road is subject to is identical, that is to say that nonlinear impairments are consistent, from
And inconsistent non-linear distortion bring severe jamming is avoided, it improves system and is inhibiting the robustness in noise.Then, then it is right
At least the second residual signals of two-way carry out noise suppressed processing, significantly reduce stationary noise, improve system in stationary noise
Robustness.Inhibit processing similar with echo, when carrying out noise suppressed processing at least the second residual signals of two-way, is all made of same
One noise suppressed transmission function avoids non-linear distortion bring severe jamming.It is final to be believed according at least two-way third residual error
Number, state is presently in ambient sound and is detected, only when ambient sound status is voice status, to per all the way
After third residual signals are smoothed respectively, then sound source is positioned, further decreases non-targeted voice status signal
Interference to auditory localization.
Corresponding with above-described embodiment 1, it is fixed that the embodiment of the invention also provides a kind of voice signals based on complex environment
Position device, specifically as shown in figure 3, the device includes: echo cancellation module 301, echo suppression module 302, noise suppression module
303, state detection module 304 and voice signal locating module 305.
Echo cancellation module 301, for the loudspeaker reference signal according to pre-acquiring, at least two-way microphone pick
Echo cancellation process is carried out respectively per desired signal all the way in desired signal, obtains at least the first residual signals of two-way;
Echo suppression module 302, for inhibiting transmission function according to default echo, at least the first residual signals of two-way
Echo inhibition processing is carried out respectively per the first residual signals all the way, obtains at least the second residual signals of two-way.
Optionally, presetting echo inhibits transmission function can be with are as follows: according to loudspeaker reference signal and first via desired signal,
Carrying out echo to the first residual signals of the first via inhibits the echo used when processing to inhibit transmission function;
Echo suppression module 302 is specifically used for, according to loudspeaker reference signal and first via desired signal, to the first via
One residual signals carry out echo inhibition processing, obtain the second residual signals of the first via, and record to the first residual signals of the first via
Carrying out echo inhibits echo when processing to inhibit transmission function, wherein the first residual signals of the first via are that at least two-way first is residual
The first residual signals of any road in difference signal, first via desired signal are expectation corresponding with the first residual signals of first via letter
Number;
According to echo inhibit transmission function, respectively at least the first residual signals of two-way remove the first residual signals of the first via
Except echo inhibition processing is carried out per the first residual signals all the way, obtain at least the second residual signals all the way;
Noise suppression module 303, for being believed at least the second residual error of two-way respectively according to default noise suppressed transmission function
Noise suppressed processing is carried out per the second residual signals all the way in number, obtains at least two-way third residual signals.
Optionally, presetting noise suppressed transmission function is when the second residual signals of the first via are carried out with noise suppressed processing
Noise suppressed transmission function.
Noise suppression module 303 is specifically used for, and carries out noise suppressed processing to the second residual signals of the first via, obtains first
Road third residual signals, and the noise suppressed recorded when the second residual signals of the first via are carried out with noise suppressed processing transmits letter
Number;
According to noise suppressed transmission function, in addition to the second residual signals of the first via per the second residual signals all the way into
The processing of row noise suppressed, obtains at least third residual signals all the way;
State detection module 304, for according at least two-way third residual signals, to ambient sound be presently in state into
Row detection;
Voice signal locating module 305, for when determine ambient sound be presently in state be voice status when, according to extremely
Few two-way third residual signals carry out auditory localization and position to the voice signal in environment.
Optionally, voice signal locating module 305 is also used to, and is mute state when determining that ambient sound is presently in state
When, it is directly output to few two-way third residual signals, and by the preceding positioning knot once positioned to the voice signal in environment
Fruit is as this auditory localization result.
Optionally, voice signal locating module 305 is specifically used for, and is smoothly located respectively to per third residual signals all the way
Reason obtains at least signal of the two-way after smoothing processing;
The voice signal in environment is positioned according to signal of at least two-way after smoothing processing.
Optionally, echo cancellation module 301 is specifically used for: loudspeaker reference signal is input to n-th of adaptive-filtering
In device, n-th of output signal is obtained;
N-th of desired signal and n-th of output signal are subjected to difference operation, obtain n-th of first residual signals,
In, n is the positive integer of the number more than or equal to 1, and less than or equal to desired signal.
In a kind of voice signal positioning device based on complex environment provided in an embodiment of the present invention performed by each component
Function has been discussed in detail in above-described embodiment 1, therefore does not do excessively repeat here.
A kind of voice signal positioning device based on complex environment provided in an embodiment of the present invention, utilizes echo cancellation process
Method, according to loudspeaker reference signal, respectively in desired signal all the way every in at least desired signal of two-way microphone pick
Echo signal linearly eliminated.To have the function that certain echo cancellor.At least the first residual signals of two-way are distinguished
Echo inhibition processing is carried out, at least the second residual signals of two-way are obtained.By this kind of mode, can inhibit in the first residual signals
Non-linear component;Moreover, being all made of the same echo at least the first residual signals of two-way inhibits transmission function, it is ensured that
The Nonlinear Processing that voice phase information in each the first residual signals of road is subject to is identical, that is to say that nonlinear impairments are consistent, from
And inconsistent non-linear distortion bring severe jamming is avoided, it improves system and is inhibiting the robustness in noise.Then, then it is right
At least the second residual signals of two-way carry out noise suppressed processing, significantly reduce stationary noise, improve system in stationary noise
Robustness.Inhibit processing similar with echo, when carrying out noise suppressed processing at least the second residual signals of two-way, is all made of same
One noise suppressed transmission function avoids non-linear distortion bring severe jamming.It is final to be believed according at least two-way third residual error
Number, state is presently in ambient sound and is detected, only when ambient sound status is voice status, to per all the way
After third residual signals are smoothed respectively, then sound source is positioned, further decreases non-targeted voice status signal
Interference to auditory localization.
Corresponding with above-described embodiment 1, the embodiment of the invention also provides a kind of voice signals based on complex environment
Positioning system, specifically as shown in figure 4, the system includes: processor 401 and memory 402.
Memory 402 is for storing one or more program instructions;
Processor 401, for running one or more program instructions, a kind of base for being introduced to execute embodiment as above
In the method that the voice signal of complex environment positions.
A kind of voice signal positioning system based on complex environment provided in an embodiment of the present invention, utilizes echo cancellation process
Method, according to loudspeaker reference signal, respectively in desired signal all the way every in at least desired signal of two-way microphone pick
Echo signal linearly eliminated.To have the function that certain echo cancellor.At least the first residual signals of two-way are distinguished
Echo inhibition processing is carried out, at least the second residual signals of two-way are obtained.By this kind of mode, can inhibit in the first residual signals
Non-linear component;Moreover, being all made of the same echo at least the first residual signals of two-way inhibits transmission function, it is ensured that
The Nonlinear Processing that voice phase information in each the first residual signals of road is subject to is identical, that is to say that nonlinear impairments are consistent, from
And inconsistent non-linear distortion bring severe jamming is avoided, it improves system and is inhibiting the robustness in noise.Then, then it is right
At least the second residual signals of two-way carry out noise suppressed processing, significantly reduce stationary noise, improve system in stationary noise
Robustness.Inhibit processing similar with echo, when carrying out noise suppressed processing at least the second residual signals of two-way, is all made of same
One noise suppressed transmission function avoids non-linear distortion bring severe jamming.It is final to be believed according at least two-way third residual error
Number, state is presently in ambient sound and is detected, only when ambient sound status is voice status, to per all the way
After third residual signals are smoothed respectively, then sound source is positioned, further decreases non-targeted voice status signal
Interference to auditory localization.
Corresponding with above-described embodiment, the embodiment of the invention also provides a kind of computer storage medium, the computers
Include one or more program instructions in storage medium.Wherein, one or more program instructions are used for by one kind based on complicated ring
The voice signal positioning system in border executes a kind of voice signal localization method based on complex environment introduced such as embodiment 1.
Although above having used general explanation and specific embodiment, the present invention is described in detail, at this
On the basis of invention, it can be made some modifications or improvements, this will be apparent to those skilled in the art.Therefore,
These modifications or improvements without departing from theon the basis of the spirit of the present invention are fallen within the scope of the claimed invention.
Claims (10)
1. a kind of voice signal localization method based on complex environment, which is characterized in that the described method includes:
According to the loudspeaker reference signal of pre-acquiring, to desired signal all the way every in at least desired signal of two-way microphone pick
Echo cancellation process is carried out respectively, obtains at least the first residual signals of two-way;
Inhibit transmission function according to default echo, divides per the first residual signals all the way in first residual signals of at least two-way
Not carry out echo inhibition processing, obtain at least the second residual signals of two-way;
According to default noise suppressed transmission function, respectively to every second residual error all the way in second residual signals of at least two-way
Signal carries out noise suppressed processing, obtains at least two-way third residual signals;
According at least two-way third residual signals, state is presently in ambient sound and is detected;
When determine the ambient sound be presently in state be voice status when, according at least two-way third residual signals pair
Voice signal in environment is positioned.
2. the method according to claim 1, wherein being mute when determining that the ambient sound is presently in state
When state, using the preceding positioning result once positioned to the voice signal in environment as this positioning result.
3. the method according to claim 1, wherein described be when the determining ambient sound is presently in state
When voice status, the voice signal in environment is positioned according at least two-way third residual signals, is specifically included:
It is smoothed respectively to per third residual signals all the way, obtains at least signal of the two-way after smoothing processing;
The voice signal in environment is positioned according to signal of at least two-way after smoothing processing.
4. method according to claim 1-3, which is characterized in that according to the loudspeaker reference signal of pre-acquiring,
Echo cancellation process is carried out respectively per desired signal all the way in at least desired signal of two-way microphone pick, obtains at least two
The first residual signals of road, specifically include:
The loudspeaker reference signal is input in n-th of sef-adapting filter, n-th of output signal is obtained;
N-th of desired signal and n-th of output signal are subjected to difference operation, obtain n-th of first residual signals,
In, n is the positive integer of the number more than or equal to 1, and less than or equal to the desired signal.
5. method according to claim 1-3, which is characterized in that the default echo inhibits transmission function are as follows:
According to the loudspeaker reference signal and first via desired signal, when carrying out echo inhibition processing to the first residual signals of the first via
The echo of use inhibits transmission function;It is described to inhibit transmission function according to default echo, first residual error of at least two-way is believed
Echo inhibition processing is carried out respectively per the first residual signals all the way in number, is obtained at least the second residual signals of two-way, is specifically included:
According to the loudspeaker reference signal and first via desired signal, the first residual signals of the first via are carried out at echo inhibition
Reason obtains the second residual signals of the first via, and records and carry out when echo inhibits to handle to first residual signals of the first via
Echo inhibits transmission function, wherein first residual signals of the first via are appointing at least first residual signals of two-way
First residual signals all the way, the first via desired signal are desired signal corresponding with first residual signals of the first via;
According to the echo inhibit transmission function, respectively in first residual signals of at least two-way remove the first via first
Echo inhibition processing is carried out per the first residual signals all the way except residual signals, obtains at least the second residual signals all the way.
6. method according to claim 1-3, which is characterized in that the default noise suppressed transmission function is pair
The second residual signals of the first via carry out noise suppressed transmission function when noise suppressed processing;It is described to be passed according to default noise suppressed
Delivery function carries out noise suppressed processing per the second residual signals all the way in second residual signals of at least two-way respectively,
At least two-way third residual signals are obtained, are specifically included:
Noise suppressed processing is carried out to second residual signals of the first via, obtains first via third residual signals, and record pair
Second residual signals of the first via carry out noise suppressed transmission function when noise suppressed processing;
According to the noise suppressed transmission function, to believing in addition to second residual signals of the first via per the second residual error all the way
Number noise suppressed processing is carried out, obtains at least third residual signals all the way.
7. a kind of voice signal positioning device based on complex environment, which is characterized in that described device includes:
Echo cancellation module believes the expectation of at least two-way microphone pick for the loudspeaker reference signal according to pre-acquiring
Echo cancellation process is carried out respectively per desired signal all the way in number, obtains at least the first residual signals of two-way;
Echo suppression module, for inhibiting transmission function according to default echo, to every in first residual signals of at least two-way
The first residual signals carry out echo inhibition processing respectively all the way, obtain at least the second residual signals of two-way;
Noise suppression module is used for according to default noise suppressed transmission function, respectively to second residual signals of at least two-way
In noise suppressed processing is carried out per the second residual signals all the way, obtain at least two-way third residual signals;
State detection module, for being presently in state to ambient sound and carrying out according at least two-way third residual signals
Detection;
Voice signal locating module, for when determine the ambient sound be presently in state be voice status when, according to described
At least two-way third residual signals position the voice signal in environment.
8. device according to claim 7, which is characterized in that the voice signal locating module is also used to, when determining
State ambient sound be presently in state be mute state when, by the preceding positioning knot once positioned to the voice signal in environment
Fruit is as this auditory localization result.
9. a kind of voice signal positioning system based on complex environment, which is characterized in that the system comprises: processor and storage
Device;
The memory is for storing one or more program instructions;
The processor, for running one or more of program instructions, to execute as described in claim any one of 1-6
Method.
10. a kind of computer storage medium, which is characterized in that refer in the computer storage medium comprising one or more programs
It enables, one or more of program instructions are used for the voice signal positioning system by a kind of based on complex environment and execute as right is wanted
Seek the described in any item method and steps of 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910190519.3A CN109901113B (en) | 2019-03-13 | 2019-03-13 | Voice signal positioning method, device and system based on complex environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910190519.3A CN109901113B (en) | 2019-03-13 | 2019-03-13 | Voice signal positioning method, device and system based on complex environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109901113A true CN109901113A (en) | 2019-06-18 |
CN109901113B CN109901113B (en) | 2020-08-11 |
Family
ID=66952201
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910190519.3A Active CN109901113B (en) | 2019-03-13 | 2019-03-13 | Voice signal positioning method, device and system based on complex environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109901113B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112216295A (en) * | 2019-06-25 | 2021-01-12 | 大众问问(北京)信息科技有限公司 | Sound source positioning method, device and equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090310796A1 (en) * | 2006-10-26 | 2009-12-17 | Parrot | method of reducing residual acoustic echo after echo suppression in a "hands-free" device |
CN108538305A (en) * | 2018-04-20 | 2018-09-14 | 百度在线网络技术(北京)有限公司 | Audio recognition method, device, equipment and computer readable storage medium |
CN108766456A (en) * | 2018-05-22 | 2018-11-06 | 出门问问信息科技有限公司 | A kind of method of speech processing and device |
-
2019
- 2019-03-13 CN CN201910190519.3A patent/CN109901113B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090310796A1 (en) * | 2006-10-26 | 2009-12-17 | Parrot | method of reducing residual acoustic echo after echo suppression in a "hands-free" device |
CN108538305A (en) * | 2018-04-20 | 2018-09-14 | 百度在线网络技术(北京)有限公司 | Audio recognition method, device, equipment and computer readable storage medium |
CN108766456A (en) * | 2018-05-22 | 2018-11-06 | 出门问问信息科技有限公司 | A kind of method of speech processing and device |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112216295A (en) * | 2019-06-25 | 2021-01-12 | 大众问问(北京)信息科技有限公司 | Sound source positioning method, device and equipment |
CN112216295B (en) * | 2019-06-25 | 2024-04-26 | 大众问问(北京)信息科技有限公司 | Sound source positioning method, device and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN109901113B (en) | 2020-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200342887A1 (en) | Microphone array-based target voice acquisition method and device | |
US9094496B2 (en) | System and method for stereophonic acoustic echo cancellation | |
US8583428B2 (en) | Sound source separation using spatial filtering and regularization phases | |
CN110767247B (en) | Voice signal processing method, sound acquisition device and electronic equipment | |
US9106196B2 (en) | Sound field spatial stabilizer with echo spectral coherence compensation | |
US9084049B2 (en) | Automatic equalization using adaptive frequency-domain filtering and dynamic fast convolution | |
US20160293179A1 (en) | Extraction of reverberant sound using microphone arrays | |
Kumatani et al. | Microphone array processing for distant speech recognition: Towards real-world deployment | |
JP2013543987A (en) | System, method, apparatus and computer readable medium for far-field multi-source tracking and separation | |
CN109087663A (en) | signal processor | |
WO2005104091A2 (en) | Method and apparatus to detect and remove audio disturbances | |
US20160073209A1 (en) | Maintaining spatial stability utilizing common gain coefficient | |
CN111883153B (en) | Microphone array-based double-end speaking state detection method and device | |
US20170064454A1 (en) | Sound field spatial stabilizer | |
CN109727605A (en) | Handle the method and system of voice signal | |
US9743179B2 (en) | Sound field spatial stabilizer with structured noise compensation | |
CN109901113A (en) | A kind of voice signal localization method, apparatus and system based on complex environment | |
US20140376743A1 (en) | Sound field spatial stabilizer with structured noise compensation | |
JP6182169B2 (en) | Sound collecting apparatus, method and program thereof | |
CN113661720A (en) | Dynamic device speaker tuning for echo control | |
TW202312140A (en) | Conference terminal and feedback suppression method | |
EP2816818B1 (en) | Sound field spatial stabilizer with echo spectral coherence compensation | |
US20240236595A9 (en) | Generating restored spatial audio signals for occluded microphones | |
US20240071404A1 (en) | Input selection for wind noise reduction on wearable devices | |
EP2816817A1 (en) | Sound field spatial stabilizer with spectral coherence compensation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |