CN109901113A

CN109901113A - A kind of voice signal localization method, apparatus and system based on complex environment

Info

Publication number: CN109901113A
Application number: CN201910190519.3A
Authority: CN
Inventors: 李勤; 李楠
Original assignee: Chumen Wenwen Information Technology Co Ltd
Current assignee: Chumen Wenwen Information Technology Co Ltd
Priority date: 2019-03-13
Filing date: 2019-03-13
Publication date: 2019-06-18
Anticipated expiration: 2039-03-13
Also published as: CN109901113B

Abstract

The embodiment of the present invention specifically discloses a kind of voice signal localization method, apparatus and system based on complex environment, comprising: according to loudspeaker reference signal, carries out echo cancellation process respectively at least two-way desired signal, obtains at least the first residual signals of two-way；Inhibit transmission function according to default echo, carries out echo inhibition processing respectively per the first residual signals all the way at least the first residual signals of two-way, obtain at least the second residual signals of two-way；According to default noise suppressed transmission function, noise suppressed processing is carried out per the second residual signals all the way at least the second residual signals of two-way respectively, obtains at least two-way third residual signals；According at least two-way third residual signals, when to determine ambient sound currently be voice status, the voice signal in environment is positioned.By this kind of mode, interference of the non-targeted voice status signal to auditory localization is reduced.It realizes and precise positioning is carried out to sound source, improve auditory localization robustness.

Description

A kind of voice signal localization method, apparatus and system based on complex environment

Technical field

The present embodiments relate to signal processing technology fields, and in particular to a kind of voice signal based on complex environment is fixed Position method, apparatus and system.

Background technique

Auditory localization algorithm is generally basede on the array of multiple microphone compositions, utilizes the phase between multipath input audio signal Position difference estimation Sounnd source direction information.But in audio device speakers playing audio signal and when in the higher noise environment, There are the echo signal of higher-energy and noise signal in microphone signal, these signals can be to auditory localization algorithm to target language The positioning in speech source causes extremely strong interference, causes algorithm azimuth information calculating error very big, robustness is low.

To solve the above-mentioned problems, it exists in the prior art and echo cancellor and mute detection scheduling algorithm is added to system In, but this method in nonlinear echo compared in the strong and higher situation of ambient noise, robustness is still very low.

How to guarantee in the complex environment even at higher noise and compared with strong nonlinearity echo, still can to sound source into Row precise positioning, improving auditory localization robustness becomes the application technical problem urgently to be resolved.

Summary of the invention

For this purpose, the embodiment of the present invention provides a kind of voice signal localization method, apparatus and system based on complex environment, with It solves in the prior art in the complex environment in higher noise and compared with strong nonlinearity echo, to auditory localization inaccuracy, Shandong The low problem of stick.

To achieve the goals above, the embodiment of the present invention provides the following technical solutions:

In a first aspect, the embodiment of the invention provides a kind of voice signal localization method based on complex environment, this method Include:

According to the loudspeaker reference signal of pre-acquiring, it is expected all the way every in at least desired signal of two-way microphone pick Signal carries out echo cancellation process respectively, obtains at least the first residual signals of two-way；

Inhibit transmission function according to default echo, divides per the first residual signals all the way at least the first residual signals of two-way Not carry out echo inhibition processing, obtain at least the second residual signals of two-way；

According to default noise suppressed transmission function, respectively to every second residual error all the way at least the second residual signals of two-way Signal carries out noise suppressed processing, obtains at least two-way third residual signals；According at least two-way third residual signals, to environment Sound is presently in state and is detected；

When determine ambient sound be presently in state be voice status when, according at least two-way third residual signals to environment In voice signal positioned.

The embodiment of the present invention is further characterized in that, when determining ambient sound to be presently in state being mute state, will before The positioning result once positioned to the voice signal in environment is as this positioning result.

The embodiment of the present invention is further characterized in that, when determining ambient sound to be presently in state being voice status, according to At least two-way third residual signals progress auditory localization positions the voice signal in environment, specifically includes:

It is smoothed respectively to per third residual signals all the way, obtains at least letter of the two-way after smoothing processing Number；

Auditory localization is carried out according to signal of at least two-way after smoothing processing to determine the voice signal in environment Position.

The embodiment of the present invention is further characterized in that, according to the loudspeaker reference signal of pre-acquiring, at least two-way microphone Per desired signal all the way in the desired signal of acquisition, echo cancellation process is carried out respectively to the loudspeaker reference signal of pre-acquiring, At least the first residual signals of two-way are obtained, are specifically included:

Loudspeaker reference signal is input in n-th of sef-adapting filter, n-th of output signal is obtained；

N-th of desired signal and n-th of input signal output signal are subjected to difference operation, obtain n-th of first residual errors Signal, wherein n is the positive integer of the number more than or equal to 1, and less than or equal to desired signal.

The embodiment of the present invention is further characterized in that, presets echo and inhibits transmission function are as follows: according to loudspeaker reference signal and First via desired signal carries out echo to the first residual signals of the first via and the echo used when processing is inhibited to inhibit transmission function； Inhibit transmission function according to default echo, is returned respectively at least the first residual signals of two-way per the first residual signals all the way Sound inhibition processing, obtains at least the second residual signals of two-way, specifically includes:

According to loudspeaker reference signal and first via desired signal, the first residual signals of the first via are carried out at echo inhibition Reason obtains the second residual signals of the first via, and records the echo carried out when echo inhibits processing to the first residual signals of the first via Inhibit transmission function, wherein the first residual signals of the first via are the first residual error of any road at least in the first residual signals of two-way Signal, first via desired signal are desired signal corresponding with the first residual signals of the first via；

According to echo inhibit transmission function, respectively at least the first residual signals of two-way remove the first residual signals of the first via Except echo inhibition processing is carried out per the first residual signals all the way, obtain at least the second residual signals all the way.

The embodiment of the present invention is further characterized in that, preset noise suppressed transmission function be to the second residual signals of the first via into Noise suppressed transmission function when the processing of row noise suppressed；According to default noise suppressed transmission function, respectively at least two-way Noise suppressed processing is carried out per the second residual signals all the way in two residual signals, obtains at least two-way third residual signals, tool Body includes:

Noise suppressed processing is carried out to the second residual signals of the first via, obtains first via third residual signals, and record pair The second residual signals of the first via carry out noise suppressed transmission function when noise suppressed processing；

According to noise suppressed transmission function, in addition to the second residual signals of the first via per the second residual signals all the way into The processing of row noise suppressed, obtains at least third residual signals all the way.

Second aspect, the embodiment of the invention also provides a kind of voice signal positioning device based on complex environment, the dress It sets and includes:

Echo cancellation module, for the loudspeaker reference signal according to pre-acquiring, to the phase of at least two-way microphone pick It hopes in signal and carries out echo cancellation process respectively per desired signal all the way, obtain at least the first residual signals of two-way；

Echo suppression module, for inhibiting transmission function according to default echo, to every at least the first residual signals of two-way The first residual signals carry out echo inhibition processing respectively all the way, obtain at least the second residual signals of two-way；

Noise suppression module is used for according to default noise suppressed transmission function, respectively at least the second residual signals of two-way In noise suppressed processing is carried out per the second residual signals all the way, obtain at least two-way third residual signals；

State detection module, for being presently in state to ambient sound and carrying out according at least two-way third residual signals Detection；

Voice signal locating module, for when determine ambient sound be presently in state be voice status when, according at least Two-way third residual signals position the voice signal in environment.

The embodiment of the present invention is further characterized in that voice signal locating module is also used to, when determining the current institute of ambient sound When place's state is mute state, once the positioning result that the voice signal in environment positions is determined as this sound source using preceding Position result.

The embodiment of the present invention is further characterized in that voice signal locating module is specifically used for:

The voice signal in environment is positioned according to signal of at least two-way after smoothing processing.

The embodiment of the present invention is further characterized in that echo cancellation module is specifically used for:

N-th of desired signal and n-th of output signal are subjected to difference operation, obtain n-th of first residual signals, In, n is the positive integer of the number more than or equal to 1, and less than or equal to desired signal.

The embodiment of the present invention is further characterized in that, presets echo and inhibits transmission function are as follows: according to loudspeaker reference signal and First via desired signal carries out echo to the first residual signals of the first via and the echo used when processing is inhibited to inhibit transmission function；

Echo suppression module is specifically used for, according to loudspeaker reference signal and first via desired signal, to the first via first Residual signals carry out echo inhibition processing, obtain the second residual signals of the first via, and record to the first residual signals of the first via into Row echo inhibits echo when processing to inhibit transmission function, wherein the first residual signals of the first via are at least the first residual error of two-way The first residual signals of any road in signal, first via desired signal are expectation corresponding with the first residual signals of first via letter Number；

The embodiment of the present invention is further characterized in that, preset noise suppressed transmission function be to the second residual signals of the first via into Noise suppressed transmission function when the processing of row noise suppressed；

Noise suppression module is specifically used for carrying out noise suppressed processing to the second residual signals of the first via, obtains the first via Third residual signals, and record noise suppressed transmission function when the second residual signals of the first via are carried out with noise suppressed processing；

The third aspect, the voice signal positioning system based on complex environment that the embodiment of the invention also provides a kind of, this is System includes: processor and memory；

Memory is for storing one or more program instructions；

Processor, for running one or more program instructions, to execute a kind of voice based on complex environment as above Method step either in the method for signal framing.

Fourth aspect includes in computer storage medium the embodiment of the invention also provides a kind of computer storage medium One or more program instructions, one or more program instructions are used for by a kind of voice signal positioning system based on complex environment Either execute in a kind of voice signal localization method based on complex environment of first aspect as above method step.

Embodiment according to the present invention has the advantages that using echo cancellation process method, refers to according to loudspeaker Signal is linearly disappeared in at least desired signal of two-way microphone pick per the echo signal in desired signal all the way respectively It removes.To have the function that certain echo cancellor.Echo inhibition processing is carried out at least the first residual signals of two-way respectively, is obtained At least the second residual signals of two-way.By this kind of mode, the non-linear component in the first residual signals can be inhibited；Moreover, right At least the first residual signals of two-way are all made of the same echo inhibition transmission function, it is ensured that in each the first residual signals of road The Nonlinear Processing that voice phase information is subject to is identical, that is to say that nonlinear impairments are consistent, to avoid inconsistent non-linear It is distorted bring severe jamming, system is improved and is inhibiting the robustness in noise.Then, then at least the second residual signals of two-way Noise suppressed processing is carried out, stationary noise is significantly reduced, improves robustness of the system in stationary noise.It is handled with echo inhibition Similar, when carrying out noise suppressed processing at least the second residual signals of two-way, it is all made of the same noise suppressed transmission function, Avoid non-linear distortion bring severe jamming.Final basis at least two-way third residual signals, are presently in ambient sound State is detected, and only when ambient sound status is voice status, is carried out respectively to per third residual signals all the way After smoothing processing, then sound source is positioned, further decreases interference of the non-targeted voice status signal to auditory localization.

Detailed description of the invention

It, below will be to embodiment party in order to illustrate more clearly of embodiments of the present invention or technical solution in the prior art Formula or attached drawing needed to be used in the description of the prior art are briefly described.It should be evident that the accompanying drawings in the following description is only It is merely exemplary, it for those of ordinary skill in the art, without creative efforts, can also basis The attached drawing of offer, which is extended, obtains other implementation attached drawings.

Fig. 1 is a kind of voice signal localization method process signal based on complex environment that one embodiment of the invention provides Figure；

Fig. 2 is a kind of voice signal positioning principle structural representation based on complex environment that one embodiment of the invention provides Figure；

Fig. 3 be another embodiment of the present invention provides it is a kind of based on complex environment voice signal positioning device structure signal Figure；

Fig. 4 be another embodiment of the present invention provides it is a kind of based on complex environment voice signal positioning system structure signal Figure.

Specific embodiment

Embodiments of the present invention are illustrated by particular specific embodiment below, those skilled in the art can be by this explanation Content disclosed by book is understood other advantages and efficacy of the present invention easily, it is clear that described embodiment is the present invention one Section Example, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not doing Every other embodiment obtained under the premise of creative work out, shall fall within the protection scope of the present invention.

The embodiment of the present invention 1 provides a kind of voice signal localization method based on complex environment, specific such as Fig. 1 and Fig. 2 Shown, Fig. 1 shows the voice signal localization method flow diagram based on complex environment, and Fig. 2 shows be based on complex environment Voice signal positioning principle structural schematic diagram.This method comprises:

Step 110, according to the loudspeaker reference signal of pre-acquiring, to every in at least desired signal of two-way microphone pick Desired signal carries out echo cancellation process respectively all the way, obtains at least the first residual signals of two-way.

Specifically, automatic echo cancellation (Automatic Echo Cancellation, abbreviation AEC) method can be used Remove linear echo segment in desired signal.Xref (z) shown in Figure 2 is the reference letter that audio frequency apparatus inputs to loudspeaker Number, at least two-way audio signal that D0 (z), D1 (z) ..., Dn-1 (z) receive at least two microphones disappears as echo Except desired signal in algorithm, wherein n is the positive integer more than or equal to 1.

Its realization principle is that loudspeaker reference signal is input in n-th of sef-adapting filter, obtains n-th of output Signal；

If setting Waesn-1 (z) as the transmission function of n-th of sef-adapting filter, Ecn-1 (z) is n-th of first residual errors Signal then has: Ecn-1 (z)=Dn-1 (z)-Waesn-1 (z) Xref (z)；

It that is to say:

Ec0 (z)=D0 (z)-Waec0 (z) Xref (z),

Ec1 (z)=D1 (z)-Waec1 (z) Xref (z),

...

Ecn-1 (z)=Dn-1 (z)-Waecn-1 (z) Xref (z)

Although the linear echo part in desired signal can be removed by step 110, moreover, the process is linear place Reason, therefore the phase information in signal will not be destroyed.It eliminates certain echo that is, can reach and does not influence auditory localization institute simultaneously Need the effect of information.However, linear process cannot be eliminated by audio system non-linear distortion bring nonlinear echo part, because This needs to be implemented step 120 to step 130, inhibits (Acoustic Echo Suppression, abbreviation using automatic echo AES) method removes the nonlinear echo part in the first residual signals.

Step 120, inhibit transmission function according to default echo, to residual per all the way first at least the first residual signals of two-way Difference signal carries out echo inhibition processing respectively, obtains at least the second residual signals of two-way.

Optionally, presetting echo inhibits transmission function can be with are as follows: according to loudspeaker reference signal and first via desired signal, Carrying out echo to the first residual signals of the first via inhibits the echo used when processing to inhibit transmission function.

When executing step 120, it can specifically include following steps:

According to loudspeaker reference signal and first via desired signal, the first residual signals of the first via are carried out at echo inhibition Reason obtains the second residual signals of the first via, and records the echo carried out when echo inhibits processing to the first residual signals of the first via Inhibit transmission function.

Wherein, the first residual signals of the first via are the first residual signals of any road at least in the first residual signals of two-way, First via desired signal is desired signal corresponding with the first residual signals of the first via.

Concrete processing procedure is expressed as follows by formula:

Esn-1 (z)=Waesn-1 (z) Ecn-1 (z)；

It that is to say,

Es0 (z)=Waes0 (z) Ec0 (z)；

Es1 (z)=Waes1 (z) Ec1 (z)；

...

Esn-1 (z)=Waesn-1 (z) Ecn-1 (z).

Wherein, Esn-1 (z) is n-th the second residual signals of tunnel, and Waesn-1 (z) is that echo inhibits transmission function, Ecn-1 It (z) is n-th the first residual signals of tunnel.

Although AES method can inhibit the non-linear component in the first residual signals, Nonlinear Processing It destroys per the phase information in signal all the way.If used when to carrying out echo inhibition processing per the first residual signals all the way Be different echo inhibit transmission function if, strong influence will necessarily be caused to final auditory localization.In order to solve The problem proposes in this implementation to be all made of the same echo suppression to when carrying out echo per the first residual signals all the way and inhibiting processing Modulation trnasfer function.It that is to say and carry out echo inhibition processing using the same filter transfer function.By this kind of mode, guarantee AES method is identical to the Nonlinear Processing carried out per the first residual signals all the way, that is to say that nonlinear impairments are consistent, passes through this kind Mode avoids inconsistent non-linear distortion bring severe jamming.Meanwhile considering in audio frequency apparatus in each microphone Signal similarity is higher, and carrying out echo inhibition processing to other road signals using the filter of signal all the way can also play preferably Effect.

After executing the step 120, the nonlinear echo part in the first residual signals of removal may be implemented.And it connects down Come, then need to carry out noise suppressed processing, further promotes the robustness of auditory localization.That is, executing step 130.

Step 130, according to default noise suppressed transmission function, respectively at least the second residual signals of two-way per all the way Second residual signals carry out noise suppressed processing, obtain at least two-way third residual signals.

When executing step 130, it can specifically include following steps:

Noise suppressed processing is carried out to the second residual signals of the first via, obtains first via third residual signals, and record pair The second residual signals of the first via carry out noise suppressed transmission function when noise suppressed processing.

Concrete processing procedure is expressed as follows by formula:

Ern-1 (z)=Waes (z) Esn-1 (z)；

It that is to say:

Er0 (z)=Wnr (z) Es0 (z)

Er1 (z)=Wnr (z) Es1 (z),

…

Ern-1 (z)=Wnr (z) Esn-1 (z)

Wherein, Ern-1 (z) is the n-th third residual signals, and Wnr (z) is noise suppressed transmission function, and Esn-1 (z) is n-th Second residual signals.

Noise suppressed processing is similarly Nonlinear Processing, can equally destroy the nonlinear transformations of signal.Therefore, at least two-way Second residual signals use the same noise suppressed transmission function, that is to say and carry out noise using the same filter transfer function Inhibition processing, guarantees that the non-linear distortion of each road third residual signals is identical, and it is broken to signal phase information to reduce Nonlinear Processing The interference of bad bring.Noise suppression treatment process can be using prior art realization, for example, by using noise reduction method, commonly Noise reduction method may include spectrum sword method, Wiener filtering etc., and noise reduction method can also significantly reduce stationary noise, Robustness of the raising system in stationary noise.

Step 140, according at least two-way third residual signals, state is presently in ambient sound and is detected.

Specifically, according at least two-way third residual signals, determining that ambient sound is presently in using condition detection method State.It may include voice status or mute state that ambient sound, which is presently in state,.If mute state, then in Fig. 2 If voice=0；If voice status, then the if voice=1 in Fig. 2.Specific condition detection method is routine techniques, example As using mute detection method based on signal-to-noise ratio, or the mute detection method based on machine learning etc..

Step 150, when determine ambient sound be presently in state be voice status when, according at least two-way third residual error believe Number the voice signal in environment is positioned.

In the present embodiment, when determine ambient sound be presently in state be voice status when, according at least two-way third Residual signals position the voice signal in environment, specifically include:

Specifically, can be smoothed in the following way at least two-way third residual signals:

E0 (z)=smoothfactor*E0 (z)+(1-smoothfactor) * Er0 (z),

E1 (z)=smoothfactor*E0 (z)+(1-smoothfactor) * Er1 (z),

…

En-1 (z)=smoothfactor*E0 (z)+(1-smoothfactor) * Ern-1 (z)

Wherein, smoothfactor is smoothing factor, and numerical value is generally the real number in (0,1) section, and specific value can be by Staff is previously set, and in the present embodiment, smoothfactor=0.9 can be set.

And sound localization method can use existing method, such as the undistorted response of minimum variance based on Estimation of Spatial Spectrum (Minimum Variance Distortionless Response, abbreviation MVDR) algorithm and multiple signal classification (Multiple Signal Classification, abbreviation MUSIC) algorithm, the broad sense cross-correlation based on reaching time-difference (Generalized Cross Correlation, abbreviation GCC) algorithm, and the auditory localization based on Wave beam forming scheduling algorithm Method.Specific auditory localization process is also the prior art, does not do excessive explanation here.

Optionally, when determine ambient sound be presently in state be mute state when, then not at least two-way third residual error Signal does any processing, but directly exports this at least two-way third residual signals, and previous auditory localization result is made For this auditory localization result.

After only the signal of voice status is smoothed in the present embodiment, as the input signal of auditory localization, It is the interference in order to further decrease non-targeted speech phase signal to auditory localization.And time smoothing can drop to a certain extent Interference of the low short-term burst noise to auditory localization, keeps algorithm more stable to the estimation of targeted voice signal.

A kind of voice signal localization method based on complex environment provided in an embodiment of the present invention, utilizes echo cancellation process Method, according to loudspeaker reference signal, respectively in desired signal all the way every in at least desired signal of two-way microphone pick Echo signal linearly eliminated.To have the function that certain echo cancellor.At least the first residual signals of two-way are distinguished Echo inhibition processing is carried out, at least the second residual signals of two-way are obtained.By this kind of mode, can inhibit in the first residual signals Non-linear component；Moreover, being all made of the same echo at least the first residual signals of two-way inhibits transmission function, it is ensured that The Nonlinear Processing that voice phase information in each the first residual signals of road is subject to is identical, that is to say that nonlinear impairments are consistent, from And inconsistent non-linear distortion bring severe jamming is avoided, it improves system and is inhibiting the robustness in noise.Then, then it is right At least the second residual signals of two-way carry out noise suppressed processing, significantly reduce stationary noise, improve system in stationary noise Robustness.Inhibit processing similar with echo, when carrying out noise suppressed processing at least the second residual signals of two-way, is all made of same One noise suppressed transmission function avoids non-linear distortion bring severe jamming.It is final to be believed according at least two-way third residual error Number, state is presently in ambient sound and is detected, only when ambient sound status is voice status, to per all the way After third residual signals are smoothed respectively, then sound source is positioned, further decreases non-targeted voice status signal Interference to auditory localization.

Corresponding with above-described embodiment 1, it is fixed that the embodiment of the invention also provides a kind of voice signals based on complex environment Position device, specifically as shown in figure 3, the device includes: echo cancellation module 301, echo suppression module 302, noise suppression module 303, state detection module 304 and voice signal locating module 305.

Echo cancellation module 301, for the loudspeaker reference signal according to pre-acquiring, at least two-way microphone pick Echo cancellation process is carried out respectively per desired signal all the way in desired signal, obtains at least the first residual signals of two-way；

Echo suppression module 302, for inhibiting transmission function according to default echo, at least the first residual signals of two-way Echo inhibition processing is carried out respectively per the first residual signals all the way, obtains at least the second residual signals of two-way.

Optionally, presetting echo inhibits transmission function can be with are as follows: according to loudspeaker reference signal and first via desired signal, Carrying out echo to the first residual signals of the first via inhibits the echo used when processing to inhibit transmission function；

Echo suppression module 302 is specifically used for, according to loudspeaker reference signal and first via desired signal, to the first via One residual signals carry out echo inhibition processing, obtain the second residual signals of the first via, and record to the first residual signals of the first via Carrying out echo inhibits echo when processing to inhibit transmission function, wherein the first residual signals of the first via are that at least two-way first is residual The first residual signals of any road in difference signal, first via desired signal are expectation corresponding with the first residual signals of first via letter Number；

According to echo inhibit transmission function, respectively at least the first residual signals of two-way remove the first residual signals of the first via Except echo inhibition processing is carried out per the first residual signals all the way, obtain at least the second residual signals all the way；

Noise suppression module 303, for being believed at least the second residual error of two-way respectively according to default noise suppressed transmission function Noise suppressed processing is carried out per the second residual signals all the way in number, obtains at least two-way third residual signals.

Optionally, presetting noise suppressed transmission function is when the second residual signals of the first via are carried out with noise suppressed processing Noise suppressed transmission function.

Noise suppression module 303 is specifically used for, and carries out noise suppressed processing to the second residual signals of the first via, obtains first Road third residual signals, and the noise suppressed recorded when the second residual signals of the first via are carried out with noise suppressed processing transmits letter Number；

According to noise suppressed transmission function, in addition to the second residual signals of the first via per the second residual signals all the way into The processing of row noise suppressed, obtains at least third residual signals all the way；

State detection module 304, for according at least two-way third residual signals, to ambient sound be presently in state into Row detection；

Voice signal locating module 305, for when determine ambient sound be presently in state be voice status when, according to extremely Few two-way third residual signals carry out auditory localization and position to the voice signal in environment.

Optionally, voice signal locating module 305 is also used to, and is mute state when determining that ambient sound is presently in state When, it is directly output to few two-way third residual signals, and by the preceding positioning knot once positioned to the voice signal in environment Fruit is as this auditory localization result.

Optionally, voice signal locating module 305 is specifically used for, and is smoothly located respectively to per third residual signals all the way Reason obtains at least signal of the two-way after smoothing processing；

Optionally, echo cancellation module 301 is specifically used for: loudspeaker reference signal is input to n-th of adaptive-filtering In device, n-th of output signal is obtained；

In a kind of voice signal positioning device based on complex environment provided in an embodiment of the present invention performed by each component Function has been discussed in detail in above-described embodiment 1, therefore does not do excessively repeat here.

A kind of voice signal positioning device based on complex environment provided in an embodiment of the present invention, utilizes echo cancellation process Method, according to loudspeaker reference signal, respectively in desired signal all the way every in at least desired signal of two-way microphone pick Echo signal linearly eliminated.To have the function that certain echo cancellor.At least the first residual signals of two-way are distinguished Echo inhibition processing is carried out, at least the second residual signals of two-way are obtained.By this kind of mode, can inhibit in the first residual signals Non-linear component；Moreover, being all made of the same echo at least the first residual signals of two-way inhibits transmission function, it is ensured that The Nonlinear Processing that voice phase information in each the first residual signals of road is subject to is identical, that is to say that nonlinear impairments are consistent, from And inconsistent non-linear distortion bring severe jamming is avoided, it improves system and is inhibiting the robustness in noise.Then, then it is right At least the second residual signals of two-way carry out noise suppressed processing, significantly reduce stationary noise, improve system in stationary noise Robustness.Inhibit processing similar with echo, when carrying out noise suppressed processing at least the second residual signals of two-way, is all made of same One noise suppressed transmission function avoids non-linear distortion bring severe jamming.It is final to be believed according at least two-way third residual error Number, state is presently in ambient sound and is detected, only when ambient sound status is voice status, to per all the way After third residual signals are smoothed respectively, then sound source is positioned, further decreases non-targeted voice status signal Interference to auditory localization.

Corresponding with above-described embodiment 1, the embodiment of the invention also provides a kind of voice signals based on complex environment Positioning system, specifically as shown in figure 4, the system includes: processor 401 and memory 402.

Memory 402 is for storing one or more program instructions；

Processor 401, for running one or more program instructions, a kind of base for being introduced to execute embodiment as above In the method that the voice signal of complex environment positions.

A kind of voice signal positioning system based on complex environment provided in an embodiment of the present invention, utilizes echo cancellation process Method, according to loudspeaker reference signal, respectively in desired signal all the way every in at least desired signal of two-way microphone pick Echo signal linearly eliminated.To have the function that certain echo cancellor.At least the first residual signals of two-way are distinguished Echo inhibition processing is carried out, at least the second residual signals of two-way are obtained.By this kind of mode, can inhibit in the first residual signals Non-linear component；Moreover, being all made of the same echo at least the first residual signals of two-way inhibits transmission function, it is ensured that The Nonlinear Processing that voice phase information in each the first residual signals of road is subject to is identical, that is to say that nonlinear impairments are consistent, from And inconsistent non-linear distortion bring severe jamming is avoided, it improves system and is inhibiting the robustness in noise.Then, then it is right At least the second residual signals of two-way carry out noise suppressed processing, significantly reduce stationary noise, improve system in stationary noise Robustness.Inhibit processing similar with echo, when carrying out noise suppressed processing at least the second residual signals of two-way, is all made of same One noise suppressed transmission function avoids non-linear distortion bring severe jamming.It is final to be believed according at least two-way third residual error Number, state is presently in ambient sound and is detected, only when ambient sound status is voice status, to per all the way After third residual signals are smoothed respectively, then sound source is positioned, further decreases non-targeted voice status signal Interference to auditory localization.

Corresponding with above-described embodiment, the embodiment of the invention also provides a kind of computer storage medium, the computers Include one or more program instructions in storage medium.Wherein, one or more program instructions are used for by one kind based on complicated ring The voice signal positioning system in border executes a kind of voice signal localization method based on complex environment introduced such as embodiment 1.

Although above having used general explanation and specific embodiment, the present invention is described in detail, at this On the basis of invention, it can be made some modifications or improvements, this will be apparent to those skilled in the art.Therefore, These modifications or improvements without departing from theon the basis of the spirit of the present invention are fallen within the scope of the claimed invention.

Claims

1. a kind of voice signal localization method based on complex environment, which is characterized in that the described method includes:

According to the loudspeaker reference signal of pre-acquiring, to desired signal all the way every in at least desired signal of two-way microphone pick Echo cancellation process is carried out respectively, obtains at least the first residual signals of two-way；

Inhibit transmission function according to default echo, divides per the first residual signals all the way in first residual signals of at least two-way Not carry out echo inhibition processing, obtain at least the second residual signals of two-way；

According to default noise suppressed transmission function, respectively to every second residual error all the way in second residual signals of at least two-way Signal carries out noise suppressed processing, obtains at least two-way third residual signals；

According at least two-way third residual signals, state is presently in ambient sound and is detected；

When determine the ambient sound be presently in state be voice status when, according at least two-way third residual signals pair Voice signal in environment is positioned.

2. the method according to claim 1, wherein being mute when determining that the ambient sound is presently in state When state, using the preceding positioning result once positioned to the voice signal in environment as this positioning result.

3. the method according to claim 1, wherein described be when the determining ambient sound is presently in state When voice status, the voice signal in environment is positioned according at least two-way third residual signals, is specifically included:

It is smoothed respectively to per third residual signals all the way, obtains at least signal of the two-way after smoothing processing；

4. method according to claim 1-3, which is characterized in that according to the loudspeaker reference signal of pre-acquiring, Echo cancellation process is carried out respectively per desired signal all the way in at least desired signal of two-way microphone pick, obtains at least two The first residual signals of road, specifically include:

The loudspeaker reference signal is input in n-th of sef-adapting filter, n-th of output signal is obtained；

N-th of desired signal and n-th of output signal are subjected to difference operation, obtain n-th of first residual signals, In, n is the positive integer of the number more than or equal to 1, and less than or equal to the desired signal.

5. method according to claim 1-3, which is characterized in that the default echo inhibits transmission function are as follows: According to the loudspeaker reference signal and first via desired signal, when carrying out echo inhibition processing to the first residual signals of the first via The echo of use inhibits transmission function；It is described to inhibit transmission function according to default echo, first residual error of at least two-way is believed Echo inhibition processing is carried out respectively per the first residual signals all the way in number, is obtained at least the second residual signals of two-way, is specifically included:

According to the loudspeaker reference signal and first via desired signal, the first residual signals of the first via are carried out at echo inhibition Reason obtains the second residual signals of the first via, and records and carry out when echo inhibits to handle to first residual signals of the first via Echo inhibits transmission function, wherein first residual signals of the first via are appointing at least first residual signals of two-way First residual signals all the way, the first via desired signal are desired signal corresponding with first residual signals of the first via；

According to the echo inhibit transmission function, respectively in first residual signals of at least two-way remove the first via first Echo inhibition processing is carried out per the first residual signals all the way except residual signals, obtains at least the second residual signals all the way.

6. method according to claim 1-3, which is characterized in that the default noise suppressed transmission function is pair The second residual signals of the first via carry out noise suppressed transmission function when noise suppressed processing；It is described to be passed according to default noise suppressed Delivery function carries out noise suppressed processing per the second residual signals all the way in second residual signals of at least two-way respectively, At least two-way third residual signals are obtained, are specifically included:

Noise suppressed processing is carried out to second residual signals of the first via, obtains first via third residual signals, and record pair Second residual signals of the first via carry out noise suppressed transmission function when noise suppressed processing；

According to the noise suppressed transmission function, to believing in addition to second residual signals of the first via per the second residual error all the way Number noise suppressed processing is carried out, obtains at least third residual signals all the way.

7. a kind of voice signal positioning device based on complex environment, which is characterized in that described device includes:

Echo cancellation module believes the expectation of at least two-way microphone pick for the loudspeaker reference signal according to pre-acquiring Echo cancellation process is carried out respectively per desired signal all the way in number, obtains at least the first residual signals of two-way；

Echo suppression module, for inhibiting transmission function according to default echo, to every in first residual signals of at least two-way The first residual signals carry out echo inhibition processing respectively all the way, obtain at least the second residual signals of two-way；

Noise suppression module is used for according to default noise suppressed transmission function, respectively to second residual signals of at least two-way In noise suppressed processing is carried out per the second residual signals all the way, obtain at least two-way third residual signals；

Voice signal locating module, for when determine the ambient sound be presently in state be voice status when, according to described At least two-way third residual signals position the voice signal in environment.

8. device according to claim 7, which is characterized in that the voice signal locating module is also used to, when determining State ambient sound be presently in state be mute state when, by the preceding positioning knot once positioned to the voice signal in environment Fruit is as this auditory localization result.

9. a kind of voice signal positioning system based on complex environment, which is characterized in that the system comprises: processor and storage Device；

The memory is for storing one or more program instructions；

The processor, for running one or more of program instructions, to execute as described in claim any one of 1-6 Method.

10. a kind of computer storage medium, which is characterized in that refer in the computer storage medium comprising one or more programs It enables, one or more of program instructions are used for the voice signal positioning system by a kind of based on complex environment and execute as right is wanted Seek the described in any item method and steps of 1-6.