CN109102822A

CN109102822A - A kind of filtering method and device formed based on fixed beam

Info

Publication number: CN109102822A
Application number: CN201810828327.6A
Authority: CN
Inventors: 孙思宁; 黄美玉
Original assignee: Chumen Wenwen Information Technology Co Ltd
Current assignee: Volkswagen China Investment Co Ltd; Mobvoi Innovation Technology Co Ltd
Priority date: 2018-07-25
Filing date: 2018-07-25
Publication date: 2018-12-28
Anticipated expiration: 2038-07-25
Also published as: CN109102822B

Abstract

The embodiment of the present invention provides a kind of filtering method and device formed based on fixed beam, the described method includes: obtaining multicenter voice signal to be processed, wherein, the multicenter voice signal includes at least the voice signal from target sound source and the interference signal from interference sound source；Beam forming is fixed to the multicenter voice signal in the fixed beam form factor being pointed in different directions based on pre-set at least two, obtains voice estimation signal and disturbance estimation signal；Signal and the disturbance estimation signal are estimated based on the voice, calculate post-filtering parameter；Based on the post-filtering parameter, voice estimation signal is filtered, the voice signal that obtains that treated.In this way, being filtered by post-filtering parameter to the voice signal after beam forming, it can be ensured that the voice signal that target sound source is directed toward is undistorted, and effectively inhibits other interference signals.

Description

A kind of filtering method and device formed based on fixed beam

Technical field

The present embodiments relate to signal processing technology field more particularly to a kind of filtering sides formed based on fixed beam Method and device.

Background technique

With the rise of smart home, Internet of Things, the electronic equipments such as intelligent sound box, wearable device, smart phone it is fast Speed is universal, user for electronic equipment function and it is intelligentized require it is higher and higher, in order to enable human-computer interaction is more natural Simple and direct, most of electronic equipments are equipped with intelligent sound interactive function.But when user and electronic equipment apart from it is larger when, When electronic equipment passes through sensor array (such as microphone array) remote pickup, due to by including in true environment Ambient noise (such as background music), other voice, a variety of interference of reverberation, can make electronic equipment target user collected Quality of speech signal is poor, causes speech discrimination accuracy lower.

Currently, would generally use beam forming (Beamforming) when acquiring user speech, beam forming is a kind of use In the signal processing technology of sensor array (such as microphone array), received for phasing signal and to the voice signal received Carry out signal processing appropriate.

Inventor has found in research beam forming procedure, due to often there is reverberation and the wave beam of time-varying in true environment The presence of secondary lobe in formation algorithm, and constrained by sensor array geometry and intelligent terminal design conditions, cause The interference signal of non-stationary cannot effectively be inhibited.

Summary of the invention

In view of this, the embodiment of the present invention provides a kind of filtering method and device formed based on fixed beam, main mesh The user voice signal for being to ensure that target sound source is directed toward it is undistorted, and the interference signal that other spaces are directed toward is carried out effective Inhibit.

In order to achieve the above objectives, the embodiment of the present invention mainly provides the following technical solutions:

In a first aspect, the embodiment of the present invention provides a kind of filtering method formed based on fixed beam, which comprises Obtain multicenter voice signal to be processed, wherein the multicenter voice signal includes at least the voice from target sound source Signal and from interference sound source interference signal；The fixed beam being pointed in different directions based on pre-set at least two at Beam forming is fixed to the multicenter voice signal in shape coefficient, obtains voice estimation signal and disturbance estimation signal；Base Signal and the disturbance estimation signal are estimated in the voice, calculate post-filtering parameter；It is right based on the post-filtering parameter Voice estimation signal is filtered, the voice signal that obtains that treated.

Second aspect, the embodiment of the present invention provide a kind of filter formed based on fixed beam, and described device includes: Obtaining unit, for obtaining multicenter voice signal to be processed, wherein the multicenter voice signal, which includes at least, comes from mesh Mark the voice signal of sound source and the interference signal from interference sound source；Beam shaping elements, for based on it is pre-set extremely Few two fixed beam form factors being pointed in different directions, are fixed beam forming to the multicenter voice signal, obtain Obtain voice estimation signal and disturbance estimation signal；Computing unit, for based on voice estimation signal and the Interference Estimation Signal calculates post-filtering parameter；Filter unit, for be based on the post-filtering parameter, to the voice estimate signal into Row filtering processing, obtains treated voice signal.

The third aspect, the embodiment of the present invention provide a kind of computer readable storage medium, and the storage medium includes storage Program, wherein equipment where controlling the storage medium in described program operation executes above-mentioned to be formed based on fixed beam Filtering method the step of.

Fourth aspect, the embodiment of the present invention provide a kind of electronic equipment, and the equipment includes: at least one processor；With And at least one processor, the bus being connected to the processor；Wherein, the processor, memory are complete by the bus At mutual communication；The processor is used to call the program instruction in the memory, above-mentioned based on fixed wave to execute The step of filtering method that beam is formed.

The filtering method and device provided in an embodiment of the present invention formed based on fixed beam is being obtained while including mesh Mark sound source voice signal and interference sound source interference signal multicenter voice signal after, can first based on it is pre-set at least Two fixed beam form factors being pointed in different directions are fixed beam forming to multicenter voice signal, obtain voice Estimate signal and disturbance estimation signal.Next, can estimate signal according to voice obtained after beam forming is fixed Post-filtering parameter is calculated with disturbance estimation signal.Finally, being carried out by the post-filtering parameter to voice estimation signal Filtering processing obtains treated voice signal.In this way, first passing through fixed beam forming carries out wave beam enhancing to voice signal, The user voice signal that target sound source direction can be enhanced inhibits the interference signal in other directions, then passes through post-filtering parameter Post-filtering is carried out to enhanced voice signal, it can be to a large amount of in the user voice signal enhanced after single Wave beam forming Remaining interference signal is effectively inhibited.To realize the interference signal for effectively inhibiting non-targeted Sounnd source direction.So, When being applied to remote pickup, it can be ensured that the user voice signal that target sound source is directed toward is undistorted, and refers to other spaces To interference signal effectively inhibited.

Detailed description of the invention

By reading the following detailed description of the preferred embodiment, various other advantages and benefit are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:

Fig. 1 is the flow diagram of the filtering method formed based on fixed beam in the embodiment of the present invention one；

Fig. 2A to Fig. 2 B is the schematic diagram of the microphone array in the embodiment of the present invention one；

Fig. 3 is the schematic diagram of the multicenter voice signal model in the embodiment of the present invention two；

Fig. 4 is the schematic diagram of more fixed beams in the embodiment of the present invention two；

Fig. 5 is the structural schematic diagram of the filter formed based on fixed beam in the embodiment of the present invention three；

Fig. 6 is the structural schematic diagram of the electronic equipment in the embodiment of the present invention four.

Specific embodiment

The exemplary embodiment that the present invention will be described in more detail below with reference to accompanying drawings.Although showing the present invention in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the present invention without should be by embodiments set forth here It is limited.It is to be able to thoroughly understand the present invention on the contrary, providing these embodiments, and can be by the scope of the present invention It is fully disclosed to those skilled in the art.

Embodiment one

The embodiment of the present invention provides a kind of filtering method formed based on fixed beam, in practical applications, should be based on solid The filtering method for determining Wave beam forming can be applied to various need to be filtered voice signal to obtain clean voice signal Occasion in, for example, in order to improve speech discrimination accuracy, needing to acquire sensor array in field of speech recognition Include interference signal voice signal identified before pretreatment remove division ring to enhance the voice signal of target user The interference signals such as border noise, other voice, obtain clean user voice signal.

It specifically, should be the filter formed based on fixed beam based on the executing subject for the filtering method that fixed beam is formed Wave apparatus should be can be built-in or external based on the filter that fixed beam is formed in an electronic equipment.

In practical applications, which can implement in a variety of manners.For example, described in the embodiment of the present invention Electronic equipment may include the smart home devices such as intelligent sound box, smart television, Intelligent set top box, such as smart phone, Carry-on equipment such as tablet computer, smartwatch, Intelligent bracelet etc..It is, of course, also possible to be adopted to be other types of with user speech The electronic equipment of collection and processing function, such as laptop.Here, to the specific implementation of electronic equipment in the embodiment of the present invention Form is not specifically limited.

So, Fig. 1 is the flow diagram of the filtering method formed based on fixed beam in the embodiment of the present invention one, ginseng As shown in Figure 1, being somebody's turn to do the filtering method that formed based on fixed beam may include:

S101: multicenter voice signal to be processed is obtained；

Wherein, multicenter voice signal includes at least the voice signal from target sound source and doing from interference sound source Disturb signal.

Here, target sound source generally refers to the user that currently used electronic equipment is making a sound, as spoken People；Interference sound source can be the other people for referring to and making a sound in current environment locating for electronic equipment, as sung Another people, the electronics made a sound that may also mean that other users use in current environment locating for electronic equipment are set It is standby, such as it is playing the speaker of music, mobile phone.

Here, the quantity of target sound source is one, interferes the quantity of sound source for one or more, such as two, three. Certainly, in practical applications, in addition to including the voice signal of target sound source and coming from multicenter voice signal obtained Outside the interference signal for interfering sound source, it is also possible to include other types of interference signal, such as environmental noise, reverberation interference etc..

In practical applications, target sound source and interference sound source can be oriented to 0 °~180 ° of plane wave of any angle.

It specifically, can be by the sensing that is arranged in electronic equipment in order to obtain multicenter voice signal to be processed Device array acquires multicenter voice signal.In practical applications, which is by a certain number of acoustic sensors (for example, microphone) composition, samples for the spatial character to sound field.

Illustratively, it is assumed that the sensor array is by microphone array (Microphone Array) come what is realized, that , the sensor array can for as 4 at array composed by line style equidistantly equally distributed microphone (such as Fig. 2A institute Show), or it can also be by 8 into circle as 6 at array composed by line style equidistantly equally distributed microphone Array (as shown in Figure 2 B) composed by equidistant equally distributed microphone, it is, of course, also possible to for by other quantity and arrangement Array composed by the microphone of mode, for example, 12 at the equidistant equally distributed microphone of circle, rectangle, crescent Composed array etc..Here, the embodiment of the present invention does not do specifically the quantity and arrangement mode of microphone in microphone array It limits.

In practical applications, it is contemplated that the characteristic of sound wave, when being laid out microphone array, if setting microphone two-by-two it Between distance it is improper, will lead to sound source focusing positioning generate error, therefore, the distance between microphone should not be set two-by-two That sets is excessive, and what can not be arranged is too small.Illustratively, the equidistant distance between microphone two-by-two can be set less than 80 millis Rice, and it is greater than 30 millimeters.

S102: the fixed beam form factor being pointed in different directions based on pre-set at least two, to multichannel language Beam forming is fixed in sound signal, obtains voice estimation signal and disturbance estimation signal；

In practical applications, the pre-set at least two fixed beam form factor being pointed in different directions, can be right The voice signal that target sound source is directed toward generates enhancing, and to other interference signals in addition to the voice signal that target sound source is directed toward It generates the mode inhibited and beam forming is fixed, so that the voice signal that target sound source is directed toward is undistorted, also, to other sides To voice signal generate inhibition.

Specifically, after obtaining multicenter voice signal to be processed, so that it may solid according to one group be pre-designed Beam forming coefficients are determined, beam forming is fixed to multicenter voice signal, to enhance the voice letter in target sound source direction Number energy, inhibit except target sound source direction other directions (as interference Sounnd source direction) interference signal energy.In this way, just Enhanced voice signal can be obtained, i.e., voice estimates signal, and the remaining disturbance estimation signal after being inhibited.

In the specific implementation process, the quantity of the pre-set fixed beam form factor being pointed in different directions can be Two, three, four etc..Each fixed beam form factor refers to the fixed beam being centainly directed toward, for example, can be finger To the fixed beam in the directions such as 0 °, 30 °, 53 °, 60 °, 80 °, 90 °, 120 °, 150 °, 180 °.But, it should be noted that this In not be only capable of may point to other angles, here, the embodiment of the present invention is not for above-mentioned angle limiting fixed beam forming It is specifically limited.

In practical applications, the fixed beam form factor in each direction is a matrix, included wave in the matrix The quantity of beam shaping parameter value is consistent with the quantity of microphone in microphone array.

S103: based on voice estimation signal and disturbance estimation signal, post-filtering parameter is calculated；

In practical applications, due to the presence of the secondary lobe in reverberation and beamforming algorithm, to multicenter voice signal After carrying out Wave beam forming, prevent the interference signal of non-targeted Sounnd source direction from being suppressed completely, cause single Wave beam forming it In the voice signal in the target sound source direction enhanced afterwards, still can there are noise or the interference of a large amount of non-targeted Sounnd source direction The residual of sound.In this way, can also have the residual of interference signal in voice estimation signal obtained after executing S102, therefore, It also needs based on voice estimation signal and disturbance estimation signal, to calculate post-filtering parameter, to estimate signal to voice Post-filtering is carried out, more pure voice signal is obtained.

In practical applications, it is remained in the voice signal which can be directed toward enhanced target sound source Interference signal, such as ambient noise, reverberation interference, interference sound source be directed toward interference signal further inhibited so that Final signal obtained is undistorted, more pure.

S104: being based on post-filtering parameter, is filtered to voice estimation signal, obtains treated voice letter Number.

Specifically, after obtaining post-filtering parameter, so that it may estimate to believe to voice according to the post-filtering parameter It is number further to be filtered, further to inhibit the still remaining interference in voice estimation signal after wave beam enhances Signal, in this manner it is possible to obtain more pure final signal, i.e., treated voice signal.

As shown in the above, the filtering method provided in an embodiment of the present invention formed based on fixed beam, is being obtained It, can be first based on preparatory simultaneously after the multicenter voice signal of the interference signal of the voice signal comprising target sound source and interference sound source The fixed beam form factor that at least two be arranged are pointed in different directions, to multicenter voice signal be fixed wave beam at Shape obtains voice estimation signal and disturbance estimation signal.Next, after beam forming is fixed, it can be according to obtained Voice estimates signal and disturbance estimation signal to calculate post-filtering parameter.Finally, by the post-filtering parameter, to voice Estimation signal is filtered, and obtains treated voice signal.In this way, first pass through fixed beam forming to voice signal into Traveling wave Shu Zengqiang can enhance the user voice signal of target sound source direction, inhibit the interference signal in other directions, then after passing through It sets filtering parameter and post-filtering is carried out to enhanced voice signal, it can be to the user speech enhanced after single Wave beam forming A large amount of remaining interference signals are effectively inhibited in signal.To realize the interference for effectively inhibiting non-targeted Sounnd source direction Signal.So, when applying this method to remote pickup, it can be ensured that the user voice signal that target sound source is directed toward is not lost Very, and to other spaces the interference signal being directed toward effectively is inhibited.

Embodiment two

Based on previous embodiment, the embodiment of the present invention also provides another filtering method formed based on fixed beam, with Specific example is described in detail step each in previous embodiment.This method is applied to following scene: referring to Fig. 3 institute Show, it is assumed that sensor array 30, which is equidistantly uniformly distributed by the eolian line style of M Mike, to be formed, and assume in space there are One target sound source 31 and two interference sound sources 32, wherein M is positive integer, and the angle of target sound source 31 is θ^s, interfere sound source 32 Angle be respectivelyWith

The multicenter voice signal in above-mentioned S101 is introduced below.

Specifically, the signal x that m-th microphone is received in t moment_mIt (t) can be as follows shown in formula (1).

In formula (1), * indicates convolution, h_sm(t) shock response of target sound source to m-th microphone is indicated, s (t) is The voice signal that target sound source generates, h_im(t) shock response of i-th of interference sound source to m-th microphone, n are indicated_i(t) it is The interference signal that i-th of interference sound source generates, i are the index for interfering sound source, and the value range of i is [1, N].

In practical applications, for the ease of carrying out subsequent processing to multicenter voice signal, need to first pass through Fourier change It changes and voice signal is handled, original reluctant time-domain signal is converted to the frequency-region signal for being easy to analyze, Fourier The principle of leaf transformation is any timing continuously measured or signal, may be expressed as the unlimited of the sine wave signal of different frequency Superposition, and the fourier transform algorithm founded according to the principle is using the original signal that directly measures, come in cumulative mode in terms of Calculate frequency, amplitude and the phase of different sine wave signals in the signal.Wherein, the specific implementation sheet in relation to Fourier transform Inventive embodiments are no longer repeated herein.

Next, by above-mentioned time-domain signal x_m(t) after transforming to frequency domain, so that it may obtain shown in following formula (2) Frequency domain signal X_m(t,k)。

In formula (2), frame index (time frame index) when t is indicated, k indicates the index of discrete frequency (frequency bin index)。

On frequency domain, the observation signal of M microphone is indicated with vector form, so that it may obtain following formula (3) institute The multicenter voice signal X (t, k) shown.

X (t, k)=[X₁(t,k),X₂(t,k),...,X_M(t,k)]^TFormula (3)

In formula (3), X (t, k) indicates multicenter voice signal, X₁(t, k) indicates the 1st microphone letter collected Number, X₂(t, k) indicates the 2nd microphone signal collected, X_M(t, k) indicates m-th microphone signal collected.

In an alternative embodiment of the invention, above-mentioned S102 can be used but not limited to following methods to realize.Specifically, on Stating S102 may include: the fixed beam form factor being pointed in different directions based on pre-set at least two, to multichannel Beam forming is fixed in voice signal, obtains at least two beam signals；Wave beam at least two beam signals is oriented to The beam signal of the direction of target sound source is determined as voice estimation signal；Voice estimation letter will be removed at least two beam signals Other beam signals of extra, are determined as disturbance estimation signal.

Specifically, the quantity of the pre-set fixed beam form factor being pointed in different directions can be two, three It is a, four etc..Each fixed beam form factor refers to the fixed beam being centainly directed toward.For example, institute referring to fig. 4 Show, when beam forming (alternatively referred to as wave beam enhancing) is fixed to multicenter voice signal, 7 fixed beams can be used 40 enhance different directions respectively, wherein it is 0 °, 30 °, 60 °, 90 °, 120 °, 150 ° and 180 ° totally 7 that beam position, which can be set, The fixed beam in a direction, wherein 31 direction of target sound source is 90 ° of directions.

It in embodiments of the present invention, can will be entire using the fixed beam forming technique for having white noise gain constraint Space is divided into P part, and the fixed beam form factor W of P different directions is directed toward in design¹(t,k),...,W^P(t, k), to more Channel speech signal carries out wave beam enhancing, goes to enhance respectively from different directions signal, wherein W¹(t, k) indicates the 1st direction Fixed beam form factor, W^P(t, k) indicates the fixed beam form factor in the P direction.

It illustratively, is θ with target sound source orientation angle^sDirection, for number of microphone is M in microphone array, For each time frequency point, the corresponding fixed beam form factor in target sound source directionIt, can for a matrix Shown in following formula (4).

In formula (4), θ^sIndicate the direction of target sound source,Indicate corresponding 1st wave in target sound source direction Beam shaping parameter,Indicate the corresponding 2nd beam forming parameter in target sound source direction,Indicate target sound The corresponding m-th beam forming parameter in source direction.

Similarly, W1 (t, k), W^POther fixed beam form factors such as (t, k) fix wave with shown in above-mentioned formula (4) Beam shaping coefficient is similar.Here, it does not do and excessively repeats.

Next, above-mentioned P designed Wave beam forming parameters can be used, multicenter voice signal is enhanced, Obtain P beam signal.

Assuming that the beam signal of the direction of target sound source is q-th of beam signal in P beam signal, pass through formula (4) and (5) q-th of beam signal can, be calculated.

In formula (5),Indicate q-th of beam signal,Indicate that target sound source direction is corresponding solid Determine beam forming coefficients, X (t, k) indicates multicenter voice signal.

Similarly, other P-1 beam signals in P beam signalCalculation it is similar with formula (5), Wherein, p=1 ..., P, p ≠ q, only with the corresponding fixed beam form factor in target sound source direction in formula (5) is corresponding Replace with other P-1 fixed beam form factors.Here, it does not do and excessively repeats.

Specifically, from above-mentioned P beam signal, by the beam signal of the direction of target sound source, i.e., above-mentioned q-th of wave Estimation of the beam signal as the voice signal of target sound source, so that it may obtain voice estimation signalAnd by above-mentioned P wave In beam signal in addition to the beam signal of the direction of target sound source, other P-1 beam signalsAs estimating for interference signal Meter, so that it may obtain disturbance estimation signal, wherein p=1 ..., P, p ≠ q.

In an alternative embodiment of the invention, above-mentioned S103 can be used but not limited to following methods to realize.Specific real During applying, in order to realize better filter effect, more pure voice signal is obtained, post-filtering parameter can be by frame level Other postposition gain and time-frequency rank postposition gain are realized.Specific address, above-mentioned S103 may comprise steps of A1~A2:

Step A1, based on voice estimation signal and disturbance estimation signal, the postposition gain of time-frequency rank is calculated；

Step A2, it is based on the postposition gain of time-frequency rank, calculates the other postposition gain of frame level.

In the specific implementation process, in order to calculate the postposition gain of time-frequency rank, above-mentioned steps A1 can also include following Step B1~B2:

Step B1, it is based on preset weight coefficient, calculates the weighted sum of disturbance estimation signal, interference signal energy is obtained and estimates Evaluation；

Specifically, disturbance estimation signal is being obtainedWherein, it after p=1 ..., P, p ≠ q, can be usedThe approximate evaluation of interference signal energy is carried out by following formula (6), obtains interference signal energy estimators

In formula (6), α_pIndicate the weight of p-th of wave beam, wherein 0 < α_p< 1.

In practical applications, weight coefficient is empirical value, can be by those skilled in the art's basis in the specific implementation process Actual conditions are arranged, and here, the embodiment of the present invention is not specifically limited.

Step B2, based on voice estimation signal and interference signal energy estimators, the postposition gain of time-frequency rank is calculated.

In the specific implementation process, based on voice estimation signal and interference signal energy estimators, after calculating time-frequency rank The specific implementation for setting gain may exist and be not limited to include following two implementation:

The first implementation: calculate voice estimation signal and interference signal energy estimators and, acquisition the first value；Meter The ratio for calculating voice estimation signal and the first value, obtains the postposition gain of time-frequency rank.

Specifically, it due to obtaining voice estimation signal according to formula (5), and is estimated according to formula (6) dry Signal energy estimated value is disturbed, thus, by following formula (7), time-frequency rank postposition gain G can be calculated_TF(t,k)。

In formula (7), G_TF(t, k) indicates the postposition gain of time-frequency rank,Indicate that voice estimates signal, Indicate interference signal energy estimators.

Second of implementation: the ratio of voice estimation signal and interference signal energy estimators is calculated, signal-to-noise ratio is obtained Estimated value；Calculate signal-to-noise ratio (SNR) estimation value and pre-set constant value and, acquisition second value；Calculate signal-to-noise ratio (SNR) estimation value and second value Ratio obtains the postposition gain of time-frequency rank.

Specifically, it due to calculating voice estimation signal according to formula (5), and is obtained according to formula (6) dry Signal energy estimated value is disturbed, thus, the signal-to-noise ratio of following formula (8) estimation time frequency point can also be first passed through, signal-to-noise ratio is obtained and estimates Then evaluation using the signal-to-noise ratio (SNR) estimation value of estimation, the wiener form of T/F rank is estimated by following formula (9) Gain, obtain the postposition gain of time-frequency rank.

In formula (8),Indicate signal-to-noise ratio (SNR) estimation value,Indicate that voice estimates signal,It indicates Interference signal energy estimators.

In formula (9), G_TF(t, k) indicates the postposition gain of time-frequency rank,Indicate signal-to-noise ratio (SNR) estimation value, C is indicated Pre-set constant value.

Under normal circumstances, above-mentioned pre-set constant value can take 1.

Specifically, after executing S102 and obtaining voice estimation signal and disturbance estimation signal, step can be first carried out A1 calculates time-frequency rank postposition gain, may then pass through sub-frame processing, executes step A2 and is increased according to the time-frequency rank postposition Benefit calculates the other postposition gain of frame level by following formula (10).

In formula (10), G_T(t) the other postposition gain of frame level, G are indicated_TF(t, k) indicates the postposition gain of time-frequency rank, K table Show totalframes, k=0 ..., K-1.

Certainly, it should be noted that post-filtering parameter is in addition to that can be the other postposition gain of frame level and time-frequency rank postposition Outside the combination of gain, or any of the other postposition gain of frame level and time-frequency rank postposition gain.

In an alternative embodiment of the invention, above-mentioned S104 can be used but not limited to following methods to realize.Specific real During applying, above-mentioned S104 may include: to calculate the other postposition gain of frame level, time-frequency rank postposition gain and voice estimation signal Product obtains treated voice signal.

Specifically, when post-filtering parameter is realized especially by the other postposition gain of frame level and time-frequency rank postposition gain When, the voice signal that can obtain that treated by following formula (11).

In formula (11),Indicate treated voice signal, G_T(t) the other postposition gain of frame level, G are indicated_TF(t, K) the postposition gain of time-frequency rank is indicated.

So far, the multi-beam space post-filtering process to voice signal is just completed.

As shown in the above, in embodiments of the present invention, it first passes through fixed beam forming and wave beam is carried out to voice signal Enhancing can enhance the user voice signal of target sound source direction, inhibit the interference signal in other directions, then by frame level not after It sets gain and time-frequency rank postposition gain and post-filtering is carried out to enhanced voice signal, can effectively inhibit enhanced use The remaining interference signal of institute in the voice signal of family, such as ambient noise, reverberation interference, the interference signal for interfering sound source direction, it is real Better filter effect is showed.To realize the interference signal for effectively inhibiting non-targeted Sounnd source direction.So, by this method When being applied to remote pickup, it can be ensured that the user voice signal that target sound source is directed toward is undistorted, and is directed toward to other spaces Interference signal effectively inhibited.

Embodiment three

Based on the same inventive concept, as an implementation of the above method, the embodiment of the invention provides one kind based on fixation The filter of Wave beam forming, the Installation practice is corresponding with preceding method embodiment, and to be easy to read, present apparatus embodiment is not The detail content in preceding method embodiment is repeated one by one again, it should be understood that the device in the present embodiment can be right It should realize the full content in preceding method embodiment.

Fig. 5 is the structural schematic diagram of the filter formed based on fixed beam in the embodiment of the present invention three, referring to Fig. 5 Shown, which includes: obtaining unit 501, for obtaining multicenter voice signal to be processed, wherein multicenter voice Signal includes at least the voice signal from target sound source and the interference signal from interference sound source；Beam shaping elements 502, Fixed beam form factor for being pointed in different directions based on pre-set at least two carries out multicenter voice signal Fixed beam forming obtains voice estimation signal and disturbance estimation signal；Computing unit 503, for estimating signal based on voice And disturbance estimation signal, calculate post-filtering parameter；Filter unit 504 is estimated to believe for being based on post-filtering parameter to voice It number is filtered, the voice signal that obtains that treated.

In embodiments of the present invention, beam shaping elements, for being pointed in different directions based on pre-set at least two Fixed beam form factor, beam forming is fixed to multicenter voice signal, obtains at least two beam signals；It is near Wave beam is oriented to the beam signal of the direction of target sound source in few two beam signals, is determined as voice estimation signal；It will at least Other beam signals in two beam signals in addition to voice estimates signal, are determined as disturbance estimation signal.

In embodiments of the present invention, computing unit, for calculating time-frequency based on voice estimation signal and disturbance estimation signal Rank postposition gain；Based on time-frequency rank postposition gain, the other postposition gain of frame level is calculated.

In embodiments of the present invention, filter unit, for calculating the other postposition gain of frame level, time-frequency rank postposition gain and language Sound estimates the product of signal, obtains treated voice signal.

In embodiments of the present invention, computing unit calculates adding for disturbance estimation signal for being based on preset weight coefficient Quan He obtains interference signal energy estimators；Signal and interference signal energy estimators are estimated based on voice, calculate time-frequency rank Postposition gain.

In embodiments of the present invention, computing unit, for calculating voice estimation signal and interference signal energy estimators With the first value of acquisition；The ratio for calculating voice estimation signal and the first value, obtains the postposition gain of time-frequency rank.

In embodiments of the present invention, computing unit, for calculating voice estimation signal and interference signal energy estimators Ratio obtains signal-to-noise ratio (SNR) estimation value；Calculate signal-to-noise ratio (SNR) estimation value and pre-set constant value and, acquisition second value；Calculate signal-to-noise ratio The ratio of estimated value and second value obtains the postposition gain of time-frequency rank.

Since the filter formed based on fixed beam that the embodiment of the present invention is introduced is that can execute the present invention in fact Apply the device of the filtering method formed based on fixed beam in example, so based on described in the embodiment of the present invention based on solid Determine the filtering method of Wave beam forming, those skilled in the art can understand being formed based on fixed beam for the embodiment of the present invention Filter specific embodiment and its various change form, so herein for this based on fixed beam formed filter How wave apparatus realizes that the filtering method formed based on fixed beam in the embodiment of the present invention is no longer discussed in detail.As long as ability Domain those of skill in the art implement device used by the filtering method formed in the embodiment of the present invention based on fixed beam, belong to The range to be protected of the application.

Example IV

Based on the same inventive concept, the embodiment of the present invention provides a kind of electronic equipment.Fig. 6 is in the embodiment of the present invention four The structural schematic diagram of electronic equipment, shown in Figure 6, which includes: at least one processor 601；And with place Manage at least one processor 602, the bus 603 that device 601 connects；Wherein, processor 601, memory 602 are complete by bus 603 At mutual communication；Processor 601 is used to call the program instruction in memory 602, to execute following steps: obtaining wait locate The multicenter voice signal of reason, wherein multicenter voice signal includes at least the voice signal from target sound source and comes from Interfere the interference signal of sound source；The fixed beam form factor being pointed in different directions based on pre-set at least two, to more Beam forming is fixed in channel speech signal, obtains voice estimation signal and disturbance estimation signal；Signal is estimated based on voice And disturbance estimation signal, calculate post-filtering parameter；Based on post-filtering parameter, voice estimation signal is filtered, Obtain treated voice signal.

In embodiments of the present invention, following steps be can also carry out when above-mentioned processor caller instructs: based on setting in advance Beam forming is fixed to multicenter voice signal in the fixed beam form factor that at least two set are pointed in different directions, Obtain at least two beam signals；Wave beam at least two beam signals is oriented to the beam signal of the direction of target sound source, It is determined as voice estimation signal；By other beam signals at least two beam signals in addition to voice estimates signal, it is determined as Disturbance estimation signal.

In embodiments of the present invention, following steps be can also carry out when above-mentioned processor caller instructs: being estimated based on voice Signal and disturbance estimation signal are counted, the postposition gain of time-frequency rank is calculated；Based on time-frequency rank postposition gain, the other postposition of frame level is calculated Gain.

In embodiments of the present invention, following steps be can also carry out when above-mentioned processor caller instructs: it is other to calculate frame level The product of postposition gain, time-frequency rank postposition gain and voice estimation signal, obtains treated voice signal.

In embodiments of the present invention, following steps be can also carry out when above-mentioned processor caller instructs: based on preset Weight coefficient calculates the weighted sum of disturbance estimation signal, obtains interference signal energy estimators；Based on voice estimation signal and do Signal energy estimated value is disturbed, the postposition gain of time-frequency rank is calculated.

In embodiments of the present invention, following steps be can also carry out when above-mentioned processor caller instructs: calculating voice and estimate Count signal and interference signal energy estimators and, the first value of acquisition；The ratio for calculating voice estimation signal and the first value, obtains Time-frequency rank postposition gain.

In embodiments of the present invention, following steps be can also carry out when above-mentioned processor caller instructs: calculating voice and estimate The ratio of signal and interference signal energy estimators is counted, signal-to-noise ratio (SNR) estimation value is obtained；Calculate signal-to-noise ratio (SNR) estimation value and preset constant The sum of value obtains second value；The ratio of signal-to-noise ratio (SNR) estimation value and second value is calculated, the postposition gain of time-frequency rank is obtained.

The embodiment of the invention also provides a kind of processor, processor is for running program, wherein program executes when running In above-described embodiment based on fixed beam formed filtering method the step of.

Above-mentioned processor can be by central processing unit (Central Processing Unit, CPU), microprocessor (Micro Processor Unit, MPU), digital signal processor (Digital Signal Processor, DSP) or field-programmable Gate array (Field Programmable Gate Array, FPGA) etc. is realized.Memory may include computer-readable medium In non-volatile memory, the shapes such as random access memory (Random Access Memory, RAM) and/or Nonvolatile memory Formula, if read-only memory (Read Only Memory, ROM) or flash memory (Flash RAM), memory include at least one storage Chip.

Embodiment five

Based on the same inventive concept, the embodiment of the present invention also provides a kind of computer readable storage medium, above-mentioned computer Readable storage medium storing program for executing includes the program of storage, wherein in program operation, equipment where control storage medium executes above-mentioned implementation Example in based on fixed beam formed filtering method the step of.

It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, Usable storage medium (including but not limited to magnetic disk storage, CD-ROM (Compact Disc Read-Only Memory, CD-ROM), optical memory etc.) on the form of computer program product implemented.

The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions can also be loaded into computer or other programmable data processing devices, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net Network interface and memory.

Memory may include the non-volatile memory in computer-readable medium, RAM and/or Nonvolatile memory etc. Form, such as ROM or Flash RAM.Memory is the example of computer-readable medium.

Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. Computer readable storage medium can be ROM, programmable read only memory (Programmable Read-Only Memory, PROM), Erasable Programmable Read Only Memory EPROM (Erasable Programmable Read-Only Memory, EPROM), electricity Erasable Programmable Read Only Memory EPROM (Electrically Erasable Programmable Read-Only Memory,

EEPROM), magnetic RAM (Ferromagnetic Random Access Memory, FRAM), Flash memory (Flash Memory), magnetic surface storage, CD or CD-ROM (Compact Disc Read-Only Memory, CD-ROM) etc. memories；It is also possible to flash memory or other memory techniques, CD-ROM, digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices or any other non-biography Defeated medium, can be used for storage can be accessed by a computing device information；It can also be including one of above-mentioned memory or any group The various electronic equipments closed, such as mobile phone, computer, tablet device, personal digital assistant.As defined in this article, Computer-readable medium does not include temporary computer readable media (transitory media), the data-signal and load of such as modulation Wave.

It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include the other elements being not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including element There is also other identical elements in process, method, commodity or equipment.

It will be understood by those skilled in the art that the embodiment of the present invention can provide as method, system or computer program product. Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the present invention Form.It is deposited moreover, the present invention can be used to can be used in the computer that one or more wherein includes computer usable program code The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.

The above is only the embodiment of the present invention, are not intended to restrict the invention.To those skilled in the art, The invention may be variously modified and varied.It is all within the spirit and principles of the present invention made by any modification, equivalent replacement, Improve etc., it should be included within scope of the presently claimed invention.

Claims

1. a kind of filtering method formed based on fixed beam, which is characterized in that the described method includes:

Obtain multicenter voice signal to be processed, wherein the multicenter voice signal is included at least from target sound source Voice signal and from interference sound source interference signal；

The fixed beam form factor being pointed in different directions based on pre-set at least two, to the multicenter voice signal Beam forming is fixed, obtains voice estimation signal and disturbance estimation signal；

Signal and the disturbance estimation signal are estimated based on the voice, calculate post-filtering parameter；

Based on the post-filtering parameter, voice estimation signal is filtered, the voice signal that obtains that treated.

2. the method according to claim 1, wherein described be directed toward not Tongfang based on pre-set at least two To fixed beam form factor, beam forming is fixed to the multicenter voice signal, obtain voice estimation signal and Disturbance estimation signal, comprising:

The fixed beam form factor being pointed in different directions based on pre-set at least two, to the multicenter voice signal Beam forming is fixed, obtains at least two beam signals；

Wave beam at least two beam signal is oriented to the beam signal of the direction of the target sound source, is determined as described Voice estimates signal；

By other beam signals at least two beam signal in addition to the voice estimates signal, it is determined as the interference Estimate signal.

3. method according to claim 1 or 2, which is characterized in that described based on voice estimation signal and described dry Estimation signal is disturbed, post-filtering parameter is calculated, comprising:

Signal and the disturbance estimation signal are estimated based on the voice, calculate the postposition gain of time-frequency rank；

Based on the time-frequency rank postposition gain, the other postposition gain of frame level is calculated.

4. according to the method described in claim 3, it is characterized in that, described be based on the post-filtering parameter, to the voice Estimation signal is filtered, and obtains treated voice signal, comprising:

The product for calculating the other postposition gain of the frame level, the time-frequency rank postposition gain and voice estimation signal, obtains Treated the voice signal.

5. according to the method described in claim 3, it is characterized in that, described estimated based on voice estimation signal and the interference Signal is counted, the postposition gain of time-frequency rank is calculated, comprising:

Based on preset weight coefficient, the weighted sum of the disturbance estimation signal is calculated, obtains interference signal energy estimators；

Signal and the interference signal energy estimators are estimated based on the voice, calculate the time-frequency rank postposition gain.

6. according to the method described in claim 5, it is characterized in that, described based on voice estimation signal and interference letter Number energy estimators, calculate the time-frequency rank postposition gain, comprising:

Calculate voice estimation signal and the interference signal energy estimators and, the first value of acquisition；

The ratio for calculating voice the estimation signal and first value, obtains the time-frequency rank postposition gain.

7. according to the method described in claim 5, it is characterized in that, described based on voice estimation signal and interference letter Number energy estimators, calculate the time-frequency rank postposition gain, comprising:

The ratio for calculating voice the estimation signal and the interference signal energy estimators, obtains signal-to-noise ratio (SNR) estimation value；

Calculate the signal-to-noise ratio (SNR) estimation value and pre-set constant value and, obtain second value；

The ratio for calculating the signal-to-noise ratio (SNR) estimation value and the second value obtains the time-frequency rank postposition gain.

8. a kind of filter formed based on fixed beam, which is characterized in that described device includes:

Obtaining unit, for obtaining multicenter voice signal to be processed, wherein the multicenter voice signal, which includes at least, to be come From the voice signal of target sound source and from the interference signal for interfering sound source；

Beam shaping elements, the fixed beam form factor for being pointed in different directions based on pre-set at least two are right Beam forming is fixed in the multicenter voice signal, obtains voice estimation signal and disturbance estimation signal；

Computing unit, for calculating post-filtering parameter based on voice estimation signal and the disturbance estimation signal；

Filter unit is filtered voice estimation signal, is handled for being based on the post-filtering parameter Voice signal afterwards.

9. a kind of computer readable storage medium, which is characterized in that the storage medium includes the program of storage, wherein in institute Equipment where controlling the storage medium when stating program operation executes as described in any one of claim 1 to 7 based on fixed wave The step of filtering method that beam is formed.

10. a kind of electronic equipment, which is characterized in that the equipment includes:

At least one processor；

And at least one processor, the bus being connected to the processor；

Wherein, the processor, memory complete mutual communication by the bus；The processor is described for calling Program instruction in memory, to execute the filtering method as described in any one of claim 1 to 7 formed based on fixed beam The step of.