CN109102822B

CN109102822B - Filtering method and device based on fixed beam forming

Info

Publication number: CN109102822B
Application number: CN201810828327.6A
Authority: CN
Inventors: 孙思宁; 黄美玉
Original assignee: Mobvoi Information Technology Co Ltd
Current assignee: Volkswagen China Investment Co Ltd; Mobvoi Innovation Technology Co Ltd
Priority date: 2018-07-25
Filing date: 2018-07-25
Publication date: 2020-07-28
Anticipated expiration: 2038-07-25
Also published as: CN109102822A

Abstract

The embodiment of the invention provides a filtering method and a device based on fixed beam forming, wherein the method comprises the following steps: obtaining a multi-channel voice signal to be processed, wherein the multi-channel voice signal at least comprises a voice signal from a target sound source and an interference signal from an interference sound source; based on at least two preset fixed beam forming coefficients pointing to different directions, performing fixed beam forming on the multi-channel voice signal to obtain a voice estimation signal and an interference estimation signal; calculating post-filtering parameters based on the speech estimation signal and the interference estimation signal; and based on the post-filtering parameter, carrying out filtering processing on the voice estimation signal to obtain a processed voice signal. Therefore, the voice signal after beam forming is filtered through the post-filtering parameter, the voice signal pointed by the target sound source is ensured not to be distorted, and other interference signals are effectively inhibited.

Description

Filtering method and device based on fixed beam forming

Technical Field

The embodiment of the invention relates to the technical field of signal processing, in particular to a filtering method and a filtering device based on fixed beam forming.

Background

Along with the rise of smart homes and the Internet of things, electronic equipment such as smart sound boxes, wearable equipment and smart phones is rapidly popularized, the requirements of users on the functions and the intellectualization of the electronic equipment are higher and higher, and most of the electronic equipment is configured with an intelligent voice interaction function in order to enable human-computer interaction to be more natural and simple. However, when the distance between the user and the electronic device is large, when the electronic device picks up sound at a long distance through a sensor array (e.g., a microphone array), due to various interferences including background noise (e.g., background music), other human voices, reverberation, and the like contained in the real environment, the quality of the voice signal of the target user acquired by the electronic device is poor, and the voice recognition accuracy is low.

Currently, Beamforming (Beamforming), which is a signal processing technique used in sensor arrays (e.g., microphone arrays), is commonly used in capturing user speech for directional signal reception and proper signal processing of received sound signals.

The inventor finds that non-stationary interference signals cannot be effectively suppressed due to the fact that time-varying reverberation often exists in a real environment and side lobes exist in a beamforming algorithm and are constrained by sensor array geometry and intelligent terminal calculation conditions in the process of researching beamforming.

Disclosure of Invention

In view of this, embodiments of the present invention provide a filtering method and apparatus based on fixed beam forming, which mainly aim to ensure that a user speech signal pointed by a target sound source is not distorted, and effectively suppress interference signals pointed by other spaces.

In order to achieve the above purpose, the embodiments of the present invention mainly provide the following technical solutions:

in a first aspect, an embodiment of the present invention provides a filtering method based on fixed beam forming, where the method includes: obtaining a multi-channel voice signal to be processed, wherein the multi-channel voice signal at least comprises a voice signal from a target sound source and an interference signal from an interference sound source; based on at least two preset fixed beam forming coefficients pointing to different directions, performing fixed beam forming on the multi-channel voice signal to obtain a voice estimation signal and an interference estimation signal; calculating post-filtering parameters based on the speech estimation signal and the interference estimation signal; and based on the post-filtering parameter, carrying out filtering processing on the voice estimation signal to obtain a processed voice signal.

In a second aspect, an embodiment of the present invention provides a filtering apparatus based on fixed beam forming, where the apparatus includes: an obtaining unit, configured to obtain a multi-channel speech signal to be processed, where the multi-channel speech signal at least includes a speech signal from a target sound source and an interference signal from an interference sound source; the beam forming unit is used for carrying out fixed beam forming on the multi-channel voice signal based on at least two preset fixed beam forming coefficients pointing to different directions to obtain a voice estimation signal and an interference estimation signal; a calculation unit, configured to calculate a post-filtering parameter based on the speech estimation signal and the interference estimation signal; and the filtering unit is used for carrying out filtering processing on the voice estimation signal based on the post-filtering parameter to obtain a processed voice signal.

In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where the storage medium includes a stored program, and when the program runs, the apparatus where the storage medium is located is controlled to execute the steps of the filtering method based on fixed beam forming.

In a fourth aspect, an embodiment of the present invention provides an electronic device, where the electronic device includes: at least one processor; and at least one memory, bus connected with the processor; the processor and the memory complete mutual communication through the bus; the processor is configured to invoke program instructions in the memory to perform the steps of the fixed beamforming based filtering method described above.

After obtaining a multi-channel voice signal simultaneously including a voice signal of a target sound source and an interference signal of an interference sound source, the filtering method and device based on fixed beam forming according to the embodiments of the present invention perform fixed beam forming on the multi-channel voice signal based on at least two preset fixed beam forming coefficients pointing to different directions to obtain a voice estimation signal and an interference estimation signal. Next, after fixed beam forming, post-filtering parameters are calculated from the obtained speech estimation signal and interference estimation signal. And finally, filtering the voice estimation signal through the post-filtering parameter to obtain a processed voice signal. Therefore, the voice signals are subjected to beam enhancement through fixed beam forming, the user voice signals pointed by the target sound source can be enhanced, interference signals in other directions are inhibited, the enhanced voice signals are subjected to post-filtering through post-filtering parameters, and a large amount of residual interference signals in the enhanced user voice signals after single beam forming can be effectively inhibited. Thereby, an effective suppression of interfering signals in non-target sound source directions is achieved. Therefore, when the method is applied to long-distance sound pickup, the voice signal of a user pointed by a target sound source is ensured not to be distorted, and interference signals pointed by other spaces are effectively suppressed.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 is a schematic flowchart of a filtering method based on fixed beam forming according to a first embodiment of the present invention;

fig. 2A to 2B are schematic diagrams of a microphone array according to a first embodiment of the invention;

FIG. 3 is a diagram illustrating a multi-channel speech signal model according to a second embodiment of the present invention;

fig. 4 is a diagram illustrating multiple fixed beams according to a second embodiment of the present invention;

fig. 5 is a schematic structural diagram of a filtering apparatus based on fixed beam forming according to a third embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device in a fourth embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

Example one

In practical application, the filtering method based on fixed beam forming can be applied to various occasions where a voice signal needs to be filtered to obtain a clean voice signal, for example, in the field of voice recognition, in order to improve the accuracy of voice recognition, preprocessing is required to be performed before the voice signal which contains an interference signal and is collected by a sensor array is recognized to enhance the voice signal of a target user, remove the interference signals such as environmental noise and other voices, and obtain the clean user voice signal.

Specifically, the fixed beam forming based filtering method is executed by a fixed beam forming based filtering apparatus, and the fixed beam forming based filtering apparatus can be built in or externally connected to an electronic device.

In practical applications, the electronic device may be implemented in various forms. For example, the electronic device described in the embodiment of the present invention may include smart home devices such as a smart speaker, a smart television, a smart set-top box, etc., and personal devices such as a smart phone, a tablet computer, a smart watch, a smart band, etc. Of course, other types of electronic devices with user voice collection and processing functions, such as notebook computers, etc., are also possible. Here, a specific implementation form of the electronic device in the embodiment of the present invention is not particularly limited.

Then, fig. 1 is a schematic flowchart of a fixed beam forming based filtering method according to a first embodiment of the present invention, and referring to fig. 1, the fixed beam forming based filtering method may include:

s101: obtaining a multi-channel voice signal to be processed;

wherein the multi-channel speech signal comprises at least a speech signal from a target sound source and an interfering signal from an interfering sound source.

Here, the target sound source generally refers to a user who is currently making a sound using the electronic device, such as a person who is speaking; the interfering sound source may refer to another person who is making a sound in the current environment where the electronic device is located, such as another person singing, or may refer to an electronic device that is making a sound and is used by another user in the current environment where the electronic device is located, such as a sound box or a mobile phone that is playing music.

Here, the number of the target sound sources is one, and the number of the interfering sound sources is one or more, such as two, three, and the like. Of course, in practical applications, the obtained multi-channel speech signal may include other types of interference signals, such as ambient noise, reverberation interference, etc., besides the speech signal of the target sound source and the interference signal from the interference sound source.

In practical application, the target sound source and the interference sound source can point to any angle of 0-180 degrees of plane waves.

In particular, in order to obtain a multi-channel speech signal to be processed, the multi-channel speech signal may be acquired by a sensor array provided in the electronic device. In practical applications, the sensor array is composed of a number of acoustic sensors (e.g., microphones) for sampling the spatial characteristics of the sound field.

For example, assuming that the sensor Array is implemented by a Microphone Array (Microphone Array), the sensor Array may be an Array composed of 4 linear uniformly-spaced microphones (as shown in fig. 2A), an Array composed of 6 linear uniformly-spaced microphones (as shown in fig. 2B), an Array composed of 8 circular uniformly-spaced microphones (as shown in fig. 2B), or an Array composed of other numbers and arrangements of microphones, for example, an Array composed of 12 circular, rectangular, crescent uniformly-spaced microphones, etc. Here, the number and arrangement of the microphones in the microphone array are not specifically limited in the embodiments of the present invention.

In practical applications, in consideration of characteristics of sound waves, when a microphone array is laid out, if a distance between every two arranged microphones is not appropriate, an error is generated in focusing and positioning of a sound source, and therefore, the distance between every two arranged microphones is not suitable to be too large, and cannot be too small. Illustratively, the equidistant distance between two microphones may be set to be less than 80 mm and greater than 30 mm.

S102: based on at least two preset fixed beam forming coefficients pointing to different directions, performing fixed beam forming on a multi-channel voice signal to obtain a voice estimation signal and an interference estimation signal;

in practical application, at least two preset fixed beam forming coefficients pointing to different directions can enhance the voice signal pointed by the target sound source and perform fixed beam forming in a manner of suppressing other interference signals except the voice signal pointed by the target sound source, so that the voice signal pointed by the target sound source is not distorted, and the voice signals in other directions are suppressed.

Specifically, after obtaining the multi-channel voice signal to be processed, the multi-channel voice signal may be fixed beamformed according to a set of fixed beamforming coefficients designed in advance, so as to enhance the energy of the voice signal in the target sound source direction and suppress the energy of the interfering signal in the directions (such as the interfering sound source direction) other than the target sound source direction. In this way, an enhanced speech signal, i.e. a speech estimation signal, can be obtained, and a suppressed residual interference estimation signal can be obtained.

In a specific implementation process, the number of the preset fixed beamforming coefficients pointing to different directions may be two, three, four, and the like. Each fixed beamforming coefficient refers to a fixed beam with a certain pointing direction, for example, may be a fixed beam pointing in the directions of 0 °, 30 °, 53 °, 60 °, 80 °, 90 °, 120 °, 150 °, 180 °, and so on. However, it should be noted that the fixed beamforming is not limited to the above-mentioned angle, and may be directed to other angles, and the embodiments of the present invention are not limited thereto.

In practical applications, the fixed beamforming coefficients for each direction are a matrix, in which the number of beamforming parameter values included in the matrix corresponds to the number of microphones in the microphone array.

S103: calculating post-filtering parameters based on the speech estimation signal and the interference estimation signal;

in practical application, due to existence of reverberation and sidelobes in a beamforming algorithm, after a multi-channel voice signal is beamformed, an interference signal in a non-target sound source direction cannot be completely suppressed, so that a large amount of noise or interference sound residues in the non-target sound source direction still exist in the voice signal in the target sound source direction enhanced after a single beamforming. Thus, after S102 is executed, there is still a residual interference signal in the obtained speech estimation signal, and therefore, it is further necessary to calculate a post-filtering parameter based on the speech estimation signal and the interference estimation signal, so as to perform post-filtering on the speech estimation signal to obtain a cleaner speech signal.

In practical application, the post-filter parameter can further suppress residual interference signals, such as environmental noise, reverberation interference, interference signals directed by an interference sound source, and the like, in the enhanced voice signal directed by the target sound source, so that the obtained final signal is not distorted and is more pure.

S104: and based on the post-filtering parameters, filtering the voice estimation signal to obtain a processed voice signal.

Specifically, after the post-filtering parameter is obtained, the speech estimation signal can be further filtered according to the post-filtering parameter to further suppress the interference signal still remaining in the speech estimation signal after the beam enhancement, so that a purer final signal, i.e., the processed speech signal, can be obtained.

As can be seen from the above, after obtaining a multi-channel speech signal simultaneously including a speech signal of a target sound source and an interference signal of an interference sound source, the filtering method based on fixed beam forming according to the embodiment of the present invention performs fixed beam forming on the multi-channel speech signal based on at least two preset fixed beam forming coefficients pointing to different directions to obtain a speech estimation signal and an interference estimation signal. Next, after fixed beam forming, post-filtering parameters are calculated from the obtained speech estimation signal and interference estimation signal. And finally, filtering the voice estimation signal through the post-filtering parameter to obtain a processed voice signal. Therefore, the voice signals are subjected to beam enhancement through fixed beam forming, the user voice signals pointed by the target sound source can be enhanced, interference signals in other directions are inhibited, the enhanced voice signals are subjected to post-filtering through post-filtering parameters, and a large amount of residual interference signals in the enhanced user voice signals after single beam forming can be effectively inhibited. Thereby, an effective suppression of interfering signals in non-target sound source directions is achieved. When the method is applied to the long-distance sound pickup, the voice signals of the user pointed by the target sound source can be ensured not to be distorted, and the interference signals pointed by other spaces can be effectively suppressed.

Example two

Based on the foregoing embodiments, the embodiments of the present invention further provide another filtering method based on fixed beam forming, and the steps in the foregoing embodiments are described in detail with specific examples. The method is applied to the following scenes: referring to fig. 3, it is assumed that the sensor array 30 is composed of M microphones which are linearly and uniformly distributed at equal intervals, and that there are a target sound source 31 and two interfering sound sources 32 in the space, where M is a positive integer, and the angle of the target sound source 31 is θ^sThe angles of the interfering sound sources 32 are respectively

And

the following describes the multichannel speech signal in S101 described above.

Specifically, the mth microphone receives the signal x at time t_m(t) can be represented by the following formula (1).

In formula (1), denotes convolution, h_sm(t) represents the impulse response from the target sound source to the Mth microphone, s (t) is the speech signal generated by the target sound source, h_im(t) represents the impulse response from the ith interfering sound source to the Mth microphone, n_i(t) is an interference signal generated by the ith interference sound source, i is an index of the interference sound source, and the value range of i is [1, N]。

In practical application, in order to facilitate the subsequent processing of a multi-channel speech signal, the speech signal needs to be processed by fourier transform, the time domain signal which is difficult to process originally is converted into a frequency domain signal which is easy to analyze, the principle of fourier transform is that any continuously measured time sequence or signal can be represented as infinite superposition of sine wave signals with different frequencies, and the frequency, amplitude and phase of different sine wave signals in the signal are calculated in an accumulation mode by using the directly measured original signal according to the fourier transform algorithm created according to the principle. The detailed implementation of the fourier transform is not described herein.

Next, the time domain signal x is processed_mAfter (t) is transformed into the frequency domain, a frequency domain signal X shown in the following formula (2) can be obtained_m(t,k)。

In formula (2), t denotes a time frame index (time frame index), and k denotes a discrete frequency index (frequency bin index).

In the frequency domain, the observation signals of M microphones are expressed in a vector form, so that a multichannel speech signal X (t, k) shown in the following formula (3) can be obtained.

X(t,k)＝[X₁(t,k),X₂(t,k),...,X_M(t,k)]^TFormula (3)

In formula (3), X (t, k) represents a multi-channel speech signal, X₁(t, k) represents the signal picked up by the 1 st microphone, X₂(t, k) represents the signal picked up by the 2 nd microphone, X_MAnd (t, k) represents the signal collected by the Mth microphone.

In another embodiment of the present invention, the above S102 can be implemented by, but not limited to, the following method. Specifically, S102 may include: based on at least two preset fixed beam forming coefficients pointing to different directions, performing fixed beam forming on the multi-channel voice signal to obtain at least two beam signals; determining a beam pointing to a beam signal of a target sound source in at least two beam signals as a voice estimation signal; and determining other beam signals except the voice estimation signal in the at least two beam signals as interference estimation signals.

Specifically, the number of preset fixed beamforming coefficients pointing in different directions may be two, three, four, and the like. Each fixed beamforming coefficient refers to a fixed beam with a certain directivity. For example, referring to fig. 4, when performing fixed beam forming (also referred to as beam enhancement) on a multi-channel voice signal, 7 fixed beams 40 may be used to enhance different directions, respectively, wherein the fixed beams with beam pointing in 7 directions of 0 °, 30 °, 60 °, 90 °, 120 °, 150 °, and 180 ° may be set, and wherein the direction of the target sound source 31 is the 90 ° direction.

In the embodiment of the invention, the whole space can be divided into P parts by using a fixed beam forming technology with white noise gain constraint, and fixed beam forming coefficients W pointing to P different directions are designed¹(t,k),...,W^P(t, k) performing beam enhancement on the multi-channel speech signal to enhance signals from different directions respectively, wherein W¹(t, k) represents a fixed beamforming coefficient for the 1 st direction, W^P(t, k) represents a fixed beamforming coefficient for the P-th direction.

Illustratively, the target sound source pointing angle is θ^sFor each time-frequency point, the fixed beam forming coefficient corresponding to the target sound source direction

Which is a matrix, can be expressed as the following equation (4).

In the formula (4), θ^sWhich represents the pointing direction of the target sound source,

a1 st beamforming parameter indicating a direction of a target sound source,

a2 nd beamforming parameter indicating a direction of the target sound source,

and M-th beam forming parameters corresponding to the target sound source direction are represented.

Similarly, W1(t, k), W^POther fixed beamforming coefficients such as (t, k) are similar to the fixed beamforming coefficients shown in equation (4) above. Here, too much description is not given.

Next, the multi-channel speech signal may be enhanced using the P designed beamforming parameters to obtain P beam signals.

Assuming that the beam signal directed by the target sound source is the q-th beam signal among the P beam signals, the q-th beam signal can be calculated by equations (4) and (5).

In the formula (5), the first and second groups,

which represents the q-th beam signal,

and the fixed beam forming coefficients corresponding to the target sound source direction are shown, and X (t, k) represents a multi-channel voice signal.

Similarly, the other P-1 beam signals of the P beam signals

The calculation method of (a) is similar to the formula (5), wherein P is 1.. and P is not equal to q, and only the fixed beamforming coefficients corresponding to the target sound source direction in the formula (5) are replaced by other P-1 fixed beamforming coefficients. Here, do not do too muchThe description is given.

Specifically, the speech estimation signal can be obtained by estimating the speech signal of the target sound source from the q-th beam signal, which is the beam signal directed to the target sound source, among the P beam signals

And the other P-1 wave beam signals except the pointed wave beam signal of the target sound source in the P wave beam signals

As an estimation of the interference signal, an interference estimation signal can be obtained, where P ═ 1.

In another embodiment of the present invention, the above S103 can be implemented by, but not limited to, the following method. In the specific implementation process, in order to achieve a better filtering effect and obtain a cleaner speech signal, the post-filtering parameters may be implemented by a frame-level post-gain and a time-frequency-level post-gain. Specifically, the step S103 may include the following steps a1 to a 2:

step A1, calculating time-frequency level post gain based on the voice estimation signal and the interference estimation signal;

and A2, calculating the frame level post gain based on the time frequency level post gain.

In a specific implementation process, in order to calculate the post gain of the time-frequency level, the step a1 may further include the following steps B1 to B2:

step B1, calculating the weighted sum of the interference estimation signals based on the preset weight coefficient to obtain the energy estimation value of the interference signals;

in particular, an interference estimation signal is obtained

Where P is 1.. after P, P ≠ q, it can be used

The interference signal energy is carried out by the following formula (6)To obtain an estimate of the interference signal energy

In the formula (6), α_pRepresents the weight of the p-th beam, where 0 < α_p＜1。

In practical applications, the weighting factor is an empirical value, and can be set by a person skilled in the art in a specific implementation process according to practical situations, and the embodiment of the present invention is not limited in particular here.

And step B2, calculating the time-frequency level post gain based on the estimated value of the voice estimated signal and the estimated value of the interference signal energy.

In a specific implementation process, a specific implementation manner of calculating the time-frequency level post-gain based on the estimated values of the energy of the speech estimation signal and the interference signal may exist and is not limited to the following two implementation manners:

the first implementation mode comprises the following steps: calculating the sum of the energy estimated values of the voice estimated signal and the interference signal to obtain a first value; and calculating the ratio of the voice estimation signal to the first value to obtain the time-frequency level post-gain.

Specifically, since the speech estimation signal is obtained according to equation (5) and the interference signal energy estimation value is estimated according to equation (6), the time-frequency level post gain G can be calculated by equation (7) below_TF(t,k)。

In the formula (7), G_TF(t, k) represents the time-frequency level post gain,

which represents the estimated signal of the speech to be,

representing an estimate of the interference signal energy.

The second implementation mode comprises the following steps: calculating the ratio of the energy estimation value of the voice estimation signal to the energy estimation value of the interference signal to obtain the signal-to-noise ratio estimation value; calculating the sum of the signal-to-noise ratio estimation value and a preset constant value to obtain a second value; and calculating the ratio of the signal-to-noise ratio estimation value to the second value to obtain the time-frequency level post-gain.

Specifically, since the speech estimation signal is calculated according to the formula (5) and the interference signal energy estimation value is obtained according to the formula (6), the snr estimation value can be obtained by estimating the snr of the time frequency point according to the following formula (8), and then the time-frequency level post gain can be obtained by estimating the wiener gain of the time-frequency level according to the following formula (9) using the estimated snr estimation value.

In the formula (8), the first and second groups,

representing an estimate of the signal-to-noise ratio,

which represents the estimated signal of the speech to be,

representing an estimate of the interference signal energy.

In formula (9), G_TF(t, k) represents the time-frequency level post gain,

represents the snr estimation value and C represents a preset constant value.

In general, the predetermined constant value may take 1.

Specifically, after obtaining the speech estimation signal and the interference estimation signal in S102, step a1 may be executed to calculate a time-frequency level post gain, and then step a2 may be executed to calculate a frame level post gain according to the time-frequency level post gain by the following equation (10) through a framing process.

In the formula (10), G_T(t) denotes the frame level post gain, G_TF(t, K) represents the time-frequency level post gain, K represents the total frame number, K is 0.

Of course, it should be noted that the post-filtering parameter may be any one of the frame-level post-gain and the time-frequency-level post-gain, besides the combination of the frame-level post-gain and the time-frequency-level post-gain.

In another embodiment of the present invention, the above S104 can be implemented by, but not limited to, the following method. In a specific implementation process, the step S104 may include: and calculating the product of the frame level post gain, the time-frequency level post gain and the voice estimation signal to obtain a processed voice signal.

Specifically, when the post-filtering parameter is specifically realized by the frame-level post gain and the time-frequency-level post gain, the processed speech signal can be obtained by the following equation (11).

In the formula (11), the reaction mixture,

representing the processed speech signal, G_T(t) denotes the frame level post gain, G_TF(t, k) represents the time-frequency level postgain.

Thus, the multi-beam space post-filtering process of the voice signals is completed.

As can be seen from the above, in the embodiment of the present invention, the voice signal is first beam-enhanced by fixed beam forming, so that the voice signal of the user pointed by the target sound source can be enhanced, and the interference signals in other directions are suppressed, and then the enhanced voice signal is post-filtered by the frame-level post-gain and the time-frequency-level post-gain, so that the residual interference signals, such as the environmental noise, the reverberation interference, the interference signal pointed by the interference sound source, and the like, in the voice signal of the user can be effectively suppressed, and a better filtering effect is achieved. Thereby, an effective suppression of interfering signals in non-target sound source directions is achieved. When the method is applied to the long-distance sound pickup, the voice signals of the user pointed by the target sound source can be ensured not to be distorted, and the interference signals pointed by other spaces can be effectively suppressed.

EXAMPLE III

Based on the same inventive concept, as an implementation of the foregoing method, an embodiment of the present invention provides a filtering apparatus based on fixed beam forming, where the apparatus embodiment corresponds to the foregoing method embodiment, and for convenience of reading, details in the foregoing method embodiment are not repeated in this apparatus embodiment one by one, but it should be clear that the apparatus in this embodiment can correspondingly implement all the contents in the foregoing method embodiment.

Fig. 5 is a schematic structural diagram of a filtering apparatus based on fixed beam forming according to a third embodiment of the present invention, and referring to fig. 5, the apparatus 50 includes: an obtaining unit 501, configured to obtain a multi-channel speech signal to be processed, where the multi-channel speech signal at least includes a speech signal from a target sound source and an interference signal from an interference sound source; a beam forming unit 502, configured to perform fixed beam forming on a multi-channel speech signal based on at least two preset fixed beam forming coefficients pointing to different directions, so as to obtain a speech estimation signal and an interference estimation signal; a calculating unit 503, configured to calculate a post-filtering parameter based on the speech estimation signal and the interference estimation signal; the filtering unit 504 is configured to perform filtering processing on the speech estimation signal based on the post-filtering parameter, so as to obtain a processed speech signal.

In the embodiment of the present invention, the beamforming unit is configured to perform fixed beamforming on a multi-channel speech signal based on at least two preset fixed beamforming coefficients pointing to different directions, so as to obtain at least two beam signals; determining a beam pointing to a beam signal of a target sound source in at least two beam signals as a voice estimation signal; and determining other beam signals except the voice estimation signal in the at least two beam signals as interference estimation signals.

In the embodiment of the present invention, the calculating unit is configured to calculate a time-frequency level post gain based on the speech estimation signal and the interference estimation signal; and calculating the frame level post gain based on the time frequency level post gain.

In the embodiment of the present invention, the filtering unit is configured to calculate a product of the frame-level post gain, the time-frequency-level post gain, and the speech estimation signal, and obtain the processed speech signal.

In the embodiment of the present invention, the calculating unit is configured to calculate a weighted sum of interference estimation signals based on a preset weight coefficient, so as to obtain an interference signal energy estimation value; and calculating the time-frequency level post gain based on the voice estimated signal and the interference signal energy estimated value.

In the embodiment of the invention, the calculating unit is used for calculating the sum of the energy estimated values of the voice estimated signal and the interference signal to obtain a first value; and calculating the ratio of the voice estimation signal to the first value to obtain the time-frequency level post-gain.

In the embodiment of the invention, the calculating unit is used for calculating the ratio of the voice estimated signal to the interference signal energy estimated value to obtain a signal-to-noise ratio estimated value; calculating the sum of the signal-to-noise ratio estimation value and a preset constant value to obtain a second value; and calculating the ratio of the signal-to-noise ratio estimation value to the second value to obtain the time-frequency level post-gain.

Since the filtering apparatus based on fixed beamforming described in the embodiment of the present invention is a device that can execute the filtering method based on fixed beamforming in the embodiment of the present invention, based on the filtering method based on fixed beamforming described in the embodiment of the present invention, those skilled in the art can understand the specific implementation manner and various variations of the filtering apparatus based on fixed beamforming in the embodiment of the present invention, and therefore, how the filtering apparatus based on fixed beamforming implements the filtering method based on fixed beamforming in the embodiment of the present invention is not described in detail herein. The scope of the present application is not limited to the specific embodiments of the present invention, and other embodiments of the present invention are also within the scope of the present invention.

Example four

Based on the same inventive concept, the embodiment of the invention provides electronic equipment. Fig. 6 is a schematic structural diagram of an electronic device in a fourth embodiment of the present invention, and referring to fig. 6, the electronic device 60 includes: at least one processor 601; and at least one memory 602, bus 603 connected to processor 601; the processor 601 and the memory 602 complete communication with each other through the bus 603; the processor 601 is used to call the program instructions in the memory 602 to perform the following steps: obtaining a multi-channel voice signal to be processed, wherein the multi-channel voice signal at least comprises a voice signal from a target sound source and an interference signal from an interference sound source; based on at least two preset fixed beam forming coefficients pointing to different directions, performing fixed beam forming on a multi-channel voice signal to obtain a voice estimation signal and an interference estimation signal; calculating post-filtering parameters based on the speech estimation signal and the interference estimation signal; and based on the post-filtering parameters, filtering the voice estimation signal to obtain a processed voice signal.

In the embodiment of the present invention, when the processor calls the program instruction, the following steps may be further performed: based on at least two preset fixed beam forming coefficients pointing to different directions, performing fixed beam forming on the multi-channel voice signal to obtain at least two beam signals; determining a beam pointing to a beam signal of a target sound source in at least two beam signals as a voice estimation signal; and determining other beam signals except the voice estimation signal in the at least two beam signals as interference estimation signals.

In the embodiment of the present invention, when the processor calls the program instruction, the following steps may be further performed: calculating time-frequency level post gain based on the voice estimation signal and the interference estimation signal; and calculating the frame level post gain based on the time frequency level post gain.

In the embodiment of the present invention, when the processor calls the program instruction, the following steps may be further performed: and calculating the product of the frame level post gain, the time-frequency level post gain and the voice estimation signal to obtain a processed voice signal.

In the embodiment of the present invention, when the processor calls the program instruction, the following steps may be further performed: calculating the weighted sum of interference estimation signals based on a preset weight coefficient to obtain an interference signal energy estimation value; and calculating the time-frequency level post gain based on the voice estimated signal and the interference signal energy estimated value.

In the embodiment of the present invention, when the processor calls the program instruction, the following steps may be further performed: calculating the sum of the energy estimated values of the voice estimated signal and the interference signal to obtain a first value; and calculating the ratio of the voice estimation signal to the first value to obtain the time-frequency level post-gain.

In the embodiment of the present invention, when the processor calls the program instruction, the following steps may be further performed: calculating the ratio of the energy estimation value of the voice estimation signal to the energy estimation value of the interference signal to obtain the signal-to-noise ratio estimation value; calculating the sum of the signal-to-noise ratio estimation value and a preset constant value to obtain a second value; and calculating the ratio of the signal-to-noise ratio estimation value to the second value to obtain the time-frequency level post-gain.

The embodiment of the present invention further provides a processor, where the processor is configured to execute a program, where the program executes the steps of the filtering method based on fixed beam forming in the foregoing embodiment when running.

The Processor may be implemented by a Central Processing Unit (CPU), a MicroProcessor Unit (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like. The Memory may include volatile Memory in a computer readable medium, Random Access Memory (RAM), and/or nonvolatile Memory such as Read Only Memory (ROM) or Flash Memory (Flash RAM), and the Memory includes at least one Memory chip.

EXAMPLE five

Based on the same inventive concept, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored program, and when the program runs, the apparatus in which the storage medium is located is controlled to execute the steps of the fixed beamforming-based filtering method in the above embodiments.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, Compact disk Read-Only Memory (CD-ROM), optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, RAM and/or non-volatile memory, such as ROM or Flash RAM. The memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. The computer readable storage medium may be ROM, Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory,

EEPROM), magnetic Random Access Memory (FRAM), Flash Memory (Flash Memory), magnetic surface Memory, optical disk, or Compact Disc Read-only Memory (CD-ROM); or flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information and which can be accessed by a computing device; but may also be various electronic devices such as mobile phones, computers, tablet devices, personal digital assistants, etc., that include one or any combination of the above-mentioned memories. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The above are merely examples of the present invention, and are not intended to limit the present invention. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. A method of fixed beamforming based filtering, the method comprising:

obtaining a multi-channel voice signal to be processed, wherein the multi-channel voice signal at least comprises a voice signal from a target sound source and an interference signal from an interference sound source;

based on at least two preset fixed beam forming coefficients pointing to different directions, performing fixed beam forming on the multi-channel voice signal to obtain a voice estimation signal and an interference estimation signal; the fixed beamforming coefficients are fixed beams with a certain pointing direction;

calculating post-filtering parameters based on the speech estimation signal and the interference estimation signal;

based on the post-filtering parameter, filtering the voice estimation signal to obtain a processed voice signal;

wherein, the performing fixed beamforming on the multi-channel voice signal based on at least two preset fixed beamforming coefficients pointing to different directions to obtain a voice estimation signal and an interference estimation signal includes:

based on at least two preset fixed beam forming coefficients pointing to different directions, performing fixed beam forming on the multi-channel voice signal to obtain at least two beam signals; the preset fixed beam forming coefficients pointing to different directions can enhance the voice signal pointed by the target sound source and carry out fixed beam forming in a mode of inhibiting other interference signals except the voice signal pointed by the target sound source;

determining a beam of the at least two beam signals as a beam signal of the target sound source;

determining other beam signals except the voice estimation signal in the at least two beam signals as the interference estimation signal;

wherein, the filtering the speech estimation signal based on the post-filtering parameter to obtain a processed speech signal includes:

and calculating the product of the frame level post gain, the time-frequency level post gain and the voice estimation signal to obtain the processed voice signal.

2. The method of claim 1, wherein the calculating post-filtering parameters based on the speech estimation signal and the interference estimation signal comprises:

calculating a time-frequency level post-gain based on the speech estimation signal and the interference estimation signal;

and calculating the frame level post gain based on the time frequency level post gain.

3. The method of claim 2, wherein calculating a time-frequency level post-gain based on the speech estimation signal and the interference estimation signal comprises:

calculating the weighted sum of the interference estimation signals based on a preset weight coefficient to obtain an interference signal energy estimation value;

and calculating the time-frequency level post-gain based on the voice estimation signal and the interference signal energy estimation value.

4. The method of claim 3, wherein the computing the time-frequency level post-gain based on the speech estimation signal and the interference signal energy estimation value comprises:

calculating the sum of the voice estimation signal and the interference signal energy estimation value to obtain a first value;

and calculating the ratio of the voice estimation signal to the first value to obtain the time-frequency level post gain.

5. The method of claim 3, wherein the computing the time-frequency level post-gain based on the speech estimation signal and the interference signal energy estimation value comprises:

calculating the ratio of the voice estimation signal to the interference signal energy estimation value to obtain a signal-to-noise ratio estimation value;

calculating the sum of the signal-to-noise ratio estimation value and a preset constant value to obtain a second value;

and calculating the ratio of the signal-to-noise ratio estimation value to the second value to obtain the time-frequency level post-gain.

6. A fixed beamforming based filtering apparatus, the apparatus comprising:

an obtaining unit, configured to obtain a multi-channel speech signal to be processed, where the multi-channel speech signal at least includes a speech signal from a target sound source and an interference signal from an interference sound source;

the beam forming unit is used for carrying out fixed beam forming on the multi-channel voice signal based on at least two preset fixed beam forming coefficients pointing to different directions to obtain a voice estimation signal and an interference estimation signal; the fixed beamforming coefficients are fixed beams with a certain pointing direction;

a calculation unit, configured to calculate a post-filtering parameter based on the speech estimation signal and the interference estimation signal;

the filtering unit is used for carrying out filtering processing on the voice estimation signal based on the post-filtering parameter to obtain a processed voice signal;

the beamforming unit is specifically configured to perform fixed beamforming on a multi-channel speech signal based on at least two preset fixed beamforming coefficients pointing to different directions to obtain at least two beam signals; the preset fixed beam forming coefficients pointing to different directions can enhance the voice signal pointed by the target sound source and carry out fixed beam forming in a mode of inhibiting other interference signals except the voice signal pointed by the target sound source; determining a beam pointing to a beam signal of a target sound source in at least two beam signals as a voice estimation signal; determining other beam signals except the voice estimation signal in the at least two beam signals as interference estimation signals;

the filtering unit is specifically configured to calculate a product of the frame-level post gain, the time-frequency-level post gain, and the speech estimation signal, and obtain a processed speech signal.

7. A computer-readable storage medium, comprising a stored program, wherein the program, when executed, controls an apparatus on which the storage medium is located to perform the steps of the fixed beamforming based filtering method according to any of claims 1 to 5.

8. An electronic device, characterized in that the device comprises:

at least one processor;

and at least one memory, bus connected with the processor;

the processor and the memory complete mutual communication through the bus; the processor is configured to invoke program instructions in the memory to perform the steps of the fixed beamforming based filtering method according to any of the claims 1 to 5.