CN109102822B - Filtering method and device based on fixed beam forming - Google Patents

Filtering method and device based on fixed beam forming Download PDF

Info

Publication number
CN109102822B
CN109102822B CN201810828327.6A CN201810828327A CN109102822B CN 109102822 B CN109102822 B CN 109102822B CN 201810828327 A CN201810828327 A CN 201810828327A CN 109102822 B CN109102822 B CN 109102822B
Authority
CN
China
Prior art keywords
signal
interference
voice
estimation
filtering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810828327.6A
Other languages
Chinese (zh)
Other versions
CN109102822A (en
Inventor
孙思宁
黄美玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Volkswagen China Investment Co Ltd
Mobvoi Innovation Technology Co Ltd
Original Assignee
Mobvoi Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mobvoi Information Technology Co Ltd filed Critical Mobvoi Information Technology Co Ltd
Priority to CN201810828327.6A priority Critical patent/CN109102822B/en
Publication of CN109102822A publication Critical patent/CN109102822A/en
Application granted granted Critical
Publication of CN109102822B publication Critical patent/CN109102822B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Abstract

The embodiment of the invention provides a filtering method and a device based on fixed beam forming, wherein the method comprises the following steps: obtaining a multi-channel voice signal to be processed, wherein the multi-channel voice signal at least comprises a voice signal from a target sound source and an interference signal from an interference sound source; based on at least two preset fixed beam forming coefficients pointing to different directions, performing fixed beam forming on the multi-channel voice signal to obtain a voice estimation signal and an interference estimation signal; calculating post-filtering parameters based on the speech estimation signal and the interference estimation signal; and based on the post-filtering parameter, carrying out filtering processing on the voice estimation signal to obtain a processed voice signal. Therefore, the voice signal after beam forming is filtered through the post-filtering parameter, the voice signal pointed by the target sound source is ensured not to be distorted, and other interference signals are effectively inhibited.

Description

Filtering method and device based on fixed beam forming
Technical Field
The embodiment of the invention relates to the technical field of signal processing, in particular to a filtering method and a filtering device based on fixed beam forming.
Background
Along with the rise of smart homes and the Internet of things, electronic equipment such as smart sound boxes, wearable equipment and smart phones is rapidly popularized, the requirements of users on the functions and the intellectualization of the electronic equipment are higher and higher, and most of the electronic equipment is configured with an intelligent voice interaction function in order to enable human-computer interaction to be more natural and simple. However, when the distance between the user and the electronic device is large, when the electronic device picks up sound at a long distance through a sensor array (e.g., a microphone array), due to various interferences including background noise (e.g., background music), other human voices, reverberation, and the like contained in the real environment, the quality of the voice signal of the target user acquired by the electronic device is poor, and the voice recognition accuracy is low.
Currently, Beamforming (Beamforming), which is a signal processing technique used in sensor arrays (e.g., microphone arrays), is commonly used in capturing user speech for directional signal reception and proper signal processing of received sound signals.
The inventor finds that non-stationary interference signals cannot be effectively suppressed due to the fact that time-varying reverberation often exists in a real environment and side lobes exist in a beamforming algorithm and are constrained by sensor array geometry and intelligent terminal calculation conditions in the process of researching beamforming.
Disclosure of Invention
In view of this, embodiments of the present invention provide a filtering method and apparatus based on fixed beam forming, which mainly aim to ensure that a user speech signal pointed by a target sound source is not distorted, and effectively suppress interference signals pointed by other spaces.
In order to achieve the above purpose, the embodiments of the present invention mainly provide the following technical solutions:
in a first aspect, an embodiment of the present invention provides a filtering method based on fixed beam forming, where the method includes: obtaining a multi-channel voice signal to be processed, wherein the multi-channel voice signal at least comprises a voice signal from a target sound source and an interference signal from an interference sound source; based on at least two preset fixed beam forming coefficients pointing to different directions, performing fixed beam forming on the multi-channel voice signal to obtain a voice estimation signal and an interference estimation signal; calculating post-filtering parameters based on the speech estimation signal and the interference estimation signal; and based on the post-filtering parameter, carrying out filtering processing on the voice estimation signal to obtain a processed voice signal.
In a second aspect, an embodiment of the present invention provides a filtering apparatus based on fixed beam forming, where the apparatus includes: an obtaining unit, configured to obtain a multi-channel speech signal to be processed, where the multi-channel speech signal at least includes a speech signal from a target sound source and an interference signal from an interference sound source; the beam forming unit is used for carrying out fixed beam forming on the multi-channel voice signal based on at least two preset fixed beam forming coefficients pointing to different directions to obtain a voice estimation signal and an interference estimation signal; a calculation unit, configured to calculate a post-filtering parameter based on the speech estimation signal and the interference estimation signal; and the filtering unit is used for carrying out filtering processing on the voice estimation signal based on the post-filtering parameter to obtain a processed voice signal.
In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where the storage medium includes a stored program, and when the program runs, the apparatus where the storage medium is located is controlled to execute the steps of the filtering method based on fixed beam forming.
In a fourth aspect, an embodiment of the present invention provides an electronic device, where the electronic device includes: at least one processor; and at least one memory, bus connected with the processor; the processor and the memory complete mutual communication through the bus; the processor is configured to invoke program instructions in the memory to perform the steps of the fixed beamforming based filtering method described above.
After obtaining a multi-channel voice signal simultaneously including a voice signal of a target sound source and an interference signal of an interference sound source, the filtering method and device based on fixed beam forming according to the embodiments of the present invention perform fixed beam forming on the multi-channel voice signal based on at least two preset fixed beam forming coefficients pointing to different directions to obtain a voice estimation signal and an interference estimation signal. Next, after fixed beam forming, post-filtering parameters are calculated from the obtained speech estimation signal and interference estimation signal. And finally, filtering the voice estimation signal through the post-filtering parameter to obtain a processed voice signal. Therefore, the voice signals are subjected to beam enhancement through fixed beam forming, the user voice signals pointed by the target sound source can be enhanced, interference signals in other directions are inhibited, the enhanced voice signals are subjected to post-filtering through post-filtering parameters, and a large amount of residual interference signals in the enhanced user voice signals after single beam forming can be effectively inhibited. Thereby, an effective suppression of interfering signals in non-target sound source directions is achieved. Therefore, when the method is applied to long-distance sound pickup, the voice signal of a user pointed by a target sound source is ensured not to be distorted, and interference signals pointed by other spaces are effectively suppressed.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a schematic flowchart of a filtering method based on fixed beam forming according to a first embodiment of the present invention;
fig. 2A to 2B are schematic diagrams of a microphone array according to a first embodiment of the invention;
FIG. 3 is a diagram illustrating a multi-channel speech signal model according to a second embodiment of the present invention;
fig. 4 is a diagram illustrating multiple fixed beams according to a second embodiment of the present invention;
fig. 5 is a schematic structural diagram of a filtering apparatus based on fixed beam forming according to a third embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device in a fourth embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Example one
In practical application, the filtering method based on fixed beam forming can be applied to various occasions where a voice signal needs to be filtered to obtain a clean voice signal, for example, in the field of voice recognition, in order to improve the accuracy of voice recognition, preprocessing is required to be performed before the voice signal which contains an interference signal and is collected by a sensor array is recognized to enhance the voice signal of a target user, remove the interference signals such as environmental noise and other voices, and obtain the clean user voice signal.
Specifically, the fixed beam forming based filtering method is executed by a fixed beam forming based filtering apparatus, and the fixed beam forming based filtering apparatus can be built in or externally connected to an electronic device.
In practical applications, the electronic device may be implemented in various forms. For example, the electronic device described in the embodiment of the present invention may include smart home devices such as a smart speaker, a smart television, a smart set-top box, etc., and personal devices such as a smart phone, a tablet computer, a smart watch, a smart band, etc. Of course, other types of electronic devices with user voice collection and processing functions, such as notebook computers, etc., are also possible. Here, a specific implementation form of the electronic device in the embodiment of the present invention is not particularly limited.
Then, fig. 1 is a schematic flowchart of a fixed beam forming based filtering method according to a first embodiment of the present invention, and referring to fig. 1, the fixed beam forming based filtering method may include:
s101: obtaining a multi-channel voice signal to be processed;
wherein the multi-channel speech signal comprises at least a speech signal from a target sound source and an interfering signal from an interfering sound source.
Here, the target sound source generally refers to a user who is currently making a sound using the electronic device, such as a person who is speaking; the interfering sound source may refer to another person who is making a sound in the current environment where the electronic device is located, such as another person singing, or may refer to an electronic device that is making a sound and is used by another user in the current environment where the electronic device is located, such as a sound box or a mobile phone that is playing music.
Here, the number of the target sound sources is one, and the number of the interfering sound sources is one or more, such as two, three, and the like. Of course, in practical applications, the obtained multi-channel speech signal may include other types of interference signals, such as ambient noise, reverberation interference, etc., besides the speech signal of the target sound source and the interference signal from the interference sound source.
In practical application, the target sound source and the interference sound source can point to any angle of 0-180 degrees of plane waves.
In particular, in order to obtain a multi-channel speech signal to be processed, the multi-channel speech signal may be acquired by a sensor array provided in the electronic device. In practical applications, the sensor array is composed of a number of acoustic sensors (e.g., microphones) for sampling the spatial characteristics of the sound field.
For example, assuming that the sensor Array is implemented by a Microphone Array (Microphone Array), the sensor Array may be an Array composed of 4 linear uniformly-spaced microphones (as shown in fig. 2A), an Array composed of 6 linear uniformly-spaced microphones (as shown in fig. 2B), an Array composed of 8 circular uniformly-spaced microphones (as shown in fig. 2B), or an Array composed of other numbers and arrangements of microphones, for example, an Array composed of 12 circular, rectangular, crescent uniformly-spaced microphones, etc. Here, the number and arrangement of the microphones in the microphone array are not specifically limited in the embodiments of the present invention.
In practical applications, in consideration of characteristics of sound waves, when a microphone array is laid out, if a distance between every two arranged microphones is not appropriate, an error is generated in focusing and positioning of a sound source, and therefore, the distance between every two arranged microphones is not suitable to be too large, and cannot be too small. Illustratively, the equidistant distance between two microphones may be set to be less than 80 mm and greater than 30 mm.
S102: based on at least two preset fixed beam forming coefficients pointing to different directions, performing fixed beam forming on a multi-channel voice signal to obtain a voice estimation signal and an interference estimation signal;
in practical application, at least two preset fixed beam forming coefficients pointing to different directions can enhance the voice signal pointed by the target sound source and perform fixed beam forming in a manner of suppressing other interference signals except the voice signal pointed by the target sound source, so that the voice signal pointed by the target sound source is not distorted, and the voice signals in other directions are suppressed.
Specifically, after obtaining the multi-channel voice signal to be processed, the multi-channel voice signal may be fixed beamformed according to a set of fixed beamforming coefficients designed in advance, so as to enhance the energy of the voice signal in the target sound source direction and suppress the energy of the interfering signal in the directions (such as the interfering sound source direction) other than the target sound source direction. In this way, an enhanced speech signal, i.e. a speech estimation signal, can be obtained, and a suppressed residual interference estimation signal can be obtained.
In a specific implementation process, the number of the preset fixed beamforming coefficients pointing to different directions may be two, three, four, and the like. Each fixed beamforming coefficient refers to a fixed beam with a certain pointing direction, for example, may be a fixed beam pointing in the directions of 0 °, 30 °, 53 °, 60 °, 80 °, 90 °, 120 °, 150 °, 180 °, and so on. However, it should be noted that the fixed beamforming is not limited to the above-mentioned angle, and may be directed to other angles, and the embodiments of the present invention are not limited thereto.
In practical applications, the fixed beamforming coefficients for each direction are a matrix, in which the number of beamforming parameter values included in the matrix corresponds to the number of microphones in the microphone array.
S103: calculating post-filtering parameters based on the speech estimation signal and the interference estimation signal;
in practical application, due to existence of reverberation and sidelobes in a beamforming algorithm, after a multi-channel voice signal is beamformed, an interference signal in a non-target sound source direction cannot be completely suppressed, so that a large amount of noise or interference sound residues in the non-target sound source direction still exist in the voice signal in the target sound source direction enhanced after a single beamforming. Thus, after S102 is executed, there is still a residual interference signal in the obtained speech estimation signal, and therefore, it is further necessary to calculate a post-filtering parameter based on the speech estimation signal and the interference estimation signal, so as to perform post-filtering on the speech estimation signal to obtain a cleaner speech signal.
In practical application, the post-filter parameter can further suppress residual interference signals, such as environmental noise, reverberation interference, interference signals directed by an interference sound source, and the like, in the enhanced voice signal directed by the target sound source, so that the obtained final signal is not distorted and is more pure.
S104: and based on the post-filtering parameters, filtering the voice estimation signal to obtain a processed voice signal.
Specifically, after the post-filtering parameter is obtained, the speech estimation signal can be further filtered according to the post-filtering parameter to further suppress the interference signal still remaining in the speech estimation signal after the beam enhancement, so that a purer final signal, i.e., the processed speech signal, can be obtained.
As can be seen from the above, after obtaining a multi-channel speech signal simultaneously including a speech signal of a target sound source and an interference signal of an interference sound source, the filtering method based on fixed beam forming according to the embodiment of the present invention performs fixed beam forming on the multi-channel speech signal based on at least two preset fixed beam forming coefficients pointing to different directions to obtain a speech estimation signal and an interference estimation signal. Next, after fixed beam forming, post-filtering parameters are calculated from the obtained speech estimation signal and interference estimation signal. And finally, filtering the voice estimation signal through the post-filtering parameter to obtain a processed voice signal. Therefore, the voice signals are subjected to beam enhancement through fixed beam forming, the user voice signals pointed by the target sound source can be enhanced, interference signals in other directions are inhibited, the enhanced voice signals are subjected to post-filtering through post-filtering parameters, and a large amount of residual interference signals in the enhanced user voice signals after single beam forming can be effectively inhibited. Thereby, an effective suppression of interfering signals in non-target sound source directions is achieved. When the method is applied to the long-distance sound pickup, the voice signals of the user pointed by the target sound source can be ensured not to be distorted, and the interference signals pointed by other spaces can be effectively suppressed.
Example two
Based on the foregoing embodiments, the embodiments of the present invention further provide another filtering method based on fixed beam forming, and the steps in the foregoing embodiments are described in detail with specific examples. The method is applied to the following scenes: referring to fig. 3, it is assumed that the sensor array 30 is composed of M microphones which are linearly and uniformly distributed at equal intervals, and that there are a target sound source 31 and two interfering sound sources 32 in the space, where M is a positive integer, and the angle of the target sound source 31 is θsThe angles of the interfering sound sources 32 are respectively
Figure BDA0001742993910000071
And
Figure BDA0001742993910000072
the following describes the multichannel speech signal in S101 described above.
Specifically, the mth microphone receives the signal x at time tm(t) can be represented by the following formula (1).
Figure BDA0001742993910000073
In formula (1), denotes convolution, hsm(t) represents the impulse response from the target sound source to the Mth microphone, s (t) is the speech signal generated by the target sound source, him(t) represents the impulse response from the ith interfering sound source to the Mth microphone, ni(t) is an interference signal generated by the ith interference sound source, i is an index of the interference sound source, and the value range of i is [1, N]。
In practical application, in order to facilitate the subsequent processing of a multi-channel speech signal, the speech signal needs to be processed by fourier transform, the time domain signal which is difficult to process originally is converted into a frequency domain signal which is easy to analyze, the principle of fourier transform is that any continuously measured time sequence or signal can be represented as infinite superposition of sine wave signals with different frequencies, and the frequency, amplitude and phase of different sine wave signals in the signal are calculated in an accumulation mode by using the directly measured original signal according to the fourier transform algorithm created according to the principle. The detailed implementation of the fourier transform is not described herein.
Next, the time domain signal x is processedmAfter (t) is transformed into the frequency domain, a frequency domain signal X shown in the following formula (2) can be obtainedm(t,k)。
Figure BDA0001742993910000081
In formula (2), t denotes a time frame index (time frame index), and k denotes a discrete frequency index (frequency bin index).
In the frequency domain, the observation signals of M microphones are expressed in a vector form, so that a multichannel speech signal X (t, k) shown in the following formula (3) can be obtained.
X(t,k)=[X1(t,k),X2(t,k),...,XM(t,k)]TFormula (3)
In formula (3), X (t, k) represents a multi-channel speech signal, X1(t, k) represents the signal picked up by the 1 st microphone, X2(t, k) represents the signal picked up by the 2 nd microphone, XMAnd (t, k) represents the signal collected by the Mth microphone.
In another embodiment of the present invention, the above S102 can be implemented by, but not limited to, the following method. Specifically, S102 may include: based on at least two preset fixed beam forming coefficients pointing to different directions, performing fixed beam forming on the multi-channel voice signal to obtain at least two beam signals; determining a beam pointing to a beam signal of a target sound source in at least two beam signals as a voice estimation signal; and determining other beam signals except the voice estimation signal in the at least two beam signals as interference estimation signals.
Specifically, the number of preset fixed beamforming coefficients pointing in different directions may be two, three, four, and the like. Each fixed beamforming coefficient refers to a fixed beam with a certain directivity. For example, referring to fig. 4, when performing fixed beam forming (also referred to as beam enhancement) on a multi-channel voice signal, 7 fixed beams 40 may be used to enhance different directions, respectively, wherein the fixed beams with beam pointing in 7 directions of 0 °, 30 °, 60 °, 90 °, 120 °, 150 °, and 180 ° may be set, and wherein the direction of the target sound source 31 is the 90 ° direction.
In the embodiment of the invention, the whole space can be divided into P parts by using a fixed beam forming technology with white noise gain constraint, and fixed beam forming coefficients W pointing to P different directions are designed1(t,k),...,WP(t, k) performing beam enhancement on the multi-channel speech signal to enhance signals from different directions respectively, wherein W1(t, k) represents a fixed beamforming coefficient for the 1 st direction, WP(t, k) represents a fixed beamforming coefficient for the P-th direction.
In practical applications, the fixed beamforming coefficients for each direction are a matrix, in which the number of beamforming parameter values included in the matrix corresponds to the number of microphones in the microphone array.
Illustratively, the target sound source pointing angle is θsFor each time-frequency point, the fixed beam forming coefficient corresponding to the target sound source direction
Figure BDA0001742993910000091
Which is a matrix, can be expressed as the following equation (4).
Figure BDA0001742993910000092
In the formula (4), θsWhich represents the pointing direction of the target sound source,
Figure BDA0001742993910000093
a1 st beamforming parameter indicating a direction of a target sound source,
Figure BDA0001742993910000094
a2 nd beamforming parameter indicating a direction of the target sound source,
Figure BDA0001742993910000095
and M-th beam forming parameters corresponding to the target sound source direction are represented.
Similarly, W1(t, k), WPOther fixed beamforming coefficients such as (t, k) are similar to the fixed beamforming coefficients shown in equation (4) above. Here, too much description is not given.
Next, the multi-channel speech signal may be enhanced using the P designed beamforming parameters to obtain P beam signals.
Assuming that the beam signal directed by the target sound source is the q-th beam signal among the P beam signals, the q-th beam signal can be calculated by equations (4) and (5).
Figure BDA0001742993910000096
In the formula (5), the first and second groups,
Figure BDA0001742993910000097
which represents the q-th beam signal,
Figure BDA0001742993910000098
and the fixed beam forming coefficients corresponding to the target sound source direction are shown, and X (t, k) represents a multi-channel voice signal.
Similarly, the other P-1 beam signals of the P beam signals
Figure BDA0001742993910000099
The calculation method of (a) is similar to the formula (5), wherein P is 1.. and P is not equal to q, and only the fixed beamforming coefficients corresponding to the target sound source direction in the formula (5) are replaced by other P-1 fixed beamforming coefficients. Here, do not do too muchThe description is given.
Specifically, the speech estimation signal can be obtained by estimating the speech signal of the target sound source from the q-th beam signal, which is the beam signal directed to the target sound source, among the P beam signals
Figure BDA00017429939100000910
And the other P-1 wave beam signals except the pointed wave beam signal of the target sound source in the P wave beam signals
Figure BDA00017429939100000911
As an estimation of the interference signal, an interference estimation signal can be obtained, where P ═ 1.
In another embodiment of the present invention, the above S103 can be implemented by, but not limited to, the following method. In the specific implementation process, in order to achieve a better filtering effect and obtain a cleaner speech signal, the post-filtering parameters may be implemented by a frame-level post-gain and a time-frequency-level post-gain. Specifically, the step S103 may include the following steps a1 to a 2:
step A1, calculating time-frequency level post gain based on the voice estimation signal and the interference estimation signal;
and A2, calculating the frame level post gain based on the time frequency level post gain.
In a specific implementation process, in order to calculate the post gain of the time-frequency level, the step a1 may further include the following steps B1 to B2:
step B1, calculating the weighted sum of the interference estimation signals based on the preset weight coefficient to obtain the energy estimation value of the interference signals;
in particular, an interference estimation signal is obtained
Figure BDA0001742993910000101
Where P is 1.. after P, P ≠ q, it can be used
Figure BDA0001742993910000102
The interference signal energy is carried out by the following formula (6)To obtain an estimate of the interference signal energy
Figure BDA0001742993910000103
Figure BDA0001742993910000104
In the formula (6), αpRepresents the weight of the p-th beam, where 0 < αp<1。
In practical applications, the weighting factor is an empirical value, and can be set by a person skilled in the art in a specific implementation process according to practical situations, and the embodiment of the present invention is not limited in particular here.
And step B2, calculating the time-frequency level post gain based on the estimated value of the voice estimated signal and the estimated value of the interference signal energy.
In a specific implementation process, a specific implementation manner of calculating the time-frequency level post-gain based on the estimated values of the energy of the speech estimation signal and the interference signal may exist and is not limited to the following two implementation manners:
the first implementation mode comprises the following steps: calculating the sum of the energy estimated values of the voice estimated signal and the interference signal to obtain a first value; and calculating the ratio of the voice estimation signal to the first value to obtain the time-frequency level post-gain.
Specifically, since the speech estimation signal is obtained according to equation (5) and the interference signal energy estimation value is estimated according to equation (6), the time-frequency level post gain G can be calculated by equation (7) belowTF(t,k)。
Figure BDA0001742993910000105
In the formula (7), GTF(t, k) represents the time-frequency level post gain,
Figure BDA0001742993910000106
which represents the estimated signal of the speech to be,
Figure BDA0001742993910000107
representing an estimate of the interference signal energy.
The second implementation mode comprises the following steps: calculating the ratio of the energy estimation value of the voice estimation signal to the energy estimation value of the interference signal to obtain the signal-to-noise ratio estimation value; calculating the sum of the signal-to-noise ratio estimation value and a preset constant value to obtain a second value; and calculating the ratio of the signal-to-noise ratio estimation value to the second value to obtain the time-frequency level post-gain.
Specifically, since the speech estimation signal is calculated according to the formula (5) and the interference signal energy estimation value is obtained according to the formula (6), the snr estimation value can be obtained by estimating the snr of the time frequency point according to the following formula (8), and then the time-frequency level post gain can be obtained by estimating the wiener gain of the time-frequency level according to the following formula (9) using the estimated snr estimation value.
Figure BDA0001742993910000111
In the formula (8), the first and second groups,
Figure BDA0001742993910000112
representing an estimate of the signal-to-noise ratio,
Figure BDA0001742993910000113
which represents the estimated signal of the speech to be,
Figure BDA0001742993910000114
representing an estimate of the interference signal energy.
Figure BDA0001742993910000115
In formula (9), GTF(t, k) represents the time-frequency level post gain,
Figure BDA0001742993910000116
represents the snr estimation value and C represents a preset constant value.
In general, the predetermined constant value may take 1.
Specifically, after obtaining the speech estimation signal and the interference estimation signal in S102, step a1 may be executed to calculate a time-frequency level post gain, and then step a2 may be executed to calculate a frame level post gain according to the time-frequency level post gain by the following equation (10) through a framing process.
Figure BDA0001742993910000117
In the formula (10), GT(t) denotes the frame level post gain, GTF(t, K) represents the time-frequency level post gain, K represents the total frame number, K is 0.
Of course, it should be noted that the post-filtering parameter may be any one of the frame-level post-gain and the time-frequency-level post-gain, besides the combination of the frame-level post-gain and the time-frequency-level post-gain.
In practical application, the post-filter parameter can further suppress residual interference signals, such as environmental noise, reverberation interference, interference signals directed by an interference sound source, and the like, in the enhanced voice signal directed by the target sound source, so that the obtained final signal is not distorted and is more pure.
In another embodiment of the present invention, the above S104 can be implemented by, but not limited to, the following method. In a specific implementation process, the step S104 may include: and calculating the product of the frame level post gain, the time-frequency level post gain and the voice estimation signal to obtain a processed voice signal.
Specifically, when the post-filtering parameter is specifically realized by the frame-level post gain and the time-frequency-level post gain, the processed speech signal can be obtained by the following equation (11).
Figure BDA0001742993910000121
In the formula (11), the reaction mixture,
Figure BDA0001742993910000122
representing the processed speech signal, GT(t) denotes the frame level post gain, GTF(t, k) represents the time-frequency level postgain.
Thus, the multi-beam space post-filtering process of the voice signals is completed.
As can be seen from the above, in the embodiment of the present invention, the voice signal is first beam-enhanced by fixed beam forming, so that the voice signal of the user pointed by the target sound source can be enhanced, and the interference signals in other directions are suppressed, and then the enhanced voice signal is post-filtered by the frame-level post-gain and the time-frequency-level post-gain, so that the residual interference signals, such as the environmental noise, the reverberation interference, the interference signal pointed by the interference sound source, and the like, in the voice signal of the user can be effectively suppressed, and a better filtering effect is achieved. Thereby, an effective suppression of interfering signals in non-target sound source directions is achieved. When the method is applied to the long-distance sound pickup, the voice signals of the user pointed by the target sound source can be ensured not to be distorted, and the interference signals pointed by other spaces can be effectively suppressed.
EXAMPLE III
Based on the same inventive concept, as an implementation of the foregoing method, an embodiment of the present invention provides a filtering apparatus based on fixed beam forming, where the apparatus embodiment corresponds to the foregoing method embodiment, and for convenience of reading, details in the foregoing method embodiment are not repeated in this apparatus embodiment one by one, but it should be clear that the apparatus in this embodiment can correspondingly implement all the contents in the foregoing method embodiment.
Fig. 5 is a schematic structural diagram of a filtering apparatus based on fixed beam forming according to a third embodiment of the present invention, and referring to fig. 5, the apparatus 50 includes: an obtaining unit 501, configured to obtain a multi-channel speech signal to be processed, where the multi-channel speech signal at least includes a speech signal from a target sound source and an interference signal from an interference sound source; a beam forming unit 502, configured to perform fixed beam forming on a multi-channel speech signal based on at least two preset fixed beam forming coefficients pointing to different directions, so as to obtain a speech estimation signal and an interference estimation signal; a calculating unit 503, configured to calculate a post-filtering parameter based on the speech estimation signal and the interference estimation signal; the filtering unit 504 is configured to perform filtering processing on the speech estimation signal based on the post-filtering parameter, so as to obtain a processed speech signal.
In the embodiment of the present invention, the beamforming unit is configured to perform fixed beamforming on a multi-channel speech signal based on at least two preset fixed beamforming coefficients pointing to different directions, so as to obtain at least two beam signals; determining a beam pointing to a beam signal of a target sound source in at least two beam signals as a voice estimation signal; and determining other beam signals except the voice estimation signal in the at least two beam signals as interference estimation signals.
In the embodiment of the present invention, the calculating unit is configured to calculate a time-frequency level post gain based on the speech estimation signal and the interference estimation signal; and calculating the frame level post gain based on the time frequency level post gain.
In the embodiment of the present invention, the filtering unit is configured to calculate a product of the frame-level post gain, the time-frequency-level post gain, and the speech estimation signal, and obtain the processed speech signal.
In the embodiment of the present invention, the calculating unit is configured to calculate a weighted sum of interference estimation signals based on a preset weight coefficient, so as to obtain an interference signal energy estimation value; and calculating the time-frequency level post gain based on the voice estimated signal and the interference signal energy estimated value.
In the embodiment of the invention, the calculating unit is used for calculating the sum of the energy estimated values of the voice estimated signal and the interference signal to obtain a first value; and calculating the ratio of the voice estimation signal to the first value to obtain the time-frequency level post-gain.
In the embodiment of the invention, the calculating unit is used for calculating the ratio of the voice estimated signal to the interference signal energy estimated value to obtain a signal-to-noise ratio estimated value; calculating the sum of the signal-to-noise ratio estimation value and a preset constant value to obtain a second value; and calculating the ratio of the signal-to-noise ratio estimation value to the second value to obtain the time-frequency level post-gain.
Since the filtering apparatus based on fixed beamforming described in the embodiment of the present invention is a device that can execute the filtering method based on fixed beamforming in the embodiment of the present invention, based on the filtering method based on fixed beamforming described in the embodiment of the present invention, those skilled in the art can understand the specific implementation manner and various variations of the filtering apparatus based on fixed beamforming in the embodiment of the present invention, and therefore, how the filtering apparatus based on fixed beamforming implements the filtering method based on fixed beamforming in the embodiment of the present invention is not described in detail herein. The scope of the present application is not limited to the specific embodiments of the present invention, and other embodiments of the present invention are also within the scope of the present invention.
Example four
Based on the same inventive concept, the embodiment of the invention provides electronic equipment. Fig. 6 is a schematic structural diagram of an electronic device in a fourth embodiment of the present invention, and referring to fig. 6, the electronic device 60 includes: at least one processor 601; and at least one memory 602, bus 603 connected to processor 601; the processor 601 and the memory 602 complete communication with each other through the bus 603; the processor 601 is used to call the program instructions in the memory 602 to perform the following steps: obtaining a multi-channel voice signal to be processed, wherein the multi-channel voice signal at least comprises a voice signal from a target sound source and an interference signal from an interference sound source; based on at least two preset fixed beam forming coefficients pointing to different directions, performing fixed beam forming on a multi-channel voice signal to obtain a voice estimation signal and an interference estimation signal; calculating post-filtering parameters based on the speech estimation signal and the interference estimation signal; and based on the post-filtering parameters, filtering the voice estimation signal to obtain a processed voice signal.
In the embodiment of the present invention, when the processor calls the program instruction, the following steps may be further performed: based on at least two preset fixed beam forming coefficients pointing to different directions, performing fixed beam forming on the multi-channel voice signal to obtain at least two beam signals; determining a beam pointing to a beam signal of a target sound source in at least two beam signals as a voice estimation signal; and determining other beam signals except the voice estimation signal in the at least two beam signals as interference estimation signals.
In the embodiment of the present invention, when the processor calls the program instruction, the following steps may be further performed: calculating time-frequency level post gain based on the voice estimation signal and the interference estimation signal; and calculating the frame level post gain based on the time frequency level post gain.
In the embodiment of the present invention, when the processor calls the program instruction, the following steps may be further performed: and calculating the product of the frame level post gain, the time-frequency level post gain and the voice estimation signal to obtain a processed voice signal.
In the embodiment of the present invention, when the processor calls the program instruction, the following steps may be further performed: calculating the weighted sum of interference estimation signals based on a preset weight coefficient to obtain an interference signal energy estimation value; and calculating the time-frequency level post gain based on the voice estimated signal and the interference signal energy estimated value.
In the embodiment of the present invention, when the processor calls the program instruction, the following steps may be further performed: calculating the sum of the energy estimated values of the voice estimated signal and the interference signal to obtain a first value; and calculating the ratio of the voice estimation signal to the first value to obtain the time-frequency level post-gain.
In the embodiment of the present invention, when the processor calls the program instruction, the following steps may be further performed: calculating the ratio of the energy estimation value of the voice estimation signal to the energy estimation value of the interference signal to obtain the signal-to-noise ratio estimation value; calculating the sum of the signal-to-noise ratio estimation value and a preset constant value to obtain a second value; and calculating the ratio of the signal-to-noise ratio estimation value to the second value to obtain the time-frequency level post-gain.
The embodiment of the present invention further provides a processor, where the processor is configured to execute a program, where the program executes the steps of the filtering method based on fixed beam forming in the foregoing embodiment when running.
The Processor may be implemented by a Central Processing Unit (CPU), a MicroProcessor Unit (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like. The Memory may include volatile Memory in a computer readable medium, Random Access Memory (RAM), and/or nonvolatile Memory such as Read Only Memory (ROM) or Flash Memory (Flash RAM), and the Memory includes at least one Memory chip.
EXAMPLE five
Based on the same inventive concept, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored program, and when the program runs, the apparatus in which the storage medium is located is controlled to execute the steps of the fixed beamforming-based filtering method in the above embodiments.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, Compact disk Read-Only Memory (CD-ROM), optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, RAM and/or non-volatile memory, such as ROM or Flash RAM. The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. The computer readable storage medium may be ROM, Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory,
EEPROM), magnetic Random Access Memory (FRAM), Flash Memory (Flash Memory), magnetic surface Memory, optical disk, or Compact Disc Read-only Memory (CD-ROM); or flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information and which can be accessed by a computing device; but may also be various electronic devices such as mobile phones, computers, tablet devices, personal digital assistants, etc., that include one or any combination of the above-mentioned memories. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present invention, and are not intended to limit the present invention. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (8)

1. A method of fixed beamforming based filtering, the method comprising:
obtaining a multi-channel voice signal to be processed, wherein the multi-channel voice signal at least comprises a voice signal from a target sound source and an interference signal from an interference sound source;
based on at least two preset fixed beam forming coefficients pointing to different directions, performing fixed beam forming on the multi-channel voice signal to obtain a voice estimation signal and an interference estimation signal; the fixed beamforming coefficients are fixed beams with a certain pointing direction;
calculating post-filtering parameters based on the speech estimation signal and the interference estimation signal;
based on the post-filtering parameter, filtering the voice estimation signal to obtain a processed voice signal;
wherein, the performing fixed beamforming on the multi-channel voice signal based on at least two preset fixed beamforming coefficients pointing to different directions to obtain a voice estimation signal and an interference estimation signal includes:
based on at least two preset fixed beam forming coefficients pointing to different directions, performing fixed beam forming on the multi-channel voice signal to obtain at least two beam signals; the preset fixed beam forming coefficients pointing to different directions can enhance the voice signal pointed by the target sound source and carry out fixed beam forming in a mode of inhibiting other interference signals except the voice signal pointed by the target sound source;
determining a beam of the at least two beam signals as a beam signal of the target sound source;
determining other beam signals except the voice estimation signal in the at least two beam signals as the interference estimation signal;
wherein, the filtering the speech estimation signal based on the post-filtering parameter to obtain a processed speech signal includes:
and calculating the product of the frame level post gain, the time-frequency level post gain and the voice estimation signal to obtain the processed voice signal.
2. The method of claim 1, wherein the calculating post-filtering parameters based on the speech estimation signal and the interference estimation signal comprises:
calculating a time-frequency level post-gain based on the speech estimation signal and the interference estimation signal;
and calculating the frame level post gain based on the time frequency level post gain.
3. The method of claim 2, wherein calculating a time-frequency level post-gain based on the speech estimation signal and the interference estimation signal comprises:
calculating the weighted sum of the interference estimation signals based on a preset weight coefficient to obtain an interference signal energy estimation value;
and calculating the time-frequency level post-gain based on the voice estimation signal and the interference signal energy estimation value.
4. The method of claim 3, wherein the computing the time-frequency level post-gain based on the speech estimation signal and the interference signal energy estimation value comprises:
calculating the sum of the voice estimation signal and the interference signal energy estimation value to obtain a first value;
and calculating the ratio of the voice estimation signal to the first value to obtain the time-frequency level post gain.
5. The method of claim 3, wherein the computing the time-frequency level post-gain based on the speech estimation signal and the interference signal energy estimation value comprises:
calculating the ratio of the voice estimation signal to the interference signal energy estimation value to obtain a signal-to-noise ratio estimation value;
calculating the sum of the signal-to-noise ratio estimation value and a preset constant value to obtain a second value;
and calculating the ratio of the signal-to-noise ratio estimation value to the second value to obtain the time-frequency level post-gain.
6. A fixed beamforming based filtering apparatus, the apparatus comprising:
an obtaining unit, configured to obtain a multi-channel speech signal to be processed, where the multi-channel speech signal at least includes a speech signal from a target sound source and an interference signal from an interference sound source;
the beam forming unit is used for carrying out fixed beam forming on the multi-channel voice signal based on at least two preset fixed beam forming coefficients pointing to different directions to obtain a voice estimation signal and an interference estimation signal; the fixed beamforming coefficients are fixed beams with a certain pointing direction;
a calculation unit, configured to calculate a post-filtering parameter based on the speech estimation signal and the interference estimation signal;
the filtering unit is used for carrying out filtering processing on the voice estimation signal based on the post-filtering parameter to obtain a processed voice signal;
the beamforming unit is specifically configured to perform fixed beamforming on a multi-channel speech signal based on at least two preset fixed beamforming coefficients pointing to different directions to obtain at least two beam signals; the preset fixed beam forming coefficients pointing to different directions can enhance the voice signal pointed by the target sound source and carry out fixed beam forming in a mode of inhibiting other interference signals except the voice signal pointed by the target sound source; determining a beam pointing to a beam signal of a target sound source in at least two beam signals as a voice estimation signal; determining other beam signals except the voice estimation signal in the at least two beam signals as interference estimation signals;
the filtering unit is specifically configured to calculate a product of the frame-level post gain, the time-frequency-level post gain, and the speech estimation signal, and obtain a processed speech signal.
7. A computer-readable storage medium, comprising a stored program, wherein the program, when executed, controls an apparatus on which the storage medium is located to perform the steps of the fixed beamforming based filtering method according to any of claims 1 to 5.
8. An electronic device, characterized in that the device comprises:
at least one processor;
and at least one memory, bus connected with the processor;
the processor and the memory complete mutual communication through the bus; the processor is configured to invoke program instructions in the memory to perform the steps of the fixed beamforming based filtering method according to any of the claims 1 to 5.
CN201810828327.6A 2018-07-25 2018-07-25 Filtering method and device based on fixed beam forming Active CN109102822B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810828327.6A CN109102822B (en) 2018-07-25 2018-07-25 Filtering method and device based on fixed beam forming

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810828327.6A CN109102822B (en) 2018-07-25 2018-07-25 Filtering method and device based on fixed beam forming

Publications (2)

Publication Number Publication Date
CN109102822A CN109102822A (en) 2018-12-28
CN109102822B true CN109102822B (en) 2020-07-28

Family

ID=64847409

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810828327.6A Active CN109102822B (en) 2018-07-25 2018-07-25 Filtering method and device based on fixed beam forming

Country Status (1)

Country Link
CN (1) CN109102822B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109767783B (en) * 2019-02-15 2021-02-02 深圳市汇顶科技股份有限公司 Voice enhancement method, device, equipment and storage medium
CN111755021B (en) * 2019-04-01 2023-09-01 北京京东尚科信息技术有限公司 Voice enhancement method and device based on binary microphone array
CN110211601B (en) * 2019-05-21 2020-05-08 出门问问信息科技有限公司 Method, device and system for acquiring parameter matrix of spatial filter
CN110267160B (en) * 2019-05-31 2020-09-22 潍坊歌尔电子有限公司 Sound signal processing method, device and equipment
CN110265020B (en) * 2019-07-12 2021-07-06 大象声科(深圳)科技有限公司 Voice wake-up method and device, electronic equipment and storage medium
CN112216298B (en) * 2019-07-12 2024-04-26 大众问问(北京)信息科技有限公司 Dual-microphone array sound source orientation method, device and equipment
CN112216299B (en) * 2019-07-12 2024-02-20 大众问问(北京)信息科技有限公司 Dual-microphone array beam forming method, device and equipment
CN112289335A (en) * 2019-07-24 2021-01-29 阿里巴巴集团控股有限公司 Voice signal processing method and device and pickup equipment
CN110970050B (en) * 2019-12-20 2022-07-15 北京声智科技有限公司 Voice noise reduction method, device, equipment and medium
CN113035216B (en) * 2019-12-24 2023-10-13 深圳市三诺数字科技有限公司 Microphone array voice enhancement method and related equipment
CN113393856B (en) * 2020-03-11 2024-01-16 华为技术有限公司 Pickup method and device and electronic equipment
CN113645542B (en) * 2020-05-11 2023-05-02 阿里巴巴集团控股有限公司 Voice signal processing method and system and audio and video communication equipment
CN113923562A (en) * 2020-07-10 2022-01-11 北京搜狗智能科技有限公司 Sound pickup method and device
CN111863017A (en) * 2020-07-20 2020-10-30 上海汽车集团股份有限公司 In-vehicle directional pickup method based on double-microphone array and related device
CN111863012A (en) * 2020-07-31 2020-10-30 北京小米松果电子有限公司 Audio signal processing method and device, terminal and storage medium
CN112634930A (en) * 2020-12-21 2021-04-09 北京声智科技有限公司 Multi-channel sound enhancement method and device and electronic equipment
CN113077803B (en) * 2021-03-16 2024-01-23 联想(北京)有限公司 Voice processing method and device, readable storage medium and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8249862B1 (en) * 2009-04-15 2012-08-21 Mediatek Inc. Audio processing apparatuses
CN102969002B (en) * 2012-11-28 2014-09-03 厦门大学 Microphone array speech enhancement device capable of suppressing mobile noise
CN107301869B (en) * 2017-08-17 2021-01-29 珠海全志科技股份有限公司 Microphone array pickup method, processor and storage medium thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于传声器阵列的语音增强技术;邹领,等;《语音技术》;20071231;第31卷(第12期);第47-50页 *
基于麦克风阵列的语音增强系统设计;朱兴宇;《中国优秀硕士学位论文全文数据库信息科技辑》;20121031;第17-36,37-40,42,47-49页 *

Also Published As

Publication number Publication date
CN109102822A (en) 2018-12-28

Similar Documents

Publication Publication Date Title
CN109102822B (en) Filtering method and device based on fixed beam forming
CN107221336B (en) Device and method for enhancing target voice
CN106710601B (en) Noise-reduction and pickup processing method and device for voice signals and refrigerator
US9100734B2 (en) Systems, methods, apparatus, and computer-readable media for far-field multi-source tracking and separation
EP2647221B1 (en) Apparatus and method for spatially selective sound acquisition by acoustic triangulation
CN108831498B (en) Multi-beam beamforming method and device and electronic equipment
US20140169576A1 (en) Spatial interference suppression using dual-microphone arrays
CN104699445A (en) Audio information processing method and device
CN107369460B (en) Voice enhancement device and method based on acoustic vector sensor space sharpening technology
CN110660404B (en) Voice communication and interactive application system and method based on null filtering preprocessing
CN112017681A (en) Directional voice enhancement method and system
Derkx et al. Theoretical analysis of a first-order azimuth-steerable superdirective microphone array
Pan et al. On the design of target beampatterns for differential microphone arrays
CN113767432A (en) Audio processing method, audio processing device and electronic equipment
CN116312602B (en) Voice signal beam forming method based on interference noise space spectrum matrix
CN113223552B (en) Speech enhancement method, device, apparatus, storage medium, and program
Markovich‐Golan et al. Spatial filtering
Firoozabadi et al. Combination of nested microphone array and subband processing for multiple simultaneous speaker localization
CN113491137B (en) Flexible differential microphone array with fractional order
CN106448693A (en) Speech signal processing method and apparatus
CN112712818A (en) Voice enhancement method, device and equipment
Bagekar et al. Dual channel coherence based speech enhancement with wavelet denoising
CN110661510A (en) Beam former forming method, beam forming device and electronic equipment
CN117037836B (en) Real-time sound source separation method and device based on signal covariance matrix reconstruction
Liu et al. Sound source localization and speech enhancement algorithm based on fixed beamforming

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230530

Address after: 210034 floor 8, building D11, Hongfeng Science Park, Nanjing Economic and Technological Development Zone, Jiangsu Province

Patentee after: New Technology Co.,Ltd.

Patentee after: VOLKSWAGEN (CHINA) INVESTMENT Co.,Ltd.

Address before: 100094 1001, 10th floor, office building a, 19 Zhongguancun Street, Haidian District, Beijing

Patentee before: MOBVOI INFORMATION TECHNOLOGY Co.,Ltd.