CN118116398A - Microphone array-based adaptive voice enhancement method and related device thereof - Google Patents

Microphone array-based adaptive voice enhancement method and related device thereof Download PDF

Info

Publication number
CN118116398A
CN118116398A CN202410402981.6A CN202410402981A CN118116398A CN 118116398 A CN118116398 A CN 118116398A CN 202410402981 A CN202410402981 A CN 202410402981A CN 118116398 A CN118116398 A CN 118116398A
Authority
CN
China
Prior art keywords
domain data
frequency domain
microphone array
filter
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410402981.6A
Other languages
Chinese (zh)
Inventor
邓刚
赵宏亮
欧阳梓俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Changfeng Imaging Equipment Co ltd
Original Assignee
Shenzhen Changfeng Imaging Equipment Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Changfeng Imaging Equipment Co ltd filed Critical Shenzhen Changfeng Imaging Equipment Co ltd
Priority to CN202410402981.6A priority Critical patent/CN118116398A/en
Publication of CN118116398A publication Critical patent/CN118116398A/en
Pending legal-status Critical Current

Links

Landscapes

  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention relates to the field of audio processing, and discloses a microphone array-based adaptive voice enhancement method, device and equipment and a storage medium. The method comprises the following steps: acquiring a plurality of audio signals; determining first time domain data from the audio signal; transforming the first time domain data into first frequency domain data; filtering the first frequency domain data to obtain forward frequency domain data and reverse frequency domain data; the forward frequency domain data and the reverse frequency domain data are respectively overlapped and added to obtain second frequency domain data; transforming the second frequency domain data into second time domain data; and adaptively adjusting the gain of each output according to the second time domain data. The invention enhances the voice enhancement effect.

Description

Microphone array-based adaptive voice enhancement method and related device thereof
Technical Field
The present invention relates to the field of audio processing, and in particular, to a microphone array-based adaptive speech enhancement method, apparatus, device, and storage medium.
Background
With the development of technology, people's daily communication and man-machine interaction pay more and more attention to the quality of audio signals, and the demand drives the development of microphone voice enhancement technology. The traditional recording mode is simpler, often adopts single microphone design in portable terminal, and the pickup scope is little, and the speech enhancement effect is poor.
Disclosure of Invention
The invention mainly aims to solve the technical problem of poor voice enhancement effect of a microphone.
The first aspect of the present invention provides a microphone array-based adaptive speech enhancement method, which includes:
Acquiring a plurality of audio signals;
Determining first time domain data according to the audio signal;
transforming the first time domain data into first frequency domain data;
filtering the first frequency domain data to obtain forward frequency domain data and reverse frequency domain data;
The forward frequency domain data and the reverse frequency domain data are respectively overlapped and added to obtain second frequency domain data;
transforming the second frequency domain data into second time domain data;
And adaptively adjusting the gain of each output according to the second time domain data.
Optionally, in a first implementation manner of the first aspect of the present invention, the step of filtering the first frequency domain data to obtain forward frequency domain data and reverse frequency domain data includes:
And filtering the first frequency domain data according to a preset forward filter and a preset reverse filter to obtain forward frequency domain data and reverse frequency domain data.
Optionally, in a second implementation manner of the first aspect of the present invention, before the step of filtering the first frequency domain data according to a preset forward filter and a preset inverse filter to obtain forward frequency domain data and inverse frequency domain data, the method further includes:
a forward filter and a backward filter are generated.
Optionally, in a third implementation manner of the first aspect of the present invention, the step of generating the forward filter and the backward filter includes:
taking a preset minimum norm filter as a forward filter;
and generating a reverse filter according to the forward filter.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the step of generating a backward filter according to the forward filter includes:
Performing inverse sequence arrangement operation on the forward filter coefficients of the forward filter to obtain inverse filter coefficients;
And generating an inverse filter according to the inverse filter coefficient.
Optionally, in a fifth implementation manner of the first aspect of the present invention, the step of adaptively adjusting the gain of each output according to the second time domain data includes:
Detecting a preset gain value adjusted in real time;
When the preset gain value adjusted in real time is detected, multiplying the reverse data of the second time domain data with the preset gain value to obtain target data;
subtracting the target data from the forward data of the second time domain data to obtain enhanced audio data;
and the preset gain value is adaptively adjusted in real time according to the audio data, so that the step of triggering detection of the preset gain value adjusted in real time is returned.
Optionally, in a sixth implementation manner of the first aspect of the present invention, the step of acquiring a plurality of audio signals includes:
And acquiring a plurality of audio signals in the same direction.
A second aspect of the present invention provides an adaptive speech enhancement device based on a microphone array, comprising:
An acquisition module for acquiring a plurality of audio signals;
A determining module, configured to determine first time domain data according to the audio signal;
the transformation module is used for transforming the first time domain data into first frequency domain data;
the filtering module is used for filtering the first frequency domain data to obtain forward frequency domain data and reverse frequency domain data;
the calculation module is used for overlapping and adding the forward frequency domain data and the reverse frequency domain data respectively to obtain second frequency domain data;
an inverse transform module for transforming the second frequency domain data into second time domain data;
and the output module is used for adaptively adjusting the gain of each output according to the second time domain data.
A third aspect of the present invention provides an adaptive speech enhancement device based on a microphone array, comprising: a memory and at least one processor, the memory having instructions stored therein, the memory and the at least one processor being interconnected by a line; the at least one processor invokes the instructions in the memory to cause the microphone array based adaptive speech enhancement device to perform the microphone array based adaptive speech enhancement method described above.
A fourth aspect of the invention provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the microphone array based adaptive speech enhancement method described above.
In the present embodiment, a plurality of audio signals are acquired; determining first time domain data according to the audio signal; transforming the first time domain data into first frequency domain data; filtering the first frequency domain data to obtain forward frequency domain data and reverse frequency domain data; the forward frequency domain data and the reverse frequency domain data are respectively overlapped and added to obtain second frequency domain data; transforming the second frequency domain data into second time domain data; and adaptively adjusting the gain of each output according to the second time domain data. The adaptive voice enhancement device based on the microphone array can acquire a plurality of audio signals, so that the pick-up range is increased; the sound in front of the user can be effectively picked up by filtering and overlap-adding based on a plurality of audio signals, and interference of environmental noise in other directions is eliminated; the obtained second time domain signal is used for outputting gain, so that spatial domain information can be better utilized, and compared with a single microphone, the voice enhancement effect is improved by adopting a multi-microphone array design.
Drawings
FIG. 1 is a schematic diagram of an embodiment of a microphone array-based adaptive speech enhancement method according to an embodiment of the present invention;
FIG. 2 is a reference diagram of one embodiment of a microphone array-based adaptive speech enhancement method in accordance with an embodiment of the present invention;
FIG. 3 is a reference diagram of another embodiment of a microphone array-based adaptive speech enhancement method in accordance with an embodiment of the present invention;
FIG. 4 is a schematic diagram of another embodiment of a microphone array-based adaptive speech enhancement method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an embodiment of an adaptive speech enhancement device based on a microphone array in an embodiment of the invention;
Fig. 6 is a schematic diagram of an embodiment of an adaptive speech enhancement device based on a microphone array in an embodiment of the invention.
Detailed Description
The embodiment of the invention provides a microphone array-based adaptive voice enhancement method, a microphone array-based adaptive voice enhancement device, microphone array-based adaptive voice enhancement equipment and a storage medium.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
For ease of understanding, a specific flow of an embodiment of the present invention is described below with reference to fig. 1, and an embodiment of a microphone array-based adaptive speech enhancement method in an embodiment of the present invention includes:
101. Acquiring a plurality of audio signals;
Specifically, the adaptive voice enhancement device based on the microphone array provides an adaptive differential microphone array pickup mode, and sound source localization and voice enhancement are realized through capturing sound information in a space domain. The microphone array that fixed microphone range, arrange along same straight line can effectively catch the sound signal in place ahead, improves the voice quality of signal in place ahead, eliminates the interference of other direction environmental noise, improves voice recognition's accuracy and quality.
Optionally, the adaptive differential microphone array utilizes time and amplitude differences between a plurality of microphone elements to achieve estimation and tracking of sound source position. By dynamically adjusting the phase and gain between the different microphones, the array forms beam pointing directly in front of it, enhancing the reception of the target sound source signal while suppressing noise interference from other directions. Therefore, the definition and strength of the voice signal can be effectively improved, and the effects of voice communication and recognition are improved. The adaptive differential microphone array pickup mode combines the processing of airspace sound information and the enhancement of front directivity, and has an efficient voice enhancement effect.
102. Determining first time domain data according to the audio signal;
specifically, the collected plurality of audio signals are processed and analyzed, so that a plurality of first time domain data can be obtained. The time domain data is the change of the signal on the time axis, and reflects the characteristics of the amplitude, frequency, phase and the like of the signal. The analog signal may be converted to a digital signal and then the digital signal may be converted to frequency domain data using, for example, a fourier transform.
103. Transforming the first time domain data into first frequency domain data;
Optionally, a windowing operation is performed on each time domain data frame, including, but not limited to, hanning windows (Hanning windows), hamming windows (Hamming windows), and the like. To reduce spectral leakage.
A short-time fourier transform is applied to each windowed time domain data frame. The short-time Fourier transform converts each time domain data frame into frequency domain data to obtain corresponding frequency spectrum information.
After applying the short-time fourier transform to each time-domain data frame, each first time-domain data is converted into corresponding first frequency-domain data. Thus, if there are a plurality of first time domain data, each data will generate a corresponding first frequency domain data, thereby obtaining a plurality of first frequency domain data.
104. Filtering the first frequency domain data to obtain forward frequency domain data and reverse frequency domain data;
In particular, a complete audio signal may be represented in the time domain or in the frequency domain.
In the time domain, a complete audio signal refers to a sound pressure waveform that varies over time, and can be represented by a waveform diagram showing the amplitude variation of sound at different points in time.
In the frequency domain, a complete audio signal may then be represented by forward frequency domain data, i.e. data that converts the time domain signal into a frequency domain representation. This frequency domain representation shows the components of sound at different frequencies, and can show the various frequency components contained in the audio signal and their intensities.
The inverse frequency domain data may then be used to reconstruct the original time domain audio signal, i.e. converted back from the frequency domain data to the time domain representation. Thus, the forward frequency domain data and the reverse frequency domain data provide a way to convert between the time domain and the frequency domain. The filter may be used to process the frequency domain data to obtain a forward frequency domain data and a reverse frequency domain data.
Optionally, filtering the first frequency domain data according to a preset forward filter and a preset reverse filter to obtain forward frequency domain data and reverse frequency domain data. And applying a forward filter to each first frequency domain data to obtain corresponding forward frequency domain data. And applying an inverse filter to the same first frequency domain data to obtain corresponding inverse frequency domain data.
Optionally, a forward filter and a backward filter are generated. Specifically, a preset minimum norm filter is used as a forward filter; and generating a reverse filter according to the forward filter. The minimum norm filter is a digital filter design method, and the coefficient vector of the filter is adjusted so that the norm of the coefficient vector of the filter is minimized under given constraint conditions. In this embodiment, a minimum-norm filter is designed as the forward filter, and the coefficients of the backward filter are obtained by arranging the coefficients of the forward filter in the reverse order based on the forward filter.
105. The forward frequency domain data and the reverse frequency domain data are respectively overlapped and added to obtain second frequency domain data;
Specifically, for each set of forward frequency domain data and corresponding inverse frequency domain data, it is considered as complex form, including amplitude and phase. And respectively adding the amplitude and the phase of the corresponding positions of the forward frequency domain data and the reverse frequency domain data to obtain synthesized complex frequency domain data. And adding the complex frequency domain data of all groups to obtain synthesized second frequency domain data.
Alternatively, referring to fig. 2, a microphone array composed of 6 microphones is exemplified by a 6-channel signal model, and in practical use, the number of microphones is not limited to 6, but may be 2, 3, 4, 5, 6, 7, 8, 9, 10 … … n, and the number of microphones is not limited in theory, and the specific number is not limited herein.
Specifically, all the microphones of the adaptive voice enhancement device based on the microphone array in this embodiment need to be on the same straight line, and the distances between the two microphones are fixed and equal. Referring to fig. 3, in the cardioid mode, since the microphone array is a fixed beam forming, the sound behind the array is greatly attenuated. Matching the distance between microphones and the electrical delay are critical to a good beamforming array. The heart-shaped sound reception indicates that the microphone type in the microphone array is heart-shaped pointing microphone, and the sound reception schematic diagram is a lover shape as the name implies. Sensitivity to the front area of the microphone is slightly weaker on both sides and can be ignored. In video shooting, or studio recording, heart microphones are very practical in this case if you do not want to pick up ambient noise. In video shooting, especially outdoor shooting, sound is from everything assailed, and heart microphone can shield various environment murmurs that passersby, spectators etc. sent, accomplishes accurate listing. In studio reception, the heart-shaped microphone can effectively reduce surround sound and sound reflected by the microphone, and can well present human voice even in a less ideal environment.
Specifically, the following terms are defined:
MIC1-6: microphones 1 to 6;
Delta: a distance between the two microphones;
Lowercase x1-x6: external signals received by the microphones respectively;
STFT: short-time Fourier transform;
capitalization X1-X6: after STFT conversion, microphone signals x1-x6 are converted from time domain to frequency domain;
H1-H6 from left to right H1, H2, H3, H4, H5 and H6 are forward filters;
H6-H1, from left to right H6, H5, H4, H3, H2, H1 are inverse filters;
Gain, defaulting to 1.
The microphone in the adaptive speech enhancement device based on the microphone array of this embodiment adopts fixed beam forming, taking heart shape as an example:
the gain is at maximum 1 when at 0 °, i.e. before MIC1 (solid line in fig. 3);
The gain is at a minimum of 0 when at 180 °, i.e. after MIC6 (solid line in fig. 3);
in the first order differential microphone there are alpha and beta;
α, the angle at which the zero point is located, the above figure is a heart shape as an example, the zero point is at 180 ° because cos 180 ° = -1, all α= -1;
whereas in the first order differential microphone, the null is between 1 ° and 180 ° because the beamforming is symmetrical;
the first-order differential microphone pointing mode is two rows and one column [1;0].
Two constraints are defined:
dT(ω,cos0°)h(ω)=dT(ω,1)h(ω)=1
dT(ω,α1,1)h(ω)=β1,1
the following terms are defined:
Steering vector d (ω, cos0 °) =d (ω, 1);
τ is the amount of delay between sound arrival at the microphones;
c is the speed of sound, here defaulting to 340m/s;
delta is the fixed distance between the two microphones defaults to 0.01m (i.e. 1 cm);
There is τ=δ/c;
In front of the end-fire direction, ideally in front of the first microphone:
dT(ω,1)=[1exp(-jωτ)]
Also in ideal condition, the rear of the last microphone:
dT(ω,α)=[1exp(-jωτα)]
from the vandermonde determinant (Vandermonde matrix), we find:
Since beamforming is fixed, H1, H2, H3, H4, H5, H6 are common.
In addition, in order to enhance stability, the algorithm uses a Minimum norm filter (Minimum-Norm Solution filter):
D(ω,α)h(ω)=β
since the algorithm of this embodiment uses only the first order microphone, there are:
Obtained before beta
Then
Since beamforming is fixed, D (ω, α) can be found in advance:
h(ω,α,β)=DT(ω,α)[D(ω,α)DT(ω,α)]-1β;
H (ω, α, β) is then determined from D (ω, α):
Filtering and outputting: according to the previous treatment, X1, X2, X3, X4, X5, X6 and H1, H2, H3, H4, H5, H6 are obtained;
Frequency domain output overlap-add (x1..h1+x2..h2+x3..h3+x4..h4+x5..h5+x6..h6).
The above output phase is added as an overlap-add to obtain second frequency domain data.
106. Transforming the second frequency domain data into second time domain data;
Optionally, performing inverse fast fourier transform on the second frequency domain data to obtain the second time domain data. Specifically, the above overlap-add is then Inverse Fast Fourier Transformed (IFFT) into time domain x_forward forward filtered data. IFFT (INVERSE FAST Fourier Transform) is one implementation in OFDM technology: the quadrature modulation and demodulation of each subchannel can be achieved by using IDFT (INVERSE DISCRETE Fourier Transform) and DFT, respectively, and in a system where the number of subcarriers is large, can be achieved by using IFFT (INVERSE FAST Fourier Transform) and FFT.
Likewise, the frequency domain outputs overlap-add (x1..h6+x2..h5+x3..h4+x4..h3+x5..h2+x6..h1) and then performs inverse fast fourier transform to time domain x_backward post-filtered data.
And the data after the forward filtering of the x_forward of the time domain and the data after the backward filtering of the x_backward of the time domain are the second time domain data.
107. And adaptively adjusting the gain of each output according to the second time domain data.
In the embodiment of the invention, a plurality of audio signals are acquired; determining first time domain data according to the audio signal; transforming the first time domain data into first frequency domain data; filtering the first frequency domain data to obtain forward frequency domain data and reverse frequency domain data; the forward frequency domain data and the reverse frequency domain data are respectively overlapped and added to obtain second frequency domain data; transforming the second frequency domain data into second time domain data; and adaptively adjusting the gain of each output according to the second time domain data. The adaptive voice enhancement device based on the microphone array can acquire a plurality of audio signals, so that the pick-up range is increased; the sound in front of the user can be effectively picked up by filtering and overlap-adding based on a plurality of audio signals, and interference of environmental noise in other directions is eliminated; the obtained second time domain signal is used for outputting gain, so that spatial domain information can be better utilized, and compared with a single microphone, the voice enhancement effect is improved by adopting a multi-microphone array design.
Referring to fig. 4, another embodiment of the adaptive speech enhancement method based on a microphone array according to an embodiment of the present invention includes:
201. Detecting a preset gain value adjusted in real time;
202. When the preset gain value adjusted in real time is detected, multiplying the reverse data of the second time domain data with the preset gain value to obtain target data;
203. Subtracting the target data from the forward data of the second time domain data to obtain enhanced audio data;
204. and the preset gain value is adaptively adjusted in real time according to the audio data, so that the step of triggering detection of the preset gain value adjusted in real time is returned.
Specifically, the preset gain value input by the user is expressed as a scalar. And multiplying the reverse data of each sampling point of the second time domain data with a corresponding preset gain value to obtain target data. And subtracting the corresponding target data from the forward data of each sampling point of the second time domain data to obtain the enhanced audio data. And implementing an adaptive gain adjustment algorithm according to the characteristics of the enhanced audio data.
For example: referring to fig. 4, the gain of each output is adjusted according to the adaptive filter of the microphone array-based adaptive voice enhancement device:
Input of NLMS adaptive filter: x_backward, e.
Wherein e (n) =x_forward (n) -x_backward (n);
finally, a filter, wherein the constraint of the preset gain is 0< = gain < = 1;
Mu is a preset step size, defaulting to 0.01.
gain=gain+μ*x_backward(n)/(x_backward(n)*x_backward(n))*e(n);
Updating gain each time is equivalent to adjusting the gain of sound according to the position of the sound, thereby achieving better directivity.
And finally, outputting: the output is e.
In the embodiment of the invention, because the microphone is a single tap, the gain is a scalar and is not a vector, the microphone can be executed in the time domain and is convenient to be transplanted to equipment.
The foregoing describes a microphone array-based adaptive speech enhancement method in the embodiment of the present invention, and the following describes a microphone array-based adaptive speech enhancement device in the embodiment of the present invention, referring to fig. 5, and one embodiment of the microphone array-based adaptive speech enhancement device in the embodiment of the present invention includes:
An acquisition module 301, configured to acquire a plurality of audio signals;
a determining module 302, configured to determine first time domain data according to the audio signal;
a transforming module 303, configured to transform the first time domain data into first frequency domain data;
the filtering module 304 is configured to filter the first frequency domain data to obtain forward frequency domain data and reverse frequency domain data;
A calculation module 305, configured to overlap-add the forward frequency domain data and the reverse frequency domain data to obtain second frequency domain data;
An inverse transform module 306 for transforming the second frequency domain data into second time domain data;
And an output module 307, configured to adaptively adjust a gain of each output according to the second time domain data.
Optionally, the filtering module 304 may be further specifically configured to:
And filtering the first frequency domain data according to a preset forward filter and a preset reverse filter to obtain forward frequency domain data and reverse frequency domain data.
Optionally, the filtering module 304 may be further specifically configured to:
And filtering the first frequency domain data according to a preset forward filter and a preset reverse filter to obtain forward frequency domain data and reverse frequency domain data.
Optionally, the filtering module 304 may be further specifically configured to:
a forward filter and a backward filter are generated.
Optionally, the filtering module 304 may be further specifically configured to:
taking a preset minimum norm filter as a forward filter;
and generating a reverse filter according to the forward filter.
Optionally, the filtering module 304 may be further specifically configured to:
Performing inverse sequence arrangement operation on the forward filter coefficients of the forward filter to obtain inverse filter coefficients;
And generating an inverse filter according to the inverse filter coefficient.
Optionally, the output module 307 may be further specifically configured to:
Detecting a preset gain value adjusted in real time;
When the preset gain value adjusted in real time is detected, multiplying the reverse data of the second time domain data with the preset gain value to obtain target data;
subtracting the target data from the forward data of the second time domain data to obtain enhanced audio data;
and the preset gain value is adaptively adjusted in real time according to the audio data, so that the step of triggering detection of the preset gain value adjusted in real time is returned.
Optionally, the obtaining module 301 may be further specifically configured to:
And acquiring a plurality of audio signals in the same direction.
In the embodiment of the invention, a plurality of audio signals are acquired; determining first time domain data according to the audio signal; transforming the first time domain data into first frequency domain data; filtering the first frequency domain data to obtain forward frequency domain data and reverse frequency domain data; the forward frequency domain data and the reverse frequency domain data are respectively overlapped and added to obtain second frequency domain data; transforming the second frequency domain data into second time domain data; and adaptively adjusting the gain of each output according to the second time domain data. The adaptive voice enhancement device based on the microphone array can acquire a plurality of audio signals, so that the pick-up range is increased; the sound in front of the user can be effectively picked up by filtering and overlap-adding based on a plurality of audio signals, and interference of environmental noise in other directions is eliminated; the obtained second time domain signal is used for outputting gain, so that spatial domain information can be better utilized, and compared with a single microphone, the voice enhancement effect is improved by adopting a multi-microphone array design.
The foregoing fig. 5 describes the microphone array-based adaptive voice enhancement device in the embodiment of the present invention in detail from the point of view of a modularized functional entity, and the following describes the microphone array-based adaptive voice enhancement device in the embodiment of the present invention in detail from the point of view of hardware processing.
Fig. 6 is a schematic structural diagram of a microphone array-based adaptive voice enhancement device 500 according to an embodiment of the present invention, where the microphone array-based adaptive voice enhancement device 500 may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 510 (e.g., one or more processors) and a memory 520, one or more storage mediums 530 (e.g., one or more mass storage devices) storing application programs 533 or data 532. Wherein memory 520 and storage medium 530 may be transitory or persistent storage. The program stored in the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations on the microphone array-based adaptive speech enhancement device 500. Still further, the processor 510 may be arranged to communicate with a storage medium 530 to perform a series of instruction operations in the storage medium 530 on the microphone array based adaptive speech enhancement device 500.
The microphone array-based adaptive speech enhancement device 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input/output interfaces 560, and/or one or more operating systems 531, such as Windows Serve, mac OS X, unix, linux, freeBSD, and the like. It will be appreciated by those skilled in the art that the microphone array based adaptive speech enhancement device structure shown in fig. 6 does not constitute a limitation of the microphone array based adaptive speech enhancement device, and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, and which may also be a volatile computer readable storage medium, the computer readable storage medium having instructions stored therein which, when executed on a computer, cause the computer to perform the steps of the microphone array based adaptive speech enhancement method.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the system or apparatus and unit described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A microphone array-based adaptive speech enhancement method, the microphone array-based adaptive speech enhancement method comprising:
Acquiring a plurality of audio signals;
Determining first time domain data according to the audio signal;
transforming the first time domain data into first frequency domain data;
filtering the first frequency domain data to obtain forward frequency domain data and reverse frequency domain data;
The forward frequency domain data and the reverse frequency domain data are respectively overlapped and added to obtain second frequency domain data;
transforming the second frequency domain data into second time domain data;
And adaptively adjusting the gain of each output according to the second time domain data.
2. The microphone array-based adaptive speech enhancement method of claim 1, wherein the step of filtering the first frequency domain data to obtain forward frequency domain data and reverse frequency domain data comprises:
And filtering the first frequency domain data according to a preset forward filter and a preset reverse filter to obtain forward frequency domain data and reverse frequency domain data.
3. The method for adaptive speech enhancement based on a microphone array according to claim 2, wherein before the step of filtering the first frequency domain data according to a predetermined forward filter and a predetermined backward filter to obtain forward frequency domain data and backward frequency domain data, the method further comprises:
a forward filter and a backward filter are generated.
4. The microphone array-based adaptive speech enhancement method of claim 3, wherein the step of generating the forward filter and the backward filter comprises:
taking a preset minimum norm filter as a forward filter;
and generating a reverse filter according to the forward filter.
5. The microphone array-based adaptive speech enhancement method of claim 4, wherein the step of generating a backward filter from the forward filter comprises:
Performing inverse sequence arrangement operation on the forward filter coefficients of the forward filter to obtain inverse filter coefficients;
And generating an inverse filter according to the inverse filter coefficient.
6. The microphone array-based adaptive speech enhancement method according to claim 1, wherein the step of adaptively adjusting the gain of each output according to the second time domain data comprises:
Detecting a preset gain value adjusted in real time;
When the preset gain value adjusted in real time is detected, multiplying the reverse data of the second time domain data with the preset gain value to obtain target data;
subtracting the target data from the forward data of the second time domain data to obtain enhanced audio data;
and the preset gain value is adaptively adjusted in real time according to the audio data, so that the step of triggering detection of the preset gain value adjusted in real time is returned.
7. The microphone array-based adaptive speech enhancement method according to any of claims 1-6, wherein the step of acquiring a plurality of audio signals comprises:
And acquiring a plurality of audio signals in the same direction.
8. An adaptive speech enhancement device based on a microphone array, the adaptive speech enhancement device based on a microphone array comprising:
The acquisition module is used for acquiring a plurality of voice signals;
a determining module, configured to determine first time domain data according to the voice signal;
the transformation module is used for transforming the first time domain data into first frequency domain data;
the filtering module is used for filtering the first frequency domain data to obtain forward frequency domain data and reverse frequency domain data;
The calculation module is used for carrying out overlap addition on the forward frequency domain data and the reverse frequency domain data to obtain second frequency domain data;
The inverse transformation module is used for transforming the second frequency domain data into second time domain data;
and the output module is used for adaptively adjusting the gain of each output according to the second time domain data.
9. An adaptive speech enhancement device based on a microphone array, the adaptive speech enhancement device based on a microphone array comprising: a memory and at least one processor, the memory having instructions stored therein, the memory and the at least one processor being interconnected by a line;
The at least one processor invoking the instructions in the memory to cause the microphone array based adaptive speech enhancement device to perform the microphone array based adaptive speech enhancement method of any of claims 1-7.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the microphone array based adaptive speech enhancement method according to any of claims 1-7.
CN202410402981.6A 2024-04-03 2024-04-03 Microphone array-based adaptive voice enhancement method and related device thereof Pending CN118116398A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410402981.6A CN118116398A (en) 2024-04-03 2024-04-03 Microphone array-based adaptive voice enhancement method and related device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410402981.6A CN118116398A (en) 2024-04-03 2024-04-03 Microphone array-based adaptive voice enhancement method and related device thereof

Publications (1)

Publication Number Publication Date
CN118116398A true CN118116398A (en) 2024-05-31

Family

ID=91215543

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410402981.6A Pending CN118116398A (en) 2024-04-03 2024-04-03 Microphone array-based adaptive voice enhancement method and related device thereof

Country Status (1)

Country Link
CN (1) CN118116398A (en)

Similar Documents

Publication Publication Date Title
US10123113B2 (en) Selective audio source enhancement
US8363850B2 (en) Audio signal processing method and apparatus for the same
US9628905B2 (en) Adaptive beamforming for eigenbeamforming microphone arrays
US8275148B2 (en) Audio processing apparatus and method
CN110534126B (en) Sound source positioning and voice enhancement method and system based on fixed beam forming
JP4066197B2 (en) Microphone device
WO2007127182A2 (en) Noise reduction system and method
CN109285557B (en) Directional pickup method and device and electronic equipment
Lockwood et al. Beamformer performance with acoustic vector sensors in air
WO2014007911A1 (en) Audio signal processing device calibration
CN111445920A (en) Multi-sound-source voice signal real-time separation method and device and sound pick-up
CN105976822B (en) Audio signal extracting method and device based on parametrization supergain beamforming device
CN110085246A (en) Sound enhancement method, device, equipment and storage medium
Teutsch et al. Detection and localization of multiple wideband acoustic sources based on wavefield decomposition using spherical apertures
DE69939272D1 (en) BINAURAL SIGNAL PROCESSING TECHNIQUES
Huleihel et al. Spherical array processing for acoustic analysis using room impulse responses and time-domain smoothing
CN113299307B (en) Microphone array signal processing method, system, computer equipment and storage medium
CN110675892B (en) Multi-position voice separation method and device, storage medium and electronic equipment
Liu et al. Deep learning assisted sound source localization using two orthogonal first-order differential microphone arrays
Xie et al. Deconvolved frequency-difference beamforming for a linear array
CN110875056A (en) Voice transcription device, system, method and electronic device
CN114001816B (en) Acoustic imager audio acquisition system based on MPSOC
WO2023108864A1 (en) Regional pickup method and system for miniature microphone array device
CN110689900B (en) Signal enhancement method and device, computer readable storage medium and electronic equipment
CN118116398A (en) Microphone array-based adaptive voice enhancement method and related device thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination