CN114220450A - Method for restraining strong noise of space-based finger-controlled environment - Google Patents
Method for restraining strong noise of space-based finger-controlled environment Download PDFInfo
- Publication number
- CN114220450A CN114220450A CN202111370832.9A CN202111370832A CN114220450A CN 114220450 A CN114220450 A CN 114220450A CN 202111370832 A CN202111370832 A CN 202111370832A CN 114220450 A CN114220450 A CN 114220450A
- Authority
- CN
- China
- Prior art keywords
- noise
- voice signal
- deep learning
- voice
- noise reduction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000000452 restraining effect Effects 0.000 title abstract description 8
- 230000009467 reduction Effects 0.000 claims abstract description 39
- 238000013135 deep learning Methods 0.000 claims abstract description 37
- 238000012545 processing Methods 0.000 claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 18
- 230000001629 suppression Effects 0.000 claims abstract description 9
- 230000003044 adaptive effect Effects 0.000 claims description 23
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000006835 compression Effects 0.000 claims description 4
- 238000007906 compression Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 230000003595 spectral effect Effects 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 3
- 230000003993 interaction Effects 0.000 abstract description 2
- 238000001228 spectrum Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000007796 conventional method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000003321 amplification Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000003631 expected effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0224—Processing in the time domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The application belongs to the technical field of aviation display control, and particularly relates to a method for restraining strong noise of a space-based finger control environment. The method comprises the following steps: step one, optimizing a microphone array for space-based control, acquiring a first voice signal and engine mechanical noise through the microphone array, and processing the first voice signal through a super-directional beam former to obtain a noise-reduced second voice signal pointing to the mouth of a pilot; removing mechanical engine noise in the second voice signal through a self-adaptive filter to obtain a third voice signal; and step three, carrying out deep learning noise reduction model training, carrying out cabin noise suppression processing on the third voice signal through the deep learning noise reduction model to obtain a fourth voice signal, and sending the fourth voice signal to a recognition engine. The method for restraining the strong noise of the space-based finger-controlled environment can improve the quality of the collected voice data of the human voice, further improve the voice recognition accuracy of an operator and promote the engineering application of voice interaction related technology.
Description
Technical Field
The application belongs to the technical field of aviation display control, and particularly relates to a method for restraining strong noise of a space-based finger control environment.
Background
In the deployment of the air-based command control system and the existing man-machine cabin, an operator completes the command control function of the unmanned aerial vehicle through voice recognition under the high dynamic environment, the strong noise environment of the existing cabin leads to the fact that the audio data acquired by the pickup equipment of the traditional helmet cannot meet the requirement of voice recognition, and the typical aerodynamic noise in the cabin cannot be eliminated through simple physical noise reduction or the traditional active noise reduction and other methods at the rear end.
Accordingly, a technical solution is desired to overcome or at least alleviate at least one of the above-mentioned drawbacks of the prior art.
Disclosure of Invention
The application aims to provide a method for suppressing strong noise in a space-based finger-controlled environment, so as to solve at least one problem in the prior art.
The technical scheme of the application is as follows:
a method for suppressing strong noise in a space-based finger-controlled environment comprises the following steps:
step one, optimizing a microphone array for space-based control, acquiring a first voice signal and mechanical noise of an engine through the microphone array, and processing the first voice signal through a super-directional beam former to obtain a noise-reduced second voice signal which points to the mouth of a pilot;
removing the mechanical noise of the engine in the second voice signal through a self-adaptive filter to obtain a third voice signal;
and step three, carrying out deep learning noise reduction model training, carrying out cabin noise suppression processing on the third voice signal through the deep learning noise reduction model to obtain a fourth voice signal, and sending the fourth voice signal to a recognition engine.
In at least one embodiment of the present application, in step one, the microphone array for optimizing space-based steering includes: four microphones are uniformly distributed in a linear mode at the headset pickup position of the driver helmet, and one microphone is installed on the side face of the driver helmet.
In at least one embodiment of the present application, in step one, the super-directional beamformer is:
supposing a uniform linear array of M omnidirectional microphones with an array element spacing of delta, in the presence of isotropic noise, a super-directional beam former is designed, the array gain of which in the end-fire direction reaches M2The array gain is defined as follows:
δij=(i-j)δ
wherein d isL(ω, θ) is the array steering vector, θ is the desired direction;
make the array gain ζL,dn[h(ω)]The maximum filter is found by:
wherein the content of the first and second substances,for any complex number, the maximum signal-to-noise ratio filter is obtained through constraint of a distortion-free criterion, namely, the super-directional beam former is as follows:
in at least one embodiment of the present application, the method further comprises processing the super-directional beamformer to obtain a robust super-directional beamformer:
acquiring a white noise gain:
maximizing the directivity factor under the constraint of white noise gain is equivalent to minimizing the following equation:
wherein epsilon is a Lagrange multiplier, and a robust super-directional beam former is obtained through constraint of a distortion-free criterion:
in at least one embodiment of the present application, in the second step, the removing the engine mechanical noise in the second speech signal by using an adaptive filter to obtain a third speech signal includes:
obtaining an adaptive filter;
removing the engine mechanical noise in the second voice signal through an adaptive filter to obtain a third voice signal:
e(n)=d(n)-y(n)
wherein, x (N) is engine mechanical noise, d (N) is a second voice signal, h (N) is an adaptive filter, e (N) is a third voice signal, and N is a filter length.
In at least one embodiment of the present application, in step three, the performing deep learning noise reduction model training includes:
training the deep learning noise reduction model by using the mixed voice y output by the signal processing module and the corresponding pure voice s:
extracting an acoustic feature F of the mixed voice y:
F=log(mel(STFT(y)))
wherein STFT is short-time Fourier transform, and mel is Mel spectral feature;
the acoustic feature F is normalized:
F=(F-mean)/var
wherein mean is the average value of the training set characteristics, and var is the standard deviation of the training set characteristics;
sending the normalized acoustic features F into a deep learning noise reduction model as input, and guiding the update of model parameters by using the output s _ of the deep learning noise reduction model and the MSE error e between actual pure speeches s;
e=||s_-ISTFT(STFT(y)*model(F))||2
the ISTFT is inverse short-time Fourier transform, the model is a model, and the multiplication is dot;
and (4) until the value of the MSE error e tends to be stable, converging the deep learning noise reduction model, and storing the deep learning noise reduction model.
In at least one embodiment of the present application, the method further includes performing model compression on the deep learning noise reduction model, specifically:
a 16bit fixed-point method is used for all parameters of the deep learning noise reduction model to accelerate the calculation speed;
SVD decomposition is applied to the matrix to reduce the amount of computation.
The invention has at least the following beneficial technical effects:
the method for restraining the strong noise of the space-based finger-controlled environment can improve the quality of the collected voice data of the human voice, further improve the voice recognition accuracy of an operator and promote the engineering application of voice interaction related technology.
Drawings
FIG. 1 is a flow chart of a method for suppressing strong noise in a space-based finger-controlled environment according to an embodiment of the present application;
FIG. 2 is a super-directive beam pattern according to an embodiment of the present application;
FIG. 3 is a flow diagram of an adaptive filter process according to an embodiment of the present application;
FIG. 4 is an adaptive filter according to an embodiment of the present application;
FIG. 5 is a frequency domain block adaptive filter according to an embodiment of the present application;
FIG. 6 is a flow chart of deep learning noise reduction model training according to an embodiment of the present application.
Detailed Description
In order to make the implementation objects, technical solutions and advantages of the present application clearer, the technical solutions in the embodiments of the present application will be described in more detail below with reference to the drawings in the embodiments of the present application. In the drawings, the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The described embodiments are a subset of the embodiments in the present application and not all embodiments in the present application. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. Embodiments of the present application will be described in detail below with reference to the accompanying drawings.
In the description of the present application, it is to be understood that the terms "center", "longitudinal", "lateral", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like indicate orientations or positional relationships based on those shown in the drawings, and are used merely for convenience in describing the present application and for simplifying the description, and do not indicate or imply that the referenced device or element must have a particular orientation, be constructed in a particular orientation, and be operated, and therefore should not be construed as limiting the scope of the present application.
The present application is described in further detail below with reference to fig. 1 to 6.
The application provides a method for suppressing strong noise in a space-based finger-controlled environment, as shown in fig. 1, comprising the following steps:
step one, optimizing a microphone array for space-based control, acquiring a first voice signal and engine mechanical noise through the microphone array, and processing the first voice signal through a super-directional beam former to obtain a noise-reduced second voice signal pointing to the mouth of a pilot;
removing mechanical engine noise in the second voice signal through a self-adaptive filter to obtain a third voice signal;
and step three, carrying out deep learning noise reduction model training, carrying out cabin noise suppression processing on the third voice signal through the deep learning noise reduction model to obtain a fourth voice signal, and sending the fourth voice signal to a recognition engine.
In the method for suppressing strong noise in the space-based finger control environment, in the first step, optimizing a microphone array for space-based finger control includes: four microphones are uniformly distributed in a linear mode at the headset pickup position of the helmet of the driver and used for collecting voice control instructions of the pilot, and meanwhile, one microphone is arranged on the side face of the helmet of the driver and used for collecting mechanical noise of an aircraft engine. The voice signals collected by the microphone array are processed by the super-directional beam former, wind noise generated after the front end of the airplane is rubbed with air is filtered, and noise-reduced signals pointing to the mouth of a pilot can be obtained.
Super-directional beamforming is one type of fixed beamforming that has a narrower main lobe and is more noise-suppressing than conventional beamforming. In one embodiment of the present application, in step one, the super-directional beamformer is:
supposing a uniform linear array of M omnidirectional microphones with an array element spacing of delta, in the presence of isotropic noise, a super-directional beam former is designed, the array gain of which in the end-fire direction reaches M2The array gain is defined as follows:
δij=(i-j)δ
wherein d isL(ω, θ) is the array steering vector, θ is the desired direction;
make the array gain ζL,dn[h(ω)]The maximum filter is found by:
wherein the content of the first and second substances,for any complex number, the maximum signal-to-noise ratio filter is obtained through constraint of a distortion-free criterion, namely, the super-directional beam former is as follows:
the maximum snr filter is the super-directional beamformer (main lobe direction θ). But the super-directional beamformer has a white noise gain at low frequencies much less than 1 and therefore in practical applications will cause an amplification of the white noise, especially at low frequencies. Therefore, in this embodiment, it is preferable that the processing of the super-directional beamformer results in a robust super-directional beamformer:
acquiring a white noise gain:
maximizing the directivity factor under the constraint of white noise gain is equivalent to minimizing the following equation:
wherein epsilon is a Lagrange multiplier, and a robust super-directional beam former is obtained through constraint of a distortion-free criterion:
on the basis of the robust super-directional beam former, low-frequency sidelobe constraint is continuously added, so that the beam direction is narrower at low frequency, and the suppression on noise is enhanced. Through the beam forming processing of the stage, the integral signal-to-noise ratio is improved by about 10-15 dB.
Fig. 2 shows the super-directional beam pattern designed in this embodiment, and it can be seen that the super-directional beam former has limited noise suppression capability at low frequencies. In this regard, noise will be further suppressed by incorporating adaptive noise cancellation methods.
According to the space-based finger-control environment strong noise suppression method, under the environment of an airplane cabin, the frequency spectrum occupied by wind noise is relatively wide and does not have obvious frequency spectrum characteristics, but engine mechanical noise usually has obvious frequency spectrum characteristics, so that adaptive noise cancellation processing aiming at the engine mechanical noise under the environment of the cabin is possible. According to the method, the mic is arranged on the side face of the helmet of the pilot to collect the reference noise, wherein the collected reference noise comprises wind noise and engine mechanical noise, but the frequency spectrum structure of the engine mechanical noise is more critical to the learning and tracking of the adaptive filter processing. As shown in fig. 3, the noise-containing signal obtained by mixing the mechanical engine noise collected by the speech command and the reference microphone is collected by the microphone array disposed in the headset of the helmet, so that the mechanical engine noise collected by the microphone can be well eliminated as long as the change of H can be accurately calculated and tracked.
In this embodiment, in the second step, removing the mechanical engine noise in the second speech signal by using the adaptive filter to obtain the third speech signal includes:
obtaining an adaptive filter;
removing the engine mechanical noise in the second voice signal through an adaptive filter to obtain a third voice signal:
e(n)=d(n)-y(n)
wherein, x (N) is engine mechanical noise, d (N) is a second voice signal, h (N) is an adaptive filter, e (N) is a third voice signal, and N is a filter length. A block diagram of the adaptive cancellation algorithm process can be represented by fig. 4, with s (n) representing the near-end speaker's speech.
In the preferred embodiment of the present application, in a practical complex usage environment of the cabin, fast tracking and convergence of the filter need to be considered, the time domain processing usually generates unacceptable delay, and the frequency domain processing has the advantages of low computational complexity and high convergence rate compared to the time domain processing, and each frequency band in the convergence process can be precisely controlled to achieve the overall optimal filtering, so in this embodiment, the structure of the frequency domain block adaptive filter (PBFDAF) is adopted for adaptive noise cancellation, and the structure is shown in fig. 5.
The coefficient vector of the time-domain adaptive filter can be represented as:
w(n)=[w0(n)…wM-1(n)]T
the error vector in the time domain is:
e(n)=[e(n)…e(n+M-1)]T
the input signal matrix in the frequency domain is:
X(k)=diag{X0(k)…X2M-1(k)}
=diag{F[x(kM-M)…x(kM+M-1)]T}
the filter coefficients and error signal vector in the frequency domain are:
W(k)=[w0(k)…w2M-1(k)]T=F[wT(kM)0…0]
E(k)=[E0(k)…E2M-1(k)]T=F[0…0e(kM)]T
wherein, F and F-1Are 2M x 2M, respectivelyDFT and IDFT matrices. The frequency domain adaptive iterative expression of FDAF can thus be written as:
wherein μ (k) ═ diag { μ0(k)…μ2M-1(k) Is the normalized step matrix, Λ (k) ═ diag { P }0(k)…P2M-1(k) Is the input signal power matrix and,andrespectively as follows:
after the self-adaptive filter processing, the overall signal-to-noise ratio can be improved by about 15-20 dB.
According to the space-based finger control environment strong noise suppression method, cabin noise suppression processing is further performed on the third voice signal through the deep learning noise reduction model. Single-channel noise reduction algorithms can be divided into two broad categories: conventional methods and Deep Learning (Deep Learning) based methods. The conventional methods can be classified into parametric methods, nonparametric methods, and statistical model-based methods, and a typical representative of the Deep learning method is a Deep Neural Network (DNN) -based method. The traditional method has the advantages of simple method, easy realization, small calculated amount and good processing effect on stationary noise in an environment with high signal-to-noise ratio, but when the signal-to-noise ratio is low, the performance of the algorithm is sharply reduced, and the processing effect on non-stationary noise is poor. The DNN-based noise reduction algorithm just makes up the defects of the traditional noise reduction signal algorithm, but the method is a data-driven supervised learning method, a large amount of data are needed to train a model to achieve the expected effect, more time is needed for data preparation and model training, and compared with the traditional method, the method is large in calculated amount, and the calculated amount is reduced while the performance is ensured by using a model compression method.
In a preferred embodiment of the present application, in step three, performing deep learning noise reduction model training includes:
training the deep learning noise reduction model by using the mixed voice y output by the signal processing module and the corresponding pure voice s:
the input voice of the module is the voice output by the signal processing module, and the output voice is sent to the next module for subsequent operation.
Extracting an acoustic feature F of the mixed voice y:
F=log(mel(STFT(y)))
wherein STFT is short-time Fourier transform, and mel is Mel spectral feature;
the acoustic feature F is normalized:
F=(F-mean)/var
wherein mean is the average value of the training set characteristics, and var is the standard deviation of the training set characteristics;
sending the normalized acoustic features F into a deep learning noise reduction model as input, and guiding the update of model parameters by using the output s _ of the deep learning noise reduction model and the MSE error e between actual pure speeches s;
e=||s_-ISTFT(STFT(y)*model(F))||2
the ISTFT is inverse short-time Fourier transform, the model is a model, and the multiplication is dot;
and (4) until the value of the MSE error e tends to be stable, converging the deep learning noise reduction model, and storing the deep learning noise reduction model.
In this embodiment, further, a model test is included. And (3) processing the mixed voice by a signal processing module to obtain y, extracting the characteristics of the y, inputting the characteristics into a model to obtain a corresponding output mask, performing point multiplication on the mask and the y, reconstructing the voice to obtain a noise-reduced signal, and outputting the noise-reduced signal to a next module, wherein the model framework is shown in fig. 6.
According to the method for restraining the strong noise of the space-based finger control environment, due to the fact that the parameter quantity of an original model is large, the computing resources of mobile equipment are limited, the model needs to be compressed, and the computing quantity is reduced. In this embodiment, the model compression is performed on the deep learning noise reduction model, specifically: a 16bit fixed-point method is used for all parameters of the deep learning noise reduction model to accelerate the calculation speed; SVD decomposition is applied to the matrix to reduce the amount of computation. The computation amount is greatly reduced while the loss of the model performance is very small. The signal after the preceding stage signal processing is processed by a model, and the expected signal-to-noise ratio can be additionally improved by about 10 dB.
The method for restraining the strong noise of the space-based finger-controlled environment can improve the noise restraint by about 30-45db, and the optimization rate is as high as 100%.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (7)
1. A method for suppressing strong noise in a space-based finger-controlled environment is characterized by comprising the following steps:
step one, optimizing a microphone array for space-based control, acquiring a first voice signal and mechanical noise of an engine through the microphone array, and processing the first voice signal through a super-directional beam former to obtain a noise-reduced second voice signal which points to the mouth of a pilot;
removing the mechanical noise of the engine in the second voice signal through a self-adaptive filter to obtain a third voice signal;
and step three, carrying out deep learning noise reduction model training, carrying out cabin noise suppression processing on the third voice signal through the deep learning noise reduction model to obtain a fourth voice signal, and sending the fourth voice signal to a recognition engine.
2. The method for suppressing strong noise in a space-based pointing environment according to claim 1, wherein in step one, the microphone array for optimizing space-based pointing comprises: four microphones are uniformly distributed in a linear mode at the headset pickup position of the driver helmet, and one microphone is installed on the side face of the driver helmet.
3. The method according to claim 1, wherein in the first step, the super-directional beamformer comprises:
supposing a uniform linear array of M omnidirectional microphones with an array element spacing of delta, in the presence of isotropic noise, a super-directional beam former is designed, the array gain of which in the end-fire direction reaches M2The array gain is defined as follows:
δij=(i-j)δ
wherein d isL(ω, θ) is the array steering vector, θ is the desired direction;
make the array gain ζL,dn[h(ω)]The maximum filter is found by:
wherein the content of the first and second substances,for any complex number, constrained by a distortion-free criterion, to obtain a maximum signal-to-noise ratio filter, i.e. a super-directional beamformerComprises the following steps:
4. the method according to claim 3, further comprising processing the super-directional beamformer to obtain a robust super-directional beamformer:
acquiring a white noise gain:
maximizing the directivity factor under the constraint of white noise gain is equivalent to minimizing the following equation:
wherein epsilon is a Lagrange multiplier, and a robust super-directional beam former is obtained through constraint of a distortion-free criterion:
5. the method according to claim 4, wherein in the second step, the removing the mechanical engine noise in the second speech signal by an adaptive filter to obtain a third speech signal includes:
obtaining an adaptive filter;
removing the engine mechanical noise in the second voice signal through an adaptive filter to obtain a third voice signal:
e(n)=d(n)-y(n)
wherein, x (N) is engine mechanical noise, d (N) is a second voice signal, h (N) is an adaptive filter, e (N) is a third voice signal, and N is a filter length.
6. The method for suppressing strong noise in a space-based finger-controlled environment according to claim 1, wherein in step three, the performing deep learning noise reduction model training includes:
training the deep learning noise reduction model by using the mixed voice y output by the signal processing module and the corresponding pure voice s:
extracting an acoustic feature F of the mixed voice y:
F=log(mel(STFT(y)))
wherein STFT is short-time Fourier transform, and mel is Mel spectral feature;
the acoustic feature F is normalized:
F=(F-mean)/var
wherein mean is the average value of the training set characteristics, and var is the standard deviation of the training set characteristics;
sending the normalized acoustic features F into a deep learning noise reduction model as input, and guiding the update of model parameters by using the output s _ of the deep learning noise reduction model and the MSE error e between actual pure speeches s;
e=||s_-ISTFT(STFT(y)*model(F))||2
the ISTFT is inverse short-time Fourier transform, the model is a model, and the multiplication is dot;
and (4) until the value of the MSE error e tends to be stable, converging the deep learning noise reduction model, and storing the deep learning noise reduction model.
7. The method for suppressing strong noise in a space-based finger-controlled environment according to claim 6, further comprising performing model compression on the deep learning noise reduction model, specifically:
a 16bit fixed-point method is used for all parameters of the deep learning noise reduction model to accelerate the calculation speed;
SVD decomposition is applied to the matrix to reduce the amount of computation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111370832.9A CN114220450A (en) | 2021-11-18 | 2021-11-18 | Method for restraining strong noise of space-based finger-controlled environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111370832.9A CN114220450A (en) | 2021-11-18 | 2021-11-18 | Method for restraining strong noise of space-based finger-controlled environment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114220450A true CN114220450A (en) | 2022-03-22 |
Family
ID=80697615
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111370832.9A Pending CN114220450A (en) | 2021-11-18 | 2021-11-18 | Method for restraining strong noise of space-based finger-controlled environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114220450A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115273850A (en) * | 2022-09-28 | 2022-11-01 | 科大讯飞股份有限公司 | Autonomous mobile equipment voice control method and system |
-
2021
- 2021-11-18 CN CN202111370832.9A patent/CN114220450A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115273850A (en) * | 2022-09-28 | 2022-11-01 | 科大讯飞股份有限公司 | Autonomous mobile equipment voice control method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9002027B2 (en) | Space-time noise reduction system for use in a vehicle and method of forming same | |
US10123113B2 (en) | Selective audio source enhancement | |
US8583428B2 (en) | Sound source separation using spatial filtering and regularization phases | |
CN107993670B (en) | Microphone array speech enhancement method based on statistical model | |
US9197975B2 (en) | System for detecting and reducing noise via a microphone array | |
KR101339592B1 (en) | Sound source separator device, sound source separator method, and computer readable recording medium having recorded program | |
CN111261138B (en) | Noise reduction system determination method and device, and noise processing method and device | |
CN110517701B (en) | Microphone array speech enhancement method and implementation device | |
KR20050115857A (en) | System and method for speech processing using independent component analysis under stability constraints | |
US9078057B2 (en) | Adaptive microphone beamforming | |
Li et al. | Geometrically constrained independent vector analysis for directional speech enhancement | |
CN112349292B (en) | Signal separation method and device, computer readable storage medium and electronic equipment | |
US20180308503A1 (en) | Real-time single-channel speech enhancement in noisy and time-varying environments | |
CN114220450A (en) | Method for restraining strong noise of space-based finger-controlled environment | |
Li et al. | Online Directional Speech Enhancement Using Geometrically Constrained Independent Vector Analysis. | |
WO2023108864A1 (en) | Regional pickup method and system for miniature microphone array device | |
Priyanka et al. | Adaptive Beamforming Using Zelinski-TSNR Multichannel Postfilter for Speech Enhancement | |
CN113658605B (en) | Speech enhancement method based on deep learning assisted RLS filtering processing | |
US11721353B2 (en) | Spatial audio wind noise detection | |
CN113223552B (en) | Speech enhancement method, device, apparatus, storage medium, and program | |
CN113838472A (en) | Voice noise reduction method and device | |
Song et al. | Drone ego-noise cancellation for improved speech capture using deep convolutional autoencoder assisted multistage beamforming | |
US11282531B2 (en) | Two-dimensional smoothing of post-filter masks | |
Li et al. | An overview of speech dereverberation | |
CN116206603A (en) | Voice control method and system for transfer robot |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |