CN114220450A - Method for restraining strong noise of space-based finger-controlled environment - Google Patents

Method for restraining strong noise of space-based finger-controlled environment Download PDF

Info

Publication number
CN114220450A
CN114220450A CN202111370832.9A CN202111370832A CN114220450A CN 114220450 A CN114220450 A CN 114220450A CN 202111370832 A CN202111370832 A CN 202111370832A CN 114220450 A CN114220450 A CN 114220450A
Authority
CN
China
Prior art keywords
noise
voice signal
deep learning
voice
noise reduction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111370832.9A
Other languages
Chinese (zh)
Inventor
刘泽石
李思凝
张世辉
张佳鹏
姜博文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Aircraft Design and Research Institute Aviation Industry of China AVIC
Original Assignee
Shenyang Aircraft Design and Research Institute Aviation Industry of China AVIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Aircraft Design and Research Institute Aviation Industry of China AVIC filed Critical Shenyang Aircraft Design and Research Institute Aviation Industry of China AVIC
Priority to CN202111370832.9A priority Critical patent/CN114220450A/en
Publication of CN114220450A publication Critical patent/CN114220450A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The application belongs to the technical field of aviation display control, and particularly relates to a method for restraining strong noise of a space-based finger control environment. The method comprises the following steps: step one, optimizing a microphone array for space-based control, acquiring a first voice signal and engine mechanical noise through the microphone array, and processing the first voice signal through a super-directional beam former to obtain a noise-reduced second voice signal pointing to the mouth of a pilot; removing mechanical engine noise in the second voice signal through a self-adaptive filter to obtain a third voice signal; and step three, carrying out deep learning noise reduction model training, carrying out cabin noise suppression processing on the third voice signal through the deep learning noise reduction model to obtain a fourth voice signal, and sending the fourth voice signal to a recognition engine. The method for restraining the strong noise of the space-based finger-controlled environment can improve the quality of the collected voice data of the human voice, further improve the voice recognition accuracy of an operator and promote the engineering application of voice interaction related technology.

Description

Method for restraining strong noise of space-based finger-controlled environment
Technical Field
The application belongs to the technical field of aviation display control, and particularly relates to a method for restraining strong noise of a space-based finger control environment.
Background
In the deployment of the air-based command control system and the existing man-machine cabin, an operator completes the command control function of the unmanned aerial vehicle through voice recognition under the high dynamic environment, the strong noise environment of the existing cabin leads to the fact that the audio data acquired by the pickup equipment of the traditional helmet cannot meet the requirement of voice recognition, and the typical aerodynamic noise in the cabin cannot be eliminated through simple physical noise reduction or the traditional active noise reduction and other methods at the rear end.
Accordingly, a technical solution is desired to overcome or at least alleviate at least one of the above-mentioned drawbacks of the prior art.
Disclosure of Invention
The application aims to provide a method for suppressing strong noise in a space-based finger-controlled environment, so as to solve at least one problem in the prior art.
The technical scheme of the application is as follows:
a method for suppressing strong noise in a space-based finger-controlled environment comprises the following steps:
step one, optimizing a microphone array for space-based control, acquiring a first voice signal and mechanical noise of an engine through the microphone array, and processing the first voice signal through a super-directional beam former to obtain a noise-reduced second voice signal which points to the mouth of a pilot;
removing the mechanical noise of the engine in the second voice signal through a self-adaptive filter to obtain a third voice signal;
and step three, carrying out deep learning noise reduction model training, carrying out cabin noise suppression processing on the third voice signal through the deep learning noise reduction model to obtain a fourth voice signal, and sending the fourth voice signal to a recognition engine.
In at least one embodiment of the present application, in step one, the microphone array for optimizing space-based steering includes: four microphones are uniformly distributed in a linear mode at the headset pickup position of the driver helmet, and one microphone is installed on the side face of the driver helmet.
In at least one embodiment of the present application, in step one, the super-directional beamformer is:
supposing a uniform linear array of M omnidirectional microphones with an array element spacing of delta, in the presence of isotropic noise, a super-directional beam former is designed, the array gain of which in the end-fire direction reaches M2The array gain is defined as follows:
Figure BDA0003362019780000021
Figure BDA0003362019780000022
δij=(i-j)δ
wherein d isL(ω, θ) is the array steering vector, θ is the desired direction;
make the array gain ζL,dn[h(ω)]The maximum filter is found by:
Figure BDA0003362019780000023
wherein the content of the first and second substances,
Figure BDA0003362019780000024
for any complex number, the maximum signal-to-noise ratio filter is obtained through constraint of a distortion-free criterion, namely, the super-directional beam former is as follows:
Figure BDA0003362019780000025
in at least one embodiment of the present application, the method further comprises processing the super-directional beamformer to obtain a robust super-directional beamformer:
acquiring a white noise gain:
Figure BDA0003362019780000026
maximizing the directivity factor under the constraint of white noise gain is equivalent to minimizing the following equation:
Figure BDA0003362019780000027
wherein epsilon is a Lagrange multiplier, and a robust super-directional beam former is obtained through constraint of a distortion-free criterion:
Figure BDA0003362019780000028
in at least one embodiment of the present application, in the second step, the removing the engine mechanical noise in the second speech signal by using an adaptive filter to obtain a third speech signal includes:
obtaining an adaptive filter;
removing the engine mechanical noise in the second voice signal through an adaptive filter to obtain a third voice signal:
e(n)=d(n)-y(n)
Figure BDA0003362019780000031
wherein, x (N) is engine mechanical noise, d (N) is a second voice signal, h (N) is an adaptive filter, e (N) is a third voice signal, and N is a filter length.
In at least one embodiment of the present application, in step three, the performing deep learning noise reduction model training includes:
training the deep learning noise reduction model by using the mixed voice y output by the signal processing module and the corresponding pure voice s:
extracting an acoustic feature F of the mixed voice y:
F=log(mel(STFT(y)))
wherein STFT is short-time Fourier transform, and mel is Mel spectral feature;
the acoustic feature F is normalized:
F=(F-mean)/var
wherein mean is the average value of the training set characteristics, and var is the standard deviation of the training set characteristics;
sending the normalized acoustic features F into a deep learning noise reduction model as input, and guiding the update of model parameters by using the output s _ of the deep learning noise reduction model and the MSE error e between actual pure speeches s;
e=||s_-ISTFT(STFT(y)*model(F))||2
the ISTFT is inverse short-time Fourier transform, the model is a model, and the multiplication is dot;
and (4) until the value of the MSE error e tends to be stable, converging the deep learning noise reduction model, and storing the deep learning noise reduction model.
In at least one embodiment of the present application, the method further includes performing model compression on the deep learning noise reduction model, specifically:
a 16bit fixed-point method is used for all parameters of the deep learning noise reduction model to accelerate the calculation speed;
SVD decomposition is applied to the matrix to reduce the amount of computation.
The invention has at least the following beneficial technical effects:
the method for restraining the strong noise of the space-based finger-controlled environment can improve the quality of the collected voice data of the human voice, further improve the voice recognition accuracy of an operator and promote the engineering application of voice interaction related technology.
Drawings
FIG. 1 is a flow chart of a method for suppressing strong noise in a space-based finger-controlled environment according to an embodiment of the present application;
FIG. 2 is a super-directive beam pattern according to an embodiment of the present application;
FIG. 3 is a flow diagram of an adaptive filter process according to an embodiment of the present application;
FIG. 4 is an adaptive filter according to an embodiment of the present application;
FIG. 5 is a frequency domain block adaptive filter according to an embodiment of the present application;
FIG. 6 is a flow chart of deep learning noise reduction model training according to an embodiment of the present application.
Detailed Description
In order to make the implementation objects, technical solutions and advantages of the present application clearer, the technical solutions in the embodiments of the present application will be described in more detail below with reference to the drawings in the embodiments of the present application. In the drawings, the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The described embodiments are a subset of the embodiments in the present application and not all embodiments in the present application. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. Embodiments of the present application will be described in detail below with reference to the accompanying drawings.
In the description of the present application, it is to be understood that the terms "center", "longitudinal", "lateral", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like indicate orientations or positional relationships based on those shown in the drawings, and are used merely for convenience in describing the present application and for simplifying the description, and do not indicate or imply that the referenced device or element must have a particular orientation, be constructed in a particular orientation, and be operated, and therefore should not be construed as limiting the scope of the present application.
The present application is described in further detail below with reference to fig. 1 to 6.
The application provides a method for suppressing strong noise in a space-based finger-controlled environment, as shown in fig. 1, comprising the following steps:
step one, optimizing a microphone array for space-based control, acquiring a first voice signal and engine mechanical noise through the microphone array, and processing the first voice signal through a super-directional beam former to obtain a noise-reduced second voice signal pointing to the mouth of a pilot;
removing mechanical engine noise in the second voice signal through a self-adaptive filter to obtain a third voice signal;
and step three, carrying out deep learning noise reduction model training, carrying out cabin noise suppression processing on the third voice signal through the deep learning noise reduction model to obtain a fourth voice signal, and sending the fourth voice signal to a recognition engine.
In the method for suppressing strong noise in the space-based finger control environment, in the first step, optimizing a microphone array for space-based finger control includes: four microphones are uniformly distributed in a linear mode at the headset pickup position of the helmet of the driver and used for collecting voice control instructions of the pilot, and meanwhile, one microphone is arranged on the side face of the helmet of the driver and used for collecting mechanical noise of an aircraft engine. The voice signals collected by the microphone array are processed by the super-directional beam former, wind noise generated after the front end of the airplane is rubbed with air is filtered, and noise-reduced signals pointing to the mouth of a pilot can be obtained.
Super-directional beamforming is one type of fixed beamforming that has a narrower main lobe and is more noise-suppressing than conventional beamforming. In one embodiment of the present application, in step one, the super-directional beamformer is:
supposing a uniform linear array of M omnidirectional microphones with an array element spacing of delta, in the presence of isotropic noise, a super-directional beam former is designed, the array gain of which in the end-fire direction reaches M2The array gain is defined as follows:
Figure BDA0003362019780000051
Figure BDA0003362019780000052
δij=(i-j)δ
wherein d isL(ω, θ) is the array steering vector, θ is the desired direction;
make the array gain ζL,dn[h(ω)]The maximum filter is found by:
Figure BDA0003362019780000053
wherein the content of the first and second substances,
Figure BDA0003362019780000054
for any complex number, the maximum signal-to-noise ratio filter is obtained through constraint of a distortion-free criterion, namely, the super-directional beam former is as follows:
Figure BDA0003362019780000061
the maximum snr filter is the super-directional beamformer (main lobe direction θ). But the super-directional beamformer has a white noise gain at low frequencies much less than 1 and therefore in practical applications will cause an amplification of the white noise, especially at low frequencies. Therefore, in this embodiment, it is preferable that the processing of the super-directional beamformer results in a robust super-directional beamformer:
acquiring a white noise gain:
Figure BDA0003362019780000062
maximizing the directivity factor under the constraint of white noise gain is equivalent to minimizing the following equation:
Figure BDA0003362019780000063
wherein epsilon is a Lagrange multiplier, and a robust super-directional beam former is obtained through constraint of a distortion-free criterion:
Figure BDA0003362019780000064
on the basis of the robust super-directional beam former, low-frequency sidelobe constraint is continuously added, so that the beam direction is narrower at low frequency, and the suppression on noise is enhanced. Through the beam forming processing of the stage, the integral signal-to-noise ratio is improved by about 10-15 dB.
Fig. 2 shows the super-directional beam pattern designed in this embodiment, and it can be seen that the super-directional beam former has limited noise suppression capability at low frequencies. In this regard, noise will be further suppressed by incorporating adaptive noise cancellation methods.
According to the space-based finger-control environment strong noise suppression method, under the environment of an airplane cabin, the frequency spectrum occupied by wind noise is relatively wide and does not have obvious frequency spectrum characteristics, but engine mechanical noise usually has obvious frequency spectrum characteristics, so that adaptive noise cancellation processing aiming at the engine mechanical noise under the environment of the cabin is possible. According to the method, the mic is arranged on the side face of the helmet of the pilot to collect the reference noise, wherein the collected reference noise comprises wind noise and engine mechanical noise, but the frequency spectrum structure of the engine mechanical noise is more critical to the learning and tracking of the adaptive filter processing. As shown in fig. 3, the noise-containing signal obtained by mixing the mechanical engine noise collected by the speech command and the reference microphone is collected by the microphone array disposed in the headset of the helmet, so that the mechanical engine noise collected by the microphone can be well eliminated as long as the change of H can be accurately calculated and tracked.
In this embodiment, in the second step, removing the mechanical engine noise in the second speech signal by using the adaptive filter to obtain the third speech signal includes:
obtaining an adaptive filter;
removing the engine mechanical noise in the second voice signal through an adaptive filter to obtain a third voice signal:
e(n)=d(n)-y(n)
Figure BDA0003362019780000071
wherein, x (N) is engine mechanical noise, d (N) is a second voice signal, h (N) is an adaptive filter, e (N) is a third voice signal, and N is a filter length. A block diagram of the adaptive cancellation algorithm process can be represented by fig. 4, with s (n) representing the near-end speaker's speech.
In the preferred embodiment of the present application, in a practical complex usage environment of the cabin, fast tracking and convergence of the filter need to be considered, the time domain processing usually generates unacceptable delay, and the frequency domain processing has the advantages of low computational complexity and high convergence rate compared to the time domain processing, and each frequency band in the convergence process can be precisely controlled to achieve the overall optimal filtering, so in this embodiment, the structure of the frequency domain block adaptive filter (PBFDAF) is adopted for adaptive noise cancellation, and the structure is shown in fig. 5.
The coefficient vector of the time-domain adaptive filter can be represented as:
w(n)=[w0(n)…wM-1(n)]T
the error vector in the time domain is:
e(n)=[e(n)…e(n+M-1)]T
the input signal matrix in the frequency domain is:
X(k)=diag{X0(k)…X2M-1(k)}
=diag{F[x(kM-M)…x(kM+M-1)]T}
the filter coefficients and error signal vector in the frequency domain are:
W(k)=[w0(k)…w2M-1(k)]T=F[wT(kM)0…0]
E(k)=[E0(k)…E2M-1(k)]T=F[0…0e(kM)]T
wherein, F and F-1Are 2M x 2M, respectivelyDFT and IDFT matrices. The frequency domain adaptive iterative expression of FDAF can thus be written as:
Figure BDA0003362019780000072
wherein μ (k) ═ diag { μ0(k)…μ2M-1(k) Is the normalized step matrix, Λ (k) ═ diag { P }0(k)…P2M-1(k) Is the input signal power matrix and,
Figure BDA0003362019780000081
and
Figure BDA0003362019780000082
respectively as follows:
Figure BDA0003362019780000083
Figure BDA0003362019780000084
after the self-adaptive filter processing, the overall signal-to-noise ratio can be improved by about 15-20 dB.
According to the space-based finger control environment strong noise suppression method, cabin noise suppression processing is further performed on the third voice signal through the deep learning noise reduction model. Single-channel noise reduction algorithms can be divided into two broad categories: conventional methods and Deep Learning (Deep Learning) based methods. The conventional methods can be classified into parametric methods, nonparametric methods, and statistical model-based methods, and a typical representative of the Deep learning method is a Deep Neural Network (DNN) -based method. The traditional method has the advantages of simple method, easy realization, small calculated amount and good processing effect on stationary noise in an environment with high signal-to-noise ratio, but when the signal-to-noise ratio is low, the performance of the algorithm is sharply reduced, and the processing effect on non-stationary noise is poor. The DNN-based noise reduction algorithm just makes up the defects of the traditional noise reduction signal algorithm, but the method is a data-driven supervised learning method, a large amount of data are needed to train a model to achieve the expected effect, more time is needed for data preparation and model training, and compared with the traditional method, the method is large in calculated amount, and the calculated amount is reduced while the performance is ensured by using a model compression method.
In a preferred embodiment of the present application, in step three, performing deep learning noise reduction model training includes:
training the deep learning noise reduction model by using the mixed voice y output by the signal processing module and the corresponding pure voice s:
the input voice of the module is the voice output by the signal processing module, and the output voice is sent to the next module for subsequent operation.
Extracting an acoustic feature F of the mixed voice y:
F=log(mel(STFT(y)))
wherein STFT is short-time Fourier transform, and mel is Mel spectral feature;
the acoustic feature F is normalized:
F=(F-mean)/var
wherein mean is the average value of the training set characteristics, and var is the standard deviation of the training set characteristics;
sending the normalized acoustic features F into a deep learning noise reduction model as input, and guiding the update of model parameters by using the output s _ of the deep learning noise reduction model and the MSE error e between actual pure speeches s;
e=||s_-ISTFT(STFT(y)*model(F))||2
the ISTFT is inverse short-time Fourier transform, the model is a model, and the multiplication is dot;
and (4) until the value of the MSE error e tends to be stable, converging the deep learning noise reduction model, and storing the deep learning noise reduction model.
In this embodiment, further, a model test is included. And (3) processing the mixed voice by a signal processing module to obtain y, extracting the characteristics of the y, inputting the characteristics into a model to obtain a corresponding output mask, performing point multiplication on the mask and the y, reconstructing the voice to obtain a noise-reduced signal, and outputting the noise-reduced signal to a next module, wherein the model framework is shown in fig. 6.
According to the method for restraining the strong noise of the space-based finger control environment, due to the fact that the parameter quantity of an original model is large, the computing resources of mobile equipment are limited, the model needs to be compressed, and the computing quantity is reduced. In this embodiment, the model compression is performed on the deep learning noise reduction model, specifically: a 16bit fixed-point method is used for all parameters of the deep learning noise reduction model to accelerate the calculation speed; SVD decomposition is applied to the matrix to reduce the amount of computation. The computation amount is greatly reduced while the loss of the model performance is very small. The signal after the preceding stage signal processing is processed by a model, and the expected signal-to-noise ratio can be additionally improved by about 10 dB.
The method for restraining the strong noise of the space-based finger-controlled environment can improve the noise restraint by about 30-45db, and the optimization rate is as high as 100%.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (7)

1. A method for suppressing strong noise in a space-based finger-controlled environment is characterized by comprising the following steps:
step one, optimizing a microphone array for space-based control, acquiring a first voice signal and mechanical noise of an engine through the microphone array, and processing the first voice signal through a super-directional beam former to obtain a noise-reduced second voice signal which points to the mouth of a pilot;
removing the mechanical noise of the engine in the second voice signal through a self-adaptive filter to obtain a third voice signal;
and step three, carrying out deep learning noise reduction model training, carrying out cabin noise suppression processing on the third voice signal through the deep learning noise reduction model to obtain a fourth voice signal, and sending the fourth voice signal to a recognition engine.
2. The method for suppressing strong noise in a space-based pointing environment according to claim 1, wherein in step one, the microphone array for optimizing space-based pointing comprises: four microphones are uniformly distributed in a linear mode at the headset pickup position of the driver helmet, and one microphone is installed on the side face of the driver helmet.
3. The method according to claim 1, wherein in the first step, the super-directional beamformer comprises:
supposing a uniform linear array of M omnidirectional microphones with an array element spacing of delta, in the presence of isotropic noise, a super-directional beam former is designed, the array gain of which in the end-fire direction reaches M2The array gain is defined as follows:
Figure FDA0003362019770000011
Figure FDA0003362019770000012
δij=(i-j)δ
wherein d isL(ω, θ) is the array steering vector, θ is the desired direction;
make the array gain ζL,dn[h(ω)]The maximum filter is found by:
Figure FDA0003362019770000013
wherein the content of the first and second substances,
Figure FDA0003362019770000014
for any complex number, constrained by a distortion-free criterion, to obtain a maximum signal-to-noise ratio filter, i.e. a super-directional beamformerComprises the following steps:
Figure FDA0003362019770000021
4. the method according to claim 3, further comprising processing the super-directional beamformer to obtain a robust super-directional beamformer:
acquiring a white noise gain:
Figure FDA0003362019770000022
maximizing the directivity factor under the constraint of white noise gain is equivalent to minimizing the following equation:
Figure FDA0003362019770000023
wherein epsilon is a Lagrange multiplier, and a robust super-directional beam former is obtained through constraint of a distortion-free criterion:
Figure FDA0003362019770000024
5. the method according to claim 4, wherein in the second step, the removing the mechanical engine noise in the second speech signal by an adaptive filter to obtain a third speech signal includes:
obtaining an adaptive filter;
removing the engine mechanical noise in the second voice signal through an adaptive filter to obtain a third voice signal:
e(n)=d(n)-y(n)
Figure FDA0003362019770000025
wherein, x (N) is engine mechanical noise, d (N) is a second voice signal, h (N) is an adaptive filter, e (N) is a third voice signal, and N is a filter length.
6. The method for suppressing strong noise in a space-based finger-controlled environment according to claim 1, wherein in step three, the performing deep learning noise reduction model training includes:
training the deep learning noise reduction model by using the mixed voice y output by the signal processing module and the corresponding pure voice s:
extracting an acoustic feature F of the mixed voice y:
F=log(mel(STFT(y)))
wherein STFT is short-time Fourier transform, and mel is Mel spectral feature;
the acoustic feature F is normalized:
F=(F-mean)/var
wherein mean is the average value of the training set characteristics, and var is the standard deviation of the training set characteristics;
sending the normalized acoustic features F into a deep learning noise reduction model as input, and guiding the update of model parameters by using the output s _ of the deep learning noise reduction model and the MSE error e between actual pure speeches s;
e=||s_-ISTFT(STFT(y)*model(F))||2
the ISTFT is inverse short-time Fourier transform, the model is a model, and the multiplication is dot;
and (4) until the value of the MSE error e tends to be stable, converging the deep learning noise reduction model, and storing the deep learning noise reduction model.
7. The method for suppressing strong noise in a space-based finger-controlled environment according to claim 6, further comprising performing model compression on the deep learning noise reduction model, specifically:
a 16bit fixed-point method is used for all parameters of the deep learning noise reduction model to accelerate the calculation speed;
SVD decomposition is applied to the matrix to reduce the amount of computation.
CN202111370832.9A 2021-11-18 2021-11-18 Method for restraining strong noise of space-based finger-controlled environment Pending CN114220450A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111370832.9A CN114220450A (en) 2021-11-18 2021-11-18 Method for restraining strong noise of space-based finger-controlled environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111370832.9A CN114220450A (en) 2021-11-18 2021-11-18 Method for restraining strong noise of space-based finger-controlled environment

Publications (1)

Publication Number Publication Date
CN114220450A true CN114220450A (en) 2022-03-22

Family

ID=80697615

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111370832.9A Pending CN114220450A (en) 2021-11-18 2021-11-18 Method for restraining strong noise of space-based finger-controlled environment

Country Status (1)

Country Link
CN (1) CN114220450A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115273850A (en) * 2022-09-28 2022-11-01 科大讯飞股份有限公司 Autonomous mobile equipment voice control method and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115273850A (en) * 2022-09-28 2022-11-01 科大讯飞股份有限公司 Autonomous mobile equipment voice control method and system

Similar Documents

Publication Publication Date Title
US9002027B2 (en) Space-time noise reduction system for use in a vehicle and method of forming same
US10123113B2 (en) Selective audio source enhancement
US8583428B2 (en) Sound source separation using spatial filtering and regularization phases
CN107993670B (en) Microphone array speech enhancement method based on statistical model
US9197975B2 (en) System for detecting and reducing noise via a microphone array
KR101339592B1 (en) Sound source separator device, sound source separator method, and computer readable recording medium having recorded program
CN111261138B (en) Noise reduction system determination method and device, and noise processing method and device
CN110517701B (en) Microphone array speech enhancement method and implementation device
KR20050115857A (en) System and method for speech processing using independent component analysis under stability constraints
US9078057B2 (en) Adaptive microphone beamforming
Li et al. Geometrically constrained independent vector analysis for directional speech enhancement
CN112349292B (en) Signal separation method and device, computer readable storage medium and electronic equipment
US20180308503A1 (en) Real-time single-channel speech enhancement in noisy and time-varying environments
CN114220450A (en) Method for restraining strong noise of space-based finger-controlled environment
Li et al. Online Directional Speech Enhancement Using Geometrically Constrained Independent Vector Analysis.
WO2023108864A1 (en) Regional pickup method and system for miniature microphone array device
Priyanka et al. Adaptive Beamforming Using Zelinski-TSNR Multichannel Postfilter for Speech Enhancement
CN113658605B (en) Speech enhancement method based on deep learning assisted RLS filtering processing
US11721353B2 (en) Spatial audio wind noise detection
CN113223552B (en) Speech enhancement method, device, apparatus, storage medium, and program
CN113838472A (en) Voice noise reduction method and device
Song et al. Drone ego-noise cancellation for improved speech capture using deep convolutional autoencoder assisted multistage beamforming
US11282531B2 (en) Two-dimensional smoothing of post-filter masks
Li et al. An overview of speech dereverberation
CN116206603A (en) Voice control method and system for transfer robot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination