CN115132219A

CN115132219A - Speech recognition method and system based on quadratic spectral subtraction under complex noise background

Info

Publication number: CN115132219A
Application number: CN202210711617.9A
Authority: CN
Inventors: 邵鹏志; 谢志豪; 王乃正; 孟英谦; 彭龙; 李胜昌; 宋彪; 邬书豪; 李泽宇; 张世超; 魏中锐; 任智颖; 葛祥雨; 胡明哲; 霸建民; 高圣楠; 张敏
Original assignee: China North Computer Application Technology Research Institute
Current assignee: China North Computer Application Technology Research Institute
Priority date: 2022-06-22
Filing date: 2022-06-22
Publication date: 2022-09-30

Abstract

The invention relates to a speech recognition method and a speech recognition system based on a quadratic spectral subtraction under a complex noise background, and belongs to the technical field of speech enhancement. The method comprises the following steps: selecting noisy historical audio and pure noise audio under a complex noise background, and obtaining historical noise estimation through calculation processing; performing framing processing on the audio to be identified under the complex noise background to obtain multi-frame audio; and sequentially processing each frame of audio: and determining a historical noise removal factor and a current frame audio noise removal factor based on the historical noise estimation and the noise estimation of the current frame audio, and performing secondary spectrum subtraction on the current frame audio to obtain a voice spectrum after the noise of the current frame audio is reduced. The method solves the problem that the residual noise cannot be controlled to a lower level by adopting the prior art for the complicated background noise in the real world.

Description

Speech recognition method and system based on quadratic spectral subtraction under complex noise background

Technical Field

The invention belongs to the technical field of voice enhancement, and particularly relates to a voice recognition method and system based on secondary spectral subtraction under a complex noise background.

Background

Spectral subtraction is one of the speech enhancement algorithms. Speech enhancement is a technique of extracting a useful speech signal from a noisy speech signal, and suppressing and reducing noise interference, when the speech signal is interfered with or even buried by various noises. However, noise interference is usually random, and it is almost impossible to extract a completely clean speech signal from noisy speech. In this case, the purpose of speech enhancement is mainly two: firstly, the voice quality is improved, and background noise is eliminated, which is a subjective measure; and secondly, the voice task effects of voice recognition, speaker recognition and the like are improved, and the objective measurement is realized. However, these two objectives often cannot be combined, for example, some speech enhancement algorithms can significantly reduce background noise and improve speech quality, but cannot improve the speech recognition effect, even slightly decrease the speech recognition effect.

Spectral subtraction is a speech enhancement algorithm that processes bandwidth noise more traditionally and efficiently, and its basic idea is to subtract the noise power spectrum from the noisy speech signal under the condition that the additive noise and the short-time stationary speech signal are assumed to be independent from each other, so as to obtain a purer speech spectrum. The spectral subtraction has the outstanding advantages of small calculation amount of the algorithm, low calculation amount, low calculation complexity and suitability for real-time processing scenes. The disadvantage is that the processed signal will leave a relatively large amount of noise, referred to as musical noise.

In order to attenuate the musical noise caused by spectral subtraction, Berouti proposes a spectral subtraction algorithm that reduces the amplitude of the wideband spectral peak remaining from spectral subtraction by using a noise removal factor, and fills the spectral valleys (negative values of spectral subtraction) with the lowest audio energy to control how much of the residual noise and the magnitude of the musical noise. The over subtraction spectral subtraction expression is as follows:

wherein, P _y (ω)、P _s (ω)、P _n (ω) representing the power spectra of the noisy signal, the clean speech signal and the noise signal, respectively; α is a noise removal factor, which is a coefficient of the audio spectrum minus the noise spectrum; b represents the lowest audio energy remaining in the audio;

both spectral subtraction and over-subtraction are true in a stationary background noise environment, i.e. noise has an equal influence on all spectral components of speech. However, the background noise in the real world varies with time, different interference noise has different effects on each frequency band of the voice, and the over-subtraction spectral subtraction still cannot control the residual noise to a low level.

Disclosure of Invention

In view of the above analysis, the present invention aims to provide a speech recognition method based on quadratic spectral subtraction under complex noise background, which performs noise estimation on noisy historical audio, determines a historical noise removal factor and a current frame audio noise removal factor through calculation for current audio noise estimation, and performs quadratic spectral subtraction on noisy audio to be recognized, so as to solve the problem that the background noise in the real world is complex and the residual noise cannot be controlled to a lower level by using the prior art.

On one hand, the invention provides a speech recognition method based on secondary spectral subtraction under a complex noise background, which specifically comprises the following steps:

obtaining a historical noise estimation of a complex noise background based on a noisy historical audio and a clean noise audio under the complex noise background;

performing framing processing on the audio to be identified under the complex noise background to obtain multi-frame audio;

processing each frame of audio in sequence to obtain noise-reduced voice; wherein, to current frame audio frequency processing, include: and performing secondary spectral subtraction on the current frame audio based on the historical noise estimation and the noise estimation of the current frame audio to obtain a voice frequency spectrum of the current frame audio after noise reduction.

Further, the obtaining of the historical noise estimate of the complex noise background based on the noisy historical audio and the clean noise audio under the complex noise background includes:

framing each piece of the historical audio containing noise, and processing to obtain a power spectrum of each frame signal of the historical audio containing noise;

selecting the audio frequency of the preset number of frames with the lowest power spectrum on each audio frequency as pure noise, and estimating to obtain the average power spectrum B of each frame of noise of each noisy historical audio frequency _i (ω), wherein i ═ 1, 2, 3 … …, n, represent the calendar containing noiseNumber of history frequencies;

dividing each pure noise audio into frames, and processing to obtain the average noise power spectrum B of each frame of each pure noise audio _j (ω), where j ═ 1, 2, 3 … …, k, represents the number of clean noise tones;

b is to be _i (omega) and B _j (omega) averaging to obtain historical noise estimate

Further, the noise estimation of the current frame audio includes:

selecting the audio with the lowest power spectrum in the audio to be identified and the preset number of frames as pure noise, and estimating the average power spectrum of the noise of each frame of the audio to be identified

I.e. a noise estimate of the current frame audio.

Further, performing secondary spectral subtraction on the current frame audio by using the following formula to obtain a noise-reduced speech spectrum of the current frame audio:

wherein the content of the first and second substances,

representing the power spectrum estimate of the audio of the current frame, Y _n+1 (ω, m) represents the frequency spectrum of the audio of the current frame,. psi _n+1 (ω, m) represents phase information of the current frame audio; alpha is alpha _m 、β _m Respectively, a historical noise removal factor and a current frame audio noise removal factor; b is a mixture of _m Is the lowest spectral factor of the audio signal.

Further, the α is calculated by the following formula _m 、β _m And b _m ：

Wherein c is a constant, ξ _m The posterior signal-to-noise ratio of the current frame audio signal frequency domain is obtained; alpha is alpha _min 、α _max Respectively represent alpha _m Minimum and maximum values of (d); beta is a beta _min 、β _max Respectively represents beta _m Minimum and maximum values of; b _min 、b _max Respectively represent b _m Maximum and minimum values of.

Further, the xi is calculated by the following formula _m ：

Wherein k is frequency point, sigma _k |Y _n+1 (ω _k M) | represents the audio spectral intensity of the current frame,

representing the spectral strength of the historical noise estimate.

Further, for said alpha _m 、β _m And b _m Is limited to a and a minimum value, including _max ＝3,α _min ＝1,β _max ＝3,β _min ＝1,b _max ＝0.1,b _min ＝0.02。

On the other hand, the invention also provides a speech recognition system based on the quadratic spectral subtraction under the complex noise background, which comprises the following components:

a historical noise estimation module: obtaining historical noise estimation of the complex noise background based on the noisy historical audio and the pure noise audio under the complex noise background;

the audio noise reduction processing module to be identified: performing framing processing on the audio to be identified under the complex noise background to obtain multi-frame audio; sequentially processing each frame of audio to obtain noise-reduced voice; wherein, to current frame audio frequency processing, include: and performing secondary spectrum subtraction on the current frame audio based on the historical noise estimation and the noise estimation of the current frame audio to obtain a voice spectrum of the current frame audio after noise reduction.

Further, the historical noise estimation module includes the following modules:

a noisy historical audio processing module: framing each piece of the historical audio containing noise, and processing to obtain a power spectrum of each frame signal of the historical audio containing noise; selecting the audio frequency of the preset number of frames with the lowest power spectrum on each audio frequency as pure noise, and estimating to obtain the average power spectrum B of each frame of noise of each noisy historical audio frequency _n (ω), wherein n represents the nth noisy historical audio;

a clean noise audio processing module: dividing each pure noise audio into frames, and processing to obtain the average noise power spectrum B of each frame of each pure noise audio _k (ω), wherein k represents a k-th clean noise audio;

a noise estimation module: b is to be _p (omega) and B _n (omega) averaging to obtain historical noise estimate

Further, the audio noise reduction processing module to be identified includes:

the current audio noise estimation module: the method comprises the steps of framing audio to be identified under a complex noise background to obtain multi-frame audio, selecting the audio with the lowest power spectrum and a preset number of frames as pure noise, and estimating the average power spectrum of the noise of each frame of the audio to be identified

I.e. noise of the current frame audioEstimating sound;

a noise reduction processing module: and performing secondary spectrum subtraction on the current frame audio based on the historical noise estimation and the noise estimation of the current frame audio to obtain a voice spectrum of the current frame audio after noise reduction.

The invention can realize at least one of the following beneficial effects:

1. the method comprises the steps of acquiring a large number of historical noisy audios of complex noise backgrounds of different scenes, carrying out noise weighted averaging to obtain historical noise estimation, and extracting specific frames from the current audio to be identified to carry out current noise estimation, so that the problem that the noise estimation is deviated from the noise in the audio to be identified due to the difference of the complex noise backgrounds of different scenes is solved.

2. Historical noise removal factors and current frame audio noise removal factors are determined through calculation, secondary spectrum subtraction is carried out on the audio containing noise to be recognized, and accuracy of noise reduction processing on the audio to be recognized is improved.

3. By iterating the noise estimation of the audio to be processed to the historical noise estimation each time, the historical noise estimation can be continuously optimized, and the accuracy of noise reduction processing of the audio to be identified by secondary spectral subtraction is further improved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, wherein like reference numerals are used to designate like parts throughout.

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a schematic diagram showing the detailed steps of the method of the present invention;

fig. 3 is a comparison of waveforms before and after processing a piece of audio to be processed according to an embodiment of the present invention.

Detailed Description

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate preferred embodiments of the invention and together with the description, serve to explain the principles of the invention and not to limit the scope of the invention.

Method embodiment

Example 1

The invention discloses a speech recognition method based on quadratic spectral subtraction under complex noise background, which comprises the following steps:

step S1: and selecting noisy historical audio and pure noise audio under a complex noise background, and obtaining historical noise estimation through calculation processing.

The complex noise background refers to that in a voice signal environment, various different noises irrelevant to the existence or nonexistence of signals exist and the noises are changed continuously; noisy historical audio refers to audio that contains complex noise and clean speech signals, and clean noise audio refers to noise audio captured in a complex noise background that does not contain speech signals. The method can be suitable for steady-state noise voice enhancement scenes in the military field, such as voice recognition scenes under the operation condition of armored vehicles; at this time, the noisy historical audio refers to audio including noise generated by the operation of the armored vehicle and a clean voice signal, and the clean noise audio refers to noise generated by the operation of the armored vehicle.

Specifically, step S1 includes the following steps:

step S11: selecting i pieces of noisy historical audio (wherein i is 1, 2, 3, … …, n, n is the number of the noisy historical audio), framing the ith piece of noisy audio, illustratively, framing every 25ms, shifting by 10ms, and respectively representing the clean audio, the noisy audio and the noisy audio signal at the mth frame in the ith piece of audio by X (i, m), Y (i, m) and B (i, m), wherein the corresponding frequency spectrums are respectively represented by X (ω, m), Y (ω, m) and B (ω, m);

step S12: carrying out short-time Fourier transform on each frame of audio to obtain a power spectrum X of each frame of signal _i (ω, m) and phase spectrum ψ _i (ω,m)；

Step S13: selecting a power spectrum X over the entire audio _i The lowest 100 frames of audio frequency (omega, m) is taken as pure noise, and the average power spectrum B of each frame of noise of the audio frequency to be denoised is estimated _i (ω)；

Step S14: selecting j clean noise audios (wherein j is (1, 2, 3 … …, k)), framing the j clean noise audio, framing every 25ms, shifting the frame by 10ms, performing short-time Fourier transform on each frame of audio, and calculating a weighted power spectrum to obtain an average noise power spectrum B of each frame _j (ω)；

Step S15: b is to be _i (omega) and B _j (omega) averaging to obtain historical noise estimate

Step S2: the method comprises the following steps of framing audio to be identified under a complex noise background, and sequentially processing each frame of audio to obtain noise-reduced voice, wherein the method specifically comprises the following steps:

step S21: let the serial number of the audio to be identified be n +1 and i be n +1, execute the same steps as S11, S12 and S13, and estimate the average power spectrum of each frame of the audio to be identified

Step S22: calculating parameters required for performing secondary spectral subtraction on the current frame audio: xi _m 、α _m 、β _m And b _m Taking the value of (A); wherein ξ _m The posterior signal-to-noise ratio of the current frame audio signal frequency domain; alpha (alpha) ("alpha") _m 、β _m Respectively, a historical noise removal factor and a current frame audio noise removal factor; b _m Is the lowest spectral factor of the audio signal;

wherein k is frequency point, sigma _k |Y _n+1 (ω _k M) | denotes the audio spectral intensity of the current frame,

representing a spectral strength of the historical noise estimate;

step S23: performing secondary spectrum subtraction on the current frame of the audio to be recognized by using the following formula to obtain a voice spectrum of the current frame audio after noise reduction:

it should be noted that, when the audio to be identified is identified next time, the current audio to be identified is added as the (n + 1) th audio to be identified into the historical noisy audio, and the noise of the audio to be identified is estimated

Iterative to historical noise estimation

And optimizing historical noise estimation.

Example 2

The invention further discloses a speech recognition method based on quadratic spectral subtraction under a complex noise background, which comprises the following steps:

step S4: selecting 100 noise-containing human voice audios sampled on the spot of an armored car training field with the sampling rate of 44kHz and the length of 1 minute as noise-containing historical audios; wherein, the background noise scene has: the running noise of the engine only when the armored vehicle is static, the running noise of the engine when the armored vehicle is accelerated, the collision noise of a track and wheels or the ground when the vehicle runs, the running noise of the vehicle when the armored vehicle is decelerated and the like; selecting 300 armored vehicle noises sampled on the spot in a training field with the sampling rate of 44kHz and the length of 1 minute as pure noise audio; the historical noise estimation is obtained through calculation processing, and the method specifically comprises the following steps:

step S41: for each noisy audio, framing every 25ms, frame shifting by 10ms, denoted as clean cause frequency, noisy audio, and noisy audio signal at mth frame in ith (i ═ 1, 2, 3.... said., 100) audio, respectively, by X (i, m), Y (i, m), and B (i, m), respectively, and the corresponding frequency spectra are denoted as X (ω, m), Y (ω, m), and B (ω, m), respectively;

step S42: carrying out short-time Fourier transform on each frame of audio to obtain a power spectrum X of each frame of signal _i (ω, m) and phase spectrum ψ _i (ω，m)；

Step S43: selecting a power spectrum X over the entire audio _i The lowest 100 frames of audio frequency (omega, m) is taken as pure noise, and the average power spectrum B of each frame of noise of the audio frequency to be denoised is estimated _i (ω)；

Step S44: dividing each pure noise audio into frames, dividing each frame every 25ms, shifting the frames by 10ms, performing short-time Fourier transform on each frame of audio, and calculating a weighted power spectrum to obtain an average noise power spectrum B of each frame _j (ω)(j＝1，2，3......，80)；

Step S45: b is to be _i (omega) and B _j (omega) averaging to obtain historical noise estimate

Step S5: selecting 50 noise-containing voice audios sampled on the spot in an armored car training field with the sampling rate of 44kHz and the length of 1 minute as audios to be recognized, framing the audios to be recognized, and processing each frame of audio in sequence to obtain voices after noise reduction, wherein the method specifically comprises the following steps:

step S51: making the serial number of the current audio to be identified be n + 1; let i equal n +1, executeThe steps S41, S42 and S43 are performed to estimate the average power spectrum of each frame of the audio to be identified

Step S52: calculating parameters required for performing secondary spectral subtraction on the current frame audio: xi _m 、α _m 、β _m And b _m Taking the value of (A); wherein ξ _m The posterior signal-to-noise ratio of the frequency domain of the current frame audio signal; alpha is alpha _m 、β _m Respectively, a historical noise removal factor and a current frame audio noise removal factor; b _m Is the lowest spectral factor of the audio signal;

representing a spectral strength of the historical noise estimate;

step S53: performing secondary spectrum subtraction on the current frame of the audio to be recognized by using the following formula to obtain a voice spectrum of the current frame audio after noise reduction:

step S54: estimating noise of current audio to be identified

Iterative to historical noise estimation

Optimizing historical noise estimation;

step S55: and executing the steps S51, S52, S53 and S54 on the next piece of audio to be identified until the 50 pieces of voice audio to be identified are processed.

The noise reduction effect of the embodiment is verified by adopting the following method:

marking the voice of a speaker in 50 voice audios to be identified by using a marking tool, wherein the time slot is 10ms, and for each time slot, if the voice exists, the voice is marked as 1, otherwise, the voice is marked as 0; each audio file corresponds to a markup file.

For each piece of audio, the length of the audio is fixed, so that the length of the noise reduction processed result is the same as that of the labeling data. Comparing the result data with the content of the labeled data, counting the number a of the labeled contents different from each other, and calculating with the total labeled length b of the audio to obtain the error rate e as a/b.

Through calculation, by adopting the speech recognition method under the complex noise background based on the quadratic spectral subtraction disclosed by the embodiment, the error rate of denoising the audio to be recognized is 14.92%, and the accuracy rate is 85.1%; the error rate of denoising 50 audios to be identified by adopting a traditional power-based spectral subtraction method is 19.10%, and the accuracy rate is 80.9%; compared with the classical spectral subtraction method, the speech recognition method of the embodiment has the advantage that the accuracy rate is obviously improved.

Fig. 3 shows the comparison of waveforms before and after processing a piece of audio to be processed by using the speech recognition method of the embodiment, and waveforms before and after sound noise reduction can be observed from the marked position in the diagram, so that the background noise removal effect is obvious.

System embodiment

A speech recognition system based on quadratic spectral subtraction on a complex noise background, comprising:

a historical noise estimation module: obtaining a historical noise estimation of a complex noise background based on a noisy historical audio and a clean noise audio under the complex noise background;

the audio noise reduction processing module to be identified: performing framing processing on the audio to be identified under the complex noise background to obtain multi-frame audio; processing each frame of audio in sequence to obtain noise-reduced voice; wherein, to current frame audio frequency processing, include: and performing secondary spectrum subtraction on the current frame audio based on the historical noise estimation and the noise estimation of the current frame audio to obtain a voice spectrum of the current frame audio after noise reduction.

The historical noise estimation module comprises the following modules:

a noisy historical audio processing module: framing each piece of the historical audio containing noise, and processing to obtain a power spectrum of each frame signal of the historical audio containing noise; selecting the audio with the lowest power spectrum on each audio and with a preset number of frames as pure noise, and estimating to obtain the average power spectrum B of each frame of noise of each noisy historical audio _n (ω), wherein n represents the nth noisy historical audio;

a clean noise audio processing module: dividing each pure noise audio into frames, and processing to obtain the average noise power spectrum B of each frame of each pure noise audio _k (ω), where k represents the k-th clean noise audio;

a noise estimation module: b is to be _p (omega) and B _n (omega) weighted averaging to obtain a historical noise estimate

Wherein, treat discernment audio frequency noise reduction processing module includes:

the current audio noise estimation module: performing framing processing on the audio to be identified under the complex noise background to obtain multi-frame audio, selecting the audio with the lowest power spectrum of a preset number of frames as pure noise, and estimating the average power spectrum of the noise of each frame of the audio to be identified

Namely noise estimation of the current frame audio;

Because the speech recognition system based on the quadratic spectral subtraction under the complex noise background and the speech recognition method based on the quadratic spectral subtraction under the complex noise background are based on the same invention concept, the related parts can be referred to each other, and the same technical effect can be realized.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims

1. A speech recognition method based on quadratic spectral subtraction under a complex noise background is characterized by comprising the following steps:

processing each frame of audio in sequence to obtain noise-reduced voice; wherein, to current frame audio processing, include: and performing secondary spectrum subtraction on the current frame audio based on the historical noise estimation and the noise estimation of the current frame audio to obtain a voice spectrum of the current frame audio after noise reduction.

2. The speech recognition method of claim 1, wherein obtaining the historical noise estimate for the complex noise background based on noisy historical audio and clean noise audio in the complex noise background comprises:

selecting the audio frequency of the preset number of frames with the lowest power spectrum on each audio frequency as pure noise, and estimating to obtain the average power spectrum B of each frame of noise of each noisy historical audio frequency _i (ω), wherein i ═ 1, 2, 3, … …, n, represents the number of noisy historical audio;

dividing each pure noise audio into frames, and processing to obtain the average noise power spectrum B of each frame of each pure noise audio _j (ω), where j ═ 1, 2, 3, … …, k, represents the number of clean noise tones;

b is to be _i (omega) and B _j (omega) averaging to obtain the historical noise estimate

3. The speech recognition method of claim 2, wherein the noise estimation of the current frame audio comprises:

selecting the audio with the lowest power spectrum in the audio to be identified and the preset number of frames as pure noise; estimating a noise average power spectrum for each frame of audio to be identified based on the pure noise

I.e. a noise estimate of the current frame audio.

4. The speech recognition method of claim 3, wherein the current frame audio is subjected to secondary spectral subtraction by using the following formula to obtain a power spectrum estimate of the current frame audio, that is, a noise-reduced speech spectrum of the current frame audio:

wherein the content of the first and second substances,

represents the power spectrum estimation of the current frame audio, m represents the sequence number of the current frame audio, Y _n+1 (ω, m) represents the frequency spectrum of the audio of the current frame,. psi _n+1 (ω, m) represents phase information of the current frame audio; alpha is alpha _m 、β _m Respectively, a historical noise removal factor and a current frame audio noise removal factor; b _m Is the lowest spectral factor of the audio signal.

5. The speech recognition method of claim 4, wherein the α is calculated using the following formula _m 、β _m And b _m ：

Wherein c is a constant and xi _m The posterior signal-to-noise ratio of the frequency domain of the current frame audio signal is obtained; alpha (alpha) ("alpha") _min 、α _max Respectively represent alpha _m Minimum and maximum values of; beta is a _min 、β _max Respectively represents beta _m Minimum and maximum values of (d); b _min 、b _max Respectively represent b _m Maximum and minimum values of.

6. A speech recognition method according to claim 5, wherein the ξ is calculated by the following formula _m ：

representing the spectral strength of the historical noise estimate.

7. The speech recognition technique of claim 6, wherein a is estimated for the alpha _m 、β _m And b _m Is limited to a maximum and a minimum value, including

α _max ＝3,α _min ＝1,β _max ＝3,β _min ＝1,b _max ＝0.1,b _min ＝0.02。

8. A speech recognition system based on quadratic spectral subtraction on a complex noise background, comprising:

the audio noise reduction processing module to be identified: performing framing processing on the audio to be identified under the complex noise background to obtain multi-frame audio; processing each frame of audio in sequence to obtain noise-reduced voice; wherein, to current frame audio processing, include: and performing secondary spectrum subtraction on the current frame audio based on the historical noise estimation and the noise estimation of the current frame audio to obtain a voice spectrum of the current frame audio after noise reduction.

9. The speech recognition system of claim 8, wherein the historical noise estimation module comprises:

a noise-containing historical audio processing module used for framing each piece of the noise-containing historical audio respectively to obtain each frame signal of the noise-containing historical audio after processingThe power spectrum of (a); selecting the audio with the lowest power spectrum on each audio and with a preset number of frames as pure noise, and estimating to obtain the average power spectrum B of each frame of noise of each noisy historical audio _n (ω), wherein n represents the nth noisy historical audio;

a pure noise audio processing module for framing each pure noise audio and obtaining the average noise power spectrum B of each frame of each pure noise audio after processing _k (ω), wherein k represents a k-th clean noise audio;

a noise estimation module for estimating B _p (omega) and B _n (omega) averaging to obtain historical noise estimate

10. The speech recognition system of claim 9, wherein the audio to be recognized noise reduction processing module comprises:

the current audio noise estimation module is used for performing framing processing on the audio to be identified under the complex noise background to obtain multi-frame audio, selecting the audio with the lowest power spectrum and a preset number of frames as pure noise, and estimating the noise average power spectrum of each frame of the audio to be identified

Namely noise estimation of the current frame audio;

and the noise reduction processing module is used for carrying out secondary spectrum reduction on the current frame audio based on the historical noise estimation and the noise estimation of the current frame audio to obtain a voice frequency spectrum after the noise reduction of the current frame audio.