CN112242147B - Voice gain control method and computer storage medium - Google Patents
Voice gain control method and computer storage medium Download PDFInfo
- Publication number
- CN112242147B CN112242147B CN202011098089.1A CN202011098089A CN112242147B CN 112242147 B CN112242147 B CN 112242147B CN 202011098089 A CN202011098089 A CN 202011098089A CN 112242147 B CN112242147 B CN 112242147B
- Authority
- CN
- China
- Prior art keywords
- voice
- neural network
- network model
- signal
- domain signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000001228 spectrum Methods 0.000 claims abstract description 70
- 238000003062 neural network model Methods 0.000 claims abstract description 51
- 230000001052 transient effect Effects 0.000 claims abstract description 47
- 238000012545 processing Methods 0.000 claims abstract description 35
- 238000007781 pre-processing Methods 0.000 claims abstract description 20
- 238000012937 correction Methods 0.000 claims abstract description 13
- 238000009432 framing Methods 0.000 claims abstract description 9
- 230000009466 transformation Effects 0.000 claims abstract description 5
- 210000002569 neuron Anatomy 0.000 claims description 24
- 238000012549 training Methods 0.000 claims description 20
- 230000001629 suppression Effects 0.000 claims description 19
- 238000009499 grossing Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 3
- 230000002829 reductive effect Effects 0.000 abstract description 9
- 238000013135 deep learning Methods 0.000 abstract description 7
- 230000003321 amplification Effects 0.000 abstract description 4
- 238000004891 communication Methods 0.000 abstract description 4
- 238000003199 nucleic acid amplification method Methods 0.000 abstract description 4
- 230000002401 inhibitory effect Effects 0.000 abstract description 2
- 230000000694 effects Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0224—Processing in the time domain
Abstract
The invention relates to the technical field of voice communication, and discloses a voice gain control method and a computer storage medium, wherein the method comprises the following steps: carrying out framing and Fourier transformation on the voice signal to obtain an original amplitude spectrum and an original phase spectrum of the frequency domain signal; preprocessing the original amplitude spectrum through a neural network model, and inhibiting transient noise amplitude spectrum components in the original amplitude spectrum to obtain a voice enhancement amplitude spectrum; performing AGC processing and correction processing on the time domain signal, wherein the AGC processing comprises the steps of framing the time domain signal to obtain an envelope, and performing AGC processing on the envelope to obtain a gain coefficient; and finally, applying a gain coefficient to the time domain signal to complete gain control of the voice signal. After the deep learning neural network model is preprocessed, the amplitude of the instantaneous noise is greatly reduced and is far lower than the gain amplification threshold value in the AGC, and the gain correction processing can reduce the risk that the instantaneous noise is mistakenly amplified by the AGC algorithm, so that the voice call quality is improved.
Description
Technical Field
The present invention relates to the field of voice communication technologies, and in particular, to a method for controlling a voice gain in a voice communication and a storage medium.
Background
In conference calls, the occurrence of transient noise can be distracted from the audience, and how to properly handle the transient noise is a technical problem that needs to be solved in conference calls. In the well-known open source algorithm webtc, an automatic gain control Algorithm (AGC) is used in conference calls. For example, in patent application CN201010181204.1, entitled speech noise reduction device, noise control is disclosed using AGC, which further suppresses noise intensity by reducing gain when target speech is not present, and the decision of the probability of presence of target speech is given by a speech detection unit. In addition, in the invention with the application number of CN201910860097.6 and the name of a microphone automatic gain control method, device and storage medium, the invention also discloses a method for controlling the volume of a microphone through AGC to realize noise suppression.
The existing AGC algorithm mainly comprises the steps of taking a signal time domain amplitude envelope according to the amplitude of an input signal, judging the difference between the amplitude and a given target amplitude, and calculating corresponding gain, so that the signal higher than the target amplitude is attenuated, and the amplitude lower than the target signal is improved. However, the conventional algorithm has a disadvantage in that when transient noise occurs in the voice, the amplitude judgment is easily disturbed, so that the voice signal which does not reach the target amplitude is judged to be a signal higher than the target amplitude due to the occurrence of the transient noise, thereby obtaining attenuation. And the smoothing processing in the algorithm can inhibit a small-end voice signal after the instantaneous noise appears, and the voice volume is inconsistent on the hearing sense, so that the conversation quality is reduced.
Disclosure of Invention
Therefore, it is necessary to provide a voice gain control method for solving the above-mentioned technical problem that the existing voice noise processing effect is poor.
In order to achieve the above object, the present invention provides a method for controlling a speech gain, comprising the steps of:
carrying out framing and Fourier transformation on the voice signal, and then converting the voice signal into a polar coordinate form to obtain an original amplitude spectrum and an original phase spectrum of the frequency domain signal;
preprocessing the original amplitude spectrum through a neural network model, wherein the preprocessing comprises the step of suppressing transient noise amplitude spectrum components in the original amplitude spectrum to obtain a voice enhancement amplitude spectrum; restoring the preprocessed voice enhancement amplitude spectrum into a time domain signal by combining the original phase spectrum;
calculating the energy ratio of the voice signal before preprocessing and the time domain signal after preprocessing;
performing AGC processing and correction processing on the time domain signal, wherein the AGC processing is performed on the envelope in frames of the time domain signal to obtain a gain coefficient, if the energy ratio is larger than a first preset threshold value, the gain coefficient is not corrected, if the energy ratio is smaller than the first preset threshold value and larger than a second preset threshold value, the gain coefficient is corrected to be not more than half of the gain coefficient, and if the energy ratio is smaller than the second preset threshold value, the gain coefficient is corrected to be the product result of the gain coefficient and the energy ratio;
and finally, applying the gain coefficient to the time domain signal to complete gain control of the voice signal.
Further, the neural network model structure includes: the input layer, the output layer, the first full-connection layer and the second full-connection layer, the first LSTM layer and the second LSTM layer are formed by training open source and self-grinding data sets.
Further, the input layer has 128 neurons, the 128 neurons corresponding to 128 amplitude spectrum values; the output layer has 128 neurons, the 128 neurons corresponding to 128 speech enhancement magnitude spectrum values, the first fully connected layer has 64 neurons, the second fully connected layer has 128 neurons, the first LSTM layer has 64 neurons, and the second LSTM layer has 128 neurons.
Further, the energy ratio in the correction processing is that the energy ratio before and after the pretreatment of the voice signal is obtained, the logarithm is compared with a preset threshold value, after the pretreatment of the neural network model, the energy ratio is smaller than the second preset threshold value in the correction processing, and if the energy ratio is smaller than the second preset threshold value, the energy ratio needs to be converted into a linear scale first and then multiplied by the gain coefficient.
Further, the step of preprocessing the frequency domain signal through a neural network model to suppress a noise signal in the frequency domain signal includes the steps of:
dividing the frequency domain signal into a voice section containing human voice and a voice-free section not containing human voice, wherein the suppression intensity of the noise signal in the voice section is greater than or equal to 12dB, and the suppression intensity of the noise signal in the voice-free section is greater than or equal to 24dB.
Further, after obtaining the speech enhancement amplitude spectrum, the method further comprises the steps of: the speech enhancement magnitude spectrum from frame to frame needs to be smoothed with a smoothing coefficient below 0.1
Further, the smoothing coefficient is 0.
Further, the neural network model is an LSTM neural network model through deep training, different data sets are adopted in different training stages, and the LSTM neural network model deep training comprises the following steps:
training by using the voice without transient noise and the characteristics of the voice without transient noise as the input and the output of an LSTM neural network model at the same time, and obtaining a parameter A after the LSTM neural network model converges;
and using the noisy speech feature as an input of the LSTM neural network model, using the speech feature without transient noise as an output of the LSTM neural network model, continuing training the neural network model based on the parameter A, obtaining a parameter B after the LSTM neural network model converges, and determining the parameter B as a parameter of the LSTM neural network model.
In order to solve the technical problems, the invention also provides a technical scheme:
a computer storage medium having a program stored therein, which when executed by a processor performs the speech gain control method described in any one of the above claims.
Compared with the prior art, the technical scheme provides a voice gain control method based on a neural network model, wherein in the scheme, voice signals are converted into polar coordinate forms after framing and Fourier transformation, and the original amplitude spectrum and the original phase spectrum of the corresponding frequency domain signals are obtained; before the AGC processing, the transient noise is detected through a neural network model, and when the transient noise is detected, the transient noise component in the original amplitude spectrum is restrained, so that the amplitude of the transient noise is reduced, and the normal AGC processing of the voice signal is not affected. Furthermore, the amplitude of the transient noise processed by the deep learning neural network model is greatly reduced and is far lower than the gain amplification threshold value in AGC (namely the second preset threshold value in AGC processing), and the risk that the transient noise is mistakenly amplified by an AGC algorithm can be reduced through gain correction processing, so that the voice call quality is improved.
Drawings
FIG. 1 is a flow chart of a method of controlling speech gain according to an embodiment;
FIG. 2 is a schematic diagram of a neural network model according to an embodiment;
fig. 3 is a schematic diagram of a computer storage medium according to an embodiment.
Detailed Description
In order to describe the technical content, constructional features, achieved objects and effects of the technical solution in detail, the following description is made in connection with the specific embodiments in conjunction with the accompanying drawings.
The existing AGC algorithm mainly calculates the gain of the current frame by combining the amplitude characteristic of the voice signal and the voice signal existence probability (VAD) of the current frame; wherein in the voice signal segment without human voice, if transient noise exists, attenuation is carried out so as to avoid noise false amplification. In a speech signal segment including a human voice, if transient noise is present, it is often difficult to detect the transient noise by the VAD, and even if the transient noise is detected, it is difficult to suppress the transient noise by the AGC. The reason is that the duration of the transient noise is usually 5-50 ms, the period is very short, the short-time Fourier transform segment is also 10ms to 20ms, the superposition of the transient noise spectrum and the voice spectrum is relatively high, and the transient noise is difficult to strip off from the spectrum.
In addition, the AGC is usually processed by adjusting the response according to the amplitude variation of the input signal, when the transient noise exists in the voice, the gain coefficient is reduced, so that the voice volume level reaches the target level, the voice signal segment where the transient noise exists is suppressed, the smoothing strategy is added, the suppressing effect lasts for a short period of time, and the voice becomes smaller after the transient noise appears in the sense of hearing, so that the actual conversation experience is affected.
Aiming at the interference of transient noise to AGC, the embodiment provides a voice gain control method for preprocessing an AGC algorithm. Referring to fig. 1 and 2, the voice gain control method can be used in voice communication such as conference call, and is used for noise suppression and automatic gain control of voice signals, so as to improve voice call quality. As shown in fig. 1, the voice gain control method includes the steps of:
s101, framing and Fourier transforming the voice signal, and then converting into a polar coordinate form to obtain an original amplitude spectrum S102 and an original phase spectrum S103 of the frequency domain signal;
s104, preprocessing the original amplitude spectrum through a neural network model, and inhibiting transient noise amplitude spectrum components in the original amplitude spectrum to obtain a voice enhancement amplitude spectrum;
s105, restoring the preprocessed voice enhancement amplitude spectrum into a time domain signal by combining an original phase spectrum, and calculating the ratio of energy of the voice signal before and after preprocessing;
s106, carrying out AGC processing and correction processing on the time domain signal, wherein the AGC processing comprises the steps of framing the time domain signal to obtain an envelope, carrying out AGC processing on the envelope to obtain a gain coefficient, if the energy ratio is larger than a first preset threshold value, not correcting the gain coefficient, if the energy ratio is smaller than the first preset threshold value and larger than a second preset threshold value, correcting the gain coefficient to be not more than half of the gain coefficient, and if the energy ratio is smaller than the second preset threshold value, correcting the gain coefficient to be the product result of the gain coefficient and the energy ratio;
and finally, applying a gain coefficient to the time domain signal to complete gain control of the voice signal.
In this embodiment, before the AGC algorithm processing is performed, whether the current speech signal frame includes transient noise is determined by the neural network model of deep learning, if yes, suppression is performed, otherwise, no processing is performed, so that interference of the transient noise on a call is avoided, and call experience is improved.
In step S101, an input speech signal is first framed, and fourier transformed to obtain a corresponding frequency domain signal, and transformed to a polar coordinate form to obtain an original amplitude spectrum and original phase spectrum information, where only the original amplitude spectrum is modified, and the original phase spectrum remains unchanged. Step S104 is performed after step S101.
In step S104, the original amplitude spectrum is preprocessed by the neural network model, that is, AGC preprocessing is performed, and an instantaneous noise amplitude spectrum component in the original amplitude spectrum is suppressed, so as to obtain a voice enhancement amplitude spectrum.
Preferably, as shown in fig. 2, the neural network model is a deep-learning LSTM neural network model (but not limited to the LSTM neural network model), which is also called a long-short-term memory network model, and compared with a conventional cyclic neural network, the LSTM has a more careful design of an internal structure, and an input gate it, a forgetting gate ft, an output gate t three gates and an internal memory unit ct are added. The input gate controls the degree to which the new state calculated at present is updated into the memory unit; the forgetting door controls how much information in the previous step of memory unit is forgotten; the extent to which the output gate controls the current output depends on the current memory cell.
In this embodiment, the LSTM neural network model includes: the input layer, the output layer, the first full-connection layer, the second full-connection layer, the first LSTM layer and the second LSTM layer. The input layer has 128 neurons, the 128 neurons corresponding to 128 amplitude spectrum values; the output layer has 128 neurons, the 128 neurons of the output layer correspond to 128 speech enhancement magnitude spectrum values, the first fully connected layer has 64 neurons, the second fully connected layer has 128 neurons, the first LSTM layer has 64 neurons, and the second LSTM layer has 128 neurons.
The LSTM neural network model for deep learning is divided into two parts, namely training and using. When the LSTM neural network model is used, the training is needed to be performed in advance by using a data set, the LSTM neural network model can be used for preprocessing the frequency domain signal after deep learning training, and noise signals in the frequency domain signal are restrained. Learning training of the LSTM neural network model includes:
1) Collecting clean voices, extracting characteristics in the voices, wherein the characteristics are signal spectrums, taking the voices and the characteristics as input and output of a neural network model, and obtaining parameters A after convergence;
2) On the basis of the parameter A, replacing a data set, collecting voice signals with noise, extracting characteristics in the voices, wherein the characteristics are signal frequency spectrums, taking the voice signals with noise and the characteristics as input of a network model, simultaneously using clean voices and corresponding characteristics as output of the network model, continuing training a neural network model based on the parameter A, and obtaining the parameter B after the model converges.
The parameter B is the final parameter, and the training mode has the advantages of accelerating the convergence speed and reducing the training time.
Training the LSTM neural network model comprises forward transmission and reverse feedback, and extracting weight parameters when the loss function of the LSTM neural network model is converged; when in use, the weight is directly used for forward transmission to obtain the expected characteristics, and reverse feedback is not needed.
In the above embodiment, the voice signal may be a full-band voice signal. The existing audio algorithm based on deep learning is characterized by taking a Mel frequency spectrum coefficient (MFCC) as one of characteristic values, dividing a voice full frequency band into a plurality of sub-frequency bands, and using the logarithmic energy of the sub-frequency bands and the characteristic of the sub-frequency bands to represent the characteristics of the sub-frequency bands, so that the audio algorithm is a coarse resolution analysis mode and has good effects in semantic analysis and voice recognition. However, the transient noise cannot be suppressed finely because the spectrum overlap of the transient noise and the speech signal is relatively high, which is expressed by: in the signal containing the voice, more transient noise still remains, and the noise is obviously perceived from existence to nonexistence to existence. Therefore, the embodiment adopts the complete frequency spectrum as the voice characteristic, can effectively enhance the suppression effect of the instantaneous noise and weaken the perception of the instantaneous noise by human ears.
And in the above embodiments, the full-band amplitude spectrum may be employed as the signal feature. If full-band complex spectrum is used, the number of fourier transform points is assumed to be 256, and the number of effective spectrums is 128 complex spectrums, which include 256 features in total of real part and imaginary part. According to the fact that the human ear is not obvious in noise phase perception and is different from complex frequency spectrum characteristics, in the embodiment, the amplitude spectrum is adopted as signal characteristics, namely 128 amplitude values, the number of input characteristics is reduced to half of that of the original signals, and the original phase information is still adopted as phase information.
In the learning training process, specifically, the spectrum of the noisy speech signal is included as the input feature, and the spectrum of the clean speech signal is included as the desired feature. The characteristic is that the signal spectrum, in particular, the Fourier transform result is converted into polar coordinate representation, and the phase information is discarded because the human ear is insensitive to the phase information, and the amplitude value is used as the input characteristic.
Step S105 is performed after step S104, and AGC processing and correction processing are performed on the time domain signal in step S105. The method comprises the steps of converting a preprocessed voice enhancement amplitude spectrum into a time domain signal by combining an original phase spectrum, framing the time domain signal to obtain an envelope, performing AGC (automatic gain control) processing on the envelope to obtain a gain coefficient, and correcting the gain coefficient to be not more than half of the gain coefficient if the energy ratio is larger than a first preset threshold value, correcting the gain coefficient to be the product of the gain coefficient and the energy ratio if the energy ratio is smaller than the first preset threshold value and larger than a second preset threshold value, and correcting the gain coefficient to be the product of the gain coefficient and the energy ratio if the energy ratio is smaller than the second preset threshold value. Through AGC processing (namely automatic gain control algorithm), the volume of voice signals is more consistent and consistent, and the conversation quality is improved. In this embodiment, since the preprocessing is performed before the AGC processing, when the transient noise occurs in the speech signal, the effective speech signal is effectively prevented from being suppressed, and the energy value of the transient noise is reduced, so that the correction processing prevents the transient noise from being misamplified during the AGC processing.
Preferably, in the above embodiment, in order to effectively suppress the transient noise interference, in the step S104, after the preprocessing by the neural network model, the energy ratio is smaller than the second preset threshold in the correction processing.
In one embodiment, the step of preprocessing the frequency domain signal through a neural network model to suppress a noise signal in the frequency domain signal includes the steps of:
dividing the frequency domain signal into a voice section containing human voice and a voice-free section not containing human voice, wherein the suppression intensity of the noise signal in the voice section is greater than or equal to 12dB, and the suppression intensity of the noise signal in the voice-free section is greater than or equal to 24dB.
In this embodiment, before AGC processing is performed, transient noise suppression is performed, and the suppression is performed by more than 24dB in a voice-free section containing no human voice, so that the suppression is prevented from being erroneously amplified by AGC, and the suppression is performed by 12dB or more in a voice section containing human voice, so that noise is prevented from being regarded as a voice signal, and judgment of the voice amplitude in AGC is affected, so that a voice signal with transient noise is suppressed, and the sense of hearing is interrupted.
Further, in the above embodiment, in the AGC processing of the frequency domain signal after the preprocessing in the step, the gain suppression coefficient and the gain amplification coefficient are not smoothed. For the suppression of steady-state noise, the gain smoothing operation is adopted, so that the processed voice is stable and natural, and noise generated by overlarge gain difference between frames is avoided. However, the occurrence frequency of the transient noise is far lower than that of the steady-state noise, and the suppression effect of the transient noise is reduced by adopting the smoothing operation, in particular, in the scene of continuously occurring transient noise (such as knocking a table), the knock suppression effect of the former times is weaker because the suppression of the transient noise by the smoothing operation generates a delay effect.
As shown in fig. 3, in another embodiment, a computer storage medium 300 is provided, in which a program is stored, which when executed by a processor, performs the speech gain control method described in any of the above embodiments.
It should be noted that, although the foregoing embodiments have been described herein, the scope of the present invention is not limited thereby. Therefore, based on the innovative concepts of the present invention, alterations and modifications to the embodiments described herein, or equivalent structures or equivalent flow transformations made by the present description and drawings, apply the above technical solution, directly or indirectly, to other relevant technical fields, all of which are included in the scope of the invention.
Claims (9)
1. A method for controlling speech gain, comprising the steps of:
carrying out framing and Fourier transformation on the voice signal, and then converting the voice signal into a polar coordinate form to obtain an original amplitude spectrum and an original phase spectrum of the frequency domain signal;
preprocessing the original amplitude spectrum through a neural network model, wherein the preprocessing comprises the step of suppressing transient noise amplitude spectrum components in the original amplitude spectrum to obtain a voice enhancement amplitude spectrum; restoring the preprocessed voice enhancement amplitude spectrum into a time domain signal by combining the original phase spectrum;
calculating the energy ratio of the voice signal before preprocessing and the time domain signal after preprocessing;
performing AGC processing and correction processing on the time domain signal, including: framing the time domain signal, solving an envelope, and performing AGC (automatic gain control) processing on the envelope to obtain a gain coefficient; if the energy ratio is larger than a first preset threshold, the gain coefficient is not corrected, if the energy ratio is smaller than the first preset threshold and larger than a second preset threshold, the gain coefficient is corrected to be not more than half of the gain coefficient, and if the energy ratio is smaller than the second preset threshold, the gain coefficient is corrected to be the product of the gain coefficient and the energy ratio;
and finally, applying the gain coefficient to the time domain signal to complete gain control of the voice signal.
2. The voice gain control method of claim 1, wherein the neural network model structure comprises: the input layer, the output layer, the first full-connection layer and the second full-connection layer, the first LSTM layer and the second LSTM layer are formed by training open source and self-grinding data sets.
3. The speech gain control method of claim 2, wherein the input layer has 128 neurons, the 128 neurons corresponding to 128 magnitude spectrum values; the output layer has 128 neurons, the 128 neurons corresponding to 128 speech enhancement magnitude spectrum values; the first fully connected layer has 64 neurons, the second fully connected layer has 128 neurons, the first LSTM layer has 64 neurons, and the second LSTM layer has 128 neurons.
4. The method according to claim 1, wherein the energy ratio in the correction process is obtained by taking a logarithm of the energy ratio before and after the pretreatment of the speech signal, comparing the logarithm with a second preset threshold, and after the pretreatment by a neural network model, the energy ratio is smaller than the second preset threshold in the correction process, and if the energy ratio is smaller than the second preset threshold, the energy ratio needs to be converted into linear scale first and then multiplied by the gain coefficient.
5. The method according to claim 1, wherein the step of preprocessing the frequency domain signal by a neural network model to suppress a noise signal in the frequency domain signal, comprises the steps of:
dividing the frequency domain signal into a voice section containing human voice and a voice-free section not containing human voice, wherein the suppression intensity of the noise signal in the voice section is greater than or equal to 12dB, and the suppression intensity of the noise signal in the voice-free section is greater than or equal to 24dB.
6. The voice gain control method of claim 1, further comprising the step of, after obtaining the voice enhancement amplitude spectrum: the speech enhancement magnitude spectrum from frame to frame requires smoothing with a smoothing coefficient below 0.1.
7. The speech gain control method of claim 6, wherein the smoothing factor is 0.
8. The method of claim 1, wherein the neural network model is an LSTM neural network model that is deep trained, and the deep training of the LSTM neural network model using different data sets during different training phases comprises:
respectively training by using the voice without transient noise and the characteristics of the voice without transient noise as the input and the output of an LSTM neural network model, and obtaining a parameter A after the LSTM neural network model converges;
and using the characteristics of the voice with noise as the input of the LSTM neural network model, using the characteristics of the voice without transient noise as the output of the LSTM neural network model, continuing training the neural network model based on the parameter A, obtaining the parameter B after the LSTM neural network model converges, and determining the parameter B as the parameter of the LSTM neural network model.
9. A computer storage medium, characterized in that the computer storage medium has stored therein a program which, when executed by a processor, performs the speech gain control method according to any of the preceding claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011098089.1A CN112242147B (en) | 2020-10-14 | 2020-10-14 | Voice gain control method and computer storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011098089.1A CN112242147B (en) | 2020-10-14 | 2020-10-14 | Voice gain control method and computer storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112242147A CN112242147A (en) | 2021-01-19 |
CN112242147B true CN112242147B (en) | 2023-12-19 |
Family
ID=74169185
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011098089.1A Active CN112242147B (en) | 2020-10-14 | 2020-10-14 | Voice gain control method and computer storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112242147B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113823312B (en) * | 2021-02-19 | 2023-11-07 | 北京沃东天骏信息技术有限公司 | Speech enhancement model generation method and device, and speech enhancement method and device |
CN113436640B (en) * | 2021-06-28 | 2022-11-25 | 歌尔科技有限公司 | Audio noise reduction method, device and system and computer readable storage medium |
CN113470691A (en) * | 2021-07-08 | 2021-10-01 | 浙江大华技术股份有限公司 | Automatic gain control method of voice signal and related device thereof |
CN113823309B (en) * | 2021-11-22 | 2022-02-08 | 成都启英泰伦科技有限公司 | Noise reduction model construction and noise reduction processing method |
CN113921030B (en) * | 2021-12-07 | 2022-06-07 | 江苏清微智能科技有限公司 | Speech enhancement neural network training method and device based on weighted speech loss |
CN114566152B (en) * | 2022-04-27 | 2022-07-08 | 成都启英泰伦科技有限公司 | Voice endpoint detection method based on deep learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105515597A (en) * | 2015-12-02 | 2016-04-20 | 中国电子科技集团公司第四十一研究所 | Automatic gain control circuit for receivers |
CN108877775A (en) * | 2018-06-04 | 2018-11-23 | 平安科技(深圳)有限公司 | Voice data processing method, device, computer equipment and storage medium |
CN110036440A (en) * | 2016-10-18 | 2019-07-19 | 弗劳恩霍夫应用研究促进协会 | Device and method for handling audio signal |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9173025B2 (en) * | 2012-02-08 | 2015-10-27 | Dolby Laboratories Licensing Corporation | Combined suppression of noise, echo, and out-of-location signals |
EP3053356B8 (en) * | 2013-10-30 | 2020-06-17 | Cerence Operating Company | Methods and apparatus for selective microphone signal combining |
US11017798B2 (en) * | 2017-12-29 | 2021-05-25 | Harman Becker Automotive Systems Gmbh | Dynamic noise suppression and operations for noisy speech signals |
-
2020
- 2020-10-14 CN CN202011098089.1A patent/CN112242147B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105515597A (en) * | 2015-12-02 | 2016-04-20 | 中国电子科技集团公司第四十一研究所 | Automatic gain control circuit for receivers |
CN110036440A (en) * | 2016-10-18 | 2019-07-19 | 弗劳恩霍夫应用研究促进协会 | Device and method for handling audio signal |
CN108877775A (en) * | 2018-06-04 | 2018-11-23 | 平安科技(深圳)有限公司 | Voice data processing method, device, computer equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
基于稀疏性的相位谱补偿语音增强算法;张天骐 等;《信号处理》;第36卷(第11期);第1867-1876页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112242147A (en) | 2021-01-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112242147B (en) | Voice gain control method and computer storage medium | |
US10355658B1 (en) | Automatic volume control and leveler | |
US9076456B1 (en) | System and method for providing voice equalization | |
CN103238183B (en) | Noise suppression device | |
EP3866165B1 (en) | Method for enhancing telephone speech signals based on convolutional neural networks | |
US20130030800A1 (en) | Adaptive voice intelligibility processor | |
US20120123769A1 (en) | Gain control apparatus and gain control method, and voice output apparatus | |
US10115411B1 (en) | Methods for suppressing residual echo | |
JPH08221093A (en) | Method of noise reduction in voice signal | |
KR101855969B1 (en) | A digital compressor for compressing an audio signal | |
EP3929919A1 (en) | Voice signal processing method and device, apparatus, and readable storage medium | |
US11128954B2 (en) | Method and electronic device for managing loudness of audio signal | |
JP5752324B2 (en) | Single channel suppression of impulsive interference in noisy speech signals. | |
CN113539285B (en) | Audio signal noise reduction method, electronic device and storage medium | |
WO2017106454A1 (en) | Suppression of breath in audio signals | |
KR20160113224A (en) | An audio compression system for compressing an audio signal | |
JP2024517721A (en) | Audio optimization for noisy environments | |
CN112151060B (en) | Single-channel voice enhancement method and device, storage medium and terminal | |
US9614486B1 (en) | Adaptive gain control | |
GB2536727B (en) | A speech processing device | |
KR101993003B1 (en) | Apparatus and method for noise reduction | |
CN104703108A (en) | Wide-dynamic compression algorithm of digital hearing aid under noise condition | |
CN113409812B (en) | Processing method and device of voice noise reduction training data and training method | |
CN111711881B (en) | Self-adaptive volume adjustment method according to environmental sound and wireless earphone | |
US11527232B2 (en) | Applying noise suppression to remote and local microphone signals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |