CN112242147B - Voice gain control method and computer storage medium - Google Patents

Voice gain control method and computer storage medium Download PDF

Info

Publication number
CN112242147B
CN112242147B CN202011098089.1A CN202011098089A CN112242147B CN 112242147 B CN112242147 B CN 112242147B CN 202011098089 A CN202011098089 A CN 202011098089A CN 112242147 B CN112242147 B CN 112242147B
Authority
CN
China
Prior art keywords
voice
neural network
network model
signal
domain signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011098089.1A
Other languages
Chinese (zh)
Other versions
CN112242147A (en
Inventor
陈东敏
陈荣观
薛建清
陈玉龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Xingwang Intelligent Technology Co ltd
Original Assignee
Fujian Xingwang Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Xingwang Intelligent Technology Co ltd filed Critical Fujian Xingwang Intelligent Technology Co ltd
Priority to CN202011098089.1A priority Critical patent/CN112242147B/en
Publication of CN112242147A publication Critical patent/CN112242147A/en
Application granted granted Critical
Publication of CN112242147B publication Critical patent/CN112242147B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain

Abstract

The invention relates to the technical field of voice communication, and discloses a voice gain control method and a computer storage medium, wherein the method comprises the following steps: carrying out framing and Fourier transformation on the voice signal to obtain an original amplitude spectrum and an original phase spectrum of the frequency domain signal; preprocessing the original amplitude spectrum through a neural network model, and inhibiting transient noise amplitude spectrum components in the original amplitude spectrum to obtain a voice enhancement amplitude spectrum; performing AGC processing and correction processing on the time domain signal, wherein the AGC processing comprises the steps of framing the time domain signal to obtain an envelope, and performing AGC processing on the envelope to obtain a gain coefficient; and finally, applying a gain coefficient to the time domain signal to complete gain control of the voice signal. After the deep learning neural network model is preprocessed, the amplitude of the instantaneous noise is greatly reduced and is far lower than the gain amplification threshold value in the AGC, and the gain correction processing can reduce the risk that the instantaneous noise is mistakenly amplified by the AGC algorithm, so that the voice call quality is improved.

Description

Voice gain control method and computer storage medium
Technical Field
The present invention relates to the field of voice communication technologies, and in particular, to a method for controlling a voice gain in a voice communication and a storage medium.
Background
In conference calls, the occurrence of transient noise can be distracted from the audience, and how to properly handle the transient noise is a technical problem that needs to be solved in conference calls. In the well-known open source algorithm webtc, an automatic gain control Algorithm (AGC) is used in conference calls. For example, in patent application CN201010181204.1, entitled speech noise reduction device, noise control is disclosed using AGC, which further suppresses noise intensity by reducing gain when target speech is not present, and the decision of the probability of presence of target speech is given by a speech detection unit. In addition, in the invention with the application number of CN201910860097.6 and the name of a microphone automatic gain control method, device and storage medium, the invention also discloses a method for controlling the volume of a microphone through AGC to realize noise suppression.
The existing AGC algorithm mainly comprises the steps of taking a signal time domain amplitude envelope according to the amplitude of an input signal, judging the difference between the amplitude and a given target amplitude, and calculating corresponding gain, so that the signal higher than the target amplitude is attenuated, and the amplitude lower than the target signal is improved. However, the conventional algorithm has a disadvantage in that when transient noise occurs in the voice, the amplitude judgment is easily disturbed, so that the voice signal which does not reach the target amplitude is judged to be a signal higher than the target amplitude due to the occurrence of the transient noise, thereby obtaining attenuation. And the smoothing processing in the algorithm can inhibit a small-end voice signal after the instantaneous noise appears, and the voice volume is inconsistent on the hearing sense, so that the conversation quality is reduced.
Disclosure of Invention
Therefore, it is necessary to provide a voice gain control method for solving the above-mentioned technical problem that the existing voice noise processing effect is poor.
In order to achieve the above object, the present invention provides a method for controlling a speech gain, comprising the steps of:
carrying out framing and Fourier transformation on the voice signal, and then converting the voice signal into a polar coordinate form to obtain an original amplitude spectrum and an original phase spectrum of the frequency domain signal;
preprocessing the original amplitude spectrum through a neural network model, wherein the preprocessing comprises the step of suppressing transient noise amplitude spectrum components in the original amplitude spectrum to obtain a voice enhancement amplitude spectrum; restoring the preprocessed voice enhancement amplitude spectrum into a time domain signal by combining the original phase spectrum;
calculating the energy ratio of the voice signal before preprocessing and the time domain signal after preprocessing;
performing AGC processing and correction processing on the time domain signal, wherein the AGC processing is performed on the envelope in frames of the time domain signal to obtain a gain coefficient, if the energy ratio is larger than a first preset threshold value, the gain coefficient is not corrected, if the energy ratio is smaller than the first preset threshold value and larger than a second preset threshold value, the gain coefficient is corrected to be not more than half of the gain coefficient, and if the energy ratio is smaller than the second preset threshold value, the gain coefficient is corrected to be the product result of the gain coefficient and the energy ratio;
and finally, applying the gain coefficient to the time domain signal to complete gain control of the voice signal.
Further, the neural network model structure includes: the input layer, the output layer, the first full-connection layer and the second full-connection layer, the first LSTM layer and the second LSTM layer are formed by training open source and self-grinding data sets.
Further, the input layer has 128 neurons, the 128 neurons corresponding to 128 amplitude spectrum values; the output layer has 128 neurons, the 128 neurons corresponding to 128 speech enhancement magnitude spectrum values, the first fully connected layer has 64 neurons, the second fully connected layer has 128 neurons, the first LSTM layer has 64 neurons, and the second LSTM layer has 128 neurons.
Further, the energy ratio in the correction processing is that the energy ratio before and after the pretreatment of the voice signal is obtained, the logarithm is compared with a preset threshold value, after the pretreatment of the neural network model, the energy ratio is smaller than the second preset threshold value in the correction processing, and if the energy ratio is smaller than the second preset threshold value, the energy ratio needs to be converted into a linear scale first and then multiplied by the gain coefficient.
Further, the step of preprocessing the frequency domain signal through a neural network model to suppress a noise signal in the frequency domain signal includes the steps of:
dividing the frequency domain signal into a voice section containing human voice and a voice-free section not containing human voice, wherein the suppression intensity of the noise signal in the voice section is greater than or equal to 12dB, and the suppression intensity of the noise signal in the voice-free section is greater than or equal to 24dB.
Further, after obtaining the speech enhancement amplitude spectrum, the method further comprises the steps of: the speech enhancement magnitude spectrum from frame to frame needs to be smoothed with a smoothing coefficient below 0.1
Further, the smoothing coefficient is 0.
Further, the neural network model is an LSTM neural network model through deep training, different data sets are adopted in different training stages, and the LSTM neural network model deep training comprises the following steps:
training by using the voice without transient noise and the characteristics of the voice without transient noise as the input and the output of an LSTM neural network model at the same time, and obtaining a parameter A after the LSTM neural network model converges;
and using the noisy speech feature as an input of the LSTM neural network model, using the speech feature without transient noise as an output of the LSTM neural network model, continuing training the neural network model based on the parameter A, obtaining a parameter B after the LSTM neural network model converges, and determining the parameter B as a parameter of the LSTM neural network model.
In order to solve the technical problems, the invention also provides a technical scheme:
a computer storage medium having a program stored therein, which when executed by a processor performs the speech gain control method described in any one of the above claims.
Compared with the prior art, the technical scheme provides a voice gain control method based on a neural network model, wherein in the scheme, voice signals are converted into polar coordinate forms after framing and Fourier transformation, and the original amplitude spectrum and the original phase spectrum of the corresponding frequency domain signals are obtained; before the AGC processing, the transient noise is detected through a neural network model, and when the transient noise is detected, the transient noise component in the original amplitude spectrum is restrained, so that the amplitude of the transient noise is reduced, and the normal AGC processing of the voice signal is not affected. Furthermore, the amplitude of the transient noise processed by the deep learning neural network model is greatly reduced and is far lower than the gain amplification threshold value in AGC (namely the second preset threshold value in AGC processing), and the risk that the transient noise is mistakenly amplified by an AGC algorithm can be reduced through gain correction processing, so that the voice call quality is improved.
Drawings
FIG. 1 is a flow chart of a method of controlling speech gain according to an embodiment;
FIG. 2 is a schematic diagram of a neural network model according to an embodiment;
fig. 3 is a schematic diagram of a computer storage medium according to an embodiment.
Detailed Description
In order to describe the technical content, constructional features, achieved objects and effects of the technical solution in detail, the following description is made in connection with the specific embodiments in conjunction with the accompanying drawings.
The existing AGC algorithm mainly calculates the gain of the current frame by combining the amplitude characteristic of the voice signal and the voice signal existence probability (VAD) of the current frame; wherein in the voice signal segment without human voice, if transient noise exists, attenuation is carried out so as to avoid noise false amplification. In a speech signal segment including a human voice, if transient noise is present, it is often difficult to detect the transient noise by the VAD, and even if the transient noise is detected, it is difficult to suppress the transient noise by the AGC. The reason is that the duration of the transient noise is usually 5-50 ms, the period is very short, the short-time Fourier transform segment is also 10ms to 20ms, the superposition of the transient noise spectrum and the voice spectrum is relatively high, and the transient noise is difficult to strip off from the spectrum.
In addition, the AGC is usually processed by adjusting the response according to the amplitude variation of the input signal, when the transient noise exists in the voice, the gain coefficient is reduced, so that the voice volume level reaches the target level, the voice signal segment where the transient noise exists is suppressed, the smoothing strategy is added, the suppressing effect lasts for a short period of time, and the voice becomes smaller after the transient noise appears in the sense of hearing, so that the actual conversation experience is affected.
Aiming at the interference of transient noise to AGC, the embodiment provides a voice gain control method for preprocessing an AGC algorithm. Referring to fig. 1 and 2, the voice gain control method can be used in voice communication such as conference call, and is used for noise suppression and automatic gain control of voice signals, so as to improve voice call quality. As shown in fig. 1, the voice gain control method includes the steps of:
s101, framing and Fourier transforming the voice signal, and then converting into a polar coordinate form to obtain an original amplitude spectrum S102 and an original phase spectrum S103 of the frequency domain signal;
s104, preprocessing the original amplitude spectrum through a neural network model, and inhibiting transient noise amplitude spectrum components in the original amplitude spectrum to obtain a voice enhancement amplitude spectrum;
s105, restoring the preprocessed voice enhancement amplitude spectrum into a time domain signal by combining an original phase spectrum, and calculating the ratio of energy of the voice signal before and after preprocessing;
s106, carrying out AGC processing and correction processing on the time domain signal, wherein the AGC processing comprises the steps of framing the time domain signal to obtain an envelope, carrying out AGC processing on the envelope to obtain a gain coefficient, if the energy ratio is larger than a first preset threshold value, not correcting the gain coefficient, if the energy ratio is smaller than the first preset threshold value and larger than a second preset threshold value, correcting the gain coefficient to be not more than half of the gain coefficient, and if the energy ratio is smaller than the second preset threshold value, correcting the gain coefficient to be the product result of the gain coefficient and the energy ratio;
and finally, applying a gain coefficient to the time domain signal to complete gain control of the voice signal.
In this embodiment, before the AGC algorithm processing is performed, whether the current speech signal frame includes transient noise is determined by the neural network model of deep learning, if yes, suppression is performed, otherwise, no processing is performed, so that interference of the transient noise on a call is avoided, and call experience is improved.
In step S101, an input speech signal is first framed, and fourier transformed to obtain a corresponding frequency domain signal, and transformed to a polar coordinate form to obtain an original amplitude spectrum and original phase spectrum information, where only the original amplitude spectrum is modified, and the original phase spectrum remains unchanged. Step S104 is performed after step S101.
In step S104, the original amplitude spectrum is preprocessed by the neural network model, that is, AGC preprocessing is performed, and an instantaneous noise amplitude spectrum component in the original amplitude spectrum is suppressed, so as to obtain a voice enhancement amplitude spectrum.
Preferably, as shown in fig. 2, the neural network model is a deep-learning LSTM neural network model (but not limited to the LSTM neural network model), which is also called a long-short-term memory network model, and compared with a conventional cyclic neural network, the LSTM has a more careful design of an internal structure, and an input gate it, a forgetting gate ft, an output gate t three gates and an internal memory unit ct are added. The input gate controls the degree to which the new state calculated at present is updated into the memory unit; the forgetting door controls how much information in the previous step of memory unit is forgotten; the extent to which the output gate controls the current output depends on the current memory cell.
In this embodiment, the LSTM neural network model includes: the input layer, the output layer, the first full-connection layer, the second full-connection layer, the first LSTM layer and the second LSTM layer. The input layer has 128 neurons, the 128 neurons corresponding to 128 amplitude spectrum values; the output layer has 128 neurons, the 128 neurons of the output layer correspond to 128 speech enhancement magnitude spectrum values, the first fully connected layer has 64 neurons, the second fully connected layer has 128 neurons, the first LSTM layer has 64 neurons, and the second LSTM layer has 128 neurons.
The LSTM neural network model for deep learning is divided into two parts, namely training and using. When the LSTM neural network model is used, the training is needed to be performed in advance by using a data set, the LSTM neural network model can be used for preprocessing the frequency domain signal after deep learning training, and noise signals in the frequency domain signal are restrained. Learning training of the LSTM neural network model includes:
1) Collecting clean voices, extracting characteristics in the voices, wherein the characteristics are signal spectrums, taking the voices and the characteristics as input and output of a neural network model, and obtaining parameters A after convergence;
2) On the basis of the parameter A, replacing a data set, collecting voice signals with noise, extracting characteristics in the voices, wherein the characteristics are signal frequency spectrums, taking the voice signals with noise and the characteristics as input of a network model, simultaneously using clean voices and corresponding characteristics as output of the network model, continuing training a neural network model based on the parameter A, and obtaining the parameter B after the model converges.
The parameter B is the final parameter, and the training mode has the advantages of accelerating the convergence speed and reducing the training time.
Training the LSTM neural network model comprises forward transmission and reverse feedback, and extracting weight parameters when the loss function of the LSTM neural network model is converged; when in use, the weight is directly used for forward transmission to obtain the expected characteristics, and reverse feedback is not needed.
In the above embodiment, the voice signal may be a full-band voice signal. The existing audio algorithm based on deep learning is characterized by taking a Mel frequency spectrum coefficient (MFCC) as one of characteristic values, dividing a voice full frequency band into a plurality of sub-frequency bands, and using the logarithmic energy of the sub-frequency bands and the characteristic of the sub-frequency bands to represent the characteristics of the sub-frequency bands, so that the audio algorithm is a coarse resolution analysis mode and has good effects in semantic analysis and voice recognition. However, the transient noise cannot be suppressed finely because the spectrum overlap of the transient noise and the speech signal is relatively high, which is expressed by: in the signal containing the voice, more transient noise still remains, and the noise is obviously perceived from existence to nonexistence to existence. Therefore, the embodiment adopts the complete frequency spectrum as the voice characteristic, can effectively enhance the suppression effect of the instantaneous noise and weaken the perception of the instantaneous noise by human ears.
And in the above embodiments, the full-band amplitude spectrum may be employed as the signal feature. If full-band complex spectrum is used, the number of fourier transform points is assumed to be 256, and the number of effective spectrums is 128 complex spectrums, which include 256 features in total of real part and imaginary part. According to the fact that the human ear is not obvious in noise phase perception and is different from complex frequency spectrum characteristics, in the embodiment, the amplitude spectrum is adopted as signal characteristics, namely 128 amplitude values, the number of input characteristics is reduced to half of that of the original signals, and the original phase information is still adopted as phase information.
In the learning training process, specifically, the spectrum of the noisy speech signal is included as the input feature, and the spectrum of the clean speech signal is included as the desired feature. The characteristic is that the signal spectrum, in particular, the Fourier transform result is converted into polar coordinate representation, and the phase information is discarded because the human ear is insensitive to the phase information, and the amplitude value is used as the input characteristic.
Step S105 is performed after step S104, and AGC processing and correction processing are performed on the time domain signal in step S105. The method comprises the steps of converting a preprocessed voice enhancement amplitude spectrum into a time domain signal by combining an original phase spectrum, framing the time domain signal to obtain an envelope, performing AGC (automatic gain control) processing on the envelope to obtain a gain coefficient, and correcting the gain coefficient to be not more than half of the gain coefficient if the energy ratio is larger than a first preset threshold value, correcting the gain coefficient to be the product of the gain coefficient and the energy ratio if the energy ratio is smaller than the first preset threshold value and larger than a second preset threshold value, and correcting the gain coefficient to be the product of the gain coefficient and the energy ratio if the energy ratio is smaller than the second preset threshold value. Through AGC processing (namely automatic gain control algorithm), the volume of voice signals is more consistent and consistent, and the conversation quality is improved. In this embodiment, since the preprocessing is performed before the AGC processing, when the transient noise occurs in the speech signal, the effective speech signal is effectively prevented from being suppressed, and the energy value of the transient noise is reduced, so that the correction processing prevents the transient noise from being misamplified during the AGC processing.
Preferably, in the above embodiment, in order to effectively suppress the transient noise interference, in the step S104, after the preprocessing by the neural network model, the energy ratio is smaller than the second preset threshold in the correction processing.
In one embodiment, the step of preprocessing the frequency domain signal through a neural network model to suppress a noise signal in the frequency domain signal includes the steps of:
dividing the frequency domain signal into a voice section containing human voice and a voice-free section not containing human voice, wherein the suppression intensity of the noise signal in the voice section is greater than or equal to 12dB, and the suppression intensity of the noise signal in the voice-free section is greater than or equal to 24dB.
In this embodiment, before AGC processing is performed, transient noise suppression is performed, and the suppression is performed by more than 24dB in a voice-free section containing no human voice, so that the suppression is prevented from being erroneously amplified by AGC, and the suppression is performed by 12dB or more in a voice section containing human voice, so that noise is prevented from being regarded as a voice signal, and judgment of the voice amplitude in AGC is affected, so that a voice signal with transient noise is suppressed, and the sense of hearing is interrupted.
Further, in the above embodiment, in the AGC processing of the frequency domain signal after the preprocessing in the step, the gain suppression coefficient and the gain amplification coefficient are not smoothed. For the suppression of steady-state noise, the gain smoothing operation is adopted, so that the processed voice is stable and natural, and noise generated by overlarge gain difference between frames is avoided. However, the occurrence frequency of the transient noise is far lower than that of the steady-state noise, and the suppression effect of the transient noise is reduced by adopting the smoothing operation, in particular, in the scene of continuously occurring transient noise (such as knocking a table), the knock suppression effect of the former times is weaker because the suppression of the transient noise by the smoothing operation generates a delay effect.
As shown in fig. 3, in another embodiment, a computer storage medium 300 is provided, in which a program is stored, which when executed by a processor, performs the speech gain control method described in any of the above embodiments.
It should be noted that, although the foregoing embodiments have been described herein, the scope of the present invention is not limited thereby. Therefore, based on the innovative concepts of the present invention, alterations and modifications to the embodiments described herein, or equivalent structures or equivalent flow transformations made by the present description and drawings, apply the above technical solution, directly or indirectly, to other relevant technical fields, all of which are included in the scope of the invention.

Claims (9)

1. A method for controlling speech gain, comprising the steps of:
carrying out framing and Fourier transformation on the voice signal, and then converting the voice signal into a polar coordinate form to obtain an original amplitude spectrum and an original phase spectrum of the frequency domain signal;
preprocessing the original amplitude spectrum through a neural network model, wherein the preprocessing comprises the step of suppressing transient noise amplitude spectrum components in the original amplitude spectrum to obtain a voice enhancement amplitude spectrum; restoring the preprocessed voice enhancement amplitude spectrum into a time domain signal by combining the original phase spectrum;
calculating the energy ratio of the voice signal before preprocessing and the time domain signal after preprocessing;
performing AGC processing and correction processing on the time domain signal, including: framing the time domain signal, solving an envelope, and performing AGC (automatic gain control) processing on the envelope to obtain a gain coefficient; if the energy ratio is larger than a first preset threshold, the gain coefficient is not corrected, if the energy ratio is smaller than the first preset threshold and larger than a second preset threshold, the gain coefficient is corrected to be not more than half of the gain coefficient, and if the energy ratio is smaller than the second preset threshold, the gain coefficient is corrected to be the product of the gain coefficient and the energy ratio;
and finally, applying the gain coefficient to the time domain signal to complete gain control of the voice signal.
2. The voice gain control method of claim 1, wherein the neural network model structure comprises: the input layer, the output layer, the first full-connection layer and the second full-connection layer, the first LSTM layer and the second LSTM layer are formed by training open source and self-grinding data sets.
3. The speech gain control method of claim 2, wherein the input layer has 128 neurons, the 128 neurons corresponding to 128 magnitude spectrum values; the output layer has 128 neurons, the 128 neurons corresponding to 128 speech enhancement magnitude spectrum values; the first fully connected layer has 64 neurons, the second fully connected layer has 128 neurons, the first LSTM layer has 64 neurons, and the second LSTM layer has 128 neurons.
4. The method according to claim 1, wherein the energy ratio in the correction process is obtained by taking a logarithm of the energy ratio before and after the pretreatment of the speech signal, comparing the logarithm with a second preset threshold, and after the pretreatment by a neural network model, the energy ratio is smaller than the second preset threshold in the correction process, and if the energy ratio is smaller than the second preset threshold, the energy ratio needs to be converted into linear scale first and then multiplied by the gain coefficient.
5. The method according to claim 1, wherein the step of preprocessing the frequency domain signal by a neural network model to suppress a noise signal in the frequency domain signal, comprises the steps of:
dividing the frequency domain signal into a voice section containing human voice and a voice-free section not containing human voice, wherein the suppression intensity of the noise signal in the voice section is greater than or equal to 12dB, and the suppression intensity of the noise signal in the voice-free section is greater than or equal to 24dB.
6. The voice gain control method of claim 1, further comprising the step of, after obtaining the voice enhancement amplitude spectrum: the speech enhancement magnitude spectrum from frame to frame requires smoothing with a smoothing coefficient below 0.1.
7. The speech gain control method of claim 6, wherein the smoothing factor is 0.
8. The method of claim 1, wherein the neural network model is an LSTM neural network model that is deep trained, and the deep training of the LSTM neural network model using different data sets during different training phases comprises:
respectively training by using the voice without transient noise and the characteristics of the voice without transient noise as the input and the output of an LSTM neural network model, and obtaining a parameter A after the LSTM neural network model converges;
and using the characteristics of the voice with noise as the input of the LSTM neural network model, using the characteristics of the voice without transient noise as the output of the LSTM neural network model, continuing training the neural network model based on the parameter A, obtaining the parameter B after the LSTM neural network model converges, and determining the parameter B as the parameter of the LSTM neural network model.
9. A computer storage medium, characterized in that the computer storage medium has stored therein a program which, when executed by a processor, performs the speech gain control method according to any of the preceding claims 1 to 8.
CN202011098089.1A 2020-10-14 2020-10-14 Voice gain control method and computer storage medium Active CN112242147B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011098089.1A CN112242147B (en) 2020-10-14 2020-10-14 Voice gain control method and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011098089.1A CN112242147B (en) 2020-10-14 2020-10-14 Voice gain control method and computer storage medium

Publications (2)

Publication Number Publication Date
CN112242147A CN112242147A (en) 2021-01-19
CN112242147B true CN112242147B (en) 2023-12-19

Family

ID=74169185

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011098089.1A Active CN112242147B (en) 2020-10-14 2020-10-14 Voice gain control method and computer storage medium

Country Status (1)

Country Link
CN (1) CN112242147B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113823312B (en) * 2021-02-19 2023-11-07 北京沃东天骏信息技术有限公司 Speech enhancement model generation method and device, and speech enhancement method and device
CN113436640B (en) * 2021-06-28 2022-11-25 歌尔科技有限公司 Audio noise reduction method, device and system and computer readable storage medium
CN113470691A (en) * 2021-07-08 2021-10-01 浙江大华技术股份有限公司 Automatic gain control method of voice signal and related device thereof
CN113823309B (en) * 2021-11-22 2022-02-08 成都启英泰伦科技有限公司 Noise reduction model construction and noise reduction processing method
CN113921030B (en) * 2021-12-07 2022-06-07 江苏清微智能科技有限公司 Speech enhancement neural network training method and device based on weighted speech loss
CN114566152B (en) * 2022-04-27 2022-07-08 成都启英泰伦科技有限公司 Voice endpoint detection method based on deep learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105515597A (en) * 2015-12-02 2016-04-20 中国电子科技集团公司第四十一研究所 Automatic gain control circuit for receivers
CN108877775A (en) * 2018-06-04 2018-11-23 平安科技(深圳)有限公司 Voice data processing method, device, computer equipment and storage medium
CN110036440A (en) * 2016-10-18 2019-07-19 弗劳恩霍夫应用研究促进协会 Device and method for handling audio signal

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9173025B2 (en) * 2012-02-08 2015-10-27 Dolby Laboratories Licensing Corporation Combined suppression of noise, echo, and out-of-location signals
EP3053356B8 (en) * 2013-10-30 2020-06-17 Cerence Operating Company Methods and apparatus for selective microphone signal combining
US11017798B2 (en) * 2017-12-29 2021-05-25 Harman Becker Automotive Systems Gmbh Dynamic noise suppression and operations for noisy speech signals

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105515597A (en) * 2015-12-02 2016-04-20 中国电子科技集团公司第四十一研究所 Automatic gain control circuit for receivers
CN110036440A (en) * 2016-10-18 2019-07-19 弗劳恩霍夫应用研究促进协会 Device and method for handling audio signal
CN108877775A (en) * 2018-06-04 2018-11-23 平安科技(深圳)有限公司 Voice data processing method, device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于稀疏性的相位谱补偿语音增强算法;张天骐 等;《信号处理》;第36卷(第11期);第1867-1876页 *

Also Published As

Publication number Publication date
CN112242147A (en) 2021-01-19

Similar Documents

Publication Publication Date Title
CN112242147B (en) Voice gain control method and computer storage medium
US10355658B1 (en) Automatic volume control and leveler
US9076456B1 (en) System and method for providing voice equalization
CN103238183B (en) Noise suppression device
EP3866165B1 (en) Method for enhancing telephone speech signals based on convolutional neural networks
US20130030800A1 (en) Adaptive voice intelligibility processor
US20120123769A1 (en) Gain control apparatus and gain control method, and voice output apparatus
US10115411B1 (en) Methods for suppressing residual echo
JPH08221093A (en) Method of noise reduction in voice signal
KR101855969B1 (en) A digital compressor for compressing an audio signal
EP3929919A1 (en) Voice signal processing method and device, apparatus, and readable storage medium
US11128954B2 (en) Method and electronic device for managing loudness of audio signal
JP5752324B2 (en) Single channel suppression of impulsive interference in noisy speech signals.
CN113539285B (en) Audio signal noise reduction method, electronic device and storage medium
WO2017106454A1 (en) Suppression of breath in audio signals
KR20160113224A (en) An audio compression system for compressing an audio signal
JP2024517721A (en) Audio optimization for noisy environments
CN112151060B (en) Single-channel voice enhancement method and device, storage medium and terminal
US9614486B1 (en) Adaptive gain control
GB2536727B (en) A speech processing device
KR101993003B1 (en) Apparatus and method for noise reduction
CN104703108A (en) Wide-dynamic compression algorithm of digital hearing aid under noise condition
CN113409812B (en) Processing method and device of voice noise reduction training data and training method
CN111711881B (en) Self-adaptive volume adjustment method according to environmental sound and wireless earphone
US11527232B2 (en) Applying noise suppression to remote and local microphone signals

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant