CN110660408B - Method and device for digital automatic gain control - Google Patents

Method and device for digital automatic gain control Download PDF

Info

Publication number
CN110660408B
CN110660408B CN201910860075.XA CN201910860075A CN110660408B CN 110660408 B CN110660408 B CN 110660408B CN 201910860075 A CN201910860075 A CN 201910860075A CN 110660408 B CN110660408 B CN 110660408B
Authority
CN
China
Prior art keywords
gain
signal data
speech
signal
amplitude
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910860075.XA
Other languages
Chinese (zh)
Other versions
CN110660408A (en
Inventor
何志辉
林立峰
康元勋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Yealink Network Technology Co Ltd
Original Assignee
Xiamen Yealink Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Yealink Network Technology Co Ltd filed Critical Xiamen Yealink Network Technology Co Ltd
Priority to CN201910860075.XA priority Critical patent/CN110660408B/en
Publication of CN110660408A publication Critical patent/CN110660408A/en
Priority to EP20195635.6A priority patent/EP3792918B1/en
Application granted granted Critical
Publication of CN110660408B publication Critical patent/CN110660408B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0324Details of processing therefor
    • G10L21/034Automatic adjustment
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Abstract

The application discloses a method and a device for digital automatic gain control. One embodiment of the method comprises: calculating the speech probability P of each section of the signal data by a speech probability calculation model according to the read signal datan(ii) a Performing speech envelope estimation on the signal data to obtain speech envelope magnitude; calculating a first gain according to the deviation of the voice envelope amplitude and the expected amplitude; based on the speech probability PnClassifying the signal data, and counting the occurrence number of noise in the signal data based on the classification result, thereby calculating a second gain; and performing signal amplitude adjustment on the signal data by using the first gain and the second gain. This embodiment is favorable to realizing whole amplification effect, reduces the distortion that automatic gain leads to, and when the user changes, signal gain amplitude also can be along with the user difference and quick adjustment, and this device also can automatic adjustment gain, the expansion of noise reduction under the too big condition in near noise simultaneously.

Description

Method and device for digital automatic gain control
Technical Field
The present application relates to the field of signal processing, and in particular, to a method and apparatus for digitally and automatically controlling gain.
Background
When the traditional video conference system is used, voice data picked up by the microphone depends on the distance between a speaker and the microphone and the volume of the speaker, and if the voice data are directly transmitted and sent, the voice heard by a receiver is easy to be large and small, so that the conversation quality is seriously influenced. Aiming at the problem, the main solution is to adopt an AGC algorithm to control the sending volume, so that the sending volume is adjusted by using different gains according to input signals with different amplitudes, for example, a weak signal uses a larger gain, a strong signal uses a smaller gain, and finally the amplitude of the sending signal is stabilized in a certain range.
Some digital automatic control processing schemes currently exist that typically calculate the gain based on the deviation of the module input amplitude from the desired amplitude and then adjust the signal output amplitude. Since the amplitude of the speech signal changes rapidly, the current input signal is not typically used directly as the AGC block input, but rather the envelope of the input signal is used as the block input. The general envelope is obtained by smoothing the peak signal, since the envelope varies with the speech signal, the calculated gain varies with the speech signal, which inevitably results in distortion of the processed speech signal, and the envelope value is small in the case of non-speech, so that the calculated gain will also result in large output noise if it is too large. In addition, there are also proposals for recognizing a user by using a voiceprint recognition technique. When the device recognizes that the user speaks for the first time, the steady-state gain is calculated by adopting the standard AGC and stored in a memory; when the device identifies that the user has a prior history of use, the historical gains may be retrieved directly from memory. The method needs to use voiceprint recognition calculation, so the calculation amount is large, and the method is easily influenced by the voiceprint recognition accuracy.
Disclosure of Invention
The present application is directed to an improved method and apparatus for digitally and automatically controlling gain to solve the above problems.
According to a first aspect of the present application, there is provided a method of digitally automatically controlling gain, the method comprising: calculating the voice probability P of each section of signal data by the read section of signal data through a voice probability calculation modeln(ii) a Performing speech envelope estimation on the signal data to obtain speech envelope amplitude; calculating a first gain according to the deviation of the speech envelope amplitude and the expected amplitude; based on the probability of speech PnClassifying the signal data, and counting the occurrence number of noise in the signal data based on the classification result, thereby calculating a second gain; by using a firstThe gain and the second gain perform signal amplitude adjustment on the signal data.
In some embodiments, the speech envelope estimation is performed on the signal data to obtain the speech envelope magnitude, and the specific steps include: and respectively carrying out voice envelope estimation processing on each section of signal data, wherein the voice envelope estimation calculation formula is as follows:
Figure BDA0002199478630000021
wherein, A [ n ]]For speech envelope estimation at nth processing, E n]Is the maximum value of a segment of signal data, alpha is the update coefficient, PT1Updating a threshold for the speech;
wherein, the calculation formula of the updating coefficient alpha is as follows:
Figure BDA0002199478630000022
wherein, Delta is the adjustment amount of the update coefficient, PT2Updating the threshold value;
after the update coefficient alpha is obtained, the maximum value and the minimum value of the update coefficient are limited:
Figure BDA0002199478630000023
wherein alpha is0Is a lower threshold value and alpha0<1。
In some embodiments, the calculating the first gain according to the deviation of the speech envelope amplitude from the desired amplitude comprises: calculating a first gain based on a deviation of the speech envelope amplitude from the desired amplitude, the first gain being calculated by:
Figure BDA0002199478630000024
wherein A isTAt the desired amplitude.
In some embodimentsBased on the probability of speech PnClassifying the signal data, and counting the occurrence number of noise in the signal data based on the classification result, thereby calculating a second gain, specifically comprising: according to the speech probability PnThe signal data is classified in the following way:
Figure BDA0002199478630000031
wherein, PT3To classify the threshold, T [ n ]]For the classification result, 1 represents speech and 0 represents noise;
calculating a second gain according to the number c of noise appearing in the voice data counted based on the classification result:
Figure BDA0002199478630000032
wherein c1 and c2 are threshold values, and c1<c2,gminIs a minimum gain and gmin<1。
In some embodiments, the signal amplitude adjustment of the signal data by using the first gain and the second gain comprises the following specific steps: and performing signal amplitude adjustment on the signal data based on the first gain and the second gain, wherein the signal amplitude adjustment formula is as follows:
xo=g1*g2*x
wherein x is signal data, xoIs the amplitude adjustment output.
In some embodiments, further comprising: the signal data is dynamically compressed through a dynamic compressor, and the dynamic compression formula is as follows:
Figure BDA0002199478630000033
wherein T is a compression threshold, R is a compression coefficient, W is a compression transition range, and xoThe gain-adjusted signal is denoted as y, and the output signal is denoted as y.
According to the second aspect of the present applicationIn a second aspect, the present application provides an apparatus for digitally and automatically controlling gain, the apparatus comprising: a voice probability calculation module for calculating the voice probability P of each section of signal data from the read section of signal data through a voice probability calculation modeln(ii) a The voice envelope estimation module is used for carrying out voice envelope estimation on the signal data to obtain voice envelope amplitude; the first gain calculation module is used for calculating a first gain according to the deviation between the voice envelope amplitude and the expected amplitude; a second gain calculation module arranged to calculate a gain based on the speech probability PnClassifying the signal data, and counting the occurrence number of noise in the signal data based on the classification result, thereby calculating a second gain; and the signal amplitude adjusting module is used for adjusting the signal amplitude of the signal data by utilizing the first gain and the second gain.
In some embodiments, the apparatus further comprises: a signal data classification module arranged to classify the signal data according to the speech probability PnThe signal data is classified in the following way:
Figure BDA0002199478630000041
wherein, PT3To classify the threshold, T [ n ]]For the classification result, 1 represents speech and 0 represents noise.
In some embodiments, the apparatus further comprises: the dynamic compression module is used for dynamically compressing the signal data through a dynamic compressor, and the dynamic compression formula is as follows:
Figure BDA0002199478630000042
wherein T is a compression threshold, R is a compression coefficient, W is a compression transition range, and xoThe gain-adjusted signal is denoted as y, and the output signal is denoted as y.
According to a third aspect of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect.
According to the method and the device for digital automatic gain control, the signal data are respectively subjected to the first gain and the second gain through voice probability calculation, voice envelope estimation and signal data classification, so that the signal gain amplitude is adjusted, the whole amplification effect is favorably realized, the distortion caused by automatic gain processing is reduced, and when a user changes, the signal gain amplitude can be quickly adjusted along with the difference of the user. Meanwhile, the device can automatically adjust the gain under the condition of overlarge noise nearby, and the noise expansion is reduced.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a method of digitally automatically controlling gain according to the present application;
FIG. 3 is a flow chart of yet another embodiment of a method of digitally automatically controlling gain according to the present application;
FIG. 4 is a schematic block diagram of one embodiment of an apparatus for digitally and automatically controlling gain according to the present application;
FIG. 5 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 illustrates an exemplary system architecture 100 to which the method of digitally automatically controlling gain of embodiments of the present application may be applied.
As shown in FIG. 1, system architecture 100 may include a data server 101, a network 102, and a host server 103. Network 102 serves as a medium for providing a communication link between data server 101 and host server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The main server 103 may be a server that provides various services, such as a data processing server that processes information uploaded by the data server 101. The data processing server can process the received event information and store the processing result (such as element information set and label) in the event information base in an associated manner.
It should be noted that the method for digitally and automatically controlling gain provided in the embodiment of the present application is generally executed by the host server 103, and accordingly, the apparatus for digitally and automatically controlling gain is generally disposed in the host server 103.
The data server and the main server may be hardware or software. When the hardware is used, the hardware can be implemented as a distributed server cluster consisting of a plurality of servers, or can be implemented as a single server. When software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module.
It should be understood that the number of data servers, networks, and host servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to fig. 2, a flow 200 of one embodiment of a digital automatic control gain method according to the present application is shown. The method comprises the following steps:
step 201, calculating the speech probability P of each section of signal data by the speech probability calculation model from the read section of signal datan
As an example, 10ms voice data is set for each bar signal data, and may be set separately according to the actual situation and the usage scenario.
Step 202, performing speech envelope estimation on the signal data to obtain a speech envelope magnitude.
In this embodiment, each section of signal data is respectively subjected to speech envelope estimation processing, and the speech envelope estimation calculation formula is:
Figure BDA0002199478630000061
wherein, A [ n ]]For speech envelope estimation at nth processing, E n]Is the maximum value of a segment of signal data, alpha is the update coefficient, PT1The threshold is updated for speech.
As an example, when the probability of speech of the signal data is greater than the threshold, the probability that the input signal data is speech is higher, and the speech envelope is updated to avoid that the noise signal causes the estimated envelope value to be too small, so that the noise signal is amplified. The value of the update coefficient alpha controls the smoothness of the envelope, the larger the value is, the larger the fluctuation of the envelope is, and the faster the estimated value approaches the maximum value of the currently input signal, otherwise, the smoother the envelope is.
Wherein, the calculation formula of the updating coefficient alpha is as follows:
Figure BDA0002199478630000062
wherein, Delta is the adjustment amount of the update coefficient, PT2Updating the threshold value;
after the update coefficient alpha is obtained, the maximum value and the minimum value of the update coefficient are limited:
Figure BDA0002199478630000071
wherein alpha is0Is a lower threshold value and alpha0<1。
As an example, the α value increases when the speech probability of the input signal is low, and decreases when the speech probability of the input signal is high. When the input signal is continuous voice, the alpha value is gradually smaller, the envelope updating speed is slower and slower, and the estimated envelope is more stable; when the input signal is non-speech for many times, the alpha value is gradually increased, the envelope updating speed is faster and faster, and the value of the alpha value is closer to the current input maximum value.
In some alternative implementations of the present embodiment, the smaller the gain variation of the processing, the better, and therefore the smoother the envelope estimate, when the signal data is from the same person. When the user changes, the gain needs to be adjusted quickly, and the envelope estimate needs to be changed quickly. Under normal conditions, when the same person speaks, the signal data continuity is high, and a stable envelope can be obtained according to the mode; when one person speaks, the updating coefficient is gradually increased, if another person speaks again, the envelope estimation value can quickly track the input signal, the envelope can also quickly change, and then the envelope tends to be stable along with the speaking of the user.
Step 203, a first gain is calculated according to the deviation of the speech envelope amplitude from the desired amplitude.
In this embodiment, a first gain is calculated based on the deviation between the speech envelope amplitude and the desired amplitude, and the first gain calculation formula is:
Figure BDA0002199478630000072
wherein A isTAt the desired amplitude.
Step 204, based on the speech probability PnThe signal data is classified, and the number of occurrences of noise in the signal data is counted based on the classification result, thereby calculating a second gain.
In the present embodiment, the probability P of speech is based onnThe signal data is classified in the following way:
Figure BDA0002199478630000073
wherein, PT3To classify the threshold, T [ n ]]For the classification result, 1 represents speech and 0 represents noise.
Calculating a second gain according to the number c of noise appearing in the voice data counted based on the classification result:
Figure BDA0002199478630000081
wherein c1 and c2 are threshold values, and c1<c2,gminIs a minimum gain and gmin<1。
Step 205, performing signal gain amplitude adjustment on the signal data by using the first gain and the second gain.
In this embodiment, signal gain amplitude adjustment is performed on the signal data based on the first gain and the second gain, and the signal gain amplitude adjustment formula is as follows:
xo=g1*g2*x
wherein x is signal data, xoAnd adjusting output for gain amplitude.
As an example, if the number of times of noise occurring within a period of time is large, which indicates that the probability of local speaker-free is high, g2Less than 1, and the final gain value is less than g1The noise gain amplitude when the local nobody speaks can be reduced; otherwise, if the noise in the statistical time is smaller than the threshold, the probability of the local person speaking is higher, g2Equal to 1, the final gain value is g1The output signal amplitude may tend to the desired amplitude.
According to the method and the device for digital automatic gain control, the signal data are respectively subjected to the first gain and the second gain through the voice probability calculation, the voice envelope estimation and the signal data classification, so that the signal gain amplitude is adjusted, the whole amplification effect is favorably realized, the distortion caused by automatic gain processing is reduced, and when a user changes, the signal gain amplitude can be quickly adjusted along with the difference of the user. Meanwhile, the device can automatically adjust the gain under the condition of overlarge noise nearby, and the noise expansion is reduced.
With further reference to fig. 3, a flow 300 of yet another embodiment of a method of digitally automatically controlling gain according to the present application is shown. The method comprises the following steps:
step 301, calculating the speech probability P of each section of signal data by the speech probability calculation model from the read signal datan
In this embodiment, step 301 is substantially the same as step 201 in the embodiment corresponding to fig. 2, and is not described here again.
Step 302, performing speech envelope estimation on the signal data to obtain a speech envelope magnitude.
In this embodiment, step 302 is substantially the same as step 202 in the corresponding embodiment of fig. 2, and is not described herein again.
Step 303, a first gain is calculated according to the deviation between the speech envelope amplitude and the desired amplitude.
In this embodiment, step 303 is substantially the same as step 203 in the corresponding embodiment of fig. 2, and is not described herein again.
Step 304, based on the speech probability PnThe signal data is classified, and the number of occurrences of noise in the signal data is counted based on the classification result, thereby calculating a second gain.
In this embodiment, step 304 is substantially the same as step 204 in the corresponding embodiment of fig. 2, and is not described herein again.
Step 305, performing signal gain amplitude adjustment on the signal data by using the first gain and the second gain.
In this embodiment, step 305 is substantially the same as step 205 in the corresponding embodiment of fig. 2, and is not described herein again.
Step 306, dynamically compressing the signal data by the dynamic compressor.
In this embodiment, the signal data is dynamically compressed by a dynamic compressor, and the dynamic compression formula is:
Figure BDA0002199478630000091
wherein T is a compression threshold, R is a compression coefficient, W is a compression transition range, and xoThe gain-adjusted signal is denoted as y, and the output signal is denoted as y.
As an example, when the user speaks more smoothly, the envelope estimation obtained during the period is also more smoothly, so that the overall amplification effect can be realized, but when the user speaks, the amplitude of individual tones is often significantly higher than that of other tones, and if the overall amplification effect is still used, the high-amplitude tones may be broken when being output. Meanwhile, when a user is quickly switched from a long distance to a short distance, the updating coefficient cannot be timely adjusted to 1, the envelope estimation value is smaller than the input maximum value, and the condition that the amplitude of an output signal is too large can be caused.
As can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, the flow 300 of the method for digitally and automatically controlling gain in the present embodiment highlights the step of dynamically compressing the signal data by the dynamic compressor. Therefore, the scheme described in the embodiment can reduce the sound breaking condition caused by overlarge gain, and is beneficial to stable output of the voice signal.
With further reference to fig. 4, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for digitally and automatically controlling gain, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in various electronic devices.
As shown in fig. 4, the apparatus 400 for digitally and automatically controlling gain of the present embodiment includes: a speech probability calculation module 401 configured to calculate the speech probability P of each section of signal data from the read section of signal data through a speech probability calculation modeln(ii) a A speech envelope estimation module 402 configured to perform speech envelope estimation on the signal data to obtain a speech envelope magnitude; a first gain calculation module 403, configured to calculate a first gain according to a deviation of the speech envelope amplitude from the desired amplitude; a second gain calculation module 404 configured to calculate a second gain based on the speech probabilityPnClassifying the signal data, and counting the occurrence number of noise in the signal data based on the classification result, thereby calculating a second gain; the signal gain amplitude adjustment module 405 is configured to perform signal gain amplitude adjustment on the signal data by using the first gain and the second gain.
In this embodiment, the speech probability calculation module 401 may calculate the speech probability P of each section of signal data from the read signal data through the speech probability calculation modeln
As an example, 10ms voice data is set for each bar signal data, and may be set separately according to the actual situation and the usage scenario.
In this embodiment, the speech envelope estimation module 402 may perform speech envelope estimation on the signal data to obtain a speech envelope magnitude.
Specifically, the speech envelope estimation module 402 performs speech envelope estimation processing on each section of signal data, and the speech envelope estimation calculation formula is as follows:
Figure BDA0002199478630000101
wherein, A [ n ]]For speech envelope estimation at nth processing, E n]Is the maximum value of a segment of signal data, alpha is the update coefficient, PT1Updating a threshold for the speech;
as an example, when the probability of speech of the signal data is greater than the threshold, the probability that the input signal data is speech is higher, and the speech envelope is updated to avoid that the noise signal causes the estimated envelope value to be too small, so that the noise signal is amplified. The value of the update coefficient alpha controls the smoothness of the envelope, the larger the value is, the larger the fluctuation of the envelope is, and the faster the estimated value approaches the maximum value of the currently input signal, otherwise, the smoother the envelope is.
Wherein, the calculation formula of the updating coefficient alpha is as follows:
Figure BDA0002199478630000111
wherein, Delta is the adjustment amount of the update coefficient, PT2Updating the threshold value;
after the update coefficient alpha is obtained, the maximum value and the minimum value of the update coefficient are limited:
Figure BDA0002199478630000112
wherein alpha is0Is a lower threshold value and alpha0<1。
As an example, the α value increases when the speech probability of the input signal is low, and decreases when the speech probability of the input signal is high. When the input signal is continuous voice, the alpha value is gradually smaller, the envelope updating speed is slower and slower, and the estimated envelope is more stable; when the input signal is non-speech for many times, the alpha value is gradually increased, the envelope updating speed is faster and faster, and the value of the alpha value is closer to the current input maximum value.
In some alternative implementations of the present embodiment, the smaller the gain variation of the processing, the better, and therefore the smoother the envelope estimate, when the signal data is from the same person. When the user changes, the gain needs to be adjusted quickly, and the envelope estimate needs to be changed quickly. Under normal conditions, when the same person speaks, the signal data continuity is high, and a stable envelope can be obtained according to the mode; when one person speaks, the updating coefficient is gradually increased, if another person speaks again, the envelope estimation value can quickly track the input signal, the envelope can also quickly change, and then the envelope tends to be stable along with the speaking of the user.
In this embodiment, the first gain calculation module 403 may calculate the first gain according to a deviation of the speech envelope amplitude from the desired amplitude.
Specifically, the first gain calculating module 403 calculates a first gain based on the deviation between the speech envelope amplitude and the desired amplitude, where the first gain calculating formula is:
Figure BDA0002199478630000113
wherein A isTAt the desired amplitude.
In this embodiment, the second gain calculation module 404 may be based on the voice probability PnThe signal data is classified, and the number of occurrences of noise in the signal data is counted based on the classification result, thereby calculating a second gain.
Specifically, the second gain calculation module 404 calculates the second gain according to the speech probability PnThe signal data is classified in the following way:
Figure BDA0002199478630000121
wherein, PT3To classify the threshold, T [ n ]]For the classification result, 1 represents speech and 0 represents noise.
Calculating a second gain according to the number c of noise appearing in the voice data counted based on the classification result:
Figure BDA0002199478630000122
wherein c1 and c2 are threshold values, and c1<c2,gminIs a minimum gain and gminIn this embodiment, the first storage unit 405 may store the target event information, the element information set, and the tag association in a preset event information base.
In this embodiment, the signal gain and amplitude adjustment module 405 may perform signal gain and amplitude adjustment on the signal data by using the first gain and the second gain.
Specifically, the signal gain amplitude adjustment module 405 adjusts the signal gain amplitude of the signal data based on the first gain and the second gain, where the signal gain amplitude adjustment formula is as follows:
xo=g1*g2*x
wherein x is signal data, xoAnd adjusting output for gain amplitude.
As an example, if a segmentThe times of noise appearing in time are more, which shows that the probability of local unmanned speaking is higher, g2Less than 1, and the final gain value is less than g1The noise gain amplitude when the local nobody speaks can be reduced; otherwise, if the noise in the statistical time is smaller than the threshold, the probability of the local person speaking is higher, g2Equal to 1, the final gain value is g1The output signal amplitude may tend to the desired amplitude.
In some optional implementations of this embodiment, the apparatus 400 may further include: the dynamic compression module is used for dynamically compressing the signal data through a dynamic compressor, and the dynamic compression formula is as follows:
Figure BDA0002199478630000123
wherein T is a compression threshold, R is a compression coefficient, W is a compression transition range, and xoThe gain-adjusted signal is denoted as y, and the output signal is denoted as y.
As an example, when the user speaks more smoothly, the envelope estimation obtained during the period is also more smoothly, so that the overall amplification effect can be realized, but when the user speaks, the amplitude of individual tones is often significantly higher than that of other tones, and if the overall amplification effect is still used, the high-amplitude tones may be broken when being output. Meanwhile, when a user is quickly switched from a long distance to a short distance, the updating coefficient cannot be timely adjusted to 1, the envelope estimation value is smaller than the input maximum value, and the condition that the amplitude of an output signal is too large can be caused.
The device that above-mentioned embodiment of this application provided carries out first gain and second gain to signal data respectively through pronunciation probability calculation, speech envelope estimation and signal data classification to adjustment signal gain amplitude is favorable to realizing whole amplification effect, reduces the distortion that automatic gain handled and leads to, and when the user changes, signal gain amplitude also can be along with the user difference and quick adjustment. Meanwhile, the device can automatically adjust the gain under the condition of overlarge noise nearby, and the noise expansion is reduced.
Referring now to FIG. 5, shown is a block diagram of a computer system 500 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program performs the above-described functions defined in the method of the present application when executed by the Central Processing Unit (CPU) 501.
It should be noted that the computer readable storage medium of the present application can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a speech probability calculation unit, a speech envelope estimation unit, a first gain calculation unit, a second gain calculation unit, and a signal gain amplitude adjustment unit.
As another aspect, the present application also provides a method of computingA computer-readable storage medium that may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable storage medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: calculating the voice probability P of each section of signal data by the read section of signal data through a voice probability calculation modeln(ii) a Performing speech envelope estimation on the signal data to obtain speech envelope amplitude; calculating a first gain according to the deviation of the speech envelope amplitude and the expected amplitude; based on the probability of speech PnClassifying the signal data, and counting the occurrence number of noise in the signal data based on the classification result, thereby calculating a second gain; and performing signal gain amplitude adjustment on the signal data by using the first gain and the second gain.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (8)

1. A method for digitally and automatically controlling gain, said method comprising the steps of:
calculating the speech probability P of each section of the signal data by a speech probability calculation model according to the read signal datan
Performing speech envelope estimation on the signal data to obtain speech envelope magnitude;
calculating a first gain according to the deviation between the speech envelope amplitude and the expected amplitude, wherein the calculation formula of the first gain is as follows:
Figure FDA0003385074820000011
wherein A isTTo a desired amplitude, A [ n ]]Estimating the voice envelope at the nth processing;
based on the speech probability PnClassifying the signal data, and counting the occurrence number c of noise in the signal data based on the classification result, thereby calculating a second gain, wherein the calculation formula of the second gain is as follows:
Figure FDA0003385074820000012
wherein c1 and c2 are threshold values, and c1<c2,gminIs a minimum gain and gmin<1;
And performing signal gain amplitude adjustment on the signal data by using the first gain and the second gain, wherein the signal gain amplitude adjustment formula is as follows: x is the number ofo=g1*g2X, wherein x is the signal data, xoAnd adjusting output for gain amplitude.
2. The method of claim 1, wherein the step of performing speech envelope estimation on the signal data to obtain speech envelope magnitude specifically comprises:
respectively carrying out voice envelope estimation processing on each section of signal data, wherein the voice envelope estimation calculation formula is as follows:
Figure FDA0003385074820000013
wherein, A [ n ]]For speech envelope estimation at nth processing, E n]Is the maximum value of the signal data, alpha is the update coefficient, PT1Updating a threshold for the speech;
wherein, the calculation formula of the update coefficient α is:
Figure FDA0003385074820000021
wherein, Delta is the adjustment amount of the update coefficient, PT2Updating the threshold value;
after the update coefficient alpha is obtained, limiting the maximum value and the minimum value of the update coefficient:
Figure FDA0003385074820000022
wherein alpha is0Is a lower threshold value and alpha0<1。
3. The method of claim 1, wherein the step of controlling the gain is based on the speech probability PnClassifying the signal data and counting the number of occurrences of noise in the signal data based on the classification result to calculate a second gain, comprising:
according to the voice probability PnClassifying the signal data in the following manner:
Figure FDA0003385074820000023
wherein, PT3To classify the threshold, T [ n ]]For the classification result, 1 represents speech and 0 represents noise.
4. The method of digitally automatically controlling gain according to claim 1, further comprising:
dynamically compressing the signal data by a dynamic compressor, wherein the dynamic compression formula is as follows:
Figure FDA0003385074820000024
wherein T is a compression threshold, R is a compression coefficient, W is a compression transition range, and xoAfter gain adjustmentSignal y is the output signal.
5. An apparatus for digitally and automatically controlling gain, comprising:
a voice probability calculation module, which is used for calculating the voice probability P of each section of the signal data by the read signal data through a voice probability calculation modeln
The voice envelope estimation module is used for carrying out voice envelope estimation on the signal data to obtain voice envelope amplitude;
the first gain calculation module is configured to calculate a first gain according to a deviation between the speech envelope amplitude and the expected amplitude, where a calculation formula of the first gain is:
Figure FDA0003385074820000031
wherein A isTTo a desired amplitude, A [ n ]]Estimating the voice envelope at the nth processing;
a second gain calculation module arranged to calculate a gain based on the speech probability PnClassifying the signal data, and counting the occurrence number c of noise in the signal data based on the classification result, thereby calculating a second gain, wherein the calculation formula of the second gain is as follows:
Figure FDA0003385074820000032
wherein c1 and c2 are threshold values, and c1<c2,gminIs a minimum gain and gmin<1;
A signal gain amplitude adjustment module configured to perform signal gain amplitude adjustment on the signal data by using the first gain and the second gain, where the signal gain amplitude adjustment formula is: x is the number ofo=g1*g2X, wherein x is the signal data, xoFor gain amplitude adjustment output, g1Is a first gain, g2Is the second gain.
6. The apparatus for digitally and automatically controlling gain according to claim 5, further comprising:
a signal data classification module arranged to classify the speech signal according to the speech probability PnClassifying the signal data in the following manner:
Figure FDA0003385074820000033
wherein, PT3To classify the threshold, T [ n ]]For the classification result, 1 represents speech and 0 represents noise.
7. The apparatus for digitally and automatically controlling gain according to claim 5, further comprising:
a dynamic compression module configured to dynamically compress the signal data by a dynamic compressor, wherein the dynamic compression formula is as follows:
Figure FDA0003385074820000041
wherein T is a compression threshold, R is a compression coefficient, W is a compression transition range, and xoThe gain-adjusted signal is denoted as y, and the output signal is denoted as y.
8. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-4.
CN201910860075.XA 2019-09-11 2019-09-11 Method and device for digital automatic gain control Active CN110660408B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910860075.XA CN110660408B (en) 2019-09-11 2019-09-11 Method and device for digital automatic gain control
EP20195635.6A EP3792918B1 (en) 2019-09-11 2020-09-11 Digital automatic gain control method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910860075.XA CN110660408B (en) 2019-09-11 2019-09-11 Method and device for digital automatic gain control

Publications (2)

Publication Number Publication Date
CN110660408A CN110660408A (en) 2020-01-07
CN110660408B true CN110660408B (en) 2022-02-22

Family

ID=69037225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910860075.XA Active CN110660408B (en) 2019-09-11 2019-09-11 Method and device for digital automatic gain control

Country Status (2)

Country Link
EP (1) EP3792918B1 (en)
CN (1) CN110660408B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112216302A (en) * 2020-09-09 2021-01-12 深圳市欢太科技有限公司 Audio signal processing method and device, electronic equipment and readable storage medium
CN112700785A (en) * 2020-12-21 2021-04-23 苏州科达特种视讯有限公司 Voice signal processing method and device and related equipment
CN113470691A (en) * 2021-07-08 2021-10-01 浙江大华技术股份有限公司 Automatic gain control method of voice signal and related device thereof

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101009099A (en) * 2007-01-26 2007-08-01 北京中星微电子有限公司 Digital auto gain control method and device
CN101415045A (en) * 2007-10-17 2009-04-22 北京三星通信技术研究有限公司 Method and apparatus for implementing intelligent automatic level control in communication network
CN101447771A (en) * 2008-12-24 2009-06-03 北京中星微电子有限公司 Method and system for automatically controlling gains
CN201600893U (en) * 2009-12-30 2010-10-06 比亚迪股份有限公司 Device for adjusting the input signal dynamic range
CN104471855A (en) * 2012-07-12 2015-03-25 Dts公司 Loudness control with noise detection and loudness drop detection
CN104823236A (en) * 2013-11-07 2015-08-05 株式会社东芝 Speech processing system
CN105355197A (en) * 2015-10-30 2016-02-24 百度在线网络技术(北京)有限公司 Gain processing method and device for speech recognition system
CN105490654A (en) * 2016-01-20 2016-04-13 山东大学 Automatic gain controller control method of voice acquisition system and circuit thereof
CN106782593A (en) * 2017-02-27 2017-05-31 重庆邮电大学 A kind of many band structure sef-adapting filter changing methods eliminated for acoustic echo
CN108711435A (en) * 2018-05-30 2018-10-26 中南大学 A kind of high efficiency audio control method towards loudness
CN109767780A (en) * 2019-03-14 2019-05-17 苏州科达科技股份有限公司 A kind of audio signal processing method, device, equipment and readable storage medium storing program for executing

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5937377A (en) * 1997-02-19 1999-08-10 Sony Corporation Method and apparatus for utilizing noise reducer to implement voice gain control and equalization
US8290181B2 (en) * 2005-03-19 2012-10-16 Microsoft Corporation Automatic audio gain control for concurrent capture applications
CN104078050A (en) * 2013-03-26 2014-10-01 杜比实验室特许公司 Device and method for audio classification and audio processing

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101009099A (en) * 2007-01-26 2007-08-01 北京中星微电子有限公司 Digital auto gain control method and device
CN101415045A (en) * 2007-10-17 2009-04-22 北京三星通信技术研究有限公司 Method and apparatus for implementing intelligent automatic level control in communication network
CN101447771A (en) * 2008-12-24 2009-06-03 北京中星微电子有限公司 Method and system for automatically controlling gains
CN201600893U (en) * 2009-12-30 2010-10-06 比亚迪股份有限公司 Device for adjusting the input signal dynamic range
CN104471855A (en) * 2012-07-12 2015-03-25 Dts公司 Loudness control with noise detection and loudness drop detection
CN104823236A (en) * 2013-11-07 2015-08-05 株式会社东芝 Speech processing system
CN105355197A (en) * 2015-10-30 2016-02-24 百度在线网络技术(北京)有限公司 Gain processing method and device for speech recognition system
CN105490654A (en) * 2016-01-20 2016-04-13 山东大学 Automatic gain controller control method of voice acquisition system and circuit thereof
CN106782593A (en) * 2017-02-27 2017-05-31 重庆邮电大学 A kind of many band structure sef-adapting filter changing methods eliminated for acoustic echo
CN108711435A (en) * 2018-05-30 2018-10-26 中南大学 A kind of high efficiency audio control method towards loudness
CN109767780A (en) * 2019-03-14 2019-05-17 苏州科达科技股份有限公司 A kind of audio signal processing method, device, equipment and readable storage medium storing program for executing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
VoIP系统中AGC算法的研究与DSP实现;李菁菁;《中国优秀硕士学位论文全文数据库 信息科技辑》;20110315(第03期);第61-70页 *
基于短时谱估计的语音增强方法研究;陈照平;《中国优秀硕士学位论文全文数据库 信息科技辑》;20081015(第10期);第66-70页 *

Also Published As

Publication number Publication date
EP3792918A1 (en) 2021-03-17
CN110660408A (en) 2020-01-07
EP3792918B1 (en) 2023-11-01

Similar Documents

Publication Publication Date Title
CN110660408B (en) Method and device for digital automatic gain control
US10579327B2 (en) Speech recognition device, speech recognition method and storage medium using recognition results to adjust volume level threshold
EP2592546B1 (en) Automatic Gain Control in a multi-talker audio system
US9349384B2 (en) Method and system for object-dependent adjustment of levels of audio objects
CN101689373A (en) Intelligent gradient noise reduction system
CN110650410B (en) Microphone automatic gain control method, device and storage medium
US20120189147A1 (en) Sound processing apparatus, sound processing method and hearing aid
JP3255584B2 (en) Sound detection device and method
US20060122831A1 (en) Speech recognition system for automatically controlling input level and speech recognition method using the same
CN111048118B (en) Voice signal processing method and device and terminal
US20240088856A1 (en) Long-term signal estimation during automatic gain control
KR20080059881A (en) Apparatus for preprocessing of speech signal and method for extracting end-point of speech signal thereof
US8243955B2 (en) System for attenuating noise in an input signal
US9972338B2 (en) Noise suppression device and noise suppression method
US20220277766A1 (en) Dialog enhancement using adaptive smoothing
CN113329372A (en) Method, apparatus, device, medium and product for vehicle-mounted call
JP2002091487A (en) Device, method and program for voice recognition
CN112558916B (en) Audio adjustment method, device, electronic equipment and storage medium
US11837254B2 (en) Frontend capture with input stage, suppression module, and output stage
CN112700785A (en) Voice signal processing method and device and related equipment
CN117894329A (en) Hardware noise suppression method based on time domain
CN114694638A (en) Voice awakening method, terminal and storage medium
CN117499838A (en) Audio processing method and device and non-volatile computer readable storage medium
CN116132862A (en) Microphone control method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant