CN112564655A - Audio signal gain control method, device, equipment and storage medium - Google Patents

Audio signal gain control method, device, equipment and storage medium Download PDF

Info

Publication number
CN112564655A
CN112564655A CN201910920120.6A CN201910920120A CN112564655A CN 112564655 A CN112564655 A CN 112564655A CN 201910920120 A CN201910920120 A CN 201910920120A CN 112564655 A CN112564655 A CN 112564655A
Authority
CN
China
Prior art keywords
energy
audio
gain control
frame
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910920120.6A
Other languages
Chinese (zh)
Inventor
杨晓霞
刘溪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Volkswagen Mobvoi Beijing Information Technology Co Ltd
Original Assignee
Volkswagen Mobvoi Beijing Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Volkswagen Mobvoi Beijing Information Technology Co Ltd filed Critical Volkswagen Mobvoi Beijing Information Technology Co Ltd
Priority to CN201910920120.6A priority Critical patent/CN112564655A/en
Publication of CN112564655A publication Critical patent/CN112564655A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G1/00Details of arrangements for controlling amplification
    • H03G1/04Modifications of control circuit to reduce distortion caused by control
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G3/00Gain control in amplifiers or frequency changers without distortion of the input signal
    • H03G3/20Automatic control
    • H03G3/30Automatic control in amplifiers having semiconductor devices
    • H03G3/3005Automatic control in amplifiers having semiconductor devices in amplifiers suitable for low-frequencies, e.g. audio amplifiers
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G5/00Tone control or bandwidth control in amplifiers
    • H03G5/16Automatic control

Abstract

The embodiment of the invention discloses an audio signal gain control method, an audio signal gain control device, audio signal gain control equipment and a storage medium. The audio signal gain control method comprises the following steps: collecting audio signals in real time, and sequentially storing the audio signals in an audio signal buffer area according to the form of audio frames; determining a gain control type according to the signal energy of each audio frame currently cached in the audio signal cache region in real time; and when the audio frame accumulation reference condition is determined to be met, acquiring the audio frame which is firstly cached from the audio signal cache region, and performing gain control processing matched with the current gain control type. According to the technical scheme of the embodiment of the invention, the gain of the audio signal is adjusted by continuously monitoring the signal energy value of the audio frame, so that the amplification of the voice signal is realized, the residual interference and noise signals are inhibited, and the success rate of voice recognition is improved.

Description

Audio signal gain control method, device, equipment and storage medium
Technical Field
The present invention relates to voice interaction technologies, and in particular, to a method, an apparatus, a device, and a storage medium for controlling gain of an audio signal.
Background
The application of the voice interaction system in the vehicle-mounted scene is another important application scene after the application of voice interaction in home scenes such as an intelligent sound box, and the like.
In the prior art, in order to improve the accuracy of voice recognition, a series of processing such as noise reduction and echo cancellation is performed on an audio signal recorded by a microphone, but after the processing is completed, some residual interference noise still exists, for example, the interference noise may be the sound of a rear passenger or the sound played by an unremoved vehicle-mounted speaker, and the residual interference noise may have a great influence on the accuracy of voice recognition of a vehicle-mounted voice recognition system.
Disclosure of Invention
The embodiment of the invention provides a gain control method, a gain control device, gain control equipment and a storage medium of an audio signal, wherein the gain control type is determined according to the signal energy value of the current cached audio frame so as to perform gain control processing matched with the current gain control type on the audio frame cached firstly, thereby realizing the amplification of a voice signal, simultaneously inhibiting residual interference and noise and improving the success rate of voice recognition.
In a first aspect, an embodiment of the present invention provides a method for controlling gain of an audio signal, where the method includes:
collecting audio signals in real time, and sequentially storing the audio signals in an audio signal buffer area according to the form of audio frames;
determining a gain control type according to the signal energy of each audio frame currently cached in the audio signal cache region in real time;
and when the audio frame accumulation reference condition is determined to be met, acquiring the audio frame which is firstly cached from the audio signal cache region, and performing gain control processing matched with the current gain control type.
In a second aspect, an embodiment of the present invention further provides an apparatus for controlling gain of an audio signal, where the apparatus includes:
the audio signal storage module is used for collecting audio signals in real time and sequentially storing the audio signals in an audio signal buffer area according to the form of audio frames;
the gain control type determining module is used for determining the gain control type according to the signal energy of each audio frame currently cached in the audio signal cache region in real time;
and the gain control processing module is used for acquiring the audio frame cached firstly from the audio signal cache region and carrying out gain control processing matched with the current gain control type when the audio frame accumulation reference condition is determined to be met.
In a third aspect, an embodiment of the present invention further provides an electronic device, including:
one or more processors;
a memory for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors implement the method for gain control of an audio signal according to any of the embodiments of the present invention.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for controlling gain of an audio signal according to any embodiment of the present invention.
According to the technical scheme of the embodiment of the invention, the audio signals are collected in real time and are sequentially stored in the audio signal buffer area according to the form of the audio frames, meanwhile, the gain control type is determined in real time according to the signal energy of each audio frame currently buffered in the audio signal buffer area, when the audio frame accumulation reference condition is determined to be met, the audio frame firstly buffered is obtained from the audio signal buffer area, and the gain control processing matched with the current gain control type is carried out, so that the adjustment of the gain of the audio signal is realized by continuously monitoring the signal energy value of the audio frame, the voice signal can be amplified, the residual interference and the noise signal are inhibited, and the success rate of voice recognition is improved.
Drawings
Fig. 1a is a flowchart of a method for controlling gain of an audio signal according to a first embodiment of the present invention;
FIG. 1b is a schematic diagram of the action positions of the method for controlling the gain of an audio signal in a complete speech recognition system according to a first embodiment of the present invention;
FIG. 2a is a flowchart of a method for controlling gain of an audio signal according to a second embodiment of the present invention;
FIG. 2b is a flowchart of determining the type of gain control in the method for controlling the gain of an audio signal according to the second embodiment of the present invention;
fig. 3a is a flowchart of a method for controlling gain of an audio signal according to a third embodiment of the present invention;
FIG. 3b is a schematic flowchart of a method for controlling gain of an audio signal according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus for controlling gain of an audio signal according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of an apparatus according to a fifth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1a is a flowchart of a method for controlling gain of an audio signal according to a first embodiment of the present invention, where the technical solution of this embodiment is suitable for performing gain control processing on an audio frame stored in an audio signal buffer according to a signal energy value of a current audio frame, and the method may be executed by a device for controlling gain of an audio signal, where the device may be implemented by software and/or hardware, and may be integrated in various general-purpose computer devices.
As shown in fig. 1b, the audio signal gain control method provided by the present application acts on the tail end of the whole audio signal processing pipeline, that is, after the audio signal received by the microphone is subjected to noise reduction, echo cancellation, and the like, there are still some interference noises involved, and these interference noises may cause the sound of the back passengers or the sound broadcast by the on-board speaker which is not completely cancelled, and the like.
The method for controlling the gain of the audio signal specifically comprises the following steps:
and 110, acquiring audio signals in real time, and sequentially storing the audio signals in an audio signal buffer area according to the form of audio frames. In this embodiment, an audio signal buffer area with a set duration is established for the received audio signal, and is used for buffering the received audio signal, so as to implement corresponding gain control processing on the audio frame in the audio signal buffer area. For example, an audio signal buffer with a length of 80 ms may be established, and if the length of each audio frame is set to 10 ms, 8 audio frames may be stored in the current audio signal buffer.
And step 120, determining a gain control type in real time according to the signal energy of each audio frame currently buffered in the audio signal buffer area.
In this embodiment, the gain control of the audio signal is a method for automatically adjusting the control gain of the audio frame in the audio signal buffer area according to the energy of the audio frame stored in the audio signal buffer area, and the final purpose of the method is to amplify the voice signal input by the user and suppress the noise signal.
The technical scheme in the embodiment of the invention can be applied to a single-microphone vehicle-mounted audio signal processing system, namely a scene that a vehicle carries only one microphone, the type of gain control is determined by acquiring the currently cached audio frame in real time, further calculating the signal energy of the currently cached audio frame in a set mode and finally determining the type of the gain control according to the signal energy of the audio frame, so that whether the received audio signal belongs to a voice signal or a noise signal is judged by continuously monitoring the signal energy of the received audio frame, and the corresponding gain control type is selected according to the type of the audio signal. Illustratively, the gain control type may be to amplify a voice signal by increasing the gain or to suppress a noise signal by decreasing the gain.
And step 130, when the audio frame accumulation reference condition is determined to be met, acquiring the audio frame buffered firstly from the audio signal buffer area, and performing gain control processing matched with the current gain control type.
In this embodiment, when it is determined that the audio frame accumulation reference condition is satisfied, the audio frame that has been previously stored in the audio signal buffer is processed in a gain control processing manner that matches the current gain control type. Illustratively, when 8 audio frames are stored in the audio signal buffer area, by continuously monitoring the signal energy of the first 8 audio frames and the currently buffered 9 th audio frame in the buffering process, it is determined that the first 1-9 audio frames belong to the voice signal, and then it is determined that the current gain control type is to amplify the voice signal by increasing the gain, and further, the gain control processing matched with the current gain control type is performed on the 1 st audio frame in the audio signal buffer area, and finally, the 1 st audio frame after the gain control processing is output; when detecting that the 10 th audio frame still belongs to the voice signal, performing gain control processing matched with the current gain control type on the 1 st audio frame (which is equivalent to the 2 nd audio frame before the 1 st audio frame is sent out) in the current audio signal buffer area, and repeating the steps until the output of all the audio frames is completed.
Optionally, the audio frame accumulation reference condition includes:
the audio signal buffer area is cached with audio frames of accumulated reference quantity, or no new audio frame is cached in the audio signal buffer area.
In this optional embodiment, the audio frame accumulation reference condition is defined, and includes that audio frames with an accumulation reference number are cached in the audio buffer area, or when the microphone stops audio signal acquisition, a new audio frame is not cached in the audio signal buffer area. In the technical scheme of this embodiment, accumulated classified statistics needs to be performed on audio frames input into the audio signal buffer area according to the signal energy, the gain control type is determined, and finally, gain control processing matching with the current gain control type is performed on the audio frame buffered first. Illustratively, when 8 audio frames are stored in the audio signal buffer, the corresponding gain control type is used to perform gain control processing on the 1 st audio frame stored in the current audio signal buffer, and the processed audio frame is output. It can be understood that, at the audio entry end stage, when a new audio frame is not cached any more, the audio frames are sequentially acquired from the audio signal buffer area according to the stored sequence, and the gain control processing matched with the current gain control type is performed, so as to complete the gain control processing and output of all the audio frames in the audio signal buffer area, for example, if no new audio frame is input into the audio signal buffer area within a set time period (for example, 3 seconds), the gain control processing is directly performed on the remaining audio frames in the audio signal buffer area, and the gain value at this time may be selected as a preset fixed value, for example, the gain value is 1.
Optionally, after obtaining the audio frame buffered first from the audio signal buffer and performing gain control processing matching with the current gain control type, the method further includes:
and inputting the audio frame subjected to the gain control processing in the audio signal buffer area to a voice recognition module.
In this optional embodiment, in order to improve the accuracy of speech recognition, the received audio signal is first stored in an audio signal buffer, then the audio frames in the audio signal buffer are sequentially subjected to gain control processing, and finally the audio frames subjected to gain control processing are input to a speech recognition module for speech recognition, so as to perform corresponding operations according to the speech signal input by the user.
According to the technical scheme of the embodiment of the invention, the audio signals are collected in real time and are sequentially stored in the audio signal buffer area according to the form of the audio frames, meanwhile, the gain control type is determined in real time according to the signal energy of each audio frame currently buffered in the audio signal buffer area, when the condition that the audio frame accumulation reference condition is met is determined, the audio frame firstly buffered is obtained from the audio signal buffer area, and the gain control processing matched with the current gain control type is carried out, so that the adjustment of the audio signal gain is realized by continuously monitoring the signal energy value of the audio frame, the voice signals can be amplified, the residual interference and noise signals are inhibited, and the success rate of voice recognition is improved.
Example two
Fig. 2a is a flowchart of a method for controlling gain of an audio signal according to a second embodiment of the present invention, which is further refined based on the above embodiments and provides a specific step of determining a gain control type in real time according to signal energy of each audio frame currently buffered in the audio signal buffer. A method for controlling gain of an audio signal according to a second embodiment of the present invention is described with reference to fig. 2a, which includes the following steps:
step 210, collecting audio signals in real time, and sequentially storing the audio signals in an audio signal buffer area according to the form of audio frames.
Step 220, acquiring a new audio frame currently buffered in the audio signal buffer area in real time, and calculating the signal energy of the new audio frame.
In this embodiment, in order to continuously monitor whether a received audio frame belongs to a speech signal or a noise signal, each time a new audio frame is to be stored in the audio signal buffer, the currently buffered audio frame is acquired, and the signal energy of the currently buffered audio frame is calculated, so as to determine the type of the current audio frame according to the set rule of the set audio signal and the signal energy.
Step 230, determining whether the signal energy is greater than or equal to a global energy threshold, and updating a high-energy frame count result and a low-energy frame count result matched with the audio signal buffer according to the determination result.
The global energy threshold is calculated by using a Voice Activity Detection (VAD) method based on energy, and specifically, the interference section may be estimated by using the VAD method, and then the root mean square is taken as the global energy threshold. The high energy frame count result and the low energy frame count result respectively represent the accumulated number of audio frames with the signal energy value greater than the global energy threshold and the accumulated number of audio frames with the signal energy value less than the global energy threshold.
In this embodiment, the received audio frames are divided into two types by comparing with the global energy threshold, the first type is an audio frame with signal energy greater than or equal to the global energy threshold, a high energy frame counter is used for counting, the second type is an audio frame with signal energy less than the global energy threshold, a low energy frame counter is used for counting, and when a new audio frame is cached in the audio signal cache region, the corresponding counter value is updated.
Optionally, updating the high-energy frame count result and the low-energy frame count result matched with the audio signal buffer according to the determination result, including:
when the judgment result is that the signal energy is larger than or equal to the global energy threshold, accumulating the counting result of the high-energy frames; if the accumulated result is determined to exceed a first threshold value, setting the counting result of the low-energy frames as a set initial value;
when the judgment result is that the signal energy is smaller than a global energy threshold value, accumulating the counting result of the low-energy frames; and if the accumulated result is determined to exceed the second threshold value, setting the high-energy frame counting result as a set initial value.
In this optional embodiment, a manner of updating the counter value according to a result of comparing the signal energy of the current audio frame with the global energy threshold is provided, specifically, as shown in fig. 2B, if the signal energy P of the current audio frame is greater than or equal to the global energy threshold a, the value of the high-energy frame counter H is increased by 1, the updated high-energy frame counter value H is compared with a preset first threshold B, and when the updated high-energy frame counter value H is greater than the first threshold B, the low-energy frame counter L is set to zero; and if the signal energy P of the current audio frame is less than the global energy threshold A, adding 1 to the value L of the low-energy frame counter, comparing the updated value L of the low-energy frame counter with a preset second threshold C, and when the updated value L of the low-energy frame counter is greater than the second threshold C, setting the high-energy frame counter H to zero. For example, the first threshold B may be set to 8, and the second threshold C may be set to 15.
And 240, determining the gain control type according to the high-energy frame counting result and the low-energy frame counting result.
In this embodiment, in order to determine the type of the audio signal according to the signal energy values of a plurality of consecutive audio frames and further determine the type of the gain control according to the type of the audio signal, the gain control type is determined according to the counting result of the counter on the basis of updating the counting result of the high-energy frame and the counting result of the low-energy frame in step 230. Illustratively, when the high-energy frame count result is greater than the first threshold, it indicates that the signal energy of 8 audio frames is greater than the global energy threshold, and it is determined that the segment of audio signal belongs to a speech signal, and it is determined that the gain control type is to amplify the speech signal by increasing the gain.
Optionally, determining a gain control type according to the high-energy frame counting result and the low-energy frame counting result, including:
determining the gain control type to be an amplified audio frame signal if it is determined that the high-energy frame count result exceeds the first threshold;
determining the gain control type to be a suppressed audio frame signal if it is determined that the low energy frame count result exceeds the second threshold;
determining the gain control type to leave the audio frame signal unchanged if it is determined that the high-energy frame count result does not exceed the first threshold or it is determined that the low-energy frame count result does not exceed the second threshold.
In this optional embodiment, a manner of determining a gain control type according to a high-energy frame counting result and a low-energy frame counting result is provided, specifically, if it is determined that the high-energy frame counting result exceeds a first threshold, it is determined that the current segment of audio signal belongs to a speech signal, and thus it is further determined that the gain control type is an amplified audio frame signal; if the energy frame counting result is determined to exceed the second threshold, determining that the audio signal of the current segment belongs to the noise signal, and further determining that the gain control type is the suppression audio frame signal; and if the energy of the current audio frame is higher than the global energy threshold but the counting result of the high-energy frame does not exceed the first threshold, or the energy of the current audio frame is lower than the global energy threshold but the counting result of the low-energy frame does not exceed the second threshold, determining that the gain control type is to keep the audio frame signal unchanged.
Specifically, as shown in fig. 2B, if the signal energy P of the current audio frame is greater than or equal to the global energy threshold a, and the updated high-energy frame counter value H is greater than the first threshold B, that is, when the step of the left branch of the flowchart is executed, it is determined that the gain control type is to amplify the audio signal by increasing the gain; if the signal energy P of the current audio frame is smaller than the global energy threshold A and the updated low-energy frame counter value L is larger than a second threshold C, namely the step of the right branch of the flow chart is executed, the gain control type is determined to be that the audio signal is suppressed by reducing the gain; and if the energy P of the current audio frame is higher than the global energy threshold A but the counting result H of the high-energy frame does not exceed the first threshold B, or the energy P of the current audio frame is lower than the global energy threshold A but the counting result L of the low-energy frame does not exceed the second threshold C, namely the conditions of the left branch and the right branch are not met, determining that the gain control type is to keep the audio frame signal unchanged.
And step 250, when the audio frame accumulation reference condition is determined to be met, acquiring the audio frame which is firstly cached from the audio signal cache region, and performing gain control processing matched with the current gain control type.
According to the technical scheme of the embodiment of the invention, firstly, audio signals are collected in real time, the audio signals are sequentially stored in an audio signal buffer area according to the form of audio frames, then the signal energy value of one currently buffered audio frame is calculated, the high-energy frame counting result and the low-energy frame counting result are updated according to the comparison result of the signal energy value and a global energy threshold value, finally, the gain control type is determined according to the updated counting result, the audio frame firstly buffered is obtained from the audio signal buffer area, the gain control processing matched with the current gain control type is carried out, the voice signals and the noise signals can be distinguished according to the energy of the continuously monitored audio frames, different gain control processing is carried out on different signals accurately, and the condition that voice recognition fails due to residual noise interference is avoided.
EXAMPLE III
Fig. 3a is a flowchart of a method for controlling gain of an audio signal according to a third embodiment of the present invention, which is further refined on the basis of the foregoing embodiments and provides specific steps after determining a gain control type according to the high-energy frame count result and the low-energy frame count result. In the following, a method for controlling gain of an audio signal according to a third embodiment of the present invention is described with reference to fig. 3a, including the following steps:
and 310, acquiring audio signals in real time, and sequentially storing the audio signals in an audio signal buffer area according to the form of audio frames.
Step 320, obtaining a new audio frame currently buffered in the audio signal buffer area in real time, and calculating the signal energy of the new audio frame.
Optionally, acquiring a new audio frame currently buffered in the audio signal buffer in real time, and calculating a signal energy of the new audio frame, including:
based on a new audio frame currently buffered, the formula is used: p (t) ═ α X2(t)+(1-α)X2(t-1) calculating a signal energy of the new audio frame;
wherein, p (t) represents the signal energy of the current buffered tth audio frame, α is a constant, x (t) represents the tth audio frame, and t represents the frame number of the audio signal.
In this optional embodiment, a specific method for calculating signal energy of each audio frame buffered in an audio signal buffer is provided, where energy of an audio frame currently input to the audio signal buffer is calculated according to an audio frame signal of a current audio frame and a previous audio frame and a relationship of a set constant value, and a specific formula is as follows:
P(t)=αX2(t)+(1-α)X2(t-1)
wherein, p (t) represents the signal energy of the current buffered tth audio frame, α is a constant, x (t) represents the tth audio frame signal, and t represents the frame number of the current audio frame. Illustratively, α is 0.98.
Step 330, determining whether the signal energy is greater than or equal to a global energy threshold, and updating a high-energy frame counting result and a low-energy frame counting result matched with the audio signal buffer according to the determination result.
And step 340, determining the gain control type according to the high energy frame counting result and the low energy frame counting result.
And step 350, determining a gain adjustment value corresponding to the gain control type according to a high energy frame counting result or a low energy frame counting result corresponding to the gain control type.
In this embodiment, after the gain control type is determined, a gain adjustment value is calculated according to the high-energy frame count result or the low-energy frame count result by using a gain adjustment value calculation method corresponding to the gain control type.
Optionally, determining a gain adjustment value corresponding to the gain control type according to a high energy frame count result or a low energy frame count result corresponding to the gain control type includes:
in determining that the gain control type is an amplified audio frame signal, using the formula:
Figure BDA0002217302200000131
calculating the gain adjustment value;
where Gain is an amplification Gain value, a is a Gain amplification factor, H is the high-energy frame count result, and B is the first threshold.
Optionally, determining a gain adjustment value corresponding to the gain control type according to a high energy frame count result or a low energy frame count result corresponding to the gain control type includes:
upon determining that the gain control type is to suppress an audio frame signal, using the formula: gain is 0.95(L-C)Calculating the gain adjustment value;
wherein Gain is a suppression Gain value, L is the low energy frame count result, and C is the second threshold.
In the two optional embodiments, a specific manner for determining the gain adjustment value corresponding to the gain control type according to the high-energy frame count result and the low-energy frame count result corresponding to the gain control type is provided, respectively, and when the gain control type is an amplified audio frame signal, a formula is used:
Figure BDA0002217302200000132
calculating the gain adjustment value; when the gain control type is to suppress the audio frame signal, the formula is used: gain is 0.95(L-C)And calculating the gain adjustment value. By the calculation method, the gain can be smoothly adjusted, sudden change of the audio signal is avoided, and the influence on distortion of the audio signal is small.
And step 360, acquiring the audio frame cached firstly from the audio signal cache region, and performing gain control processing on the acquired audio frame by using the gain adjustment value corresponding to the current gain control type.
In this embodiment, after the gain adjustment value corresponding to the gain control type is obtained, the audio frame stored in the audio signal buffer area is subjected to gain control processing by the gain adjustment value. Illustratively, the current gain adjustment value is multiplied by a previously set number of audio frame signals to obtain an audio frame after gain processing. For example, applying the gain adjustment value to the audio data of the current audio signal buffer set earlier by a set frame number (for example, 8 frames) relative to the current frame number, the specific calculation formula is as follows:
out(t-d)=Gain(t)×X(t-d)
wherein t represents the frame number of the audio frame, i.e. the current audio frame is the t-th frame, out (t-d) is the audio frame signal of the audio frame d frames earlier than the current audio frame after the gain control processing, gain (t) is the gain adjustment value calculated according to the current audio frame, and X (t-d) is the audio frame d frames earlier than the current audio frame.
As shown in fig. 3b, the method for controlling gain of an audio signal provided in this embodiment specifically includes applying a gain adjustment value calculated according to a current audio frame to an audio frame that is d frames earlier than the current frame number in a current audio signal buffer, and sending an audio signal subjected to gain control processing to a speech recognition module.
According to the technical scheme of the embodiment, after the gain control type is determined according to the high-energy frame counting result and the low-energy frame counting result, the gain adjustment value corresponding to the gain control type is calculated through a set formula according to the high-energy frame counting result or the low-energy frame counting result corresponding to the gain control type, finally, when the condition that the audio frame accumulation reference condition is met is determined, the audio frame cached firstly is obtained from the audio signal cache region, the gain control processing matched with the current gain control type is carried out, the effect that the gain adjustment value corresponding to the gain control type acts on the audio frame stored in the audio signal cache region in a delayed mode, the voice signal is amplified smoothly or the noise signal is restrained, distortion caused to the voice signal is small, and the success rate of voice recognition is improved.
Example four
Fig. 4 is a schematic structural diagram of an audio signal gain control apparatus according to a fourth embodiment of the present invention, where the audio signal gain control apparatus includes: an audio signal storage module 410, a gain control type determination module 420, and a gain control processing module 430.
The audio signal storage module 410 is configured to collect audio signals in real time, and sequentially store the audio signals in an audio signal buffer according to an audio frame format;
a gain control type determining module 420, configured to determine a gain control type in real time according to signal energy of each currently buffered audio frame in the audio signal buffer;
and the gain control processing module 430 is configured to, when it is determined that the audio frame accumulation reference condition is satisfied, obtain the audio frame that is buffered first from the audio signal buffer, and perform gain control processing that matches the current gain control type.
According to the technical scheme of the embodiment of the invention, the audio signals are collected in real time and are sequentially stored in the audio signal buffer area according to the form of the audio frames, meanwhile, the gain control type is determined in real time according to the signal energy of each audio frame currently buffered in the audio signal buffer area, when the condition that the audio frame accumulation reference condition is met is determined, the audio frame firstly buffered is obtained from the audio signal buffer area, and the gain control processing matched with the current gain control type is carried out, so that the adjustment of the audio signal gain is realized by continuously monitoring the signal energy value of the audio frame, the voice signals can be amplified, the residual interference and noise signals are inhibited, and the success rate of voice recognition is improved.
Optionally, the gain control type determining module 420 includes:
the signal energy calculating unit is used for acquiring a new audio frame currently cached in the audio signal cache region in real time and calculating the signal energy of the new audio frame;
the counting result updating unit is used for judging whether the signal energy is greater than or equal to a global energy threshold value or not and updating a high-energy frame counting result and a low-energy frame counting result which are matched with the audio signal buffer area according to the judgment result;
and the gain control type determining unit is used for determining the gain control type according to the high-energy frame counting result and the low-energy frame counting result.
Optionally, the counting result updating unit is specifically configured to:
when the judgment result is that the signal energy is larger than or equal to the global energy threshold, accumulating the counting result of the high-energy frames; if the accumulated result is determined to exceed a first threshold value, setting the counting result of the low-energy frames as a set initial value;
when the judgment result is that the signal energy is smaller than a global energy threshold value, accumulating the counting result of the low-energy frames; and if the accumulated result is determined to exceed the second threshold value, setting the high-energy frame counting result as a set initial value.
Optionally, the gain control type determining unit is specifically configured to:
determining the gain control type to be an amplified audio frame signal if it is determined that the high-energy frame count result exceeds the first threshold;
determining the gain control type to be a suppressed audio frame signal if it is determined that the low energy frame count result exceeds the second threshold;
determining the gain control type to leave the audio frame signal unchanged if it is determined that the high-energy frame count result does not exceed the first threshold or it is determined that the low-energy frame count result does not exceed the second threshold.
Optionally, the gain control type determining module 420 further includes:
a gain adjustment value determination unit configured to determine a gain control type according to the high energy frame count result and the low energy frame count result, and then determine a gain adjustment value corresponding to the gain control type according to the high energy frame count result or the low energy frame count result corresponding to the gain control type;
correspondingly, the gain control processing module 430 is specifically configured to:
and acquiring the audio frame buffered firstly from the audio signal buffer area, and performing gain control processing on the acquired audio frame by using the gain adjustment value corresponding to the current gain control type.
Optionally, the apparatus for controlling gain of the audio signal further includes:
and the audio frame input module is used for acquiring the audio frame which is firstly buffered from the audio signal buffer area, performing gain control processing matched with the current gain control type, and inputting the audio frame which is subjected to the gain control processing in the audio signal buffer area to the voice recognition module.
Optionally, the gain adjustment value determining unit is specifically configured to:
in determining that the gain control type is an amplified audio frame signal, using the formula:
Figure BDA0002217302200000171
calculating the gain adjustment value;
where Gain1 is an amplification Gain value, a is a Gain amplification factor, H is the high-energy frame count result, and B is the first threshold.
Optionally, the gain adjustment value determining unit is specifically configured to:
upon determining that the gain control type is to suppress an audio frame signal, using the formula: gain2 ═ 0.95(L-C)Calculating the gain adjustment value;
where Gain2 is a suppression Gain value, L is the low energy frame count result, and C is the second threshold.
Optionally, the signal energy calculating unit is specifically configured to:
based on a new audio frame currently buffered, the formula is used: p (t) ═ α X2(t)+(1-α)X2(t-1) calculating a signal energy of the new audio frame;
wherein, p (t) represents the signal energy of the current buffered tth audio frame, α is a constant, x (t) represents the tth audio frame, and t represents the frame number of the audio signal.
Optionally, the audio frame accumulation reference condition includes:
the audio signal buffer area is cached with audio frames of accumulated reference quantity, or no new audio frame is cached in the audio signal buffer area.
The gain control device for the audio signal provided by the embodiment of the invention can execute the gain control method for the audio signal provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE five
Fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention, as shown in fig. 5, the electronic device includes a processor 50 and a memory 51; the number of processors 50 in the device may be one or more, and one processor 50 is taken as an example in fig. 5; the processor 50 and the memory 51 in the device may be connected by a bus or other means, as exemplified by the bus connection in fig. 5.
The memory 51 serves as a computer-readable storage medium for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to a gain control method for an audio signal in an embodiment of the present invention (for example, an audio signal storage module 410, a gain control type determination module 420, and a gain control processing module 430 in a gain control apparatus for an audio signal). The processor 50 executes various functional applications of the device and data processing, i.e., implements the above-described gain control method of the audio signal, by executing software programs, instructions, and modules stored in the memory 51.
The method comprises the following steps:
collecting audio signals in real time, and sequentially storing the audio signals in an audio signal buffer area according to the form of audio frames;
determining a gain control type according to the signal energy of each audio frame currently cached in the audio signal cache region in real time;
and when the audio frame accumulation reference condition is determined to be met, acquiring the audio frame which is firstly cached from the audio signal cache region, and performing gain control processing matched with the current gain control type.
The memory 51 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 51 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 51 may further include memory located remotely from the processor 50, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
EXAMPLE six
An embodiment of the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a computer processor, is configured to perform a method of gain control of an audio signal, the method including:
collecting audio signals in real time, and sequentially storing the audio signals in an audio signal buffer area according to the form of audio frames;
determining a gain control type according to the signal energy of each audio frame currently cached in the audio signal cache region in real time;
and when the audio frame accumulation reference condition is determined to be met, acquiring the audio frame which is firstly cached from the audio signal cache region, and performing gain control processing matched with the current gain control type.
Of course, the storage medium containing the computer-executable instructions provided by the embodiments of the present invention is not limited to the method operations described above, and may also perform related operations in the method for controlling the gain of an audio signal provided by any embodiment of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the above embodiment of the gain control apparatus for an audio signal, the units and modules included in the apparatus are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (19)

1. A method for gain control of an audio signal, comprising:
collecting audio signals in real time, and sequentially storing the audio signals in an audio signal buffer area according to the form of audio frames;
determining a gain control type according to the signal energy of each audio frame currently cached in the audio signal cache region in real time;
and when the audio frame accumulation reference condition is determined to be met, acquiring the audio frame which is firstly cached from the audio signal cache region, and performing gain control processing matched with the current gain control type.
2. The method of claim 1, wherein determining the gain control type in real time based on the signal energy of each of the audio frames currently buffered in the audio signal buffer comprises:
acquiring a new audio frame currently cached in the audio signal cache region in real time, and calculating the signal energy of the new audio frame;
judging whether the signal energy is greater than or equal to a global energy threshold value, and updating a high-energy frame counting result and a low-energy frame counting result which are matched with the audio signal buffer area according to a judgment result;
and determining the gain control type according to the high-energy frame counting result and the low-energy frame counting result.
3. The method of claim 2, wherein updating the high-energy frame count result and the low-energy frame count result that match the audio signal buffer according to the determination comprises:
when the judgment result is that the signal energy is larger than or equal to the global energy threshold, accumulating the counting result of the high-energy frames; if the accumulated result is determined to exceed a first threshold value, setting the counting result of the low-energy frames as a set initial value;
when the judgment result is that the signal energy is smaller than a global energy threshold value, accumulating the counting result of the low-energy frames; and if the accumulated result is determined to exceed the second threshold value, setting the high-energy frame counting result as a set initial value.
4. The method of claim 3, wherein determining the gain control type based on the high energy frame count result and the low energy frame count result comprises:
determining the gain control type to be an amplified audio frame signal if it is determined that the high-energy frame count result exceeds the first threshold;
determining the gain control type to be a suppressed audio frame signal if it is determined that the low energy frame count result exceeds the second threshold;
determining the gain control type to leave the audio frame signal unchanged if it is determined that the high-energy frame count result does not exceed the first threshold or it is determined that the low-energy frame count result does not exceed the second threshold.
5. The method of claim 4, further comprising, after determining a gain control type based on the high energy frame count result and the low energy frame count result:
determining a gain adjustment value corresponding to the gain control type according to a high energy frame counting result or a low energy frame counting result corresponding to the gain control type;
obtaining the audio frame buffered firstly from the audio signal buffer area, and performing gain control processing matched with the current gain control type, wherein the gain control processing comprises the following steps:
and acquiring the audio frame buffered firstly from the audio signal buffer area, and performing gain control processing on the acquired audio frame by using the gain adjustment value corresponding to the current gain control type.
6. The method of claim 5, wherein determining the gain adjustment value corresponding to the gain control type according to the high energy frame count result or the low energy frame count result corresponding to the gain control type comprises:
in determining that the gain control type is an amplified audio frame signal, using the formula:
Figure FDA0002217302190000021
calculating the gain adjustment value;
where Gain is an amplification Gain value, a is a Gain amplification factor, H is the high-energy frame count result, and B is the first threshold.
7. The method of claim 5, wherein determining the gain adjustment value corresponding to the gain control type according to the high energy frame count result or the low energy frame count result corresponding to the gain control type comprises:
upon determining that the gain control type is to suppress an audio frame signal, using the formula: gain is 0.95(L-C)Calculating the gain adjustment value;
wherein Gain is a suppression Gain value, L is the low energy frame count result, and C is the second threshold.
8. The method of claim 2, wherein obtaining a new audio frame currently buffered in the audio signal buffer in real-time and calculating the signal energy of the new audio frame comprises:
based on a new audio frame currently buffered, the formula is used: p (t) ═ α X2(t)+(1-α)X2(t-1) calculating a signal energy of the new audio frame;
wherein, p (t) represents the signal energy of the current buffered tth audio frame, α is a constant, x (t) represents the tth audio frame signal, and t represents the frame number of the current audio frame.
9. The method of claim 1, wherein the audio frame accumulation reference condition comprises:
the audio signal buffer area is cached with audio frames of accumulated reference quantity, or no new audio frame is cached in the audio signal buffer area.
10. An audio signal gain control apparatus, comprising:
the audio signal storage module is used for collecting audio signals in real time and sequentially storing the audio signals in an audio signal buffer area according to the form of audio frames;
the gain control type determining module is used for determining the gain control type according to the signal energy of each audio frame currently cached in the audio signal cache region in real time;
and the gain control processing module is used for acquiring the audio frame cached firstly from the audio signal cache region and carrying out gain control processing matched with the current gain control type when the audio frame accumulation reference condition is determined to be met.
11. The apparatus of claim 10, wherein the gain control type determining module comprises:
the signal energy calculating unit is used for acquiring a new audio frame currently cached in the audio signal cache region in real time and calculating the signal energy of the new audio frame;
the counting result updating unit is used for judging whether the signal energy is greater than or equal to a global energy threshold value or not and updating a high-energy frame counting result and a low-energy frame counting result which are matched with the audio signal buffer area according to the judgment result;
and the gain control type determining unit is used for determining the gain control type according to the high-energy frame counting result and the low-energy frame counting result.
12. The apparatus according to claim 11, wherein the counting result updating unit is specifically configured to:
when the judgment result is that the signal energy is larger than or equal to the global energy threshold, accumulating the counting result of the high-energy frames; if the accumulated result is determined to exceed a first threshold value, setting the counting result of the low-energy frames as a set initial value;
when the judgment result is that the signal energy is smaller than a global energy threshold value, accumulating the counting result of the low-energy frames; and if the accumulated result is determined to exceed the second threshold value, setting the high-energy frame counting result as a set initial value.
13. The apparatus according to claim 12, wherein the gain control type determining unit is specifically configured to:
determining the gain control type to be an amplified audio frame signal if it is determined that the high-energy frame count result exceeds the first threshold;
determining the gain control type to be a suppressed audio frame signal if it is determined that the low energy frame count result exceeds the second threshold;
determining the gain control type to leave the audio frame signal unchanged if it is determined that the high-energy frame count result does not exceed the first threshold or it is determined that the low-energy frame count result does not exceed the second threshold.
14. The apparatus of claim 13, wherein the gain control type determination module further comprises:
a gain adjustment value determination unit configured to determine a gain control type according to the high energy frame count result and the low energy frame count result, and then determine a gain adjustment value corresponding to the gain control type according to the high energy frame count result or the low energy frame count result corresponding to the gain control type;
the gain control processing module is specifically configured to:
and acquiring the audio frame buffered firstly from the audio signal buffer area, and performing gain control processing on the acquired audio frame by using the gain adjustment value corresponding to the current gain control type.
15. The apparatus according to claim 14, wherein the gain adjustment value determining unit is specifically configured to:
in determining that the gain control type is an amplified audio frame signal, using the formula:
Figure FDA0002217302190000051
calculating the gain adjustment value;
where Gain1 is an amplification Gain value, a is a Gain amplification factor, H is the high-energy frame count result, and B is the first threshold.
16. The apparatus according to claim 14, wherein the gain adjustment value determining unit is specifically configured to:
upon determining that the gain control type is to suppress an audio frame signal, using the formula: gain2 ═ 0.95(L-C)Calculating the gain adjustment value;
where Gain2 is a suppression Gain value, L is the low energy frame count result, and C is the second threshold.
17. The apparatus according to claim 11, wherein the signal energy calculation unit is specifically configured to:
based on a new audio frame currently buffered, the formula is used: p (t) ═ α X2(t)+(1-α)X2(t-1) calculating a signal energy of the new audio frame;
wherein, p (t) represents the signal energy of the current buffered tth audio frame, α is a constant, x (t) represents the tth audio frame, and t represents the frame number of the audio signal.
18. An electronic device, characterized in that the device comprises:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the audio signal gain control method of any of claims 1-9.
19. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method for gain control of an audio signal as claimed in any one of the claims 1 to 9.
CN201910920120.6A 2019-09-26 2019-09-26 Audio signal gain control method, device, equipment and storage medium Pending CN112564655A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910920120.6A CN112564655A (en) 2019-09-26 2019-09-26 Audio signal gain control method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910920120.6A CN112564655A (en) 2019-09-26 2019-09-26 Audio signal gain control method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112564655A true CN112564655A (en) 2021-03-26

Family

ID=75030222

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910920120.6A Pending CN112564655A (en) 2019-09-26 2019-09-26 Audio signal gain control method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112564655A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112735487A (en) * 2021-03-29 2021-04-30 智道网联科技(北京)有限公司 Voice data processing method and device and electronic equipment
WO2023098103A1 (en) * 2021-12-03 2023-06-08 北京达佳互联信息技术有限公司 Audio processing method and audio processing apparatus

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112735487A (en) * 2021-03-29 2021-04-30 智道网联科技(北京)有限公司 Voice data processing method and device and electronic equipment
CN112735487B (en) * 2021-03-29 2021-07-09 智道网联科技(北京)有限公司 Voice data processing method and device and electronic equipment
WO2023098103A1 (en) * 2021-12-03 2023-06-08 北京达佳互联信息技术有限公司 Audio processing method and audio processing apparatus

Similar Documents

Publication Publication Date Title
US10403299B2 (en) Multi-channel speech signal enhancement for robust voice trigger detection and automatic speech recognition
CN106898359B (en) Audio signal processing method and system, audio interaction device and computer equipment
CN109473118B (en) Dual-channel speech enhancement method and device
US10755728B1 (en) Multichannel noise cancellation using frequency domain spectrum masking
US10771621B2 (en) Acoustic echo cancellation based sub band domain active speaker detection for audio and video conferencing applications
US11373667B2 (en) Real-time single-channel speech enhancement in noisy and time-varying environments
CN112004177B (en) Howling detection method, microphone volume adjustment method and storage medium
CN105304093A (en) Signal front-end processing method used for voice recognition and device thereof
CN110634496B (en) Double-talk detection method and device, computer equipment and storage medium
CN112564655A (en) Audio signal gain control method, device, equipment and storage medium
CN111800725A (en) Howling detection method and device, storage medium and computer equipment
CN110992967A (en) Voice signal processing method and device, hearing aid and storage medium
CN111048118B (en) Voice signal processing method and device and terminal
CN111081233B (en) Audio processing method and electronic equipment
JP2020504966A (en) Capture of distant sound
CN104867498A (en) Mobile communication terminal and voice enhancement method and module thereof
CN112259117A (en) Method for locking and extracting target sound source
CN112997249B (en) Voice processing method, device, storage medium and electronic equipment
CN110992975A (en) Voice signal processing method and device and terminal
KR101811635B1 (en) Device and method on stereo channel noise reduction
CN105895084A (en) Signal gain method and apparatus applied to speech recognition
CN111048096B (en) Voice signal processing method and device and terminal
CN115410593A (en) Audio channel selection method, device, equipment and storage medium
CN112165558B (en) Method and device for detecting double-talk state, storage medium and terminal equipment
CN114255779A (en) Audio noise reduction method for VR device, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination