CN113473316B - Audio signal processing method, device and storage medium - Google Patents

Audio signal processing method, device and storage medium Download PDF

Info

Publication number
CN113473316B
CN113473316B CN202110737797.3A CN202110737797A CN113473316B CN 113473316 B CN113473316 B CN 113473316B CN 202110737797 A CN202110737797 A CN 202110737797A CN 113473316 B CN113473316 B CN 113473316B
Authority
CN
China
Prior art keywords
audio signal
gain
interval
threshold interval
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110737797.3A
Other languages
Chinese (zh)
Other versions
CN113473316A (en
Inventor
严涛
修平平
朱赛男
浦宏杰
鄢仁祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Keda Technology Co Ltd
Original Assignee
Suzhou Keda Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Keda Technology Co Ltd filed Critical Suzhou Keda Technology Co Ltd
Priority to CN202110737797.3A priority Critical patent/CN113473316B/en
Publication of CN113473316A publication Critical patent/CN113473316A/en
Application granted granted Critical
Publication of CN113473316B publication Critical patent/CN113473316B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/04Circuits for transducers, loudspeakers or microphones for correcting frequency response

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention provides an audio signal processing method, an audio signal processing device and a storage medium, wherein the audio signal processing method comprises the following steps: acquiring a current frame audio signal as a first audio signal; carrying out first automatic gain processing on the first audio signal to obtain a second audio signal; the first automatic gain processing specifically includes: dividing the first audio signal into audio signals of a plurality of first sub-frames; calculating to obtain a first parameter value according to the audio parameter of each first subframe; comparing the size of the first parameter value with a first threshold interval, and accumulating to obtain the number M of the subframes of which the first parameter value is not in the first threshold interval; and when M is larger than a first statistic value, simultaneously performing signal gain on the sample points of the first audio signal. The invention simultaneously gains the sample points of the first audio signal, thereby maintaining the dynamic range of the audio, keeping the fluctuation sense of the volume of the audio and improving the voice quality.

Description

Audio signal processing method, device and storage medium
Technical Field
The present invention relates to the field of audio technologies, and in particular, to a method and an apparatus for processing an audio signal, and a storage medium.
Background
With the wide use of modern communication technologies, competition among communication enterprises is continuously aggravated, and in order to improve the competitive advantage of the communication enterprises, the communication enterprises need to improve the quality of communication signals of the communication enterprises and improve the stability, safety and efficiency of each index of a communication system.
In the implementation of the audio signal processing method, an automatic gain control Algorithm (AGC) algorithm is adopted, so that the stability of an audio signal system and audio signal output can be improved, and the problem of signal distortion after AGC debugging is solved. At present, most of AGC (automatic gain control algorithm) processes each point in a frame of voice signal point by point, which results in loss of voice dynamics and reduces fluctuation of voice.
Disclosure of Invention
In view of this, embodiments of the present invention provide an audio signal processing method, an audio signal processing apparatus, and a storage medium, so as to solve the technical problem in the prior art that a loss of speech dynamics is caused after audio signal processing.
In a first aspect, the present invention provides an audio signal processing method, including the steps of: acquiring a current frame audio signal as a first audio signal; carrying out first automatic gain processing on the first audio signal to obtain a second audio signal; the step of performing a first automatic gain processing on the first audio signal to obtain a second audio signal specifically includes: dividing the first audio signal into audio signals of a plurality of first sub-frames; calculating to obtain a first parameter value according to the audio parameter of each first subframe; comparing the size of the first parameter value with a first threshold interval, and accumulating to obtain the number M of the subframes of which the first parameter value is not in the first threshold interval; and when M is larger than a first statistic value, simultaneously performing signal gain on the sample points of the first audio signal.
Further, after the step of performing the first automatic gain processing on the first audio signal to obtain the second audio signal, the method further includes: carrying out echo cancellation processing on the second audio signal to obtain a third audio signal; performing second automatic gain processing on the third audio signal to obtain a fourth audio signal; and outputting the fourth audio signal.
Further, when the VAD is performed on the third audio signal, if the VAD does not detect the voice, the second automatic gain processing is: and simultaneously performing signal gain on the sample points of the third audio signal, wherein the gain value of the second automatic gain is the gain value of the last frame of the third audio signal.
Further, when the VAD is performed on the third audio signal, if the VAD detects speech, the second automatic gain processing specifically includes the following steps: slicing the third audio signal into audio signals of a plurality of second sub-frames; calculating to obtain a second parameter value according to the audio parameter of each second subframe; comparing the size of the second parameter value with a second threshold interval, and accumulating to obtain the number N of the subframes of which the second parameter value is not in the second threshold interval; judging whether N is larger than a second statistic numerical value; and if so, simultaneously performing signal gain on the sample points of the third audio signal.
Further, the step of performing signal gain on the sample points of the first audio signal simultaneously specifically includes the following steps: dividing the number M of first subframes which are not in the first threshold interval into two types, wherein the first type is the number Ma on the left side of the first threshold interval; wherein the second class is the number Mb to the right of the first threshold interval, where Ma + Mb = M; comparing the Ma and Mb sizes; when Ma is larger than Mb, judging that the gain value of the first gain is a positive number, and increasing the gain at the moment; and when the Ma is smaller than the Mb, judging that the gain value of the first gain is a negative number, and reducing the gain.
Further, the step of simultaneously performing signal gain on the sample points of the third audio signal specifically includes the following steps: dividing the number N of second subframes not in the second threshold interval into two types, wherein the first type is the number Na on the left side of the second threshold interval; the second type is the number Nb on the right side of the second threshold interval, where Na + Nb = N; comparing the sizes of Na and Nb; when Na is larger than Nb, judging that the gain value of the first gain is a positive number, and increasing the gain at the moment; and when the Na is smaller than the Nb, judging that the gain value of the second gain is negative, and reducing the gain.
Further, the first threshold interval is located in a central interval of the volume interval; the difference between the values of the left end point of the first threshold interval and the left end point of the volume interval is equal to the difference between the values of the right end point of the first threshold interval and the right end point of the volume interval.
Further, the volume interval includes n consecutive sub-intervals, the first threshold interval is a kth sub-interval, and the kth sub-interval is a central interval of the volume interval; then, in the step of cumulatively obtaining the number M of the subframes of which the first parameter value is not within the first threshold interval, the method further includes: counting the number Q of the first parameter values in the n subintervals respectively, wherein Q is less than or equal to M; when M is greater than a first count value, the step of performing signal gain on the sample points of the first audio signal simultaneously includes: when the number Q of at least one subinterval is larger than the corresponding third statistic numerical value, simultaneously performing signal gain on the sample points of the first audio signal; and the third statistical number value corresponding to each subinterval is in negative correlation with the distance from the subinterval to the first threshold interval.
In a second aspect, the present invention provides an audio signal processing apparatus comprising: the acquisition module is used for acquiring a current frame audio signal as a first audio signal; the first automatic gain processing module is used for carrying out first automatic gain processing on the first audio signal to obtain a second audio signal; wherein, the first automatic gain processing module specifically comprises: a dividing unit that divides the first audio signal into audio signals of a plurality of first sub-frames; the calculating unit is used for calculating to obtain a first parameter value according to the audio parameter of each first subframe; the statistical unit is used for comparing the size of the first parameter value with a first threshold interval and accumulating to obtain the number M of the subframes of which the first parameter value is not in the first threshold interval; and the judging unit is used for simultaneously carrying out signal gain on the sample points of the first audio signal when the M is larger than a first statistic value.
In a third aspect, the present invention provides a computer-readable storage medium storing computer instructions for causing a computer to execute the audio signal processing method.
The technical scheme of the invention has the following advantages:
the invention provides an audio signal processing method, an audio signal processing device and a storage medium, wherein a frame of audio signal is divided into a plurality of subframes, and the audio parameter of each subframe is judged according to the threshold interval. Finally, the sample points of one frame of audio signal are simultaneously gained, thus the dynamic range of the audio is kept, the fluctuation of the volume of the audio is kept, and the voice quality can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of an audio signal processing method provided according to an embodiment of the present invention;
fig. 2 is a flowchart of an audio signal processing method provided according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a first threshold interval provided in accordance with an embodiment of the present invention;
fig. 4 is a flowchart of an audio signal processing method provided according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a second threshold interval provided in accordance with an embodiment of the present invention;
fig. 6 is a schematic diagram of an audio signal processing apparatus provided according to an embodiment of the present invention;
fig. 7 is a schematic diagram of an audio signal processing apparatus provided according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of an audio signal processing system provided in accordance with an embodiment of the invention;
fig. 9 is a schematic diagram of an electronic device provided in accordance with an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
In one embodiment, as shown in fig. 1 and fig. 2, the present invention provides an audio signal processing method, which divides a first audio signal of a current frame into a plurality of subframes, performs threshold determination on an audio parameter of each subframe, and applies a changed gain to a sample point of the current frame signal while performing signal gain, so as to consider a dynamic continuous characteristic of the audio signal during processing, and the audio signal processing method includes the following steps S1 to S5.
S1, acquiring a current frame audio signal as a first audio signal.
The first audio signal is provided with a plurality of sample points, and the number of the sample points is set according to actual needs. In one embodiment the 48k sample rate frame audio signal is 512 sample points, in other embodiments it may be 480 points.
And S2, carrying out first automatic gain processing on the first audio signal to obtain a second audio signal.
This process is used to eliminate the problem of popping before the audio signal is input. The step of performing the first automatic gain processing on the first audio signal to obtain the second audio signal specifically includes S201 to S204.
S201, the first audio signal is divided into a plurality of audio signals of first sub-frames.
In the present embodiment, the slicing criterion is equal slicing, for example, equal slicing is performed according to the total duration of the current frame audio signal, i.e. the first audio signal, into a plurality of first subframes with the same duration. For example, in one embodiment, the first audio signal is equally divided into audio signals of 16 first sub-frames, and 32 sample points are taken per sub-frame. Other ways of segmenting are included in other embodiments.
S202, a first parameter value is obtained through calculation according to the audio parameter of each first subframe.
Wherein the audio parameters include: mean energy, and/or envelope peak. If the audio parameter is the mean energy RMS, it is used to characterize the amount of energy in the signal. The numerical value of the first parameter is calculated by the formula
Figure BDA0003142184840000051
y i Represents the audio amplitude value of the sample point in the ith subframe, and k is the number of the sample points of the ith subframe. If the audio parameter is an envelope peak, the first parameter value is a sample point where the envelope peak represents the maximum amplitude value in the signal.
S203, comparing the value of the first parameter with a first threshold interval, and accumulating to obtain the number M of first subframes of the audio parameter, which are not in the first threshold interval.
In an embodiment, the first threshold interval is located in a central interval of a volume interval, and the first threshold interval is a range of volume; the difference between the values of the left end point of the first threshold interval and the left end point of the volume interval is equal to the difference between the values of the right end point of the first threshold interval and the right end point of the volume interval. If the target speech level at the center point is set to-22 dBfs, the left end point of the first threshold interval may be obtained by subtracting one unit from-22 dBfs, and the right end point of the first threshold interval may be obtained by adding one unit to-22 dBfs.
Specifically, after the value range of the first threshold interval is determined, the size of the first parameter value calculated by each first subframe is sequentially compared with the first threshold interval, whether the first parameter value of the current first subframe falls into the first threshold interval is judged, if not, the value of the statistical frequency is added by 1, if so, the value of the statistical frequency is added by 0, the first parameter value of the next first subframe of the current first subframe is obtained, the judging step is repeated until the statistical frequency value M is greater than the first statistical frequency value, counting is stopped or until the first parameter values corresponding to all the first subframes of the first audio signal are compared with the first threshold interval, and the statistical frequency value M is obtained.
In another embodiment, the sum of the audio parameters of the first subframe is calculated, and then the volume interval corresponding to the audio parameters is divided into a plurality of continuous subintervals, for example, the volume interval includes n continuous subintervals, the first threshold interval is a kth subinterval, and the kth subinterval is a central interval of the volume interval; the interval size of each subinterval in the volume interval is two unit lengths, the unit length can be set according to actual needs, for example, the unit length is set to 2dBfs, the optimal voice level is used as the central point of the volume interval, for example, the target voice level of the central point is set to-22 dBfs, the left end point of the first threshold interval can be obtained by subtracting one unit from-22 dBfs, and the right end point of the first threshold interval can be obtained by adding one unit from-22 dBfs, namely, the first threshold interval is (-24 dBfs, -20 dBfs); the subinterval adjacent to the right end point of the first threshold interval is (-20 dBfs, -16 dBfs), the subinterval adjacent to the left end point of the first threshold interval is (-28 dBfs, -24 dBfs), and so on, the range of each subinterval in the volume interval is obtained.
The value of each sub-interval is only an example, and in other embodiments, the interval size of the first threshold interval may be set to be two unit lengths, and the sizes of the sub-intervals except the first threshold interval are one unit length, that is, the first threshold interval is (-24 dBfs, -20 dBfs); the subinterval adjacent to the right end point of the first threshold interval is (-20 dBfs, -18 dBfs), the subinterval adjacent to the left end point of the first threshold interval is (-26 dBfs, -24 dBfs), and so on. The specific setting of the interval length of each subinterval can be set according to actual needs. In one embodiment, as shown in fig. 3, the volume interval is divided into 7 sub-intervals, and the (4) th sub-interval is used as the first threshold interval.
When the volume interval includes a plurality of sub-intervals, the step of accumulating the number M of the sub-frames of which the first parameter value is not in the first threshold interval includes: and counting the number Q of the first parameter values in the n subintervals respectively, wherein Q is less than or equal to M.
Specifically, the number of first parameter values corresponding to each subframe in the first audio signal falling in each subinterval is counted, and each subinterval corresponds to a count value.
In the embodiment of the invention, the audio parameter of the first subframe is used as the algorithm parameter, so that the comparison is carried out by setting the first threshold interval of a dynamic range to judge whether the gain adjustment is needed.
And S204, when the M is larger than a first statistic numerical value, simultaneously performing signal gain on the sample points of the first audio signal.
Specifically, the first count value is a preset value set according to a tolerable degree of deviation of the audio signal. That is, an audio signal in a first subframe corresponding to a first parameter value falling within a first threshold interval is considered as a standard audio signal, an audio signal corresponding to a first subframe corresponding to a first parameter value falling outside the first threshold interval is a non-standard audio signal, and when a first statistical count threshold is obtained according to a statistical count value that the first parameter value of each first subframe in the current frame of audio signal (first audio signal) falls outside the first threshold interval, it is indicated that the first audio signal has a larger proportion of non-standard audio signals, and a gain adjustment needs to be performed on the first audio signal, so that the first audio signal meets the requirement of audio quality. And the statistics is carried out by combining the conditions of the plurality of first subframes, whether gain adjustment is carried out or not is determined by using the statistical value, and the gain adjustment is carried out when a certain first subframe does not meet the optimal standard requirement, so that the gain of the audio signal is prevented from being adjusted by frequency, and meanwhile, the statistical value is more universal, and the accuracy of the gain adjustment is ensured.
In one embodiment, simultaneously signal-gaining sample points of the first audio signal when M is greater than the first statistical value comprises: dividing the number M of first subframes which are not in the first threshold interval into two types, wherein the first type is the number Ma of the subframes on the left side of the first threshold interval; wherein the second type is the number Mb on the right side of the first threshold interval, where Ma + Mb = M; comparing the magnitude of Ma with Mb; when Ma is larger than Mb, judging that the first gain value is a positive number, and increasing the gain at the moment; and when the Ma is smaller than the Mb, judging that the first gain value is a negative number, and reducing the gain.
In another embodiment, the volume interval includes n consecutive sub-intervals, and a third statistical value corresponding to each sub-interval is respectively set, so that when M is greater than the first statistical value, the step of performing signal gain on the sample points of the first audio signal simultaneously includes: when the number Q of at least one subinterval is larger than the corresponding third statistic numerical value, simultaneously performing signal gain on the sample points of the first audio signal; wherein the third count value is inversely related to the distance from the subinterval to the first threshold interval.
Specifically, since the volume areas corresponding to different subintervals are different, and the difference from the standard volume is different, the larger the difference from the standard volume is, the larger the influence on the audio quality is, and thus the corresponding tolerable degree is smaller. If the first threshold interval is an interval of the standard volume, the audio signal falling into the interval does not need to be subjected to gain processing, and therefore the third statistical frequency of the subinterval corresponding to the first threshold interval can be set to be infinite; the sub-interval close to the first threshold interval has a smaller loss relative to the standard volume, so that the third statistical times corresponding to the sub-interval adjacent to the first threshold interval is set to be the first preset value, for example, the first preset value is 20 times; and setting the third statistical times corresponding to the subintervals separated from the first threshold interval by one subinterval as a second preset value, wherein the second preset value is smaller than the first preset value, and if the second preset value is 10 times, and so on, respectively setting the corresponding third statistical times according to the distances from the subintervals to the first threshold interval.
In one embodiment, as shown in FIG. 3, the first threshold interval is the (4) th interval [ -23, -21]; other subintervals are (-21, -19), (-19, -17), (-25, -23), (-27, -25), respectively; setting the third statistic value corresponding to the subintervals (-21, -19) and (-25, -23) to be 20 times; setting the third statistic value corresponding to subintervals (-19, -17), (-27, -25) to 10. Respectively calculating first parameter values corresponding to each first subframe in the first audio signal, counting the range of subintervals in which each first parameter value falls, and accumulating the times of falling into each subinterval, wherein if the accumulated times of falling into the subinterval (-21, -19) of each first subframe of the first audio signal is 12 times, the accumulated times of falling into the subinterval (-27, -25) is 10 times, and because the accumulated times of falling into the subinterval (-27, -25) reaches a third statistical times corresponding to the subinterval, simultaneously performing signal gain on a sample point of the first audio signal. That is, as long as the number of times of statistics falling into one of the subintervals reaches the third number of times of statistics corresponding to the subintervals, the signal gain is performed on the sample points of the first audio signal at the same time.
The first gain calculation value can be obtained by looking up a preset gain table, for example, the gain value at the middle position of the gain table is 1, the gain value is decreased by 0.2 by advancing one bit, and the gain value is increased by 0.2 by advancing one bit, when the gain value at the last moment of a certain moment is at the middle position, if Ma is greater than the first statistic value and Ma is greater than Mb at the moment, the gain is positive, the new gain value obtained by moving one bit backwards from the middle position of the gain table is 1.2; if Mb is greater than the first statistic value and Ma is far less than Mb at this moment, the gain is negative, and the new gain value obtained by moving one bit from the middle position of the gain table is 0.8. The gain value of the first automatic gain is less than or equal to 1.
And S3, carrying out echo cancellation processing on the second audio signal to obtain a third audio signal.
Performing echo cancellation processing on the second audio signal may prevent a far-end participant in the teleconference from hearing an echo of his own voice. In a telephone call or teleconference there is one near end and one far end. The near end is the location of your call and the far end is the location of other participants in the call. At each location, there is at least one microphone and one speaker. In other embodiments, other processing such as denoising may be performed on the second audio signal in step S3. This prevents the far-end participants in the teleconference from hearing echoes of their own voice.
And S4, performing second automatic gain processing on the third audio signal to obtain a fourth audio signal.
And performing automatic gain processing on the third audio signal again to obtain a fourth audio signal, and outputting the fourth audio signal, that is, performing automatic gain processing again before the audio signal is output, so that the tone quality of the output audio signal can be improved.
The second automatic gain judging step is the same as the first automatic gain judging step, but the processing procedure is slightly different. As shown in fig. 4, specifically, the second automatic gain processing specifically includes the following steps: s401 to S405.
S401, VAD detection is performed on the third audio signal, and if VAD detects speech, the process proceeds to step S402.
When the VAD detection is performed on the third audio signal, if the VAD does not detect the voice, the second automatic gain processing is: and performing signal gain on the third audio signal, wherein the gain value of the second automatic gain is the gain value of the last frame of the third audio signal.
S402, the third audio signal is divided into a plurality of audio signals of second sub-frames.
And S403, calculating to obtain a second parameter value according to the audio parameter of each second subframe. The calculation method of the second parameter value refers to the calculation method of the first parameter value in step S202.
S404, comparing the value of the second parameter with a second threshold interval, and accumulating to obtain the number N of the sub-frames of the audio parameter which is not in the second threshold interval.
As shown in fig. 5, in an embodiment, the second threshold interval is located in a central interval of a volume interval, and the second threshold interval is a range of volume; the difference between the values of the left end point of the second threshold interval and the left end point of the volume interval is equal to the difference between the values of the right end point of the second threshold interval and the right end point of the volume interval. If the target speech level at the center point is set to-22 dBfs, the left end point of the first threshold interval may be obtained by subtracting two units from-22 dBfs, and the right end point of the first threshold interval may be obtained by adding two units to-22 dBfs.
In another embodiment, the volume interval includes n consecutive sub-intervals, the second threshold interval is a kth sub-interval, and the kth sub-interval is a central interval of the volume interval; then, in the step of cumulatively obtaining the number N of the subframes for which the second parameter value is not within the second threshold interval, the method further includes: and counting the number q of the second parameter values in the N subintervals respectively, wherein q is less than or equal to N.
And S405, when N is larger than a second statistic numerical value, simultaneously performing signal gain on the sample points of the third audio signal.
Specifically, the step of simultaneously performing signal gain on the sample points of the third audio signal includes: dividing the number N of second subframes not in the second threshold interval into two types, wherein the first type is the number Na on the left side of the second threshold interval; the second type is the number Nb on the right side of the second threshold interval, where Na + Nb = N; comparing the sizes of Na and Nb; when Na is larger than Nb, judging that the gain value of the second gain is a positive number, and increasing the gain at the moment; and when the Na is smaller than the Nb, judging that the gain value of the second gain is negative, and reducing the gain.
In an embodiment, the volume interval includes N consecutive sub-intervals, the number q of the second parameter value in the N sub-intervals is counted, and when N is greater than a second counted number value, the step of performing signal gain on the sample points of the first audio signal at the same time includes: when the number q of at least one subinterval is larger than the corresponding fourth statistic value, simultaneously performing signal gain on the sample points of the third audio signal; wherein the fourth count value is inversely related to the distance from the subinterval to the second threshold interval.
In this embodiment, the manner of acquiring the range of each sub-interval and the manner of acquiring the fourth statistical number corresponding to each sub-interval are the same as the manner of acquiring the range of each sub-interval related to the first threshold interval and the manner of acquiring the third statistical number corresponding to each sub-interval in the above embodiment, and are not described herein again.
Specifically, as shown in fig. 5, that is, the second threshold interval is the (4) th sub-interval of [ -24, -20]; other subintervals are (-20, -18), (-18, -16), (-26, -24), (-28, -26), respectively; setting the fourth statistic value corresponding to the subintervals (-20, -18), (-26, -24) to be 15; the subintervals (-18, -16), (-28, -26) are set to correspond to a third statistical count of 5. Respectively calculating first parameter values corresponding to each first subframe in the first audio signal, counting the range of subintervals in which each first parameter value falls and accumulating the times of the subintervals, wherein if the times of the first subframes of the first audio signal which fall into the subintervals (-26, -24) are 15 times, the times of the first subframes which fall into the subintervals (-28, -26) are 4 times, and because the times of the first subframes which fall into the subintervals (-26, -24) reach the fourth times of the first statistics corresponding to the subintervals, simultaneously performing signal gain on sample points of the third audio signal. That is, as long as the number of times of statistics that fall into one of the subintervals reaches the fourth number of times of statistics corresponding to the subintervals, the signal gain is performed on the sample points of the third audio signal at the same time.
The second gain calculation value may be obtained by looking up a preset gain table, for example, if the gain value at the middle position of the gain table is 1, the gain value is decreased by 0.2 by advancing one bit, and the gain value is increased by 0.2 by advancing one bit, and if Na at the last moment is greater than the first statistical number value and Na is greater than Nb at the middle position at the last moment, the gain value is positive gain, and the new gain value obtained by moving one bit from the middle position of the gain table backward is 1.2; if the current moment Nb is greater than the first statistic and Na is less than Nb, the gain is negative, and the new gain value obtained by moving one bit from the middle position of the gain table is 0.8.
And S5, outputting the fourth audio signal.
In one embodiment, if there is no VAD detection in the first automatic gain processing step, the second automatic gain processing step includes VAD detection. Because of the need to take into account noisy conditions, the gain should remain constant in the presence of stationary or non-stationary noise. Voice Activity Detection (VAD), also called Voice endpoint Detection, voice boundary Detection. The aim is to identify and eliminate long silent periods from the voice signal stream to achieve the effect of saving speech path resources without reducing the quality of service.
The invention provides an audio signal processing method, which comprises the steps of dividing a frame of audio signal into a plurality of subframes, and judging the audio parameter and the threshold interval of each subframe.
In this embodiment, the first automatic gain process is only used to attenuate a large volume voice signal, and should be maintained for a small volume or a normal volume voice signal, while the second automatic gain process should take account of the situations of a large volume and a small volume, so the variation range of the gain value should be between a positive value and a negative value, that is, if the volume is large, the gain is a negative value; the volume is small and the gain is positive.
In one embodiment, as shown in fig. 6, the present invention provides an audio signal processing apparatus including: an acquisition module 11, a first automatic gain module 12, a signal processing module 13, and a second automatic gain module 14.
The obtaining module 11 is configured to obtain a current frame audio signal as a first audio signal to be input.
The first automatic gain module 12 is configured to perform a first automatic gain processing on the first audio signal to obtain a second audio signal.
The signal processing module 13 is configured to perform echo cancellation processing on the second audio signal to obtain a third audio signal.
The second automatic gain module 14 is configured to perform a second automatic gain processing on the third audio signal to obtain a fourth audio signal.
In one embodiment, as shown in fig. 7, the first automatic gain module 12 specifically includes: a segmentation unit 101, a calculation unit 102, a statistics unit 103, and a judgment unit 104.
The slicing unit 101 is configured to slice the first audio signal into audio signals of a plurality of first sub-frames.
The calculating unit 102 is configured to calculate a first parameter value according to the audio parameter of each first subframe.
The counting unit 103 is configured to compare the size of the first parameter value with a first threshold interval, and accumulate to obtain the number M of subframes where the first parameter value is not within the first threshold interval.
The determining unit 104 performs signal gain on the sample points of the first audio signal simultaneously when M is greater than a first statistic value.
The invention provides an audio signal processing device, which divides a frame of audio signal into a plurality of subframes through an automatic gain module and judges audio parameters of the subframes and a threshold interval. Finally, gain is carried out on the sample point of a frame of audio signal, so that the dynamic range of the audio is kept, the fluctuation sense of the volume of the audio is kept, and the voice quality can be improved.
In one embodiment, as shown in fig. 8, the present invention provides an audio signal processing system 200 comprising: an input device 201, a processor 202, and an output device 203.
The input device 201 is connected with the processor 202, and the output device 203 is connected with the processor 202. The input device 201 comprises a microphone and the output device 203 comprises a speaker.
The processor 51 may call program instructions to implement the audio signal processing method as shown in the embodiments of fig. 1 and fig. 2 of the present application.
The specific method of using the audio signal processing system 200 comprises the following steps:
the input device obtains the current frame audio signal as a first audio signal and inputs the first audio signal to the processor 202;
the processor 202 performs a first automatic gain processing on the first audio signal to obtain a second audio signal;
the processor 202 performs echo cancellation processing on the second audio signal to obtain a third audio signal;
the processor 202 performs a second automatic gain processing on the third audio signal to obtain a fourth audio signal, and sends the fourth audio signal to the output device 203;
the output means 203 outputs the fourth audio signal.
The steps of the processor 202 performing the first automatic gain processing on the first audio signal to obtain the second audio signal are steps S201 to S204 in the above embodiment.
The invention provides an audio signal processing system 200 for preventing pop sound, performing low gain for high sound and maintaining the same for low sound by performing a first automatic gain process on a first audio signal acquired by an input device. And carrying out second automatic gain processing on the output second audio signal, wherein the second automatic gain processing is used for low volume boosting and high volume attenuation, and the voice quality can be improved.
According to the first automatic gain or the second automatic gain processing, one frame of audio signal is divided into a plurality of subframes, and the audio parameters of the subframes and the threshold interval are judged. And finally, the gain is carried out on the first audio signal of one frame, so that the dynamic range of the audio is kept, and the fluctuation sense of the volume of the audio is kept.
In an embodiment, please refer to fig. 9, where fig. 9 is a schematic structural diagram of an electronic device according to an alternative embodiment of the present invention, the electronic device may include: at least one processor 51, such as a CPU (Central Processing Unit), at least one communication interface 53, memory 54, at least one communication bus 52. Wherein the communication bus 52 is used to enable connection communication between these components. The communication interface 53 may include a Display (Display) and a Keyboard (Keyboard), and the optional communication interface 53 may also include a standard wired interface and a standard wireless interface. The Memory 54 may be a high-speed RAM Memory (volatile Random Access Memory) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The memory 54 may alternatively be at least one memory device located remotely from the processor 51. Wherein the processor 51 may be combined with the apparatus described in fig. 6 and fig. 7, the memory 54 stores an application program, and the processor 51 calls the program code stored in the memory 54 for executing the steps of the audio signal processing method according to any of the above embodiments.
The communication bus 52 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 52 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The memory 54 may include a volatile memory (RAM), such as a random-access memory (RAM); the memory may also include a non-volatile memory (e.g., flash memory), a hard disk (HDD) or a solid-state drive (SSD); the memory 54 may also comprise a combination of the above types of memory.
The processor 51 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of a CPU and an NP.
The processor 51 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.
Optionally, the memory 54 is also used to store program instructions. The processor 51 may call program instructions to implement an audio signal processing method as shown in any of the embodiments of the present application.
Embodiments of the present invention further provide a non-transitory computer storage medium, where computer-executable instructions are stored, and the computer-executable instructions may execute the audio signal processing method in any of the method embodiments. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (10)

1. An audio signal processing method, comprising the steps of:
acquiring a current frame audio signal as a first audio signal;
carrying out first automatic gain processing on the first audio signal to obtain a second audio signal;
the step of performing a first automatic gain process on the first audio signal to obtain a second audio signal specifically includes:
dividing the first audio signal into audio signals of a plurality of first sub-frames;
calculating to obtain a first parameter value according to the audio parameter of each first subframe; wherein the audio parameters include: mean energy, and/or envelope peak;
comparing the size of the first parameter value with a first threshold interval, and accumulating to obtain the number M of the subframes of which the first parameter value is not in the first threshold interval;
and when M is larger than a first statistic numerical value, simultaneously performing signal gain on the sample points of the first audio signal.
2. The audio signal processing method of claim 1, further comprising, after the step of performing the first automatic gain processing on the first audio signal to obtain the second audio signal:
carrying out echo cancellation processing on the second audio signal to obtain a third audio signal;
performing second automatic gain processing on the third audio signal to obtain a fourth audio signal; and
outputting the fourth audio signal.
3. The audio signal processing method of claim 2, wherein when the VAD detects the third audio signal, if the VAD does not detect the voice, the second automatic gain processing is: and simultaneously performing signal gain on the sample points of the third audio signal, wherein the gain value of the second automatic gain is the gain value of a previous frame of the third audio signal.
4. The audio signal processing method of claim 2, wherein when the VAD detects the third audio signal, if the VAD detects voice, the second automatic gain processing specifically includes the following steps:
slicing the third audio signal into audio signals of a plurality of second sub-frames;
calculating to obtain a second parameter value according to the audio parameter of each second subframe;
comparing the size of the second parameter value with a second threshold interval, and accumulating to obtain the number N of the subframes of which the second parameter value is not in the second threshold interval;
judging whether N is larger than a second statistic numerical value; and if so, simultaneously performing signal gain on the sample points of the third audio signal.
5. The audio signal processing method according to claim 2,
in the step of performing signal gain on the sample points of the first audio signal simultaneously, the method specifically includes the following steps:
dividing the number M of first subframes which are not in the first threshold interval into two types, wherein the first type is the number Ma on the left side of the first threshold interval; wherein the second class is the number Mb to the right of the first threshold interval, where Ma + Mb = M;
comparing the Ma and Mb sizes; when Ma is larger than Mb, judging that the gain value of the first automatic gain is a positive number, and increasing the gain at the moment; and when the Ma is smaller than the Mb, judging that the gain value of the first automatic gain is negative, and reducing the gain.
6. The audio signal processing method according to claim 4,
in the step of performing signal gain on the sample points of the third audio signal simultaneously, the method specifically includes the following steps:
dividing the number N of second subframes not in the second threshold interval into two types, wherein the first type is the number Na on the left side of the second threshold interval; the second type is the number Nb on the right side of the second threshold interval, where Na + Nb = N;
comparing the sizes of Na and Nb; when Na is larger than Nb, judging that the gain value of the first automatic gain is a positive number, and increasing the gain at the moment; and when Na is smaller than Nb, judging that the gain value of the second automatic gain is negative, and reducing the gain at the moment.
7. The audio signal processing method according to claim 1, wherein the first threshold interval is located in a center interval of a volume interval; the difference between the values of the left end point of the first threshold interval and the left end point of the volume interval is equal to the difference between the values of the right end point of the first threshold interval and the right end point of the volume interval.
8. The audio signal processing method of claim 7, wherein the volume interval comprises n consecutive sub-intervals, the first threshold interval is a k-th sub-interval, and the k-th sub-interval is a central interval of the volume interval; then, in the step of cumulatively obtaining the number M of the subframes of which the first parameter value is not within the first threshold interval, the method further includes:
counting the number Q of the first parameter values in the n subintervals respectively, wherein Q is less than or equal to M;
when M is greater than a first statistical count value, the step of performing signal gain on the sample points of the first audio signal simultaneously includes: when the number Q of at least one subinterval is larger than the corresponding third statistic numerical value, simultaneously performing signal gain on the sample points of the first audio signal; and the third statistical number value corresponding to each subinterval is in negative correlation with the distance from the subinterval to the first threshold interval.
9. An audio signal processing apparatus, comprising:
the acquisition module is used for acquiring a current frame audio signal as a first audio signal;
the first automatic gain processing module is used for carrying out first automatic gain processing on the first audio signal to obtain a second audio signal;
wherein, the first automatic gain processing module specifically comprises:
a slicing unit slicing the first audio signal into audio signals of a plurality of first subframes;
the calculating unit is used for calculating to obtain a first parameter value according to the audio parameter of each first subframe; wherein the audio parameters include: mean energy, and/or envelope peak;
the statistical unit is used for comparing the size of the first parameter value with a first threshold interval and accumulating to obtain the number M of the subframes of which the first parameter value is not in the first threshold interval;
and the judging unit is used for simultaneously carrying out signal gain on the sample points of the first audio signal when the M is larger than a first statistic numerical value.
10. A computer-readable storage medium storing computer instructions for causing a computer to execute the audio signal processing method according to any one of claims 1 to 8.
CN202110737797.3A 2021-06-30 2021-06-30 Audio signal processing method, device and storage medium Active CN113473316B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110737797.3A CN113473316B (en) 2021-06-30 2021-06-30 Audio signal processing method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110737797.3A CN113473316B (en) 2021-06-30 2021-06-30 Audio signal processing method, device and storage medium

Publications (2)

Publication Number Publication Date
CN113473316A CN113473316A (en) 2021-10-01
CN113473316B true CN113473316B (en) 2023-01-31

Family

ID=77876654

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110737797.3A Active CN113473316B (en) 2021-06-30 2021-06-30 Audio signal processing method, device and storage medium

Country Status (1)

Country Link
CN (1) CN113473316B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5267322A (en) * 1991-12-13 1993-11-30 Digital Sound Corporation Digital automatic gain control with lookahead, adaptive noise floor sensing, and decay boost initialization
CN103544961A (en) * 2012-07-10 2014-01-29 中兴通讯股份有限公司 Voice signal processing method and device
CN103915103A (en) * 2014-04-15 2014-07-09 成都凌天科创信息技术有限责任公司 Voice quality enhancement system
CN104021796A (en) * 2013-02-28 2014-09-03 华为技术有限公司 Voice enhancement processing method and device
CN107430866A (en) * 2015-04-05 2017-12-01 高通股份有限公司 The gain parameter estimation scaled based on energy saturation and signal
CN108648765A (en) * 2018-04-27 2018-10-12 海信集团有限公司 A kind of method, apparatus and terminal of voice abnormality detection
CN110349595A (en) * 2019-07-22 2019-10-18 浙江大华技术股份有限公司 A kind of audio signal auto gain control method, control equipment and storage medium
CN111210021A (en) * 2020-01-09 2020-05-29 腾讯科技(深圳)有限公司 Audio signal processing method, model training method and related device

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100017203A1 (en) * 2008-07-15 2010-01-21 Texas Instruments Incorporated Automatic level control of speech signals
JP4439579B1 (en) * 2008-12-24 2010-03-24 株式会社東芝 SOUND QUALITY CORRECTION DEVICE, SOUND QUALITY CORRECTION METHOD, AND SOUND QUALITY CORRECTION PROGRAM
JP5085769B1 (en) * 2011-06-24 2012-11-28 株式会社東芝 Acoustic control device, acoustic correction device, and acoustic correction method
CN103688579A (en) * 2011-07-25 2014-03-26 高通股份有限公司 Method and apparatus for automatic gain control for TD-SCDMA systems
CN103325386B (en) * 2012-03-23 2016-12-21 杜比实验室特许公司 The method and system controlled for signal transmission
WO2014021890A1 (en) * 2012-08-01 2014-02-06 Dolby Laboratories Licensing Corporation Percentile filtering of noise reduction gains
US9741350B2 (en) * 2013-02-08 2017-08-22 Qualcomm Incorporated Systems and methods of performing gain control
US10360895B2 (en) * 2017-12-21 2019-07-23 Bose Corporation Dynamic sound adjustment based on noise floor estimate

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5267322A (en) * 1991-12-13 1993-11-30 Digital Sound Corporation Digital automatic gain control with lookahead, adaptive noise floor sensing, and decay boost initialization
CN103544961A (en) * 2012-07-10 2014-01-29 中兴通讯股份有限公司 Voice signal processing method and device
CN104021796A (en) * 2013-02-28 2014-09-03 华为技术有限公司 Voice enhancement processing method and device
CN103915103A (en) * 2014-04-15 2014-07-09 成都凌天科创信息技术有限责任公司 Voice quality enhancement system
CN107430866A (en) * 2015-04-05 2017-12-01 高通股份有限公司 The gain parameter estimation scaled based on energy saturation and signal
CN108648765A (en) * 2018-04-27 2018-10-12 海信集团有限公司 A kind of method, apparatus and terminal of voice abnormality detection
CN110349595A (en) * 2019-07-22 2019-10-18 浙江大华技术股份有限公司 A kind of audio signal auto gain control method, control equipment and storage medium
CN111210021A (en) * 2020-01-09 2020-05-29 腾讯科技(深圳)有限公司 Audio signal processing method, model training method and related device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
数字助听器中的音频动态范围控制算法研究;丁凯星等;《电子器件》;20200620(第03期);全文 *

Also Published As

Publication number Publication date
CN113473316A (en) 2021-10-01

Similar Documents

Publication Publication Date Title
US20200336602A1 (en) Detection of Acoustic Echo Cancellation
CN104980337B (en) A kind of performance improvement method and device of audio processing
CN104980600B (en) For controlling the threshold control system and method for nonlinear processor
US10356249B2 (en) Echo time delay detection method, echo elimination chip, and terminal equipment
CN104980601B (en) Gain control system and method for dynamic tuning echo canceller
CN104200810B (en) Automatic gain control equipment and method
CN113766073B (en) Howling detection in conference systems
CN109461455B (en) System and method for eliminating howling
EP3882913A1 (en) Echo elimination method and terminal
US8208621B1 (en) Systems and methods for acoustic echo cancellation
CN107360530B (en) Echo cancellation testing method and device
CN110782914B (en) Signal processing method and device, terminal equipment and storage medium
US9521263B2 (en) Long term monitoring of transmission and voice activity patterns for regulating gain control
US11164592B1 (en) Responsive automatic gain control
CN111800725B (en) Howling detection method and device, storage medium and computer equipment
CN108133712B (en) Method and device for processing audio data
CN110503973B (en) Audio signal transient noise suppression method, system and storage medium
CN110148421B (en) Residual echo detection method, terminal and device
CN113473316B (en) Audio signal processing method, device and storage medium
CN110611862A (en) Microphone gain adjusting method, device, system and storage medium
CN106571138B (en) Signal endpoint detection method, detection device and detection equipment
CN105895084B (en) A kind of signal gain method and apparatus applied to speech recognition
CN110718230B (en) Method and system for eliminating reverberation
CN111986694A (en) Audio processing method, device, equipment and medium based on transient noise suppression
US10388298B1 (en) Methods for detecting double talk

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant