CN115910094A

CN115910094A - Audio frame processing method and device, electronic equipment and storage medium

Info

Publication number: CN115910094A
Application number: CN202211168036.1A
Authority: CN
Inventors: 马路; 魏伟
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2022-09-23
Filing date: 2022-09-23
Publication date: 2023-04-04

Abstract

The embodiment of the invention discloses an audio frame processing method, an audio frame processing device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a first audio frame, and determining a first amplitude gain corresponding to the first audio frame; according to a second amplitude gain corresponding to a second audio frame, smoothing the first amplitude gain; and/or smoothing the first amplitude gain according to a third amplitude gain corresponding to a third audio frame; wherein the second audio frame is an audio frame adjacent to and preceding the first audio frame, and the third audio frame is an audio frame adjacent to and following the first audio frame; and adjusting the amplitude of the first audio frame based on the first amplitude gain after the smoothing processing to obtain a target audio frame. By the technical scheme of the embodiment of the invention, the problem of audio incoherence caused by the gain adjustment of the audio frame is solved, and the subjective auditory quality of the audio is improved.

Description

Audio frame processing method and device, electronic equipment and storage medium

Technical Field

Embodiments of the present invention relate to audio processing technologies, and in particular, to an audio frame processing method and apparatus, an electronic device, and a storage medium.

Background

An audio Automatic Gain Control (AGC) algorithm refers to an audio signal that is expanded or compressed by a Gain value to increase or decrease audio volume. In the prior art, a gain value is determined for each audio frame, and the volume of the corresponding audio frame is adjusted by using each gain value.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:

the method in the prior art easily causes the phenomenon of abrupt change of gain values at the joint of adjacent audio frames, thereby causing great instantaneous change of audio, causing the problem of discontinuous audio and influencing the subjective auditory quality of the audio.

Disclosure of Invention

The embodiment of the invention provides an audio frame processing method, an audio frame processing device, electronic equipment and a storage medium, which are used for solving the problem of audio incoherence caused by audio frame gain adjustment and improving the subjective auditory quality of audio.

In a first aspect, an embodiment of the present invention provides an audio frame processing method, where the method includes:

acquiring a first audio frame, and determining a first amplitude gain corresponding to the first audio frame;

according to a second amplitude gain corresponding to a second audio frame, smoothing the first amplitude gain; and/or smoothing the first amplitude gain according to a third amplitude gain corresponding to a third audio frame; wherein the second audio frame is an audio frame adjacent to and preceding the first audio frame, and the third audio frame is an audio frame adjacent to and following the first audio frame;

and adjusting the amplitude of the first audio frame based on the first amplitude gain after the smoothing processing to obtain a target audio frame.

In a second aspect, an embodiment of the present invention further provides an audio frame processing apparatus, where the apparatus includes:

the first amplitude gain determining module is used for acquiring a first audio frame and determining a first amplitude gain corresponding to the first audio frame;

the first amplitude gain smoothing module is used for smoothing the first amplitude gain according to a second amplitude gain corresponding to a second audio frame; and/or smoothing the first amplitude gain according to a third amplitude gain corresponding to a third audio frame; wherein the second audio frame is an audio frame adjacent to and preceding the first audio frame, and the third audio frame is an audio frame adjacent to and following the first audio frame;

and the first audio frame gain module is used for adjusting the amplitude of the first audio frame based on the first amplitude gain after the smoothing processing to obtain the target audio frame.

In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:

one or more processors;

a storage device to store one or more programs,

when executed by the one or more processors, cause the one or more processors to implement an audio frame processing method according to any one of the embodiments of the present invention.

In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the audio frame processing method according to any one of the embodiments of the present invention.

According to the technical scheme of the embodiment of the invention, the first audio frame is obtained, the first amplitude gain corresponding to the first audio frame is determined, the amplitude gain value corresponding to each sampling point in the first audio frame is determined, the first amplitude gain is smoothed according to the second amplitude gain corresponding to the second audio frame, and/or the first amplitude gain is smoothed according to the third amplitude gain corresponding to the third audio frame, so that the first amplitude gain is smoothed, the amplitude gain value is changed slowly, the amplitude of the first audio frame is adjusted based on the smoothed first amplitude gain, the target audio frame is obtained, the problems that the amplitude gain value suddenly changes at the joint of adjacent audio frames, the audio is not consistent and the audio quality is poor are solved, the effect of eliminating gain inconsistency is realized, and the subjective auditory quality of the audio is improved.

Drawings

In order to more clearly illustrate the technical solutions of the exemplary embodiments of the present invention, a brief description is given below of the drawings used in describing the embodiments. It should be clear that the described figures are only views of some of the embodiments of the invention to be described, not all, and that for a person skilled in the art, other figures can be derived from these figures without inventive effort.

Fig. 1 is a flowchart illustrating an audio frame processing method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of gain smoothing under the condition that the first amplitude gain is greater than the second amplitude gain according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating gain smoothing under the condition that the first amplitude gain is greater than the third amplitude gain according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating another audio frame processing method according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating another audio frame processing method according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an audio frame processing apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some structures related to the present invention are shown in the drawings, not all of them.

Fig. 1 is a schematic flowchart of an audio frame processing method according to an embodiment of the present invention, where the embodiment is applicable to a case where a gain adjustment is performed on an audio frame, the method may be implemented by an audio frame processing plant, and the apparatus may be implemented in the form of software and/or hardware, where the hardware may be an electronic device, and optionally, the electronic device may be a mobile terminal, a PC terminal, a server, and the like.

As shown in fig. 1, the method of the embodiment may specifically include:

s110, obtaining a first audio frame, and determining a first amplitude gain corresponding to the first audio frame.

The first audio frame may be an audio frame to be currently processed, and the first audio frame may be an original audio frame or an audio frame that has been subjected to audio processing other than gain processing. The first amplitude gain may be an amplitude gain value corresponding to the first audio frame obtained by data processing or the like. It will be appreciated that the first amplitude gain is an amplitude gain value that corresponds to the amplitude of each sample point in the first audio frame.

Specifically, in the process of processing the audio frame, a first audio frame may be acquired, so as to process the first audio frame subsequently. Furthermore, the first audio frame may be processed to obtain an amplitude gain value corresponding to the first audio frame, and the amplitude gain value may be used as the first amplitude gain.

In a specific implementation, in order to improve the accuracy of the first amplitude gain and to process the audio frame more accurately, the first amplitude gain corresponding to the first audio frame may be determined through the following steps:

step one, according to the amplitude of each sampling point in a first audio frame, determining a first average amplitude and a first maximum amplitude of the first audio frame.

The first average amplitude may be an average of amplitudes of the sampling points in the first audio frame. The first maximum amplitude may be a maximum of amplitudes of the sampling points in the first audio frame.

Specifically, after the first audio frame is obtained, the amplitudes of the sampling points in the first audio frame may be determined, an average value of the amplitudes of the sampling points is determined as a first average amplitude, and a maximum value of the sampling points is determined as a first maximum amplitude.

It should be noted that the first average amplitude and the first maximum amplitude can be determined by the following formulas:

wherein x is _k (N) represents the magnitude of the nth sample point in the first audio frame, N represents the number of sample points in the first audio frame,

represents a first average magnitude, based on a first audio frame>

Representing a first maximum amplitude of the first audio frame.

And step two, determining the gain of the first average amplitude value according to the first average amplitude value and the preset average amplitude value.

Wherein the preset average amplitude value may be a received artificially set average amplitude value. It should be noted that the preset average amplitude may correspond to the first audio frame, that is, the preset average amplitudes corresponding to different first audio frames may be the same or different; the preset average amplitude value may also correspond to the entire audio, i.e. the preset average amplitude value corresponding to each first audio frame is the same. The first average amplitude gain may be a gain obtained by adjusting the first average amplitude.

Specifically, the first average amplitude and the preset average amplitude are subjected to data processing to obtain a first average amplitude gain, where the data processing may be in a manner of calculating a ratio and the like.

For example, the first average amplitude gain may be determined according to the first average amplitude and the preset average amplitude by the following formula:

wherein the content of the first and second substances,

represents a first average magnitude, based on a first audio frame>

Represents a preset average magnitude, is greater than or equal to>

Representing a first average magnitude gain.

And step three, determining the first maximum amplitude gain according to the first maximum amplitude and a preset maximum amplitude.

The preset maximum amplitude may be a received maximum amplitude set by an operator. It should be noted that the preset maximum amplitude may correspond to the first audio frame, that is, the preset maximum amplitudes corresponding to different first audio frames may be the same or different; the preset maximum amplitude may also correspond to the entire audio, i.e. the preset maximum amplitudes corresponding to the respective first audio frames are the same. The first maximum amplitude gain may be a gain obtained by adjusting the first maximum amplitude.

Specifically, the first maximum amplitude and the preset maximum amplitude are subjected to data processing to obtain a first maximum amplitude gain, where the data processing may be in a manner of calculating a ratio and the like.

For example, the first maximum amplitude gain may be determined according to the first maximum amplitude and the preset maximum amplitude by the following formula:

wherein，

Represents a first maximum amplitude, which represents a first audio frame, is greater than or equal to>

Represents a preset maximum amplitude value>

Representing the first maximum amplitude gain.

And fourthly, determining a first amplitude gain corresponding to the first audio frame according to the first average amplitude gain and the first maximum amplitude gain.

Specifically, according to the first average amplitude gain and the first maximum amplitude gain, data processing may be performed to analyze the first average amplitude gain and the first maximum amplitude gain, for example, by using a preset algorithm or a preset model, and using an analysis result as the first amplitude gain corresponding to the first audio frame.

Illustratively, the minimum of the first average amplitude gain and the first maximum amplitude gain may be taken as the first amplitude gain corresponding to the first audio frame, i.e. the minimum of the first average amplitude gain and the first maximum amplitude gain is taken as the first amplitude gain corresponding to the first audio frame

Wherein it is present>

Represents a first average magnitude gain, <' > greater>

Representing the first maximum amplitude gain, G _k Representing a first amplitude gain for the first audio frame.

It should be noted that the first amplitude gain may also be determined according to other manners, such as: the first amplitude gain is determined according to the current volume by acquiring the preset audio gain, or by the corresponding relation between the preset volume and the audio gain, or by the preset deep learning gain model.

S120, smoothing the first amplitude gain according to a second amplitude gain corresponding to the second audio frame; and/or smoothing the first amplitude gain according to a third amplitude gain corresponding to a third audio frame.

The second audio frame is adjacent to the first audio frame and located before the first audio frame, and the third audio frame is adjacent to the first audio frame and located after the first audio frame. The second amplitude gain may be an amplitude gain value corresponding to the second audio frame obtained by processing the second audio frame, and may be understood as an amplitude gain value corresponding to the second audio frame buffered after the second audio frame is processed. The third amplitude gain may be an amplitude gain value corresponding to the third audio frame obtained by processing the third audio frame, and may be understood as an amplitude gain value corresponding to the third audio frame buffered after the third audio frame is processed. The smoothing process may be a data process according to a preset smoothing algorithm or a preset curve form.

Specifically, the first amplitude gain may be smoothed according to the second amplitude gain corresponding to the second audio frame to obtain the amplitude gain value corresponding to each sampling point in the first audio frame, and in this case, the amplitude gain value corresponding to each sampling point is smoothed, so that the amplitude gain value of the first audio frame and the amplitude gain value of the second audio frame may be in smooth transition. The first amplitude gain can also be smoothed according to a third amplitude gain corresponding to a third audio frame to obtain an amplitude gain value corresponding to each sampling point in the first audio frame, and under the condition, the amplitude gain value corresponding to each sampling point is smoothed, so that the amplitude gain value of the first audio frame and the amplitude gain value of the third audio frame can be in smooth transition. And smoothing amplitude gain values corresponding to the partial sampling points in the first audio according to the second amplitude gain, and smoothing amplitude gain values corresponding to the partial sampling points in the first audio according to the third amplitude gain to obtain amplitude gain values corresponding to the sampling points in the first audio frame, wherein the amplitude gain values corresponding to the sampling points in the first audio frame are smooth, so that the amplitude gain values of the first audio frame can be in smooth transition with the amplitude gain values of the second audio frame and the amplitude gain values of the third audio frame respectively.

Illustratively, the amplitude gain values of the first half of the sampling points in the first audio frame are processed according to the second amplitude gain, and the amplitude gain values of the second half of the sampling points in the first audio frame are processed according to the third amplitude gain, wherein the sum of the number of the first half of the sampling points and the number of the second half of the sampling points is less than or equal to the total number of the sampling points in the first audio frame.

On the basis of the above example, in order to effectively limit the range of the sampling points in the first audio frame, which are smoothed with the gain of the second audio frame, so as to accurately determine the magnitude gain value of the sampling point to be smoothed in the first audio frame, it may be:

and according to the second amplitude gain corresponding to the second audio frame, smoothing the first gain value in the first amplitude gain.

The first gain value is an amplitude gain value of a first preset number of sampling points adjacent to the second audio frame in the first audio frame. It is understood that the first gain value is an amplitude gain value of a first preset number of samples located at the front end after the samples in the first audio frame are sorted according to the time stamp. The first preset number may be a preset number of amplitude gain values of the sample points to be smoothed at the front end in the first audio frame.

Specifically, the amplitude gain values of a first preset number of sampling points at the front end are determined from the amplitude gain values corresponding to the sampling points in the first audio frame, and the amplitude gain values are used as first gain values. And further, smoothing the first gain value according to the second amplitude gain. If the first preset number is smaller than the total number of the sampling points in the first audio frame, smoothing the amplitude gain values of partial sampling points at the front end in the first audio frame; and if the first preset number is equal to the total number of the sampling points in the first audio frame, smoothing the amplitude gain values of all the sampling points in the first audio frame.

On the basis of the above example, in a case where the first amplitude gain is greater than a second amplitude gain corresponding to the second audio frame, the first gain value in the first amplitude gain may be smoothed according to the second amplitude gain.

Specifically, when the first amplitude gain is greater than the second amplitude gain, the smoothing process is performed on the first gain value in the first amplitude gain, and when the first amplitude gain is less than or equal to the second amplitude gain, the smoothing process is not performed on the first gain value in the first amplitude gain.

It should be noted that the reason why the condition limitation is set is that, when the first amplitude gain is larger than the second amplitude gain, since the second amplitude gain is smaller than the first amplitude gain, when the second audio frame is subjected to the smoothing processing, the amplitude gain value in the second audio frame may also be subjected to the smoothing processing according to the first amplitude gain, and the problem of poor audio quality caused by repeating the smoothing processing can be avoided by the condition setting.

For example, a gain smoothing diagram in the case where the first amplitude gain is larger than the second amplitude gain is shown in fig. 2. Wherein G is _k Representing a first amplitude gain, G, corresponding to a first audio frame _k-1 Representing a second amplitude gain for the second audio frame.

On the basis of the above example, the smoothing manner may be a manner based on sigmoid smoothing algorithm processing, i.e., G _k ＝sigmoid(G _k ，G _k-1 ). For example, the first gain value in the first amplitude gain may be smoothed according to the second amplitude gain based on the following formula:

ΔG _k ＝G _k -G _k-1

/>

G _k (m)＝max{G _k (m),1.0}

wherein G is _k Representing a first amplitude gain, G, corresponding to a first audio frame _k-1 Representing a second amplitude gain, Δ G, corresponding to a second audio frame _k Is the difference value of the first amplitude gain and the second amplitude gain, M is a first preset number, x (M) is the smoothing processing coefficient corresponding to the mth sampling point, alpha and beta are preset smoothing parameter values, G _k And (m) is the amplitude gain value after the smoothing processing corresponding to the mth sampling point.

It should be noted that the smoothing parameter value in the above formula can be set according to actual requirements, for example: α = -5.0, β =10.0, and the like, and specific numerical values are not particularly limited in this embodiment.

It should be further noted that, when the smoothing processing is performed in the above manner, the second amplitude gain corresponding to the second audio frame may be buffered in advance.

On the basis of the above example, in order to effectively limit the range of the sampling points in the first audio frame, which are smoothed by the gain of the third audio frame, so as to accurately determine the magnitude gain value of the sampling point to be smoothed in the first audio frame, it may be:

and according to a third amplitude gain corresponding to a third audio frame, smoothing a second gain value in the first amplitude gain.

The second gain value is the amplitude gain value of a second preset number of sampling points adjacent to the third audio frame in the first audio frame. It is understood that the second gain value is an amplitude gain value of a second predetermined number of samples located at the rear end after the samples in the first audio frame are sorted according to the time stamp. The second preset number may be a preset number of amplitude gain values of the sampling points to be smoothed at the rear end in the first audio frame.

Specifically, the amplitude gain values of a second preset number of sampling points at the rear end are determined from the amplitude gain values corresponding to the sampling points in the first audio frame, and the amplitude gain values are used as second gain values. And further, smoothing the second gain value according to the third amplitude gain. If the second preset number is smaller than the total number of the sampling points in the first audio frame, smoothing the amplitude gain values of the rear-end sampling points in the first audio frame; and if the second preset number is equal to the total number of the sampling points in the first audio frame, smoothing the amplitude gain values of all the sampling points in the first audio frame.

On the basis of the above example, it may also be that, in a case where the first amplitude gain is greater than a corresponding third amplitude gain of the third audio frame, a second gain value in the first amplitude gain is smoothed according to the third amplitude gain.

Specifically, when the first amplitude gain is greater than the third amplitude gain, the second gain value in the first amplitude gain is smoothed, and when the first amplitude gain is less than or equal to the third amplitude gain, the second gain value in the first amplitude gain is not smoothed.

It should be noted that the reason why the condition limitation is set is that, when the first amplitude gain is larger than the third amplitude gain, since the third amplitude gain is smaller than the first amplitude gain, when the third audio frame is subjected to the smoothing processing, the amplitude gain value in the third audio frame may also be subjected to the smoothing processing according to the first amplitude gain, and the problem of poor audio quality caused by repeating the smoothing processing can be avoided by the condition setting.

For example, a gain smoothing diagram in the case where the first amplitude gain is greater than the third amplitude gain is shown in fig. 3. Wherein G is _k Representing a first amplitude gain, G, corresponding to a first audio frame _k+1 Representing a third amplitude gain for a third audio frame.

On the basis of the above example, the manner of smoothing processing may be based on sigmoid smoothing algorithm processing, i.e., G _k ＝sigmoid(G _k ，G _k+1 ). For example, the second gain value in the first amplitude gain may be smoothed according to the third amplitude gain based on the following formula:

ΔG _k ＝G _k+1 -G _k

G _k (m)＝max{G _k (m),1.0}

wherein, G _k Representing a first amplitude gain, G, corresponding to a first audio frame _k+1 Representing a third amplitude gain, Δ G, corresponding to a third frequency frame _k Is the difference value between the third amplitude gain and the first amplitude gain, M is a second preset number, x (M) is the smoothing processing coefficient corresponding to the mth sampling point in the second preset number, alpha and beta are preset smoothing parameter values, G _k And (m) is the amplitude gain value after the smoothing processing corresponding to the mth sampling point in the second preset number.

It should be further noted that, when the smoothing processing is performed in the above manner, the third amplitude gain corresponding to the third audio frame may be buffered in advance.

S130, adjusting the amplitude of the first audio frame based on the first amplitude gain after the smoothing processing to obtain the target audio frame.

The target audio frame may be an audio frame obtained by performing gain processing on the amplitude of the first audio frame, that is, the first audio frame after the gain processing.

Specifically, an amplitude gain value corresponding to each sampling point in the first audio frame is determined according to the smoothed first amplitude gain, and then the amplitude of each sampling point in the first audio frame is multiplied by the amplitude gain value corresponding to each sampling point in the first audio frame, so that the amplitude of each sampling point after gain processing is obtained. And taking the first audio frame obtained after the gain processing is carried out on each sampling point as a target audio frame.

Fig. 4 is a flowchart illustrating another audio frame processing method according to an embodiment of the invention. The present embodiment is optimized based on the above technical solutions. Optionally, after determining the first amplitude gain, before performing smoothing processing on the first amplitude gain, the first amplitude gain may be updated; before determining the first amplitude gain corresponding to the first audio frame, the processing mode of the first audio frame may also be determined through a speech existence algorithm. The same or corresponding terms as those in the above embodiments are not explained in detail herein.

As shown in fig. 4, the method of this embodiment may specifically include:

s210, obtaining a first audio frame, and determining the voice existence probability of the first audio frame according to a preset voice existence algorithm.

The preset speech existence algorithm may be a predetermined method for determining the speech existence probability in the first audio frame, may be a conventional signal processing method, may also be a neural network model method, and the like. The speech existence probability may be a probability value of existence of speech in the first audio frame, that is, a processing result of the preset speech existence algorithm on the first audio frame.

Specifically, a first audio frame is obtained and input into a preset voice existence algorithm to obtain an output result, and the output result is used as the voice existence probability of the first audio frame.

Illustratively, the speech presence probability of the first audio frame is determined by the following formula

p _k ＝f(x _k (n)),n＝0,…,N-1

Wherein x is _k (N) represents the sequence of amplitudes of the samples in the first audio frame, N represents the number of samples in the first audio frame, f (-) represents the predetermined speech presence algorithm, p _k Representing the probability of speech presence for the first audio frame.

S220, judging whether the voice existence probability reaches a preset voice threshold value, if so, executing S230; if not, go to S270.

The preset speech threshold may be a preset threshold for determining whether speech exists in the first audio frame.

Specifically, whether the voice existence probability reaches a preset voice threshold value can be judged in the following manner:

b _k ＝(p _k >p _th )？1∶0

wherein, b _k Flag indicating the presence of speech in the first audio frame, "1" indicates the presence of speech, "0" indicates the absence of speech, p _k Representing the probability of speech presence, p, of the first audio frame _th Indicating a preset speech threshold.

In practical application, b can also be paired _k And carrying out certain smooth control to eliminate the influence caused by short pause or gap of voice.

It should be noted that, before performing gain smoothing processing, a preset speech existence algorithm may be used to determine whether speech to be subjected to gain processing exists in the first audio frame, if so, determine the first amplitude gain, and perform smoothing processing on the first amplitude gain, and if not, indicate that no meaningful speech exists in the first audio frame, and may not perform gain processing, so as to reduce data processing amount in the audio frame processing process and improve processing efficiency.

S230, determining a first amplitude gain corresponding to the first audio frame, and performing S240.

S240, according to the second amplitude gain corresponding to the second audio frame, smoothing is carried out on the amplitude gain value of each sampling point in the first amplitude gain so as to update the first amplitude gain, and S250 is executed.

Specifically, the smoothing in this step may be performed by a weighted average, and may be performed by performing a weighted average on the second amplitude gain and the first amplitude gain, and using an amplitude gain value obtained by the weighted average as an amplitude gain value of each sampling point in the first amplitude gain to perform smoothing. And updating the first amplitude gain according to the amplitude gain value of the smoothing processing of each sampling point. The first amplitude gain may be updated based on the following equation:

G _k ＝α·G _k-1 +(1-α)·G _k

wherein G is _k Representing a first amplitude gain, G _k-1 Indicating the second amplitude gain and alpha a preset weighting factor.

It should be noted that the preset weighting coefficient may be set according to actual requirements, and may be, for example, 0.5, 0.6, and the specific numerical value is not specifically limited in this embodiment.

On the basis of the above example, in order to avoid the problems of speech distortion caused by an excessively large first amplitude gain and inaccuracy of the determined amplitude gain, the first amplitude gain may be truncated by the first maximum amplitude gain of the first audio frame. The method specifically comprises the following steps:

the first amplitude gain is updated based on the first amplitude gain and a first maximum amplitude gain for the first audio frame.

Wherein the first maximum amplitude gain is a gain value determined according to a ratio of a preset maximum amplitude to a first maximum amplitude of the first audio frame.

Specifically, after determining the first amplitude gain and the first maximum amplitude gain for the first audio frame, the first amplitude gain may be processed such that the first amplitude gain is less than or equal to the first maximum amplitude gain. And updating the first amplitude gain according to the processed first amplitude gain.

Optionally, to avoid resource occupation and time consumption caused by a large amount of data processing, a minimum value of the first amplitude gain and the first maximum amplitude gain may be selected as the updated first amplitude gain.

The method specifically comprises the following steps:

the first amplitude gain is updated according to a minimum of the first amplitude gain and a first maximum amplitude gain for the first audio frame.

Specifically, if the first amplitude gain is less than or equal to the first maximum amplitude gain, the first amplitude gain is kept unchanged; and if the first amplitude gain is larger than the first maximum amplitude gain, taking the first maximum amplitude gain as the updated first amplitude gain. The first amplitude gain may be updated based on the following equation:

wherein G is _k A gain of a first magnitude is represented,

representing the first maximum amplitude gain.

It is noted that updating the first amplitude gain may be performed between S230 and S240 and/or between S240 and S250, depending on the first amplitude gain and the first maximum amplitude gain for the first audio frame.

It should be noted that, due to the clipping process, the amplitude gain value of the adjacent two frames of speech may be discontinuous, thereby causing discontinuity of audio frequency, and therefore, the processing of the amplitude gain value may be continued.

S250, smoothing the first amplitude gain according to a second amplitude gain corresponding to the second audio frame; and/or performing smoothing processing on the first amplitude gain according to a third amplitude gain corresponding to a third audio frame, and executing S260.

S260, adjusting the amplitude of the first audio frame based on the smoothed first amplitude gain to obtain the target audio frame.

S270, adjusting the amplitude of the first audio frame based on the preset gain to obtain the target audio frame.

Wherein the preset gain may be a magnitude gain value set for the first audio frame where no speech is present.

Specifically, when the speech existence probability does not reach the preset speech threshold, it may be determined that speech does not exist in the first audio frame, and the target audio frame may be obtained based on multiplication of the preset gain and the amplitude of the first audio frame.

It should be noted that, if the preset gain is 1, it indicates that the gain processing is not performed on the first audio frame, and the first audio frame may be directly used as the target audio frame without adjusting the amplitude of the first audio frame.

According to the technical scheme of the embodiment of the invention, a first audio frame is obtained, the voice existence probability of the first audio frame is determined according to a preset voice existence algorithm, whether the voice existence probability reaches a preset voice threshold value is judged, if not, the amplitude of the first audio frame is adjusted based on a preset gain to obtain a target audio frame, the first audio frame is rapidly processed, the resource occupation and the time consumption are reduced, if yes, a first amplitude gain corresponding to the first audio frame is determined, the amplitude gain value of each sampling point in the first amplitude gain is smoothed according to a second amplitude gain corresponding to a second audio frame, the first amplitude gain is updated, the first amplitude gain is initially smoothed, and the first amplitude gain is smoothed according to a second amplitude gain corresponding to the second audio frame; and/or smoothing the first amplitude gain according to a third amplitude gain corresponding to a third audio frame to smooth the first amplitude gain again so that the amplitude gain value changes slowly, adjusting the amplitude of the first audio frame based on the smoothed first amplitude gain to obtain a target audio frame, solving the problems of inconsistent audio and poor audio quality due to abrupt amplitude gain value change at the joint of adjacent audio frames, realizing the effect of eliminating gain inconsistency, and further improving the subjective auditory quality of the audio.

Fig. 5 is a flowchart illustrating another audio frame processing method according to an embodiment of the invention. As shown in fig. 5, the method of this embodiment may specifically be:

and acquiring a first audio frame, and processing the first audio frame according to a preset voice existence algorithm so as to calculate the voice existence probability. And comparing the voice existence probability with a preset voice threshold value to judge whether the voice exists or not, so that the voice existence probability is used for controlling the gain adjustment of the effective voice part.

If the voice does not exist, acquiring a preset gain, and adjusting the first audio frame according to the preset gain to obtain a target audio frame.

If the voice exists, calculating a first average amplitude gain and a first maximum amplitude gain of the first audio frame according to the first audio frame, and calculating the first amplitude gain of the first audio frame according to the first average amplitude gain and the first maximum amplitude gain. At this time, the first amplitude gain may be buffered for subsequent smoothing of the amplitude gain of the adjacent audio frame. And updating the first amplitude gain through a second amplitude gain corresponding to the second audio frame, and/or performing amplitude truncation processing on the first amplitude gain through the first maximum amplitude gain to update the first amplitude gain so as to enable the gain of the adjacent frame to be changed slowly. According to a second amplitude gain corresponding to a second audio frame, smoothing the first amplitude gain; and/or smoothing the first amplitude gain according to a third amplitude gain corresponding to a third audio frame, and adjusting the amplitude of the first audio frame based on the smoothed first amplitude gain to obtain the target audio frame.

According to the technical scheme of the embodiment of the invention, a first audio frame is obtained, the first audio frame is processed according to a preset voice existence algorithm, if no voice exists, the first audio frame is adjusted according to a preset gain to obtain a target audio frame, if the voice exists, a first average amplitude gain and a first maximum amplitude gain of the first audio frame are calculated to calculate a first amplitude gain of the first audio frame, the first amplitude gain is updated through a second amplitude gain corresponding to a second audio frame, and/or the first amplitude gain is subjected to amplitude truncation processing through the first maximum amplitude gain to update the first amplitude gain, further, the amplitude of the first audio frame is adjusted according to the second amplitude gain corresponding to the second audio frame to obtain the target audio frame, and/or the third amplitude gain corresponding to a third audio frame is subjected to smoothing processing, based on the smoothed first amplitude gain, the amplitude of the first audio frame is adjusted to obtain the target audio frame, the problem that the amplitude values of adjacent audio frames are connected, the audio gain is not continuous, and the audio quality is not consistent is further solved, and the problem of audio quality is solved.

Fig. 6 is a schematic structural diagram of an audio frame processing apparatus according to an embodiment of the present invention, the apparatus including: a first amplitude gain determination module 310, a first amplitude gain smoothing module 320, and a first audio frame gain module 330.

The first amplitude gain determining module 310 is configured to obtain a first audio frame, and determine a first amplitude gain corresponding to the first audio frame; a first amplitude gain smoothing module 320, configured to smooth the first amplitude gain according to a second amplitude gain corresponding to a second audio frame; and/or smoothing the first amplitude gain according to a third amplitude gain corresponding to a third audio frame; wherein the second audio frame is an audio frame adjacent to and preceding the first audio frame, and the third audio frame is an audio frame adjacent to and following the first audio frame; the first audio frame gain module 330 is configured to adjust an amplitude of the first audio frame based on the smoothed first amplitude gain, so as to obtain a target audio frame.

Optionally, the first amplitude gain smoothing module 320 is further configured to perform smoothing processing on a first gain value in the first amplitude gain according to a second amplitude gain corresponding to a second audio frame; the first gain value is an amplitude gain value of a first preset number of sampling points adjacent to the second audio frame in the first audio frame.

Optionally, the first amplitude gain smoothing module 320 is further configured to, when the first amplitude gain is greater than a second amplitude gain corresponding to the second audio frame, perform smoothing processing on a first gain value in the first amplitude gain according to the second amplitude gain.

Optionally, the first amplitude gain smoothing module 320 is further configured to perform smoothing processing on a second gain value in the first amplitude gain according to a third amplitude gain corresponding to a third audio frame; the second gain value is an amplitude gain value of a second preset number of sampling points adjacent to the third audio frame in the first audio frame.

Optionally, the first amplitude gain smoothing module 320 is further configured to, when the first amplitude gain is greater than a corresponding third amplitude gain of the third audio frame, perform smoothing processing on a second gain value in the first amplitude gain according to the third amplitude gain.

Optionally, the apparatus further comprises: and the first updating module is used for smoothing the amplitude gain value of each sampling point in the first amplitude gain according to a second amplitude gain corresponding to a second audio frame so as to update the first amplitude gain.

Optionally, the apparatus further comprises: a second updating module, configured to update the first amplitude gain according to the first amplitude gain and a first maximum amplitude gain of the first audio frame, where the first maximum amplitude gain is a gain value determined according to a ratio of a preset maximum amplitude to the first maximum amplitude of the first audio frame.

Optionally, the second updating module is further configured to update the first amplitude gain according to a minimum value of the first amplitude gain and a first maximum amplitude gain of the first audio frame.

Optionally, the first amplitude gain determining module 310 is further configured to determine a first average amplitude and a first maximum amplitude of the first audio frame according to amplitudes of sampling points in the first audio frame; determining a first average amplitude gain according to the first average amplitude and a preset average amplitude; determining a first maximum amplitude gain according to the first maximum amplitude and a preset maximum amplitude; and determining a first amplitude gain corresponding to the first audio frame according to the first average amplitude gain and the first maximum amplitude gain.

Optionally, the apparatus further comprises: the voice existence judging module is used for determining the voice existence probability of the first audio frame according to a preset voice existence algorithm; and under the condition that the voice existence probability reaches a preset voice threshold value, triggering and executing the operation of determining the first amplitude gain corresponding to the first audio frame.

Optionally, the apparatus further comprises: and the preset processing module is used for adjusting the amplitude of the first audio frame based on a preset gain under the condition that the voice existence probability does not reach a preset voice threshold value, so as to obtain a target audio frame.

According to the technical scheme, the first audio frame is obtained, the first amplitude gain corresponding to the first audio frame is determined, the amplitude gain value corresponding to each sampling point in the first audio frame is determined, the first amplitude gain is subjected to smoothing processing according to the second amplitude gain corresponding to the second audio frame, and/or the first amplitude gain is subjected to smoothing processing according to the third amplitude gain corresponding to the third audio frame, so that the first amplitude gain is smoothed, the amplitude gain value is changed slowly, the amplitude of the first audio frame is adjusted based on the smoothed first amplitude gain, the target audio frame is obtained, the problems that the amplitude gain value suddenly changes at the joint of adjacent audio frames, the audio is not consistent and the audio quality is poor are solved, the effect of eliminating gain inconsistency is achieved, and the subjective auditory quality of the audio is improved.

The audio frame processing device provided by the embodiment of the invention can execute the audio frame processing method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

It should be noted that, the units and modules included in the apparatus are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the embodiment of the invention.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. FIG. 7 illustrates a block diagram of an exemplary electronic device 40 suitable for use in implementing embodiments of the present invention. The electronic device 40 shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.

As shown in fig. 7, electronic device 40 is in the form of a general purpose computing device. The components of the electronic device 40 may include, but are not limited to: one or more processors or processing units 401, a system memory 402, and a bus 403 that couples various system components including the system memory 402 and the processing unit 401.

Bus 403 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Electronic device 40 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 40 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 402 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 404 and/or cache 405. The electronic device 40 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 406 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 7 and commonly referred to as a "hard drive"). Although not shown in FIG. 7, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to the bus 403 by one or more data media interfaces. System memory 402 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 408 having a set (at least one) of program modules 407 may be stored, for example, in the system memory 402, such program modules 407 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which or some combination of which may comprise an implementation of a network environment. Program modules 407 generally perform the functions and/or methods of the described embodiments of the invention.

The electronic device 40 may also communicate with one or more external devices 409 (e.g., keyboard, pointing device, display 410, etc.), with one or more devices that enable a user to interact with the electronic device 40, and/or with any devices (e.g., network card, modem, etc.) that enable the electronic device 40 to communicate with one or more other computing devices. Such communication may be performed through an I/O interface (input/output interface) 411. Also, the electronic device 40 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 412. As shown, the network adapter 412 communicates with the other modules of the electronic device 40 over the bus 403. It should be appreciated that although not shown in FIG. 7, other hardware and/or software modules may be used in conjunction with electronic device 40, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 401 executes various functional applications and data processing, for example, implementing an audio frame processing method provided by an embodiment of the present invention, by running a program stored in the system memory 402.

Embodiments of the present invention also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a method of audio frame processing, the method comprising:

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for embodiments of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. An audio frame processing method, comprising:

2. The method of claim 1, wherein smoothing the first amplitude gain based on a second amplitude gain corresponding to a second audio frame comprises

According to a second amplitude gain corresponding to a second audio frame, smoothing a first gain value in the first amplitude gain; the first gain value is the amplitude gain value of a first preset number of sampling points adjacent to the second audio frame in the first audio frame.

3. The method of claim 2, wherein smoothing a first gain value of the first amplitude gain according to a second amplitude gain corresponding to a second audio frame comprises:

and under the condition that the first amplitude gain is larger than a second amplitude gain corresponding to the second audio frame, smoothing a first gain value in the first amplitude gain according to the second amplitude gain.

4. The method of claim 1, wherein smoothing the first amplitude gain according to a third amplitude gain corresponding to a third audio frame comprises:

according to a third amplitude gain corresponding to a third audio frame, smoothing a second gain value in the first amplitude gain; the second gain value is an amplitude gain value of a second preset number of sampling points adjacent to the third audio frame in the first audio frame.

5. The method of claim 4, wherein smoothing a second gain value of the first amplitude gain according to a third amplitude gain corresponding to a third audio frame comprises:

and under the condition that the first amplitude gain is larger than a corresponding third amplitude gain of the third audio frame, smoothing a second gain value in the first amplitude gain according to the third amplitude gain.

6. The method according to claim 1, wherein after said obtaining a first audio frame and determining a first amplitude gain corresponding to said first audio frame, said smoothing said first amplitude gain according to a second amplitude gain corresponding to a second audio frame; and/or, before performing smoothing processing on the first amplitude gain according to a third amplitude gain corresponding to a third audio frame, the method further includes:

and according to a second amplitude gain corresponding to a second audio frame, smoothing the amplitude gain value of each sampling point in the first amplitude gain to update the first amplitude gain.

7. The method of claim 1, wherein after said obtaining the first audio frame and determining the first amplitude gain corresponding to the first audio frame, further comprising:

updating the first amplitude gain according to the first amplitude gain and a first maximum amplitude gain of the first audio frame, wherein the first maximum amplitude gain is a gain value determined according to a ratio of a preset maximum amplitude to the first maximum amplitude of the first audio frame.

8. The method of claim 7, wherein updating the first amplitude gain based on the first amplitude gain and a first maximum amplitude gain for the first audio frame comprises:

updating the first amplitude gain according to a minimum of the first amplitude gain and a first maximum amplitude gain of the first audio frame.

9. The method of claim 1, wherein determining the first amplitude gain for the first audio frame comprises:

determining a first average amplitude and a first maximum amplitude of the first audio frame according to the amplitude of each sampling point in the first audio frame;

determining a first average amplitude gain according to the first average amplitude and a preset average amplitude;

determining a first maximum amplitude gain according to the first maximum amplitude and a preset maximum amplitude;

and determining a first amplitude gain corresponding to the first audio frame according to the first average amplitude gain and the first maximum amplitude gain.

10. The method of claim 1, prior to said determining the first amplitude gain for the first audio frame, further comprising:

determining the voice existence probability of the first audio frame according to a preset voice existence algorithm;

and under the condition that the voice existence probability reaches a preset voice threshold value, triggering and executing the operation of determining the first amplitude gain corresponding to the first audio frame.

11. The method of claim 10, further comprising:

and under the condition that the voice existence probability does not reach a preset voice threshold value, adjusting the amplitude of the first audio frame based on a preset gain to obtain a target audio frame.

12. An audio frame processing apparatus, comprising:

13. An electronic device, characterized in that the electronic device comprises:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the audio frame processing method of any of claims 1-11.

14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out an audio frame processing method according to any one of claims 1 to 11.