CN112927715B

CN112927715B - Audio processing method, equipment and computer readable storage medium

Info

Publication number: CN112927715B
Application number: CN202110217049.2A
Authority: CN
Inventors: 张超鹏
Original assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Current assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date: 2021-02-26
Filing date: 2021-02-26
Publication date: 2024-06-14
Anticipated expiration: 2041-02-26
Also published as: CN112927715A

Abstract

The application discloses an audio processing method, equipment and a medium, comprising the following steps: acquiring target audio to be processed; calculating a power spectrum of the target audio; determining whether a frequency value to be analyzed with the maximum power variation amplitude exists in the power spectrum, and judging whether the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable or not; and determining that the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable, and determining the frequency value to be analyzed as the cut-off frequency so as to process the target audio based on the cut-off frequency. The application discovers that the power spectrum near the cut-off frequency has the largest power spectrum variation amplitude and the power spectrum behind the cut-off frequency is in a stable state, and determines the cut-off frequency of the audio according to the power spectrum variation amplitude, and compared with the prior art that half of the audio sampling frequency is directly determined as the cut-off frequency, the application can obtain the accurate cut-off frequency, and thus, the accuracy of audio processing can be improved if the target audio is processed based on the cut-off frequency.

Description

Audio processing method, equipment and computer readable storage medium

Technical Field

The present application relates to the field of audio processing technology, and more particularly, to an audio processing method, system, device, and computer readable storage medium.

Background

Currently, in the processing of audio such as music and human voice, a cut-off frequency of the audio, that is, an upper limit frequency value of an effective frequency band containing sound, is sometimes required, and the cut-off frequency of the audio may be determined by a nyquist criterion (nyquist). In the nyquist criterion, the cut-off frequency of the audio is considered to be half of the audio sampling frequency, but due to limitations of a sampling method or equipment and the like, the actually obtained audio may not meet the nyquist criterion, so that the cut-off frequency determined according to the nyquist criterion may not coincide with the actual situation, and the accuracy of audio processing is finally affected.

In summary, the inventor found that there is at least a problem in the prior art that the accuracy of audio processing is poor.

Disclosure of Invention

In view of the foregoing, it is an object of the present invention to provide an audio processing method, apparatus, device, and computer readable storage medium, which can improve the accuracy of audio processing. The specific scheme is as follows:

in a first aspect, the present application discloses an audio processing method, including:

acquiring target audio to be processed;

calculating a power spectrum of the target audio;

determining whether a frequency value to be analyzed with the largest power variation amplitude exists in the power spectrum, and judging whether the power spectrum corresponding to a frequency value larger than the frequency value to be analyzed is stable or not;

And if the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable, determining the frequency value to be analyzed as the cut-off frequency, and processing the target audio based on the cut-off frequency.

Optionally, the method further comprises:

and determining that the frequency value to be analyzed does not exist in the power spectrum, and determining half of the target audio sampling frequency as the cut-off frequency.

Optionally, the method further comprises:

And if the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is not stable, determining half of the target audio sampling frequency as the cut-off frequency.

Optionally, determining whether the frequency value to be analyzed with the largest power variation amplitude exists in the power spectrum includes:

Calculating a differential value of the power spectrum with frequency;

Judging whether the frequency value to be analyzed corresponding to the differential value with the smallest value exists or not, wherein the differential values corresponding to the frequency values at the left side and the right side of the frequency to be analyzed are all larger than the differential values corresponding to the frequency value to be analyzed;

if the frequency value to be analyzed does not exist, determining that the frequency value to be analyzed does not exist in the power spectrum;

And if the frequency value to be analyzed exists, determining that the frequency value to be analyzed exists in the power spectrum.

Optionally, the determining whether the frequency value to be analyzed corresponding to the difference value with the smallest value exists, and the difference values corresponding to the frequency values on the left and right sides of the frequency to be analyzed are both greater than the difference value corresponding to the frequency value to be analyzed includes:

Judging whether a minimum trough point exists in the change relation of the differential value along with the frequency;

if the minimum trough point does not exist, judging that the frequency value to be analyzed does not exist;

and if the minimum trough point exists, determining the frequency corresponding to the minimum trough point as the frequency value to be analyzed.

Optionally, the determining whether the power spectrum corresponding to the frequency value greater than the frequency value to be analyzed is stable includes:

Judging whether a frequency value corresponding to the increase of the power spectrum exists in the frequency value larger than the frequency value to be analyzed in the power spectrum; if the frequency value corresponding to the power spectrum increase does not exist, selecting a frequency value to be operated from frequency values larger than the frequency value to be analyzed; if the frequency value of the corresponding power spectrum increase exists, determining the frequency value of the corresponding power spectrum increase as a frequency value to be operated;

calculating a standard deviation value of a power spectrum corresponding to the frequency value which is larger than or equal to the frequency value to be operated;

Judging whether the standard deviation value is smaller than a preset threshold value or not;

If the standard deviation value is smaller than the preset threshold value, determining that the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable;

And if the standard deviation value is greater than or equal to the preset threshold value, determining that the power spectrum corresponding to the frequency value greater than the frequency value to be analyzed is unstable.

Optionally, the calculating the differential value of the power spectrum with frequency includes:

Performing silence suppression on the power spectrum to obtain a processing power spectrum;

the differential value of the processing power spectrum with frequency is calculated.

Optionally, the mute suppression on the power spectrum is performed to obtain a processing power spectrum, which includes:

Smoothing the power spectrum to obtain a smooth power spectrum;

And performing silence suppression on the smooth power spectrum to obtain the processing power spectrum.

Optionally, the calculating the power spectrum of the target audio includes:

Performing silence suppression on the target audio based on the time domain signal energy of the target audio to obtain processed audio;

the power spectrum of the processed audio is calculated.

calculating an average power spectrum of each frame of audio based on the power spectrum;

The difference value of the average power spectrum of all the audios with frequency is calculated.

Optionally, the calculating the average power spectrum of each frame of audio includes:

For each frame of the audio, acquiring the power spectrum with a preset length at a preset frequency of the audio, and taking an average value of the acquired power spectrum as the average power spectrum of the audio.

Optionally, the calculating the power spectrum of the target audio includes:

And calculating the logarithmic power spectrum of the target audio.

In a second aspect, the present application discloses an audio processing apparatus comprising:

the audio acquisition module is used for acquiring target audio to be processed;

The power spectrum calculation module is used for calculating the power spectrum of the target audio;

The cut-off frequency judging module is used for determining whether a frequency value to be analyzed with the largest power change amplitude exists in the power spectrum, and judging whether the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable or not; and if the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable, determining the frequency value to be analyzed as the cut-off frequency, and processing the target audio based on the cut-off frequency.

In a third aspect, the present application discloses an electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the previously disclosed audio processing method.

In a fourth aspect, the present application discloses a computer readable storage medium storing a computer program which, when executed by a processor, implements the previously disclosed audio processing method.

In the application, after the target audio to be processed is acquired, the power spectrum of the target audio is calculated; if the cut-off frequency is not half of the audio sampling frequency, the power spectrum variation amplitude near the cut-off frequency is maximum, so that whether a frequency value to be analyzed with the maximum power variation amplitude exists in the power spectrum needs to be judged, if the frequency value to be analyzed exists in the power spectrum is determined, the accuracy that the frequency value to be analyzed is the cut-off frequency needs to be further judged, and if the power spectrum behind the cut-off frequency is in a stable state, whether the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable can be judged; if the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable, the frequency value to be analyzed is determined to be the cut-off frequency, so that the accurate cut-off frequency is obtained, and the accuracy of audio processing can be improved if the target audio is processed based on the cut-off frequency.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a system framework to which the audio processing provided by the present application is applicable;

fig. 2 is a flowchart of an audio processing method according to an embodiment of the present application;

FIG. 3 is a flowchart of an audio processing method according to an embodiment of the present application;

FIG. 4 is a diagram of a power spectrum and differential values;

FIG. 5 is a flowchart of an audio processing method according to an embodiment of the present application;

FIG. 6 is a flowchart of a power spectrum stability judging method according to the present application;

FIG. 7 is a schematic illustration of a smoothing process of a power spectrum;

FIG. 8 is a flow chart of silence suppression;

FIG. 9 is a graph of frequency and power spectrum information for a target audio;

FIG. 10 is a graph of frequency and power spectrum information for a target audio;

FIG. 11 is a diagram of a power spectrum and differential values;

fig. 12 is a schematic structural diagram of an audio processing device according to the present application;

Fig. 13 is a block diagram of an electronic device according to the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Currently, in the processing of audio such as music and human voice, a cut-off frequency of the audio, that is, an upper limit frequency value of an effective frequency band containing sound, is sometimes required, and the cut-off frequency of the audio may be determined by a nyquist criterion (nyquist). In the nyquist criterion, the cut-off frequency of the audio is considered to be half of the audio sampling frequency, but due to limitations of a sampling method or equipment and the like, the actually obtained audio may not meet the nyquist criterion, so that the cut-off frequency determined according to the nyquist criterion may not coincide with the actual situation, and the accuracy of audio processing is finally affected. In order to overcome the technical problems, the application provides an audio processing scheme which can improve the accuracy of audio processing.

In the audio processing scheme of the present application, the system framework adopted may specifically be as shown in fig. 1, and may specifically include: a background server 01 and a number of clients 02 establishing a communication connection with the background server 01.

In the application, the background server 01 is used for executing an audio processing step, which comprises the steps of obtaining target audio to be processed; calculating a power spectrum of the target audio; determining whether a frequency value to be analyzed with the maximum power variation amplitude exists in the power spectrum, and judging whether the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable or not; and if the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable, determining the frequency value to be analyzed as a cut-off frequency, and processing the target audio based on the cut-off frequency.

Furthermore, the background server 01 may also be provided with an audio database, a power spectrum database, a frequency value database to be analyzed, a cut-off frequency database, and the like. Wherein the audio database is used for storing various target audio, such as music audio data, voice audio data and the like. The power spectrum database can be used for storing the power spectrum data obtained by calculating the target audio. The frequency value database to be analyzed can be used for storing the frequency value to be analyzed, which needs to be judged whether the frequency value is the cut-off frequency or not. The cut-off frequency database may then be used to store the cut-off frequency of the target audio.

Of course, the application can also set the target audio database in the service server of the third party, and the target audio data uploaded by the service server can be specially collected. In this way, when the background server 01 needs to use the target audio, the corresponding target audio may be obtained by initiating a corresponding target audio call request to the service server. And in the application, the background server 01 can respond to the audio processing request of one or more clients 02, etc.

Fig. 2 is a flowchart of an audio processing method according to an embodiment of the present application. Referring to fig. 2, the audio processing method includes:

Step S11: and obtaining target audio to be processed.

In this embodiment, the target audio to be processed refers to audio whose cutoff frequency needs to be determined, because the cutoff frequency of the target audio is affected by the audio acquisition mode and the audio acquisition device, the cutoff frequency of the target audio may not be half of the audio sampling frequency, when the target audio is processed, such as time-frequency masking, if the target audio is directly processed, the signal power (or amplitude) between the cutoff frequency and half of the sampling frequency is0, when the target audio is used as a divisor, an abnormality occurs, and even if the abnormality is considered in the processing process, the minimum value is processed, because the value at this time does not reflect the real spectrum information any more, the processing of the target audio is inaccurate, so in order to improve the processing accuracy of the target audio, the cutoff frequency of the target audio needs to be accurately determined according to the method provided by the present application, so that the target audio is accurately processed based on the cutoff frequency.

It will be appreciated that the type of the target audio may be determined according to the actual application scenario, and the present application is not particularly limited herein.

Step S12: a power spectrum of the target audio is calculated.

In this embodiment, since the frequency portion after the cut-off frequency does not have any audio, the variation amplitude of the power spectrum information before and after the cut-off frequency is the largest, so in the process of determining the cut-off frequency, the power spectrum of the target audio may be calculated first to determine the cut-off frequency of the target audio by means of the power spectrum.

It can be understood that the method for calculating the power spectrum of the target audio may be determined according to actual needs, for example, short-time fourier transform may be performed on the target audio to obtain a corresponding power spectrum, in this process, frame division, windowing, discrete fourier transform, and recalculation of the power spectrum are required for the target audio, and frame shift, frame length, etc. applied in the short-time fourier transform process may be determined according to the target audio or processing needs, for example, the frame shift applied in the short-time fourier transform process may be 20ms, frame length may be 30ms, etc. in the present application, the whole short-time fourier transform process may be as follows:

Framing the target audio by a framing formula x _n (i) =x (n.m+i), wherein n represents an nth frame signal, M represents frame shift, i represents an index of a signal in the nth frame, and the value range of i is 0,1,2, …, L-1, L represents frame length;

Windowing is carried out on a frame signal obtained by dividing a frame through a windowing formula xw _n(i)＝x_n (i) & w (i), wherein xw _n (i) represents a windowing result, w (i) represents a window function, the type of the window function can be determined according to actual needs, for example, a hanning (hanning) window and the like can be used in the application, and the hanning window has the following expression:

the windowed result is transformed by a discrete fourier transform formula, which may be as follows:

Wherein X (n, k) represents the fourier transform result of the nth frame signal; k represents a frequency point; n represents the number of points of Fourier transformation, and when the frame length L is smaller than N, zero padding is needed to be carried out on xw _n (i); when L is larger than N, the xw _n (i) needs to be truncated, namely N points in the truncated x w _n (i) need to be truncated to carry out Fourier transformation;

The power spectrum is calculated by a power spectrum calculation formula P (n, k) = ||x (n, k) || ², where P (n, k) represents the power spectrum of the kth frequency point of the nth frame.

It can be understood that in the process of calculating the power spectrum of the target audio, for convenience of display or calculation, the log power spectrum of the target audio may be calculated, for example, P _log (n, k) is used to represent the log power spectrum of the kth frequency point of the nth frame signal, and then the calculation formula of the log power spectrum may be as follows:

step S13: and determining that the frequency value to be analyzed with the maximum power change amplitude exists in the power spectrum, and executing step S14.

In this embodiment, after the power spectrum of the target audio is obtained by calculation, if it is determined that the frequency value to be analyzed with the largest power variation amplitude exists in the power spectrum, it is further determined whether the frequency value to be analyzed is the cut-off frequency, because the frequency value to be analyzed with the largest power variation amplitude may exist in the power spectrum if the frequency value to be analyzed is not half of the sampling frequency. Accordingly, if it is determined that the frequency value to be analyzed does not exist in the determined power spectrum, half of the target audio sampling frequency may be directly determined as the cut-off frequency.

Step S14: judging whether a power spectrum corresponding to a frequency value larger than the frequency value to be analyzed is stable or not; if the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable, step S15 is executed.

Step S15: the frequency value to be analyzed is determined as a cut-off frequency to process the target audio based on the cut-off frequency.

In this embodiment, when the frequency value to be analyzed exists in the power spectrum, it is known by analyzing the power spectrum that the stability of the power spectrum after the frequency value to be analyzed is different when the cutoff frequency is half of the sampling frequency and the cutoff frequency is not half of the sampling frequency, specifically, when the cutoff frequency is half of the sampling frequency, the power spectrum corresponding to the frequency value greater than the frequency value to be analyzed is not stable, and when the cutoff frequency is not half of the sampling frequency, the power spectrum corresponding to the frequency value greater than the frequency value to be analyzed is stable, so in order to accurately determine the cutoff frequency, it is also necessary to determine whether the power spectrum corresponding to the frequency value greater than the frequency value to be analyzed is stable; if the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable, the frequency value to be analyzed can be determined as the cut-off frequency, so that the target audio is processed based on the cut-off frequency, for example, in the case that the cut-off frequency is not half of the sampling frequency, the audio storage space or the number of transmission bytes is saved by performing downsampling processing on the target audio. Accordingly, if it is determined that the power spectrum corresponding to the frequency value greater than the frequency value to be analyzed is not stationary, half of the target audio sampling frequency may be determined as the cut-off frequency;

In the application, after the target audio to be processed is acquired, the power spectrum of the target audio is calculated; because the cut-off frequency is not half of the audio sampling frequency, the power spectrum variation amplitude near the cut-off frequency is maximum, so that whether a frequency value to be analyzed with the maximum power variation amplitude exists in the power spectrum or not needs to be judged, if the frequency value to be analyzed exists in the power spectrum, the accuracy that the frequency value to be analyzed is the cut-off frequency needs to be further judged, and because the power spectrum behind the cut-off frequency is in a stable state, whether the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable or not can be judged; if the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable, the frequency value to be analyzed is determined to be the cut-off frequency, so that the accurate cut-off frequency is obtained, and the accuracy of audio processing can be improved if the target audio is processed based on the cut-off frequency.

Fig. 3 is a flowchart of an audio processing method according to an embodiment of the present application. Referring to fig. 3, the audio processing method includes:

Step S21: and obtaining target audio to be processed.

Step S22: a power spectrum of the target audio is calculated.

Step S23: a differential value of the power spectrum with frequency is calculated.

Step S24: judging whether the frequency value to be analyzed corresponding to the difference value with the minimum value exists or not, wherein the difference values corresponding to the frequency values at the left side and the right side of the frequency to be analyzed are larger than the difference values corresponding to the frequency value to be analyzed; if the frequency value to be analyzed does not exist, executing step S25; if the frequency value to be analyzed exists, step S26 is performed.

In this embodiment, if the frequency value to be analyzed is the frequency value with the largest variation amplitude in the power spectrum and is reflected on the differential value of the power spectrum, as shown in fig. 4, the abscissa in fig. 4 represents the frequency, the ordinate represents the power, and the differential values corresponding to the frequency values on the left and right sides of the frequency to be analyzed are all greater than the differential value corresponding to the frequency value to be analyzed, so that in the process of determining whether the frequency value to be analyzed with the largest variation amplitude of the power exists in the power spectrum, the differential value of the power spectrum along with the frequency can be calculated; judging whether the frequency value to be analyzed corresponding to the difference value with the minimum value exists or not, wherein the difference values corresponding to the frequency values at the left side and the right side of the frequency to be analyzed are larger than the difference values corresponding to the frequency value to be analyzed; if the frequency value to be analyzed does not exist, determining that the frequency value to be analyzed does not exist in the power spectrum; if the frequency value to be analyzed exists, determining that the frequency value to be analyzed exists in the power spectrum. So as to rapidly judge whether the frequency value to be analyzed with the largest power variation amplitude exists in the power spectrum.

Step S25: half of the target audio sampling frequency is determined as the cut-off frequency.

Step S26: judging whether a power spectrum corresponding to a frequency value larger than the frequency value to be analyzed is stable or not; if the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is not stable, executing the step S27; if the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable, step S28 is executed.

Step S27: half of the target audio sampling frequency is determined as the cut-off frequency.

Step S28: the frequency value to be analyzed is determined as a cut-off frequency to process the target audio based on the cut-off frequency.

Fig. 5 is a flowchart of an audio processing method according to an embodiment of the present application. Referring to fig. 5, the audio processing method includes:

Step S31: and obtaining target audio to be processed.

Step S32: a power spectrum of the target audio is calculated.

Step S33: a differential value of the power spectrum with frequency is calculated.

Step S34: judging whether a minimum wave trough point exists in the change relation of the differential value along with the frequency; if the minimum trough point does not exist, executing step S35; if the minimum trough exists, step S36 is performed.

In this embodiment, after analyzing fig. 4, if a change relation of a difference value along with frequency can be obtained, the frequency value to be analyzed can be determined directly according to the change relation, and because the difference value corresponding to the frequency value to be analyzed is represented as the minimum trough point on the change relation, in the process of judging whether the frequency value to be analyzed corresponding to the difference value with the minimum value exists or not, and the difference values corresponding to the frequency values on the left and right sides of the frequency to be analyzed are both greater than the difference value corresponding to the frequency value to be analyzed, in order to quickly determine whether the frequency value to be analyzed exists or not, whether the minimum trough point exists or not can be judged in the change relation of the difference value along with frequency; if the minimum trough point does not exist, judging that the frequency value to be analyzed does not exist; if the minimum wave trough point exists, determining the frequency corresponding to the minimum wave trough point as a frequency value to be analyzed.

It can be understood that in practical application, the minimum trough point can be determined rapidly and conveniently by the following minimum trough point determination formula, etc., so as to determine the frequency value to be analyzed, where the minimum trough point determination formula can be:

Pd_log(k)＝P_log(k+1)-P_log(k)；

wherein arg represents parameterizing the function; k _min represents a frequency point corresponding to the frequency value to be analyzed; the frequency point can be represented by the formula To a corresponding frequency, where fs represents the sampling frequency.

Step S35: half of the target audio sampling frequency is determined as the cut-off frequency.

Step S36: judging whether a power spectrum corresponding to a frequency value larger than the frequency value to be analyzed is stable or not; if the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is not stable, executing step S37; if the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable, step S38 is performed.

Step S37: half of the target audio sampling frequency is determined as the cut-off frequency.

Step S38: the frequency value to be analyzed is determined as a cut-off frequency to process the target audio based on the cut-off frequency.

Fig. 6 is a flowchart of a power spectrum stability judging method in the present application. Referring to fig. 6, the process of judging whether the power spectrum is stationary in the audio processing method may include the steps of:

Step S41: judging whether a frequency value corresponding to the increase of the power spectrum exists in a frequency value larger than the frequency value to be analyzed in the power spectrum; if there is no frequency value corresponding to the power spectrum increase, step S42 is performed; if there is a frequency value corresponding to the power spectrum increase, step S43 is performed;

step S42: and selecting a frequency value to be operated from frequency values larger than the frequency value to be analyzed, and executing step S44.

Step S43: the frequency value corresponding to the power spectrum increase is determined as the frequency value to be operated, and step S44 is performed.

In this embodiment, in the process of judging whether the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable, since the stable influence of the frequency value corresponding to the increase of the power spectrum on the power spectrum is obvious, in order to rapidly judge whether the power spectrum is stable, the frequency value corresponding to the increase of the power spectrum can be determined first, and then whether the power spectrum is stable is judged according to the frequency value, that is, whether the frequency value corresponding to the increase of the power spectrum exists in the frequency value larger than the frequency value to be analyzed in the power spectrum; if the frequency value corresponding to the power spectrum increase does not exist, selecting the frequency value to be operated from the frequency value larger than the frequency value to be analyzed; if the frequency value corresponding to the power spectrum increase exists, the frequency value corresponding to the power spectrum increase is determined as the frequency value to be operated.

It is understood that in the case where there is no frequency value corresponding to the increase of the power spectrum, a frequency value 1 greater than the frequency value to be analyzed may be regarded as the frequency value to be operated, or the like; in the case that there is a frequency value corresponding to the power spectrum increase, the frequency value corresponding to the power spectrum increase closest to the frequency value to be analyzed may be determined as the frequency value to be operated, etc.; the present application is not particularly limited herein.

In practical application, in order to facilitate determining the frequency value corresponding to the power spectrum increase, the frequency value corresponding to the power spectrum increase may be determined by the following formula:

Wherein k _v represents a frequency point corresponding to a frequency value corresponding to an increase in power value.

Step S44: and calculating a standard deviation value of the power spectrum corresponding to the frequency value which is larger than or equal to the frequency value to be operated.

Step S45: judging whether the standard deviation value is smaller than a preset threshold value or not; if the standard deviation is smaller than the preset threshold, executing step S46; if the standard deviation is greater than or equal to the preset threshold, step S47 is performed.

Step S46: and determining that the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable.

Step S47: and determining that the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is unstable.

In this embodiment, after determining the frequency value to be operated, in order to quickly determine whether the power spectrum is stable according to the frequency value to be operated, a standard deviation value of the power spectrum corresponding to the frequency value greater than or equal to the frequency value to be operated may be calculated; judging whether the standard deviation value is smaller than a preset threshold value or not; if the standard deviation value is smaller than the preset threshold value, determining that the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable; if the standard deviation value is greater than or equal to a preset threshold value, determining that the power spectrum corresponding to the frequency value greater than the frequency value to be analyzed is unstable.

It can be understood that the magnitude of the preset threshold value can be according to actual needs, for example, the preset threshold value can be 0.1 or the like; of course, in practical application, parameters that can reflect whether the power spectrum is stable, such as variance of the power spectrum corresponding to the frequency value greater than or equal to the frequency value to be operated, can also be calculated, and the application is not limited in detail herein.

In the audio processing method provided by the disclosure, because silence or audio information unnecessary for audio processing may exist in the target audio, and the audio information is unnecessary for audio processing, for convenience of subsequent audio processing, silence suppression (VAD) may be performed on the power spectrum in the process of calculating the difference value of the power spectrum with frequency, so as to obtain a processed power spectrum; and calculating the difference value of the processing power spectrum along with the frequency.

It can be understood that the manner of silence suppression for the power spectrum can be determined according to actual needs, for example, a minimum power value and a maximum power value of effective audio in silence suppression can be set, the power spectrum between the minimum power value and the maximum power value in the power spectrum is saved, and the rest power spectrums are removed to obtain power spectrum information of the effective audio and the effective audio; in practical application, the length preset value and the adjacent distance of the effective audio can be set, the effective audio can be removed under the condition that the length of the effective audio does not reach the length preset value, or the two adjacent effective audio can be spliced to serve as one effective audio under the condition that the interval between the adjacent effective audio is smaller than the adjacent distance, and the like.

In the audio processing method provided by the disclosure, because noise information may exist in the calculated power spectrum, if the power spectrum is directly processed, the processing accuracy is affected, so in order to ensure the processing accuracy, in the process of performing silence suppression on the power spectrum to obtain the processed power spectrum, the power spectrum can be subjected to smoothing processing to obtain a smoothed power spectrum; and performing silence suppression on the smooth power spectrum to obtain a processing power spectrum.

It can be understood that in practical application, the power spectrum may be smoothed by convolution operation, and the duration of the frame signal applied in this process may be determined according to practical needs, for example, the duration is 0.400s, etc., and then, on the basis of the calculated log power spectrum, the process of smoothing the log power spectrum may be as follows:

Determining the number of frame points M=0.400/0.02=20 of the smoothing operation according to the frame shift and the duration of the frame signal;

Calculation of Wherein/>Representing an upward rounding;

Determining a smooth kernel function S _b (M) with the length of M+1 according to M and B;

Normalizing S _b (m), i.e

The normalized S _b (m) is applied to smooth the log power spectrum, namely:

Wherein Ps _log (n) represents a smoothed log power spectrum, and the smoothing process of the power spectrum can be shown in fig. 7;

In this process, the logarithmic result of the power spectrum mean value at each frequency point can be used as the power value of the current frame, namely Accordingly, the formula for smoothing the log power spectrum may be: /(I)

Accordingly, in the process of silence suppression of the smoothed power spectrum to obtain the processed power spectrum, the minimum power value thrL of the effective audio may be set empirically, for example, thrl=30db, and half of the average value of the logarithmic power values is set as the maximum power value of the effective audio, that isWherein T represents the total number of frames of the current signal, namely the total frame number; and the frame index is represented by n, the detected valid audio segment index is represented by idx, the current valid audio segment start frame is represented by ns, the current valid audio segment end frame is represented by ne, the length preset value is represented by n _last, the adjacent distance is represented by n _pre, the start frame position of the idx-th valid audio is represented by VAD (idx, 0), and the end frame index of the valid audio is represented by VAD (idx, 1), the valid audio determination process may be as shown in fig. 8, which includes the steps of:

step S101: initializing n, idx, ns, ne to a value of 0;

step S102: judging whether the power value of n is greater than the maximum power value and whether the value of ns is 0, if so, executing step S103; if not, step S104 is performed.

Step S104: judging whether the power value of n is smaller than the minimum power value, if not, executing step S103; if yes, step S105 is performed.

Step S105: judging whether n is larger than the sum of ns and n _last, if not, executing step S106; if yes, step S107 is performed.

Step S106: the value of is set to 0, and step S103 is executed.

Step S107: step S108 is performed by assigning the maximum value of 0, (ns-n _pre) to the value of ns.

Step S108: judging whether the value of ne is greater than 0 and ns is less than ne, if so, executing step S109; if not, step S110 is performed.

Step S109: setting the value of the ending frame index of the valid audio to n-1; step S111 is performed.

Step S110: the step S111 is executed by adding 1 to the value of idx, setting ns to the value of the start frame position of the effective audio, and setting n-1 to the value of the end frame index of the effective audio.

Step S111: the value of n-1 is set to the value of ne, and the value of ns is set to 0, and step S103 is executed.

Step S103: the value of n is given to ns, and the value of n is added to 1, and step S112 is performed.

Step S112: judging whether n is greater than or equal to T, if not, returning to the step S102; if yes, step S113 is executed.

Step S113: judging whether the value of ns is not equal to 0, if so, adding 1 to the value of idx to output, outputting ns as the value of the initial frame position of the effective audio, outputting T-1 as the value of the end frame index of the effective audio, and ending; if not, the method is directly finished.

It should be noted that, in the audio processing method provided by the present disclosure, the silence suppression may be performed on the target audio before the power spectrum is calculated, that is, in the process of calculating the power spectrum of the target audio, the silence suppression may be performed on the target audio based on the time domain signal energy of the target audio, so as to obtain the processed audio; the power spectrum of the processed audio is recalculated.

In the audio processing method provided by the disclosure, in order to improve the calculation speed, the average power spectrum of each frame of audio can be calculated as the power spectrum information of the whole audio to perform corresponding calculation, that is, in the process of calculating the difference value of the power spectrum along with the frequency, the average power spectrum of each frame of audio can be calculated based on the power spectrum; the difference value of the average power spectrum of all the audios with frequency is calculated.

In this embodiment, in the process of calculating the average power spectrum of each frame of audio, for each frame of audio, a power spectrum with a preset length may be obtained at a preset frequency of the audio, and the average value of the obtained power spectrum is taken as the average power spectrum of the audio. For example, at frequencies corresponding to positions of 0.25, 0.5 and 0.75 times of the length of the audio, power spectrums of 0.5s length can be obtained respectively, and then an average value of the obtained power spectrums is used as an average power spectrum of the audio, so that in the case of taking the logarithmic power spectrum as a calculation basis, an average power spectrum calculation formula can be as follows:

where T _1.5 denotes the number of audio frames over the acquired 1.5s period.

Of course, the value of the preset length at the preset frequency can be determined according to the actual requirement, and the application is not limited herein specifically, for example, on the premise of knowing that the target audio distribution is stable, the average power spectrum can be calculated by randomly extracting the length audio power spectrum for 0.5s or even shorter according to the actual requirement and the processing speed, but preferably, the time difference is greater than 10 frames, that is, greater than 0.2 s.

It can be understood that in the process of calculating the difference value of the average power spectrum of all the audios with frequency, the average power spectrum can be further smoothed, the related parameters of the smoothing process can refer to the above embodiment, and the difference value of the smoothed average power spectrum with frequency can be calculated, namely, the difference value of the smoothed average power spectrum with frequency can be calculated first:

pd _log(k)＝Ps_log(k+1)-Ps_log (k) was calculated again to obtain differential values, etc.

The following describes a technical scheme of the present application by taking an audio processing procedure of an APP of a certain music client as an example. The process may include the steps of:

Step S201: acquiring target audio to be processed in the music client APP;

Step S202: calculating a logarithmic power spectrum of the target audio;

Specifically, the target audio is framed by a framing formula x _n (i) =x (n·m+i), wherein n represents an nth frame signal, M represents frame shift, i represents an index of an nth frame signal, the value range of i is 0,1,2, …, and L-1, L represents frame length;

windowing a frame signal obtained by dividing the frame by a windowing formula xw _n(i)＝x_n (i) & w (i), wherein xw _n (i) represents a windowing result, w (i) represents a window function and is a hanning (hanning) window, and the hanning window has the following expression:

calculating a power spectrum by a power spectrum calculation formula P (n, k) = |x (n, k) | ², wherein P (n, k) represents a power spectrum of a kth frequency point of an nth frame;

By passing through Calculating a logarithmic power spectrum of the target audio;

step S203: smoothing the logarithmic power spectrum to obtain a smoothed power spectrum;

Specifically, by the formula Obtaining a smooth power spectrum;

Wherein,

Step S204: performing silence suppression on the smooth power spectrum to obtain a processing power spectrum;

Step S205: based on the logarithmic power spectrum, for each frame of audio, acquiring a power spectrum with a preset length at a preset frequency of the audio, and taking an average value of the acquired power spectrum as an average power spectrum of the audio;

Specifically, by the formula To calculate an average power spectrum;

Step S206: smoothing the average power spectrum, and calculating the difference value of the average power spectrum after all the audio smoothing processing along with the frequency;

Specifically, by the formula Smoothing the average power spectrum;

Step S207: judging whether a minimum wave trough point exists in the change relation of the differential value along with the frequency; if the minimum trough point does not exist, determining half of the target audio sampling frequency as a cut-off frequency; if the minimum wave trough point exists, judging whether a frequency value corresponding to the increase of the power spectrum exists in a frequency value larger than the frequency value to be analyzed in the power spectrum; if the frequency value corresponding to the power spectrum increase does not exist, selecting the frequency value to be operated from the frequency value larger than the frequency value to be analyzed; if the frequency value corresponding to the power spectrum increase exists, determining the frequency value corresponding to the power spectrum increase as the frequency value to be operated;

Specifically, by the formula Determining a minimum trough point, wherein Pd _log(k)＝Ps_log(k+1)-Ps_log (k);

By the formula To determine a frequency value corresponding to the increase in power value;

step S208: calculating a standard deviation value of a power spectrum corresponding to a frequency value which is larger than or equal to the frequency value to be operated;

Specifically, by the formula To calculate a standard deviation value;

step S209: judging whether the standard deviation value is smaller than a preset threshold value or not; if the standard deviation value is smaller than a preset threshold value, determining the frequency value to be analyzed as a cut-off frequency; and if the standard deviation value is greater than or equal to a preset threshold value, determining half of the target audio sampling frequency as the cut-off frequency.

For ease of understanding, it is assumed that the frequency and power spectrum information of the target audio is shown in fig. 9, and the pulses in the graph represent the audio, the corresponding parameter information when processed according to the present application may be shown in fig. 4, where the cut-off frequency may be explicitly determined to be half of the sampling frequency according to the method of the present application; for the frequency and power spectrum information of the target audio shown in fig. 10, the corresponding parameter information when processed according to the present application may be shown as 11, where the method according to the present application may explicitly determine the cutoff frequency as the frequency value to be analyzed, that is, the frequency value corresponding to the minimum trough point in the power spectrum difference value, and this corresponds to the true cutoff frequency value; the application can accurately determine the cut-off frequency so as to accurately process the audio.

Referring to fig. 12, an audio processing apparatus according to an embodiment of the present application is also disclosed, and applied to a background server, including:

An audio acquisition module 11, configured to acquire target audio to be processed;

a power spectrum calculation module 12 for calculating a power spectrum of the target audio;

the cut-off frequency judging module 13 is used for determining whether a frequency value to be analyzed with the largest power variation amplitude exists in the power spectrum, and judging whether the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable or not; and if the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable, determining the frequency value to be analyzed as a cut-off frequency, and processing the target audio based on the cut-off frequency.

In this embodiment, after the target audio to be processed is obtained, the power spectrum of the target audio needs to be calculated; if the cut-off frequency is not half of the audio sampling frequency, the power spectrum variation amplitude near the cut-off frequency is maximum, so that whether a frequency value to be analyzed with the maximum power variation amplitude exists in the power spectrum needs to be judged, if the frequency value to be analyzed exists in the power spectrum is determined, the accuracy that the frequency value to be analyzed is the cut-off frequency needs to be further judged, and if the power spectrum behind the cut-off frequency is in a stable state, whether the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable can be judged; if the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable, the frequency value to be analyzed is determined to be the cut-off frequency, so that the accurate cut-off frequency is obtained, and the accuracy of audio processing can be improved if the target audio is processed based on the cut-off frequency.

In some embodiments, the cutoff frequency determining module may be specifically configured to: and determining that no frequency value to be analyzed exists in the power spectrum, and determining half of the target audio sampling frequency as a cut-off frequency.

In some embodiments, the cutoff frequency determining module may be specifically configured to: and if the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is not stable, determining half of the target audio sampling frequency as the cut-off frequency.

In some embodiments, the cutoff frequency determining module may be specifically configured to: calculating a differential value of the power spectrum with frequency; judging whether the frequency value to be analyzed corresponding to the difference value with the minimum value exists or not, wherein the difference values corresponding to the frequency values at the left side and the right side of the frequency to be analyzed are larger than the difference values corresponding to the frequency value to be analyzed; if the frequency value to be analyzed does not exist, determining that the frequency value to be analyzed does not exist in the power spectrum; if the frequency value to be analyzed exists, determining that the frequency value to be analyzed exists in the power spectrum.

In some embodiments, the cutoff frequency determining module may be specifically configured to: judging whether a minimum wave trough point exists in the change relation of the differential value along with the frequency; if the minimum trough point does not exist, judging that the frequency value to be analyzed does not exist; if the minimum wave trough point exists, determining the frequency corresponding to the minimum wave trough point as a frequency value to be analyzed.

In some embodiments, the cut-off frequency judging module may be specifically configured to judge, in the power spectrum, whether a frequency value corresponding to the increase of the power spectrum exists in a frequency value greater than the frequency value to be analyzed; if the frequency value corresponding to the power spectrum increase does not exist, selecting the frequency value to be operated from the frequency value larger than the frequency value to be analyzed; if the frequency value corresponding to the power spectrum increase exists, determining the frequency value corresponding to the power spectrum increase as the frequency value to be operated; calculating a standard deviation value of a power spectrum corresponding to a frequency value which is larger than or equal to the frequency value to be operated; judging whether the standard deviation value is smaller than a preset threshold value or not; if the standard deviation value is smaller than the preset threshold value, determining that the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable; if the standard deviation value is greater than or equal to a preset threshold value, determining that the power spectrum corresponding to the frequency value greater than the frequency value to be analyzed is unstable.

In some embodiments, the cutoff frequency determining module may be specifically configured to: performing silence suppression on the power spectrum to obtain a processing power spectrum; a differential value of the processing power spectrum with frequency is calculated.

In some embodiments, the cutoff frequency determining module may be specifically configured to: smoothing the power spectrum to obtain a smooth power spectrum; and performing silence suppression on the smooth power spectrum to obtain a processing power spectrum.

In some embodiments, the power spectrum calculation module may be specifically configured to: performing silence suppression on the target audio based on the time domain signal energy of the target audio to obtain processed audio; a power spectrum of the processed audio is calculated.

In some embodiments, the cutoff frequency determining module may be specifically configured to: calculating an average power spectrum of each frame of audio based on the power spectrum; the difference value of the average power spectrum of all the audios with frequency is calculated.

In some embodiments, the cutoff frequency determining module may be specifically configured to: for each frame of audio, acquiring a power spectrum with a preset length at a preset frequency of the audio, and taking an average value of the acquired power spectrum as an average power spectrum of the audio.

In some embodiments, the power spectrum calculation module may be specifically configured to: a log power spectrum of the target audio is calculated.

Further, the embodiment of the application also provides electronic equipment. Fig. 13 is a block diagram of an electronic device 20, according to an exemplary embodiment, and the contents of the diagram should not be construed as limiting the scope of use of the present application in any way.

Fig. 13 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. Wherein the memory 22 is used for storing a computer program, which is loaded and executed by the processor 21 to implement the relevant steps in the audio processing method disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be a server.

In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.

The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon may include an operating system 221, a computer program 222, video data 223, and the like, and the storage may be temporary storage or permanent storage.

The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and the computer program 222, so as to implement the operation and processing of the processor 21 on the massive video data 223 in the memory 22, which may be Windows Server, netware, unix, linux, etc. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the audio processing method performed by the electronic device 20 as disclosed in any of the previous embodiments. The data 223 may include various video data collected by the electronic device 20.

Further, the embodiment of the application also discloses a storage medium, wherein the storage medium stores a computer program, and when the computer program is loaded and executed by a processor, the steps of the audio processing method disclosed in any one of the previous embodiments are realized.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

It is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An audio processing method, comprising:

acquiring target audio to be processed;

calculating a power spectrum of the target audio;

If the power spectrum corresponding to the frequency value larger than the frequency value to be analyzed is stable, determining the frequency value to be analyzed as a cut-off frequency, and processing the target audio based on the cut-off frequency;

wherein the determining whether the power spectrum corresponding to the frequency value greater than the frequency value to be analyzed is stable includes:

In the power spectrum, determining a frequency value of which the frequency value is larger than the frequency value to be analyzed and corresponds to the power spectrum increase, and determining the frequency value of which the corresponding power spectrum increases as the frequency value to be operated;

2. The method as recited in claim 1, further comprising:

3. The method as recited in claim 1, further comprising:

4. The method of claim 1, wherein determining whether the frequency value to be analyzed with the greatest power variation amplitude exists in the power spectrum comprises:

Calculating a differential value of the power spectrum with frequency;

5. The method according to claim 4, wherein the determining whether the frequency value to be analyzed corresponding to the differential value with the smallest value exists, and the differential values corresponding to the frequency values on the left and right sides of the frequency to be analyzed are both larger than the differential values corresponding to the frequency value to be analyzed, includes:

6. The method of claim 5, wherein the determining a frequency value greater than the frequency value to be analyzed corresponds to a frequency value of power spectrum increase, and wherein determining the frequency value of corresponding power spectrum increase as the frequency value to be operated comprises:

Judging whether a frequency value corresponding to the increase of the power spectrum exists in a frequency value larger than the frequency value to be analyzed; if the frequency value corresponding to the power spectrum increase does not exist, selecting a frequency value to be operated from frequency values larger than the frequency value to be analyzed; and if the frequency value of the corresponding power spectrum increase exists, determining the frequency value of the corresponding power spectrum increase as a frequency value to be operated.

7. The method of claim 6, wherein said calculating a differential value of said power spectrum with frequency comprises:

8. The method of claim 7, wherein muting the power spectrum results in a processed power spectrum, comprising:

Smoothing the power spectrum to obtain a smooth power spectrum;

9. The method of claim 5, wherein said calculating a power spectrum of said target audio comprises:

the power spectrum of the processed audio is calculated.

10. The method according to any one of claims 7 to 9, wherein said calculating a differential value of the power spectrum with frequency comprises:

11. The method of claim 10, wherein said calculating an average power spectrum for each frame of audio comprises:

12. The method of claim 10, wherein the calculating the power spectrum of the target audio comprises:

And calculating the logarithmic power spectrum of the target audio.

13. An electronic device, comprising:

a memory for storing a computer program;

A processor for executing the computer program to implement the audio processing method of any one of claims 1 to 12.

14. A computer readable storage medium for storing a computer program which, when executed by a processor, implements the audio processing method according to any one of claims 1 to 12.