CN111862995A

CN111862995A - Code rate determination model training method, code rate determination method and device

Info

Publication number: CN111862995A
Application number: CN202010575623.7A
Authority: CN
Inventors: 郑羲光; 董培; 张晨
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-06-22
Filing date: 2020-06-22
Publication date: 2020-10-30

Abstract

The disclosure relates to a code rate determination model training method, a code rate determination method and a device, wherein the method comprises the following steps: acquiring an audio sample data set; acquiring characteristic information of each audio signal and a target coding rate corresponding to the audio signal; inputting the acquired characteristic information into a code rate determination model to be trained to obtain a coding code rate output by the code rate determination model to be trained; determining the coding rate output by the model and the target coding rate according to the code rate to be trained, and obtaining the loss value of the code rate determination model to be trained; and adjusting model parameters of the code rate determination model to be trained according to the loss value, and taking the code rate determination model to be trained as the trained code rate determination model until the loss value is lower than a preset threshold value. Therefore, in the subsequent steps, when the audio signal to be coded is coded, the code rate determining model can obtain the code rate with proper size, and the audio quality of the coded audio signal can be ensured.

Description

Code rate determination model training method, code rate determination method and device

Technical Field

The application relates to the technical field of audio and video, in particular to a code rate determination model training method, a code rate determination method and a code rate determination device.

Background

With the development of mobile internet, the use of audio on terminals becomes a demand for more and more users, and in order to save transmission resources and storage resources, audio signals need to be encoded in transmission and storage. Audio coding techniques can be classified into lossless coding, i.e., a terminal can perfectly restore an original audio signal through a decoder; another encoding method is lossy encoding, i.e., the audio signal decoded by the terminal through the decoder is compressed to different degrees.

In the related art, when an audio signal is encoded, a code rate is usually specified, an encoder may encode according to the specified code rate, and in order to ensure the quality of the encoded audio signal, a high code rate is usually specified to encode the audio signal.

Thus, a higher transmission bandwidth may be required when transmitting the encoded audio signal; in addition, when the encoded audio signal is stored, a large storage space is required, which results in waste of transmission resources and storage resources.

Disclosure of Invention

In order to solve the technical problem that transmission resources and storage resources are wasted when the coded audio signal is transmitted and stored due to high coding rate of the audio signal in the related art, the present disclosure provides a code rate determination model training method, a code rate determination method and a device, and the technical scheme of the present disclosure is as follows:

According to a first aspect of the embodiments of the present disclosure, there is provided a method for training a code rate determination model, the method including:

acquiring an audio sample data set, wherein the audio sample data set comprises different types of audio signals;

acquiring characteristic information of each audio signal and a target coding rate corresponding to the audio signal, wherein the characteristic information is associated with the type of the audio signal, and the target coding rate is the lowest coding rate when the audio signal meets the target audio quality;

inputting the acquired characteristic information into a code rate determination model to be trained to obtain a coding code rate output by the code rate determination model to be trained;

determining the coding rate output by the model according to the code rate to be trained and the target coding rate, and obtaining the loss value of the code rate determination model to be trained;

and adjusting model parameters of the code rate determination model to be trained according to the loss value until the loss value is lower than a preset threshold value, and taking the code rate determination model to be trained as the trained code rate determination model.

Optionally, obtaining the target coding rate corresponding to the audio signal includes:

Encoding the audio signal according to a preset code rate to obtain an encoded audio signal;

calculating a quality loss value of the encoded audio signal from the audio signal and the encoded audio signal;

and when the quality loss value is smaller than a quality loss threshold value and the quality loss value is the minimum quality loss value, determining the preset code rate as the target coding code rate corresponding to the audio signal.

Optionally, when the quality loss value is smaller than a quality loss threshold and the quality loss value is a minimum quality loss value, determining the preset code rate as a target coding code rate corresponding to the audio signal, including:

when the quality loss value is smaller than a quality loss threshold, reducing the preset code rate, and encoding the audio signal according to the reduced preset code rate to obtain an encoded audio signal until the audio quality loss value is larger than the quality loss threshold;

and taking the previous reduced preset code rate as a target coding code rate.

Optionally, the acquiring the feature information of each audio signal includes:

and acquiring amplitude information and phase information of each audio signal in a time-frequency domain, and determining characteristic information of the audio signal according to the amplitude information and/or the phase information.

Optionally, the obtaining the feature information of each audio signal and the target coding rate corresponding to the audio signal includes:

acquiring characteristic information of each frame of signal of each audio signal and a target coding rate corresponding to each frame of signal of each audio signal;

or acquiring the characteristic information of each frame signal in each audio signal, taking the average value of the characteristic information of each frame signal as the characteristic information of the audio signal, and acquiring the target coding rate corresponding to the characteristic information of the audio signal.

According to a second aspect of the embodiments of the present disclosure, there is provided a code rate determining method, the method including:

acquiring characteristic information of an audio signal to be encoded;

inputting the characteristic information of the audio signal to be encoded into the code rate determination model of the first aspect to obtain an encoding code rate corresponding to the audio signal to be encoded, and encoding the audio signal to be encoded according to the encoding code rate corresponding to the audio signal to be encoded.

Optionally, the obtaining the feature information of the audio signal to be encoded includes:

acquiring amplitude information and phase information of the audio signal to be coded in a time-frequency domain, and determining characteristic information of the audio signal to be coded according to the amplitude information and/or the phase information.

acquiring characteristic information of each frame of signal of an audio signal to be coded;

or acquiring the characteristic information of each frame signal in the audio information to be coded, and taking the average value of the characteristic signals of each frame signal as the characteristic information of the audio signal to be coded.

According to a third aspect of the embodiments of the present disclosure, there is provided a code rate determination model training apparatus, the apparatus including:

the audio signal acquisition module is configured to execute acquisition of an audio sample data set, wherein the audio sample data set comprises different types of audio signals;

an information and code rate obtaining module configured to perform obtaining feature information of each audio signal and a target coding rate corresponding to the audio signal, where the feature information is associated with a type of the audio signal, and the target coding rate is a lowest coding rate at which the audio signal meets a target audio quality;

the code rate obtaining module is configured to input the obtained characteristic information into a code rate determination model to be trained to obtain a code rate output by the code rate determination model to be trained;

The loss value obtaining module is configured to execute the coding rate output by the code rate determination model to be trained and the target coding rate, and obtain the loss value of the code rate determination model to be trained;

and the model parameter adjusting module is configured to adjust the model parameters of the code rate determining model to be trained according to the loss value until the loss value is lower than a preset threshold value, and the code rate determining model to be trained is used as the trained code rate determining model.

Optionally, the information and code rate obtaining module includes:

an audio signal encoding unit configured to perform encoding of the audio signal according to a preset code rate to obtain an encoded audio signal;

a quality loss value calculation unit configured to perform calculating a quality loss value of the encoded audio signal from the audio signal and the encoded audio signal;

and the target coding rate determining unit is configured to determine the preset coding rate as the target coding rate corresponding to the audio signal when the quality loss value is smaller than a quality loss threshold and the quality loss value is the minimum quality loss value.

Optionally, the target coding rate determining unit is specifically configured to perform:

and taking the previous reduced preset code rate as a target coding code rate.

Optionally, the information and code rate obtaining module is specifically configured to perform:

According to a fourth aspect of the embodiments of the present disclosure, there is provided a code rate determining apparatus, the apparatus including:

a feature information acquisition module configured to perform acquisition of feature information of an audio signal to be encoded;

and the coding rate determination module is configured to execute the step of inputting the characteristic information of the audio signal to be coded into the rate determination model in the third aspect to obtain a coding rate corresponding to the audio signal to be coded, so as to code the audio signal to be coded according to the coding rate corresponding to the audio signal to be coded.

Optionally, the characteristic information obtaining module is specifically configured to perform:

According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the code rate determination model training method of the first aspect.

According to a sixth aspect of an embodiment of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the code rate determination method according to the second aspect.

According to a seventh aspect of embodiments of the present disclosure, there is provided a storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the code rate determination model training method according to the first aspect.

According to an eighth aspect of embodiments of the present disclosure, there is provided a storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the code rate determination method of the second aspect.

According to a ninth aspect of embodiments of the present disclosure, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to implement the code rate determining model training method of the first aspect.

According to a tenth aspect of embodiments of the present disclosure, there is provided a computer program product containing instructions that, when run on a computer, cause the computer to implement the code rate determination method of the second aspect.

Therefore, according to the technical scheme provided by the embodiment of the disclosure, the target output of the code rate determination model is the target coding code rate, and the target coding code rate is the lowest coding code rate when the audio signal meets the target audio quality, so that in the subsequent steps, when the audio data to be coded is coded, the code rate determination model can obtain the coding code rate with a proper size, and the audio quality of the coded audio data can be ensured. Unlike the related art, a higher coding rate is determined, so that the transmission bandwidth of the coded audio data during transmission and the storage space of the coded audio data during storage can be saved.

Drawings

FIG. 1 is a flow diagram illustrating a code rate determination model training method in accordance with an exemplary embodiment;

FIG. 2 is a diagram illustrating a code rate determination model training process according to an example embodiment

FIG. 3 is a flowchart illustrating obtaining a target encoding rate for an audio signal according to an example embodiment;

FIG. 4 is a diagram illustrating a process of obtaining a target coding rate for an audio signal according to an example embodiment;

FIG. 5 is a flow diagram illustrating a method for code rate determination in accordance with an exemplary embodiment;

FIG. 6 is a block diagram illustrating a code rate determination model training apparatus in accordance with an exemplary embodiment;

FIG. 7 is a block diagram illustrating a code rate determination apparatus according to an example embodiment;

FIG. 8 is a block diagram illustrating an electronic device in accordance with an exemplary embodiment;

FIG. 9 is a block diagram illustrating another electronic device in accordance with an exemplary embodiment;

FIG. 10 is a block diagram illustrating a code rate determination model training apparatus or a code rate determination apparatus according to an example embodiment;

fig. 11 is a block diagram illustrating another apparatus for training a code rate determination model or an apparatus for determining a code rate according to an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In order to solve the technical problem that transmission resources and storage resources are wasted when the coded audio signal is transmitted and stored due to the fact that the coding rate of the audio signal is high in the related art, the disclosure provides a code rate determination model training method, a code rate determination method and a device.

In a first aspect, a method for training a code rate determination model provided in an embodiment of the present disclosure will be described in detail.

As shown in fig. 1, a method for training a code rate determination model according to an embodiment of the present disclosure may include the following steps:

in step S11, an audio sample data set is acquired.

Wherein, the audio sample data set comprises different types of audio signals.

Specifically, when the code rate determination model is trained, a large amount of sample data needs to be acquired, that is, an audio sample data set needs to be acquired. Also, different types of audio signals may be included in the set of audio sample data.

For example, the audio sample data set may include different types of audio signals, such as a speech signal, a music signal, and a background environment sound signal, and the type of the audio signal included in the audio sample data set is not specifically limited in the embodiments of the present disclosure.

In step S12, the feature information of each audio signal and the target coding rate corresponding to the audio signal are obtained.

The characteristic information of the audio signal is associated with the type of the audio signal, and the target coding rate is the lowest coding rate when the audio signal meets the target audio quality.

Specifically, after the audio sample data set is obtained, the feature information and the target coding rate of each audio signal in the audio sample data set may be obtained.

The characteristic information of an audio signal is associated with the type of audio signal, and is usually different for different types of audio signals. Specifically, when the type of the audio signal is a voice signal, the feature information of the audio signal is the feature information of the voice signal; when the type of the audio signal is a music signal, the characteristic information of the audio signal is the characteristic information of the music signal; similarly, when the type of the audio signal is a background environment sound signal, the characteristic information of the audio signal is the characteristic information of the background environment sound signal. The characteristic information of the audio signal may be amplitude information, phase information, and the like of the audio signal in a time-frequency domain, and the embodiment of the present disclosure does not specifically limit the characteristic information of the audio signal.

In addition, in order to ensure that the transmission bandwidth of the encoded audio signal during transmission and the storage space of the encoded audio signal during storage can be saved as much as possible when the encoded audio signal meets the specified audio quality, the target encoding rate is required to be the lowest encoding rate when the audio signal meets the target audio quality. The target audio quality may be a designated audio quality, and the designated audio quality may be determined according to actual conditions, for example, for an audio signal of which the type is background environmental sound, the designated audio quality may be lower; for audio signals of the type music signals, the specified audio quality may be higher. The size of the target audio quality is not particularly limited in the embodiments of the present disclosure.

For clarity of the description of the scheme, the following embodiments will explain in detail specific implementations of obtaining the feature information of each audio signal and the target coding rate corresponding to the audio signal.

In step S13, the obtained feature information is input into the code rate determination model to be trained, so as to obtain the coding code rate output by the code rate determination model to be trained.

After the feature information of the audio signal and the target coding rate corresponding to the audio signal are obtained, the code rate determination model may be trained. Specifically, the obtained feature information of the audio signal may be input to the code rate determination model to be trained, and the coding code rate of the audio signal may be output from the code rate determination model to be trained.

In step S14, the loss value of the code rate determination model to be trained is obtained according to the code rate output by the code rate determination model to be trained and the target code rate.

Specifically, since the target output of the code rate determination model is the target coding code rate, after the coding code rate output by the code rate determination model to be trained is obtained, the coding code rate output by the model and the target coding code rate can be determined according to the code rate to be trained, so as to calculate the loss value of the code rate determination model to be trained.

It can be understood that the loss value of the code rate determination model to be trained can be used to characterize the magnitude of the difference between the coding code rate output from the code rate determination model to be trained and the target coding code rate. If the loss value of the code rate determination model to be trained is larger, the difference value between the coding code rate output by the code rate determination model to be trained and the target coding code rate is larger; and if the loss value of the code rate determination model to be trained is smaller, the difference value between the coding code rate output by the code rate determination model to be trained and the target coding code rate is smaller.

In step S15, the model parameters of the code rate determination model to be trained are adjusted according to the loss value, and the code rate determination model to be trained is used as the trained code rate determination model until the loss value is lower than the preset threshold value.

Specifically, if the loss value of the to-be-trained code rate determination model is large, it indicates that the difference between the coding code rate output from the to-be-trained code rate determination model and the target coding code rate is large, and in order to make the coding code rate output from the to-be-trained code rate determination model approach the target coding code rate, the model parameters of the to-be-trained code rate determination model may be adjusted.

And after adjusting the model parameters of the code rate determination model to be trained, inputting the characteristic information of the audio signal into the code rate determination model to be trained again, obtaining the coding code rate output from the code rate determination model to be trained again, and calculating the loss value of the code rate determination model to be trained according to the coding code rate output from the code rate determination model to be trained and the target coding code rate obtained again. When the loss value is smaller than the preset threshold value, the coding rate output from the code rate determination model to be trained is close to the target coding rate, and at the moment, the code rate determination model to be trained can be used as the trained code rate determination model.

It should be noted that the preset threshold may be determined according to actual conditions, and the size of the preset threshold is not specifically limited in the embodiment of the present disclosure.

In order to more intuitively and clearly understand the training process of the code rate determination model, the following describes the training process of the code rate determination model with reference to a specific example, as shown in fig. 2.

When a code rate determination model is trained, obtaining an audio signal; and extracting the characteristics of the audio signal, and estimating the coding rate of the audio signal based on comprehensive analysis, namely obtaining the target coding rate corresponding to the audio signal, wherein the target coding rate is the target output of the code rate determination model.

And then training a neural network, namely a code rate determination model to be trained, based on the extracted features and the coding rate of the audio signal. And continuously optimizing parameters of the neural network in the process of training the neural network, obtaining the optimal parameters of the neural network when the loss value of the neural network is smaller than a preset threshold value, and determining the trained neural network as a trained code rate determination model.

For clarity of the description of the scheme, a detailed description will be given below of a specific implementation of obtaining a target coding rate corresponding to an audio signal.

In one embodiment, obtaining the target coding rate corresponding to the audio signal, as shown in fig. 3, may include the following steps:

in step S31, the audio signal is encoded according to a preset code rate to obtain an encoded audio signal.

Specifically, when the target coding rate of the audio signal is determined, the audio signal may be coded according to a predetermined preset coding rate, so as to obtain the coded audio signal. The number of the preset code rates may be multiple, and specifically, the multiple preset code rates may be an initial code rate with a predetermined larger value and a code rate obtained by reducing the initial code rate.

It should be noted that the size of the preset code rate may be set according to an actual situation, and the size of the preset code rate is not specifically limited in the embodiment of the present disclosure.

In step S32, a quality loss value of the encoded audio signal is calculated from the audio signal and the encoded audio signal.

Specifically, after obtaining the encoded audio signal, the audio quality of the encoded audio signal may be obtained, and the quality loss value of the encoded audio signal may be determined according to the audio quality of the encoded audio signal and the signal quality of the audio signal before encoding. The quality loss value may be used to measure the degree of loss of audio quality in the encoded audio signal compared to the audio signal before encoding.

The process of calculating the quality loss value of the encoded audio signal may be: and performing quality loss evaluation on the audio quality of the encoded audio signal and the audio quality of the audio signal before encoding by using an audio quality evaluation method to obtain a quality loss value of the encoded audio signal.

Also, the audio quality evaluation method may include an objective evaluation method and a subjective evaluation method. For example, the objective evaluation method may be PEAQ (Perceptual evaluation of Audio Quality), etc.; the subjective evaluation method may be MUSHRA (Multi-Stimulus Test with Hidden Reference and benchmark Test method), or the like. The audio quality evaluation method according to the embodiment of the present disclosure is not particularly limited. For example, PEAQ performs a quality loss evaluation on the audio quality of the encoded audio signal and the audio quality of the audio signal before encoding, and the resulting quality loss value of the encoded audio signal may be 0.1.

In step S33, when the quality loss value is smaller than the quality loss threshold and the quality loss value is the minimum quality loss value, the preset code rate is determined as the target coding rate corresponding to the audio signal.

Wherein the quality loss threshold may be a magnitude of a difference between the audio quality of the unencoded audio signal and the target audio quality.

Specifically, if the audio signal is encoded by using a preset code rate, the quality loss value of the encoded audio signal is smaller than the quality loss threshold, and the quality loss value is the minimum quality loss value, it indicates that after the audio signal is encoded by using the preset code rate, the audio quality of the encoded audio signal can meet the target audio quality, and the audio quality of the encoded audio signal just meets the target audio quality, that is, the preset code rate is the lowest encoding code rate when the target audio quality is met. Therefore, the preset code rate can be determined as the target coding rate corresponding to the audio signal.

As an implementation manner of the embodiment of the present disclosure, when the quality loss value is smaller than the quality loss threshold and the quality loss value is the minimum quality loss value, determining the preset code rate as the target coding code rate corresponding to the audio signal may include the following steps:

When the quality loss value is smaller than the quality loss threshold, reducing a preset code rate, and encoding the audio signal according to the reduced preset code rate to obtain an encoded audio signal until the audio quality loss value is larger than the quality loss threshold;

and taking the previous reduced preset code rate as a target coding code rate.

Specifically, when determining the target coding rate of the audio signal, the audio signal may be encoded according to a larger preset coding rate, that is, the initial coding rate. The initial coding rate may be a coding rate that can ensure the audio quality of the encoded audio signal to the greatest extent.

And after the audio signal is encoded according to the initial encoding code rate, the quality loss value of the encoded audio signal can be compared with the quality loss threshold, if the quality loss value of the encoded audio signal is smaller than the quality loss threshold, the quality loss value of the encoded audio signal is smaller, and the preset code rate can be continuously reduced under the condition that the target audio quality is met. And re-encoding the audio signal according to the reduced preset code rate, comparing the quality loss value of the re-encoded audio signal with the quality loss threshold, if the quality loss value of the re-encoded audio signal is still smaller than the quality loss threshold, continuing to reduce the preset code rate until the quality loss value of the encoded audio signal is larger than the quality loss threshold, indicating that the former reduced preset code rate is the lowest encoding code rate when the encoded audio signal meets the target audio quality, and therefore, taking the former reduced preset code rate as the target encoding code rate.

It can be seen that the target coding rate determined by the implementation manner is the lowest coding rate when the encoded audio signal meets the target audio quality.

In order to more intuitively and clearly understand the process of obtaining the target coding rate of the audio signal, the following describes a process of obtaining the target coding rate of the audio signal with reference to a specific example, as shown in fig. 4.

Firstly, an audio encoder encodes an audio signal according to an initial code rate, and performs quality damage assessment on the encoded audio signal.

Second, it is determined whether the mass damage value is above a threshold. If the judgment result is negative, namely the quality damage value is lower than the threshold value, the code rate is updated, namely the initial code rate is reduced.

Thirdly, the audio encoder encodes the audio signal according to the updated code rate, obtains the encoded audio again, and evaluates the quality damage of the encoded audio signal.

Fourthly, judging whether the quality damage value is higher than the threshold value again. If the judgment result is negative, namely the quality damage value is lower than the threshold value, the code rate is updated, namely the initial code rate is reduced, and the audio encoder encodes the audio signal according to the updated code rate. And outputting the last code rate meeting the threshold value until the judgment result is yes, namely when the quality damage value is judged to be higher than the threshold value, namely outputting the code rate with the damage value lower than the code rate corresponding to the threshold value, wherein the output code rate is the target coding code rate.

For clarity of description of the scheme, a detailed description of a specific embodiment for acquiring the feature information of each audio signal will be described below.

In one embodiment, obtaining the feature information of each audio signal may include:

Specifically, the audio signal may be converted to a time-frequency domain by using a time-frequency conversion method, such as short-time fourier transform, to obtain a complex signal S (n, k).

S(n，k)＝A(n，k)*e^iθ(n，k)

Where a (n, k) is amplitude information and θ (n, k) is phase information.

Moreover, as an implementation manner of the embodiment of the present disclosure, after obtaining the amplitude information and the phase information, the amplitude information may be directly used as the feature information of the audio signal; alternatively, the phase information may be used as characteristic information of the audio signal; alternatively, it is reasonable to use both amplitude information and phase information as the characteristic information of the audio signal.

As another implementation manner of the embodiment of the present disclosure, after the amplitude information and the phase information are obtained, other characteristic information of the audio signal may be obtained by performing preset processing on the amplitude information and/or the phase information. Wherein, the other characteristic information may include: MFCC (Mel Frequency Cepstrum Coefficient, Mel Frequency cepstral Coefficient); mel-frequency spectrum melspectrogram; spectral contrast ratio, etc., and other audio features are not particularly limited in the embodiments of the present disclosure. In this case, it is reasonable that any one or more of the amplitude information, the phase information, and the other characteristic information may be used as the characteristic information of the audio signal.

Therefore, the technical scheme provided by the embodiment can accurately show the characteristic information of the audio signal.

For clarity of the description of the scheme, a detailed description will be given below of a specific implementation for obtaining the feature information of each audio signal and the target coding rate corresponding to the audio signal.

In one embodiment, obtaining the characteristic information of each audio signal and the target coding rate corresponding to the audio signal may include the following steps:

and acquiring the characteristic information of each frame of signal of each audio signal and the target coding rate corresponding to each frame of signal of each audio signal.

In this embodiment, when training the code rate determination model, the feature information of each frame of signal of each audio signal and the corresponding coding code rate may be obtained, that is, more training data of the code rate determination model are trained, and therefore, the accuracy of the trained coding code rate determination model is higher.

Therefore, by the technical scheme of the embodiment, the accuracy of the trained code rate determination model is high.

In another embodiment, obtaining the characteristic information of each audio signal and the target coding rate corresponding to the audio signal may include the following steps:

The method comprises the steps of obtaining the characteristic information of each frame of signal in each audio signal, taking the average value of the characteristic information of each frame of signal as the characteristic information of the audio signal, and obtaining the target coding rate corresponding to the characteristic information of the audio signal.

In practical application, in order to reduce the calculation amount of the process of determining the model by training the code rate, the dimension reduction can be performed on the characteristic information of the audio signal.

For example, if an audio signal is an audio signal of 30 consecutive frames, the feature information of the audio signal of 30 frames may be averaged to obtain the feature information of a frame length, and the feature information of the frame length is determined as the feature information of the audio signal, where the target coding rate is a coding rate corresponding to the feature information of the audio signal.

Therefore, the technical scheme of the embodiment can reduce the calculation amount of the process of determining the model by training the code rate.

In a second aspect, a method for determining a code rate provided by an embodiment of the present disclosure will be described in detail.

As shown in fig. 5, the method for determining a code rate according to the embodiment of the present disclosure may include the following steps:

in step S51, feature information of the audio signal to be encoded is acquired.

Specifically, when audio signals are transmitted or stored, in order to reduce transmission bandwidth or storage space, the audio signals need to be encoded, and these audio signals to be encoded may be referred to as audio signals to be encoded.

In order to accurately obtain the coding rate corresponding to the audio signal to be coded, the feature information of the audio signal to be coded needs to be obtained, so that in the subsequent step, the feature information of the audio signal to be coded can be input into the code rate determination model described in the first aspect, and the coding rate corresponding to the audio signal to be coded is obtained.

In one embodiment, obtaining the feature information of the audio signal to be encoded may include the following steps:

acquiring amplitude information and phase information of the audio signal to be encoded in a time-frequency domain, and determining characteristic information of the audio signal to be encoded according to the amplitude information and/or the phase information.

Specifically, the audio signal to be encoded may be converted to the time-frequency domain by using a time-frequency conversion method, such as short-time fourier transform, to obtain a complex signal S (n, k).

S(n，k)＝A(n，k)*e^iθ(n，k)

Where a (n, k) is amplitude information and θ (n, k) is phase information.

Moreover, as an implementation manner of the embodiment of the present disclosure, after obtaining the amplitude information and the phase information, the amplitude information may be directly used as the feature information of the audio signal to be encoded; alternatively, the phase information may be used as characteristic information of the audio signal to be encoded; alternatively, it is also possible to use both amplitude information and phase information as characteristic information of the audio signal to be encoded, which is reasonable.

As another implementation manner of the embodiment of the present disclosure, after the amplitude information and the phase information are obtained, other feature information of the audio signal to be encoded may be obtained by performing preset processing on the amplitude information and/or the phase information. Wherein, the other characteristic information may include: MFCC (Mel Frequency Cepstrum Coefficient, Mel Frequency cepstral Coefficient); mel-frequency spectrum melspectrogram; spectral contrast ratio, etc., and other audio features are not particularly limited in the embodiments of the present disclosure. At this time, it is reasonable that any one or more of amplitude information, phase information, and other characteristic information may be used as the characteristic information of the audio signal to be encoded.

In step S52, the feature information of the audio signal to be encoded is input into the code rate determination model described in the first aspect, so as to obtain an encoding code rate corresponding to the audio signal to be encoded, and encode the audio signal to be encoded according to the encoding code rate corresponding to the audio signal to be encoded.

After obtaining the feature information of the audio signal to be encoded, the feature information of the audio signal to be encoded may be input into the code rate determination model obtained by the training of the first aspect, so as to obtain the encoding code rate corresponding to the audio signal to be encoded. The coding rate corresponding to the audio signal to be coded obtained by the rate determination model is appropriate, and the audio quality of the coded audio signal can be ensured.

According to the technical scheme provided by the embodiment of the disclosure, the characteristic information of the audio signal to be coded is obtained; inputting the characteristic information of the audio signal to be encoded into the code rate determination model of the first aspect to obtain the encoding code rate corresponding to the audio signal to be encoded, and encoding the audio signal to be encoded according to the encoding code rate corresponding to the audio signal to be encoded. The coding rate corresponding to the audio signal to be coded obtained by the code rate determination model is proper, and the audio quality of the coded audio signal can be ensured, so that the transmission bandwidth during transmission of the coded audio signal and the storage space during storage of the coded audio signal can be saved.

And, in an embodiment, acquiring the feature information of the audio signal to be encoded may include the steps of:

characteristic information of each frame signal of the audio signal to be encoded is acquired.

In this embodiment, the feature information of each frame of signal of the audio signal to be encoded may be obtained, so that, in the subsequent step, by inputting the feature information of each frame of signal of the audio signal to be encoded into the code rate determination model, the accuracy of the obtained code rate corresponding to the audio signal to be encoded is relatively higher.

In another embodiment, obtaining the feature information of the audio signal to be encoded may include the following steps:

and acquiring the characteristic information of each frame of signal in the audio information to be coded, and taking the average value of the characteristic signals of each frame of signal as the characteristic information of the audio signal to be coded.

In practical applications, in order to reduce the workload of acquiring the feature information of the audio signal to be encoded, the feature information of the audio signal to be encoded may be subjected to dimension reduction.

For example, if an audio signal to be encoded is an audio signal of 30 consecutive frames, the feature information of the audio signal to be encoded of the 30 frames may be averaged to obtain the feature information of a frame length, and the feature information of the frame length is determined as the feature information of the audio signal to be encoded.

According to a third aspect of the embodiments of the present disclosure, there is provided a code rate determination model training apparatus, as shown in fig. 6, the apparatus includes:

an audio signal acquisition module 610 configured to perform acquisition of an audio sample data set, where the audio sample data set includes audio signals of different types;

An information and code rate obtaining module 620, configured to perform obtaining feature information of each audio signal and a target coding rate corresponding to the audio signal, where the feature information is associated with a type of the audio signal, and the target coding rate is a lowest coding rate at which the audio signal meets a target audio quality;

a coding rate obtaining module 630, configured to input the obtained feature information into a to-be-trained code rate determination model, so as to obtain a coding rate output by the to-be-trained code rate determination model;

a loss value obtaining module 640, configured to execute an encoding rate output according to the code rate determination model to be trained and the target encoding rate, and obtain a loss value of the code rate determination model to be trained;

and the model parameter adjusting module 650 is configured to perform adjustment of the model parameters of the to-be-trained code rate determination model according to the loss value until the loss value is lower than a preset threshold, and use the to-be-trained code rate determination model as the trained code rate determination model.

Optionally, the information and code rate obtaining module includes:

and taking the previous reduced preset code rate as a target coding code rate.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a code rate determining apparatus, as shown in fig. 7, the apparatus including:

a feature information obtaining module 710 configured to perform obtaining feature information of an audio signal to be encoded;

the encoding rate determining module 720 is configured to execute the step of inputting the feature information of the audio signal to be encoded into the rate determining model described in the third aspect to obtain the encoding rate corresponding to the audio signal to be encoded, so as to encode the audio signal to be encoded according to the encoding rate corresponding to the audio signal to be encoded.

According to a fifth aspect of the embodiments of the present disclosure, there is provided an electronic apparatus, as shown in fig. 8, including:

a processor 810;

a memory 820 for storing the processor-executable instructions;

According to a sixth aspect of an embodiment of the present disclosure, there is provided an electronic apparatus, as shown in fig. 9, including:

a processor 910;

a memory 920 for storing the processor-executable instructions;

Fig. 10 is a block diagram illustrating an apparatus 1000 for training a coding rate determination model, or determining a coding rate, according to an example embodiment. For example, the apparatus 1000 may be provided as a server. Referring to fig. 10, the apparatus 1000 includes a processing component 1022 that further includes one or more processors and memory resources, represented by memory 1032, for storing instructions, such as application programs, that are executable by the processing component 1022. The application programs stored in memory 1032 may include one or more modules that each correspond to a set of instructions. Furthermore, the processing component 1022 is configured to execute instructions to perform the code rate determination model training method of the first aspect, or the code rate determination method of the second aspect.

The device 1000 may also include a power supply component 1026 configured to perform power management for the device 1000, a wired or wireless network interface 1050 configured to connect the device 1000 to a network, and an input/output (I/O) interface 1058. The apparatus 1000 may operate based on an operating system stored in memory 1032, such as Windows Server, MacOS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

Fig. 11 is a block diagram illustrating an apparatus for training a coding rate determination model, or an apparatus 1100 for determining a coding rate according to an example embodiment. For example, the apparatus 1100 may be a mobile phone, a computer, a digital broadcast electronic device, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 11, apparatus 1100 may include one or more of the following components: a processing component 1102, a memory 1104, a power component 1106, a multimedia component 1108, an audio component 1110, an input/output (I/O) interface 1112, a sensor component 1114, and a communication component 1116.

The processing component 1102 generally controls the overall operation of the device 1100, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 1102 may include one or more processors 1120 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 1102 may include one or more modules that facilitate interaction between the processing component 1102 and other components. For example, the processing component 1102 may include a multimedia module to facilitate interaction between the multimedia component 1108 and the processing component 1102.

The memory 1104 is configured to store various types of data to support operation at the device 1100. Examples of such data include instructions for any application or method operating on device 1100, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1104 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 1107 provides power to the various components of the device 1100. The power components 1107 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the apparatus 1100.

The multimedia component 1108 includes a screen that provides an output interface between the device 1100 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 1108 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 1100 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 1110 is configured to output and/or input audio signals. For example, the audio component 1110 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 1100 is in operating modes, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 404 or transmitted via the communication component 1116. In some embodiments, the audio assembly 1110 further includes a speaker for outputting audio signals.

The I/O interface 1112 provides an interface between the processing component 1102 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 1114 includes one or more sensors for providing various aspects of state assessment for the apparatus 1100. For example, the sensor assembly 1114 may detect an open/closed state of the device 1100, the relative positioning of components, such as a display and keypad of the apparatus 1100, the sensor assembly 1114 may also detect a change in position of the apparatus 1100 or a component of the apparatus 1100, the presence or absence of user contact with the apparatus 1100, an orientation or acceleration/deceleration of the apparatus 1100, and a change in temperature of the apparatus 1100. The sensor assembly 1114 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 1114 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1114 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 1116 is configured to facilitate wired or wireless communication between the apparatus 1100 and other devices. The apparatus 1100 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 416 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 1116 also includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 1100 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the code rate determination model training method of the first aspect, or the code rate determination method of the second aspect.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 1104 comprising instructions, executable by the processor 1120 of the apparatus 1100 to perform the method described above is also provided. Alternatively, for example, the storage medium may be a non-transitory computer-readable storage medium, such as a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for rate-determining model training, the method comprising:

2. The method of claim 1, wherein obtaining the target coding rate corresponding to the audio signal comprises:

3. The method of claim 2, wherein when the quality loss value is smaller than a quality loss threshold and the quality loss value is a minimum quality loss value, determining the preset code rate as a target coding code rate corresponding to the audio signal comprises:

and taking the previous reduced preset code rate as a target coding code rate.

4. A method for determining a code rate, the method comprising:

acquiring characteristic information of an audio signal to be encoded;

inputting the characteristic information of the audio signal to be encoded into the code rate determination model of any one of claims 1 to 3 to obtain the encoding code rate corresponding to the audio signal to be encoded, so as to encode the audio signal to be encoded according to the encoding code rate corresponding to the audio signal to be encoded.

5. An apparatus for rate-determining model training, the apparatus comprising:

6. An apparatus for determining a code rate, the apparatus comprising:

an encoding rate determination module, configured to input the feature information of the audio signal to be encoded into the rate determination model of claim 5, to obtain an encoding rate corresponding to the audio signal to be encoded, so as to encode the audio signal to be encoded according to the encoding rate corresponding to the audio signal to be encoded.

7. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the code rate determination model training method of any of claims 1 to 3.

8. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the code rate determination method of claim 4.

9. A storage medium having instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the code rate determination model training method of any of claims 1 to 3.

10. A storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the code rate determination method of claim 4.