CN111862995A - Code rate determination model training method, code rate determination method and device - Google Patents

Code rate determination model training method, code rate determination method and device Download PDF

Info

Publication number
CN111862995A
CN111862995A CN202010575623.7A CN202010575623A CN111862995A CN 111862995 A CN111862995 A CN 111862995A CN 202010575623 A CN202010575623 A CN 202010575623A CN 111862995 A CN111862995 A CN 111862995A
Authority
CN
China
Prior art keywords
code rate
audio signal
audio
encoded
trained
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010575623.7A
Other languages
Chinese (zh)
Inventor
郑羲光
董培
张晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202010575623.7A priority Critical patent/CN111862995A/en
Publication of CN111862995A publication Critical patent/CN111862995A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The disclosure relates to a code rate determination model training method, a code rate determination method and a device, wherein the method comprises the following steps: acquiring an audio sample data set; acquiring characteristic information of each audio signal and a target coding rate corresponding to the audio signal; inputting the acquired characteristic information into a code rate determination model to be trained to obtain a coding code rate output by the code rate determination model to be trained; determining the coding rate output by the model and the target coding rate according to the code rate to be trained, and obtaining the loss value of the code rate determination model to be trained; and adjusting model parameters of the code rate determination model to be trained according to the loss value, and taking the code rate determination model to be trained as the trained code rate determination model until the loss value is lower than a preset threshold value. Therefore, in the subsequent steps, when the audio signal to be coded is coded, the code rate determining model can obtain the code rate with proper size, and the audio quality of the coded audio signal can be ensured.

Description

Code rate determination model training method, code rate determination method and device
Technical Field
The application relates to the technical field of audio and video, in particular to a code rate determination model training method, a code rate determination method and a code rate determination device.
Background
With the development of mobile internet, the use of audio on terminals becomes a demand for more and more users, and in order to save transmission resources and storage resources, audio signals need to be encoded in transmission and storage. Audio coding techniques can be classified into lossless coding, i.e., a terminal can perfectly restore an original audio signal through a decoder; another encoding method is lossy encoding, i.e., the audio signal decoded by the terminal through the decoder is compressed to different degrees.
In the related art, when an audio signal is encoded, a code rate is usually specified, an encoder may encode according to the specified code rate, and in order to ensure the quality of the encoded audio signal, a high code rate is usually specified to encode the audio signal.
Thus, a higher transmission bandwidth may be required when transmitting the encoded audio signal; in addition, when the encoded audio signal is stored, a large storage space is required, which results in waste of transmission resources and storage resources.
Disclosure of Invention
In order to solve the technical problem that transmission resources and storage resources are wasted when the coded audio signal is transmitted and stored due to high coding rate of the audio signal in the related art, the present disclosure provides a code rate determination model training method, a code rate determination method and a device, and the technical scheme of the present disclosure is as follows:
According to a first aspect of the embodiments of the present disclosure, there is provided a method for training a code rate determination model, the method including:
acquiring an audio sample data set, wherein the audio sample data set comprises different types of audio signals;
acquiring characteristic information of each audio signal and a target coding rate corresponding to the audio signal, wherein the characteristic information is associated with the type of the audio signal, and the target coding rate is the lowest coding rate when the audio signal meets the target audio quality;
inputting the acquired characteristic information into a code rate determination model to be trained to obtain a coding code rate output by the code rate determination model to be trained;
determining the coding rate output by the model according to the code rate to be trained and the target coding rate, and obtaining the loss value of the code rate determination model to be trained;
and adjusting model parameters of the code rate determination model to be trained according to the loss value until the loss value is lower than a preset threshold value, and taking the code rate determination model to be trained as the trained code rate determination model.
Optionally, obtaining the target coding rate corresponding to the audio signal includes:
Encoding the audio signal according to a preset code rate to obtain an encoded audio signal;
calculating a quality loss value of the encoded audio signal from the audio signal and the encoded audio signal;
and when the quality loss value is smaller than a quality loss threshold value and the quality loss value is the minimum quality loss value, determining the preset code rate as the target coding code rate corresponding to the audio signal.
Optionally, when the quality loss value is smaller than a quality loss threshold and the quality loss value is a minimum quality loss value, determining the preset code rate as a target coding code rate corresponding to the audio signal, including:
when the quality loss value is smaller than a quality loss threshold, reducing the preset code rate, and encoding the audio signal according to the reduced preset code rate to obtain an encoded audio signal until the audio quality loss value is larger than the quality loss threshold;
and taking the previous reduced preset code rate as a target coding code rate.
Optionally, the acquiring the feature information of each audio signal includes:
and acquiring amplitude information and phase information of each audio signal in a time-frequency domain, and determining characteristic information of the audio signal according to the amplitude information and/or the phase information.
Optionally, the obtaining the feature information of each audio signal and the target coding rate corresponding to the audio signal includes:
acquiring characteristic information of each frame of signal of each audio signal and a target coding rate corresponding to each frame of signal of each audio signal;
or acquiring the characteristic information of each frame signal in each audio signal, taking the average value of the characteristic information of each frame signal as the characteristic information of the audio signal, and acquiring the target coding rate corresponding to the characteristic information of the audio signal.
According to a second aspect of the embodiments of the present disclosure, there is provided a code rate determining method, the method including:
acquiring characteristic information of an audio signal to be encoded;
inputting the characteristic information of the audio signal to be encoded into the code rate determination model of the first aspect to obtain an encoding code rate corresponding to the audio signal to be encoded, and encoding the audio signal to be encoded according to the encoding code rate corresponding to the audio signal to be encoded.
Optionally, the obtaining the feature information of the audio signal to be encoded includes:
acquiring amplitude information and phase information of the audio signal to be coded in a time-frequency domain, and determining characteristic information of the audio signal to be coded according to the amplitude information and/or the phase information.
Optionally, the obtaining the feature information of the audio signal to be encoded includes:
acquiring characteristic information of each frame of signal of an audio signal to be coded;
or acquiring the characteristic information of each frame signal in the audio information to be coded, and taking the average value of the characteristic signals of each frame signal as the characteristic information of the audio signal to be coded.
According to a third aspect of the embodiments of the present disclosure, there is provided a code rate determination model training apparatus, the apparatus including:
the audio signal acquisition module is configured to execute acquisition of an audio sample data set, wherein the audio sample data set comprises different types of audio signals;
an information and code rate obtaining module configured to perform obtaining feature information of each audio signal and a target coding rate corresponding to the audio signal, where the feature information is associated with a type of the audio signal, and the target coding rate is a lowest coding rate at which the audio signal meets a target audio quality;
the code rate obtaining module is configured to input the obtained characteristic information into a code rate determination model to be trained to obtain a code rate output by the code rate determination model to be trained;
The loss value obtaining module is configured to execute the coding rate output by the code rate determination model to be trained and the target coding rate, and obtain the loss value of the code rate determination model to be trained;
and the model parameter adjusting module is configured to adjust the model parameters of the code rate determining model to be trained according to the loss value until the loss value is lower than a preset threshold value, and the code rate determining model to be trained is used as the trained code rate determining model.
Optionally, the information and code rate obtaining module includes:
an audio signal encoding unit configured to perform encoding of the audio signal according to a preset code rate to obtain an encoded audio signal;
a quality loss value calculation unit configured to perform calculating a quality loss value of the encoded audio signal from the audio signal and the encoded audio signal;
and the target coding rate determining unit is configured to determine the preset coding rate as the target coding rate corresponding to the audio signal when the quality loss value is smaller than a quality loss threshold and the quality loss value is the minimum quality loss value.
Optionally, the target coding rate determining unit is specifically configured to perform:
when the quality loss value is smaller than a quality loss threshold, reducing the preset code rate, and encoding the audio signal according to the reduced preset code rate to obtain an encoded audio signal until the audio quality loss value is larger than the quality loss threshold;
and taking the previous reduced preset code rate as a target coding code rate.
Optionally, the information and code rate obtaining module is specifically configured to perform:
and acquiring amplitude information and phase information of each audio signal in a time-frequency domain, and determining characteristic information of the audio signal according to the amplitude information and/or the phase information.
Optionally, the information and code rate obtaining module is specifically configured to perform:
acquiring characteristic information of each frame of signal of each audio signal and a target coding rate corresponding to each frame of signal of each audio signal;
or acquiring the characteristic information of each frame signal in each audio signal, taking the average value of the characteristic information of each frame signal as the characteristic information of the audio signal, and acquiring the target coding rate corresponding to the characteristic information of the audio signal.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a code rate determining apparatus, the apparatus including:
a feature information acquisition module configured to perform acquisition of feature information of an audio signal to be encoded;
and the coding rate determination module is configured to execute the step of inputting the characteristic information of the audio signal to be coded into the rate determination model in the third aspect to obtain a coding rate corresponding to the audio signal to be coded, so as to code the audio signal to be coded according to the coding rate corresponding to the audio signal to be coded.
Optionally, the characteristic information obtaining module is specifically configured to perform:
acquiring amplitude information and phase information of the audio signal to be coded in a time-frequency domain, and determining characteristic information of the audio signal to be coded according to the amplitude information and/or the phase information.
Optionally, the characteristic information obtaining module is specifically configured to perform:
acquiring characteristic information of each frame of signal of an audio signal to be coded;
or acquiring the characteristic information of each frame signal in the audio information to be coded, and taking the average value of the characteristic signals of each frame signal as the characteristic information of the audio signal to be coded.
According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic apparatus including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the code rate determination model training method of the first aspect.
According to a sixth aspect of an embodiment of the present disclosure, there is provided an electronic apparatus including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the code rate determination method according to the second aspect.
According to a seventh aspect of embodiments of the present disclosure, there is provided a storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the code rate determination model training method according to the first aspect.
According to an eighth aspect of embodiments of the present disclosure, there is provided a storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the code rate determination method of the second aspect.
According to a ninth aspect of embodiments of the present disclosure, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to implement the code rate determining model training method of the first aspect.
According to a tenth aspect of embodiments of the present disclosure, there is provided a computer program product containing instructions that, when run on a computer, cause the computer to implement the code rate determination method of the second aspect.
Therefore, according to the technical scheme provided by the embodiment of the disclosure, the target output of the code rate determination model is the target coding code rate, and the target coding code rate is the lowest coding code rate when the audio signal meets the target audio quality, so that in the subsequent steps, when the audio data to be coded is coded, the code rate determination model can obtain the coding code rate with a proper size, and the audio quality of the coded audio data can be ensured. Unlike the related art, a higher coding rate is determined, so that the transmission bandwidth of the coded audio data during transmission and the storage space of the coded audio data during storage can be saved.
Drawings
FIG. 1 is a flow diagram illustrating a code rate determination model training method in accordance with an exemplary embodiment;
FIG. 2 is a diagram illustrating a code rate determination model training process according to an example embodiment
FIG. 3 is a flowchart illustrating obtaining a target encoding rate for an audio signal according to an example embodiment;
FIG. 4 is a diagram illustrating a process of obtaining a target coding rate for an audio signal according to an example embodiment;
FIG. 5 is a flow diagram illustrating a method for code rate determination in accordance with an exemplary embodiment;
FIG. 6 is a block diagram illustrating a code rate determination model training apparatus in accordance with an exemplary embodiment;
FIG. 7 is a block diagram illustrating a code rate determination apparatus according to an example embodiment;
FIG. 8 is a block diagram illustrating an electronic device in accordance with an exemplary embodiment;
FIG. 9 is a block diagram illustrating another electronic device in accordance with an exemplary embodiment;
FIG. 10 is a block diagram illustrating a code rate determination model training apparatus or a code rate determination apparatus according to an example embodiment;
fig. 11 is a block diagram illustrating another apparatus for training a code rate determination model or an apparatus for determining a code rate according to an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
In order to solve the technical problem that transmission resources and storage resources are wasted when the coded audio signal is transmitted and stored due to the fact that the coding rate of the audio signal is high in the related art, the disclosure provides a code rate determination model training method, a code rate determination method and a device.
In a first aspect, a method for training a code rate determination model provided in an embodiment of the present disclosure will be described in detail.
As shown in fig. 1, a method for training a code rate determination model according to an embodiment of the present disclosure may include the following steps:
in step S11, an audio sample data set is acquired.
Wherein, the audio sample data set comprises different types of audio signals.
Specifically, when the code rate determination model is trained, a large amount of sample data needs to be acquired, that is, an audio sample data set needs to be acquired. Also, different types of audio signals may be included in the set of audio sample data.
For example, the audio sample data set may include different types of audio signals, such as a speech signal, a music signal, and a background environment sound signal, and the type of the audio signal included in the audio sample data set is not specifically limited in the embodiments of the present disclosure.
In step S12, the feature information of each audio signal and the target coding rate corresponding to the audio signal are obtained.
The characteristic information of the audio signal is associated with the type of the audio signal, and the target coding rate is the lowest coding rate when the audio signal meets the target audio quality.
Specifically, after the audio sample data set is obtained, the feature information and the target coding rate of each audio signal in the audio sample data set may be obtained.
The characteristic information of an audio signal is associated with the type of audio signal, and is usually different for different types of audio signals. Specifically, when the type of the audio signal is a voice signal, the feature information of the audio signal is the feature information of the voice signal; when the type of the audio signal is a music signal, the characteristic information of the audio signal is the characteristic information of the music signal; similarly, when the type of the audio signal is a background environment sound signal, the characteristic information of the audio signal is the characteristic information of the background environment sound signal. The characteristic information of the audio signal may be amplitude information, phase information, and the like of the audio signal in a time-frequency domain, and the embodiment of the present disclosure does not specifically limit the characteristic information of the audio signal.
In addition, in order to ensure that the transmission bandwidth of the encoded audio signal during transmission and the storage space of the encoded audio signal during storage can be saved as much as possible when the encoded audio signal meets the specified audio quality, the target encoding rate is required to be the lowest encoding rate when the audio signal meets the target audio quality. The target audio quality may be a designated audio quality, and the designated audio quality may be determined according to actual conditions, for example, for an audio signal of which the type is background environmental sound, the designated audio quality may be lower; for audio signals of the type music signals, the specified audio quality may be higher. The size of the target audio quality is not particularly limited in the embodiments of the present disclosure.
For clarity of the description of the scheme, the following embodiments will explain in detail specific implementations of obtaining the feature information of each audio signal and the target coding rate corresponding to the audio signal.
In step S13, the obtained feature information is input into the code rate determination model to be trained, so as to obtain the coding code rate output by the code rate determination model to be trained.
After the feature information of the audio signal and the target coding rate corresponding to the audio signal are obtained, the code rate determination model may be trained. Specifically, the obtained feature information of the audio signal may be input to the code rate determination model to be trained, and the coding code rate of the audio signal may be output from the code rate determination model to be trained.
In step S14, the loss value of the code rate determination model to be trained is obtained according to the code rate output by the code rate determination model to be trained and the target code rate.
Specifically, since the target output of the code rate determination model is the target coding code rate, after the coding code rate output by the code rate determination model to be trained is obtained, the coding code rate output by the model and the target coding code rate can be determined according to the code rate to be trained, so as to calculate the loss value of the code rate determination model to be trained.
It can be understood that the loss value of the code rate determination model to be trained can be used to characterize the magnitude of the difference between the coding code rate output from the code rate determination model to be trained and the target coding code rate. If the loss value of the code rate determination model to be trained is larger, the difference value between the coding code rate output by the code rate determination model to be trained and the target coding code rate is larger; and if the loss value of the code rate determination model to be trained is smaller, the difference value between the coding code rate output by the code rate determination model to be trained and the target coding code rate is smaller.
In step S15, the model parameters of the code rate determination model to be trained are adjusted according to the loss value, and the code rate determination model to be trained is used as the trained code rate determination model until the loss value is lower than the preset threshold value.
Specifically, if the loss value of the to-be-trained code rate determination model is large, it indicates that the difference between the coding code rate output from the to-be-trained code rate determination model and the target coding code rate is large, and in order to make the coding code rate output from the to-be-trained code rate determination model approach the target coding code rate, the model parameters of the to-be-trained code rate determination model may be adjusted.
And after adjusting the model parameters of the code rate determination model to be trained, inputting the characteristic information of the audio signal into the code rate determination model to be trained again, obtaining the coding code rate output from the code rate determination model to be trained again, and calculating the loss value of the code rate determination model to be trained according to the coding code rate output from the code rate determination model to be trained and the target coding code rate obtained again. When the loss value is smaller than the preset threshold value, the coding rate output from the code rate determination model to be trained is close to the target coding rate, and at the moment, the code rate determination model to be trained can be used as the trained code rate determination model.
It should be noted that the preset threshold may be determined according to actual conditions, and the size of the preset threshold is not specifically limited in the embodiment of the present disclosure.
Therefore, according to the technical scheme provided by the embodiment of the disclosure, the target output of the code rate determination model is the target coding code rate, and the target coding code rate is the lowest coding code rate when the audio signal meets the target audio quality, so that in the subsequent steps, when the audio data to be coded is coded, the code rate determination model can obtain the coding code rate with a proper size, and the audio quality of the coded audio data can be ensured. Unlike the related art, a higher coding rate is determined, so that the transmission bandwidth of the coded audio data during transmission and the storage space of the coded audio data during storage can be saved.
In order to more intuitively and clearly understand the training process of the code rate determination model, the following describes the training process of the code rate determination model with reference to a specific example, as shown in fig. 2.
When a code rate determination model is trained, obtaining an audio signal; and extracting the characteristics of the audio signal, and estimating the coding rate of the audio signal based on comprehensive analysis, namely obtaining the target coding rate corresponding to the audio signal, wherein the target coding rate is the target output of the code rate determination model.
And then training a neural network, namely a code rate determination model to be trained, based on the extracted features and the coding rate of the audio signal. And continuously optimizing parameters of the neural network in the process of training the neural network, obtaining the optimal parameters of the neural network when the loss value of the neural network is smaller than a preset threshold value, and determining the trained neural network as a trained code rate determination model.
For clarity of the description of the scheme, a detailed description will be given below of a specific implementation of obtaining a target coding rate corresponding to an audio signal.
In one embodiment, obtaining the target coding rate corresponding to the audio signal, as shown in fig. 3, may include the following steps:
in step S31, the audio signal is encoded according to a preset code rate to obtain an encoded audio signal.
Specifically, when the target coding rate of the audio signal is determined, the audio signal may be coded according to a predetermined preset coding rate, so as to obtain the coded audio signal. The number of the preset code rates may be multiple, and specifically, the multiple preset code rates may be an initial code rate with a predetermined larger value and a code rate obtained by reducing the initial code rate.
It should be noted that the size of the preset code rate may be set according to an actual situation, and the size of the preset code rate is not specifically limited in the embodiment of the present disclosure.
In step S32, a quality loss value of the encoded audio signal is calculated from the audio signal and the encoded audio signal.
Specifically, after obtaining the encoded audio signal, the audio quality of the encoded audio signal may be obtained, and the quality loss value of the encoded audio signal may be determined according to the audio quality of the encoded audio signal and the signal quality of the audio signal before encoding. The quality loss value may be used to measure the degree of loss of audio quality in the encoded audio signal compared to the audio signal before encoding.
The process of calculating the quality loss value of the encoded audio signal may be: and performing quality loss evaluation on the audio quality of the encoded audio signal and the audio quality of the audio signal before encoding by using an audio quality evaluation method to obtain a quality loss value of the encoded audio signal.
Also, the audio quality evaluation method may include an objective evaluation method and a subjective evaluation method. For example, the objective evaluation method may be PEAQ (Perceptual evaluation of Audio Quality), etc.; the subjective evaluation method may be MUSHRA (Multi-Stimulus Test with Hidden Reference and benchmark Test method), or the like. The audio quality evaluation method according to the embodiment of the present disclosure is not particularly limited. For example, PEAQ performs a quality loss evaluation on the audio quality of the encoded audio signal and the audio quality of the audio signal before encoding, and the resulting quality loss value of the encoded audio signal may be 0.1.
In step S33, when the quality loss value is smaller than the quality loss threshold and the quality loss value is the minimum quality loss value, the preset code rate is determined as the target coding rate corresponding to the audio signal.
Wherein the quality loss threshold may be a magnitude of a difference between the audio quality of the unencoded audio signal and the target audio quality.
Specifically, if the audio signal is encoded by using a preset code rate, the quality loss value of the encoded audio signal is smaller than the quality loss threshold, and the quality loss value is the minimum quality loss value, it indicates that after the audio signal is encoded by using the preset code rate, the audio quality of the encoded audio signal can meet the target audio quality, and the audio quality of the encoded audio signal just meets the target audio quality, that is, the preset code rate is the lowest encoding code rate when the target audio quality is met. Therefore, the preset code rate can be determined as the target coding rate corresponding to the audio signal.
As an implementation manner of the embodiment of the present disclosure, when the quality loss value is smaller than the quality loss threshold and the quality loss value is the minimum quality loss value, determining the preset code rate as the target coding code rate corresponding to the audio signal may include the following steps:
When the quality loss value is smaller than the quality loss threshold, reducing a preset code rate, and encoding the audio signal according to the reduced preset code rate to obtain an encoded audio signal until the audio quality loss value is larger than the quality loss threshold;
and taking the previous reduced preset code rate as a target coding code rate.
Specifically, when determining the target coding rate of the audio signal, the audio signal may be encoded according to a larger preset coding rate, that is, the initial coding rate. The initial coding rate may be a coding rate that can ensure the audio quality of the encoded audio signal to the greatest extent.
And after the audio signal is encoded according to the initial encoding code rate, the quality loss value of the encoded audio signal can be compared with the quality loss threshold, if the quality loss value of the encoded audio signal is smaller than the quality loss threshold, the quality loss value of the encoded audio signal is smaller, and the preset code rate can be continuously reduced under the condition that the target audio quality is met. And re-encoding the audio signal according to the reduced preset code rate, comparing the quality loss value of the re-encoded audio signal with the quality loss threshold, if the quality loss value of the re-encoded audio signal is still smaller than the quality loss threshold, continuing to reduce the preset code rate until the quality loss value of the encoded audio signal is larger than the quality loss threshold, indicating that the former reduced preset code rate is the lowest encoding code rate when the encoded audio signal meets the target audio quality, and therefore, taking the former reduced preset code rate as the target encoding code rate.
It can be seen that the target coding rate determined by the implementation manner is the lowest coding rate when the encoded audio signal meets the target audio quality.
In order to more intuitively and clearly understand the process of obtaining the target coding rate of the audio signal, the following describes a process of obtaining the target coding rate of the audio signal with reference to a specific example, as shown in fig. 4.
Firstly, an audio encoder encodes an audio signal according to an initial code rate, and performs quality damage assessment on the encoded audio signal.
Second, it is determined whether the mass damage value is above a threshold. If the judgment result is negative, namely the quality damage value is lower than the threshold value, the code rate is updated, namely the initial code rate is reduced.
Thirdly, the audio encoder encodes the audio signal according to the updated code rate, obtains the encoded audio again, and evaluates the quality damage of the encoded audio signal.
Fourthly, judging whether the quality damage value is higher than the threshold value again. If the judgment result is negative, namely the quality damage value is lower than the threshold value, the code rate is updated, namely the initial code rate is reduced, and the audio encoder encodes the audio signal according to the updated code rate. And outputting the last code rate meeting the threshold value until the judgment result is yes, namely when the quality damage value is judged to be higher than the threshold value, namely outputting the code rate with the damage value lower than the code rate corresponding to the threshold value, wherein the output code rate is the target coding code rate.
For clarity of description of the scheme, a detailed description of a specific embodiment for acquiring the feature information of each audio signal will be described below.
In one embodiment, obtaining the feature information of each audio signal may include:
and acquiring amplitude information and phase information of each audio signal in a time-frequency domain, and determining characteristic information of the audio signal according to the amplitude information and/or the phase information.
Specifically, the audio signal may be converted to a time-frequency domain by using a time-frequency conversion method, such as short-time fourier transform, to obtain a complex signal S (n, k).
S(n,k)=A(n,k)*eiθ(n,k)
Where a (n, k) is amplitude information and θ (n, k) is phase information.
Moreover, as an implementation manner of the embodiment of the present disclosure, after obtaining the amplitude information and the phase information, the amplitude information may be directly used as the feature information of the audio signal; alternatively, the phase information may be used as characteristic information of the audio signal; alternatively, it is reasonable to use both amplitude information and phase information as the characteristic information of the audio signal.
As another implementation manner of the embodiment of the present disclosure, after the amplitude information and the phase information are obtained, other characteristic information of the audio signal may be obtained by performing preset processing on the amplitude information and/or the phase information. Wherein, the other characteristic information may include: MFCC (Mel Frequency Cepstrum Coefficient, Mel Frequency cepstral Coefficient); mel-frequency spectrum melspectrogram; spectral contrast ratio, etc., and other audio features are not particularly limited in the embodiments of the present disclosure. In this case, it is reasonable that any one or more of the amplitude information, the phase information, and the other characteristic information may be used as the characteristic information of the audio signal.
Therefore, the technical scheme provided by the embodiment can accurately show the characteristic information of the audio signal.
For clarity of the description of the scheme, a detailed description will be given below of a specific implementation for obtaining the feature information of each audio signal and the target coding rate corresponding to the audio signal.
In one embodiment, obtaining the characteristic information of each audio signal and the target coding rate corresponding to the audio signal may include the following steps:
and acquiring the characteristic information of each frame of signal of each audio signal and the target coding rate corresponding to each frame of signal of each audio signal.
In this embodiment, when training the code rate determination model, the feature information of each frame of signal of each audio signal and the corresponding coding code rate may be obtained, that is, more training data of the code rate determination model are trained, and therefore, the accuracy of the trained coding code rate determination model is higher.
Therefore, by the technical scheme of the embodiment, the accuracy of the trained code rate determination model is high.
In another embodiment, obtaining the characteristic information of each audio signal and the target coding rate corresponding to the audio signal may include the following steps:
The method comprises the steps of obtaining the characteristic information of each frame of signal in each audio signal, taking the average value of the characteristic information of each frame of signal as the characteristic information of the audio signal, and obtaining the target coding rate corresponding to the characteristic information of the audio signal.
In practical application, in order to reduce the calculation amount of the process of determining the model by training the code rate, the dimension reduction can be performed on the characteristic information of the audio signal.
For example, if an audio signal is an audio signal of 30 consecutive frames, the feature information of the audio signal of 30 frames may be averaged to obtain the feature information of a frame length, and the feature information of the frame length is determined as the feature information of the audio signal, where the target coding rate is a coding rate corresponding to the feature information of the audio signal.
Therefore, the technical scheme of the embodiment can reduce the calculation amount of the process of determining the model by training the code rate.
In a second aspect, a method for determining a code rate provided by an embodiment of the present disclosure will be described in detail.
As shown in fig. 5, the method for determining a code rate according to the embodiment of the present disclosure may include the following steps:
in step S51, feature information of the audio signal to be encoded is acquired.
Specifically, when audio signals are transmitted or stored, in order to reduce transmission bandwidth or storage space, the audio signals need to be encoded, and these audio signals to be encoded may be referred to as audio signals to be encoded.
In order to accurately obtain the coding rate corresponding to the audio signal to be coded, the feature information of the audio signal to be coded needs to be obtained, so that in the subsequent step, the feature information of the audio signal to be coded can be input into the code rate determination model described in the first aspect, and the coding rate corresponding to the audio signal to be coded is obtained.
In one embodiment, obtaining the feature information of the audio signal to be encoded may include the following steps:
acquiring amplitude information and phase information of the audio signal to be encoded in a time-frequency domain, and determining characteristic information of the audio signal to be encoded according to the amplitude information and/or the phase information.
Specifically, the audio signal to be encoded may be converted to the time-frequency domain by using a time-frequency conversion method, such as short-time fourier transform, to obtain a complex signal S (n, k).
S(n,k)=A(n,k)*eiθ(n,k)
Where a (n, k) is amplitude information and θ (n, k) is phase information.
Moreover, as an implementation manner of the embodiment of the present disclosure, after obtaining the amplitude information and the phase information, the amplitude information may be directly used as the feature information of the audio signal to be encoded; alternatively, the phase information may be used as characteristic information of the audio signal to be encoded; alternatively, it is also possible to use both amplitude information and phase information as characteristic information of the audio signal to be encoded, which is reasonable.
As another implementation manner of the embodiment of the present disclosure, after the amplitude information and the phase information are obtained, other feature information of the audio signal to be encoded may be obtained by performing preset processing on the amplitude information and/or the phase information. Wherein, the other characteristic information may include: MFCC (Mel Frequency Cepstrum Coefficient, Mel Frequency cepstral Coefficient); mel-frequency spectrum melspectrogram; spectral contrast ratio, etc., and other audio features are not particularly limited in the embodiments of the present disclosure. At this time, it is reasonable that any one or more of amplitude information, phase information, and other characteristic information may be used as the characteristic information of the audio signal to be encoded.
In step S52, the feature information of the audio signal to be encoded is input into the code rate determination model described in the first aspect, so as to obtain an encoding code rate corresponding to the audio signal to be encoded, and encode the audio signal to be encoded according to the encoding code rate corresponding to the audio signal to be encoded.
After obtaining the feature information of the audio signal to be encoded, the feature information of the audio signal to be encoded may be input into the code rate determination model obtained by the training of the first aspect, so as to obtain the encoding code rate corresponding to the audio signal to be encoded. The coding rate corresponding to the audio signal to be coded obtained by the rate determination model is appropriate, and the audio quality of the coded audio signal can be ensured.
According to the technical scheme provided by the embodiment of the disclosure, the characteristic information of the audio signal to be coded is obtained; inputting the characteristic information of the audio signal to be encoded into the code rate determination model of the first aspect to obtain the encoding code rate corresponding to the audio signal to be encoded, and encoding the audio signal to be encoded according to the encoding code rate corresponding to the audio signal to be encoded. The coding rate corresponding to the audio signal to be coded obtained by the code rate determination model is proper, and the audio quality of the coded audio signal can be ensured, so that the transmission bandwidth during transmission of the coded audio signal and the storage space during storage of the coded audio signal can be saved.
And, in an embodiment, acquiring the feature information of the audio signal to be encoded may include the steps of:
characteristic information of each frame signal of the audio signal to be encoded is acquired.
In this embodiment, the feature information of each frame of signal of the audio signal to be encoded may be obtained, so that, in the subsequent step, by inputting the feature information of each frame of signal of the audio signal to be encoded into the code rate determination model, the accuracy of the obtained code rate corresponding to the audio signal to be encoded is relatively higher.
In another embodiment, obtaining the feature information of the audio signal to be encoded may include the following steps:
and acquiring the characteristic information of each frame of signal in the audio information to be coded, and taking the average value of the characteristic signals of each frame of signal as the characteristic information of the audio signal to be coded.
In practical applications, in order to reduce the workload of acquiring the feature information of the audio signal to be encoded, the feature information of the audio signal to be encoded may be subjected to dimension reduction.
For example, if an audio signal to be encoded is an audio signal of 30 consecutive frames, the feature information of the audio signal to be encoded of the 30 frames may be averaged to obtain the feature information of a frame length, and the feature information of the frame length is determined as the feature information of the audio signal to be encoded.
According to a third aspect of the embodiments of the present disclosure, there is provided a code rate determination model training apparatus, as shown in fig. 6, the apparatus includes:
an audio signal acquisition module 610 configured to perform acquisition of an audio sample data set, where the audio sample data set includes audio signals of different types;
An information and code rate obtaining module 620, configured to perform obtaining feature information of each audio signal and a target coding rate corresponding to the audio signal, where the feature information is associated with a type of the audio signal, and the target coding rate is a lowest coding rate at which the audio signal meets a target audio quality;
a coding rate obtaining module 630, configured to input the obtained feature information into a to-be-trained code rate determination model, so as to obtain a coding rate output by the to-be-trained code rate determination model;
a loss value obtaining module 640, configured to execute an encoding rate output according to the code rate determination model to be trained and the target encoding rate, and obtain a loss value of the code rate determination model to be trained;
and the model parameter adjusting module 650 is configured to perform adjustment of the model parameters of the to-be-trained code rate determination model according to the loss value until the loss value is lower than a preset threshold, and use the to-be-trained code rate determination model as the trained code rate determination model.
Therefore, according to the technical scheme provided by the embodiment of the disclosure, the target output of the code rate determination model is the target coding code rate, and the target coding code rate is the lowest coding code rate when the audio signal meets the target audio quality, so that in the subsequent steps, when the audio data to be coded is coded, the code rate determination model can obtain the coding code rate with a proper size, and the audio quality of the coded audio data can be ensured. Unlike the related art, a higher coding rate is determined, so that the transmission bandwidth of the coded audio data during transmission and the storage space of the coded audio data during storage can be saved.
Optionally, the information and code rate obtaining module includes:
an audio signal encoding unit configured to perform encoding of the audio signal according to a preset code rate to obtain an encoded audio signal;
a quality loss value calculation unit configured to perform calculating a quality loss value of the encoded audio signal from the audio signal and the encoded audio signal;
and the target coding rate determining unit is configured to determine the preset coding rate as the target coding rate corresponding to the audio signal when the quality loss value is smaller than a quality loss threshold and the quality loss value is the minimum quality loss value.
Optionally, the target coding rate determining unit is specifically configured to perform:
when the quality loss value is smaller than a quality loss threshold, reducing the preset code rate, and encoding the audio signal according to the reduced preset code rate to obtain an encoded audio signal until the audio quality loss value is larger than the quality loss threshold;
and taking the previous reduced preset code rate as a target coding code rate.
Optionally, the information and code rate obtaining module is specifically configured to perform:
And acquiring amplitude information and phase information of each audio signal in a time-frequency domain, and determining characteristic information of the audio signal according to the amplitude information and/or the phase information.
Optionally, the information and code rate obtaining module is specifically configured to perform:
acquiring characteristic information of each frame of signal of each audio signal and a target coding rate corresponding to each frame of signal of each audio signal;
or acquiring the characteristic information of each frame signal in each audio signal, taking the average value of the characteristic information of each frame signal as the characteristic information of the audio signal, and acquiring the target coding rate corresponding to the characteristic information of the audio signal.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a code rate determining apparatus, as shown in fig. 7, the apparatus including:
a feature information obtaining module 710 configured to perform obtaining feature information of an audio signal to be encoded;
the encoding rate determining module 720 is configured to execute the step of inputting the feature information of the audio signal to be encoded into the rate determining model described in the third aspect to obtain the encoding rate corresponding to the audio signal to be encoded, so as to encode the audio signal to be encoded according to the encoding rate corresponding to the audio signal to be encoded.
According to the technical scheme provided by the embodiment of the disclosure, the characteristic information of the audio signal to be coded is obtained; inputting the characteristic information of the audio signal to be encoded into the code rate determination model of the first aspect to obtain the encoding code rate corresponding to the audio signal to be encoded, and encoding the audio signal to be encoded according to the encoding code rate corresponding to the audio signal to be encoded. The coding rate corresponding to the audio signal to be coded obtained by the code rate determination model is proper, and the audio quality of the coded audio signal can be ensured, so that the transmission bandwidth during transmission of the coded audio signal and the storage space during storage of the coded audio signal can be saved.
Optionally, the characteristic information obtaining module is specifically configured to perform:
acquiring amplitude information and phase information of the audio signal to be coded in a time-frequency domain, and determining characteristic information of the audio signal to be coded according to the amplitude information and/or the phase information.
Optionally, the characteristic information obtaining module is specifically configured to perform:
acquiring characteristic information of each frame of signal of an audio signal to be coded;
or acquiring the characteristic information of each frame signal in the audio information to be coded, and taking the average value of the characteristic signals of each frame signal as the characteristic information of the audio signal to be coded.
According to a fifth aspect of the embodiments of the present disclosure, there is provided an electronic apparatus, as shown in fig. 8, including:
a processor 810;
a memory 820 for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the code rate determination model training method of the first aspect.
Therefore, according to the technical scheme provided by the embodiment of the disclosure, the target output of the code rate determination model is the target coding code rate, and the target coding code rate is the lowest coding code rate when the audio signal meets the target audio quality, so that in the subsequent steps, when the audio data to be coded is coded, the code rate determination model can obtain the coding code rate with a proper size, and the audio quality of the coded audio data can be ensured. Unlike the related art, a higher coding rate is determined, so that the transmission bandwidth of the coded audio data during transmission and the storage space of the coded audio data during storage can be saved.
According to a sixth aspect of an embodiment of the present disclosure, there is provided an electronic apparatus, as shown in fig. 9, including:
a processor 910;
a memory 920 for storing the processor-executable instructions;
Wherein the processor is configured to execute the instructions to implement the code rate determination method according to the second aspect.
According to the technical scheme provided by the embodiment of the disclosure, the characteristic information of the audio signal to be coded is obtained; inputting the characteristic information of the audio signal to be encoded into the code rate determination model of the first aspect to obtain the encoding code rate corresponding to the audio signal to be encoded, and encoding the audio signal to be encoded according to the encoding code rate corresponding to the audio signal to be encoded. The coding rate corresponding to the audio signal to be coded obtained by the code rate determination model is proper, and the audio quality of the coded audio signal can be ensured, so that the transmission bandwidth during transmission of the coded audio signal and the storage space during storage of the coded audio signal can be saved.
Fig. 10 is a block diagram illustrating an apparatus 1000 for training a coding rate determination model, or determining a coding rate, according to an example embodiment. For example, the apparatus 1000 may be provided as a server. Referring to fig. 10, the apparatus 1000 includes a processing component 1022 that further includes one or more processors and memory resources, represented by memory 1032, for storing instructions, such as application programs, that are executable by the processing component 1022. The application programs stored in memory 1032 may include one or more modules that each correspond to a set of instructions. Furthermore, the processing component 1022 is configured to execute instructions to perform the code rate determination model training method of the first aspect, or the code rate determination method of the second aspect.
The device 1000 may also include a power supply component 1026 configured to perform power management for the device 1000, a wired or wireless network interface 1050 configured to connect the device 1000 to a network, and an input/output (I/O) interface 1058. The apparatus 1000 may operate based on an operating system stored in memory 1032, such as Windows Server, MacOS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.
Therefore, according to the technical scheme provided by the embodiment of the disclosure, the target output of the code rate determination model is the target coding code rate, and the target coding code rate is the lowest coding code rate when the audio signal meets the target audio quality, so that in the subsequent steps, when the audio data to be coded is coded, the code rate determination model can obtain the coding code rate with a proper size, and the audio quality of the coded audio data can be ensured. Unlike the related art, a higher coding rate is determined, so that the transmission bandwidth of the coded audio data during transmission and the storage space of the coded audio data during storage can be saved.
Fig. 11 is a block diagram illustrating an apparatus for training a coding rate determination model, or an apparatus 1100 for determining a coding rate according to an example embodiment. For example, the apparatus 1100 may be a mobile phone, a computer, a digital broadcast electronic device, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 11, apparatus 1100 may include one or more of the following components: a processing component 1102, a memory 1104, a power component 1106, a multimedia component 1108, an audio component 1110, an input/output (I/O) interface 1112, a sensor component 1114, and a communication component 1116.
The processing component 1102 generally controls the overall operation of the device 1100, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 1102 may include one or more processors 1120 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 1102 may include one or more modules that facilitate interaction between the processing component 1102 and other components. For example, the processing component 1102 may include a multimedia module to facilitate interaction between the multimedia component 1108 and the processing component 1102.
The memory 1104 is configured to store various types of data to support operation at the device 1100. Examples of such data include instructions for any application or method operating on device 1100, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1104 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 1107 provides power to the various components of the device 1100. The power components 1107 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the apparatus 1100.
The multimedia component 1108 includes a screen that provides an output interface between the device 1100 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 1108 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 1100 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 1110 is configured to output and/or input audio signals. For example, the audio component 1110 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 1100 is in operating modes, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 404 or transmitted via the communication component 1116. In some embodiments, the audio assembly 1110 further includes a speaker for outputting audio signals.
The I/O interface 1112 provides an interface between the processing component 1102 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 1114 includes one or more sensors for providing various aspects of state assessment for the apparatus 1100. For example, the sensor assembly 1114 may detect an open/closed state of the device 1100, the relative positioning of components, such as a display and keypad of the apparatus 1100, the sensor assembly 1114 may also detect a change in position of the apparatus 1100 or a component of the apparatus 1100, the presence or absence of user contact with the apparatus 1100, an orientation or acceleration/deceleration of the apparatus 1100, and a change in temperature of the apparatus 1100. The sensor assembly 1114 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 1114 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1114 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 1116 is configured to facilitate wired or wireless communication between the apparatus 1100 and other devices. The apparatus 1100 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 416 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 1116 also includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 1100 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the code rate determination model training method of the first aspect, or the code rate determination method of the second aspect.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 1104 comprising instructions, executable by the processor 1120 of the apparatus 1100 to perform the method described above is also provided. Alternatively, for example, the storage medium may be a non-transitory computer-readable storage medium, such as a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Therefore, according to the technical scheme provided by the embodiment of the disclosure, the target output of the code rate determination model is the target coding code rate, and the target coding code rate is the lowest coding code rate when the audio signal meets the target audio quality, so that in the subsequent steps, when the audio data to be coded is coded, the code rate determination model can obtain the coding code rate with a proper size, and the audio quality of the coded audio data can be ensured. Unlike the related art, a higher coding rate is determined, so that the transmission bandwidth of the coded audio data during transmission and the storage space of the coded audio data during storage can be saved.
According to a seventh aspect of embodiments of the present disclosure, there is provided a storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the code rate determination model training method according to the first aspect.
Therefore, according to the technical scheme provided by the embodiment of the disclosure, the target output of the code rate determination model is the target coding code rate, and the target coding code rate is the lowest coding code rate when the audio signal meets the target audio quality, so that in the subsequent steps, when the audio data to be coded is coded, the code rate determination model can obtain the coding code rate with a proper size, and the audio quality of the coded audio data can be ensured. Unlike the related art, a higher coding rate is determined, so that the transmission bandwidth of the coded audio data during transmission and the storage space of the coded audio data during storage can be saved.
According to an eighth aspect of embodiments of the present disclosure, there is provided a storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the code rate determination method of the second aspect.
According to the technical scheme provided by the embodiment of the disclosure, the characteristic information of the audio signal to be coded is obtained; inputting the characteristic information of the audio signal to be encoded into the code rate determination model of the first aspect to obtain the encoding code rate corresponding to the audio signal to be encoded, and encoding the audio signal to be encoded according to the encoding code rate corresponding to the audio signal to be encoded. The coding rate corresponding to the audio signal to be coded obtained by the code rate determination model is proper, and the audio quality of the coded audio signal can be ensured, so that the transmission bandwidth during transmission of the coded audio signal and the storage space during storage of the coded audio signal can be saved.
According to a ninth aspect of embodiments of the present disclosure, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to implement the code rate determining model training method of the first aspect.
Therefore, according to the technical scheme provided by the embodiment of the disclosure, the target output of the code rate determination model is the target coding code rate, and the target coding code rate is the lowest coding code rate when the audio signal meets the target audio quality, so that in the subsequent steps, when the audio data to be coded is coded, the code rate determination model can obtain the coding code rate with a proper size, and the audio quality of the coded audio data can be ensured. Unlike the related art, a higher coding rate is determined, so that the transmission bandwidth of the coded audio data during transmission and the storage space of the coded audio data during storage can be saved.
According to a tenth aspect of embodiments of the present disclosure, there is provided a computer program product containing instructions that, when run on a computer, cause the computer to implement the code rate determination method of the second aspect.
According to the technical scheme provided by the embodiment of the disclosure, the characteristic information of the audio signal to be coded is obtained; inputting the characteristic information of the audio signal to be encoded into the code rate determination model of the first aspect to obtain the encoding code rate corresponding to the audio signal to be encoded, and encoding the audio signal to be encoded according to the encoding code rate corresponding to the audio signal to be encoded. The coding rate corresponding to the audio signal to be coded obtained by the code rate determination model is proper, and the audio quality of the coded audio signal can be ensured, so that the transmission bandwidth during transmission of the coded audio signal and the storage space during storage of the coded audio signal can be saved.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A method for rate-determining model training, the method comprising:
acquiring an audio sample data set, wherein the audio sample data set comprises different types of audio signals;
acquiring characteristic information of each audio signal and a target coding rate corresponding to the audio signal, wherein the characteristic information is associated with the type of the audio signal, and the target coding rate is the lowest coding rate when the audio signal meets the target audio quality;
inputting the acquired characteristic information into a code rate determination model to be trained to obtain a coding code rate output by the code rate determination model to be trained;
determining the coding rate output by the model according to the code rate to be trained and the target coding rate, and obtaining the loss value of the code rate determination model to be trained;
and adjusting model parameters of the code rate determination model to be trained according to the loss value until the loss value is lower than a preset threshold value, and taking the code rate determination model to be trained as the trained code rate determination model.
2. The method of claim 1, wherein obtaining the target coding rate corresponding to the audio signal comprises:
encoding the audio signal according to a preset code rate to obtain an encoded audio signal;
calculating a quality loss value of the encoded audio signal from the audio signal and the encoded audio signal;
and when the quality loss value is smaller than a quality loss threshold value and the quality loss value is the minimum quality loss value, determining the preset code rate as the target coding code rate corresponding to the audio signal.
3. The method of claim 2, wherein when the quality loss value is smaller than a quality loss threshold and the quality loss value is a minimum quality loss value, determining the preset code rate as a target coding code rate corresponding to the audio signal comprises:
when the quality loss value is smaller than a quality loss threshold, reducing the preset code rate, and encoding the audio signal according to the reduced preset code rate to obtain an encoded audio signal until the audio quality loss value is larger than the quality loss threshold;
and taking the previous reduced preset code rate as a target coding code rate.
4. A method for determining a code rate, the method comprising:
acquiring characteristic information of an audio signal to be encoded;
inputting the characteristic information of the audio signal to be encoded into the code rate determination model of any one of claims 1 to 3 to obtain the encoding code rate corresponding to the audio signal to be encoded, so as to encode the audio signal to be encoded according to the encoding code rate corresponding to the audio signal to be encoded.
5. An apparatus for rate-determining model training, the apparatus comprising:
the audio signal acquisition module is configured to execute acquisition of an audio sample data set, wherein the audio sample data set comprises different types of audio signals;
an information and code rate obtaining module configured to perform obtaining feature information of each audio signal and a target coding rate corresponding to the audio signal, where the feature information is associated with a type of the audio signal, and the target coding rate is a lowest coding rate at which the audio signal meets a target audio quality;
the code rate obtaining module is configured to input the obtained characteristic information into a code rate determination model to be trained to obtain a code rate output by the code rate determination model to be trained;
The loss value obtaining module is configured to execute the coding rate output by the code rate determination model to be trained and the target coding rate, and obtain the loss value of the code rate determination model to be trained;
and the model parameter adjusting module is configured to adjust the model parameters of the code rate determining model to be trained according to the loss value until the loss value is lower than a preset threshold value, and the code rate determining model to be trained is used as the trained code rate determining model.
6. An apparatus for determining a code rate, the apparatus comprising:
a feature information acquisition module configured to perform acquisition of feature information of an audio signal to be encoded;
an encoding rate determination module, configured to input the feature information of the audio signal to be encoded into the rate determination model of claim 5, to obtain an encoding rate corresponding to the audio signal to be encoded, so as to encode the audio signal to be encoded according to the encoding rate corresponding to the audio signal to be encoded.
7. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the code rate determination model training method of any of claims 1 to 3.
8. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the code rate determination method of claim 4.
9. A storage medium having instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the code rate determination model training method of any of claims 1 to 3.
10. A storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the code rate determination method of claim 4.
CN202010575623.7A 2020-06-22 2020-06-22 Code rate determination model training method, code rate determination method and device Pending CN111862995A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010575623.7A CN111862995A (en) 2020-06-22 2020-06-22 Code rate determination model training method, code rate determination method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010575623.7A CN111862995A (en) 2020-06-22 2020-06-22 Code rate determination model training method, code rate determination method and device

Publications (1)

Publication Number Publication Date
CN111862995A true CN111862995A (en) 2020-10-30

Family

ID=72988049

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010575623.7A Pending CN111862995A (en) 2020-06-22 2020-06-22 Code rate determination model training method, code rate determination method and device

Country Status (1)

Country Link
CN (1) CN111862995A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112767956A (en) * 2021-04-09 2021-05-07 腾讯科技(深圳)有限公司 Audio encoding method, apparatus, computer device and medium
CN113194320A (en) * 2021-04-30 2021-07-30 北京达佳互联信息技术有限公司 Parameter prediction model training method and device and parameter prediction method and device
CN115334349A (en) * 2022-07-15 2022-11-11 北京达佳互联信息技术有限公司 Audio processing method and device, electronic equipment and storage medium
WO2023077707A1 (en) * 2021-11-02 2023-05-11 深圳市中兴微电子技术有限公司 Video encoding method, model training method, device, and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07160297A (en) * 1993-12-10 1995-06-23 Nec Corp Voice parameter encoding system
WO1997031367A1 (en) * 1996-02-26 1997-08-28 At & T Corp. Multi-stage speech coder with transform coding of prediction residual signals with quantization by auditory models
US6839674B1 (en) * 1998-01-12 2005-01-04 Stmicroelectronics Asia Pacific Pte Limited Method and apparatus for spectral exponent reshaping in a transform coder for high quality audio
JP2007017659A (en) * 2005-07-07 2007-01-25 Fujitsu Ltd Audio encoding method and device
US20110125506A1 (en) * 2009-11-26 2011-05-26 Research In Motion Limited Rate-distortion optimization for advanced audio coding
US20140249806A1 (en) * 2011-10-28 2014-09-04 Panasonic Corporation Audio encoding apparatus, audio decoding apparatus, audio encoding method, and audio decoding method
US20170104552A1 (en) * 2015-10-10 2017-04-13 Dolby Laboratories Licensing Corporation Near Optimal Forward Error Correction System and Method
CN110300315A (en) * 2019-07-24 2019-10-01 北京达佳互联信息技术有限公司 A kind of video code rate determines method, apparatus, electronic equipment and storage medium
CN110992963A (en) * 2019-12-10 2020-04-10 腾讯科技(深圳)有限公司 Network communication method, device, computer equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07160297A (en) * 1993-12-10 1995-06-23 Nec Corp Voice parameter encoding system
WO1997031367A1 (en) * 1996-02-26 1997-08-28 At & T Corp. Multi-stage speech coder with transform coding of prediction residual signals with quantization by auditory models
US6839674B1 (en) * 1998-01-12 2005-01-04 Stmicroelectronics Asia Pacific Pte Limited Method and apparatus for spectral exponent reshaping in a transform coder for high quality audio
JP2007017659A (en) * 2005-07-07 2007-01-25 Fujitsu Ltd Audio encoding method and device
US20110125506A1 (en) * 2009-11-26 2011-05-26 Research In Motion Limited Rate-distortion optimization for advanced audio coding
US20140249806A1 (en) * 2011-10-28 2014-09-04 Panasonic Corporation Audio encoding apparatus, audio decoding apparatus, audio encoding method, and audio decoding method
US20170104552A1 (en) * 2015-10-10 2017-04-13 Dolby Laboratories Licensing Corporation Near Optimal Forward Error Correction System and Method
CN110300315A (en) * 2019-07-24 2019-10-01 北京达佳互联信息技术有限公司 A kind of video code rate determines method, apparatus, electronic equipment and storage medium
CN110992963A (en) * 2019-12-10 2020-04-10 腾讯科技(深圳)有限公司 Network communication method, device, computer equipment and storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112767956A (en) * 2021-04-09 2021-05-07 腾讯科技(深圳)有限公司 Audio encoding method, apparatus, computer device and medium
CN112767956B (en) * 2021-04-09 2021-07-16 腾讯科技(深圳)有限公司 Audio encoding method, apparatus, computer device and medium
WO2022213787A1 (en) * 2021-04-09 2022-10-13 腾讯科技(深圳)有限公司 Audio encoding method, audio decoding method, apparatus, computer device, storage medium, and computer program product
CN113194320A (en) * 2021-04-30 2021-07-30 北京达佳互联信息技术有限公司 Parameter prediction model training method and device and parameter prediction method and device
CN113194320B (en) * 2021-04-30 2022-11-22 北京达佳互联信息技术有限公司 Parameter prediction model training method and device and parameter prediction method and device
WO2023077707A1 (en) * 2021-11-02 2023-05-11 深圳市中兴微电子技术有限公司 Video encoding method, model training method, device, and storage medium
CN115334349A (en) * 2022-07-15 2022-11-11 北京达佳互联信息技术有限公司 Audio processing method and device, electronic equipment and storage medium
CN115334349B (en) * 2022-07-15 2024-01-02 北京达佳互联信息技术有限公司 Audio processing method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109801644B (en) Separation method, separation device, electronic equipment and readable medium for mixed sound signal
CN111862995A (en) Code rate determination model training method, code rate determination method and device
CN110827253A (en) Training method and device of target detection model and electronic equipment
CN108346425B (en) Voice activity detection method and device and voice recognition method and device
CN113362812B (en) Voice recognition method and device and electronic equipment
CN110853664B (en) Method and device for evaluating performance of speech enhancement algorithm and electronic equipment
CN110650370B (en) Video coding parameter determination method and device, electronic equipment and storage medium
CN111583944A (en) Sound changing method and device
CN109360197B (en) Image processing method and device, electronic equipment and storage medium
CN108364635B (en) Voice recognition method and device
CN111210844B (en) Method, device and equipment for determining speech emotion recognition model and storage medium
CN110033784B (en) Audio quality detection method and device, electronic equipment and storage medium
CN105721656B (en) Ambient noise generation method and device
CN113707134B (en) Model training method and device for model training
CN110930978A (en) Language identification method and device and language identification device
CN110415702A (en) Training method and device, conversion method and device
CN113362813A (en) Voice recognition method and device and electronic equipment
CN107437412B (en) Acoustic model processing method, voice synthesis method, device and related equipment
CN116741191A (en) Audio signal processing method, device, electronic equipment and storage medium
CN115052150A (en) Video encoding method, video encoding device, electronic equipment and storage medium
CN109754816B (en) Voice data processing method and device
CN111209429B (en) Unsupervised model training method and unsupervised model training device for measuring coverage of voice database
CN112201267A (en) Audio processing method and device, electronic equipment and storage medium
CN115039169A (en) Voice instruction recognition method, electronic device and non-transitory computer readable storage medium
CN107564534B (en) Audio quality identification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination