CN117612566A

CN117612566A - Audio quality assessment method and related product

Info

Publication number: CN117612566A
Application number: CN202311530611.2A
Authority: CN
Inventors: 武倩平
Original assignee: Shuhang Technology Beijing Co ltd
Current assignee: Shuhang Technology Beijing Co ltd
Priority date: 2023-11-16
Filing date: 2023-11-16
Publication date: 2024-02-27
Anticipated expiration: 2043-11-16
Also published as: CN117612566B

Abstract

The application discloses an audio quality assessment method and related products. The method comprises the following steps: acquiring original audio and audio to be evaluated, wherein the audio to be evaluated is obtained by performing audio processing on the original audio; dividing the original audio into n frames of first audio frames, wherein n is an integer greater than 1; dividing the audio to be evaluated into n frames of second audio frames; and determining the quality of the audio to be evaluated according to the first difference between the time domain information of the n frames of first audio frames and the time domain information of the n frames of second audio frames, wherein the quality of the audio to be evaluated is inversely related to the first difference.

Description

Audio quality assessment method and related product

Technical Field

The present disclosure relates to the field of audio processing technologies, and in particular, to an audio quality assessment method and related products.

Background

Audio quality assessment is one of the basic techniques in audio processing and is widely used in the field of audio processing, and therefore, how to assess the quality of audio is of great importance.

Disclosure of Invention

The application provides an audio quality assessment method and related products, wherein the related products comprise an audio quality assessment device, electronic equipment, a computer readable storage medium and a computer program product.

In a first aspect, there is provided an audio quality assessment method, the method comprising:

acquiring original audio and audio to be evaluated, wherein the audio to be evaluated is obtained by performing audio processing on the original audio;

dividing the original audio into n frames of first audio frames, wherein n is an integer greater than 1;

dividing the audio to be evaluated into n frames of second audio frames;

and determining the quality of the audio to be evaluated according to the first difference between the time domain information of the n frames of first audio frames and the time domain information of the n frames of second audio frames, wherein the quality of the audio to be evaluated is inversely related to the first difference.

In combination with any one of the embodiments of the present application, before determining the quality of the audio to be evaluated according to the first difference between the time domain information of the n frames of first audio frames and the time domain information of the n frames of second audio frames, the method further includes:

obtaining n short-time scores of the audio to be evaluated according to second differences of the audio frames at the same positions in the n frames of first audio frames and the n frames of second audio frames, wherein the quality of the audio to be evaluated is in negative correlation with the second differences, and the short-time scores represent the quality of the audio to be evaluated;

The determining the quality of the audio to be evaluated according to the first difference between the time domain information of the n frames of first audio frames and the time domain information of the n frames of second audio frames comprises:

obtaining a long-time score of the audio to be evaluated according to the first difference between the time domain information of the n frames of first audio frames and the time domain information of the n frames of second audio frames, wherein the long-time score characterizes the quality of the audio to be evaluated;

and determining the quality of the audio to be evaluated according to the n short-time scores and the long-time scores.

In combination with any one of the embodiments of the present application, the obtaining the long-term score of the audio to be evaluated according to the first difference between the time domain information of the n frames of first audio frames and the time domain information of the n frames of second audio frames includes:

performing feature extraction processing on the n frames of first audio frames to obtain n first feature values of the n frames of first audio frames;

performing feature extraction processing on the n frames of second audio frames to obtain n second feature values of the n frames of second audio frames;

and obtaining the long-term score according to the third difference between the time domain information of the n first characteristic values and the time domain information of the n second characteristic values, wherein the quality of the audio to be evaluated is inversely related to the third difference.

In combination with any one of the embodiments of the present application, the determining the quality of the audio to be evaluated according to the n short-time scores and the long-time scores includes:

calculating variances of the n short-time scores;

determining a stability score of the audio to be evaluated according to the variance, wherein the stability score characterizes the stability of the quality of the audio to be evaluated;

and determining the quality of the audio to be evaluated according to the stability score and the long-term score.

calculating an average value of the n short-time scores;

and determining the quality of the audio to be evaluated according to the average value and the long-term score.

calculating the maximum value of the n short-time scores;

and determining the quality of the audio to be evaluated according to the maximum value and the long-term score.

In combination with any one of the embodiments of the present application, the dividing the audio to be evaluated into n frames of second audio frames includes:

Aligning the audio to be evaluated to the original audio to obtain aligned audio;

and dividing the aligned audio into n frames to obtain n frames of second audio frames.

In combination with any embodiment of the present application, the audio quality evaluation method is applied to an audio quality evaluation device, where the audio quality evaluation device operates an audio on demand platform, and the audio to be evaluated is the audio to be released to the audio on demand platform;

after determining the quality of the audio to be evaluated, the method further comprises:

and determining the processing strategy of the audio to be evaluated on the audio on-demand platform according to the quality of the audio to be evaluated.

In combination with any one of the embodiments of the present application, the determining, according to the quality of the audio to be evaluated, a processing policy of the audio to be evaluated on the audio on demand platform includes:

performing a target operation on the audio under evaluation, in case the quality of the audio under evaluation reaches a high quality threshold, the target operation comprising one or more of: the audio on-demand platform is sent to and pushed on the audio on-demand platform;

outputting alarm information when the quality of the audio to be evaluated does not reach a low quality threshold, wherein the alarm information indicates that the quality of the audio to be evaluated is lower than the quality requirement of the audio on-demand platform;

Reducing the push amount on the basis of the reference push amount to obtain a target push amount of the audio to be evaluated under the condition that the quality of the audio to be evaluated does not reach the low quality threshold, wherein the reference push amount is the push amount under the condition that the quality of the audio reaches the quality threshold;

and under the condition that the quality of the audio to be evaluated reaches the low quality threshold and does not reach the high quality threshold, determining the audio to be evaluated as audio to be monitored, wherein the audio to be monitored is the audio needing manual auditing.

In a second aspect, there is provided an audio quality assessment apparatus, the apparatus comprising:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring original audio and audio to be evaluated, and the audio to be evaluated is obtained by performing audio processing on the original audio;

a dividing unit, configured to divide the original audio into n frames of first audio frames, where n is an integer greater than 1;

the dividing unit is used for dividing the audio to be evaluated into n frames of second audio frames;

and the determining unit is used for determining the quality of the audio to be evaluated according to the first difference between the time domain information of the n frames of first audio frames and the time domain information of the n frames of second audio frames, and the quality of the audio to be evaluated is inversely related to the first difference.

In combination with any one of the embodiments of the present application, the apparatus further includes: the processing unit is used for obtaining n short-time scores of the audio to be evaluated according to the second difference of the audio frames at the same position in the n frames of first audio frames and the n frames of second audio frames, wherein the quality of the audio to be evaluated is in negative correlation with the second difference, and the short-time scores represent the quality of the audio to be evaluated;

the determining unit is used for:

In combination with any one of the embodiments of the present application, the determining unit is configured to:

calculating variances of the n short-time scores;

calculating an average value of the n short-time scores;

calculating the maximum value of the n short-time scores;

In combination with any one of the embodiments of the present application, the dividing unit is configured to:

In combination with any one of the embodiments of the present application, the audio quality evaluation device operates an audio on demand platform, and the audio to be evaluated is the audio to be released to the audio on demand platform;

The determining unit is further configured to determine a processing policy of the audio to be evaluated on the audio on-demand platform according to the quality of the audio to be evaluated.

In a third aspect, an electronic device is provided, comprising: a processor and a memory for storing computer program code comprising computer instructions which, when executed by the processor, cause the electronic device to perform a method as described in the first aspect and any one of its possible implementations.

In a fourth aspect, there is provided another electronic device comprising: a processor, a transmitting means, an input means, an output means and a memory for storing computer program code comprising computer instructions which, when executed by the processor, cause the electronic device to perform the first aspect and any implementation thereof as described above.

In a fifth aspect, there is provided a computer readable storage medium having stored therein a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the first aspect and any implementation thereof as described above.

In a sixth aspect, there is provided a computer program product comprising a computer program or instructions which, when run on a computer, cause the computer to perform the first aspect and any embodiments thereof.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

In the application, the audio to be evaluated is obtained by performing audio processing on the original audio, and after the audio quality evaluation device acquires the original audio and the audio to be evaluated, the original audio is divided into n frames of first audio frames, and the audio to be evaluated is divided into n frames of second audio frames. In this way, the time information of the n frames of first audio frames includes the time sequence variation of the information carried by the n frames of first audio frames, the time information of the n frames of second audio frames includes the time sequence variation of the information carried by the n frames of second audio frames, and the first difference between the time domain information of the n frames of first audio frames and the time domain information of the n frames of second audio frames can represent the difference between the time sequence variation of the information carried by the original audio and the time sequence variation of the information carried by the audio to be evaluated. Therefore, the audio quality evaluation device can determine the quality of the audio to be evaluated according to the first difference under the condition that the quality of the audio to be evaluated is inversely related to the first difference, thereby improving the accuracy of the quality of the audio to be evaluated.

Drawings

In order to more clearly describe the technical solutions in the embodiments or the background of the present application, the following description will describe the drawings that are required to be used in the embodiments or the background of the present application.

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and, together with the description, serve to explain the technical aspects of the application.

Fig. 1 is a flow chart of an audio quality evaluation method according to an embodiment of the present application;

fig. 2 is a flow chart of another audio quality assessment method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a neural network according to an embodiment of the present application;

fig. 4 is a flowchart of another audio quality assessment method according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an audio quality evaluation apparatus according to an embodiment of the present application;

fig. 6 is a schematic hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the present application solution better understood by those skilled in the art, the following description will clearly and completely describe the technical solution in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The execution subject of the embodiment of the application is an audio quality evaluation device, where the audio quality evaluation device may be any electronic device capable of executing the technical scheme disclosed in the embodiment of the method of the application. Alternatively, the audio quality assessment means may be one of the following: computer, server.

It should be understood that the method embodiments of the present application may also be implemented by way of a processor executing computer program code. Embodiments of the present application are described below with reference to the accompanying drawings in the embodiments of the present application. Referring to fig. 1, fig. 1 is a flowchart of an audio quality evaluation method according to an embodiment of the present application.

101. The original audio and the audio to be evaluated are acquired.

In the embodiment of the application, the audio to be evaluated is obtained by performing audio processing on the original audio, where the original audio may be any piece of audio. For example, the original audio is a piece of speech, and for example, the original audio is a piece of audio in a video, and for example, the original audio is a piece of music. The audio processing may be any processing of audio, for example, processing of audio to reduce noise, for example, processing of audio to loudness equalize audio, for example, and for example, processing of audio to control dynamic range of audio. As another example, the audio processing is processing of noise reduction of audio and loudness equalization of audio.

In one implementation of acquiring raw audio, an audio quality assessment device receives raw audio input by a user through an input component. The input assembly includes at least one of: keyboard, mouse, touch screen, touch pad, audio input device.

In another implementation of obtaining the original audio, the audio quality assessment device receives the original audio sent by the terminal. The terminal may be any of the following: cell phone, computer, panel computer, server.

In one implementation of acquiring audio to be evaluated, an audio quality evaluation device receives audio to be evaluated input by a user through an input component.

In another implementation manner of obtaining the audio to be evaluated, the audio quality evaluation device receives the audio to be evaluated sent by the terminal.

In still another implementation manner of obtaining the audio to be evaluated, the audio quality evaluation device obtains the audio to be evaluated by performing audio processing on the original audio after obtaining the original audio.

It should be understood that, in the embodiment of the present application, the step of acquiring the original audio and the step of acquiring the audio to be evaluated by the audio quality evaluation device may be performed separately or simultaneously, which is not limited in this application.

102. Dividing the original audio into n frames of first audio frames.

In the embodiment of the present application, n is an integer greater than 1. The audio quality assessment apparatus may obtain n frames of the first audio frame by dividing the original audio into n segments.

In one possible implementation, the audio quality assessment device obtains the length of the original audio. Dividing the original audio into n segments according to the length of the audio to obtain a first audio frame. Optionally, any two of the n first audio frames do not overlap.

For example, if the playing time of the original audio is 50 seconds, that is, the length of the original audio is 50 seconds, the audio quality evaluation device may divide the original audio into 25 segments of the first audio frame having a length of 2 seconds. At this time, n is 25, and the length of each first audio frame is 2 seconds.

For another example, the playing duration of the original audio is 51 seconds, that is, the length of the original audio is 50 seconds, and the audio quality evaluation apparatus may divide the original audio into 25 pieces of first audio frames having a length of 2 seconds and 1 piece of first audio frames having a length of 1 second. At this time, n is 26, i.e., the first audio frame has 26 frames in total.

103. Dividing the audio to be evaluated into n frames of second audio frames.

The implementation manner of the audio quality evaluation device for dividing the audio to be evaluated into n frames of second audio frames is the same as the implementation manner of dividing the original audio into n frames of first audio frames, and will not be described again.

In one possible implementation, the audio quality assessment device aligns the audio to be assessed to the original audio to obtain aligned audio, which may be aligned with the same phonemes in the original audio. By dividing the aligned audio into n frames, n second audio frames are obtained, so that the n first audio frames can be aligned with the audio frames having the same phonemes in the n second audio frames.

104. And determining the quality of the audio to be evaluated according to the first difference between the time domain information of the n frames of first audio frames and the time domain information of the n frames of second audio frames.

In this embodiment of the present application, the time domain information of the n frames of first audio frames is time domain information of original audio, and the time domain information of the n frames of first audio frames includes time sequence changes of information carried by the n frames of first audio frames. For example, the information carried by each frame of the first audio frame includes loudness, and then the time domain information of the n frames of the first audio frame includes: the temporal variation of loudness in the n-frame first audio frame is first larger and then smaller. For another example, the information carried by each frame of the first audio frame includes frequency, and then the time domain information of the n frames of the first audio frame includes: the time sequence variation of the frequency in the first audio frame of the n frames becomes smaller first, then larger and finally smaller. Similarly, the time domain information of the n frames of second audio frames is the time domain information of the audio to be evaluated, and the time domain information of the n frames of second audio frames comprises time sequence changes of information carried by the n frames of second audio frames.

Since audio frames in audio are sequentially played in time order when audio is played, the quality of audio perceived by a listener is related to time domain information of the audio frames. For example, in the n-frame second audio frame, the second audio frame a and the second audio frame b are two audio frames adjacent in time stamp. If the quality of the second audio frame a and the quality of the second audio frame b are both higher, but the loudness of the second audio frame a and the loudness of the second audio frame b are different greatly, the listener's experience is inattention when playing the second audio frame a and then playing the second audio frame b. Obviously, the audience experience is not exhibited by a single frame of audio frames, but the time domain information of the audio frames can be used to determine the audience experience. Accordingly, the audio quality evaluation apparatus may also determine the quality of the audio to be evaluated from the viewpoint of the perception of the listener based on the time domain information of the n-frame second audio frame.

Since the audio processing is performed on the original audio to obtain the audio to be evaluated, the information of the original audio may be lost or changed, and thus a difference exists between the audio to be evaluated and the original audio, the original audio can be used as a reference, and the quality of the audio to be evaluated can be measured. Because the time domain information of the n frames of first audio frames is the time domain information of the original audio, and the time domain information of the n frames of second audio frames is the time domain information of the audio to be evaluated, the quality of the audio to be evaluated can be determined through the difference between the time domain information of the n frames of first audio frames and the time domain information of the n frames of second audio frames.

In this embodiment of the present application, the difference between the time domain information of the n frames of the first audio frame and the time domain information of the n frames of the second audio frame is the first difference. The larger the first difference is, the larger the difference between the time domain information of the original audio and the time domain information of the audio to be evaluated is, and accordingly, the lower the quality of the audio to be evaluated is, whereas the smaller the first difference is, the smaller the difference between the time domain information of the original audio and the time domain information of the audio to be evaluated is, and accordingly, the higher the quality of the audio to be evaluated is. Thus, the quality of the audio to be evaluated is inversely related to the first difference.

In this embodiment of the present application, the audio to be evaluated is obtained by performing audio processing on an original audio, and after the audio quality evaluation device obtains the original audio and the audio to be evaluated, the original audio is divided into n frames of first audio frames, and the audio to be evaluated is divided into n frames of second audio frames. In this way, the time information of the n frames of first audio frames includes the time sequence variation of the information carried by the n frames of first audio frames, the time information of the n frames of second audio frames includes the time sequence variation of the information carried by the n frames of second audio frames, and the first difference between the time domain information of the n frames of first audio frames and the time domain information of the n frames of second audio frames can represent the difference between the time sequence variation of the information carried by the original audio and the time sequence variation of the information carried by the audio to be evaluated. Therefore, the audio quality evaluation device can determine the quality of the audio to be evaluated according to the first difference under the condition that the quality of the audio to be evaluated is inversely related to the first difference, thereby improving the accuracy of the quality of the audio to be evaluated.

And, in comparison with the evaluation of the quality of the audio by evaluating the quality of the voices in the audio, for example, by an audio quality evaluation index such as an objective voice quality evaluation (perceptual evaluation of speech quality, PESQ), since the audio quality evaluation index is determined based on the quality of the voices, the audio quality evaluation index evaluates the quality of the audio by evaluating the quality of the voices in the audio, the audio quality evaluation means can evaluate the quality of any type of audio to be evaluated in the case where the quality of the audio to be evaluated is determined based on the first difference, for example, in the case where the audio to be evaluated is music audio, the audio quality evaluation means can also accurately evaluate the quality of the audio to be evaluated.

As an alternative embodiment, the audio quality assessment apparatus further performs the following steps before performing step 104:

201. and obtaining n short-time scores of the audio to be evaluated according to the second difference of the audio frames at the same position in the n-frame first audio frame and the n-frame second audio frame.

In this embodiment of the present application, n frames of first audio frames belong to original audio, and when the original audio is played, the playing order of different first audio frames is different, and likewise, n frames of second audio frames belong to audio to be evaluated, and when the audio to be evaluated is played, the playing order of different second audio frames is different. The audio frames with the same positions in the n-frame first audio frames and the n-frame second audio frames are audio frames with the same playing sequence in the n-frame first audio frames and the n-frame second audio frames. That is, for the audio frames at the same position in the n-frame first audio frame and the n-frame second audio frame, the second audio frame is obtained by audio processing the first audio frame.

For example, n is 2, and the n frames of first audio frames include a first audio frame a and a first audio frame b, where the first audio frame a and the first audio frame b are sequentially played when the original audio is played. The n frames of second audio frames comprise a second audio frame c and a second audio frame d, wherein the second audio frame c and the second audio frame d are sequentially played when the audio to be evaluated is played. At this time, the playing order of the first audio frame a and the playing order of the second audio frame c are the same, and the playing order of the first audio frame b and the playing order of the second audio frame d are the same. The position of the first audio frame a in the n-frame first audio frame is the same as the position of the second audio frame c in the n-frame second audio frame, and the position of the first audio frame b in the n-frame first audio frame is the same as the position of the second audio frame d in the n-frame second audio frame. The second audio frame c is obtained by audio processing the first audio frame a, and the second audio frame d is obtained by audio processing the first audio frame b.

In this embodiment of the present application, the second difference is a difference between audio frames at the same position in the n-frame first audio frame and the n-frame second audio frame. If the audio frames at the same position in the n-frame first audio frame and the n-frame second audio frame are referred to as a set of position audio frames, then n sets of position audio frames exist in the n-frame first audio frame and the n-frame second audio frame. There is a second difference between two audio frames in each set of co-located audio frames. For example, the n-frame first audio frame includes a first audio frame a, a first audio frame b, and the n-frame second audio frame includes a second audio frame c, and a second audio frame d, where the first audio frame a and the second audio frame c are a set of co-located audio frames, and the first audio frame b and the second audio frame d are a set of co-located audio frames. There is a second difference between the first audio frame a and the second audio frame c and a second difference between the first audio frame b and the second audio frame d.

The larger the second difference is, the larger the difference between the first audio frame and the second audio frame in the same-position audio frame is, and the larger the difference between the original audio and the audio to be evaluated is, otherwise, the smaller the second difference is, the smaller the difference between the first audio frame and the second audio frame in the same-position audio frame is, and the smaller the difference between the original audio and the audio to be evaluated is. Thus, the quality of the audio to be evaluated is inversely related to the second difference.

The audio quality evaluation device can obtain a short-time score of the audio to be evaluated according to one second difference, and can obtain n short-time scores of the audio to be evaluated according to n second differences, wherein the short-time scores represent the quality of the audio to be evaluated. In one possible implementation, the second difference is positively correlated with a short time score, which is negatively correlated with the quality of the audio being evaluated, since the quality of the audio being evaluated is negatively correlated with the second difference. In one possible implementation, the second difference is inversely related to the short-time score, which is then positively related to the quality of the audio being evaluated, since the quality of the audio being evaluated is inversely related to the second difference.

After n short-time scores are obtained, the audio quality evaluation apparatus performs the following steps in the process of performing step 104:

202. and obtaining the long-term score of the audio to be evaluated according to the first difference between the time domain information of the n frames of first audio frames and the time domain information of the n frames of second audio frames.

In the embodiment of the application, the long-term score characterizes the quality of the audio to be evaluated, and the audio quality evaluation device can obtain the long-term score of the audio to be evaluated according to the first difference. In one possible implementation, the first difference is positively correlated with the long term score, which is negatively correlated with the quality of the audio being evaluated, since the quality of the audio being evaluated is negatively correlated with the first difference. In one possible implementation, the first difference is inversely related to the long-term score, which is then positively related to the quality of the audio being evaluated, since the quality of the audio being evaluated is inversely related to the first difference.

It should be appreciated that while both the short-term score and the long-term score can characterize the quality of the audio being evaluated, the short-term score and the long-term score characterize the quality of the audio being evaluated from different angles. Specifically, the long-term score is determined according to the first difference, and therefore, the long-term score is used for determining the quality of the audio to be evaluated based on the difference between the time domain information of the original audio and the time domain information of the audio to be evaluated. The short-time score is determined according to the n second differences, and therefore, the short-time score is based on the difference between the original audio and the audio at the same position in the audio to be evaluated to determine the quality of the audio to be evaluated.

For example, the n-frame first audio frame includes a first audio frame a, a first audio frame b, and the n-frame second audio frame includes a second audio frame c, and a second audio frame d, where a position of the first audio frame a in the n-frame first audio frame is the same as a position of the second audio frame c in the n-frame second audio frame, and a position of the first audio frame b in the n-frame first audio frame is the same as a position of the second audio frame d in the n-frame second audio frame.

Then the short-time score is based on the second difference between the first audio frame a and the second audio frame c, and the second difference between the first audio frame b and the second audio frame d, so as to determine the difference between the audio to be evaluated and the original audio, and further determine the quality of the audio to be evaluated. The time domain information of the original audio is the relation e of the first audio frame a and the first audio frame b in the time domain, the time domain information of the audio to be evaluated is the relation f of the second audio frame c and the second audio frame d in the time domain, and the long-term scoring is based on the difference between the relation e and the relation f, so as to determine the difference between the audio to be evaluated and the original audio, and further determine the quality of the audio to be evaluated.

203. And determining the quality of the audio to be evaluated according to the n short-time scores and the long-time scores.

The audio quality assessment apparatus can determine the quality of audio to be assessed by fusing the n short-time scores and the long-time score. In one possible implementation manner, the audio quality evaluation device determines the quality of the audio to be evaluated by performing weighted summation on n short-time scores and long-time scores in the case that the short-time scores and the long-time scores are both positively correlated with the quality of the audio to be evaluated or the short-time scores and the long-time scores are both negatively correlated with the quality of the audio to be evaluated.

In another possible implementation manner, the audio quality evaluation device determines the quality of the audio to be evaluated by performing weighted average on the fusion of the n short-time scores and the long-time scores, where the short-time scores and the long-time scores are both positively correlated with the quality of the audio to be evaluated, or the short-time scores and the long-time scores are both negatively correlated with the quality of the audio to be evaluated.

In this embodiment, the audio quality evaluation device obtains the long-term scores of the audio to be evaluated according to the first difference between the time domain information of the n frames of the first audio frame and the time domain information of the n frames of the second audio frame, and obtains the n short-term scores of the audio to be evaluated according to the second difference between the audio frames at the same position in the n frames of the first audio frame and the n frames of the second audio frame. The short-time score and the long-time score can represent the quality of the audio to be evaluated from different dimension angles, and the audio quality evaluation device determines the quality of the audio to be evaluated according to the n short-time scores and the long-time scores, so that the accuracy of the quality of the audio to be evaluated can be improved.

As an alternative embodiment, the audio quality assessment apparatus performs the following steps in performing step 202:

301. and carrying out feature extraction processing on the n frames of first audio frames to obtain n first feature values of the n frames of first audio frames.

In this embodiment of the present application, the first feature value carries audio information of the first audio frame. The first characteristic values are in one-to-one correspondence with the first audio frames, i.e. one frame of the first audio frames has one first characteristic value. Optionally, the first eigenvalue is a Mel-frequency cepstral coefficient (Mel-Frequency Cepstral Coefficients, MFCC) of the first audio frame. Optionally, one first eigenvalue includes 50 MFCC coefficients.

N first eigenvalues of audio information carrying n frames of first audio frames can be obtained, and the data volume of the n first eigenvalues.

In one possible implementation, the audio quality assessment apparatus may obtain the first characteristic value of the first audio frame by performing Mel-frequency cepstrum (Mel-Frequency Cepstrum) processing on the first audio frame. The audio quality evaluation device obtains n first characteristic values of the n first audio frames by respectively carrying out Mel frequency cepstrum processing on the n first audio frames.

302. And carrying out feature extraction processing on the n frames of second audio frames to obtain n second feature values of the n frames of second audio frames.

In this embodiment of the present application, the second feature value carries audio information of the second audio frame. The second characteristic values are in one-to-one correspondence with the second audio frames, i.e. one frame of the second audio frames has one second characteristic value. Optionally, the second characteristic value is MFCCs of the second audio frame.

In one possible implementation manner, the audio quality evaluation device may obtain the second characteristic value of the second audio frame by performing mel frequency cepstrum processing on the second audio frame. The audio quality evaluation device obtains n second characteristic values of the n second audio frames by respectively carrying out Mel frequency cepstrum processing on the n second audio frames.

303. And obtaining the long-term scores according to the third difference between the time domain information of the n first characteristic values and the time domain information of the n second characteristic values.

In this embodiment of the present application, the third difference is a difference between the time domain information of the n first feature values and the time domain information of the n second feature values. Since the n first feature values carry the audio information of the n frames of first audio frames, the n second feature values carry the audio information of the n frames of second audio frames, and the third difference can represent the difference between the time domain information of the n first audio frames and the time domain information of the n second audio frames, the third difference can be used for representing the quality of the audio to be evaluated, wherein the quality of the audio to be evaluated is inversely related to the third difference. Thus, the audio quality evaluation apparatus can obtain a long-term score based on the third difference.

In this embodiment, the audio quality evaluation device may obtain n first feature values of the n first audio frames by performing feature extraction processing on the n first audio frames, and the audio information of the n first audio frames may be represented by the smaller data volume by extracting the n first feature values of the n first audio frames because the first feature values have smaller data volume than the first audio frames. Similarly, the audio quality evaluation device obtains n second feature values of the n second audio frames by performing feature extraction processing on the n second audio frames, and can characterize audio information of the n second audio frames by using a smaller data volume. Thus, the third difference between the time domain information of the n first eigenvalues and the time domain information of the n second eigenvalues may characterize the difference between the time domain information of the n first audio frames and the time domain information of the n second audio frames, i.e. the third difference may be used to characterize the quality of the audio to be evaluated. Therefore, the audio quality evaluation device obtains the long-term score for representing the quality of the audio to be evaluated according to the third difference between the time domain information of the n first characteristic values and the time domain information of the n second characteristic values under the condition that the quality of the audio to be evaluated is inversely related to the third difference, so that the data processing amount for obtaining the long-term score can be reduced.

As an alternative embodiment, the audio quality evaluation device obtains n short-time scores of the audio to be evaluated according to a fourth difference of the feature values at the same position in the n first feature values and the n second feature values when obtaining the n first feature values of the n first audio frames and the n second feature values of the n second audio frames.

In this embodiment of the present application, the position of the first feature value in the n frames of the first feature value is the same as the position of the first audio frame corresponding to the first feature value in the n frames of the first audio frame, and the position of the second feature value in the n frames of the second feature value is the same as the position of the second audio frame corresponding to the second feature value in the n frames of the second audio frame.

For example, n is 2, the n-frame first audio frame includes a first audio frame a, a first audio frame b, and the n-frame second audio frame includes a second audio frame c, a second audio frame d. The n first eigenvalues comprise a first eigenvalue e and a first eigenvalue f, the n second eigenvalues comprise a second eigenvalue g and a second eigenvalue h, wherein the first eigenvalue e is a first eigenvalue of a first audio frame a, the first eigenvalue f is a first eigenvalue of a first audio frame b, the second eigenvalue g is a second eigenvalue of a second audio frame c, and the second eigenvalue h is a second eigenvalue of a second audio frame d.

Then, the position of the first eigenvalue e among the n first eigenvalues is the same as the position of the first audio frame a among the n first audio frames. The position of the first eigenvalue f in the n first eigenvalues is the same as the position of the first audio frame b in the n first audio frames. The position of the second eigenvalue g in the n second eigenvalues is the same as the position of the second audio frame c in the n second audio frames. The position of the second eigenvalue h in the n second eigenvalues is the same as the position of the second audio frame d in the n second audio frames.

There is a fourth difference between the first and second eigenvalues of the audio frames at the same location. For example, the n frames of first audio frames include a first audio frame a, a first audio frame b, and the n frames of second audio frames include a second audio frame c and a second audio frame d, where a first feature value of the first audio frame a is a first feature value e, a first feature value of the first audio frame b is a first feature value f, a second feature value of the second audio frame c is a second feature value g, and a second feature value of the second audio frame d is a second feature value h.

The first audio frame a and the second audio frame c are a group of audio frames at the same position, and the first audio frame b and the second audio frame d are a group of audio frames at the same position. Then there is a fourth difference between the first characteristic value e and the second characteristic value g and a fourth difference between the first characteristic value f and the second characteristic value h.

The larger the fourth difference is, the larger the difference between the audio information carried by the first audio frame and the audio information carried by the second audio frame in the same-position audio frame is, that is, the larger the difference between the original audio and the audio to be evaluated is, otherwise, the smaller the fourth difference is, the smaller the difference between the audio information carried by the first audio frame and the audio information carried by the second audio frame in the same-position audio frame is, that is, the smaller the difference between the original audio and the audio to be evaluated is. Therefore, the quality of the audio to be evaluated is inversely related to the fourth difference, the audio quality evaluation device may determine the short-term score of the audio to be evaluated according to the fourth difference, specifically, the audio quality evaluation device may obtain a short-term score of the audio to be evaluated according to one fourth difference, and may obtain n short-term scores of the audio to be evaluated according to n fourth differences.

In this embodiment, after obtaining n first feature values and n second feature values, the audio quality evaluation apparatus obtains a short-time score for characterizing the quality of audio to be evaluated according to a fourth difference between the n first feature values and the n second feature values, so that the data throughput for obtaining the short-time score can be reduced.

As an alternative embodiment, the audio quality assessment apparatus performs the following steps in performing step 203:

401. and calculating the variance of the n short-time scores.

402. And determining the stability score of the audio to be evaluated according to the variance.

In the embodiment of the present application, the stability score characterizes the stability of the quality of the audio to be evaluated, and the larger the stability score characterizes the better stability of the quality of the audio to be evaluated, specifically, the larger the stability score, which indicates that the fluctuation of the quality of the second audio frame in the audio to be evaluated is small. Thus, the audio quality assessment apparatus may determine the stability score of the audio to be assessed from the variance, in particular, the stability score of the audio to be assessed is inversely related to the variance of the n short-time scores.

403. And determining the quality of the audio to be evaluated according to the stability score and the long-term score.

In the embodiment of the present application, the better the stability of the quality of the audio to be evaluated, the higher the quality of the audio to be evaluated, and therefore, the stability score is positively correlated with the quality of the audio to be evaluated. In one possible implementation, the audio quality assessment means determines the quality of the audio to be assessed by weighted summing the stability score and the long term score.

In this embodiment, the audio processing apparatus calculates the variance of the n short-time scores, so that the stability score of the audio to be evaluated can be determined according to the variance, and finally, the quality of the audio to be evaluated is determined according to the stability score and the long-time score, so that the accuracy of the quality of the audio to be evaluated can be improved.

501. the average of the n short-time scores is calculated.

In this embodiment, the average value of the n short-time scores may represent an average difference between audio frames at the same position in the n-frame first audio frame and the n-frame second audio frame.

502. And determining the quality of the audio to be evaluated according to the average value and the long-term score.

In this embodiment of the present application, the smaller the average difference between the audio frames at the same position in the n frames of the first audio frames and the n frames of the second audio frames, the higher the quality of the audio to be evaluated, and therefore, the average value of the n short-time scores is inversely related to the quality of the audio to be evaluated. In one possible implementation, the audio quality assessment means determines the quality of the audio to be assessed by weighted summation of the average of the n short-term scores and the long-term score.

In this embodiment, the audio processing apparatus may determine the quality of the audio to be evaluated based on the average value of the n short-time scores and the long-time score after calculating the average value of the n short-time scores.

601. and calculating the maximum value of the n short-time scores.

In this embodiment, the n short-time scores include a minimum value of the n short-time scores and a maximum value of the n short-time scores.

602. And determining the quality of the audio to be evaluated according to the maximum value and the long-term score.

The maximum of the n short-time scores may be used to characterize the limit of the quality of the audio to be evaluated, in particular, the minimum of the n short-time scores may be used to characterize the highest value of the quality of the audio to be evaluated, and the maximum of the n short-time scores may be used to characterize the lowest value of the quality of the audio to be evaluated. Accordingly, the audio quality evaluation apparatus can determine the quality of the audio to be evaluated based on the maximum value of the n short-time scores and the long-time score. In one possible implementation, the audio quality assessment means determines the quality of the audio to be assessed by weighted summation of the longest of the n short-term scores and the long-term score.

In this embodiment, the audio processing apparatus may determine the quality of the audio to be evaluated based on the longest of the n short-time scores and the long-time score after calculating the longest of the n short-time scores.

As an alternative implementation manner, after n short-time scores are obtained, the audio quality evaluation device calculates and obtains the variance of the n short-time scores, the average value of the n short-time scores and the maximum value of the n short-time scores, and obtains the quality of the audio to be evaluated according to the variance of the n short-time scores, the average value of the n short-time scores, the maximum value of the n short-time scores and the long-time score, so that the accuracy of the quality of the audio to be evaluated can be improved.

As an alternative embodiment, the audio quality evaluation apparatus stores the n short-time scores, the variance of the n short-time scores, the maximum value of the n short-time scores, and the long-time score after obtaining the n short-time scores, the variance of the n short-time scores, the average value of the n short-time scores, the maximum value of the n short-time scores, and the long-time score as information of the audio to be evaluated, so as to query the information of the audio to be evaluated later.

As an alternative implementation manner, the audio quality evaluation device operates an audio on-demand platform, through which audio can be released, and the audio to be evaluated is the audio to be released to the on-demand platform. In one possible implementation, after the user uploads the original audio to the audio on demand platform, the audio quality assessment device obtains the audio to be assessed by performing audio processing on the original audio. It should be appreciated that what is published may be audio in video. A user on an audio-on-demand platform may order any audio on the platform. The audio quality evaluation device determines the processing strategy of the audio to be evaluated on the audio on-demand platform according to the quality of the audio to be evaluated after determining the quality of the audio to be evaluated.

In one possible implementation, the processing strategy of the audio to be evaluated on the audio on demand platform includes: the audio quality evaluation device performs a target operation on the audio to be evaluated, in the case where the quality of the audio to be evaluated reaches a high quality threshold, the target operation including one or more of: and sending the audio to an audio on-demand platform and pushing the audio on-demand platform. In this way, the quality of audio of the audio-on-demand platform may be improved.

In another possible implementation, the processing strategy of the audio to be evaluated on the audio on demand platform includes: and outputting alarm information under the condition that the quality of the audio to be evaluated does not reach a low quality threshold, wherein the alarm information indicates that the quality of the audio to be evaluated is lower than the quality requirement of the audio on-demand platform, so that related personnel can timely process, for example, re-process the original audio to obtain target audio with the quality higher than that of the audio to be evaluated, and then issue the target audio to the audio on-demand platform.

In yet another possible implementation, the processing strategy of the audio to be evaluated on the audio on demand platform includes: and in the case that the quality of the audio to be evaluated does not reach the low quality threshold, reducing the push quantity on the basis of the reference push quantity to obtain a target push quantity of the audio to be evaluated, wherein the reference push quantity is the push quantity in the case that the quality of the audio reaches the quality threshold.

In yet another possible implementation, the processing strategy of the audio to be evaluated on the audio on demand platform includes: and under the condition that the quality of the audio to be evaluated reaches a low quality threshold and does not reach a high quality threshold, determining the audio to be evaluated as the audio to be monitored, and monitoring the audio to be monitored as the audio needing manual auditing. At this time, the audio to be evaluated is manually checked to further confirm whether the audio to be evaluated is released to the audio on-demand platform.

Because the audio on demand platform needs to perform audio processing on the audio uploaded by the user before the audio uploaded by the user is released to the audio on demand platform, the audio processing may cause information loss of the audio uploaded by the user, and further cause a difference between the audio uploaded by the user and the audio released to the audio on demand platform. Therefore, when a user issues audio through the audio on-demand platform, it is generally desirable that the audio issued by the audio on-demand platform is the same as the uploaded audio, i.e., the smaller the difference between the audio issued by the audio on-demand platform and the uploaded audio is, the better the user desires.

Therefore, before the audio to be evaluated is released to the audio on-demand platform, the audio quality evaluation device determines the quality of the audio to be evaluated based on the method, and further determines the processing strategy of the audio to be evaluated on the audio on-demand platform according to the quality of the audio to be evaluated, so that the quality of the audio released on the audio on-demand platform can be improved, and further user experience is improved.

Referring to fig. 2, fig. 2 is a flowchart illustrating another audio quality evaluation method according to an embodiment of the present application. In the processing flow shown in fig. 2, after the audio quality evaluation device acquires the original audio, on one hand, the original audio is divided into n frames of first audio frames, and then, by performing feature extraction processing on the n frames of first audio frames, n first feature values of the n frames of first audio frames are obtained. On the other hand, the audio processing is carried out on the original audio to obtain the audio to be evaluated, the audio to be evaluated is divided into n frames of second audio frames, and the feature extraction processing is carried out on the n frames of second audio frames, wherein n second feature values of the n frames of second audio frames. The n first feature values are divided into at least one first analysis frame, the n second feature values are divided into at least one second analysis frame, for example, n is 200, the continuous 50 first feature values are used as one first analysis frame, 4 first analysis frames can be obtained, and the continuous 50 second feature values are used as one second analysis frame, 4 second analysis frames can be obtained. And taking at least one first analysis frame and at least one second analysis frame as a channel respectively to obtain the double-channel characteristic. And processing the double-channel characteristic frames through deep learning to obtain the quality of the audio to be evaluated. The deep learning processes the two-channel feature frames to obtain the quality of the audio to be evaluated, which can be referred to the above-mentioned process for obtaining the quality of the audio to be evaluated. Optionally, the dual-channel feature frame is input to a neural network, so that the quality of the audio to be evaluated can be obtained.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a neural network according to an embodiment of the present application. As shown in fig. 3, the neural network includes a convolutional neural network (convolutional neural network, CNN), a gating and circulating unit (gated recurrent unit, GRU), and a compression module (dense), and after at least one first analysis frame and at least one second analysis frame are input to the neural network, the CNN performs feature extraction and feature fusion on the at least one first analysis frame to obtain at least one first analysis feature, and the CNN performs feature extraction and feature fusion on the at least one second analysis frame to obtain at least one second analysis feature. The method comprises the steps of processing at least one first analysis feature and at least one second analysis feature through GRU to obtain long-term scores, and processing the at least one first analysis feature and the at least one second analysis feature through a compression module to obtain n short-term scores.

And finally, determining the processing strategy of the audio to be evaluated on the audio on-demand platform according to the quality of the audio to be evaluated by executing the quality judging step. As shown in fig. 2, through quality judgment, the audio to be evaluated can be classified into three cases of low quality, low quality and high quality, wherein the low quality is that the quality of the audio to be evaluated does not reach a low quality threshold, the low quality is that the quality of the audio to be evaluated reaches the low quality threshold and does not reach the high quality threshold, and the high quality is that the quality of the audio to be evaluated reaches the high quality threshold. Under the condition of the too low quality, the audio quality assessment device gives an alarm by outputting alarm information, and reduces pushing quantity to obtain target pushing quantity of the audio to be assessed by reducing pushing quantity on the basis of the reference pushing quantity, and in addition, under the condition of the too low quality, the audio quality assessment device submits the audio to be assessed to the background for auditing. Under the condition of low quality, the audio quality evaluation device prompts that the quality of the audio to be evaluated is low by outputting prompt information, and determines that the audio to be evaluated is the audio to be detected, so that the audio to be evaluated is in a state to be checked. In the case of higher quality, performing a target operation on the audio to be evaluated, wherein the target operation comprises one or more of the following: and the audio to be evaluated has no special operation at the moment.

Referring to fig. 4, fig. 4 is a flowchart illustrating another audio quality evaluation method according to an embodiment of the present application. As shown in fig. 4, the processing flow includes three stages of preparing and preprocessing, obtaining the quality of the audio to be evaluated, and formulating a processing strategy, wherein the preparing and preprocessing includes channel matching, framing, and feature extraction. Specifically, in the case where the original audio includes at least two channels, the audio to be evaluated also includes at least two channels, at this time, the corresponding channels in the original audio and the audio to be evaluated may be extracted by channel matching, or the at least two channels of the original audio may be mixed into one channel by channel matching, and the at least two channels of the audio to be evaluated may be mixed into one channel. Then dividing the channel of the original audio into n frames of first audio frames by framing, and dividing the channel of the audio to be evaluated into n frames of second audio frames. And then, obtaining n first characteristic values of the n first audio frames and second characteristic values of the n second audio frames through characteristic extraction.

In the stage of obtaining the quality of the audio to be evaluated, n first characteristic values and n second characteristic values are input into a neural network, and long-time scores of the audio to be evaluated and n short-time scores of the audio to be evaluated are obtained through model calculation. And then carrying out score statistics on the n short-time scores to obtain an average value (namely a short-time average value) of the n short-time scores, a maximum value (namely a short-time maximum value) of the n short-time scores, a minimum value (namely a short-time minimum value) of the n short-time scores and a variance (namely a short-time variance) of the n short-time scores. And then, the quality of the audio to be evaluated is obtained by fusing the short-time average value, the short-time maximum value, the short-time minimum value, the short-time variance and the long-time score. It should be appreciated that after the quality of each channel of the audio to be evaluated is determined separately, the quality of the audio to be evaluated may be obtained by determining an average of the quality of all channels of the audio to be evaluated.

In the stage of formulating the processing strategy, the processing strategy is determined according to the quality of the audio to be evaluated, specifically, under the condition of over-low quality, the audio quality evaluation device alarms by outputting alarm information, and reduces pushing quantity by reducing pushing quantity on the basis of the reference pushing quantity to obtain target pushing quantity of the audio to be evaluated, in addition, under the condition of over-low quality, the audio quality evaluation device submits the audio to be evaluated to the background for checking, so that the audio to be evaluated is in a state to be processed. Under the condition of low quality, the processing of the audio to be evaluated is not urgent, and the audio quality evaluation device prompts the low quality of the audio to be evaluated by outputting prompt information and determines the audio to be evaluated as the audio to be detected so as to pay attention to and track the audio to be evaluated. And normally issuing the audio to be evaluated under the condition of higher quality.

It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.

The foregoing details the method of embodiments of the present application, and the apparatus of embodiments of the present application is provided below.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an audio quality assessment apparatus according to an embodiment of the present application, where the audio quality assessment apparatus 1 includes: the acquisition unit 11, the division unit 12, the determination unit 13, optionally, the audio quality assessment apparatus 1 further comprises a processing unit 14, in particular:

an obtaining unit 11, configured to obtain an original audio and an audio to be evaluated, where the audio to be evaluated is obtained by performing audio processing on the original audio;

a dividing unit 12 for dividing the original audio into n frames of first audio frames, where n is an integer greater than 1;

the dividing unit 12 is configured to divide the audio to be evaluated into n frames of second audio frames;

a determining unit 13, configured to determine a quality of the audio to be evaluated according to a first difference between the time domain information of the n frames of first audio frames and the time domain information of the n frames of second audio frames, where the quality of the audio to be evaluated is inversely related to the first difference.

In combination with any one of the embodiments of the present application, the audio quality assessment apparatus 1 further includes: a processing unit 14, configured to obtain n short-time scores of the audio to be evaluated according to a second difference of the audio frames at the same position in the n first audio frames and the n second audio frames, where the quality of the audio to be evaluated is inversely related to the second difference, and the short-time scores represent the quality of the audio to be evaluated;

The determining unit 13 is configured to:

In combination with any one of the embodiments of the present application, the determining unit 13 is configured to:

calculating variances of the n short-time scores;

calculating an average value of the n short-time scores;

calculating the maximum value of the n short-time scores;

In combination with any one of the embodiments of the present application, the dividing unit 12 is configured to:

In combination with any embodiment of the present application, the audio quality assessment apparatus 1 operates an audio on demand platform, where the audio to be assessed is audio to be released to the audio on demand platform;

the determining unit 13 is further configured to determine a processing policy of the audio to be evaluated on the audio on-demand platform according to the quality of the audio to be evaluated.

And, in comparison with the evaluation of the quality of audio by evaluating the quality of speech in audio, for example, by an audio quality evaluation index (such as PESQ), since the audio quality evaluation index is determined based on the quality of speech, the evaluation of the quality of audio by an audio quality evaluation index evaluates the quality of speech in audio, the audio quality evaluation means can evaluate the quality of any type of audio to be evaluated in the case where the quality of audio to be evaluated is determined based on the first difference, for example, in the case where the audio to be evaluated is music audio, the audio quality evaluation means can also accurately evaluate the quality of audio to be evaluated.

In some embodiments, functions or modules included in the apparatus provided in the embodiments of the present application may be used to perform the methods described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.

Fig. 6 is a schematic hardware structure of an electronic device according to an embodiment of the present application. The electronic device 2 comprises a processor 21 and a memory 22. Optionally, the electronic device 2 further comprises input means 23 and output means 24. The processor 21, memory 22, input device 23, and output device 24 are coupled by connectors, including various interfaces, transmission lines or buses, etc., as not limited in this application. It should be understood that in various embodiments of the present application, coupled is intended to mean interconnected by a particular means, including directly or indirectly through other devices, e.g., through various interfaces, transmission lines, buses, etc.

The processor 21 may comprise one or more processors, for example one or more central processing units (central processing unit, CPU), which in the case of a CPU may be a single core CPU or a multi core CPU. Alternatively, the processor 21 may be a processor group constituted by a plurality of CPUs, the plurality of processors being coupled to each other through one or more buses. In the alternative, the processor may be another type of processor, and the embodiment of the present application is not limited.

Memory 22 may be used to store computer program instructions as well as various types of computer program code for performing aspects of the present application. Optionally, the memory includes, but is not limited to, a random access memory (random access memory, RAM), a read-only memory (ROM), an erasable programmable read-only memory (erasable programmable read only memory, EPROM), or a portable read-only memory (compact disc read-only memory, CD-ROM) for associated instructions and data.

The input means 23 are for inputting data and/or signals and the output means 24 are for outputting data and/or signals. The input device 23 and the output device 24 may be separate devices or may be an integral device.

It will be appreciated that in the embodiments of the present application, the memory 22 may be used to store not only relevant instructions, but also relevant data, and the embodiments of the present application are not limited to the data specifically stored in the memory.

It will be appreciated that fig. 6 shows only a simplified design of an electronic device. In practical applications, the electronic device may further include other necessary elements, including but not limited to any number of input/output devices, processors, memories, etc., and all electronic devices that may implement the embodiments of the present application are within the scope of protection of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein. It will be further apparent to those skilled in the art that the descriptions of the various embodiments herein are provided with emphasis, and that the same or similar parts may not be explicitly described in different embodiments for the sake of convenience and brevity of description, and thus, parts not described in one embodiment or in detail may be referred to in the description of other embodiments.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital versatile disk (digital versatiledisc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

Those of ordinary skill in the art will appreciate that implementing all or part of the above-described method embodiments may be accomplished by a computer program to instruct related hardware, the program may be stored in a computer readable storage medium, and the program may include the above-described method embodiments when executed. And the aforementioned storage medium includes: a read-only memory (ROM) or a random access memory (random access memory, RAM), a magnetic disk or an optical disk, or the like.

Claims

1. A method of audio quality assessment, the method comprising:

dividing the audio to be evaluated into n frames of second audio frames;

2. The method of claim 1, wherein prior to determining the quality of the audio to be evaluated based on a first difference in time domain information of the n frames of first audio frames and time domain information of the n frames of second audio frames, the method further comprises:

3. The method of claim 2, wherein the obtaining the long-term score of the audio to be evaluated based on the first difference between the time-domain information of the n first audio frames and the time-domain information of the n second audio frames comprises:

4. A method according to claim 2 or 3, wherein said determining the quality of the audio to be evaluated from the n short-time scores and the long-time score comprises:

calculating variances of the n short-time scores;

5. A method according to claim 2 or 3, wherein said determining the quality of the audio to be evaluated from the n short-time scores and the long-time score comprises:

calculating an average value of the n short-time scores;

6. A method according to claim 2 or 3, wherein said determining the quality of the audio to be evaluated from the n short-time scores and the long-time score comprises:

calculating the maximum value of the n short-time scores;

7. A method according to any one of claims 1 to 3, wherein said dividing the audio to be evaluated into n frames of second audio frames comprises:

8. A method according to any one of claims 1 to 3, wherein the audio quality assessment method is applied to an audio quality assessment device, the audio quality assessment device operating an audio on demand platform, the audio to be assessed being audio that is to be published to the audio on demand platform;

9. The method of claim 8, wherein the determining a processing strategy of the audio-on-demand platform according to the quality of the audio-on-demand comprises:

10. An audio quality assessment apparatus, the apparatus comprising:

11. An electronic device, comprising: a processor and a memory for storing computer program code comprising computer instructions which, when executed by the processor, cause the electronic device to perform the method of any one of claims 1 to 9.

12. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1 to 9.

13. A computer program product, characterized in that the computer program product comprises a computer program or instructions; the computer program or instructions, when run on a computer, cause the computer to perform the method of any one of claims 1 to 9.