CN112261437A

CN112261437A - Audio and video quality evaluation method and device, readable storage medium and electronic equipment

Info

Publication number: CN112261437A
Application number: CN202011299725.7A
Authority: CN
Inventors: 徐亮
Original assignee: Beike Technology Co Ltd
Current assignee: Beike Technology Co Ltd
Priority date: 2020-11-19
Filing date: 2020-11-19
Publication date: 2021-01-22

Abstract

The embodiment of the disclosure discloses an audio and video quality evaluation method and device, a readable storage medium and electronic equipment, wherein the method comprises the following steps: obtaining dimension information of a plurality of evaluation dimensions of the audio and video to be scored; determining a dimension score corresponding to the evaluation dimension based on the dimension information corresponding to each evaluation dimension in the plurality of evaluation dimensions, so as to obtain a plurality of dimension scores; inputting the multiple dimensionality scores into a neural network to obtain comprehensive scores of the audio and video; according to the method, the comprehensive score is determined through the multiple dimension scores, so that the comprehensive score of the audio and video can give guidance for optimizing the direction.

Description

Audio and video quality evaluation method and device, readable storage medium and electronic equipment

Technical Field

The present disclosure relates to audio and video quality evaluation technologies, and in particular, to an audio and video quality evaluation method and apparatus, a readable storage medium, and an electronic device.

Background

With the rapid development of the mobile internet, various novel audio and video services emerge in large quantity, and the success or failure of the audio and video services is directly determined by the direct broadcasting services of the mobile internet which are subjected to intense fire due to epidemic situations and the quality of the audio and video.

People want to find an effective audio and video evaluation method to evaluate various audio and video services, and meanwhile, by means of evaluation, system optimization is better promoted, and more extreme viewing experience is brought to users.

Disclosure of Invention

The present disclosure is proposed to solve the above technical problems. The embodiment of the disclosure provides an audio and video quality evaluation method and device, a readable storage medium and electronic equipment.

According to an aspect of the embodiments of the present disclosure, there is provided an audio and video quality evaluation method, including:

obtaining dimension information of a plurality of evaluation dimensions of the audio and video to be scored;

determining a dimension score corresponding to the evaluation dimension based on the dimension information corresponding to each evaluation dimension in the plurality of evaluation dimensions, so as to obtain a plurality of dimension scores;

and inputting the multiple dimensionality scores into a neural network to obtain the comprehensive score of the audio and video.

Optionally, before the inputting the multiple dimension scores into a neural network to obtain a composite score of the audio/video, the method further includes:

training the neural network by using a training set; the training set comprises a plurality of sample audios and videos, and each sample audio and video comprises a corresponding user score.

Optionally, the training the neural network with a training set includes:

determining a plurality of sample dimension scores of a plurality of evaluation dimensions of the sample audio/video, and inputting the plurality of sample dimension scores into the neural network to obtain a prediction comprehensive score;

determining the network loss based on a difference between the predicted composite score and the user score;

adjusting a network parameter of the neural network based on the network loss.

Optionally, before determining the network loss based on a difference between the predicted composite score and the user score, further comprising:

and determining the user score corresponding to the sample audio/video according to the feedback score of at least one user when the sample audio/video is played.

Optionally, the determining, based on the dimension information corresponding to each of the plurality of evaluation dimensions, a dimension score corresponding to the evaluation dimension includes:

determining whether the dimension information is abnormal or not according to the relationship between the dimension information of the evaluation dimension and a dimension threshold corresponding to the evaluation dimension;

and determining a dimension score corresponding to the evaluation dimension based on the abnormal times of the audio and video in the dimension information of the evaluation dimension.

Optionally, the determining, based on the number of times of abnormality of the audio and video in the dimension information of the evaluation dimension, a dimension score corresponding to the evaluation dimension includes:

dividing the audio and video into at least one time interval according to time;

in each time interval, determining a deduction score corresponding to the evaluation dimension according to the abnormal times of abnormality of the dimension information of the evaluation dimension in the time interval;

and determining a dimension score corresponding to the evaluation dimension based on at least one deduction score corresponding to the at least one time interval.

Optionally, the determining, according to the number of times of abnormality of the dimension information of the evaluation dimension in the time interval, a deducted score corresponding to the evaluation dimension includes:

determining whether the abnormal times of the dimension information of the evaluation dimension is abnormal reaches a first set value;

in response to the number of abnormal times reaching a first set value in the time interval, taking a first preset value as the deduction value;

and obtaining the deduction score based on the fact that the abnormal times are multiplied by a second preset score in response to the fact that the abnormal times do not reach a first set value in the time interval.

Optionally, the determining, based on at least one of the deducted scores corresponding to the at least one time interval, a dimension score corresponding to the evaluation dimension includes:

judging whether the sum of at least one deduction value is larger than or equal to a third set value;

in response to the sum of the at least one deducted score being greater than or equal to a third set score, determining the dimension score of the evaluation dimension to be a fourth set score; wherein the fourth set score is the difference between the initial score of the evaluation dimension and the third set score;

in response to the sum of the at least one deducted score being less than a third set score, subtracting the sum of the at least one deducted score based on the initial score of the evaluation dimension to obtain a difference as the dimension score of the average dimension.

Optionally, the evaluation dimension comprises at least two of: video code rate, frame rate, resolution, audio code rate, volume, packet loss, transmission delay, and CPU.

Optionally, the method further comprises:

determining the quality grade of the audio and video based on the comprehensive score of the audio and video; the quality grades comprise at least two grades, and each grade corresponds to different comprehensive scores.

According to another aspect of the embodiments of the present disclosure, there is provided an audio and video quality evaluation device, including:

the dimension information acquisition module is used for acquiring dimension information of a plurality of evaluation dimensions of the audio/video to be scored;

the single-dimension scoring module is used for determining a dimension score corresponding to each evaluation dimension based on the dimension information corresponding to each evaluation dimension in the plurality of evaluation dimensions to obtain a plurality of dimension scores;

and the comprehensive scoring module is used for inputting the multiple dimensionality scores into a neural network to obtain the comprehensive scores of the audio and video.

Optionally, the apparatus further comprises:

the network training module is used for training the neural network by utilizing a training set; the training set comprises a plurality of sample audios and videos, and each sample audio and video comprises a corresponding user score.

Optionally, the network training module is specifically configured to determine a plurality of sample dimension scores of a plurality of evaluation dimensions of the sample audio/video, and input the plurality of sample dimension scores into the neural network to obtain a predicted comprehensive score; determining the network loss based on a difference between the predicted composite score and the user score; adjusting a network parameter of the neural network based on the network loss.

Optionally, the network training module is further configured to determine a user score corresponding to the sample audio/video according to a feedback score of at least one user when the sample audio/video is played.

Optionally, the single-dimension scoring module includes:

the anomaly determination unit is used for determining whether the dimension information is abnormal or not according to the relationship between the dimension information of the evaluation dimension and the dimension threshold corresponding to the evaluation dimension;

and the abnormality scoring unit is used for determining the dimension score corresponding to the evaluation dimension based on the abnormal times of the audio and video in the dimension information of the evaluation dimension.

Optionally, the abnormality scoring unit is specifically configured to divide the audio/video into at least one time interval according to time; in each time interval, determining a deduction score corresponding to the evaluation dimension according to the abnormal times of abnormality of the dimension information of the evaluation dimension in the time interval; and determining a dimension score corresponding to the evaluation dimension based on at least one deduction score corresponding to the at least one time interval.

Optionally, when determining the deduction value corresponding to the evaluation dimension according to the abnormal frequency of the abnormality of the dimension information of the evaluation dimension in the time interval, the abnormality scoring unit is configured to determine whether the abnormal frequency of the abnormality of the dimension information of the evaluation dimension reaches a first set value; in response to the number of abnormal times reaching a first set value in the time interval, taking a first preset value as the deduction value; and obtaining the deduction score based on the fact that the abnormal times are multiplied by a second preset score in response to the fact that the abnormal times do not reach a first set value in the time interval.

Optionally, when determining the dimension score corresponding to the evaluation dimension based on at least one deduction score corresponding to the at least one time interval, the abnormality scoring unit is configured to determine whether a sum of the at least one deduction score is greater than or equal to a third set score; in response to the sum of the at least one deducted score being greater than or equal to a third set score, determining the dimension score of the evaluation dimension to be a fourth set score; wherein the fourth set score is the difference between the initial score of the evaluation dimension and the third set score; in response to the sum of the at least one deducted score being less than a third set score, subtracting the sum of the at least one deducted score based on the initial score of the evaluation dimension to obtain a difference as the dimension score of the average dimension.

Optionally, the apparatus further comprises:

the grade determining module is used for determining the quality grade of the audio and video based on the comprehensive grade of the audio and video; the quality grades comprise at least two grades, and each grade corresponds to different comprehensive scores.

According to still another aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, where the storage medium stores a computer program for executing the audio and video quality evaluation method according to any of the embodiments.

According to still another aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instruction from the memory, and execute the instruction to implement the audio/video quality evaluation method according to any of the embodiments.

The audio and video quality evaluation method and device, the readable storage medium and the electronic device provided based on the above embodiments of the present disclosure include: obtaining dimension information of a plurality of evaluation dimensions of the audio and video to be scored; determining a dimension score corresponding to the evaluation dimension based on the dimension information corresponding to each evaluation dimension in the plurality of evaluation dimensions, so as to obtain a plurality of dimension scores; inputting the multiple dimensionality scores into a neural network to obtain comprehensive scores of the audio and video; according to the method, the comprehensive score is determined through the multiple dimension scores, so that the comprehensive score of the audio and video can give guidance for optimizing the direction.

The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in more detail embodiments of the present disclosure with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings, like reference numbers generally represent like parts or steps.

Fig. 1 is a schematic flow diagram of an audio and video quality evaluation method according to an exemplary embodiment of the present disclosure.

Fig. 2 is a schematic flowchart of a training method of a neural network in an example of an audio/video quality evaluation method provided by an exemplary embodiment of the present disclosure.

Fig. 3 is a schematic flow chart of step 104 in the embodiment shown in fig. 1 of the present disclosure.

Fig. 4 is a schematic flowchart of step 1042 in the embodiment shown in fig. 3 according to the present disclosure.

Fig. 5 is a schematic structural diagram of an audio/video quality evaluation device according to an exemplary embodiment of the present disclosure.

Fig. 6 is a block diagram of an electronic device provided in an exemplary embodiment of the present disclosure.

Detailed Description

Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.

It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.

It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more and "at least one" may refer to one, two or more.

It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.

In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

The disclosed embodiments may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

Exemplary method

Fig. 1 is a schematic flow diagram of an audio and video quality evaluation method according to an exemplary embodiment of the present disclosure. The embodiment can be applied to an electronic device, as shown in fig. 1, and includes the following steps:

and 102, obtaining dimension information of a plurality of evaluation dimensions of the audio and video to be scored.

The method has the advantages that a plurality of factors influencing the audio and video quality exist, and the plurality of factors can be used as evaluation dimensions respectively to obtain corresponding dimension information; for example, when the method provided by the embodiment is applied to an audio/video live broadcast scene, live broadcast fluency, high definition degree of pictures, definition degree of sound, real-time interactivity of live broadcast and other problems need to be considered. Alternatively, one live broadcast may simply be broken down into: the audio and video coding and decoding scheme is quite mature, only the influence of a CPU on coding is considered, and the influence of the transmission link on the audio and video quality possibly comprises the following steps: audio and video code rate, frame rate, packet loss rate, transmission delay and the like.

And 104, determining a dimension score corresponding to the evaluation dimension based on the dimension information corresponding to each evaluation dimension in the plurality of evaluation dimensions, and obtaining a plurality of dimension scores.

Optionally, in this embodiment, a dimension score is respectively determined for each evaluation dimension, so that the audio and video is respectively evaluated in each evaluation dimension through the dimension score, and a comprehensive score determined by integrating multiple dimension scores can be provided to adjust which evaluation dimension can improve the comprehensive score when the comprehensive score is not good (e.g., low).

And 106, inputting the multiple dimensional scores into a neural network to obtain the comprehensive scores of the audios and videos.

According to the audio and video quality evaluation method provided by the embodiment of the disclosure, dimension information of a plurality of evaluation dimensions of an audio and video to be scored is obtained; determining a dimension score corresponding to the evaluation dimension based on the dimension information corresponding to each evaluation dimension in the plurality of evaluation dimensions, so as to obtain a plurality of dimension scores; inputting the multiple dimensionality scores into a neural network to obtain comprehensive scores of the audio and video; according to the method, the comprehensive score is determined through the multiple dimension scores, so that the comprehensive score of the audio and video can give guidance for optimizing the direction.

In some optional embodiments, before performing step 106, the method may further include:

and training the neural network by utilizing the training set.

The training set comprises a plurality of sample audios and videos, and each sample audio and video comprises a corresponding user score.

In the embodiment, the user score is used as the supervision information of the neural network training, so that the comprehensive score obtained by the trained neural network fits the subjective experience of the user, and the problems that the existing objective evaluation method cannot approach the real user experience and cannot provide an improved method for improving the user experience are solved.

As shown in fig. 2, in the audio/video quality evaluation method provided by the present application, the process of training the neural network by using the training set may include the following steps:

in step 2011, a plurality of sample dimension scores of a plurality of evaluation dimensions of the sample audio/video are determined, and the plurality of sample dimension scores are input to the neural network to obtain a predicted comprehensive score.

Step 2012, a network loss is determined based on the difference between the predicted composite score and the user score.

Step 2013, adjusting network parameters of the neural network based on the network loss.

In this embodiment, a plurality of known sample dimension scores may exist in a sample audio/video, or a plurality of evaluation dimensions corresponding to the sample audio/video are respectively determined to obtain a sample dimension score, the plurality of sample dimension scores are processed by a neural network to be trained to obtain a predicted comprehensive score, network parameters are continuously updated by iteration through the neural network, a mapping relationship between each evaluation dimension and a subjective evaluation is fitted to obtain a trained neural network, optionally, the neural network may be a BP neural network, and conditions for finishing training may include but are not limited to: the change of the network loss between two continuous iterations is smaller than a preset value, the iteration updating times reach the set iteration times, and the like.

Optionally, in the network training process provided in the foregoing embodiment, the method for obtaining the user score as the supervision information may include: determining user scores corresponding to the sample audios and videos according to feedback scores of at least one user when the sample audios and videos are played; or the staff is requested to carry out manual scoring according to the set standard, the number of the acquisition channels of the supervision information and the acquired user scores is increased by utilizing the mode that the user feedback is acquired during the audio and video playing of the sample, meanwhile, the acquired user scores are closer to the real user evaluation, and the comprehensive scores acquired by the neural network obtained through training are also closer to the real user evaluation; optionally, the average value of the user scores of the same sample audio/video based on a plurality of users is used as the user score of the sample audio/video, and the robustness and the practicability of the user score are improved in a mode of averaging.

As shown in fig. 3, based on the embodiment shown in fig. 1, step 104 may include the following steps:

step 1041, determining whether the dimension information is abnormal according to the relationship between the dimension information of the evaluation dimension and the dimension threshold corresponding to the evaluation dimension.

And 1042, determining a dimension score corresponding to the evaluation dimension based on the abnormal times of the audio and video in the dimension information of the evaluation dimension.

In this embodiment, a different dimension threshold may be set for each evaluation dimension, for example: the transmission delay threshold value is 200ms (abnormal occurrence when exceeding the threshold value), the video rate threshold value is 400Kbs (abnormal occurrence when being lower than the threshold value), the frame rate threshold value is 12fps (abnormal occurrence when being lower than the threshold value), the audio rate threshold value is 30Kbs (abnormal occurrence when being lower than the 30 Kbs), the volume threshold value is 40db (abnormal occurrence when being lower than the threshold value), the packet loss (including uplink and downlink packet loss) is 15% (abnormal occurrence when exceeding the threshold value), the cpu threshold value is 80% (abnormal occurrence when exceeding the threshold value), and the resolution threshold value is 720p (abnormal occurrence when being lower than the threshold value); in this embodiment, if the original dimension information is not processed and the dimension information is directly used for comprehensive scoring, a feature with a higher numerical level is caused, a higher weight coefficient is obtained in the process of establishing a mapping model, and the accuracy of the system is reduced; therefore, the present embodiment scores the dimension information of each evaluation dimension in a single dimension before inputting the dimension information into the neural network.

As shown in fig. 4, based on the embodiment shown in fig. 3, step 1042 may include the following steps:

step 401, dividing the audio and video into at least one time interval according to time.

And 402, in each time interval, determining a deduction score corresponding to the evaluation dimension according to the abnormal times of the dimension information of the evaluation dimension in the time interval.

Step 403, determining a dimension score corresponding to the evaluation dimension based on at least one deducted score corresponding to at least one time interval.

According to the method and the device, the calculation efficiency of deduction values is improved by segmenting the audios and videos, the problem caused by the fact that the rules are respectively determined for a plurality of audios and videos with uncertain duration is avoided, the deduction values are respectively determined in each time interval, and the identification accuracy is improved. Optionally, uploading the audio and video to the evaluation system once every set time (e.g., 2 seconds) and taking the uploading once as a point (corresponding to the set time), at this time, determining a time interval (e.g., 5 points) by using a preset number of points to determine a deduction value to be deducted from the audio and video in each time interval, and then determining a dimension score corresponding to the scoring dimension according to the deduction value corresponding to each time interval; when the audio and video duration is not enough, the audio and video duration is directly used as a time interval for processing.

Optionally, step 402 in the above embodiment may include:

in response to the abnormal times reaching a first set value in the time interval, taking a first preset value as a deduction value;

and obtaining a deduction score based on the fact that the abnormal times are multiplied by a second preset score in response to the fact that the abnormal times do not reach the first set value in the time interval.

In this embodiment, the user experience is considered in the evaluation of the audio and video, and the single-dimensional evaluation is deducted whenever the data fluctuation of the continuous audio and video quality degradation (the number of abnormal times in the set time reaches the first set value, for example, 3 points out of 5 points are abnormal) and a spur (an abnormal occurs independently). For example, in an alternative example, 10 seconds is used as a time interval, 3 times is used as a first set value, and when three times of abnormality is detected, the time interval is reduced by 8 minutes (percent); when the number of times of abnormality occurrence in 10 seconds is less than 3, subtracting the score to accumulate a second preset score (for example, 2 scores) every time abnormality occurs, for example, the number of times of abnormality occurrence in 10 seconds is 2, and the subtracted score is obtained by multiplying the second preset score by 2 (for example, 2 × 2 — 4 scores), at this time, the subtracted score corresponding to the evaluation dimension in the time interval can be determined; the method includes the steps that at least one deduction value in at least one time interval corresponding to an evaluation dimension is accumulated, so that a total deduction value corresponding to the evaluation dimension can be determined, an initial value corresponding to the evaluation dimension (for example, the initial value is set to be 100 points in a percentage system) is subtracted, so that a dimension score of the evaluation dimension can be obtained, and the quality of the audio and video in the evaluation dimension is evaluated through the dimension score.

Optionally, step 403 in the above embodiment may include:

in response to the sum of the at least one deducted score being greater than or equal to the third set score, determining the dimension score of the evaluation dimension as a fourth set score; the fourth set score is the difference between the initial score of the evaluation dimension and the third set score;

and in response to the sum of the at least one deducted score being less than the third set score, subtracting the sum of the at least one deducted score from the initial score based on the evaluation dimension to obtain a difference as a dimension score of the average dimension.

In the foregoing embodiment, the deduction value is not limited, so that the dimension score corresponding to the last evaluation dimension or some evaluation dimensions may be 0, although the score does not affect the subsequent determination of the composite score, in order to further determine the importance of each dimension in the composite score, the third set value is used in this embodiment to prevent the score of each evaluation dimension from being too low, optionally, the third set value may be set to 70 (percent), at this time, when the deduction value is greater than 70, the dimension score of the evaluation dimension is directly determined to be 30, and when the deduction value is less than 70, the deduction may be directly performed according to the actual deduction value.

In some optional embodiments, the method may further include:

and determining the quality grade of the audio and video based on the comprehensive score of the audio and video.

The quality grades comprise at least two grades, and each grade corresponds to different comprehensive scores.

The subjective measurement in the field of audio and video evaluation generally adopts a method of an MOS value, which is divided into five grades: 5 (excellent), 4 (good), 3 (medium), 2 (inferior) and 1 (poor). In the method, a percentile scoring rule is adopted, and on the premise of referring to a scoring rule of an MOS value, in order to better fit subjective experience of a user in a live video scene, the MOS value may be finally determined based on a comprehensive score through mapping, for example, the mapping relationship is established as follows: the MOS value of 0-60 points is corresponding to 1, which indicates that the extreme difference exists, and the sound and the video are seriously blocked; the MOS value of 60-70 points corresponds to 2, the difference is represented, the observation can be carried out, and severe pause exists in part of time; 70-80 corresponding MOS value is 3, which shows that the audio and video quality is general and has small delay; the MOS value of 80-90 points corresponds to 4, the audio and video quality is good, and no obvious delay or blockage exists; the MOS value of 90-100 points corresponds to 5, which shows that the sound and video quality is excellent, and the sound and image quality experience is excellent in the whole process; the quality grade of the audio and video is determined through mapping, and the method is in contact with the audio and video evaluation in the prior art, so that the quality condition of the audio and video can be clear to technicians in the field, the method is more consistent with a conventional evaluation system, and the application range and the application scene of the method provided by the embodiment are expanded.

Any audio and video quality evaluation method provided by the embodiment of the present disclosure may be executed by any suitable device with data processing capability, including but not limited to: terminal equipment, a server and the like. Alternatively, any audio and video quality evaluation method provided by the embodiment of the present disclosure may be executed by a processor, for example, the processor executes any audio and video quality evaluation method mentioned in the embodiment of the present disclosure by calling a corresponding instruction stored in a memory. And will not be described in detail below.

Exemplary devices

Fig. 5 is a schematic structural diagram of an audio/video quality evaluation device according to an exemplary embodiment of the present disclosure. As shown in fig. 5, the apparatus provided in this embodiment includes:

the dimension information obtaining module 51 is configured to obtain dimension information of multiple evaluation dimensions of the audio and video to be scored.

The single-dimension scoring module 52 is configured to determine a dimension score corresponding to each evaluation dimension based on the dimension information corresponding to each evaluation dimension in the plurality of evaluation dimensions, and obtain a plurality of dimension scores.

And the comprehensive scoring module 53 is configured to input the multiple dimensional scores into the neural network to obtain a comprehensive score of the audio and video.

According to the audio and video quality evaluation device provided by the embodiment of the disclosure, dimension information of a plurality of evaluation dimensions of an audio and video to be scored is obtained; determining a dimension score corresponding to the evaluation dimension based on the dimension information corresponding to each evaluation dimension in the plurality of evaluation dimensions, so as to obtain a plurality of dimension scores; inputting the multiple dimensionality scores into a neural network to obtain comprehensive scores of the audio and video; according to the method, the comprehensive score is determined through the multiple dimension scores, so that the comprehensive score of the audio and video can give guidance for optimizing the direction.

In some optional embodiments, the apparatus provided in this embodiment further includes:

and the network training module is used for training the neural network by utilizing the training set.

Optionally, the network training module is specifically configured to determine a plurality of sample dimension scores of a plurality of evaluation dimensions of the sample audio/video, and input the plurality of sample dimension scores into the neural network to obtain a predicted comprehensive score; determining a network loss based on a difference between the predicted composite score and the user score; network parameters of the neural network are adjusted based on the network loss.

Optionally, the network training module is further configured to determine a user score corresponding to the sample audio/video according to the feedback score of at least one user when the sample audio/video is played.

In some alternative embodiments, the one-dimensional scoring module 52 includes:

and the abnormality scoring unit is used for determining the dimension score corresponding to the evaluation dimension based on the abnormal times of the dimension information of the audio and video in the evaluation dimension.

Optionally, the abnormality scoring unit is specifically configured to divide the audio/video into at least one time interval by time; in each time interval, determining a deduction value corresponding to the evaluation dimension according to the abnormal times of the dimension information of the evaluation dimension in the time interval; and determining a dimension score corresponding to the evaluation dimension based on at least one deduction score corresponding to at least one time interval.

Optionally, the anomaly scoring unit is used for determining whether the abnormal frequency of the dimension information of the evaluation dimension reaches a first set value when determining the deduction score corresponding to the evaluation dimension according to the abnormal frequency of the dimension information of the evaluation dimension in the time interval; in response to the abnormal times reaching a first set value in the time interval, taking a first preset value as a deduction value; and obtaining a deduction score based on the fact that the abnormal times are multiplied by a second preset score in response to the fact that the abnormal times do not reach the first set value in the time interval.

Optionally, the anomaly scoring unit is configured to determine whether a sum of at least one deduction score is greater than or equal to a third set score when determining the dimension score corresponding to the evaluation dimension based on at least one deduction score corresponding to at least one time interval; in response to the sum of the at least one deducted score being greater than or equal to the third set score, determining the dimension score of the evaluation dimension as a fourth set score; the fourth set score is the difference between the initial score of the evaluation dimension and the third set score; and in response to the sum of the at least one deducted score being less than the third set score, subtracting the sum of the at least one deducted score from the initial score based on the evaluation dimension to obtain a difference as a dimension score of the average dimension.

In some alternative embodiments, the evaluation dimensions include at least two of: video code rate, frame rate, resolution, audio code rate, volume, packet loss, transmission delay, and CPU.

Exemplary electronic device

Next, an electronic apparatus according to an embodiment of the present disclosure is described with reference to fig. 6. The electronic device may be either or both of the first device 100 and the second device 200, or a stand-alone device separate from them that may communicate with the first device and the second device to receive the collected input signals therefrom.

FIG. 6 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure.

As shown in fig. 6, the electronic device 60 includes one or more processors 61 and a memory 62.

The processor 61 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 60 to perform desired functions.

Memory 62 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 61 to implement the audio-video quality evaluation methods of the various embodiments of the present disclosure described above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.

In one example, the electronic device 60 may further include: an input device 63 and an output device 64, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

For example, when the electronic device is the first device 100 or the second device 200, the input device 63 may be a microphone or a microphone array as described above for capturing an input signal of a sound source. When the electronic device is a stand-alone device, the input means 63 may be a communication network connector for receiving the acquired input signals from the first device 100 and the second device 200.

The input device 63 may also include, for example, a keyboard, a mouse, and the like.

The output device 64 may output various information including the determined distance information, direction information, and the like to the outside. The output devices 64 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.

Of course, for simplicity, only some of the components of the electronic device 60 relevant to the present disclosure are shown in fig. 6, omitting components such as buses, input/output interfaces, and the like. In addition, the electronic device 60 may include any other suitable components depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the audio-visual quality assessment method according to various embodiments of the present disclosure described in the "exemplary methods" section of this specification above.

The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the audio-video quality evaluation method according to various embodiments of the present disclosure described in the "exemplary methods" section above in this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. An audio and video quality evaluation method is characterized by comprising the following steps:

2. The method of claim 1, before inputting the plurality of dimension scores into a neural network to obtain a composite score of the audio/video, further comprising:

3. The method of claim 2, wherein the training the neural network with a training set comprises:

adjusting a network parameter of the neural network based on the network loss.

4. The method of claim 3, further comprising, prior to determining the network loss based on a difference between the predicted composite score and the user score:

5. The method according to any one of claims 1-4, wherein the determining a dimension score corresponding to each evaluation dimension based on the dimension information corresponding to the evaluation dimension comprises:

6. The method according to claim 5, wherein the determining the dimension score corresponding to the evaluation dimension based on the abnormal times of the audio/video in the dimension information of the evaluation dimension comprises:

dividing the audio and video into at least one time interval according to time;

7. The method according to claim 6, wherein the determining a deduction score corresponding to the evaluation dimension according to the number of times of abnormality of the dimension information of the evaluation dimension in the time interval comprises:

8. An audio/video quality evaluation device characterized by comprising:

9. A computer-readable storage medium, characterized in that the storage medium stores a computer program for executing the audio-video quality evaluation method according to any one of claims 1 to 7.

10. An electronic device, characterized in that the electronic device comprises:

a processor;

a memory for storing the processor-executable instructions;

the processor is used for reading the executable instructions from the memory and executing the instructions to realize the audio and video quality evaluation method of any one of the claims 1 to 7.