CN112950567A

CN112950567A - Quality evaluation method, quality evaluation device, electronic device, and storage medium

Info

Publication number: CN112950567A
Application number: CN202110211837.0A
Authority: CN
Inventors: 鲁方波; 汪贤; 樊鸿飞; 蔡媛
Original assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Current assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date: 2021-02-25
Filing date: 2021-02-25
Publication date: 2021-06-11

Abstract

According to the quality evaluation method, the quality evaluation device, the electronic equipment and the storage medium, the first quality evaluation model and the second quality evaluation model are respectively used for carrying out quality evaluation on the object to be measured, so that a quality evaluation scheme for predicting the quality of the object to be measured from coarse to fine by adopting a multi-stage (two-stage) strategy is realized, the quality evaluation problem of multimedia data such as images or videos can be effectively solved, in addition, the scheme firstly carries out coarse-grained quality evaluation on the object to be measured by utilizing the first quality evaluation model, then, further takes the coarse-grained quality evaluation result as prior knowledge, and finely adjusts the quality evaluation by utilizing the second quality evaluation model (finally obtains the fine-grained quality evaluation result of the object to be measured), and thus the high-precision quality evaluation on the object to be measured can be realized.

Description

Quality evaluation method, quality evaluation device, electronic device, and storage medium

Technical Field

The present application relates to the field of quality detection of multimedia data, and in particular, to a quality evaluation method and apparatus, an electronic device, and a storage medium.

Background

Multimedia such as digital images or videos is used as an important medium for visualization and information interaction, and is widely developed in industries such as video conferences and television broadcasting. In addition, as the technical level of internet and multimedia is continuously improved, people have more and more demands for various multimedia data.

However, in the links of acquisition, encoding, transmission and the like of multimedia data such as images or videos, the multimedia data generally face quality loss of a corresponding degree, the quality loss generally causes the quality of the multimedia data such as the videos or the images to be significantly reduced, and the low-quality images or videos seriously reduce the visual impression of human eyes, so that in order to improve the visual experience of a user, a solution capable of effectively evaluating the quality of the multimedia data such as the images or the videos is provided, which has a very important meaning.

Disclosure of Invention

In view of the above, the present application provides a quality evaluation method, apparatus, electronic device and storage medium, which implement a coarse-to-fine and high-precision quality evaluation scheme suitable for objects such as multimedia data by performing quality prediction on the objects such as multimedia data by using a multi-stage strategy.

The specific technical scheme is as follows:

a quality evaluation method comprising:

acquiring a target object to be evaluated;

performing quality evaluation on the target object by using a first quality evaluation model to obtain a first evaluation result;

performing quality evaluation on the target object according to the first evaluation result by using a second quality evaluation model to obtain a second evaluation result;

the second quality evaluation model is obtained by training a first evaluation result of the sample based on the sample of the first quality evaluation model and the first quality evaluation model, so that the quality evaluation fine granularity corresponding to the second evaluation result of the second quality evaluation model is higher than the quality evaluation fine granularity corresponding to the first evaluation result of the first quality evaluation model.

Optionally, the first quality evaluation model includes a first feature extraction layer, a first generalization processing layer, and a softmax layer;

the quality evaluation of the target object by using the first quality evaluation model to obtain a first evaluation result includes:

inputting the target object into the first feature extraction layer, and performing feature extraction processing on the target object by using the first feature extraction layer to obtain a first feature;

refining the first characteristic by using the first generalization processing layer to obtain a second characteristic;

mapping the second features into confidence degrees corresponding to the first granularity scores in the sample mark space by using the softmax layer; the first evaluation result includes: and the target first granularity score with the highest confidence level in the first granularity scores.

Optionally, the second quality evaluation model includes a second feature extraction layer, a second generalization treatment layer, and a full connection layer;

the performing, by using the second quality evaluation model, quality evaluation on the target object according to the first evaluation result to obtain a second evaluation result includes:

inputting the target object into the second feature extraction layer, and performing feature extraction processing on the target object by using the second feature extraction layer to obtain a third feature;

refining the third characteristic by using the second generalization processing layer to obtain a fourth characteristic;

inputting the fourth feature and the first evaluation result into the full-connection layer, and mapping the fourth feature and the first evaluation result to a sample mark space by using the full-connection layer to obtain a second granularity score corresponding to the target object in the sample mark space;

wherein the second evaluation result comprises the second granularity score; the second granularity score is composed of the target first granularity score and a fine adjustment score, and the fine adjustment score is a non-negative value smaller than the absolute value of the difference between two adjacent first granularity scores.

Optionally:

the number of the first generalization treatment layers is one;

the number of the second generalization treatment layers is multiple.

Optionally, the second quality evaluation model is: a model obtained by training one branch of the two model branches of the twin network model;

the second quality evaluation model is obtained by training the branch by utilizing each sample in the sample set of the first quality evaluation model and the first evaluation result of the first quality evaluation model on each sample;

the twin network model is obtained by performing sequencing training on a twin structure network determined based on a predetermined classification network by using each sample pair in the sample set of the first quality evaluation model and a first evaluation result of the first quality evaluation model on each sample in the sample pair;

wherein the sample pair is a sample pair formed by any two samples in the sample set.

Optionally, before the obtaining of the target object to be evaluated, the method further includes:

constructing the first quality evaluation model;

and constructing the second quality evaluation model based on the sample set of the first quality evaluation model and the first evaluation result of the first quality evaluation model on each sample in the sample set.

Optionally, the constructing the first quality evaluation model, and the constructing the second quality evaluation model based on the sample set of the first quality evaluation model and the first evaluation result of the first quality evaluation model on each sample in the sample set, includes:

training a model network determined based on a first classification network based on a sample set and a quality evaluation label labeled for each sample in the sample set to obtain a first quality evaluation model;

based on each sample pair in the sample set, a quality evaluation label labeled for each sample in the sample pair and a first evaluation result of the first quality evaluation model on each sample in the sample pair, performing sequencing training on a twin structure network determined based on a second classification network to obtain a twin network model;

training one branch in the twin network model based on the sample set, the quality evaluation label labeled for each sample in the sample set and the first evaluation result of the first quality evaluation model for each sample in the sample set to obtain the second quality evaluation model;

Optionally, the quality assessment label labeled for a sample in the sample set is a second granularity score;

the first evaluation result comprises a first granularity score; the evaluation result obtained by evaluating the quality of the sample by any branch in the twin network model comprises a second granularity score; the second evaluation result comprises a second granularity score;

and the quality evaluation fine granularity corresponding to the second granularity score is higher than the quality evaluation fine granularity corresponding to the first granularity score.

A quality evaluation apparatus comprising:

an acquisition unit configured to acquire a target object to be evaluated;

the first evaluation unit is used for evaluating the quality of the target object by utilizing a first quality evaluation model to obtain a first evaluation result;

the second evaluation unit is used for evaluating the quality of the target object according to the first evaluation result by using a second quality evaluation model to obtain a second evaluation result;

An electronic device, comprising:

a memory for storing a set of computer instructions;

a processor for implementing the quality assessment method as described in any one of the above by executing the instruction set stored in the memory.

A computer readable storage medium having stored therein a set of computer instructions which, when executed by a processor, implement a quality assessment method as claimed in any one of the preceding claims.

According to the above scheme, the quality evaluation method, the quality evaluation device, the electronic device and the storage medium provided by the application realize a quality evaluation scheme for performing coarse-to-fine quality prediction on an object to be measured by using a multi-stage (two-stage) strategy by respectively using a first quality evaluation model and a second quality evaluation model, can effectively solve the quality evaluation problem of multimedia data such as images or videos, and can realize high-precision quality evaluation on the object to be measured by using the first quality evaluation model to perform coarse-grained quality evaluation on the object to be measured, further using the coarse-grained quality evaluation result as prior knowledge and using the second quality evaluation model to perform fine-tuning of quality evaluation (finally obtaining the fine-grained quality evaluation result of the object to be measured).

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic flow chart of a quality evaluation method provided in an embodiment of the present application;

fig. 2 is a schematic model structure diagram of a first quality evaluation model provided in an embodiment of the present application;

fig. 3 is a schematic flowchart of a quality evaluation of a target object by using a first quality evaluation model according to an embodiment of the present application;

fig. 4 is a schematic model structure diagram of a second quality evaluation model provided in an embodiment of the present application;

fig. 5 is a schematic flowchart of quality evaluation of a target object by using a second quality evaluation model according to an embodiment of the present application;

FIG. 6 is a schematic flow chart of another quality evaluation method provided in the embodiments of the present application;

FIG. 7 is a schematic flowchart of a method for constructing a multi-stage quality evaluation model according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of a model structure of a twin network model provided by an embodiment of the present application;

FIG. 9 is a schematic structural diagram of a quality evaluation device provided in an embodiment of the present application;

FIG. 10 is a schematic structural diagram of another quality evaluation device provided in an embodiment of the present application;

fig. 11 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In order to improve the visual experience of a user, a solution capable of effectively evaluating the quality of multimedia data such as images or videos is provided, and the method has very important significance. The inventor finds that the quality scores of the videos or the images are generally obtained directly through a single-stage task processing mode in the conventional technology, the precision of the quality evaluation result of the mode is not high, and particularly for two videos or images with similar quality, the quality of the two videos or images cannot be measured accurately.

Therefore, the application discloses a quality evaluation method, a quality evaluation device, an electronic device and a storage medium, which realize a coarse-to-fine and high-precision quality evaluation scheme suitable for objects such as multimedia data by predicting the quality of the objects such as the multimedia data by adopting a multi-stage strategy so as to improve the precision of the quality evaluation result of the multimedia data such as images and videos by the traditional technology at least to a certain extent.

Referring to fig. 1, a schematic flow chart of a quality evaluation method provided in an optional embodiment of the present application is applicable to, but not limited to, a terminal device such as a mobile phone, a tablet computer, and a personal PC (e.g., a notebook, an all-in-one machine, and a desktop) having a data processing function, or a physical machine corresponding to a private cloud/public cloud platform, a local area network/internet server having a data processing function.

As shown in fig. 1, in this embodiment, the quality evaluation method specifically includes:

step 101, obtaining a target object to be evaluated.

The target object to be evaluated may be, but is not limited to, multimedia data such as an image or a video. In the case that the target object is a video, it may be a complete video, or may be any video segment obtained by cutting (or slicing) the complete video.

Typically, the method can be applied to a filtering scene of multimedia data such as images/videos, and in the scene, quality evaluation is performed on uploaded data such as images or videos by using the method, and data which do not meet quality requirements are filtered out, so that the visual impression of human eyes of a user is improved. Accordingly, in this scenario, the target objects to be evaluated may be individual pictures or video/video clips uploaded to a network (e.g., to a video client backend server).

And 102, evaluating the quality of the target object by using a first quality evaluation model to obtain a first evaluation result.

The first quality evaluation model is used in the first stage of the multi-stage quality prediction of the present application, and is a coarse-grained quality evaluation model, which may be constructed in advance based on a predetermined classification network, such as, but not limited to, a convolutional neural network such as ResNet50, Vgg16, and the like.

The model structure of the first quality evaluation model, as shown in fig. 2, includes a first feature extraction layer (denoted as CNN Features in fig. 1), a first generalization processing layer (denoted as FC Block in fig. 2), and a softmax layer; the number of the first generalization treatment layers can be one or more.

In implementation, the number of the first generalization processing layers can be set according to actual needs on the basis of balancing model efficiency and effect, for the first quality evaluation model with coarse granularity, potential rule learning between object features and object quality with fine granularity is not needed, and the increase of the number of the first generalization processing layers does not bring obvious improvement of model effect.

Based on the model structure of the first quality evaluation model shown in fig. 2, referring to fig. 3, a first evaluation result can be obtained by performing the following quality evaluation process on the target object using the first quality evaluation model:

step 301, inputting a target object into a first feature extraction layer, and performing feature extraction processing on the target object by using the first feature extraction layer to obtain a first feature;

specifically, a target object such as an image or a video may be input into a first feature extraction layer of the first quality evaluation model, and the first feature extraction layer may be specifically implemented to include one or more convolution layers for extracting underlying features of the target object, for example, the target object is the image or the video, and the extracted underlying features may include, but are not limited to, underlying features of image frames in the image or the video in terms of color, brightness, edges, corners, and/or texture.

If the target object input to the first quality evaluation model is a video, the frame-level features of at least a part of video frames (e.g., all video frames of the video) of the video are averaged to obtain the underlying features of the video. For convenience of description, the present embodiment refers to the above-described bottom-layer features extracted by the first feature extraction layer as first features.

Step 302, refining the first characteristic by using a first generalization processing layer to obtain a second characteristic;

the first generalization processing layer receives the bottom layer features (first features) output by the first feature extraction layer, and performs further feature refinement processing on the received bottom layer features to prevent the first quality evaluation model from being over-fitted, and promote the generalization capability of the first quality evaluation model.

Referring to fig. 2, in this embodiment, the first generalization processing layer is composed of a full connection layer (denoted as FC in fig. 2), a ReLU active layer (denoted as ReLU in fig. 2), a normalization layer (denoted as Batch Norm in fig. 2), and a Dropout layer, and further feature refinement of the bottom layer features extracted by the first feature extraction layer is implemented through the functional layers included in the first generalization processing layer, so as to obtain a second feature, thereby achieving the purposes of preventing overfitting of the model and improving the generalization capability of the model. Regarding the specific functions of each functional layer in the fully-connected layer, the ReLU active layer, the normalization layer, and the Dropout layer, reference may be made to the functional design of each corresponding layer in the existing neural network, and details thereof are not described.

Step 303, mapping the second features to confidence degrees corresponding to the first granularity values in the sample mark space by using a softmax layer; the first evaluation result includes: and the target first granularity score with the highest confidence level in the first granularity scores.

And the softmax layer receives the second features output by the first generalization processing layer and maps the second features into confidence degrees corresponding to the first granularity scores in the sample mark space. And the target first granularity score with the highest confidence coefficient is a first evaluation result obtained by processing the target object by the first quality evaluation model.

The first granularity score is a coarse granularity score in the sample mark space, and assuming that a range of a subjective annotation score of a sample in the sample mark space is 1 to 5, and an annotated score is specifically a floating point number type, for example, 1.0, 1.1, 1.7, 3.5, 4.9, and the like, each first granularity score (coarse granularity score) corresponding to the sample mark space may specifically be: 1.2, 3 and 4. The first granularity scores 1, 2, 3 and 4 respectively correspond to 1-2, 2-3, 3-4 and 4-5 of the subjective notation scores 1-5 in a logical angle, that is, if the target first granularity score with the highest confidence level output by the first quality evaluation model is 1, the target first granularity score indicates that the actual quality score of the target object is between 1 and 2, and specifically may be any score value of 1.0 and 1.1 … 1.9.9.

It should be noted that the sample label space and the corresponding first granularity scores (coarse granularity scores) are merely exemplary descriptions of the present application, and in practical applications, other implementation forms may also be possible, for example, the sample label space is a natural number of 1 to 100, and the corresponding first granularity scores of the sample label space are: 0. 1, 2 … 9, or 0, 10, 20 … 90, etc., and logically correspond to actual annotated score values of 1-9, 10-19, 20-29 … 90-99, respectively, without limitation.

And 103, evaluating the quality of the target object according to the first evaluation result by using a second quality evaluation model to obtain a second evaluation result.

The second quality evaluation model is obtained by training a first evaluation result of a sample of the first quality evaluation model based on the sample and the first quality evaluation model, so that the quality evaluation fine granularity corresponding to the second evaluation result of the second quality evaluation model is higher than the quality evaluation fine granularity corresponding to the first evaluation result of the first quality evaluation model.

The second quality evaluation model, which is used in the second stage of the multi-stage quality prediction of the present application, is a fine-grained quality evaluation model, and may also be constructed in advance based on a predetermined classification network, which may be, but is not limited to, a convolutional neural network such as ResNet50, Vgg16, and the classification network on which the second quality evaluation model is constructed may be the same as or different from the classification network on which the first quality evaluation model is constructed, for example, both quality evaluation models are constructed based on ResNet50, or one quality evaluation model is constructed based on ResNet50 and the other quality evaluation model is constructed based on Vgg 16.

Referring to fig. 4, a model structure of the second quality evaluation model is shown, including a second feature extraction layer (denoted as CNN Features in fig. 4), a second generalization processing layer (denoted as FC Block in fig. 4) and a full connection layer (denoted as FC in fig. 4).

Similarly, in implementation, the number of the second generalization treatment layers can be set according to actual needs on the basis of balancing the efficiency and the effect of the model, and for the second quality evaluation model with fine granularity, in view of the potential rule learning requirement between the characteristics of the object (image, video or the like) and the quality of the object, the number of the second generalization treatment layers in the second quality evaluation model can be preferably set to be multiple.

Based on the model structure of the second quality evaluation model shown in fig. 4, referring to fig. 5, a second evaluation result can be obtained by performing the following quality evaluation process on the target object using the second quality evaluation model:

step 501, inputting the target object into the second feature extraction layer, and performing feature extraction processing on the target object by using the second feature extraction layer to obtain a third feature.

Similar to the function of the first feature extraction layer in the first quality evaluation model, the second feature extraction layer in the second quality evaluation model may also be implemented to include one or more convolution layers for extracting underlying features of the target object, such as underlying features of an image frame in an image or video in terms of color, brightness, edges, corners, and/or texture, etc., and the underlying features extracted by the second feature extraction layer are described as third features.

If the target object input to the second quality evaluation model is a video, the frame-level features of at least part of video frames (e.g., all video frames) of the video are also averaged to obtain the underlying features of the video.

Step 502, refining the third characteristic by using a second generalization layer to obtain a fourth characteristic;

similarly, the second generalization layer may also include a full connection layer (FC), a ReLU activation layer (ReLU), a normalization layer (Batch Norm), and a Dropout, and is configured to implement further feature refinement on the underlying feature (third feature) of the target object extracted by the second feature extraction layer through each included layer to obtain a fourth feature, so as to prevent overfitting of the second quality evaluation model and improve the generalization capability of the second quality evaluation model.

Different from the first quality evaluation model, the first quality evaluation model learns the rule between the characteristics of the object (image or video and the like) and the quality of the coarse-grained object in the model training stage, and the second quality evaluation model learns the rule between the characteristics of the object (image or video and the like) and the quality of the fine-grained object in the model training stage, which is reflected in the model itself, and is different from the network parameters of the model and/or the specific processing procedures (such as convolution-based characteristic extraction, characteristic extraction and the like) of each functional layer in the model.

Step 503, inputting the fourth feature and the first evaluation result into the full-link layer, and mapping the fourth feature and the first evaluation result to the sample mark space by using the full-link layer to obtain a second granularity score corresponding to the target object in the sample mark space.

And the second evaluation result output by the second quality evaluation model for quality evaluation of the target object comprises the second granularity score.

Specifically, the fully-connected layer receives the fourth feature output by the second generalization processing layer, and as shown in fig. 4, the fully-connected layer also takes as input the first evaluation result output by the first quality evaluation model (as in fig. 4, MOS1 input to FC). And then, fusing the two paths of input, wherein vectors corresponding to the two paths of input can be fused (concat), and the fusion result is mapped to a sample mark space to obtain a second granularity value of the target object corresponding to the sample mark space.

And the quality evaluation fine granularity corresponding to the second granularity score is higher than the quality evaluation fine granularity corresponding to the first granularity score, and the second granularity score and the subjective marking score value of the sample in the sample marking space belong to the same fine granularity. Still following the example above, assuming that the target first granularity score output by the first quality assessment model with the highest confidence is 1, the second granularity score output by the second quality assessment model may specifically be 1.7.

In the present application, the first evaluation result of the first quality evaluation model and the target object to be evaluated are used as the input of the second quality evaluation model, and the second quality evaluation model is used to perform quality evaluation on the target object by combining the first evaluation result, and the essential purpose is to: and providing the first evaluation result of the first quality evaluation model as prior knowledge to the second quality evaluation model, so that the second quality evaluation model does not need to perform coarse-grained quality evaluation on the target object any more, and only performs fine adjustment on the basis of the coarse-grained quality evaluation result (the target first granularity score with the highest confidence) obtained a priori, and accordingly obtaining a matched fine-grained quality evaluation result (the second granularity score in the second evaluation result). Thus, in essence, the second granularity score in the second evaluation result is comprised of the target first granularity score with the highest confidence in the first evaluation result and a fine tuning score that is a non-negative value less than the absolute value of the difference between two adjacent first granularity scores.

It is easy to understand that in the model training stage, the second quality evaluation model also uses the coarse-grained quality evaluation result output by the first quality evaluation model as prior knowledge, and only learns the potential rules between the object (image, video, or the like) features and the object quality in a targeted and fine-grained manner, so that the model has the function of fine-tuning the quality score according to the input prior coarse-grained quality evaluation result, and the coarse-grained "potential rules between the object features and the object quality" is obtained by the first quality evaluation model through learning.

According to the above scheme, the quality evaluation method of the embodiment implements a quality evaluation scheme for performing coarse-to-fine quality prediction on an object to be measured by using a multi-stage (two-stage) strategy by performing quality evaluation on the object to be measured by using the first quality evaluation model and the second quality evaluation model, and can effectively solve the quality evaluation problem of multimedia data such as images or videos.

The traditional technology directly obtains the quality score of a video or an image in a single-stage task processing mode, the precision of the quality evaluation result is not high, and the inventor finds that the root cause is as follows through research: the quality evaluation task belongs to a regression task essentially, namely, a certain video or image data is input, a refined quality result (such as a refined quality score) of the data is regressed through a model, the higher the precision required by the regression task is, the greater the difficulty is, namely, one model in single-stage task processing is difficult to learn the potential rule between the object characteristics of the image/video and the quality evaluation score of the image/video in a full sample marking space with high accuracy, the application adopts a multi-stage (two-stage) strategy to carry out quality prediction from coarse to fine on a target object, so that one model is used for learning the potential rule between the object characteristics and the object quality of coarse granularity (which coarse-granularity score belongs to is preliminarily determined based on the coarse-granularity object characteristics), and the other model is used for using the coarse-granularity quality evaluation result as a priori knowledge, the method has the advantages that the potential law between the object characteristics and the object quality of the fine granularity is learned in a targeted mode (the fine granularity score on the basis of the coarse granularity score is further judged based on the fine granularity object characteristics), the problem is solved, and the quality evaluation precision of the images or videos is improved.

In addition, in the single-stage task processing mode of the conventional technology, for two videos or images with similar quality, the quality of the two videos or images cannot be measured accurately.

In order to further solve the problem, optionally, in the embodiment of the present application, when constructing the second quality evaluation model, firstly, the twin structure network designed based on the predetermined classification network is subjected to ranking training to obtain a twin network model by using the "sample pair" in the sample set of the first quality evaluation model and the first evaluation result of the first quality evaluation model on each sample in the sample pair.

The predetermined classification network on which the twin network model is built may be, but is not limited to, a convolutional neural network such as ResNet50 or Vgg16, and the classification network on which the twin network model is built may be the same as or different from the classification network on which the first quality evaluation model is built. Two branches in the twin network model share network parameters, and based on sequencing training, each branch in the twin network model can effectively measure the quality of two videos or images with similar quality.

On the basis, one branch is selected, each sample in the sample set of the first quality evaluation model and the first evaluation result of the first quality evaluation model on each sample are utilized, the branch is further trained, and the selected branch can learn the fine-grained 'potential law between object characteristics and object quality' by taking the coarse-grained quality evaluation result as prior knowledge through the training, so that the second quality evaluation model is finally obtained.

The following embodiment will explain the construction process of each stage model referred to in the present application in detail.

In this embodiment, the second quality evaluation model has a function of measuring the quality of two objects, such as videos or images, with similar quality, and a function of evaluating the fine-grained quality based on the coarse-grained quality evaluation result as a prior result, so that the quality evaluation accuracy of the objects, such as images or videos, is improved, and the purpose of accurately measuring the quality of the two objects, such as videos or images, with similar quality can be further achieved.

The implementation of the method of the present application needs to be premised on that a first quality evaluation model and a second quality evaluation model are constructed in advance, and thus, in an optional embodiment, as shown in the flowchart of the quality evaluation method shown in fig. 6, before step 101, the quality evaluation method may further include:

step 601, constructing a first quality evaluation model; and constructing the second quality evaluation model based on the sample set of the first quality evaluation model and the first evaluation result of the first quality evaluation model on each sample in the sample set.

Referring to fig. 7, the process of constructing the first quality evaluation model and the second quality evaluation model includes:

step 701, training the twin structure network determined based on the first classification network based on the sample set and the quality evaluation labels labeled for the samples in the sample set to obtain the first quality evaluation model.

For the construction of the first quality evaluation model, firstly, the construction of a sample space and a sample marking space is required, wherein, a series of images or videos can be selected as a sample set, and each sample is subjected to quality labeling, the present embodiment assumes that the subjective labeling score value range when the sample set is subjected to quality labeling is 1-5, wherein, the marked point values are floating point number types, for example, 1.0, 1.1, 1.7, 3.5, 4.9, etc., each subjective marking point value in the range of 1-5 forms a sample marking space, and the subjective marking point values in the sample marking space are divided into four groups, i.e., 1-2, 2-3, 3-4, 4-5, and maps the subjective annotation score value of the video or image of the corresponding score segment to 4 coarse-grained scores of 1, 2, 3, 4 (i.e., the first granular score above).

Then, each sample is input into a network structure designed based on a first classification network such as ResNet50 or Vgg16, and the designed model network structure, as shown in fig. 2, includes a first feature extraction layer, a first generalization layer and a softmax layer, wherein the first feature extraction layer can be implemented to include one or more convolution layers, the number of the first generalization layers can be one or more, preferably one, and the first generalization layer includes a full connection layer, a ReLU activation layer normalization layer and a Dropout layer. When the sample is input, the sample is specifically input into the first feature extraction layer of the network.

In the model training process, the coarse-grained scores (such as 1, 2, 3 and 4 above) obtained by mapping the subjective marking score values of the videos or images of the corresponding score segments are taken as the standard, model loss and model parameter adjustment and model optimization on the basis are measured until the model converges or reaches the preset iteration times, and then the trained coarse-grained quality evaluation network can be obtained and used as the first quality evaluation model of the application.

Step 702, performing sequencing training on the twin structure network determined based on the second classification network based on each sample pair in the sample set, the quality evaluation label labeled for each sample in the sample pair, and the first evaluation result of the first quality evaluation model for each sample in the sample pair to obtain a twin network model;

optionally, in order to effectively measure the quality of two video or image objects to be measured with similar quality, for the construction of the second quality evaluation model, in this embodiment, a network model of a twin network structure is trained first, and then a branch of the twin network model is further selected to train the second quality evaluation model.

And (3) training the twin network model, wherein each sample pair formed by two samples in the sample set of the first quality evaluation model is used as a training sample, and the sequence of two subjective marking scores (representing the quality of the two samples) corresponding to each sample pair in the sample marking space of the first quality evaluation model is used as a label. In practice, it is preferable that the sample set contains a sufficient number of samples with similar quality in order to obtain a good modeling effect.

Then, the two samples in each sample pair are respectively input into two branches of a twin network structure designed based on the selected second classification network such as ResNet50 or Vgg16, and the network structure of each branch is as shown in fig. 8 and comprises a second feature extraction layer, a second generalization processing layer and a full connection layer. Wherein the second feature extraction layer may be implemented to include one or more convolution layers, the number of the second generalization processing layers may be one or more, preferably a plurality, and the second generalization processing layers include a full-connection layer, a ReLU activation layer normalization layer, and a Dropout layer. When the sample is input, the sample is specifically input into the second feature extraction layer of the corresponding branch.

Two branches of the twin network model share network parameters in the training process, and a rankinggloss function is adopted as a loss function.

In addition, in the model training process, a first quality evaluation model is added to the input of each branch full-connection layer to perform model training on the full-connection layer after a first evaluation result (a target first granularity score with the highest confidence) of the sample input in the branch is fused with the characteristics output by the functional layer before the full-connection layer in the branch structure. After two samples in the sample pair are input into two branches of the twin network model one by one, each branch outputs the quality evaluation value of the input sample of the corresponding branch, and calculates the corresponding loss value through a rankinggloss loss function, and then feeds back the loss value to the model for network parameter updating until the model converges or reaches the preset iteration number, and the trained twin network model is correspondingly obtained.

Compared with the first quality evaluation model, the twin network model has finer quality evaluation granularity on objects such as images or videos, and the quality evaluation value output by each branch of the twin network model is a fine-grained quality score such as 1.2, 1.3 and the like in the training process,

it should be noted that, in the twin network model training stage, the model learns the potential rules between the object characteristics of two samples with similar quality in the sample pair and the quality of the two samples, and in the training process, the quality ranking of the two samples represented by the output results of the two branches and the difference between the quality ranking of the two samples represented by the subjective labeling scores in the sample labeling space, the model loss, the model parameter adjustment and the model optimization based on the measurement are performed through the rankings loss function, so that the two model branches sharing the network parameters can be obtained finally. That is, in the training process, the measurement of the model loss does not consider the value difference between the model branch output score and the annotation score, but only considers the difference between the ranking results, for example, if the annotation scores of two samples in a sample pair are 1.1 and 1.2 respectively, the scores output by the two branches of the model are 1.2 and 1.3, and the ranking results of the two are the same, the model output and the annotation result are considered to be consistent correspondingly, and no loss exists.

Step 703, training a branch in the twin network model based on the sample set, the quality evaluation label labeled for each sample in the sample set, and the first evaluation result of each sample in the sample set by the first quality evaluation model, to obtain the second quality evaluation model.

After the twin network model is trained, one branch (any branch can be used) is selected for further model training to construct a second quality evaluation model.

The sample set on which the model training is based is still the sample set used for training the first quality evaluation model, the sample label space is the label space corresponding to the above subjective marking point value range, such as the fine quality values of each floating point type in the range of 1-5, and the model structure follows the structure of the model branch in the twin network model, specifically as shown in fig. 4, and includes a second feature extraction layer, a second generalization layer and a full connection layer.

In the training process, each sample in the sample set is input into the selected branch, specifically, a second feature extraction layer of the selected branch is input, after feature extraction of the second feature extraction layer and feature extraction of a second generalization processing layer are sequentially carried out, a result is output to the full connection layer, in addition, in the input of the full connection layer, a first quality evaluation model is added to carry out fusion on a first evaluation result (a target first granularity score with the highest confidence coefficient, such as MOS1 input to the FC in fig. 4) of the input sample in the branch and a feature output by a functional layer before the full connection layer in the branch structure, and then model training is carried out in the full connection layer. Continuously iterating and training by calculating the loss between the output of the model and the subjective annotation score (the fine granularity score of the original annotation, namely the second granularity score, such as 1.1, 1.7, 3.5, 4.9 and the like), so as to realize fine adjustment of the model, and continuously enabling the prediction result of the model to approach the subjective annotation score of the fine granularity until the model converges or reaches the preset iteration times, so as to correspondingly obtain a trained fine granularity quality evaluation network, namely the second quality evaluation model.

The model loss can be measured by L1 loss or L2 loss, wherein L1 loss refers to Mean Absolute Error (MAE) and is a loss function for the regression model; l2 loss, which refers to mean-square error (MSE), is another loss function used in regression models.

The training in this stage can make the model further learn the potential law between the object features of fine granularity and the object quality by using the coarse-granularity quality evaluation result output by the first quality evaluation model as prior knowledge on the basis of effectively measuring the quality of two objects with similar quality, such as videos or images, so that the model output is closer to the actual annotation score (for example, for a twin network model with the trained completion, the two objects with similar quality can be effectively and correctly ranked in quality, but the object with the annotated quality score of 1.1 may be predicted to be a quality score of 1.2 or 1.3, and based on the training in this stage, the problem of branching of the twin network model can be further corrected to make the output close to the actual annotation result).

In the model using stage, an object such as a video or an image is input into a trained first quality evaluation model to obtain a coarse-grained quality evaluation result (the above first evaluation result), then the object is further input into a trained second quality evaluation model, the coarse-grained quality evaluation result of the first quality evaluation model is added into a full connection layer of the second quality evaluation model as an input, and finally, the second quality evaluation model outputs a fine-grained quality evaluation result (the above second evaluation result) of the object.

Based on the model construction process provided by the embodiment, coarse-to-fine quality evaluation based on a multi-stage evaluation strategy can be performed on the object to be measured such as an image or a video, the precision of the quality evaluation result is higher compared with a single-stage quality evaluation mode of the traditional technology, and the quality of two videos or images with similar quality can be effectively measured.

Corresponding to the quality evaluation method, an embodiment of the present application further provides a quality evaluation apparatus, as shown in fig. 9, the apparatus may include:

an acquiring unit 901 configured to acquire a target object to be evaluated;

a first evaluation unit 902, configured to perform quality evaluation on the target object by using a first quality evaluation model to obtain a first evaluation result;

a second evaluation unit 903, configured to perform quality evaluation on the target object according to the first evaluation result by using a second quality evaluation model to obtain a second evaluation result;

In an optional implementation manner of an embodiment of the present application, the first quality evaluation model includes a first feature extraction layer, a first generalization processing layer, and a softmax layer;

the first evaluation unit 902 is specifically configured to:

In an optional implementation manner of the embodiment of the present application, the second quality evaluation model includes a second feature extraction layer, a second generalization treatment layer, and a full connection layer;

the second evaluation unit 903 is specifically configured to:

In an alternative implementation of the embodiments of the present application:

the number of the first generalization treatment layers is one;

the number of the second generalization treatment layers is more than one.

In an optional implementation manner of the embodiment of the present application, the second quality evaluation model is: a model obtained by training one branch of the two model branches of the twin network model;

a second quality evaluation model, which is obtained by training the branch by using each sample in the sample set of the first quality evaluation model and the first evaluation result of the first quality evaluation model to each sample;

the twin network model is obtained by performing sequencing training on a twin structure network determined based on a predetermined classification network by utilizing each sample pair in the sample set of the first quality evaluation model and a first evaluation result of the first quality evaluation model on each sample in the sample pair;

In an optional implementation manner of the embodiment of the present application, referring to fig. 10, the quality evaluation apparatus of the present application may further include:

a model building unit 904 for: before the target object to be evaluated is obtained, the first quality evaluation model is constructed; and constructing the second quality evaluation model based on the sample set of the first quality evaluation model and the first evaluation result of the first quality evaluation model on each sample in the sample set.

In an optional implementation manner of the embodiment of the present application, the model building unit 904 is specifically configured to:

In an optional implementation manner of the embodiment of the present application, the quality evaluation label marked for the sample in the sample set is a second granularity score;

The quality evaluation device disclosed in the embodiment of the present application is relatively simple in description because it corresponds to the quality evaluation method disclosed in the embodiment of the method above, and for the relevant similarities, please refer to the description of the corresponding method embodiment above, and the detailed description is omitted here.

The embodiment of the application also discloses an electronic device, which can be but not limited to a terminal device with a data processing function, such as a mobile phone, a tablet computer, a personal PC (e.g., a notebook, an all-in-one machine, a desktop), or a corresponding physical machine with a data processing function, such as a private cloud/public cloud platform, a local area network/internet server, and the like.

The structural schematic diagram of the electronic device shown in fig. 11 at least includes:

a memory 1101 for storing a set of computer instructions;

the set of computer instructions may be embodied in the form of a computer program.

The memory 1101 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

A processor 1102 for implementing the quality assessment method of any of the above method embodiments by executing a set of instructions stored in a memory.

The processor 1102 may be a Central Processing Unit (CPU), an application-specific integrated circuit (ASIC), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other programmable logic devices.

Besides, the electronic device may further include a communication interface, a communication bus, and the like. The memory, the processor and the communication interface communicate with each other via a communication bus.

The communication interface is used for communication between the electronic device and other devices. The communication bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like, and may be divided into an address bus, a data bus, a control bus, and the like.

In this embodiment, when a processor in an electronic device executes a computer instruction set stored in a memory, a quality evaluation scheme for performing coarse-to-fine quality prediction on an object to be measured by using a multi-stage (two-stage) strategy is implemented by performing quality evaluation on the object to be measured by using a first quality evaluation model and a second quality evaluation model, which can effectively solve the quality evaluation problem of multimedia data such as images or videos.

In addition, the embodiment of the present application further discloses a computer-readable storage medium, in which a computer instruction set is stored, and when the computer instruction set is executed by a processor, the quality evaluation method disclosed in any one of the above method embodiments is implemented.

Specifically, when the computer instruction set in the computer-readable storage medium of this embodiment is executed by the processor, the quality of the object to be measured is evaluated by using the first quality evaluation model and the second quality evaluation model, respectively, so as to implement a quality evaluation scheme for predicting the quality of the object to be measured from coarse to fine by using a multi-stage (two-stage) strategy, which can effectively solve the quality evaluation problem of multimedia data such as images or videos.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other.

For convenience of description, the above system or apparatus is described as being divided into various modules or units by function, respectively. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

Finally, it is further noted that, herein, relational terms such as first, second, third, fourth, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A quality evaluation method characterized by comprising:

acquiring a target object to be evaluated;

2. The method of claim 1, wherein the first quality assessment model comprises a first feature extraction layer, a first generalization processing layer, and a softmax layer;

3. The method of claim 2, wherein the second quality assessment model comprises a second feature extraction layer, a second generalization treatment layer, and a full-connectivity layer;

4. The method of claim 3, wherein:

the number of the first generalization treatment layers is one;

the number of the second generalization treatment layers is multiple.

5. The method of claim 1, wherein the second quality assessment model is: a model obtained by training one branch of the two model branches of the twin network model;

6. The method according to claim 1, further comprising, before the obtaining a target object to be evaluated:

constructing the first quality evaluation model;

7. The method of claim 6, wherein the constructing the first quality evaluation model and the constructing the second quality evaluation model based on the first evaluation result of the first quality evaluation model on each sample in the sample set comprises:

8. The method of claim 7, wherein the quality assessment labels labeled for samples in a sample set are a second granularity score;

9. A quality evaluation apparatus, comprising:

an acquisition unit configured to acquire a target object to be evaluated;

10. An electronic device, comprising:

a memory for storing a set of computer instructions;

a processor for implementing the quality assessment method of any one of claims 1 to 8 by executing a set of instructions stored on the memory.

11. A computer-readable storage medium having stored therein a set of computer instructions which, when executed by a processor, implement the quality assessment method of any one of claims 1-8.