CN111199541A

CN111199541A - Image quality evaluation method, image quality evaluation device, electronic device, and storage medium

Info

Publication number: CN111199541A
Application number: CN201911379479.3A
Authority: CN
Inventors: 彭冬炜
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2020-05-26

Abstract

The present disclosure provides an image quality evaluation method, an apparatus, an electronic device and a computer-readable storage medium, which relate to the technical field of image processing, and the image quality evaluation method includes: acquiring an image to be processed, and acquiring a semantic segmentation image corresponding to the image to be processed; performing prediction processing on the image to be processed and the semantic segmentation image, and determining scoring information of the image to be processed; and performing aesthetic quality evaluation on the image to be processed according to the grading information. The method and the device can improve the comprehensiveness and accuracy of image quality evaluation.

Description

Image quality evaluation method, image quality evaluation device, electronic device, and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image quality evaluation method, an image quality evaluation device, an electronic device, and a computer-readable storage medium.

Background

With the development of image technology, the quality of images can be scored so as to facilitate the beautification of the images or the subsequent processing.

In the related art, a model corresponding to data is used for quality evaluation of an image. In this way, the data may not be comprehensive, thus resulting in inaccurate and unreasonable quality evaluation results for the image.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

An object of the present disclosure is to provide an image quality evaluation method, apparatus, electronic device, and computer-readable storage medium, thereby overcoming, at least to some extent, the problem of inaccurate image quality evaluation due to limitations and disadvantages of the related art.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to an aspect of the present disclosure, there is provided an image quality evaluation method including: acquiring an image to be processed, and acquiring a semantic segmentation image corresponding to the image to be processed; performing prediction processing on the image to be processed and the semantic segmentation image, and determining scoring information of the image to be processed; and performing aesthetic quality evaluation on the image to be processed according to the grading information.

According to an aspect of the present disclosure, there is provided an image quality evaluation apparatus including: the semantic segmentation module is used for acquiring an image to be processed and acquiring a semantic segmentation image corresponding to the image to be processed; the score determining module is used for performing prediction processing on the image to be processed and the semantic segmentation image and determining score information of the image to be processed; and the image evaluation module is used for performing aesthetic quality evaluation on the image to be processed according to the grading information.

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform any of the image quality assessment methods described above via execution of the executable instructions.

According to an aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image quality evaluation method of any one of the above.

In the image quality evaluation method, the apparatus, the electronic device, and the computer-readable storage medium provided by the present exemplary embodiment, on one hand, by extracting the score information of the to-be-processed image in combination with the to-be-processed image and the semantic segmentation image of the to-be-processed image, feature data can be acquired from the to-be-processed image itself and two dimensions of the semantic segmentation image at a pixel level to determine the score information of the to-be-processed image, so as to perform aesthetic quality evaluation on the to-be-processed image. The method has the advantages that the pixel distribution and abstract semantic information of the image to be processed are considered at the same time, so that aesthetic quality evaluation can be performed from two aspects of the pixel and the semantic information, image data for obtaining the grading information is more comprehensive, the image to be processed can be accurately identified by combining the semantic segmentation image, misevaluation caused only by marking the image is avoided, limitation is avoided, and the accuracy and the comprehensiveness of the aesthetic quality evaluation are improved. On the other hand, the semantic segmentation image and the image to be processed are combined to evaluate the image to be processed, so that not only can the scoring be performed through the pixel color distribution of the image to be processed, but also the aesthetic evaluation of the abstract significance can be performed by referring to the semantic information represented by the semantic segmentation image, the rationality of performing the aesthetic quality evaluation of the image is improved, and meanwhile, the efficiency of evaluating the image quality is also improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

Fig. 1 schematically shows a schematic diagram of a system architecture for implementing the image quality evaluation method.

Fig. 2 schematically illustrates a schematic diagram of an image quality evaluation method in an exemplary embodiment of the present disclosure.

Fig. 3 schematically shows an overall flow diagram of quality evaluation in an exemplary embodiment of the present disclosure.

Fig. 4 schematically illustrates a specific flowchart of processing a sample image in an exemplary embodiment of the present disclosure.

Fig. 5 schematically illustrates a specific flowchart for training a model in an exemplary embodiment of the present disclosure.

Fig. 6 schematically illustrates a detailed flowchart of identifying an image in an exemplary embodiment of the present disclosure.

Fig. 7 schematically shows a block diagram of an image quality evaluation apparatus in an exemplary embodiment of the present disclosure.

Fig. 8 schematically shows a schematic view of an electronic device in an exemplary embodiment of the disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

In the present exemplary embodiment, first, a system architecture diagram for performing an image quality evaluation method is provided. Referring to fig. 1, a system architecture 100 may include a first end 101, a network 102, and a second end 103. The first end 101 may be a client, and may be, for example, various handheld devices (smart phones) having a photographing function and an image display function, desktop computers, vehicle-mounted devices, wearable devices, and the like. The network 102 is used as a medium for providing a communication link between the first end 101 and the second end 103, the network 102 may include various connection types, such as a wired communication link, a wireless communication link, and the like, and in the embodiment of the present disclosure, the network 102 between the first end 101 and the second end 103 may be a wired communication link, such as a communication link provided by a serial connection line, or a wireless communication link, such as a communication link provided by a wireless network. The second terminal 103 may be a client, for example, a terminal device with a data processing function, such as a portable computer, a desktop computer, a smart phone, and the like, for performing feature extraction and scoring processing on an input image. When the first end and the second end are both clients, the first end and the second end may be the same client.

It should be understood that the number of first ends, networks and second ends in fig. 1 is merely illustrative. There may be any number of clients, networks, and servers, as desired for an implementation.

It should be noted that the image quality evaluation method provided by the embodiment of the present disclosure may be completely executed by the second end or the first end, or may be executed by the first end and the second end, where the execution subject of the image quality evaluation method is not particularly limited. Accordingly, the image quality evaluation device may be disposed in the second end 103 or in the first end 101.

Based on the system architecture, the embodiment of the present disclosure provides an image quality evaluation method, which may be applied to any scene for evaluating and evaluating the quality of a photo, a video, or a picture. Next, the image quality evaluation method in the present exemplary embodiment is explained in detail with reference to fig. 2. The detailed description is as follows:

in step S210, an image to be processed is obtained, and a semantic segmentation image corresponding to the image to be processed is obtained.

In the embodiment of the present disclosure, the image to be processed may be an image captured by using any camera, or an image downloaded from a network, or an image acquired from another storage device. The image to be processed may be a still image or an image in a moving state, and the like. One or more objects may be included in the image to be processed, for example, when an image of a person is captured, an environment around the person, such as a car, a tree, and the like, may be captured, and the person, the car, the tree, and the like are all the objects included in the image to be processed.

The semantic segmentation image refers to an image including semantic information corresponding to an image to be processed, and can be obtained by performing semantic segmentation on the image to be processed. Semantic segmentation refers to grouping or segmenting pixels in an image to be processed according to different semantic meanings expressed in the image, and labeling different classes as different colors. When performing semantic segmentation, the image to be processed can be directly input to the trained semantic segmentation model, and the image to be processed is subjected to semantic segmentation at the pixel level according to the image characteristic information to obtain the semantic segmentation image. The image characteristic information may be used to divide the image to be processed into different objects, i.e. to represent categories to which the pixels belong, and may include, for example, people, vehicles, buildings, etc. All pixels of the image to be processed can be grouped or segmented according to different expression meanings (image characteristic information) according to the image characteristic information, so that semantic segmentation at a pixel level is realized to obtain a semantic segmentation image.

In the embodiment of the present disclosure, the semantic segmentation model refers to a model for performing semantic segmentation on an image. Semantic segmentation is to classify and recognize an image at a pixel level, and pixels belonging to the same class are classified into one class, so that the image is understood from the pixel level. For example, pixels belonging to people may all be classified into a first category, pixels belonging to buildings into a second category, and pixels belonging to cars into a third category, thereby identifying different categories of objects in the image. The present exemplary embodiment may train a semantic segmentation model through a large amount of training data and its corresponding class labels. And processing the image to be processed through the semantic segmentation model to determine the class information of one or more objects in the image to be processed. The semantic segmentation model can allocate each pixel point in the image to be processed to a certain object category. The semantic segmentation is performed on the image to be processed, which is to essentially segment the region where each object included in the image to be processed is located to identify the class where the object is located. Specifically, when performing semantic segmentation on the image to be processed, a semantic label, such as a semantic label of an object such as a road, a sky, a person, or a building, may be assigned to each pixel point in the image to be processed, and the semantic label may be used to represent category information of one or more objects in the image to be processed.

In the embodiment of the disclosure, in order to increase the application range and enable the application range to be applied to a convenient mobile terminal, a lightweight semantic segmentation model may be used. In particular, a lightweight semantic segmentation model may include an encoder and a decoder; the encoder is used for down-sampling the image to be processed to obtain intermediate characteristic data corresponding to the image to be processed, and the decoder is used for up-sampling the intermediate characteristic data to obtain category information of each object in the image to be processed. The encoder and decoder may be of a symmetrical or asymmetrical configuration. The encoder can adopt a convolutional neural network, the input image to be processed is downsampled through operations of convolution and pooling, features are extracted from the aspect of abstract image semantics and feature learning is carried out, the decoder can gradually recover the detailed features of the image to be processed through operations of deconvolution and the like, the features are further learned on different scales, and finally a pixel classification result with the same resolution as the image to be processed is output. From encoder to decoder, there is usually a direct information connection to help the decoder better recover the details of the output target result.

Further, in order to ensure the accuracy of image upsampling during the decoding process and improve the depth of feature learning, the decoder may adopt a pyramid structure, that is, the intermediate feature data is gradually restored to the image features of the initial resolution by the combined arrangement of a plurality of deconvolution layers. In addition, in order to improve the segmentation and recognition capability of the semantic segmentation model on the infrared image, an attention layer can be added in a decoder, so that the obtained output result has higher accuracy, and the discrimination between similar images and the generalization capability of the model are improved.

The training process of the semantic segmentation model can comprise the following steps: the images used for training are input into the machine learning model, the machine learning model takes the image characteristic data of the images as input, the classification result of which class the object in the training images is output, and the output classification result can be closer to the class label by adjusting the model parameters until the accuracy of the model reaches a certain standard, so that the training can be considered to be finished.

In order to reduce the computational complexity of the semantic segmentation model, the semantic segmentation model can be deployed into different application scenes. When training the semantic segmentation model, one or more of network pruning, network quantization and weight sharing can be performed on the semantic segmentation model. The network pruning refers to compressing the trained neural network so as to achieve the purpose of reducing the complexity of the model. Network quantization is a generic term for a model acceleration method. Weight sharing refers to setting a part of the same weight in a model to achieve the purpose of sharing, and is also a means for simplifying the model.

Semantic segmentation refers to identifying an image at a pixel level, namely marking an object class to which each pixel in the image belongs, and further analyzing semantic information in the image. After the trained semantic segmentation model is obtained, the image to be processed can be input into the trained semantic segmentation model for image processing, so that the image to be processed is subjected to semantic segmentation from a pixel level, the category of each pixel in the image to be processed is marked, and the semantic segmentation image corresponding to the image to be processed is obtained. Fig. 3 shows a schematic diagram of obtaining the score information, and referring to fig. 3, an image 301 is an image to be processed, and an image 303 is a semantic segmentation image of the image to be processed. By acquiring the semantic segmentation image, the semantics contained in the image can be understood more accurately, and the image prediction can be further accurately performed.

With continuing reference to fig. 2, in step S220, the to-be-processed image and the semantically segmented image are subjected to prediction processing, and score information of the to-be-processed image is determined.

In the embodiment of the disclosure, after the semantic segmentation image used for representing the semantic information of the image to be processed is obtained, the image to be processed and the semantic segmentation image can be subjected to prediction processing according to the trained machine learning model. In particular, the machine learning model may be a variety of suitable models that can be used for classification. For example, the model may be any one of a decision tree, a convolutional neural network model, a linear regression analysis model, a support vector machine, and a random forest model, and the machine learning model is taken as an example for description herein. The trained machine learning model can include semantic information and pixel information, and is obtained by jointly training a semantic segmentation image according to a sample image and the sample image.

In order to improve the accuracy of the model, the machine learning model may be trained first to obtain a trained machine learning model, and the trained machine learning model may be used to score the image to be processed. The machine learning model can be trained according to the sample images and the grading information of the sample images, and the trained machine learning model is obtained. The scoring information of the sample image may be used to characterize the class label of the sample image.

A schematic diagram of a training model is schematically shown in fig. 4, and referring to the diagram shown in fig. 4, mainly includes the following steps:

in step S410, the sample image is acquired.

In the embodiment of the present disclosure, the sample image refers to an image for which score information and image quality have been obtained. The sample image may be the same as or different from an object included in the image to be processed, the scene of the sample image may be the same as or different from the image to be processed, and the sample image may be the same as or different from the acquisition condition of the image to be processed, for example, the sample image belongs to a dark scene image, the image to be processed belongs to a normal scene image, and the like. The number of sample images can be multiple, so as to improve the accuracy and reliability of training.

In step S420, performing semantic segmentation on the sample image to obtain a sample semantic segmentation image of the sample image.

In the embodiment of the disclosure, after the sample images are acquired, semantic segmentation can be performed on each sample image through a trained semantic segmentation model to obtain a sample semantic segmentation image of each sample image, so that semantic information of each sample image is represented through the sample semantic segmentation image. The sample semantic segmentation image is used for representing the recognition of pixel level, and each pixel of the input sample image is provided with a corresponding judgment label on the output to indicate that the pixel most probably belongs to an object or a category.

In step S430, the machine learning model is trained according to the predicted score information of the sample image, the manually labeled score information of the sample image, and the sample semantic segmentation image, so as to obtain the trained machine learning model.

In the embodiment of the present disclosure, the scoring information of the sample image herein may be used to characterize the class label of the sample image. The category label refers to scoring information manually labeled for the sample image. The category label may be the real score information of the sample image, and the real score information may be a real score, and may specifically be any number between 1 and 10, such as 1 or 10, and so on. In determining the truth score, the truth score may be determined according to a plurality of grades and a weight parameter for each grade. The plurality of ranks may be, for example, a first rank to a tenth rank, and may be represented by 1 to 10 points. The weight parameter refers to the weight of the number of people at each level in the total number of people when a plurality of users score a certain sample image. The real rating information may also be a rating profile and a rating type, etc. Wherein, the score distribution graph refers to the distribution of the histogram of each grade, and the score type refers to beauty or not, etc.

Further, the model training can be performed by combining the sample image, the grading information of the sample image and the sample semantic segmentation image. A flow chart for performing joint training is schematically illustrated in fig. 5, and referring to fig. 5, may include the following steps:

in step S510, the sample image and the sample semantic segmentation image are fused to obtain a fused sample image.

In the embodiment of the disclosure, the sample image and the sample semantic segmentation image corresponding to the sample image can be fused to obtain a fused sample image. It should be noted that the fusion here may include stitching or other ways of combining two images, and the stitching is taken as an example in the embodiment of the present disclosure. Specifically, the sample image and the sample semantically segmented image may each include a plurality of color channels therein. The plurality of color channels may be four color channels, and may include, for example, an R channel, a G channel, a B channel, and an a channel. Based on this, the sample image and the sample semantic segmentation image can be fused according to the same color channel, that is, the colors of the same color channel in the sample image and the sample semantic segmentation image can be fused. Further, the sample image and the sample semantic segmentation image are fused into an integral fused sample image according to each color channel. The fused sample image can contain classification on a pixel level, so that model training can be comprehensively carried out.

In step S520, feature extraction is performed on the fused sample image through the machine learning model to obtain image features.

In the embodiment of the present disclosure, after obtaining the fused sample image, the fused sample image may be input to the machine learning model, so as to extract the image features of the fused sample image through the convolution layer, the pooling layer, and the full-link layer of the machine learning model, and specifically, the image features are represented by one two-dimensional or high-dimensional feature data.

In step S530, the image features are input to the machine learning model to determine the prediction score information of the sample image, and parameters of the machine learning model are adjusted by using the manually labeled score information of the sample image as a training target, so as to obtain the trained machine learning model.

In the embodiment of the present disclosure, after the image features of the fused sample image are extracted, the features of the fused sample image may be used as an input of the model, and an output result of the image features may be determined by a classifier of the machine learning model, and the output result may be prediction score information, that is, prediction score information automatically generated by the machine learning model. The prediction scoring information may be the same as or different from the truth scoring information, and is determined according to a model. On the basis, the real score information (i.e. manually labeled score information) of each fused sample image is used as a training target, and the machine learning model is subjected to parameter adjustment until the parameters converge and the obtained predicted score information is matched with the real score information, so that the trained machine learning model is obtained.

In the embodiment of the disclosure, the model is trained by segmenting the image through the original sample image and the sample semantics of the sample image, and the sample image can be classified at both the image level and the pixel level, so that the machine learning model is trained according to the prediction scoring information and the real scoring information of the sample image, so that the trained machine learning model can simultaneously and accurately understand the image category and the pixel category of the sample image, and accurately understand the semantic information of the sample image, thereby avoiding the problem that the model is easily misled into the model completely based on the image pixel distribution in the correlation technique, enabling the model to understand the semantic content with abstract, improving the accuracy and the comprehensiveness of the trained machine learning model, realizing the prediction processing considering both the pixel and the semantics, and increasing the application range.

After obtaining the trained machine learning model, the trained machine learning model may be distributed to a terminal or a device including a processor, such as a server, so as to perform prediction processing on the image to be processed according to the model. Fig. 6 schematically shows a schematic diagram of the prediction process, and referring to fig. 6, the prediction process mainly includes the following steps:

in step S610, the image to be processed and the semantic segmentation image are fused to obtain a fused image.

In the embodiment of the present disclosure, the image to be processed and the semantic segmentation image may also include a plurality of color channels, and the plurality of color channels may still be an R channel, a G channel, a B channel, and an a channel. Based on the method, the image to be processed and the semantic segmentation image can be fused according to the same color channel, and the fusion can be splicing operation. Specifically, the image to be processed and the semantic segmentation image may be fused according to an R channel, a G channel, a B channel, and an a channel, respectively, so that the image to be processed and the semantic segmentation image are fused into an integral fused image according to each color channel. The fused image can contain classification on the pixel level, so that the obtained fused image can comprehensively represent the image to be processed, the accuracy is improved, and the comprehensiveness is increased.

In step S620, feature extraction is performed on the fused image to obtain feature data.

After the fused image is obtained in the embodiment of the disclosure, the fused image may be input to a trained machine learning model, so as to extract feature data of the fused image through a convolution layer, a pooling layer, and a full-link layer of the machine learning model. It should be noted that, since the trained machine learning model is obtained through pixel recognition and image recognition training, and includes the image information and semantic information of the image to be processed, abstract semantic information in the image to be processed can be accurately understood through the trained machine learning model, so that the problems of limitation of only recognizing pixels and error recognition caused by the limitation can be avoided, and the feature data of the image to be processed can be accurately extracted.

In step S630, a prediction process is performed on the feature data to determine the score information.

In the embodiment of the present disclosure, after the feature data is obtained, the feature data may be subjected to prediction processing by a trained machine learning model to obtain an output result for representing the scoring information. In the embodiment of the present disclosure, the category of the output result may be a prediction label or a prediction probability, and specifically, the category is different according to the category of the trained machine learning model. Based on this, the recognition result of the data to be processed can be determined based on the category of the output result, and the aesthetic quality evaluation result of the image to be processed can be further determined.

The presentation form of the score information may include any one of the following forms: score profile, score type, and score. The specific representation form can be determined according to actual requirements or reference information, and each type of reference information can correspond to one or more representation forms. The reference information may be a type of application scenario and is not particularly limited herein. For example, when the reference information is a1, the score information is a score distribution map; when the reference information is A2, the scoring information is a scoring type; when the reference information is a3, the score information is a score. In particular, the score may specifically be any number between 1-10, and the score may be determined from a weighted sum of the plurality of levels and the weight parameter of each level. The plurality of ranks may be, for example, a first rank to a tenth rank, and may be represented by 1 to 10 points. The weight parameter refers to the probability to which level the image to be processed belongs. The score is an aesthetic quality score that gives the image, appearing as a continuous number. The score distribution map refers to a distribution such as a histogram of each level, for example, a distribution histogram of aesthetic quality scores of images at the first level to the tenth level (1 to 10 points). The score type refers to outputting "good" and "bad", or aesthetic quality "high" or "low", or both the beauty and beauty categories, etc. given an image. And different scoring information may represent the evaluation of the image with different labels, respectively.

The aesthetic quality evaluation is used for determining the clown degree of the image according to the grading information so as to evaluate the quality of the image and further improve the image quality. When the score information is a score, it may be considered that more than a certain threshold belongs to beauty, and less than the threshold belongs to a category of beauty, and the larger the score is, the higher the quality is or the more beautiful the quality is. In addition, the image quality can be described through aesthetic distribution, so that the influence of subjective factors is avoided, and the accuracy is improved. And taking the distribution of the image as an evaluation standard, and carrying out all-around accurate evaluation on the aesthetic quality of the image. After the images are given, the system can give aesthetic evaluation with reference significance according to the trained model, and the evaluation result not only comprises specific score scores, but also comprises the distribution of the score scores, so that the system has stronger guiding significance in a real scene. The influence of image semantic information on image quality classification can be eliminated, and comprehensive, reasonable and fine image quality classification is realized.

Referring again to the schematic diagram of obtaining the rating information shown in fig. 3, the main process of performing the aesthetic quality evaluation of the image includes: acquiring an image 301, where the image may be used to represent an image to be processed; inputting the image into a voice segmentation model 302 to obtain a semantic segmentation image 302 of the image 301; the image 301 and the semantic segmentation image 303 are input to a machine learning model 304 together for joint recognition, and score information 305 of the image 301 is obtained.

The technical scheme provided by the embodiment of the disclosure provides an aesthetic quality evaluation system based on image semantic information. The original pixel image and the corresponding foreground semantic segmentation information are input into the model for prediction, and the semantic information of the image is emphasized at the input end, so that the model can learn the pixel color distribution of the image for grading, and can also perform aesthetic evaluation of abstract significance by referring to the foreground semantic information, thereby avoiding limitation and improving comprehensiveness and accuracy. Under the condition of introducing a small amount of computing resource consumption, the machine learning model simultaneously considers pixel distribution and abstract semantic information, and images are segmented according to original images and semantics to evaluate the aesthetic quality of the images, so that the resource consumption is reduced, the efficiency is improved, more reasonable aesthetic evaluation scores or distribution is given, and the accuracy of the aesthetic quality evaluation of the images is improved.

In the present exemplary embodiment, there is also provided an image quality evaluation apparatus, as shown in fig. 7, the apparatus 700 may include: the semantic segmentation module 701 is used for acquiring an image to be processed and acquiring a semantic segmentation image corresponding to the image to be processed; a score determining module 702, configured to perform prediction processing on the image to be processed and the semantic segmentation image, and determine score information of the image to be processed; and the image evaluation module 703 is configured to perform aesthetic quality evaluation on the image to be processed according to the scoring information.

In an exemplary embodiment of the present disclosure, the semantic segmentation module includes: and the segmentation control module is used for inputting the image to be processed into the trained semantic segmentation model, and performing semantic segmentation on the image to be processed at a pixel level according to the image characteristic information to obtain the semantic segmentation image.

In an exemplary embodiment of the present disclosure, the score determining module includes: the image fusion module is used for fusing the image to be processed and the semantic segmentation image to obtain a fusion image; the feature extraction module is used for extracting features of the fused image to obtain feature data; and the characteristic processing module is used for carrying out prediction processing on the characteristic data so as to determine the scoring information.

In an exemplary embodiment of the present disclosure, the image fusion module includes: and the fusion control module is used for fusing the image to be processed and the semantic segmentation image according to the same color channel to obtain the fused image.

In an exemplary embodiment of the present disclosure, the apparatus further includes: the sample prediction module is used for predicting the sample image according to the machine learning model to obtain prediction scoring information; and the model training module is used for training the machine learning model based on the predicted grading information and the manually marked grading information of the sample image to obtain a trained machine learning model for performing prediction processing on the image to be processed and the semantic segmentation image.

In an exemplary embodiment of the present disclosure, the model training module includes: the image segmentation module is used for performing semantic segmentation on the sample image to obtain a sample semantic segmentation image of the sample image; and the training control module is used for training the machine learning model according to the prediction scoring information of the sample image, the manually marked scoring information of the sample image and the sample semantic segmentation image so as to obtain the trained machine learning model.

In an exemplary embodiment of the disclosure, the training control module is configured to: fusing the sample image and the sample semantic segmentation image to obtain a fused sample image; performing feature extraction on the fused sample image through the machine learning model to obtain image features; inputting the image characteristics into the machine learning model to determine the prediction scoring information of the sample image, and adjusting parameters of the machine learning model by taking the manually marked scoring information of the sample image as a training target to obtain the trained machine learning model.

It should be noted that, the specific details of each module in the image quality evaluation apparatus have been elaborated in the corresponding method, and therefore are not described herein again.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An electronic device 800 according to this embodiment of the invention is described below with reference to fig. 8. The electronic device 800 shown in fig. 8 is only an example and should not bring any limitations to the function and scope of use of the embodiments of the present invention.

As shown in fig. 8, electronic device 800 is in the form of a general purpose computing device. The components of the electronic device 800 may include, but are not limited to: the at least one processing unit 810, the at least one memory unit 820, a bus 830 connecting various system components (including the memory unit 820 and the processing unit 810), and a display unit 840.

Wherein the storage unit stores program code that is executable by the processing unit 810 to cause the processing unit 810 to perform steps according to various exemplary embodiments of the present invention as described in the above section "exemplary methods" of the present specification. For example, the processing unit 810 may perform the steps as shown in fig. 2: in step S210, acquiring an image to be processed, and acquiring a semantic segmentation image corresponding to the image to be processed; in step S220, performing prediction processing on the image to be processed and the semantic segmentation image, and determining score information of the image to be processed; in step S230, performing aesthetic quality evaluation on the image to be processed according to the scoring information.

The storage unit 820 may include readable media in the form of volatile memory units such as a random access memory unit (RAM)8201 and/or a cache memory unit 8202, and may further include a read only memory unit (ROM) 8203.

The storage unit 820 may also include a program/utility 8204 having a set (at least one) of program modules 8205, such program modules 8205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 830 may be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The display unit 840 may be a display having a display function to show a processing result by the processing unit 810 performing the method in the present exemplary embodiment through the display. The display includes, but is not limited to, a liquid crystal display or other display.

The electronic device 800 may also communicate with one or more external devices 900 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 800, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 800 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 850. Also, the electronic device 800 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 860. As shown, the network adapter 860 communicates with the other modules of the electronic device 800 via the bus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.

According to the program product for realizing the method, the portable compact disc read only memory (CD-ROM) can be adopted, the program code is included, and the program product can be operated on terminal equipment, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims

1. An image quality evaluation method is characterized by comprising:

acquiring an image to be processed, and acquiring a semantic segmentation image corresponding to the image to be processed;

performing prediction processing on the image to be processed and the semantic segmentation image, and determining scoring information of the image to be processed;

and performing aesthetic quality evaluation on the image to be processed according to the grading information.

2. The image quality evaluation method according to claim 1, wherein the obtaining of the semantic segmentation image corresponding to the image to be processed comprises:

and inputting the image to be processed into a trained semantic segmentation model, and performing semantic segmentation on the image to be processed at a pixel level according to image characteristic information to obtain the semantic segmentation image.

3. The image quality evaluation method according to claim 1, wherein the performing prediction processing on the image to be processed and the semantic segmentation image to determine the score information of the image to be processed comprises:

fusing the image to be processed and the semantic segmentation image to obtain a fused image;

performing feature extraction on the fused image to obtain feature data;

and performing prediction processing on the characteristic data to determine the scoring information.

4. The image quality evaluation method according to claim 3, wherein the fusing the to-be-processed image and the semantic segmentation image to obtain a fused image comprises:

and fusing the to-be-processed image comprising a plurality of color channels and the semantic segmentation image according to the same color channel to obtain the fused image.

5. The image quality evaluation method according to claim 1, characterized in that the method further comprises:

predicting the sample image according to the machine learning model to obtain prediction scoring information;

and training the machine learning model based on the prediction scoring information and the manually marked scoring information of the sample image to obtain a trained machine learning model for performing prediction processing on the image to be processed and the semantic segmentation image.

6. The image quality evaluation method according to claim 5, wherein the training of the machine learning model based on the prediction score information and the manually labeled score information of the sample image to obtain a trained machine learning model for performing prediction processing on the image to be processed and the semantic segmentation image comprises:

performing semantic segmentation on the sample image to obtain a sample semantic segmentation image of the sample image;

and training the machine learning model according to the prediction scoring information of the sample image, the manually marked scoring information of the sample image and the sample semantic segmentation image so as to obtain the trained machine learning model.

7. The image quality evaluation method according to claim 6, wherein the training of the machine learning model according to the prediction score information of the sample image, the manually-labeled score information of the sample image, and the sample semantic segmentation image to obtain the trained machine learning model comprises:

fusing the sample image and the sample semantic segmentation image to obtain a fused sample image;

performing feature extraction on the fused sample image through the machine learning model to obtain image features;

inputting the image characteristics into the machine learning model to determine the prediction scoring information of the sample image, and adjusting parameters of the machine learning model by taking the manually marked scoring information of the sample image as a training target to obtain the trained machine learning model.

8. An image quality evaluation apparatus, comprising:

the semantic segmentation module is used for acquiring an image to be processed and acquiring a semantic segmentation image corresponding to the image to be processed;

the score determining module is used for performing prediction processing on the image to be processed and the semantic segmentation image and determining score information of the image to be processed;

and the image evaluation module is used for performing aesthetic quality evaluation on the image to be processed according to the grading information.

9. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the image quality assessment method of any of claims 1-7 via execution of the executable instructions.

10. A computer-readable storage medium on which a computer program is stored, the computer program, when being executed by a processor, implementing the image quality evaluation method according to any one of claims 1 to 7.