CN114402356A - Network model training method, image processing method and device and electronic equipment - Google Patents

Network model training method, image processing method and device and electronic equipment Download PDF

Info

Publication number
CN114402356A
CN114402356A CN201980100428.4A CN201980100428A CN114402356A CN 114402356 A CN114402356 A CN 114402356A CN 201980100428 A CN201980100428 A CN 201980100428A CN 114402356 A CN114402356 A CN 114402356A
Authority
CN
China
Prior art keywords
scoring
distribution data
image
network model
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201980100428.4A
Other languages
Chinese (zh)
Inventor
郭子亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Shenzhen Huantai Technology Co Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Shenzhen Huantai Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd, Shenzhen Huantai Technology Co Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Publication of CN114402356A publication Critical patent/CN114402356A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image

Abstract

The embodiment of the application discloses a network model training method, an image processing device and electronic equipment. The training method of the network model comprises the following steps: acquiring an image sample set; constructing a basic network model and a plurality of loss functions corresponding to the basic network model; training the basic network model according to the image sample set and the loss functions until the basic network model converges; and taking the converged basic network model as a grading model of the image aesthetic grade.

Description

Network model training method, image processing method and device and electronic equipment Technical Field
The present invention relates to the field of machine learning, and in particular, to a network model training method, an image processing apparatus, and an electronic device.
Background
With the rapid development of mobile internet and the rapid popularization of smart phones, visual content data such as images and videos are increasing day by day, and the perception and understanding of the visual content have become a plurality of interdisciplinary research directions of computer vision, computational camera science, human psychology and the like. Wherein image aesthetic evaluation (image aesthetic assessment) is a research hotspot in recent computer vision perception understanding direction. The image aesthetics reflect the pursuit and the direction of human vision for 'good' things, so that the image aesthetics have important significance in visual aesthetic evaluation in the fields of photography, advertisement design, artistic work production and the like.
With the rapid development of machine learning in recent years, the development of an image objective aesthetic evaluation method capable of realizing repeated calculation is greatly promoted. Machine learning, especially deep learning systems, can efficiently and accurately simulate human thinking processing, and therefore aesthetic evaluation of images by using machine learning or deep learning methods is an important research topic.
Disclosure of Invention
The embodiment of the application provides a network model training method, an image processing device and electronic equipment, and can improve the accuracy of network model training.
In a first aspect, an embodiment of the present application provides a method for training a network model, including:
acquiring an image sample set, wherein the image sample set comprises a plurality of images to be scored with initial scoring distribution data;
constructing a basic network model and a plurality of loss functions corresponding to the basic network model;
inputting the image sample set into the basic network model for aesthetic scoring to obtain scoring distribution data corresponding to each image to be scored;
training the basic network model according to the grading distribution data, the initial grading distribution data and the loss functions until the basic network model converges;
the converged underlying network model is used as a scoring model for aesthetically scoring the images.
In a second aspect, an embodiment of the present application provides an image processing method, including:
receiving an aesthetic scoring request;
acquiring a target image needing aesthetic scoring according to the aesthetic scoring request;
calling a pre-trained scoring model;
performing aesthetic scoring on the target image according to the scoring model to obtain a target scoring score corresponding to the target image;
the scoring model is obtained by training by adopting the training method of the network model provided by the embodiment.
In a third aspect, an embodiment of the present application provides a training apparatus for a network model, including:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring an image sample set, and the image sample set comprises a plurality of images to be scored with initial scoring distribution data;
the system comprises a construction module, a data processing module and a data processing module, wherein the construction module is used for constructing a basic network model and a plurality of loss functions corresponding to the basic network model;
the first scoring module is used for inputting the image sample set into the basic network model for aesthetic scoring to obtain scoring distribution data corresponding to each image to be scored;
the training module is used for training the basic network model according to the score distribution data, the initial score distribution data and the loss functions until the basic network model converges;
and the determining module is used for taking the converged basic network model as a grading model for performing aesthetic grading on the image.
In a fourth aspect, an embodiment of the present application provides an apparatus for processing an image, including:
a receiving module for receiving an aesthetic scoring request;
the second acquisition module is used for acquiring a target image needing aesthetic scoring according to the aesthetic scoring request;
the calling model is used for calling a pre-trained scoring model;
the second scoring module is used for performing aesthetic scoring on the target image according to the scoring model to obtain a target scoring score corresponding to the target image;
the scoring model is obtained by training by adopting the training method of the network model provided by the embodiment.
In a fifth aspect, an embodiment of the present application provides a storage medium, on which a computer program is stored, wherein when the computer program is executed on a computer, the computer is caused to execute the training method of the network model or the processing method of the image provided by the embodiment.
In a sixth aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to, by calling the computer program stored in the memory, execute:
acquiring an image sample set, wherein the image sample set comprises a plurality of images to be scored with initial scoring distribution data;
constructing a basic network model and a plurality of loss functions corresponding to the basic network model;
inputting the image sample set into the basic network model for aesthetic scoring to obtain scoring distribution data corresponding to each image to be scored;
training the basic network model according to the grading distribution data, the initial grading distribution data and the loss functions until the basic network model converges;
the converged underlying network model is used as a scoring model for aesthetically scoring the images.
In a seventh aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor, by calling the computer program stored in the memory, is configured to perform:
receiving an aesthetic scoring request;
acquiring a target image needing aesthetic scoring according to the aesthetic scoring request;
calling a pre-trained scoring model;
performing aesthetic scoring on the target image according to the scoring model to obtain a target scoring score corresponding to the target image;
the scoring model is obtained by training through the network model training method provided by the embodiment of the application.
Drawings
The technical solutions and advantages of the present application will become apparent from the following detailed description of specific embodiments of the present application when taken in conjunction with the accompanying drawings.
Fig. 1 is a first flowchart of a method for training a network model according to an embodiment of the present disclosure.
Fig. 2 is a second flowchart of a method for training a network model according to an embodiment of the present disclosure.
Fig. 3 is a distribution diagram of initial score distribution data and expected score distribution data provided by an embodiment of the present application.
Fig. 4 is a schematic flowchart of an image processing method according to an embodiment of the present application.
Fig. 5 is a schematic structural diagram of a training apparatus for a network model according to an embodiment of the present application.
Fig. 6 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.
Fig. 7 is a schematic structural diagram of a first electronic device according to an embodiment of the present application.
Fig. 8 is a schematic structural diagram of a second electronic device according to an embodiment of the present application.
Detailed Description
Referring to the drawings, wherein like reference numbers refer to like elements, the principles of the present application are illustrated as being implemented in a suitable computing environment. The following description is based on illustrated embodiments of the application and should not be taken as limiting the application with respect to other embodiments that are not detailed herein.
The embodiment of the application provides a network model training method. The execution subject of the network model training method may be the network model training device provided in the embodiments of the present application, or an electronic device integrated with the network model training device. The training device of the network model can be realized in a hardware or software mode, and the electronic equipment can be equipment with processing capability, such as a smart phone, a tablet computer, a palm computer, a notebook computer, a desktop computer and the like, which are provided with a processor. For convenience of description, the execution subject of the training method of the network model will be exemplified as the electronic device in the following.
Referring to fig. 1, fig. 1 is a first flowchart illustrating a network model training method according to an embodiment of the present disclosure. The process of the network model training method can comprise the following steps:
101. a sample set of images is acquired.
The electronic device can acquire the image sample set through a wired connection or a wireless connection. Wherein, the image sample set can adopt the existing first Large-Scale Aesthetic quality assessment Database (A Large-Scale Database for Aestistic Visual Analysis, AVA). For ease of description, this first large-scale aesthetic quality assessment database will be referred to hereinafter as the AVA public data set. And all sample images in the AVA publication dataset are taken as images to be scored for the present application. It should be noted that the AVA public data set is a database for aesthetic quality assessment, and the AVA public data set includes about 256000 sample images, and each sample image is aesthetically scored by a plurality of different users, wherein the score of the aesthetic score is a plurality of natural numbers between [1,10], for example, the score may be 1, 2 or 10, etc. Further, initial score distribution data corresponding to the sample image can be generated according to the score data of a plurality of different users, and the initial score distribution data comprises the number of scoring people corresponding to each score. It will be appreciated that a higher score for the aesthetic score indicates a higher aesthetic quality of the sample image.
It should be noted that the number of people scored on each sample image in the AVA public data set is between 78 and 539. Wherein each sample image has 210 ginseng and aesthetic scores on average, therefore, the large amount of aesthetic score data of each sample image can better reflect the evaluation and perception of one image to the public. Therefore, the data set is a standard test set accepted in the field of image aesthetic evaluation, and therefore, the AVA public data set is adopted to train an original network model, and a scoring model with higher accuracy for performing image aesthetic scoring can be obtained.
102. And constructing a basic network model and a plurality of loss functions corresponding to the basic network model.
Among them, the lightweight deep neural network model is smaller and faster in model, and thus is widely applied to embedded electronic devices such as smart phones. Therefore, a lightweight deep neural network model can be constructed and used as a basic network model, so that the trained basic network model can be applied to electronic equipment such as a smart phone, the smart phone can perform aesthetic scoring on images, and the intelligence of the smart phone is further improved. The core of the MobileNets is a deep separable convolution consisting of a deep convolution (depthwise conv) and a Point-to-Point convolution (Point conv), and the structure of the deep separable convolution realizes that the MobileNets model reduces network parameters and calculation amount on the premise of not reducing network performance.
In some embodiments, a network model by building MobileNetV2 may be used as a base network model. Further, in order to make the score distribution data output by the base network model between [0,1], a softmax function may be used as an output function of the output layer of the base network model, so that the score distribution data output by the base network model includes a probability value corresponding to each score.
In some embodiments, after the MobileNetV2 network model is constructed, the MobileNetV2 network model is pre-trained on the ImageNet database, and the pre-trained MobileNetV2 network model is used as the base network model. It should be noted that the ImageNet database is a large visualization database for visual object recognition software research. The ImageNet database contains over 1500 million images, with 120 million images divided into 1000 categories (approximately 100 million images contain bounding boxes and annotations). The mobilenetV2 network model is pre-trained on the ImageNet database, a mobilenetV2 network model with better network parameters can be obtained after training is finished, and the training time of the basic network model can be greatly shortened by taking the pre-trained mobilenetV2 network model as the basic network model.
Since the score distribution data output by the base network model includes probability distribution data of each score, it is necessary to obtain a corresponding function for processing probability distribution as a plurality of loss functions corresponding to the base network model. Through the combined action of a plurality of loss functions, the difference between the score distribution data of each image to be scored and the initial score distribution data can be better captured, and the score distribution data can be better fitted. Therefore, the basic network model can be better trained through a plurality of loss functions to obtain a more accurate scoring model.
103. And inputting the image sample set into the basic network model for aesthetic scoring to obtain scoring distribution data corresponding to each image to be scored.
And inputting the images to be scored in the image sample set into a basic network model such as a MobileNet V2 network model for aesthetic scoring to obtain scoring distribution data corresponding to each image to be scored, which is output by the basic network model. Wherein the score distribution data includes a probability value corresponding to each score. For example, the score distribution data outputted is
Figure PCTCN2019118135-APPB-000001
Wherein p iss1Is a probability value when the score is 1,
Figure PCTCN2019118135-APPB-000002
is a probability value when the score is 2,
Figure PCTCN2019118135-APPB-000003
is the probability value at which the score is N-1,
Figure PCTCN2019118135-APPB-000004
is the probability value when the score is N.
104. And training the basic network model according to the grading distribution data, the initial grading distribution data and the plurality of loss functions until the basic network model converges.
In order to make the score distribution data output by the basic network model consistent with the initial score distribution data of the user real score, the basic model can be trained through a plurality of loss functions. The specific training process may be as follows: after the aesthetic scoring of a batch of images to be scored is completed, scoring distribution data and initial scoring distribution data of the batch of images to be scored can be input into a plurality of loss functions for calculation, and corresponding target loss values are obtained. And determining whether the basic network model meets a convergence condition according to the target loss value, and confirming that the basic network model converges when the target loss value gradually approaches a certain numerical value or fluctuates around the certain numerical value and the loss change is less than a small positive number. If the target loss value does not meet the conditions, the basic network model does not meet the convergence conditions, the target loss value is transmitted back to the basic network model, and network parameters, namely the weight and the bias value of the basic network model are adjusted according to a back propagation algorithm. And obtaining the image to be evaluated which does not pass through the base network model for aesthetic evaluation, and continuing training the adjusted base network model according to the image to be evaluated which does not pass through the base network model for aesthetic evaluation and a plurality of loss functions until the base network model converges.
105. The converged underlying network model is used as a scoring model for aesthetically scoring the images.
Wherein the converged base network model is used as a scoring model for aesthetic scoring of the image. The scoring model may be applied to the electronic device to aesthetically score a plurality of images stored in the electronic device by the user according to the scoring model, sort the plurality of images according to the scoring scores, and display the images according to the result of the sorting process, so that the user may preferentially browse images with high aesthetic scoring scores, i.e., images with higher aesthetic quality.
As can be seen from the above, in the training method of the network model provided in the embodiment of the present application, an image sample set is obtained, where the image sample set includes a plurality of images to be scored, which have initial scoring distribution data; constructing a basic network model and a plurality of loss functions corresponding to the basic network model; inputting the image sample set into a basic network model for aesthetic scoring to obtain scoring distribution data corresponding to each image to be scored; training the basic network model according to the grading distribution data, the initial grading distribution data and the plurality of loss functions until the basic network model converges; the converged underlying network model is used as a scoring model for aesthetically scoring the images. Therefore, the basic network model can be trained through a plurality of loss functions to obtain a scoring model with higher accuracy, and the accuracy of the scoring model is improved.
Referring to fig. 2, fig. 2 is a second flowchart illustrating a network model training method according to an embodiment of the present disclosure. The process of the network model training method can comprise the following steps:
201. an initial image sample set is acquired.
Wherein the initial image sample set comprises a plurality of sample images with initial score distribution data. The initial image sample set may employ an existing AVA public data set. The AVA public data set is a database for aesthetic quality assessment, the AVA public data set comprises about 256000 sample images, and each sample image is subjected to aesthetic scoring by a plurality of different users, wherein the scoring score of the aesthetic scoring is a plurality of natural numbers between [1,10], for example, the scoring score can be 10 scoring scores. Further, initial score distribution data corresponding to the sample image can be generated according to the score data of a plurality of different users, and the initial score distribution data comprises the number of scoring people corresponding to each score. It will be appreciated that a higher score for the aesthetic score indicates a higher aesthetic quality of the sample image.
It should be noted that the number of people scored on each sample image in the AVA public data set is between 78 and 539. Wherein each sample image has 210 ginseng and aesthetic scores on average, so that the large amount of aesthetic score data of each sample image can better reflect the evaluation and perception of an image to the public. Therefore, the data set is a standard test set accepted in the field of image aesthetic evaluation, and therefore, the AVA public data set is adopted to train an original network model, and a scoring model with higher accuracy for performing image aesthetic scoring can be obtained.
202. And carrying out image preprocessing on each sample image to obtain a plurality of first sample images with initial grading distribution data.
The image preprocessing mode comprises conventional multi-scale scaling, random turning, random translation, random cutting and the like, so that a plurality of first sample images are obtained through image preprocessing of the sample images, and the first sample images are input into the basic model as the images to be scored for aesthetic scoring, so that the data volume of the input images, namely the images to be scored, is enhanced. However, considering that information such as composition and color of the image needs to be taken as the basis of aesthetic scoring, the application does not perform image preprocessing of random cropping with large scale and random change of color so as to ensure that the first sample image and the corresponding sample image are similar in information such as composition and color. In addition, it can be understood that the random scaling, random flipping and random translation of the sample image do not change the composition, color and other information of the sample image too much. That is, the obtained first sample image and the corresponding sample image are similar in information such as composition colors, and therefore, the initial score distribution data of the corresponding sample image may be adopted as the initial score distribution data of the first sample image.
203. And adding random noise into the first sample image to obtain a first target sample image.
The first sample image is obtained by randomly changing the sample image, such as multi-scale scaling, random inversion, random translation, random cropping and the like, so that a slight difference necessarily exists between the first sample data and the sample data. Although the slight difference does not generate a difference in visual identification, the score distribution data output by the basic network model is directly affected, so that the score distribution data corresponding to the first sample image and the score distribution data corresponding to the sample image have a difference. Therefore, after image preprocessing is performed on each sample image, the aesthetic score of the obtained first sample image is affected by a slight difference operation caused by the image preprocessing, so that the score distribution data predicted by the basic network model is unstable.
Based on this problem, a small random noise, such as gaussian noise, may be added to each pixel point of the first sample image to obtain the first target sample image. It will be appreciated that for a typical image, small variations in image pixel values do not produce a visual difference in recognizability, i.e. small variations in image pixel values are difficult for the human eye of a user to distinguish. That is, the first target sample image is visually identical to the first sample image on the user's visual sense, and thus the user's scores for the first target sample image and the first sample image do not change. Therefore, in an ideal environment, the score distribution data of the underlying network model corresponding to the first sample image and the first target sample image should not be greatly different. Therefore, the basic network model needs to be trained through the first target sample image, so that the basic network model can adapt to the tiny changes of the input sample image, the basic network model is more robust to the tiny changes of the pixel values, and the predicted grading distribution data is more stable.
204. And taking the sample image and the first target sample image as images to be scored to obtain an image sample set.
And obtaining an image sample set by taking the sample image and the first target sample image as images to be evaluated, so that the data volume of the images to be evaluated in the finally obtained image sample set is larger than that of the initial image sample set. The image sample set with larger data volume is more suitable for training of the basic network model, so that the prediction result of the trained scoring model is more accurate. And taking the sample image and the first target sample image as images to be evaluated so as to train the basic network model.
205. And performing exponential processing on each scoring data in the initial scoring distribution data corresponding to each image to be scored to obtain corresponding expected scoring distribution data.
Although the initial score distribution data of each image to be scored can better reflect the evaluation and perception of one image to the public, the initial score distribution data is relatively flat because the standard deviation of the initial score distribution data is relatively large and the score scores are mostly concentrated in the middle part, namely, the score scores [3, 7 ]. Therefore, when the basic network model is trained and learned through the initial score distribution data, the standard deviation of the score distribution data output by the trained score model is large. The images with different scores are similar in aesthetic quality, and are difficult to be finely distinguished in aesthetic quality. For example, the aesthetic quality of the image corresponding to the score output by the trained score model from 3 to 7 is not good according to the score.
Therefore, it is necessary to perform an indexing process on each score data in the initial score distribution data to obtain desired score distribution data with a smaller standard deviation. For example, each score data in the initial score distribution data is exponentially calculated to obtain the desired score distribution data.
Referring to fig. 3, fig. 3 is a schematic distribution diagram of initial score distribution data and expected score distribution data according to an embodiment of the present application. As shown in the figure, a curve a is initial score distribution data corresponding to a certain image to be scored, provided by the embodiment of the present application, and an abscissa in fig. 3 is a score, and an ordinate is a probability value corresponding to the score. When the index is 2, each score data in the initial score distribution data is squared to obtain the expected score distribution data with a smaller standard deviation, i.e., curve B shown in fig. 3. As can be seen from fig. 3, the probability distribution of the score in curve B is more concentrated, i.e., curve B has a standard deviation smaller than that of curve a. Therefore, each score data in the initial score distribution data can be subjected to indexing processing, and corresponding expected score distribution data with a small standard deviation is obtained. Therefore, when the basic network model is trained and learned through the expected scoring distribution data with smaller standard deviation, the scoring model with smaller standard deviation of the output result can be obtained.
In some embodiments, continuing to refer to fig. 3, a gaussian function associated with the score may also be obtained and multiplied by the initial score distribution data to obtain the desired score distribution data, i.e., curve C shown in fig. 3. The gaussian function is a gaussian function with a small standard deviation, i.e. the probability distribution of the gaussian function is concentrated. Therefore, the standard deviation of the initial score distribution data can be reduced by multiplying the initial score distribution data by the gaussian function in the probability distribution set, and the expected score distribution data can be obtained. However, the probability distribution of the gaussian function is relatively absolute, i.e. there will only be one single peak near the middle. However, considering that there may be a case where there is a plurality of peaks in the probability distribution of the image score, that is, a case where the bipolar differentiation is considered to be very unsightly, for example, when the probability value is larger when the score is 3 and the probability value is larger when the score is 8, two peaks will be formed at two points of the score of 3 and the score of 8 in the initial score distribution data. In this case, when a gaussian function is used to reduce the standard deviation of the initial score distribution data, the score with the large probability value is close to the middle value, for example, the score is 5 points, so that the obtained expected score distribution data is inaccurate. Therefore, when the initial score distribution data is adjusted according to the gaussian function, the initial score distribution data is selected according to the actual situation.
206. And constructing a basic network model and a first loss function and a second loss function corresponding to the basic network model.
Among them, the lightweight deep neural network model is smaller and faster in model, and thus is widely applied to embedded electronic devices such as smart phones. Therefore, a lightweight deep neural network model can be constructed and used as a basic network model, so that the trained basic network model can be applied to electronic equipment such as a smart phone, the smart phone can perform aesthetic scoring on images, and the intelligence of the smart phone is further improved. The core of the MobileNets is a deep separable convolution consisting of a deep convolution (depthwise conv) and a Point-to-Point convolution (Point conv), and the structure of the deep separable convolution realizes that the MobileNets model reduces network parameters and calculation amount on the premise of not reducing network performance.
In some embodiments, a network model by building MobileNetV2 may be used as a base network model. Further, in order to make the score distribution data output by the base network model between [0,1], a softmax function may be used as an output function of the output layer of the base network model, so that the score distribution data output by the base network model includes a probability value corresponding to each score.
In some embodiments, after the MobileNetV2 network model is constructed, the MobileNetV2 network model is pre-trained on ImageNet, and the pre-trained MobileNetV2 network model is used as a base network model. It should be noted that ImageNet is a large visualization database for visual object recognition software research. The ImageNet database contains over 1500 million images, with 120 million images divided into 1000 categories (approximately 100 million images contain bounding boxes and annotations). The mobilenetV2 network model is pre-trained on ImageNet, a mobilenetV2 network model with better model parameters can be obtained after training is finished, and the training time of the basic network model can be greatly shortened by taking the pre-trained mobilenetV2 network model as the basic network model.
Since the score distribution data output by the base network model includes probability distribution data of each score, different functions for processing probability distribution need to be obtained as the first loss function and the second loss function of the corresponding base network model. Through the combined action of the first loss function and the second loss function, the difference between the score distribution data of each image to be scored and the initial score distribution data can be better captured, and the score distribution data can be better fitted. Therefore, the basic network model can be better trained through a plurality of loss functions to obtain a more accurate scoring model.
Specifically, a first loss function may be constructed by using a function for measuring a Distance between two distributions, where the first loss function may be an Earth Mover's Distance (EMD) loss function, and the first loss function is:
Figure PCTCN2019118135-APPB-000005
wherein the content of the first and second substances,
Figure PCTCN2019118135-APPB-000006
function CDF for desired score distribution data, p for score distribution data, N for number of score scores, N-10p(k) Sum function
Figure PCTCN2019118135-APPB-000007
Is a cumulative probability distribution function, k is a score, CDFp(k) Represents a cumulative probability value when the score is k in the score distribution data,
Figure PCTCN2019118135-APPB-000008
presentation periodAnd the cumulative probability value when the score in the expectation score distribution data is k, and l is an index value. In some embodiments, l ═ 2.
Wherein the content of the first and second substances,
Figure PCTCN2019118135-APPB-000009
k is the score of the score and k is the mark,
Figure PCTCN2019118135-APPB-000010
a probability value representing a score i in the score distribution data;
Figure PCTCN2019118135-APPB-000011
k is the score of the score and k is the mark,
Figure PCTCN2019118135-APPB-000012
a probability value representing a score i in the desired score distribution data.
Specifically, the second loss function may be a CJS loss function, and the second loss function is:
Figure PCTCN2019118135-APPB-000013
wherein the content of the first and second substances,
Figure PCTCN2019118135-APPB-000014
function CDF for desired score distribution data, p for score distribution data, N for number of score scores, N-10p(k) Sum function
Figure PCTCN2019118135-APPB-000015
Is a cumulative probability distribution function, k is a score, CDFp(k) Representing a score of k in score distribution dataThe probability values are accumulated and the probability values,
Figure PCTCN2019118135-APPB-000016
and the cumulative probability value when the score is k in the expected score distribution data is represented, and l is an index value. In some embodiments, l ═ 2.
Wherein the content of the first and second substances,
Figure PCTCN2019118135-APPB-000017
k is the score of the score and k is the mark,
Figure PCTCN2019118135-APPB-000018
a probability value representing a score i in the score distribution data;
Figure PCTCN2019118135-APPB-000019
k is the score of the score and k is the mark,
Figure PCTCN2019118135-APPB-000020
a probability value representing a score i in the desired score distribution data.
207. And inputting the image sample set into the basic network model for aesthetic scoring to obtain scoring distribution data corresponding to each image to be scored.
And inputting the images to be scored in the image sample set into a basic network model such as a MobileNet V2 network model for aesthetic scoring to obtain scoring distribution data corresponding to each image to be scored, which is output by the basic network model. Wherein the score distribution data includes a probability value corresponding to each score. For example, the score distribution data outputted is
Figure PCTCN2019118135-APPB-000021
Wherein p iss1Is a probability value when the score is 1,
Figure PCTCN2019118135-APPB-000022
is a probability value when the score is 2,
Figure PCTCN2019118135-APPB-000023
is the probability value at which the score is N-1,
Figure PCTCN2019118135-APPB-000024
is the probability value when the score is N.
208. And inputting the score distribution data and the expected score distribution data into a first loss function to obtain a first loss value.
And inputting the score distribution data and the expected score distribution data corresponding to each image to be scored into a first loss function to obtain a first loss value corresponding to each image to be scored.
209. And inputting the score distribution data and the expected score distribution data into a second loss function to obtain a second loss value.
And inputting the score distribution data and the expected score distribution data corresponding to each image to be scored into a second loss function to obtain a second loss value corresponding to each image to be scored.
210. A target loss value is determined based on the first loss value and the second loss value.
After obtaining the first loss value and the second loss value, the electronic device may determine a target loss value according to the first loss value and the second loss value. Specifically, the first loss value may be multiplied by the first weight value to obtain a third loss value. And multiplying the second loss value by the second weight value to obtain a fourth loss value. And finally, adding the third loss value and the fourth loss value to obtain a target loss value. The first weight value and the second weight value can be set according to actual conditions, and the first weight value and the second weight value can be two numerical values with equal numerical values and can also be two numerical values with different numerical values. For example, the first weight value may be 0.6, and the second weight value may be 0.4. Or both the first and second weight values may be 0.5.
211. And adjusting parameters of the basic network model according to the target loss value until the basic network model converges.
And determining whether the basic network model meets a convergence condition according to the target loss value, wherein when the target loss value gradually approaches a certain numerical value or fluctuates around the certain numerical value, and the loss change is smaller than a small positive number, the convergence of the basic network model can be confirmed. If the target loss value does not meet the conditions, the basic network model does not meet the convergence conditions, the target loss value is transmitted back to the basic network model, and network parameters, namely the weight and the bias value of the basic network model are adjusted according to a back propagation algorithm. And continuing to train the adjusted basic network model until the basic network model converges.
In some embodiments, after a basic network model is trained by inputting a batch of images to be scored, a network model after parameter adjustment can be obtained, and the electronic device can obtain a batch of verification images from a verification set and input the network model after parameter adjustment so as to verify the accuracy of the network model after parameter adjustment. When the accuracy obtained this time is greater than the accuracy obtained last time, the electronic device may store the parameter of the network model after the parameter adjustment. When the accuracy obtained this time is smaller than the accuracy obtained last time, the electronic device may not store the network model after the parameter adjustment. When the accuracy of the network model after the parameter adjustment obtained for multiple times is not increased, for example, the accuracy of the network model after the parameter adjustment obtained for multiple times is respectively: 87%, 86.9%, 86.7%, and 86.8%, the electronic device may confirm that the training of the base network model is completed, i.e., the base network model converges.
212. The converged underlying network model is used as a scoring model for aesthetically scoring the images.
Wherein a converged base network model is obtained and used as a scoring model for aesthetic scoring of images. The scoring model may be applied to the electronic device to aesthetically score images stored in the electronic device by the user according to the scoring model, sort the images according to the scoring scores, and display the images according to the results of the sorting process, so that the user can preferentially browse images with high aesthetic scoring scores, i.e., images with higher aesthetic quality.
As can be seen from the above, in the training method of the network model provided in the embodiment of the present application, an image sample set is obtained, where the image sample set includes a plurality of images to be scored, which have initial scoring distribution data; constructing a basic network model and a plurality of loss functions corresponding to the basic network model; inputting the image sample set into a basic network model for aesthetic scoring to obtain scoring distribution data corresponding to each image to be scored; training the basic network model according to the grading distribution data, the initial grading distribution data and the plurality of loss functions until the basic network model converges; the converged underlying network model is used as a scoring model for aesthetically scoring the images. Therefore, the basic network model can be trained through a plurality of loss functions to obtain a scoring model with higher accuracy, and the accuracy of the scoring model is improved.
Referring to fig. 4, fig. 4 is a flowchart illustrating an image processing method according to an embodiment of the present disclosure. The flow of the image processing method may include:
301. an aesthetic scoring request is received.
The method comprises the steps that when the electronic equipment receives modes such as target component touch operation, preset voice operation or preset starting instructions of target applications, an aesthetic scoring request is triggered and generated. In addition, the electronic device can automatically trigger the generation of the aesthetic scoring request at preset time intervals or based on certain trigger rules. For example, when the electronic device detects that the current display interface includes a plurality of images, such as when the electronic device detects that a browser application is started to browse an article page containing the images, the electronic device may automatically trigger generation of an aesthetic scoring request to aesthetically score the plurality of images in the current page according to a scoring model. The electronic equipment can sort the plurality of images according to different scoring scores, and the images with high scoring scores, namely good aesthetic quality, are preferentially displayed.
302. And acquiring a target image needing aesthetic scoring according to the aesthetic scoring request.
The target image may be an image stored in the electronic device, and in this case, the aesthetic scoring request includes path information for indicating a location where the target image is stored, and the electronic device may obtain the target image to be aesthetically scored through the path information. Of course, when the target image is not an image stored in the electronic device, the electronic device may acquire the target image to be aesthetically scored through a wired connection or a wireless connection according to the aesthetic scoring request.
303. A pre-trained scoring model is invoked.
The scoring model is obtained by training by adopting the training method of the network model provided by the embodiment. For a training process of a specific network model, reference may be made to the related description of the above embodiments, and details are not repeated herein.
304. And performing aesthetic scoring on the target image according to the scoring model to obtain a target scoring score corresponding to the target image.
And inputting the target image into a scoring model for aesthetic scoring to obtain a target scoring score corresponding to the target image. The target scoring score may represent an aesthetic quality of the target image. A higher target score indicates a higher aesthetic quality of the target image, i.e., indicates that the target image is more aesthetically pleasing to the public.
In some embodiments, the step of performing an aesthetic scoring on the target image according to the scoring model to obtain a target scoring score corresponding to the target image includes: performing aesthetic scoring on the target image according to the scoring model to obtain target scoring distribution data corresponding to the target image, wherein the target scoring distribution data are probability distribution data of each scoring score; and taking the score with the maximum probability value in the target score distribution data as a target score.
In some embodiments, when the electronic device scores a plurality of target images stored in an album or an image library in the electronic device, after obtaining a target score corresponding to each target image, it may further detect whether the target score is greater than a preset score, and delete the target images whose target score is less than or equal to the preset score. It is understood that when the target score is less than or equal to the preset score, it indicates that the target image has poor aesthetic quality, i.e., the target image may be an image with unclear image or an image with incomplete composition. It is understood that such an image with unclear image and an image with incomplete composition may cause a user to inadvertently press a capture key to obtain an invalid image, so that such an image is not an image that the user needs to save, and may occupy a storage space of the electronic device. Therefore, the electronic device can trigger corresponding aesthetic scoring requests periodically to screen the plurality of images stored in the electronic device through the scoring model, and intelligently delete the target images with the aesthetic scoring less than or equal to the preset scoring value, so that a user can be more intelligently helped to manage the images in the electronic device, and the memory space of the electronic device is saved.
As can be seen from the above, the image processing method provided in the embodiment of the present application is based on the aesthetic scoring request by receiving the aesthetic scoring request; acquiring a target image needing aesthetic scoring; calling a pre-trained scoring model; performing aesthetic scoring on the target image according to the scoring model to obtain a target scoring score corresponding to the target image; therefore, the aesthetic scoring of the target image is obtained through preset training.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a training apparatus for a network model according to an embodiment of the present disclosure. The training device of the network model may include: a first acquisition module 41, a construction module 42, a first scoring module 43, a training module 44, and a determination module 45.
The first obtaining module 41 is configured to obtain an image sample set, where the image sample set includes a plurality of images to be scored with initial scoring distribution data.
A building module 42 is configured to build a base network model and a plurality of loss functions corresponding to the base network model.
The first scoring module 43 is configured to input the image sample set into the basic network model for aesthetic scoring, so as to obtain scoring distribution data corresponding to each image to be scored.
A training module 44, configured to train the basic network model according to the score distribution data, the initial score distribution data, and the loss functions until the basic network model converges.
A determination module 45 for using the converged base network model as a scoring model for aesthetic scoring of the image.
In some embodiments, the first obtaining module 41 is specifically configured to obtain an initial image sample set, where the initial image sample set includes a plurality of sample images with initial score distribution data; carrying out image preprocessing on each sample image to obtain a plurality of first sample images with initial grading distribution data; adding random noise into the first sample image to obtain a first target sample image; and obtaining an image sample set according to the sample image and the first target sample image, and taking the sample image and the first target sample image as images to be scored.
In some embodiments, before the step of constructing the base network model and the plurality of loss functions corresponding to the base network model, the construction module 42 is further configured to: and adjusting the initial grading distribution data corresponding to each image to be graded to obtain corresponding expected grading distribution data, wherein the standard deviation of the expected grading distribution data corresponding to each image to be graded is smaller than the standard deviation of the initial grading distribution data.
In some embodiments, the building module 42 is specifically configured to perform indexing processing on each scoring data in the initial scoring distribution data corresponding to each image to be scored to obtain corresponding expected scoring distribution data when the initial scoring distribution data corresponding to each image to be scored is adjusted to obtain the corresponding expected scoring distribution data.
In some embodiments, training module 44 is specifically configured to train the base network model according to the score distribution data, the expected score distribution data, and the plurality of loss functions.
In some embodiments, the plurality of loss functions includes a first loss function and a second loss function, and the training module 44 is specifically configured to input the score distribution data and the expected score distribution data into the first loss function to obtain a first loss value; inputting the score distribution data and the expected score distribution data into a second loss function to obtain a second loss value; determining a target loss value according to the first loss value and the second loss value; and adjusting parameters of the basic network model according to the target loss value.
In some embodiments, training module 44 multiplies the first loss value by a first weight value to obtain a third loss value; multiplying the second loss value by a second weight value to obtain a fourth loss value; and adding the third loss value and the fourth loss value to obtain a target loss value.
As can be seen from the above, in the training device of the network model provided in the embodiment of the present application, the first obtaining module 41 obtains the image sample set, where the image sample set includes a plurality of images to be scored, which have initial scoring distribution data; the construction module 42 constructs a base network model and a plurality of loss functions corresponding to the base network model; the first scoring module 43 inputs the image sample set into the basic network model for aesthetic scoring to obtain scoring distribution data corresponding to each image to be scored; the training module 44 trains the basic network model according to the score distribution data, the initial score distribution data and the plurality of loss functions until the basic network model converges; the determination module 45 takes the converged base network model as a scoring model for aesthetic scoring of the image. Therefore, the basic network model can be trained through a plurality of loss functions to obtain a scoring model with higher accuracy, and the accuracy of the scoring model is improved.
It should be noted that the training apparatus for a network model provided in the embodiment of the present application and the training method for a network model in the foregoing embodiment belong to the same concept, and any method provided in the training method embodiment for a network model may be run on the training apparatus for a network model, and a specific implementation process thereof is described in detail in the training method embodiment for a network model, and is not described herein again.
Referring to fig. 6, fig. 6 is a schematic structural diagram of an image processing apparatus 500 according to an embodiment of the present disclosure. The image processing apparatus may include: a receiving module 51, a second obtaining module 52, a calling model 53 and a second scoring module 54.
A receiving module 51, configured to receive an aesthetic scoring request;
a second obtaining module 52, configured to obtain a target image to be aesthetically scored according to the aesthetic scoring request;
a calling model 53 for calling a pre-trained scoring model;
the second scoring module 54 is configured to perform aesthetic scoring on the target image according to the scoring model to obtain a target scoring score corresponding to the target image;
the scoring model is obtained by training through the network model training method provided by the embodiment of the application.
In some embodiments, the second scoring module 54 is specifically configured to perform aesthetic scoring on the target image according to the scoring model to obtain target scoring distribution data corresponding to the target image; and taking the score with the maximum probability value in the target score distribution data as a target score.
As can be seen from the above, the image processing apparatus provided in the embodiment of the present application receives the aesthetic scoring request through the receiving module 51; the second obtaining module 52 obtains a target image needing to be subjected to aesthetic scoring according to the aesthetic scoring request; calling the model 53 to call a pre-trained scoring model; the second scoring module 54 performs aesthetic scoring on the target image according to the scoring model to obtain a target scoring score corresponding to the target image; therefore, the aesthetic scoring of the target image is obtained through preset training.
It should be noted that the image processing apparatus provided in the embodiment of the present application and the image processing method in the foregoing embodiment belong to the same concept, and any method provided in the embodiment of the image processing method may be run on the image processing apparatus, and a specific implementation process thereof is described in detail in the embodiment of the image processing method, and is not described herein again.
The present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program stored in the storage medium is executed on a computer, the computer is caused to execute a training method of a network model or a processing method of an image as provided in the present application.
The storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like.
The embodiment of the present application further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the training method of the network model or the image processing method provided in the embodiment of the present application by calling the computer program stored in the memory.
For example, the electronic device may be a mobile terminal such as a tablet computer or a smart phone. Referring to fig. 7, fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
The electronic device 600 may include components such as a memory 601, a processor 602, and the like. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 7 does not constitute a limitation of the electronic device and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.
The memory 601 may be used to store software programs and modules, and the processor 602 executes various functional applications and data processing by operating the computer programs and modules stored in the memory 601. The memory 601 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, a computer program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the electronic device, and the like.
The processor 602 is a control center of the electronic device, connects various parts of the whole electronic device by using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing an application program stored in the memory 601 and calling the data stored in the memory 601, thereby performing overall monitoring of the electronic device.
Further, the memory 601 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 601 may also include a memory controller to provide the processor 602 with access to the memory 601.
In this embodiment, the processor 602 in the electronic device loads the executable code corresponding to the processes of one or more application programs into the memory 601 according to the following instructions, and the processor 601 runs the application programs stored in the memory 601, thereby implementing the following processes:
acquiring an image sample set, wherein the image sample set comprises a plurality of images to be scored with initial scoring distribution data;
constructing a basic network model and a plurality of loss functions corresponding to the basic network model;
inputting the image sample set into the basic network model for aesthetic scoring to obtain scoring distribution data corresponding to each image to be scored;
training the basic network model according to the grading distribution data, the initial grading distribution data and the loss functions until the basic network model converges;
the converged underlying network model is used as a scoring model for aesthetically scoring the images.
In some embodiments, before the processor 602 performs the building of the base network model and the plurality of loss functions corresponding to the base network model, it may perform:
and adjusting the initial grading distribution data corresponding to each image to be graded to obtain corresponding expected grading distribution data, wherein the standard deviation of the expected grading distribution data corresponding to each image to be graded is smaller than the standard deviation of the initial grading distribution data.
In some embodiments, when processor 602 performs training of the base network model according to the score distribution data, the initial score distribution data, and the plurality of loss functions, it may perform:
and training the basic network model according to the grading distribution data, the expected grading distribution data and the loss functions.
In some embodiments, the plurality of loss functions includes a first loss function and a second loss function, and the training of the base network model according to the score distribution data, the expected score distribution data, and the plurality of loss functions performed by processor 602 may include:
inputting the score distribution data and the expected score distribution data into a first loss function to obtain a first loss value;
inputting the score distribution data and the expected score distribution data into a second loss function to obtain a second loss value;
determining a target loss value according to the first loss value and the second loss value;
and adjusting parameters of the basic network model according to the target loss value.
In some embodiments, when processor 602 performs determining the target loss value according to the first loss value and the second loss value, it may perform:
multiplying the first loss value by a first weight value to obtain a third loss value;
multiplying the second loss value by a second weight value to obtain a fourth loss value;
and adding the third loss value and the fourth loss value to obtain a target loss value.
In some embodiments, when the processor 602 performs adjustment on the initial score distribution data corresponding to each image to be scored to obtain corresponding expected score distribution data, the following may be performed:
and performing exponential processing on each scoring data in the initial scoring distribution data corresponding to each image to be scored to obtain corresponding expected scoring distribution data.
In some embodiments, when the processor 602 performs acquiring the image sample set, it may perform:
acquiring an initial image sample set, wherein the initial image sample set comprises a plurality of sample images with initial grading distribution data;
carrying out image preprocessing on each sample image to obtain a plurality of first sample images with initial grading distribution data;
adding random noise into the first sample image to obtain a first target sample image;
and obtaining an image sample set according to the sample image and the first target sample image, and taking the sample image and the first target sample image as images to be scored.
In this embodiment, the processor 602 in the electronic device loads the executable code corresponding to the processes of one or more application programs into the memory 601 according to the following instructions, and the processor 602 runs the application programs stored in the memory 601, thereby implementing the following processes:
receiving an aesthetic scoring request;
acquiring a target image needing aesthetic scoring according to the aesthetic scoring request;
calling a pre-trained scoring model;
performing aesthetic scoring on the target image according to the scoring model to obtain a target scoring score corresponding to the target image;
the scoring model is obtained by training through the network model training method provided by the embodiment of the application.
In some embodiments, when the processor 602 performs aesthetic scoring on the target image according to the scoring model to obtain a target scoring score corresponding to the target image, the following steps may be performed:
performing aesthetic scoring on the target image according to the scoring model to obtain target scoring distribution data corresponding to the target image;
and taking the score with the maximum probability value in the target score distribution data as a target score.
Referring to fig. 8, fig. 8 is a second schematic structural diagram of an electronic device according to an embodiment of the present disclosure, which is different from the electronic device shown in fig. 7 in that the electronic device further includes: a camera assembly 603, a radio frequency circuit 604, an audio circuit 605, and a power supply 606. The display 603, the rf circuit 604, the audio circuit 605 and the power supply 606 are electrically connected to the processor 602 respectively.
The display 603 may be used to display information entered by or provided to the user as well as various graphical user interfaces, which may be made up of graphics, text, icons, video, and any combination thereof. The Display 603 may include a Display panel, and in some embodiments, the Display panel may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.
The rf circuit 604 may be used for transceiving rf signals to establish wireless communication with a network device or other electronic devices through wireless communication, and for transceiving signals with the network device or other electronic devices.
The audio circuit 605 may be used to provide an audio interface between the user and the electronic device through a speaker, microphone.
The power supply 606 may be used to power various components of the electronic device 600. In some embodiments, power supply 606 may be logically coupled to processor 602 through a power management system, such that functions to manage charging, discharging, and power consumption management are performed through the power management system.
Although not shown in fig. 8, the electronic device 600 may also include a camera assembly, which may include Image Processing circuitry, which may be implemented using hardware and/or software components, a bluetooth module, etc., which may include various Processing units that define an Image Signal Processing (Image Signal Processing) pipeline. The image processing circuit may include at least: a plurality of cameras, an Image Signal Processor (ISP Processor), control logic, an Image memory, and a display. Where each camera may include at least one or more lenses and an image sensor. The image sensor may include an array of color filters (e.g., Bayer filters). The image sensor may acquire light intensity and wavelength information captured with each imaging pixel of the image sensor and provide a set of raw image data that may be processed by an image signal processor.
In the above embodiments, the descriptions of the embodiments have respective emphasis, and parts that are not described in detail in a certain embodiment may refer to the above detailed description of the training method/image processing method for the network model, and are not described here again.
The network model training method/image processing method apparatus provided in the embodiment of the present application and the network model training method/image processing method in the above embodiments belong to the same concept, and any method provided in the network model training method/image processing method embodiment may be run on the network model training method/image processing method apparatus, and a specific implementation process thereof is described in detail in the network model training method/image processing method embodiment, and is not described herein again.
It should be noted that, for the training method/image processing method of the network model described in the embodiment of the present application, it can be understood by those skilled in the art that all or part of the process of implementing the training method/image processing method of the network model described in the embodiment of the present application can be completed by controlling the related hardware through a computer program, where the computer program can be stored in a computer-readable storage medium, such as a memory, and executed by at least one processor, and during the execution process, the process of the embodiment of the training method/image processing method of the network model can be included. The storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like.
For the training method of a network model/the processing method device of an image in the embodiment of the present application, each functional module may be integrated in one processing chip, or each module may exist alone physically, or two or more modules may be integrated in one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium, such as a read-only memory, a magnetic or optical disk, or the like.
The network model training method, the image processing device, the storage medium and the electronic device provided by the embodiment of the application are described in detail, a specific example is applied to explain the principle and the implementation of the application, and the description of the embodiment is only used for helping to understand the method and the core idea of the application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (20)

  1. A method for training a network model, comprising:
    acquiring an image sample set, wherein the image sample set comprises a plurality of images to be scored with initial scoring distribution data;
    constructing a basic network model and a plurality of loss functions corresponding to the basic network model;
    inputting the image sample set into the basic network model for aesthetic scoring to obtain scoring distribution data corresponding to each image to be scored;
    training the basic network model according to the grading distribution data, the initial grading distribution data and the loss functions until the basic network model converges;
    the converged underlying network model is used as a scoring model for aesthetically scoring the images.
  2. The method of claim 1, wherein the step of constructing a base network model and a plurality of loss functions corresponding to the base network model is preceded by the step of:
    adjusting initial score distribution data corresponding to each image to be scored to obtain corresponding expected score distribution data, wherein the standard deviation of the expected score distribution data corresponding to each image to be scored is smaller than the standard deviation of the initial score distribution data;
    the step of training the base network model according to the score distribution data, the initial score distribution data, and the plurality of loss functions includes:
    and training the basic network model according to the grading distribution data, the expected grading distribution data and the loss functions.
  3. The method of claim 2, wherein the plurality of loss functions includes a first loss function and a second loss function, and the step of training the base network model based on the score distribution data, the expected score distribution data, and the plurality of loss functions comprises:
    inputting the score distribution data and the expected score distribution data into a first loss function to obtain a first loss value;
    inputting the score distribution data and the expected score distribution data into a second loss function to obtain a second loss value;
    determining a target loss value according to the first loss value and the second loss value;
    and adjusting parameters of the basic network model according to the target loss value.
  4. The method of claim 3, wherein the step of determining a target loss value from the first loss value and the second loss value comprises:
    multiplying the first loss value by a first weight value to obtain a third loss value;
    multiplying the second loss value by a second weight value to obtain a fourth loss value;
    and adding the third loss value and the fourth loss value to obtain a target loss value.
  5. The method according to claim 2, wherein the step of adjusting the initial score distribution data corresponding to each image to be scored to obtain the corresponding expected score distribution data comprises:
    and performing exponential processing on each scoring data in the initial scoring distribution data corresponding to each image to be scored to obtain corresponding expected scoring distribution data.
  6. The method of any of claims 3 to 5, wherein the first loss function is:
    Figure PCTCN2019118135-APPB-100001
    wherein the content of the first and second substances,
    Figure PCTCN2019118135-APPB-100002
    a function CDF for desired score distribution data, p for score distribution data, N for number of score scoresp(k) Sum function
    Figure PCTCN2019118135-APPB-100003
    Is a cumulative probability distribution function, k is a score, CDFp(k) Represents a cumulative probability value when the score is k in the score distribution data,
    Figure PCTCN2019118135-APPB-100004
    expressing the cumulative probability value when the score is k in the expected score distribution data, wherein l is an index value;
    the second loss function is:
    Figure PCTCN2019118135-APPB-100005
    wherein the content of the first and second substances,
    Figure PCTCN2019118135-APPB-100006
    a function CDF for desired score distribution data, p for score distribution data, N for number of score scoresp(k) Sum function
    Figure PCTCN2019118135-APPB-100007
    Is a cumulative probability distribution function, k is a score, CDFp(k) Represents a cumulative probability value when the score is k in the score distribution data,
    Figure PCTCN2019118135-APPB-100008
    indicating the cumulative probability value when the score is k in the expected score distribution data.
  7. The method of any of claims 1 to 5, wherein the step of obtaining a sample set of images comprises:
    acquiring an initial image sample set, wherein the initial image sample set comprises a plurality of sample images with initial grading distribution data;
    carrying out image preprocessing on each sample image to obtain a plurality of first sample images with initial grading distribution data;
    adding random noise into the first sample image to obtain a first target sample image;
    and taking the sample image and the first target sample image as images to be scored to obtain a sample image set.
  8. A method of processing an image, comprising:
    receiving an aesthetic scoring request;
    acquiring a target image needing aesthetic scoring according to the aesthetic scoring request;
    calling a pre-trained scoring model;
    performing aesthetic scoring on the target image according to the scoring model to obtain a target scoring score corresponding to the target image;
    wherein the scoring model is obtained by training the network model according to any one of claims 1 to 7.
  9. The method of claim 8, wherein the step of performing an aesthetic scoring on the target image according to the scoring model to obtain a target scoring score corresponding to the target image comprises:
    performing aesthetic scoring on the target image according to the scoring model to obtain target scoring distribution data corresponding to the target image;
    and taking the score with the maximum probability value in the target score distribution data as a target score.
  10. An apparatus for training a network model, comprising:
    the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring an image sample set, and the image sample set comprises a plurality of images to be scored with initial scoring distribution data;
    the system comprises a construction module, a data processing module and a data processing module, wherein the construction module is used for constructing a basic network model and a plurality of loss functions corresponding to the basic network model;
    the first scoring module is used for inputting the image sample set into the basic network model for aesthetic scoring to obtain scoring distribution data corresponding to each image to be scored;
    the training module is used for training the basic network model according to the score distribution data, the initial score distribution data and the loss functions until the basic network model converges;
    and the determining module is used for taking the converged basic network model as a grading model for performing aesthetic grading on the image.
  11. An apparatus for processing an image, comprising:
    a receiving module for receiving an aesthetic scoring request;
    the second acquisition module is used for acquiring a target image needing aesthetic scoring according to the aesthetic scoring request;
    the calling model is used for calling a pre-trained scoring model;
    the second scoring module is used for performing aesthetic scoring on the target image according to the scoring model to obtain a target scoring score corresponding to the target image;
    wherein the scoring model is obtained by training the network model according to any one of claims 1 to 7.
  12. A storage medium, wherein a computer program is stored in the storage medium, which when run on a computer causes the computer to execute the method of training a network model according to any one of claims 1 to 7 or the method of processing an image according to any one of claims 8 to 9.
  13. An electronic device, wherein the electronic device comprises a processor and a memory, wherein the memory stores a computer program, and the processor is configured to execute, by calling the computer program stored in the memory:
    acquiring an image sample set, wherein the image sample set comprises a plurality of images to be scored with initial scoring distribution data;
    constructing a basic network model and a plurality of loss functions corresponding to the basic network model;
    inputting the image sample set into the basic network model for aesthetic scoring to obtain scoring distribution data corresponding to each image to be scored;
    training the basic network model according to the grading distribution data, the initial grading distribution data and the loss functions until the basic network model converges;
    the converged underlying network model is used as a scoring model for aesthetically scoring the images.
  14. The electronic device of claim 13, wherein the processor is configured to perform:
    adjusting initial score distribution data corresponding to each image to be scored to obtain corresponding expected score distribution data, wherein the standard deviation of the expected score distribution data corresponding to each image to be scored is smaller than the standard deviation of the initial score distribution data;
    and training the basic network model according to the grading distribution data, the expected grading distribution data and the loss functions.
  15. The electronic device of claim 14, wherein the processor is configured to perform:
    inputting the score distribution data and the expected score distribution data into a first loss function to obtain a first loss value;
    inputting the score distribution data and the expected score distribution data into a second loss function to obtain a second loss value;
    determining a target loss value according to the first loss value and the second loss value;
    and adjusting parameters of the basic network model according to the target loss value.
  16. The electronic device of claim 15, wherein the processor is configured to perform:
    multiplying the first loss value by a first weight value to obtain a third loss value;
    multiplying the second loss value by a second weight value to obtain a fourth loss value;
    and adding the third loss value and the fourth loss value to obtain a target loss value.
  17. The electronic device of claim 14, wherein the processor is configured to perform:
    and performing exponential processing on each scoring data in the initial scoring distribution data corresponding to each image to be scored to obtain corresponding expected scoring distribution data.
  18. The electronic device of any of claims 13-17, wherein the processor is configured to perform:
    acquiring an initial image sample set, wherein the initial image sample set comprises a plurality of sample images with initial grading distribution data;
    carrying out image preprocessing on each sample image to obtain a plurality of first sample images with initial grading distribution data;
    adding random noise into the first sample image to obtain a first target sample image;
    and taking the sample image and the first target sample image as images to be scored to obtain an image sample set.
  19. An electronic device, wherein the electronic device comprises a processor and a memory, wherein the memory stores a computer program, and the processor is configured to execute, by calling the computer program stored in the memory:
    receiving an aesthetic scoring request;
    acquiring a target image needing aesthetic scoring according to the aesthetic scoring request;
    calling a pre-trained scoring model;
    performing aesthetic scoring on the target image according to the scoring model to obtain a target scoring score corresponding to the target image;
    wherein the scoring model is obtained by training the network model according to any one of claims 1 to 7.
  20. The electronic device of claim 19, wherein the processor is configured to perform:
    performing aesthetic scoring on the target image according to the scoring model to obtain target scoring distribution data corresponding to the target image;
    and taking the score with the maximum probability value in the target score distribution data as a target score.
CN201980100428.4A 2019-11-13 2019-11-13 Network model training method, image processing method and device and electronic equipment Pending CN114402356A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/118135 WO2021092808A1 (en) 2019-11-13 2019-11-13 Network model training method, image processing method and device, and electronic device

Publications (1)

Publication Number Publication Date
CN114402356A true CN114402356A (en) 2022-04-26

Family

ID=75911632

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980100428.4A Pending CN114402356A (en) 2019-11-13 2019-11-13 Network model training method, image processing method and device and electronic equipment

Country Status (2)

Country Link
CN (1) CN114402356A (en)
WO (1) WO2021092808A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554327A (en) * 2021-07-29 2021-10-26 上海千内云教育软件科技有限公司 Sketch work intelligent grading and quantitative scoring method based on deep learning
CN113962965B (en) * 2021-10-26 2023-06-09 腾讯科技(深圳)有限公司 Image quality evaluation method, device, equipment and storage medium
CN114186497B (en) * 2021-12-15 2023-03-24 湖北工业大学 Intelligent analysis method, system, equipment and medium for value of art work
CN117315438A (en) * 2023-09-25 2023-12-29 北京邮电大学 Image color aesthetic evaluation method, device and equipment based on interest points

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10303977B2 (en) * 2016-06-28 2019-05-28 Conduent Business Services, Llc System and method for expanding and training convolutional neural networks for large size input images
CN106651830A (en) * 2016-09-28 2017-05-10 华南理工大学 Image quality test method based on parallel convolutional neural network
CN108520213B (en) * 2018-03-28 2021-10-19 五邑大学 Face beauty prediction method based on multi-scale depth
CN109344855B (en) * 2018-08-10 2021-09-24 华南理工大学 Depth model face beauty evaluation method based on sequencing guided regression
CN109801256B (en) * 2018-12-15 2023-05-26 华南理工大学 Image aesthetic quality assessment method based on region of interest and global features
CN109902912B (en) * 2019-01-04 2023-04-07 中国矿业大学 Personalized image aesthetic evaluation method based on character features
CN110223292B (en) * 2019-06-20 2022-01-25 厦门美图之家科技有限公司 Image evaluation method, device and computer readable storage medium

Also Published As

Publication number Publication date
WO2021092808A1 (en) 2021-05-20

Similar Documents

Publication Publication Date Title
CN107943860B (en) Model training method, text intention recognition method and text intention recognition device
CN109461167B (en) Training method, matting method, device, medium and terminal of image processing model
CN110473141B (en) Image processing method, device, storage medium and electronic equipment
CN114402356A (en) Network model training method, image processing method and device and electronic equipment
JP2021056991A (en) Recommendation method, device, electronic device, storage medium, and program
US20210224592A1 (en) Method and device for training image recognition model, and storage medium
CN110741377A (en) Face image processing method and device, storage medium and electronic equipment
WO2021138855A1 (en) Model training method, video processing method and apparatus, storage medium and electronic device
WO2021114847A1 (en) Internet calling method and apparatus, computer device, and storage medium
CN110162604B (en) Statement generation method, device, equipment and storage medium
CN113160819B (en) Method, apparatus, device, medium, and product for outputting animation
CN111709398A (en) Image recognition method, and training method and device of image recognition model
CN107291772A (en) One kind search access method, device and electronic equipment
CN114840734B (en) Training method of multi-modal representation model, cross-modal retrieval method and device
CN111382644A (en) Gesture recognition method and device, terminal equipment and computer readable storage medium
CN111275683B (en) Image quality grading processing method, system, device and medium
CN110866114B (en) Object behavior identification method and device and terminal equipment
CN112488157A (en) Dialog state tracking method and device, electronic equipment and storage medium
CN113762585B (en) Data processing method, account type identification method and device
WO2021147421A1 (en) Automatic question answering method and apparatus for man-machine interaction, and intelligent device
CN116259083A (en) Image quality recognition model determining method and related device
CN114358102A (en) Data classification method, device, equipment and storage medium
CN103164504A (en) Smartphone refined picture searching system and method
CN113806532B (en) Training method, device, medium and equipment for metaphor sentence judgment model
CN110909190B (en) Data searching method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination