WO2021092808A1 - 网络模型的训练方法、图像的处理方法、装置及电子设备 - Google Patents

网络模型的训练方法、图像的处理方法、装置及电子设备 Download PDF

Info

Publication number
WO2021092808A1
WO2021092808A1 PCT/CN2019/118135 CN2019118135W WO2021092808A1 WO 2021092808 A1 WO2021092808 A1 WO 2021092808A1 CN 2019118135 W CN2019118135 W CN 2019118135W WO 2021092808 A1 WO2021092808 A1 WO 2021092808A1
Authority
WO
WIPO (PCT)
Prior art keywords
scoring
distribution data
image
network model
score
Prior art date
Application number
PCT/CN2019/118135
Other languages
English (en)
French (fr)
Inventor
郭子亮
Original Assignee
深圳市欢太科技有限公司
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市欢太科技有限公司, Oppo广东移动通信有限公司 filed Critical 深圳市欢太科技有限公司
Priority to PCT/CN2019/118135 priority Critical patent/WO2021092808A1/zh
Priority to CN201980100428.4A priority patent/CN114402356A/zh
Publication of WO2021092808A1 publication Critical patent/WO2021092808A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image

Definitions

  • the present invention relates to the field of machine learning, in particular to a training method of a network model, an image processing method, device and electronic equipment.
  • image aesthetics assessment is a recent research hotspot in the direction of computer vision perception and comprehension.
  • Image aesthetics reflects the human pursuit and yearning for "beautiful" things visually. Therefore, it is of great significance to evaluate visual aesthetics in the fields of photography and photography, advertising design, and artistic production.
  • the embodiments of the present application provide a network model training method, image processing method, device, and electronic equipment, which can improve the accuracy of network model training.
  • an embodiment of the present application provides a method for training a network model, including:
  • the image sample set including a plurality of images to be scored with initial score distribution data
  • the convergent basic network model is used as a scoring model for aesthetic scoring of images.
  • an image processing method including:
  • the scoring model is obtained by training using the training method of the network model provided in this embodiment.
  • an embodiment of the present application provides a network model training device, including:
  • the first acquisition module is configured to acquire an image sample set, the image sample set including a plurality of images to be scored with initial score distribution data;
  • the first scoring module is configured to input the image sample set into the basic network model for aesthetic scoring, and obtain score distribution data corresponding to each image to be scored;
  • a training module configured to train the basic network model according to the score distribution data, the initial score distribution data, and the multiple loss functions until the basic network model converges;
  • the determining module is used to use the convergent basic network model as a scoring model for aesthetic scoring of the image.
  • an image processing device including:
  • the receiving module is used to receive aesthetic scoring requests
  • the second acquisition module is configured to acquire the target image that needs to be aesthetically scored according to the aesthetic score request;
  • the second scoring module is configured to perform aesthetic scoring on the target image according to the scoring model to obtain a target scoring score corresponding to the target image;
  • the scoring model is obtained by training using the training method of the network model provided in this embodiment.
  • an embodiment of the present application provides a storage medium on which a computer program is stored, wherein when the computer program is executed on a computer, the computer is caused to execute the network model training method provided in this embodiment or Image processing method.
  • an embodiment of the present application provides an electronic device, including a memory and a processor, the memory stores a computer program, and the processor invokes the computer program stored in the memory to execute:
  • the image sample set including a plurality of images to be scored with initial score distribution data
  • the convergent basic network model is used as a scoring model for aesthetic scoring of images.
  • an embodiment of the present application provides an electronic device, including a memory and a processor, the memory stores a computer program, and the processor invokes the computer program stored in the memory to execute:
  • the scoring model is obtained by training using the network model training method provided in the embodiment of the present application.
  • FIG. 1 is a schematic diagram of the first flow of a training method for a network model provided by an embodiment of the present application.
  • Fig. 2 is a schematic diagram of the second flow of the network model training method provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of the distribution of initial score distribution data and expected score distribution data provided by an embodiment of the present application.
  • Fig. 4 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • Fig. 5 is a schematic structural diagram of a training device for a network model provided by an embodiment of the present application.
  • Fig. 6 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of the first structure of an electronic device provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a second structure of an electronic device provided by an embodiment of the present application.
  • the embodiment of the present application provides a method for training a network model.
  • the execution subject of the training method of the network model may be the training device of the network model provided in the embodiment of the present application, or an electronic device integrated with the training device of the network model.
  • the training device of the network model can be implemented in hardware or software, and the electronic device can be a device equipped with a processor and processing capability, such as a smart phone, a tablet computer, a palmtop computer, a notebook computer, or a desktop computer.
  • a processor and processing capability such as a smart phone, a tablet computer, a palmtop computer, a notebook computer, or a desktop computer.
  • the following will take the electronic device as an example for the execution of the network model training method.
  • FIG. 1 is a schematic diagram of the first process of the network model training method provided by the embodiment of the present application.
  • the process of the training method of the network model may include:
  • the electronic device can obtain the image sample set through a wired connection or a wireless connection.
  • the image sample set can use the first existing large-scale aesthetic quality evaluation database (A Large-Scale Database for Aesthetic Visual Analysis, AVA).
  • AVA public data set the first existing large-scale aesthetic quality evaluation database
  • AVA public data set the first large-scale aesthetic quality evaluation database
  • the AVA public data set contains approximately 256,000 sample images, and each sample image is aesthetically scored by multiple different users. , Where the scoring score of the aesthetic score is multiple natural numbers between [1,10], for example, the scoring score can be 1, 2, or 10, etc.
  • the initial score distribution data corresponding to the sample image can be generated according to the score data of a plurality of different users, and the initial score distribution data includes the number of scores corresponding to each score. It is understandable that the higher the aesthetic score, the higher the aesthetic quality of the sample image.
  • the number of people rated for each sample image in the AVA public data set is between 78 and 539. On average, 210 people participate in aesthetic scoring for each sample image. Therefore, a large amount of aesthetic scoring data for each sample image can better reflect the evaluation and perception of an image for the public. Therefore, this data set is a recognized benchmark test set in the field of image aesthetics evaluation. Therefore, this application uses the AVA public data set to train the original network model to obtain a more accurate scoring model for scoring image aesthetics.
  • the lightweight deep neural network model is smaller and faster, it is widely used in embedded electronic devices such as smart phones. Therefore, a lightweight deep neural network model can be constructed, and the lightweight deep neural network model can be used as a basic network model, so that the trained basic network model can be applied to electronic devices such as smart phones, and then the smart phone can aesthetics the image Score, and further improve the smart phone's intelligence.
  • the MobileNets model can be used as the basic network model.
  • the core of MobileNets is the depthwise separable convolution composed of deep convolution (depthwise conv) and point-to-point convolution (Point Conv). It is this kind of depthwise separable convolution
  • the product structure enables the MobileNets model to reduce network parameters and calculations without reducing network performance.
  • the MobileNetV2 network model may be constructed as the basic network model. Further, in order to make the score distribution data output by the basic network model be between [0,1], the softmax function can be used as the output function of the output layer of the basic network model, so that the score distribution data output by the basic network model includes each The probability value corresponding to the scoring score.
  • the ImageNet database is a large-scale visualization database for the research of visual object recognition software.
  • the ImageNet database contains more than 15 million images, of which 1.2 million images are divided into 1000 categories (approximately 1 million images contain bounding boxes and annotations).
  • the MobileNetV2 network model is pre-trained on the ImageNet database, and the MobileNetV2 network model with better network parameters can be obtained at the end of the training. Using the pre-trained MobileNetV2 network model as the basic network model can greatly shorten the basic network model. Training time.
  • the scoring distribution data output by the basic network model includes the probability distribution data of each scoring score, it is necessary to obtain a corresponding function for processing the probability distribution as multiple loss functions of the corresponding basic network model.
  • the difference between the score distribution data of each image to be scored and the initial score distribution data can be better captured, and the score distribution data can be better fitted. Therefore, the basic network model can be better trained through multiple loss functions to obtain a more accurate scoring model.
  • the images to be scored in the image sample set are input into a basic network model such as the MobileNetV2 network model for aesthetic scoring, and the score distribution data corresponding to each image to be scored output by the basic network model is obtained.
  • the score distribution data includes the probability value corresponding to each score.
  • the output score distribution data is Where p s1 is the probability value when the score is 1, Is the probability value when the scoring score is 2, Is the probability value when the scoring score is N-1, It is the probability value when the score is N.
  • the basic model in order to keep the score distribution data output by the basic network model consistent with the initial score distribution data of the user's actual score, the basic model can be trained through multiple loss functions.
  • the specific training process can be as follows: After the aesthetic scoring of a batch of images to be scored is completed, the score distribution data and initial score distribution data of a batch of images to be scored can be input into multiple loss functions for calculation to obtain the corresponding target loss value. Determine whether the basic network model meets the convergence condition according to the target loss value. When the target loss value gradually approaches a certain value, or fluctuates around a certain value, and the loss change is less than a small positive number, the basic network model can be confirmed to have converged .
  • the target loss value does not meet the above conditions, it means that the basic network model does not meet the convergence conditions.
  • the target loss value is returned to the basic network model, and the network parameters of the basic network model, namely the weight sum, are calculated according to the backpropagation algorithm.
  • the bias value is adjusted. And obtain the image to be scored that has not been aesthetically scored by the basic network model, and then continue to train the adjusted basic network model according to the image to be scored that has not been aesthetically scored by the basic network model and multiple loss functions, until the basic network model convergence.
  • the convergent basic network model is used as a scoring model for aesthetic scoring of the image.
  • the scoring model can be applied to an electronic device to perform aesthetic scoring on multiple images stored in the electronic device by the user according to the scoring model, sort the multiple images according to the scoring score, and display the images according to the result of the sorting process, This allows users to preferentially browse images with higher aesthetic scores, that is, images with higher aesthetic quality.
  • the network model training method obtains an image sample set, which includes multiple images to be scored with initial score distribution data; constructs a basic network model and multiple corresponding basic network models.
  • a loss function input the image sample set into the basic network model for aesthetic scoring, and obtain the score distribution data corresponding to each image to be scored; train the basic network model according to the score distribution data, initial score distribution data and multiple loss functions , Until the basic network model converges; the converged basic network model is used as a scoring model for aesthetic scoring of images.
  • the basic network model can be trained through multiple loss functions to obtain a more accurate scoring model, which improves the accuracy of the scoring model.
  • FIG. 2 is a schematic diagram of a second flow of the network model training method provided by an embodiment of the present application.
  • the process of the training method of the network model may include:
  • the initial image sample set includes multiple sample images with initial score distribution data.
  • the initial image sample set can use the existing AVA public data set.
  • the AVA public data set is a database for aesthetic quality evaluation.
  • the AVA public data set includes approximately 256,000 sample images, and each sample image is aesthetically scored by a number of different users.
  • the scoring score is multiple natural numbers between [1,10], for example, the scoring score can be 10 scoring scores.
  • the initial score distribution data corresponding to the sample image can be generated according to the score data of a plurality of different users, and the initial score distribution data includes the number of scores corresponding to each score. It is understandable that the higher the aesthetic score, the higher the aesthetic quality of the sample image.
  • the number of people rated for each sample image in the AVA public data set is between 78 and 539.
  • 210 people participate in aesthetic scoring for each sample image so a large amount of aesthetic scoring data for each sample image can better reflect the evaluation and perception of an image for the public. Therefore, this data set is a recognized benchmark test set in the field of image aesthetics evaluation. Therefore, this application uses the AVA public data set to train the original network model to obtain a more accurate scoring model for scoring image aesthetics.
  • the image preprocessing methods include conventional multi-scale scaling, random flip, random translation, and random cropping, etc., to obtain multiple first sample images through image preprocessing of the sample images, and the first sample images are also
  • the image to be scored is input into the basic model for aesthetic scoring, so as to enhance the data volume of the input image, that is, the image to be scored.
  • this application does not perform large-scale random cropping and random color change image preprocessing to ensure that the first sample image and the corresponding sample Images are similar in composition and color information.
  • random scaling, random flipping, and random translation of the sample image will not change the composition and color information of the sample image too much. That is, the obtained first sample image and the corresponding sample image are similar in composition, color and other information, so the initial score distribution data of the corresponding sample image can be used as the initial score distribution data of the first sample image.
  • the first sample image is obtained by randomly changing the sample image, such as multi-scale scaling, random flip, random translation, and random cropping
  • this small difference does not cause a difference in visual recognition, it will directly affect the score distribution data output by the basic network model, so that the score distribution data corresponding to the first sample image and the score distribution data of the corresponding sample image are directly affected. has a difference. Therefore, after image preprocessing is performed on each sample image, the small difference operation caused by the image preprocessing will affect the aesthetic score of the first sample image obtained, making the score distribution data predicted by the basic network model unstable.
  • a tiny random noise such as Gaussian noise can be added to each pixel of the first sample image to obtain the first target sample image.
  • small changes in image pixels will not cause a difference in visual recognition, that is, small changes in image pixel values are difficult for the user's eyes to distinguish. That is to say, the first target sample image and the first sample image are the same in the user's vision, so the user's score on the first target sample image and the first sample image will not change. Therefore, in an ideal environment, the score distribution data of the basic network model corresponding to the first sample image and the first target sample image should not be very different. Therefore, the basic network model needs to be trained through the first target sample image, so that the basic network model can adapt to small changes in the input sample image, making the basic network model more robust to small changes in pixel values, thereby making the predicted score distribution The data is more stable.
  • the sample image and the first target sample image are both used as the image to be scored, and the image sample set is obtained, so that the data amount of the image to be scored in the finally obtained image sample set is greater than the initial image sample set.
  • the image sample set with a larger amount of data is more suitable for training the basic network model, so that the prediction result of the scoring model after the training is completed is more accurate.
  • the sample image and the first target sample image are used as images to be scored to train the basic network model.
  • the initial score distribution data of each image to be scored can better reflect the evaluation and perception of an image for the public, because the standard deviation of the initial score distribution data is large, and the scores are mostly concentrated in the middle part That is, it is concentrated between the scores [3, 7], so that the initial score distribution data is relatively flat. Therefore, when the basic network model is trained and learned through the initial score distribution data, the standard deviation of the score distribution data output by the trained scoring model will also be large. As a result, the aesthetic quality of images with different scoring scores are similar, and it is difficult to make detailed distinctions in aesthetic quality. For example, the aesthetic quality of the images corresponding to the scores from 3 to 7 of the scoring model output by the training is similar, and it is impossible to better reflect the aesthetic quality of the image based on the scores.
  • each score data in the initial score distribution data needs to be indexed to obtain expected score distribution data with a smaller standard deviation. For example, exponential calculation is performed on each score data in the initial score distribution data to obtain the expected score distribution data.
  • FIG. 3 is a schematic diagram of the distribution of initial score distribution data and expected score distribution data provided by an embodiment of the application.
  • the curve A is the initial score distribution data corresponding to a certain image to be scored provided by the embodiment of the application.
  • the abscissa in FIG. 3 is the score score, and the ordinate is the probability value corresponding to the score score.
  • the index is 2
  • the square calculation is performed on each score data in the initial score distribution data to obtain the expected score distribution data with a small standard deviation, that is, the curve B shown in FIG. 3. It can be seen from Figure 3 that the probability distribution of the scores in curve B is more concentrated, which means that the standard deviation of curve B is smaller than the standard deviation of curve A.
  • each score data in the initial score distribution data can be indexed to obtain the expected score distribution data with a small standard deviation.
  • the basic network model is trained and learned through the expected scoring distribution data with a smaller standard deviation, a scoring model with a smaller standard deviation of the output result can be obtained.
  • the Gaussian function is a Gaussian function with a small standard deviation, that is, the probability distribution of the Gaussian function is relatively concentrated. Therefore, the initial score distribution data can be multiplied by the Gaussian function in the probability distribution set to reduce the standard deviation of the initial score distribution data, and the expected score distribution data can be obtained.
  • the probability distribution of the Gaussian function is relatively absolute, that is, there will only be a single peak close to the middle value.
  • the probability distribution of image scores may have multiple peaks, that is, some people think it looks good and others find it difficult to see polarization. For example, when the score is 3, the probability value is larger, and the score is The probability value when it is 8 is also larger. At this time, the initial score distribution data will form two peaks at the score of 3 and the score of 8. In this case, when the Gaussian function is used to reduce the standard deviation of the initial score distribution data, the score score with a large probability value will move closer to the middle value, such as the score score of 5, which makes the expected score distribution data inaccurate. Therefore, when adjusting the initial score distribution data according to the Gaussian function, it is also necessary to choose according to the actual situation.
  • the lightweight deep neural network model is smaller and faster, it is widely used in embedded electronic devices such as smart phones. Therefore, a lightweight deep neural network model can be constructed, and the lightweight deep neural network model can be used as a basic network model, so that the trained basic network model can be applied to electronic devices such as smart phones, and then the smart phone can aesthetics the image Score, and further improve the smart phone's intelligence.
  • the MobileNets model can be used as the basic network model.
  • the core of MobileNets is the depthwise separable convolution composed of deep convolution (depthwise conv) and point-to-point convolution (Point Conv). It is this kind of depthwise separable convolution
  • the product structure enables the MobileNets model to reduce network parameters and calculations without reducing network performance.
  • the MobileNetV2 network model may be constructed as the basic network model. Further, in order to make the score distribution data output by the basic network model be between [0,1], the softmax function can be used as the output function of the output layer of the basic network model, so that the score distribution data output by the basic network model includes each The probability value corresponding to the scoring score.
  • the MobileNetV2 network model needs to be pre-trained on ImageNet, and the pre-trained MobileNetV2 network model is used as the basic network model.
  • ImageNet is a large-scale visualization database for the research of visual object recognition software.
  • the ImageNet database contains more than 15 million images, of which 1.2 million images are divided into 1000 categories (approximately 1 million images contain bounding boxes and annotations).
  • the MobileNetV2 network model is pre-trained on ImageNet. At the end of the training, a MobileNetV2 network model with better model parameters can be obtained.
  • Using the pre-trained MobileNetV2 network model as the basic network model can greatly shorten the basic network model. Training time.
  • the scoring distribution data output by the basic network model includes the probability distribution data of each scoring score, it is necessary to obtain corresponding different functions for processing the probability distribution as the first loss function and the second loss function of the corresponding basic network model .
  • the difference between the score distribution data of each image to be scored and the initial score distribution data can be better captured, and the score distribution data can be better fitted. Therefore, the basic network model can be better trained through multiple loss functions to obtain a more accurate scoring model.
  • a function used to measure the distance between two distributions can be used to construct a first loss function.
  • the first loss function can be an Earth Mover's Distance (EMD) loss function.
  • EMD Earth Mover's Distance
  • the first loss function is:
  • p is the scoring distribution data
  • N is the number of scoring scores
  • N 10
  • k is the scoring score
  • CDF p (k) represents the cumulative probability value when the scoring score is k in the scoring distribution data
  • k is the scoring score, Represents the probability value of the score i in the score distribution data; k is the scoring score, Represents the probability value of the score i in the expected score distribution data.
  • the second loss function may be a CJS loss function, and the second loss function is:
  • p is the scoring distribution data
  • N is the number of scoring scores
  • N 10
  • k is the scoring score
  • CDF p (k) represents the cumulative probability value when the scoring score is k in the scoring distribution data
  • k is the scoring score, Represents the probability value of the score i in the score distribution data; k is the scoring score, Represents the probability value of the score i in the expected score distribution data.
  • the images to be scored in the image sample set are input into a basic network model such as the MobileNetV2 network model for aesthetic scoring, and the score distribution data corresponding to each image to be scored output by the basic network model is obtained.
  • the score distribution data includes the probability value corresponding to each score.
  • the output score distribution data is Where p s1 is the probability value when the score is 1, Is the probability value when the scoring score is 2, Is the probability value when the scoring score is N-1, It is the probability value when the score is N.
  • the score distribution data and expected score distribution data corresponding to each image to be scored are input into the first loss function, and the first loss value corresponding to each image to be scored is obtained.
  • the score distribution data and expected score distribution data corresponding to each image to be scored are input into the second loss function to obtain the second loss value corresponding to each image to be scored.
  • the electronic device can determine the target loss value according to the first loss value and the second loss value. Specifically, the first loss value can be multiplied by the first weight value to obtain the third loss value. And multiply the second loss value by the second weight value to obtain the fourth loss value. Finally, the third loss value and the fourth loss value are added to obtain the target loss value.
  • the first weight value and the second weight value can be set according to actual conditions, and the first weight value and the second weight value can be two values with equal values, or two values with different values. For example, the first weight value may be 0.6, and the second weight value may be 0.4. Or the first weight value and the second weight value may both be 0.5.
  • the target loss value determines whether the basic network model meets the convergence condition according to the target loss value.
  • the target loss value gradually approaches a certain value, or fluctuates near a certain value, and the loss change is less than a small positive number, the basic network can be confirmed The model converges. If the target loss value does not meet the above conditions, it means that the basic network model does not meet the convergence conditions.
  • the target loss value is returned to the basic network model, and the network parameters of the basic network model, namely the weight sum, are calculated according to the backpropagation algorithm. The bias value is adjusted. And continue to train the adjusted basic network model until the basic network model converges.
  • a network model with adjusted parameters can be obtained, and the electronic device can obtain a batch of network models with adjusted input parameters for verification images from the verification set.
  • the electronic device can save the parameters of the network model after the parameter adjustment.
  • the electronic device may not save the network model after the parameter adjustment.
  • the electronic device can confirm the completion of the basic network model training, that is, the basic network model converges.
  • a convergent basic network model is obtained, and the convergent basic network model is used as a scoring model for aesthetic scoring of the image.
  • the scoring model can be applied to electronic devices to score aesthetically the user’s images stored in the electronic device according to the scoring model, sort the images according to the scoring score, and display the images according to the result of the sorting process, so that the user can browse first An image with a higher aesthetic score is an image with a higher aesthetic quality.
  • the network model training method obtains an image sample set, which includes multiple images to be scored with initial score distribution data; constructs a basic network model and multiple corresponding basic network models.
  • a loss function input the image sample set into the basic network model for aesthetic scoring, and obtain the score distribution data corresponding to each image to be scored; train the basic network model according to the score distribution data, initial score distribution data and multiple loss functions , Until the basic network model converges; the converged basic network model is used as a scoring model for aesthetic scoring of images.
  • the basic network model can be trained through multiple loss functions to obtain a more accurate scoring model, which improves the accuracy of the scoring model.
  • FIG. 4 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • the flow of the image processing method may include:
  • the electronic device when the electronic device receives a touch operation of a target component, a preset voice operation, or an opening instruction of a preset target application, the generation of an aesthetic rating request is triggered.
  • the electronic device can also automatically trigger the generation of the aesthetic rating request at a preset time interval or based on a certain trigger rule. For example, when the electronic device detects that the current display interface includes multiple images, such as detecting that the electronic device launches a browser application to browse an article page containing images, it can automatically trigger the generation of an aesthetic rating request, and compare the multiple images in the current page according to the scoring model. Images are scored aesthetically. This allows the electronic device to sort multiple images according to different scoring scores, and prioritize the display of images with high scoring scores, that is, with good aesthetic quality.
  • the target image may be an image stored in an electronic device.
  • the aesthetic scoring request includes path information indicating the location where the target image is stored, and the electronic device can use the path information to obtain the target that needs aesthetic scoring. image.
  • the electronic device can obtain the target image that needs to be aesthetically scored through a wired connection or a wireless connection according to the aesthetic score request.
  • the scoring model is obtained by training using the network model training method provided in this embodiment.
  • the network model training method provided in this embodiment.
  • the target image is input to the scoring model for aesthetic scoring, so as to obtain the target scoring score corresponding to the target image.
  • the target scoring score can represent the aesthetic quality of the target image. The higher the target score, the higher the aesthetic quality of the target image, which means that the target image is more in line with the public's aesthetics.
  • the step of aesthetically scoring the target image according to the scoring model to obtain the target scoring score corresponding to the target image includes: performing aesthetic scoring on the target image according to the scoring model to obtain The target image corresponds to target scoring distribution data, and the target scoring distribution data is the probability distribution data of each scoring score; the scoring score with the largest probability value in the target scoring distribution data is used as the target scoring score.
  • the electronic device when the electronic device scores multiple target images in the album or image library stored in the electronic device, after obtaining the target score corresponding to each target image, it can also detect whether the target score is It is greater than the preset score value, and the target image with the target score score less than or equal to the preset score value is deleted. It is understandable that when the target score is less than or equal to the preset score value, it indicates that the aesthetic quality of the target image is not high, that is, the target image may be an unclear image or an image with incomplete composition. It is understandable that such images with unclear images and images with incomplete composition are very likely to be invalid images when the user accidentally presses the capture button and mistakenly captured them. Therefore, such images are not images that users need to save.
  • the electronic device of the present application can periodically trigger the corresponding aesthetic scoring request to filter multiple images stored in the electronic device through the scoring model, and intelligently delete the target image with the aesthetic score less than or equal to the preset score value. It can help users manage images in electronic devices more intelligently and save the memory space of electronic devices.
  • the image processing method provided by the embodiment of the application receives an aesthetic scoring request according to the aesthetic scoring request; obtains the target image that needs to be aesthetically scored; calls a pre-trained scoring model; The target image is scored aesthetically, and the target score score corresponding to the target image is obtained; in this way, the aesthetic score of the target image is obtained through preset training.
  • FIG. 5 is a schematic structural diagram of a training device for a network model provided by an embodiment of the application.
  • the training device of the network model may include: a first acquisition module 41, a construction module 42, a first scoring module 43, a training module 44, and a determination module 45.
  • the first acquisition module 41 is configured to acquire an image sample set, the image sample set including a plurality of images to be scored with initial score distribution data.
  • the construction module 42 is used to construct a basic network model and multiple loss functions corresponding to the basic network model.
  • the first scoring module 43 is configured to input the image sample set into the basic network model for aesthetic scoring, and obtain score distribution data corresponding to each image to be scored.
  • the training module 44 is configured to train the basic network model according to the score distribution data, the initial score distribution data, and the multiple loss functions until the basic network model converges.
  • the determining module 45 is configured to use the convergent basic network model as a scoring model for aesthetic scoring of the image.
  • the first acquisition module 41 is specifically configured to acquire an initial image sample set, the initial image sample set includes a plurality of sample images with initial score distribution data; image pre-processing is performed on each of the sample images. Processing to obtain a plurality of first sample images with initial score distribution data; add random noise to the first sample image to obtain a first target sample image; according to the sample image and the first target sample The image obtains an image sample set, and the sample image and the first target sample image are used as the image to be scored.
  • the construction module 42 is also used to: adjust the initial score distribution data corresponding to each image to be scored to obtain The corresponding expected score distribution data, wherein the standard deviation of the expected score distribution data corresponding to each image to be scored is smaller than the standard deviation of the initial score distribution data.
  • the construction module 42 when the construction module 42 adjusts the initial score distribution data corresponding to each image to be scored to obtain the corresponding expected score distribution data, it is specifically used to determine the initial score distribution data corresponding to each image to be scored. Each scoring data of is subjected to indexation processing, and the corresponding expected score distribution data is obtained.
  • the training module 44 is specifically configured to train the basic network model according to the score distribution data, the expected score distribution data, and the multiple loss functions.
  • the multiple loss functions include a first loss function and a second loss function.
  • the training module 44 is specifically configured to input the score distribution data and the expected score distribution data into the first loss function to obtain the first loss function.
  • a loss value input the score distribution data and the expected score distribution data into a second loss function to obtain a second loss value; determine a target loss value according to the first loss value and the second loss value; The target loss value adjusts the parameters of the basic network model.
  • the training module 44 multiplies the first loss value by a first weight value to obtain a third loss value; and multiplies the second loss value by a second weight value to obtain a fourth loss value; The third loss value and the fourth loss value are added to obtain a target loss value.
  • the network model training device obtains an image sample set through the first obtaining module 41, and the image sample set includes a plurality of images to be scored with initial score distribution data;
  • the building module 42 constructs the basic network Model and multiple loss functions corresponding to the basic network model;
  • the first scoring module 43 inputs the image sample set into the basic network model for aesthetic scoring, and obtains the score distribution data corresponding to each image to be scored;
  • the training module 44 distributes the data according to the score
  • the initial score distribution data and multiple loss functions train the basic network model until the basic network model converges;
  • the determining module 45 uses the converged basic network model as a scoring model for aesthetic scoring of the image.
  • the basic network model can be trained through multiple loss functions to obtain a more accurate scoring model, which improves the accuracy of the scoring model.
  • the network model training device provided in the embodiment of this application belongs to the same concept as the network model training method in the above embodiment, and the network model training method can be run on the network model training device provided in the embodiment of the training method
  • the specific implementation process is detailed in the embodiment of the training method of the network model, which will not be repeated here.
  • FIG. 6 is a schematic structural diagram of an image processing apparatus 500 provided by an embodiment of the present application.
  • the image processing device may include: a receiving module 51, a second acquiring module 52, a calling model 53, and a second scoring module 54.
  • the receiving module 51 is configured to receive aesthetic scoring requests
  • the second acquiring module 52 is configured to acquire a target image that needs to be aesthetically scored according to the aesthetic score request;
  • the second scoring module 54 is configured to perform aesthetic scoring on the target image according to the scoring model to obtain a target scoring score corresponding to the target image;
  • the scoring model is obtained by training using the network model training method provided in the embodiment of the present application.
  • the second scoring module 54 is specifically configured to perform aesthetic scoring on the target image according to the scoring model to obtain target score distribution data corresponding to the target image; and compare the probability values in the target score distribution data The maximum score is used as the target score.
  • the image processing device receives the aesthetic scoring request through the receiving module 51; the second acquiring module 52 acquires the target image that needs to be aesthetically scored according to the aesthetic scoring request; the calling model 53 calls the pre-training A scoring model; the second scoring module 54 performs aesthetic scoring on the target image according to the scoring model to obtain the target scoring score corresponding to the target image; in this way, it is obtained through preset training to perform aesthetic scoring on the target image.
  • the image processing device provided in the embodiment of this application belongs to the same concept as the image processing method in the above embodiment, and any method provided in the embodiment of the image processing method can be run on the image processing device.
  • any method provided in the embodiment of the image processing method can be run on the image processing device.
  • the embodiment of the present application provides a computer-readable storage medium on which a computer program is stored.
  • the computer executes the network model training method or image provided in the embodiment of the present application. ⁇ Treatment methods.
  • the storage medium may be a magnetic disk, an optical disk, a read only memory (Read Only Memory, ROM,), or a random access device (Random Access Memory, RAM), etc.
  • An embodiment of the present application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory.
  • the processor is configured to execute the computer program stored in the memory by calling the computer program stored in the memory.
  • the above electronic device may be a mobile terminal such as a tablet computer or a smart phone.
  • FIG. 7 is a schematic diagram of the first structure of an electronic device provided by an embodiment of this application.
  • the electronic device 600 may include components such as a memory 601 and a processor 602. Those skilled in the art can understand that the structure of the electronic device shown in FIG. 7 does not constitute a limitation on the electronic device, and may include more or fewer components than shown in the figure, or a combination of certain components, or different component arrangements.
  • the memory 601 may be used to store software programs and modules.
  • the processor 602 executes various functional applications and data processing by running the computer programs and modules stored in the memory 601.
  • the memory 601 may mainly include a storage program area and a storage data area.
  • the storage program area may store an operating system, a computer program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data created by the use of electronic equipment, etc.
  • the processor 602 is the control center of the electronic device. It uses various interfaces and lines to connect various parts of the entire electronic device, and executes the electronic device by running or executing the application program stored in the memory 601 and calling the data stored in the memory 601
  • the various functions and processing data of the electronic equipment can be used to monitor the electronic equipment as a whole.
  • the memory 601 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the memory 601 may further include a memory controller to provide the processor 602 with access to the memory 601.
  • the processor 602 in the electronic device will load the executable code corresponding to the process of one or more application programs into the memory 601 according to the following instructions, and the processor 602 will run and store the executable code in the memory 601.
  • the image sample set including a plurality of images to be scored with initial score distribution data
  • the convergent basic network model is used as a scoring model for aesthetic scoring of images.
  • the processor 602 may execute:
  • the initial score distribution data corresponding to each image to be scored is adjusted to obtain corresponding expected score distribution data, wherein the standard deviation of the expected score distribution data corresponding to each image to be scored is smaller than the standard deviation of the initial score distribution data.
  • the processor 602 when the processor 602 executes training the basic network model according to the score distribution data, the initial score distribution data, and the multiple loss functions, it may execute:
  • the plurality of loss functions include a first loss function and a second loss function
  • the processor 602 executes the pair of loss functions according to the score distribution data, the expected score distribution data, and the plurality of loss functions.
  • the parameters of the basic network model are adjusted according to the target loss value.
  • the processor 602 when the processor 602 determines the target loss value according to the first loss value and the second loss value, it may execute:
  • the third loss value and the fourth loss value are added to obtain a target loss value.
  • the processor 602 adjusts the initial score distribution data corresponding to each image to be scored, and when the corresponding expected score distribution data is obtained, it may execute:
  • processor 602 when the processor 602 executes to acquire the image sample set, it may execute:
  • the initial image sample set including a plurality of sample images with initial score distribution data
  • Image preprocessing is performed on each of the sample images to obtain a plurality of first sample images with initial score distribution data
  • An image sample set is obtained according to the sample image and the first target sample image, and the sample image and the first target sample image are used as images to be scored.
  • the processor 602 in the electronic device loads the executable code corresponding to the process of one or more application programs into the memory 601 according to the following instructions, and the processor 602 runs and stores the executable code in the memory 601 The application in 601, so as to realize the process:
  • the scoring model is obtained by training using the network model training method provided in the embodiment of the present application.
  • the processor 602 performs aesthetic scoring on the target image according to the scoring model, and when the target scoring score corresponding to the target image is obtained, it may execute:
  • the scoring score with the largest probability value in the target scoring distribution data is used as the target scoring score.
  • FIG. 8 is a schematic diagram of a second structure of an electronic device provided by an embodiment of the application.
  • the electronic device further includes a camera component 603, a radio frequency circuit 604, an audio circuit 605, and Power supply 606.
  • the display 603, the radio frequency circuit 604, the audio circuit 605, and the power supply 606 are electrically connected to the processor 602, respectively.
  • the display 603 may be used to display information input by the user or information provided to the user, and various graphical user interfaces. These graphical user interfaces may be composed of graphics, text, icons, videos, and any combination thereof.
  • the display 603 may include a display panel.
  • the display panel may be configured in the form of a liquid crystal display (LCD) or an organic light-emitting diode (OLED).
  • the radio frequency circuit 604 may be used to transmit and receive radio frequency signals to establish wireless communication with network equipment or other electronic equipment through wireless communication, and to transmit and receive signals with the network equipment or other electronic equipment.
  • the audio circuit 605 can be used to provide an audio interface between the user and the electronic device through a speaker or a microphone.
  • the power supply 606 can be used to power various components of the electronic device 600.
  • the power supply 606 may be logically connected to the processor 602 through a power management system, so that functions such as charging, discharging, and power consumption management can be managed through the power management system.
  • the electronic device 600 may also include a camera component, a Bluetooth module, etc.
  • the camera component may include an image processing circuit, which may be implemented by hardware and/or software components, and may include defining image signal processing (Image Signal Processing) various processing units of the pipeline.
  • the image processing circuit may at least include: multiple cameras, an image signal processor (Image Signal Processor, ISP processor), a control logic, an image memory, a display, and the like.
  • Each camera may include at least one or more lenses and image sensors.
  • the image sensor may include a color filter array (such as a Bayer filter). The image sensor can obtain the light intensity and wavelength information captured with each imaging pixel of the image sensor, and provide a set of raw image data that can be processed by the image signal processor.
  • the network model training method/image processing method device provided by the embodiment of the application belongs to the same concept as the network model training method/image processing method in the above embodiment.
  • the device can run any of the methods provided in the network model training method/image processing method embodiment.
  • For the specific implementation process please refer to the network model training method/image processing method embodiment. I won't repeat it here.
  • the computer program can be stored in a computer readable storage medium, such as in a memory, and executed by at least one processor. May include the flow of the embodiment of the training method of the network model/the image processing method.
  • the storage medium may be a magnetic disk, an optical disc, a read only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), etc.
  • each functional module can be integrated in a processing chip, or each module can exist alone physically, or two or more The modules are integrated in one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer readable storage medium, such as a read-only memory, a magnetic disk or an optical disk, etc. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例公开了一种网络模型的训练方法、图像的处理方法、装置及电子设备。网络模型的训练方法包括:获取图像样本集;构建基础网络模型以及对应基础网络模型的多个损失函数;根据图像样本集和多个损失函数对基础网络模型进行训练,直至基础网络模型收敛;将收敛的基础网络模型作为图像美学评分的评分模型。

Description

网络模型的训练方法、图像的处理方法、装置及电子设备 技术领域
本发明涉及机器学习领域,特别涉及一种网络模型的训练方法、图像的处理方法、装置及电子设备。
背景技术
随着移动互联网快速发展和智能手机快速普及,图像和视频等视觉内容数据与日俱增,这些视觉内容的感知理解已经成为计算机视觉、计算摄像学和人类心理学等多个交叉学科研究方向。其中图像美学评价(image aesthetics assessment)是近期计算机视觉感知理解方向中的研究热点。图像美学反映了人类在视觉上对“美好”事物追求和向往,因此在摄影摄像、广告设计以及艺术作品制作等领域进行视觉美学评价具有重要的意义。
随着近年来机器学习的迅速发展,极大促进了可重复计算的图像客观美学评价方法发展。机器学习,尤其是深度学习系统可以高效和精准地模仿人类的思维处理方式,因此利用机器学习或者深度学习方法对图像进行美学评价是一个重要研究课题。
发明内容
本申请实施例提供一种网络模型的训练方法、图像的处理方法、装置及电子设备,可以提高网络模型训练的准确率。
第一方面,本申请实施例提供一种网络模型的训练方法,包括:
获取图像样本集,所述图像样本集包括多个带有初始评分分布数据的待评分图像;
构建基础网络模型以及对应所述基础网络模型的多个损失函数;
将所述图像样本集输入至所述基础网络模型中进行美学评分,得到每一待评分图像对应的评分分布数据;
根据所述评分分布数据、所述初始评分分布数据以及所述多个损失函数对所述基础网络模型进行训练,直至所述基础网络模型收敛;
将收敛的基础网络模型作为用于对图像进行美学评分的评分模型。
第二方面,本申请实施例提供一种图像的处理方法,包括:
接收美学评分请求;
根据所述美学评分请求获取需要进行美学评分的目标图像;
调用预先训练的评分模型;
根据所述评分模型对所述目标图像进行美学评分,得到所述目标图像对应的目标评分分数;
其中,所述评分模型采用本实施例提供的网络模型的训练方法训练得到。
第三方面,本申请实施例提供一种网络模型的训练装置,包括:
第一获取模块,用于获取图像样本集,所述图像样本集包括多个带有初始评分分布数据的待评分图像;
构建模块,用于构建基础网络模型以及对应所述基础网络模型的多个损失函数;
第一评分模块,用于将所述图像样本集输入至所述基础网络模型中进行美学评分,得到每一待评分图像对应的评分分布数据;
训练模块,用于根据所述评分分布数据、所述初始评分分布数据以及所述多个损失函数对所述基础网络模型进行训练,直至所述基础网络模型收敛;
确定模块,用于将收敛的基础网络模型作为用于对图像进行美学评分的评分模型。
第四方面,本申请实施例提供一种图像的处理装置,包括:
接收模块,用于接收美学评分请求;
第二获取模块,用于根据所述美学评分请求获取需要进行美学评分的目标图像;
调用模型,用于调用预先训练的评分模型;
第二评分模块,用于根据所述评分模型对所述目标图像进行美学评分,得到所述目标图像对应的目标评分分数;
其中,所述评分模型采用本实施例提供的网络模型的训练方法训练得到。
第五方面,本申请实施例提供一种存储介质,其上存储有计算机程序,其中,当所述计算机程序在计算机上执行时,使得所述计算机执行本实施例提供的网络模型的训练方法或图像的处理方法。
第六方面,本申请实施例提供一种电子设备,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器通过调用所述存储器中存储的所述计算机程序,用于执行:
获取图像样本集,所述图像样本集包括多个带有初始评分分布数据的待评分图像;
构建基础网络模型以及对应所述基础网络模型的多个损失函数;
将所述图像样本集输入至所述基础网络模型中进行美学评分,得到每一待评分图像对应的评分分布数据;
根据所述评分分布数据、所述初始评分分布数据以及所述多个损失函数对所述基础网络模型进行训练,直至所述基础网络模型收敛;
将收敛的基础网络模型作为用于对图像进行美学评分的评分模型。
第七方面,本申请实施例提供一种电子设备,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器通过调用所述存储器中存储的所述计算机程序,用于执行:
接收美学评分请求;
根据所述美学评分请求获取需要进行美学评分的目标图像;
调用预先训练的评分模型;
根据所述评分模型对所述目标图像进行美学评分,得到所述目标图像对应的目标评分分数;
其中,所述评分模型采用本申请实施例提供的网络模型的训练方法训练得到。
附图说明
下面结合附图,通过对本申请的具体实施方式详细描述,将使本申请的技术方案及其有益效果显而易见。
图1是本申请实施例提供的网络模型的训练方法的第一种流程示意图。
图2是本申请实施例提供的网络模型的训练方法的第二种流程示意图。
图3是本申请实施例提供的初始评分分布数据和期望评分分布数据的分布示意图。
图4是本申请实施例提供的图像的处理方法的流程示意图。
图5是本申请实施例提供的网络模型的训练装置的结构示意图。
图6是本申请实施例提供的图像的处理装置的结构示意图。
图7是本申请实施例提供的电子设备的第一种结构示意图。
图8是本申请实施例提供的电子设备的第二种结构示意图。
具体实施方式
请参照图示,其中相同的组件符号代表相同的组件,本申请的原理是以实施在一适当的运算环境中来举例说明。以下的说明是基于所例示的本申请具体实施例,其不应被视为限制本申请未在此详述的其它具体实施例。
本申请实施例提供一种网络模型的训练方法。其中,该网络模型的训练方法的执行主体可以是本申请实施例提供的网络模型的训练装置,或者集成了该网络模型的训练装置的电子设备。该网络模型的训练装置可以采用硬件或者软件的方式实现,电子设备可以是智能手机、平板电脑、掌上电脑、笔记本电脑、或者台式电脑等配置有处理器而具有处理能力的设备。为了便于描述,以下将以网络模型的训练方法的执行主体为电子设备进行举例说明。
请参阅图1,图1是本申请实施例提供的网络模型的训练方法的第一种流程示意图。该网络模型的训练方法的流程可以包括:
101、获取图像样本集。
其中,电子设备可以通过有线连接或无线连接的方式获取图像样本集。其中,该图像样本集可以采用现有的第一个大规模的美学质量评估数据库(A Large-Scale Database for Aesthetic Visual Analysis,AVA)。为方便描述,以下将该第一个大规模的美学质量评估数据库称为AVA公开数据集。并将该AVA公开数据集中的所有的样本图像作为本申请的待评分图像。需要说明的是,该AVA公开数据集为一个美学质量评估的数据库,该AVA公开数据集大约包括有256000张样本图像,并且每张样本图像均由多个不同的用户对该样本图像进行美学评分,其中美学评分的评分分数为[1,10]之间的多个自然数,例如,评分分数可以为1、2或10等。进一步的,根据多个不同的用户的评分数据可以生成该样本图像对应的初始评分分布数据,该初始评分分布数据中包括每一评分分数相应的评分人数。可以理解的是,美学评分的分数越高表明该样本图像的美学质量更高。
需要说明的是,AVA公开数据集中每张样本图像被评分的人数在78~539之间。其中平均每张样本图像有210人参与美学评分,因此每张样本图像的大量的美学评分数据可以比较好 的反应出一张图像对于大众的评价与感知。故该数据集是图像美学评价领域公认的基准测试集,因此,本申请采用该AVA公开数据集对原始网络模型进行训练,可以得到准确性更高的用于进行图像美学评分的评分模型。
102、构建基础网络模型以及对应基础网络模型的多个损失函数。
其中,由于轻量化深度神经网络模型的模型更小、速度更快,故被广泛的应用到嵌入式的电子设备如智能手机中。因此可以构建一个轻量化深度神经网络模型,将该轻量化深度神经网络模型作为基础网络模型,使得将训练完成的基础网络模型能应用于电子设备如智能手机中,进而实现智能手机对图像进行美学评分,进一步的提高智能手机的智能度。其中,可以采用MobileNets模型作为基础网络模型,具体的,MobileNets的核心就是由深层卷积(depthwise conv)和点对点卷积(Point conv)组成的深度可分离卷积,正是这种深度可分离卷积的结构实现了MobileNets模型在不降低网络性能的前提下减少网络参数和计算量。
在一些实施方式中,可以将通过构建MobileNetV2网络模型作为基础网络模型。进一步的,为了使得基础网络模型输出的评分分布数据在[0,1]之间,可以将softmax函数作为基础网络模型的输出层的输出函数,使得基础网络模型输出的评分分布数据中包括每一评分分数对应的概率值。
在一些实施方式中,在构建MobileNetV2网络模型之后,还需要将MobileNetV2网络模型在ImageNet数据库上进行预训练,将完成预训练的MobileNetV2网络模型作为基础网络模型。需要说明的是,ImageNet数据库是一个用于视觉对象识别软件研究的大型可视化数据库。ImageNet数据库包含超过1500万个图像,其中120万个图像分为1000个类别(大约100万个图像含边界框和注释)。将MobileNetV2网络模型在ImageNet数据库上进行预训练,在训练结束时可以得到网络参数较好的MobileNetV2网络模型,并将该预训练完成的MobileNetV2网络模型作为基础网络模型可以极大的缩短基础网络模型的训练时间。
其中,由于基础网络模型输出的评分分布数据包括每一评分分数的概率分布数据,因此,需要获取相应的用于处理概率分布的函数作为对应基础网络模型的多个损失函数。通过多个损失函数的共同作用,可以更好的捕获每张待评分图像的评分分布数据与初始评分分布数据之间的差异,更好的拟合评分分布数据。因此可以通过多个损失函数更好的对基础网络模型的训练,以得到更准确的评分模型。
103、将图像样本集输入至基础网络模型中进行美学评分,得到每一待评分图像对应的评分分布数据。
其中,将图像样本集中的待评分图像输入至基础网络模型如MobileNetV2网络模型中进行美学评分,得到基础网络模型输出的每一待评分图像对应的评分分布数据。其中评分分布数据包括每一评分分数对应的概率值。例如输出的评分分布数据为
Figure PCTCN2019118135-appb-000001
其中p s1为评分分数为1时的概率值,
Figure PCTCN2019118135-appb-000002
为评分分数为2时的概率值,
Figure PCTCN2019118135-appb-000003
为评分分数为N-1时的概率值,
Figure PCTCN2019118135-appb-000004
为评分分数为N时的概率值。
104、根据评分分布数据、初始评分分布数据以及多个损失函数对基础网络模型进行训练,直至基础网络模型收敛。
其中,为了使得基础网络模型输出的评分分布数据与用户真实评分的初始评分分布数据保持一致,可以通过多个损失函数来对基础模型进行训练。具体的训练过程可以如下:当一批待评分图像美学评分完成后,可以将一批待评分图像的评分分布数据和初始评分分布数据输入至多个损失函数中进行计算,得到相应的目标损失值。根据目标损失值确定基础网络模型是否满足收敛条件,当目标损失值逐步逼近某个数值,或者是在某个数值附近波动,损失变化小于某个很小的正数时,可以确认基础网络模型收敛。若目标损失值不满足上述条件时,表示基础网络模型不满足收敛条件,此时将目标损失值回传至基础网络模型中,并根据反向传播算法对基础网络模型的网络参数即权值和偏值进行调整。并获取未通过基础网络模型进行美学评分的待评分图像,以根据未通过基础网络模型进行美学评分的待评分图像和多个损失函数再对调整后的基础网络模型继续进行训练,直至基础网络模型收敛。
105、将收敛的基础网络模型作为用于对图像进行美学评分的评分模型。
其中,将该收敛的基础网络模型作为用于对图像进行美学评分的评分模型。可以将评分模型应用于电子设备中,以根据评分模型对用户存储在电子设备中多个图像进行美学评分,根据评分分数对多个图像进行排序处理,并根据排序处理的结果将图像进行显示,使得用户可以优先浏览到美学评分分数高即图像美学质量更高的图像。
由上可知,本申请实施例提供的网络模型的训练方法,通过获取图像样本集,图像样本集包括多个带有初始评分分布数据的待评分图像;构建基础网络模型以及对应基础网络模型的多个损失函数;将图像样本集输入至基础网络模型中进行美学评分,得到每一待评分图像对应的评分分布数据;根据评分分布数据、初始评分分布数据以及多个损失函数对基础网络模型进行训练,直至基础网络模型收敛;将收敛的基础网络模型作为用于对图像进行美学评分的评分模型。以此可以通过多个损失函数对基础网络模型进行训练,以得到准确率更高的评分模型,提升了评分模型的准确率。
请参阅图2,图2是本申请实施例提供的网络模型的训练方法的第二种流程示意图。该网络模型的训练方法的流程可以包括:
201、获取初始图像样本集。
其中,初始图像样本集中包括多个带有初始评分分布数据的样本图像。该初始图像样本集可以采用现有的AVA公开数据集。该AVA公开数据集为一个美学质量评估的数据库,该AVA公开数据集大约包括有256000张样本图像,并且每张样本图像均由多个不同的用户对该样本图像进行美学评分,其中美学评分的评分分数为[1,10]之间的多个自然数,例如,评分分数可以为10个评分分数。进一步的,根据多个不同的用户的评分数据可以生成该样本图像对应的初始评分分布数据,该初始评分分布数据中包括每一评分分数相应的评分人数。可以理解的是,美学评分的分数越高表明该样本图像的美学质量更高。
需要说明的是,AVA公开数据集中每张样本图像被评分的人数在78~539之间。其中平 均每张样本图像有210人参与美学评分,因此每张样本图像的大量的美学评分数据可以比较好的反应出一张图像对于大众的评价与感知。故该数据集是图像美学评价领域公认的基准测试集,因此,本申请采用该AVA公开数据集对原始网络模型进行训练,可以得到准确性更高的用于进行图像美学评分的评分模型。
202、对每一样本图像进行图像预处理,得到多个带有初始评分分布数据的第一样本图像。
其中,图像预处理的方式包括常规的多尺度缩放、随机翻转、随机平移以及随机裁剪等,以通过对样本图像的图像预处理得到多个第一样本图像,并且将第一样本图像也作为待评分图像输入至基础模型中进行美学评分,以增强输入图像即待评分图像的数据量。但是,考虑到图像的构图和颜色等信息需要作为美学评分的依据,因此本申请不进行尺度很大的随机裁剪及颜色的随机变化的图像预处理,以保证第一样本图像与对应的样本图像在构图和颜色等信息是相近的。另外,可以理解的是,在对样本图像进行随机缩放、随机翻转以及随机平移是并不会过多的改变样本图像的构图以及颜色等信息。也就是说,得到的第一样本图像与对应的样本图像在构图个颜色等信息是相近的,因此可以采用对应的样本图像的初始评分分布数据作为第一样本图像的初始评分分布数据。
203、在第一样本图像中加入随机噪声,得到第一目标样本图像。
其中,由于第一样本图像是通过样本图像进行随机改变如多尺度缩放、随机翻转、随机平移以及随机裁剪等得到的,使得第一样本数据与样本数据之间必定存在微小的差异。虽然该微小的差异并不会产生视觉上辨识度的不同,但是却会直接影响到基础网络模型输出的评分分布数据,使得第一样本图像对应的评分分布数据与对应样本图像的评分分布数据存在差异。因此,在对每一样本图像进行图像预处理后,图像预处理造成的微小差异操作会对得到的第一样本图像的美学评分造成影响,使得基础网络模型预测出的评分分布数据不稳定。
基于此问题,可以在第一样本图像的每一像素点中加入微小的随机噪声如高斯噪声,以得到第一目标样本图像。可以理解的是,对于一般的图像,图像像素的微小变化并不会产生视觉上辨识度的不同,即图像像素值的微小变化对于用户的人眼是很难分辨。也就是说,在用户的视觉上第一目标样本图像与第一样本图像是相同的,因此用户对第一目标样本图像和第一样本图像的评分也不会有变化。因此,在理想环境下,基础网络模型对应第一样本图像和第一目标样本图像的评分分布数据也不应该有很大的区别。因此,需要通过第一目标样本图像对基础网络模型进行训练,可以让基础网络模型适应输入的样本图像的微小变化,使得基础网络模型对于像素值的微小变化更加鲁棒,从而使得预测的评分分布数据更稳定。
204、将样本图像和第一目标样本图像均作为待评分图像,得到图像样本集。
其中,将样本图像和第一目标样本图像均作为待评分图像,得到图像样本集,使得最终得到的图像样本集中的待评分图像的数据量大于初始图像样本集。该更大数据量的图像样本集更适合用于基础网络模型的训练,使得训练完成后的评分模型的预测结果更加准确。并且,将样本图像和第一目标样本图像作为待评分图像,以对基础网络模型进行训练。
205、对每一待评分图像对应的初始评分分布数据中的每一评分数据进行指数化处理,得 到相应的期望评分分布数据。
其中,虽然每一待评分图像的初始评分分布数据可以比较好的反应出一张图像对于大众的评价与感知,但是由于该初始评分分布数据的标准差较大,且评分分数大多集中在中间部分即集中在评分分数[3,7]之间,使得初始评分分布数据较为平坦。因此,通过该初始评分分布数据对基础网络模型进行训练学习时,会使得训练完成的评分模型输出的评分分布数据的标准差同样很大。使得不同评分分数的图像的美学质量都相近,在美学质量上很难有细致的区别。例如,训练完成的评分模型输出的评分分数从3分到7分所对应的图像的美学质量都差不多,无法根据评分分数去更好体现出图像的美学质量的好坏。
因此,需要对初始评分分布数据中的每一评分数据进行指数化处理,得到标准差更小的期望评分分布数据。例如,对初始评分分布数据中的每一评分数据进行求指数计算,以得到期望评分分布数据。
请参阅图3,图3为本申请实施例提供的初始评分分布数据和期望评分分布数据的分布示意图。如图所示,曲线A为本申请实施例提供的某一待评分图像对应的初始评分分布数据,图3中的横坐标为评分分数,纵坐标为评分分数对应的概率值。当指数为2时,即对初始评分分布数据中的每一评分数据进行求平方计算,以得到标准差较小的期望评分分布数据,即图3所示的曲线B。从图3可见,曲线B中评分分数的概率分布更为集中,即可表明曲线B的标准差小于曲线A的标准差。故,可以对对初始评分分布数据中的每一评分数据进行指数化处理,得到相应的标准差较小的期望评分分布数据。使得通过标准差更小的期望评分分布数据对基础网络模型进行训练学习时,可以得到输出结果的标准差更小的评分模型。
在一些实施方式中,请继续参阅图3,还可以获取一个与评分分数相关的高斯函数,将该高斯函数与初始评分分布数据相乘,以得到期望评分分布数据即图3所示的曲线C。其中该高斯函数为标准差较小的高斯函数,即该高斯函数的概率分布较为集中。故可以通过该概率分布集中的高斯函数与初始评分分布数据相乘,以降低初始评分分布数据的标准差,得到期望评分分布数据。但是,高斯函数的概率分布较为绝对,即只会存在一个靠近中间值的单峰。但考虑到可能有图像评分分数的概率分布会存在多峰的情况,即有人觉得很好看有人觉得很难看的两极分化情况,例如,在评分分数为3时的概率值较大,且在评分分数为8时的概率值也较大,此时初始评分分布数据中将会在评分分数为3和评分分数为8两处形成两个峰值。这种情况使用高斯函数去降低初始评分分布数据的标准差时,会使得概率值大的评分分数向中间值如评分分数为5分处靠拢,使得得到的期望评分分布数据不准确。因此,根据高斯函数对初始评分分布数据进行调整时,还需根据实际情况进行选择。
206、构建基础网络模型以及对应基础网络模型的第一损失函数和第二损失函数。
其中,由于轻量化深度神经网络模型的模型更小、速度更快,故被广泛的应用到嵌入式的电子设备如智能手机中。因此可以构建一个轻量化深度神经网络模型,将该轻量化深度神经网络模型作为基础网络模型,使得将训练完成的基础网络模型能应用于电子设备如智能手机中,进而实现智能手机对图像进行美学评分,进一步的提高智能手机的智能度。其中,可以采用 MobileNets模型作为基础网络模型,具体的,MobileNets的核心就是由深层卷积(depthwise conv)和点对点卷积(Point conv)组成的深度可分离卷积,正是这种深度可分离卷积的结构实现了MobileNets模型在不降低网络性能的前提下减少网络参数和计算量。
在一些实施方式中,可以将通过构建MobileNetV2网络模型作为基础网络模型。进一步的,为了使得基础网络模型输出的评分分布数据在[0,1]之间,可以将softmax函数作为基础网络模型的输出层的输出函数,使得基础网络模型输出的评分分布数据中包括每一评分分数对应的概率值。
在一些实施方式中,在构建MobileNetV2网络模型之后,还需要将MobileNetV2网络模型在ImageNet上进行预训练,将完成预训练的MobileNetV2网络模型作为基础网络模型。需要说明的是,ImageNet是一个用于视觉对象识别软件研究的大型可视化数据库。ImageNet数据库包含超过1500万个图像,其中120万个图像分为1000个类别(大约100万个图像含边界框和注释)。将MobileNetV2网络模型在ImageNet上进行预训练,在训练结束时可以得到模型参数较好的MobileNetV2网络模型,并将该预训练完成的MobileNetV2网络模型作为基础网络模型可以极大的缩短的基础网络模型的训练时间。
其中,由于基础网络模型输出的评分分布数据包括每一评分分数的概率分布数据,因此,需要获取相应的用于处理概率分布的不同函数作为对应基础网络模型的第一损失函数和第二损失函数。通过第一损失函数和第二损失函数的共同作用,可以更好的捕获每张待评分图像的评分分布数据与初始评分分布数据之间的差异,更好的拟合评分分布数据。因此可以通过多个损失函数更好的对基础网络模型的训练,以得到更准确的评分模型。
具体的,可以使用用来测量某两个分布之间的距离的函数构建第一损失函数,该第一损失函数可以为测地距离(Earth Mover's Distance,EMD)损失函数,第一损失函数为:
Figure PCTCN2019118135-appb-000005
其中,
Figure PCTCN2019118135-appb-000006
为期望评分分布数据,p为评分分布数据,N为评分分数的数量,N=10,函数CDF p(k)和函数
Figure PCTCN2019118135-appb-000007
为累积概率分布函数,k为评分分数,CDF p(k)表示评分分布数据中评分分数为k时的累积概率值,
Figure PCTCN2019118135-appb-000008
表示期望评分分布数据中评分分数为k时的累积概率值,l为指数值。在一些实施方式中,l=2。
其中,
Figure PCTCN2019118135-appb-000009
k为评分分数,
Figure PCTCN2019118135-appb-000010
表示评分分布数据中评分分数i的概率值;
Figure PCTCN2019118135-appb-000011
k为评分分数,
Figure PCTCN2019118135-appb-000012
表示期望评分分布数据中评分分数i的概率值。
具体的,第二损失函数可以为CJS损失函数,第二损失函数为:
Figure PCTCN2019118135-appb-000013
其中,
Figure PCTCN2019118135-appb-000014
为期望评分分布数据,p为评分分布数据,N为评分分数的数量,N=10,函数CDF p(k)和函数
Figure PCTCN2019118135-appb-000015
为累积概率分布函数,k为评分分数,CDF p(k)表示评分分布数据中评 分分数为k时的累积概率值,
Figure PCTCN2019118135-appb-000016
表示期望评分分布数据中评分分数为k时的累积概率值,l为指数值。在一些实施方式中,l=2。
其中,
Figure PCTCN2019118135-appb-000017
k为评分分数,
Figure PCTCN2019118135-appb-000018
表示评分分布数据中评分分数i的概率值;
Figure PCTCN2019118135-appb-000019
k为评分分数,
Figure PCTCN2019118135-appb-000020
表示期望评分分布数据中评分分数i的概率值。
207、将图像样本集输入至基础网络模型中进行美学评分,得到每一待评分图像对应的评分分布数据。
其中,将图像样本集中的待评分图像输入至基础网络模型如MobileNetV2网络模型中进行美学评分,得到基础网络模型输出的每一待评分图像对应的评分分布数据。其中评分分布数据包括每一评分分数对应的概率值。例如输出的评分分布数据为
Figure PCTCN2019118135-appb-000021
其中p s1为评分分数为1时的概率值,
Figure PCTCN2019118135-appb-000022
为评分分数为2时的概率值,
Figure PCTCN2019118135-appb-000023
为评分分数为N-1时的概率值,
Figure PCTCN2019118135-appb-000024
为评分分数为N时的概率值。
208、将评分分布数据和期望评分分布数据输入至第一损失函数,得到第一损失值。
其中,将每一待评分图像对应的评分分布数据和期望评分分布数据输入至第一损失函数,得到每一待评分图像对应的第一损失值。
209、将评分分布数据和期望评分分布数据输入至第二损失函数,得到第二损失值。
其中,将每一待评分图像对应的评分分布数据和期望评分分布数据输入至第二损失函数,得到每一待评分图像对应的第二损失值。
210、根据第一损失值和第二损失值确定目标损失值。
其中,当得到第一损失值和第二损失值之后,电子设备可根据该第一损失值和第二损失值,确定目标损失值。具体的,可以将第一损失值乘以第一权重值,得到第三损失值。并将第二损失值乘以第二权重值,得到第四损失值。最后将第三损失值和第四损失值相加,得到目标损失值。其中,第一权重值和第二权重值可以根据实际情况设置,第一权重值和第二权重值可以为数值相等的两个数值,也可以为数值不同的两个数值。例如,第一权重值可以为0.6,第二权重值可以为0.4。或者第一权重值和第二权重值可以均为0.5。
211、根据目标损失值对基础网络模型的参数进行调整,直至基础网络模型收敛。
其中,根据目标损失值确定基础网络模型是否满足收敛条件,当目标损失值逐步逼近某个数值,或者是在某个数值附近波动,损失变化小于某个很小的正数时,可以确认基础网络模型收敛。若目标损失值不满足上述条件时,表示基础网络模型不满足收敛条件,此时将目标损失值回传至基础网络模型中,并根据反向传播算法对基础网络模型的网络参数即权值和偏值进行调整。并继续对调整后的基础网络模型进行训练,直至基础网络模型收敛。
在一些实施方式中,每输入一批待评分图像对基础网络模型进行训练之后,可得到参数调整后的网络模型,电子设备可从验证集中获取一批验证图像输入参数调整后的网络模型中,以验证该参数调整后的网络模型的准确率。当本次得到的准确率大于上次得到的准确率时,电 子设备可保存参数调整后的网络模型的参数。当本次得到的准确率小于上次得到的准确率时,电子设备可不对参数调整后的网络模型进行保存。当多次得到的参数调整后的网络模型的准确率不增加时,比如,当多次得到的参数调整后的网络模型的准确率分别为:87%,86.9%,86.7%,86.8%时,电子设备可确认基础网络模型训练结束,即该基础网络模型收敛。
212、将收敛的基础网络模型作为用于对图像进行美学评分的评分模型。
其中,获取收敛的基础网络模型,并将该收敛的基础网络模型作为用于对图像进行美学评分的评分模型。可以将评分模型应用于电子设备中,以根据评分模型对用户存储在电子设备中图像进行美学评分,根据评分分数对图像进行排序处理并根据排序处理的结果将图像进行显示,使得用户可以优先浏览到美学评分分数高即图像美学质量更高的图像。
由上可知,本申请实施例提供的网络模型的训练方法,通过获取图像样本集,图像样本集包括多个带有初始评分分布数据的待评分图像;构建基础网络模型以及对应基础网络模型的多个损失函数;将图像样本集输入至基础网络模型中进行美学评分,得到每一待评分图像对应的评分分布数据;根据评分分布数据、初始评分分布数据以及多个损失函数对基础网络模型进行训练,直至基础网络模型收敛;将收敛的基础网络模型作为用于对图像进行美学评分的评分模型。以此可以通过多个损失函数对基础网络模型进行训练,以得到准确率更高的评分模型,提升了评分模型的准确率。
请参阅图4,图4是本申请实施例提供的图像的处理方法的流程示意图。该图像的处理方法的流程可以包括:
301、接收美学评分请求。
其中,当电子设备接收到目标组件触控操作、预设语音操作或预设目标应用的开启指令等方式时触发生成美学评分请求。另外,电子设备还可以在间隔预设时长或者基于一定的触发规则去自动触发生成美学评分请求。例如,当电子设备检测到当前显示界面包括多个图像时,如检测到电子设备启动浏览器应用进行浏览包含图像的文章页面时,可以自动触发生成美学评分请求,根据评分模型对当前页面中多个图像进行美学评分。使得电子设备可以根据不同的评分分数对多个图像进行排序,将评分分数高即美学质量好的图像优先展示。
302、根据美学评分请求获取需要进行美学评分的目标图像。
其中,目标图像可以是存储在电子设备中的图像,此时美学评分请求中包括用于指示目标图像所存储的位置的路径信息,电子设备可以通过该路径信息去获取到需要进行美学评分的目标图像。当然,当目标图像不为存储在电子设备中图像时,电子设备可以根据美学评分请求通过有线连接或者无线连接的方式获取需要进行美学评分的目标图像。
303、调用预先训练的评分模型。
其中,评分模型采用本实施例提供的网络模型的训练方法训练得到的。具体网络模型的训练过程可以参见上述实施例的相关描述,在此不再赘述。
304、根据评分模型对目标图像进行美学评分,得到目标图像对应的目标评分分数。
其中,将目标图像输入至评分模型进行美学评分,以得到目标图像对应的目标评分分数。 该目标评分分数可以代表目标图像的美学质量。目标评分分数越高,表明该目标图像的美学质量越高,即表明目标图像更加符合大众的审美。
在一些实施方式中,根据所述评分模型对所述目标图像进行美学评分,得到所述目标图像对应的目标评分分数的步骤,包括:根据所述评分模型对所述目标图像进行美学评分,得到所述目标图像对应目标评分分布数据,该目标评分分布数据为每一评分分数的概率分布数据;将所述目标评分分布数据中概率值最大的评分分数作为目标评分分数。
在一些实施方式中,当电子设备对存储在电子设备中的相册或图像库中的多个目标图像进行评分时,在得到每一目标图像对应的目标评分分数之后,还可以检测目标评分分数是否大于预设评分值,并将目标评分分数小于或等于预设评分值的目标图像进行删除。可以理解的是,目标评分分数小于或等于预设评分值时,表明该目标图像的美学质量不高,即该目标图像可能是图像不清晰的图像、也有可能是构图不完整的图像。可以理解的是,此类图像不清晰的图像以及构图不完整的图像极大概率时用户不小心按到拍摄键误拍得到的无效的图像,因此此类图像并非用户需要保存的图像,并且此类图像还会占用电子设备的存储空间。因此,本申请的电子设备可以定期触发相应的美学评分请求,以通过评分模型对存储在电子设备中的多个图像进行筛选,将美学评分小于或等于预设评分值的目标图像进行智能删除,能更加智能的帮助用户管理电子设备中的图像,节省电子设备的内存空间。
由上可知,本申请实施例提供的图像的处理方法,通过接收美学评分请求根据所述美学评分请求;获取需要进行美学评分的目标图像;调用预先训练的评分模型;根据所述评分模型对所述目标图像进行美学评分,得到所述目标图像对应的目标评分分数;以此通过预设训练得到去对目标图像进行美学评分。
请参阅图5,图5为本申请实施例提供的网络模型的训练装置的结构示意图。该网络模型的训练装置可以包括:第一获取模块41、构建模块42、第一评分模块43、训练模块44和确定模块45。
第一获取模块41,用于获取图像样本集,所述图像样本集包括多个带有初始评分分布数据的待评分图像。
构建模块42,用于构建基础网络模型以及对应所述基础网络模型的多个损失函数。
第一评分模块43,用于将所述图像样本集输入至所述基础网络模型中进行美学评分,得到每一待评分图像对应的评分分布数据。
训练模块44,用于根据所述评分分布数据、所述初始评分分布数据以及所述多个损失函数对所述基础网络模型进行训练,直至所述基础网络模型收敛。
确定模块45,用于将收敛的基础网络模型作为用于对图像进行美学评分的评分模型。
在一些实施方式中,第一获取模块41,具体用于获取初始图像样本集,所述初始图像样本集中包括多个带有初始评分分布数据的样本图像;对每一所述样本图像进行图像预处理,得到多个带有初始评分分布数据的第一样本图像;在所述第一样本图像中加入随机噪声,得到第一目标样本图像;根据所述样本图像和所述第一目标样本图像得到图像样本集,将所述样本图 像和所述第一目标样本图像作为待评分图像。
在一些实施方式中,构建模块42在构建基础网络模型以及对应所述基础网络模型的多个损失函数的步骤之前,还用于:对每一待评分图像对应的初始评分分布数据进行调整,得到相应的期望评分分布数据,其中每一待评分图像对应的期望评分分布数据的标准差小于所述初始评分分布数据的标准差。
在一些实施方式中,构建模块42在对每一待评分图像对应的初始评分分布数据进行调整,得到相应的期望评分分布数据时,具体用于对每一待评分图像对应的初始评分分布数据中的每一评分数据进行指数化处理,得到相应的期望评分分布数据。
在一些实施方式中,训练模块44,具体用于对根据所述评分分布数据、所述期望评分分布数据以及所述多个损失函数对所述基础网络模型进行训练。
在一些实施方式中,多个损失函数包括第一损失函数和第二损失函数,训练模块44,具体用于将所述评分分布数据和所述期望评分分布数据输入至第一损失函数,得到第一损失值;将所述评分分布数据和所述期望评分分布数据输入至第二损失函数,得到第二损失值;根据所述第一损失值和所述第二损失值确定目标损失值;根据所述目标损失值对所述基础网络模型的参数进行调整。
在一些实施方式中,训练模块44,将所述第一损失值乘以第一权重值,得到第三损失值;将所述第二损失值乘以第二权重值,得到第四损失值;将所述第三损失值和所述第四损失值相加,得到目标损失值。
由上可知,本申请实施例提供的网络模型的训练装置,通过第一获取模块41获取图像样本集,图像样本集包括多个带有初始评分分布数据的待评分图像;构建模块42构建基础网络模型以及对应基础网络模型的多个损失函数;第一评分模块43将图像样本集输入至基础网络模型中进行美学评分,得到每一待评分图像对应的评分分布数据;训练模块44根据评分分布数据、初始评分分布数据以及多个损失函数对基础网络模型进行训练,直至基础网络模型收敛;确定模块45将收敛的基础网络模型作为用于对图像进行美学评分的评分模型。以此可以通过多个损失函数对基础网络模型进行训练,以得到准确率更高的评分模型,提升了评分模型的准确率。
应当说明的是,本申请实施例提供的网络模型的训练装置与上文实施例中的网络模型的训练方法属于同一构思,在网络模型的训练装置上可以运行网络模型的训练方法实施例中提供的任一方法,其具体实现过程详见网络模型的训练方法实施例,此处不再赘述。
请参阅图6,图6为本申请实施例提供的图像的处理装置500的结构示意图。该图像的处理装置可以包括:接收模块51、第二获取模块52、调用模型53、第二评分模块54。
接收模块51,用于接收美学评分请求;
第二获取模块52,用于根据所述美学评分请求获取需要进行美学评分的目标图像;
调用模型53,用于调用预先训练的评分模型;
第二评分模块54,用于根据所述评分模型对所述目标图像进行美学评分,得到所述目标 图像对应的目标评分分数;
其中,所述评分模型采用本申请实施例提供的网络模型的训练方法训练得到。
在一些实施方式中,第二评分模块54,具体用于根据所述评分模型对所述目标图像进行美学评分,得到所述目标图像对应目标评分分布数据;将所述目标评分分布数据中概率值最大的评分分数作为目标评分分数。
由上可知,本申请实施例提供的图像的处理装置,通过接收模块51接收美学评分请求;第二获取模块52根据所述美学评分请求获取需要进行美学评分的目标图像;调用模型53调用预先训练的评分模型;第二评分模块54根据所述评分模型对所述目标图像进行美学评分,得到所述目标图像对应的目标评分分数;以此通过预设训练得到去对目标图像进行美学评分。
应当说明的是,本申请实施例提供的图像的处理装置与上文实施例中的图像的处理方法属于同一构思,在图像的处理装置上可以运行图像的处理方法实施例中提供的任一方法,其具体实现过程详见图像的处理方法实施例,此处不再赘述。
本申请实施例提供一种计算机可读的存储介质,其上存储有计算机程序,当其存储的计算机程序在计算机上执行时,使得计算机执行如本申请实施例提供的网络模型的训练方法或图像的处理方法。
其中,存储介质可以是磁碟、光盘、只读存储器(Read Only Memory,ROM,)或者随机存取器(Random Access Memory,RAM)等。
本申请实施例还提供一种电子设备,包括存储器,处理器,所述存储器中存储有计算机程序,所述处理器通过调用所述存储器中存储的所述计算机程序,用于执行如本申请实施例提供的网络模型的训练方法或图像的处理方法。
例如,上述电子设备可以是诸如平板电脑或者智能手机等移动终端。请参阅图7,图7为本申请实施例提供的电子设备的第一种结构示意图。
该电子设备600可以包括存储器601、处理器602等部件。本领域技术人员可以理解,图7中示出的电子设备结构并不构成对电子设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
存储器601可用于存储软件程序以及模块,处理器602通过运行存储在存储器601的计算机程序以及模块,从而执行各种功能应用以及数据处理。存储器601可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的计算机程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据电子设备的使用所创建的数据等。
处理器602是电子设备的控制中心,利用各种接口和线路连接整个电子设备的各个部分,通过运行或执行存储在存储器601内的应用程序,以及调用存储在存储器601内的数据,执行电子设备的各种功能和处理数据,从而对电子设备进行整体监控。
此外,存储器601可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器601还可以包括存储器控制器,以提供处理器602对存储器601的访问。
在本实施例中,电子设备中的处理器602会按照如下的指令,将一个或一个以上的应用程序的进程对应的可执行代码加载到存储器601中,并由601处理器602来运行存储在存储器601中的应用程序,从而实现流程:
获取图像样本集,所述图像样本集包括多个带有初始评分分布数据的待评分图像;
构建基础网络模型以及对应所述基础网络模型的多个损失函数;
将所述图像样本集输入至所述基础网络模型中进行美学评分,得到每一待评分图像对应的评分分布数据;
根据所述评分分布数据、所述初始评分分布数据以及所述多个损失函数对所述基础网络模型进行训练,直至所述基础网络模型收敛;
将收敛的基础网络模型作为用于对图像进行美学评分的评分模型。
在一些实施方式中,处理器602执行构建基础网络模型以及对应所述基础网络模型的多个损失函数之前,可以执行:
对每一待评分图像对应的初始评分分布数据进行调整,得到相应的期望评分分布数据,其中每一待评分图像对应的期望评分分布数据的标准差小于所述初始评分分布数据的标准差。
在一些实施方式中,处理器602执行根据所述评分分布数据、所述初始评分分布数据以及所述多个损失函数对所述基础网络模型进行训练时,可以执行:
根据所述评分分布数据、所述期望评分分布数据以及所述多个损失函数对所述基础网络模型进行训练。
在一些实施方式中,所述多个损失函数包括第一损失函数和第二损失函数,处理器602执行所述根据所述评分分布数据、所述期望评分分布数据以及所述多个损失函数对所述基础网络模型进行训练时,可以执行:
将所述评分分布数据和所述期望评分分布数据输入至第一损失函数,得到第一损失值;
将所述评分分布数据和所述期望评分分布数据输入至第二损失函数,得到第二损失值;
根据所述第一损失值和所述第二损失值确定目标损失值;
根据所述目标损失值对所述基础网络模型的参数进行调整。
在一些实施方式中,处理器602执行根据所述第一损失值和所述第二损失值确定目标损失值时,可以执行:
将所述第一损失值乘以第一权重值,得到第三损失值;
将所述第二损失值乘以第二权重值,得到第四损失值;
将所述第三损失值和所述第四损失值相加,得到目标损失值。
在一些实施方式中,处理器602执行对每一待评分图像对应的初始评分分布数据进行调整,得到相应的期望评分分布数据时,可以执行:
对每一待评分图像对应的初始评分分布数据中的每一评分数据进行指数化处理,得到相应的期望评分分布数据。
在一些实施方式中,处理器602执行获取图像样本集时,可以执行:
获取初始图像样本集,所述初始图像样本集中包括多个带有初始评分分布数据的样本图像;
对每一所述样本图像进行图像预处理,得到多个带有初始评分分布数据的第一样本图像;
在所述第一样本图像中加入随机噪声,得到第一目标样本图像;
根据所述样本图像和所述第一目标样本图像得到图像样本集,将所述样本图像和所述第一目标样本图像作为待评分图像。
在本实施例中,电子设备中的处理器602会按照如下的指令,将一个或一个以上的应用程序的进程对应的可执行代码加载到存储器601中,并由处理器602来运行存储在存储器601中的应用程序,从而实现流程:
接收美学评分请求;
根据所述美学评分请求获取需要进行美学评分的目标图像;
调用预先训练的评分模型;
根据所述评分模型对所述目标图像进行美学评分,得到所述目标图像对应的目标评分分数;
其中,所述评分模型采用本申请实施例提供的网络模型的训练方法训练得到。
在一些实施方式中,处理器602执行根据所述评分模型对所述目标图像进行美学评分,得到所述目标图像对应的目标评分分数时,可以执行:
根据所述评分模型对所述目标图像进行美学评分,得到所述目标图像对应目标评分分布数据;
将所述目标评分分布数据中概率值最大的评分分数作为目标评分分数。
请参照图8,图8为本申请实施例提供的电子设备的第二结构示意图,与图7所示电子设备的区别在于,电子设备还包括:摄像组件603、射频电路604、音频电路605以及电源606。其中,显示器603、射频电路604、音频电路605以及电源606分别与处理器602电性连接。
该显示器603可以用于显示由用户输入的信息或提供给用户的信息以及各种图形用户接口,这些图形用户接口可以由图形、文本、图标、视频和其任意组合来构成。显示器603可以包括显示面板,在某些实施方式中,可以采用液晶显示器(Liquid Crystal Display,LCD)、或者有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板。
射频电路604可以用于收发射频信号,以通过无线通信与网络设备或其他电子设备建立无线通讯,与网络设备或其他电子设备之间收发信号。
音频电路605可以用于通过扬声器、传声器提供用户与电子设备之间的音频接口。
电源606可以用于给电子设备600的各个部件供电。在一些实施例中,电源606可以通过电源管理系统与处理器602逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。
尽管图8中未示出,电子设备600还可以包括摄像组件、蓝牙模块等,摄像组件可以包括图像处理电路,图像处理电路可以利用硬件和/或软件组件实现,可包括定义图像信号处理 (Image Signal Processing)管线的各种处理单元。图像处理电路至少可以包括:多个摄像头、图像信号处理器(Image Signal Processor,ISP处理器)、控制逻辑器、图像存储器以及显示器等。其中每个摄像头至少可以包括一个或多个透镜和图像传感器。图像传感器可包括色彩滤镜阵列(如Bayer滤镜)。图像传感器可获取用图像传感器的每个成像像素捕捉的光强度和波长信息,并提供可由图像信号处理器处理的一组原始图像数据。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见上文针对网络模型的训练方法/图像的处理方法的详细描述,此处不再赘述。
本申请实施例提供的所述网络模型的训练方法/图像的处理方法装置与上文实施例中的网络模型的训练方法/图像的处理方法属于同一构思,在所述网络模型的训练方法/图像的处理方法装置上可以运行所述网络模型的训练方法/图像的处理方法实施例中提供的任一方法,其具体实现过程详见所述网络模型的训练方法/图像的处理方法实施例,此处不再赘述。
需要说明的是,对本申请实施例所述网络模型的训练方法/图像的处理方法而言,本领域普通技术人员可以理解实现本申请实施例所述网络模型的训练方法/图像的处理方法的全部或部分流程,是可以通过计算机程序来控制相关的硬件来完成,所述计算机程序可存储于一计算机可读取存储介质中,如存储在存储器中,并被至少一个处理器执行,在执行过程中可包括如所述网络模型的训练方法/图像的处理方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)等。
对本申请实施例的所述网络模型的训练方法/图像的处理方法装置而言,其各功能模块可以集成在一个处理芯片中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中,所述存储介质譬如为只读存储器,磁盘或光盘等。
以上对本申请实施例所提供的一种网络模型的训练方法、图像的处理方法、装置、存储介质及电子设备进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种网络模型的训练方法,其中,包括:
    获取图像样本集,所述图像样本集包括多个带有初始评分分布数据的待评分图像;
    构建基础网络模型以及对应所述基础网络模型的多个损失函数;
    将所述图像样本集输入至所述基础网络模型中进行美学评分,得到每一待评分图像对应的评分分布数据;
    根据所述评分分布数据、所述初始评分分布数据以及所述多个损失函数对所述基础网络模型进行训练,直至所述基础网络模型收敛;
    将收敛的基础网络模型作为用于对图像进行美学评分的评分模型。
  2. 根据权利要求1所述的方法,其中,所述构建基础网络模型以及对应所述基础网络模型的多个损失函数的步骤之前,还包括:
    对每一待评分图像对应的初始评分分布数据进行调整,得到相应的期望评分分布数据,其中每一待评分图像对应的期望评分分布数据的标准差小于所述初始评分分布数据的标准差;
    所述根据所述评分分布数据、所述初始评分分布数据以及所述多个损失函数对所述基础网络模型进行训练的步骤,包括:
    根据所述评分分布数据、所述期望评分分布数据以及所述多个损失函数对所述基础网络模型进行训练。
  3. 根据权利要求2所述的方法,其中,所述多个损失函数包括第一损失函数和第二损失函数,所述根据所述评分分布数据、所述期望评分分布数据以及所述多个损失函数对所述基础网络模型进行训练的步骤,包括:
    将所述评分分布数据和所述期望评分分布数据输入至第一损失函数,得到第一损失值;
    将所述评分分布数据和所述期望评分分布数据输入至第二损失函数,得到第二损失值;
    根据所述第一损失值和所述第二损失值确定目标损失值;
    根据所述目标损失值对所述基础网络模型的参数进行调整。
  4. 根据权利要求3所述的方法,其中,所述根据所述第一损失值和所述第二损失值确定目标损失值的步骤,包括:
    将所述第一损失值乘以第一权重值,得到第三损失值;
    将所述第二损失值乘以第二权重值,得到第四损失值;
    将所述第三损失值和所述第四损失值相加,得到目标损失值。
  5. 根据权利要求2所述的方法,其中,所述对每一待评分图像对应的初始评分分布数据进行调整,得到相应的期望评分分布数据的步骤,包括:
    对每一待评分图像对应的初始评分分布数据中的每一评分数据进行指数化处理,得到相应的期望评分分布数据。
  6. 根据权利要求3至5任一项所述的方法,其中,所述第一损失函数为:
    Figure PCTCN2019118135-appb-100001
    其中,
    Figure PCTCN2019118135-appb-100002
    为期望评分分布数据,p为评分分布数据,N为评分分数的数量,函数CDF p(k)和函数
    Figure PCTCN2019118135-appb-100003
    为累积概率分布函数,k为评分分数,CDF p(k)表示评分分布数据中评分分数为k时的累积概率值,
    Figure PCTCN2019118135-appb-100004
    表示期望评分分布数据中评分分数为k时的累积概率值,l为指数值;
    所述第二损失函数为:
    Figure PCTCN2019118135-appb-100005
    其中,
    Figure PCTCN2019118135-appb-100006
    为期望评分分布数据,p为评分分布数据,N为评分分数的数量,函数CDF p(k)和函数
    Figure PCTCN2019118135-appb-100007
    为累积概率分布函数,k为评分分数,CDF p(k)表示评分分布数据中评分分数为k时的累积概率值,
    Figure PCTCN2019118135-appb-100008
    表示期望评分分布数据中评分分数为k时的累积概率值。
  7. 根据权利要求1至5任一项所述的方法,其中,所述获取图像样本集的步骤,包括:
    获取初始图像样本集,所述初始图像样本集中包括多个带有初始评分分布数据的样本图像;
    对每一所述样本图像进行图像预处理,得到多个带有初始评分分布数据的第一样本图像;
    在所述第一样本图像中加入随机噪声,得到第一目标样本图像;
    将所述样本图像和所述第一目标样本图像均作为待评分图像,得到样本图像集。
  8. 一种图像的处理方法,其中,包括:
    接收美学评分请求;
    根据所述美学评分请求获取需要进行美学评分的目标图像;
    调用预先训练的评分模型;
    根据所述评分模型对所述目标图像进行美学评分,得到所述目标图像对应的目标评分分数;
    其中,所述评分模型采用权利要求1至7任一项所述网络模型的训练方法训练得到。
  9. 根据权利要求8所述的方法,其中,根据所述评分模型对所述目标图像进行美学评分,得到所述目标图像对应的目标评分分数的步骤,包括:
    根据所述评分模型对所述目标图像进行美学评分,得到所述目标图像对应目标评分分布数据;
    将所述目标评分分布数据中概率值最大的评分分数作为目标评分分数。
  10. 一种网络模型的训练装置,其中,包括:
    第一获取模块,用于获取图像样本集,所述图像样本集包括多个带有初始评分分布数据的待评分图像;
    构建模块,用于构建基础网络模型以及对应所述基础网络模型的多个损失函数;
    第一评分模块,用于将所述图像样本集输入至所述基础网络模型中进行美学评分,得到每一待评分图像对应的评分分布数据;
    训练模块,用于根据所述评分分布数据、所述初始评分分布数据以及所述多个损失函数对所述基础网络模型进行训练,直至所述基础网络模型收敛;
    确定模块,用于将收敛的基础网络模型作为用于对图像进行美学评分的评分模型。
  11. 一种图像的处理装置,其中,包括:
    接收模块,用于接收美学评分请求;
    第二获取模块,用于根据所述美学评分请求获取需要进行美学评分的目标图像;
    调用模型,用于调用预先训练的评分模型;
    第二评分模块,用于根据所述评分模型对所述目标图像进行美学评分,得到所述目标图像对应的目标评分分数;
    其中,所述评分模型采用权利要求1至7任一项所述网络模型的训练方法训练得到。
  12. 一种存储介质,其中,所述存储介质中存储有计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行权利要求1至7任一项所述的网络模型的训练方法或权利要求8至9任一项所述的图像的处理方法。
  13. 一种电子设备,其中,所述电子设备包括处理器和存储器,所述存储器中存储有计算机程序,所述处理器通过调用所述存储器中存储的所述计算机程序,用于执行:
    获取图像样本集,所述图像样本集包括多个带有初始评分分布数据的待评分图像;
    构建基础网络模型以及对应所述基础网络模型的多个损失函数;
    将所述图像样本集输入至所述基础网络模型中进行美学评分,得到每一待评分图像对应的评分分布数据;
    根据所述评分分布数据、所述初始评分分布数据以及所述多个损失函数对所述基础网络模型进行训练,直至所述基础网络模型收敛;
    将收敛的基础网络模型作为用于对图像进行美学评分的评分模型。
  14. 根据权利要求13所述的电子设备,其中,所述处理器用于执行:
    对每一待评分图像对应的初始评分分布数据进行调整,得到相应的期望评分分布数据,其中每一待评分图像对应的期望评分分布数据的标准差小于所述初始评分分布数据的标准差;
    根据所述评分分布数据、所述期望评分分布数据以及所述多个损失函数对所述基础网络模型进行训练。
  15. 根据权利要求14所述的电子设备,其中,所述处理器用于执行:
    将所述评分分布数据和所述期望评分分布数据输入至第一损失函数,得到第一损失值;
    将所述评分分布数据和所述期望评分分布数据输入至第二损失函数,得到第二损失值;
    根据所述第一损失值和所述第二损失值确定目标损失值;
    根据所述目标损失值对所述基础网络模型的参数进行调整。
  16. 根据权利要求15所述的电子设备,其中,所述处理器用于执行:
    将所述第一损失值乘以第一权重值,得到第三损失值;
    将所述第二损失值乘以第二权重值,得到第四损失值;
    将所述第三损失值和所述第四损失值相加,得到目标损失值。
  17. 根据权利要求14所述的电子设备,其中,所述处理器用于执行:
    对每一待评分图像对应的初始评分分布数据中的每一评分数据进行指数化处理,得到相应的期望评分分布数据。
  18. 根据权利要求13至17任一项所述的电子设备,其中,所述处理器用于执行:
    获取初始图像样本集,所述初始图像样本集中包括多个带有初始评分分布数据的样本图像;
    对每一所述样本图像进行图像预处理,得到多个带有初始评分分布数据的第一样本图像;
    在所述第一样本图像中加入随机噪声,得到第一目标样本图像;
    将所述样本图像和所述第一目标样本图像作为待评分图像,得到图像样本集。
  19. 一种电子设备,其中,所述电子设备包括处理器和存储器,所述存储器中存储有计算机程序,所述处理器通过调用所述存储器中存储的所述计算机程序,用于执行:
    接收美学评分请求;
    根据所述美学评分请求获取需要进行美学评分的目标图像;
    调用预先训练的评分模型;
    根据所述评分模型对所述目标图像进行美学评分,得到所述目标图像对应的目标评分分数;
    其中,所述评分模型采用权利要求1至7任一项所述网络模型的训练方法训练得到。
  20. 根据权利要求19所述的电子设备,其中,所述处理器用于执行:
    根据所述评分模型对所述目标图像进行美学评分,得到所述目标图像对应目标评分分布数据;
    将所述目标评分分布数据中概率值最大的评分分数作为目标评分分数。
PCT/CN2019/118135 2019-11-13 2019-11-13 网络模型的训练方法、图像的处理方法、装置及电子设备 WO2021092808A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/118135 WO2021092808A1 (zh) 2019-11-13 2019-11-13 网络模型的训练方法、图像的处理方法、装置及电子设备
CN201980100428.4A CN114402356A (zh) 2019-11-13 2019-11-13 网络模型的训练方法、图像的处理方法、装置及电子设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/118135 WO2021092808A1 (zh) 2019-11-13 2019-11-13 网络模型的训练方法、图像的处理方法、装置及电子设备

Publications (1)

Publication Number Publication Date
WO2021092808A1 true WO2021092808A1 (zh) 2021-05-20

Family

ID=75911632

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118135 WO2021092808A1 (zh) 2019-11-13 2019-11-13 网络模型的训练方法、图像的处理方法、装置及电子设备

Country Status (2)

Country Link
CN (1) CN114402356A (zh)
WO (1) WO2021092808A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554327A (zh) * 2021-07-29 2021-10-26 上海千内云教育软件科技有限公司 一种基于深度学习的素描作品智能分档以及量化评分方法
CN113569809A (zh) * 2021-08-27 2021-10-29 腾讯音乐娱乐科技(深圳)有限公司 一种图像处理方法、设备及计算机可读存储介质
CN113962965A (zh) * 2021-10-26 2022-01-21 腾讯科技(深圳)有限公司 图像质量评价方法、装置、设备以及存储介质
CN114186497A (zh) * 2021-12-15 2022-03-15 湖北工业大学 艺术作品价值智能化解析方法、系统、设备及介质
CN114898424A (zh) * 2022-04-01 2022-08-12 中南大学 一种基于双重标签分布的轻量化人脸美学预测方法
CN117152409A (zh) * 2023-08-07 2023-12-01 中移互联网有限公司 基于多模态感知建模的图像裁剪方法、装置和设备
CN117315438A (zh) * 2023-09-25 2023-12-29 北京邮电大学 基于兴趣点的图像色彩美学评估方法、装置及设备

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115100061A (zh) * 2022-06-28 2022-09-23 重庆长安汽车股份有限公司 图像美化方法、装置、设备及介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106651830A (zh) * 2016-09-28 2017-05-10 华南理工大学 一种基于并行卷积神经网络的图像质量测试方法
CN108520213A (zh) * 2018-03-28 2018-09-11 五邑大学 一种基于多尺度深度的人脸美丽预测方法
CN109344855A (zh) * 2018-08-10 2019-02-15 华南理工大学 一种基于排序引导回归的深度模型的人脸美丽评价方法
CN109801256A (zh) * 2018-12-15 2019-05-24 华南理工大学 一种基于感兴趣区域和全局特征的图像美学质量评估方法
CN109902912A (zh) * 2019-01-04 2019-06-18 中国矿业大学 一种基于性格特征的个性化图像美学评价方法
US20190220702A1 (en) * 2016-06-28 2019-07-18 Conduent Business Services, Llc System and method for expanding and training convolutional neural networks for large size input images
CN110223292A (zh) * 2019-06-20 2019-09-10 厦门美图之家科技有限公司 图像评估方法、装置及计算机可读存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190220702A1 (en) * 2016-06-28 2019-07-18 Conduent Business Services, Llc System and method for expanding and training convolutional neural networks for large size input images
CN106651830A (zh) * 2016-09-28 2017-05-10 华南理工大学 一种基于并行卷积神经网络的图像质量测试方法
CN108520213A (zh) * 2018-03-28 2018-09-11 五邑大学 一种基于多尺度深度的人脸美丽预测方法
CN109344855A (zh) * 2018-08-10 2019-02-15 华南理工大学 一种基于排序引导回归的深度模型的人脸美丽评价方法
CN109801256A (zh) * 2018-12-15 2019-05-24 华南理工大学 一种基于感兴趣区域和全局特征的图像美学质量评估方法
CN109902912A (zh) * 2019-01-04 2019-06-18 中国矿业大学 一种基于性格特征的个性化图像美学评价方法
CN110223292A (zh) * 2019-06-20 2019-09-10 厦门美图之家科技有限公司 图像评估方法、装置及计算机可读存储介质

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554327A (zh) * 2021-07-29 2021-10-26 上海千内云教育软件科技有限公司 一种基于深度学习的素描作品智能分档以及量化评分方法
CN113569809A (zh) * 2021-08-27 2021-10-29 腾讯音乐娱乐科技(深圳)有限公司 一种图像处理方法、设备及计算机可读存储介质
CN113962965A (zh) * 2021-10-26 2022-01-21 腾讯科技(深圳)有限公司 图像质量评价方法、装置、设备以及存储介质
CN113962965B (zh) * 2021-10-26 2023-06-09 腾讯科技(深圳)有限公司 图像质量评价方法、装置、设备以及存储介质
CN114186497A (zh) * 2021-12-15 2022-03-15 湖北工业大学 艺术作品价值智能化解析方法、系统、设备及介质
CN114898424A (zh) * 2022-04-01 2022-08-12 中南大学 一种基于双重标签分布的轻量化人脸美学预测方法
CN114898424B (zh) * 2022-04-01 2024-04-26 中南大学 一种基于双重标签分布的轻量化人脸美学预测方法
CN117152409A (zh) * 2023-08-07 2023-12-01 中移互联网有限公司 基于多模态感知建模的图像裁剪方法、装置和设备
CN117315438A (zh) * 2023-09-25 2023-12-29 北京邮电大学 基于兴趣点的图像色彩美学评估方法、装置及设备

Also Published As

Publication number Publication date
CN114402356A (zh) 2022-04-26

Similar Documents

Publication Publication Date Title
WO2021092808A1 (zh) 网络模型的训练方法、图像的处理方法、装置及电子设备
WO2021138855A1 (zh) 模型训练方法、视频处理方法、装置、存储介质及电子设备
US11355100B2 (en) Method and electronic device for processing audio, and non-transitory storage medium
US10846522B2 (en) Speaking classification using audio-visual data
WO2021134277A1 (zh) 情感识别方法、智能装置和计算机可读存储介质
WO2022027912A1 (zh) 一种人脸姿态检测方法、装置、终端设备及存储介质
WO2023151289A1 (zh) 情感识别方法、训练方法、装置、设备、存储介质及产品
US20240105159A1 (en) Speech processing method and related device
WO2021102655A1 (zh) 网络模型训练方法、图像属性识别方法、装置及电子设备
CN110865666A (zh) 温度控制方法、装置、存储介质及电子设备
US20200364457A1 (en) Emotion recognition-based artwork recommendation method and device, medium, and electronic apparatus
JP2021056991A (ja) 推薦方法、装置、電子デバイス、記憶媒体、及びプログラム
WO2020181523A1 (zh) 唤醒屏幕的方法和装置
CN111242273B (zh) 一种神经网络模型训练方法及电子设备
US11676419B2 (en) Electronic apparatus and control method thereof
TWI731442B (zh) 電子裝置及其利用觸控資料的物件資訊辨識方法
CN112364799A (zh) 一种手势识别方法及装置
WO2021081945A1 (zh) 一种文本分类方法、装置、电子设备及存储介质
CN113778663A (zh) 一种多核处理器的调度方法及电子设备
WO2022063270A1 (zh) 人脸图像的属性特征的识别方法、装置及电子设备
CN111475384A (zh) 壳体温度的计算方法、装置、存储介质及电子设备
WO2021134485A1 (zh) 视频评分方法、装置、存储介质及电子设备
CN110716642A (zh) 一种调节显示界面的方法与设备
WO2024022060A1 (zh) 一种图像配准方法、装置及存储介质
WO2021119949A1 (zh) 文本分类模型训练方法、文本分类方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19952830

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19952830

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19952830

Country of ref document: EP

Kind code of ref document: A1