CN113592765A

CN113592765A - Image processing method, device, equipment and storage medium

Info

Publication number: CN113592765A
Application number: CN202110133070.4A
Authority: CN
Inventors: 高洵
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-01-29
Filing date: 2021-01-29
Publication date: 2021-11-02

Abstract

The embodiment of the application relates to the technical field of image processing, and discloses an image processing method, an image processing device, image processing equipment and a storage medium, wherein the method comprises the following steps: acquiring an image to be processed; performing image evaluation processing on the image to be processed through an image evaluation model to obtain an evaluation value of the image to be processed; the image evaluation model is constructed based on the trained twin network module, the trained twin network module is obtained by training the twin network module based on the evaluation result of the image sample pair, the evaluation result is used for indicating the evaluation level of the positive sample and the negative sample in the image sample pair, the evaluation result is obtained based on the fusion feature of the image sample pair, the fusion feature is obtained based on the positive sample feature and the negative sample feature, the positive sample feature is obtained by performing feature extraction on the positive sample by adopting a first feature extraction branch of the twin network module, the negative sample feature is obtained by performing feature extraction on the negative sample by adopting a second feature extraction branch of the twin network module, and the generalization of image processing is improved.

Description

Image processing method, device, equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, an image processing device, and a storage medium.

Background

In the information age, images are the most intuitive presentation form, and images with reasonable layout or rich colors can generally obtain higher attention and use value, so that many scenes for processing the images by using an artificial intelligence technology appear in real life.

The images are scored according to a manually established scoring rule, and the scoring rule has strong subjectivity. In reality, different people have different insights on image scoring standards in different application scenes, so that the way of artificially designing scoring rules easily results in low generalization of the scheme. Therefore, it is meaningful to research how to improve the generalization of image processing.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, image processing equipment and a storage medium, and the generalization of image processing can be improved.

In one aspect, an embodiment of the present application provides an image processing method, including:

acquiring an image to be processed;

performing image evaluation processing on the image to be processed through an image evaluation model to obtain an evaluation value of the image to be processed;

the image evaluation model is constructed based on a trained twin network module, the trained twin network module is obtained by training the twin network module based on an evaluation result of an image sample pair, the evaluation result is used for indicating evaluation grades of a positive sample and a negative sample in the image sample pair, the evaluation result is obtained based on a fusion feature of the image sample pair, the fusion feature is obtained based on a positive sample feature and a negative sample feature, the positive sample feature is obtained by performing feature extraction on the positive sample by adopting a first feature extraction branch of the twin network module, and the negative sample feature is obtained by performing feature extraction on the negative sample by adopting a second feature extraction branch of the twin network module.

Correspondingly, an embodiment of the present application provides an image processing apparatus, including:

the acquisition unit is used for acquiring an image to be processed;

the processing unit is used for carrying out image evaluation processing on the image to be processed through an image evaluation model to obtain an evaluation value of the image to be processed;

Accordingly, an embodiment of the present invention provides an image processing apparatus, where the apparatus includes an input interface, and further includes:

a processor adapted to implement one or more instructions;

a computer storage medium having stored thereon one or more instructions adapted to be loaded by the processor and to execute the above-described image processing method.

Accordingly, an embodiment of the present application further provides a computer storage medium, where the computer storage medium stores one or more instructions, and the one or more instructions are adapted to be loaded by the processor and execute the image processing method.

Accordingly, embodiments of the present application provide a computer program product or a computer program, the computer program product comprising a computer program, the computer program being stored in a computer storage medium; the processor of the terminal reads the computer program from the computer storage medium, and executes the computer program, so that the image processing apparatus executes the above-described image processing method.

In the embodiment of the application, the evaluation value of the image to be processed is obtained by acquiring the image to be processed and then calling an image evaluation model to evaluate the image to be processed. The image evaluation model is constructed based on a trained twin network module, the twin network module is obtained by training based on an evaluation result of an image sample pair, and the image sample pair comprises a positive sample and a negative sample. Then, it can be understood that, since each training process of the twin network module is a pair of image samples (positive sample and negative sample), the model built based on the twin network has the capability of learning the nuance between the images, and the generalization of the image processing is enhanced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic flow chart diagram of a twin network module training method provided by an embodiment of the present application;

FIG. 2a is a schematic diagram of a model training process of image processing according to an embodiment of the present disclosure;

fig. 2b is a schematic diagram of a feature extraction process provided in the embodiment of the present application;

fig. 2c is a schematic diagram of an image evaluation model construction method provided in an embodiment of the present application;

FIG. 3 is a schematic flow chart of an image evaluation model training method provided in an embodiment of the present application;

FIG. 4a is a schematic diagram of an image evaluation model training process provided in an embodiment of the present application;

FIG. 4b is a schematic diagram of an image to be processed according to an embodiment of the present disclosure;

FIG. 5 is a schematic flow chart diagram of an image processing method provided by an embodiment of the present application;

fig. 6 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.

Detailed Description

Artificial intelligence techniques use digital computers or machines controlled by digital computers to simulate, extend and extend human intelligence, and then derive theories, methods, techniques and application systems that can be used to perceive the environment, acquire knowledge or use knowledge to obtain optimal results. Machine learning is the core of artificial intelligence, is a fundamental approach for enabling computers to have intelligence, is applied to various fields of artificial intelligence, and refers to a cross subject of multiple subjects including probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. Machine learning specializes in studying how a computer simulates or implements human learning behavior to acquire new knowledge or skills and reorganizes existing knowledge structures to enable the computer to continually improve its performance.

The embodiment of the application makes full use of the artificial intelligence technology and the machine learning technology, and provides an image processing method, wherein the method mainly comprises the steps of carrying out image evaluation processing on an image to be processed through an image evaluation model to obtain an evaluation value of the image to be processed, wherein the image to be processed can be any image; the image evaluation model is constructed based on the trained twin network module. Before constructing the image evaluation model, the embodiment of the present application further trains the twin network module by using the evaluation result of a large number of image sample pairs by using a deep learning algorithm, where the image sample pairs include positive samples and negative samples, the positive samples may be, for example, artificially selected images with higher quality (such as images with high definition and bright and vivid colors), and the negative samples may be, for example, images with lower quality than the positive samples (such as images with low definition and dim colors); the evaluation result can be understood as a comparison result of the positive sample and the negative sample, such as: the positive sample quality level is higher than the negative sample quality level or the negative sample quality level is higher than the positive sample quality level. The twin network module is obtained by inputting paired data for comparison training (such as comparing the quality of a positive sample and a negative sample in an image sample pair), and a specific comparison training process (or a twin network module training process) will be described in detail in the following embodiments, and will not be described herein again. Therefore, it can be understood that the image evaluation model constructed based on the trained twin network module has an objective evaluation system, and the image evaluation model provided by the embodiment is used for evaluating the image, so that the generalization of image processing is remarkably improved.

In one embodiment, the above image processing method mainly aims to perform content dimensional analysis and evaluation on an image to be processed to obtain an evaluation value of the image to be processed, so that the method can be applied to various image processing scenes. For example, the image processing method can be applied to a series of scenes needing image evaluation, such as scoring of picture contents (such as poster picture preference, video cover selection and the like), scoring of picture quality and the like. When the method is used for selecting a video cover, an image cluster consisting of images corresponding to all video frames can be obtained according to a plurality of video frames extracted from a video, and then the image processing method is used for grading the image content of all the images in the image cluster so as to select the image with the highest grade as the image of the video cover; when poster picture selection is carried out, images with high definition and vivid themes can be selected from an image cluster consisting of a plurality of poster images by the method. The evaluation value of the image to be processed may be used to indicate the quality of the image to be processed, the suitability of the image to be processed (the applicability in a certain scene), and the like, and for example, the evaluation value may be an evaluation made on the richness of colors, the integrity of a subject, brightness, sharpness, and the like in the image, or an evaluation made on the degree of fit of a subject, the vividness of the subject, and the like in the image.

In an embodiment, the image processing method may be implemented in the same image processing device, for example, the image processing device may be a server, and then the image to be processed may be obtained through the server, and the image to be processed is evaluated by using the image evaluation model; in another embodiment, the image processing method may also be applied to different image processing devices, for example, a first device may be a terminal, and a second device may be a server that establishes a communication connection with the terminal, where the first device may be a terminal, and the second device may be a server that completes the acquisition of the image to be processed in the second device.

Before the image processing device applies the image processing method, the embodiment of the application can also train the twin network module in a mode that the model training device trains the image prediction model, and train the image evaluation model, wherein the image prediction model is constructed based on the twin network module. In an embodiment, the model training device and the image processing device may be the same device or different devices, and the embodiments of the present application are not limited herein.

The embodiment of the application takes the image processing device and the model training device as the same device for explanation.

Referring to fig. 1, fig. 1 is a schematic diagram of a twin network module training method according to an embodiment of the present application, and as shown in fig. 1, the method includes:

s101, acquiring an image sample pair.

In one embodiment, the pair of image samples includes a positive sample and a negative sample, and the positive sample and the negative sample in the pair of image samples are obviously different, and the quality (such as brightness, sharpness, color richness, body integrity, etc.) of the positive sample is better than that of the negative sample. In the embodiment, if the data of the online service (some image processing services existing in the prior art, such as video cover selection, book cover selection, etc.) is directly used as the training data (image sample pair), the training effect may be affected due to the noise in the service data. For example, taking an image sample pair obtained in a video file as an example, if a manner of manually selecting a positive sample (i.e., a positive sample) in the image sample pair and randomly selecting a negative sample (i.e., a negative sample) of the image sample pair is adopted in the process of obtaining the image sample pair, the quality of the negative sample in the image sample pair may be better than that of the manually selected positive sample, which may cause the quality of the negative sample in the image sample pair to be better than that of the positive sample, and violate the assumption that the quality of the positive sample is better than that of the negative sample, thereby affecting the training effect of the image prediction model. Therefore, in order to avoid the above situation, in this embodiment, the image processing device enlarges the number of image sample pairs when acquiring the image sample pairs, specifically, the image processing device obtains an image cluster by randomly intercepting an image in a video, and selects a negative sample from images of the image cluster except for a positive sample, so that training data can reach a million level, thereby solving the problem of insufficient training data.

In another embodiment, since the whole video frame is flat and unchanged, the difference between the positive sample and the negative sample in the selected image sample pair consisting of the positive sample and the negative sample is not obvious, and further the training effect of the image prediction model can be influenced. Therefore, in this embodiment, when the image processing device selects the negative sample, it is necessary to first obtain the similarity between the positive sample and any image in the image cluster to obtain at least one image with the similarity smaller than the similarity threshold, and then the image processing device selects the negative sample from the images with the similarity smaller than the similarity threshold to obtain a pair of images with good and bad differences. Specifically, the image processing device calculates the hash similarity between the positive sample and the negative sample by applying a difference hash algorithm to obtain hash values corresponding to the positive sample and the negative sample, calculates the hamming distance between the two images by the hash values, and then determines the positive sample and the negative sample in the pair of image samples based on the hamming distance, thereby further avoiding the situation that the training effect of the image prediction model is not ideal due to the above situation. Alternatively, the image processing apparatus may take two images (positive sample, negative sample) whose hamming distance is smaller than the distance threshold as an image sample pair, for example, the image processing apparatus may combine two images whose hamming distance is smaller than 50 into one image sample pair. The specific steps of the image processing device applying the differential hash algorithm may be: zooming an image, turning a gray scale image, calculating a difference value of pixel values of two adjacent lines, and calculating a fingerprint (key information in the image).

Then, according to the above description of the embodiments, it is understood that, in a specific embodiment, the way of acquiring the label-free sample by the image processing apparatus may be: the image processing device firstly acquires an image cluster, then takes an image selected by a user in the image cluster as a positive sample, determines a negative sample in the images except the positive sample in the image cluster, and then generates an image sample pair based on the positive sample and the negative sample. It can be seen that each image sample pair has a corresponding image sample set, and any image sample pair in the corresponding image sample set may include N image sample pairs (N >1, and N is an integer). Each image sample set may include 20 image sample pairs (i.e., N-20), and the image sample set may be formed by: image 1+ random image 1, image 1+ random image 2, …, image 1+ random image 20.

Illustratively, taking a cover image for video a (video a is a video in the cover map service) as an example, assuming that the similarity threshold is 25%, video a includes image 1, image 2, image 3, and image 4, then the image cluster may be represented as { image 1, image 2, image 3, image 4 }. If the image 1 is a cover picture selected by a user in the service, the image 2, the image 3 and the image 4 are randomly captured images in the video A. Then, the image processing apparatus may take the image 1 as a positive sample, then select any one of the images 2, 3, and 4, and calculate the similarity between the image and the image 1 if the similarity between the image 1 and the other images is as follows: images 1 and 2-90%, images 1 and 20%, and images 1 and 4-10%, then it can be determined that the images with similarity less than the similarity threshold are images 3 and 4. Therefore, the image processing apparatus should select any one of the images between the image 3 and the image 4 as a negative sample, that is: selecting an image 3 as a negative sample, or selecting an image 4 as a negative sample; if the image processing device selects image 3 as the negative example, the image processing device may combine image 1 and image 3 into one image sample pair. In the embodiment of the application, the image processing device can obtain more image clusters in the above manner, so as to obtain more image sample pairs. Therefore, when the image processing equipment acquires the training data (image sample pair), the image processing equipment selects the existing image data in the video cover picture service, fully utilizes the service advantages and saves the labor cost (such as avoiding the artificial creation of the training data).

S102, performing feature extraction on the positive sample in the image sample pair through the first feature extraction branch of the current twin network module to obtain positive sample features, and performing feature extraction on the negative sample in the image sample pair through the second feature extraction branch of the current twin network module to obtain negative sample features.

In one embodiment, please refer to fig. 2a, as shown in fig. 2a, the twin network module 201 is a component of the image prediction model, and two parallel network branches (a first feature extraction branch and a second feature extraction branch) are provided in the twin network module 201, the two parallel network branches share the network weight, and the two parallel network branches are used for mapping the image sample pair to a new space, so that the image sample pair forms a new representation in the new space, that is: the two parallel network branches are used for carrying out feature extraction on the positive sample and the negative sample in the image sample pair to obtain feature representations (such as feature maps) of the positive sample and the negative sample in the twin network module.

In a specific embodiment, the processing relationship between the two branches of the twin network module and the positive and negative samples can be seen in fig. 2 b. As shown in fig. 2b, the image processing apparatus does not define the processing relationship between the positive and negative examples and the first and second feature extraction branches, that is: in the feature extraction process, the first feature extraction branch can be used for performing feature extraction on a negative sample to obtain negative sample features, and can also be used for performing feature extraction on a positive sample to obtain positive sample features. For example, in one training process, when the first feature extraction branch is used for feature extraction of a negative sample to obtain negative sample features, the second feature extraction branch is used for feature extraction of a positive sample to obtain positive sample features.

Illustratively, the two Network branches in the twin Network module may be formed by using a classical neural Network, such as a Visual Geometry Group Network (VGGNet), google Network (google net, a completely new deep learning structure), and the like. In the embodiment of the application, a network branch in the twin network module is a network trained by an attention-and-Excitation-Resnet-101 (SE-Resnet-101), and the network structure of the SE-Resnet-101 is actually based on the Resnet-101 (residual network 101), an attention module (Squeze-and-Excitation block, SEblock) is added, and an attention mechanism of a channel dimension is introduced. For example, the image processing apparatus may set the output dimension of two network branches in the twin network module to be 100 dimensions, and it is understood that, if the outputs of the two branches are merged (the positive sample feature and the negative sample feature are subjected to the fusion processing), the feature dimension corresponding to the obtained image sample pair is 200 dimensions, and specifically, the structure of each layer of SE-Resnet-101 may be shown in the following table:

and S103, carrying out fusion processing on the positive sample characteristics and the negative sample characteristics to obtain fusion characteristics of the image sample pairs, and analyzing and processing the fusion characteristics to obtain evaluation results of the image sample pairs.

In an embodiment, with continued reference to fig. 2a, as shown in fig. 2a, the image prediction model further includes a feature fusion module and a first feature analysis module, and the image processing device may fuse the positive sample features and the negative sample features in the image sample pair after obtaining the positive sample features corresponding to the positive sample and the negative sample features corresponding to the negative sample in the image sample pair, so as to obtain fusion features of the image sample pair, and then analyze the fusion features by the first feature analysis module, so as to obtain an evaluation result of the image sample pair. The evaluation result may be, for example, a result of comparing the fitness levels of the positive sample and the negative sample, and may be, for example, an input picture fitness of the first feature extraction branch is reasonably better than an input picture of the second feature extraction branch represented by 1, and an input picture fitness of the second feature extraction branch is reasonably better than an input picture of the first feature extraction branch represented by 0.

In a specific embodiment, the first feature parsing module may be, for example, a fully connected layer; the feature fusion module can predict the rank order of the fitness of the positive sample and the negative sample, and then perform feature fusion according to the quality order of the fitness of the positive sample and the negative sample. Optionally, the feature fusion mode may be: and performing stitching processing on the image characteristics of the positive sample and the negative sample, for example, placing the image characteristic A corresponding to the image with higher appropriateness in the two images at the front part of the fusion characteristic, and stitching the image characteristic B corresponding to the image with lower appropriateness at the back part of the image characteristic A to be used as the back part of the fusion characteristic.

For example, assume that there is an image sample pair including a positive sample a and a negative sample B, and the evaluation result includes 1 and 0, where 1 represents the input picture goodness of the first feature extraction branch better than the input picture goodness of the second feature extraction branch, and 0 represents the input picture goodness of the second feature extraction branch better than the input picture goodness of the first feature extraction branch. Then, when the unmarked sample is subjected to feature extraction on the positive sample A through the first feature extraction branch of the twin network module to obtain a positive sample feature A1, and is subjected to feature extraction on the negative sample B through the second feature extraction branch of the twin network module to obtain a negative sample feature B1; if the A1 and the B1 are input into the feature fusion module, the feature fusion module predicts that the image fitness corresponding to the positive sample feature A1 is higher than the image fitness corresponding to the negative sample feature B1, and the feature fusion module generates a fusion feature A1B 1; if the image fitness corresponding to the negative sample feature B1 is higher than the image fitness corresponding to the positive sample feature A1, the feature fusion module generates a fusion feature B1A 1. Then, the first feature analysis module analyzes the ranking sequence of the positive sample features and the negative sample features in the fusion features to obtain an evaluation result of a proper ranking sequence of the positive sample a and the negative sample B, for example: the first feature analysis module can obtain an evaluation result 1 (indicating that the fitness of the positive sample A is better than that of the negative sample B) according to the fusion features A1B 1; the evaluation result 0 (indicating that the fitness of the negative sample B is better than that of the positive sample a) can be obtained from the fusion feature B1a 1.

And S104, training the current twin network module based on the evaluation result.

In one embodiment, the training of the twin network module by the image processing apparatus based on the evaluation result refers to: the image processing device trains the twin network module once. Since the twin network module is a component of the image prediction model, it can be understood that one training of the twin network module is one training of the image prediction model.

In one embodiment, as can be seen from the foregoing, the image sample set to which the image sample pair belongs includes N image sample pairs, where N is an integer greater than 1, and then the image sample pair used by the image processing apparatus for performing one training on the twin network module may be the mth sample pair in the image sample set (M is greater than or equal to 1 and less than or equal to N, where M is an integer); when M is 1, the current twin network module is a twin network module; when M is more than 1 and less than or equal to N, the current twin network module is the twin network module obtained after the M-1 training; after the image processing device trains the current twin network module based on the evaluation result to obtain the trained twin network module, the image processing device may perform a plus 1 operation on M and trigger execution of obtaining an image sample pair to train the trained twin network module next time.

In a specific embodiment, since the quality of the positive samples forming the image sample pair is higher than that of the negative samples when the image sample pair is obtained, the image processing device performs one training on the twin network module based on the evaluation result, and the method can be implemented in the following manner: the image processing device performs primary training on the image prediction model by using a first loss function based on a target difference value between an evaluation result of the image sample pair and a standard evaluation result (the quality of a positive sample is higher than that of a negative sample), so that parameters in each module (a twin network module, a first feature extraction module and a feature fusion module) in the image prediction model, particularly the twin network module, are adjusted once. Wherein the first loss function may for example be a cross entropy loss function. Alternatively, for a more excellent training effect, the image processing apparatus may prepare a plurality of different videos for acquiring the image sample pairs, wherein each video may acquire N image sample pairs. When the image processing method is mainly applied to a video cover selection scene, the mode of obtaining the image sample pair greatly improves the service fit degree of the image processing method.

And S105, constructing an image evaluation model based on the trained twin network module.

In one embodiment, as shown in 202 in fig. 2a, after the image processing apparatus obtains the trained twin network module, an image evaluation model may be constructed by any feature extraction branch followed by a second feature analysis module; it is also possible to construct an image evaluation model based on the entire trained twin network module, as shown in fig. 2 c. Then, the image processing device can evaluate the image to be processed through the image evaluation model to obtain the evaluation value of the image to be processed. In a specific embodiment, the evaluation value may be, for example, the sharpness of a subject in the image, the richness of colors, or the sharpness of the image, and the second feature analysis module may be, for example, a fully connected layer different from the first feature analysis module.

The embodiment of the application utilizes an image comparison method, and saves labor cost by a mode of acquiring a large amount of paired data (image sample pairs) as training data in the existing business; in addition, the image processing equipment inputs the image sample pair into the twin network module for feature extraction to obtain fusion features of the image sample pair, then obtains a comparison result (evaluation result) of a positive sample and a negative sample in the image sample pair according to the fusion features, and trains the twin network module based on the comparison result to obtain a trained twin network module; then the image processing equipment builds an image evaluation model based on the trained twin network module, and builds a relatively objective image evaluation system, so that the generalization of the image evaluation model is enhanced.

Referring to fig. 3, fig. 3 is a schematic diagram of an image evaluation model training method according to an embodiment of the present application, and as shown in fig. 3, the method includes:

and S301, acquiring a labeling sample.

In one embodiment, as can be seen from the foregoing description, the image processing apparatus implements training of an image prediction model by inputting paired images, and then it can be understood that, when an image evaluation model is constructed by using any feature extraction branch of a twin network module in the image prediction model, a processing object of the image evaluation model is a single image, and therefore, after the image evaluation model is constructed, the image evaluation model needs to be trained by using a small number of labeled samples, so that any feature extraction branch is suitable for a feature extraction process in an image evaluation scene. The annotation sample may include an annotation image and a first evaluation value of the annotation image, where the first evaluation value is obtained by a user annotating the annotation image, and a small number of annotation samples may be, for example, ten thousand levels of annotation samples. Exemplarily, a training process for an image evaluation model can be shown in fig. 4a, and it can be seen that an annotation sample is any one image in any image cluster.

In a specific embodiment, the first evaluation value of the annotation image may be a distribution score obtained after scoring by a plurality of users, or may be a single score obtained after scoring by a single user. For example, assuming that there is an annotation image as shown in fig. 4b, the first evaluation value of the annotation image may be, for example, the score data of the user a after scoring the content of the image, such as: the integrity of the main body is 85 points, the rationality of the position of the main body is 90 points, and the color richness score is 95 points; for example, the final rating data may be obtained from rating data of the user a and the user B after the users a and B respectively perform rating.

S302, extracting the features of the marked image through the image evaluation model to obtain the image features of the marked image, and analyzing and processing the image features through the image evaluation model to obtain a second evaluation value of the marked image.

In one embodiment, when the image evaluation model is constructed based on any feature extraction branch of the twin network module (hereinafter referred to as twin network branch), the image evaluation model may perform feature extraction on the tagged image through the twin network branch to obtain an image feature, and then analyze the image feature by using the second feature analysis module to obtain a second evaluation value of the tagged image.

And S303, training the image evaluation model based on the first evaluation value and the second evaluation value to obtain the trained image evaluation model.

In one embodiment, when the first evaluation value is a distribution score, the loss function used by the image processing apparatus in training the image evaluation model may be, for example, an earth mover's loss function, where the earth mover's loss is a distance between two distributions, and the earth mover's loss is mainly implemented by measuring a distance between two distributions and then comparing the distance with a distance threshold; when the first evaluation value is a single score, the second loss function used by the image processing device in the process of training the image evaluation model may continue to use the cross entropy loss function, or may use the above earth mover's loss after converting the single score into a distribution score according to normal distribution simulation.

In yet another embodiment, assuming that the first evaluation value of the annotation image a is 95 points and the second evaluation value is 70 points, the image processing apparatus may train the image evaluation model, in particular, the twin network branches in the image evaluation model, using the second loss function according to the first evaluation value and the second evaluation value to obtain a feature extraction module suitable for the image evaluation process, thereby obtaining the trained image evaluation model. Therefore, when the image evaluation model is constructed based on any feature extraction branch of the trained twin network module, the training of the image evaluation model by the image processing device is mainly to fine-tune the weight of one network branch of the trained twin network module, that is: in the training process of the image evaluation model, the image processing device only needs to adjust the parameters of any network branch after training by a small amplitude to obtain a network module for extracting the features of the image to be processed in the image evaluation process, and the training method improves the training convergence speed in the training process of the image evaluation model, wherein the training convergence means that: and continuously reducing the loss value obtained by using the loss function until the expected value is reached.

According to the method, a small number of marked images marked with the first evaluation values are input into the image evaluation model for training by using a grading comparison method, so that the parameters of the twin network module are further adjusted to obtain the trained image evaluation model, the image processing speed is increased, and meanwhile, the grading operability of the image evaluation model is kept.

Referring to fig. 5, fig. 5 is a diagram illustrating an image processing method according to an embodiment of the present invention, as shown in fig. 5, the method includes:

s501, acquiring an image to be processed.

In one embodiment, the image to be processed may be any image that needs to be subjected to image evaluation processing, such as: in the process of selecting the cover of the magazine, all images included in the magazine are included. In another embodiment, the image to be processed may also be any image in a video that needs to be subjected to a video evaluation process, such as: in the video screening process, the image processing equipment performs frame extraction on the video to obtain a video picture.

And S502, performing image evaluation processing on the image to be processed through the image evaluation model to obtain an evaluation value of the image to be processed.

In one embodiment, when the image evaluation model is constructed based on any feature extraction branch of the twin network module, the image evaluation model can be used for performing image evaluation processing on a single image to be processed, and thus, the method provided by the embodiment enables the twin network module in the image evaluation model to be independent of comparison of paired data. For example, when the user a needs to evaluate a certain image, the image evaluation model may be invoked to directly process the image to obtain the evaluation value of the image, and the image evaluation process does not need to be performed after the image and another image are combined into a pair of good and bad differentiated images.

In yet another embodiment, when the image evaluation model is constructed based on the entire trained twin network module, the image evaluation model may be used to perform image evaluation processing on an image to be processed including two sub-images, where the two sub-images included in the image to be processed may be any two images. For example, if a photo contest requires two-by-two comparisons of contest pieces, winners may advance to the next round of the contest. If the images 1 and 2 uploaded by the users a and B form a pair of mosaics, the images 1 and 2 may be evaluated by using an image evaluation model to obtain an evaluation value of the pair of mosaics. Then, in this case, the evaluation value may be a result of scoring for each dimension of each of the images for competition, such as: the match image is 1, the brightness is 95 points, and the color vividness is 75 points; 2, 89 minutes of brightness and 96 minutes of color vividness; the evaluation value may also be a higher-scoring image name in each dimension, such as: "luminance: race image 1, color vividness: race image 2 "(i.e., race image 1 is more brightly lighted and race image 2 is more brightly colored); therefore, the construction mode of the image evaluation model can effectively improve the image evaluation processing speed of the image evaluation model on the image to be processed.

According to the image processing method provided by the embodiment of the application, the image to be processed is obtained, then the image evaluation model is called to evaluate the image to be processed, and when the evaluation value of the image to be processed is obtained, the image evaluation model adopted by the application is constructed based on the trained twin network module, so that the image evaluation model has a more objective image evaluation system, and the image processing method provided by the application improves the generalization of image processing.

Based on the description of the above embodiment of the image processing method, the embodiment of the present application also discloses an image processing apparatus, which may be a computer program (including program code) running in the above-mentioned server. The image processing apparatus may perform the method shown in fig. 1, 3 or 5. Referring to fig. 6, the image processing apparatus 60 may at least include: an acquisition unit 601 and a processing unit 602.

An acquisition unit 601 configured to acquire an image to be processed;

a processing unit 602, configured to perform image evaluation processing on the image to be processed through an image evaluation model, so as to obtain an evaluation value of the image to be processed;

In an embodiment, the image processing apparatus 60 further comprises a training unit 603; before the processing unit 602 performs image evaluation processing on the image to be processed through an image evaluation model, so as to obtain an evaluation value of the image to be processed, the training unit 603 is configured to perform:

obtaining a pair of image samples, the pair of image samples comprising the positive sample and the negative sample;

performing feature extraction on positive samples in the image sample pairs through a first feature extraction branch of a current twin network module to obtain positive sample features, and performing feature extraction on negative samples in the image sample pairs through a second feature extraction branch of the current twin network module to obtain negative sample features;

performing fusion processing on the positive sample features and the negative sample features to obtain fusion features of the image sample pairs, and obtaining evaluation results of the image sample pairs based on the fusion features, wherein the evaluation results are used for indicating evaluation levels of the positive samples and the negative samples in the image sample pairs;

training the current twin network module based on the evaluation result to obtain a trained twin network module;

the processing unit 602 is further specifically configured to: and constructing the image evaluation model based on the trained twin network module.

In another embodiment, when executing the image evaluation model constructed based on the trained twin network module, the processing unit 602 is specifically configured to: and constructing the image evaluation model based on any feature extraction branch of the trained twin network module.

In another embodiment, after the processing unit 602 executes the image evaluation model constructed based on the trained twin network module, the training unit 603 is further configured to execute:

acquiring an annotation sample, wherein the annotation sample comprises an annotation image and a first evaluation value of the annotation image, and the first evaluation value is obtained by annotating the annotation image by a user;

extracting the features of the annotated image through the image evaluation model to obtain the image features of the annotated image, and analyzing and processing the image features through the image evaluation model to obtain a second evaluation value of the annotated image;

training the image evaluation model based on the first evaluation value and the second evaluation value to obtain a trained image evaluation model;

then, when the image evaluation processing is performed on the image to be processed by the image evaluation model to obtain an evaluation value of the image to be processed, the processing unit 602 is specifically configured to perform:

and performing image evaluation processing on the image to be processed through the trained image evaluation model to obtain an evaluation value of the image to be processed.

In another embodiment, the acquiring unit 601, when acquiring the pair of image samples, is specifically configured to:

acquiring an image cluster, wherein the image cluster comprises a plurality of images; taking the image selected by the user in the image cluster as a positive sample; determining a negative sample in images of the image cluster other than the positive sample; generating an image sample pair comprising the positive sample and the negative sample.

In another embodiment, when determining a negative sample in the images other than the positive sample in the image cluster, the obtaining unit 601 is specifically configured to:

obtaining the similarity between the positive sample and any image in the image cluster; and selecting the negative sample in the image with the similarity smaller than the similarity threshold value.

In yet another embodiment, the image sample set to which the image sample pairs belong includes N image sample pairs, N being an integer greater than 1; the image sample pair is the Mth sample pair in the image sample set, M is more than or equal to 1 and less than or equal to N, and M is an integer; when M is 1, the current twin network module is the twin network module; when M is more than 1 and less than or equal to N, the current twin network module is the twin network module obtained after the M-1 training;

the training unit 603 is further configured to, after performing training on the current twin network module based on the evaluation result to obtain a trained twin network module, perform: and executing an adding 1 operation on the M, and triggering and executing the pair of acquired image samples.

According to an embodiment of the present application, the steps involved in the methods shown in fig. 1, 3 and 5 may be performed by units in the image processing apparatus 60 shown in fig. 6. For example, steps S101 to S104 shown in fig. 1 may be performed by the training unit 603 in the image processing apparatus 60 described in fig. 6, and step S105 may be performed by the processing unit 602 in the image processing apparatus 60 shown in fig. 6; steps S301 to S303 shown in fig. 3 may be performed by the training unit 603 in the image processing apparatus 60 shown in fig. 6, and step S304 may be performed by the processing unit 602 in the image processing apparatus 60 shown in fig. 6; as another example, step S501 shown in fig. 5 may be performed by the acquisition unit 601 in the image processing apparatus 60 shown in fig. 6, and step S502 may be performed by the processing unit 602 in the image processing apparatus 60 shown in fig. 6.

According to another embodiment of the present application, the units in the image processing apparatus 60 shown in fig. 6 are divided based on logical functions, and the units may be respectively or entirely combined into one or several other units to form the unit, or some unit(s) may be further split into multiple units with smaller functions to form the unit(s), which may achieve the same operation without affecting the achievement of the technical effects of the embodiments of the present application. In other embodiments of the present application, the image processing apparatus may also include other units, and in practical applications, these functions may also be implemented by being assisted by other units, and may be implemented by cooperation of multiple units.

According to another embodiment of the present application, the image processing apparatus 60 as shown in fig. 6 may be configured by running a computer program (including program code) capable of executing the steps involved in the method as shown in fig. 1, fig. 3, or fig. 5 on a general-purpose computing device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and a storage element, and the image processing method, the twin network module training method, and the image evaluation model training method of the embodiments of the present application may be implemented. The computer program may be embodied on, for example, a computer storage medium, and loaded into and executed by the computing device described above via the computer storage medium.

In the embodiment of the present application, the obtaining unit 601 obtains an image to be processed, and then the processing unit 602 performs image evaluation processing on the image to be processed through an image evaluation model to obtain an evaluation value of the image to be processed, where the image evaluation model is constructed based on a trained twin network module, the trained twin network module is obtained by training the twin network module based on an evaluation result of the image sample pair, the evaluation result is used to indicate evaluation levels of a positive sample and a negative sample in the image sample pair, the evaluation result is obtained based on a fusion feature of the image sample pair, the fusion feature is obtained based on a positive sample feature and a negative sample feature, the positive sample feature is obtained by performing feature extraction on the positive sample by using a first feature extraction branch of the twin network module, the negative sample feature is obtained by performing feature extraction on the negative sample by using a second feature extraction branch of the twin network module, it can be understood that, the twin network is adopted in the feature extraction process to process the image sample pairs, so that the model established based on the twin network has the capability of learning the nuance between the images, and the generalization of the image processing is enhanced.

Based on the description of the method embodiment and the device embodiment, the embodiment of the application also provides an image processing device. Referring to fig. 7, the image processing apparatus 70 at least includes a processor 701, an input interface 702, and a computer storage medium 703, and the processor 701, the input interface 702, and the computer storage medium 703 in the image processing apparatus 70 may be connected by a bus or other means.

The computer storage medium 703 is a memory device in a computer device for storing programs and data. It is understood that the computer storage medium 703 herein may include a built-in storage medium in the image processing apparatus, and may also include an extended storage medium supported by the image processing apparatus. The computer storage medium 703 provides a storage space that stores an operating system of the image processing apparatus. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by processor 701. The computer storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; and optionally at least one computer storage medium located remotely from the processor. The processor 701 (or CPU) is a computing core and a control core of the image Processing apparatus, and is adapted to implement one or more instructions, and in particular, is adapted to load and execute the one or more instructions so as to implement a corresponding method flow or a corresponding function.

In one embodiment, one or more instructions stored in the computer storage medium 703 may be loaded and executed by the processor 701 to implement the corresponding method steps described above in connection with the method embodiments illustrated in fig. 1, 3, and 5; in particular implementations, one or more instructions in the computer storage medium 703 are loaded and executed by the processor 701 to perform the steps of:

acquiring an image to be processed through an input interface 702;

In one embodiment, before the instruction for performing image evaluation processing on the image to be processed by the image evaluation model in the computer storage medium 703 to obtain an evaluation value of the image to be processed is executed, the processor is further configured to load and execute:

and constructing the image evaluation model based on the trained twin network module.

In another embodiment, the instructions for constructing the image evaluation model based on the trained twin network module in the computer storage medium 703 are specifically loaded and executed by the processor 701: and constructing the image evaluation model based on any feature extraction branch of the trained twin network module.

In yet another embodiment, after the instructions for constructing the image evaluation model based on the trained twin network module are executed in the computer storage medium 703, the processor 701 is further configured to load and execute:

the image evaluation processing of the image to be processed through the image evaluation model to obtain the evaluation value of the image to be processed includes:

In yet another embodiment, the instructions for obtaining image sample pairs in the computer storage medium 703 are specifically loaded and executed by the processor 701 to:

acquiring an image cluster, wherein the image cluster comprises a plurality of images;

taking the image selected by the user in the image cluster as a positive sample;

determining a negative sample in images of the image cluster other than the positive sample;

generating an image sample pair comprising the positive sample and the negative sample.

In yet another embodiment, the instruction for determining a negative sample in the images other than the positive sample in the image cluster in the computer storage medium 703 is specifically loaded and executed by the processor 701:

obtaining the similarity between the positive sample and any image in the image cluster;

and selecting the negative sample in the image with the similarity smaller than the similarity threshold value.

after the current twin network module is trained based on the evaluation result in the computer storage medium 703, and the trained twin network module instruction is executed, the processor 701 is further configured to load and execute:

and executing an adding 1 operation on the M, and triggering and executing the pair of acquired image samples.

In the embodiment of the application, the input interface 702 obtains an image to be processed, and then the processor 701 performs image evaluation processing on the image to be processed through an image evaluation model to obtain an evaluation value of the image to be processed, where the image evaluation model is constructed based on a trained twin network module, the trained twin network module is obtained by training the twin network module based on an evaluation result of an image sample pair, the evaluation result is used to indicate evaluation grades of a positive sample and a negative sample in the image sample pair, the evaluation result is obtained based on a fusion feature of the image sample pair, the fusion feature is obtained based on a positive sample feature and a negative sample feature, the positive sample feature is obtained by performing feature extraction on the positive sample by using a first feature extraction branch of the twin network module, the negative sample feature is obtained by performing feature extraction on the negative sample by using a second feature extraction branch of the twin network module, it can be understood that, the twin network is adopted in the feature extraction process to process the image sample pairs, so that the model established based on the twin network has the capability of learning the nuance between the images, and the generalization of the image processing is enhanced.

The embodiment of the present application further provides a computer storage medium, where a computer program of the image processing method is stored in the computer storage medium, where the computer program includes program instructions, and when one or more processors load and execute the program instructions, the description of the image processing method in the embodiment may be implemented, which is not described herein again. The description of the beneficial effects of the same method is not repeated herein. It will be understood that the program instructions may be deployed to be executed on one or more devices capable of communicating with each other.

It should be noted that according to an aspect of the present application, a computer program product or a computer program is also provided, and the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer readable storage medium. The processor in the image processing apparatus reads the computer instructions from the computer readable storage medium and then executes the computer instructions, thereby enabling the image processing apparatus to perform the methods provided in the various alternatives in the aspect of the embodiments of the image processing method shown in fig. 1, 3 and 5 described above.

It will be understood by those skilled in the art that all or part of the processes in the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium, and the computer program may include the processes of the embodiments of the image processing method described above when executed. The computer readable storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

While the invention has been described with reference to a particular embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An image processing method, comprising:

acquiring an image to be processed;

2. The method according to claim 1, wherein before the image evaluation processing is performed on the image to be processed by the image evaluation model to obtain an evaluation value of the image to be processed, the method further comprises:

3. The method of claim 2, wherein constructing an image evaluation model based on the trained twin network module comprises:

and constructing the image evaluation model based on any feature extraction branch of the trained twin network module.

4. The method of claim 2, wherein after the constructing the image evaluation model based on the trained twin network module, the method further comprises:

5. The method of claim 2, wherein said obtaining a pair of image samples comprises:

taking the image selected by the user in the image cluster as the positive sample;

6. The method of claim 5, wherein determining a negative sample in the images in the image cluster other than the positive sample comprises:

and selecting the negative sample from the images with the similarity smaller than the similarity threshold value in the image cluster.

7. The method according to claim 2, wherein the image sample set to which the image sample pairs belong comprises N image sample pairs, N being an integer greater than 1; the image sample pair is the Mth sample pair in the image sample set, M is more than or equal to 1 and less than or equal to N, and M is an integer; when M is 1, the current twin network module is the twin network module; when M is more than 1 and less than or equal to N, the current twin network module is the twin network module obtained after the M-1 training;

the training of the current twin network module based on the evaluation result to obtain the trained twin network module further comprises:

8. An image processing apparatus characterized by comprising:

the acquisition unit is used for acquiring an image to be processed;

9. An image processing apparatus, characterized in that the apparatus comprises an input interface, and further comprises:

a processor adapted to implement one or more instructions;

a computer storage medium having stored thereon one or more instructions adapted to be loaded by the processor and to execute the image processing method according to any of claims 1-7.

10. A computer storage medium having stored thereon one or more instructions adapted to be loaded by the processor and to perform the image processing method according to any of claims 1-7.