CN111402151A

CN111402151A - Image processing method, image processing device, electronic equipment and computer readable medium

Info

Publication number: CN111402151A
Application number: CN202010158768.7A
Authority: CN
Inventors: 李华夏
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2020-03-09
Filing date: 2020-03-09
Publication date: 2020-07-10

Abstract

The disclosure provides an image processing method, an image processing device, an electronic device and a computer readable medium. When the method is used for training the target stylized special effect network, the first generation network and the second generation network are adopted in the confrontation generation network to be subjected to opposite style change processing, the confrontation generation network can be trained on the basis of the unmatched original style diagram and the target style diagram, the training of the target stylized special effect network with better effect is completed by using more limited training samples, and when the target stylized special effect network is used for carrying out target stylized processing on an image to be processed, compared with the prior art, the effects of being clearer and higher in sharpening degree can be obtained.

Description

Image processing method, image processing device, electronic equipment and computer readable medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an image processing method and apparatus, an electronic device, and a computer-readable medium.

Background

With the rapid development of computer technology and communication technology, the use of intelligent terminals is widely popularized, and more application programs are developed to facilitate and enrich the work and life of people. Currently, many applications are dedicated to providing more personalized visual special effects with better visual perception for intelligent terminal users, such as filter effects, sticker effects, deformation effects, and the like.

The characteristic of changing the style of the image is a common visual special effect, and the image can be changed into another style by changing the attributes of the image such as color, texture and the like.

In the prior art, a Convolutional Neural Network (CNN) is usually trained simply to implement special effects of image style changes of images, and during the training process, paired images are required to be used to generate corresponding training data, the paired images include images with different styles and the same content, such as target styles and the same face images with other styles, and the purpose of training is that the network can implement the change of images from other styles to the target styles.

For this training mode, the quality and quantity of the training samples directly affect the training effect of the network. If the training sample lacks images of any style, the training effect is poor, and the accuracy of the trained network is low. However, in reality, a training sample containing images of two styles is difficult to collect simultaneously, the usable data scale is small, the data matching degree is low, and how to complete network training of the image style change special effect under the condition that the training sample is insufficient becomes a difficult problem to be solved.

Disclosure of Invention

In order to overcome the above technical problems or at least partially solve the above technical problems, the following technical solutions are proposed:

in a first aspect, the present disclosure provides an image processing method, including:

acquiring an image to be processed and a pre-trained target stylized special effect network, wherein the target stylized special effect network is obtained by performing countermeasure training on two generation networks and two discrimination networks included in an countermeasure generation network, and the two generation networks correspond to opposite style change processing;

and carrying out target stylization processing on the image to be processed through the target stylization special effect network to obtain a target style image.

In a second aspect, the present disclosure provides an image processing apparatus comprising:

the system comprises an acquisition module, a pre-training module and a processing module, wherein the acquisition module is used for acquiring an image to be processed and a pre-trained target stylized special effect network, the target stylized special effect network is obtained by performing countermeasure training on two generation networks and two discrimination networks which are included in an countermeasure generation network, and the two generation networks correspond to opposite style change processing;

and the special effect processing module is used for carrying out target stylization processing on the image to be processed through the target stylized special effect network to obtain a target style image.

In a third aspect, the present disclosure provides a training apparatus, comprising:

the system comprises a sample acquisition module, a data acquisition module and a data processing module, wherein the sample acquisition module is used for acquiring a training sample set, and each group of training samples in the training sample set comprises an original style sheet and a target style sheet;

the system comprises a network acquisition module, a configuration module and a configuration module, wherein the network acquisition module is used for acquiring a pre-constructed countermeasure generation network, and the countermeasure generation network comprises a first generation network, a second generation network, a first judgment network and a second judgment network;

the first generation network is used for carrying out target stylization processing on the original stylized images in each group of training samples to obtain corresponding first generated images, and the second generation network is used for carrying out style original processing on the first generated images to obtain corresponding second generated images;

the second generation network is used for carrying out style-primitive processing on the target stylized graph in each group of training samples to obtain a corresponding third generated image, and the first generation network is used for carrying out target stylized processing on the third generated image to obtain a corresponding fourth generated image;

the first judging network is used for judging the authenticity of the target style sheet and the first generated image in each group of training samples to obtain a corresponding first judging result;

the second judging network is used for judging the authenticity of the original style sheet and the third generated image in each group of training samples to obtain a corresponding second judging result;

and the network training module is used for carrying out countermeasure training on the antibiotic forming network based on the first judgment result, the second generating image and the fourth generating image which are respectively corresponding to each group of training samples, and determining the trained first generating network as the target stylized special effect network.

In a fourth aspect, the present disclosure provides an electronic device comprising:

a processor and a memory storing at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement a method as set forth in the first aspect of the disclosure.

In a fifth aspect, the present disclosure provides a computer readable medium for storing a computer instruction, program, code set or instruction set which, when run on a computer, causes the computer to perform the method as set forth in the first aspect of the disclosure.

According to the image processing method, the image processing device, the electronic equipment and the computer readable medium, when the target stylized special effect network is trained, the confrontation generation network can be trained based on the unmatched original style diagram and target style diagram in the confrontation generation network through the first generation network and the second generation network through opposite style change processing, so that the purpose of completing training of the target stylized special effect network with better effect by using more limited training samples is achieved, and when the target stylized special effect network is used for conducting target stylized processing on the image to be processed, compared with the prior art, the special effect with clearer and higher sharpening degree can be obtained.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a countermeasure generation network provided by an embodiment of the present disclosure;

fig. 3a is an exemplary diagram of an image to be processed provided by an embodiment of the present disclosure;

FIG. 3b is an exemplary diagram of a black and white line-style image provided by an embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of a training model provided in the embodiments of the present disclosure;

fig. 5 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing the features, data, elements, devices, modules or units, and are not used for limiting the features, data, elements, devices, modules or units to be specific to different features, data, elements, devices, modules or units, and also for limiting the sequence or interdependence relationship of the functions performed by the features, data, elements, devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

To make the objects, technical solutions and advantages of the present disclosure more apparent, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.

An embodiment of the present disclosure provides an image processing method, as shown in fig. 1, the method including:

step S100: acquiring an image to be processed and a pre-trained target stylized special effect network, wherein the target stylized special effect network is obtained by performing countermeasure training on two generation networks and two discrimination networks included in an countermeasure generation network, and the two generation networks correspond to opposite style change processing;

and step S200, performing target stylization processing on the image to be processed through the target stylization special effect network to obtain a target style image.

The image processing method provided by the embodiment of the disclosure is characterized in that in the countermeasure generation network, the first generation network and the second generation network are subjected to opposite style change processing, the countermeasure generation network can be trained based on the unmatched original style diagram and the target style diagram, the training of the target stylized special effect network with better effect is completed by using more limited training samples, and when the target stylized special effect network is used for carrying out target stylized processing on the image to be processed, compared with the prior art, the effect of the special effect which is clearer and has higher sharpening degree can be obtained.

Specifically, the target stylized special effect network is obtained by training the following steps:

step S110: acquiring a training sample set, wherein each group of training samples in the training sample set comprises an original style sheet and a target style sheet;

in the embodiment of the present disclosure, the specific type of the original style or the target style is not limited, and for example, the style may be a color photograph style, a black and white photograph style, a color oil painting style, a black and white line style, and the like. The skilled person in the art can determine the original style and the target style according to the actual requirement, and select the corresponding original style diagram and the target style diagram to train the target stylized special effect network. For example, if the training target stylized special effect network can convert an image from a color photo style to a black and white line style, when the training sample set is selected, the target stylized special effect network may be trained by using the color photo style image as an original style chart and the black and white line style image as a target style chart.

The original style sheet and the target style sheet included in a set of training samples may be matched, that is, a set of training samples may be composed of corresponding original style sheet and target style sheet, and the corresponding original style sheet and target style sheet have the same image content. Alternatively, the original style sheet and the target style sheet included in a set of training samples may be unmatched. In practical application, an original style sheet and a target style sheet can be respectively selected from the data set to form a set of training samples without paying attention to the relationship between the two sheets.

It will be appreciated by those skilled in the art that in this case, there may be repeated original or target patterns in different sets of training samples, illustratively this training is based on m sets of training samples, including the original pattern { a }₁，a₂，…，a_mAnd a target trellis diagram b₁，b₂，…，b_mIn which b is₁And b_mMay be repeated, or the plots in different sets of training samples may be completely different, for exampleE.g. b₁To b_mAll the differences are not limited in this disclosure.

Step S120: acquiring a pre-constructed countermeasure generating network, wherein the countermeasure generating network comprises a first generating network, a second generating network, a first judging network and a second judging network; performing target stylization processing on the original stylized graphs in each group of training samples through a first generation network to obtain corresponding first generated images, and performing style-original processing on the first generated images through a second generation network to obtain corresponding second generated images; performing style-primitive processing on the target stylized graph in each group of training samples through a second generation network to obtain a corresponding third generated image, and performing target stylized processing on the third generated image through the first generation network to obtain a corresponding fourth generated image; judging the authenticity of the target style sheet and the first generated image in each group of training samples through a first judging network to obtain a corresponding first judging result; judging the authenticity of the original style sheet and the third generated image in each group of training samples through a second judging network to obtain a corresponding second judging result;

the challenge generation network may be constructed based on various types of challenge generation networks (GAN), and the main structure of GAN includes a generator g (generator) and a discriminator d (discriminator).

For the disclosed embodiment, as shown in fig. 2, two generators G are defined, namely a first generation network and a second generation network, respectively, the first generation network is responsible for changing the image from the original style to the target style, and the second generation network is responsible for changing the image from the target style to the original style.

Thus, in the disclosed embodiments, the first and fourth generated images output by the first generation network are of a target style, and the second and third generated images output by the second generation network are of an original style.

For the embodiment of the present disclosure, as shown in fig. 2, two discriminators D are further defined, namely a first discrimination network and a second discrimination network, respectively, where the first discrimination network is used for discriminating the authenticity of the target style sheet and the first generated image in the training sample, and since the target style sheet and the first generated image are both of the target style, the first discrimination network can be directly used for judging whether the target style sheet is true (Real) or false (Fake) and whether the first generated image is true or false. Similarly, the second judging network is used for judging the authenticity of the original style sheet and the third generated image in the training sample, and since the original style sheet and the third generated image are both of original style, the second judging network can be directly used for judging whether the original style sheet is true or false and whether the third generated image is true or false.

Step S130: and performing countermeasure training on the anti-biotic network based on the first judgment result, the second generated image and the fourth generated image which are respectively corresponding to each group of training samples, and determining the trained first generated network as a target stylized special effect network.

Specifically, the confrontational training may employ the following procedure:

initializing the network parameters of the first generation network, the network parameters of the second generation network, the network parameters of the first judgment network and the network parameters of the second judgment network.

Based on m sets of training samples, including the original style sheet { a }₁，a₂，…，a_mAnd a target trellis diagram b₁，b₂，…，b_mAnd m first generated images { c } obtained from the first generated network₁，c₂，…，c_mH, m fourth generation images { f₁，f₂，…，f_mM second generated images { d } obtained from a second generation network₁，d₂，…，d_mM third generated images { e }₁，e₂，…，e_mCarry out the confrontation training.

Training a first discriminant network to distinguish a true sample (target style) from a generated sample (first generated image) as accurately as possible; training the first generation network reduces the style gap between the generated sample (first generation image) and the real sample (target style diagram) as much as possible, which also means that the first discrimination network is discriminated incorrectly as much as possible. Training a second discrimination network to distinguish the real sample (original style sheet) and the generated sample (third generated image) as accurately as possible; training the second generation network reduces the style gap (original style chart) between the generated sample (third generation image) and the real sample as much as possible, which also means that the second judgment network is judged wrongly as much as possible. That is, the four networks respectively improve the generation capability and the discrimination capability in the process of the countermeasure training.

After multiple update iterations, the final ideal case is that the first and second discrimination networks cannot discriminate whether the sample is a generated sample or a real sample.

Due to the fact that the generation capacity of the first generation network and the generation capacity of the second generation network reach an ideal state through the countertraining, the trained first generation network is determined to be the target stylized special effect network, and a good target stylized processing special effect can be achieved.

In practical applications, if the image needs to be changed from the target style to the original style, the trained second generation network may be used for processing, and a good inverse processing effect of the target style image can be achieved.

The image processing method provided by the embodiment of the disclosure is adopted in a pre-constructed confrontation generation network when a target stylized special effect network is trained, the original style image is converted into a second generation image with consistent style through the first generation network and the second generation network through twice cycle changes, and transforming the target style sheet into a fourth generated image of consistent style, the confrontation generation network being able to be trained based on the non-matching original and target style sheets, and the first generation network in the trained confrontation generation network is determined as the target stylized special effect network, thereby realizing the training of the target stylized special effect network with better effect by using more limited training samples, when the target stylized special effect network is used for carrying out target stylized processing on the image to be processed, compared with the prior art, the special effect which is clearer and has higher sharpening degree can be obtained.

In the prior art, usually, a convolutional neural network is simply trained to realize the style change special effect of the image, but because training errors and generalization errors exist in the training of the neural network, the style change special effect of the image is realized through the simple convolutional neural network, so that the special effect is poor, and the user experience is influenced.

In the training method for the target stylized special-effect network provided by the embodiment of the disclosure, the generation capability of the first generation network is improved by training the pre-constructed confrontation generation network, and the trained first generation network is determined as the target stylized special-effect network, so that the training effect of the network can be effectively improved, and compared with the prior art, a clearer and higher-sharpening special-effect can be obtained.

In the embodiment of the disclosure, a corresponding loss function is provided for the countermeasure training process, so as to better optimize the countermeasure generation network in the training process.

Specifically, step S130 includes the steps of:

step S131: determining corresponding first countermeasure loss according to the first discrimination result corresponding to each group of training samples;

in an embodiment of the present disclosure, the first pair of immunity losses includes a true sample loss corresponding to the target style sheet, a false sample true loss corresponding to the first generated image, and a false sample false loss corresponding to the first generated image.

Since the first discriminant network needs to determine all the m target style maps as true samples (i.e., true samples, where the true probability is 1), but in the actual training process, the probability that each target style map is discriminated as true by the first discriminant network may not be 1, at this time, a first countermeasure loss may be determined based on the determination of the true and false probabilities of the target style maps, which is defined as the true sample loss corresponding to the target style map in the embodiment of the present disclosure, and for convenience of description, the true sample loss corresponding to the target style map is hereinafter abbreviated as L12 _ loss 1.

Since the first discriminant network needs to judge all m first generated images as false samples (i.e., generated samples, the probability of true is 0), but in the actual training process, the probability of each first generated image being discriminated as true by the discriminant network may not be 0, at this time, another first countermeasure loss may be determined based on the judgment of the true and false probabilities of the first generated images, which is defined as the true loss of the false sample corresponding to the first generated image in the embodiment of the present disclosure, and for convenience of description, the true loss of the false sample corresponding to the first generated image is abbreviated as L12 _ loss2 hereinafter.

Since the first generation network needs to reduce the style gap (target style) between the generated samples (first generation images) and the true samples as much as possible, that is, the first generation network makes the first discriminant network misjudge as much as possible, and all the m first generation images are judged as true samples, at this time, a further first confrontation loss can be determined based on the judgment (misjudgment) of the true and false probability of the first generation image caused by the first generation network, which is defined as a false sample false loss corresponding to the first generation image in the embodiment of the present disclosure, and for convenience of description, the false sample false loss corresponding to the first generation image is referred to as L12 _ loss3 for short.

In practical applications, all three losses can be calculated based on a least squares loss function.

Step S132: determining a first image loss between the original stylized graph and a second generated image in each set of training samples;

wherein, as is clear to those skilled in the art, since each second generated image is obtained from the corresponding original style sheet through the first generation network once and then through the second generation network once, the image sizes of the original style sheet and the corresponding second generated image in each set of training samples are the same, for example, a₁And d₁Are the same. However, in the actual training process, the original style sheet in each set of training samples may be different from the corresponding second generated image, the same pixels in the original style sheet and the corresponding second generated image may be compared one by one, the difference value of each pixel is determined, and the first image loss between the original style sheet and the second generated image is determined according to the difference value of each pixel.

In one possible implementation, the difference values of each pixel are summed to obtain a first image loss between the original stylized graph and the second generated image.

For convenience of description hereinafter, the first image loss between the original style sheet and the second generated image will be referred to simply as L11 _ loss.

Step S133: determining corresponding second confrontation loss according to a second judgment result corresponding to each group of training samples;

in an embodiment of the present disclosure, the second pair of anti-loss includes a true sample loss corresponding to the original style sheet, a false sample true loss corresponding to the third generated image, and a false sample false loss corresponding to the third generated image.

Since the second decision network needs to decide all the m original style sheets as true samples (i.e. true samples, where the true probability is 1), but in the actual training process, the probability that each original style sheet is decided as true by the decision network may not be 1, at this time, a second countermeasure loss may be determined based on the decision of the true and false probabilities of the original style sheets, which is defined as the true sample loss corresponding to the original style sheet in the embodiment of the present disclosure, and for convenience of description, the true sample loss corresponding to the original style sheet is hereinafter abbreviated as L22 _ loss 1.

Since the second decision network needs to decide all m third generated images as false samples (i.e. the generated samples have a true probability of 0), but in the actual training process, the probability that each third generated image is decided as true by the decision network may not be 0, at this time, another second countermeasure loss may be determined based on the decision of the true and false probabilities of the third generated images, which is defined as the true loss of the false sample corresponding to the third generated image in the embodiment of the present disclosure, and for convenience of description, the true loss of the false sample corresponding to the third generated image is abbreviated as L22 _ loss2 hereinafter.

Since the second generation network needs to reduce the style gap (original style diagram) between the generated sample (third generation image) and the true sample as much as possible, that is, the second generation network makes the second determination network determine the second determination network as wrong as possible, and all m third generation images are determined to be true samples, at this time, a further second countermeasure loss can be determined based on the determination (wrong determination) of the true and false probability of the third generation image caused by the second generation network, which is defined as a false sample false loss corresponding to the third generation image in the embodiment of the present disclosure, and for convenience of description, the false sample false loss corresponding to the third generation image is simply referred to as L22 _ loss 3.

Step S134: determining a second image loss between the target style sheet and the fourth generated image in each set of training samples;

wherein, as is clear to those skilled in the art, since each fourth generated image is obtained from the corresponding target pattern through the second generation network once and the first generation network once, the image sizes of the target pattern and the corresponding fourth generated image in each set of training samples are the same, such as b₁And f₁Are the same. However, in the actual training process, the target style sheet in each set of training samples may be different from the corresponding fourth generated image, the same pixels in the target style sheet and the fourth generated image may be compared one by one for determining the difference value of each pixel, and then the second image loss between the target style sheet and the corresponding fourth generated image may be determined according to the difference value of each pixel.

In a possible implementation, the difference values of each pixel are summed to obtain a second image loss between the target style sheet and the corresponding fourth generated image.

For convenience of description hereinafter, the second image loss between the target style sheet and the corresponding fourth generated image will be simply referred to as L21 _ loss.

Step S135: and optimizing the anti-biotic network according to the first confrontation loss, the first image loss, the second confrontation loss and the second image loss corresponding to each group of training samples.

The anti-aliasing network is optimized according to the loss of the true sample corresponding to the target style diagram corresponding to each group of training samples, the true sample corresponding to the first generated image, the false sample corresponding to the first generated image, the loss of the first image, the loss of the true sample corresponding to the original style diagram, the true sample corresponding to the third generated image, the false sample corresponding to the third generated image and the loss of the second image.

In the embodiment of the present disclosure, a feasible implementation manner is provided for step S135, and specifically, step S135 may include the following steps:

step S1351: according to the weight corresponding to each loss, carrying out weighted fusion processing on the true sample loss corresponding to the target style diagram, the false sample true loss corresponding to the first generated image, the false sample false loss corresponding to the first generated image and the first image loss in each group of training samples to obtain a first total loss, and carrying out weighted fusion processing on the true sample loss corresponding to the original style diagram, the false sample true loss corresponding to the third generated image, the false sample false loss corresponding to the third generated image and the second image loss in each group of training samples to obtain a second total loss;

step S1352: and optimizing the antibiotic network according to the first total loss and the second total loss.

Considering that the degree of contribution of the true sample loss corresponding to the target style diagram corresponding to each set of training samples, the false sample true loss corresponding to the first generated image, the false sample false loss corresponding to the first generated image, the first image loss, the true sample loss corresponding to the original style diagram, the false sample true loss corresponding to the third generated image, the false sample false loss corresponding to the third generated image, and the second image loss to network optimization is different, in the embodiment of the present disclosure, weights corresponding to each loss are set for representing the importance degree of each loss.

In practical application, the same weight may be used for true sample loss corresponding to the target style diagram corresponding to different training samples, false sample true loss corresponding to the first generated image, false sample false loss corresponding to the first generated image, first image loss, true sample loss corresponding to the original style diagram, false sample true loss corresponding to the third generated image, false sample false loss corresponding to the third generated image, or second image loss.

A person skilled in the art may adjust weights respectively corresponding to a true sample loss corresponding to the target style diagram corresponding to each set of training samples, a false sample true loss corresponding to the first generated image, a false sample false loss corresponding to the first generated image, a first image loss, a true sample loss corresponding to the original style diagram, a false sample true loss corresponding to the third generated image, a false sample false loss corresponding to the third generated image, and a second image loss according to an actual situation, which is not limited herein in the embodiment of the present disclosure.

For convenience of description, the true sample loss corresponding to the target style diagram, the false sample true loss corresponding to the first generated image, the false sample false loss corresponding to the first generated image, the first image loss, the true sample loss corresponding to the original style diagram, the false sample true loss corresponding to the third generated image, the false sample false loss corresponding to the third generated image, and the second image loss are respectively referred to as w11, w12, w13, w14, w21, w22, w23, and w 24.

Then for the disclosed embodiment, for each set of training samples, the first total loss L oss1 is:

Loss1＝w14×L11_loss+w11×L12_loss1+w12×L12_loss2+w13×L12_loss3

the second total loss L oss2 was:

Loss2＝w24×L21_loss+w21×L22_loss1+w22×L22_loss2+w23×L22_loss3

then in the training process, network parameters of the first generation network, the second generation network, the first judgment network and the second judgment network are adjusted according to the first total loss L oss1 and the second total loss L oss2 corresponding to each group of training samples, the anti-biotic network is optimized, and L oss is converged after adjustment of multiple groups of training samples, so that training of the anti-biotic network is completed.

Through a large number of experiments, the inventor of the present disclosure finds that the best result can be obtained by training when the weight of each loss includes the true sample loss corresponding to the target style diagram in each set of training samples, the true sample loss corresponding to the first generated image, the false sample false loss corresponding to the first generated image, and the weight ratio corresponding to the first image loss is 1:1:2:100, and the true sample loss corresponding to the original style diagram in each set of training samples, the false sample true loss corresponding to the third generated image, the false sample false loss corresponding to the third generated image, and the weight ratio corresponding to the second image loss is 1:1:2: 100.

Therefore, in step S1331, the weight values corresponding to the true sample loss, the false sample true loss, the false sample false loss, and the first image loss of the target style sheet in each set of training samples may be 1, 2, and 100, respectively, and the weight values corresponding to the true sample loss, the false sample true loss, the false sample false loss, and the second image loss of the original style sheet in each set of training samples may be 1, 2, and 100, respectively.

The first total loss can be obtained as follows:

Loss1＝100×L11_loss+1×L12_loss1+1×L12_loss2+2×L12_loss3

the second total loss is:

Loss2＝100×L21_loss+1×L22_loss1+1×L22_loss2+2×L22_loss3

in step S1332, the antibiontic network is optimized according to the first total loss and the second total loss corresponding to each set of training samples, so as to obtain the best training effect.

In practical application, the first total loss and the second total loss corresponding to each group of training samples may also be fused, for example, added, averaged, or other fusion methods, to obtain a corresponding final total loss, and in step S1332, the anti-biotic network is optimized according to the final total loss corresponding to each group of training samples, so as to obtain the optimal training effect.

When the target stylized special effect network is used for processing the target stylized special effect of the image, the effect with the clearest degree and the highest sharpening degree can be obtained.

Based on the above embodiments of the present disclosure, in the embodiments of the present disclosure, the processing instruction of the target stylized special effect may be issued through an operation of the user on the terminal device. The terminal devices include, but are not limited to, mobile terminals, smart terminals, and the like, such as mobile phones, smart phones, tablet computers, notebook computers, personal digital assistants, portable multimedia players, navigation devices, and the like. It will be understood by those skilled in the art that the configuration according to the embodiments of the present disclosure can be applied to a fixed type terminal such as a digital television, a desktop computer, etc., in addition to elements particularly used for mobile purposes.

In the embodiment of the present disclosure, the execution subject of the method may be the terminal device or an application installed on the terminal device. Specifically, after receiving a processing instruction of a target stylized special effect, in step S100, an image to be processed corresponding to the processing instruction is obtained, and a target stylized special effect network obtained by training through the training step provided in any embodiment of the present disclosure is obtained, where the image to be processed is an image of an original style corresponding to the target stylized special effect network. In step S200, a target stylized image can be obtained by performing target stylization processing on the image to be processed through the target stylized special effect network.

Further, after obtaining the target style image, the method may further include the steps of: and displaying the target style image on a display screen.

Or, the execution subject of the method may be a server, and after receiving a processing instruction of a target stylized special effect sent by a terminal device, the execution subject receives an image to be processed corresponding to the processing instruction, acquires a target stylized special effect network obtained by training in the training step provided in any one of the embodiments of the present disclosure, performs target stylized processing on the image to be processed through the target stylized special effect network, obtains a target style image, and sends the target style image to the terminal device for display.

In practical applications, the number of the images to be processed may be one or more. When the number of the images to be processed is multiple, the images to be processed may also be videos to be processed. And processing each frame of image in the video to be processed by adopting the image processing method to obtain the target style video.

Taking a face image as an image to be processed and taking a black-and-white line style as an example, as shown in fig. 3a, the black-and-white line style image shown in fig. 3b can be obtained by performing black-and-white line striping on a target stylized special effect network (black-and-white line striping special effect network) obtained by training the image to be processed through the training step provided in any one of the embodiments of the present disclosure in fig. 3 a.

Compared with the prior art, the target style image obtained by processing through the processing method of the target stylized special effect provided by the embodiment of the disclosure is clearer and has higher sharpening degree.

In the embodiment of the present disclosure, a second generation network obtained by training in the training step provided in any one of the above embodiments may also be obtained to convert the image of the target style into an image of another style. For example, the black and white line style image shown in fig. 3b can be processed by the trained second generation network to obtain the image shown in fig. 3 a. For a specific process, reference may be made to the above description of the target stylized special effect, and details thereof are not described herein again.

Based on the foregoing embodiments of the present disclosure, an embodiment of the present disclosure further provides a training model, where the training model includes: the system comprises a first generation network, a second generation network, a first judgment network and a second judgment network, wherein the first generation network is used for carrying out target stylization processing on an original style chart in each group of training samples to obtain a corresponding first generated image, the second generation network is used for carrying out style primitive processing on the first generated image to obtain a corresponding second generated image, the second generation network is used for carrying out style primitive processing on the target style chart in each group of training samples to obtain a corresponding third generated image, the first generation network is used for carrying out target stylization processing on the third generated image to obtain a corresponding fourth generated image, the first judgment network is used for judging the authenticity of the target style chart and the first generated image in each group of training samples to obtain a corresponding first judgment result, and the second judgment network is used for judging the original style chart in each group of training samples, Judging the authenticity of the third generated image to obtain a corresponding second judgment result;

as shown in fig. 4, the first generation network is connected to the second generation network and the first discrimination network, and the second generation network is further connected to the second discrimination network, so that the countermeasure training is performed on the countermeasure network based on the first discrimination result, the second generated image, and the fourth generated image respectively corresponding to each set of training samples, and the trained first generation network is obtained.

The implementation principle and the generated technical effect of the training model provided in the embodiments of the present disclosure are the same as those of the countermeasure generation network in the embodiments of the foregoing methods, and for the sake of brief description, no part of this embodiment is mentioned, and reference may be made to corresponding contents in the embodiments of the foregoing methods, and details are not repeated here.

The embodiment of the present disclosure also provides an image processing apparatus, as shown in fig. 5, the image processing apparatus 50 may include: an obtaining module 501 and a special effects processing module 502, wherein,

the obtaining module 501 is configured to obtain an image to be processed and a pre-trained target stylized special effect network, where the target stylized special effect network is obtained by performing countermeasure training on two generation networks and two discrimination networks included in an countermeasure generation network, and the two generation networks correspond to opposite style change processes;

the special effect processing module 502 is configured to perform target stylization processing on the image to be processed through the target stylized special effect network to obtain a target style image.

In an alternative implementation, the target stylized special effects network is trained by the following steps:

acquiring a training sample set, wherein each group of training samples in the training sample set comprises an original style sheet and a target style sheet;

acquiring a pre-constructed countermeasure generating network, wherein the countermeasure generating network comprises a first generating network, a second generating network, a first judging network and a second judging network;

performing target stylization processing on the original stylized graphs in each group of training samples through a first generation network to obtain corresponding first generated images, and performing style-original processing on the first generated images through a second generation network to obtain corresponding second generated images;

performing style-primitive processing on the target stylized graph in each group of training samples through a second generation network to obtain a corresponding third generated image, and performing target stylized processing on the third generated image through the first generation network to obtain a corresponding fourth generated image;

judging the authenticity of the target style sheet and the first generated image in each group of training samples through a first judging network to obtain a corresponding first judging result;

judging the authenticity of the original style sheet and the third generated image in each group of training samples through a second judging network to obtain a corresponding second judging result;

and performing countermeasure training on the anti-biotic network based on the first judgment result, the second generated image and the fourth generated image which are respectively corresponding to each group of training samples, and determining the trained first generated network as a target stylized special effect network.

In an optional implementation manner, the process of performing countercheck training on the anti-biotic network based on the first discrimination result, the second generated image, and the fourth generated image respectively corresponding to each set of training samples includes:

determining corresponding first countermeasure loss according to the first discrimination result corresponding to each group of training samples;

determining a first image loss between the original stylized graph and a second generated image in each set of training samples;

determining corresponding second confrontation loss according to a second judgment result corresponding to each group of training samples;

determining a second image loss between the target style sheet and the fourth generated image in each set of training samples;

and optimizing the anti-biotic network according to the first confrontation loss, the first image loss, the second confrontation loss and the second image loss corresponding to each group of training samples.

In an alternative implementation, the first pair of immunity losses includes a true sample loss corresponding to the target style sheet, a false sample true loss corresponding to the first generated image, and a false sample false loss corresponding to the first generated image;

the second pair of anti-losses includes a true sample loss corresponding to the original style sheet, a false sample true loss corresponding to the third generated image, and a false sample false loss corresponding to the third generated image.

In an alternative implementation, the process of optimizing the anti-biotic network according to the first confrontation loss, the first image loss, the second confrontation loss, and the second image loss corresponding to each set of training samples includes:

according to the weight corresponding to each loss, carrying out weighted fusion processing on the true sample loss corresponding to the target style diagram, the false sample true loss corresponding to the first generated image, the false sample false loss corresponding to the first generated image and the first image loss in each group of training samples to obtain a first total loss, and carrying out weighted fusion processing on the true sample loss corresponding to the original style diagram, the false sample true loss corresponding to the third generated image, the false sample false loss corresponding to the third generated image and the second image loss in each group of training samples to obtain a second total loss;

and optimizing the antibiotic network according to the first total loss and the second total loss.

In an alternative implementation, the weight corresponding to each loss includes:

the ratio of the weight corresponding to the loss of the true sample corresponding to the target style graph, the true loss of the false sample corresponding to the first generated image, the false loss of the false sample corresponding to the first generated image and the loss of the first image in each group of training samples is 1:1:2: 100;

the weight ratio of the true sample loss corresponding to the original style diagram in each group of training samples, the false sample true loss corresponding to the third generated image, the false sample false loss corresponding to the third generated image and the second image loss is 1:1:2: 100.

The image processing apparatus provided in the embodiment of the present disclosure may be specific hardware on the device, or software or firmware installed on the device, and the implementation principle and the generated technical effect are the same as those of the foregoing method embodiment, and for brief description, no part of the embodiment of the device is mentioned, and reference may be made to corresponding contents in the foregoing method embodiment, and details are not repeated here.

For training of the target stylized special-effect network, the embodiment of the present disclosure further provides a training device, where the training device may include: a sample acquisition module, a network acquisition module and a network training module, wherein,

the system comprises a sample acquisition module, a training sample acquisition module and a training sample analysis module, wherein the sample acquisition module is used for acquiring a training sample set, and each group of training samples in the training sample set comprises an original style sheet and a target style sheet;

the network acquisition module is used for acquiring a pre-constructed countermeasure generating network, and the countermeasure generating network comprises a first generating network, a second generating network, a first judging network and a second judging network;

the network training module is used for carrying out countermeasure training on the anti-biotic network based on the first judgment result, the second generated image and the fourth generated image which are respectively corresponding to each group of training samples, and determining the trained first generated network as a target stylized special effect network.

In an optional implementation manner, the network training module is specifically configured to, when performing countercheck training on the anti-biotic network based on the first discrimination result, the second generated image, and the fourth generated image respectively corresponding to each set of training samples:

In an optional implementation manner, the network training module, when configured to optimize the anti-biotic network according to the first confrontation loss, the first image loss, the second confrontation loss, and the second image loss corresponding to each set of training samples, is specifically configured to:

The training apparatus provided in the embodiments of the present disclosure may be specific hardware on the device, or software or firmware installed on the device, etc., and the implementation principle and the generated technical effect are the same as those of the foregoing method embodiments.

Referring now to FIG. 6, a schematic diagram of an electronic device 60 suitable for use in implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

The electronic device includes: a memory and a processor, wherein the processor may be referred to as the processing device 601 hereinafter, and the memory may include at least one of a Read Only Memory (ROM)602, a Random Access Memory (RAM)603 and a storage device 608 hereinafter, which are specifically shown as follows:

as shown in fig. 6, the electronic device 60 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 60 are also stored. The processing device 601, the ROM602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

In general, input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc., output devices 607 including, for example, a liquid crystal display (L CD), speaker, vibrator, etc., storage devices 608 including, for example, magnetic tape, hard disk, etc., and communication devices 609. communication devices 609 may allow electronic device 60 to communicate wirelessly or wiredly with other devices to exchange data.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText transfer protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communications network). examples of communications networks include local area networks ("L AN"), wide area networks ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the image processing method shown in any of the above embodiments of the present disclosure.

Computer program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including AN object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules or units described in the embodiments of the present disclosure may be implemented by software or hardware. Wherein the designation of a module or unit does not in some cases constitute a limitation of the unit itself.

For example, without limitation, exemplary types of hardware logic that may be used include Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), complex programmable logic devices (CP L D), and so forth.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Example 1 provides, according to one or more embodiments of the present disclosure, an image processing method including:

In an optional implementation manner, performing countermeasure training on an anti-forming network based on a first discrimination result, a second generated image, and a fourth generated image respectively corresponding to each set of training samples includes:

In an alternative implementation, optimizing the anti-biotic network according to the first confrontation loss, the first image loss, the second confrontation loss and the second image loss corresponding to each set of training samples includes:

Example 2 provides the image processing apparatus of example 1, the apparatus including:

Example 3 provides, in accordance with one or more embodiments of the present disclosure, an exercise device, comprising:

Example 4 provides, in accordance with one or more embodiments of the present disclosure, an electronic device comprising:

a processor and a memory storing at least one instruction, at least one program, set of codes, or set of instructions that is loaded and executed by the processor to implement a method as shown in example 1 or any of the alternative implementations of example 1 of the present disclosure.

Example 5 provides a computer readable medium for storing a computer instruction, program, code set or instruction set which, when run on a computer, causes the computer to perform a method as shown in example 1 or any one of the alternative implementations of example 1 of the present disclosure.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. An image processing method, comprising:

2. The image processing method of claim 1, wherein the target stylized special effects network is trained by:

performing target stylization processing on the original stylized graph in each group of training samples through the first generation network to obtain a corresponding first generated image, and performing style primitive processing on the first generated image through the second generation network to obtain a corresponding second generated image;

performing style-primitive processing on the target stylized graph in each group of training samples through the second generation network to obtain a corresponding third generated image, and performing target stylized processing on the third generated image through the first generation network to obtain a corresponding fourth generated image;

judging the authenticity of the target style sheet and the first generated image in each group of training samples through the first judging network to obtain a corresponding first judging result;

judging the authenticity of the original style sheet and the third generated image in each group of training samples through the second judgment network to obtain a corresponding second judgment result;

performing countermeasure training on the countermeasure generating network based on the first discrimination result, the second generating image and the fourth generating image respectively corresponding to each group of training samples, and determining the trained first generating network as the target stylized special effect network.

3. The image processing method according to claim 2, wherein the performing countermeasure training on the countermeasure generation network based on the first determination result, the second generation image, and the fourth generation image respectively corresponding to each set of training samples includes:

determining corresponding first confrontation loss according to a first discrimination result corresponding to each group of training samples;

determining a first image loss between the original stylistic map and the second generated image in each set of training samples;

optimizing the countermeasure generation network according to the first countermeasure loss, the first image loss, the second countermeasure loss and the second image loss corresponding to each set of training samples.

4. The image processing method of claim 3, wherein the first pair of impairments comprises a true sample impairment corresponding to the target style sheet, a false sample true impairment corresponding to the first generated image, and a false sample false impairment corresponding to the first generated image;

the second pair of anti-loss comprises a true sample loss corresponding to the original style sheet, a false sample true loss corresponding to the third generated image, and a false sample false loss corresponding to the third generated image.

5. The image processing method of claim 4, wherein the optimizing the countermeasure generation network according to the first countermeasure loss, the first image loss, the second countermeasure loss, and the second image loss corresponding to each set of training samples comprises:

and optimizing the countermeasure generation network according to the first total loss and the second total loss.

6. The image processing method according to claim 5, wherein the weight corresponding to each loss comprises:

the ratio of the weight corresponding to the loss of the true sample corresponding to the target style diagram, the loss of the true sample corresponding to the first generated image, the false sample false loss corresponding to the first generated image and the loss of the first image in each group of training samples is 1:1:2: 100;

the ratio of the weight corresponding to the loss of the true sample corresponding to the original style sheet in each set of training samples, the loss of the true sample corresponding to the third generated image, the false sample false loss corresponding to the third generated image and the loss of the second image is 1:1:2: 100.

7. An image processing apparatus characterized by comprising:

8. An exercise device, comprising:

the system comprises a sample acquisition module, a data processing module and a data processing module, wherein the sample acquisition module is used for acquiring a training sample set, and each group of training samples in the training sample set comprises an original style sheet and a target style sheet;

the first generation network is used for carrying out target stylization processing on the original stylized graph in each group of training samples to obtain a corresponding first generated image, and the second generation network is used for carrying out style original processing on the first generated image to obtain a corresponding second generated image;

the second generation network is used for performing style-primitive processing on the target stylized graph in each group of training samples to obtain a corresponding third generated image, and the first generation network is used for performing target stylized processing on the third generated image to obtain a corresponding fourth generated image;

and the network training module is used for performing countermeasure training on the countermeasure generating network based on the first judgment result, the second generating image and the fourth generating image which respectively correspond to each group of training samples, and determining the trained first generating network as a target stylized special effect network.

9. An electronic device, comprising:

a processor and a memory, the memory storing at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the method of any of claims 1-5.

10. A computer readable medium for storing a computer instruction, a program, a set of codes, or a set of instructions, which when run on a computer, causes the computer to perform the method of any one of claims 1-5.