CN113470124B

CN113470124B - Training method and device for special effect model, and special effect generation method and device

Info

Publication number: CN113470124B
Application number: CN202110736335.XA
Authority: CN
Inventors: 赵松涛; 宋丛礼
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2023-09-22
Anticipated expiration: 2041-06-30
Also published as: CN113470124A

Abstract

The disclosure relates to a training method and device for a special effect model, and a special effect generation method and device. The training method of the special effect model comprises the following steps: acquiring a training sample set, wherein the training sample set comprises a plurality of training images; inputting a training image and a first mask map into the special effect model to obtain an output image with special effects added to a preset object in the training image and a second mask map, wherein the first mask map is a mask map of the preset object in the training image, and the second mask map is a mask map of the preset object with special effects added to the output image; determining a value of a target loss function based on the training image, the output image, the first mask map, and the second mask map; and adjusting parameters of the special effect model based on the value of the target loss function, and training the special effect model. The method and the device can solve the problem that the GAN special effect in the related technology is not ideal in the color, texture and other attributes.

Description

Training method and device for special effect model, and special effect generation method and device

Technical Field

The disclosure relates to the field of image processing, and in particular relates to a training method and device for a special effect model, and a special effect generation method and device.

Background

At present, in common short video applications or camera applications, special effects of a generated countermeasure network (Generative Adversarial Networks, abbreviated as GAN) are more and more common, such as special effects of aging, children, cartoon faces, hand drawings and the like, and because of the very natural and real change of some attributes or styles (such as facial attributes or styles), the special effects are deeply favored by users.

In the related art, generally, an image domain (domain) -based migration scheme, such as a cyclic plan, a stargan, etc., is used to control the attribute of an object to be processed to implement the attribute change, but the effect of the scheme on the attribute of color, texture, etc., is not ideal. Taking the special hair growth effect as an example, the color transition of the hair growing out of the special hair effect is not ideal in the hair growth process, especially for the hair with non-black color (such as red, yellow, brown and the like), the color of the grown hair is not matched with the color of the hair in the original picture to a certain extent, such as the black hair growing out on the basis of the original red hair, and the phenomenon of color fault appears.

Disclosure of Invention

The disclosure provides a training method and device for a special effect model, and a special effect generation method and device, so as to at least solve the problem that GAN special effects in the related technology are not ideal in terms of color, texture and other attributes.

According to a first aspect of an embodiment of the present disclosure, there is provided a training method of a special effect model, including: acquiring a training sample set, wherein the training sample set comprises a plurality of training images; inputting a training image and a first mask map into the special effect model to obtain an output image with special effects added to a preset object in the training image and a second mask map, wherein the first mask map is a mask map of the preset object in the training image, and the second mask map is a mask map of the preset object with special effects added to the output image; determining a value of a target loss function based on the training image, the output image, the first mask map, and the second mask map; and adjusting parameters of the special effect model based on the value of the target loss function, and training the special effect model.

Optionally, determining the value of the target loss function based on the training image, the output image, the first mask map, and the second mask map, determining the value of the target loss function based on the mask map of the output image and the mask map of the training image, includes: based on the training image and a mask image first mask image of the training image, obtaining a real feature vector of a preset object in the training image; obtaining a first estimated feature vector of a preset object in the output image based on the mask image first mask image of the output image and the training image; obtaining a second estimated feature vector of the preset object added with the special effect in the output image based on the output image and a mask image second mask image of the output image; and determining the value of the target loss function according to the real feature vector, the first estimated feature vector and the second estimated feature vector.

Optionally, determining the value of the objective loss function based on the real feature vector, the first estimated feature vector, and the second estimated feature vector includes: determining a value of a first target loss function based on an absolute value of a difference between the real feature vector and the first estimated feature vector; determining a value of a second target loss function based on a square value of a difference between the real feature vector and the second estimated feature vector; determining a value of a third target loss function based on the square value of the difference between two adjacent eigenvalues on each row vector in the second estimated eigenvector and the square value of the difference between two adjacent eigenvalues on each column vector in the second estimated eigenvector; the value of the target loss function is determined from the value of the first target loss function, the value of the second target loss function, and the value of the third target loss function.

Optionally, determining the value of the objective loss function based on the real feature vector, the first estimated feature vector, and the second estimated feature vector includes: determining a value of a first target loss function based on an absolute value of a difference between the real feature vector and the first estimated feature vector; determining a value of a second target loss function based on a square value of a difference between the real feature vector and the second estimated feature vector; the value of the target loss function is determined from the value of the first target loss function and the value of the second target loss function.

Optionally, determining the value of the objective loss function based on the real feature vector, the first estimated feature vector, and the second estimated feature vector includes: determining a value of a first target loss function based on an absolute value of a difference between the real feature vector and the first estimated feature vector; determining a value of a third target loss function based on the square value of the difference between two adjacent eigenvalues on each row vector in the second estimated eigenvector and the square value of the difference between two adjacent eigenvalues on each column vector in the second estimated eigenvector; and determining the value of the target loss function according to the value of the first target loss function and the value of the third target loss function.

Optionally, the value of the third objective loss function is determined by the following formula:

wherein Z is _i,j And beta is a value determined according to actual needs, wherein the value is the value of the ith row and the jth column in the second estimated feature vector.

Optionally, the first mask image is obtained by inputting the training image into an image segmentation model.

Optionally, the special effects model includes a StarGAN model.

According to a second aspect of the embodiments of the present disclosure, there is provided a special effect generation method, including: acquiring an image to be processed; inputting the images to be processed and the mask images of the preset objects of the images to be processed into a special effect model to obtain an output image with special effects added to the preset objects of the images to be processed, wherein the special effect model is trained by adopting the training method of the special effect model.

Optionally, the mask map of the predetermined object of the image to be processed is obtained by inputting the image to be processed into the image segmentation model.

Optionally, the special effects model includes a StarGAN model.

According to a third aspect of the embodiments of the present disclosure, there is provided a training apparatus for a special effect model, including: a sample acquisition unit configured to acquire a training sample set, wherein the training sample set contains a plurality of training images; the special effect adding unit is configured to input a training image and a first mask image into the special effect model to obtain an output image with special effects added to a preset object in the training image and a second mask image, wherein the first mask image is a mask image of the preset object in the training image, and the second mask image is a mask image of the preset object with special effects added to the output image; a loss function determining unit configured to determine a value of the target loss function based on the training image, the output image, the first mask map, and the second mask map; and the special effect model training unit is configured to adjust parameters of the special effect model based on the value of the target loss function and train the special effect model.

Optionally, the loss function determining unit is further configured to obtain a real feature vector of a predetermined object in the training image based on the training image and the first mask map; obtaining a first estimated feature vector of a preset object in the output image based on the output image and the first mask image; obtaining a second estimated feature vector of the preset object added with the special effect in the output image based on the output image and the second mask image; and determining the value of the target loss function according to the real feature vector, the first estimated feature vector and the second estimated feature vector.

Optionally, the loss function determining unit is further configured to determine a value of the first target loss function based on an absolute value of a difference between the real feature vector and the first estimated feature vector; determining a value of a second target loss function based on a square value of a difference between the real feature vector and the second estimated feature vector; determining a value of a third target loss function based on the square value of the difference between two adjacent eigenvalues on each row vector in the second estimated eigenvector and the square value of the difference between two adjacent eigenvalues on each column vector in the second estimated eigenvector; the value of the target loss function is determined from the value of the first target loss function, the value of the second target loss function, and the value of the third target loss function.

Optionally, the loss function determining unit is further configured to determine a value of the first target loss function based on an absolute value of a difference between the real feature vector and the first estimated feature vector; determining a value of a second target loss function based on a square value of a difference between the real feature vector and the second estimated feature vector; the value of the target loss function is determined from the value of the first target loss function and the value of the second target loss function.

Optionally, the loss function determining unit is further configured to determine a value of the first target loss function based on an absolute value of a difference between the real feature vector and the first estimated feature vector; determining a value of a third target loss function based on the square value of the difference between two adjacent eigenvalues on each row vector in the second estimated eigenvector and the square value of the difference between two adjacent eigenvalues on each column vector in the second estimated eigenvector; and determining the value of the target loss function according to the value of the first target loss function and the value of the third target loss function.

Optionally, the special effects model includes a StarGAN model.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a special effect generating apparatus, including: an image acquisition unit configured to acquire an image to be processed; the special effect adding unit is configured to input the to-be-processed image and the mask image of the preset object of the to-be-processed image into the special effect model to obtain an output image after adding the special effect to the preset object of the to-be-processed image, wherein the special effect model is trained by adopting the training method of the special effect model.

Optionally, the special effects model includes a StarGAN model.

According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic device, comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to execute instructions to implement a training method or a special effect generation method of the special effect model according to the present disclosure.

According to a sixth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, which when executed by at least one processor, causes the at least one processor to perform the training method or the special effect generation method according to the special effect model of the present disclosure as above.

According to a seventh aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer instructions which, when executed by a processor, implement a training method or a special effect generation method of a special effect model according to the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

according to the training method and device for the special effect model and the special effect generation method and device, when the special effect model is trained, mask patterns representing a preset object (such as a hair area) are added to the input and the output of the special effect model, namely, the input of the special effect model is changed from an original training image to a training image and a first mask pattern, and the output of the special effect model is changed from an original output image to an output image and a second mask pattern, so that parameters of the special effect model can be adjusted based on the training image, the output image and the mask patterns of the preset object, the trained special effect model can enable special effects such as colors and textures of the preset object to be more natural, and a more realistic effect is achieved. Therefore, the method solves the problem that the GAN special effect in the related technology is not ideal in terms of color, texture and other attributes.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

FIG. 1 is an implementation scenario diagram illustrating a training method of a special effects model according to an exemplary embodiment of the present disclosure;

fig. 2 is an effect diagram illustrating an effect generating method according to an exemplary embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating a training method for a special effects model, according to an exemplary embodiment;

FIG. 4 is a diagram illustrating a comparison of a generated network in accordance with an exemplary embodiment;

FIG. 5 is a flowchart illustrating a method of effect generation, according to an exemplary embodiment;

FIG. 6 is a block diagram of a training apparatus for a special effects model, shown in accordance with an exemplary embodiment;

fig. 7 is a block diagram of an effect generation apparatus according to an exemplary embodiment;

fig. 8 is a block diagram of an electronic device 800 according to an embodiment of the disclosure.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The embodiments described in the examples below are not representative of all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

It should be noted that, in this disclosure, "at least one of the items" refers to a case where three types of juxtaposition including "any one of the items", "a combination of any of the items", "an entirety of the items" are included. For example, "including at least one of a and B" includes three cases side by side as follows: (1) comprises A; (2) comprising B; (3) includes A and B. For example, "at least one of the first and second steps is executed", that is, three cases are juxtaposed as follows: (1) performing step one; (2) executing the second step; (3) executing the first step and the second step.

The training method and the special effect generating method for the special effect model can enable the special effect transition of colors, textures and the like of the preset objects to be more natural, achieve more realistic effects, and can be applied to but not limited to the following preset objects: hair, lips, eyebrows, etc., will be described below by taking the special effect of hair growth as an example.

Fig. 1 is a schematic diagram illustrating an implementation scenario of a training method of a special effect model according to an exemplary embodiment of the present disclosure, and as illustrated in fig. 1, the implementation scenario includes a server 100, a user terminal 110, and a user terminal 120, where the user terminals are not limited to 2, including but not limited to a mobile phone, a personal computer, and other devices, and the user terminal may be installed with a camera for acquiring an image including hair, where the server may be one server, or a server cluster formed by a plurality of servers, or may be a cloud computing platform or a virtualization center.

After receiving the request for training the special effect model sent by the user terminal 110, 120, the server 100 counts the images containing hairs received from the user terminal 110, 120 historically, and combines the images as a training sample set, inputs the training image in the training sample set and the mask image of hairs in the training image into the special effect model, and obtains the output image with special effects added to the hairs in the training image and the mask image of hairs with special effects added to the output image, so that the value of the target loss function can be determined based on the training image, the output image, the mask image of hairs in the training image and the mask image of hairs with special effects added to the output image, and then the parameters of the special effect model are adjusted based on the value of the target loss function, and the special effect model is trained. The special effect model trained by the method can make the special effect transition of the color, the texture and the like of the growing hair more natural, and achieves more realistic effect.

Fig. 2 is an effect diagram illustrating an effect generating method according to an exemplary embodiment of the present disclosure, after training an effect model, after a user terminal 110 or 120 acquires a picture to be processed (an image on the left side of fig. 2), the picture to be processed is directly input into the effect model, an output image (an image on the right side of fig. 2) with an effect added to hair of the image to be processed may be obtained, and colors, textures, etc. of the hair of the output image with the effect added are more natural and more realistic.

Next, a training method and apparatus, and an effect generation method and apparatus of an effect model according to an exemplary embodiment of the present disclosure will be described in detail with reference to fig. 3 to 7.

FIG. 3 is a flowchart illustrating a method of training a special effects model, according to an exemplary embodiment, as shown in FIG. 3, comprising the steps of:

in step S301, a training sample set is acquired, wherein the training sample set contains a plurality of training images. For example, images historically containing hair may be counted and combined together as a training sample set.

In step S302, the training image and the first mask map are input into the special effect model, and an output image and a second mask map are obtained after adding the special effect to the predetermined object in the training image, where the first mask map is a mask map of the predetermined object in the training image, and the second mask map is a mask map of the predetermined object after adding the special effect in the output image. Before the training image and the first mask map are input into the special effect model, the feature vector of the training image and the feature vector of the first mask map may be spliced, for example, a Concat function may be adopted to splice, and then the spliced feature vector is input into the special effect model, and it should be noted that the specific splicing function is not limited to the Concat function, but may be any other function capable of realizing splicing.

According to an exemplary embodiment of the present disclosure, the above-described first mask image may be obtained by inputting a training image into an image segmentation model, and the image segmentation model may automatically identify a predetermined object in the training image, thereby obtaining a corresponding first mask image. By this embodiment, the first mask image can be quickly acquired by means of the image segmentation model.

According to the exemplary embodiment of the disclosure, the StarGAN model can be adopted as the special effect model, so that a better special effect can be achieved. For example, taking a hair growth effect as an example, the generation network of the related art and the generation network of the present disclosure change the Input of the effect model from an Input image (i.e., the training image described above) to a mask map 1 of the Input image and a hair region corresponding to the Input image, for example, after splicing the Input image and the mask1 by a Concat function, the Input is Input to the effect model, and the Output of the effect model is adaptively changed from an Output image (Output image) to a mask map 2 of the Output image and the hair region corresponding to the Output image. It should be noted that any other model suitable for the present disclosure may be used as the special effect model, which is not limited in this disclosure.

In step S303, the value of the target loss function is determined based on the training image, the output image, the first mask map, and the second mask map.

According to an exemplary embodiment of the present disclosure, determining a value of a target loss function based on a training image, an output image, a first mask map, and a second mask map may be achieved by: based on the training image and the first mask map, obtaining a real feature vector of a preset object in the training image; obtaining a first estimated feature vector of a preset object in the output image based on the output image and the first mask image; obtaining a second estimated feature vector of the preset object added with the special effect in the output image based on the output image and the second mask image; and determining the value of the target loss function according to the real feature vector, the first estimated feature vector and the second estimated feature vector. By the embodiment, the value of the target loss function can be conveniently determined.

For example, taking the hair growth effect as an example, an Input image (i.e., the training image) may be multiplied by a hair region mask map (mask 1) corresponding to the Input image as a real feature vector, an Output image (Output image) may be multiplied by a hair region mask map 1 corresponding to the Input image as a first estimated feature vector, and an Output image (Output image) may be multiplied by a hair region mask map mask2 corresponding to the Output image as a second estimated feature vector.

According to an exemplary embodiment of the present disclosure, determining a value of an objective loss function from a true feature vector, a first estimated feature vector, and a second estimated feature vector may include: determining a value of a first target loss function based on an absolute value of a difference between the real feature vector and the first estimated feature vector; determining a value of a second target loss function based on a square value of a difference between the real feature vector and the second estimated feature vector; determining a value of a third target loss function based on the square value of the difference between two adjacent eigenvalues on each row vector in the second estimated eigenvector and the square value of the difference between two adjacent eigenvalues on each column vector in the second estimated eigenvector; the value of the target loss function is determined from the value of the first target loss function, the value of the second target loss function, and the value of the third target loss function. Through the embodiment, the values of the three target loss functions are used for jointly restraining the preset object, so that not only can the continuity of the color and texture of the preset object in the output image be ensured, but also the smoothness of the color and texture of the preset object in the output image can be ensured, and the special effect of the output image is more natural and lifelike.

According to an exemplary embodiment of the present disclosure, the value of the first target loss function described above may be determined by the following formula:

L1＝|X-Y| (1)

wherein X is the true feature vector and Y is the first estimated feature vector.

It should be noted that the above formula may be a function of any area of the predetermined object that can restrict the output image, such as L1 restriction, that must be able to cover the predetermined object of the input image. For example, taking the special effect of hair length as an example, the above formula can ensure that the area corresponding to the original hair in the output image and the original hair area in the input image are completely consistent.

According to an exemplary embodiment of the present disclosure, the value of the above-described second objective loss function may be determined by the following formula:

L2＝(X-Z) ² (2)

wherein X is the true feature vector and Z is the second estimated feature vector.

It should be noted that the above formula may be any function that can constrain the information of the predetermined object of the input image and the information of the predetermined object of the output image, such as VGG Loss, so as to ensure the continuity of the color, texture, and other attributes of the predetermined object. For example, taking the special effect of hair growth as an example, the above formula can ensure that the hair growing in the output image is continuous with the color and texture of the original hair in the input image, and no faults exist.

According to an exemplary embodiment of the present disclosure, the value of the above-described third objective loss function may be determined by the following formula:

wherein Z is _i,j For the value of the ith row and jth column in the second estimated feature vector, β is a value determined according to actual needs, for example, 2 may be taken.

It should be noted that the above formula may be any function that can add a regularization term to the information of the predetermined object of the input image and the information of the predetermined object of the output image to constrain noise and ensure smoothness of the image, for example, when taking the special effect of hair length as an example, the hair growing in the output image can be ensured to be as smooth as the color and texture of the original hair in the input image, without noise points.

The determining the value of the target loss function according to the value of the first target loss function, the value of the second target loss function and the value of the third target loss function may be that the values of the three target loss functions are summed according to respective preset weights to obtain the value of the target loss function, the respective preset weights may be set according to actual needs, or the comparison may be performed based on multiple iteration results in the training process to select a proper weight. For example, the formula for the value of the target loss function may be as follows:

Loss＝lambda1×L1+lambda2×L2+lambda3×L3 (4)

The lambda1, the lambda2 and the lambda3 are set according to actual needs, or appropriate weights are selected in a comparison mode based on multiple iteration results in the training process.

According to an exemplary embodiment of the present disclosure, determining the value of the target loss function from the real feature vector, the first estimated feature vector, and the second estimated feature vector may further include: determining a value of a first target loss function based on an absolute value of a difference between the real feature vector and the first estimated feature vector; determining a value of a second target loss function based on a square value of a difference between the real feature vector and the second estimated feature vector; the value of the target loss function is determined from the value of the first target loss function and the value of the second target loss function. Through the embodiment, the values of the two target loss functions are used for jointly restraining the preset object, so that the continuity of the color and the texture of the preset object in the output image can be ensured, and the special effect of the output image is more natural and lifelike. The values of the first target loss function and the second target loss function are discussed in detail above and will not be discussed further herein.

According to an exemplary embodiment of the present disclosure, determining the value of the target loss function from the real feature vector, the first estimated feature vector, and the second estimated feature vector may further include: determining a value of a first target loss function based on an absolute value of a difference between the real feature vector and the first estimated feature vector; determining a value of a third target loss function based on the square value of the difference between two adjacent eigenvalues on each row vector in the second estimated eigenvector and the square value of the difference between two adjacent eigenvalues on each column vector in the second estimated eigenvector; and determining the value of the target loss function according to the value of the first target loss function and the value of the third target loss function. Through the embodiment, the values of the two target loss functions are used for jointly restraining the preset object, so that the smoothness of the color and the texture of the preset object in the output image can be ensured, and the special effect of the output image is more natural and lifelike. The values of the first target loss function and the second target loss function are discussed in detail above and will not be discussed further herein.

In step S304, parameters of the special effect model are adjusted based on the value of the objective loss function, and the special effect model is trained.

In summary, in the training process of the special effect model, the mask patterns of the preset object of the input image and the mask patterns of the preset object of the output image are added, and the special effect model is called based on the two mask patterns, so that the transition of the color, the texture and the like of the preset object in the output image of the trained special effect model can be ensured to be more continuous, stable and natural, and a more real long-term experience is brought to a user. Furthermore, by means of the functions of the three constraint preset objects, transition continuity, stability and nature of the preset objects of the output image and the preset objects of the input image are further guaranteed, the special effect is more real and nature, and better user experience is achieved.

Fig. 5 is a flowchart illustrating a special effect generation method according to an exemplary embodiment, and as shown in fig. 5, the special effect generation method includes the steps of:

in step 501, an image to be processed is acquired. The image to be processed can be an image acquired through a camera on the terminal, or can be an image stored locally in advance.

In step 502, the mask image of the image to be processed and the predetermined object of the image to be processed is input into the special effect model, so as to obtain an output image with special effects added to the predetermined object of the image to be processed, wherein the special effect model is trained by adopting the training method of the special effect model as described above. The mask map of the predetermined object of the image to be processed is obtained by inputting the image to be processed into an image segmentation model, and the image segmentation model can automatically identify the predetermined object in the image to be processed, so that the corresponding mask map is obtained. The special effect model can be obtained by training by adopting the training method of the special effect model. For example, taking a hair growth effect as an example, the special effect model may be a StarGAN model, the mask image mask1 of the to-be-processed image and the hair region corresponding to the to-be-processed image is Input into the trained StarGAN model, for example, the Input image and mask1 are spliced by a Concat function and then Input into the StarGAN model, and the StarGAN model outputs an output image with special effects added to the hair of the to-be-processed image. It should be noted that any other model suitable for the present disclosure may be used as the special effect model, which is not limited in this disclosure.

FIG. 6 is a block diagram of a training apparatus for a special effects model, according to an exemplary embodiment. Referring to fig. 6, the apparatus includes a sample acquisition unit 60, a special effect adding unit 62, a loss function determining unit 64, and a special effect model training unit 66.

A sample acquisition unit 60 configured to acquire a training sample set, wherein the training sample set contains a plurality of training images; the special effect adding unit 62 is configured to input a training image and a first mask map into the special effect model, and obtain an output image with special effects added to a predetermined object in the training image and a second mask map, wherein the first mask map is a mask map of the predetermined object in the training image, and the second mask map is a mask map of the predetermined object with special effects added to the output image; a loss function determining unit 64 configured to determine a value of the target loss function based on the training image, the output image, the first mask map, and the second mask map; the special effects model training unit 66 is configured to adjust parameters of the special effects model based on the value of the objective loss function, and train the special effects model.

According to an embodiment of the present disclosure, the loss function determining unit 64 is further configured to obtain a true feature vector of the predetermined object in the training image based on the training image and the first mask map; obtaining a first estimated feature vector of a preset object in the output image based on the output image and the first mask image; obtaining a second estimated feature vector of the preset object added with the special effect in the output image based on the output image and the second mask image; and determining the value of the target loss function according to the real feature vector, the first estimated feature vector and the second estimated feature vector.

According to an embodiment of the present disclosure, the loss function determination unit 64 is further configured to determine a value of the first target loss function based on an absolute value of a difference value of the real feature vector and the first estimated feature vector; determining a value of a second target loss function based on a square value of a difference between the real feature vector and the second estimated feature vector; determining a value of a third target loss function based on the square value of the difference between two adjacent eigenvalues on each row vector in the second estimated eigenvector and the square value of the difference between two adjacent eigenvalues on each column vector in the second estimated eigenvector; the value of the target loss function is determined from the value of the first target loss function, the value of the second target loss function, and the value of the third target loss function.

According to an embodiment of the present disclosure, the loss function determination unit 64 is further configured to determine a value of the first target loss function based on an absolute value of a difference value of the real feature vector and the first estimated feature vector; determining a value of a second target loss function based on a square value of a difference between the real feature vector and the second estimated feature vector; the value of the target loss function is determined from the value of the first target loss function and the value of the second target loss function.

According to an embodiment of the present disclosure, the loss function determination unit 64 is further configured to determine a value of the first target loss function based on an absolute value of a difference value of the real feature vector and the first estimated feature vector; determining a value of a third target loss function based on the square value of the difference between two adjacent eigenvalues on each row vector in the second estimated eigenvector and the square value of the difference between two adjacent eigenvalues on each column vector in the second estimated eigenvector; and determining the value of the target loss function according to the value of the first target loss function and the value of the third target loss function.

According to an embodiment of the present disclosure, the value of the third objective loss function is determined by the following formula:

According to an embodiment of the present disclosure, the first mask image is obtained by inputting a training image into the image segmentation model.

According to an embodiment of the present disclosure, the special effects model includes a StarGAN model.

Fig. 7 is a block diagram of an effect generation apparatus according to an exemplary embodiment. Referring to fig. 7, the apparatus includes an image acquisition unit 70 and a special effect adding unit 72.

An image acquisition unit 70 configured to acquire an image to be processed; the special effect adding unit 72 is configured to input the mask image of the image to be processed and the predetermined object of the image to be processed into a special effect model, so as to obtain an output image after adding the special effect to the predetermined object of the image to be processed, wherein the special effect model is trained by using the training method of the special effect model as described above.

According to an embodiment of the present disclosure, a mask map of a predetermined object of an image to be processed is obtained by inputting the image to be processed into an image segmentation model.

According to embodiments of the present disclosure, an electronic device may be provided. Fig. 8 is a block diagram of an electronic device 800 including at least one memory 801 having a set of computer-executable instructions stored therein that, when executed by the at least one processor, perform a training method or a special effect generation method of a special effect model according to an embodiment of the present disclosure, and at least one processor 802, according to an embodiment of the present disclosure.

By way of example, electronic device 800 may be a PC computer, tablet device, personal digital assistant, smart phone, or other device capable of executing the above-described set of instructions. Here, the electronic device 1000 is not necessarily a single electronic device, but may be any apparatus or a collection of circuits capable of executing the above-described instructions (or instruction sets) individually or in combination. The electronic device 800 may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with either locally or remotely (e.g., via wireless transmission).

In electronic device 800, processor 802 may include a Central Processing Unit (CPU), a Graphics Processor (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, the processor 802 may also include an analog processor, a digital processor, a microprocessor, a multi-core processor, a processor array, a network processor, and the like.

The processor 802 may execute instructions or code stored in the memory, wherein the memory 801 may also store data. The instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.

The memory 801 may be integrated with the processor 802, for example, RAM or flash memory disposed within an integrated circuit microprocessor or the like. In addition, the memory 802 may include a stand-alone device, such as an external disk drive, a storage array, or other storage device usable by any database system. The memory 801 and the processor 802 may be operatively coupled or may communicate with each other, for example, through an I/O port, network connection, etc., such that the processor 802 is able to read files stored in the memory 801.

In addition, the electronic device 800 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device may be connected to each other via a bus and/or a network.

According to an embodiment of the present disclosure, there may also be provided a computer-readable storage medium, wherein the instructions in the computer-readable storage medium, when executed by the at least one processor, cause the at least one processor to perform the training method or the special effect generation method of the special effect model of the embodiment of the present disclosure. Examples of the computer readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, nonvolatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD+RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD+RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, blu-ray or optical disk storage, hard Disk Drives (HDD), solid State Disks (SSD), card memory (such as multimedia cards, secure Digital (SD) cards or ultra-fast digital (XD) cards), magnetic tape, floppy disks, magneto-optical data storage, hard disks, solid state disks, and any other means configured to store computer programs and any associated data, data files and data structures in a non-transitory manner and to provide the computer programs and any associated data, data files and data structures to a processor or computer to enable the processor or computer to execute the programs. The computer programs in the computer readable storage media described above can be run in an environment deployed in a computer device, such as a client, host, proxy device, server, etc., and further, in one example, the computer programs and any associated data, data files, and data structures are distributed across networked computer systems such that the computer programs and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.

According to an embodiment of the present disclosure, there is provided a computer program product comprising computer instructions which, when executed by a processor, implement a training method or a special effect generation method of a special effect model of an embodiment of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for training a special effect model, comprising:

acquiring a training sample set, wherein the training sample set comprises a plurality of training images;

Inputting a training image and a first mask map into a special effect model to obtain an output image and a second mask map after adding special effects to a preset object in the training image, wherein the first mask map is the mask map of the preset object in the training image, and the second mask map is the mask map of the preset object after adding special effects in the output image;

determining a value of an objective loss function based on the training image, the output image, a first mask map, and a second mask map;

and adjusting parameters of the special effect model based on the value of the target loss function, and training the special effect model.

2. The training method of claim 1, wherein the determining the value of the objective loss function based on the training image, the output image, the first mask map, and the second mask map comprises:

based on the training image and the first mask map, obtaining a real feature vector of the predetermined object in the training image;

obtaining a first estimated feature vector of the preset object in the output image based on the output image and the first mask map;

obtaining a second estimated feature vector of the preset object added with the special effect in the output image based on the output image and the second mask map;

And determining the value of the target loss function according to the real feature vector, the first estimated feature vector and the second estimated feature vector.

3. The training method of claim 2 wherein said determining a value of a target loss function from said true feature vector, said first estimated feature vector, and said second estimated feature vector comprises:

determining a value of a first target loss function based on an absolute value of a difference between the real feature vector and the first estimated feature vector;

determining a value of a second objective loss function based on a square value of a difference between the real feature vector and the second estimated feature vector;

determining a value of a third target loss function based on the square value of the difference between two adjacent eigenvalues on each row vector in the second estimated eigenvector and the square value of the difference between two adjacent eigenvalues on each column vector in the second estimated eigenvector;

and determining the value of the target loss function according to the value of the first target loss function, the value of the second target loss function and the value of the third target loss function.

4. The training method of claim 2 wherein said determining a value of a target loss function from said true feature vector, said first estimated feature vector, and said second estimated feature vector comprises:

and determining the value of the target loss function according to the value of the first target loss function and the value of the second target loss function.

5. The training method of claim 2 wherein said determining a value of a target loss function from said true feature vector, said first estimated feature vector, and said second estimated feature vector comprises:

and determining the value of the target loss function according to the value of the first target loss function and the value of the third target loss function.

6. Training method according to claim 3 or 5, characterized in that the value of the third objective loss function is determined by the following formula:

wherein the saidFor the value of the ith row and jth column in the second estimated feature vector,/th row and jth column>Is a value determined according to actual needs.

7. The training method of claim 1, wherein the first mask image is obtained by inputting the training image into an image segmentation model.

8. The training method of claim 1, wherein the special effects model comprises a StarGAN model.

9. A special effect generation method, characterized by comprising:

acquiring an image to be processed;

inputting the to-be-processed image and a mask image of a preset object of the to-be-processed image into a special effect model to obtain an output image with special effects added to the preset object of the to-be-processed image, wherein the special effect model is trained by adopting the training method of the special effect model according to any one of claims 1 to 8.

10. The special effect generation method according to claim 9, wherein the mask map of the predetermined object of the image to be processed is obtained by inputting the image to be processed into an image segmentation model.

11. The special effects generation method of claim 9 or 10, wherein the special effects model comprises a StarGAN model.

12. A training device for a special effect model, comprising:

a sample acquisition unit configured to acquire a training sample set, wherein the training sample set contains a plurality of training images;

the special effect adding unit is configured to input a training image and a first mask map into a special effect model to obtain an output image and a second mask map after special effects are added to a preset object in the training image, wherein the first mask map is a mask map of the preset object in the training image, and the second mask map is a mask map of the preset object after special effects are added to the output image;

a loss function determining unit configured to determine a value of a target loss function based on the training image, the output image, a first mask map, and a second mask map;

and the special effect model training unit is configured to adjust parameters of the special effect model based on the value of the target loss function and train the special effect model.

13. The training apparatus of claim 12 wherein the loss function determination unit is further configured to obtain a true feature vector of the predetermined object in the training image based on the training image and the first mask map; obtaining a first estimated feature vector of the preset object in the output image based on the output image and the first mask map; obtaining a second estimated feature vector of the preset object added with the special effect in the output image based on the output image and the second mask map; and determining the value of the target loss function according to the real feature vector, the first estimated feature vector and the second estimated feature vector.

14. The training apparatus of claim 13 wherein the loss function determination unit is further configured to determine a value of a first objective loss function based on an absolute value of a difference between the real feature vector and the first estimated feature vector; determining a value of a second objective loss function based on a square value of a difference between the real feature vector and the second estimated feature vector; determining a value of a third target loss function based on the square value of the difference between two adjacent eigenvalues on each row vector in the second estimated eigenvector and the square value of the difference between two adjacent eigenvalues on each column vector in the second estimated eigenvector; and determining the value of the target loss function according to the value of the first target loss function, the value of the second target loss function and the value of the third target loss function.

15. The training apparatus of claim 13 wherein the loss function determination unit is further configured to determine a value of a first objective loss function based on an absolute value of a difference between the real feature vector and the first estimated feature vector; determining a value of a second objective loss function based on a square value of a difference between the real feature vector and the second estimated feature vector; and determining the value of the target loss function according to the value of the first target loss function and the value of the second target loss function.

16. The training apparatus of claim 13 wherein the loss function determination unit is further configured to determine a value of a first objective loss function based on an absolute value of a difference between the real feature vector and the first estimated feature vector; determining a value of a third target loss function based on the square value of the difference between two adjacent eigenvalues on each row vector in the second estimated eigenvector and the square value of the difference between two adjacent eigenvalues on each column vector in the second estimated eigenvector; and determining the value of the target loss function according to the value of the first target loss function and the value of the third target loss function.

17. Training device according to claim 14 or 16, characterized in that the value of the third objective loss function is determined by the following formula:

18. The training apparatus of claim 12 wherein the first mask image is obtained by inputting the training image into an image segmentation model.

19. The training apparatus of claim 12 wherein the special effects model comprises a StarGAN model.

20. A special effect generation apparatus, comprising:

an image acquisition unit configured to acquire an image to be processed;

a special effect adding unit configured to input the to-be-processed image and a mask image of a predetermined object of the to-be-processed image into a special effect model to obtain an output image after adding a special effect to the predetermined object of the to-be-processed image, wherein the special effect model is trained by the training method of the special effect model according to any one of claims 1 to 8.

21. The special effect generation apparatus according to claim 20, wherein the mask map of the predetermined object of the image to be processed is obtained by inputting the image to be processed into an image segmentation model.

22. The special effects generation apparatus of claim 20 or 21, wherein the special effects model comprises a StarGAN model.

23. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the training method of the special effects model of any one of claims 1 to 8 or the special effects generation method of any one of claims 9 to 11.

24. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by at least one processor, cause the at least one processor to perform the training method of the special effects model of any one of claims 1 to 8 or the special effects generation method of any one of claims 9 to 11.