CN114266695A

CN114266695A - Image processing method, image processing system and electronic equipment

Info

Publication number: CN114266695A
Application number: CN202111602629.XA
Authority: CN
Inventors: 王哲
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-04-01

Abstract

The embodiment of the application provides an image processing method, an image processing system and electronic equipment. In the image processing method provided by the embodiment of the application, a first avatar parameter corresponding to the avatar adapted to the user image is determined, and the first avatar parameter is optimized by combining the feature information of the user image, so that a first avatar map is generated according to the optimized first avatar parameter. The scheme provided by the embodiment of the application optimizes the first virtual image parameter by using the user image characteristic so as to generate a corresponding first virtual image graph based on the optimized first virtual image parameter, and the virtual image graph has high similarity with the user image and good effect.

Description

Image processing method, image processing system and electronic equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method, an image processing system, and an electronic device.

Background

Currently, many game applications, e-commerce applications, or social applications provide a function by which a user can generate a cartoon-style virtual pictogram. This avatar may be like himself or another person. This function may be referred to in the industry as "face-pinching," which broadly refers to the act of performing custom data operations on the appearance of a virtual character.

The technique of pinching faces based on real photos is gaining more and more attention and research in the industry as a low-threshold virtual image creation technique. At present, the scheme of pinching faces based on real photos mostly has the problem of poor virtual image effect.

Disclosure of Invention

The present application provides an image processing method, a system and a device for determining a first image synthesis model that solve the above problems, or at least partially solve the above problems.

In one embodiment of the present application, an image processing method is provided. The image processing method comprises the following steps:

acquiring a user image; determining a first avatar parameter corresponding to an avatar adapted to the user image; extracting feature information of the user image; optimizing the first avatar parameter in combination with the feature information; and generating a first virtual pictogram according to the optimized first virtual character parameters.

In another embodiment of the present application, an image processing method is provided. The image processing method comprises the following steps:

responding to an input operation triggered by a user, and acquiring a user image; determining a face pinching parameter corresponding to the user image; extracting feature information of the user image; optimizing the face pinching parameters by combining the characteristic information; calling an image generation engine to enable the image generation engine to generate a virtual pictogram according to the optimized face pinching parameters; and displaying the virtual pictogram.

In yet another embodiment of the present application, an image processing method is also provided. The image processing method comprises the following steps:

acquiring a first image; determining image parameters corresponding to the first image converted into the target style image by using an image processing model; extracting feature information of the first image; optimizing the image parameters by combining the characteristic information of the first image; and generating a second image of the target style according to the optimized image parameters.

In one embodiment of the present application, an image processing system is provided. The image processing system includes:

the client is used for responding to the operation of the user and sending the user image to the server;

the server is used for acquiring the user image and determining a first virtual image parameter corresponding to a virtual image matched with the character image in the user image; extracting characteristic information of the character image; optimizing the first avatar parameter in combination with the feature information; generating a first virtual pictogram according to the optimized first virtual character parameters;

and the client is used for displaying the first virtual pictogram.

In an embodiment of the present application, an electronic device is provided. The electronic device comprises a processor and a memory, wherein the memory is used for storing one or more computer instructions; the processor, coupled with the memory, is configured to execute the one or more computer instructions to implement the steps in the above-described method embodiments.

In yet another embodiment of the present application, a computer program product is also provided. The computer program product comprises computer programs or instructions which, when executed by a processor, cause the processor to carry out the steps in the above-described method embodiments.

In the technical scheme provided by the embodiment of the application, a first avatar parameter corresponding to the avatar adapted to the user image is determined, and the first avatar parameter is optimized by combining with the characteristic information of the user image, so that a first avatar graph is generated according to the optimized first avatar parameter. Therefore, the scheme provided by the embodiment of the application optimizes the parameters by using the characteristics of the user image, and can improve the similarity between the virtual pictogram and the user image; further, the embodiment of the application can determine the first avatar parameter corresponding to the user image adaptation avatar by using an image processing model (namely, an end-to-end model), so that the influence of the problems of inconsistent training targets, large error accumulation and the like in multi-step or multi-model cooperative work can be reduced, the performance is good, and the complexity is low; the virtual image can be deployed on the client side when being implemented, and is beneficial to improving the generation efficiency of the virtual image.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required to be utilized in the description of the embodiments or the prior art are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained according to the drawings without creative efforts for those skilled in the art.

FIG. 1 is a schematic diagram illustrating a client side generating a virtual pictogram from a user image by using an image processing method provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of an image processing system provided by an embodiment of the present application;

FIG. 3 shows a schematic diagram of an image processing system provided by another embodiment of the present application;

FIG. 4 is a flowchart illustrating an image processing method according to an embodiment of the present application;

FIG. 5 shows a schematic diagram of a loop generation countermeasure network as referred to in an embodiment of the present application;

FIG. 6 is a flow chart illustrating an image processing method according to another embodiment of the present application;

FIG. 7 is a flow chart illustrating an image processing method according to another embodiment of the present application;

fig. 8 is a schematic structural diagram illustrating an image processing apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an image processing apparatus according to another embodiment of the present application;

fig. 10 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

In the prior art, the technique of pinching faces based on real faces is mainly classified into two types:

the first type: scheme based on template matching

The scheme based on template matching is that facial key points and facial semantics are utilized to extract the facial contour and color of a real person photo, and then template matching is carried out. The method has the advantages of strong interpretability; the method has the obvious defects that the template matching method is not robust enough to the face posture, is easy to carry out error matching, is easy to be influenced by illumination, and is not accurate enough to identify continuous parameters in face pinching parameters, so that the finally generated virtual pictogram has poor effect.

The second type: scheme based on face reconstruction

The method can generate a human face image with higher fidelity and a corresponding chartlet by constructing a real human face through a human face reconstruction algorithm, and has the defects of not being matched with the style of a virtual image necessarily and being higher in fidelity. The scheme based on face reconstruction is more prone to realistic digital human construction using scenes.

Therefore, the following embodiments are provided to provide a scheme with good virtual image generation effect. In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. In some of the flows described in the specification, claims, and above-described figures of the present application, a number of operations are included that occur in a particular order, which operations may be performed out of order or in parallel as they occur herein. The sequence numbers of the operations, e.g., 101, 102, etc., are used merely to distinguish between the various operations, and do not represent any order of execution per se. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different modules, models, devices, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different. In addition, the embodiments described below are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

First, the following terms mentioned herein are simply explained to facilitate understanding.

And (3) virtual image: refers to virtual character images such as game characters, character avatars, and the like.

Pinching the face: the avatar system includes an editable system, and the pinching refers to a process of generating an avatar drawing by adjusting parameters and attributes of the avatar editable system.

Stylization (or style migration): techniques to transform one style to another, such as the image field, change a real face into a cartoon style. The face stylization is to convert the face head portrait into a specific style head portrait, such as a sketch portrait style, a cartoon image style, an oil painting style and the like.

The technical scheme provided by the embodiment of the application can be applied to various applications such as game applications, e-commerce applications, social applications and the like. For example, in game applications, the game applications provide functions implemented by the scheme of the embodiment of the application for users, and users can generate game role diagrams in game scenes by using their own photos by using the functions. For another example, in the social application, the social application provides a function implemented by the scheme of the embodiment of the application for the user, and the user can use the function to generate a cartoon head portrait, an emoticon, a social role and the like by using a photo of the user.

In the scenario shown in fig. 1, a user can touch a control corresponding to the function through a client (e.g., a smart phone, a desktop computer, a tablet computer, etc.), the client starts a camera to take a picture of the user, and then an avatar diagram is generated and displayed according to the taken picture of the user by using the scheme provided by the embodiments of the present application. Because the virtual character parameters based on the virtual pictogram are combined with the characteristic information of the user photo, the virtual pictogram has higher similarity with the user photo and better effect.

Besides generating the adapted virtual pictogram based on the user image, the technical scheme provided by the embodiment of the application is also suitable for style conversion of other images, such as generating an animation style building chart based on the building photo and generating an animation style animal chart based on the animal photo. Besides the animation and cartoon styles, the painting style and the sketch style can also be included, which is not limited in this embodiment.

The technical scheme provided by the embodiment of the application can be realized based on the system shown in fig. 2. Specifically, as shown in fig. 2, the image processing system includes a client 11 and a server 12. The client 11 is configured to obtain a user image; determining a first avatar parameter corresponding to an avatar adapted to the user image; extracting feature information of the user image; optimizing the first avatar parameter in combination with the feature information; and generating a first virtual pictogram according to the optimized first virtual character parameters. The server 12 is configured to train the image processing model, where a training sample of the image processing model is constructed by using a stylized model, and the stylized model is obtained by training an avatar image randomly generated by using an image generation engine and a character image obtained from a network side.

Specifically, the client may determine the first avatar parameter using an image processing model. The training process of the image processing model will be described in detail below, and please refer to the following content.

The server 12 is responsible for training the model, and the image processing model after training is sent to the client 11, so that the client 11 can locally determine the virtual image parameters of the user image input by the user of the client 11 by using the image processing model; and then optimizing the virtual image parameters by combining the characteristic information of the user image, and generating a virtual image graph based on the optimized virtual image parameters.

Alternatively, as shown in fig. 3, the client 11 is configured to send a user image to the server 12, and display an avatar image returned by the server 12; other processing is handled by the server 12. As shown in FIG. 3, the server 12 is used for training the stylized model, for constructing the training sample, for training the image processing model using the training sample, and for generating the virtual pictogram using the image processing model in combination with the user image feature extraction. Specifically, the client 11 is configured to send a user image to the server 12 in response to an operation of the user. The server 12 is configured to obtain a user image; determining a first avatar parameter corresponding to an avatar adapted to the user image using an image processing model; extracting feature information of the user image; optimizing the first avatar parameter in combination with the feature information; and generating a first virtual pictogram according to the optimized first virtual character parameters. The client is further configured to display the first virtual pictogram.

The client in this embodiment may be a smart phone, a desktop computer, a notebook computer, a tablet computer, an intelligent wearable device, and the like, and the server may be a server, a server cluster, a virtual server, a cloud, and the like, which is not limited in this embodiment. In addition, the client 11 and the server 12 in this embodiment may also implement the corresponding functions of the steps in the following method embodiments, in addition to the functions corresponding to the steps, which may be specifically referred to in the following corresponding embodiments.

Fig. 4 shows a flowchart of an image processing method according to an embodiment of the present application. The execution subject of the method provided by this embodiment may be a client or a server in the image processing system. As shown in fig. 1, the method includes:

101. acquiring a user image;

102. determining a first avatar parameter corresponding to an avatar adapted to the user image;

103. extracting feature information of the user image;

104. optimizing the first avatar parameter in combination with the feature information;

105. and generating a first virtual pictogram according to the optimized first virtual character parameters.

Referring to the example shown in fig. 1, a user may input a user image through a client. For example, the user image is captured by an image capture device on the client, and the user image may also be imported from the gallery through an interactive interface on the client. The gallery may be locally stored (such as an album) in an execution main body of the method in this embodiment, or may be stored in other devices on the network side, which is not limited in this embodiment.

In the above 102, in implementation, the first avatar parameter may be determined by using an image processing model. The image processing model can be a deep learning model, and the deep learning model can comprise a convolutional neural network, and if convolutional layers are stacked, the deep convolutional neural network can be formed and used for training an avatar parameter prediction task. Wherein, the convolution layer can be further divided into: conventional convolutional layers, deconvolution layers, void convolutional layers, and the like. The network structure of the deep learning model can also comprise other network layers such as a residual error network and the like. Of course, the image processing model in this embodiment may also be implemented by using other types of networks, which is not illustrated herein.

The feature information extraction in 103 can be realized by the person key points and the semantic segmentation model. Namely, feature analysis is carried out on the user image through the character key points and the semantic segmentation model so as to extract human body contour information, color information and the like.

In fact, the user may be more concerned that the face feature of the avatar is not similar to that of the avatar, and further, the extracted feature information in the present embodiment may be face feature information. Namely, the above character key points and semantic segmentation model may specifically be: the face key points and the face semantic segmentation model. And performing feature analysis on the user image through the face key points and the face semantic segmentation model to extract face five-sense information, color information and the like.

The facial feature information may include, but is not limited to: hair style profile, lip profile, nose profile, eyebrow profile, eye angle, eyelid pattern, eyebrow width, eyelash visible or invisible attributes, and the like. Color information may include, but is not limited to: skin color, lip color, iris color, hair color, and the like.

What needs to be added here is: the character keypoints and semantic segmentation model can be two models. Similarly, the face key points and the face semantic segmentation model may be two models, for example, a key point extraction model and a semantic segmentation model. In the embodiment, the existing key point extraction model and the semantic segmentation model can be directly selected. For the contents of the keypoint extraction model and the semantic segmentation model, reference may be made to the existing literature, which is not specifically limited and is not repeated herein.

In the above 104, the first avatar parameter may include: a plurality of first discrete parameters and a plurality of first continuous parameters reflecting the characteristics of the avatar. For example, hair style parameters, hair color, skin color, lip color, and the like are discrete parameters. Face width, face length, eye angle, eyebrow width, etc. are continuous parameters.

Accordingly, in an implementation solution, the step 104 "combine the feature information to optimize the first avatar parameter" may include the following steps:

1041. performing value optimization processing on the plurality of first discrete parameters based on the feature information to optimize the values of at least part of the plurality of first discrete parameters;

1042. and carrying out shape-beautifying optimization on the plurality of first continuous parameters and the plurality of first discrete parameters subjected to value optimization processing by using a shape-beautifying optimization algorithm.

Step 1041 is to optimize the value of the discrete parameter, so that the discrete parameter is more fit to the feature information of the user image, so that the subsequently generated virtual pictogram can have higher similarity with the user image. Step 1042 is for aesthetic optimization, which is essentially to make the final generated virtual pictogram aesthetically pleasing. In specific implementation, some aesthetic parameters, such as the proportion of the forehead on the face, the distance between the eyes and the like, can be preset, and in the beauty optimization algorithm, beauty adjustment can be performed on the plurality of first continuous parameters and the plurality of first discrete parameters according to the preset aesthetic parameters, so that the first virtual pictogram generated by the beauty-adjusted parameters meets the aesthetic standard and is called as good-looking.

According to the first avatar parameter optimized in the above steps 1041 and 1042, the generated first avatar image has high similarity to the user image and is more beautiful (i.e. nice).

Specifically, in this embodiment, the step 1041 "performing value optimization processing on the plurality of first discrete parameters based on the feature information to optimize the values of at least part of the plurality of first discrete parameters" may be implemented by:

s11, generating a characteristic template according to the characteristic information;

s12, determining a target discrete parameter corresponding to the feature template in the plurality of first discrete parameters;

s13, searching target materials matched with the characteristic template in the characteristic material set by using a template matching algorithm;

and S14, assigning the values of the target discrete parameters to material identifications corresponding to the target materials.

The above steps S11 to S14 show only one optimization step of the first discrete parameter. Substantially, the other first discrete parameters of the plurality of first discrete parameters may be optimized through the steps of S11 to S14.

Taking a face-pinching system in a game application, an e-commerce application or a social application as an example, some discrete parameters in the face-pinching system correspond to corresponding material libraries, and when an image generation engine (in the game application, the engine may be a game engine) generates a virtual pictogram according to the discrete parameters, the image generation engine calls corresponding materials from the corresponding material libraries according to values (namely material identifications) of the discrete parameters, and then generates the virtual pictogram by using the materials. For example, the hair style parameter corresponds to a corresponding hair style material library in the face pinching system.

In order to ensure that the avatar parameters conform to the input of the image generation engine and reduce the occurrence of error conditions, the foregoing S11 to S14 are adopted in this embodiment to optimize the values of at least some of the first discrete parameters in the plurality of first discrete parameters.

In the above S11, the "generating a feature template based on the feature information" may include, but is not limited to, at least one of:

respectively generating corresponding hairstyle templates and lip templates according to the characteristics of the hairstyle, the lips and the like in the characteristic information;

and respectively generating a corresponding color development template, a skin color template, a lip color template and the like according to the characteristics of the color development, the skin color, the lip color and the like in the characteristic information.

In S12, the target discrete parameter corresponding to the feature template may be determined by identifying whether the attribute of the feature template matches or is similar to the parameter name of the discrete parameter. For example, if the attribute of the hair style template is "hair style", and the parameter of a certain discrete parameter is called "hair style", the target discrete parameter corresponding to the hair style template is the discrete parameter called "hair style".

In S13, the target material matching the feature template is searched out from the feature material set by using a template matching algorithm. For example, the feature template is a hair style template, and the template matching algorithm may specifically be: similar hair style materials are retrieved from a hair style material library using a shape matching algorithm. Wherein, an achievable shape matching algorithm can be to align the hair style template with the hair style materials in the material library and then find the difference; and the hair style materials with small difference from the hair style template are the template materials to be searched. For another example, the feature template is a lip color template, and the template matching algorithm may specifically be: and searching similar lip color materials in a lip color material library by using a color matching algorithm. The color matching algorithm may include, but is not limited to: and (4) color histogram algorithm.

In 105, the image generating engine may be directly invoked, so that the image generating engine generates the first avatar diagram according to the optimized first avatar parameter.

In the technical scheme provided by this embodiment, an image processing model is used to determine a first avatar parameter corresponding to an avatar adapted to a user image, and the first avatar parameter is optimized in combination with feature information of the user image, so as to generate a first avatar map according to the optimized first avatar parameter. Therefore, according to the scheme provided by the embodiment of the application, the virtual image parameters predicted by the image processing model are combined with the user image feature optimization parameters, so that the similarity between the virtual image and the user image is high, and the effect is good; in addition, the embodiment utilizes an image processing model (namely an end-to-end model) to carry out parameter prediction, so that the influence of the problems of inconsistent training targets, large error accumulation and the like in multi-step or multi-model cooperative work can be reduced, the performance is good, and the complexity is low; for example, in implementation, the virtual image generation method can be deployed on the client side to improve the generation efficiency of the virtual image.

Further, an avatar parameter in step 102 in the embodiment of the present application is determined by using an image processing model, that is, step 102 may be specifically "determining a first avatar parameter corresponding to an avatar adapted to the user image by using the image processing model". Correspondingly, the image processing method provided by the embodiment may further include the following steps:

106. inputting the second virtual pictogram into a stylized model, and executing the stylized model to output a character image;

107. acquiring a second virtual image parameter corresponding to the second virtual image;

108. adding the character image and the second avatar parameter as a sample pair to a training sample;

109. and training the image processing model by using the sample pairs in the training samples.

The above steps 106 to 109 describe the training scheme of the image processing model in this embodiment. Referring to fig. 1 and 2, the training samples of the image processing model in the present embodiment are obtained by using a stylized model. It is difficult to obtain the paired avatar parameters and character images, so the embodiment uses the stylized model to generate the character image corresponding to the second avatar, and then adds the second avatar parameters corresponding to the second avatar and the character image generated by the stylized model as a sample pair to the training sample. By adopting the sample pairs in the training samples to train, the obtained image processing model is more accurate and has better performance.

For example, the second avatar parameter includes a plurality of second discrete parameters and a plurality of second continuous parameters. Accordingly, the step 109 "training the image processing model by using the sample pairs in the training samples" may include the following steps:

1091. inputting the character images in the sample pair into the image processing model, executing the image processing model and outputting a third virtual character parameter; wherein the third avatar parameter includes a plurality of third discrete parameters and a plurality of third continuous parameters;

1092. determining a discrete parameter loss based on the plurality of third discrete parameters and the plurality of second discrete parameters in the sample pair;

1093. determining a continuous parameter loss from the third plurality of continuous parameters and the second plurality of continuous parameters in the sample pair;

1094. and optimizing the image processing model based on the discrete parameter loss and the continuous parameter loss.

In the above steps 1092 and 1093, corresponding loss functions are constructed for different discrete parameters. Likewise, corresponding loss functions are constructed for different consecutive parameters. For example, the hair style parameter corresponds to a hair style loss function, the face length (or width) parameter corresponds to a face shape loss function, and so on, which are not exemplified herein. Therefore, when the 1092 step is executed, different discrete parameters select their corresponding loss functions, and the discrete parameter losses corresponding to the discrete parameters are calculated. Similarly, different continuous parameters select respective corresponding loss functions, and the continuous parameter losses corresponding to the respective continuous parameters are respectively calculated.

In 1094, the total loss is calculated according to all the losses, and then the parameters in the image processing model are optimized by using the total loss. The specific implementation steps for optimizing the model parameters by using the loss can be referred to in the existing literature, and are not specifically limited and are not repeated herein.

Still further, the image processing method provided in the embodiment of the present application further includes the following steps:

110. placing the randomly generated third virtual pictogram into the first data set;

111. placing the figure image obtained from the network side into a second data set;

112. and training a stylized model by utilizing the first data set and the second data set.

As shown in fig. 1 and 2, the randomly generated third avatar graphic may be generated by an image generation engine according to random avatar parameters. The character graph can be obtained from open source data on the network side.

The randomly generated third avatar and the character graph obtained at the network side are unpaired data sets, so the stylized model in the embodiment can be implemented by using a loop generation countermeasure network as shown in the figure, and the training of the loop generation countermeasure network can adopt unsupervised training of unpaired data sets. Specifically, the step 112 "training the stylized model by using the first data set and the second data set" may specifically include:

1121. inputting a third virtual pictogram in the first data set into the loop generation countermeasure network, and executing the loop generation countermeasure network to obtain a first output result;

1122. determining a first cycle consistency loss according to the first output result and the third virtual pictogram;

1123. determining a first global apparent loss of the first output result and the third avatar using a character recognition model;

1124. determining the first output result and the first local feature loss of the third virtual image by using a human body semantic segmentation model;

1125. inputting the figure images in the second data set into the loop generation countermeasure network, and executing the loop generation countermeasure network to obtain a second output result;

1126. determining a second cycle consistency loss according to the second output result and the input person image;

1127. determining a second global apparent loss of the second output result and the human image by utilizing a human recognition model;

1128. determining a second output result and a second local characteristic loss of the character image by using a human body semantic segmentation model;

1129. optimizing the cycle generating countermeasure network based on the first cycle consistency penalty, the first global apparent penalty, the first local feature penalty, the second cycle consistency penalty, the second global apparent penalty, and the second local feature penalty.

Referring to fig. 5, a cyclic generation countermeasure network (CycleGAN) is generated by performing two transformations on the source domain X image X: firstly, mapping the image X to a target domain Y, and then returning to a source domain X to obtain a secondary generated image

Thus eliminating the requirement for image pairing in the target domain Y, which is a cyclic structure and hence referred to as CycleGAN. It can be seen from fig. 5 that CycleGAN is actually two unidirectional GANs in opposite directions, which share two generators and then have one discriminator each, adding up to two discriminators and two generators in total.

Referring to fig. 5, X and Y represent images of two fields, respectively. The CycleGAN includes two generators G and F for X to Y generation, and Y to X generation, respectively; also includes two discriminators, Dx and Dy, respectively. The Loss is as follows:

L(G，F，D_x，D_y)＝L_GAN(G，D_y，X，Y)+L_GAN(F，D_x，X，Y)+λL_CYC(F，G)

wherein λ is a hyper-parameter, the first two terms are the resulting loss of opposition, loss of cyclology, and Lyc is the loss of cyclic consistency of the cycleGAN. Specifically, the Lcyc loss is:

L_cyc(F，G)＝E_x～Pdata(x)[‖F(G(x))-x‖₁]+E_y～Pdata(y)[‖G(F(y))-y‖₁]

wherein x to Pdata (x) represent imagesX is from the X domain; y to pdata (Y) represent the Y domain of the image Y. Referring to FIG. 5, F (G (x)) represents the generation of image x by generator G (x)

Then, through the generator

Generating

Expectation of

G (F (y)) represents the generation of an image y by a generator F (y)

Then, through the generator

Generating

Expectation of

Because the CycleGAN lacks understanding of human face semantic information, the effect after final training is poor. Therefore, the human face five sense organs need to be improved, semantic information can be understood, and reasonable distribution of the human face five sense organs is guaranteed. For example, a method of using key points for constraint restricts the distribution of five sense organs of an output image by adding a key point prediction task; or based on the attention mechanism method, the effective area of the human face is learned by using the attention mechanism. Alternatively, two more loss functions may be added to the global appearance and local appearance, respectivelyThe section details are two convenient measures. For example, a face recognition model is introduced to measure global apparent loss of two faces, such as face shape and approximate expression. For example, the face recognition model "Light CNN-29v 2" extracts an embedded representation of 256-dimensional face features, and then uses the features to calculate the cosine distance between two images as their similarity representation, and the loss can be called face identity loss, and its function is to judge the images

And image x or image

And whether image y belongs to the same identity.

In addition to face identity loss, local facial features can be extracted using a face semantic segmentation model, and a face content loss can be defined by calculating the error of these local facial features at the pixel level. Facial content loss can be viewed as constraints on the shape and displacement of different facial components in the two images, e.g., eyes, mouth, and nose. A more concern is facial image feature differences due to loss of facial content. For example, Resnet-50 may be used as a semantic segmentation model.

In fact, more complex loss can be designed to enable the stylized model to better capture the features of both the virtual pictogram and the character image, enabling better stylized control. In addition, the loss selection, construction, and the like in the present embodiment are not particularly limited, and may be designed as needed.

Finally, the total loss function of the stylized model may be L (G, F, D) as described above_x，D_y) Linear combinations of face identity loss, face content loss, etc.

Based on the stylized model, a large number of virtual pictograms can be stylized to corresponding character images, so that a data set with paired character images and virtual character parameters can be conveniently established, and the paired data can be used as training samples to provide a data base for further training an image processing model subsequently. The method can also be used for various face pinching systems with certain characteristic similarity with human faces, the application range of the intelligent face pinching system is greatly improved, and the face pinching system can be expanded to automatically generate virtual images of different styles.

Furthermore, in this embodiment, a stylized model is used to construct pairs of face images and avatar parameters, and then these pairs of data are used to train an image processing model that can output corresponding avatar parameters when inputting a human image.

In summary, the technical solution provided by the embodiment of the present application designs a stylized model skillfully, and captures similar features of an avatar image and a real character image (such as a photo) by using the stylized model, so as to generate a character image corresponding to each avatar image, and train an image processing model for obtaining avatar parameters based on the character image based on the generated character image and avatar parameters of the corresponding avatar image. The decoupling design mode provided by the embodiment of the application shows an excellent generalization effect in an actual test, and the stylizing capability also enables the scheme to be generalized to more abstract virtual image scenes, such as quadratic element head portraits and the like.

In fact, in this embodiment, the character image may be input into a stylized model to output a virtual image map, and then the paired virtual image maps and the corresponding virtual image coefficients are used to train the prediction model, so that two models, namely the stylized model and the prediction model, are needed in actual use. Specifically, an image processing method provided in another embodiment of the present application includes:

A. acquiring a user image;

B. inputting the user image into a stylized model, executing the stylized model and outputting a corresponding fourth virtual pictogram;

C. inputting the fourth virtual pictogram into a prediction model, and executing the prediction model to output corresponding four virtual character parameters;

D. extracting feature information of the user image;

E. optimizing the fourth avatar parameter in combination with the feature information;

F. and calling an image generation engine to enable the image generation engine to generate a corresponding fifth virtual pictogram according to the optimized virtual character parameters.

It should be added here that the fourth avatar diagram, the fourth avatar parameter and the fifth avatar diagram in this embodiment are all used for distinguishing the avatar diagram, the virtual parameter and the like in the above.

The game industry typically creates and customizes a more complex face-pinching system for virtual images to satisfy the appeal of the user's personalized image. The technical scheme provided by the embodiment of the application can also be applied to the existing face pinching system. As shown in fig. 6, another embodiment of the present application provides a flowchart of an image processing method. As shown in fig. 6, the method execution subject according to the present embodiment may be a client in the image processing system. Specifically, the method comprises the following steps:

201. responding to an input operation triggered by a user, and acquiring a user image;

202. determining a face pinching parameter corresponding to the user image;

203. extracting feature information of the user image;

204. optimizing the face pinching parameters by combining the characteristic information;

205. calling an image generation engine to enable the image generation engine to generate a virtual pictogram according to the optimized face pinching parameters;

206. and displaying the virtual pictogram.

For more details on the above steps 201 to 205, reference may be made to the description of the above embodiments. The face pinching parameters in this embodiment are similar to the avatar parameters in the above-described embodiment.

Further, the method of this embodiment may further include the following steps:

207. acquiring an image processing model which is trained from a server; the training sample of the image processing model is constructed by utilizing a stylized model, and the stylized model is obtained by utilizing an avatar image randomly generated by the image generation engine and a character image obtained from a network side through training;

208. and storing the image processing model in a local place.

Accordingly, step 202 in this embodiment may be specifically "determining the face pinching parameters corresponding to the user image by using the image processing model"

For the training content of the image processing model and the training content of the stylized model, reference may be made to the related description in the foregoing embodiments, which is not repeated herein.

Further, the pinching parameters may include a plurality of discrete parameters and a plurality of continuous parameters. Accordingly, the aforementioned step 204 "of optimizing the face-pinching parameters in combination with the feature information" may include the steps of:

2041. performing value optimization processing on the plurality of discrete parameters based on the characteristic information to optimize the values of at least part of the plurality of discrete parameters;

2042. and carrying out shape-beautifying optimization on the plurality of continuous parameters and the plurality of discrete parameters after value optimization processing by using a shape-beautifying optimization algorithm.

Still further, in step 2041, "performing value optimization processing on the plurality of discrete parameters based on the feature information to optimize the value of at least some of the plurality of discrete parameters" may be implemented by:

generating a characteristic template according to the characteristic information;

determining a target discrete parameter corresponding to the feature template in the plurality of discrete parameters;

searching target materials matched with the characteristic template in a characteristic material set corresponding to the image generation engine by utilizing a template matching algorithm;

and assigning the value of the target discrete parameter as a material identifier corresponding to the target material.

Similarly, the detailed contents of the above steps 2041 and 2042 can be referred to the contents of the above embodiments, and the detailed description thereof is omitted here.

The scheme provided by the embodiment of the application can also be applied to wider image style conversion scenes. For example, fig. 7 shows a flowchart of an image processing method according to another embodiment of the present application. As shown, the method comprises:

301. acquiring a first image;

302. determining image parameters corresponding to the first image converted into the target style image by using an image processing model;

303. extracting feature information of the first image;

304. optimizing the image parameters by combining the characteristic information of the first image;

305. and generating a second image of the target style according to the optimized image parameters.

The first image may be a human face image, an animal image, or the like, which is not limited in this embodiment. Target styles may include, but are not limited to: sketch styles, oil painting styles, animated cartoon styles, and the like. In specific implementation, the image processing model can be obtained by training sample pairs in training sets corresponding to different styles.

For example, the trained stylized model generates image parameter pairs corresponding to the character image and the sketch style chart, and the image parameter pairs are added to the training sample as sample pairs. The image processing model is trained by using the training sample, and the image processing model after training can determine the image parameters corresponding to the sketch style according to the input first image. Then, the image parameters are optimized by combining the characteristic information of the first image, and the image generation engine can generate a second image with a target style according to the optimized image parameters.

Wherein the training data set of the stylized model may include: the sketch style diagram generated by the image generation engine at random and the person image obtained from the network side. The stylized model performs unsupervised training using unpaired, randomly generated sketch style maps and the person images obtained from the network side. After the training is finished, the stylized model can be used for constructing paired samples, so that the training of the image processing model is facilitated.

In this embodiment, the contents of the steps, the training of the image processing model, and the training of the stylized model may refer to the corresponding contents in the above, which are not described herein again.

The technical scheme provided by each embodiment of the application has the following advantages:

1. in the technical scheme provided by the embodiment of the application, the image processing model adopts an end-to-end learning mode, has better robustness on the input character image, and can better determine discrete parameters, especially continuous coefficients, in the virtual image parameters; the relation between the figure image and the virtual image parameter can be well captured through the training and learning method.

2. According to the technical scheme provided by the embodiment of the application, training samples do not need to be constructed manually to train the image processing model.

3. The technical scheme provided by the embodiment of the application has strong robustness to noise, high similarity and very high universality and generalization.

Fig. 8 shows a schematic structural diagram of an image processing apparatus according to an embodiment of the present application. As shown in fig. 8, the apparatus includes: the system comprises a first obtaining module 21, a first determining module 22, a first extracting module 23, a first optimizing module 24 and a generating module 25. The first obtaining module 21 is configured to obtain an image of a user. The first determination module 22 is configured to determine a first avatar parameter corresponding to an avatar adapted to the user image. The first extraction module 23 is configured to extract feature information of the user image. The first optimization module 24 is configured to optimize the first avatar parameter in combination with the feature information. The generating module 25 is configured to generate a first avatar map according to the optimized first avatar parameter.

Further, the first avatar parameter includes: a plurality of first discrete parameters and a plurality of first continuous parameters reflecting the characteristics of the avatar. Correspondingly, the first optimization module 24, when combining the feature information, is specifically configured to:

performing value optimization processing on the plurality of first discrete parameters based on the feature information to optimize the values of at least part of the plurality of first discrete parameters; and carrying out shape-beautifying optimization on the plurality of first continuous parameters and the plurality of first discrete parameters subjected to value optimization processing by using a shape-beautifying optimization algorithm.

Further, the first optimization module 24 is specifically configured to, when performing value optimization processing on the plurality of first discrete parameters based on the feature information to optimize values of at least some of the plurality of first discrete parameters:

generating a characteristic template according to the characteristic information; determining a target discrete parameter corresponding to the feature template in the plurality of first discrete parameters; searching target materials matched with the characteristic template in the characteristic material set by utilizing a template matching algorithm; and assigning the value of the target discrete parameter as a material identifier corresponding to the target material.

Further, the first determining module 22 in the image processing apparatus provided in this embodiment is specifically configured to determine, by using an image processing model, a first avatar parameter corresponding to an avatar adapted to the user image. Correspondingly, the image processing apparatus provided in this embodiment may further include an execution module, an adding module, and a first training module. The execution module is used for inputting the second virtual pictogram into the stylized model and executing the stylized model to output the character image. The first obtaining module 21 is further configured to obtain a second avatar parameter corresponding to the second avatar diagram. The adding module is used for adding the character image and the second virtual image parameter as a sample pair to the training sample. The first training module is used for training the image processing model by utilizing sample pairs in training samples.

Still further, the second avatar parameter includes a plurality of second discrete parameters and a plurality of second continuous parameters. Correspondingly, when the first training module trains the image processing model by using a sample pair in training samples, the first training module is specifically configured to:

inputting the character images in the sample pair into the image processing model, executing the image processing model and outputting a third virtual character parameter; wherein the third avatar parameter includes a plurality of third discrete parameters and a plurality of third continuous parameters;

determining a discrete parameter loss based on the plurality of third discrete parameters and the plurality of second discrete parameters in the sample pair;

determining a continuous parameter loss from the third plurality of continuous parameters and the second plurality of continuous parameters in the sample pair;

and optimizing the image processing model based on the discrete parameter loss and the continuous parameter loss.

Further, the image processing apparatus provided in this embodiment further includes a data preparation module and a second training module. The data preparation module is used for placing a third virtual pictogram which is generated randomly in the first data set and placing a person image obtained from the network side in the second data set. The second training module is used for training the stylized model by utilizing the first data set and the second data set.

Further, the stylized model includes a cycle generating countermeasure network. Correspondingly, the second training module is specifically configured to:

inputting a third virtual pictogram in the first data set into the loop generation countermeasure network, and executing the loop generation countermeasure network to obtain a first output result;

determining a first cycle consistency loss according to the first output result and the third virtual pictogram;

determining a first global apparent loss of the first output result and the third avatar using a character recognition model;

determining the first output result and the first local feature loss of the third virtual image by using a human body semantic segmentation model;

inputting the figure images in the second data set into the loop generation countermeasure network, and executing the loop generation countermeasure network to obtain a second output result;

determining a second cycle consistency loss according to the second output result and the input person image;

determining a second global apparent loss of the second output result and the human image by utilizing a human recognition model;

determining a second output result and a second local characteristic loss of the character image by using a human body semantic segmentation model;

optimizing the cycle generating countermeasure network based on the first cycle consistency penalty, the first global apparent penalty, the first local feature penalty, the second cycle consistency penalty, the second global apparent penalty, and the second local feature penalty.

Here, it should be noted that: the image processing apparatus provided in the foregoing embodiments may implement the technical solutions described in the foregoing method embodiments, and the specific implementation principle of each module or unit may refer to the corresponding content in the foregoing method embodiments, and is not described herein again.

Fig. 9 is a schematic structural diagram of an image processing apparatus according to another embodiment of the present application. As shown in fig. 9, the image processing apparatus includes: a second obtaining module 31, a second determining module 32, a second extracting module 33, a second optimizing module 34, a calling module 35 and a displaying module 36. The second acquiring module 31 is configured to acquire a user image in response to an input operation triggered by a user. The second determining module 32 is configured to determine a pinching face parameter corresponding to the user image. The second extraction module 33 is configured to extract feature information of the user image. The second optimization module 34 is configured to optimize the face-pinching parameter in combination with the feature information. The calling module 35 is configured to call an image generation engine, so that the image generation engine generates a virtual pictogram according to the optimized face-pinching parameter. The display module 36 is configured to display the virtual pictogram.

Further, the image processing apparatus provided in this embodiment may further include a storage module. The second obtaining module 31 is further configured to obtain the trained image processing model from the server; the training sample of the image processing model is constructed by utilizing a stylized model, and the stylized model is obtained by utilizing an avatar image randomly generated by the image generation engine and a character image obtained from a network side for training. The storage module is used for storing the image processing model in local.

Correspondingly, in this embodiment, the second determining module 32 may specifically be configured to: and determining a face pinching parameter corresponding to the user image by using the image processing model.

Further, the face pinching parameters include: a plurality of discrete parameters and a plurality of continuous parameters. Correspondingly, when the second optimization module 34 is configured to optimize the face pinch parameter in combination with the feature information, specifically:

performing value optimization processing on the plurality of discrete parameters based on the characteristic information to optimize the values of at least part of the plurality of discrete parameters; and carrying out shape-beautifying optimization on the plurality of continuous parameters and the plurality of discrete parameters after value optimization processing by using a shape-beautifying optimization algorithm.

Still further, the second optimization module 34 is specifically configured to, when performing value optimization processing on the multiple discrete parameters based on the feature information to optimize values of at least some of the multiple discrete parameters:

The application further provides an image processing device. The structure of the image processing apparatus is similar to that of fig. 9. Specifically, the image processing apparatus includes: the device comprises a first acquisition module, a first determination module, a first extraction module, a first optimization module and a generation module. The first acquisition module is used for acquiring a first image. The first determining module is used for determining image parameters corresponding to the first image converted into the target style image by using an image processing model. The first extraction module is used for extracting the characteristic information of the first image. The first optimization module is used for optimizing the image parameters by combining the characteristic information of the first image. And the generating module is used for generating a second image of a target style according to the optimized image parameters.

Here, it should be noted that: the determining apparatus of the first image synthesis model for image processing provided in the foregoing embodiment may implement the technical solutions described in the foregoing corresponding method embodiments, and the specific implementation principles of the modules or units may refer to the corresponding contents in the foregoing corresponding method embodiments, and are not described herein again.

Fig. 10 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application. The electronic device includes a processor 42 and a memory 41. Wherein the memory 41 is configured to store one or more computer instructions; the processor 42, coupled to the memory 41, is used for one or more computer instructions (e.g., computer instructions implementing data storage logic) for implementing the steps in the above-described method embodiments.

The memory 41 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Further, as shown in fig. 10, the electronic apparatus further includes: communication components 43, power components 45, and a display 44. Only some of the components are schematically shown in fig. 10, and the electronic device is not meant to include only the components shown in fig. 10.

Yet another embodiment of the present application provides a computer program product (not shown in any figure of the drawings). The computer program product comprises computer programs or instructions which, when executed by a processor, cause the processor to carry out the steps in the above-described method embodiments.

Accordingly, the present application further provides a computer-readable storage medium storing a computer program, where the computer program can implement the method steps or functions provided by the foregoing embodiments when executed by a computer.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An image processing method, comprising:

acquiring a user image;

determining a first avatar parameter corresponding to an avatar adapted to the user image;

extracting feature information of the user image;

optimizing the first avatar parameter in combination with the feature information;

and generating a first virtual pictogram according to the optimized first virtual character parameters.

2. The method of claim 1, wherein the first avatar parameter comprises: a plurality of first discrete parameters and a plurality of first continuous parameters reflecting the characteristics of the avatar; and

optimizing the first avatar parameter in conjunction with the feature information, including:

performing value optimization processing on the plurality of first discrete parameters based on the feature information to optimize the values of at least part of the plurality of first discrete parameters;

and carrying out shape-beautifying optimization on the plurality of first continuous parameters and the plurality of first discrete parameters subjected to value optimization processing by using a shape-beautifying optimization algorithm.

3. The method according to claim 2, wherein performing a value optimization process on the plurality of first discrete parameters based on the feature information to optimize values of at least some of the plurality of first discrete parameters comprises:

determining a target discrete parameter corresponding to the feature template in the plurality of first discrete parameters;

searching target materials matched with the characteristic template in the characteristic material set by utilizing a template matching algorithm;

4. The method according to any of claims 1 to 3, characterized by determining the first avatar parameter using an image processing model; and the method further comprises:

inputting the second virtual pictogram into a stylized model, and executing the stylized model to output a character image;

acquiring a second virtual image parameter corresponding to the second virtual image;

adding the character image and the second avatar parameter as a sample pair to a training sample;

and training the image processing model by using the sample pairs in the training samples.

5. The method of claim 4, wherein the second avatar parameter comprises a second plurality of discrete parameters and a second plurality of continuous parameters; and

training the image processing model using sample pairs in training samples, comprising:

6. The method of claim 4, further comprising:

placing the randomly generated third virtual pictogram into the first data set;

placing the figure image obtained from the network side into a second data set;

and training a stylized model by utilizing the first data set and the second data set.

7. The method of claim 6, wherein the stylized model includes a cycle generating countermeasure network; and

training the stylized model using the first data set and the second data set, including:

8. An image processing method, comprising:

responding to an input operation triggered by a user, and acquiring a user image;

determining a face pinching parameter corresponding to the user image;

extracting feature information of the user image;

optimizing the face pinching parameters by combining the characteristic information;

calling an image generation engine to enable the image generation engine to generate a virtual pictogram according to the optimized face pinching parameters;

and displaying the virtual pictogram.

9. The method of claim 8, further comprising:

acquiring an image processing model which is trained from a server; the training sample of the image processing model is constructed by utilizing a stylized model, and the stylized model is obtained by utilizing an avatar image randomly generated by the image generation engine and a character image obtained from a network side through training;

and storing the image processing model in a local place so as to determine the face pinching parameters corresponding to the user image by using the image processing model.

10. The method according to claim 8 or 9, wherein the pinching parameters comprise: a plurality of discrete parameters and a plurality of continuous parameters; and

and optimizing the face pinching parameters by combining the characteristic information, wherein the method comprises the following steps:

performing value optimization processing on the plurality of discrete parameters based on the characteristic information to optimize the values of at least part of the plurality of discrete parameters;

and carrying out shape-beautifying optimization on the plurality of continuous parameters and the plurality of discrete parameters after value optimization processing by using a shape-beautifying optimization algorithm.

11. The method of claim 10, wherein performing a value optimization process on the plurality of discrete parameters based on the feature information to optimize values of at least some of the plurality of discrete parameters comprises:

12. An image processing method, comprising:

acquiring a first image;

determining image parameters corresponding to the first image converted into the target style image by using an image processing model;

extracting feature information of the first image;

optimizing the image parameters by combining the characteristic information of the first image;

and generating a second image of the target style according to the optimized image parameters.

13. An image processing system, comprising:

the client is used for acquiring a user image; determining a first avatar parameter corresponding to an avatar adapted to the user image; extracting feature information of the user image; optimizing the first avatar parameter in combination with the feature information; generating a first virtual pictogram according to the optimized first virtual character parameters;

and the server is used for training the image processing model, wherein a training sample of the image processing model is constructed by utilizing a stylized model, and the stylized model is obtained by utilizing an avatar image randomly generated by an image generation engine and a character image obtained from a network side through training.

14. An electronic device comprising a processor and a memory, wherein,

the memory to store one or more computer instructions;

the processor, coupled to the memory, configured to execute the one or more computer instructions to implement the steps of the method of any one of claims 1 to 7, or to implement the steps of the method of any one of claims 8 to 11, or to implement the steps of the method of claim 12.