CN114782240A

CN114782240A - Picture processing method and device

Info

Publication number: CN114782240A
Application number: CN202110014386.1A
Authority: CN
Inventors: 郎一宁; 何源; 薛晖; 杨帆
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2021-01-06
Filing date: 2021-01-06
Publication date: 2022-07-22

Abstract

The embodiment of the specification provides a picture processing method and a picture processing device, wherein the picture processing method comprises the following steps: receiving a generation request of an initial picture containing a first initial object, wherein the generation request carries attribute information of the first initial object; inputting the attribute information of the first initial object into a picture generation model to obtain the initial picture containing the first initial object; inputting the initial picture and a candidate picture containing a second initial object and a third initial object into a picture synthesis model, and obtaining a target picture containing a target object, wherein the first initial object and the third initial object are of the same type, and the target object is composed of the first initial object and the second initial object.

Description

Picture processing method and device

Technical Field

The embodiment of the specification relates to the technical field of computers, in particular to a picture processing method. One or more embodiments of the present specification also relate to a picture processing apparatus, an application program, a computing device, and a computer-readable storage medium.

Background

With the rapid development of e-commerce platforms, virtual model (fitting) technology is beginning to be applied to large platforms and some off-line shopping experience stores. The virtual model technology is a technology of putting a garment in a picture on a real world or a model in a picture, and is also called virtual fitting because the garment and the model can be virtual. Fitting is often realized through a real model in the prior art to ensure the authenticity of a real model picture, but the generated real model picture has blurred edges and low image resolution, cannot process complex clothing textures and the like.

Disclosure of Invention

In view of this, the present specification provides an image processing method. One or more embodiments of the present disclosure also relate to a picture processing apparatus, an application program, a computing device, and a computer-readable storage medium to solve the technical problems in the prior art.

According to a first aspect of embodiments of the present specification, there is provided a picture processing method, including:

receiving a generation request of an initial picture containing a first initial object, wherein the generation request carries attribute information of the first initial object;

inputting the attribute information of the first initial object into a picture generation model to obtain the initial picture containing the first initial object;

inputting the initial picture and a candidate picture containing a second initial object and a third initial object into a picture synthesis model to obtain a target picture containing a target object,

wherein the first initial object and the third initial object are of the same type, and the target object is composed of the first initial object and the second initial object.

Optionally, the inputting the attribute information of the first initial object into an image generation model to obtain the initial image containing the first initial object includes:

and inputting the attribute information of the first initial object into an image generation model to obtain the initial image containing the first initial object and the characteristic information of the first initial object.

Optionally, the inputting the initial picture and the candidate picture including the second initial object and the third initial object into a picture synthesis model to obtain a target picture including a target object includes:

adjusting feature information of a first initial object in the initial picture based on a preset demand condition to obtain at least one adjusted initial picture;

inputting the adjusted at least one initial picture and the candidate pictures containing the second initial object and the third initial object into a picture synthesis model to obtain at least one target picture containing a target object.

Optionally, the picture generation model includes a picture generation network and a picture authentication network,

the training steps of the picture generation network and the picture identification network are as follows:

acquiring a sample picture training set, wherein the sample picture training set comprises sample pictures and sample labels corresponding to the sample pictures;

training an initial picture generation network based on the sample picture and the sample label to obtain a picture generation network;

inputting the sample picture into the picture generation network to obtain a sample target picture;

training an initial picture identification network based on the sample target picture and the sample label to obtain a picture identification network, and obtaining a trained picture generation model based on the picture generation network and the picture identification network.

Optionally, the picture synthesis model comprises an encoder and at least one decoder,

the training steps of the picture synthesis model are as follows:

acquiring a sample data set, wherein the sample data set comprises a first sample object and a second sample object;

inputting the first sample object and the second sample object into the encoder for encoding, and outputting a first encoding vector of the first sample object and a second encoding vector of the second sample object;

and inputting the first encoding vector and the second encoding vector into the decoder for decoding, determining a loss function of the decoding result based on the decoding result, and adjusting network parameters of the encoder based on the loss function to obtain a trained picture synthesis model.

Optionally, the adjusting the feature information of the first initial object in the initial picture based on the preset requirement condition to obtain at least one adjusted initial picture includes:

coding feature information of a first initial object in the initial picture based on a preset demand condition;

and decoding the coded characteristic information of the first initial object by using the picture generation model to obtain at least one initial picture of the first initial object with the target characteristic information.

Optionally, the obtaining a target picture including a target object includes:

and rendering a target object of the target picture to obtain the rendered target picture containing the target object.

Optionally, after obtaining the target picture including the target object, the method further includes:

acquiring a target picture which meets a preset condition and contains a target object, and acquiring a picture which contains a fourth initial object matched with the first initial object;

inputting the target picture containing the target object and the picture containing the fourth initial object which meet the preset condition into the picture synthesis model, and obtaining the target picture containing the target object of the first initial object and the fourth initial object.

Optionally, the first initial object includes a human face or clothing.

According to a second aspect of embodiments herein, there is provided a picture processing apparatus including:

the image processing device comprises a receiving module, a processing module and a processing module, wherein the receiving module is configured to receive a generation request of an initial image containing a first initial object, and the generation request carries attribute information of the first initial object;

a picture generation module configured to input attribute information of the first initial object into a picture generation model, and obtain the initial picture containing the first initial object;

a picture synthesis module configured to input the initial picture and a candidate picture including a second initial object and a third initial object into a picture synthesis model, and obtain a target picture including a target object, wherein the first initial object and the third initial object are of the same type, and the target object is composed of the first initial object and the second initial object.

According to a third aspect of embodiments herein, there is provided an application program including:

According to a fourth aspect of embodiments herein, there is provided a computing device comprising:

a memory and a processor;

the memory is used for storing computer-executable instructions, and the processor is used for executing the computer-executable instructions, wherein the processor implements the steps of the picture processing method when executing the computer-executable instructions.

According to a fifth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of any one of the picture processing methods.

One embodiment of the present specification implements receiving a generation request for an initial picture including a first initial object, where the generation request carries attribute information of the first initial object; inputting the attribute information of the first initial object into a picture generation model to obtain the initial picture containing the first initial object; inputting the initial picture and a candidate picture containing a second initial object and a third initial object into a picture synthesis model, and obtaining a target picture containing a target object, wherein the first initial object and the third initial object are of the same type, and the target object is composed of the first initial object and the second initial object.

According to the picture processing method, the attribute information of the first initial object is input into the picture generation model to obtain the initial picture containing the first initial object, the initial picture and the candidate pictures are input into the picture synthesis model to obtain the target picture containing the target object, so that the initial picture containing the first initial object can be obtained, and the initial picture is mapped to the candidate pictures containing the second initial object and the third initial object, the problem that the picture containing the third initial object is generated is avoided, the generated picture is guaranteed to have reality, the edge after picture processing is clear, the resolution ratio is high, and better experience is provided for a user.

Drawings

Fig. 1 is a flowchart of a picture processing method according to an embodiment of the present specification;

fig. 2 is a schematic diagram illustrating a picture processing method generating a target picture including a target object according to an embodiment of the present specification;

fig. 3 is a schematic diagram of a picture processing method provided in an embodiment of the present specification generating an adjusted picture including a first initial object;

fig. 4 is a schematic diagram of a target picture including a target object generated by a picture processing method according to an embodiment of the present specification;

fig. 5 is a flowchart illustrating a processing procedure of a picture processing method according to an embodiment of the present disclosure;

FIG. 6 is a schematic structural diagram of a picture processing apparatus according to an embodiment of the present disclosure;

fig. 7 is a block diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be implemented in many ways other than those specifically set forth herein, and those skilled in the art will appreciate that the present description is susceptible to similar generalizations without departing from the scope of the description, and thus is not limited to the specific implementations disclosed below.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can be termed a second and, similarly, a second can be termed a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

First, the noun terms to which one or more embodiments of the present specification relate are explained.

A virtual model: the model image synthesized by the algorithm has a face and a body close to the real person, can be put in different postures, and can be worn on different virtual clothes.

Virtual model face: the model face synthesized by the algorithm is trained based on a large number of model face photos, and finally the model can synthesize high-definition face photos which do not exist in the real world.

Generation of antagonistic networks (CGAN model): a new network structure in the computer vision field, which is used by a large number of generative vision tasks in recent years, belongs to a mainstream algorithm structure in the field, and can comprise Style GAN2, ProGAN, Style GAN and other types of confrontation network structures.

Self-coding model: a network structure in the computer vision field, generally encode a photo into a section of information character string, the module to realize this process is called encoder; the module for decoding the information character string is called a decoder and is used for restoring the information character string into a photo.

Computer Graphics (CG): is a science of converting two-dimensional or three-dimensional graphics into a grid form for computer displays using mathematical algorithms.

3D rendering: rendering is the final CG step, and is also the final stage of fitting your image to your 3D scene.

In practical applications, the virtual model not only aims to synthesize a set of seller show photos meeting the needs of the merchant through an algorithm so as to reduce the cost of the merchant for taking a professional model, but also ensures the high quality and high reality of the synthesized model and the clothes so that the buyer show quality of the synthesized model and the clothes can reach the same level, and after the buyer receives the clothes or goods, the buyer show photos can also synthesize a set of photos meeting the needs of the buyer through the algorithm so as to avoid the real human faces of the buyer in the buyer shows photos and solve the risk of disclosing the privacy of the buyer. Existing virtual model (fitting) techniques in the current field are mainly divided into two categories, including: virtual fitting technology based on electronic fitting mirrors (comprising various sensors). The electronic fitting mirror-based virtual fitting technology is the earliest virtual fitting technology, a user stands in front of the electronic fitting mirror, a sensor acquires key points of a human body, the key points are aligned with key points of clothes, the clothes are attached to the human body, and a model of the technology is the user. Virtual fitting technology based on a generative network (GAN) is a computer vision technology which is newly emerged in recent two years, and the method combines a generative confrontation network to complete the migration of model clothes in 2D pictures, and models of the technology generally take real-person pictures. However, the two methods have major problems and limitations, such as edge blurring, low image resolution, and inability to process complex clothing textures.

The embodiment of the specification provides a virtual model synthesis technology based on generation of an antagonistic network and three-dimensional rendering, which mainly relies on the generation of the antagonistic network to synthesize the faces of a model, maps the synthesized faces onto a 3D model, and then combines a traditional Computer Graphics (CG) method to render the virtual model and clothes. The advantages are that: the human face of the virtual model does not exist in the real world, does not have the portrait right risk, and is sufficient to have reality sense; the clothes have strong stereoscopic impression after being reconstructed three-dimensionally, the virtual model can change the posture, and it needs to be noted that the virtual image has the advantages that the virtual image can be changed correspondingly according to the requirements of specific practical application, and meanwhile, the virtual model can be not only a human face model but also a cartoon image model, wherein the cartoon image model can be a cartoon image which does not exist in the real world or an authorized cartoon image, and the specification does not limit the cartoon image model.

In the present specification, a picture processing method is provided, and the present specification relates to a picture processing apparatus, an application program, a computing device, and a computer-readable storage medium, which are described in detail one by one in the following embodiments.

Referring to fig. 1, fig. 1 shows a flowchart of a picture processing method provided in an embodiment of the present specification, which specifically includes the following steps.

Step 102: receiving a generation request of an initial picture containing a first initial object, wherein the generation request carries attribute information of the first initial object.

The first initial object may be understood as an object included in an initial image that a user needs to generate, and taking the initial image as a virtual face image as an example, the first initial object is a virtual face; the attribute information is basic attribute information of the first initial object itself, for example, if the first initial object is a human face, the attribute information may be basic attribute information of a human face such as yellow skin, big eye, and high nose bridge.

Specifically, in an actual application, the server receives a picture generation request for generating a virtual face sent by a user, where the request includes attribute information of the virtual face, and the request includes basic attribute information about the virtual face, such as skin color, eye shape and size, mouth shape and size, and the like, and for example, when the server receives a picture generation request for generating a virtual asian face sent by a user, the picture generation request includes a user generation requirement, that is, the attribute information of the virtual face may be yellow skin, big eyes, high nose bridge, small mouth and the like, which meet attribute information of the asian face.

It should be noted that the generation request for generating the initial picture including the first initial object by the user is determined based on the actual requirement of the user, and in the actual application of the virtual model face synthesis, the user may adjust the face attribute information of the model according to the requirement to adapt to the actual requirement, which is not limited in this specification.

Step 104: and inputting the attribute information of the first initial object into a picture generation model to obtain the initial picture containing the first initial object.

The picture generation model may be understood as a network model that generates a picture meeting the user requirement according to attribute information of a first initial object carried in a picture generation request of a user, and this description takes the picture generation model as an example to generate a countermeasure network model, but does not limit the type of the picture generation model at all.

In specific implementation, the attribute information of the first initial object is input into a picture generation model, and an initial picture containing the first initial object can be obtained through the picture generation model, wherein the picture generation model can be a generation confrontation network model; specifically, the picture generation model comprises a picture generation network and a picture identification network,

The picture generation network and the picture identification network can be understood as a generation model and a discrimination model in the CGAN model, the generation model refers to data such as characters, pictures and videos which can be generated from input data through model training according to tasks, and the discrimination model cannot reflect the characteristics of the training data, but reflects the difference between heterogeneous data in the aspect of optimal classification between different classes.

Specifically, a sample picture training set is obtained, wherein the sample picture training set comprises a sample picture and a label of a sample picture object, an initial picture generation network is trained on the basis of the sample picture and the sample label to obtain a picture generation network, an output sample target picture and the sample label are trained on an initial picture identification network to obtain a picture identification network, and a picture generation model is obtained on the basis of the picture generation network and the picture identification network, so that the picture including the first initial object can be generated by inputting attribute information of the first initial object in a picture generation request.

It should be noted that, the photo generation model provided in the embodiment of the present disclosure may specifically select a Style GAN2 countermeasure network structure, which has more stable performance and is less prone to image noise than other countermeasure network structures of the same type, but the photo generation model is not limited to this type of countermeasure network structure, and also includes various types of countermeasure network structures such as ProGAN and Style GAN, which is not limited in this disclosure.

In practical application, for example, a training image generation model can generate an image containing a virtual face, by acquiring thousands or even hundreds of millions of images containing faces and labels corresponding to the face images, if a model with asian faces needs to be generated, the face images with the faces of the asian model need to be collected, and conditions such as other races and ages can be correspondingly matched in training data.

For example, the labels corresponding to the face pictures are white, yellow, 20-25 years old, big eyes, low nose bridge and the like, the picture generation network is trained based on prior face feature distribution through a large number of pictures containing faces and the corresponding labels, the picture generation network is enabled to generate pictures similar to the sample pictures, the pictures containing the faces generated by the picture generation network are compared in similarity through the picture identification network, the generated effect is calculated through a loss function, whether the pictures containing the faces generated by the picture generation network are the same type of face pictures as the faces in the trained sample pictures is judged, it needs to be noted that the pictures generated by the picture generation network may not have the shapes and features of the faces in the process of starting training, but the generated pictures are distinguished through the picture identification network in the continuous training process, the pictures generated by the picture generation network gradually have the characteristics of the human faces and even can be the same as the labels corresponding to the pictures, on the basis, the steps are continuously and circularly repeated, iterative interactive antagonistic training is carried out on the picture generation network and the picture identification network, and finally a Nash equilibrium state is achieved, so that the performance of the picture generation network is gradually enhanced, and the generation of the virtual human faces meeting the requirements of users can be realized.

In the embodiment of the present specification, the generation of the confrontation network model through training can realize that the attribute information of the first initial object is input, and the picture of the first initial object corresponding to the attribute information is output, so that not only can the generated picture meet the user requirement, but also the picture including the first initial object can be automatically generated in batch through the trained model, so as to facilitate subsequent use.

After the trained picture generation model has the capability of generating a target picture, inputting the attribute information of the user for the first initial object into the picture generation model, and obtaining a picture containing the first initial object; specifically, the inputting the attribute information of the first initial object into an image generation model to obtain the initial image containing the first initial object includes:

inputting the attribute information of the first initial object into a picture generation model, and obtaining the initial picture containing the first initial object and the characteristic information of the first initial object.

In specific implementation, the server inputs the feature information of the first initial object carried in the picture generation request sent by the user into the trained picture generation model, and the initial picture containing the first initial object and the feature information of the first initial object are obtained through the picture generation model.

Taking an image generation request sent by a user as an image request for generating a virtual face as an example, the first initial object is a virtual face, and for example, attribute information of the virtual face carried in the image generation request sent by the user, which is received by the server, is: basic information such as skin color is white, before the age of 20-25, big eyes, high nose, cherry mouth and the like is input into the image generation model, so that an image of the virtual face including the attribute information can be obtained, and a feature code record of the generated image including the virtual face is stored, wherein the feature code can be feature code information such as head posture information code, eye distance information code, face width information code and the like, and the dimension of the feature code information of the image can be set according to requirements, which is not limited in the embodiment of the present specification.

In the embodiment of the present specification, by inputting attribute information of a user for a first initial object into a picture generation model, an initial picture including the first initial object and feature information of the first initial object can be quickly obtained, so that the initial picture including the first initial object is subsequently synthesized with a template picture to generate a target picture including a target object.

Step 106: inputting the initial picture and a candidate picture containing a second initial object and a third initial object into a picture synthesis model, and obtaining a target picture containing a target object, wherein the first initial object and the third initial object are of the same type, and the target object is composed of the first initial object and the second initial object.

The second initial object may be understood as a part of object content of the object in the candidate template picture, the third initial object may be understood as another part of object content of the object in the candidate template picture, and the candidate picture includes a complete object composed of the second initial object and the third initial object.

It should be noted that the first initial object and the third initial object are of the same type, the target object is composed of the first initial object and the second initial object, if the first initial object is a face and the object in the candidate picture is an integral part of a portrait, the third initial object in the candidate picture is a face part of the portrait in the candidate picture, and the second initial object is a body part of the portrait in the candidate picture.

In specific implementation, an initial picture containing a first initial object and a candidate picture containing a second initial object and a third initial object, which are generated by a picture generation model, are input into a picture synthesis model, that is, the initial picture and the candidate picture can be synthesized, and because the types of the first initial object and the third initial object are the same, the part of the third initial object and the part of the first initial object can be replaced in the picture synthesis model, so that a target object combining the first initial object and the second initial object is obtained, that is, the picture containing the target object is the target picture.

In practical applications, the image processing method provided in the embodiment of the present specification is applied to an example of synthesizing a virtual model, wherein the candidate image is a model image with a real body image, the first initial object is a virtual model face, and the candidate image with the virtual model face and the candidate image with the real body model are input into a self-coding model to replace the virtual model face with a face in the real body model, so as to obtain a model synthetic image with the virtual model face and a body with the real body image.

In particular, the picture synthesis model comprises an encoder and at least one decoder,

the training steps of the picture synthesis model are as follows:

inputting the first encoding vector and the second encoding vector into the decoder for decoding, determining a loss function of the decoding result based on the decoding result, and adjusting a network parameter of the encoder based on the loss function to obtain a trained picture synthesis model.

In a specific implementation, the picture synthesis model provided in this specification may be a self-coding model, which is not limited to this, in a specific training model process, two pictures are used as training data, and an encoder and two decoders are trained, where the encoder can receive data input and compress it into a small code, and then regenerate original input data from this code, and in a continuous training process, the encoder continuously tries to learn to create a code so that it can regenerate the input original picture, as long as there is a large amount of image data, the encoder can learn to create such a code, and then decode the coded content through two decoder modules, and can restore the output image data into the input original picture data.

In the embodiment of the present specification, a trained encoder replaces an initial picture including a first initial object with a part of content of a third initial object having the same type as the first initial object, so that a replacement process between equivalent types for the first initial object can be automatically implemented, and a user's requirement for a synthesized picture is met.

Referring to fig. 2, fig. 2 is a schematic diagram illustrating that the picture processing method provided in an embodiment of the present specification generates a target picture containing a target object.

In fig. 2, a virtual model is generated by a picture generation request of a user as an example, the model image shown in fig. 2 is a virtual model image generated by a picture generation model and a picture synthesis model, wherein a part in fig. 2 is a virtual face generated by the picture generation model, but not a real model face, and the other parts except the part a in fig. 2 are template pictures of the real model, and in order to protect the portrait right of the real model, the face part is replaced by the virtual face, thereby not only realizing the protection of the portrait right of the model, but also automatically realizing the quick face change of the real model by the method provided by the embodiment.

By rendering the generated target picture containing the target object, the picture processing method provided by the embodiment can present a high-definition picture with a good display effect for the user, so that the user can further use the picture.

Because the initial picture generated by the picture generation model only comprises a single object form, the generated picture has a single representation form when being applied to the 3D stereo graph, and in order to provide richer presentation forms of the target object, the characteristic information of the first initial object is adjusted to realize rich and varied presentation forms of the picture synthesis model; specifically, the inputting the initial picture and the candidate picture including the second initial object and the third initial object into a picture synthesis model to obtain the target picture including the target object includes:

The preset requirement condition may be understood as a requirement condition that a user needs to adjust a picture including a first initial object generated based on a picture generation model, for example, the requirement condition that the different first initial objects need to be adjusted is different, if the first initial object is a human face, the preset requirement condition may be an angle presented by the front face of the human face, if the first initial object is a garment, the preset requirement condition may be a color of the garment, and the like, and the specific preset requirement condition is not limited in this specification embodiment.

In specific implementation, the server adjusts feature information of a first initial object in the initial pictures based on a preset requirement condition of a user to obtain at least one adjusted initial picture, and then inputs the at least one adjusted initial picture and a candidate picture including a second initial object and a third initial object into a picture synthesis model, so as to obtain at least one target picture including a target object.

For example, taking generating a virtual model with rich poses as an example, a picture generation model is utilized according to a picture generation request of a user, a virtual face picture meeting the user request is generated for the user, feature information of the virtual face, such as head pose information, mouth state information, eye state information and the like, is recorded and stored, and a preset requirement condition is that a virtual model image with a plurality of head poses is generated, that is, the head pose information in the feature information of the virtual face is adjusted, virtual faces with a plurality of head poses can be generated through the picture generation model, and a plurality of virtual faces with different head poses and a real model of a candidate picture are synthesized through a picture synthesis model, so that the virtual model with the plurality of head poses can be obtained.

In the embodiment of the present specification, a plurality of adjusted pictures including a first initial object are generated by adjusting the first initial object according to a preset requirement condition, so that a series of target pictures having various target objects are generated through a picture synthesis model in a subsequent and rapid manner, and requirements of a user are met.

Further, the adjusting the feature information of the first initial object in the initial picture based on the preset requirement condition to obtain at least one adjusted initial picture includes:

In specific implementation, the feature information of the first initial object in the initial picture is encoded based on a preset requirement condition of a user, and the encoded feature information of the first initial object is decoded by using a picture generation model to obtain at least one initial picture of the first initial object with target feature information.

In the above example, feature codes of head pose information in the feature information of the virtual human face are adjusted, for example, feature vectors for controlling the head pose are adjusted to obtain feature codes of a plurality of head poses, and the encoded head feature information is decoded by using a picture generation model, that is, a virtual human face with one, two or more head poses is generated.

In the embodiment of the present specification, by performing secondary encoding on the feature information of the first initial object, a picture including diversified first initial objects is output, and then synthesized with the candidate picture, so as to subsequently and quickly present diverse target pictures including target objects for a user.

Referring to fig. 3, fig. 3 is a schematic diagram illustrating that the picture processing method provided in an embodiment of the present specification generates an adjusted picture containing the first initial object.

Fig. 3 is a detailed example taking the first initial object as a virtual model face as an example, where a in fig. 3 is a picture including a virtual face generated based on a picture generation model, and b in fig. 3 is a virtual face picture obtained by adjusting head feature information of the virtual face, and a head pose of the virtual face is tilted by a certain angle to adjust the head feature information, so that the adjusted virtual model is shown to meet the requirements of a user.

After a target picture containing a target object is obtained, in order to enable the target picture to have a better display effect and improve the picture quality, image rendering can be carried out on the target picture; specifically, the obtaining a target picture including a target object includes:

In specific implementation, the target object in the obtained target picture is subjected to 3D rendering, the image resolution can be automatically adjusted, and the rendered target picture containing the target object can be obtained.

It should be noted that resolution adjusted by rendering can reach a definition of 4k at most, which is much higher than an image effect generated in an image generation model, and further to the problem of blurred image edges, the 3D rendering technology provided by the embodiment of the present specification is synthesized, and the stereoscopic impression of the image edges is very strong, so as to improve user experience.

In practical application, in the field of virtual model generation, clothing is generally synthesized on a model body, and the clothing can be returned to a user through manual subsequent processing, namely picture refinement; according to the picture processing method provided by the embodiment of the description, the synthesized virtual model picture is manually screened out to be rendered, and the rendered picture is rendered through a rendering technology, so that a picture with high definition and high quality is obtained, and the problems of picture quality and the like are solved.

In the embodiment of the description, the obtained target picture containing the target object is rendered, so that the defects of picture quality can be overcome, the problems of stereoscopic impression and edge definition of the picture are enhanced through a rendering technology, and the time for processing the picture manually and subsequently is saved through an automatic rendering technology.

In addition, after a target picture containing a target object is generated for a user through the picture generation model and the picture synthesis model, the target picture is only a virtual synthesis picture with a single expression form, and in order to obtain a target picture which is richer and has a better display effect, a picture of a fourth initial object matched with the first initial object needs to be obtained, and picture synthesis is further carried out; specifically, after obtaining the target picture including the target object, the method further includes:

acquiring a target picture which meets a preset condition and contains a target object, and acquiring a candidate picture which contains a fourth initial object matched with the first initial object;

inputting the target picture containing the target object and the candidate picture containing the fourth initial object which meet the preset condition into the picture synthesis model, and obtaining the target picture of the target object containing the first initial object and the fourth initial object.

The fourth initial object may be understood as an initial object matched with the first initial object, for example, the first initial object is a virtual face, if the virtual face is a front face, the front face is matched with the front face, and if the virtual face is a left face, the left face is matched with the side face.

In specific implementation, a target picture including a target object is generated through the picture generation model and the picture synthesis model, the target picture meeting the preset conditions is obtained, the preset conditions can be actual demand conditions of a user, a candidate picture including a fourth initial object matched with the first initial object is obtained, the target picture meeting the preset conditions and the candidate picture including the fourth initial object are input into the picture synthesis model, and the target picture including the target object of the fourth initial object is obtained.

Taking the obtained target picture as a virtual model picture as an example, in order to obtain a target picture of a multi-pose virtual model, that is, to obtain a virtual face picture including an adjusted head pose, wherein the head pose is a head pose of a left side face, and to obtain a candidate picture including a body pose of the left side face matched with the head pose of the left side face, wherein the candidate picture is a complete real model picture, and the virtual face picture and the candidate picture are input into a picture synthesis model, so that the target picture having the head pose of the left side face and the body pose of the left side face can be obtained.

It should be noted that there may be a plurality of virtual face pictures with adjusted head postures, virtual face pictures with different angles may be adjusted according to user requirements, and there may also be a plurality of complete real model pictures in candidate pictures.

In the embodiment of the present specification, a target picture that meets a preset condition and a candidate picture of a fourth initial object that can match the first initial object are obtained and synthesized through a picture synthesis model, so that the target picture that includes the first initial object and the fourth initial object can be quickly obtained, and a target picture that is diversified and meets user requirements is realized.

It should be noted that, in the picture processing method provided in this embodiment of the present specification, the first initial object includes a human face or clothes.

In practical applications, the method provided by the embodiment of the present specification can also be applied to the generation of virtual clothes and the synthesis to the body of a virtual model to implement a virtual fitting technology, and an application scenario is not limited herein.

Referring to fig. 4, fig. 4 is a schematic diagram illustrating a target picture containing a target object generated by the picture processing method provided in an embodiment of the present specification.

In fig. 4, a target object is taken as an example of a virtual model, and a in fig. 4 shows pictures of a plurality of virtual models, wherein the faces of the plurality of virtual models are virtual faces generated by a picture generation model, and because the body postures of the models of the candidate pictures are different in the synthesis of the virtual models, a plurality of virtual models with virtual faces and different body postures can be generated according to template pictures with different postures; b in fig. 4 shows a plurality of virtual model pictures, wherein the faces of the virtual model pictures are virtual human faces, and the head features are head poses displayed after adjusting the head feature information, and since the body poses of the models of the candidate pictures are different in the synthesis of the virtual model, a plurality of virtual model pictures of models having different head poses and different body poses can be generated, wherein the head poses are different from those of a in fig. 4.

To sum up, in the embodiments of the present description, the attribute information of the first initial object is input into the picture generation model to obtain the initial picture including the first initial object, and the initial picture and the candidate picture are both input into the picture synthesis model to obtain the target picture including the target object, so that the model face synthesis can be implemented, and the synthesized face is mapped onto the 3D model, which not only avoids the problem of portrait right of the real model, but also ensures that the generated picture has a sense of reality, and the processed picture has clear edges and high resolution, thereby providing better experience for the user.

In addition, the image processing method provided by the embodiment of the present specification may include three innovative points: first, in the embodiment of the present specification, a model face generator is obtained by training based on a structure for generating a confrontation network and using a model face picture as training data. By utilizing the generator, the face photos of the model can be automatically generated in batch as the image basis of the virtual model. Second, embodiments of the present specification train an encoder and two decoders based on a self-coding model with the pair of child face pictures as training data. The trained self-encoder model can be used for finishing face changing between an original face and a target face, and particularly, the model face is changed to a 3D model of an original template when the model face is applied to the application target of the patent. Through the operation, the method can obtain a 3D model with a synthetic human face for rendering the virtual fitting. Third, another innovation of the embodiments of the present specification is multi-pose rendering of a virtual model. The generated model photo output by the countermeasure network only has a single head gesture on the front, in order to provide richer model gestures, the method carries out secondary coding on image features output by the countermeasure network, adjusts a feature vector for controlling the head gesture so as to realize feature level editing of a face image, and finally outputs model photos in various gestures in cooperation with the 3D gesture of a human body.

The following will further describe the image processing method by taking an application of the image processing method provided in this specification in virtual model synthesis as an example with reference to fig. 5. Fig. 5 shows a processing flow chart of a picture processing method provided in an embodiment of the present specification.

In the embodiment of the present description, based on the structure of the generated confrontation network, a model face picture is taken as training data, a model face generator is obtained through training, and by using the generator, the model face picture can be automatically generated in batch as the visual basis of the virtual model. On the aspect of selection of training data, corresponding matching can be performed according to the type of a target model, if a model with an Asian face needs to be generated, a face picture of the face of the Asian model needs to be collected, conditions such as other ethnicities, ages and the like can be correspondingly matched in the training data, and the feature prior of a trained face generator is completely from a training data set. The generation countermeasure network is structurally divided into a generator and a discriminator, and the generator is used for generating a face photo based on prior face feature distribution; the discriminator is used for comparing the similarity of the face photos with the real face photos in the training data, and the generating effect is calculated through a loss function, namely whether the face photos with the same type as the training data are generated or not is judged. During the iterative interaction confrontation training process of the generator and the arbiter, the performance of the generator is gradually enhanced. Generating a virtual face based on a face generator and a face discriminator, namely outputting a virtual face picture meeting the requirements of a user;

fig. 5B shows a process of face-changing a virtual model by a virtual model face-changing network, and this embodiment of the present specification trains an encoder and two decoders based on a self-coding model with paired children face (face a and face B) pictures as training data. This "autoencoder" is actually a deep neural network that can receive a data input and compress it into a small code, and then regenerate the original input data from this code. In this standard auto-encoder setup, the encoder will try to learn to create an encoding so that the network can regenerate the original picture of the input. The encoder can learn to create this code as long as there is enough image data, and let two decoder modules, one restore it to the original face a and the other to the target face B. And inputting the virtual face picture generated by the step a in the figure 5 into an encoder for encoding by utilizing the trained self-encoder model, decoding by a decoder, replacing the virtual face picture into the template picture of the real model according to the template picture of the real model, and outputting the model picture with the virtual face, wherein the model picture can be used for rendering the virtual fitting.

Fig. 5 c shows the process of synthesizing the virtual model in multiple poses, where the photo of the model output by the generated confrontation network has only a single head pose on the front, and in order to provide richer model poses, in the virtual model multiple pose synthesizing part, the image features output by the generated confrontation network are secondarily encoded, the feature vectors controlling the head pose are adjusted to implement feature level editing of the human face image, and finally, the photo of the model in multiple poses is output in cooperation with the 3D pose of the human body of the virtual model. The method comprises the steps that a confrontation network is generated, a virtual model face image is output, meanwhile, the feature code records of the image are also stored, although the output face image cannot be directly modified, the feature code of the image can be directionally modified, for example, the feature code is edited only in the direction of the feature vector for controlling the head pose, the feature codes of various head poses can be generated through the method, and then the codes are restored into photos to output the faces of the model with multiple head poses. And finally, matching the head postures with the diversified postures of the body, manually screening the required model postures, and rendering and outputting.

In the embodiment of the specification, a virtual face meeting the requirement is generated through a face generator, the virtual face is replaced by a real model template picture through a trained encoder so as to realize an automatic face changing process of a virtual model, the rendering of the virtual model with a single posture can be carried out, the face of the model with multiple postures is realized by changing the characteristic information of the virtual face, and when the face of the model with multiple postures is matched with the body posture, the picture of the virtual model with multiple postures can be quickly obtained.

Corresponding to the above method embodiment, this specification further provides an image processing apparatus embodiment, and fig. 6 shows a schematic structural diagram of an image processing apparatus provided in an embodiment of this specification. As shown in fig. 6, the apparatus includes:

a receiving module 602, configured to receive a generation request for an initial picture including a first initial object, where the generation request carries attribute information of the first initial object;

a picture generation module 604, configured to input attribute information of the first initial object into a picture generation model, and obtain the initial picture containing the first initial object;

a picture synthesis module 606 configured to input the initial picture and a candidate picture including a second initial object and a third initial object into a picture synthesis model, and obtain a target picture including a target object, where the first initial object and the third initial object are of the same type, and the target object is composed of the first initial object and the second initial object.

Optionally, the image generation model 604 is further configured to:

Optionally, the picture synthesizing module 606 is further configured to:

Optionally, the image generation model 604 is further configured to:

the picture generation model comprises a picture generation network and a picture identification network,

Optionally, the picture synthesizing module 606 is further configured to:

the picture synthesis model comprises an encoder and at least one decoder,

the training steps of the picture synthesis model are as follows:

Optionally, the picture generation model 604 is further configured to:

Optionally, the picture synthesizing module 606 is further configured to:

inputting the target picture containing the target object and the candidate picture containing the fourth initial object which meet the preset condition into the picture synthesis model, and obtaining the target picture containing the target object of the first initial object and the fourth initial object.

Optionally, the image processing apparatus further includes: the first initial object comprises a human face or clothing.

The image processing device provided in the embodiment of the present specification obtains an initial image including a first initial object by inputting attribute information of the first initial object into an image generation model, and obtains a target image including a target object by inputting both the initial image and a candidate image into the image synthesis model, thereby realizing model face synthesis and mapping a synthesized face onto a 3D model, which not only avoids the problem of portrait right of a real person model, but also ensures that the generated image has reality, and the image processed edge is clear and has high resolution, which provides better experience for users.

The above is a schematic scheme of a picture processing apparatus of this embodiment. It should be noted that the technical solution of the image processing apparatus and the technical solution of the image processing method belong to the same concept, and details that are not described in detail in the technical solution of the image processing apparatus can be referred to the description of the technical solution of the image processing method.

An embodiment of the present specification further provides an application program, which stores computer instructions, and the instructions are executed by a processor to implement the steps of the picture processing method.

In a specific implementation, the application program may include an application program that implements the image processing method, for example, an e-commerce application program to which the image processing method is applicable.

The above is a schematic scheme of an application program of the embodiment. It should be noted that the technical solution of the application program and the technical solution of the above-mentioned picture processing method belong to the same concept, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the above-mentioned picture processing method.

FIG. 7 illustrates a block diagram of a computing device 700 provided in accordance with one embodiment of the present description. The components of the computing device 700 include, but are not limited to, memory 710 and a processor 720. Processor 720 is coupled to memory 710 via bus 730, and database 750 is used to store data.

Computing device 700 also includes access device 740, access device 740 enabling computing device 700 to communicate via one or more networks 760. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 740 may include one or more of any type of network interface, e.g., a Network Interface Card (NIC), wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 700, as well as other components not shown in FIG. 7, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 7 is for purposes of example only and is not limiting as to the scope of the present description. Other components may be added or replaced as desired by those skilled in the art.

Computing device 700 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 700 may also be a mobile or stationary server.

Wherein, the processor 720 is configured to execute the following computer-executable instructions, wherein the processor implements the steps of the picture processing method when executing the computer-executable instructions.

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the image processing method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the image processing method.

An embodiment of the present specification further provides a computer readable storage medium storing computer instructions, which when executed by a processor, are used to implement the steps of the picture processing method.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the above-mentioned image processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the above-mentioned image processing method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of combinations of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the embodiments. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, and to thereby enable others skilled in the art to best understand the specification and utilize the specification. The specification is limited only by the claims and their full scope and equivalents.

Claims

1. A picture processing method comprises the following steps:

2. The picture processing method according to claim 1, wherein said inputting the attribute information of the first initial object into a picture generation model to obtain the initial picture containing the first initial object comprises:

3. The picture processing method according to claim 2, wherein said inputting the initial picture and a candidate picture including a second initial object and a third initial object into a picture synthesis model to obtain a target picture including a target object comprises:

and inputting the adjusted at least one initial image and the candidate images containing the second initial object and the third initial object into an image synthesis model to obtain at least one target image containing a target object.

4. The picture processing method according to claim 1 or 2, wherein the picture generation model includes a picture generation network and a picture authentication network,

5. The picture processing method according to claim 1 or 3, said picture synthesis model comprising an encoder and at least one decoder,

the training steps of the picture synthesis model are as follows:

6. The picture processing method according to claim 3, wherein the adjusting the feature information of the first initial object in the initial picture based on the preset requirement condition to obtain at least one adjusted initial picture comprises:

7. The picture processing method according to claim 1, said obtaining a target picture including a target object, comprising:

and rendering the target object of the target picture to obtain the rendered target picture containing the target object.

8. The picture processing method according to claim 7, further comprising, after obtaining a target picture containing a target object:

inputting the target picture containing the target object and the candidate picture containing the fourth initial object which meet the preset condition into the picture synthesis model to obtain a target picture of the target object containing the first initial object and the fourth initial object.

9. The picture processing method according to claim 1, wherein the first initial object comprises a human face or clothing.

10. A picture processing apparatus comprising:

the image generation module is configured to input attribute information of the first initial object into an image generation model to obtain the initial image containing the first initial object;

11. An application program, comprising:

12. A computing device, comprising:

a memory and a processor;

the memory is used for storing computer-executable instructions, and the processor is used for executing the computer-executable instructions, wherein the processor executes the computer-executable instructions to realize the steps of the picture processing method according to any one of claims 1 to 9.

13. A computer-readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the picture processing method of any one of claims 1 to 9.