CN114004772A - Image processing method, image synthesis model determining method, system and equipment - Google Patents

Image processing method, image synthesis model determining method, system and equipment Download PDF

Info

Publication number
CN114004772A
CN114004772A CN202111162107.2A CN202111162107A CN114004772A CN 114004772 A CN114004772 A CN 114004772A CN 202111162107 A CN202111162107 A CN 202111162107A CN 114004772 A CN114004772 A CN 114004772A
Authority
CN
China
Prior art keywords
image
sample
model
graph
image synthesis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111162107.2A
Other languages
Chinese (zh)
Inventor
赵帅帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Network Technology Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202111162107.2A priority Critical patent/CN114004772A/en
Publication of CN114004772A publication Critical patent/CN114004772A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)

Abstract

The embodiment of the application provides an image processing method, an image synthesis model determining system and image synthesis model determining equipment. The image processing method comprises the following steps: responding to the operation of a user, and acquiring a first image and a second image; determining a first image synthesis model, wherein the first image synthesis model learns an image synthesis result of a second image synthesis model through a training process, and the second image synthesis model is a pre-trained model; inputting the first image and the second image into the first image synthesis model, and outputting a synthesized image; and displaying the composite image. The technical scheme provided by the embodiment of the application has the advantages of good image synthesis effect, small time delay and good real-time property.

Description

Image processing method, image synthesis model determining method, system and equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to an image processing method, an image synthesis model determining system, and an image synthesis model determining device.
Background
The effect of using real person to display clothes is better and more intuitive than the effect of only displaying clothes by merchants for the propaganda of own clothes. However, the types of clothes of the merchants are various, if each piece of clothes is photographed by looking for one model, the photographing cost is high, the types of clothes are changed very quickly, the cost is increased by photographing again after each change, and the model is not photographed conditionally at every home.
In the prior art, some technologies exist, a merchant (or called a user) does not need to find a real model to shoot, and an image simulating to shoot a model to try on corresponding clothes can be generated by some automatic image generation means. However, the image generated by the prior art is not generated for a long period, or the dressing effect of the model is not real.
Disclosure of Invention
The present application provides an image processing method, a system and a device for determining a first image synthesis model that solve the above problems, or at least partially solve the above problems.
In one embodiment of the present application, an image processing method is provided. The method comprises the following steps:
responding to the operation of a user, and acquiring a first image and a second image;
determining a first image synthesis model, wherein the first image synthesis model learns an image synthesis result of a second image synthesis model through a training process, and the second image synthesis model is a pre-trained model;
inputting the first image and the second image into the first image synthesis model which is trained, and outputting a synthesis image;
and displaying the composite image.
Responding to the image input operation of a user, and acquiring a first display object image and a model image;
determining a first image synthesis model, wherein the first image synthesis model learns an image synthesis result of a second image synthesis model through a training process, and the second image synthesis model is a pre-trained model;
inputting the first exhibit image and the model image into the first image synthesis model, and outputting a synthesis image of the model exhibiting the first exhibit;
and displaying the composite image.
In yet another embodiment of the present application, a method for determining an image composition model for image processing is also provided. The method comprises the following steps:
acquiring a second image synthesis model which is trained in advance, a first image synthesis model to be trained and a training sample; wherein the training sample comprises a first sample graph, at least one first feature graph associated with the first sample graph, a second sample graph, at least one second feature graph associated with the second sample graph, a third sample graph and a fourth sample graph; the first sample graph and the second sample graph are synthesized to obtain a synthesized graph related to graph content of the third sample graph; the fourth sample graph and the third sample graph are synthesized to obtain a synthesized graph related to the graph content of the first sample graph;
inputting the first sample map, the at least one first feature map, the second sample map, and the at least one second feature map into the second image synthesis model to synthesize the first sample map and the second sample map with reference to the at least one first feature map and the at least one second feature map, thereby obtaining a first output map;
inputting the first output map, the third sample map, and the fourth sample map into the first image synthesis model, outputting a second output map for knowledge distillation;
optimizing the first image synthesis model based on the second output map and the first sample map;
and the optimized first image synthesis model is used for processing two input images to obtain a synthesized image.
In yet another embodiment of the present application, an image processing system is also provided. The image processing system includes:
the data layer is used for interactively storing and acquiring data with the database; the database stores data which can be used as training samples;
the processing layer is provided with at least one second image synthesis model serving as a teacher model and at least one first image synthesis model serving as a student model and used for generating training samples according to the data set acquired by the data layer; respectively training the at least one first image synthesis model by using the training sample and the at least one second image synthesis model to obtain at least one first image synthesis model for learning the image synthesis result of the corresponding second image synthesis model;
the application layer is used for receiving a first image and a second image input by a user;
the processing layer is further configured to synthesize the first image and the second image by using at least a part of the at least one first image synthesis model to obtain a synthesized image;
the application layer is further configured to send the composite image to a client device corresponding to the user.
In yet another embodiment of the present application, an image processing system is also provided. The image processing system includes:
the server is used for respectively training at least one first image synthesis model serving as a student model by using the training samples and at least one second image synthesis model serving as a teacher model to obtain at least one first image synthesis model for learning the image synthesis result of the corresponding second image synthesis model;
a client for locally deploying at least part of the at least one first image composition model; the first image and the second image are acquired in response to the operation of a user; determining a first image synthesis model, inputting the first image and the second image into the first image synthesis model, outputting a synthesis image, and displaying the synthesis image.
In yet another embodiment of the present application, an electronic device is also provided. The electronic device includes a processor and a memory, wherein,
the memory to store one or more computer instructions;
the processor, coupled with the memory, is configured to execute the one or more computer instructions to implement the steps in the above-described method embodiments.
In yet another embodiment of the present application, a computer program product is also provided. The computer program product comprises computer programs or instructions which, when executed by a processor, cause the processor to carry out the steps in the above-described method embodiments.
According to the technical scheme provided by the embodiment of the application, the knowledge distillation technology is used for training the first image synthesis model by utilizing the second image synthesis model, so that the first image synthesis model has the precision and the performance similar to those of the second image synthesis model. In this way, in implementation, a model with good precision and performance can be selected as the second image synthesis model (in the knowledge distillation technology field, the second image synthesis model can be called as a teacher model), and the first image synthesis model is trained through the knowledge distillation process; although the original first image synthesis model has low performance and accuracy, the knowledge distillation training process has high accuracy and high performance which are similar to those of the second image synthesis model, so that the image synthesis using the first image synthesis model only needs two images and has good effect; in addition, most of the models with high precision and good performance have the problem of time delay, while the first image synthesis model in the embodiment has good precision and performance, and has the characteristics of high processing efficiency when images need to be processed in batch due to small parameter quantity, small time delay and good real-time performance.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required to be utilized in the description of the embodiments or the prior art are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained according to the drawings without creative efforts for those skilled in the art.
Fig. 1 is a schematic flowchart illustrating an image processing method according to an embodiment of the present application;
FIG. 2 is a schematic diagram illustrating a technical solution provided by an embodiment of the present application from an application interface perspective;
fig. 3 is a schematic diagram illustrating the principle of a spatial transformation network module in the image processing method according to an embodiment of the present application;
FIG. 4 is a schematic diagram illustrating a variation of FIG. A using a spatial transform network module;
fig. 5 is a schematic diagram illustrating the principle of a GAN module in the image processing method according to an embodiment of the present application;
FIG. 6 is a schematic diagram illustrating an image processing method according to an embodiment of the present application;
FIG. 7 is a flow chart illustrating an image processing method according to another embodiment of the present application;
FIG. 8 is a schematic diagram illustrating a principle of training a first image synthesis model through a knowledge-based distillation process based on a second image synthesis model provided by an embodiment of the present application;
FIG. 9 is a flowchart illustrating a method for determining a first image synthesis model for image processing according to an embodiment of the present application;
FIG. 10 is a schematic diagram of an image processing system provided by an embodiment of the present application;
FIG. 11 illustrates an architectural diagram of an image processing system provided from the system software architecture level in accordance with an embodiment of the present application;
fig. 12 is a schematic structural diagram illustrating an image processing apparatus according to an embodiment of the present application;
fig. 13 is a schematic structural diagram illustrating a first image synthesis model determining apparatus for image processing according to another embodiment of the present application;
fig. 14 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In the prior art, a scheme exists for generating by using a three-dimensional image rendering technology. The scheme mainly comprises two steps: a virtual three-dimensional model is generated and then the garment is rendered onto the three-dimensional model. The scheme needs a large amount of human body three-dimensional information at the early stage, and the cost is high. The generation period of the later use stage is long, and real-time generation cannot be achieved. Therefore, the following embodiments are provided to solve the problems in the prior art, and the scheme capable of generating the display effect quickly and replacing the display effect according to the change of the clothes style is provided.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. In some of the flows described in the specification, claims, and above-described figures of the present application, a number of operations are included that occur in a particular order, which operations may be performed out of order or in parallel as they occur herein. The sequence numbers of the operations, e.g., 101, 102, etc., are used merely to distinguish between the various operations, and do not represent any order of execution per se. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different modules, models, devices, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different. In addition, the embodiments described below are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Before introducing the embodiments provided by the present application, a brief introduction is made to a scenario to which the technical solution provided by the embodiments of the present application is adapted. The scene that the technical scheme provided by the embodiment of the application can be suitable for can be a model repacking scene, such as the scene shown in fig. 2. For example, the technical scheme provided by the embodiment of the application can provide a service for synthesizing the commodity image for the merchant. For example, a merchant user may upload a model image (e.g., a previously taken photograph of a model-worn garment a) and a dress image (which may be a tile of a dress) via a client (e.g., a smartphone, a desktop computer, a tablet computer, etc.). The merchant user then clicks on a "composite" control on the interactive interface of the client device to obtain a composite image of the model's dress. Because the first image synthesis model adopted by the synthetic image is obtained by training by using a knowledge distillation technology, the performance and the precision are higher; the composite image has a good effect. The merchant user not only wants to put the composite image on the e-commerce platform for promotion, but also wants to make an advertisement poster for offline promotion. Because the first image synthesis model adopted by the synthesis image is obtained by training the high-resolution sample graph, the synthesis image also has higher resolution, the resolution requirement of the advertisement poster can be met, and the effect of making the advertisement poster is very good.
Besides the change-over, the technical solutions provided in the embodiments of the present application are also applicable to other types of goods, such as shoes, boots, bags (handbags, bags, suitcases, etc.), scarves, hats, gloves, belts, ornaments (such as bracelets, rings, necklaces, earrings, headwear, etc.), watches, and handheld electronic devices (such as mobile phones, notebook computers, tablet computers, etc.).
The training of the first image synthesis model for processing image synthesis in the present application may be handled by the server. For example, the server trains a first image synthesis model, and then sends the trained first image synthesis model to the client, so that the client can locally synthesize two images input by the client user by using the first image synthesis model. Or, the training of the first image synthesis model and the image synthesis of the two images by using the first image synthesis model are both performed by the server, and the client sends the first image and the second image input by the user to the server and then receives the synthesized image fed back by the server.
Fig. 1 shows a flowchart of an image processing method according to an embodiment of the present application. The execution main body of the method provided by this embodiment may be a client, and the client may be a smart phone, a desktop computer, a notebook computer, a tablet computer, an intelligent wearable device, and the like, which is not limited in this embodiment. As shown in fig. 1, the method includes:
101. responding to the operation of a user, and acquiring a first image and a second image;
102. determining a first image synthesis model, wherein the first image synthesis model learns an image synthesis result of a second image synthesis model through a training process, and the second image synthesis model is a pre-trained model;
103. inputting the first image and the second image into the first image synthesis model, and outputting a synthesized image;
104. and displaying the composite image.
Referring to the example shown in fig. 2, a user may input a first image and a second image through the interactive interface shown in fig. 2. Wherein the first image and the second image may be user selected from a gallery. The gallery may be locally stored (such as an album) in an execution main body of the method in this embodiment, or may be stored in other devices on the network side, which is not limited in this embodiment.
In 102, the accuracy and performance of the second image synthesis model are higher than those of the first image synthesis model. Here, it should be noted that: by the date of this application, it was generally recognized in the art that the performance and accuracy of models with large model magnitudes surpassed those of models with small model magnitudes. The model magnitudes of the two models can be compared from multiple dimensions, such as the number of parameters in the model, the level quantities contained by the model, the number of input parameters to the model, and the like. The model with large model magnitude has large parameter quantity, multiple levels, multiple input parameters and the like. Therefore, the model magnitude of the second image synthesis model is larger than that of the first image synthesis model in this embodiment.
The first image synthesis model was obtained using a knowledge-based distillation technique. Knowledge distillation refers to the process of migration of parameters between two models in deep learning, or migration of information learned by a large model to a small model. The large model and the small model are relative concepts, and the two models in the knowledge distillation, the one with the large parameter quantity can be understood as the large model, and the one with the small parameter quantity can be understood as the small model. Or, in the two models in the knowledge distillation, the model with large magnitude (such as more levels, more parameters in the models and more input parameters) is a large model, and the model with small magnitude (such as less levels, less parameters in the models and less input parameters) is a small model.
Knowledge distillation is to train a small model by constructing a lightweight small model and utilizing supervision information of a large model with better performance so as to achieve better performance and precision. This large model is called the teacher (teacher) model and the small model is called the student (student) model.
In an embodiment of the application, the first image synthesis model is trained using knowledge distillation based on the second image synthesis model. The distillation process of knowledge is further illustrated under the scheme of the present application.
The knowledge distillation process may include several distillation modes as follows: off-line distillation, semi-supervised distillation, self-supervised distillation, etc. Wherein the off-line distillation process is: a teacher model is trained in advance, then when the student model is trained, the obtained teacher model is used for supervision training to achieve the purpose of distillation, the training precision of the teacher model is higher than that of the student model, the difference value is larger, and the distillation effect is more obvious. Generally speaking, parameters of the teacher model are kept unchanged in the distillation training process, so that the purpose of training the student model is achieved. The difference of the output predicted values between the teacher model and the student model is calculated by the distillation loss function distillation loss, and the difference and the student loss (loss) are added together to be used as the loss of the whole training to perform gradient updating, so that the student model with higher performance and precision is finally obtained. The semi-supervised distillation process is: and (4) using the prediction information of the teacher model as a label to supervise and learn the student model. Before training the student model, inputting part of unlabeled data, using the teacher network output label as supervision information, and inputting the supervision information into the student model to complete the distillation process, so that a data set with less labeling amount can be used, and the purpose of improving the model precision is achieved. The self-supervision distillation process is as follows: there is no need to train a teacher model in advance, but the training of the student model itself completes a distillation process. There are many ways of self-supervised distillation, such as training the student model first, and using the student previously trained as the supervised model when the last epochs of the whole training process (1 epoch represents all samples in 1 training set). In the remaining epoch, the model is distilled. The advantage of doing so is that need not train good teacher model in advance, just can train the distillation simultaneously, saves the training time of whole distillation process.
In this embodiment, a knowledge distillation technology is adopted, and a first image synthesis model (e.g., a student model) can be trained by performing knowledge distillation on a high-precision second image synthesis model (e.g., a teacher model), so as to improve the precision of the first image synthesis model. The first image synthesis model thus trained may be used to perform a synthesis calculation on both images. Compared with the second image synthesis model, the first image synthesis model has small model magnitude, and the time delay for processing the image by using the trained first image synthesis model is small; in addition, the knowledge distillation method can also compress the input parameter quantity of the second image synthesis model, and the parameter quantity input by using the trained first image synthesis model (namely the first image synthesis model) is less than that of the second image synthesis model, so that the knowledge distillation method is convenient for users to use.
More prominent is that: by adopting the technical scheme provided by the embodiment of the application, a user can obtain a good synthesis effect only by simply providing two images; for users, the requirement for inputting images is low, and the operation is simple; particularly, for the merchant, the merchant user only needs to provide the picture of the new commodity and the picture of the existing model, the model does not need to be found for shooting, the cost is low, the operation is convenient and fast, and the commodity shelving efficiency is high.
Further, the first image synthesis model described in this embodiment includes: the system comprises a space transformation network module and an image synthesis network module. The first image synthesis model may include a plurality of layers, the spatial transform network module may be a part of the plurality of layers, and the image synthesis network module may be another part of the plurality of layers. The first image synthesis model may further include other layers, such as an input layer, a convolution layer, and an output layer, in addition to the layers corresponding to the spatial transform network module and the image synthesis network module, which is not limited in this embodiment. Accordingly, in the method of this embodiment, the step 103 "inputting the first image and the second image into the first image synthesis model to output a synthesized image" may specifically include:
1031. inputting the first image and the second image into the spatial transformation network module, and outputting an intermediate image obtained by performing coordinate transformation on at least part of pixel points in the second image by referring to the first image;
1032. inputting the intermediate image and the first image into the image synthesis network module, and outputting the synthesized image;
wherein the image synthesis network module shares at least part of the parameters with the second image synthesis model. Here, the image synthesis network module shares parameters with the second image synthesis model, which can be understood as: the second image synthesis model comprises the same parameters as those in the image synthesis network module. Alternatively, the second image synthesis model has the same network layer as the image synthesis network module.
The Spatial Transformer Networks (STN) module is a module providing a Spatial transformation function. The STN as a special network module can be embedded into a certain layer of the first image synthesis model, so that the spatial transformation (simulation transformation, projection transformation) and the like are supported, and the characteristics of rotation invariance, translation deformation prevention and the like are added to the first image synthesis model. For example, the STN module may pose an object in an image input to the first image synthesis model. Referring to fig. 3, the principle structure of the STN module may include: regression network, network generator and sampler. Wherein:
regression Network (localization Network): the input original image U is subjected to several convolution operations and then all connections are regressed to 6 angular values (assuming affine transformation), such as a 2 x 3 matrix.
Network generator (Grid generator): and calculating the coordinate position in the target image V by matrix operation, wherein each position in the target image V corresponds to the coordinate position in the original image U, namely generating T (G).
Sampler (Sampler): sampling is performed on the original image U based on the coordinate information in t (g), and the pixels in the original image U are copied to the target image V.
Such as the two diagrams shown in fig. 4, diagram a and diagram B are input to the STN module, which transforms the garment in diagram B based on the pose of the model in diagram B to obtain diagram C. In practice, GMM (geometric Matching Module) may be used in addition to STN. GMM is an end-to-end neural network trained with pixel-wise L1 loss (a loss function that computes the loss between pixels of predicted and target images), and can also be used to align an input garment C with a person representation P (e.g., a pose key point heat map, body shape mask, artwork retention area, etc.) and generate an image of garment C that is appropriate for the person's pose, body shape, etc. For more details of the GMM, reference is made to the relevant literature and no further details are given here.
In specific implementation, the image synthesis Network module in this embodiment may be implemented by using a generation countermeasure Network (GAN). The GAN aims at generating data by means of a competing game between neural networks. GAN has strong image generation capability, so that GAN has wide applications in image synthesis, image inpainting, super-resolution, draft image restoration and the like. As shown in fig. 5, the GAN model is composed of two parts, a generator and a discriminator. As can be seen from fig. 5, the input information is passed through the generator to obtain a generated image (fake) which is input as a part to the discriminator for discrimination. While another part of the input to the discriminator comes from the real image (real), thereby letting the discriminator discriminate between the two inputs. During training, game relations exist between the discriminators and the generators, and the performances of the discriminators and the generators are obviously enhanced in the continuous training process.
In an implementation solution, the first image synthesis model in this embodiment may be obtained by the following steps.
Specifically, as shown in fig. 6, the determining process of the first image synthesis model may include:
1021. and acquiring a second image synthesis model trained in advance, a first image synthesis model to be trained and a training sample.
Wherein the training sample comprises a first sample graph, at least one first feature graph associated with the first sample graph, a second sample graph, at least one second feature graph associated with the second sample graph, a third sample graph and a fourth sample graph; the first sample graph and the second sample graph are synthesized to obtain a synthesized graph related to graph content of the third sample graph; the fourth sample graph and the third sample graph are synthesized to obtain a synthesized graph related to the graph content of the first sample graph.
1022. And inputting the first sample map, the at least one first feature map, the second sample map and the at least one second feature map into the second image synthesis model to synthesize the first sample map and the second sample map by referring to the at least one first feature map and the at least one second feature map, so as to obtain a first output map.
1023. Inputting the first output map, the third sample map, and the fourth sample map into the first image synthesis model, and outputting a second output map for knowledge distillation.
1024. Optimizing the first image synthesis model based on the second output map and the first sample map.
The technical scheme provided by the embodiment can be applied to image synthesis of many scenes, such as synthesizing a human face in an image a to a human face in an image b; for another example, the clothing in fig. c is synthesized on the person in the image D; for another example, the watch in image d is synthesized onto the wrist of the person in image e; and the like, which is not limited by the present embodiment.
The drawings in the training sample are described below with reference to a specific example. In order to train a first image synthesis model suitable for processing a model change scene, the training samples selected at this time include: the model displays a first sample graph corresponding to a second exhibit, at least one first feature graph associated with the first sample graph, a second sample graph corresponding to a third exhibit, at least one second feature graph associated with the second sample graph, a third sample graph corresponding to the third exhibit and a fourth sample graph corresponding to the second exhibit.
It is convenient to understand from the above example that "the first sample graph and the second sample graph are synthesized to obtain a synthesized graph related to the graph content of the third sample graph; the fourth sample graph and the third sample graph are synthesized to obtain a synthesized graph' related to the graph content of the first sample graph. "graph content related" is understood here as: the content of the composite graph of the first sample graph and the second sample graph is the same as, similar to, etc. the content of the graph of the third sample graph, which is not limited in this embodiment.
The at least one first characteristic map mentioned hereinbefore may include, but is not limited to, at least one of: and at least one segmentation graph obtained by segmenting the first sample graph by adopting at least one segmentation method, wherein the characteristic point distribution graph of the first sample graph. The at least one second feature map includes, but is not limited to, at least one of: and at least one segmentation graph obtained by segmenting the second sample graph by adopting at least one segmentation method.
For example, the first sample map is a model map, and the corresponding at least one segmentation map corresponding to the model map can be used for human body detection and segmentation by at least one of the following methods: a template-based method, a model-based method, a parallel line-based method, an edge contour-based method, an image blocking-based method, and the like, which are not limited in this embodiment. The feature point distribution map of the first sample map may be a human posture key point map. The human body key point diagram can be obtained by detecting the person (namely the model) in the first sample diagram by adopting a human body key point detection method. There are many methods for detecting key points of human body, and the methods for detecting human body and the above-mentioned various segmentation methods are not specifically described in this document, and may be referred to in other documents. In practical implementation, the densipose (human body gesture recognition system) may be selected in the present embodiment to recognize the human body gesture in the image, so as to generate the human body gesture information (which may be the human body gesture key point diagram mentioned above) in the first sample graph. To obtain more accurate pose estimation, DensePose proposes a dense human pose estimation method that maps each pixel to a dense pose point. Garment region warping and pose alignment is performed using estimated dense poses, which provides richer pose details for the synthesis of pose guidance. The densipose resolution contains body segmentation and grid coordinates, which provides richer information that can be used for realistic gesture-guided synthesis. Densepose parsing and network coordinates provide dense pseudo-three dimensional information that can be used to represent gesture details.
For another example, the second sample graph is a garment graph, and the corresponding at least one segmentation graph of the garment can be segmented by at least one of the following methods: edge contour based methods, image block based methods, and the like.
Further, as shown in fig. 8, the second module to be trained in this embodiment includes: the system comprises a space transformation network module and an image synthesis network module. Specifically, the spatial transformation network module may be the STN module, the GMM module, or the like mentioned above; the image synthesis network module may be a GAN module. Accordingly, step 1023 "inputting the first output map, the third sample map, and the fourth sample map into the first image synthesis model, and outputting a second output map for knowledge distillation" specifically includes:
s1, inputting the third sample graph and the fourth sample graph into the spatial transform network module, and outputting a third output graph obtained by performing coordinate transformation on at least part of the pixels in the fourth sample graph based on the third sample graph;
and S2, inputting the first output diagram and the third output diagram into the image synthesis network module to obtain a second output diagram for knowledge distillation.
A more specific application scenario will be introduced below, which is helpful for understanding the relationship between the above sample diagrams, and the knowledge distillation method for training the first image synthesis model provided in this embodiment. In particular, reference will be made to the examples hereinafter.
When the solution provided by the embodiment of the present application is applied, it is found that the resolution of the images in the existing data set is low, for example, only 256 × 192. By using the low-resolution image as the training sample, the resolution of the synthesized image output by the trained first image synthesis model is also lower. The low resolution composite image will be poor if it is desired to display the advertisement by external projection (e.g. making a window advertisement or newspaper, or other display). Or, the composite image is displayed on the merchant e-commerce page as a commodity image by the merchant user, and when the purchaser wants to look up the clothing details in an enlarged manner, the clothing details cannot be clearly displayed due to low resolution. Therefore, the method provided by the embodiment of the present application can improve the resolution by using the following steps in training the second image synthesis model and training the first image synthesis model by knowledge distillation using the trained second image synthesis model. That is, the method provided by this embodiment may further include the following steps:
105. acquiring a data set;
106. determining an atlas capable of serving as a training sample based on the dataset;
107. processing the images in the atlas to improve the resolution of the images in the atlas; wherein, the processed image is used as a sample image in the training sample;
108. and analyzing the processed image in the atlas to obtain at least one characteristic map corresponding to the image.
The data sets in 105 above may be obtained from some common data set published on the network. A public data set is a data set that a holder with the data set publicly uses for mass reading. The selection of the common data set in the present embodiment is not particularly limited.
In practical implementation, the atlas suitable for serving as the training sample may be selected according to the actual image synthesis requirement 106. For example, in the embodiment, the first image synthesis model is used to synthesize the model image and the clothing image to obtain the image of the clothing in which the model is converted, and the model image, the clothing image, the converted image, and the like in the data set are all suitable for being used as the training sample. For another example, in the embodiment, the first image synthesis model is a synthesis image for synthesizing two human images to obtain a face-changed synthesis image, and the human images in the data set are all suitable as training samples. For another example, in this embodiment, the first image synthesis model is used to synthesize the ornament image and the character image to obtain an image of the character wearing the ornament, and the ornament image, the character image, and the like in the data set are all suitable for being used as training samples.
In practical implementation, the image super-resolution reconstruction means 107 can be used to improve the quality of the image in the data set. Namely, the detail information of the image is reconstructed from the low-resolution image by a software processing method, so that a super-resolution image with higher quality is obtained.
The resolution of the image in the above steps can be improved by the following means:
1. the super-resolution reconstruction method based on interpolation estimates the value of the current pixel point through an algorithm according to the pixel values of a plurality of known positions adjacent to the current pixel point, thereby obtaining an image with higher resolution. The algorithm based on interpolation is simple, the calculation complexity is low, and the application range is very wide.
2. A super-resolution reconstruction method based on reconstruction, which is based on an assumption that "more of the missing original detail features can be captured and estimated from the low resolution image". Many super-resolution reconstruction models represented by the spatial domain method and the frequency domain method are developed based on this assumption.
3. The super-resolution reconstruction method based on learning is to collect and establish a learning image material database, an algorithm carries out a large amount of model training and image reconstruction to accumulate certain priori knowledge, and the detailed information of an original image is better captured and restored by adjusting certain parameter setting, so that the reconstruction effect is improved. For example, the comparison is classically based on Super-Resolution using a general adaptive Network (SRGAN) algorithm for generating a countermeasure Network. The main ideas of SRGAN are: the generator generates a high-resolution image to approximate the effect of a real image through training and learning, and a discriminator is difficult to judge whether the input high-resolution image is from an original real high-resolution image or the generated high-resolution image as far as possible. When the discriminator cannot discriminate the authenticity of the image, the explanation generator network generates a high quality high resolution image.
For example, an image acquired from an existing common data set at a resolution of 256 × 192 may be reconstructed by any of the methods described above to an image at a resolution of 512 × 384. Images with the resolution of 512-384 are used as sample images in training samples, and then the model is trained by using the sample images with high resolution, so that the resolution of the images output by the model can be improved, and the effect is better.
Of course, in the embodiment, if there are high-resolution images, the high-resolution images can be directly selected as the sample image, and it is not necessary to improve the resolution of the image by the above method.
In 108, the analyzing of the processed image in the atlas may include segmenting the image, detecting key points in the image, and so on, as mentioned above, to obtain at least one feature map corresponding to the image, such as: at least one segmentation map obtained by segmenting the image by at least one segmentation method, a key point map obtained by detecting the image by a key point detection method, and the like.
Further, the method provided by this embodiment may further include the following steps:
109. in response to the feedback of the user for the synthetic image, in the case that the feedback synthesis effect is not satisfactory, re-determining the first image synthesis model to calculate a synthetic image of the first image and the second image using the re-determined first image synthesis model; wherein the redetermined first image synthesis model is: and training the first image synthesis model through a knowledge distillation process based on the third model to obtain the image.
In a specific implementation, the model magnitude of the third model may be larger than the model magnitude of the second image synthesis model. Alternatively, the third model is of the same magnitude as the second image synthesis model, but belongs to a different type of model (e.g., different hierarchy and order).
Alternatively, there may be more than one first image synthesis model for image processing in the present embodiment. For example, different teacher models (such as the second image synthesis model and the third model mentioned above) are used in advance to train the second image synthesis model through knowledge distillation to obtain different first image synthesis models. For another example, different teacher models are used in advance to train different student models through knowledge distillation to obtain different first image synthesis models. For example, a model A and a model B can be used as teacher models in knowledge distillation; the knowledge distillation can be used as a student model comprising a model a and a model b. Training the model a by knowledge distillation by using the model A to obtain a first image synthesis model z; and training the model B by knowledge distillation by using the model B to obtain a first image synthesis model x. And when the knowledge distillation is used for training each student model, different training sample sets can be used so as to be suitable for corresponding scene requirements. In this way, the user may reselect one of the first image synthesis models to regenerate a desired synthetic image when the synthetic image effect is not satisfied. That is, the step of "determining the first image synthesis model" in the method provided in this embodiment may specifically be:
selecting one model from a plurality of candidate models as the first image synthesis model in response to a model selection operation by a user; wherein the plurality of candidate models comprises at least one of: the teacher model based on different model magnitudes is a model obtained after training the first image synthesis model through a knowledge distillation process, and the teacher model based on different model magnitudes is a model obtained after training different student models through a knowledge distillation process.
The technical solution provided by the present application is described below with reference to a specific application scenario, such as a scenario of changing a model exhibit. As shown in fig. 7, a flowchart of an image processing method according to another embodiment of the present application is schematically illustrated. The execution main body of the method provided by this embodiment may be a client, and the client may be a smart phone, a desktop computer, a notebook computer, a tablet computer, an intelligent wearable device, and the like, which is not limited in this embodiment. Specifically, the method comprises the following steps:
201. responding to the image input operation of a user, and acquiring a first display object image and a model image;
202. determining a first image synthesis model, wherein the first image synthesis model learns an image synthesis result of a second image synthesis model through a training process, and the second image synthesis model is a pre-trained model;
203. inputting the first exhibit image and the model image into the first image synthesis model, and outputting a synthesis image of the model exhibiting the first exhibit;
204. and displaying the composite image.
The display in this embodiment may be a garment, an accessory, an electronic product, a bag (such as a handbag, a case, a backpack, etc.), a boot, a hat, a scarf, a glove, and the like. The ornaments may be necklaces, watches, rings, headwear, earrings, etc., which are not limited in this embodiment.
As shown in FIG. 2, a user may input a first image (e.g., a model image) and a second image (e.g., a white jacket image) via an interactive interface. The user can see the composite image after clicking a composite control on the interactive interface. For a main body (such as a mobile phone, a computer, etc.) executing the method of the embodiment, after a user inputs a first image and a second image, a first image synthesis model is determined; the first image and the second image are then processed using the first image synthesis model to obtain a synthesized image (e.g., a synthesized image of a model wearing a white coat).
For the model exhibit replacement scenario corresponding to this embodiment, as shown in fig. 8, the method according to this embodiment may determine the first image synthesis model by the following steps:
2021. and acquiring a second image synthesis model trained in advance, a first image synthesis model to be trained and a training sample.
The training samples comprise a first sample graph corresponding to a second exhibit displayed by the model, at least one first feature graph related to the first sample graph, a second sample graph corresponding to a third exhibit, at least one second feature graph related to the second sample graph, a third sample graph corresponding to the third exhibit displayed by the model, and a fourth sample graph corresponding to the second exhibit.
2022. Inputting the first sample graph, the at least one first feature graph, the second sample graph and the at least one second feature graph into the second image synthesis model to obtain a first output graph corresponding to the third exhibit displayed by the model;
2023. inputting the first output image, the third sample image and the fourth sample image into the first image synthesis model, and outputting a second output image for knowledge distillation corresponding to a model displaying the second display;
2024. optimizing the first image synthesis model based on the second output map and the first sample map.
The at least one first feature map may include, but is not limited to, at least one of: and at least one segmentation graph obtained by segmenting the model in the first sample graph by adopting at least one segmentation method, wherein the feature point distribution graph of the model in the first sample graph is obtained. The at least one second feature map may include, but is not limited to, at least one of: and at least one segmentation graph obtained by segmenting the third exhibit in the second sample graph by adopting at least one segmentation method.
FIG. 8 illustrates a training process for a first image synthesis model suitable for use in image processing in a model retouching scenario. The first image synthesis model comprises a space transformation network module and an image synthesis network module. As an example shown in fig. 8, the at least one first characteristic map corresponding to the first sample map may include: two segmentation methods are adopted to carry out human body segmentation on the model in the first sample graph to obtain two segmentation graphs and a key point graph representing the posture of the model. The upper second image synthesis model in fig. 8, which has been trained to act as a teacher model during the knowledge distillation process, has more levels. And inputting the first sample graph, two model human body segmentation graphs and a model posture key point graph into a second image synthesis model to obtain a first output graph. And inputting the third sample image and the second sample image into the first image synthesis model, and executing a space transformation network module of the first image synthesis model to obtain a third output image (namely the deformed clothing image suitable for the model posture). And the third output image and the first output image output by the second image synthesis model are used as the input of the image synthesis network module of the first image synthesis model, and the image synthesis network module executes to obtain the second output image. Finally, a knowledge distillation loss (loss) is calculated based on the second output map and the first sample map to optimize the first image synthesis model based on the knowledge distillation loss (loss). More specifically, the parameters of the image synthesis network module can be optimized according to the knowledge loss (loss).
It is mentioned above that the second image synthesis model shares part of the parameters with the image synthesis network module, so that in the above optimization process, if the parameters shared by the second image synthesis model are optimized, the parameters in the second image synthesis model are also optimized accordingly.
The reason why this embodiment does this is that: the second image synthesis model is difficult to generate a synthesis graph with good effect in one step, so that a first output graph output by the second image synthesis model is used as the input of the first image synthesis model, and then the first sample graph in the training sample is used as a label to calculate loss so as to optimize the first image synthesis model, so that the first image synthesis model can be well learned.
Here, it should be noted that: how to calculate the distillation loss of knowledge and how to use the loss optimization model parameters, etc., the embodiment is not limited. In addition, the present embodiment is also not particularly limited with respect to the selection or design (hierarchical structure) of the second image synthesis model and the first image synthesis model (the spatial transform network module and the image synthesis network module). The embodiment of the application focuses on innovation of knowledge distillation, namely a scheme of training the output of a second image synthesis model (namely, a teacher model) as the input of a first image synthesis model; and innovations in applying knowledge-distilling techniques to image processing (e.g., image synthesis) scenarios.
The above process profile is: the posture of the model is analyzed, and then a space transformation network module is used for deforming a displayed object image (such as a tiled clothing image) so as to enable the displayed object image to be matched with the posture of the model in the model image; and then synthesizing the deformed exhibit image and the model image to obtain a diagram of the model exhibiting the object in the exhibit image. According to the technical scheme provided by each embodiment of the application, knowledge distillation is technically used, on one hand, the input content is greatly reduced, and the trained first image synthesis model can be used for image synthesis only by inputting a model diagram and a floor plan, so that the application flow is greatly simplified; the two aspects further promote the generated effect.
Fig. 9 is a flowchart illustrating a method for determining a first image synthesis model for image processing according to an embodiment of the present application. As shown in fig. 9, the method includes:
401. acquiring a second image synthesis model which is trained in advance, a first image synthesis model to be trained and a training sample; wherein the training sample comprises a first sample graph, at least one first feature graph associated with the first sample graph, a second sample graph, at least one second feature graph associated with the second sample graph, a third sample graph and a fourth sample graph; the first sample graph and the second sample graph are synthesized to obtain a synthesized graph related to graph content of the third sample graph; the fourth sample graph and the third sample graph are synthesized to obtain a synthesized graph related to the graph content of the first sample graph;
402. inputting the first sample map, the at least one first feature map, the second sample map, and the at least one second feature map into the second image synthesis model to synthesize the first sample map and the second sample map with reference to the at least one first feature map and the at least one second feature map, thereby obtaining a first output map;
403. inputting the first output map, the third sample map, and the fourth sample map into the first image synthesis model, outputting a second output map for knowledge distillation;
404. optimizing the first image synthesis model based on the second output map and the first sample map;
and the optimized first image synthesis model is used for processing two input images to obtain a synthesized image.
For the contents 401 to 402, reference may be made to the corresponding contents above, which are not described herein again.
Further, the first image synthesis model includes: the system comprises a space transformation network module and an image synthesis network module. Accordingly, the step 403 "inputting the first output map, the third sample map and the fourth sample map into the first image synthesis model, and outputting the second output map for knowledge distillation" may include:
4031. inputting the third sample graph and the fourth sample graph into the spatial transformation network module, and outputting a third output graph obtained by performing coordinate transformation on at least part of pixel points in the fourth sample graph by referring to the third sample graph;
4032. and inputting the first output diagram and the third output diagram into the image synthesis network module to obtain a second output diagram for knowledge distillation.
Similarly, for the contents of 4031-4032, reference may be made to the corresponding contents in the above, which are not described herein again.
Because the technical scheme provided by each embodiment of the application adopts the knowledge distillation technology, the lightweight first image synthesis model can have better performance. Therefore, in practical application, a teacher model used in the knowledge distillation process can be trained on the service end side, and the teacher model has better performance after being trained. Training the learning model through a knowledge distillation process based on the trained teacher model. The trained student model is deployed at the client side due to light magnitude, so that the image processing speed is higher and the response is fast. Namely, the server is responsible for training the first image synthesis model. And deploying the trained student model to the client. The user can directly use the first image synthesis model locally stored in the client, so that the image processing can be completed, and the method is simple, convenient and quick.
Namely, an embodiment of the present application provides an image processing system. As shown in fig. 2, the image processing system includes: a server 302 and a client 301. The server 302 is configured to train at least one first image synthesis model serving as a student model by using the training sample and at least one second image synthesis model serving as a teacher model, respectively, to obtain at least one first image synthesis model for which an image synthesis result of a corresponding second image synthesis model is learned. A client 301 for locally deploying at least part of the at least one first image composition model; the first image and the second image are acquired in response to the operation of a user; determining a first image synthesis model, inputting the first image and the second image into the first image synthesis model, outputting a synthesis image, and displaying the synthesis image.
As shown in fig. 6, the first image synthesis model trained by the server may be deployed at the client. The client may be, but is not limited to: desktop computers, notebook computers, mobile phones, tablet computers, and the like. The user can obtain and load the first image synthesis model from the server through a client request. Or the server side issues the first image synthesis model to the client side, and the client side receives the first image synthesis model and then automatically loads the first image synthesis model locally or updates the locally existing first image synthesis model.
Still alternatively, an image processing system as shown in fig. 10. The server 302 is configured to train at least one first image synthesis model serving as a student model by using a training sample and at least one second image synthesis model serving as a teacher model, respectively, to obtain at least one first image synthesis model for which an image synthesis result of a corresponding second image synthesis model is learned; the image synthesis system is also used for receiving a first image and a second image sent by a client, determining a first image synthesis model, inputting the first image and the second image into the first image synthesis model and outputting a synthesized image; the composite image is then fed back to the client 301. The client 301 is configured to send a first image and a second image input by a user to the server 302 in response to an operation of the user; and receiving the composite image fed back by the server 302 and displaying the composite image.
The technical solution provided by the present application can also be implemented by using a system architecture as shown in fig. 11. The system corresponding to the system architecture shown in fig. 11 may be deployed on a single server on the server side, a server cluster, a virtual server or cloud deployed on a server, and so on. A system corresponding to the system architecture shown in fig. 11 may also be deployed on a computer on the client side. As shown in fig. 11, the image processing system includes: a data layer, a processing layer and an application layer. The data layer is used for interactively storing and acquiring data with a database; the database stores data which can be used as training samples. The processing layer is provided with at least one second image synthesis model serving as a teacher model and at least one first image synthesis model serving as a student model and used for generating training samples according to the data set acquired by the data layer; and respectively training the at least one first image synthesis model by using the training sample and the at least one second image synthesis model to obtain the at least one first image synthesis model which learns the image synthesis result of the corresponding second image synthesis model. Each of the second image synthesis model, the second model, the third model, and the … … N-th model shown in fig. 11 may be used as a teacher model in one knowledge distillation training task or a student model in the next knowledge distillation training task. And the application layer is used for receiving the first image and the second image input by the user. The processing layer is further configured to synthesize the first image and the second image by using at least a part of the at least one first image synthesis model to obtain a synthesized image. For example, selecting one first image synthesis model at a time to obtain one synthesis image, or selecting a plurality of first image synthesis models at a time to obtain a plurality of synthesis images facilitates the user to select one of the synthesis images which is most satisfactory. The participation in image processing with several first image composition models in a specific implementation may be determined by the user. The application layer is further used for sending the composite image to the client device corresponding to the user.
The following describes technical solutions provided in embodiments of the present application with reference to specific application scenarios. By adopting the technical scheme provided by the embodiment of the application, the service of synthesizing the commodity image can be provided for the merchant. For example, a merchant user may upload a model image (such as a previously taken photograph of a model wearing apparel a) and a dress image (which may be a flat drawing of a dress) through a client. The merchant user then clicks on the "composite" control to obtain a composite image of the model's dress. Because the first image synthesis model adopted by the synthetic image is obtained by training by using a knowledge distillation technology, the performance and the precision are higher; the composite image has a good effect. The merchant user not only wants to put the composite image on the e-commerce platform for promotion, but also wants to make an advertisement poster for offline promotion. Because the first image synthesis model adopted by the synthesis image is obtained by training the high-resolution sample graph, the synthesis image also has higher resolution, the resolution requirement of the advertisement poster can be met, and the effect of making the advertisement poster is very good.
In fact, based on the technical solutions provided by the embodiments of the present application, a service for batch processing image synthesis may also be provided for a merchant. For example, a new product in the season is 10 pieces, and 3 pieces are suitable for the model x; there were 4 temperaments and postures appropriate for model y; there are 3 pieces of air and posture suitable for model z. In this case, the user may upload the model image of the model x and the images of the 3 pieces of clothing at one time, and then click the "batch processing" control, and the back end of the client device or the server device sequentially processes the model image of the model x and the images of the 3 pieces of clothing to obtain a composite image of the model x wearing the 3 pieces of clothing. Similarly, other pieces can be processed in batch, so that a user does not need to operate for many times, the operation is simplified, and the efficiency is improved.
Therefore, according to the technical scheme provided by each embodiment of the application, the shooting cost of clothes display is saved, the clothes style change can be adapted more quickly, and the display effect can be changed in real time according to the style change.
The above description is only described in conjunction with a model change scene, and the technical solution provided in the embodiment of the present application may also be applied to other scenes, so as to be used by users with different types of requirements, which is not illustrated herein.
Fig. 12 is a schematic structural diagram illustrating an image processing apparatus according to an embodiment of the present application. As shown in fig. 12, the apparatus includes: the device comprises an acquisition module 11, a determination module 12, a calculation module 13 and a display module 14. The acquiring module 11 is configured to acquire a first image and a second image in response to a user operation. The determining module 12 is configured to determine a first image synthesis model, where the first image synthesis model learns an image synthesis result of a second image synthesis model through a training process, and the second image synthesis model is a pre-trained model. The calculation module 13 is configured to input the first image and the second image into the first image synthesis model, and output a synthesized image. The display module 14 is configured to display the composite image.
Further, the first calculating module 13 includes: the system comprises a space transformation network module and an image synthesis network module. Correspondingly, when the first image and the second image are input into the first image synthesis model and output as a synthesized image, the computing module is specifically configured to:
inputting the first image and the second image into the spatial transformation network module, and outputting an intermediate image obtained by performing coordinate transformation on at least part of pixel points in the second image by referring to the first image; inputting the intermediate image and the first image into the image synthesis network module, and outputting the synthesized image; wherein the image synthesis network module shares at least part of the parameters with the second image synthesis model.
Further, when determining the first image synthesis model, the determining module 12 is specifically configured to:
acquiring a second image synthesis model which is trained in advance, a first image synthesis model to be trained and a training sample; wherein the training sample comprises a first sample graph, at least one first feature graph associated with the first sample graph, a second sample graph, at least one second feature graph associated with the second sample graph, a third sample graph and a fourth sample graph; the first sample graph and the second sample graph are synthesized to obtain a synthesized graph related to graph content of the third sample graph; the fourth sample graph and the third sample graph are synthesized to obtain a synthesized graph related to the graph content of the first sample graph;
inputting the first sample map, the at least one first feature map, the second sample map, and the at least one second feature map into the second image synthesis model to synthesize the first sample map and the second sample map with reference to the at least one first feature map and the at least one second feature map, thereby obtaining a first output map;
inputting the first output map, the third sample map, and the fourth sample map into the first image synthesis model, outputting a second output map for knowledge distillation;
optimizing the first image synthesis model based on the second output map and the first sample map.
Still further, the at least one first feature map includes at least one of: and at least one segmentation graph obtained by segmenting the first sample graph by adopting at least one segmentation method, wherein the characteristic point distribution graph of the first sample graph. The at least one second profile comprises at least one of: and at least one segmentation graph obtained by segmenting the second sample graph by adopting at least one segmentation method.
Further, the first image synthesis model includes, but is not limited to: the system comprises a space transformation network module and an image synthesis network module. Accordingly, when the first output map, the third sample map, and the fourth sample map are input to the first image synthesis model and the second output map for knowledge distillation is output, the determining module 12 is specifically configured to:
inputting the third sample graph and the fourth sample graph into the spatial transformation network module, and outputting a third output graph obtained by performing coordinate transformation on at least part of pixel points in the fourth sample graph based on the third sample graph;
and inputting the first output diagram and the third output diagram into the image synthesis network module to obtain a second output diagram for knowledge distillation.
Further, the apparatus provided in this embodiment further includes a data preparation module. The data preparation module is used for acquiring a data set; determining an atlas capable of serving as a training sample based on the dataset; processing the images in the atlas to improve the resolution of the images in the atlas; wherein, the processed image is used as a sample image in the training sample; and analyzing the processed image in the atlas to obtain at least one characteristic map corresponding to the image.
Here, it should be noted that: the image processing apparatus provided in the foregoing embodiments may implement the technical solutions described in the foregoing method embodiments, and the specific implementation principle of each module or unit may refer to the corresponding content in the foregoing method embodiments, and is not described herein again.
Another embodiment of the present application provides an image processing apparatus having a structure similar to that shown in fig. 12. The image processing apparatus includes: the device comprises an acquisition module, a determination module, a calculation module and a display module. The acquisition module is used for responding to image input operation of a user and acquiring a first show object image and a model image. The determining module is configured to determine a first image synthesis model, where the first image synthesis model learns an image synthesis result of a second image synthesis model through a training process, and the second image synthesis model is a pre-trained model. The calculation module is used for inputting the first show object image and the model image into the first image synthesis model and outputting a synthesis image of the model showing the first show object. The display module is used for displaying the composite image.
Further, when determining the first image synthesis model, the determining module is specifically configured to:
acquiring a second image synthesis model which is trained in advance, a first image synthesis model to be trained and a training sample; wherein the training samples comprise a first sample graph corresponding to a second exhibit displayed by the model, at least one first feature graph associated with the first sample graph, a second sample graph corresponding to a third exhibit, at least one second feature graph associated with the second sample graph, a third sample graph corresponding to the third exhibit displayed by the model, and a fourth sample graph corresponding to the second exhibit;
inputting the first sample graph, the at least one first feature graph, the second sample graph and the at least one second feature graph into the second image synthesis model to obtain a first output graph corresponding to the third exhibit displayed by the model;
inputting the first output image, the third sample image and the fourth sample image into the first image synthesis model, and outputting a second output image for knowledge distillation corresponding to a model displaying the second display;
optimizing the first image synthesis model based on the second output map and the first sample map.
Further, the at least one first profile comprises at least one of: and at least one segmentation graph obtained by segmenting the model in the first sample graph by adopting at least one segmentation method, wherein the feature point distribution graph of the model in the first sample graph is obtained. The at least one second profile comprises at least one of: and at least one segmentation graph obtained by segmenting the third exhibit in the second sample graph by adopting at least one segmentation method.
Here, it should be noted that: the image processing apparatus provided in the foregoing embodiments may implement the technical solutions described in the foregoing method embodiments, and the specific implementation principle of each module or unit may refer to the corresponding content in the foregoing method embodiments, and is not described herein again.
Fig. 13 is a schematic structural diagram illustrating a first image synthesis model determining apparatus for image processing according to a further embodiment of the present application. As shown, the determining means includes: the system comprises an acquisition module 21, a first calculation module 22, a second calculation module 23 and an optimization module 24. The obtaining module 21 is configured to obtain a second image synthesis model trained in advance, a first image synthesis model to be trained, and a training sample; wherein the training sample comprises a first sample graph, at least one first feature graph associated with the first sample graph, a second sample graph, at least one second feature graph associated with the second sample graph, a third sample graph and a fourth sample graph; the first sample graph and the second sample graph are synthesized to obtain a synthesized graph related to graph content of the third sample graph; the fourth sample graph and the third sample graph are synthesized to obtain a synthesized graph related to the graph content of the first sample graph. The first calculation module 22 is configured to input the first sample map, the at least one first feature map, the second sample map, and the at least one second feature map into the second image synthesis model to perform synthesis processing on the first sample map and the second sample map by referring to the at least one first feature map and the at least one second feature map, so as to obtain a first output map. The second calculation module 23 is configured to input the first output map, the third sample map, and the fourth sample map into the first image synthesis model, and output a second output map for knowledge distillation. The optimization module 24 is configured to optimize the first image synthesis model according to the second output map and the first sample map. And the optimized first image synthesis model is used for processing two input images to obtain a synthesized image.
Further, the first image synthesis model includes: the system comprises a space transformation network module and an image synthesis network module. Accordingly, when the first output map, the third sample map, and the fourth sample map are input to the first image synthesis model and the second output map for knowledge distillation is output, the second calculation module 23 is specifically configured to:
inputting the third sample graph and the fourth sample graph into the spatial transformation network module, and outputting a third output graph obtained by performing coordinate transformation on at least part of pixel points in the fourth sample graph by referring to the third sample graph;
and inputting the first output diagram and the third output diagram into the image synthesis network module to obtain a second output diagram for knowledge distillation.
Here, it should be noted that: the determining apparatus of the first image synthesis model for image processing provided in the foregoing embodiment may implement the technical solutions described in the foregoing corresponding method embodiments, and the specific implementation principles of the modules or units may refer to the corresponding contents in the foregoing corresponding method embodiments, and are not described herein again.
Fig. 14 shows a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device includes a processor 31 and a memory 33. Wherein the memory 33 is configured to store one or more computer instructions; the processor 31, coupled with the memory 33, is used for the at least one or more computer instructions (e.g., computer instructions implementing data storage logic) to implement:
responding to the operation of a user, and acquiring a first image and a second image;
determining a first image synthesis model, wherein the first image synthesis model learns an image synthesis result of a second image synthesis model through a training process, and the second image synthesis model is a pre-trained model;
inputting the first image and the second image into the first image synthesis model, and outputting a synthesized image;
and displaying the composite image.
Here, it should be noted that: the processor may also implement other method steps provided in the above data processing method embodiment besides the above steps, which may be referred to in detail in the above embodiments and will not be described herein again. The memory 33 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Further, as shown in fig. 14, the electronic apparatus further includes: communication components 35, power components 32, and a display 34. Only some of the components are schematically shown in fig. 14, and it is not meant that the electronic device includes only the components shown in fig. 14.
Another embodiment of the present application provides an electronic device, a schematic structure diagram of which is shown in fig. 14. Specifically, the electronic device comprises a processor and a memory. Wherein the memory is configured to store one or more computer instructions; the processor, coupled with the memory, to execute the at least one or more computer instructions to implement:
responding to the image input operation of a user, and acquiring a first display object image and a model image;
determining a first image synthesis model, wherein the first image synthesis model learns an image synthesis result of a second image synthesis model through a training process, and the second image synthesis model is a pre-trained model;
inputting the first exhibit image and the model image into the first image synthesis model, and outputting a synthesis image of the model exhibiting the first exhibit;
and displaying the composite image.
Here, it should be noted that: the processor may also implement other method steps provided in the above data processing method embodiment besides the above steps, which may be referred to in detail in the above embodiments and will not be described herein again.
In another embodiment of the present application, an electronic device is provided, whose schematic structural diagram is as shown in fig. 14. Specifically, the electronic device comprises a processor and a memory. Wherein the memory is configured to store one or more computer instructions; the processor, coupled with the memory, to execute the at least one or more computer instructions to implement:
acquiring a second image synthesis model which is trained in advance, a first image synthesis model to be trained and a training sample; wherein the training sample comprises a first sample graph, at least one first feature graph associated with the first sample graph, a second sample graph, at least one second feature graph associated with the second sample graph, a third sample graph and a fourth sample graph; the first sample graph and the second sample graph are synthesized to obtain a synthesized graph related to graph content of the third sample graph; the fourth sample graph and the third sample graph are synthesized to obtain a synthesized graph related to the graph content of the first sample graph;
inputting the first sample map, the at least one first feature map, the second sample map, and the at least one second feature map into the second image synthesis model to synthesize the first sample map and the second sample map with reference to the at least one first feature map and the at least one second feature map, thereby obtaining a first output map;
inputting the first output map, the third sample map, and the fourth sample map into the first image synthesis model, outputting a second output map for knowledge distillation;
knowledge distillation is performed using the second output map and the first sample map to optimize the first image synthesis model;
and the optimized first image synthesis model is used as the first image synthesis model, and the first image synthesis model is used for processing two input images to obtain a synthesized image.
Here, it should be noted that: the processor may also implement other method steps provided in the above data processing method embodiment besides the above steps, which may be referred to in detail in the above embodiments and will not be described herein again.
Yet another embodiment of the present application provides a computer program product (not shown in any figure of the drawings). The computer program product comprises computer programs or instructions which, when executed by a processor, cause the processor to carry out the steps in the above-described method embodiments.
Accordingly, the present application further provides a computer-readable storage medium storing a computer program, where the computer program can implement the method steps or functions provided by the foregoing embodiments when executed by a computer.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (14)

1. An image processing method, comprising:
responding to the operation of a user, and acquiring a first image and a second image;
determining a first image synthesis model, wherein the first image synthesis model learns an image synthesis result of a second image synthesis model through a training process, and the second image synthesis model is a pre-trained model;
inputting the first image and the second image into the first image synthesis model which is trained, and outputting a synthesis image;
and displaying the composite image.
2. The method of claim 1, wherein the first image synthesis model comprises: spatial transformation network module, image synthesis network module, and
inputting the first image and the second image into the first image synthesis model output synthesis image, comprising:
inputting the first image and the second image into the spatial transformation network module, and outputting an intermediate image obtained by performing coordinate transformation on at least part of pixel points in the second image by referring to the first image;
inputting the intermediate image and the first image into the image synthesis network module, and outputting the synthesized image;
wherein the image synthesis network module shares at least part of the parameters with the second image synthesis model.
3. The method of claim 1 or 2, wherein determining the first image synthesis model comprises:
acquiring a second image synthesis model which is trained in advance, a first image synthesis model to be trained and a training sample; wherein the training sample comprises a first sample graph, at least one first feature graph associated with the first sample graph, a second sample graph, at least one second feature graph associated with the second sample graph, a third sample graph and a fourth sample graph; the first sample graph and the second sample graph are synthesized to obtain a synthesized graph related to graph content of the third sample graph; the fourth sample graph and the third sample graph are synthesized to obtain a synthesized graph related to the graph content of the first sample graph;
inputting the first sample map, the at least one first feature map, the second sample map, and the at least one second feature map into the second image synthesis model to synthesize the first sample map and the second sample map with reference to the at least one first feature map and the at least one second feature map, thereby obtaining a first output map;
inputting the first output map, the third sample map, and the fourth sample map into the first image synthesis model, outputting a second output map for knowledge distillation;
optimizing the first image synthesis model based on the second output map and the first sample map.
4. The method of claim 3, wherein the at least one first profile comprises at least one of: at least one segmentation graph obtained by segmenting the first sample graph by adopting at least one segmentation method, wherein the feature point distribution graph of the first sample graph;
the at least one second profile comprises at least one of: and at least one segmentation graph obtained by segmenting the second sample graph by adopting at least one segmentation method.
5. The method of claim 3, wherein the first image synthesis model comprises: spatial transformation network module, image synthesis network module, and
inputting the first output map, the third sample map, and the fourth sample map into the first image synthesis model, and outputting a second output map for knowledge distillation, including:
inputting the third sample graph and the fourth sample graph into the spatial transformation network module, and outputting a third output graph obtained by performing coordinate transformation on at least part of pixel points in the fourth sample graph based on the third sample graph;
and inputting the first output diagram and the third output diagram into the image synthesis network module to obtain a second output diagram for knowledge distillation.
6. The method of claim 3, further comprising:
acquiring a data set;
determining an atlas capable of serving as a training sample based on the dataset;
processing the images in the atlas to improve the resolution of the images in the atlas; wherein, the processed image is used as a sample image in the training sample;
and analyzing the processed image in the atlas to obtain at least one characteristic map corresponding to the image.
7. An image processing method, comprising:
responding to the image input operation of a user, and acquiring a first display object image and a model image;
determining a first image synthesis model, wherein the first image synthesis model learns an image synthesis result of a second image synthesis model through a training process, and the second image synthesis model is a pre-trained model;
inputting the first exhibit image and the model image into the first image synthesis model, and outputting a synthesis image of the model exhibiting the first exhibit;
and displaying the composite image.
8. The method of claim 7, wherein determining the first image synthesis model comprises:
acquiring a second image synthesis model which is trained in advance, a first image synthesis model to be trained and a training sample; wherein the training samples comprise a first sample graph corresponding to a second exhibit displayed by the model, at least one first feature graph associated with the first sample graph, a second sample graph corresponding to a third exhibit, at least one second feature graph associated with the second sample graph, a third sample graph corresponding to the third exhibit displayed by the model, and a fourth sample graph corresponding to the second exhibit;
inputting the first sample graph, the at least one first feature graph, the second sample graph and the at least one second feature graph into the second image synthesis model to obtain a first output graph corresponding to the third exhibit displayed by the model;
inputting the first output image, the third sample image and the fourth sample image into the first image synthesis model, and outputting a second output image for knowledge distillation corresponding to a model displaying the second display;
optimizing the first image synthesis model based on the second output map and the first sample map.
9. The method of claim 8, wherein the at least one first profile comprises at least one of: at least one segmentation graph obtained by segmenting the model in the first sample graph by adopting at least one segmentation method, wherein the feature point distribution graph of the model in the first sample graph;
the at least one second profile comprises at least one of: and at least one segmentation graph obtained by segmenting the third exhibit in the second sample graph by adopting at least one segmentation method.
10. A method for determining an image composition model for image processing, comprising:
acquiring a second image synthesis model which is trained in advance, a first image synthesis model to be trained and a training sample; wherein the training sample comprises a first sample graph, at least one first feature graph associated with the first sample graph, a second sample graph, at least one second feature graph associated with the second sample graph, a third sample graph and a fourth sample graph; the first sample graph and the second sample graph are synthesized to obtain a synthesized graph related to graph content of the third sample graph; the fourth sample graph and the third sample graph are synthesized to obtain a synthesized graph related to the graph content of the first sample graph;
inputting the first sample map, the at least one first feature map, the second sample map, and the at least one second feature map into the second image synthesis model to synthesize the first sample map and the second sample map with reference to the at least one first feature map and the at least one second feature map, thereby obtaining a first output map;
inputting the first output map, the third sample map, and the fourth sample map into the first image synthesis model, outputting a second output map for knowledge distillation;
optimizing the first image synthesis model based on the second output map and the first sample map;
and the optimized first image synthesis model is used for processing two input images to obtain a synthesized image.
11. The method of claim 10, wherein the first image synthesis model comprises: spatial transformation network module, image synthesis network module, and
inputting the first output map, the third sample map, and the fourth sample map into the first image synthesis model, and outputting a second output map for knowledge distillation, including:
inputting the third sample graph and the fourth sample graph into the spatial transformation network module, and outputting a third output graph obtained by performing coordinate transformation on at least part of pixel points in the fourth sample graph by referring to the third sample graph;
and inputting the first output diagram and the third output diagram into the image synthesis network module to obtain a second output diagram for knowledge distillation.
12. An image processing system, comprising:
the data layer is used for interactively storing and acquiring data with the database; the database stores data which can be used as training samples;
the processing layer is provided with at least one second image synthesis model serving as a teacher model and at least one first image synthesis model serving as a student model and used for generating training samples according to the data set acquired by the data layer; respectively training the at least one first image synthesis model by using the training sample and the at least one second image synthesis model to obtain at least one first image synthesis model for learning the image synthesis result of the corresponding second image synthesis model;
the application layer is used for receiving a first image and a second image input by a user;
the processing layer is further configured to synthesize the first image and the second image by using at least a part of the at least one first image synthesis model to obtain a synthesized image;
the application layer is further configured to send the composite image to a client device corresponding to the user.
13. An image processing system, comprising:
the server is used for respectively training at least one first image synthesis model serving as a student model by using the training samples and at least one second image synthesis model serving as a teacher model to obtain at least one first image synthesis model for learning the image synthesis result of the corresponding second image synthesis model;
a client for locally deploying at least part of the at least one first image composition model; the first image and the second image are acquired in response to the operation of a user; determining a first image synthesis model, inputting the first image and the second image into the first image synthesis model, outputting a synthesis image, and displaying the synthesis image.
14. An electronic device comprising a processor and a memory, wherein,
the memory to store one or more computer instructions;
the processor, coupled to the memory, configured to execute the at least one or more computer instructions for implementing the steps in the method of any one of claims 1 to 6, or for implementing the steps in the method of any one of claims 7 to 9, or for implementing the steps in the method of claim 10 or 11.
CN202111162107.2A 2021-09-30 2021-09-30 Image processing method, image synthesis model determining method, system and equipment Pending CN114004772A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111162107.2A CN114004772A (en) 2021-09-30 2021-09-30 Image processing method, image synthesis model determining method, system and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111162107.2A CN114004772A (en) 2021-09-30 2021-09-30 Image processing method, image synthesis model determining method, system and equipment

Publications (1)

Publication Number Publication Date
CN114004772A true CN114004772A (en) 2022-02-01

Family

ID=79922200

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111162107.2A Pending CN114004772A (en) 2021-09-30 2021-09-30 Image processing method, image synthesis model determining method, system and equipment

Country Status (1)

Country Link
CN (1) CN114004772A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115471799A (en) * 2022-09-21 2022-12-13 首都师范大学 Vehicle re-identification method and system using attitude estimation and data enhancement
WO2024099004A1 (en) * 2022-11-09 2024-05-16 腾讯科技(深圳)有限公司 Image processing model training method and apparatus, and electronic device, computer-readable storage medium and computer program product

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115471799A (en) * 2022-09-21 2022-12-13 首都师范大学 Vehicle re-identification method and system using attitude estimation and data enhancement
CN115471799B (en) * 2022-09-21 2024-04-30 首都师范大学 Vehicle re-recognition method and system enhanced by using attitude estimation and data
WO2024099004A1 (en) * 2022-11-09 2024-05-16 腾讯科技(深圳)有限公司 Image processing model training method and apparatus, and electronic device, computer-readable storage medium and computer program product

Similar Documents

Publication Publication Date Title
Zhen et al. Smap: Single-shot multi-person absolute 3d pose estimation
US10679046B1 (en) Machine learning systems and methods of estimating body shape from images
US11030782B2 (en) Accurately generating virtual try-on images utilizing a unified neural network framework
Daněřek et al. Deepgarment: 3d garment shape estimation from a single image
Hashmi et al. FashionFit: Analysis of mapping 3D pose and neural body fit for custom virtual try-on
Zhu et al. Simpose: Effectively learning densepose and surface normals of people from simulated data
CN114004772A (en) Image processing method, image synthesis model determining method, system and equipment
Su et al. Danbo: Disentangled articulated neural body representations via graph neural networks
Li et al. Detailed 3D human body reconstruction from multi-view images combining voxel super-resolution and learned implicit representation
Navarro et al. Learning occlusion-aware view synthesis for light fields
Xu et al. Generative image completion with image-to-image translation
Jinka et al. Sharp: Shape-aware reconstruction of people in loose clothing
Liu et al. Adapted human pose: monocular 3D human pose estimation with zero real 3D pose data
Ma et al. Unselfie: Translating selfies to neutral-pose portraits in the wild
CN114170250A (en) Image processing method and device and electronic equipment
Saint et al. 3dbooster: 3d body shape and texture recovery
CN116051593A (en) Clothing image extraction method and device, equipment, medium and product thereof
Chen et al. Three stages of 3D virtual try-on network with appearance flow and shape field
CN112258389B (en) Virtual reloading method and related equipment
Botre et al. Virtual Trial Room
CN115861488B (en) High-resolution virtual reloading method, system, equipment and storage medium
US20240161423A1 (en) Systems and methods for using machine learning models to effect virtual try-on and styling on actual users
Junus et al. Application Fitting Room using Virtual Technology: A Systematic Literature Review
US20240037869A1 (en) Systems and methods for using machine learning models to effect virtual try-on and styling on actual users
CN117241065B (en) Video plug-in frame image generation method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220926

Address after: No. 699, Wangshang Road, Binjiang District, Hangzhou City, Zhejiang Province, 310052

Applicant after: Alibaba (China) Network Technology Co.,Ltd.

Address before: 310052 room 508, 5th floor, building 4, No. 699 Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant before: Alibaba (China) Co.,Ltd.