CN115272632A

CN115272632A - Virtual fitting method based on posture migration

Info

Publication number: CN115272632A
Application number: CN202210795212.8A
Authority: CN
Inventors: 朱佳龙; 姜明华; 史衍康; 陈子宜; 刘军; 余锋
Original assignee: Wuhan Textile University
Current assignee: Wuhan Textile University
Priority date: 2022-07-07
Filing date: 2022-07-07
Publication date: 2022-11-01
Anticipated expiration: 2042-07-07
Also published as: CN115272632B

Abstract

The invention relates to a virtual fitting method based on posture migration, which comprises the following steps: acquiring an original analysis chart and a try-on image, extracting clothing pixel information in the try-on image, and then performing texture restoration to obtain a fine clothing image; inputting the original analysis diagram and the target posture into an analysis guide network to obtain an analysis guide diagram; preliminarily limiting the distortion range of the garment according to the analytic guide diagram; acquiring a target posture, preprocessing the target posture to obtain an analytic graph with the lower half body removed, and acquiring a distorted garment image through a garment warping network; and generating a fitting result of the target posture according to the analysis guide picture, the distorted garment image, the target posture and the fitting person image. According to the invention, the analysis guide picture, the distorted target clothing image, the target posture and the fitting image are simultaneously input into the neural network model to obtain the fitting effect picture of the target posture, so that the fitting effect is improved, and the problem of skin and cloth pixel confusion caused by fitting posture change is solved.

Description

Virtual fitting method based on attitude migration

Technical Field

The invention belongs to the field of garment image processing, and particularly relates to a virtual fitting method based on posture migration.

Background

In recent years, as shopping methods are changed from off-line to on-line, on-line clothing shopping methods are favored by consumers, but have the problem that the consumers cannot try on the clothes, and cannot experience the effect of wearing the clothes on their own. The appearance of the virtual fitting can enable sellers to more objectively show the advantages of the clothes, so that both parties in the transaction can more intuitively know information, the transaction is facilitated, unnecessary work is reduced, the working efficiency is improved, and the user requirements are met.

At present, the prior art of virtual fitting method for fusing virtual fitting and posture migration to realize multi-posture is mainly divided into two modes of 2D image-based and 3D reconstruction-based, and the technique of multi-posture fitting directly based on 2D image is less, and the fitting result has the phenomena of skin and cloth pixel confusion, detail loss and the like; the effect based on the 3D reconstruction mode is better, but the requirements on the calculation power, the performance and the quality of the generated model are relatively higher, so that the popularization and the application of the technology are not facilitated.

Chinese patent publication No. CN 108734787A discloses a picture synthesis virtual fitting method based on multi-pose and part decomposition, which synthesizes pictures by using multi-pose and part decomposition, rather than simply synthesizing the whole clothes picture, and can achieve the virtual fitting effect more truly, but the technique does not consider the problems of confusion between skin and cloth pixels, loss of details and the like caused by pose transformation, and greatly affects the fitting effect.

Disclosure of Invention

The invention aims to solve the problems and provides a virtual fitting method based on posture migration, which utilizes an analytic guide graph to limit the distortion range of a target clothing image and avoids the excessive distortion of the target clothing image when the target clothing image is changed along with the target posture; according to the target posture and the analytic graph without the lower half body, a target clothing image distorted along with the target posture is obtained by utilizing a clothing warping network, the image of the person to be tried, the target clothing image distorted along with the target posture, the target posture and the analytic guide graph are simultaneously input into a try-on image generation network, a try-on result of the target posture is obtained by utilizing the try-on image generation network, the try-on effect is improved, the problem that skin is mixed with cloth pixels due to the change of the try-on posture is avoided, and more clothing texture details are kept.

The technical scheme of the invention is a virtual fitting method based on posture migration, which comprises the following steps:

step 1, acquiring an original analysis chart and a try-on image, extracting clothing pixel information in the try-on image to obtain a crude clothing image, and then performing texture restoration to obtain a fine clothing image;

step 2, inputting the original analysis chart, the target garment and the target posture into an analysis guide network to obtain an analysis guide chart;

step 3, preliminarily limiting the distortion range of the target garment according to the analysis guide diagram;

step 4, obtaining a target posture, preprocessing the target posture to obtain an analytic graph with the lower half of the body removed, and obtaining a distorted target garment image through a garment warping network;

and 5, generating a fitting result of the target posture through an image generation network according to the analysis guide picture, the distorted target garment image, the target posture and the fitting person image.

Further, step 1 carries out pixel-level restoration on the clothing image, and the specific restoration process comprises the following steps:

learning the edge information characteristics of the clothing image through a convolution nerve layer, and paying attention to an area with a severely changed pixel value; and then, carrying out pixel repair on the area with the drastically changed pixel value by using an interpolation method to ensure that the edge of the garment is smooth and naturally transits with the background.

Preferably, step 1 comprises the sub-steps of:

firstly, extracting pixel information of a corresponding area in a try-on image according to clothing semantic information in an original analytic graph to obtain a primary clothing image, wherein the clothing image has the condition of fuzzy image edges or gap image edges;

then, pixel filling or filling is carried out on the fuzzy and gap areas in the clothing image by using an interpolation method, and a finer clothing image is obtained.

Further, the specific process of step 2 is as follows:

firstly, inputting an original analytic graph and a target posture into an analytic guide network, extracting image characteristics by utilizing a multilayer convolution network of the analytic guide network, and adding a residual error module and a wavelet sampling layer into the analytic guide network for extracting a higher-level semantic structure, so that the analytic guide network deeply learns the relationship details among all parts of a human body, wherein the wavelet sampling layer converts the characteristic graph into a frequency domain for down-sampling through wavelet transformation, and can better retain texture information;

then, inputting the extracted image features into a multilayer deconvolution network of an analysis guide network, carrying out up-sampling on the image, adding a normalization layer between deconvolution to enhance feature fusion between global features and local features, introducing a normalization constraint loss function, and controlling to retain more semantic details in the up-sampling process;

and finally, comparing the spatial position of the generated analytic guide map with the target state, ensuring that each semantic part is attached to the corresponding posture key point in position, better processing the relation when the arm and the clothes are overlapped, and finely adjusting the semantic position to obtain a more regular analytic guide map.

Preferably, the normalized constraint loss function is as follows:

in the formula (I), the compound is shown in the specification,

representing a normalized constraint loss function, G representing a global feature of the image, G 'representing a global feature of the parsed image, L representing a local feature of the image, L' representing a local feature of the parsed image,

representing the global feature matching loss function of the image before and after the analysis,

representing the local feature matching loss function of the image before and after analysis,

all are learning coefficients used for adjusting the importance degree of the global features and the local features.

The parsing guide graph contains semantic segmentation information, and specifically comprises the following steps: face, hair, neck, upper garment region, left arm, right arm, left hand, right hand, left shoulder, right shoulder, lower garment region.

Preferably, the target pose contains 18 key points, specifically including: nose, neck, right shoulder, right elbow, right wrist, left shoulder, left elbow, left wrist, right hip, right knee, right ankle, left hip, left knee, left ankle, right eye, left eye, right ear, left ear.

Further, the specific process of step 4 is as follows:

firstly, removing the semantic information of the lower body according to the difference of pixel values of the semantic information in the analysis guide image to obtain an analysis image with the lower body removed;

then, the whole distorted outline of the garment image is limited by removing the analytic graph and the target posture of the lower half body, the garment image is prevented from being forcibly deformed by a warping network, and excessive distortion of the garment is avoided;

and finally, deforming the clothing image through a warping network and introducing a plane deformation loss function to obtain a warped clothing image.

Preferably, the planar deformation loss function is as follows:

in the formula, C_x(x),C_y(x) X, y coordinates representing sampling parameters, respectivelyStandard, | C_x(x+i,y)-C_x(x, y) | represents the Euclidean distance between two nodes, i and j are all deformation variables, and gamma and delta are all deformation coefficients.

Further, in step 5, the fitting image generation network is an end-to-end network and comprises a generator and a discriminator, the generator inputs an analysis guide image, a distorted garment image and a fitting image, under the limitation of the analysis guide image, a rough fitting result image is generated according to the distorted garment image and the pixel information of the fitting image, and whether the rough fitting result image meets the target posture or not is judged by the discriminator and a characteristic point loss function is introduced, more arm region characteristics are extracted, the details of the rough fitting result image are continuously enhanced, and the image definition is improved.

Preferably, the feature point matching loss function is as follows:

in the formula (I), the compound is shown in the specification,

representing a characteristic point matching loss function, W representing a human body posture coordinate point in a rough fitting result graph, M representing a coordinate point of a target posture, W_i(x) Abscissa, M, representing coordinate point i in the rough fitting result graph_i(x) Represents the abscissa of a coordinate point i in the target attitude diagram, n represents the total number of characteristic points, | W_i(x)-M_i(x) And | represents the Euclidean distance of key points of the same part on the x axis, both alpha and beta are adjustment coefficients, and the sum of alpha and beta =1.

Compared with the prior art, the invention has the beneficial effects that:

(1) According to the invention, the analysis guide picture containing the semantic segmentation information, the target clothing image distorted along with the target posture, the target posture and the fitting image are simultaneously input into the fitting image generation network, and the fitting image generation network is utilized to obtain the fitting effect picture of the target posture of the fitting, so that the fitting effect is greatly improved, the problem of skin and cloth pixel confusion caused by fitting posture change is solved, more clothing texture details are kept in the fitting effect picture, and the virtual fitting experience is improved.

(2) The invention limits the distortion range of the target clothing image by utilizing the analysis guide image containing the semantic segmentation information, avoids the excessive distortion of the target clothing image when the target clothing image is changed along with the target posture, and ensures that the virtual fitting effect is more vivid.

(3) The method obtains the clothing image from the image of the try-on person, carries out texture refinement on the fuzzy and gap areas in the clothing image to obtain a finer clothing image, solves the problem that the training data set lacks the clothing image, is beneficial to training and strengthening of a try-on image generation network, an analysis guide network and a clothing warping network, and enhances the robustness of the fitting method.

(4) According to the invention, in the analysis guide process of obtaining the analysis guide graph, a normalization layer and a normalization constraint loss function are introduced, and more semantic details are retained in the process of controlling up-sampling while the fusion of global features and local features is enhanced.

(5) According to the invention, the characteristic point matching loss function is introduced into the fitting image generation network, whether the preliminary fitting result graph meets the target posture or not is judged, the problem of cross shielding of arms and clothes is effectively avoided, and the virtual fitting effect is further improved.

Drawings

The invention is further illustrated by the following figures and examples.

Fig. 1 is a schematic flow chart of a virtual fitting method according to an embodiment of the present invention.

Fig. 2 is a diagram of an analysis guidance network structure of a virtual fitting method according to an embodiment of the present invention.

Fig. 3 is a diagram of a clothing warping network structure of the virtual fitting method according to the embodiment of the present invention.

Fig. 4 is a network structure diagram of fitting image generation of the virtual fitting method according to the embodiment of the present invention.

Fig. 5 is a schematic view of a virtual fitting system according to an embodiment of the present invention.

Detailed Description

Example one

As shown in fig. 1, the virtual fitting method based on gesture migration includes the following steps:

(1) Acquiring an original analysis image and a try-on image, extracting clothing pixel information in the try-on image to obtain a crude clothing image, and then performing texture restoration to obtain a fine clothing image;

the garment image acquisition process comprises the following steps: firstly, according to the clothing semantic information in the original analysis graph, extracting the pixel information of the corresponding area in the image of the try-on person to obtain a rough clothing image, wherein the edge of the clothing image is fuzzy and has a gap. And then, carrying out texture restoration on the clothing image, and carrying out pixel filling or filling on fuzzy and gap areas in the rough clothing image by using an interpolation method to obtain a more fine clothing image.

The original analysis chart contains semantic information of each part of the try-on person, and comprises the following components: face, hair, neck, jacket area, left arm, right arm, left hand, right hand, left shoulder, right shoulder, under-garment area.

The texture restoration method comprises the steps of learning the edge information characteristics of a garment image through a convolutional neural network, paying attention to the region with the drastically changed pixel value, and performing pixel restoration on the region with the drastically changed pixel value by using an interpolation method to ensure that the garment edge is smooth and naturally transits with the background.

(2) Acquiring an original analysis chart and a target posture, and inputting the original analysis chart and the target posture into an analysis guide network to obtain an analysis guide chart;

the analysis guide graph displays semantic segmentation information after the posture of the try-on person is changed, and the semantic segmentation information comprises information of a face, hair, a neck, an upper garment, arms and a lower garment.

The target pose consists of 18 key points, including nose, neck, right shoulder, right elbow, right wrist, left shoulder, left elbow, left wrist, right hip, right knee, right ankle, left hip, left knee, left ankle, right eye, left eye, right ear, and left ear.

As shown in fig. 2, the analysis guidance network is composed of a multilayer convolution network and a multilayer deconvolution network, the input of the analysis guidance network is an original analysis diagram, a target garment and a target posture, and the output is an analysis guidance diagram.

The analysis guiding process is as follows: firstly, inputting an original analysis graph and a target posture, extracting image characteristics through a multilayer convolution network, adding a residual error module and a wavelet sampling layer into an analysis guide network for extracting a higher-level semantic structure, so that the analysis guide network deeply learns the relation details among all parts of a human body, wherein the wavelet sampling layer converts the characteristic graph into a frequency domain through wavelet transformation for down-sampling, and texture information can be better reserved; and then, inputting the extracted image features into a multilayer deconvolution network, up-sampling the image, adding a normalization layer between deconvolution to enhance feature fusion between the global features and the local features, introducing a normalization constraint loss function, and controlling to retain more semantic details in the up-sampling process. And finally, comparing the spatial position of the generated analytic guide graph with the target state, ensuring that each semantic part is attached to the corresponding posture key point on the position, better processing the relation when the arms and the clothes are overlapped, and finely adjusting the semantic position to obtain a more regular analytic guide graph.

In the normalization layer, the features obtained by deconvolution of the previous layer are regarded as local features, the features obtained by deconvolution of the next layer are regarded as global features, and the influence of the current local features and the global features on the subsequent fusion result is controlled by introducing a normalization constraint loss function.

Wherein the normalized constraint loss function is represented as:

in the formula (I), the compound is shown in the specification,

(3) Preliminarily limiting the distortion range of the garment according to the analytic guide diagram;

(4) Acquiring a target posture, preprocessing the target posture to obtain an analytic graph with the lower half of the body removed, and acquiring a distorted garment image through a garment warping network, wherein the distorted garment image is shown in fig. 3;

the specific process of acquiring the distorted garment image is as follows: firstly, removing the semantic information of the lower body according to the difference of pixel values of the semantic information in the analysis guide image to obtain an analysis image with the lower body removed; then, the whole distorted outline of the garment image is limited by removing the analytic graph and the target posture of the lower half body, the garment image is prevented from being forcibly deformed by a warping network, and excessive distortion of the garment is avoided; and finally, deforming the clothing image through a warping network and introducing a plane deformation loss function to obtain a warped clothing image.

Wherein the planar deformation loss function is expressed as:

in the formula, C_x(x),C_y(x) Respectively representing the x, y coordinates, | C, of the sampling parameter_x(x+i,y)-C_x(x, y) | represents the Euclidean distance between two nodes, i and j are deformation variables, and gamma and delta are deformation coefficients.

(5) Generating a fitting result of the target posture through an image generation network according to the analysis guide picture, the distorted clothing image, the target posture and the fitting person image;

as shown in fig. 4, the image generation network is an end-to-end network, and is composed of a generator and a discriminator, the generator is composed of an encoder and a decoder, the generator inputs an analysis guide image, a distorted garment image and a try-on image, under the limitation of the analysis guide image, a rough try-on result image is generated according to the pixel information of the distorted garment image and the try-on image, a human body posture image of the rough try-on result image is obtained, and the rough try-on result image is judged whether to meet the target posture or not by the discriminator and introducing a feature point matching loss function, and more arm region features are encouraged to be extracted, the details of the rough try-on result image are continuously strengthened, and the image definition is improved.

Wherein, the characteristic point matching loss function is as follows:

in the formula (I), the compound is shown in the specification,

Example two

As shown in fig. 5, the virtual fitting system for pose migration includes an analysis guiding module, a clothing matching module, and an image fusion module.

The analysis guide module is used for firstly carrying out pixel extraction and texture restoration according to the original analysis diagram, the try-on image and the target posture and then generating an analysis guide diagram through an analysis guide network;

the clothing matching module is used for obtaining a distorted clothing image through a clothing warping network according to the analytic guide image, the target posture and the analytic image with the lower half body removed;

and the image fusion module generates a fitting result of the target posture through the fitting image according to the analysis guide picture, the distorted clothing image, the target posture and the fitting image.

As shown in fig. 2, the input of the semantic parsing network is an original parse graph and a target pose graph, and the output is a parse guide graph, namely, a parse graph after pose migration. Respectively processing an original analysis graph and a target posture graph by using 5 sequentially connected residual blocks, extracting features by using convolution of 3 multiplied by 3 for each residual block, connecting the residual blocks by using a wavelet layer, and sampling the feature graph by using the wavelet layer in a frequency domain space; the residual block at the tail end is connected with a normalization layer and used for enhancing feature fusion between the global feature and the local feature, introducing a normalization constraint loss function and controlling to keep more semantic details in the up-sampling process; and after normalization, the data are processed by 5 deconvolution layers which are connected in sequence, the adjacent deconvolution layers are connected by an inverse wavelet layer, the inverse wavelet layer is used for up-sampling, and the deconvolution layer at the tail end outputs an analytic guide graph.

As shown in fig. 3, the input of the clothing warping network is the analysis guide map and the clothing image, and the output is the warped clothing image. Firstly, analyzing a guide picture and a clothing image, respectively encoding by an encoder, and respectively extracting image characteristics of the guide picture and the clothing image; then, a deformation coefficient theta is calculated through the image characteristics of the two, and the whole distorted outline of the clothing image is limited by analyzing the guide image and the target posture, so that the clothing image is prevented from being forcibly deformed by a warping network, and the clothing is prevented from being excessively distorted; and finally, carrying out distortion operation and introducing a plane deformation loss function to deform the clothing image to obtain a distorted clothing image.

As shown in fig. 4, the input of the try-on image generation network is the analysis guide map, the distorted garment image, and the try-on image, and the output is the try-on image. The fitting image generation network is an end-to-end network and comprises a generator and a discriminator, wherein the generator consists of an encoder and a decoder, the generator inputs an analysis guide image, a distorted clothing image and a fitting image, under the limitation of the analysis guide image, a rough fitting result image is generated according to the distorted clothing image and the fitting image, and whether the rough fitting result image meets the target posture or not is judged by introducing a characteristic point loss function through the discriminator, more arm region characteristics are extracted, the details of the rough fitting result image are enhanced, and the image definition is improved.

The virtual fitting system for posture migration adopts the same virtual fitting method as the embodiment.

The implementation result shows that the method not only enables the semantic segmentation precision to be higher, but also increases the robustness of clothing deformation, enables the fitting result image to retain more details, greatly improves the virtual fitting effect of the high-resolution 2D image, and improves the fitting effect and the user experience.

Claims

1. The virtual fitting method based on posture migration is characterized by comprising the following steps of:

step 2, inputting the original analysis chart, the target clothes and the target posture into an analysis guide network to obtain an analysis guide chart;

step 4, acquiring a target posture, preprocessing the target posture to obtain an analytic graph with the lower half of the body removed, and acquiring a distorted garment image through a garment warping network;

and 5, generating a fitting result of the target posture through a network by the fitting image according to the analysis guide picture, the distorted target garment image, the target posture and the fitting image.

2. The virtual fitting method according to claim 1, wherein step 1 performs pixel-level repair on the garment image, and the specific repair process comprises: learning the edge information characteristics of the clothing image through a convolution nerve layer, and paying attention to the region with the violently changed pixel values; and then, carrying out pixel repair on the area with the drastically changed pixel value by using an interpolation method to ensure that the edge of the garment is smooth and naturally transits with the background.

3. The virtual fitting method according to claim 2, wherein step 1 comprises the sub-steps of:

firstly, extracting pixel information of a corresponding area in a try-on image according to clothing semantic information in an original analytic graph to obtain a rough clothing image, wherein the clothing image has the condition of fuzzy image edges or gap image edges;

and then, carrying out texture restoration on the clothing image, and carrying out pixel filling or filling on fuzzy and gap areas in the rough clothing image by using an interpolation method to obtain a more fine clothing image.

4. The virtual fitting method according to claim 3, wherein the specific process of step 2 is as follows:

firstly, inputting an original analysis graph and a target posture into an analysis guide network, extracting image characteristics by utilizing a multilayer convolution network of the analysis guide network, and adding a residual error module and a wavelet sampling layer into the analysis guide network for extracting a higher-level semantic structure, so that the analysis guide network deeply learns the relation details among all parts of a human body, wherein the wavelet sampling layer converts a characteristic graph into a frequency domain through wavelet transformation for down-sampling, and can better retain texture information;

then, inputting the extracted image features into a multilayer deconvolution network of an analysis guide network, up-sampling the image, adding a normalization layer between convolution and deconvolution to enhance feature fusion between global features and local features, introducing a normalization constraint loss function, and controlling to retain more semantic details in the up-sampling process;

5. The virtual fitting method according to claim 4, wherein the normalized constraint loss function is as follows:

in the formula (I), the compound is shown in the specification,

are all learning coefficients.

6. The virtual fitting method according to claim 4, wherein the specific process of step 4 is as follows:

then, the whole contour of the distorted garment image is limited by removing the analytic graph and the target posture of the lower half of the garment, the garment image is prevented from being forcibly deformed by a warping network, and excessive distortion of the garment is avoided;

7. The virtual fitting method according to claim 6, wherein the planar deformation loss function is as follows:

in the formula, C_x(x),C_y(x) Respectively representing the x, y coordinates, | C of the sampling parameters_x(x+i,y)-C_x(x, y) | represents the Euclidean distance between two nodes, i and j are deformation variables, and gamma and delta are deformation coefficients.

8. The virtual fitting method according to claim 6, wherein in step 5, the fitting image generation network comprises a generator and a discriminator, the input of the generator is an analysis guide image, a distorted garment image and a fitting image, under the limitation of the analysis guide image, a rough fitting result image is generated according to the pixel information of the distorted garment image and the fitting image, and then the rough fitting result image is judged whether to meet the target posture through the discriminator and a characteristic point loss function is introduced, and more arm region characteristics are extracted, so that the details of the rough fitting result image are enhanced, and the image definition is improved.

9. The virtual fitting method according to claim 8, wherein the feature point matching loss function is as follows:

in the formula (I), the compound is shown in the specification,

representing feature point matching loss function, W-tableShowing the coordinate points of human body postures in the rough fitting result graph, M showing the coordinate point of the target posture, W_i(x) The abscissa, M, of the human posture coordinate point i in the rough fitting result chart_i(x) Represents the abscissa of coordinate point i in the target attitude diagram, n represents the total number of feature points, | W_i(x)-M_i(x) And | represents the Euclidean distance of key points of the same part on the x axis, both alpha and beta are adjustment coefficients, and the sum of alpha and beta =1.