CN115272632B

CN115272632B - Virtual fitting method based on gesture migration

Info

Publication number: CN115272632B
Application number: CN202210795212.8A
Authority: CN
Inventors: 朱佳龙; 姜明华; 史衍康; 陈子宜; 刘军; 余锋
Original assignee: Wuhan Textile University
Current assignee: Wuhan Textile University
Priority date: 2022-07-07
Filing date: 2022-07-07
Publication date: 2023-07-18
Anticipated expiration: 2042-07-07
Also published as: CN115272632A

Abstract

The present invention relates to a virtual fitting method based on posture migration, comprising: obtaining an original analysis image and an image of a person trying on clothes, first extracting clothing pixel information in the image of the person trying on, and then performing texture restoration to obtain a fine clothing image; converting the original analysis image Input the target posture and the target pose into the analysis guide network to obtain the analysis guide map; according to the analysis guide map, make a preliminary limit on the distortion range of the clothing; obtain the target pose and preprocess to obtain the analysis map with the lower body removed, and obtain the distortion through the clothing warp network The clothing image of the image; according to the analysis guide map, the distorted clothing image, the target pose and the image of the tryer, the try-on result of the target pose is generated. The present invention simultaneously inputs the analysis guide map, the distorted target clothing image, the target posture and the image of the try-on person to the neural network model to obtain the fitting effect diagram of the target posture, which improves the fitting effect and solves the problem caused by the transformation of the try-on posture. Issue with skin and cloth pixels getting mixed up.

Description

Virtual fitting method based on pose transfer

技术领域technical field

本发明属于服装图像处理领域，具体涉及一种基于姿态迁移的虚拟试衣方法。The invention belongs to the field of clothing image processing, in particular to a virtual fitting method based on posture transfer.

背景技术Background technique

近年来，随着购物方式从线下向线上转变，线上服装购物方式受到了消费者的青睐，然而却存在不能试穿的问题，消费者无法体验到服装穿在自己身上的效果。虚拟试衣的出现可以使卖方更客观地展示服饰优点，使交易双方可以更为直观地了解信息，促成交易，减少不必要的工作，提高工作效率，满足用户需求。In recent years, with the transformation of shopping methods from offline to online, online clothing shopping has been favored by consumers. However, there is a problem that consumers cannot try on clothes, and consumers cannot experience the effect of clothing on themselves. The emergence of virtual fittings can enable the seller to display the advantages of clothing more objectively, so that both parties to the transaction can understand information more intuitively, facilitate transactions, reduce unnecessary work, improve work efficiency, and meet user needs.

目前，现有技术的将虚拟试衣和姿态迁移进行融合以实现多姿势的虚拟试衣方法，主要分为基于2D图像和基于3D重构两种方式，对于直接基于2D图像的多姿态试衣的技术比较少，而且试衣结果存在皮肤与布料像素混淆，细节丢失等现象；基于3D重构方式的效果较好，但是对设备算力、性能和生成的模型质量要求相对较高，不利于技术的推广和普及。At present, the virtual fitting method of the existing technology that combines virtual fitting and posture migration to realize multi-pose virtual fitting is mainly divided into two methods based on 2D images and 3D reconstruction. For multi-pose fitting methods directly based on 2D images There are relatively few technologies, and there are skin and cloth pixel confusion and loss of details in the fitting results; the effect based on 3D reconstruction is better, but the requirements for equipment computing power, performance and the quality of the generated model are relatively high, which is not conducive to Promotion and popularization of technology.

公开号为CN 108734787A的中国专利公开了“一种基于多姿态和部位分解的图片合成虚拟试衣方法”，通过使用多姿态和部位的分解来进行合成，而不是单纯的将整张衣服图片进行简单的合成，能够更真实的达到虚拟试衣的效果，但该技术没有考虑姿态变换导致的皮肤与布料像素混淆，细节丢失等问题，大大影响了试衣效果。The Chinese patent with the publication number CN 108734787A discloses "a virtual fitting method for picture synthesis based on multi-pose and part decomposition", which is synthesized by using multi-pose and part decomposition instead of simply taking the whole clothes picture Simple synthesis can more realistically achieve the effect of virtual fitting, but this technology does not consider the confusion of skin and cloth pixels and loss of details caused by pose changes, which greatly affects the fitting effect.

发明内容Contents of the invention

本发明的目的是针对上述问题，提供一种基于姿态迁移的虚拟试衣方法，利用解析引导图对目标服装图像的扭曲范围限定，避免目标服装图像在随目标姿态变换时过度扭曲；根据目标姿态以及去除了下半身的解析图，利用服装曲翘网络得到跟随目标姿态扭曲的目标服装图像，将试穿者图像、随目标姿态扭曲的目标服装图像、目标姿态和解析引导图同时输入试穿图像生成网络，利用试穿图像生成网络得到目标姿态的试穿结果，提高试衣效果，避免试穿姿态变换导致皮肤与布料像素混淆的问题，并保持更多的服装纹理细节。The purpose of the present invention is to address the above-mentioned problems, provide a kind of virtual fitting method based on posture transfer, utilize the analytical guide map to limit the distortion range of target clothing image, avoid excessive distortion when target clothing image changes with target posture; According to target posture And remove the analysis map of the lower body, use the clothing warping network to obtain the target clothing image distorted with the target posture, input the try-on image, the target clothing image distorted with the target posture, the target posture and the analysis guide map at the same time to generate Network, using the try-on image generation network to obtain the try-on result of the target pose, improve the try-on effect, avoid the problem of skin and cloth pixel confusion caused by the change of try-on pose, and maintain more clothing texture details.

本发明的技术方案是基于姿态迁移的虚拟试衣方法，包括以下步骤：The technical solution of the present invention is a virtual fitting method based on posture migration, comprising the following steps:

步骤1，获取原始解析图和试穿者图像，先提取试穿者图像中的服装像素信息，得到简陋的服装图像，再进行纹理修复得到精细的服装图像；Step 1. Obtain the original analysis image and the image of the wearer, first extract the clothing pixel information in the image of the tryer to obtain a simple clothing image, and then perform texture restoration to obtain a fine clothing image;

步骤2，将原始解析图、目标服装和目标姿势输入到解析引导网络，得到解析引导图；Step 2, input the original analytical map, target clothing and target pose into the analytical guidance network to obtain the analytical guidance map;

步骤3，根据解析引导图，对目标服装的扭曲范围做初步限定；Step 3: Preliminarily limit the distortion range of the target clothing according to the analysis guide map;

步骤4，获取目标姿态并预处理得到去除了下半身的解析图，通过服装曲翘网络获得扭曲后的目标服装图像；Step 4: Obtain the target pose and preprocess it to obtain the analytical image with the lower body removed, and obtain the distorted target clothing image through the clothing warping network;

步骤5，根据解析引导图、扭曲后的目标服装图像、目标姿态和试穿者图像，通过图像生成网络生成目标姿态的试穿结果。Step 5: According to the analysis guide map, the distorted target clothing image, the target pose and the image of the person trying on the clothes, the try-on result of the target pose is generated through the image generation network.

进一步地，步骤1对服装图像进行像素级修复，具体修复过程包括：Further, step 1 performs pixel-level repair on the clothing image, and the specific repair process includes:

先通过卷积神经层学习服装图像的边缘信息特征，关注像素值剧烈变化的区域；再使用插值法对像素值剧烈变化的区域进行像素修复，确保服装边缘平滑且与背景自然过渡。First learn the edge information features of the clothing image through the convolutional neural layer, and focus on the area where the pixel value changes drastically; then use the interpolation method to repair the pixel in the area where the pixel value changes rapidly, to ensure that the edge of the clothing is smooth and transitions naturally with the background.

优选地，步骤1包括以下子步骤：Preferably, step 1 includes the following sub-steps:

首先，根据原始解析图中的服装语义信息，提取试穿者图像中对应区域的像素信息，得到初步的服装图像，服装图像存在图像边缘模糊或图像边缘有缺口的情形；First, according to the clothing semantic information in the original analysis image, the pixel information of the corresponding area in the try-on image is extracted to obtain a preliminary clothing image. The clothing image has blurred image edges or gaps in the image edges;

然后，使用插值法对服装图像中的模糊和缺口区域进行像素补齐或填充，得到更加精细的服装图像。Then, the interpolation method is used to fill or fill the blurred and gap areas in the clothing image to obtain a more detailed clothing image.

进一步地，步骤2的具体过程如下：Further, the specific process of step 2 is as follows:

首先，将原始解析图和目标姿势输入到解析引导网络，利用解析引导网络的多层卷积网络提取图像特征，并在解析引导网络中加入残差模块和小波采样层，用于提取更高级的语义结构，使得解析引导网络深入学习人体各个部位之间的关系细节，其中，小波采样层是通过小波变换将特征图转换到频域进行下采样，可以更好的保留纹理信息；First, input the original analysis image and target pose into the analysis-guided network, use the multi-layer convolutional network of the analysis-guided network to extract image features, and add a residual module and wavelet sampling layer in the analysis-guided network to extract more advanced The semantic structure enables the analysis-guided network to deeply learn the details of the relationship between various parts of the human body. Among them, the wavelet sampling layer converts the feature map into the frequency domain through wavelet transform for down-sampling, which can better retain texture information;

然后，将提取到的图像特征输入到解析引导网络的多层反卷积网络中，对图像进行上采样，并在反卷积之间加入归一化层，用来增强全局特征和局部特征之间的特征融合，并引入归一化约束损失函数，控制上采样过程中保留更多的语义细节；Then, the extracted image features are input into the multi-layer deconvolution network of the analytical guidance network, the image is up-sampled, and a normalization layer is added between deconvolutions to enhance the relationship between global features and local features. Feature fusion among them, and introduce a normalized constraint loss function to control the retention of more semantic details in the upsampling process;

最后，将生成的解析引导图与目标状态进行空间位置比较，确保各语义部分与对应的姿态关键点在位置上贴合，更好的处理手臂和服装之间重叠时的关系，并对语义位置进行微调得到更规整的解析引导图。Finally, compare the spatial position of the generated analytic guide map with the target state to ensure that each semantic part fits in position with the corresponding posture key points, better handle the overlapping relationship between the arm and the clothing, and correct the semantic position Perform fine-tuning to obtain a more regular analytical guide map.

优选地，所述归一化约束损失函数如下：Preferably, the normalized constraint loss function is as follows:

式中，表示归一化约束损失函数，G表示图像的全局特征，G′表示解析后图像的全局特征，L表示图像的局部特征，L′表示解析后图像的局部特征，/>表示解析前后图像全局特征匹配损失函数，/>表示解析前后图像局部特征匹配损失函数，均为学习系数，用于调整全局特征和局部特征的重要程度。In the formula, Represents the normalized constraint loss function, G represents the global feature of the image, G' represents the global feature of the analyzed image, L represents the local feature of the image, L' represents the local feature of the parsed image, /> Represents the image global feature matching loss function before and after parsing, /> Represents the image local feature matching loss function before and after parsing, Both are learning coefficients, which are used to adjust the importance of global features and local features.

解析引导图包含语义分割信息，具体包括：面部、头发、脖子、上衣区域、左手臂、右手臂、左手、右手、左肩膀、右肩膀、下衣区域。The parsing guide map contains semantic segmentation information, including: face, hair, neck, upper garment area, left arm, right arm, left hand, right hand, left shoulder, right shoulder, lower garment area.

优选地，目标姿态包含18个关键点，具体包括：鼻子、脖子、右肩、右肘、右手腕、左肩、左肘、左手腕、右臀、右膝、右脚踝、左臀、左膝、左脚踝、右眼、左眼、右耳、左耳。Preferably, the target pose contains 18 key points, specifically including: nose, neck, right shoulder, right elbow, right wrist, left shoulder, left elbow, left wrist, right hip, right knee, right ankle, left hip, left knee, Left ankle, right eye, left eye, right ear, left ear.

进一步地，步骤4的具体过程如下：Further, the specific process of step 4 is as follows:

首先，根据解析引导图中各语义信息像素值的不同，去除下半身的语义信息，得到去除了下半身的解析图；First, remove the semantic information of the lower body according to the difference in pixel values of each semantic information in the analysis guide map, and obtain the analysis map with the lower body removed;

然后，通过去除了下半身的解析图和目标姿态，限制服装图像扭曲的整体轮廓，避免曲翘网络对服装图像进行强行变形，避免服装过度扭曲；Then, by removing the analysis map and the target pose of the lower body, the overall contour of the clothing image is limited, and the warping network is prevented from forcibly deforming the clothing image, avoiding excessive distortion of the clothing;

最后，经过曲翘网络并引入平面变形损失函数对服装图像进行变形，得到扭曲后的服装图像。Finally, the clothing image is deformed through the warping network and the plane deformation loss function is introduced to obtain the warped clothing image.

优选地，所述平面变形损失函数如下：Preferably, the plane deformation loss function is as follows:

式中，C_x(x),C_y(x)分别表示采样参数的x,y坐标，|C_x(x+i,y)-C_x(x,y)|表示两个节点之间的欧氏距离，i,j均为形变量，γ,δ均为形变系数。In the formula, C _x (x), C _y (x) represent the x and y coordinates of the sampling parameters respectively, and |C _x (x+i,y)-C _x (x,y)| represents the distance between two nodes Euclidean distance, i, j are deformation variables, γ, δ are deformation coefficients.

进一步地，步骤5中，试穿图像生成网络是端到端的网络，它包括生成器和判别器，生成器输入是解析引导图、扭曲后的服装图像和试穿者图像，在解析引导图的限定下，根据扭曲后服装图像和试穿者图像的像素信息生成粗糙试穿结果图，再经过判别器并引入特征点损失函数，判定粗糙试穿结果图是否符合目标姿态，并提取更多手臂区域特征，不断加强粗糙试穿结果图的细节，提高图像清晰度。Further, in step 5, the try-on image generation network is an end-to-end network, which includes a generator and a discriminator. The input of the generator is the analysis guide map, the distorted clothing image and the try-on image. Under the constraints, a rough try-on result map is generated according to the pixel information of the distorted clothing image and the try-on image, and then the discriminator and the feature point loss function are introduced to determine whether the rough try-on result map conforms to the target pose and extract more arms Regional features, continuously enhance the details of the rough try-on result image, and improve image clarity.

优选地，所述特征点匹配损失函数如下：Preferably, the feature point matching loss function is as follows:

式中，表示特征点匹配损失函数，W表示粗糙试穿结果图中人体姿态坐标点，M表示目标姿态的坐标点，W_i(x)表示粗糙试穿结果图中坐标点i的横坐标，M_i(x)表示目标姿态图中坐标点i的横坐标，n表示特征点总个数，|W_i(x)-M_i(x)|表示相同部位关键点在x轴的欧式距离，α,β均为调整系数，且α+β＝1。In the formula, Represents the feature point matching loss function, W represents the coordinate point of the human body posture in the rough try-on result image, M represents the coordinate point of the target posture, W _i (x) represents the abscissa of the coordinate point i in the rough try-on result graph, M _i ( x) represents the abscissa of the coordinate point i in the target posture graph, n represents the total number of feature points, |W _i (x)-M _i (x)| represents the Euclidean distance of the key points of the same part on the x-axis, α, β Both are adjustment coefficients, and α+β=1.

相比现有技术，本发明的有益效果包括：Compared with the prior art, the beneficial effects of the present invention include:

(1)本发明通过将包含语义分割信息的解析引导图、随目标姿态扭曲的目标服装图像、目标姿态和试穿者图像同时输入到试穿图像生成网络，利用试穿图像生成网络得到试穿者目标姿态的试衣效果图，大幅度提升了试衣效果，解决了试穿姿态变换导致皮肤与布料像素混淆的问题，并在试衣效果图中保持更多的服装纹理细节，提高了虚拟试穿的体验度。(1) In the present invention, the analysis guide map containing semantic segmentation information, the target clothing image distorted with the target posture, the target posture and the image of the try-on person are input into the try-on image generation network at the same time, and the try-on image generation network is used to obtain the try-on The fitting effect picture of the target pose of the user, which greatly improves the fitting effect, solves the problem of confusion between the skin and the cloth pixels caused by the change of the try-on pose, and maintains more clothing texture details in the fitting effect map, which improves the virtual The experience of trying on.

(2)本发明利用包含语义分割信息的解析引导图对目标服装图像的扭曲范围进行限定，避免目标服装图像在随目标姿态变换时过度扭曲，使虚拟试穿效果更加逼真。(2) The present invention limits the distortion range of the target clothing image by using the analysis guide map containing semantic segmentation information, avoiding excessive distortion of the target clothing image when changing with the target posture, and making the virtual try-on effect more realistic.

(3)本发明从试穿者图像中获取服装图像，并对服装图像中的模糊和缺口区域进行纹理细化，得到更精细的服装图像，解决了训练数据集中缺少服装图像的问题，有助于试穿图像生成网络、解析引导网络、服装曲翘网络的训练强化，增强了试衣方法的鲁棒性。(3) The present invention obtains the clothing image from the image of the wearer, and refines the texture of the blur and gap regions in the clothing image to obtain a finer clothing image, which solves the problem of lack of clothing images in the training data set, and helps The robustness of the fitting method is enhanced by strengthening the training of the try-on image generation network, the analysis-guided network, and the clothing warping network.

(4)本发明在得到解析引导图的解析引导过程中，引入归一化层和归一化约束损失函数，在增强全局特征和局部特征融合的同时，控制上采样过程中保留了更多的语义细节。(4) The present invention introduces a normalization layer and a normalization constraint loss function during the analysis guidance process of obtaining the analysis guidance map, and while enhancing the fusion of global features and local features, more features are retained in the control upsampling process Semantic details.

(5)本发明在试穿图像生成网络中引入特征点匹配损失函数，判定初步的试穿结果图是否符合目标姿态，有效避免手臂与服装交叉遮挡的问题，进一步提高了虚拟试衣效果。(5) The present invention introduces a feature point matching loss function in the try-on image generation network to determine whether the preliminary try-on result map conforms to the target posture, effectively avoids the problem of cross-occlusion of arms and clothing, and further improves the effect of virtual fitting.

附图说明Description of drawings

下面结合附图和实施例对本发明作进一步说明。The present invention will be further described below in conjunction with drawings and embodiments.

图1为本发明实施例的虚拟试衣方法的流程示意图。FIG. 1 is a schematic flowchart of a virtual fitting method according to an embodiment of the present invention.

图2为本发明实施例的虚拟试衣方法的解析引导网络结构图。Fig. 2 is a structural diagram of the analytical guidance network of the virtual fitting method according to the embodiment of the present invention.

图3为本发明实施例的虚拟试衣方法的服装曲翘网络结构图。Fig. 3 is a structural diagram of the warping network of the virtual fitting method of the embodiment of the present invention.

图4为本发明实施例的虚拟试衣方法的试穿图像生成网络结构图。Fig. 4 is a network structure diagram of a virtual fitting image generation network of a virtual fitting method according to an embodiment of the present invention.

图5为本发明实施例的虚拟试衣系统的示意图。Fig. 5 is a schematic diagram of a virtual fitting system according to an embodiment of the present invention.

具体实施方式Detailed ways

实施例一Embodiment one

如图1所示，基于姿态迁移的虚拟试衣方法，包括以下步骤：As shown in Figure 1, the virtual fitting method based on gesture migration includes the following steps:

(1)获取原始解析图和试穿者图像，先提取试穿者图像中的服装像素信息，得到简陋的服装图像，再进行纹理修复得到精细的服装图像；(1) Obtain the original analysis image and the image of the wearer, first extract the clothing pixel information in the image of the tryer to obtain a simple clothing image, and then perform texture restoration to obtain a fine clothing image;

其中，服装图像的获取过程如下：首先，根据原始解析图中的服装语义信息，提取试穿者图像中对应区域的像素信息，得到粗糙的服装图像，服装图像的边缘模糊且存在缺口。然后，对服装图像进行纹理修复，使用插值法对粗糙服装图像中的模糊和缺口区域进行像素补齐或填充，得到更加精细的服装图像。Among them, the acquisition process of the clothing image is as follows: First, according to the clothing semantic information in the original analysis image, the pixel information of the corresponding area in the try-on image is extracted to obtain a rough clothing image with blurred edges and gaps. Then, the texture of the clothing image is repaired, and the blurred and gap areas in the rough clothing image are filled or filled by interpolation method to obtain a finer clothing image.

原始解析图包含试穿者各部位的语义信息，包含：面部、头发、脖子、上衣区域、左手臂、右手臂、左手、右手、左肩膀、右肩膀、下衣区域。The original parsing image contains semantic information of each part of the wearer, including: face, hair, neck, upper garment area, left arm, right arm, left hand, right hand, left shoulder, right shoulder, and lower garment area.

其中，纹理修复先通过卷积神经网络学习服装图像的边缘信息特征，关注于像素值剧烈变化的区域，再使用插值法对像素值剧烈变化的区域进行像素修复，确保服装边缘平滑且与背景自然过渡。Among them, the texture repair first learns the edge information characteristics of the clothing image through the convolutional neural network, focuses on the area where the pixel value changes drastically, and then uses the interpolation method to perform pixel repair on the area where the pixel value changes sharply to ensure that the edge of the clothing is smooth and natural to the background. transition.

(2)获取原始解析图和目标姿势，输入到解析引导网络，得到解析引导图；(2) Obtain the original analytical map and the target pose, input it to the analytical guidance network, and obtain the analytical guidance map;

其中，解析引导图显示试穿者姿态变换后的语义分割信息，包含面部、头发、脖子、上衣、手臂、下装的信息。Among them, the analytical guide map shows the semantic segmentation information after the pose transformation of the try-on user, including information on the face, hair, neck, top, arm, and bottom.

目标姿态由18个关键点构成，包含鼻子、脖子、右肩、右肘、右手腕、左肩、左肘、左手腕、右臀、右膝、右脚踝、左臀、左膝、左脚踝、右眼、左眼、右耳、左耳。The target pose consists of 18 key points, including nose, neck, right shoulder, right elbow, right wrist, left shoulder, left elbow, left wrist, right hip, right knee, right ankle, left hip, left knee, left ankle, right Eye, left eye, right ear, left ear.

如图2所示，解析引导网络由多层卷积网络和多层反卷积网络组成，解析引导网络的输入是原始解析图、目标服装和目标姿势，输出是解析引导图。As shown in Figure 2, the parsing-guiding network consists of a multi-layer convolutional network and a multi-layer deconvolution network. The input of the parsing-guiding network is the original parsing map, target clothing and target pose, and the output is the parsing-guiding map.

其中，解析引导过程具体如下：首先，输入原始解析图和目标姿势，经过多层卷积网络提取图像特征，并在解析引导网络中加入残差模块和小波采样层，用于提取更高级的语义结构，使得解析引导网络深入学习人体各个部位之间的关系细节，其中，小波采样层是通过小波变换将特征图转换到频域进行下采样，可更好地保留纹理信息；然后，将提取到的图像特征输入到多层反卷积网络中，对图像进行上采样，并在反卷积之间加入归一化层，用来增强全局特征和局部特征之间的特征融合，并引入归一化约束损失函数，控制上采样过程中保留更多的语义细节。最后，将生成的解析引导图与目标状态进行空间位置比较，确保各语义部分与对应的姿态关键点在位置上贴合，更好的处理手臂和服装之间重叠时的关系，并对语义位置进行微调得到更规整的解析引导图。Among them, the analysis guidance process is as follows: First, input the original analysis image and target pose, extract image features through a multi-layer convolutional network, and add a residual module and wavelet sampling layer to the analysis guidance network to extract higher-level semantics The structure enables the analysis-guided network to deeply learn the details of the relationship between various parts of the human body. Among them, the wavelet sampling layer converts the feature map to the frequency domain through wavelet transform for down-sampling, which can better retain texture information; then, the extracted The image features of the image are input into the multi-layer deconvolution network, the image is upsampled, and a normalization layer is added between deconvolutions to enhance the feature fusion between global features and local features, and introduce normalization The constraint loss function is optimized to control the preservation of more semantic details during the upsampling process. Finally, compare the spatial position of the generated analytic guide map with the target state to ensure that each semantic part fits in position with the corresponding posture key points, better handle the overlapping relationship between the arm and the clothing, and correct the semantic position Perform fine-tuning to obtain a more regular analytical guide map.

其中，在归一化层中将前一层反卷积得到的特征看作局部特征，将后一层反卷积得到的特征看作全局特征，通过引入归一化约束损失函数控制当前局部特征和全局特征对后续融合结果的影响。Among them, in the normalization layer, the features obtained by the previous layer of deconvolution are regarded as local features, and the features obtained by the latter layer of deconvolution are regarded as global features, and the current local features are controlled by introducing a normalization constraint loss function and the impact of global features on subsequent fusion results.

其中，归一化约束损失函数表示为：Among them, the normalized constraint loss function is expressed as:

(3)根据解析引导图，对服装的扭曲范围做初步限定；(3) According to the analysis guide map, make a preliminary limit on the distortion range of the clothing;

(4)获取目标姿态并预处理得到去除了下半身的解析图，通过服装曲翘网络获得扭曲后的服装图像，如图3所示；(4) Obtain the target pose and preprocess it to obtain the analytical image with the lower body removed, and obtain the distorted clothing image through the clothing warping network, as shown in Figure 3;

其中，获取扭曲服装图像的具体过程如下：首先，根据解析引导图中各语义信息像素值的不同，去除下半身的语义信息，得到去除了下半身的解析图；然后，通过去除了下半身的解析图和目标姿态，限制服装图像扭曲的整体轮廓，避免曲翘网络对服装图像进行强行变形，避免服装过度扭曲；最后，经过曲翘网络并引入平面变形损失函数对服装图像进行变形，得到扭曲后的服装图像。Among them, the specific process of obtaining the distorted clothing image is as follows: First, according to the difference in the semantic information pixel value of the analysis guide image, the semantic information of the lower body is removed to obtain the analysis image with the lower body removed; then, by removing the analysis image of the lower body and The target pose limits the overall outline of the clothing image distortion, avoiding the warping network to forcibly deform the clothing image, and avoiding excessive distortion of the clothing; finally, the warping network and the introduction of the plane deformation loss function are used to deform the clothing image to obtain the distorted clothing image.

其中，平面变形损失函数表示为：Among them, the plane deformation loss function is expressed as:

(5)根据解析引导图、扭曲后的服装图像、目标姿态和试穿者图像，通过图像生成网络生成目标姿态的试穿结果；(5) According to the analysis guide map, the distorted clothing image, the target pose and the image of the try-on person, the try-on result of the target pose is generated through the image generation network;

如图4所示，图像生成网络是端到端的网络，由生成器和判别器两部分构成，生成器由编码器和解码器构成，生成器输入是解析引导图、扭曲后的服装图像和试穿者图像，在解析引导图的限定下，根据扭曲后服装图像和试穿者图像的像素信息生成粗糙试穿结果图，并得到粗糙试穿结果图的人体姿态图，再经过判别器并引入特征点匹配损失函数，判定粗糙试穿结果图是否符合目标姿态，并鼓励提取更多手臂区域特征，不断加强粗糙试穿结果图的细节，提高图像清晰度。As shown in Figure 4, the image generation network is an end-to-end network consisting of a generator and a discriminator. The generator is composed of an encoder and a decoder. The image of the wearer, under the limitation of the analysis guide map, generates a rough try-on result map according to the pixel information of the distorted clothing image and the try-on image, and obtains the human body pose map of the rough try-on result map, and then passes through the discriminator and introduces The feature point matching loss function determines whether the rough try-on result map conforms to the target pose, and encourages the extraction of more arm region features, continuously enhances the details of the rough try-on result map, and improves image clarity.

其中，特征点匹配损失函数如下所示：Among them, the feature point matching loss function is as follows:

实施例二Embodiment two

如图5所示，姿态迁移的虚拟试衣系统包括解析引导模块、服装匹配模块、图像融合模块。As shown in Figure 5, the virtual fitting system for posture transfer includes an analysis guidance module, a clothing matching module, and an image fusion module.

解析引导模块，用于根据原始解析图、试穿者图像和目标姿势，先进行像素提取和纹理修复，再经过解析引导网络生成解析引导图；The analysis guidance module is used to perform pixel extraction and texture restoration according to the original analysis image, the image of the wearer and the target pose, and then generate the analysis guidance image through the analysis guidance network;

服装匹配模块，用于根据解析引导图、目标姿态和去除了下半身的解析图，通过服装曲翘网络获得扭曲后的服装图像；The clothing matching module is used to obtain the distorted clothing image through the clothing warping network according to the analysis guide map, the target pose and the analysis map with the lower body removed;

图像融合模块，根据解析引导图、扭曲后的服装图像、目标姿态和试穿者图像，通过试穿图像生成网络生成目标姿态的试穿结果。The image fusion module generates the try-on result of the target pose through the try-on image generation network according to the analysis guide map, the distorted clothing image, the target pose and the try-on image.

如图2所示，语义解析网络的输入是原始解析图和目标姿势图，输出是解析引导图，即姿态迁移后的解析图。原始解析图和目标姿势图分别经依次连接的5个残差块处理，每个残差块使用3×3的卷积提取特征，残差块之间经小波层连接，小波层在频域空间对特征图下采样；末端的残差块连接归一化层，用来增强全局特征和局部特征之间的特征融合，并引入归一化约束损失函数，控制上采样过程中保留更多的语义细节；归一化之后经依次连接的5个反卷积层处理，相邻的反卷积层之间经逆小波层连接，逆小波层用于上采样，末端的反卷积层输出解析引导图。As shown in Figure 2, the input of the semantic parsing network is the original parsing map and the target pose map, and the output is the parsing guide map, that is, the parsing map after pose migration. The original parsing image and the target pose image are respectively processed by sequentially connected five residual blocks, and each residual block uses 3×3 convolution to extract features, and the residual blocks are connected by a wavelet layer. Downsample the feature map; the residual block at the end is connected to the normalization layer to enhance the feature fusion between global features and local features, and introduce a normalized constraint loss function to control more semantics during the upsampling process Details; After normalization, it is processed by 5 deconvolution layers connected in sequence, adjacent deconvolution layers are connected by inverse wavelet layer, the inverse wavelet layer is used for upsampling, and the final deconvolution layer outputs analytical guidance picture.

如图3所示，服装曲翘网络的输入是解析引导图和服装图像，输出是扭曲后的服装图像。首先，解析引导图和服装图像分别经编码器进行编码，分别提取两者的图像特征；然后，通过两者的图像特征计算形变系数θ，并通过解析引导图和目标姿态，限制服装图像扭曲的整体轮廓，避免曲翘网络对服装图像进行强行变形，避免服装过度扭曲；最后，经扭曲操作并引入平面变形损失函数对服装图像进行变形，得到扭曲后的服装图像。As shown in Figure 3, the input of the clothing warping network is the parsing guide map and the clothing image, and the output is the warped clothing image. First, the analysis guide map and the clothing image are encoded by the encoder, and the image features of the two are extracted respectively; then, the deformation coefficient θ is calculated through the image features of the two, and the distortion of the clothing image is limited by analyzing the guide map and the target pose. The overall outline avoids the warping network from forcibly deforming the clothing image and avoiding excessive distortion of the clothing; finally, the clothing image is deformed by the warping operation and the introduction of the plane deformation loss function to obtain the distorted clothing image.

如图4所示，试穿图像生成网络的输入是解析引导图、扭曲后的服装图像和试穿者图像，输出是试穿图像。试穿图像生成网络是端到端的网络，它包括生成器和判别器，生成器由编码器和解码器构成，生成器输入是解析引导图、扭曲后的服装图像和试穿者图像，在解析引导图的限定下，根据扭曲后服装图像和试穿者图像的像素信息生成粗糙试穿结果图，再经过判别器并引入特征点损失函数，判定粗糙试穿结果图是否符合目标姿态，并提取更多手臂区域特征，加强粗糙试穿结果图的细节，提高图像清晰度。As shown in Figure 4, the input of the try-on image generation network is the parsing guide map, the distorted clothing image and the try-on image, and the output is the try-on image. The try-on image generation network is an end-to-end network. It includes a generator and a discriminator. The generator is composed of an encoder and a decoder. Under the limitation of the guide map, the rough try-on result map is generated according to the pixel information of the distorted clothing image and the try-on image, and then the discriminator and the feature point loss function are introduced to determine whether the rough try-on result map conforms to the target pose, and extract More arm region features, enhanced details of the rough try-on result image, and improved image clarity.

姿态迁移的虚拟试衣系统采用和实施例一相同的虚拟试衣方法。The virtual fitting system for posture migration adopts the same virtual fitting method as that of the first embodiment.

实施结果表明本发明不仅使得语义分割的精度更高，而且增加了服装变形的鲁棒性，使试穿结果图像保留更多的细节大大提高了高分辨率2D图像的虚拟试穿效果，提高了试穿效果和用户体验。The implementation results show that the invention not only makes the accuracy of semantic segmentation higher, but also increases the robustness of clothing deformation, makes the try-on result image retain more details, greatly improves the virtual try-on effect of high-resolution 2D images, and improves the Try-on effect and user experience.

Claims

1. The virtual fitting method based on gesture migration is characterized by comprising the following steps of:

step 1, acquiring an original analysis image and a try-on image, firstly extracting clothing pixel information in the try-on image to obtain a crude clothing image, and then performing texture repair to obtain a fine clothing image;

step 2, inputting the original analysis chart, the target clothing and the target gesture into an analysis guiding network to obtain an analysis guiding chart;

step 3, primarily limiting the twisting range of the target clothing according to the analysis guide diagram;

step 4, acquiring a target gesture, preprocessing to obtain an analysis chart from which the lower body is removed, and obtaining a distorted clothing image through a clothing warping network;

step 5, generating a try-on result of the target gesture through a try-on image generating network according to the analysis guide map, the distorted target clothing image, the target gesture and the try-on image;

the specific process of the step 2 is as follows:

firstly, inputting an original analysis chart and a target gesture into an analysis guiding network, extracting image features by utilizing a multi-layer convolution network of the analysis guiding network, adding a residual error module and a wavelet sampling layer into the analysis guiding network for extracting higher-level semantic structures, so that the analysis guiding network deeply learns the relation details among all parts of a human body, wherein the wavelet sampling layer converts the feature chart into a frequency domain for downsampling through wavelet transformation, and texture information can be better reserved;

then, inputting the extracted image features into a multi-layer deconvolution network of an analysis guide network, up-sampling the image, adding a normalization layer between convolution and deconvolution to enhance feature fusion between global features and local features, introducing a normalization constraint loss function, and controlling the up-sampling process to retain more semantic details;

finally, comparing the space position of the generated analysis guide graph with that of the target state, ensuring that each semantic part is attached to the corresponding gesture key point in position, better processing the overlapping relationship between the arm and the garment, and fine-tuning the semantic position to obtain a more regular analysis guide graph;

the semantic analysis network comprises 5 residual blocks which are sequentially connected, wherein the residual blocks are connected through a wavelet layer, and the wavelet layer downsamples the feature map in a frequency domain space; the residual block at the tail end is connected with a normalization layer and is used for enhancing feature fusion between global features and local features, introducing a normalization constraint loss function and controlling more semantic details to be reserved in the up-sampling process; after normalization, processing by 5 deconvolution layers which are sequentially connected, connecting adjacent deconvolution layers by an inverse wavelet layer, wherein the inverse wavelet layer is used for up-sampling, and outputting an analysis guide graph by the deconvolution layer at the tail end;

in step 5, the test image generation network comprises a generator and a discriminator, wherein the input of the generator is an analysis guide image, a distorted clothing image and a test person image, a rough test result image is generated according to pixel information of the distorted clothing image and the test person image under the limitation of the analysis guide image, and then the rough test result image is judged whether to conform to the target gesture or not through the discriminator and the feature point matching loss function is introduced, and more arm region features are extracted, so that the details of the rough test result image are enhanced, and the image definition is improved;

the feature point matching loss function is as follows:

in the method, in the process of the invention,representing a characteristic point matching loss function, W representing a human body posture coordinate point in a rough test result graph, M representing a target posture coordinate point, W _i (x) Representing the abscissa of a human body posture coordinate point i in a rough test result diagram, M _i (x) The abscissa of coordinate point i in the target gesture graph is represented, n represents the total number of characteristic points, and W is represented _i (x)-M _i (x) And the I represents the Euclidean distance of the key point of the same part on the x axis, alpha and beta are adjustment coefficients, and alpha+beta=1.

2. The virtual fitting method according to claim 1, wherein step 1 performs a pixel level repair of the garment image, and the specific repair process includes: firstly, learning edge information characteristics of a clothing image through a convolutional neural layer, and focusing on a region with a pixel value which is changed severely; and then, carrying out pixel repair on the region with the pixel value which is changed severely by using an interpolation method, so as to ensure that the clothing edge is smooth and is in natural transition with the background.

3. A virtual fitting method according to claim 2, characterized in that step 1 comprises the sub-steps of:

firstly, extracting pixel information of a corresponding region in an image of a test wearer according to clothing semantic information in an original analytic graph to obtain a rough clothing image, wherein the clothing image has the condition of blurred image edges or gaps on the image edges;

and then, carrying out texture restoration on the clothing image, and carrying out pixel filling or filling on the fuzzy and notch areas in the rough clothing image by using an interpolation method to obtain a finer clothing image.

4. A virtual fitting method according to claim 3, wherein the normalized constraint loss function is as follows:

in the method, in the process of the invention,representing a normalized constraint loss function, G represents a global feature of the image, G ^′ Representing global features of the resolved image, L representing local features of the image, L ^′ Representing local features of the parsed image, +.>Representing the global feature matching loss function of the image before and after parsing, < + >>Represents the local feature matching loss function of the image before and after analysis,are learning coefficients.

5. A virtual fitting method according to claim 3, wherein the specific procedure of step 4 is as follows:

firstly, removing semantic information of the lower body according to different pixel values of semantic information in the analysis guide graph to obtain an analysis graph from which the lower body is removed;

then, by removing the analytic graph and the target gesture of the lower body, the whole outline of the distortion of the clothing image is limited, the forced deformation of the clothing image by the warping network is avoided, and the excessive distortion of the clothing is avoided;

and finally, deforming the clothing image through a warping network and introducing a plane deformation loss function to obtain a distorted clothing image.

6. The virtual fitting method according to claim 5, wherein the plane deformation loss function is as follows:

wherein C is _x (x),C _y (x) Respectively representing x and y coordinates, |C of the sampling parameter _x (x+i,y)-C _x (x, y) I represents Euclidean distance between two nodes, i, j are deformation amounts, and gamma, delta are deformation coefficients.