CN110288677A

CN110288677A - A Pedestrian Image Generation Method and Device Based on Deformable Structure

Info

Publication number: CN110288677A
Application number: CN201910425357.7A
Authority: CN
Inventors: 田永鸿; 常亦谦; 翟云鹏; 史业民; 王耀威
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2019-05-21
Filing date: 2019-05-21
Publication date: 2019-09-27
Anticipated expiration: 2039-05-21
Also published as: CN110288677B

Abstract

The invention relates to the field of image generation, in particular to a pedestrian image generation method and device based on a deformable structure. It specifically includes the following steps: step 1, segment the pedestrian picture and the target pose picture according to the part structure, and extract the mask operation; step 2, then perform the part generation operation, and obtain the part generated picture; step 3, perform structure on the part generated picture Combine operations to obtain structured merged pictures; step 4, perform overall generation operations to obtain generated pictures. The invention reduces the cost of training and improves the performance of the algorithm on the basis of considering the deformable structure of the human body.

Description

A Pedestrian Image Generation Method and Device Based on Deformable Structure

技术领域technical field

本发明涉及图像生成领域，特别涉及一种基于可形变结构的行人图像生成方法和装置。The invention relates to the field of image generation, in particular to a pedestrian image generation method and device based on a deformable structure.

背景技术Background technique

从一张行人图片根据给定姿态转换成另一张行人图片，是行人图像生成问题。行人图像生成问题是图像生成的一个领域，相比较普通的图像生成，行人图像生成因为要考虑更复杂的场景和多样的可形变姿态，会更加复杂和充满挑战性。Converting from a pedestrian image to another pedestrian image according to a given pose is a pedestrian image generation problem. The problem of pedestrian image generation is a field of image generation. Compared with ordinary image generation, pedestrian image generation will be more complex and challenging because it needs to consider more complex scenes and various deformable poses.

可以根据传统的图像生成思路来解决行人图像生成问题，比如采用条件对抗生成网络，将人体全身的源图片作为条件指导网络生成具有源图片外观的新姿态图片；还可以采用循环对抗生成网络，替换行人图片的背景和光照，在保留人体特征的基础上，生成新的姿态和环境下的行人图片。这样的方法最大的问题是难以训练，人体作为可形变物体过于复杂，复杂的图片转换关系需要极大规模的训练样本。The problem of pedestrian image generation can be solved according to the traditional image generation idea, such as using conditional confrontation generation network, using the source image of the whole body of the human body as a condition to guide the network to generate a new pose image with the appearance of the source image; it is also possible to use a cyclic confrontation generation network to replace The background and lighting of pedestrian pictures, on the basis of retaining human characteristics, generate new poses and pedestrian pictures in the environment. The biggest problem with this method is that it is difficult to train. The human body is too complex as a deformable object, and the complex image conversion relationship requires a very large number of training samples.

将人体信息引入生成过程是一种更好的解决思路，比如将姿态信息作为输入信息的一部分，提供先验条件的指引。人体可形变复杂性的关键就是姿态的多样性，姿态信息的先验指导可以有效地缓解生成复杂性，从而可以生成更真实的行人图片。同样的问题依然存在，全身的姿态转换依然复杂，想要生成更真实的图片依然需要海量的训练样本。It is a better solution to introduce human body information into the generation process, such as using posture information as part of the input information to provide guidance on prior conditions. The key to the deformable complexity of the human body is the diversity of poses. The prior guidance of pose information can effectively alleviate the generation complexity, so that more realistic pedestrian pictures can be generated. The same problem still exists, the posture transformation of the whole body is still complicated, and a large number of training samples are still required to generate more realistic pictures.

发明内容Contents of the invention

本发明实施例提供了一种基于可形变结构的行人图像生成方法和装置，在考虑人体可形变结构的基础上，降低训练的代价，提升算法的性能。Embodiments of the present invention provide a method and device for generating pedestrian images based on deformable structures, which reduce the cost of training and improve the performance of the algorithm on the basis of considering the deformable structure of the human body.

根据本发明实施例的第一方面，本发明一种基于可形变结构的行人图像生成方法，具体包括以下步骤：According to the first aspect of the embodiments of the present invention, the present invention provides a pedestrian image generation method based on a deformable structure, which specifically includes the following steps:

步骤一、对于输入的行人图片和目标姿态图片，对行人图片和目标姿态图片按照部位结构进行分割操作，得到的部位行人图片和部位目标姿态图片，对行人图片、目标姿态图片、部位行人图片和部位目标姿态图片均进行提取mask操作，得到行人mask图片、目标姿态图片mask图片、部位行人mask图片和部位目标姿态图片mask图片；Step 1. For the input pedestrian pictures and target pose pictures, the pedestrian pictures and target pose pictures are segmented according to the part structure, and the obtained part pedestrian pictures and part target pose pictures are divided into pedestrian pictures, target pose pictures, part pedestrian pictures and The target pose pictures of the parts are all subjected to the mask operation to obtain the pedestrian mask picture, the target pose picture mask picture, the part pedestrian mask picture and the part target pose picture mask picture;

步骤二、对部位行人图片预处理，对预处理后的部位行人图片、部位目标姿态图片和部位目标姿态mask图片，然后进行部位生成操作，得到部位生成图片；Step 2, preprocessing the part pedestrian picture, and then performing the part generation operation on the preprocessed part pedestrian picture, part target pose picture and part target pose mask picture, to obtain the part generation picture;

步骤三、对步骤二中部位生成操作得到的部位生成图片进行结构化合并操作，得到结构化合并图片；Step 3, performing a structured merging operation on the part generated picture obtained by the part generating operation in step 2, to obtain a structured merging picture;

步骤四、对原始的行人图片进行预处理，将预处理后的行人图片和步骤三中的合并后的图片、目标姿态图片作为输入，然后进行整体生成操作，得到生成图片。Step 4: Perform preprocessing on the original pedestrian picture, use the preprocessed pedestrian picture, the combined picture in step 3, and the target pose picture as input, and then perform the overall generation operation to obtain the generated picture.

所述步骤一中，分割操作具体包括以下步骤：In the first step, the segmentation operation specifically includes the following steps:

1.1对行人图片和目标姿态图片，采用关节点检测算法，找到输入图片的关节点；1.1 For pedestrian pictures and target pose pictures, use the joint point detection algorithm to find the joint points of the input picture;

1.2通过关节点的位置和确信度，判断提取的关节点是否可以使用；1.2 Judging whether the extracted joint points can be used according to the position and certainty of the joint points;

1.3如果关节点可以使用，根据双肩2个关节点的平均高度和髋关节2个关节点的平均高度，将图片分割为3个部分，双肩2个关节点的平均高度以上的部分为第一部分，2个关节点的平均高度和髋关节2个关节点的平均高度之间的部分为第二部分，髋关节2个关节点的平均高度以下的部分为第三部分；如果关节点不可以使用，根据固定尺寸将图片分割为3个部分，从上到下依次分别为第一部分、第二部分、第三部分。1.3 If the joint points can be used, divide the picture into 3 parts according to the average height of the two joint points of the shoulders and the average height of the two joint points of the hip joint, and the part above the average height of the two joint points of the shoulders is the first part. The part between the average height of the 2 joint points and the average height of the 2 joint points of the hip joint is the second part, and the part below the average height of the 2 joint points of the hip joint is the third part; if the joint point is not available, Divide the picture into three parts according to the fixed size, which are the first part, the second part and the third part respectively from top to bottom.

所述步骤二中，具体包括以下子步骤：In the second step, the following sub-steps are specifically included:

2.1根据生成部位的不同，分为3个独立的生成网络，分别对应步骤一中的第一部分、第二部分和第三部分；2.1 According to the different generation parts, it is divided into three independent generation networks, corresponding to the first part, second part and third part in step 1;

2.2对于第i个独立的生成网络，包括生成器和判别器向生成器和判别器输入分割后的部位行人图片x_i、分割后的目标姿态mask图片p_i和分割后的目标姿态图片y_i，通过训练输出和目标姿态一致的部位生成图片G_pi(x_i，p_i)；2.2 For the i-th independent generative network, including the generator and the discriminator to the generator and the discriminator Input the segmented part pedestrian picture x _i , the segmented target pose mask picture p _i and the segmented target pose picture y _i , and generate a picture G _pi ( _xi , p _i ) by training the part that is consistent with the target pose;

2.3依次对3个独立的生成网络重复步骤2.2，得到所有的部位生成图片。2.3 Repeat step 2.2 for the three independent generation networks in turn to obtain all part generation pictures.

所述步骤三，结构化合并操作包括如下子步骤：In the third step, the structured merging operation includes the following sub-steps:

3.1对于得到的3个分别对应第一部分、第二部分和第三部分的生成部位图片，根据原图中不同部位的尺寸比例h_T，i和w_T，将生成的部位图片进行缩放，得到缩放后的3个生成部位图片 3.1 For the obtained three generated part pictures corresponding to the first part, the second part and the third part respectively, according to the size ratio h _{T , i} and w _T of different parts in the original picture, the generated part pictures Scale to get the zoomed images of the 3 generated parts

3.2根据原图中部位结构的位置关系，将纵向合并为结构化合并后的部位生成图片；3.2 According to the positional relationship of the part structure in the original picture, the Merge vertically to generate pictures for structured merged parts;

3.3调节结构化合并后的部位生成图片的颜色和边缘连接信息，Δh_i是高度的偏移调整，c_i是不同部位图片的色彩平衡调整因子，得到更真实的结构化合并图片A_w。3.3 Adjust the color and edge connection information of the image generated by the structured merged part. Δh _i is the offset adjustment of the height, and c _i is the color balance adjustment factor of the different parts of the image to obtain a more realistic structured merged image A _w .

所述步骤2.2具体包括以下子步骤：The step 2.2 specifically includes the following sub-steps:

2.2a)将分割后的部位行人图片x_i输入生成器得到生成图将部位行人图片x_i和目标姿态mask图片p_i输入生成器生成图片 2.2a) Input the segmented pedestrian image x _i into the generator get the generated graph Input the part pedestrian picture x _i and the target pose mask picture p _i into the generator generate image

2.2b)将部位行人图片x_i和目标姿态图片y_i输入判别器得到将生成图G_pi(x_i，p_i)与部位目标姿态mask图片p_i输入判别器得到 2.2b) Input the part pedestrian picture x _i and the target pose picture y _i into the discriminator get Input the generated graph G _pi ( _xi , p _i ) and the part target pose mask image p _i into the discriminator get

2.2c)计算部位目标姿态图片y_i、生成图G_pi(x_i)与部位目标姿态mask图片p_i的maskL1损失函数其中⊙指两个相同尺寸的矩阵之间的元素乘法，||*||₁为1-范数；计算生成图G_pi(x_i)和真实图片的对抗损失函数V_pi，Mask为目标姿态mask图片矩阵：为均值；2.2c) Calculate the maskL1 loss function of the part target pose picture y _i , the generated graph G _pi ( _xi ) and the part target pose mask picture p _i in ⊙ refers to the multiplication of elements between two matrices of the same size, ||*|| ₁ is the 1-norm; calculate the confrontation loss function V _pi of the generated image G _pi ( _xi ) and the real image, and Mask is the target pose mask Image matrix: is the mean value;

2.2d)计算对抗损失函数为均值；2.2d) Compute the adversarial loss function is the mean value;

2.2e)综合上述两个损失函数，第i个独立的生成网络，损失函数为：2.2e) Combining the above two loss functions, the i-th independent generation network, the loss function is:

2.2f)通过最小化损失函数L_i来更新生成器 2.2f) The generator is updated by minimizing the loss function L _i

2.2g)通过最大化对抗损失函数更新判别器 2.2g) By maximizing the adversarial loss function update discriminator

2.2k)返回2.2a)继续更新，直至损失函数L_i减低到阈值或者迭代次数达到要求，输出和目标姿态一致的部位生成图片G_pi(x_i，p_i)。2.2k) Return to 2.2a) and continue to update until the loss function L _i is reduced to the threshold or the number of iterations reaches the requirement, and the image G _pi ( _xi , p _i ) is generated at the part that is consistent with the target pose.

所述步骤四，整体生成操作包括如下子步骤：Said step 4, the overall generation operation includes the following sub-steps:

4.1将行人图片x输入生成器G_w得到生成图G_w(x)，将行人图片x、目标姿态mask图片、合并图片A_w输入生成器G_w得到生成图G_w(x，p，A_w)；4.1 Input the pedestrian image x into the generator G _w to obtain the generated graph G _w (x), and input the pedestrian image x, the target pose mask image, and the merged image A _w into the generator G _w to obtain the generated graph G _w (x, p, A _w );

4.2将目标姿态图片y输入判别器D_w得到D_w(y)，将生成图G_w(x，p，A_w)输入判别器D_w得到D_w(G_w(x，p，A_w))；4.2 Input the target pose picture y into the discriminator D _w to get D _w (y), and input the generated image G _w (x, p, A _w ) into the discriminator D _w to get D _w (G _w (x, p, A _w ) );

4.3计算目标姿态图片y、生成图G_w(x)和mask图片p的maskL1损失函数M(G_w)：4.3 Calculate the maskL1 loss function M(G _w ) of the target pose picture y, the generated picture G _w (x) and the mask picture p:

⊙指两个相同尺寸的矩阵之间的元素乘法，||*||₁为1-范数；⊙ refers to the multiplication of elements between two matrices of the same size, ||*|| ₁ is the 1-norm;

4.4计算身份分类网络作为指导：4.4 Compute the identity classification network as a guide:

其中，cl指目标人物的身份类别标签，如果分类网络预测的类别标签和cl一致则Q_c＝1，否则Q_c＝0，P(G_w(x，p，A_w))分类网络的输出概率分布；Among them, cl refers to the identity category label of the target person. If the category label predicted by the classification network is consistent with cl, then Q _c = 1, otherwise Q _c = 0, P(G _w (x, p, A _w )) the output of the classification network Probability distributions;

4.5计算对抗损失函数V_w：4.5 Calculate the confrontation loss function V _w :

4.6整体生成网络，损失函数L_w为：4.6 The overall generation network, the loss function L _w is:

L_w＝V_w(D_w，G_w)+M(G_w)+C(G_w，cl)L _w ＝V _w (D _w ，G _w )+M(G _w )+C(G _w ，cl)

4.7通过最小化损失函数L_w来更新生成器Gw；4.7 Update the generator _Gw by minimizing the loss function Lw;

4.8通过最大化对抗损失函数V_w(D_w，G_w)更新判别器D_w；4.8 Update the discriminator D _w by maximizing the adversarial loss function V _w (D _w , G _w );

4.9返回步骤4.1继续更新，直至损失函数L_w减低到可接受范围或者迭代次数达到要求，输出生成图片G_w(x，p，A_w)。4.9 Return to step 4.1 and continue to update until the loss function L _w is reduced to an acceptable range or the number of iterations meets the requirements, and the generated image G _w (x, p, A _w ) is output.

所述步骤一中，提取mask操作具体为：In the step 1, the operation of extracting the mask is specifically:

对于输入的图片，采用mask检测算法，获得相应mask图片；其中，mask图片上的检测物体颜色统一为白色，背景颜色统一为黑色。For the input picture, use the mask detection algorithm to obtain the corresponding mask picture; wherein, the color of the detected object on the mask picture is uniformly white, and the background color is uniformly black.

所述步骤三中，A_w的计算公式为：In the step 3, the calculation formula of _Aw is:

其中，h_T和w_T表示目标图片的高度和宽度，h_T，i表示目标图片第i个身体部位的高度；R(pic，h，w)代表将一张图片的尺寸调整为h*w的操作，O(h*w)指h*w尺寸的零矩阵。我们根据目标图片的部位结构关系重新组织部位图片的位置。为了保证部位连接处的平滑，Δh_i是高度的偏移调整，而c_i是不同部位图片的色彩平衡调整因子。Among them, h _T and w _T represent the height and width of the target picture, h _{T, i} represent the height of the i-th body part of the target picture; R(pic, h, w) represents resizing a picture to h*w The operation, O(h*w) refers to the zero matrix of h*w size. We reorganize the position of part images according to the part structure relations of target images. In order to ensure the smoothness of the part connection, Δh _i is the height offset adjustment, and _ci is the color balance adjustment factor of different parts of the picture.

一种基于可形变结构的行人图像生成装置，包括：A pedestrian image generation device based on a deformable structure, comprising:

图像预处理模块：对于输入的原行人图片和目标姿态图片，分别对原行人图片和目标姿态图片按照部位结构进行分割操作和提取mask操作，得到三组预处理后的部位行人mask图、部位目标姿态mask图片、部位行人图片和部位目标姿态图片；Image preprocessing module: For the input original pedestrian picture and target pose picture, the original pedestrian picture and target pose picture are respectively segmented and extracted according to the part structure, and three sets of preprocessed part pedestrian mask pictures and part targets are obtained. Attitude mask pictures, part pedestrian pictures and part target pose pictures;

部位生成模块：对分割得到的部位行人图片用部位行人mask图片预处理，对部位目标姿态mask图片、部位行人图片和部位目标姿态图片，进行部位生成操作，得到三张部位生成图片；Part generation module: preprocess the part pedestrian image obtained by segmentation with the part pedestrian mask image, perform part generation operation on the part target pose mask picture, part pedestrian picture and part target pose picture, and obtain three part generated pictures;

结构化合并模块：对部位生成操作得到的三张部分生成图片进行结构化合并操作，得到一张结构化合并图片；Structured merging module: perform a structured merging operation on the three partially generated pictures obtained by the part generating operation to obtain a structured merging picture;

整体生成模块：将结构化合并图片、原图片和目标姿态作为输入，进行整体生成操作，得到一张最终的行人生成图片。Overall generation module: take the structured merged image, the original image and the target pose as input, perform the overall generation operation, and obtain a final generated image of pedestrians.

部位生成模块和整体生成模块均包含生成器和判别器。Both the part generation module and the overall generation module contain a generator and a discriminator.

本发明实施例提供的技术方案可以包括以下有益效果：The technical solutions provided by the embodiments of the present invention may include the following beneficial effects:

通过将行人图像生成的复杂问题分解为数个部位图片的姿态间转换问题，降低了生成网络对于训练样本的数量需求，同时将更多的局部特征作为生成的指标，在高效的同时提高了生成图片的质量。具体来说，包括：通过分割操作和提取mask操作，将图片进行符合人体特征的先验处理；通过部位生成操作，生成不同的部位图片，分解复杂的全身姿态对应；通过结构化合并操作，将生成的部位图片组合起来，为全身的生成提供有力的指导；通过整体生成操作，在保留局部信息和身份信息的前提下，生成更真实可信的行人图像。综上所述，通过本发明实施例提供的方法能够提高行人图像生成算法的效率和生成真实性。By decomposing the complex problem of pedestrian image generation into the pose conversion problem of several parts of the picture, the number of training samples required by the generation network is reduced, and more local features are used as the index of generation, which improves the generation of images while being efficient. the quality of. Specifically, it includes: through the segmentation operation and the extraction mask operation, the prior processing of the picture in line with the characteristics of the human body; through the part generation operation, different parts of the picture are generated, and the complex body posture correspondence is decomposed; through the structured merging operation, the The generated part images are combined to provide powerful guidance for the generation of the whole body; through the overall generation operation, a more authentic and credible pedestrian image is generated under the premise of retaining local information and identity information. To sum up, the method provided by the embodiment of the present invention can improve the efficiency and authenticity of the pedestrian image generation algorithm.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本发明的实施例，并与说明书一起用于解释本发明的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description serve to explain the principles of the invention.

图1为本发明一种基于可形变结构的行人图像生成方法流程图；Fig. 1 is a flow chart of a pedestrian image generation method based on a deformable structure in the present invention;

图2为本发明一种基于可形变结构的行人图像生成方法的对比图；Fig. 2 is a comparison diagram of a pedestrian image generation method based on a deformable structure in the present invention;

图3为本发明实施例中基于可形变结构的行人图像生成方法的整体示意图；3 is an overall schematic diagram of a pedestrian image generation method based on a deformable structure in an embodiment of the present invention;

图4为本发明实施例中基于可形变结构的行人图像生成的分割和提取操作示意图；FIG. 4 is a schematic diagram of segmentation and extraction operations based on deformable structure-based pedestrian image generation in an embodiment of the present invention;

图5为本发明实施例中基于可形变结构的行人图像生成的结构化合并操作示意图；FIG. 5 is a schematic diagram of a structured merging operation based on deformable structure-based pedestrian image generation in an embodiment of the present invention;

图6为本发明一种基于可形变结构的行人图像生成装置的结构框图。FIG. 6 is a structural block diagram of a pedestrian image generation device based on a deformable structure according to the present invention.

具体实施方式Detailed ways

实施例一Embodiment one

如图1、2所示，本发明提供了一种基于可形变结构的行人图像生成方法，能够得到优化的目标姿态图片，具体包括以下步骤：As shown in Figures 1 and 2, the present invention provides a pedestrian image generation method based on a deformable structure, which can obtain an optimized target posture picture, specifically including the following steps:

步骤一、如图4所示，对于输入的行人图片和目标姿态图片，对行人图片和目标姿态图片按照部位结构进行分割操作，得到的部位行人图片和部位目标姿态图片，对行人图片、目标姿态图片、部位行人图片和部位目标姿态图片均进行提取mask操作，得到行人mask图片、目标姿态mask图片、部位行人mask图片和部位目标姿态图片mask图片；Step 1, as shown in Figure 4, for the input pedestrian picture and the target pose picture, the pedestrian picture and the target pose picture are segmented according to the part structure, the obtained part pedestrian picture and the part target pose picture, the pedestrian picture, the target pose picture The picture, the part pedestrian picture and the part target pose picture are all subjected to the mask operation to obtain the pedestrian mask picture, the target pose mask picture, the part pedestrian mask picture and the part target pose picture mask picture;

分割操作具体包括以下步骤：The segmentation operation specifically includes the following steps:

1.1对行人图片和目标姿态图片，采用关节点检测算法，首先找到输入图片的14个关节点；1.1 For pedestrian pictures and target pose pictures, use the joint point detection algorithm, first find 14 joint points of the input picture;

1.2通过关节点的位置和确信度，判断提取的关节点是否可以使用，可以使用需要满足：确信度大于0.6的关节点数量超过8个，并且肩关节点与髋关节点之间的最小纵向距离超过图片总高度的1/3；1.2 Judging whether the extracted joint points can be used based on the position and certainty of the joint points, the requirements for use are: the number of joint points with a certainty degree greater than 0.6 exceeds 8, and the minimum longitudinal distance between the shoulder joint point and the hip joint point More than 1/3 of the total height of the picture;

1.3如果关节点可以使用，根据双肩2个关节点的平均高度和髋关节2个关节点的平均高度，将图片分割为3个部分，双肩2个关节点的平均高度以上的部分为第一部分，2个关节点的平均高度和髋关节2个关节点的平均高度之间的部分为第二部分，髋关节2个关节点的平均高度以下的部分为第三部分；如果关节点不可以使用，根据固定尺寸将图片分割为3个部分，具体尺寸为：纵向依次分割图片成三个部分，第一部分(头部)的高度占图片总高度的1/4，第二部分(上身部分)的高度占图片总高度的3/8，第三部分(下身部分)占图片总高度的3/8。1.3 If the joint points can be used, divide the picture into 3 parts according to the average height of the two joint points of the shoulders and the average height of the two joint points of the hip joint, and the part above the average height of the two joint points of the shoulders is the first part. The part between the average height of the 2 joint points and the average height of the 2 joint points of the hip joint is the second part, and the part below the average height of the 2 joint points of the hip joint is the third part; if the joint point is not available, Divide the picture into 3 parts according to the fixed size. The specific size is: divide the picture into three parts vertically and sequentially. The height of the first part (head) accounts for 1/4 of the total height of the picture, and the height of the second part (upper body part) Accounting for 3/8 of the total height of the picture, the third part (lower body part) accounts for 3/8 of the total height of the picture.

提取mask操作具体为：The operation of extracting the mask is as follows:

对于输入的图片，采用mask检测算法，获得相应mask图片；For the input picture, use the mask detection algorithm to obtain the corresponding mask picture;

将mask图片上的检测物体颜色统一为白色，背景颜色统一为黑色，并将最终的mask图片输出为mask图片。Unify the detected object color on the mask image to white, and unify the background color to black, and output the final mask image as a mask image.

步骤二、对部位行人图片预处理，对预处理后得到的部位行人图片、部位目标姿态图片和部位目标姿态mask图片进行部位生成操作，得到部位生成图片；Step 2, preprocessing the part pedestrian picture, performing a part generation operation on the part pedestrian picture, part target pose picture and part target pose mask picture obtained after the preprocessing, to obtain the part generation picture;

预处理为将部位行人mask图片乘以原部位行人图片，得到去除背景的部位行人图片；The preprocessing is to multiply the pedestrian mask image of the part by the pedestrian image of the original part to obtain the pedestrian image of the part with the background removed;

部位生成操作，输入要求为一张分割后的部位行人图片、分割后的部位目标姿态对应的mask图片和分割后的部位目标姿态图片，输出为一张和目标姿态一致的部位生成图片；具体包括如下步骤：Part generation operation, the input requirements are a segmented part pedestrian picture, the mask picture corresponding to the segmented part target pose, and the segmented part target pose picture, and the output is a part generation picture consistent with the target pose; specifically includes Follow the steps below:

2.2对于第i个独立的生成网络，包括一个生成器和一个判别器输入一张分割后的部位行人图片x_i、分割后的目标姿态图片对应的mask图片p_i和分割后的目标姿态图片y_i，通过训练输出为一张和目标姿态一致的部位生成图片G_pi(x_i，p_i)；2.2 For the i-th independent generative network, including a generator and a discriminator Input a segmented pedestrian image x _i , a mask image p _i corresponding to the segmented target pose image, and a segmented target pose image y _i , and generate a picture G _pi for a part consistent with the target pose through training (x _i , p _i );

2.3依次对3个独立的生成网络重复步骤2.2，得到所有的部位生成图片；2.3 Repeat step 2.2 for the three independent generation networks in turn to obtain all the generated pictures of the parts;

2.2a)将分割后的部位行人图片x_i输入生成器得到生成图G_pi(x_i)，将部位行人图片x_i和目标姿态mask图片p_i输入生成器生成图片 2.2a) Input the segmented pedestrian image x _i into the generator Get the generated graph G _pi ( _xi ), and input the part pedestrian image x _i and the target pose mask image p _i into the generator generate image

2.2k)返回2.2a)继续更新，直至损失函数L_i减低到阈值或者迭代次数达到要求，输出部位生成图片G_pi(x_i，p_i)。2.2k) Return to 2.2a) and continue to update until the loss function L _i is reduced to the threshold or the number of iterations meets the requirement, and the output part generates a picture G _pi ( _xi , p _i ).

步骤三、对部位生成操作得到的部位生成图片进行结构化合并操作；Step 3, performing a structured merging operation on the part generated pictures obtained by the part generating operation;

结构化合并操作包括如下子步骤：A structured merge operation includes the following sub-steps:

3.1对于得到的3个分别对应第一部分、第二部分和第三部分的生成部位图片，根据原图中不同部位的尺寸比例h_T，i和w_T，将生成的部位图片进行缩放，得到生成部位图片 3.1 For the obtained three generated part pictures corresponding to the first part, the second part and the third part respectively, according to the size ratio h _{T , i} and w _T of different parts in the original picture, the generated part pictures Scale to get the image of the generated part

3.2根据原图中部位结构的位置关系，将缩放后的3个生成部位图片纵向合并为一张图片，即结构化合并后的部位生成图片；3.2 According to the positional relationship of the part structure in the original image, the three generated part pictures after scaling Merge vertically into a picture, that is, generate a picture of the structurally merged parts;

3.3调节结构化合并后的部位生成图片的颜色和边缘连接等信息，Δh_i是高度的偏移调整，通过多次尝试得到，c_i是不同部位图片的色彩平衡调整因子，优选的，可以为将三张图片的色彩平均值分别除以三张色彩总的均值得到，得到更真实的结构化合并图片A_w；3.3 Adjust the information such as the color and edge connection of the image generated by the part after the structured merger, Δh _i is the offset adjustment of the height, which is obtained through multiple attempts, c _i is the color balance adjustment factor of the different parts of the picture, preferably, it can be Divide the color average of the three images by the total average of the three colors to obtain a more realistic structured merged image A _w ;

A_w的获取可以用下述公式求得：The acquisition of A _w can be obtained by the following formula:

其中，h_T和w_T表示目标图片的高度和宽度，h_T，i表示目标图片第i个身体部位的高度；R(pic，h，w)代表将一张图片的尺寸调整为h*w的操作，O(h*w)指h*w尺寸的零矩阵。我们根据目标图片的部位结构关系重新组织部位图片的位置。为了保证部位连接处的平滑，Δh_i是高度的偏移调整，而c_i是不同部位图片的色彩平衡调整因子，如图5所示；Among them, h _T and w _T represent the height and width of the target picture, h _{T, i} represent the height of the i-th body part of the target picture; R(pic, h, w) represents resizing a picture to h*w The operation, O(h*w) refers to the zero matrix of h*w size. We reorganize the position of part images according to the part structure relations of target images. In order to ensure the smoothness of the part connection, Δh _i is the offset adjustment of the height, and c _i is the color balance adjustment factor of the pictures of different parts, as shown in Figure 5;

步骤四、对原始图片进行预处理，将合并后的图片、行人图片和目标姿态图片作为输入，进行整体生成操作；Step 4. Preprocess the original image, and use the merged image, pedestrian image and target pose image as input to perform an overall generation operation;

预处理为将行人mask图片乘上原行人图片，得到去除背景的行人图片；The preprocessing is to multiply the pedestrian mask image by the original pedestrian image to obtain the pedestrian image with the background removed;

整体生成操作输入要求为：一张原始图片、目标姿态mask图片、目标姿态图片和结构化合并后的部位生成图片；包括如下子步骤：The input requirements for the overall generation operation are: an original image, a target pose mask image, a target pose image, and a structurally merged part generated image; including the following sub-steps:

4.1将行人图片x输入生成器G_w得到生成图Gw(x)，将行人图片x、目标姿态mask图片、合并图片A_w输入生成器Gw得到生成图G_w(x，p，A_w)；4.1 Input the pedestrian image x into the generator Gw to obtain the generated graph Gw(x), and input the pedestrian image x, the target pose mask image, and the merged image _Aw into the generator Gw to obtain the generated _{graph Gw (x, p, A w} ₎ _;

4.2将目标姿态图片y输入判别器D_w得到D_w(y)，将生成图G_w(x，p，A_w)输入判别器D_w得到D_w(G_w(x，p，A_w))；4.2 Input the target pose picture y into the discriminator D _w to get D _w (y), and input the generated graph G _w (x, p, A _w ) into the discriminator D _w to get D _w (G _w (x, p, A _w ) );

实施例二Embodiment two

如图3所示：本发明一种基于可形变结构的行人图像生成方法，具体包括以下步骤：As shown in Figure 3: a pedestrian image generation method based on a deformable structure of the present invention, specifically includes the following steps:

步骤一、通过分割操作和提取mask操作，将图片进行符合人体特征的先验处理；Step 1. Through the segmentation operation and mask extraction operation, the image is subjected to a priori processing that conforms to the characteristics of the human body;

步骤二、通过部位生成操作，生成不同的部位图片，分解复杂的全身姿态对应；Step 2. Through the part generation operation, generate different part pictures, and decompose the complex body posture correspondence;

步骤三、通过结构化合并操作，将生成的部位图片组合起来，为全身的生成提供有力的指导；Step 3: Combining the generated part images through structured merging operations to provide powerful guidance for the generation of the whole body;

步骤四、通过整体生成操作，在保留局部信息和身份信息的前提下，生成更真实可信的行人图像。Step 4. Through the overall generation operation, a more authentic and credible pedestrian image is generated under the premise of retaining local information and identity information.

本发明提出基于可形变结构的行人图像生成方法，将行人图像生成的复杂问题分解为数个部位图片的姿态间转换问题，用分而治之的思路解决行人图像生成的问题。本发明降低了生成网络对于训练样本的数量需求，同时将更多的局部特征作为生成的指标，在高效的同时提高了生成图片的质量。下面对本发明实施例中基于可形变结构的行人图像生成的结构化合并操作进行详细说明。The invention proposes a pedestrian image generation method based on a deformable structure, which decomposes the complex problem of pedestrian image generation into the pose conversion problem of several part pictures, and solves the problem of pedestrian image generation with the idea of divide and conquer. The invention reduces the requirement of the generation network for the number of training samples, and at the same time uses more local features as the generation index, thereby improving the quality of generated pictures while being efficient. The following is a detailed description of the structured merging operation of pedestrian image generation based on deformable structures in the embodiment of the present invention.

如图5所示，为步骤三中本发明实施例中基于可形变结构的行人图像生成的结构化合并操作的示例性流程图，As shown in FIG. 5, it is an exemplary flow chart of the structured merging operation based on the pedestrian image generation of the deformable structure in the embodiment of the present invention in step 3,

3.1对于部位生成操作得到的3个生成部位图片，根据原图中不同部位的尺寸比例，将生成的部位图片进行缩放；3.1 For the three generated part pictures obtained by the part generation operation, the generated part pictures are scaled according to the size ratio of different parts in the original image;

3.2根据原图中部位结构的位置关系，将缩放后的3个生成部位图片纵向合并为一张图片；3.2 According to the positional relationship of the part structure in the original image, vertically merge the zoomed images of the three generated parts into one picture;

3.3根据合并图片边缘的平滑连接，将合并图片的3个生成部位图片进行位置微调，拼合成更加平滑整体的一张图片；根据合并图片的整体颜色和光照条件，调整合并图片的3个生成部位图片的颜色以及亮度权重，拼合成色彩均衡的一张图片。3.3 According to the smooth connection of the edges of the merged picture, fine-tune the positions of the three generated parts of the merged picture to form a smoother overall picture; adjust the three generated parts of the merged picture according to the overall color and lighting conditions of the merged picture The color and brightness weight of the picture are combined into a picture with balanced color.

如图6所示，本发明的一种基于可形变结构的行人图像生成装置，包括：As shown in Figure 6, a pedestrian image generation device based on a deformable structure of the present invention includes:

首先向图像预处理模块输入原行人图片和目标姿态图片，通过分割操作和提取mask操作，将图片进行符合人体特征的先验处理，得到三组预处理后的部位行人mask图、部位目标姿态mask图片、部位行人图片和部位目标姿态图片；对每一组部位行人图片和部位目标姿态图片，通过输入到部位生成模块，进行部位生成操作，生成不同的部位图片，分解复杂的全身姿态对应，得到三张不同部位的部位生成图片；对部位生成操作得到的三张部分生成图片，通过输入结构化合并模块，进行合并操作，将生成的部位图片组合起来，为全身的生成提供有力的指导，得到一张结构化合并图片；将结构化合并图片、原图片和目标姿态作为整体生成模块的输入，通过整体生成操作，在保留局部信息和身份信息的前提下，最终生成更真实可信的行人图像。First, input the original pedestrian picture and target pose picture to the image preprocessing module, and perform prior processing on the picture in line with human characteristics through segmentation operation and extraction mask operation, and obtain three sets of preprocessed part pedestrian mask map and part target pose mask Pictures, part pedestrian pictures and part target pose pictures; for each group of part pedestrian pictures and part target pose pictures, through the input to the part generation module, the part generation operation is performed, different part pictures are generated, and the complex body pose correspondence is decomposed to obtain Three pictures of different parts are generated; for the three parts generated pictures obtained by the part generation operation, the merge operation is carried out by inputting the structured merging module, and the generated part pictures are combined to provide powerful guidance for the generation of the whole body. A structured merged image; the structured merged image, the original image and the target pose are used as the input of the overall generation module, and through the overall generation operation, on the premise of retaining local information and identity information, a more authentic and credible pedestrian image is finally generated .

优选的，部位生成模块和整体生成模块均包含生成器和判别器。Preferably, both the part generation module and the overall generation module include a generator and a discriminator.

生成器包括：Builders include:

将多个输入图片进行第三个维度叠加的输入处理结构；An input processing structure that superimposes multiple input images in the third dimension;

由多个卷积层串联组成的编码器；An encoder composed of multiple convolutional layers connected in series;

由多个反卷积层串联组成的解码器；A decoder consisting of multiple deconvolutional layers connected in series;

通过编码器和解码器的对应层级网络直连组成的U型结构；A U-shaped structure composed of the direct connection of the corresponding hierarchical network of the encoder and the decoder;

输出生成图片和生成损失的输出结构。Output the output structure of the generated image and the generated loss.

判别器包括：Discriminators include:

将待判别图片和期望标签的输入处理结构；The input processing structure of the picture to be discriminated and the expected label;

由数个卷积层和全连接层组成的特征提取网络；A feature extraction network consisting of several convolutional layers and fully connected layers;

输出判别标签结果和判别损失的输出结构。An output structure that outputs discriminative label results and a discriminative loss.

部位生成模块的生成器的损失函数为：The loss function of the generator of the part generation module is:

其中， in,

其中，为均值，为均值，y_i为部位目标姿态图片，p_i部位目标姿态mask图片，为部位行人图片x_i输入生成器得到的生成图，为部位行人图片x_i和目标姿态mask图片p_i输入生成器得到的生成图，为部位行人图片x_i和目标姿态图片y_i输入判别器得到的判别结果，为生成图与部位目标姿态mask图片p_i输入判别器得到的判别结果，⊙指两个相同尺寸的矩阵之间的元素乘法，||*||₁为1-范数，i代表部位，具体对应分割的三个部分。in, is the mean value, is the mean value, y _i is the target pose picture of the part, p _i is the target pose mask picture of the part, Input the generator for the part pedestrian image x _i The resulting graph is obtained, Input the generator for the part pedestrian picture x _i and the target pose mask picture p _i The resulting graph is obtained, Input the discriminator for the part pedestrian picture x _i and the target pose picture y _i The obtained judgment result, for generating graph Input the discriminator with the position target pose mask picture p _i The obtained discrimination result, ⊙ refers to the element multiplication between two matrices of the same size, ||*|| ₁ is the 1-norm, and i represents the part, which corresponds to the three parts of the segmentation.

部位生成模块的判别器模块的判别函数为：The discriminant function of the discriminator module of the part generation module is:

其中，为均值，为均值，y_i为部位目标姿态图片，p_i部位目标姿态mask图片，为部位行人图片x_i和目标姿态mask图片p_i输入生成器得到的生成图，为部位行人图片x_i和目标姿态图片y_i输入判别器得到的判别结果，为生成图与部位目标姿态mask图片p_i输入判别器得到的判别结果，i代表部位，具体对应分割的三个部分。in, is the mean value, is the mean value, y _i is the target pose picture of the part, p _i is the target pose mask picture of the part, Input the generator for the part pedestrian picture x _i and the target pose mask picture p _i The resulting graph is obtained, Input the discriminator for the part pedestrian picture x _i and the target pose picture y _i The obtained judgment result, for generating graph Input the discriminator with the position target pose mask picture p _i In the obtained discrimination result, i represents a part, which specifically corresponds to the three parts of the segmentation.

整体生成模块的生成器G_w的损失函数为The loss function of the generator G _w of the overall generation module is

其中，in,

其中，x为行人图片，y为目标姿态图片，p目标姿态mask图片，A_w为合成照片，G_w(x)为行人图片x输入生成器G_w得到生成图，G_w(x，p，A_w)为行人图片x、目标姿态mask图片p、合并图片A_w输入生成器G_w得到生成图；D_w(y)为目标姿态图片y输入判别器D_w得到的判别结果，D_w(Gw(x，p，A_w))为生成图G_w(x，p，A_w)输入判别器D_w得到的判别结果，均代表相应均值，⊙指两个相同尺寸的矩阵之间的元素乘法，||*||₁为1-范数，Mask为目标姿态mask图片p对应的矩阵，cl指目标人物的身份类别标签，如果分类网络预测的类别标签和cl一致则Q_c＝1，否则Q_c＝0，P(G_w(x，p，A_w))分类网络的输出概率分布；Among them, x is a pedestrian picture, y is a target pose picture, p is a target pose mask picture, A _w is a composite photo, G _w (x) is a pedestrian picture x input generator G _w to get a generated map, G _w (x, p, A _w ) is the pedestrian picture x, the target pose mask picture p, and the merged picture A _w input to the generator G _w to obtain the generated image; D _w (y) is the discriminant result obtained by inputting the target pose picture y into the discriminator D _w , D _w ( Gw(x, p, A _w )) is the discrimination result obtained by inputting the generated graph G _w (x, p, A _w ) into the discriminator D _w , Both represent the corresponding mean value, ⊙ refers to the element multiplication between two matrices of the same size, ||*|| ₁ is the 1-norm, Mask is the matrix corresponding to the target pose mask image p, and cl refers to the identity category label of the target person , if the category label predicted by the classification network is consistent with cl, Q _c =1, otherwise Q _c =0, P(G _w (x, p, A _w )) the output probability distribution of the classification network;

整体生成模块的判别器G_w的判别函数为：The discriminant function of the discriminator G _w of the overall generation module is:

其中，x为行人图片，y为目标姿态图片，p目标姿态mask图片，A_w为合成照片，G_w(x，p，A_w)为行人图片x、目标姿态mask图片p、合并图片A_w输入生成器G_w得到生成图；D_w(y)为目标姿态图片y输入判别器D_w得到的判别结果，D_w(Gw(x，p，A_w))为生成图G_w(x，p，A_w)输入判别器D_w得到的判别结果，均代表相应均值。Among them, x is a pedestrian picture, y is a target pose picture, p is a target pose mask picture, A _w is a composite photo, G _w (x, p, A _w ) is a pedestrian picture x, a target pose mask picture p, and a merged picture A _w Input the generator G _w to get the generated graph; D _w (y) is the discrimination result obtained by inputting the target pose image y into the discriminator D _w , and D _w (Gw(x, p, A _w )) is the generated graph G _w (x, p, A _w ) input the discriminant result obtained by the discriminator D _w , Both represent the corresponding mean.

以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解，本申请中所涉及的发明范围，并不限于上述技术特征的特定组合而成的技术方案，同时也应涵盖在不脱离上述发明构思的情况下，由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present application and an illustration of the applied technical principle. Those skilled in the art should understand that the scope of the invention involved in this application is not limited to the technical solution formed by the specific combination of the above-mentioned technical features, and should also cover the technical solutions formed by the above-mentioned technical features or without departing from the above-mentioned inventive concept. Other technical solutions formed by any combination of equivalent features. For example, a technical solution formed by replacing the above-mentioned features with technical features with similar functions disclosed in (but not limited to) this application.

Claims

1. A pedestrian image generation method based on deformable structure, is characterized in that, specifically comprises the following steps:

Step 1. For the input pedestrian pictures and target pose pictures, the pedestrian pictures and target pose pictures are segmented according to the part structure, and the obtained part pedestrian pictures and part target pose pictures are divided into pedestrian pictures, target pose pictures, part pedestrian pictures and The target pose pictures of the parts are all subjected to the mask operation to obtain the pedestrian mask picture, the target pose picture mask picture, the part pedestrian mask picture and the part target pose picture mask picture;

Step 2, preprocessing the part pedestrian picture, performing a part generation operation on the preprocessed part pedestrian picture, part target pose picture and part target pose mask picture, to obtain the part generation picture;

Step 3, performing a structured merging operation on the part generated picture obtained by the part generating operation in step 2, to obtain a structured merging picture;

Step 4: Perform preprocessing on the original pedestrian picture, use the preprocessed pedestrian picture, the merged picture in step 3, and the target pose picture as input, and perform an overall generation operation to obtain a generated picture.

2. a kind of pedestrian image generation method based on deformable structure as claimed in claim 1, is characterized in that, in described step 1, segmentation operation specifically comprises the following steps:

1.1 For pedestrian pictures and target pose pictures, use the joint point detection algorithm to find the joint points of the input picture;

1.2 Judging whether the extracted joint points can be used according to the position and certainty of the joint points;

1.3 If the joint points can be used, divide the picture into 3 parts according to the average height of the two joint points of the shoulders and the average height of the two joint points of the hip joint, and the part above the average height of the two joint points of the shoulders is the first part. The part between the average height of the 2 joint points and the average height of the 2 joint points of the hip joint is the second part, and the part below the average height of the 2 joint points of the hip joint is the third part; if the joint point is not available, Divide the picture into three parts according to the fixed size, which are the first part, the second part and the third part respectively from top to bottom.

3. a kind of pedestrian image generation method based on deformable structure as claimed in claim 2, is characterized in that, in described step 2, specifically comprises the following sub-steps:

2.1 According to the different generation parts, it is divided into three independent generation networks, corresponding to the first part, second part and third part in step 1;

2.2 For the i-th independent generative network, including the generator and the discriminator to the generator and the discriminator Input the segmented part pedestrian picture x _i , the segmented target pose mask picture p _i and the segmented target pose picture y _i , and generate a picture G _pi ( _xi , p _i ) by training the part that is consistent with the target pose;

2.3 Repeat step 2.2 for the three independent generation networks in turn to obtain all part generation pictures.

4. a kind of pedestrian image generation method based on deformable structure as claimed in claim 3, is characterized in that, described step 3, structured merging operation comprises the following sub-steps:

3.1 For the obtained three generated part pictures corresponding to the first part, the second part and the third part respectively, according to the size ratio h _{T , i} and w _T of different parts in the original picture, the generated part pictures Scale to get the zoomed images of the 3 generated parts

3.2 According to the positional relationship of the part structure in the original picture, the Merge vertically to generate pictures for structured merged parts;

3.3 Adjust the color and edge connection information of the pictures generated by the structured merged parts. Δh _i is the height offset adjustment, and c _i is the color balance adjustment factor of different parts of the picture. According to Δh _i and c _i , a more realistic structure can be obtained Merge image A _w .

5. A kind of pedestrian image generation method based on deformable structure as claimed in claim 4, is characterized in that, described step 2.2 specifically comprises the following sub-steps:

2.2a) Input the segmented pedestrian image x _i into the generator Get the generated graph G _pi ( _xi ), and input the part pedestrian image x _i and the target pose mask image p _i into the generator generate image

2.2b) Input the part pedestrian picture x _i and the target pose picture y _i into the discriminator get Input the generated graph G _pi ( _xi , p _i ) and the part target pose mask image p _i into the discriminator get

2.2c) Calculate the maskL1 loss function of the part target pose picture y _i , the generated graph G _pi ( _xi ) and the part target pose mask picture p _i in ⊙ refers to the multiplication of elements between two matrices of the same size, ||*|| ₁ is the 1-norm; calculate the confrontation loss function V _pi of the generated image G _pi ( _xi ) and the real image, and Mask is the target pose mask Image matrix: is the mean value;

2.2d) Compute the adversarial loss function is the mean value;

2.2e) Combining the above two loss functions, the i-th independent generation network, the loss function is:

2.2f) The generator is updated by minimizing the loss function L _i

2.2g) By maximizing the adversarial loss function update discriminator

2.2k) Return to 2.2a) and continue to update until the loss function L _i is reduced to the threshold or the number of iterations reaches the requirement, and the image G _pi ( _xi , p _i ) is generated at the part that is consistent with the target pose.

6. a kind of pedestrian image generation method based on deformable structure as claimed in claim 5, is characterized in that, described step 4, overall generation operation comprises the following sub-steps:

4.1 Input the pedestrian image x into the generator G _w to obtain the generated graph G _w (x), and input the pedestrian image x, the target pose mask image, and the merged image A _w into the generator G _w to obtain the generated graph G _w (x, p, A _w );

4.2 Input the target pose picture y into the discriminator D _w to get D _w (y), and input the generated image G _w (x, p, A _w ) into the discriminator D _w to get D _w (G _w (x, p, A _w ) );

4.3 Calculate the maskL1 loss function M(G _w ) of the target pose picture y, the generated picture G _w (x) and the mask picture p:

⊙ refers to the multiplication of elements between two matrices of the same size, ||*|| ₁ is the 1-norm;

4.4 Compute the identity classification network as a guide:

Among them, cl refers to the identity category label of the target person. If the category label predicted by the classification network is consistent with cl, then Q _c = 1, otherwise Q _c = 0, P(G _w (x, p, A _w )) the output of the classification network Probability distributions;

4.5 Calculate the confrontation loss function V _w :

4.6 The overall generation network, the loss function L _w is:

L _w ＝V _w (D _w ，G _w )+M(G _w )+C(G _w ，cl)

4.7 Update the generator _Gw by minimizing the loss function _Lw ;

4.8 Update the discriminator D _w by maximizing the adversarial loss function V _w (D _w , G _w );

4.9 Return to step 4.1 and continue updating until the loss function L _w is reduced to an acceptable range or the number of iterations meets the requirements, and the generated image G _w (x, p, A _w ) is output.

7. a kind of pedestrian image generation method based on deformable structure as claimed in claim 6, is characterized in that, in described step 1, extracting mask operation is specifically:

For the input picture, use the mask detection algorithm to obtain the corresponding mask picture; wherein, the color of the detected object on the mask picture is uniformly white, and the background color is uniformly black.

8. a kind of pedestrian image generation method based on deformable structure as claimed in claim 4, is characterized in that, in described step 3, the computing formula of _Aw is:

Among them, h _T and w _T represent the height and width of the target picture, h _{T, i} represent the height of the i-th body part of the target picture; R(pic, h, w) represents resizing a picture to h*w The operation, O(h*w) refers to the zero matrix of h*w size. We reorganize the location of part images according to the part structure relations of target images. In order to ensure the smoothness of the part connection, Δh _i is the height offset adjustment, and _ci is the color balance adjustment factor of different parts of the picture.

9. A pedestrian image generation device based on a deformable structure, characterized in that it comprises:

Image preprocessing module: For the input original pedestrian picture and target pose picture, the original pedestrian picture and target pose picture are respectively segmented and extracted according to the part structure, and three sets of preprocessed part pedestrian mask pictures and part targets are obtained. Attitude mask pictures, part pedestrian pictures and part target pose pictures;

Part generation module: preprocess the part pedestrian image obtained by segmentation with the part pedestrian mask image, perform part generation operation on the part target pose mask picture, part pedestrian picture and part target pose picture, and obtain three part generated pictures;

Structured merging module: perform a structured merging operation on the three partially generated pictures obtained by the part generating operation to obtain a structured merging picture;

Overall generation module: take the structured merged image, the original image and the target pose as input, perform the overall generation operation, and obtain a final generated image of pedestrians.

10. A pedestrian image generation device based on a deformable structure as claimed in claim 9, wherein both the part generation module and the overall generation module include a generator and a discriminator.