CN110288677A - It is a kind of based on can deformation structure pedestrian image generation method and device - Google Patents

It is a kind of based on can deformation structure pedestrian image generation method and device Download PDF

Info

Publication number
CN110288677A
CN110288677A CN201910425357.7A CN201910425357A CN110288677A CN 110288677 A CN110288677 A CN 110288677A CN 201910425357 A CN201910425357 A CN 201910425357A CN 110288677 A CN110288677 A CN 110288677A
Authority
CN
China
Prior art keywords
picture
pedestrian
mask
target posture
pictures
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910425357.7A
Other languages
Chinese (zh)
Other versions
CN110288677B (en
Inventor
田永鸿
常亦谦
翟云鹏
史业民
王耀威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201910425357.7A priority Critical patent/CN110288677B/en
Publication of CN110288677A publication Critical patent/CN110288677A/en
Application granted granted Critical
Publication of CN110288677B publication Critical patent/CN110288677B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to image generate field, in particular to it is a kind of based on can deformation structure pedestrian image generation method and device.Specifically includes the following steps: extracting mask operation Step 1: be split operation according to portion structure to pedestrian's picture and targeted attitude picture;Step 2: then carrying out position generates operation, obtains position and generate picture;Step 3: generating picture to position carries out structuring union operation, obtains structuring and merge picture;It is operated Step 4: carrying out whole generate, obtains generating picture.The present invention consider human body can deformation structure on the basis of, reduce trained cost, the performance of boosting algorithm.

Description

Pedestrian image generation method and device based on deformable structure
Technical Field
The invention relates to the field of image generation, in particular to a pedestrian image generation method and device based on a deformable structure.
Background
The conversion from one pedestrian picture to another pedestrian picture according to a given pose is a pedestrian image generation problem. The pedestrian image generation problem is a field of image generation, and compared with the common image generation, the pedestrian image generation is more complex and challenging because more complex scenes and various deformable postures are considered.
The pedestrian image generation problem can be solved according to the traditional image generation thought, for example, a conditional countermeasure generation network is adopted, and a source picture of the whole body of a human body is used as a conditional instruction network to generate a new posture picture with the appearance of the source picture; and a cyclic confrontation generation network can be adopted to replace the background and illumination of the pedestrian picture, and a new pedestrian picture in a posture and an environment is generated on the basis of keeping human body characteristics. The biggest problem of the method is difficult to train, the human body is too complicated as a deformable object, and the complicated picture conversion relation needs an extremely large-scale training sample.
The introduction of human body information into the generation process is a better solution, for example, posture information is used as a part of input information to provide guidance of prior conditions. The key of the human body deformable complexity is the diversity of postures, and the prior guidance of posture information can effectively relieve the generation complexity, so that a more real pedestrian picture can be generated. The same problem still exists, the posture conversion of the whole body is still complex, and a large amount of training samples are still needed to generate a more real picture.
Disclosure of Invention
The embodiment of the invention provides a pedestrian image generation method and device based on a deformable structure, which can reduce the training cost and improve the performance of an algorithm on the basis of considering the deformable structure of a human body.
According to a first aspect of the embodiments of the present invention, the method for generating a pedestrian image based on a deformable structure specifically includes the following steps:
step one, for an input pedestrian picture and an input target posture picture, performing segmentation operation on the pedestrian picture and the input target posture picture according to a part structure to obtain a part pedestrian picture and a part target posture picture, and performing mask extraction operation on the pedestrian picture, the target posture picture, the part pedestrian picture and the part target posture picture to obtain a pedestrian mask picture, a target posture picture mask picture, a part pedestrian mask picture and a part target posture picture mask picture;
step two, preprocessing the part pedestrian picture, and then performing part generation operation on the preprocessed part pedestrian picture, the preprocessed part target posture picture and the preprocessed part target posture mask picture to obtain a part generation picture;
thirdly, performing structural combination and operation on the part generation picture obtained by the part generation operation in the second step to obtain a structural combination picture;
and step four, preprocessing the original pedestrian picture, taking the preprocessed pedestrian picture, the combined picture in the step three and the target posture picture as input, and then carrying out integral generation operation to obtain a generated picture.
In the first step, the segmentation operation specifically includes the following steps:
1.1, finding out the joint points of the input picture by adopting a joint point detection algorithm on the pedestrian picture and the target posture picture;
1.2 judging whether the extracted joint points can be used or not according to the positions and the certainty factors of the joint points;
1.3 if the joint points can be used, dividing the picture into 3 parts according to the average height of 2 joint points of shoulders and the average height of 2 joint points of a hip joint, wherein the part above the average height of 2 joint points of shoulders is a first part, the part between the average height of 2 joint points and the average height of 2 joint points of the hip joint is a second part, and the part below the average height of 2 joint points of the hip joint is a third part; if the joint point can not be used, the picture is divided into 3 parts according to the fixed size, and the 3 parts are respectively a first part, a second part and a third part from top to bottom.
In the second step, the following substeps are specifically included:
2.1 dividing the network into 3 independent generating networks according to different generating parts, wherein the generating networks respectively correspond to the first part, the second part and the third part in the first step;
2.2 for the ith independent Generation network, including the GeneratorSum discriminatorDirection generatorSum discriminatorInputting the pedestrian picture x of the divided partiAnd the segmented target posture mask picture piAnd the segmented target attitude picture yiGenerating a picture G by training a part with the output consistent with the target posturepi(xi,pi);
2.3 repeating step 2.2 for 3 independent generating networks in sequence to obtain all the part generating pictures.
And step three, the structural combination operation comprises the following substeps:
3.1 for the 3 generated part pictures corresponding to the first part, the second part and the third part, according to the size ratio h of different parts in the original pictureT,iAnd wTThe generated part picture Zooming to obtain zoomed 3 generated part pictures
3.2 based on the position relation of the part structure in the original image Longitudinally combining the parts into a structural combined part to generate a picture;
3.3 adjusting the color and edge connection information of the structured merged part generated picture, Δ hiIs an offset adjustment of the height, ciIs the color balance adjustment factor of the pictures at different parts to obtain a more real structural combination picture Aw
The step 2.2 specifically comprises the following substeps:
2.2a) image x of pedestrian of the divided partiInput generatorGet the generation diagramPicture x of pedestrianiAnd target pose mask picture piInput generatorGenerating pictures
2.2b) image of pedestrian at location xiAnd target pose picture yiInput discriminatorTo obtainWill generate the graph Gpi(xi,pi) Mask picture p of position target postureiInput discriminatorTo obtain
2.2c) calculating the position target attitude picture yiAnd generating a graph Gpi(xi) Mask picture p of position target postureiMask L1 loss functionWherein⊙ refers to the multiplication of elements between two matrices of the same size, | | Y | | | Y1Is 1-norm; calculate and generate the chart Gpi(xi) Loss function V against real picturepiMask is a target posture Mask picture matrix:is an average value;
2.2d) calculating the penalty function Is an average value;
2.2e) the two loss functions are combined, the ith independent generating network, and the loss functions are:
2.2f) by minimizing the loss function LiTo update the generator
2.2g) by maximizing the penalty functionUpdating discriminator
2.2k) return to 2.2a) continue updating until the loss function LiReducing the position to a threshold value or outputting a part generation picture G consistent with the target posture when the iteration times meet the requirementspi(xi,pi)。
The step four, the overall generation operation includes the following substeps:
4.1 inputting pedestrian Picture x into Generator GwTo obtain a generated graph Gw(x) The pedestrian picture x, the target posture mask picture and the merging picture A are combinedwInput generator GwTo obtain a generated graph Gw(x,p,Aw);
4.2 input the target attitude picture y into the discriminator DwTo obtain Dw(y) generating graph Gw(x,p,Aw) Input discriminator DwTo obtain Dw(Gw(x,p,Aw));
4.3 calculating target attitude picture y and generating picture Gw(x) And mask L1 loss function M (G) for mask Picture pw):
⊙ refers to the multiplication of elements between two matrices of the same size, | | Y | | | Y1Is 1-norm;
4.4 compute identity classification network as a guide:
wherein cl refers to the identity class label of the target person, and Q is the same if the class label predicted by the classification network is consistent with clc1, otherwise Qc=0,P(Gw(x,p,Aw) Output probability distribution of the classification network;
4.5 calculating the penalty function Vw
4.6 Overall Generation of network, loss function LwComprises the following steps:
Lw=Vw(Dw,Gw)+M(Gw)+C(Gw,cl)
4.7 by minimizing the loss function LwTo update the generator Gw;
4.8 by maximizing the penalty function Vw(Dw,Gw) Update discriminator Dw
4.9 Return to step 4.1 to continue updating until the loss function LwReducing the number of iterations to an acceptable range or meeting the requirement, and outputting a generated picture Gw(x,p,Aw)。
In the first step, the operation of extracting the mask specifically comprises the following steps:
for the input picture, adopting a mask detection algorithm to obtain a corresponding mask picture; the colors of the detected objects on the mask picture are uniformly white, and the colors of the background are uniformly black.
In the third step, AwThe calculation formula of (2) is as follows:
wherein h isTAnd wTHeight and width of the target picture, hT,iRepresenting the height of the ith body part of the target picture; r (pic, h, w) represents an operation of resizing one picture to h w, and O (h w) refers to a zero matrix of h w size. We reorganize the position of the region picture according to the structure relationship of the region of the target picture. To ensure the smoothness of the site connection,. DELTA.hiIs an offset adjustment of the height, ciIs the color balance adjustment factor of the picture at different positions.
A pedestrian image generation apparatus based on a deformable structure, comprising:
an image preprocessing module: for the input original pedestrian picture and the input target posture picture, respectively carrying out segmentation operation and mask extraction operation on the original pedestrian picture and the target posture picture according to the part structure to obtain three groups of preprocessed part pedestrian mask pictures, part target posture mask pictures, part pedestrian pictures and part target posture pictures;
a part generation module: preprocessing the segmented part pedestrian pictures by using part pedestrian mask pictures, and performing part generation operation on the part target posture mask pictures, the part pedestrian pictures and the part target posture pictures to obtain three part generation pictures;
structural combination module: carrying out structural combination and operation on the three part generation pictures obtained by the part generation operation to obtain a structural combination picture;
an integral generation module: and taking the structural combination picture, the original picture and the target posture as input, and performing integral generation operation to obtain a final pedestrian generation picture.
The part generation module and the whole generation module both comprise generators and discriminators.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
the complex problem generated by the pedestrian image is decomposed into the problem of conversion among postures of the pictures of the plurality of parts, so that the quantity requirement of a generated network on training samples is reduced, more local features are used as generated indexes, and the quality of the generated pictures is improved at the same time of high efficiency. Specifically, the method comprises the following steps: carrying out prior processing on the picture according with human body characteristics through segmentation operation and mask extraction operation; generating different part pictures through part generation operation, and decomposing complex whole body posture correspondence; through structural combination and operation, the generated map sheets are combined to provide powerful guidance for the generation of the whole body; through the integral generation operation, a more real and credible pedestrian image is generated on the premise of keeping local information and identity information. In conclusion, the method provided by the embodiment of the invention can improve the efficiency and the generation authenticity of the pedestrian image generation algorithm.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a flow chart of a pedestrian image generation method based on a deformable structure according to the present invention;
FIG. 2 is a comparison diagram of a pedestrian image generation method based on a deformable structure according to the present invention;
FIG. 3 is a general schematic diagram of a pedestrian image generation method based on a deformable structure according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of the segmentation and extraction operations of pedestrian image generation based on deformable structures according to the embodiment of the present invention;
FIG. 5 is a schematic diagram of a structural merging operation generated based on a pedestrian image with a deformable structure according to an embodiment of the present invention;
fig. 6 is a block diagram of a pedestrian image generation apparatus based on a deformable structure according to the present invention.
Detailed Description
Example one
As shown in fig. 1 and 2, the invention provides a pedestrian image generation method based on a deformable structure, which can obtain an optimized target posture picture, and specifically comprises the following steps:
step one, as shown in fig. 4, for an input pedestrian picture and an input target posture picture, performing segmentation operation on the pedestrian picture and the input target posture picture according to a part structure to obtain a part pedestrian picture and a part target posture picture, and performing mask extraction operation on the pedestrian picture, the target posture picture, the part pedestrian picture and the part target posture picture to obtain a pedestrian mask picture, a target posture mask picture, a part pedestrian mask picture and a part target posture picture mask picture;
the segmentation operation specifically comprises the following steps:
1.1, for a pedestrian picture and a target posture picture, a joint point detection algorithm is adopted, and 14 joint points of an input picture are found firstly;
1.2 judging whether the extracted joint points can be used or not according to the positions and the certainty factors of the joint points, wherein the usable requirements are satisfied as follows: the number of joint points with certainty greater than 0.6 exceeds 8 and the minimum longitudinal distance between the shoulder and hip joint points exceeds 1/3 for the total height of the picture;
1.3 if the joint points can be used, dividing the picture into 3 parts according to the average height of 2 joint points of shoulders and the average height of 2 joint points of a hip joint, wherein the part above the average height of 2 joint points of shoulders is a first part, the part between the average height of 2 joint points and the average height of 2 joint points of the hip joint is a second part, and the part below the average height of 2 joint points of the hip joint is a third part; if the joint point can not be used, the picture is divided into 3 parts according to a fixed size, wherein the specific size is as follows: the picture is longitudinally divided into three parts in sequence, wherein the height of the first part (head part) accounts for 1/4 of the total height of the picture, the height of the second part (upper body part) accounts for 3/8 of the total height of the picture, and the height of the third part (lower body part) accounts for 3/8 of the total height of the picture.
The mask extracting operation specifically comprises the following steps:
for the input picture, adopting a mask detection algorithm to obtain a corresponding mask picture;
and unifying the colors of the detected objects on the mask picture into white and the background color into black, and outputting the final mask picture as the mask picture.
Step two, preprocessing the part pedestrian picture, and performing part generation operation on the preprocessed part pedestrian picture, the preprocessed part target posture picture and the preprocessed part target posture mask picture to obtain a part generation picture;
the preprocessing is to multiply the pedestrian mask picture of the part by the pedestrian picture of the original part to obtain the pedestrian picture of the part without the background;
a part generating operation, namely inputting a picture of a pedestrian of a divided part, a mask picture corresponding to the target posture of the divided part and a picture of the target posture of the divided part, and outputting the picture of the target posture of the divided part as a part generating picture consistent with the target posture; the method specifically comprises the following steps:
2.1 dividing the network into 3 independent generating networks according to different generating parts, wherein the generating networks respectively correspond to the first part, the second part and the third part in the first step;
2.2 for the ith independent Generation network, include one GeneratorAnd a discriminatorInputting a pedestrian picture x of a segmented partiAnd mask picture p corresponding to the segmented target posture pictureiAnd the segmented target attitude picture yiGenerating a picture G for a part consistent with the target posture through training outputpi(xi,pi);
2.3 repeating the step 2.2 for 3 independent generating networks in sequence to obtain all the part generating pictures;
the step 2.2 specifically comprises the following substeps:
2.2a) image x of pedestrian of the divided partiInput generatorTo obtain a generated graph Gpi(xi) To take a picture x of a pedestrian at a positioniAnd target pose mask picture piInput generatorGenerating pictures
2.2b) image of pedestrian at location xiAnd target pose picture yiInput discriminatorTo obtainWill generate the graph Gpi(xi,pi) Mask picture p of position target postureiInput discriminatorTo obtain
2.2c) calculating the position target attitude picture yiAnd generating a graph Gpi(xi) Mask picture p of position target postureiMask L1 loss functionWherein⊙ refers to the multiplication of elements between two matrices of the same size, | | Y | | | Y1Is 1-norm; calculate and generate the chart Gpi(xi) Loss function V against real picturepiMask is a target posture Mask picture matrix:is an average value;
2.2d) calculating the penalty function Is an average value;
2.2e) the two loss functions are combined, the ith independent generating network, and the loss functions are:
2.2f) by minimizing the loss function LiTo update the generator
2.2g) by maximizing the penalty functionUpdating discriminator
2.2k) return to 2.2a) continue updating until the loss function LiReducing the value to a threshold value or the number of iterations to meet the requirement, and outputting a part generated picture Gpi(xi,pi)。
Thirdly, structural combination and operation are carried out on the part generating picture obtained by the part generating operation;
the structural combination and operation comprises the following sub-steps:
3.1 for the 3 generated part pictures corresponding to the first part, the second part and the third part, according to the size ratio h of different parts in the original pictureT,iAnd wTThe generated part picture Zooming to obtain a generated part picture
3.2 according to the position relation of the part structure in the original image, 3 zoomed generated part picturesLongitudinally combining the images into one image, namely generating an image by the structured and combined part;
3.3 adjusting the color and edge connection information of the structured and combined part generated picture, delta hiIs an offset adjustment of the height, obtained by a number of attempts, ciThe color balance adjustment factors of the pictures at different parts are preferably obtained by dividing the average value of the colors of the three pictures by the total average value of the three colors respectively to obtain a more real structural combination picture Aw
AwThe following formula can be used to obtain:
wherein h isTAnd wTHeight and width of the target picture, hT,iRepresenting the height of the ith body part of the target picture; r (pic, h, w) represents an operation of resizing one picture to h w, and O (h w) refers to a zero matrix of h w size. We reorganize the position of the region picture according to the structure relationship of the region of the target picture. To ensure the smoothness of the site connection,. DELTA.hiIs an offset adjustment of the height, ciIs the color balance adjustment factor of the picture at different positions, as shown in fig. 5;
preprocessing the original picture, taking the combined picture, the pedestrian picture and the target posture picture as input, and performing integral generation operation;
the preprocessing is to multiply the pedestrian mask picture with the original pedestrian picture to obtain a pedestrian picture without the background;
the overall generation operation input requirement is as follows: generating an original picture, a target posture mask picture, a target posture picture and a part subjected to structured combination; the method comprises the following substeps:
4.1 inputting pedestrian Picture x into Generator GwObtaining a generated image Gw (x), and combining the pedestrian picture x, the target posture mask picture and the combined picture AwThe input generator Gw obtains the generation diagram Gw(x,p,Aw);
4.2 input the target attitude picture y into the discriminator DwTo obtain Dw(y) generating graph Gw(x,p,Aw) Input discriminator DwTo obtain Dw(Gw(x,p,Aw));
4.3 calculating target attitude picture y and generating picture Gw(x) And mask L1 loss function M (G) for mask Picture pw):
⊙ refers to the multiplication of elements between two matrices of the same size, | | Y | | | Y1Is 1-norm;
4.4 compute identity classification network as a guide:
wherein cl refers to the identity class label of the target person, and Q is the same if the class label predicted by the classification network is consistent with clc1, otherwise Qc=0,P(Gw(x,p,Aw) Output probability distribution of the classification network;
4.5 calculating the penalty function Vw
4.6 Overall Generation of network, loss function LwComprises the following steps:
Lw=Vw(Dw,Gw)+M(Gw)+C(Gw,cl)
4.7 by minimizing the loss function LwTo update the generator Gw;
4.8 by maximizing the penalty function Vw(Dw,Gw) Update discriminator Dw
4.9 Return to step 4.1 to continue updating until the loss function LwReducing the number of iterations to an acceptable range or meeting the requirement, and outputting a generated picture Gw(x,p,Aw)。
Example two
As shown in fig. 3: the invention relates to a pedestrian image generation method based on a deformable structure, which specifically comprises the following steps:
firstly, carrying out prior processing on a picture according with human body characteristics through segmentation operation and mask extraction operation;
generating different part pictures through part generation operation, and decomposing complex whole body posture correspondence;
combining and operating through structures, combining the generated map sheets of the positions, and providing powerful guidance for the generation of the whole body;
and step four, generating a more real and credible pedestrian image on the premise of keeping local information and identity information through integral generation operation.
The invention provides a pedestrian image generation method based on a deformable structure, which resolves the complex problem of pedestrian image generation into the problem of conversion among postures of a plurality of part pictures and solves the problem of pedestrian image generation by a concept of dividing and treating. The method reduces the quantity requirement of the generated network on the training samples, simultaneously takes more local characteristics as the generated indexes, and improves the quality of the generated pictures at the same time of high efficiency. The following describes in detail the structural merging operation of pedestrian image generation based on a deformable structure in the embodiment of the present invention.
As shown in fig. 5, which is an exemplary flowchart of the structural merging operation generated based on the pedestrian image with deformable structure in the embodiment of the present invention in step three,
3.1 zooming the generated part pictures according to the size proportion of different parts in the original picture for 3 generated part pictures obtained by the part generation operation;
3.2, longitudinally combining the zoomed 3 generated part pictures into one picture according to the position relation of the part structure in the original picture;
3.3 finely adjusting the positions of 3 generation part pictures of the merged picture according to the smooth connection of the edges of the merged picture, and splicing the pictures into a smoother and integral picture; and adjusting the colors and the brightness weights of the 3 generation part pictures of the combined picture according to the whole color and the illumination condition of the combined picture, and splicing to form a picture with balanced colors.
As shown in fig. 6, a pedestrian image generation apparatus based on a deformable structure according to the present invention includes:
an image preprocessing module: for the input original pedestrian picture and the input target posture picture, respectively carrying out segmentation operation and mask extraction operation on the original pedestrian picture and the target posture picture according to the part structure to obtain three groups of preprocessed part pedestrian mask pictures, part target posture mask pictures, part pedestrian pictures and part target posture pictures;
a part generation module: preprocessing the segmented part pedestrian pictures by using part pedestrian mask pictures, and performing part generation operation on the part target posture mask pictures, the part pedestrian pictures and the part target posture pictures to obtain three part generation pictures;
structural combination module: carrying out structural combination and operation on the three part generation pictures obtained by the part generation operation to obtain a structural combination picture;
an integral generation module: and taking the structural combination picture, the original picture and the target posture as input, and performing integral generation operation to obtain a final pedestrian generation picture.
Firstly, inputting an original pedestrian picture and a target posture picture into an image preprocessing module, and carrying out prior processing on the pictures according with human body characteristics through segmentation operation and mask extraction operation to obtain three groups of preprocessed part pedestrian mask pictures, part target posture mask pictures, part pedestrian pictures and part target posture pictures; inputting each group of the pedestrian pictures and the target posture pictures of the part into a part generating module, performing part generating operation to generate different part pictures, decomposing the complex whole body posture correspondence, and obtaining three part generating pictures of different parts; generating pictures for the three parts obtained by the part generating operation, combining the generated part pictures through an input structure combination module, and providing powerful guidance for the generation of the whole body to obtain a structure combination picture; and taking the structural combination picture, the original picture and the target posture as the input of the integral generation module, and finally generating a more real and credible pedestrian image on the premise of keeping local information and identity information through integral generation operation.
Preferably, the part generation module and the whole generation module each include a generator and a discriminator.
The generator comprises:
an input processing structure for performing third dimension superposition on the plurality of input pictures;
an encoder composed of a plurality of convolutional layers connected in series;
a decoder composed of a plurality of deconvolution layers connected in series;
a U-shaped structure formed by direct connection of corresponding hierarchical networks of an encoder and a decoder;
and outputting the generated picture and the output structure generating the loss.
The discriminator includes:
inputting the picture to be distinguished and the expected label into a processing structure;
a feature extraction network consisting of a plurality of convolutional layers and fully-connected layers;
and outputting the result of the discrimination tag and the output structure of the discrimination loss.
The loss function of the generator of the site generation module is:
wherein,
wherein,is taken as the mean value of the average value,is the mean value of yiAs a picture of the pose of the part target, piA picture of a mask of a target posture of the part,is a picture x of a pedestrian of a partiInput generatorThe generated image is obtained through the method of the method,is a picture x of a pedestrian of a partiAnd target pose mask picture piInput generatorThe generated image is obtained through the method of the method,is a picture x of a pedestrian of a partiAnd target pose picture yiInput discriminatorThe result of the obtained discrimination is obtained,to generate a chartMask picture p of position target postureiInput discriminatorThe resulting discrimination, ⊙, refers to the multiplication of elements between two matrices of the same size, | | y computation1Is 1-norm, i represents the part, and specifically corresponds to three divided parts.
The discriminant function of the discriminator module of the site generation module is:
wherein,is taken as the mean value of the average value,is the mean value of yiAs a picture of the pose of the part target, piA picture of a mask of a target posture of the part,is a picture x of a pedestrian of a partiAnd target pose mask picture piInput generatorThe generated image is obtained through the method of the method,is a picture x of a pedestrian of a partiAnd target pose picture yiInput discriminatorThe result of the obtained discrimination is obtained,to generate a chartMask picture p of position target postureiInput discriminatorThe obtained discrimination result i represents a site, and specifically corresponds to three divided partsAnd (4) dividing.
Generator G of integral generation modulewHas a loss function of
Lw=Vw(Dw,Gw)+M(Gw)+C(Gw,cl)
Wherein,
wherein x is a pedestrian picture, y is a target posture picture, p is a target posture mask picture, AwTo synthesize a photograph, Gw(x) Generator G for inputting pedestrian picture xwTo obtain a generated picture, Gw(x,p,Aw) For the pedestrian picture x, the target posture mask picture p and the merging picture AwInput generator GwObtaining a generated graph; dw(y) inputting the target attitude picture into a discriminator DwThe obtained discrimination result, Dw(Gw(x,p,Aw) To generate a chart Gw(x,p,Aw) Input discriminator DwThe result of the obtained discrimination is obtained,all represent respective means, ⊙ refers to the multiplication of elements between two matrices of the same size, | | computation of luminance1Is 1-norm, Mask is a matrix corresponding to the target posture Mask picture p, cl refers to the identity class label of the target person, and Q is obtained if the class label predicted by the classification network is consistent with clc1, noThen Q isc=0,P(Gw(x,p,Aw) Output probability distribution of the classification network;
discriminator G of integral generation modulewThe discriminant function of (c) is:
wherein x is a pedestrian picture, y is a target posture picture, p is a target posture mask picture, AwTo synthesize a photograph, Gw(x,p,Aw) For the pedestrian picture x, the target posture mask picture p and the merging picture AwInput generator GwObtaining a generated graph; dw(y) inputting the target attitude picture into a discriminator DwThe obtained discrimination result, Dw(Gw(x,p,Aw) To generate a chart Gw(x,p,Aw) Input discriminator DwThe result of the obtained discrimination is obtained, each represents a corresponding mean.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (10)

1. A pedestrian image generation method based on a deformable structure is characterized by specifically comprising the following steps:
step one, for an input pedestrian picture and an input target posture picture, performing segmentation operation on the pedestrian picture and the input target posture picture according to a part structure to obtain a part pedestrian picture and a part target posture picture, and performing mask extraction operation on the pedestrian picture, the target posture picture, the part pedestrian picture and the part target posture picture to obtain a pedestrian mask picture, a target posture picture mask picture, a part pedestrian mask picture and a part target posture picture mask picture;
step two, preprocessing the part pedestrian picture, and performing part generation operation on the preprocessed part pedestrian picture, the preprocessed part target posture picture and the preprocessed part target posture mask picture to obtain a part generation picture;
thirdly, performing structural combination and operation on the part generation picture obtained by the part generation operation in the second step to obtain a structural combination picture;
and step four, preprocessing the original pedestrian picture, taking the preprocessed pedestrian picture, the combined picture in the step three and the target posture picture as input, and performing integral generation operation to obtain a generated picture.
2. The pedestrian image generation method based on the deformable structure as claimed in claim 1, wherein in the first step, the segmentation operation specifically includes the following steps:
1.1, finding out the joint points of the input picture by adopting a joint point detection algorithm on the pedestrian picture and the target posture picture;
1.2 judging whether the extracted joint points can be used or not according to the positions and the certainty factors of the joint points;
1.3 if the joint points can be used, dividing the picture into 3 parts according to the average height of 2 joint points of shoulders and the average height of 2 joint points of a hip joint, wherein the part above the average height of 2 joint points of shoulders is a first part, the part between the average height of 2 joint points and the average height of 2 joint points of the hip joint is a second part, and the part below the average height of 2 joint points of the hip joint is a third part; if the joint point can not be used, the picture is divided into 3 parts according to the fixed size, and the 3 parts are respectively a first part, a second part and a third part from top to bottom.
3. The pedestrian image generation method based on the deformable structure as claimed in claim 2, wherein the second step specifically comprises the following sub-steps:
2.1 dividing the network into 3 independent generating networks according to different generating parts, wherein the generating networks respectively correspond to the first part, the second part and the third part in the first step;
2.2 for the ith independent Generation network, including the GeneratorSum discriminatorDirection generatorSum discriminatorInputting the pedestrian picture x of the divided partiAnd the segmented target posture mask picture piAnd the segmented target attitude picture yiGenerating a picture G by training a part with the output consistent with the target posturepi(xi,pi);
2.3 repeating step 2.2 for 3 independent generating networks in sequence to obtain all the part generating pictures.
4. A pedestrian image generation method based on deformable structures as claimed in claim 3, wherein the step three, the structural combination operation includes the sub-steps of:
3.1 for the 3 generated part pictures corresponding to the first part, the second part and the third part, according to the size ratio h of different parts in the original pictureT,iAnd wTThe generated part picture Zooming to obtain zoomed 3 generated part pictures
3.2 based on the position relation of the part structure in the original image Longitudinally combining the parts into a structural combined part to generate a picture;
3.3 adjusting the color and edge connection information of the structured merged part generated picture, Δ hiIs an offset adjustment of the height, ciIs the color balance adjustment factor of the picture at different positions according to delta hiAnd ciObtain a more real structural combination picture Aw
5. A pedestrian image generation method based on a deformable structure according to claim 4, characterized in that said step 2.2 comprises in particular the sub-steps of:
2.2a) image x of pedestrian of the divided partiInput generatorTo obtain a generated graph Gpi(xi) To take a picture x of a pedestrian at a positioniAnd target pose mask picture piInput generatorGenerating pictures
2.2b) image of pedestrian at location xiAnd target pose picture yiInput discriminatorTo obtainWill generate the graph Gpi(xi,pi) Mask picture p of position target postureiInput discriminatorTo obtain
2.2c) calculating the position target attitude picture yiAnd generating a graph Gpi(xi) Mask picture p of position target postureiMask L1 loss functionWherein⊙ refers to the multiplication of elements between two matrices of the same size, | | Y | | | Y1Is 1-norm; calculate and generate the chart Gpi(xi) Loss function V against real picturepiMask is a target posture Mask picture matrix:is an average value;
2.2d) calculating the penalty function Is an average value;
2.2e) the two loss functions are combined, the ith independent generating network, and the loss functions are:
2.2f) by minimizing the loss function LiTo update the generator
2.2g) by maximizing the penalty functionUpdating discriminator
2.2k) return to 2.2a) continue updating until the loss function LiReducing the position to a threshold value or outputting a part generation picture G consistent with the target posture when the iteration times meet the requirementspi(xi,pi)。
6. A pedestrian image generation method based on a deformable structure according to claim 5, characterized in that said step four, the overall generation operation comprises the sub-steps of:
4.1 inputting pedestrian Picture x into Generator GwTo obtain a generated graph Gw(x) The pedestrian picture x, the target posture mask picture and the merging picture A are combinedwInput generator GwTo obtain a generated graph Gw(x,p,Aw);
4.2Inputting the target attitude picture y into a discriminator DwTo obtain Dw(y) generating graph Gw(x,p,Aw) Input discriminator DwTo obtain Dw(Gw(x,p,Aw));
4.3 calculating target attitude picture y and generating picture Gw(x) And mask L1 loss function M (G) for mask Picture pw):
⊙ refers to the multiplication of elements between two matrices of the same size, | | Y | | | Y1Is 1-norm;
4.4 compute identity classification network as a guide:
wherein cl refers to the identity class label of the target person, and Q is the same if the class label predicted by the classification network is consistent with clc1, otherwise Qc=0,P(Gw(x,p,Aw) Output probability distribution of the classification network;
4.5 calculating the penalty function Vw
4.6 Overall Generation of network, loss function LwComprises the following steps:
Lw=Vw(Dw,Gw)+M(Gw)+C(Gw,cl)
4.7 by minimizing the loss function LwTo update the generator Gw
4.8 by maximizing the penalty function Vw(Dw,Gw) Update discriminator Dw
4.9 Return to step 4.1 to continue updating untilLoss function LwReducing the number of iterations to an acceptable range or meeting the requirement, and outputting a generated picture Gw(x,p,Aw)。
7. The pedestrian image generation method based on the deformable structure as claimed in claim 6, wherein in the first step, the mask extracting operation is specifically:
for the input picture, adopting a mask detection algorithm to obtain a corresponding mask picture; the colors of the detected objects on the mask picture are uniformly white, and the colors of the background are uniformly black.
8. The pedestrian image generation method based on the deformable structure as claimed in claim 4, wherein in the third step, AwThe calculation formula of (2) is as follows:
wherein h isTAnd wTHeight and width of the target picture, hT,iRepresenting the height of the ith body part of the target picture; r (pic, h, w) represents an operation of resizing one picture to h w, and O (h w) refers to a zero matrix of h w size. We reorganize the position of the region picture according to the structure relationship of the region of the target picture. To ensure the smoothness of the site connection,. DELTA.hiIs an offset adjustment of the height, ciIs the color balance adjustment factor of the picture at different positions.
9. A pedestrian image generation device based on a deformable structure, comprising:
an image preprocessing module: for the input original pedestrian picture and the input target posture picture, respectively carrying out segmentation operation and mask extraction operation on the original pedestrian picture and the target posture picture according to the part structure to obtain three groups of preprocessed part pedestrian mask pictures, part target posture mask pictures, part pedestrian pictures and part target posture pictures;
a part generation module: preprocessing the segmented part pedestrian pictures by using part pedestrian mask pictures, and performing part generation operation on the part target posture mask pictures, the part pedestrian pictures and the part target posture pictures to obtain three part generation pictures;
structural combination module: carrying out structural combination and operation on the three part generation pictures obtained by the part generation operation to obtain a structural combination picture;
an integral generation module: and taking the structural combination picture, the original picture and the target posture as input, and performing integral generation operation to obtain a final pedestrian generation picture.
10. The pedestrian image generation apparatus based on a deformable structure as claimed in claim 9, wherein the part generation module and the whole generation module each comprise a generator and a discriminator.
CN201910425357.7A 2019-05-21 2019-05-21 Pedestrian image generation method and device based on deformable structure Active CN110288677B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910425357.7A CN110288677B (en) 2019-05-21 2019-05-21 Pedestrian image generation method and device based on deformable structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910425357.7A CN110288677B (en) 2019-05-21 2019-05-21 Pedestrian image generation method and device based on deformable structure

Publications (2)

Publication Number Publication Date
CN110288677A true CN110288677A (en) 2019-09-27
CN110288677B CN110288677B (en) 2021-06-15

Family

ID=68002453

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910425357.7A Active CN110288677B (en) 2019-05-21 2019-05-21 Pedestrian image generation method and device based on deformable structure

Country Status (1)

Country Link
CN (1) CN110288677B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915527A (en) * 2012-10-15 2013-02-06 中山大学 Face image super-resolution reconstruction method based on morphological component analysis
US20140176551A1 (en) * 2012-12-21 2014-06-26 Honda Motor Co., Ltd. 3D Human Models Applied to Pedestrian Pose Classification
CN107423707A (en) * 2017-07-25 2017-12-01 深圳帕罗人工智能科技有限公司 A kind of face Emotion identification method based under complex environment
KR101818129B1 (en) * 2017-04-25 2018-01-12 동국대학교 산학협력단 Device and method for pedestrian recognition using convolutional neural network
CN107808111A (en) * 2016-09-08 2018-03-16 北京旷视科技有限公司 For pedestrian detection and the method and apparatus of Attitude estimation
CN107832672A (en) * 2017-10-12 2018-03-23 北京航空航天大学 A kind of pedestrian's recognition methods again that more loss functions are designed using attitude information
CN108038862A (en) * 2017-12-11 2018-05-15 深圳市图智能科技有限公司 A kind of Interactive medical image intelligent scissor modeling method
CN108154104A (en) * 2017-12-21 2018-06-12 北京工业大学 A kind of estimation method of human posture based on depth image super-pixel union feature
CN108319932A (en) * 2018-03-12 2018-07-24 中山大学 A kind of method and device for the more image faces alignment fighting network based on production
CN108564119A (en) * 2018-04-04 2018-09-21 华中科技大学 A kind of any attitude pedestrian Picture Generation Method
CN108921064A (en) * 2018-06-21 2018-11-30 西安理工大学 Pedestrian based on multi-feature fusion recognition methods again
CN109376582A (en) * 2018-09-04 2019-02-22 电子科技大学 A kind of interactive human face cartoon method based on generation confrontation network
CN109472191A (en) * 2018-09-17 2019-03-15 西安电子科技大学 A kind of pedestrian based on space-time context identifies again and method for tracing
CN109711316A (en) * 2018-12-21 2019-05-03 广东工业大学 A kind of pedestrian recognition methods, device, equipment and storage medium again

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915527A (en) * 2012-10-15 2013-02-06 中山大学 Face image super-resolution reconstruction method based on morphological component analysis
US20140176551A1 (en) * 2012-12-21 2014-06-26 Honda Motor Co., Ltd. 3D Human Models Applied to Pedestrian Pose Classification
CN107808111A (en) * 2016-09-08 2018-03-16 北京旷视科技有限公司 For pedestrian detection and the method and apparatus of Attitude estimation
KR101818129B1 (en) * 2017-04-25 2018-01-12 동국대학교 산학협력단 Device and method for pedestrian recognition using convolutional neural network
CN107423707A (en) * 2017-07-25 2017-12-01 深圳帕罗人工智能科技有限公司 A kind of face Emotion identification method based under complex environment
CN107832672A (en) * 2017-10-12 2018-03-23 北京航空航天大学 A kind of pedestrian's recognition methods again that more loss functions are designed using attitude information
CN108038862A (en) * 2017-12-11 2018-05-15 深圳市图智能科技有限公司 A kind of Interactive medical image intelligent scissor modeling method
CN108154104A (en) * 2017-12-21 2018-06-12 北京工业大学 A kind of estimation method of human posture based on depth image super-pixel union feature
CN108319932A (en) * 2018-03-12 2018-07-24 中山大学 A kind of method and device for the more image faces alignment fighting network based on production
CN108564119A (en) * 2018-04-04 2018-09-21 华中科技大学 A kind of any attitude pedestrian Picture Generation Method
CN108921064A (en) * 2018-06-21 2018-11-30 西安理工大学 Pedestrian based on multi-feature fusion recognition methods again
CN109376582A (en) * 2018-09-04 2019-02-22 电子科技大学 A kind of interactive human face cartoon method based on generation confrontation network
CN109472191A (en) * 2018-09-17 2019-03-15 西安电子科技大学 A kind of pedestrian based on space-time context identifies again and method for tracing
CN109711316A (en) * 2018-12-21 2019-05-03 广东工业大学 A kind of pedestrian recognition methods, device, equipment and storage medium again

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHANG YIQIAN等: "Bi-directional Re-ranking for Person Re-identification", 《2019 IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR)》 *
LI JIA 等: "Multi-Pose Learning based Head-Shoulder Re-identification", 《2018 IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL》 *
王浩: "基于视觉的行人检测技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Also Published As

Publication number Publication date
CN110288677B (en) 2021-06-15

Similar Documents

Publication Publication Date Title
CN109886121B (en) Human face key point positioning method for shielding robustness
Simo-Serra et al. Mastering sketching: adversarial augmentation for structured prediction
CN110543846B (en) Multi-pose face image obverse method based on generation countermeasure network
CN111444889A (en) Fine-grained action detection method of convolutional neural network based on multi-stage condition influence
CN108537743A (en) A kind of face-image Enhancement Method based on generation confrontation network
CN110796080A (en) Multi-pose pedestrian image synthesis algorithm based on generation of countermeasure network
CN114783024A (en) Face recognition system of gauze mask is worn in public place based on YOLOv5
CN110363068B (en) High-resolution pedestrian image generation method based on multiscale circulation generation type countermeasure network
CN112800937A (en) Intelligent face recognition method
CN109902667A (en) Human face in-vivo detection method based on light stream guide features block and convolution GRU
CN110033054A (en) Personalized handwritten form moving method and system based on collaboration stroke optimization
CN113808005A (en) Video-driving-based face pose migration method and device
CN112036260A (en) Expression recognition method and system for multi-scale sub-block aggregation in natural environment
CN113378949A (en) Dual-generation confrontation learning method based on capsule network and mixed attention
CN116030498A (en) Virtual garment running and showing oriented three-dimensional human body posture estimation method
CN115331003A (en) Single-stage instance segmentation method based on contour point representation mask under polar coordinates
CN114240811A (en) Method for generating new image based on multiple images
WO2024099026A1 (en) Image processing method and apparatus, device, storage medium and program product
CN114155540A (en) Character recognition method, device and equipment based on deep learning and storage medium
CN117422978A (en) Grounding visual question-answering method based on dynamic two-stage visual information fusion
CN110288677B (en) Pedestrian image generation method and device based on deformable structure
CN112307889A (en) Face detection algorithm based on small auxiliary network
Wang et al. Dense Hybrid Attention Network for Palmprint Image Super-Resolution
CN114120391B (en) Multi-pose face recognition system and method thereof
CN115761801A (en) Three-dimensional human body posture migration method based on video time sequence information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant