CN111460993B

CN111460993B - Human image generation method based on AND-OR graph AOG

Info

Publication number: CN111460993B
Application number: CN202010244323.0A
Authority: CN
Inventors: 吴炜; 陈灵超
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2023-04-07
Anticipated expiration: 2040-03-31
Also published as: CN111460993A

Abstract

The invention discloses a human image generation method based on an and-or graph AOG (automatic object graph), which mainly solves the problems that the prior art cannot simultaneously meet the requirements of human image data on appearance diversity, posture diversity and use simplicity. The implementation scheme is as follows: learning a human skeleton and or graph AOG model in a two-dimensional human pose dataset, sampling a three-dimensional human pose sample from the model; learning a human appearance and or image AOG model in a self-occlusion-free and background-free human image dataset with a two-dimensional pose label, and sampling a human appearance sample from the model; and constructing a pseudo three-dimensional human body model according to the three-dimensional human posture sample and the human appearance sample, carrying out perspective projection on the model, and sequentially carrying out width adjustment, fusion, interpolation and background addition on a projection result to generate a human image. The invention can effectively finish the human image generation with diversified appearances and postures while ensuring simple and convenient use, and can be used for human image data enhancement of machine learning.

Description

Human image generation method based on AND-OR graph AOG

Technical Field

The invention belongs to the technical field of image processing, and further relates to a human image generation method which can be used for image data enhancement in machine learning.

Background

In recent years, with the development of machine learning techniques, human image data has been used more and more widely. However, due to the variety of human garments and the variety of human poses, the variety of human images is particularly rich. And because the labeling of human image samples required by the prior art is more and more complex, the cost of manual real labeling is higher and higher. Therefore, many human image data sets have the problems of insufficient sample number and insufficient coverage rate, and the performance of a machine learning model based on human image data is directly influenced.

The human image data is acquired, and besides the expensive method of acquiring the real image and performing the real annotation, the method with lower cost can be generated manually.

The most common methods for expanding human image samples are general image data enhancement methods such as flipping, rotation, scaling, color dithering and noise addition, which can easily calculate the label of a new image after transformation through the label of an original image, however, due to the diversity of human images, the method has a limited effect on actually increasing the spatial coverage of the image samples.

A Human image generation method based on a three-dimensional Human body model and clothing textures is provided by Chen W, wang H and Li Y in Synthesis Training Images for Boosting Human 3D position Estimation. Although the generated image samples are indeed various in texture, the contour of all human images generated by the method is the contour of a three-dimensional human body model, which is fatal to some machine learning models sensitive to the contour, and the manufacturing cost of a garment model used by the method is quite expensive, and a more ideal method is to extract information from some input images for generation instead of generation based on a special garment model.

Although a method for generating human image samples based on three-dimensional human body models and multi-view images is proposed in "Learning peer detection models from raw viewing samples" by Pishchulin L, jain a, and Wojek C, this method does extract information from input pictures for image generation, but pictures of multiple views at the same time of the same target are required as input, which is impossible to implement in most cases.

Jain A proposes a human image generation method based on three-dimensional posture and geometric deformation in 'insulated peer detection and position estimation' which comprises the steps of firstly determining a target three-dimensional posture to be transformed according to a three-dimensional posture label of an input image, then projecting the target three-dimensional posture into a target two-dimensional posture, and then deforming the input image to the state of the target two-dimensional posture through a geometric deformation method according to a corresponding relation of joints between the two-dimensional posture of the input image and the target two-dimensional posture so as to generate a target human image. Although the method is low in cost, the diversity of human image samples which can be generated is very limited, the appearance is not changed, the outline details remain the original mode of the input image, and the generated target three-dimensional posture is very limited.

Furthermore, a series of Human Image Generation methods based on Generation of a confrontational network, such as Deformable GANs for dose-based Human Image Generation, have problems of high training cost, poor appearance diversity, and insufficient coverage of three-dimensional postures.

In summary, the prior art has poor appearance diversity, poor posture diversity or high use cost, and cannot simultaneously satisfy the data enhancement requirements of human image data in the aspects of appearance diversity, posture diversity and use convenience.

Disclosure of Invention

The invention aims to solve the defects of the prior art, introduce the concept of an and-or graph into a human image generation task, and provide a human image generation method based on an and-or graph AOG (automatic Generation) so as to simultaneously meet the data enhancement requirements of human image data in the aspects of appearance diversity, posture diversity and use simplicity.

In order to achieve the purpose, the technical scheme of the invention comprises the following steps:

(1) Defining a human skeleton and or map AOG model describing a three-dimensional human pose sample space, inputting a two-dimensional human pose dataset, and using a genetic algorithm to learn the human skeleton and or map AOG model in the two-dimensional human pose dataset;

(2) Defining a human appearance and OR graph AOG model describing a human appearance sample space, inputting a self-occlusion-free and background-free human image dataset with two-dimensional human posture labels, and learning the human appearance and OR graph AOG model in the dataset;

(3) Sampling from the learned human skeleton and/or map AOG model to generate a three-dimensional human pose sample;

(4) Sampling from the learned human appearance and/or map AOG model to generate a human appearance sample;

(5) Constructing a pseudo three-dimensional human body model according to a three-dimensional human posture sample and a human appearance sample obtained by sampling;

(6) Carrying out perspective projection on the pseudo-three-dimensional human body model, and carrying out width adjustment on a projection result to obtain a projection result after the width adjustment;

(7) Carrying out fusion processing on the projection result after the width adjustment to obtain a background-free human image;

(8) Carrying out interpolation processing on the human image without the background to obtain a human image without the background and the empty points;

(9) And adding a background to the human image without the background and the blank points to generate a human image.

According to the invention, the AND-OR graph AOG is used for human image generation, so that a three-dimensional human posture sample space with rich diversity and a human appearance sample space can be directly learned from a two-dimensional human posture data set and a non-self-shielding and non-background human image data set with a two-dimensional human posture label, the human image generation task with diversified appearances and postures can be effectively completed while the use simplicity is ensured, and the training effect of the classifier based on the human image data set is improved.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a no-self-occlusion no-background human image used by the present invention in human appearance and OR graph AOG model learning;

FIG. 3 is a human image generated by the present invention;

FIG. 4 is a graph comparing ROC curves for operating characteristics of classifiers trained using 241 real images and classifiers trained using 241 real images plus human images generated by the present invention, respectively;

FIG. 5 is a comparison graph of the ROC curves of the operating characteristics of a classifier trained using 120 real images and a classifier trained using the human image generated by the present invention.

Detailed Description

Embodiments and effects of the present invention are described in further detail below with reference to the accompanying drawings.

Referring to fig. 1, the implementation steps of the invention are as follows:

the method comprises the following steps: learning of a human skeleton and/or a graph AOG model is performed in a two-dimensional human pose dataset.

The human skeleton and or map AOG model used in this example defines 19 components, respectively, "head", "upper torso", "lower torso", "left shoulder", "right shoulder", "left hip", "right hip", "left upper arm", "left lower arm", "left hand", "right upper arm", "right lower arm", "right hand", "left upper leg", "left lower leg", "left foot", "right upper leg", "right lower leg", "right foot";

tree-structure dependencies are defined among the 19 components, wherein the root node of the tree structure is a "lower torso" component, each component in the tree structure is a child node of the component on which it depends, and the dependencies of the components are respectively "head" dependent on "upper torso", "upper torso" dependent on "lower torso", "left shoulder" dependent on "upper torso", "right shoulder" dependent on "upper torso", "left hip" dependent on "lower torso", "right hip" dependent on "lower torso", "left upper arm" dependent on "left shoulder", "left lower arm" dependent on "left upper arm", "left hand" dependent on "left lower arm", "right upper arm" dependent on "right shoulder", "right lower arm" dependent on "right upper arm", "right hand" dependent on "right lower arm", "left upper leg" dependent on "left hip", "left lower leg" dependent on "left upper leg", "right upper leg" dependent on "right hip", "right lower leg" dependent on "right upper leg", "right lower leg" dependent on "right upper leg" dependent on "right upper leg", "right lower leg" dependent on "right leg", and "right lower leg dependent on" lower leg;

each of these 19 components defines 13 parameters, denoted r, θ, and,

n、rangeθ、

rangen、μ _θ 、σ _θ 、

μ _n 、σ _n Wherein:

r、θ、

is a spherical coordinate parameter representing a spatial position, r is a spherical coordinate distance parameter, theta is a spherical coordinate latitude angle parameter, and>

is a spherical coordinate longitude angle parameter, n is a parameter representing a normal vector position, mu _θ 、

μ _n Respectively represent theta,

n parameter of expected value, σ, subject to normal distribution _θ 、

σ _n Respectively represents theta and/or>

n is a parameter of standard deviation of normal distribution, and range theta is theta and mu _θ And σ _θ Is taken into consideration and/or is taken into consideration>

Is->

And &>

Range of (1), ran is n, mu _n And σ _n The value range of (a);

r, range θ, of each component in all parameters of the human skeleton and or map AOG model,

The rangen parameter is a hyper-parameter, theta, of each component>

n parameter is a variation parameter, and μ for each component _θ 、σ _θ 、

μ _n 、σ _n The parameters are parameters to be learned in the learning process of the human skeleton and/or graph AOG model;

the dataset used by the learning process of the human skeleton and or graph AOG model is a two-dimensional human pose dataset, each datum in the dataset consisting of 19 components, the 19 components being identical to the 19 components defined in the human skeleton and or graph AOG model, and each component of each datum in the dataset containing two joint point labels, respectively a component starting joint point SP _part And a component termination articulation point EP _part Starting the joint point SP of each component _part And a component termination articulation point EP _part The distance between the two is called the length of the assembly, and the joint point SP of each assembly is started by the assembly _part Pointing element termination articulation point EP _part The vector of (a) is referred to as the direction vector of the component;

the sampled samples of the human skeleton and or map AOG model are three-dimensional human pose samples, one three-dimensional human pose sample is composed of 19 components, the 19 components are the same as the 19 components defined in the human skeleton and or map AOG model, and each component contains three parameters, respectively, a three-dimensional pose component starting point SP _ske End point EP of three-dimensional gesture Assembly _ske And a three-dimensional pose component normal vector N _ske The three parameters use coordinates in a three-dimensional world coordinate system;

the step is to use a genetic algorithm to learn a human skeleton and/or image AOG model in a two-dimensional human posture data set, and the method is realized as follows:

1.1 ) a two-dimensional human posture data set to obtain an optimized target two-dimensional posture statistic F _t ：

1.1.1 ) counting the frequency of occurrence of relative rotation angles of each component in the two-dimensional human posture data set to obtain a first 190-dimensional frequency statistic:

1.1.1a) for each component of each data in the two-dimensional human posture data set, taking a direction vector of the component, if the component is a lower torso, jumping to 1.1.1b), otherwise, finding out the component depended on by the component according to the dependency relationship between the human skeleton and 19 components defined in the AOG model of the graph, and taking the direction vector of the component depended on by the component;

1.1.1b) for each component of each data in the two-dimensional human posture data set, according to the result of 1.1.1a), if the component is a lower torso, calculating a rotation angle required for clockwise rotating from the direction of the vector (0,1) to the direction of the component direction vector, and taking the rotation angle as the relative rotation angle of the component, otherwise, calculating a rotation angle required for clockwise rotating from the direction of the component dependent on the component to the direction of the component direction vector, and taking the rotation angle as the relative rotation angle of the component;

1.1.1c) carrying out frequency statistics on each component of each data in the two-dimensional human posture data set by using the relative rotation angle of the component according to the result of 1.1.1b), wherein the width of the statistical interval of the relative rotation angle of each component is 36 degrees, namely, the range of 360 degrees is divided into 10 intervals for statistics;

1.1.2 ) counting the occurrence frequency of the relative expansion ratio of each component in the two-dimensional human posture data set to obtain a second 190-dimensional frequency statistic:

1.1.2a) for each component of each data in the two-dimensional human posture data set, taking the length of the component, taking the length of a lower trunk component in the same data, taking a spherical coordinate distance parameter r value corresponding to the component in a human skeleton and a lower trunk component in a graph AOG model, and taking a spherical coordinate distance parameter r value corresponding to the human skeleton and the lower trunk component in the graph AOG model;

1.1.2b) for each component of each data in the two-dimensional human posture data set, according to the result of 1.1.1 a), firstly calculating the relative proportion between the length of the component and the value of the spherical coordinate distance parameter r corresponding to the component, namely the relative proportion of the component; then calculating the relative proportion between the length of the lower torso component and the corresponding spherical coordinate distance parameter r value of the lower torso component in the same data, and the relative proportion is called as the relative proportion of the lower torso; then calculating the relative proportion between the relative proportion of the component and the relative proportion of the lower torso, and taking the relative proportion as the relative expansion proportion of the component;

1.1.2c) for each component of each data in the two-dimensional human posture data set, according to the result of 1.1.1 b), if the relative expansion ratio of the component is greater than 1, performing frequency statistics by using a ratio value 1, otherwise, performing frequency statistics by using the relative expansion ratio of the component, wherein the width of a relative expansion ratio statistical interval of each component is 0.1, namely dividing the range from the ratio value 0 to the ratio value 1 into 10 intervals for statistics;

1.1.3 Two 190-dimensional frequency statistics obtained in 1.1.1) and 1.1.2) are connected in series to form a 380-dimensional frequency statistic, and each frequency value in the 380-dimensional frequency statistic is divided by the data volume of the two-dimensional human posture data set to obtain an optimized target two-dimensional posture statistic F _t ；

1.2 Set a viewpoint P in perspective projection parameters _view Visual plane V _view Setting population size pn, cross rate cr, variation rate mr, sampling sample number sn and maximum iteration number wn in genetic algorithm parameters;

1.3 Create initial population: each individual in the population corresponds to a human skeleton and or map AOG model, in which 19 components are defined, each groupThe element comprises 6 parameters to be learned, which are respectively the expected value parameters mu of the normal distribution obeyed by the spherical coordinate latitude angle parameter theta of the element _θ The normal distribution standard deviation value parameter sigma obeyed by the spherical coordinate latitude angle parameter theta of the component _θ Spherical coordinate longitude angle parameter of the assembly

Desired value parameter obeying a normal distribution>

Spherical coordinate longitude and latitude parameter of the assembly>

The criterion-difference value parameter obeying a normal distribution>

The normal distribution of the normal vector position parameter n of the component is obeyed by the expected value parameter mu _n And a normally distributed standard deviation parameter sigma obeyed by a normal vector position parameter n of the component _n Setting the genotype of each individual in the population as a 19 x6 =114-dimensional vector consisting of human skeletons and all parameters to be learned in an AOG model, wherein the initial population has pn individuals, and the genotype value of each individual is obtained by uniformly distributed random values;

1.4 ) crossover: combining individuals in the current population randomly pairwise, and performing cross operation on each combination according to the probability of a cross rate cr, namely half of genotypes of the two individuals in the combination are mutually exchanged;

1.5 Variation: carrying out mutation operation on each individual in the current population according to the probability of the mutation rate mr, namely randomly selecting one component from 19 components defined by a human skeleton and an AOG model, and carrying out uniformly distributed random dereferencing on 6 parameters corresponding to the component again;

1.6 Calculate fitness of each individual in the current population:

1.6.1 ) constructing a corresponding human skeleton and or map AOG model according to the genotype of the current operation individual, and generating sn three-dimensional human posture samples by using the human skeleton and or map AOG model for sampling, wherein the method comprises the following steps:

1.6.1a) taking the currently operated human skeleton and map AOG model, and for each component defined in the human skeleton and map AOG model, taking the mean parameter mu corresponding to the component _θ 、

And mu _n Taking the standard deviation parameter sigma corresponding to the component _θ 、

And σ _n Taking the value range parameter range theta and phi corresponding to the component>

And rangen, in μ of the assembly _θ And σ _θ A normal distribution is constructed as the mean and standard deviation, randomly sampling a theta value from the normal distribution that meets the range theta range requirement to determine whether the component is->

And &>

Constructing a normal distribution as the mean and standard deviation from which a sample satisfying->

Is claimed in>

Value of, in μ of the component _n And σ _n Constructing a normal distribution as a mean value and a standard deviation, and randomly sampling an n value meeting the range requirement of rangen from the normal distribution;

1.6.1b) according to the human skeleton and Or map AOG modelA tree structure of dependency relationships between 19 components is defined, starting from the root node in breadth-first traversal order, using θ, sampled in 1.6.1a),

n value sequentially calculates three-dimensional gesture component starting point SP of each component _ske Three-dimensional pose component end point EP _ske Three-dimensional attitude component normal vector N _ske The implementation is as follows:

1.6.1b1) if the current component is 'lower torso', jumping to 1.6.1b2), otherwise, finding the component depended on by the current component according to the dependency relationship between the human skeleton and 19 components defined by the graph AOG model, and taking the starting point SP of the three-dimensional posture component of the component depended on by the current component _ske Called dependent component starting point, taking the three-dimensional gesture component end point EP of the component on which the current component depends _ske Called dependent component endpoint, taking the normal vector N of the three-dimensional pose component of the component on which the current component depends _ske The vector is called a dependent component normal vector, and the direction of the vector from the dependent component starting point to the dependent component end point is called a dependent component direction;

1.6.1b2) if the current component is "lower torso" or "left hip" or "right hip", let the three-dimensional pose component starting point SP of the current component be _ske The position of the three-dimensional posture component starting point SP of the current component is the same as the position of the origin of the three-dimensional world coordinate system, otherwise, the three-dimensional posture component starting point SP of the current component is enabled _ske The same as the position of the dependent component endpoint;

1.6.1b3) if the current component is the lower torso, jump to 1.6.1b4), otherwise, calculating a uniquely determined coordinate system according to the results of 1.6.1b1) and 1.6.1b2), and referring the coordinate system as the current coordinate system, the origin position of the current coordinate system and the three-dimensional posture component starting point SP of the current component obtained in 1.6.1b2) _ske The x-axis direction of the current coordinate system is the direction of the dependent component normal vector in 1.6.1b 1), the z-axis direction of the current coordinate system is the direction of the dependent component in 1.6.1b 1), the y-axis direction of the current coordinate system is obtained by cross product operation of the z-axis direction and the x-axis direction, and the coordinate axis base vector of the current coordinate system is three-dimensionally arrangedThe coordinates in the world coordinate system are respectively expressed as

And &>

1.6.1b4) obtaining the spherical coordinate latitude angle parameter theta and the spherical coordinate longitude angle parameter theta corresponding to the current assembly according to the sampling in 1.6.1a)

And calculating a three-dimensional gesture component end point EP of the current component according to a spherical coordinate distance parameter r corresponding to the current component in the human skeleton and the AOG model _ske Coordinates EP 'in the Current coordinate System' _ske ：/>

Wherein r is a spherical coordinate distance parameter r corresponding to the human skeleton and the current component in the graph AOG model, theta is a value theta corresponding to the current component sampled in 1.6.1a),

is the ≧ corresponding for the current component sampled in 1.6.1a)>

A value;

1.6.1b5) if the current assembly is the lower torso, making the three-dimensional posture assembly end point of the current assembly be EP _ske ＝EP′ _ske Otherwise, the three-dimensional pose component end point EP of the current component is defined _ske Coordinates EP 'in the Current coordinate System' _ske Converting into three-dimensional world coordinate system to obtain three-dimensional posture component end point EP of current component _ske ：

Wherein EP' _ske Is 1.6.1b4) derived three-dimensional pose component end point EP for the current component _ske The coordinates in the current coordinate system are,

and &>

Is the coordinate of the coordinate axis base vector of the current coordinate system obtained by 1.6.1b3) in the three-dimensional world coordinate system, SP _ske Is 1.6.1b2) the three-dimensional pose component starting point of the current component;

1.6.1b6) if the current assembly is the lower trunk, jumping to 1.6.1b7), and otherwise, connecting the origin of the current coordinate system and the three-dimensional posture assembly end point EP of the current assembly obtained by 1.6.1b4) in the current coordinate system _ske Coordinates EP 'in the Current coordinate System' _ske Obtaining a straight line, searching a vector which is perpendicular to the straight line and has the smallest included angle with the x-axis direction of the current coordinate system in the current coordinate system, and expressing the vector as

1.6.1b7) if the current component is the lower torso, jumping to 1.6.1b8), otherwise, taking the n value corresponding to the current component sampled in 1.6.1a), and then taking the straight line obtained in 1.6.1b6) as the rotating axis in the current coordinate system, and taking the vector obtained in 1.6.1b6) as the rotating axis

Rotates n degrees clockwise and the rotated vector->

Is the normal vector N of the three-dimensional posture component of the current component _ske Coordinates in the current coordinate system, expressed as N' _ske ；

1.6.1b8) if the current component is "lower torso", take the current group sampled in 1.6.1a)Calculating the normal vector N of the three-dimensional posture component of the current component according to the N value corresponding to the component _ske ：

N _ske ＝(1*sin(90°)*cos(n),1*sin(90°)*sin(n),1*cos(90°))

Otherwise, obtaining the normal vector N of the three-dimensional posture component of the current component by 1.6.1b7) _ske Coordinate N 'in the current coordinate system' _ske Converting the current component into a three-dimensional world coordinate system to obtain a normal vector N of the three-dimensional posture component of the current component _ske ：

Wherein N' _ske Is 1.6.1b7) of the current component's three-dimensional pose component normal vector N _ske The coordinates in the current coordinate system are,

and &>

1.6.1c) three-dimensional attitude component starting point SP of all components obtained in 1.6.1b) _ske Three-dimensional pose component end point EP _ske Three-dimensional attitude component normal vector N _ske All the components are combined together to be used as a three-dimensional human posture sample obtained by sampling;

1.6.1d) iterating 1.6.1a) to 1.6.1c) until sn three-dimensional human pose samples are sampled;

1.6.2 All three-dimensional human gesture samples generated in 1.6.1) are reduced to two-dimensional human gesture samples, for each three-dimensional human gesture sample, at a viewpoint P _view Under conditions of (a), perspective projection of the three-dimensional human pose sample onto a viewing plane V _view Obtaining a projection result, namely the two-dimensional human posture sample after dimension reduction;

1.6.3 All the two-dimensional human posture samples obtained in 1.6.2) are taken as a data set, and the data set is counted to obtain an individual expression type two-dimensional posture statistic F _c Statistical methods and 1.1) statistical two-dimensional human posture data sets to obtain optimized target two-dimensional posture statistics F _t The method is the same;

1.6.4 Computing an individual's phenotype two-dimensional pose statistic F _c And optimizing target two-dimensional pose statistic F _t Taking the reciprocal of the error as the fitness of the current operation individual;

1.7 Finding out two individuals with lowest fitness in the current population, combining the other individuals in pairs at random, selecting two combinations at random to perform cross operation to obtain two new cross individuals, and updating the two individuals with lowest fitness in the current population by using the two new cross individuals;

1.8 Finding out the current generation optimal individual with highest fitness in the current population, if the current generation optimal individual is the initial generation population, recording the current generation optimal individual as the historical optimal individual, and otherwise, comparing the fitness of the current generation optimal individual with that of the historical optimal individual, and updating the individual with higher fitness as the historical optimal individual;

1.9 Iterate 1.4) to 1.8) until reaching the maximum iteration time wn, and the history optimal individual is obtained and output as a learning result.

Step two: learning of human appearance and/or image AOG models is performed in a self-occlusion free and background free human image dataset with two-dimensional human pose labels.

The human appearance and map AOG model used in this example defines 19 elements, which are the same as the 19 elements defined in the human skeleton and map AOG model, and defines an appearance array for each of the 19 elements, in each of which all appearances of the element are stored, each of the elements stored in the appearance array is composed of three parameters, namely, an element appearance pixel block I _psg And a component appearance starting point SP _psg And end of appearance EP of the assembly _psg Wherein the component appearance starting point SP _psg And end of appearance EP of the assembly _psg Are all component appearance pixel blocks I _psg The coordinates of the middle pixel points; the data set used in the learning process of the human appearance and OR map AOG model is a self-occlusion free and background free human image data set with two-dimensional human pose labels, each data in the data set is composed of two parts, respectively a human region pixel block I _hum And two-dimensional gesture annotation, wherein the two-dimensional gesture annotation is composed of 19 components, the 19 components have the same human appearance as the 19 components defined in the AOG model, and each component contains two joint point annotations, namely the starting point of the two-dimensional gesture component

And two-dimensional gesture component endpoint>

Wherein the two-dimensional gesture component origin SP _pos And two-dimensional gesture Assembly end point EP _pos Are all human region pixel blocks I _hum The coordinates of the middle pixel point are greater or less>

Is SP _pos Is greater than or equal to>

Is SP _pos Is on the ordinate and is greater or less>

Is EP _pos Is greater than or equal to>

Is EP _pos The ordinate of (a);

the step is to carry out the learning of human appearance and/or image AOG model in a non-self-shielding and non-background human image data set with a two-dimensional human posture label, and the following steps are realized:

2.1 ) set the standard length parameter L for each of the 19 components defined by the human appearance and/or map AOG model _s And a standard width parameter W _s ；

2.2 Scaling all image data in a self-occlusion free and background free human image dataset with a two-dimensional human pose tag such that a "lower torso" component length of all image data is equal to a standard length parameter L corresponding to the "lower torso" component _s And (3) equality: the "lower torso" component length for each image data refers to the two-dimensional pose component starting point SP of the "lower torso" component in that image data _pos With two-dimensional gesture component end point EP _pos The distance between the two, each image data is scaled by a factor equal to the standard length parameter L corresponding to the lower torso assembly _s Relative scale to the "lower torso" component length of the image data;

2.3 2.2) the scaled image data are subjected to component appearance extraction operation, and original component appearances corresponding to 19 components are extracted from each image data:

the appearance of each original assembly consists of four parts, namely an assembly pixel block, an assembly starting point, an assembly ending point and a name of the corresponding assembly, and the step is specifically realized as follows;

2.3.1 Calculate the masked area for each component:

for each component in the currently operated image data, the two-dimensional gesture component starting point SP of the component is taken _pos And two-dimensional gesture Assembly end point EP _pos Connecting SP _pos Point and EP _pos Obtaining a line segment by point, drawing a standard width parameter W with the width equal to the component corresponding to the line segment as the central line of the long edge _s The area covered by the rectangle is the mask area of the component, and 19 rectangular mask areas are obtained in total;

according to the two-dimensional gesture component starting point corresponding to each component

Two-dimensional gesture component endpoint

And a standard width parameter W _s Calculating to obtain the component placeCorresponding to the 4 vertex coordinates of the rectangle:

wherein

Is SP _pos Is greater than or equal to>

Is SP _pos Is on the ordinate and is greater or less>

Is EP _pos Is greater than or equal to>

Is EP _pos The ordinate of (a);

2.3.2 For a block I of human area pixels of currently operated-on image data using the mask area of each component obtained in 2.3.1)' _hum Performing mask processing, and collecting all pixel points covered by the mask area of each component into a pixel point array to obtain a mask processing result consisting of 19 pixel point arrays;

2.3.3 Based on the mask processing result, the human region pixel block I of the currently operated image data is processed _hum All the pixel points in the image are divided into three classes, the first class is not coveredPixel points covered by the mask region of any component, the second type is pixel points covered by the mask region of only one component, and the third type is pixel points covered by the mask regions of a plurality of components at the same time;

2.3.4 Taking all the second-class pixel points obtained by 2.3.3) classification as known conditions, and reclassifying all the first-class and third-class pixel points to enable each pixel point to be a pixel point covered by the mask region of only one component;

2.3.5 For each component in the currently operated image data, take the two-dimensional gesture component starting point SP of that component _pos And two-dimensional gesture Assembly end point EP _pos Then, all pixel points covered by the mask area of the component are taken out from the reclassification result of 2.3.4), the three parts and the component name of the component are combined together to obtain the original component appearance of the component, and the original component appearances of 19 components are extracted in total, wherein the two-dimensional gesture component starting point SP of each component _pos Component starting point, two-dimensional gesture component end point EP for each component corresponding to the extracted raw component appearance _pos Corresponding to the extracted component end point of the original component appearance, and corresponding to all pixel points covered by the mask area of each component to the extracted component pixel block of the original component appearance;

2.4 Scaling each original assembly appearance obtained in the step 2.3) in the length direction of the assembly to enable the length of the scaled original assembly appearance to be equal to the standard length parameter L corresponding to the assembly _s And (3) equality:

2.4.1 For each raw component appearance obtained in 2.3), calculating the distance between the component starting point and the component ending point of the raw component appearance, namely the length of the raw component appearance;

2.4.2 Calculate the standard length parameter L corresponding to the original component appearance _s The relative ratio to the length of the original assembly appearance, expressed as scale _y ；

2.4.3 Connecting the starting point and the terminal point of the original assembly appearance assembly to obtain a line segment, and calculating the angle of clockwise rotation required for rotating the line segment to the y-axis direction of the assembly pixel block of the original assembly appearance, wherein the angle is represented as gamma;

2.4.4 And transforming all the coordinates of the component pixel block, the component starting point and the component ending point of the original component appearance by using the following composite transformation matrix to finish the scaling of the component in the length direction:

2.5 The original component appearance obtained by the length scaling in the step 2.4) is updated into a component appearance array corresponding to the human appearance and OR diagram AOG model according to the name of the component to which the original component appearance obtained by the step 2.4) belongs, and the learning of the human appearance and OR diagram AOG model is completed, wherein a component pixel block of the original component appearance corresponds to a component appearance pixel block I of the appearance in the appearance array _psg The starting point of the original element appearance element corresponds to the starting point SP of the appearance element in the appearance array _psg The device appearance end point of the original device appearance corresponds to the device appearance end point EP of the appearance in the appearance array _psg 。

Step three: a three-dimensional human pose sample is generated from a human skeleton and or map AOG model.

In the step I, a three-dimensional human posture sample is generated by sampling the human skeleton and the image AOG model obtained by learning in the step I, and the sampling method is the same as the method for generating sn three-dimensional human posture samples by 1.6.1) sampling.

Step four: a human appearance sample is generated from the human appearance and or map AOG model.

The sampled samples of the human-appearance and-or-map AOG model are human-appearance samples, one human-appearance sample is composed of 19 components, the 19 components are the same as the 19 components defined in the human-appearance and-map AOG model, and each component contains three parameters, respectively a human-appearance component pixel block I _app Human appearance component starting point SP _app And human appearance assembly endpoint EP _app Wherein the human appearance component origin SP _app And human appearance assembly endpoint EP _app Are all human-appearance component pixel blocks I _app In (1)Pixel point coordinates;

the step is to sample from the human appearance and/or map AOG model obtained by learning in the step two to generate a human appearance sample, and the implementation is as follows:

4.1 Taking the human appearance and/or map AOG model obtained by learning in the step two, carrying out once uniformly distributed random sampling on each component appearance array in the human appearance and/or map AOG model, and obtaining the appearance of a sampling result to obtain 19 appearances from different component appearance arrays in total;

4.2 4.1) randomly sampled block I of component appearance pixels per component appearance _psg And a component appearance starting point SP _psg And end of appearance EP of the assembly _psg Blocks of human-appearance component pixels I, each being a corresponding component in a human-appearance sample _app Human appearance component starting point SP _app And human appearance assembly endpoint EP _app And combining to obtain a human appearance sample.

Step five: and constructing a pseudo three-dimensional human body model according to the three-dimensional human posture sample obtained by sampling in the third step and the human appearance sample obtained by sampling in the fourth step.

The pseudo three-dimensional human body model used in this example is a set of individual voxel points, each voxel point in the set includes a three-dimensional world coordinate system coordinate, a three-primary color RGB value, and an attributed human component name, and the process of constructing the pseudo three-dimensional human body model is a process of calculating information of each individual voxel point in the pseudo three-dimensional human body model, and is implemented as follows:

5.1 For each component in the human appearance sample, the human appearance component for that component is pixel-block I _app Placing on the spatial position of the corresponding component in the three-dimensional human gesture sample, i.e. block I of pixels of the human-looking component _app Human appearance component starting point SP on _app Three-dimensional gesture component start point SP aligning corresponding components in three-dimensional human gesture sample _ske The human appearance component pixel block I _app Human appearance assembly endpoint EP on _app Three-dimensional pose component end point EP for corresponding component in point-aligned three-dimensional human pose sample _ske Point, block I of the human-appearance component pixels _app Of a three-dimensional human gesture sample _ske Obtaining the pixel block I of the human appearance component after the placement _app The set of the corresponding voxel points of all the pixel points in the three-dimensional world coordinate system is called as the appearance voxel point set V of the component _app Obtaining a total of 19 sets of appearance voxel points V _app ；

5.2 ) set V of apparent voxel points for each component obtained in 5.1) _app Performing spatial position adjustment to obtain adjusted appearance voxel point set of the component

5.2.1 If the current component is any one of the components "upper torso", "left shoulder", "right shoulder", "left hip", and "right hip", the set of appearance voxel points V of the "lower torso" component is taken from the results of 5.1) _app Reconstructing a set of voxel points by all the voxel point coordinates in the set, called the observation voxel point set of the current component, otherwise, taking the appearance voxel point set V of the current component from the result of 5.1) _app Reconstructing an individual voxel point set by using all the voxel point coordinates in the set, wherein the individual voxel point set is called an observation voxel point set of the current component;

5.2.2 When the current component is any one of the components "upper torso", "left shoulder", "right shoulder", "left hip", and "right hip", the three-dimensional posture component starting point SP corresponding to the "lower torso" component is taken from the three-dimensional human posture sample _ske And three-dimensional pose component end point EP _ske Three-dimensional gesture component starting point SP connecting "lower torso" components _ske And three-dimensional pose component end point EP _ske Obtaining a straight line which is called as the adjusting rotating axis of the current assembly, otherwise, taking the starting point SP of the three-dimensional gesture assembly corresponding to the current assembly from the three-dimensional human gesture sample _ske And three-dimensional pose component end point EP _ske Connecting three-dimensional gesture component starting points SP of the current component _ske And a three-dimensional gesture componentEnd point EP _ske Obtaining a straight line which is called as an adjusting rotating shaft of the current assembly;

5.2.3 Using the adjustment rotation axis of the current component obtained in the step 5.2.2) as a rotation axis, clockwise rotating the observation voxel point set of the current component obtained in the step 5.2.1) to find out a observation voxel point set of the rotated current component on the view plane V _view The angle of rotation at which the visible area is maximized, which is referred to as the adjustment angle of the current component;

5.2.4 ) taking the set V of appearance voxel points of the current component from the results of 5.1) _app Using the adjustment rotation axis of the current component obtained in 5.2.2) as the rotation axis, using the adjustment angle of the current component obtained in 5.2.3) as the clockwise rotation angle, and collecting the appearance voxel points V of the current component _app Clockwise rotation is performed to obtain an adjusted appearance voxel point set of the current component

5.3 5.2) of the adjusted apparent voxel points of a total of 19 components

And combining the two to be used as a pseudo three-dimensional human body model to complete the construction of the pseudo three-dimensional human body model.

Step six: and carrying out perspective projection on the pseudo-three-dimensional human body model, and carrying out width adjustment on the projection result to obtain the projection result after the width adjustment.

And C, performing perspective projection on each individual pixel point in the pseudo three-dimensional human body model obtained in the step V, and performing width adjustment on projection coordinates of components of an upper trunk, a lower trunk, a left shoulder, a right shoulder, a left hip and a right hip in the projection result to enable the width of the components of the upper part of the human body to be matched with the width of the components of the lower part of the human body in the adjusted projection result, so that the following steps are realized:

6.1 Perspective projection: at viewpoint P _view Under the condition of (1), each individual pixel point coordinate in the pseudo three-dimensional human body model obtained in the step five is used

Perspective projection onto viewing plane V _view The resulting projection coordinates are denoted as p _pp ；

6.2 Obtain a correlation width value:

6.2.1 Take the pixel block I of the human appearance component of the 'lower torso' component from the human appearance sample obtained in step four _app The average value of the distances between the boundaries on both sides of the component counted in the pixel block is denoted as w _lb ；

6.2.2 Take the block of human-appearance component pixels I of the "upper left leg" component from the human-appearance sample obtained in step four _app The average value of the distances between the boundaries on both sides of the component counted in the pixel block is denoted as w _lul ；

6.2.3 Take the block of human-appearance component pixels I of the "upper right leg" component from the human-appearance sample obtained in step four _app The average value of the distances between the boundaries on both sides of the component counted in the pixel block is denoted as w _rul ；

6.2.4 Three-dimensional pose component end point EP for the "left hip" component from the three-dimensional human pose sample obtained in step three _ske Called the left hip joint point, and then the three-dimensional posture component end point EP of the right hip component _ske Called right hip joint point, respectively calculates the left hip joint point and the right hip joint point in the visual plane V _view And then calculating the distance between the two parallel projection coordinates, and recording as w _hip ；

6.3 Correlation width value w obtained according to 6.2) _lb 、w _lul 、w _rul And w _hip Calculating the scaling scale of the width adjustment _w ：

6.4 6.1) scaling the projection coordinates of all torso parts in the projection results to the original scale along the width direction of the lower torso assembly _w The number of times of the total number of the components,obtaining the projection result after width adjustment, and implementing the following steps:

6.4.1 Three-dimensional pose component starting point SP of 'lower torso' component taken from the three-dimensional human pose sample obtained in step three _ske And three-dimensional pose component end point EP _ske Calculating the three-dimensional gesture component origin SP using the same method as 6.1) _ske In the viewing plane V _view The three-dimensional pose component end point EP is calculated by the same method as that of 6.1) _ske Connecting the two perspective projection coordinates to obtain a line segment, and calculating to rotate the line segment to the viewing plane V _view The y-axis direction of (a) is the angle of the required clockwise rotation, denoted as β;

6.4.2 P) projection coordinates for all "upper torso", "lower torso", "left shoulder", "right shoulder", "left hip", and "right hip" components in the 6.1) projection results _pp The following composite transformation matrix is used for width adjustment to obtain the projection coordinate p after width adjustment _wa ：

6.4.3 With the width-adjusted projection coordinate p obtained in 6.4.2) _wa Updating 6.1) all corresponding projection coordinates in the projection result.

Step seven: and carrying out fusion processing on the projection result after the width adjustment to obtain a background-free human image.

And performing fusion processing on the projection result after the width adjustment in the step six, selecting a corresponding RGB value for each coordinate position in the view plane, and synthesizing the RGB values of all the selected coordinate positions into a human image without the background, wherein the method is realized as follows:

7.1 For each coordinate position on the view plane, finding out all voxel points projected to the coordinate position in the results of the step six, finding out a voxel point with the maximum x coordinate from the voxel points, if only one voxel point with the maximum x coordinate exists, taking the RGB value of the voxel point as the RGB value of the coordinate position in the view plane, and otherwise, taking the average RGB value of all voxel points with the maximum x coordinate at the coordinate position as the RGB value of the coordinate position in the view plane;

7.2 ) a background-free human image is synthesized from the RGB values at the coordinate positions in the viewing plane obtained in 7.1).

Step eight: and carrying out interpolation processing on the human image without the background to obtain the human image without the background and the empty points.

8.1 Carrying out primary median filtering on the human image without the background obtained in the seventh step to obtain a filtered image;

8.2 And) comparing 8.1) the obtained filtering image with the non-background human image obtained in the seventh step, and if no pixel exists at the coordinate position on the non-background human image and a pixel exists at the same coordinate position on the filtering image at each coordinate position in the non-background human image, adding the pixel at the position on the filtering image into the non-background human image to serve as the pixel at the position, so as to obtain the non-background non-empty-point human image.

Step nine: and adding a background to the human image without the background and the blank points to generate a human image.

9.1 Randomly selecting a natural image with the same size as the human image without the background and the empty point, and taking the natural image as a background image;

9.2 Taking the human image without background and empty point as foreground image, and synthesizing the foreground image and background image to obtain a final human image;

9.3 Output 9.2) the generated human image data samples.

The effects of the present invention can be further illustrated by the following experiments:

1. the experimental conditions are as follows:

the experiment of the invention is carried out in the environment of a Windows 10X 64 system, an Intercore i5-7300HQ, a 2.5GHz processor and an 8GB memory. The programming language is C + +, and the programming environment is VS2013+ opencv2.4.9.

The experiment of the invention skips the first concrete implementation step, and directly gives a group of learned parameters as the result of the first concrete implementation step, and the group of parameters is given in the reference number setting part later.

The data set used in the second step of the experiment specifically implemented in the invention is composed of 12 self-shielding-free human images selected from five data sets of VOC, OTB100, LSP, INRIA and DeepFashion, the 12 self-shielding-free human images are subjected to background removal by an interactive foreground extraction algorithm (GrabCT) based on iterative graph cutting to obtain 12 self-shielding-free background-free human images, and as shown in FIG. 2, the 12 self-shielding-free background-free human images are manually labeled with two-dimensional human posture labels, so that the self-shielding-free background-free human image data set with the two-dimensional human posture labels used in the second step of the experiment specifically implemented in the invention is obtained.

The experiment of the invention selects a real human image in an INRIA dataset as a contrast and a libsvm-3.24 open source code as a test classifier, wherein a train.exe program and a predict.exe program of the libsvm-3.24 are run by using default parameters, and all training images and test images are handed to the libsvm-3.24 to be used in the form of Histogram of Oriented Gradient (HOG) feature vectors.

2. Setting parameters:

in the experiments of the present invention, the hyper-parameters used and the result parameters of the specific implementation steps for the experimental tests are given in this section.

Human skeleton and or figure AOG model components "head", "upper torso", "lower torso", "left shoulder", "right shoulder", "left hip", "right hip", "left upper arm", "left lower arm", "left hand", "right upper arm", "right lower arm", "right hand", "left upper leg", "left lower leg", "left foot", "right upper leg", "right lower leg", and "right foot" have parameters r set to 60,60, 60, 42, 42, 30,30, 72, 72, 30,30, 90,90, 40, 90,90, 40, degrees, respectively; the parameters range θ are set to (0,45), (0,0), (0,0), (95,95), (95,95), (60,60), (60,60), (0,90), (0,150), (0,60), (0,90), (0,150), (0,60), (0,70), (0,135), (30,120), (0,70), (0,135), (30,120), respectively, in degrees; parameter(s)

Set to (-180, 180), (0,0), (0,0), (90,90), (-90 ), (-90, -90), (90,90), (-90,270), (0,0), (-180, 180), (-270,90), (0,0), (-180, 180), (-45,180), (180 ), (0,0), (-180,45), (180 ), (0,0) in degrees, respectively; the parameters range are set to (0,0), (-30,30), (-90,90), (0,0), (0,0), (0,0), (0,0), (-30,90), (0,0), (0,0), (-90,30), (0,0), (0,0), (-45,45), (-60,0), (0,0), (-45,45), (0,60), (0,0), respectively, in degrees.

Viewpoint P _view Set as coordinates (1000,0,0) of a three-dimensional world coordinate system, view plane V _view The plane x =200 in the three-dimensional world coordinate system is set, the population size pn is set to 16, the cross rate cr is set to 86%, the variation rate mr is set to 10%, the number of sampling samples sn is set to 1000, and the maximum number of iterations wn is set to 10000.

The standard length parameters L of the components "head", "upper trunk", "lower trunk", "left shoulder", "right shoulder", "left hip", "right hip", "left upper arm", "left lower arm", "left hand", "right upper arm", "right lower arm", "right hand", "left upper leg", "left lower leg", "left foot", "right upper leg", "right lower leg", "right foot" and "right foot _s Set as 60,60, 60, 42, 42, 30,30, 72, 72, 30, 72, 72, 30,90, 90, 40, 90,90, 40, respectively, in pixels; standard width parameter W _s Are respectively set to 54, 80, 80, 20, 20, 20, 27, 27, 27, 27, 27, 54, 54, 27, 54, 54, 27, 27, the unit is a pixel.

Human skeleton and or figure AOG model of the components "head", "upper torso", "lower torso", "left shoulder", "right shoulder", "left hip", "right hip", "left upper arm", "left lower arm", "left hand", "right upper arm", "right lower arm", "right hand", "left upper leg", "left lower leg", "left foot", "right upper leg", "right lower leg" and "right foot" of the parameter μ _θ Are respectively set to 0,0,0, 95,95, 60,60, 90,0,0, 90,0,0, 60,0, 45,60,0, 45 in degrees; parameter sigma _θ Set to 15,0,0,0,0,0,0, 15, 25, 10, 15, 25, 10, 10, 22, 12, 10, 22, 12, respectively, in degrees; parameter(s)

Set to 0,0,0, 90, -90, -90,90, 90,0,0, -90,0,0, 90, 180,0, -90, 180,0, respectively, in degrees; parameter->

Set to 180,0,0,0,0,0,0, 30,0, 180, 30,0, 180, 22,0,0, 22,0,0, respectively, in degrees; parameter mu _n Set to 0,0,0,0,0,0,0, 30,0,0, -30,0,0,0,0,0,0,0,0, respectively, in degrees; parameter sigma _n Set to 0, 10, 30,0,0,0,0, 10,0,0, 10,0,0,7, 20,0,7, 20,0, respectively, in degrees;

3. and (3) analyzing the experimental content and the result:

experiment 1, 800 human images were generated by repeatedly performing the second to ninth steps of this embodiment under the above-described parameter settings and experimental conditions, and a part of the human images are shown in fig. 3.

Experiment 2, taking the first 241 real human images of the training set in the INRIA data set, and taking the 241 real human images as an independent human image data set which is called an INRIA241 data set; the INRIA241 data set and 800 human images generated by the method of the present invention are combined together to form an independent human image data set, which is called INRIA241+ our data set. Training by using an INRIA241 data set to obtain a Support Vector Machine (SVM) classifier, testing a test set of the INRIA data set by using the classifier, recording an experimental result and drawing a working characteristic curve ROC of a subject; and then training by using an INRIA241+ our data set to obtain an SVM classifier, testing the test set of the INRIA data set by using the classifier, recording an experimental result and drawing an ROC curve. The ROC curves of the two classifiers compare the results, as shown in fig. 4.

Experiment 3, taking the first 120 real human images of the training set in the INRIA dataset, and taking the 120 real human images as an independent human image dataset which is called an INRIA120 dataset; the 800 human images generated by the method of the invention are taken as an independent human image data set, which is called a data set our. Training by using an INRIA120 data set to obtain an SVM classifier, testing a test set of the INRIA data set by using the classifier, recording an experimental result and drawing an ROC curve; and training by using a data set of our to obtain an SVM classifier, testing a test set of the INRIA data set by using the classifier, recording an experimental result and drawing an ROC curve. The ROC curves of the two classifiers compare the results, as shown in fig. 5.

As can be seen from fig. 4 and fig. 5, in experiment 2, the ROC curve quality of the classifier trained by the data set "INRIA241+ our" is significantly better than that of the classifier trained by the data set "INRIA 241"; in experiment 3, the quality of ROC curve of the classifier trained by the data set our "is also obviously better than that trained by the data set INRIA 120.

Experiments 2 and 3 verify the diversity and effectiveness of the human images generated by the invention.

From the two comparative experiments, experiment 2 and experiment 3, the accuracy results of the classification test are shown in table 1.

TABLE 1 different data set training resulting classifier performance

Data set	Classifier accuracy
		INRIA241	81.9272％
INRIA241+our800	91.0746％
		INRIA120	84.99％
our800	90.41％

As can be seen from table 1: in the result of experiment 2, the accuracy of the classifier obtained by training the data set INRIA241 is only 81.9272%, and the accuracy of the classifier obtained by training the data set INRIA241 is 91.0746% by adding the training set INRIA241+ our of the human image sample generated by the method of the present invention, and the training effect of the data set INRIA241+ our is much higher than that of the data set INRIA241, which shows that the human image sample image generated by the method of the present invention is effective and can significantly improve the training effect of the pedestrian classifier. In the result of experiment 3, the accuracy of the classifier trained by the data set "INRIA120" is also lower than that trained by the data set "our" which shows that the generated human images can produce a greater effect in the human classifier training task than the 120 real human images in the data set "INRIA120" under the condition of using only 12 non-self-shielding human images as input, which further proves the diversity of the human images generated by the method of the present invention.

In conclusion, the invention can effectively complete the data enhancement task of human image data, the learning effect of a machine learning model based on human image data is improved. Therefore, the human image generated by the present invention is diverse and effective.

Claims

1. A method for human image generation based on and-or graph AOG, comprising:

2. The method of claim 1, wherein the learning of the human skeleton and or map AOG model is performed in (1) a two-dimensional human pose dataset using a genetic algorithm as follows:

(1.1) counting a two-dimensional human posture data set to obtain an optimized target two-dimensional posture statistic F _t ；

(1.2) setting the viewpoint P in the perspective projection parameters _view Visual plane V _view Setting population size pn, cross rate cr, variation rate mr, sampling sample number sn and maximum iteration number wn in genetic algorithm parameters;

(1.3) creating an initial population: each individual in the population corresponds to a human skeleton and image AOG model, 19 components are defined in the human skeleton and image AOG model, each component comprises 6 parameters to be learned, and the parameters are respectively an expected value parameter mu of normal distribution obeyed by a spherical coordinate latitude angle parameter theta of the component _θ The normal distribution standard deviation value parameter sigma obeyed by the spherical coordinate latitude angle parameter theta of the component _θ Spherical coordinate longitude angle parameter of the assembly

Expected value parameter obeying normal distribution

Spherical coordinate longitude and latitude parameter of the assembly>

The criterion-difference value parameter obeying a normal distribution>

(1.4) crossing: combining individuals in the current population randomly pairwise, and performing cross operation on each combination according to the probability of a cross rate cr, namely half of genotypes of the two individuals in the combination are mutually exchanged;

(1.5) mutation: carrying out mutation operation on each individual in the current population according to the probability of the mutation rate mr, namely randomly selecting one component from 19 components defined by a human skeleton and an AOG model, and carrying out uniformly distributed random dereferencing on 6 parameters corresponding to the component again;

(1.6) calculating the fitness of each individual in the current population;

(1.7) selecting: finding out two individuals with lowest fitness in the current population, combining the other individuals in pairs at random, selecting two combinations at random to perform cross operation to obtain two new cross individuals, and updating the two individuals with lowest fitness in the current population by using the two new cross individuals;

(1.8) finding out the current generation optimal individual with the highest fitness in the current population, if the current generation optimal individual is the initial generation population, recording the current generation optimal individual as a history optimal individual, and if not, comparing the fitness of the current generation optimal individual and the history optimal individual, and updating the individual with higher fitness as the history optimal individual;

and (1.9) iterating (1.4) to (1.8) until the maximum iteration time wn is reached, and outputting the history optimal individual as a learning result.

3. The method of claim 2, wherein (1.1) the two-dimensional human pose dataset is statistically calculated to obtain an optimization target two-dimensional pose statistic F _t The implementation is as follows:

(1.1 a) counting the frequency of occurrence of relative rotation angles of each component in a two-dimensional human posture data set to obtain a first 190-dimensional frequency statistic;

(1.1 b) counting the occurrence frequency of the relative expansion ratio of each component in the two-dimensional human posture data set to obtain a second 190-dimensional frequency counting quantity;

(1.1 c) connecting the two 190-dimensional frequency statistics obtained in the step (1.1 a) and the step (1.1 b) in series to form a 380-dimensional frequency statistic, and dividing each frequency value in the 380-dimensional frequency statistic by the data volume of the two-dimensional human posture data set to obtain an optimized target two-dimensional posture statistic F _t 。

4. The method of claim 2, wherein the fitness for each individual in the current population is calculated in (1.6) as follows:

(1.6 a) constructing a corresponding human skeleton and map AOG model according to the genotype of the currently operated individual, and using the human skeleton and map AOG model for sampling to generate sn three-dimensional human posture samples;

(1.6 b) reducing all of the three-dimensional human gesture samples generated in (1.6 a) into two-dimensional human gesture samples;

(1.6 c) taking all the two-dimensional human posture samples obtained in (1.6 b) as a data set, and counting the data set to obtain an individual expression type two-dimensional posture statistic F _c ；

(1.6 d) calculating the Individual phenotype two-dimensional posture statistic F _c And optimizing target two-dimensional pose statistic F _t And taking the reciprocal of the error as the fitness of the current operation individual.

5. The method of claim 1, wherein the learning of the human appearance and/or map AOG model is performed in (2) a self-occlusion free and background free human image dataset with two-dimensional human pose labels, as follows:

(2.1) set the standard length parameter L for each of the 19 components defined by the human appearance and/or graph AOG model _s And a standard width parameter W _s ；

(2.2) carrying out equal scaling on all image data in the non-self-blocking and non-background human image dataset with the two-dimensional human posture label to ensure that the length of the lower trunk component of all the image data is equal to the standard length parameter L corresponding to the lower trunk component _s Equal;

(2.3) carrying out component appearance extraction operation on the image data after the (2.2) equal scaling, and extracting original component appearances corresponding to 19 components from each image data, wherein each original component appearance consists of four parts, namely a component pixel block, a component starting point, a component ending point and a name of the component to which the original component appearance belongs;

(2.4) zooming each original assembly appearance obtained in the step (2.3) in the length direction of the assembly to enable the length of the zoomed original assembly appearance to be in accordance with the standard length parameter L corresponding to the assembly _s Equal;

and (2.5) updating the original component appearance after the length scaling obtained in the step (2.4) into a corresponding component appearance array in the human appearance and NOR graph AOG model according to the name of the component to which the original component appearance belongs, and finishing the learning of the human appearance and NOR graph AOG model.

6. The method of claim 5, wherein the original component appearance corresponding to each of the 19 components is extracted from each of the image data in (2.3) as follows:

(2.3 a) calculating the mask area of each component: for each component in the currently operated image data, the two-dimensional gesture component starting point SP of the component is taken _pos And two-dimensional gesture Assembly end point EP _pos Connecting SP _pos Point and EP _pos Obtaining a line segment by point, drawing a standard width parameter W with the width equal to the component corresponding to the line segment as the central line of the long edge _s The area covered by the rectangle is the mask area of the component, and 19 rectangular mask areas are obtained in total;

(2.3 b) for the human region pixel block I of the currently operated image data using the mask region of each component obtained in (2.3 a) _hum Performing mask processing, and collecting all pixel points covered by the mask area of each component into a pixel point array to obtain a mask processing result consisting of 19 pixel point arrays;

(2.3 c) according to the mask processing result, the human region pixel block I of the image data operated at present _hum All the pixel points in the image are divided into three types, wherein the first type is the pixel points which are not covered by the mask area of any component, the second type is the pixel points which are only covered by the mask area of one component, and the third type is the pixel points which are simultaneously covered by the mask areas of a plurality of components;

(2.3 d) taking all the second-class pixel points obtained by classifying in the step (2.3 c) as known conditions, and reclassifying all the first-class pixel points and the third-class pixel points to enable each pixel point to be a pixel point covered by the mask region of one assembly;

(2.3 e) for each component in the currently operated image data, taking the two-dimensional gesture component starting point SP of the component _pos And two dimensionsPose component endpoint EP _pos And then all pixel points covered by the mask region of the component are taken out from the reclassification result of (2.3 d), the three parts and the component name of the component are combined together, namely the extracted original component appearance of the component, and the original component appearances of 19 components are extracted in total.

7. The method of claim 1, wherein (3) a three-dimensional human pose sample is generated from a human skeleton and/or map AOG model by:

(3.1) randomly sampling spherical coordinate latitude angle parameter theta and spherical coordinate longitude angle parameter of all components from normal distribution

And a normal vector position parameter n, sampled together to 19 groups θ, </or >>

n is the parameter value;

(3.2) according to a tree structure of dependencies between the human skeleton and 19 components defined by the graph AOG model, using theta, and delta, obtained by (3.1) sampling, in the order of breadth-first traversal from the root node,

n value sequentially calculates three-dimensional gesture component starting point SP of each component _ske Three-dimensional pose component end point EP _ske Three-dimensional attitude component normal vector N _ske ；

(3.3) three-dimensional attitude component starting points SP of all the components obtained in (3.2) _ske Three-dimensional pose component end point EP _ske Three-dimensional attitude component normal vector N _ske All of which are combined together as a sampled three-dimensional human gesture sample.

8. The method of claim 1, wherein the pseudo-three-dimensional human body model is constructed in (5) based on a three-dimensional human pose sample and a human appearance sample by:

(5.1) block I of human-appearance component pixels of components in a human-appearance sample _app Placing the three-dimensional human posture sample at the space position of the corresponding assembly to obtain the appearance voxel point set V of each assembly _app ；

(5.2) adjusting the appearance body point set V of each component according to the normal vector direction of each component in the three-dimensional human posture sample _app Respectively, the appearance body point set V of each component _app In the viewing plane V _view The visual area is maximized, and the appearance voxel point set of each adjusted component is expressed as

(5.3) collecting the appearance body points of each assembly adjusted in (5.2)

9. The method of claim 1, wherein the width of the projection result in (6) is adjusted to obtain a width-adjusted projection result, and the method is implemented as follows:

(6.1) taking the pixel block I of the human appearance component of the 'lower torso' component in the human appearance sample _app The average value of the distances between the boundaries on both sides of the component counted in the pixel block is denoted as w _lb ；

(6.2) taking the human appearance component pixel block I of the 'upper left leg' component in the human appearance sample _app The average value of the distances between the boundaries on both sides of the component counted in the pixel block is denoted as w _lul ；

(6.3) taking the human appearance component pixel block I of the 'upper right leg' component in the human appearance sample _app The average of the distances between the boundaries on both sides of the statistical component in this block of pixels is denoted w _rul ；

(6.4) taking three dimensionsThree-dimensional pose component endpoint EP for the left hip component in a human pose sample _ske Called the left hip joint point, and then taking the three-dimensional posture component end point EP of the right hip component in the three-dimensional human posture sample _ske Called right hip joint point, respectively calculates the left hip joint point and the right hip joint point in the visual plane V _view And then calculating the distance between the two parallel projection coordinates, and recording as w _hip ；

(6.5) calculating the scaling scale of the width adjustment according to the parameters obtained in (6.1) to (6.4) _w ：

(6.6) scaling the projection coordinates of all the torso parts in the projection result to the original scale along the width direction of the lower torso assembly _w And multiplying to obtain the projection result after the width is adjusted.