CN111950346A

CN111950346A - Pedestrian detection data expansion method based on generation type countermeasure network

Info

Publication number: CN111950346A
Application number: CN202010595052.3A
Authority: CN
Inventors: 彭滢; 吴杰
Original assignee: China Electronic Technology Cyber Security Co Ltd
Current assignee: China Electronic Technology Cyber Security Co Ltd
Priority date: 2020-06-28
Filing date: 2020-06-28
Publication date: 2020-11-17

Abstract

The invention relates to a pedestrian detection data expansion method based on a generating type countermeasure network, which comprises the following steps: s1, building a three-layer cascade generation type antagonistic neural network model, and setting a target function for model training; each layer of the generated antagonistic neural network adopts a BicycleGAN structure, the generator adopts a residual U-net structure, and the input of the next layer of network is a pedestrian instance mask picture and the output of the previous layer of network; s2, preprocessing training data; s3, training a three-layer cascade generation type antagonistic neural network model by adopting the preprocessed data; and S4, completing the expansion of pedestrian detection data through a three-layer cascade generation type antagonistic neural network model. The pedestrian generated by the scheme of the invention is more naturally fused with the background, and the detail of the generated pedestrian is more fine by improving the U-net structure of the generator; the multi-scale pedestrian picture is generated based on the cascade structure, so that the quality of the large-size and high-resolution pedestrian picture is improved; diversified pedestrians can be generated, and the efficiency of data expansion is improved.

Description

Pedestrian detection data expansion method based on generation type countermeasure network

Technical Field

The invention relates to the field of image processing, in particular to a pedestrian detection data expansion method based on a generative confrontation network.

Background

The invention relates to pedestrian detection, which is a basic task in video processing and is widely applied to scenes such as intelligent video monitoring, automatic driving, robot automation and the like, and a large-scale and high-quality pedestrian picture data set is required for training a high-precision pedestrian detection model. At present, the research related to pedestrian detection mainly uses the existing public data sets, most of which come from huge internet companies, and the public data sets invest a great deal of manual labeling and correction cost to ensure the reliability of the data sets. When training the model on these public data sets, researchers often augment the training set pictures using traditional data augmentation methods, such as picture flipping, random cropping, adjusting colors, and the like. However, these methods do not enrich the content of the data set per se, and the effect of data expansion is limited. In view of the problem, a generation type antagonistic neural network with a cascade structure is provided, which can automatically generate pedestrians with various sizes, high quality and different clothes, thereby achieving the purpose of automatically expanding a pedestrian detection data set in a large scale.

A Generative adaptive neural Network (GAN) is a deep learning Network structure that contains two basic parts: a generator and a discriminator. In the process of generating pictures by using GAN, the generator aims to generate pictures as real as possible, and the discriminator aims to judge which data are real and which data are generated. By training the network, the generator and the discriminator continuously resist against each other, and finally the network learns to generate the picture which is close enough to the real picture.

In recent years, data expansion based on the GAN network becomes a research hotspot, and there are researches on generating plant pictures for plant identification, generating medical CT pictures for assisting intelligent diagnosis, and the like, but the research on generating pedestrian pictures is still less, and the quality of the generated pedestrian pictures is still to be improved. The pedestrian generation method based on the GAN network for predecessor work mainly comprises the following steps: providing a background picture (such as a street picture), adding a pedestrian frame at a position where a pedestrian is expected to be generated in the background picture, replacing the background picture in the frame with noise (such as Gaussian noise), using the background picture as an input of a GAN network, constructing a generator based on a U-net network structure, judging whether the generated pedestrian is real by using a local discriminator, judging whether the whole pedestrian picture is real by using a global discriminator, and processing multi-resolution pedestrians with different sizes by using a Spatial Pyramid Pooling technology (Spatial Pyramid Pooling). This approach has several problems: first, the border between the added box and the background has obvious edge traces, and the generated picture looks like a square sticker pasted on the background, which is not real. Second, the pedestrian details generated by the model are coarse and of low quality, which is particularly acute on large-size, high-resolution pedestrians. Thirdly, the model trained by the method lacks diversity, and the generated pedestrian is dressed with clothes and similar in color, which is not good enough for data expansion.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: 1. the problem that edge traces are obvious when a pedestrian frame and a background are fused in a generated pedestrian picture is solved; 2. the problem that the details of the generated pedestrian are rough is solved; 3. the problem of low mass of the pedestrians with large size and high resolution is solved; 4. the problem that the generated pedestrian pictures are lack of diversity is solved. In view of the existing problems, a pedestrian detection data expansion method based on a generative countermeasure network is provided.

The technical scheme adopted by the invention is as follows: a pedestrian detection data expansion method based on a generative confrontation network comprises the following steps:

s1, building a three-layer cascade generation type antagonistic neural network model, and setting a target function for model training; each layer of the generated antagonistic neural network adopts a BicycleGAN structure, the generator adopts a residual U-net structure, and the input of the next layer of network is a pedestrian instance mask picture and the output of the previous layer of network;

s2, preprocessing training data;

s3, training a three-layer cascade generation type antagonistic neural network model by adopting the preprocessed data;

and S4, completing the expansion of pedestrian detection data through a three-layer cascade generation type antagonistic neural network model.

Further, in S1, the specific process of building a three-layer cascade generation type antagonistic neural network model includes:

s11, constructing a generator with a residual U-net structure, wherein an encoder part of the generator is added with a multi-scale residual block, and a decoder part of the generator is added with a channel attention residual block; specifically, a generator of the residual U-net structure is improved on the basis of the U-net structure, and in an encoder part, a second 3 x 3 convolution in each basic block of the U-net is replaced by a multi-scale residual block to serve as a new basic block; in the decoder part, replacing the first 3 x 3 convolution in each basic block of U-net with a channel attention residual block as a new basic block; one masked 16-dimensional hidden layer vector is injected into each middle layer of the encoder.

S12, constructing a discriminator based on the discriminator of PatchGAN;

s13, constructing an encoder based on the residual error network;

s14, constructing a generator, a discriminator and an encoder by each layer of the cascade network through S11, S12 and S13, wherein the input picture resolution of the first layer is 64 x 64, the second layer is 128 x 128, and the third layer is 256 x 256; the two layers of generative antagonistic networks are connected through a convolution layer to form a three-layer cascade generative antagonistic neural network;

s15, adding the perception loss based on VGG-19 into the BicycleGAN-based target function as the target function of the three-layer cascade generation type antagonistic neural network model.

Further, in step 11, each intermediate layer of the encoder portion of the generator injects a hidden layer vector z of 16 dimensions, which is masked by the pedestrian instance mask.

Further, the objective function of the three-layer cascade generation type antagonistic neural network model in the step 15 is specifically as follows:

wherein G is^*,E^*Respectively representing generators and compilationsEncoder, D_wholeIs a global arbiter, D_localIs a local-area discriminator and is used for discriminating the local area,

and L_GAN(. cndot.) represents the antagonistic loss objective function of cVAE-GAN and cLR-GAN in the BicycleGAN network structure, respectively;

is L₁Loss, which makes the input of the generator as similar as possible to the pedestrian sample picture;

is also L₁Loss, which makes the output of the encoder as close to gaussian as possible; l is_KLIs the KL distance, L, in cLR-GAN_VGGIs a loss of perception; lambda, lambda_KL、λ_latent、λ_VGGIs a hyper-parameter, controlling the weight of the corresponding item.

Further, the S2 specifically includes:

s21, taking out a pedestrian sample map with the pixel size required by each layer of the generated confrontation network from the Cityscapes data set to obtain a pedestrian sample map set;

s22, obtaining an example label map corresponding to each pedestrian sample map according to the example label map set of the Cityscapes and the pedestrian sample map set obtained in the step S1, aligning and cutting each label map and the corresponding sample map, and repeating the process to obtain an example label map set L corresponding to the pedestrian sample map set;

s23, setting the pixel point value of the pedestrian at the middle in each sample label image as 1, and setting the pixel point values of other pixels as 0, so as to obtain a pedestrian sample mask M of each sample image;

s24, processing the corresponding pedestrian sample image by using the obtained pedestrian instance mask code to obtain an image B subjected to pedestrian instance mask code_M；

S25, aligning an example label graph of the Cityscapes data set with the pedestrian sample graph set obtained in the step S1, setting the pixel value of the boundary between the example and the example in the example label graph to be 1, and setting the pixel value in the example to be 0, so as to obtain a corresponding example edge graph E;

s26, and B corresponding to each picture in the set obtained in the steps S22, S23, S24 and S25_MM, L and E are spliced in sequence to obtain a three-layer cascade generation type antagonistic neural network model input set A, wherein A is { B {_M,M,L,E}。

Further, in the step 21, different pedestrian sample maps are extracted for each layer of the generated antagonistic neural mesh,

for the first layer of network, taking pedestrian samples with the height of [64, 256] pixels in the data set, wherein each pedestrian sample is a square picture, the side length of each pedestrian sample is equal to the height of the pedestrian, the center of each picture is the center of the pedestrian, and the size (resize) of the taken picture is adjusted to be 64 x 64 pixels;

for the second-layer network, taking out a pedestrian sample with the height of [100, 1024] pixels, and adjusting the size of the picture to 128 × 128 pixels;

for the third tier network, take pedestrian samples at [150, 1024] pixel height and resize the picture to 256 × 256 pixels.

Further, the S3 specifically includes: when the cascade network is trained, N rounds of training are carried out,

the goal of the first layer network is a learning generator G₁And an encoder E₁The objective function does not use the perceptual loss when training the first layer;

when training the second layer, the front N/2 wheel fixed generator G₁And an encoder E₁Only updates the generator G₂And an encoder E₂The last N/2 rounds of simultaneous update of the generator G₁、G₂And an encoder E₁、E₂The weight of (c).

When training the third layer, the front N/2 wheel fixed generator G₁、G₂And an encoder E₁、E₂Only updates the generator G₃And an encoder E₃The last N/2 rounds of simultaneous update of the generator G₁、G₂、G₃And an encoder E₁、E₂、E₃The weight of (c).

Wherein G is₁、G₂、G₃The generator of the first layer, the second layer and the third layer generation type antagonistic neural network is respectively referred; e₁、E₂、E₃The first layer, the second layer and the third layer are respectively referred to as encoders of the generative type antagonistic neural network.

Further, in S3, the weight is updated by Adam optimization method, and the learning rate is w^h-iLr, where lr is the base learning rate, h is the total number of layers in the cascade, i is the ordinal number of the current trained layer, and w is the weight parameter.

Further, the S4 specifically includes:

s41, establishing the height P of the pedestrian in the picture_hWith pedestrian position P_posThe linear relationship of (a);

s42, obtaining a road surface position coordinate set according to example label marks provided by the data set;

s43, obtaining a pedestrian position coordinate set by position coordinates in 10 x 10 pixels of the pedestrian frame bottom edge of the existing pedestrian in the example label provided by the statistical data set;

s44, for a given picture I needing to generate the pedestrian, randomly selecting a data set from a road surface position coordinate set and a pedestrian position coordinate set, and randomly selecting a position from the two data sets as the position P of the pedestrian_posCalculating the height P of the newly generated pedestrian according to the linear relation of the step S41_h；

Cutting out a P from picture I_h*P_hBackground picture of size I_bgThe center of which coincides with the center of the generated new pedestrian; randomly selecting a mask M from a pedestrian instance mask data set, and its corresponding instance mask L and edge picture E, according to I_bgCalculating a sum mask M to obtain a masked picture B_MMask M, instance mask L, edge picture E, and masked picture B_MInputting the trained three-layer cascade generation type antagonistic neural network model together to obtain a generated picture I_pedIn the picture I, a background picture I is taken_bhPixel-by-pixel replacement with generated Picture I_pedComplete one-time data expansion；

And S45, repeating the step 41 to obtain a large amount of expansion data.

Further, in S1, the pedestrian height P_hWith pedestrian position P_posThe linear relationship of (a) is specifically:

P_h ^global＝a^global*P_pos ^global+b^global

wherein, P_h ^globalPedestrian height P for the entire data set_hStatistical value of (P)_pos ^globalPedestrian position P for the entire data set_posThe statistical value of (1).

Compared with the prior art, the beneficial effects of adopting the technical scheme are as follows:

1. the pedestrian case mask is used for solving the problem that the pedestrian frame mask is obvious in edge in the background image; the pedestrian example mask can provide the shape of the pedestrian, the generated body edge of the pedestrian is clearer, and the posture is more real;

2. a multi-scale residual block and a channel attention residual block are introduced to improve the U-net structure of the generator, so that the details of the generated pedestrian are finer;

3. the multi-scale pedestrian picture is generated based on the cascade structure, so that the quality of the large-size and high-resolution pedestrian picture is improved;

4. diversified pedestrians can be generated, and the efficiency of data expansion is improved.

Drawings

Fig. 1 is a schematic process diagram of a pedestrian detection data expansion method based on a GAN network in the present invention.

Fig. 2 is a schematic diagram of the overall structure of a three-layer cascade generation type antagonistic neural network in the invention.

Fig. 3 is a schematic diagram of a generator residual U-net network structure in the present invention.

Fig. 4 is a schematic diagram of a multi-scale residual block structure in the generator of the present invention.

FIG. 5 is a schematic diagram of a channel attention residual block structure in the generator according to the present invention.

FIG. 6 is a schematic diagram of the structure of the discriminator in the present invention.

Fig. 7 is a schematic diagram of an encoder structure according to the present invention.

Fig. 8 is a schematic diagram of a connection structure between cascaded layers in the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

The technical problems to be solved by the invention are as follows:

1. the problem that edge traces are obvious when a pedestrian frame and a background are fused in a generated pedestrian picture is solved;

2. the problem that the details of the generated pedestrian are rough is solved;

3. the problem of low mass of the pedestrians with large size and high resolution is solved;

4. the problem that the generated pedestrian pictures are lack of diversity is solved.

Based on the above, the invention provides a pedestrian detection data expansion method based on a generative countermeasure network, which comprises the following specific schemes:

step 1: and building a cascade generation type antagonistic neural network. The scheme provides a three-layer cascade generation type antagonistic neural network (as shown in figure 2), each layer generation type antagonistic neural network uses the structure of BicycleGAN, but the network structure of the generator is improved into a residual U-net network, and the neural network provided by the embodiment is used for learning from B_MMapping to B, wherein B_MIs a set of input fields, each element of which is a background picture obscured by a pedestrian instance mask; b is the set of output fields, each element of which is a pedestrian picture containing the background. To provide more information to the network, we not only input B when training the network_MAnd inputting a pedestrian instance mask set M corresponding to the pedestrian instance mask, an instance label graph set L used for calculating M, and an instance edge graph set E obtained through L. Therefore, the input to the network is a ═ B_MM, L, E }, and the output is B. It should be noted that the network of this embodiment learns the mapping of "1 to many", and there are many possible outputs for one input, but we only input the mapping of "1 to 1" during training, and can obtain many outputs during testing. Utensil for cleaning buttockThe body is as follows:

and 11, constructing a generator of a residual U-net structure. The residual U-net generator is improved on the basis of the U-net: in the encoder part, replacing the second 3 x 3 convolution in each basic block of the U-net with a multi-scale residual block as a new basic block; in the decoder part, replacing the first 3 x 3 convolution in each basic block of U-net with a channel attention residual block as a new basic block; one masked 16-dimensional hidden layer vector is injected into each middle layer of the encoder. For the three-layer cascade generation type antagonistic neural network, the number of basic blocks of the encoder part and the decoder part of a generator is equal on each layer, and the total number of the basic blocks is n₁＝12，n₂＝14，n₃16, the jth and the nth-j basic blocks are connected in a hop mode consistent with the original U-net.

Specifically, each multi-scale residual block of the encoder portion is defined as:

M_n＝S'+M_n-1

where w and b are weights and offsets, the upper corner indicates the location of the network layer, and the lower corner indicatesThe table represents the convolution kernel size in a convolutional network, and differs from the multi-scale residual block of the prior art in that the activation function (-) of the present invention does not use a ReLU, but rather a LeakyReLU, brackets]Representing a connection operation, M_nAnd M_n-1Representing the output and input of the multi-scale residual block, respectively.

In a channel attention residual block of a decoder, channel attention ca (x) is defined by three steps:

s＝f(W_U(W_Dy))，

wherein the input data is X ═ X₁,x₂,...,x_C]C is a feature map of size H × W, y is statistical information for each channel, x_c(i, j) is the value at location (i, j), AA (. cndot.) is Average Possing, f (. cndot.) is sigmoid, (. cndot.) is LeakyReLU, and W is the weight. Based on ca (x), the channel attention residual block a (x) is represented as:

A_n＝CA(X)·X+A_n-1

X＝W²(W¹A_n-1)

wherein, W¹And W²Is the weight of both convolutional layers.

Fig. 3 is a schematic diagram of a generator structure in the first-tier impedance neural network, where C (in, out, k, s) represents a convolutional layer whose input channel number is in, output channel number is out, the size of the convolutional kernel is k × k, and the step size is s; CT (in, out, k, s) represents a transposed convolutional layer; CAT is splicing operation; DS (t) is a downsampling operation by a factor of t; m (in, out) is a multi-scale feature residual block, and A (in, inter) is a channel attention residual block. Fig. 4 is a schematic structural diagram of a multi-scale feature residual block, and fig. 5 is a schematic structural diagram of a channel attention residual block, where inter is the number of channels of an intermediate convolution layer in the residual block, and is obtained by calculating an input channel number in and an attenuation rate r, where r is 16 in this embodiment.

And step 12, constructing a discriminator. All discriminators use the discriminator structure proposed by PatchGAN, the structure of which is schematically shown in FIG. 6. Where C (in, out, k, s) represents a convolutional layer and AvgPool (k, s) represents an average pooling layer.

The network uses Leaky ReLU as the activation function with a parameter of 0.2 and Instance Normalization is the example regularization.

And step 13, constructing an encoder of the BicycleGAN network. The encoder is based on a residual error network, and the structure is schematically shown in FIG. 7. R (in, out, k, s) represents a standard residual block, Linear (in, out) represents a Linear layer,

representing element-by-element addition operation, the network uses Leaky ReLU as the activation function, with a parameter of 0.2, and Instance Normalization is the example regularization.

And step 14, building a three-layer cascade generation type antagonistic neural network.

Each layer of the cascade network uses the generator, discriminator and encoder constructed in steps 11, 12, 13, the first layer inputs pictures with a resolution of 64 x 64, the second layer 128 x 128 and the third layer 256 x 256. Two GAN networks are connected by a convolutional layer, which is schematically shown in fig. 8.

And step 15, setting an objective function of model training. Since each layer of the cascade is a network with a BicycleGAN structure, the present embodiment adopts the objective function proposed by BicycleGAN. In addition, in order to make the generated pedestrian more human-like, the text also adds a perception loss based on VGG-19 as an objective function. The final objective function is:

wherein G is^*,E^*Respectively representing a generator and an encoder, D_wholeIs a global arbiter, D_localIs a local-area discriminator and is used for discriminating the local area,

and L_GAN(. cndot.) represents the antagonistic loss objective function of cVAE-GAN and cLR-GAN in BicycleGAN, respectively.

is also L₁It makes the output of the encoder as close to gaussian as possible. L is_KLIs the KL distance, L, in cLR-GAN_VGGIs the loss of perception. Lambda, lambda_KL、λ_latent、λ_VGGIs a hyper-parameter, controlling the weight of the corresponding item.

Step 2: and (4) preprocessing data. The scheme performs model training on a training set provided by a public data set Cityscapes and performs testing on a verification set of the public data set Cityscapes. The resolution of each street view picture in the data set is 1920 x 1080, and the model of the invention is trained to focus only on the part of the picture with pedestrians. The method comprises the following specific steps:

step 21, taking out a pedestrian sample from the data set. To train the first-layer GAN network, we take pedestrian samples whose height in the data set is [64, 256] pixels, each pedestrian sample is a square picture, the side length is equal to the height of a pedestrian, and the center of the picture is the center of the pedestrian. Adjusting the size (resize) of the fetched picture to 64 × 64 pixels; in order to train a second-layer GAN network, a pedestrian sample with the height of [100, 1024] pixels is taken out, and the size of the picture is adjusted to 128 × 128 pixels; to train the third GAN network, take pedestrian samples at [150, 1024] pixel height and resize the picture to 256 × pixels.

And 22, obtaining an example label graph set L corresponding to the pedestrian sample. And (3) acquiring an example label map corresponding to each pedestrian sample map through an example label map set provided by Cityscapes and the pedestrian sample map set obtained in the step (21), aligning and cutting the label map and each sample map to obtain an example label map set.

And 23, obtaining a pedestrian example mask M corresponding to the pedestrian sample. And setting the pixel point value of the pedestrian which belongs to the middle of the tag map as 1 and setting the pixel point values of other pixels as 0 for each tag map, and obtaining the pedestrian instance mask of each sample map.

Step 24, obtaining an image B after being masked by a pedestrian instance_M. The pedestrian instance mask obtained in step 21 and the pedestrian instance mask obtained in step 22 are used to obtain an image after the pedestrian instance mask, the pedestrian in the middle of the image is masked to be white, and other parts of the image are still reserved.

And 25, obtaining an example edge map set E corresponding to the pedestrian sample. And (3) comparing the example label graph provided by the Cityscapes with the sample graph set obtained in the step (21), setting the pixel value at the boundary of the example and the example in the example label graph to be 1, and setting the pixel value in the example to be 0, so as to obtain a corresponding example edge graph.

Step 26, obtaining an input set a of the model, a ═ B_MM, L, E }. B corresponding to each picture in the set obtained in the steps 22, 23, 24 and 25_MM, L and E are spliced in sequence to obtain an input set.

And step 3: and training a three-layer cascade generation type antagonistic neural network model. Hyper-parameters lambda, lambda_KL、λ_latent、λ_VGGSet to 10, 0.01, 0.5 and 1, set to 1, train run number 200.

No perceptual loss is used in training the first layer GAN network, as we find that this loss leads to training instability at this resolution.

In training the cascading network, the goal of the first layer network is the learning generator G₁And an encoder E₁Weight of (1), training 200 rounds; for training the second layer, the front 100 wheels are fixed G₁And E₁Weight of G, only G is updated₂And E₂Updating G simultaneously for the rear 100 wheels₁、E₁、G₂And E₂(ii) a The third layer and the second layer adopt the same strategy.

Updating weights using Adam optimization method, learning thereofA rate of w^h-iLr, where lr is the base learning rate, h is the total number of layers in the cascade, i is the ordinal number of the current training layer, and w is the weighting parameter, where lr is 0.0002, h is 3, i is {1,2,3}, and w is 0.01.

And 4, step 4: and (3) expanding the CityPersons data set pedestrian detection data by using the model trained in the step 3. CityPersons is a public data set that extends from the cityscaps data set, and also provides data for city street pictures, instance tag labels, etc.

The specific steps of data expansion are as follows:

and step 41, determining the relation between the height and the position of the pedestrian in the data set. Height of pedestrian P_hIndicating the position of the pedestrian by P_posRepresenting that there is an association between the two: the closer the pedestrian is located to the camera position where the picture is taken, the greater its height. P_hAnd P_posCan be obtained from the pedestrian frame tag provided by the data set, P_hThe value is the height of the pedestrian frame, P_posThe value is the coordinate of the bottom edge of the pedestrian frame on the longitudinal axis. Here, the upper left corner of the picture is taken as the origin, the straight line on which the upper edge of the picture is located is the horizontal axis, the right direction is the positive direction, the straight line on which the left edge of the picture is located is the vertical axis, and the downward direction is the positive direction. From P on the entire data set_hStatistical value P of_h ^globalAnd P_posStatistics P of_pos ^globalA global linear relationship can be fitted:

P_h ^global＝a^global*P_pos ^global+b^global

and step 42, taking a position suitable for generating the pedestrian. According to the real world knowledge, the pedestrian must be present on a sidewalk or road (collectively referred to as a road), sky or tree, etc. in an inappropriate position. Then, according to the example label provided by the data set, a road surface position coordinate set { group } can be obtained. On this basis, we assume that a newly generated pedestrian can appear beside an existing pedestrian, or at any position on the road surface. And according to the example label provided by the data set, counting the position coordinates in 10 x 10 pixels of the bottom edge of the pedestrian frame of the existing pedestrian as the position coordinates { Person }.

And 43, expanding the pedestrian data. For a given picture I needing to generate the pedestrian, randomly selecting a data set from { group } and { Person }, and randomly selecting a position from the two data sets as the position P of the pedestrian_pos. Calculating the height P of the newly generated pedestrian according to the linear relation of the step 41_h. Cutting out a P from picture I_h*P_hBackground picture of size I_bgThe center of which coincides with the center of the new pedestrian being generated. Next, randomly selecting a mask M from the pedestrian instance mask data set, and its corresponding instance mask L and edge picture E, according to I_bgCalculating a sum mask M to obtain a masked picture B_MMask M, instance mask L, edge picture E, and masked picture B_MInputting the cascade GAN network model trained in the step 3 together to obtain a generated picture I_pedIn the picture I, a background picture I is taken_bgPixel-by-pixel replacement with generated Picture I_pedAnd completing one data expansion. I, P can be automatically selected according to actual demand_posAnd M, repeating the step 43 to obtain a large amount of expansion data.

The invention is not limited to the foregoing embodiments. The invention extends to any novel feature or any novel combination of features disclosed in this specification and any novel method or process steps or any novel combination of features disclosed. Those skilled in the art to which the invention pertains will appreciate that insubstantial changes or modifications can be made without departing from the spirit of the invention as defined by the appended claims.

All of the features disclosed in this specification, or all of the steps in any method or process so disclosed, may be combined in any combination, except combinations of features and/or steps that are mutually exclusive.

Any feature disclosed in this specification may be replaced by alternative features serving equivalent or similar purposes, unless expressly stated otherwise. That is, unless expressly stated otherwise, each feature is only an example of a generic series of equivalent or similar features.

Claims

1. A pedestrian detection data expansion method based on a generative confrontation network is characterized by comprising the following steps:

s2, preprocessing training data;

2. The pedestrian detection data expansion method based on the generative antagonistic network as claimed in claim 1, wherein in the step S1, the specific process of constructing the three-layer cascade generative antagonistic neural network model comprises:

s11, constructing a generator with a residual U-net structure, wherein an encoder part of the generator is added with a multi-scale residual block, and a decoder part of the generator is added with a channel attention residual block; the number of the basic blocks of the generator in the first layer, the second layer and the third layer network is n respectively₁＝12，n₂＝14，n₃16, jumping between the jth and the n-j basic blocks of each layer; the basic block comprises a multi-scale residual block and a channel attention residual block;

s12, constructing a discriminator based on the discriminator of PatchGAN;

s13, constructing an encoder based on the residual error network;

3. The pedestrian detection data expansion method based on the generative countermeasure network as claimed in claim 2, wherein in the step 11, each intermediate layer of the encoder portion of the generator injects a hidden layer vector z of 16 dimensions, and the hidden layer vector z is masked by a pedestrian instance mask.

4. The pedestrian detection data expansion method based on the generative antagonistic network as claimed in claim 3, wherein the objective function of the three-layer cascade generative antagonistic neural network model in the step 15 is specifically:

is also L₁Loss, which makes the output of the encoder as close to gaussian as possible; l is_KLIs the KL distance, L, in cLR-GAN_VGGIs a loss of perception; lambda, lambda_KL、λ_latent、λ_VGGIs over-parameter, controlWeights of the corresponding items are made.

5. The pedestrian detection data expansion method based on the generative countermeasure network according to claim 1 or 4, wherein the step S2 specifically comprises:

6. The pedestrian detection data expansion method based on the generative antagonistic network as claimed in claim 5, wherein in the step 21, different pedestrian sample maps are extracted for each layer of the generative antagonistic neural mesh,

7. The pedestrian detection data expansion method based on the generative countermeasure network according to claim 6, wherein the step S3 specifically comprises: when the cascade network is trained, N rounds of training are carried out,

8. The pedestrian detection data expansion method based on the generative countermeasure network as claimed in claim 7, wherein in S3, Adam optimization method is adoptedThe method updates the weight with a learning rate of w^h-iLr, where lr is the base learning rate, h is the total number of layers in the cascade, i is the ordinal number of the current trained layer, and w is the weight parameter.

9. The pedestrian detection data expansion method based on the generative countermeasure network according to claim 8, wherein the step S4 specifically comprises:

Cutting out a P from picture I_h*P_hBackground picture of size I_bgThe center of which coincides with the center of the generated new pedestrian; randomly selecting a mask M from a pedestrian instance mask data set, and its corresponding instance mask L and edge picture E, according to I_bgCalculating a sum mask M to obtain a masked picture B_MMask M, instance mask L, edge picture E, and masked picture B_MInputting the trained three-layer cascade generation type antagonistic neural network model together to obtain a generated picture I_pedIn the picture I, a background picture I is taken_bgPixel-by-pixel replacement with generated Picture I_pedCompleting one data expansion;

and S45, repeating the step 41 to obtain a large amount of expansion data.

10. The generation-based of claim 9The pedestrian detection data expansion method of the countermeasure network is characterized in that in the step S1, the pedestrian height P_hWith pedestrian position P_posThe linear relationship of (a) is specifically:

P_h ^global＝a^global*P_pos ^global+b^global