CN111950346A - Pedestrian detection data expansion method based on generation type countermeasure network - Google Patents

Pedestrian detection data expansion method based on generation type countermeasure network Download PDF

Info

Publication number
CN111950346A
CN111950346A CN202010595052.3A CN202010595052A CN111950346A CN 111950346 A CN111950346 A CN 111950346A CN 202010595052 A CN202010595052 A CN 202010595052A CN 111950346 A CN111950346 A CN 111950346A
Authority
CN
China
Prior art keywords
pedestrian
layer
picture
network
generator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010595052.3A
Other languages
Chinese (zh)
Inventor
彭滢
吴杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Electronic Technology Cyber Security Co Ltd
Original Assignee
China Electronic Technology Cyber Security Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Electronic Technology Cyber Security Co Ltd filed Critical China Electronic Technology Cyber Security Co Ltd
Priority to CN202010595052.3A priority Critical patent/CN111950346A/en
Publication of CN111950346A publication Critical patent/CN111950346A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention relates to a pedestrian detection data expansion method based on a generating type countermeasure network, which comprises the following steps: s1, building a three-layer cascade generation type antagonistic neural network model, and setting a target function for model training; each layer of the generated antagonistic neural network adopts a BicycleGAN structure, the generator adopts a residual U-net structure, and the input of the next layer of network is a pedestrian instance mask picture and the output of the previous layer of network; s2, preprocessing training data; s3, training a three-layer cascade generation type antagonistic neural network model by adopting the preprocessed data; and S4, completing the expansion of pedestrian detection data through a three-layer cascade generation type antagonistic neural network model. The pedestrian generated by the scheme of the invention is more naturally fused with the background, and the detail of the generated pedestrian is more fine by improving the U-net structure of the generator; the multi-scale pedestrian picture is generated based on the cascade structure, so that the quality of the large-size and high-resolution pedestrian picture is improved; diversified pedestrians can be generated, and the efficiency of data expansion is improved.

Description

Pedestrian detection data expansion method based on generation type countermeasure network
Technical Field
The invention relates to the field of image processing, in particular to a pedestrian detection data expansion method based on a generative confrontation network.
Background
The invention relates to pedestrian detection, which is a basic task in video processing and is widely applied to scenes such as intelligent video monitoring, automatic driving, robot automation and the like, and a large-scale and high-quality pedestrian picture data set is required for training a high-precision pedestrian detection model. At present, the research related to pedestrian detection mainly uses the existing public data sets, most of which come from huge internet companies, and the public data sets invest a great deal of manual labeling and correction cost to ensure the reliability of the data sets. When training the model on these public data sets, researchers often augment the training set pictures using traditional data augmentation methods, such as picture flipping, random cropping, adjusting colors, and the like. However, these methods do not enrich the content of the data set per se, and the effect of data expansion is limited. In view of the problem, a generation type antagonistic neural network with a cascade structure is provided, which can automatically generate pedestrians with various sizes, high quality and different clothes, thereby achieving the purpose of automatically expanding a pedestrian detection data set in a large scale.
A Generative adaptive neural Network (GAN) is a deep learning Network structure that contains two basic parts: a generator and a discriminator. In the process of generating pictures by using GAN, the generator aims to generate pictures as real as possible, and the discriminator aims to judge which data are real and which data are generated. By training the network, the generator and the discriminator continuously resist against each other, and finally the network learns to generate the picture which is close enough to the real picture.
In recent years, data expansion based on the GAN network becomes a research hotspot, and there are researches on generating plant pictures for plant identification, generating medical CT pictures for assisting intelligent diagnosis, and the like, but the research on generating pedestrian pictures is still less, and the quality of the generated pedestrian pictures is still to be improved. The pedestrian generation method based on the GAN network for predecessor work mainly comprises the following steps: providing a background picture (such as a street picture), adding a pedestrian frame at a position where a pedestrian is expected to be generated in the background picture, replacing the background picture in the frame with noise (such as Gaussian noise), using the background picture as an input of a GAN network, constructing a generator based on a U-net network structure, judging whether the generated pedestrian is real by using a local discriminator, judging whether the whole pedestrian picture is real by using a global discriminator, and processing multi-resolution pedestrians with different sizes by using a Spatial Pyramid Pooling technology (Spatial Pyramid Pooling). This approach has several problems: first, the border between the added box and the background has obvious edge traces, and the generated picture looks like a square sticker pasted on the background, which is not real. Second, the pedestrian details generated by the model are coarse and of low quality, which is particularly acute on large-size, high-resolution pedestrians. Thirdly, the model trained by the method lacks diversity, and the generated pedestrian is dressed with clothes and similar in color, which is not good enough for data expansion.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: 1. the problem that edge traces are obvious when a pedestrian frame and a background are fused in a generated pedestrian picture is solved; 2. the problem that the details of the generated pedestrian are rough is solved; 3. the problem of low mass of the pedestrians with large size and high resolution is solved; 4. the problem that the generated pedestrian pictures are lack of diversity is solved. In view of the existing problems, a pedestrian detection data expansion method based on a generative countermeasure network is provided.
The technical scheme adopted by the invention is as follows: a pedestrian detection data expansion method based on a generative confrontation network comprises the following steps:
s1, building a three-layer cascade generation type antagonistic neural network model, and setting a target function for model training; each layer of the generated antagonistic neural network adopts a BicycleGAN structure, the generator adopts a residual U-net structure, and the input of the next layer of network is a pedestrian instance mask picture and the output of the previous layer of network;
s2, preprocessing training data;
s3, training a three-layer cascade generation type antagonistic neural network model by adopting the preprocessed data;
and S4, completing the expansion of pedestrian detection data through a three-layer cascade generation type antagonistic neural network model.
Further, in S1, the specific process of building a three-layer cascade generation type antagonistic neural network model includes:
s11, constructing a generator with a residual U-net structure, wherein an encoder part of the generator is added with a multi-scale residual block, and a decoder part of the generator is added with a channel attention residual block; specifically, a generator of the residual U-net structure is improved on the basis of the U-net structure, and in an encoder part, a second 3 x 3 convolution in each basic block of the U-net is replaced by a multi-scale residual block to serve as a new basic block; in the decoder part, replacing the first 3 x 3 convolution in each basic block of U-net with a channel attention residual block as a new basic block; one masked 16-dimensional hidden layer vector is injected into each middle layer of the encoder.
S12, constructing a discriminator based on the discriminator of PatchGAN;
s13, constructing an encoder based on the residual error network;
s14, constructing a generator, a discriminator and an encoder by each layer of the cascade network through S11, S12 and S13, wherein the input picture resolution of the first layer is 64 x 64, the second layer is 128 x 128, and the third layer is 256 x 256; the two layers of generative antagonistic networks are connected through a convolution layer to form a three-layer cascade generative antagonistic neural network;
s15, adding the perception loss based on VGG-19 into the BicycleGAN-based target function as the target function of the three-layer cascade generation type antagonistic neural network model.
Further, in step 11, each intermediate layer of the encoder portion of the generator injects a hidden layer vector z of 16 dimensions, which is masked by the pedestrian instance mask.
Further, the objective function of the three-layer cascade generation type antagonistic neural network model in the step 15 is specifically as follows:
Figure BDA0002557217220000031
wherein G is*,E*Respectively representing generators and compilationsEncoder, DwholeIs a global arbiter, DlocalIs a local-area discriminator and is used for discriminating the local area,
Figure BDA0002557217220000032
and LGAN(. cndot.) represents the antagonistic loss objective function of cVAE-GAN and cLR-GAN in the BicycleGAN network structure, respectively;
Figure BDA0002557217220000033
is L1Loss, which makes the input of the generator as similar as possible to the pedestrian sample picture;
Figure BDA0002557217220000034
is also L1Loss, which makes the output of the encoder as close to gaussian as possible; l isKLIs the KL distance, L, in cLR-GANVGGIs a loss of perception; lambda, lambdaKL、λlatent、λVGGIs a hyper-parameter, controlling the weight of the corresponding item.
Further, the S2 specifically includes:
s21, taking out a pedestrian sample map with the pixel size required by each layer of the generated confrontation network from the Cityscapes data set to obtain a pedestrian sample map set;
s22, obtaining an example label map corresponding to each pedestrian sample map according to the example label map set of the Cityscapes and the pedestrian sample map set obtained in the step S1, aligning and cutting each label map and the corresponding sample map, and repeating the process to obtain an example label map set L corresponding to the pedestrian sample map set;
s23, setting the pixel point value of the pedestrian at the middle in each sample label image as 1, and setting the pixel point values of other pixels as 0, so as to obtain a pedestrian sample mask M of each sample image;
s24, processing the corresponding pedestrian sample image by using the obtained pedestrian instance mask code to obtain an image B subjected to pedestrian instance mask codeM
S25, aligning an example label graph of the Cityscapes data set with the pedestrian sample graph set obtained in the step S1, setting the pixel value of the boundary between the example and the example in the example label graph to be 1, and setting the pixel value in the example to be 0, so as to obtain a corresponding example edge graph E;
s26, and B corresponding to each picture in the set obtained in the steps S22, S23, S24 and S25MM, L and E are spliced in sequence to obtain a three-layer cascade generation type antagonistic neural network model input set A, wherein A is { B {M,M,L,E}。
Further, in the step 21, different pedestrian sample maps are extracted for each layer of the generated antagonistic neural mesh,
for the first layer of network, taking pedestrian samples with the height of [64, 256] pixels in the data set, wherein each pedestrian sample is a square picture, the side length of each pedestrian sample is equal to the height of the pedestrian, the center of each picture is the center of the pedestrian, and the size (resize) of the taken picture is adjusted to be 64 x 64 pixels;
for the second-layer network, taking out a pedestrian sample with the height of [100, 1024] pixels, and adjusting the size of the picture to 128 × 128 pixels;
for the third tier network, take pedestrian samples at [150, 1024] pixel height and resize the picture to 256 × 256 pixels.
Further, the S3 specifically includes: when the cascade network is trained, N rounds of training are carried out,
the goal of the first layer network is a learning generator G1And an encoder E1The objective function does not use the perceptual loss when training the first layer;
when training the second layer, the front N/2 wheel fixed generator G1And an encoder E1Only updates the generator G2And an encoder E2The last N/2 rounds of simultaneous update of the generator G1、G2And an encoder E1、E2The weight of (c).
When training the third layer, the front N/2 wheel fixed generator G1、G2And an encoder E1、E2Only updates the generator G3And an encoder E3The last N/2 rounds of simultaneous update of the generator G1、G2、G3And an encoder E1、E2、E3The weight of (c).
Wherein G is1、G2、G3The generator of the first layer, the second layer and the third layer generation type antagonistic neural network is respectively referred; e1、E2、E3The first layer, the second layer and the third layer are respectively referred to as encoders of the generative type antagonistic neural network.
Further, in S3, the weight is updated by Adam optimization method, and the learning rate is wh-iLr, where lr is the base learning rate, h is the total number of layers in the cascade, i is the ordinal number of the current trained layer, and w is the weight parameter.
Further, the S4 specifically includes:
s41, establishing the height P of the pedestrian in the picturehWith pedestrian position PposThe linear relationship of (a);
s42, obtaining a road surface position coordinate set according to example label marks provided by the data set;
s43, obtaining a pedestrian position coordinate set by position coordinates in 10 x 10 pixels of the pedestrian frame bottom edge of the existing pedestrian in the example label provided by the statistical data set;
s44, for a given picture I needing to generate the pedestrian, randomly selecting a data set from a road surface position coordinate set and a pedestrian position coordinate set, and randomly selecting a position from the two data sets as the position P of the pedestrianposCalculating the height P of the newly generated pedestrian according to the linear relation of the step S41h
Cutting out a P from picture Ih*PhBackground picture of size IbgThe center of which coincides with the center of the generated new pedestrian; randomly selecting a mask M from a pedestrian instance mask data set, and its corresponding instance mask L and edge picture E, according to IbgCalculating a sum mask M to obtain a masked picture BMMask M, instance mask L, edge picture E, and masked picture BMInputting the trained three-layer cascade generation type antagonistic neural network model together to obtain a generated picture IpedIn the picture I, a background picture I is takenbhPixel-by-pixel replacement with generated Picture IpedComplete one-time data expansion;
And S45, repeating the step 41 to obtain a large amount of expansion data.
Further, in S1, the pedestrian height PhWith pedestrian position PposThe linear relationship of (a) is specifically:
Ph global=aglobal*Ppos global+bglobal
wherein, Ph globalPedestrian height P for the entire data sethStatistical value of (P)pos globalPedestrian position P for the entire data setposThe statistical value of (1).
Compared with the prior art, the beneficial effects of adopting the technical scheme are as follows:
1. the pedestrian case mask is used for solving the problem that the pedestrian frame mask is obvious in edge in the background image; the pedestrian example mask can provide the shape of the pedestrian, the generated body edge of the pedestrian is clearer, and the posture is more real;
2. a multi-scale residual block and a channel attention residual block are introduced to improve the U-net structure of the generator, so that the details of the generated pedestrian are finer;
3. the multi-scale pedestrian picture is generated based on the cascade structure, so that the quality of the large-size and high-resolution pedestrian picture is improved;
4. diversified pedestrians can be generated, and the efficiency of data expansion is improved.
Drawings
Fig. 1 is a schematic process diagram of a pedestrian detection data expansion method based on a GAN network in the present invention.
Fig. 2 is a schematic diagram of the overall structure of a three-layer cascade generation type antagonistic neural network in the invention.
Fig. 3 is a schematic diagram of a generator residual U-net network structure in the present invention.
Fig. 4 is a schematic diagram of a multi-scale residual block structure in the generator of the present invention.
FIG. 5 is a schematic diagram of a channel attention residual block structure in the generator according to the present invention.
FIG. 6 is a schematic diagram of the structure of the discriminator in the present invention.
Fig. 7 is a schematic diagram of an encoder structure according to the present invention.
Fig. 8 is a schematic diagram of a connection structure between cascaded layers in the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
The technical problems to be solved by the invention are as follows:
1. the problem that edge traces are obvious when a pedestrian frame and a background are fused in a generated pedestrian picture is solved;
2. the problem that the details of the generated pedestrian are rough is solved;
3. the problem of low mass of the pedestrians with large size and high resolution is solved;
4. the problem that the generated pedestrian pictures are lack of diversity is solved.
Based on the above, the invention provides a pedestrian detection data expansion method based on a generative countermeasure network, which comprises the following specific schemes:
step 1: and building a cascade generation type antagonistic neural network. The scheme provides a three-layer cascade generation type antagonistic neural network (as shown in figure 2), each layer generation type antagonistic neural network uses the structure of BicycleGAN, but the network structure of the generator is improved into a residual U-net network, and the neural network provided by the embodiment is used for learning from BMMapping to B, wherein BMIs a set of input fields, each element of which is a background picture obscured by a pedestrian instance mask; b is the set of output fields, each element of which is a pedestrian picture containing the background. To provide more information to the network, we not only input B when training the networkMAnd inputting a pedestrian instance mask set M corresponding to the pedestrian instance mask, an instance label graph set L used for calculating M, and an instance edge graph set E obtained through L. Therefore, the input to the network is a ═ BMM, L, E }, and the output is B. It should be noted that the network of this embodiment learns the mapping of "1 to many", and there are many possible outputs for one input, but we only input the mapping of "1 to 1" during training, and can obtain many outputs during testing. Utensil for cleaning buttockThe body is as follows:
and 11, constructing a generator of a residual U-net structure. The residual U-net generator is improved on the basis of the U-net: in the encoder part, replacing the second 3 x 3 convolution in each basic block of the U-net with a multi-scale residual block as a new basic block; in the decoder part, replacing the first 3 x 3 convolution in each basic block of U-net with a channel attention residual block as a new basic block; one masked 16-dimensional hidden layer vector is injected into each middle layer of the encoder. For the three-layer cascade generation type antagonistic neural network, the number of basic blocks of the encoder part and the decoder part of a generator is equal on each layer, and the total number of the basic blocks is n1=12,n2=14,n316, the jth and the nth-j basic blocks are connected in a hop mode consistent with the original U-net.
Specifically, each multi-scale residual block of the encoder portion is defined as:
Figure BDA0002557217220000061
Figure BDA0002557217220000062
Figure BDA0002557217220000065
Figure BDA0002557217220000063
Figure BDA0002557217220000064
Mn=S'+Mn-1
where w and b are weights and offsets, the upper corner indicates the location of the network layer, and the lower corner indicatesThe table represents the convolution kernel size in a convolutional network, and differs from the multi-scale residual block of the prior art in that the activation function (-) of the present invention does not use a ReLU, but rather a LeakyReLU, brackets]Representing a connection operation, MnAnd Mn-1Representing the output and input of the multi-scale residual block, respectively.
In a channel attention residual block of a decoder, channel attention ca (x) is defined by three steps:
Figure BDA0002557217220000071
s=f(WU(WDy)),
Figure BDA0002557217220000073
wherein the input data is X ═ X1,x2,...,xC]C is a feature map of size H × W, y is statistical information for each channel, xc(i, j) is the value at location (i, j), AA (. cndot.) is Average Possing, f (. cndot.) is sigmoid, (. cndot.) is LeakyReLU, and W is the weight. Based on ca (x), the channel attention residual block a (x) is represented as:
An=CA(X)·X+An-1
X=W2(W1An-1)
wherein, W1And W2Is the weight of both convolutional layers.
Fig. 3 is a schematic diagram of a generator structure in the first-tier impedance neural network, where C (in, out, k, s) represents a convolutional layer whose input channel number is in, output channel number is out, the size of the convolutional kernel is k × k, and the step size is s; CT (in, out, k, s) represents a transposed convolutional layer; CAT is splicing operation; DS (t) is a downsampling operation by a factor of t; m (in, out) is a multi-scale feature residual block, and A (in, inter) is a channel attention residual block. Fig. 4 is a schematic structural diagram of a multi-scale feature residual block, and fig. 5 is a schematic structural diagram of a channel attention residual block, where inter is the number of channels of an intermediate convolution layer in the residual block, and is obtained by calculating an input channel number in and an attenuation rate r, where r is 16 in this embodiment.
And step 12, constructing a discriminator. All discriminators use the discriminator structure proposed by PatchGAN, the structure of which is schematically shown in FIG. 6. Where C (in, out, k, s) represents a convolutional layer and AvgPool (k, s) represents an average pooling layer.
The network uses Leaky ReLU as the activation function with a parameter of 0.2 and Instance Normalization is the example regularization.
And step 13, constructing an encoder of the BicycleGAN network. The encoder is based on a residual error network, and the structure is schematically shown in FIG. 7. R (in, out, k, s) represents a standard residual block, Linear (in, out) represents a Linear layer,
Figure BDA0002557217220000074
representing element-by-element addition operation, the network uses Leaky ReLU as the activation function, with a parameter of 0.2, and Instance Normalization is the example regularization.
And step 14, building a three-layer cascade generation type antagonistic neural network.
Each layer of the cascade network uses the generator, discriminator and encoder constructed in steps 11, 12, 13, the first layer inputs pictures with a resolution of 64 x 64, the second layer 128 x 128 and the third layer 256 x 256. Two GAN networks are connected by a convolutional layer, which is schematically shown in fig. 8.
And step 15, setting an objective function of model training. Since each layer of the cascade is a network with a BicycleGAN structure, the present embodiment adopts the objective function proposed by BicycleGAN. In addition, in order to make the generated pedestrian more human-like, the text also adds a perception loss based on VGG-19 as an objective function. The final objective function is:
Figure BDA0002557217220000081
wherein G is*,E*Respectively representing a generator and an encoder, DwholeIs a global arbiter, DlocalIs a local-area discriminator and is used for discriminating the local area,
Figure BDA0002557217220000082
and LGAN(. cndot.) represents the antagonistic loss objective function of cVAE-GAN and cLR-GAN in BicycleGAN, respectively.
Figure BDA0002557217220000083
Is L1Loss, which makes the input of the generator as similar as possible to the pedestrian sample picture;
Figure BDA0002557217220000084
is also L1It makes the output of the encoder as close to gaussian as possible. L isKLIs the KL distance, L, in cLR-GANVGGIs the loss of perception. Lambda, lambdaKL、λlatent、λVGGIs a hyper-parameter, controlling the weight of the corresponding item.
Step 2: and (4) preprocessing data. The scheme performs model training on a training set provided by a public data set Cityscapes and performs testing on a verification set of the public data set Cityscapes. The resolution of each street view picture in the data set is 1920 x 1080, and the model of the invention is trained to focus only on the part of the picture with pedestrians. The method comprises the following specific steps:
step 21, taking out a pedestrian sample from the data set. To train the first-layer GAN network, we take pedestrian samples whose height in the data set is [64, 256] pixels, each pedestrian sample is a square picture, the side length is equal to the height of a pedestrian, and the center of the picture is the center of the pedestrian. Adjusting the size (resize) of the fetched picture to 64 × 64 pixels; in order to train a second-layer GAN network, a pedestrian sample with the height of [100, 1024] pixels is taken out, and the size of the picture is adjusted to 128 × 128 pixels; to train the third GAN network, take pedestrian samples at [150, 1024] pixel height and resize the picture to 256 × pixels.
And 22, obtaining an example label graph set L corresponding to the pedestrian sample. And (3) acquiring an example label map corresponding to each pedestrian sample map through an example label map set provided by Cityscapes and the pedestrian sample map set obtained in the step (21), aligning and cutting the label map and each sample map to obtain an example label map set.
And 23, obtaining a pedestrian example mask M corresponding to the pedestrian sample. And setting the pixel point value of the pedestrian which belongs to the middle of the tag map as 1 and setting the pixel point values of other pixels as 0 for each tag map, and obtaining the pedestrian instance mask of each sample map.
Step 24, obtaining an image B after being masked by a pedestrian instanceM. The pedestrian instance mask obtained in step 21 and the pedestrian instance mask obtained in step 22 are used to obtain an image after the pedestrian instance mask, the pedestrian in the middle of the image is masked to be white, and other parts of the image are still reserved.
And 25, obtaining an example edge map set E corresponding to the pedestrian sample. And (3) comparing the example label graph provided by the Cityscapes with the sample graph set obtained in the step (21), setting the pixel value at the boundary of the example and the example in the example label graph to be 1, and setting the pixel value in the example to be 0, so as to obtain a corresponding example edge graph.
Step 26, obtaining an input set a of the model, a ═ BMM, L, E }. B corresponding to each picture in the set obtained in the steps 22, 23, 24 and 25MM, L and E are spliced in sequence to obtain an input set.
And step 3: and training a three-layer cascade generation type antagonistic neural network model. Hyper-parameters lambda, lambdaKL、λlatent、λVGGSet to 10, 0.01, 0.5 and 1, set to 1, train run number 200.
No perceptual loss is used in training the first layer GAN network, as we find that this loss leads to training instability at this resolution.
In training the cascading network, the goal of the first layer network is the learning generator G1And an encoder E1Weight of (1), training 200 rounds; for training the second layer, the front 100 wheels are fixed G1And E1Weight of G, only G is updated2And E2Updating G simultaneously for the rear 100 wheels1、E1、G2And E2(ii) a The third layer and the second layer adopt the same strategy.
Updating weights using Adam optimization method, learning thereofA rate of wh-iLr, where lr is the base learning rate, h is the total number of layers in the cascade, i is the ordinal number of the current training layer, and w is the weighting parameter, where lr is 0.0002, h is 3, i is {1,2,3}, and w is 0.01.
And 4, step 4: and (3) expanding the CityPersons data set pedestrian detection data by using the model trained in the step 3. CityPersons is a public data set that extends from the cityscaps data set, and also provides data for city street pictures, instance tag labels, etc.
The specific steps of data expansion are as follows:
and step 41, determining the relation between the height and the position of the pedestrian in the data set. Height of pedestrian PhIndicating the position of the pedestrian by PposRepresenting that there is an association between the two: the closer the pedestrian is located to the camera position where the picture is taken, the greater its height. PhAnd PposCan be obtained from the pedestrian frame tag provided by the data set, PhThe value is the height of the pedestrian frame, PposThe value is the coordinate of the bottom edge of the pedestrian frame on the longitudinal axis. Here, the upper left corner of the picture is taken as the origin, the straight line on which the upper edge of the picture is located is the horizontal axis, the right direction is the positive direction, the straight line on which the left edge of the picture is located is the vertical axis, and the downward direction is the positive direction. From P on the entire data sethStatistical value P ofh globalAnd PposStatistics P ofpos globalA global linear relationship can be fitted:
Ph global=aglobal*Ppos global+bglobal
and step 42, taking a position suitable for generating the pedestrian. According to the real world knowledge, the pedestrian must be present on a sidewalk or road (collectively referred to as a road), sky or tree, etc. in an inappropriate position. Then, according to the example label provided by the data set, a road surface position coordinate set { group } can be obtained. On this basis, we assume that a newly generated pedestrian can appear beside an existing pedestrian, or at any position on the road surface. And according to the example label provided by the data set, counting the position coordinates in 10 x 10 pixels of the bottom edge of the pedestrian frame of the existing pedestrian as the position coordinates { Person }.
And 43, expanding the pedestrian data. For a given picture I needing to generate the pedestrian, randomly selecting a data set from { group } and { Person }, and randomly selecting a position from the two data sets as the position P of the pedestrianpos. Calculating the height P of the newly generated pedestrian according to the linear relation of the step 41h. Cutting out a P from picture Ih*PhBackground picture of size IbgThe center of which coincides with the center of the new pedestrian being generated. Next, randomly selecting a mask M from the pedestrian instance mask data set, and its corresponding instance mask L and edge picture E, according to IbgCalculating a sum mask M to obtain a masked picture BMMask M, instance mask L, edge picture E, and masked picture BMInputting the cascade GAN network model trained in the step 3 together to obtain a generated picture IpedIn the picture I, a background picture I is takenbgPixel-by-pixel replacement with generated Picture IpedAnd completing one data expansion. I, P can be automatically selected according to actual demandposAnd M, repeating the step 43 to obtain a large amount of expansion data.
The invention is not limited to the foregoing embodiments. The invention extends to any novel feature or any novel combination of features disclosed in this specification and any novel method or process steps or any novel combination of features disclosed. Those skilled in the art to which the invention pertains will appreciate that insubstantial changes or modifications can be made without departing from the spirit of the invention as defined by the appended claims.
All of the features disclosed in this specification, or all of the steps in any method or process so disclosed, may be combined in any combination, except combinations of features and/or steps that are mutually exclusive.
Any feature disclosed in this specification may be replaced by alternative features serving equivalent or similar purposes, unless expressly stated otherwise. That is, unless expressly stated otherwise, each feature is only an example of a generic series of equivalent or similar features.

Claims (10)

1. A pedestrian detection data expansion method based on a generative confrontation network is characterized by comprising the following steps:
s1, building a three-layer cascade generation type antagonistic neural network model, and setting a target function for model training; each layer of the generated antagonistic neural network adopts a BicycleGAN structure, the generator adopts a residual U-net structure, and the input of the next layer of network is a pedestrian instance mask picture and the output of the previous layer of network;
s2, preprocessing training data;
s3, training a three-layer cascade generation type antagonistic neural network model by adopting the preprocessed data;
and S4, completing the expansion of pedestrian detection data through a three-layer cascade generation type antagonistic neural network model.
2. The pedestrian detection data expansion method based on the generative antagonistic network as claimed in claim 1, wherein in the step S1, the specific process of constructing the three-layer cascade generative antagonistic neural network model comprises:
s11, constructing a generator with a residual U-net structure, wherein an encoder part of the generator is added with a multi-scale residual block, and a decoder part of the generator is added with a channel attention residual block; the number of the basic blocks of the generator in the first layer, the second layer and the third layer network is n respectively1=12,n2=14,n316, jumping between the jth and the n-j basic blocks of each layer; the basic block comprises a multi-scale residual block and a channel attention residual block;
s12, constructing a discriminator based on the discriminator of PatchGAN;
s13, constructing an encoder based on the residual error network;
s14, constructing a generator, a discriminator and an encoder by each layer of the cascade network through S11, S12 and S13, wherein the input picture resolution of the first layer is 64 x 64, the second layer is 128 x 128, and the third layer is 256 x 256; the two layers of generative antagonistic networks are connected through a convolution layer to form a three-layer cascade generative antagonistic neural network;
s15, adding the perception loss based on VGG-19 into the BicycleGAN-based target function as the target function of the three-layer cascade generation type antagonistic neural network model.
3. The pedestrian detection data expansion method based on the generative countermeasure network as claimed in claim 2, wherein in the step 11, each intermediate layer of the encoder portion of the generator injects a hidden layer vector z of 16 dimensions, and the hidden layer vector z is masked by a pedestrian instance mask.
4. The pedestrian detection data expansion method based on the generative antagonistic network as claimed in claim 3, wherein the objective function of the three-layer cascade generative antagonistic neural network model in the step 15 is specifically:
Figure FDA0002557217210000011
wherein G is*,E*Respectively representing a generator and an encoder, DwholeIs a global arbiter, DlocalIs a local-area discriminator and is used for discriminating the local area,
Figure FDA0002557217210000021
and LGAN(. cndot.) represents the antagonistic loss objective function of cVAE-GAN and cLR-GAN in the BicycleGAN network structure, respectively;
Figure FDA0002557217210000022
is L1Loss, which makes the input of the generator as similar as possible to the pedestrian sample picture;
Figure FDA0002557217210000023
is also L1Loss, which makes the output of the encoder as close to gaussian as possible; l isKLIs the KL distance, L, in cLR-GANVGGIs a loss of perception; lambda, lambdaKL、λlatent、λVGGIs over-parameter, controlWeights of the corresponding items are made.
5. The pedestrian detection data expansion method based on the generative countermeasure network according to claim 1 or 4, wherein the step S2 specifically comprises:
s21, taking out a pedestrian sample map with the pixel size required by each layer of the generated confrontation network from the Cityscapes data set to obtain a pedestrian sample map set;
s22, obtaining an example label map corresponding to each pedestrian sample map according to the example label map set of the Cityscapes and the pedestrian sample map set obtained in the step S1, aligning and cutting each label map and the corresponding sample map, and repeating the process to obtain an example label map set L corresponding to the pedestrian sample map set;
s23, setting the pixel point value of the pedestrian at the middle in each sample label image as 1, and setting the pixel point values of other pixels as 0, so as to obtain a pedestrian sample mask M of each sample image;
s24, processing the corresponding pedestrian sample image by using the obtained pedestrian instance mask code to obtain an image B subjected to pedestrian instance mask codeM
S25, aligning an example label graph of the Cityscapes data set with the pedestrian sample graph set obtained in the step S1, setting the pixel value of the boundary between the example and the example in the example label graph to be 1, and setting the pixel value in the example to be 0, so as to obtain a corresponding example edge graph E;
s26, and B corresponding to each picture in the set obtained in the steps S22, S23, S24 and S25MM, L and E are spliced in sequence to obtain a three-layer cascade generation type antagonistic neural network model input set A, wherein A is { B {M,M,L,E}。
6. The pedestrian detection data expansion method based on the generative antagonistic network as claimed in claim 5, wherein in the step 21, different pedestrian sample maps are extracted for each layer of the generative antagonistic neural mesh,
for the first layer of network, taking pedestrian samples with the height of [64, 256] pixels in the data set, wherein each pedestrian sample is a square picture, the side length of each pedestrian sample is equal to the height of the pedestrian, the center of each picture is the center of the pedestrian, and the size (resize) of the taken picture is adjusted to be 64 x 64 pixels;
for the second-layer network, taking out a pedestrian sample with the height of [100, 1024] pixels, and adjusting the size of the picture to 128 × 128 pixels;
for the third tier network, take pedestrian samples at [150, 1024] pixel height and resize the picture to 256 × 256 pixels.
7. The pedestrian detection data expansion method based on the generative countermeasure network according to claim 6, wherein the step S3 specifically comprises: when the cascade network is trained, N rounds of training are carried out,
the goal of the first layer network is a learning generator G1And an encoder E1The objective function does not use the perceptual loss when training the first layer;
when training the second layer, the front N/2 wheel fixed generator G1And an encoder E1Only updates the generator G2And an encoder E2The last N/2 rounds of simultaneous update of the generator G1、G2And an encoder E1、E2The weight of (c).
When training the third layer, the front N/2 wheel fixed generator G1、G2And an encoder E1、E2Only updates the generator G3And an encoder E3The last N/2 rounds of simultaneous update of the generator G1、G2、G3And an encoder E1、E2、E3The weight of (c).
Wherein G is1、G2、G3The generator of the first layer, the second layer and the third layer generation type antagonistic neural network is respectively referred; e1、E2、E3The first layer, the second layer and the third layer are respectively referred to as encoders of the generative type antagonistic neural network.
8. The pedestrian detection data expansion method based on the generative countermeasure network as claimed in claim 7, wherein in S3, Adam optimization method is adoptedThe method updates the weight with a learning rate of wh-iLr, where lr is the base learning rate, h is the total number of layers in the cascade, i is the ordinal number of the current trained layer, and w is the weight parameter.
9. The pedestrian detection data expansion method based on the generative countermeasure network according to claim 8, wherein the step S4 specifically comprises:
s41, establishing the height P of the pedestrian in the picturehWith pedestrian position PposThe linear relationship of (a);
s42, obtaining a road surface position coordinate set according to example label marks provided by the data set;
s43, obtaining a pedestrian position coordinate set by position coordinates in 10 x 10 pixels of the pedestrian frame bottom edge of the existing pedestrian in the example label provided by the statistical data set;
s44, for a given picture I needing to generate the pedestrian, randomly selecting a data set from a road surface position coordinate set and a pedestrian position coordinate set, and randomly selecting a position from the two data sets as the position P of the pedestrianposCalculating the height P of the newly generated pedestrian according to the linear relation of the step S41h
Cutting out a P from picture Ih*PhBackground picture of size IbgThe center of which coincides with the center of the generated new pedestrian; randomly selecting a mask M from a pedestrian instance mask data set, and its corresponding instance mask L and edge picture E, according to IbgCalculating a sum mask M to obtain a masked picture BMMask M, instance mask L, edge picture E, and masked picture BMInputting the trained three-layer cascade generation type antagonistic neural network model together to obtain a generated picture IpedIn the picture I, a background picture I is takenbgPixel-by-pixel replacement with generated Picture IpedCompleting one data expansion;
and S45, repeating the step 41 to obtain a large amount of expansion data.
10. The generation-based of claim 9The pedestrian detection data expansion method of the countermeasure network is characterized in that in the step S1, the pedestrian height PhWith pedestrian position PposThe linear relationship of (a) is specifically:
Ph global=aglobal*Ppos global+bglobal
wherein, Ph globalPedestrian height P for the entire data sethStatistical value of (P)pos globalPedestrian position P for the entire data setposThe statistical value of (1).
CN202010595052.3A 2020-06-28 2020-06-28 Pedestrian detection data expansion method based on generation type countermeasure network Pending CN111950346A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010595052.3A CN111950346A (en) 2020-06-28 2020-06-28 Pedestrian detection data expansion method based on generation type countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010595052.3A CN111950346A (en) 2020-06-28 2020-06-28 Pedestrian detection data expansion method based on generation type countermeasure network

Publications (1)

Publication Number Publication Date
CN111950346A true CN111950346A (en) 2020-11-17

Family

ID=73337331

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010595052.3A Pending CN111950346A (en) 2020-06-28 2020-06-28 Pedestrian detection data expansion method based on generation type countermeasure network

Country Status (1)

Country Link
CN (1) CN111950346A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112634284A (en) * 2020-12-22 2021-04-09 上海体素信息科技有限公司 Weight map loss-based staged neural network CT organ segmentation method and system
CN114519798A (en) * 2022-01-24 2022-05-20 东莞理工学院 Multi-target image data enhancement method based on antagonistic neural network
TWI779760B (en) * 2021-08-04 2022-10-01 瑞昱半導體股份有限公司 Method of data augmentation and non-transitory computer-readable medium
CN115526874A (en) * 2022-10-08 2022-12-27 哈尔滨市科佳通用机电股份有限公司 Round pin of brake adjuster control rod and round pin split pin loss detection method
WO2023246921A1 (en) * 2022-06-23 2023-12-28 京东方科技集团股份有限公司 Target attribute recognition method and apparatus, and model training method and apparatus

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2948816A1 (en) * 2009-07-30 2011-02-04 Univ Paris Sud ELECTRO-OPTICAL DEVICES BASED ON INDEX VARIATION OR ABSORPTION IN ISB TRANSITIONS.
US20120069342A1 (en) * 2010-04-19 2012-03-22 Fraser Dalgleish MEMS Microdisplay Optical Imaging and Sensor Systems for Underwater Scattering Environments
US20170365038A1 (en) * 2016-06-16 2017-12-21 Facebook, Inc. Producing Higher-Quality Samples Of Natural Images
CN109271895A (en) * 2018-08-31 2019-01-25 西安电子科技大学 Pedestrian's recognition methods again based on Analysis On Multi-scale Features study and Image Segmentation Methods Based on Features
CN110021051A (en) * 2019-04-01 2019-07-16 浙江大学 One kind passing through text Conrad object image generation method based on confrontation network is generated
CN110969589A (en) * 2019-12-03 2020-04-07 重庆大学 Dynamic scene fuzzy image blind restoration method based on multi-stream attention countermeasure network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2948816A1 (en) * 2009-07-30 2011-02-04 Univ Paris Sud ELECTRO-OPTICAL DEVICES BASED ON INDEX VARIATION OR ABSORPTION IN ISB TRANSITIONS.
US20120069342A1 (en) * 2010-04-19 2012-03-22 Fraser Dalgleish MEMS Microdisplay Optical Imaging and Sensor Systems for Underwater Scattering Environments
US20170365038A1 (en) * 2016-06-16 2017-12-21 Facebook, Inc. Producing Higher-Quality Samples Of Natural Images
CN109271895A (en) * 2018-08-31 2019-01-25 西安电子科技大学 Pedestrian's recognition methods again based on Analysis On Multi-scale Features study and Image Segmentation Methods Based on Features
CN110021051A (en) * 2019-04-01 2019-07-16 浙江大学 One kind passing through text Conrad object image generation method based on confrontation network is generated
CN110969589A (en) * 2019-12-03 2020-04-07 重庆大学 Dynamic scene fuzzy image blind restoration method based on multi-stream attention countermeasure network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIE WU 等: "PMC-GANs:Generating Multi-Scale High-Quality Pedestrian with Multimodal Cascaded GANs", 《ARXIV》 *
梁礼明 等: "自适应尺度信息的U型视网膜血管分割算法", 《光学学报》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112634284A (en) * 2020-12-22 2021-04-09 上海体素信息科技有限公司 Weight map loss-based staged neural network CT organ segmentation method and system
CN112634284B (en) * 2020-12-22 2022-03-25 上海体素信息科技有限公司 Weight map loss-based staged neural network CT organ segmentation method and system
TWI779760B (en) * 2021-08-04 2022-10-01 瑞昱半導體股份有限公司 Method of data augmentation and non-transitory computer-readable medium
CN114519798A (en) * 2022-01-24 2022-05-20 东莞理工学院 Multi-target image data enhancement method based on antagonistic neural network
WO2023246921A1 (en) * 2022-06-23 2023-12-28 京东方科技集团股份有限公司 Target attribute recognition method and apparatus, and model training method and apparatus
CN115526874A (en) * 2022-10-08 2022-12-27 哈尔滨市科佳通用机电股份有限公司 Round pin of brake adjuster control rod and round pin split pin loss detection method
CN115526874B (en) * 2022-10-08 2023-05-12 哈尔滨市科佳通用机电股份有限公司 Method for detecting loss of round pin and round pin cotter pin of brake adjuster control rod

Similar Documents

Publication Publication Date Title
CN111950346A (en) Pedestrian detection data expansion method based on generation type countermeasure network
CN105894045B (en) A kind of model recognizing method of the depth network model based on spatial pyramid pond
CN105069746B (en) Video real-time face replacement method and its system based on local affine invariant and color transfer technology
CN112734845B (en) Outdoor monocular synchronous mapping and positioning method fusing scene semantics
CN111080659A (en) Environmental semantic perception method based on visual information
CN106022363B (en) A kind of Chinese text recognition methods suitable under natural scene
CN112784736B (en) Character interaction behavior recognition method based on multi-modal feature fusion
CN109002752A (en) A kind of complicated common scene rapid pedestrian detection method based on deep learning
CN110197152A (en) A kind of road target recognition methods for automated driving system
CN108416292A (en) A kind of unmanned plane image method for extracting roads based on deep learning
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN112633220B (en) Human body posture estimation method based on bidirectional serialization modeling
CN112288776B (en) Target tracking method based on multi-time step pyramid codec
CN107506765A (en) A kind of method of the license plate sloped correction based on neutral net
CN113076804B (en) Target detection method, device and system based on YOLOv4 improved algorithm
CN111209858A (en) Real-time license plate detection method based on deep convolutional neural network
CN115376024A (en) Semantic segmentation method for power accessory of power transmission line
CN109766790A (en) A kind of pedestrian detection method based on self-adaptive features channel
CN112560865A (en) Semantic segmentation method for point cloud under outdoor large scene
CN113159067A (en) Fine-grained image identification method and device based on multi-grained local feature soft association aggregation
CN114758178B (en) Hub real-time classification and air valve hole positioning method based on deep learning
CN112884893A (en) Cross-view-angle image generation method based on asymmetric convolutional network and attention mechanism
CN114399533B (en) Single-target tracking method based on multi-level attention mechanism
Li et al. Line drawing guided progressive inpainting of mural damages
CN114494786A (en) Fine-grained image classification method based on multilayer coordination convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201117