CN109522857B - People number estimation method based on generation type confrontation network model - Google Patents
People number estimation method based on generation type confrontation network model Download PDFInfo
- Publication number
- CN109522857B CN109522857B CN201811415565.0A CN201811415565A CN109522857B CN 109522857 B CN109522857 B CN 109522857B CN 201811415565 A CN201811415565 A CN 201811415565A CN 109522857 B CN109522857 B CN 109522857B
- Authority
- CN
- China
- Prior art keywords
- image
- network
- people
- convolution
- adopting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a people number estimation method based on a Generative confrontation network model, which relates to a characteristic automatic extraction technology and a multiple regression model in deep learning, fully utilizes the characteristic representation capability of the Generative confrontation network model (GANs, general adaptive Nets), takes a density graph indicating local crowd density as a second supervision signal, takes the number of people in an image as a first supervision signal, trains a network by using a back propagation algorithm, and then initializes the network by using the obtained network parameters to predict the number of people in an unknown image.
Description
Technical Field
The invention relates to a people number estimation method based on a generating type confrontation network model, and belongs to the technical field of image processing.
Background
It has been challenging to directly estimate the number of people from the image due to the influence of illumination variation, perspective distortion, and noisy environment disturbance (such as the background is a forest, or a wall with strong light reflection). However, the arousal of deep learning techniques in recent years has led researchers and engineers' projects to make extensive use of and develop deep network models. Although the automatic people number estimation method based on the deep network model achieves quite good performance in natural scenes.
Zhang [1] et al propose a multi-column convolutional network, as shown in FIG. 1. The scheme provides a single image crowd counting algorithm based on a multi-column convolutional neural network, wherein the multi-column convolutional neural network comprises three sub-networks, the structures of the sub-networks are different, convolution kernels used for the sub-networks are different in size, the input of each sub-network is the same image, after four times of convolution and two times of pooling, feature maps output by the three sub-networks are linked together in a channel dimension, and a 1 x 1 kernel convolution is used for obtaining a crowd density map. However, the scheme is only linked together at the high layer of the network, and the multi-scale features at the shallow layer of the network are not fully fused, so that the loss of the geometric features is caused, and the accuracy of people number estimation is influenced; this scheme requires training three sub-networks before training the entire network, and the training time for each sub-network is not less than ten hours.
Daniel [2] et al proposed a multi-branch convolutional network based on multi-scale blocks, as shown in FIG. 2. This scheme consists of three different subnets, although the input blocks have different dimensions. However, the three sub-networks of the scheme have the same structure, and are only linked together at the high level of the network, and the multi-scale features at the shallow layer of the network are not fully fused, so that the loss of the geometric features is caused, and the accuracy of people number estimation is influenced; this scheme requires training three sub-networks before training the entire network, and the training time for each sub-network is not less than ten hours.
Han [3] et al propose a method based on a combination of residual error network (ResNet) and fully connected network, as shown in FIG. 3. The scheme includes that firstly, a plurality of blocks are sampled from each image in an overlapped mode, then the predicted value of each block is calculated through a residual error network, and then the predicted values of the blocks are sent to a conditional random field to calculate the predicted value of the number of people in the image. However, in the scheme, the predicted value of each block is calculated by using a residual error network, and then the number of people in the image can be predicted by using a conditional random field; that is, the scheme is performed in steps, and the two steps cannot be combined into one step.
However, experiments show that it takes a long time to train these networks, and the training time is continuously increased as the network structure is continuously deepened. The deep network like Han 3, etc. has very deep network structure, and many parameters to be learned, which not only takes long time for training but also has danger of over-fitting; like the schemes proposed by Zhang [1] et al and Daniel [2] et al, although not as deep as the network structure of the scheme proposed by Han [3] et al, the breadth of the network is increased and each subnetwork needs prior pre-training.
Disclosure of Invention
Aiming at the defects of the existing automatic people number estimation technology based on a deep network model, the invention provides a people number estimation method based on a generating type confrontation network model;
in order to reduce network parameters, the size of a convolution kernel of the scheme provided by the invention is not more than 3 at most; in order to reduce the network width, the invention only uses a single-column network structure; in order to ensure the performance of the scheme proposed by the invention, the invention gives different weights to the inputs of the regression network to distinguish the importance degree of different characteristics.
The invention relates to a characteristic automatic extraction technology and a multiple regression model in deep learning, which fully utilize the characteristic representation capability of generating confrontation type network models (GANs), use a density graph indicating local crowd density as a second supervision signal, use the number of people in an image as a first supervision signal, train the network by using a back propagation algorithm, and initialize the network by using the obtained network parameters to predict the number of people in an unknown image.
Interpretation of terms:
1. batch Normalization (Batch Normalization) process, comprising the following four steps: calculating the mean value of each training batch of data; calculating the variance of each training batch of data; normalizing the training data of the batch by using the obtained mean value and variance, namely subtracting the mean value from each training data of the batch and then dividing the result by the standard deviation; then multiplied by a scaling factor gamma, plus a translation factor beta.
2. Linear commutation (ReLU) activation function, which means that f (x) is max (0, x).
3. The max pooling (i.e., "down-sampling") operation refers to maximizing the feature points within a neighborhood.
5. The RMSprop optimization algorithm comprises the steps of firstly, calculating the average value of the squares of the gradients of the previous t times; then, dividing the average value of the squares of the gradients of the previous t times by the gradient of the t time to be used as the updating proportion of the learning rate; and finally, obtaining a new learning rate according to the proportion.
6. The Adam optimization algorithm is used for dynamically adjusting the learning rate of each parameter according to the first moment estimation and the second moment estimation of the gradient of each parameter by the loss function.
The technical scheme of the invention is as follows:
a people number estimation method based on a generating confrontation network model,
the generative confrontation network model comprises three sub-networks including a generator networkDiscriminating networkRegression networkGenerator networkThe method comprises four continuous convolutions + batch normalization + maximum pooling and one convolution + batch normalization;
discriminating networkComprises four continuous up-sampling and convolution components, and is used for judging networkObtaining an estimated value of the density map through the output of the sensor;
regression networkIs a fully connected network; the regression network R has four different inputs including: generator networkOutput after second convolution + batch normalization + max poolingOutput, generator networkOutput after third convolution + batch normalization + maximum pooling, generator networkOutput, generator network after fourth convolution + batch normalization + max poolingOutput after the last convolution plus batch normalization; regression networkThe four different inputs are respectively subjected to different SE-Net to obtain four weighted inputs, and the four weighted inputs are input into a three-layer fully-connected network to obtain a predicted value of the number of people;
generative confrontation network model inspired from two-person zero-sum game in game theory, comprising a generative model (generator network)) And a discriminant model (discriminant network)). The generated model captures the distribution of sample data, and the discrimination model is a two-classifier and discriminates whether the input is real data or a generated sample. The optimization process of the model is a 'binary minimum-maximum game' problem, one side is fixed during training, the parameters of the other model are updated, and iteration is performed alternately.
The method comprises the following steps:
A. training process
(1) Obtaining multi-scale data, wherein the multi-scale data refers to a multi-scale data training set (I, M, C), and each sample is used as (I)i,Mi,Ci) Is represented by IiRepresenting images i, MiDensity map, C, representing image iiRepresenting the number of people in image i;
preferably, in step (1), acquiring multi-scale data includes:
randomly cutting each image in an image database to obtain M image blocks with the size of a multiplied by b and N image blocks with the size of c multiplied by d, wherein the value range of M is 1-100, the value range of N is 1-100, the value range of a is 1-320, the value range of b is 1-240, the value range of c is 1-320, the value range of d is 1-240, and the unit of a, b, c and d is a pixel;
further preferably, in step (i), each image in the image database is randomly cropped to obtain 5 image blocks with a size of 120 × 80 and 5 image blocks with a size of 150 × 100.
(ii) Adjusting the resolution of each image in the image database and each image block randomly intercepted in the step (i) to be e multiplied by f, wherein the value range of e is 80-640, and the value range of f is 60-480;
it is further preferable that in step (ii), the resolution of each image in the image database and each image block randomly truncated in step (i) is adjusted to 320 × 240.
(iii) sequentially performing horizontal turning, vertical turning, centrosymmetric transformation and Gaussian noise addition on each image and each image block in the image database, and performing 4 operations to obtain a new image set, wherein the new image set is marked as I;
(iv) marking the head position of each image in the new image set I to obtain a marking template image set of the image set I, marking the marking template image set as L and a set C of the number of people in all images in the new image set I;
(v) processing each image in the labeling template set L by a formula (II) to obtain a density map set of the image set I, which is marked as M:
in the formula (II), { (x)k,yk),0≤k≤CiDenotes the pixel position of the person marked in the image i, CiRepresenting the number of persons in image i, Mi(x, y) represents a density map corresponding to an image i, σ is a standard deviation, i represents the number of the image, 0dxcRepresents an all-zero matrix of size e x f;
more preferably, σ is 3.0.
(vi) obtaining a multi-scale training set of data (I, M, C), each sample using (I)i,Mi,Ci) Is represented by IiRepresenting images i, MiDensity map, C, representing image iiRepresenting the number of people in image i;
a. adopting 8 matrixes with the scale of 3 multiplied by 3 and 16 matrixes with the scale of 3 multiplied by 3 as convolution kernels, and adopting a random orthogonal matrix to initialize the convolution kernels, wherein the random orthogonal matrix is formed by [0, 1]The uniformly distributed random number matrix is obtained by SVD (singular value decomposition); respectively adopting different convolution cores to convolute the input image of the new image set I, and respectively and sequentially carrying out batch normalization processing, linear rectification activation function and maximum pooling to obtain an output image set, namely a feature map set
b. Adopting 32 matrixes with the scale of 3 multiplied by 3 as convolution kernels, adopting a random orthogonal matrix to initialize the convolution kernels, and adopting the convolution kernels to check a feature map setPerforming convolution, and sequentially performing batch normalization, linear rectification activation function and maximum pooling to obtain an output image set, i.e. a feature map set
c. Adopting 64 matrixes with the scale of 3 multiplied by 3 as convolution kernels, adopting a random orthogonal matrix to initialize the convolution kernels, and adopting the convolution kernels to check a feature map setPerforming convolution, and sequentially performing batch normalization, linear rectification activation function and maximum pooling to obtain an output image set, i.e. a feature map set
d. Adopting 128 matrixes with the scale of 3 multiplied by 3 as convolution kernels, adopting a random orthogonal matrix to initialize the convolution kernels, and adopting the convolution kernels to check a feature map setPerforming convolution, and sequentially performing batch normalization, linear rectification activation function and maximum pooling to obtain an output image set, i.e. a feature map set Ig;
adopting 64 matrixes with the scale of 3 multiplied by 3, 32 matrixes with the scale of 3 multiplied by 3, 16 matrixes with the scale of 3 multiplied by 3 and 8 matrixes with the scale of 3 multiplied by 3 as convolution kernels, and initializing the convolution kernels by adopting random orthogonal matrixes; for feature map set IgPerforming upsampling treatment, and checking the upsampled characteristic diagram set I by adopting different convolution cores respectivelygPerforming convolution to obtain an output imageNamely an estimated density map corresponding to the input image of the new image set I;
(4) attention features were extracted with SE-Net:
e. treating with global average pooling (global average pooling)Obtaining a feature vectorProcessing with global average poolingObtaining a feature vectorProcessing with global average poolingObtaining a feature vectorTreatment with global average pooling IgTo obtain a feature vector vg;
f. A multilayer perceptron with 16 nerve units at input, 1 nerve unit at hidden layer and 16 nerve units at output is usedIn the second layer, with a minimum value ofMaximum value ofUniformly distributed initialization multi-layer perceptronWeight matrix ofAnd bias the termInitialized to 0 and subjected to a linear rectification (ReLU) activation function; followed byWith a minimum value ofMaximum value ofUniformly distributed initialization multi-layer perceptronWeight matrix ofAnd bias the termInitializing to 0, and activating a function through a common S function (sigmoid) to obtain a 16-dimensional feature vector
Meanwhile, a multilayer perceptron with 32 nerve units at input, 1 nerve unit at hidden layer and 32 nerve units at output is utilizedWith a minimum value ofMaximum value ofUniformly distributed initialization multi-layer perceptronWeight matrix ofAnd bias the termInitialIs changed into 0 and is subjected to a linear rectification (ReLU) activation function; then, using a minimum value ofMaximum value ofUniformly distributed initialization multi-layer perceptronWeight matrix ofAnd bias the termInitialized to 0 and activated by a common S function (sigmoid) to obtain a 32-dimensional feature vector
Meanwhile, a multilayer perceptron with 64 nerve units at input, 1 nerve unit at hidden layer and 64 nerve units at output is utilizedWith a minimum value ofMaximum value ofUniformly distributed initialization multi-layer perceptronWeight matrix ofAnd bias the termInitialized to 0 and subjected to a linear rectification (ReLU) activation function; then, using a minimum value ofMaximum value ofUniformly distributed initialization multi-layer perceptronWeight matrix ofAnd bias the termInitialized to 0 and activated by a common S function (sigmoid) to obtain 64-dimensional feature vector
Meanwhile, a multi-layer perceptron MLP with 128 nerve units at input, 1 nerve unit at hidden layer and 128 nerve units at output is utilizedg(ii) a With a minimum value ofMaximum value ofUniform distribution initialization multi-layer perceptron MLPgWeight matrix ofAnd bias the termInitialized to 0 and subjected to a linear rectification (ReLU) activation function; then, using a minimum value ofMaximum value ofUniform distribution initialization multi-layer perceptron MLPgWeight matrix ofAnd bias the termInitialized to 0 and activated through a common S function (sigmoid) to obtain a 128-dimensional feature vector v'g;
The extracted attention characteristics include: 16-dimensional feature vector32-dimensional feature vector64-dimensional feature vector128-dimensional feature vector v'g;
(5) Re-weighting the feature map with the attention feature;
integrating feature mapsAll pixels of each image of (a) are multiplied by a feature vectorThe corresponding component of (a); the feature map set after the weight is newly given is obtained as
Integrating feature mapsAll pixels of each image of (a) are multiplied by a feature vectorThe corresponding component of (a); the feature map set after the weight is newly given is obtained as
Integrating feature mapsAll pixels of each image of (a) are multiplied by a feature vectorThe corresponding component of (a); the feature map set after the weight is newly given is obtained as
Collecting the feature map IgIs multiplied by a feature vector v 'for all pixels of each image'gThe corresponding component of (a); get the feature map set I 'after re-weighting'g;
(6) Calculating the number of people in the image by using a regression network R;
g. using a fully-connected layer MLP with 26400 neural elements on the input and 1 neural element on the outputRWith a minimum value ofMaximum value ofThe weight matrix W of the full connection layer is initialized by uniform distribution ofRAnd the bias term b is initialized to 0;
h. using full-link MLPRSimultaneous processingAnd l'gAnd obtaining a scalar quantity of 1 dimension through a linear rectification (ReLU) activation functionScalar quantityIs the number of people in the image;
(7) network training;
i. defining a loss function, i.e. an objective function to be optimized, as shown in formula (i):
in formula (I), Loss represents the value of the Loss function, λ1Representing the weight taken up by the error generated by the discriminator,representing an image IiThrough a generator networkOutput of (a)2Representing the weight taken up by the error produced by the generator,to representPassing through a discriminator networkM denotes the number of samples after the training set has been augmented, i.e., m is 70400. I isiRepresenting an input image, ciRepresenting the number of persons in the image, MiA density map representing the correspondence of the image; c. CiIndicating a primary supervisory signal, MiRepresents a secondary supervisory signal;
j. raw materialNetwork of generatorsSelecting an Adam optimization algorithm, judging the network with an initial learning rate g _ base _ lrSelecting an RMSprop optimization algorithm, wherein the initial learning rate is d _ base _ lr, and the regression networkSelecting an Adam optimization algorithm, wherein the initial learning rate is r _ base _ lr; the value range of g _ base _ lr is 0.000001-1, the value range of d _ base _ lr is 0.000001-1, and the value range of r _ base _ lr is 0.000001-1;
further preferably, g _ base _ lr takes a value of 0.00001, d _ base _ lr takes a value of 0.0002, and r _ base _ lr takes a value of 0.0001.
randomly acquiring m images from a training set1,I2,…,Im};
Secondly, randomly sampling density maps { M ] corresponding to the M images from the training set1,M2,…,Mm};
Computing discriminating networkGradient (2): is referred to as discriminating networkTraining error relative discriminant networkParameter theta ofdA gradient of (a);
fourthly, updating the discrimination network by adopting RMSprop optimization algorithmThe parameters of (1);
collecting m images from training set randomly1,I2,…,Im};
Sixthly, randomly sampling density chart corresponding to m images from training set1,C2,…,Cm};
ninthly randomly acquiring m images from training set1,I2,…,Im};
Tag of number of people corresponding to m images randomly sampled from training set in Rir (C)1,C2,…,Cm};
Computing regression networksGradient (2): refers to a regression networkTraining error versus regression networkParameter theta ofrA gradient of (a);
B. the testing process comprises the following steps:
and (4) initializing the network by using the network parameters obtained in the step (7), taking the test image as the input of the network, and directly outputting the number of people in the image by the network.
The invention has the beneficial effects that:
1. the invention provides a feature extraction algorithm based on a generative countermeasure network, which makes full use of the implicit feature representation capability of the generative network and applies a multi-task learning technology to make the generalization capability of a model stronger;
2. the invention utilizes the attention model to lead the adjustment of the network parameters to pay more attention to the characteristics influencing the accuracy;
3. the training algorithm of the countermeasure regression model provided by the invention adopts alternate training and random sampling, so that the occurrence of overfitting is avoided.
Drawings
Figure 1 is an architectural diagram of a multi-column convolutional network proposed by Zhang et al.
Fig. 2 is an architecture diagram of a multi-branch convolutional network based on multi-scale blocks proposed by Daniel et al.
Fig. 3 is an architecture diagram of a combination of residual error network (ResNet), fully connected network and markov random field proposed by Han et al.
Fig. 4 is a structural block diagram of a generative countermeasure network model proposed by the present invention.
Detailed Description
The invention is further defined in the following, but not limited to, the figures and examples in the description.
Example 1
A people number estimation method based on a generative confrontation network model, the generative confrontation network model comprises three sub-networks, as shown in figure 4, including a generator networkDiscriminating networkRegression network
Generator networkThe method comprises four continuous convolutions + batch normalization + maximum pooling and one convolution + batch normalization;
discriminating networkComprises four continuous up-sampling and convolution components, and is used for judging networkObtaining an estimated value of the density map through the output of the sensor;
regression networkIs a fully connected network; regression networkThere are four different inputs, including: generator networkOutput, generator network after second convolution + batch normalization + max poolingOutput after third convolution + batch normalization + maximum pooling, generator networkOutput, generator network after fourth convolution + batch normalization + max poolingOutput after the last convolution plus batch normalization; regression networkThe four different inputs are respectively subjected to different SE-Net to obtain four weighted inputs, and the four weighted inputs are input into a three-layer fully-connected network to obtain a predicted value of the number of people;
generative confrontation network model inspired from two-person zero-sum game in game theory, comprising a generative model (generator network)) And a discriminant model (discriminant network)). The generated model captures the distribution of sample data, and the discrimination model is a two-classifier and discriminates whether the input is real data or a generated sample. The optimization process of the model is a 'binary minimum-maximum game' problem, one side is fixed during training, the parameters of the other model are updated, and iteration is performed alternately.
The method comprises the following steps:
A. training process
(1) Obtaining multi-scale data, wherein the multi-scale data refers to a multi-scale data training set (I, M, C), and each sample is used as (I)i,Mi,Ci) Is represented by IiRepresenting images i, MiDensity map, C, representing image iiRepresenting the number of people in image i; the method comprises the following steps:
randomly cutting each image in an image database to obtain M image blocks with the size of a multiplied by b and N image blocks with the size of c multiplied by d, wherein the value range of M is 1-100, the value range of N is 1-100, the value range of a is 1-320, the value range of b is 1-240, the value range of c is 1-320, the value range of d is 1-240, and the unit of a, b, c and d is a pixel;
(ii) adjusting the resolution of each image in the image database and each image block randomly intercepted in the step (i) to be e multiplied by f, wherein the value range of e is 80-640, and the value range of f is 60-480;
(iii) sequentially performing horizontal turning, vertical turning, centrosymmetric transformation and Gaussian noise addition on each image and each image block in the image database, and performing 4 operations to obtain a new image set, wherein the new image set is marked as I;
(iv) marking the head position of each image in the new image set I to obtain a marking template image set of the image set I, marking the marking template image set as L and a set C of the number of people in all images in the new image set I;
(v) processing each image in the labeling template set L by a formula (II) to obtain a density map set of the image set I, which is marked as M:
in the formula (II), { (x)k,yk),0≤k≤CiDenotes the pixel position of the person marked in the image i, CiRepresenting the number of persons in image i, Mi(x, y) represents a density map corresponding to an image i, σ is a standard deviation, i represents the number of the image, 0dxcRepresents an all-zero matrix of size e x f; σ is 3.0.
(vi) obtaining a multi-scale training set of data (I, M, C), each sample using (I)i,Mi,Ci) Is represented by IiRepresenting images i, MiDensity map, C, representing image iiRepresenting the number of people in image i;
a. adopting 8 matrixes with the scale of 3 multiplied by 3 and 16 matrixes with the scale of 3 multiplied by 3 as convolution kernels, and adopting a random orthogonal matrix to initialize the convolution kernels, wherein the random orthogonal matrix is formed by [0, 1]The uniformly distributed random number matrix is obtained by SVD (singular value decomposition); respectively adopting different convolution cores to convolute the input image of the new image set I, and respectively and sequentially carrying out batch normalization processing, linear rectification activation function and maximum pooling to obtain an output image set, namely a feature map set
b. Adopting 32 matrixes with the scale of 3 multiplied by 3 as convolution kernels, adopting a random orthogonal matrix to initialize the convolution kernels, and adopting the convolution kernels to check a feature map setPerforming convolution, and sequentially performing batch normalization, linear rectification activation function and maximum pooling to obtain an output image set, i.e. a feature map setCombination of Chinese herbs
c. Adopting 64 matrixes with the scale of 3 multiplied by 3 as convolution kernels, adopting a random orthogonal matrix to initialize the convolution kernels, and adopting the convolution kernels to check a feature map setPerforming convolution, and sequentially performing batch normalization, linear rectification activation function and maximum pooling to obtain an output image set, i.e. a feature map set
d. Adopting 128 matrixes with the scale of 3 multiplied by 3 as convolution kernels, adopting a random orthogonal matrix to initialize the convolution kernels, and adopting the convolution kernels to check a feature map setPerforming convolution, and sequentially performing batch normalization, linear rectification activation function and maximum pooling to obtain an output image set, i.e. a feature map set Ig;
(3) Using discriminant networksGenerating an estimated density map: adopting 64 matrixes with the scale of 3 multiplied by 3, 32 matrixes with the scale of 3 multiplied by 3, 16 matrixes with the scale of 3 multiplied by 3 and 8 matrixes with the scale of 3 multiplied by 3 as convolution kernels, and initializing the convolution kernels by adopting random orthogonal matrixes; for feature map set IgPerforming upsampling treatment, and checking the upsampled characteristic diagram set I by adopting different convolution cores respectivelygPerforming convolution to obtain an output imageNamely an estimated density map corresponding to the input image of the new image set I;
(4) attention features were extracted with SE-Net:
e. pooling with global averaging(Global operating posing) treatmentObtaining a feature vectorProcessing with global average poolingObtaining a feature vectorProcessing with global average poolingObtaining a feature vectorTreatment with global average pooling IgTo obtain a feature vector vg;
f. A multilayer perceptron with 16 nerve units at input, 1 nerve unit at hidden layer and 16 nerve units at output is usedIn the second layer, with a minimum value ofMaximum value ofUniformly distributed initialization multi-layer perceptronWeight matrix ofAnd bias the termInitialized to 0 and subjected to a linear rectification (ReLU) activation function; then, using a minimum value ofMaximum value ofUniformly distributed initialization multi-layer perceptronWeight matrix ofAnd bias the termInitializing to 0, and activating a function through a common S function (sigmoid) to obtain a 16-dimensional feature vector
Meanwhile, a multilayer perceptron with 32 nerve units at input, 1 nerve unit at hidden layer and 32 nerve units at output is utilizedWith a minimum value ofMaximum value ofUniformly distributed initialization multi-layer perceptronWeight matrix ofAnd bias the termInitialized to 0 and subjected to a linear rectification (ReLU) activation function; then, using a minimum value ofMaximum value ofUniformly distributed initialization multi-layer perceptronWeight matrix ofAnd bias the termInitialized to 0 and activated by a common S function (sigmoid) to obtain a 32-dimensional feature vector
Meanwhile, a multilayer perceptron with 64 nerve units at input, 1 nerve unit at hidden layer and 64 nerve units at output is utilizedWith a minimum value ofMaximum value ofUniformly distributed initialization multi-layer perceptronWeight matrix ofAnd bias the termInitialized to 0 and subjected to a linear rectification (ReLU) activation function; then, using a minimum value ofMaximum value ofUniformly distributed initialization multi-layer perceptronWeight matrix ofAnd bias the termInitialized to 0 and activated by a common S function (sigmoid) to obtain 64-dimensional feature vector
Meanwhile, a multi-layer perceptron MLP with 128 nerve units at input, 1 nerve unit at hidden layer and 128 nerve units at output is utilizedg(ii) a With a minimum value ofMaximum value ofUniform distribution initialization multi-layer perceptron MLPgWeight matrix ofAnd bias the termIs initialized to 0, andand passing through a linear rectification (ReLU) activation function; then, using a minimum value ofMaximum value ofUniform distribution initialization multi-layer perceptron MLPgWeight matrix ofAnd bias the termInitialized to 0 and activated through a common S function (sigmoid) to obtain a 128-dimensional feature vector v'g;
The extracted attention characteristics include: 16-dimensional feature vector32-dimensional feature vector64-dimensional feature vector128-dimensional feature vector v'g;
(5) Re-weighting the feature map with the attention feature;
integrating feature mapsAll pixels of each image of (a) are multiplied by a feature vectorThe corresponding component of (a); the feature map set after the weight is newly given is obtained as
Integrating feature mapsAll pixels of each image of (a) are multiplied by a feature vectorThe corresponding component of (a); the feature map set after the weight is newly given is obtained as
Integrating feature mapsAll pixels of each image of (a) are multiplied by a feature vectorThe corresponding component of (a); the feature map set after the weight is newly given is obtained as
Collecting the feature map IgIs multiplied by a feature vector v 'for all pixels of each image'gThe corresponding component of (a); get the feature map set I 'after re-weighting'g;
g. using a fully-connected layer MLP with 26400 neural elements on the input and 1 neural element on the outputRWith a minimum value ofMaximum value ofUniformly distributed initialization of the weight moments of the full connection layerArray WRAnd the bias term b is initialized to 0;
h. using full-link MLPRSimultaneous processingAndand obtaining a scalar quantity of 1 dimension by a linear rectification (ReLU) activation functionScalar quantityIs the number of people in the image;
(7) network training;
i. defining a loss function, i.e. an objective function to be optimized, as shown in formula (i):
in formula (I), Loss represents the value of the Loss function, λ1Representing the weight taken up by the error generated by the discriminator,representing an image IiThrough a generator networkOutput of (a)2Representing the weight taken up by the error produced by the generator,to representPassing through a discriminator networkM denotes the number of samples after the training set has been augmented, i.e., m is 70400. I isiRepresenting an input image, ciRepresenting the number of persons in the image, MiA density map representing the correspondence of the image; c. CiIndicating a primary supervisory signal, MiRepresents a secondary supervisory signal;
j. generator networkSelecting an Adam optimization algorithm, judging the network with an initial learning rate g _ base _ lrSelecting an RMSprop optimization algorithm, wherein the initial learning rate is d _ base _ lr, and the regression networkSelecting an Adam optimization algorithm, wherein the initial learning rate is r _ base _ lr; the value range of g _ base _ lr is 0.000001-1, the value range of d _ base _ lr is 0.000001-1, and the value range of r _ base _ lr is 0.000001-1;
randomly acquiring m images from a training set1,I2,…,Im};
Secondly, randomly sampling density maps { M ] corresponding to the M images from the training set1,M2,…,Mm};
Computing discriminating networkGradient (2): means to discriminateNetworkTraining error relative discriminant networkParameter theta ofdA gradient of (a);
fourthly, updating the discrimination network by adopting RMSprop optimization algorithmThe parameters of (1);
collecting m images from training set randomly1,I2,…,Im};
Sixthly, randomly sampling density chart corresponding to m images from training set1,C2,…,Cm};
ninthly randomly acquiring m images from training set1,I2,…,Im};
Tag of number of people corresponding to m images randomly sampled from training set in Rir (C)1,C2,…,Cm};
Computing regression networksGradient (2): refers to a regression networkTraining error versus regression networkParameter theta ofrA gradient of (a);
B. the testing process comprises the following steps:
and (4) initializing the network by using the network parameters obtained in the step (7), taking the test image as the input of the network, and directly outputting the number of people in the image by the network.
Example 2
The people number estimation method based on the generative confrontation network model according to the embodiment 1 is characterized in that:
in step (i), each image in the image database is randomly cropped to obtain 5 image blocks with the size of 120 × 80 and the size of 150 × 100. This step is only valid for the training set and not for the test set.
In step (ii), the resolution of each image in the image database, and of each image block randomly truncated in step (i), is adjusted to 320 × 240.
The value of g _ base _ lr is 0.00001, the value of d _ base _ lr is 0.0002, and the value of r _ base _ lr is 0.0001.
Algorithm 1 is applied to train a generative confrontation network model.
The method makes full use of the implicit characteristic representation capability of the generative network and applies a multi-task learning technology to ensure that the generalization capability of the model is stronger; an attention model is utilized, so that the adjustment of network parameters is more concerned about the characteristics influencing the accuracy; the method adopts alternate training and random sampling, thereby avoiding the occurrence of overfitting.
The effects of the present invention can be further illustrated by experiments. Table 1 compares the prediction error on the MALL test set using the present invention with the method of Zhang et al, Daniel et al, and the method of Han et al, where "(calculated using true density maps)" in table 1 means: the sum of the pixels of the true density map is considered to correspond to the number of true people in the image.
TABLE 1
As can be seen from Table 1, the method of the present invention is more accurate than the other four methods.
Claims (10)
1. A people number estimation method based on a generative confrontation network model is characterized in that the generative confrontation network model comprises three sub-networks including a generator networkDiscriminating networkRegression networkGenerator networkThe method comprises four continuous convolutions + batch normalization + maximum pooling and one convolution + batch normalization; discriminating networkComprises four continuous up-sampling and convolution components, and is used for judging networkObtaining an estimated value of the density map through the output of the sensor; regression networkIs a fully connected network; regression networkThere are four different inputs, including: generator networkOutput, generator network after second convolution + batch normalization + max poolingOutput after third convolution + batch normalization + maximum pooling, generator networkOutput, generator network after fourth convolution + batch normalization + max poolingOutput after the last convolution plus batch normalization; regression networkThe four different inputs are respectively subjected to different SE-Net to obtain four weighted inputs, and the four weighted inputs are input into a three-layer fully-connected network to obtain the predicted value of the number of people; the method comprises the following steps:
A. training process
(1) Obtaining multi-scale data, wherein the multi-scale data refers to a multi-scale data training set (I, M, C), and each sample is used as (I)i,Mi,Ci) Is represented by IiRepresenting images i, MiDensity map, C, representing image iiRepresenting the number of people in image i;
(4) attention features were extracted with SE-Net:
(5) re-weighting the feature map with the attention feature;
(7) network training;
B. the testing process comprises the following steps:
and (4) initializing the network by using the network parameters obtained in the step (7), taking the test image as the input of the network, and directly outputting the number of people in the image by the network.
2. The people number estimation method based on generative confrontation network model as claimed in claim 1, wherein in step (2), the generator network is usedGenerating a feature map set of an image, comprising the steps of:
a. adopting 8 matrixes with the scale of 3 multiplied by 3 and 16 matrixes with the scale of 3 multiplied by 3 as convolution kernels, and adopting a random orthogonal matrix to initialize the convolution kernels, wherein the random orthogonal matrix is formed by [0, 1]The uniformly distributed random number matrix is obtained by SVD; respectively adopting different convolution cores to convolute the input image of the new image set I, and respectively and sequentially carrying out batch normalization processing, linear rectification activation function and maximum pooling to obtain an output image set, namely a feature map set
b. Adopting 32 matrixes with the scale of 3 multiplied by 3 as convolution kernels, adopting a random orthogonal matrix to initialize the convolution kernels, and adopting the convolution kernels to check a feature map setPerforming convolution, and sequentially performing batch normalization, linear rectification activation function and maximum pooling to obtain an output image set, i.e. a feature map set
c. Adopting 64 matrixes with the scale of 3 multiplied by 3 as convolution kernels, adopting a random orthogonal matrix to initialize the convolution kernels, and adopting the convolution kernels to check a feature map setPerforming convolution, and sequentially performing batch normalization, linear rectification activation function and maximum pooling to obtain an output image setAggregate feature graph set
d. Adopting 128 matrixes with the scale of 3 multiplied by 3 as convolution kernels, adopting a random orthogonal matrix to initialize the convolution kernels, and adopting the convolution kernels to check a feature map setPerforming convolution, and sequentially performing batch normalization, linear rectification activation function and maximum pooling to obtain an output image set, i.e. a feature map set Ig。
3. The method as claimed in claim 2, wherein in the step (3), the discriminative network is used to estimate the number of people using the generative confrontation network modelGenerating an estimated density map, comprising the steps of:
adopting 64 matrixes with the scale of 3 multiplied by 3, 32 matrixes with the scale of 3 multiplied by 3, 16 matrixes with the scale of 3 multiplied by 3 and 8 matrixes with the scale of 3 multiplied by 3 as convolution kernels, and initializing the convolution kernels by adopting random orthogonal matrixes; for feature map set IgPerforming upsampling treatment, and checking the upsampled characteristic diagram set I by adopting different convolution cores respectivelygPerforming convolution to obtain an output imageI.e. the estimated density map corresponding to the input image of the new image set I.
4. The people number estimation method based on the generative confrontation network model as claimed in claim 2, wherein the step (4) of extracting attention features by using SE-Net comprises the following steps:
e. processing with global average poolingObtaining a feature vectorProcessing with global average poolingObtaining a feature vectorProcessing with global average poolingObtaining a feature vectorTreatment with global average pooling IgTo obtain a feature vector vg;
f. A multilayer perceptron with 16 nerve units at input, 1 nerve unit at hidden layer and 16 nerve units at output is usedIn the second layer, with a minimum value ofMaximum value ofUniformly distributed initialization multi-layer perceptronWeight matrix ofAnd bias the termInitializing to 0 and activating a function through linear rectification; then, using a minimum value ofMaximum value ofUniformly distributed initialization multi-layer perceptronWeight matrix ofAnd bias the termInitialized to 0 and activated by an S function to obtain a 16-dimensional feature vector
Meanwhile, a multilayer perceptron with 32 nerve units at input, 1 nerve unit at hidden layer and 32 nerve units at output is utilizedWith a minimum value ofMaximum value ofUniformly distributed initialization multi-layer perceptronWeight matrix ofAnd bias the termInitializing to 0 and activating a function through linear rectification; then, using a minimum value ofMaximum value ofUniformly distributed initialization multi-layer perceptronWeight matrix ofAnd bias the termInitialized to 0 and activated by a common S function to obtain a 32-dimensional feature vector
Meanwhile, a multilayer perceptron with 64 nerve units at input, 1 nerve unit at hidden layer and 64 nerve units at output is utilizedWith a minimum value ofMaximum value ofUniformly distributed initialization multi-layer perceptronWeight matrix ofAnd bias the termInitializing to 0 and activating a function through linear rectification; then, using a minimum value ofMaximum value ofUniformly distributed initialization multi-layer perceptronWeight matrix ofAnd bias the termInitialized to 0 and activated by an S function to obtain a 64-dimensional feature vector
Meanwhile, a multi-layer perceptron MLP with 128 nerve units at input, 1 nerve unit at hidden layer and 128 nerve units at output is utilizedg(ii) a With a minimum value ofMaximum value ofUniform distribution initialization multi-layer perceptron MLPgWeight matrix ofAnd bias the termInitialized to 0 and subjected to a linear rectification activation function; then, using a minimum value ofMaximum value ofUniform distribution initialization multi-layer perceptron MLPgWeight matrix ofAnd bias the termInitializing to be 0, and activating a function through an S function to obtain a 128-dimensional feature vector v'g;
5. The people number estimation method based on the generative confrontation network model as claimed in claim 4, wherein the step (5) of reweighing the feature map with attention features comprises the following steps:
integrating feature mapsAll pixels of each image of (a) are multiplied by a feature vectorThe corresponding component of (a); the feature map set after the weight is newly given is obtained as
Integrating feature mapsAll pixels of each image of (a) are multiplied by a feature vectorThe corresponding component of (a); the feature map set after the weight is newly given is obtained as
Integrating feature mapsAll pixels of each image of (a) are multiplied by a feature vectorThe corresponding component of (a); the feature map set after the weight is newly given is obtained as
Collecting the feature map IgIs multiplied by a feature vector v 'for all pixels of each image'gPair ofA component of response; get the feature map set I 'after re-weighting'g。
6. The people number estimation method based on generative confrontation network model as claimed in claim 5, wherein in step (6), regression network is usedCalculating the number of people in the image, comprising the steps of:
g. using a fully-connected layer MLP with 26400 neural elements on the input and 1 neural element on the outputRWith a minimum value ofMaximum value ofThe weight matrix W of the full connection layer is initialized by uniform distribution ofRAnd the bias term b is initialized to 0;
7. The people number estimation method based on the generative confrontation network model as claimed in claim 6, wherein in the step (7), the network training comprises the following steps:
i. defining a loss function, i.e. an objective function to be optimized, as shown in equation (II):
in the formula (II), Loss represents the value of the Loss function, λ1Representing the weight taken up by the error generated by the discriminator,representing an image IiThrough a generator networkOutput of (a)2Representing the weight taken up by the error produced by the generator,to representPassing through a discriminator networkM represents the number of samples after the training set has been augmented, IiRepresenting an input image, ciRepresenting the number of persons in the image, MiA density map representing the correspondence of the image;
j. generator networkSelecting an Adam optimization algorithm, judging the network with an initial learning rate g _ base _ lrSelecting an RMSprop optimization algorithm, wherein the initial learning rate is d _ base _ lr, and the regression networkSelecting Adam optimization algorithmThe initial learning rate is r _ base _ lr; the value range of g _ base _ lr is 0.000001-1, the value range of d _ base _ lr is 0.000001-1, and the value range of r _ base _ lr is 0.000001-1;
randomly acquiring m images from a training set1,I2,…,Im};
Secondly, randomly sampling density maps { M ] corresponding to the M images from the training set1,M2,…,Mm};
Computing discriminating networkGradient (2): is referred to as discriminating networkTraining error relative discriminant networkParameter theta ofdA gradient of (a);
fourthly, updating the discrimination network by adopting RMSprop optimization algorithmThe parameters of (1);
collecting m images from training set randomly1,I2,…,Im};
Sixthly, randomly sampling density chart corresponding to m images from training set1,C2,…,Cm};
ninthly randomly acquiring m images from training set1,I2,…,Im};
Tag of number of people corresponding to m images randomly sampled from training set in Rir (C)1,C2,…,Cm};
Computing regression networksGradient (2): refers to a regression networkTraining error versus regression networkParameter theta ofrA gradient of (a);
8. The people number estimation method based on the generative confrontation network model as claimed in claim 1, wherein the step (1) of obtaining multi-scale data comprises:
randomly cutting each image in an image database to obtain M image blocks with the size of a multiplied by b and N image blocks with the size of c multiplied by d, wherein the value range of M is 1-100, the value range of N is 1-100, the value range of a is 1-320, the value range of b is 1-240, the value range of c is 1-320, the value range of d is 1-240, and the unit of a, b, c and d is a pixel;
(ii) adjusting the resolution of each image in the image database and each image block randomly intercepted in the step (i) to be e × f, wherein the value range of e is 80-640, and the value range of f is 60-480;
(iii) respectively and sequentially carrying out horizontal turning, vertical turning, central symmetry transformation and Gaussian noise addition on each image and each image block in the image database to obtain a new image set, and marking as I;
(iv) labeling the head position of each image in the new image set I to obtain a labeled template image set of the image set I, which is marked as L, and a set C of the number of people in all the images in the new image set I;
(v) processing each image in the labeling template set L by a formula (I) to obtain a density map set of the image set I, which is marked as M:
in the formula (I) { (x)k,yk),0≤k≤CiDenotes the pixel position of the person marked in the image i, CiRepresenting the number of persons in image i, Mi(x, y) represents a density map corresponding to an image i, σ is a standard deviation, i represents the number of the image, 0e×fRepresents an all-zero matrix of size e x f;
(vi) obtaining a training set of multiscale data (I, M, C), each sample using (I)i,Mi,Ci) Is represented by IiRepresenting images i, MiDensity map, C, representing image iiRepresenting the number of people in the image i.
9. The people estimation method based on generative confrontation network model as claimed in claim 8,
in the step (i), each image in the image database is randomly cropped to obtain 5 image blocks with the size of 120 × 80 and the size of 150 × 100;
in the step (ii), the resolution of each image in the image database and each image block randomly intercepted in the step (i) is adjusted to 320 × 240; σ is 3.0.
10. The people number estimation method based on the generative confrontation network model as claimed in claim 7, wherein g _ base _ lr is 0.00001, d _ base _ lr is 0.0002, and r _ base _ lr is 0.0001.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811415565.0A CN109522857B (en) | 2018-11-26 | 2018-11-26 | People number estimation method based on generation type confrontation network model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811415565.0A CN109522857B (en) | 2018-11-26 | 2018-11-26 | People number estimation method based on generation type confrontation network model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109522857A CN109522857A (en) | 2019-03-26 |
CN109522857B true CN109522857B (en) | 2021-04-23 |
Family
ID=65793346
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811415565.0A Active CN109522857B (en) | 2018-11-26 | 2018-11-26 | People number estimation method based on generation type confrontation network model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109522857B (en) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108446302A (en) * | 2018-01-29 | 2018-08-24 | 东华大学 | A kind of personalized recommendation system of combination TensorFlow and Spark |
CN110008554B (en) * | 2019-03-27 | 2022-10-18 | 哈尔滨工业大学 | Method for optimizing technological parameters and welding tool structure of friction stir welding seam forming prediction based on numerical simulation and deep learning |
CN110097185B (en) * | 2019-03-29 | 2021-03-23 | 北京大学 | Optimization model method based on generation of countermeasure network and application |
CN109978807B (en) * | 2019-04-01 | 2020-07-14 | 西北工业大学 | Shadow removing method based on generating type countermeasure network |
CN110033043B (en) * | 2019-04-16 | 2020-11-10 | 杭州电子科技大学 | Radar one-dimensional range profile rejection method based on condition generation type countermeasure network |
CN110120020A (en) * | 2019-04-30 | 2019-08-13 | 西北工业大学 | A kind of SAR image denoising method based on multiple dimensioned empty residual error attention network |
CN110335212B (en) * | 2019-06-28 | 2021-01-15 | 西安理工大学 | Defect ancient book Chinese character repairing method based on condition confrontation network |
CN110705340B (en) * | 2019-08-12 | 2023-12-26 | 广东石油化工学院 | Crowd counting method based on attention neural network field |
CN110503049B (en) * | 2019-08-26 | 2022-05-03 | 重庆邮电大学 | Satellite video vehicle number estimation method based on generation countermeasure network |
CN111080501B (en) * | 2019-12-06 | 2024-02-09 | 中国科学院大学 | Real crowd density space-time distribution estimation method based on mobile phone signaling data |
CN111429436B (en) * | 2020-03-29 | 2022-03-15 | 西北工业大学 | Intrinsic image analysis method based on multi-scale attention and label loss |
CN112326276B (en) * | 2020-10-28 | 2021-07-16 | 北京航空航天大学 | High-speed rail steering system fault detection LSTM method based on generation countermeasure network |
CN112818945A (en) * | 2021-03-08 | 2021-05-18 | 北方工业大学 | Convolutional network construction method suitable for subway station crowd counting |
CN113421192B (en) * | 2021-08-24 | 2021-11-19 | 北京金山云网络技术有限公司 | Training method of object statistical model, and statistical method and device of target object |
CN114972111B (en) * | 2022-06-16 | 2023-01-10 | 慧之安信息技术股份有限公司 | Dense crowd counting method based on GAN image restoration |
CN115357218A (en) * | 2022-08-02 | 2022-11-18 | 北京航空航天大学 | High-entropy random number generation method based on chaos prediction antagonistic learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107423701A (en) * | 2017-07-17 | 2017-12-01 | 北京智慧眼科技股份有限公司 | The non-supervisory feature learning method and device of face based on production confrontation network |
CN107451619A (en) * | 2017-08-11 | 2017-12-08 | 深圳市唯特视科技有限公司 | A kind of small target detecting method that confrontation network is generated based on perception |
CN108764085A (en) * | 2018-05-17 | 2018-11-06 | 上海交通大学 | Based on the people counting method for generating confrontation network |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10474881B2 (en) * | 2017-03-15 | 2019-11-12 | Nec Corporation | Video retrieval system based on larger pose face frontalization |
-
2018
- 2018-11-26 CN CN201811415565.0A patent/CN109522857B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107423701A (en) * | 2017-07-17 | 2017-12-01 | 北京智慧眼科技股份有限公司 | The non-supervisory feature learning method and device of face based on production confrontation network |
CN107451619A (en) * | 2017-08-11 | 2017-12-08 | 深圳市唯特视科技有限公司 | A kind of small target detecting method that confrontation network is generated based on perception |
CN108764085A (en) * | 2018-05-17 | 2018-11-06 | 上海交通大学 | Based on the people counting method for generating confrontation network |
Non-Patent Citations (2)
Title |
---|
Squeeze-and-Excitation Networks;Jie Hu et al;《2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition》;20180623;第7132-7141页 * |
非重叠域行人再识别算法研究;何晴 等;《信息技术》;20180731(第7期);第34-38页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109522857A (en) | 2019-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109522857B (en) | People number estimation method based on generation type confrontation network model | |
CN112364779B (en) | Underwater sound target identification method based on signal processing and deep-shallow network multi-model fusion | |
CN108717568B (en) | A kind of image characteristics extraction and training method based on Three dimensional convolution neural network | |
CN110335261B (en) | CT lymph node detection system based on space-time circulation attention mechanism | |
CN114429156B (en) | Radar interference multi-domain characteristic countermeasure learning and detection recognition method | |
CN106295124B (en) | The method of a variety of image detecting technique comprehensive analysis gene subgraph likelihood probability amounts | |
CN106295694B (en) | Face recognition method for iterative re-constrained group sparse representation classification | |
CN109190537A (en) | A kind of more personage's Attitude estimation methods based on mask perceived depth intensified learning | |
CN109145992A (en) | Cooperation generates confrontation network and sky composes united hyperspectral image classification method | |
CN112818764B (en) | Low-resolution image facial expression recognition method based on feature reconstruction model | |
CN109002848B (en) | Weak and small target detection method based on feature mapping neural network | |
CN109919241B (en) | Hyperspectral unknown class target detection method based on probability model and deep learning | |
CN110728629A (en) | Image set enhancement method for resisting attack | |
CN114241422A (en) | Student classroom behavior detection method based on ESRGAN and improved YOLOv5s | |
CN109598220A (en) | A kind of demographic method based on the polynary multiple dimensioned convolution of input | |
CN113780242A (en) | Cross-scene underwater sound target classification method based on model transfer learning | |
CN107729926A (en) | A kind of data amplification method based on higher dimensional space conversion, mechanical recognition system | |
CN114428234A (en) | Radar high-resolution range profile noise reduction identification method based on GAN and self-attention | |
CN116482618B (en) | Radar active interference identification method based on multi-loss characteristic self-calibration network | |
CN104778466A (en) | Detection method combining various context clues for image focus region | |
CN115496720A (en) | Gastrointestinal cancer pathological image segmentation method based on ViT mechanism model and related equipment | |
CN113344045A (en) | Method for improving SAR ship classification precision by combining HOG characteristics | |
CN113435276A (en) | Underwater sound target identification method based on antagonistic residual error network | |
CN112800882A (en) | Mask face posture classification method based on weighted double-flow residual error network | |
CN109389101A (en) | A kind of SAR image target recognition method based on denoising autoencoder network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |