CN112818884B - Crowd counting method - Google Patents

Crowd counting method Download PDF

Info

Publication number
CN112818884B
CN112818884B CN202110169724.9A CN202110169724A CN112818884B CN 112818884 B CN112818884 B CN 112818884B CN 202110169724 A CN202110169724 A CN 202110169724A CN 112818884 B CN112818884 B CN 112818884B
Authority
CN
China
Prior art keywords
sample
samples
training
similarity
counting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110169724.9A
Other languages
Chinese (zh)
Other versions
CN112818884A (en
Inventor
李国荣
刘心岩
苏荔
黄庆明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Chinese Academy of Sciences
Original Assignee
University of Chinese Academy of Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Chinese Academy of Sciences filed Critical University of Chinese Academy of Sciences
Priority to CN202110169724.9A priority Critical patent/CN112818884B/en
Publication of CN112818884A publication Critical patent/CN112818884A/en
Application granted granted Critical
Publication of CN112818884B publication Critical patent/CN112818884B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a crowd counting method, which comprises a training stage and a testing stage, wherein the training stage comprises the following steps: step 1, obtaining the similarity between training images and selecting a training sample; step 2, clustering the selected training samples, and storing a group of weights for each class; and 3, training a weight retrieval module. The crowd counting method using storage enhancement disclosed by the invention constructs a multi-weight network, utilizes the relation among samples, improves the generalization capability of a single simple network with a plurality of parameter sets, can be integrated with most of the existing methods, and obviously improves the performance of the single simple network.

Description

Crowd counting method
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a crowd counting method.
Background
The crowd counting task is used to estimate the number of objects in a picture, such as pedestrians, vehicles, animals, etc. This task has attracted increasing attention due to its wide application in a variety of scenarios, such as airports, stations, shopping malls, or people walking where people counting is important. Particularly in epidemic pandemics, the possibility of infection by viruses is remarkably increased by crowds, and the detection and warning of crowds gathering in public areas play an important role in controlling the spread of the epidemic.
Existing population counting methods have achieved reliable performance in certain application contexts, such as uniform density or fixed viewing angle. However, without limitation, the performance of existing methods is greatly impaired, mainly because the unconstrained target scene is complex in many ways, including different viewing angles, variable scales, different densities, and a wide range of brightness and contrast, etc., which ultimately results in significant changes in the visual characteristics of the objects being counted.
Most existing approaches attempt to handle unconstrained situations using a single network with multiple channels, with different channels being used to handle data of different scales. However, related research has indicated that it is difficult to solve the population count problem with a single network, suggesting the use of multiple networks, where each network is responsible for a particular size or density. For example, Switch-CNN designs a Switch structure before CNN of multiple channels to find the optimal channel for a given picture, but because it is impractical to design multiple channels manually, Switch-CNN can only handle limited scale changes.
Furthermore, to handle the cross-scene counting task, the prior art first pre-trains the model on a training data set, uses a coarse density map and a given perspective to find training samples similar to and fine with the test image during the inference phase, and then adjusts the pre-trained model on these samples to obtain a customized model of the test image. However, perspective views are not readily available, are not commonly available, and the similarity between density maps does not describe the complex correlation between images.
It can be seen that most of the existing population counting methods adopt a complex structure and a backbone network with a large parameter quantity to enhance the generalization of population counting, but when the population counting methods are tested on a large-scale data set, the improvement brought by the methods cannot be satisfied. Therefore, there is a need to provide a new people counting method to solve the above problems.
Disclosure of Invention
In order to overcome the above problems, the present inventors have conducted intensive studies and, as a result, found that: establishing a crowd counting network adopting a plurality of groups of parameters by analyzing the relation among the samples, wherein the network loads different parameters for different samples; meanwhile, a task-driven similarity is proposed, which is based on the mutual enhancement relationship between samples during fine tuning, similar samples are clustered into a cluster according to the similarity, each cluster is used for acquiring a group of specific parameters, the method utilizes the relationship between samples, improves the generalization capability of a single simple network with a plurality of parameter sets, can be integrated with most of the existing methods, and remarkably improves the performance of the single simple network, thereby completing the invention.
Specifically, the present invention aims to provide the following:
in a first aspect, there is provided a method of population counting using memory enhancement, the method comprising a training phase and a testing phase, the training phase comprising the steps of:
step 1, obtaining the similarity between training images and selecting a training sample;
step 2, clustering the selected training samples, and storing a group of weights for each class;
and 3, training a weight retrieval module.
In a second aspect, a computer readable storage medium is provided, storing a program for people counting using storage enhancement, which program, when executed by a processor, causes the processor to carry out the steps of the method for people counting using storage enhancement.
In a third aspect, a computer device is provided, comprising a memory storing a program for crowd counting using memory enhancement, and a processor, wherein the program, when executed by the processor, causes the processor to perform the steps of the method for crowd counting using memory enhancement.
The invention has the advantages that:
(1) the crowd counting method using storage enhancement provided by the invention constructs a multi-weight network, utilizes the relation among samples, improves the generalization capability of a single simple network with a plurality of parameter sets, can be integrated with most of the existing methods, and obviously improves the performance of the single simple network;
(2) according to the population counting method using storage enhancement, provided by the invention, a plurality of clusters of training data are obtained by adopting the mutual fine-tuning similarity and heuristic clustering method, and each cluster is used for learning a group of parameters, so that the method is beneficial to testing images similar to the clusters;
(3) according to the population counting method using storage enhancement, a simple and effective population counting model (FDC) is designed, a small density map regressor is provided, a plurality of FDCs (MFDCs) with a plurality of parameter sets are obtained through the proposed multi-parameter strategy, and the detection performance is remarkably improved.
Drawings
FIG. 1 illustrates a flow diagram of a population counting method using memory augmentation in accordance with a preferred embodiment of the present invention;
FIG. 2 is a diagram showing the improvement effect of the FDC method for MFDC method according to the embodiment of the present invention;
fig. 3 is a graph showing the comparison effect between the parameter and performance of the method according to the embodiment of the present invention and the existing methods.
Detailed Description
The present invention will be described in further detail below with reference to preferred embodiments and examples. The features and advantages of the present invention will become more apparent from the description.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
In a first aspect of the present invention, there is provided a method of population counting using memory enhancement, as shown in fig. 1, the method comprising a training phase and a testing phase, the training phase comprising the steps of:
step 1, obtaining the similarity between training images, and selecting a training sample according to the stability of pre-training;
step 2, clustering the selected training samples, training each type of sample and storing a group of weights;
and 3, training a weight retrieval module.
The steps of the training phase are described in further detail below:
step 1, establishing a crowd counting network model, and selecting a training sample according to the stability of pre-training.
Wherein, step 1 comprises the following substeps:
step 1-1, establishing a crowd counting network model.
The general network model includes a feature extractor and a density map generator, and the inventors consider that in the subsequent process of using the storage enhanced population counting method, the density map generator needs to be fine-tuned multiple times, and in addition to requiring a large parameter storage space, if the training set is small, the model with the large density map generator is easily over-fitted.
Therefore, in order to solve the above problem, a simple basic model called FDC (i.e. a built population counting network model) is adopted in the present invention.
According to a preferred embodiment of the present invention, the created population counting network model is composed of a standard Feature Pyramid Network (FPN) as a basic feature extractor and a dilated convolution as a density map generator.
Preferably, the FPN in the FDC can adopt different networks as backbone networks, such as ResNet-18, ResNet-34, ResNet-50 and the like, and preferably ResNet-18.
More preferably, to align the output of the FPN, the deconvolution layer is used as an upsampler of the population counting network model.
The inventors have found that the density map regression parameters of the FDC used in the present invention are less, but still sufficiently efficient, compared to the state of the art most methods (e.g.CSRNet, ACMNet, DM-Count method).
Specifically, the method comprises the following steps: in the training process, a part of images are cut out from the original image, the size of the part of images is 224 × 224, 448 × 448 and 896 × 896, and the size of the parts of images is adjusted to 224 × 224; FPN generates four feature maps, ranging in size from 7x 7 to 56x56, which are then upsampled by multiple deconvolution layers of step size 2 to generate feature maps of size 56x 56; these feature maps were then connected and fused through two 3 × 3 convolutional layers to generate an output density map.
And 1-2, obtaining the similarity between training image samples.
In general, in the case of a well-trained population count model (base model) and a sample, if the model is trimmed on a given sample, the performance of the trimmed model for that sample, and other similar samples, will improve. Similarities between other samples may be defined based on their performance improvement.
Specifically, T ═ x is definedi,yi) 1,2, N is a crop with N samples (sample and its corresponding point)Label) of the training set; the feature extractor and density map generator for a given model are defined as f ═ Ψ (x, θ), respectively1) And d ═ Φ (f, θ)2) Wherein, theta1And theta2Parameters of the feature extractor and the density map generator, respectively; the loss function is expressed as
Figure GDA0003280126700000051
Wherein ^ y ═ Φ (Ψ (x, θ)1),θ2)。
In the present invention, a novel metric is proposed to evaluate the similarity between training image samples, i.e. a similarity directly dependent on a specific task and model-sample x when using the density map parameters trimmed by other samplesiThe loss function of (a) varies.
Preferably, the similarity between the training image samples is obtained by a method comprising the steps of:
step 1-2-1, the loss of the ith sample is obtained.
Wherein the ith sample (x) of the basic model (population counting network model)i,yi) Has a loss of li=L(Φ(fi,θ2),yi)。
And 1-2-2, fine-tuning parameters of a density map generator of the crowd counting network model.
Wherein the parameters of the density map generator of the basic model are fine-tuned to obtain the ith sample (x)i,yi) And a specific and efficient set of weights for similar samples, the optimal parameter for i is obtained as
Figure GDA0003280126700000061
And 1-2-3, obtaining the loss of the jth sample, and obtaining the fine-tuning similarity between the ith sample and the jth sample.
Wherein a fine tuning model is obtained
Figure GDA0003280126700000062
Sample j (x)j,yj) Is lost in
Figure GDA0003280126700000063
In the present invention, it can be seen from the above description that if the sample i and the sample j are similar, the fine tuning model of the ith sample will achieve performance improvement on the jth sample. The more similar between samples, the greater the improvement will be. Therefore, the degree of improvement after fine tuning in the present invention can be regarded as the similarity between samples.
According to a preferred embodiment of the present invention, the fine-tuning similarity between the ith sample and the jth sample is obtained by:
Figure GDA0003280126700000064
in the present invention, when the predictions of two pictures have performance improvement by model weights finely adjusted from the base model to each other, the similarity (similarity) between them will be positive, and the larger the ratio (i.e., the degree of mutual improvement) is, the larger the similarity improves each other.
The inventor considers that calculating the fine tuning similarity of all training image samples is time-consuming, intuitively, a difficult sample (a sample with a large error from a true label in prediction) is important, but in the training process, some unstable samples exist, so that the loss of a basic model is unstable. Although some of these unstable samples are not difficult samples, trimming using some samples in the data set will greatly reduce the loss of these samples.
Therefore, in the present invention, it is preferable to calculate the fine-tuning similarity between unstable samples. In addition, since the loss of the stable samples does not change much during the training process, and the influence of fine tuning on the stable samples on the parameters is small, it is preferable that the fine tuning similarity between the samples can be directly estimated as 0.
And 1-3, selecting a training sample.
In the present invention, in order to evaluate the instability of the sample during training, it is preferable to use the sequence and inversion tests, and only consider the decreasing trend of the loss function.
According to a preferred embodiment of the present invention, the index I (I, m) of the decreasing trend of the ith sample in the mth training period is defined as follows:
Figure GDA0003280126700000071
wherein the content of the first and second substances,
Figure GDA0003280126700000072
represents the predicted value of the m-th training period to the i-th sample, yiThe true value of the ith sample is represented, and e represents the hyper-parameter.
In a further preferred embodiment, the tolerance to small variations is adjusted using the over-parameter e, the instability of the training sample is preferably obtained by:
Figure GDA0003280126700000073
where M represents the total number of cycles, the closer the equation is to 1, the greater the instability of the training samples.
In the present invention, a threshold η is set, preferably in the range of (0) to (0.5), wherein samples with instability greater than the threshold form an unstable sample set, denoted Q.
In the present invention, all training samples in the unstable sample set Q are preferably selected to calculate the inter-trimmed similarity.
And 2, clustering the selected training samples, training each type of sample and storing a group of weights.
Preferably, the clustering is performed according to a method comprising the steps of:
step 2-1, obtaining the sum of the similarity between the sample u in the unstable sample set Q and all other samples in Q, and marking all the samples as unprocessed state;
step 2-2, performing descending order arrangement on all unstable samples according to the sum of the similarity, and traversing the samples;
and 2-3, clustering according to the processing state of the sample.
Preferably, when clustering is performed, firstly, whether the sample is unprocessed or not is judged, and if the sample is processed, the next cycle process is started; if the samples are not processed, a new cluster is created, all unstable samples are sorted in descending order according to the similarity of the currently processed samples, and the samples are traversed.
More preferably, in the created new cluster, the processing state of each sample is judged, and if the sample is processed, the next cycle process is entered; if the sample is not processed, it is determined whether the sample is similar to all samples in the cluster, if so, the sample is added to the cluster, and if not, the sample is skipped.
In the invention, the clustering method follows two principles, firstly, the fine tuning similarity of all samples in each cluster is positive; second, the number of clusters should be as small as possible to reduce the space required for model storage.
By the heuristic clustering method, time cost can be saved by real-time fine adjustment, each cluster is used for learning a group of parameters, and the method is very effective for testing images similar to the cluster.
Wherein samples not in the unstable sample set Q are designated as a cluster, denoted S0. And (3) using each cluster to fine-tune the density map generator of the basic model to obtain K +1 weight sets, wherein K represents the number of clusters obtained in the step 2.
And 3, training a weight retrieval module.
In the present invention, in order to obtain the optimal weight of the test sample, each cluster is preferably regarded as a class, a multi-class classifier is trained, that is, the weight retrieval module, and the established population counting network model is marked as FDC (i.e., MFDC) with multiple parameter sets.
According to a preferred embodiment of the present invention, when training the multi-class classifier, the soft label is represented by the following formula:
Figure GDA0003280126700000091
wherein S isjFor the jth cluster, xiIs the ith sample, s (i, j) is the similarity between sample i and sample j,
Figure GDA0003280126700000092
is the label of sample i on cluster j.
The present inventors consider that a sample belonging to one cluster may have positive inter-trimmed similarity with some samples in other clusters, and therefore, in the present invention, a soft label described by the above formula is adopted instead of simply using a hard label, which is calculated based on the average similarity between the sample and the samples in the cluster.
In a further preferred embodiment, ResNet-18 is employed as the backbone for the multi-class classifier.
Wherein the input to the classifier comprises the original training image and the output of the feature extractor in the base model.
Preferably, the original training image is aligned to the same size of the feature extractor output by the shallow CNN-Pool-CNN structure.
In a further preferred embodiment, the cross-entropy loss function for training the multi-class classifier is as follows:
Figure GDA0003280126700000101
wherein L is a loss function value, T is a total number of samples, N is a total number of classes,
Figure GDA0003280126700000102
for the purpose of the calculated soft label,
Figure GDA0003280126700000103
the prediction probability for sample i classified as cluster j.
When a test image is tested, a prediction result of a weight retrieval module (a multi-class classifier) after training convergence is represented as the probability that the image belongs to a certain cluster. If each probability is small, the probability of representing data belonging to each cluster is small, and it is considered to be from cluster 0.
In the invention, the trained multi-class classifier can predict class labels of the test data, and the prediction result is used for retrieving the optimal weight so as to dynamically select a group of specific parameters according to the characteristics of the test image, thereby greatly improving the performance.
The population counting method using storage enhancement adopts a multi-weight strategy for population counting, the strategy utilizes the relation among samples, improves the generalization capability of a single simple network with a plurality of parameter sets, can be integrated with most of the existing methods, and can obviously improve the performance of the existing methods; and meanwhile, a plurality of clusters of the training images are obtained by adopting an effective task-driven similarity and clustering method, each cluster is used for learning a group of parameters, and the method is very effective for testing the images similar to the clusters.
The invention also provides a computer readable storage medium storing a program for population counting using memory enhancement, which program, when executed by a processor, causes the processor to carry out the steps of the method for population counting using memory enhancement.
The crowd counting method using memory enhancement in the present invention can be implemented by means of software plus necessary general hardware platform, the software is stored in a computer readable storage medium (including ROM/RAM, magnetic disk, optical disk), and includes several instructions to make a terminal device (which may be a mobile phone, a computer, a server, a network device, etc.) execute the method of the present invention.
The invention also provides a computer device comprising a memory and a processor, the memory storing a program for people counting using memory enhancement, which program, when executed by the processor, causes the processor to carry out the steps of the method for people counting using memory enhancement.
Examples
The present invention is further described below by way of specific examples, which are merely exemplary and do not limit the scope of the present invention in any way.
Example 1
1. Data set
This example was performed on three datasets of ShanghaiTech Part A, UCF-QNRF, and NWPU-crown, in that order.
Wherein, the ShanghaiTech Part A data set refers to: desenzhou/ShanghaiTechDataset, data set associated in Single Image Crowd Counting via Multi Column volumetric Neural Network (MCNN) (githu. com); the UCF-QNRF dataset refers to CRCV Center for Research in Computer Vision at the University of Central Florida (ucf.edu); the NWPU-Crow dataset refers to a Crown Benchmark.
A basic case introduction for these three data sets is shown in table 1.
TABLE 1
Name (R) Number of pictures Number of people
ShangHaiTech Part A 482 241667
UCF-QNRF 1535 1.25million
NWPU-Crowd 5109 Unknown
2. Performance evaluation criteria:
performance indicators include Mean Absolute Error (MAE) and Mean Square Error (MSE).
3. Task description
The network is trained using a training set provided by the public data set, and predictions are made on a test set provided by the public data set. For the ShangHaiTech Part A dataset and the UCF-QNRF dataset, the prediction index is calculated from the open test set. The NWPU-crown dataset is submitted on crown Benchmark to obtain feedback.
4. Results and analysis
The method provided by the invention is compared with the existing method on different data sets, and the comparison result of the average absolute error (MAE) and the Mean Square Error (MSE) is shown in tables 2-4.
Table 2 shows the comparison result between the method of the present invention and the existing method on the ShangHaiTech Part _ a dataset, and table 3 shows the comparison result between the method of the present invention and the existing method on the UCF-QNRF dataset, and the comparison result between the method of the present invention and the existing method on the NWPU-crwood dataset.
TABLE 2
Figure GDA0003280126700000121
Figure GDA0003280126700000131
The MCNN method is specifically shown in the literature Zhang, Y; zhou, d.; chen, s.; gao, S. & Ma, Y.Single Image Crowd Counting via Multi Column volume protocol of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.589-597,2016 ";
the CSRNet method is specifically disclosed in the literature "Li, Y.; zhang, X. & Chen, D.CSRNet: scaled relational Neural Networks for applying the mapping the high mapped scenes. proceedings of the IEEE Computer Society reference on Computer Vision and Pattern Recognition, 2018';
the ResSFCN-101 method is specifically disclosed in the literature "Laradji, I.H.; rostamzadeh, N.; pinheiro, p.o.; vazzez, D. & Schmidt, M.wheel are the Blobs, Counting by Localization with Point supervision of proceedings of the European Conference on Computer Vision (ECCV), pp.547-562,2018 ";
CAN methods are specifically described in literature "Liu, w.; salzmann, M. & Fua, P.Context-Aware crown counting. proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol.2019-June,2019 ";
the DM-Count method is specifically disclosed in the literature "Wang, B.; liu, h.; samaras, D. & Hoai, M.distribution Matching for Crowd counting. proceedings of Advances in Neural Information Processing Systems, 2020';
S-DCNet method and SS-DCNet (cls) are specifically disclosed in the literature "Xiong, H.; lu, h.; liu, c.; liang, l.; cao, Z. & Shen, C.from Open Set to Closed Set, Supervised Spatial Divide-and-Conquer for Object counting. proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.8362-8371,2019 ";
M-MCNN is MCNN modified by the method of the invention;
M-CSRNet is CSRNet improved using the method of the invention;
FDC-18 is a method using a basic model;
MFDC-18 is the method of the present invention (employing FDC with multiple parameter sets).
TABLE 3
Method MAE MSE
MCNN 277 426
Switch-CNN 228 445
CAN 107 183
CSRNet 98.2 157.2
S-DCNet 97.7 167.6
DM-Count 85.6 148.3
SS-DCNet(cls) 81.9 143.8
M-MCNN 234.1 381.8
M-CSRNet 83.1 144.6
FDC-18 93.0 157.3
MFDC-18 76.2 121.5
The Switch-CNN method is specifically disclosed in the literature "Sam, D.B.; surya, S. & Babu, R.V.switching capacitive neutral Network for Crowd counting. proceedings IEEE Conference on Computer Vision and Pattern Recognition, Vol.2017-January,2017 ".
TABLE 4
Figure GDA0003280126700000141
Figure GDA0003280126700000151
Where O _ MAE denotes MAE averaged by picture, O _ MSE denotes MSE averaged by picture, O _ NAE denotes MAE normalized by number of people, avg.
As can be seen from tables 2-4, on the data set of Shanghai Tech Part A and UCF-QNRF, the MFDC-18 method disclosed by the invention has the lowest MSE and MAE values, and the MCNN and CSRNet (namely M-MCNN and M-CSRNet) improved by the method disclosed by the invention are improved in performance compared with the original method.
On the NWPU-crown data set, the method disclosed by the invention is greatly superior to the previous method in indexes of O _ MAE and O _ MSE, and the effectiveness of the method is illustrated; the classification test error of the method based on the Avg, MAE (S/L), namely the scene and illumination is still greatly superior to that of the previous method, and the effectiveness of the method in various scene categories is proved.
Further, fig. 2 shows the effect of the improvement of the multi-weight Method (MFDC) to the single-weight method (FDC) according to the present invention, and as can be seen from fig. 2, the MFDC method greatly improves the prediction accuracy compared to the FDC method.
FIG. 3 shows the comparison of the population Count and performance of the population counting method using the memory enhancement of the present invention with the prior art methods (MCNN, SANet, PCCNet-light, CSRNet, SFCN-101, Bayesian, SCAR, CAN, SDCNT, DM-Count, respectively).
As can be seen from fig. 3, under the condition of similar parameter quantities, the method of the present invention greatly reduces the average error value compared with the conventional method.
The invention has been described in detail with reference to specific embodiments and illustrative examples, but the description is not intended to be construed in a limiting sense. Those skilled in the art will appreciate that various equivalent substitutions, modifications or improvements may be made to the technical solution of the present invention and its embodiments without departing from the spirit and scope of the present invention, which fall within the scope of the present invention.

Claims (4)

1. A method of population counting, the method comprising a training phase and a testing phase, the training phase comprising the steps of:
step 1, obtaining the similarity between training images and selecting a training sample;
step 1-1, establishing a crowd counting network model;
step 1-2, obtaining the similarity between training image samples;
definition T { (x)i,yi) 1,2, N, wherein i is a training set with N sample crops, and the sample crops are marked for samples and corresponding points thereof; the feature extractor and density map generator for a given model are defined as f ═ Ψ (x,θ1) And d ═ Φ (f, θ)2) Wherein, theta1And theta2Parameters of the feature extractor and the density map generator, respectively; the loss function is expressed as
Figure FDA0003280126690000011
Where ŷ ═ Φ (Ψ (x, θ)1),θ2);
The similarity between the training image samples is obtained by a method comprising the steps of:
step 1-2-1, obtaining the loss of the ith sample;
wherein the ith sample (x) of the basic modeli,yi) Has a loss of li=L(Φ(fi,θ2),yi) The basic model is a population counting network model;
step 1-2-2, fine-tuning parameters of a density map generator of the crowd counting network model;
wherein, the parameters of the density chart generator of the basic model are finely adjusted to obtain the optimal parameters
Figure FDA0003280126690000012
Step 1-2-3, obtaining the loss of the jth sample, and obtaining the fine tuning similarity between the ith sample and the jth sample;
the fine-tuning similarity between the ith and jth samples is obtained by:
Figure FDA0003280126690000013
li=L(Φ(fi,θ2),yi) I-th sample (x) of the population count network modeli,yi) Loss of (d);
Figure FDA0003280126690000021
representing fine-tuning models
Figure FDA0003280126690000022
Sample j (x)j,yj) Loss of (d);
step 1-3, selecting a training sample;
an index I (I, m) of the falling tendency of the ith sample in the mth training period is defined as follows:
Figure FDA0003280126690000023
xiis the ith sample; y isiRepresenting the true value of the ith sample, and epsilon representing the hyper-parameter;
the instability of the training sample is obtained by:
Figure FDA0003280126690000024
setting a threshold eta with the value of 0-0.5, wherein a sample with instability greater than the threshold is selected to form an unstable sample set, which is represented as Q;
step 2, clustering the selected training samples, and storing a group of weights for each class;
the clustering is performed according to a method comprising the following steps:
step 2-1, obtaining the sum of the similarity between the sample u in the unstable sample set Q and all other samples in Q, and marking all the samples as unprocessed state;
step 2-2, performing descending order arrangement on all unstable samples according to the sum of the similarity, and traversing the samples;
step 2-3, clustering is carried out according to the processing state of the sample;
in the step 2, the process is carried out,
when clustering is carried out, firstly, judging whether a sample is unprocessed or not, and if the sample is processed, entering the next cycle process; if the samples are not processed, a new cluster is created, all unstable samples are sorted in a descending order according to the similarity of the currently processed samples, and the samples are traversed;
in the new cluster creation, the processing state of each sample is judged, and if the sample is processed, the next cycle process is started; if the sample is not processed, judging whether the sample is similar to all samples in the cluster, if so, adding the sample into the cluster, and if not, skipping the sample;
step 3, training a weight retrieval module;
in step 3, when the weight retrieval module is trained, the soft label shown as the following formula is adopted:
Figure FDA0003280126690000031
wherein S isjFor the jth cluster, xiIs the ith sample, s (i, j) is the similarity between sample i and sample j,
Figure FDA0003280126690000032
label of sample i on cluster j; k represents the number of clusters obtained in step 2.
2. The population counting method according to claim 1, wherein in step 1-1, said constructed population counting network model is composed of a standard Feature Pyramid Network (FPN) as a basic feature extractor and a dilated convolution as a density map generator.
3. A computer-readable storage medium, in which a program for population counting enhanced with storage is stored, which program, when executed by a processor, causes the processor to carry out the steps of the population counting method according to one of claims 1 to 2.
4. A computer device comprising a memory and a processor, the memory storing a population counting program enhanced with memory, the program, when executed by the processor, causing the processor to carry out the steps of the population counting method according to one of claims 1 to 2.
CN202110169724.9A 2021-02-07 2021-02-07 Crowd counting method Active CN112818884B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110169724.9A CN112818884B (en) 2021-02-07 2021-02-07 Crowd counting method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110169724.9A CN112818884B (en) 2021-02-07 2021-02-07 Crowd counting method

Publications (2)

Publication Number Publication Date
CN112818884A CN112818884A (en) 2021-05-18
CN112818884B true CN112818884B (en) 2021-11-30

Family

ID=75862262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110169724.9A Active CN112818884B (en) 2021-02-07 2021-02-07 Crowd counting method

Country Status (1)

Country Link
CN (1) CN112818884B (en)

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101295305B (en) * 2007-04-25 2012-10-31 富士通株式会社 Image retrieval device
CN101464950B (en) * 2009-01-16 2011-05-04 北京航空航天大学 Video human face identification and retrieval method based on on-line learning and Bayesian inference
CN102436589B (en) * 2010-09-29 2015-05-06 中国科学院电子学研究所 Complex object automatic recognition method based on multi-category primitive self-learning
CN102306281B (en) * 2011-07-13 2013-11-27 东南大学 Multi-mode automatic estimating method for humage
CN102799935B (en) * 2012-06-21 2015-03-04 武汉烽火众智数字技术有限责任公司 Human flow counting method based on video analysis technology
CN103295031B (en) * 2013-04-15 2016-12-28 浙江大学 A kind of image object method of counting based on canonical risk minimization
CN105631418B (en) * 2015-12-24 2020-02-18 浙江宇视科技有限公司 People counting method and device
CN106874862B (en) * 2017-01-24 2021-06-04 复旦大学 Crowd counting method based on sub-model technology and semi-supervised learning
CN107358596B (en) * 2017-04-11 2020-09-18 阿里巴巴集团控股有限公司 Vehicle loss assessment method and device based on image, electronic equipment and system
CN107506703B (en) * 2017-08-09 2020-08-25 中国科学院大学 Pedestrian re-identification method based on unsupervised local metric learning and reordering

Also Published As

Publication number Publication date
CN112818884A (en) 2021-05-18

Similar Documents

Publication Publication Date Title
CN107480261B (en) Fine-grained face image fast retrieval method based on deep learning
WO2020073951A1 (en) Method and apparatus for training image recognition model, network device, and storage medium
WO2020114378A1 (en) Video watermark identification method and apparatus, device, and storage medium
Firpi et al. Swarmed feature selection
CN110598598A (en) Double-current convolution neural network human behavior identification method based on finite sample set
CN105809672B (en) A kind of image multiple target collaboration dividing method constrained based on super-pixel and structuring
CN110322445B (en) Semantic segmentation method based on maximum prediction and inter-label correlation loss function
CN107169117B (en) Hand-drawn human motion retrieval method based on automatic encoder and DTW
CN111723829B (en) Full-convolution target detection method based on attention mask fusion
WO2021051987A1 (en) Method and apparatus for training neural network model
CN110751027B (en) Pedestrian re-identification method based on deep multi-instance learning
CN106682681A (en) Recognition algorithm automatic improvement method based on relevance feedback
CN104966052A (en) Attributive characteristic representation-based group behavior identification method
CN109190666B (en) Flower image classification method based on improved deep neural network
CN111524140B (en) Medical image semantic segmentation method based on CNN and random forest method
CN110163130B (en) Feature pre-alignment random forest classification system and method for gesture recognition
CN114118207B (en) Incremental learning image identification method based on network expansion and memory recall mechanism
CN115457332A (en) Image multi-label classification method based on graph convolution neural network and class activation mapping
CN110969639B (en) Image segmentation method based on LFMVO optimization algorithm
Chou et al. A hierarchical multiple classifier learning algorithm
Mund et al. Active online confidence boosting for efficient object classification
CN112818884B (en) Crowd counting method
CN110751662B (en) Image segmentation method and system for quantum-behaved particle swarm optimization fuzzy C-means
Wei et al. Salient object detection based on weighted hypergraph and random walk
CN110941994B (en) Pedestrian re-identification integration method based on meta-class-based learner

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Li Guorong

Inventor after: Liu Xinyan

Inventor after: Su Li

Inventor after: Huang Qingming

Inventor before: Liu Xinyan

Inventor before: Li Guorong

Inventor before: Su Li

Inventor before: Huang Qingming

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant