CN112818884B - Crowd counting method - Google Patents
Crowd counting method Download PDFInfo
- Publication number
- CN112818884B CN112818884B CN202110169724.9A CN202110169724A CN112818884B CN 112818884 B CN112818884 B CN 112818884B CN 202110169724 A CN202110169724 A CN 202110169724A CN 112818884 B CN112818884 B CN 112818884B
- Authority
- CN
- China
- Prior art keywords
- sample
- samples
- training
- similarity
- counting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a crowd counting method, which comprises a training stage and a testing stage, wherein the training stage comprises the following steps: step 1, obtaining the similarity between training images and selecting a training sample; step 2, clustering the selected training samples, and storing a group of weights for each class; and 3, training a weight retrieval module. The crowd counting method using storage enhancement disclosed by the invention constructs a multi-weight network, utilizes the relation among samples, improves the generalization capability of a single simple network with a plurality of parameter sets, can be integrated with most of the existing methods, and obviously improves the performance of the single simple network.
Description
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a crowd counting method.
Background
The crowd counting task is used to estimate the number of objects in a picture, such as pedestrians, vehicles, animals, etc. This task has attracted increasing attention due to its wide application in a variety of scenarios, such as airports, stations, shopping malls, or people walking where people counting is important. Particularly in epidemic pandemics, the possibility of infection by viruses is remarkably increased by crowds, and the detection and warning of crowds gathering in public areas play an important role in controlling the spread of the epidemic.
Existing population counting methods have achieved reliable performance in certain application contexts, such as uniform density or fixed viewing angle. However, without limitation, the performance of existing methods is greatly impaired, mainly because the unconstrained target scene is complex in many ways, including different viewing angles, variable scales, different densities, and a wide range of brightness and contrast, etc., which ultimately results in significant changes in the visual characteristics of the objects being counted.
Most existing approaches attempt to handle unconstrained situations using a single network with multiple channels, with different channels being used to handle data of different scales. However, related research has indicated that it is difficult to solve the population count problem with a single network, suggesting the use of multiple networks, where each network is responsible for a particular size or density. For example, Switch-CNN designs a Switch structure before CNN of multiple channels to find the optimal channel for a given picture, but because it is impractical to design multiple channels manually, Switch-CNN can only handle limited scale changes.
Furthermore, to handle the cross-scene counting task, the prior art first pre-trains the model on a training data set, uses a coarse density map and a given perspective to find training samples similar to and fine with the test image during the inference phase, and then adjusts the pre-trained model on these samples to obtain a customized model of the test image. However, perspective views are not readily available, are not commonly available, and the similarity between density maps does not describe the complex correlation between images.
It can be seen that most of the existing population counting methods adopt a complex structure and a backbone network with a large parameter quantity to enhance the generalization of population counting, but when the population counting methods are tested on a large-scale data set, the improvement brought by the methods cannot be satisfied. Therefore, there is a need to provide a new people counting method to solve the above problems.
Disclosure of Invention
In order to overcome the above problems, the present inventors have conducted intensive studies and, as a result, found that: establishing a crowd counting network adopting a plurality of groups of parameters by analyzing the relation among the samples, wherein the network loads different parameters for different samples; meanwhile, a task-driven similarity is proposed, which is based on the mutual enhancement relationship between samples during fine tuning, similar samples are clustered into a cluster according to the similarity, each cluster is used for acquiring a group of specific parameters, the method utilizes the relationship between samples, improves the generalization capability of a single simple network with a plurality of parameter sets, can be integrated with most of the existing methods, and remarkably improves the performance of the single simple network, thereby completing the invention.
Specifically, the present invention aims to provide the following:
in a first aspect, there is provided a method of population counting using memory enhancement, the method comprising a training phase and a testing phase, the training phase comprising the steps of:
step 1, obtaining the similarity between training images and selecting a training sample;
step 2, clustering the selected training samples, and storing a group of weights for each class;
and 3, training a weight retrieval module.
In a second aspect, a computer readable storage medium is provided, storing a program for people counting using storage enhancement, which program, when executed by a processor, causes the processor to carry out the steps of the method for people counting using storage enhancement.
In a third aspect, a computer device is provided, comprising a memory storing a program for crowd counting using memory enhancement, and a processor, wherein the program, when executed by the processor, causes the processor to perform the steps of the method for crowd counting using memory enhancement.
The invention has the advantages that:
(1) the crowd counting method using storage enhancement provided by the invention constructs a multi-weight network, utilizes the relation among samples, improves the generalization capability of a single simple network with a plurality of parameter sets, can be integrated with most of the existing methods, and obviously improves the performance of the single simple network;
(2) according to the population counting method using storage enhancement, provided by the invention, a plurality of clusters of training data are obtained by adopting the mutual fine-tuning similarity and heuristic clustering method, and each cluster is used for learning a group of parameters, so that the method is beneficial to testing images similar to the clusters;
(3) according to the population counting method using storage enhancement, a simple and effective population counting model (FDC) is designed, a small density map regressor is provided, a plurality of FDCs (MFDCs) with a plurality of parameter sets are obtained through the proposed multi-parameter strategy, and the detection performance is remarkably improved.
Drawings
FIG. 1 illustrates a flow diagram of a population counting method using memory augmentation in accordance with a preferred embodiment of the present invention;
FIG. 2 is a diagram showing the improvement effect of the FDC method for MFDC method according to the embodiment of the present invention;
fig. 3 is a graph showing the comparison effect between the parameter and performance of the method according to the embodiment of the present invention and the existing methods.
Detailed Description
The present invention will be described in further detail below with reference to preferred embodiments and examples. The features and advantages of the present invention will become more apparent from the description.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
In a first aspect of the present invention, there is provided a method of population counting using memory enhancement, as shown in fig. 1, the method comprising a training phase and a testing phase, the training phase comprising the steps of:
step 1, obtaining the similarity between training images, and selecting a training sample according to the stability of pre-training;
step 2, clustering the selected training samples, training each type of sample and storing a group of weights;
and 3, training a weight retrieval module.
The steps of the training phase are described in further detail below:
step 1, establishing a crowd counting network model, and selecting a training sample according to the stability of pre-training.
Wherein, step 1 comprises the following substeps:
step 1-1, establishing a crowd counting network model.
The general network model includes a feature extractor and a density map generator, and the inventors consider that in the subsequent process of using the storage enhanced population counting method, the density map generator needs to be fine-tuned multiple times, and in addition to requiring a large parameter storage space, if the training set is small, the model with the large density map generator is easily over-fitted.
Therefore, in order to solve the above problem, a simple basic model called FDC (i.e. a built population counting network model) is adopted in the present invention.
According to a preferred embodiment of the present invention, the created population counting network model is composed of a standard Feature Pyramid Network (FPN) as a basic feature extractor and a dilated convolution as a density map generator.
Preferably, the FPN in the FDC can adopt different networks as backbone networks, such as ResNet-18, ResNet-34, ResNet-50 and the like, and preferably ResNet-18.
More preferably, to align the output of the FPN, the deconvolution layer is used as an upsampler of the population counting network model.
The inventors have found that the density map regression parameters of the FDC used in the present invention are less, but still sufficiently efficient, compared to the state of the art most methods (e.g.CSRNet, ACMNet, DM-Count method).
Specifically, the method comprises the following steps: in the training process, a part of images are cut out from the original image, the size of the part of images is 224 × 224, 448 × 448 and 896 × 896, and the size of the parts of images is adjusted to 224 × 224; FPN generates four feature maps, ranging in size from 7x 7 to 56x56, which are then upsampled by multiple deconvolution layers of step size 2 to generate feature maps of size 56x 56; these feature maps were then connected and fused through two 3 × 3 convolutional layers to generate an output density map.
And 1-2, obtaining the similarity between training image samples.
In general, in the case of a well-trained population count model (base model) and a sample, if the model is trimmed on a given sample, the performance of the trimmed model for that sample, and other similar samples, will improve. Similarities between other samples may be defined based on their performance improvement.
Specifically, T ═ x is definedi,yi) 1,2, N is a crop with N samples (sample and its corresponding point)Label) of the training set; the feature extractor and density map generator for a given model are defined as f ═ Ψ (x, θ), respectively1) And d ═ Φ (f, θ)2) Wherein, theta1And theta2Parameters of the feature extractor and the density map generator, respectively; the loss function is expressed asWherein ^ y ═ Φ (Ψ (x, θ)1),θ2)。
In the present invention, a novel metric is proposed to evaluate the similarity between training image samples, i.e. a similarity directly dependent on a specific task and model-sample x when using the density map parameters trimmed by other samplesiThe loss function of (a) varies.
Preferably, the similarity between the training image samples is obtained by a method comprising the steps of:
step 1-2-1, the loss of the ith sample is obtained.
Wherein the ith sample (x) of the basic model (population counting network model)i,yi) Has a loss of li=L(Φ(fi,θ2),yi)。
And 1-2-2, fine-tuning parameters of a density map generator of the crowd counting network model.
Wherein the parameters of the density map generator of the basic model are fine-tuned to obtain the ith sample (x)i,yi) And a specific and efficient set of weights for similar samples, the optimal parameter for i is obtained as
And 1-2-3, obtaining the loss of the jth sample, and obtaining the fine-tuning similarity between the ith sample and the jth sample.
In the present invention, it can be seen from the above description that if the sample i and the sample j are similar, the fine tuning model of the ith sample will achieve performance improvement on the jth sample. The more similar between samples, the greater the improvement will be. Therefore, the degree of improvement after fine tuning in the present invention can be regarded as the similarity between samples.
According to a preferred embodiment of the present invention, the fine-tuning similarity between the ith sample and the jth sample is obtained by:
in the present invention, when the predictions of two pictures have performance improvement by model weights finely adjusted from the base model to each other, the similarity (similarity) between them will be positive, and the larger the ratio (i.e., the degree of mutual improvement) is, the larger the similarity improves each other.
The inventor considers that calculating the fine tuning similarity of all training image samples is time-consuming, intuitively, a difficult sample (a sample with a large error from a true label in prediction) is important, but in the training process, some unstable samples exist, so that the loss of a basic model is unstable. Although some of these unstable samples are not difficult samples, trimming using some samples in the data set will greatly reduce the loss of these samples.
Therefore, in the present invention, it is preferable to calculate the fine-tuning similarity between unstable samples. In addition, since the loss of the stable samples does not change much during the training process, and the influence of fine tuning on the stable samples on the parameters is small, it is preferable that the fine tuning similarity between the samples can be directly estimated as 0.
And 1-3, selecting a training sample.
In the present invention, in order to evaluate the instability of the sample during training, it is preferable to use the sequence and inversion tests, and only consider the decreasing trend of the loss function.
According to a preferred embodiment of the present invention, the index I (I, m) of the decreasing trend of the ith sample in the mth training period is defined as follows:
wherein the content of the first and second substances,represents the predicted value of the m-th training period to the i-th sample, yiThe true value of the ith sample is represented, and e represents the hyper-parameter.
In a further preferred embodiment, the tolerance to small variations is adjusted using the over-parameter e, the instability of the training sample is preferably obtained by:
where M represents the total number of cycles, the closer the equation is to 1, the greater the instability of the training samples.
In the present invention, a threshold η is set, preferably in the range of (0) to (0.5), wherein samples with instability greater than the threshold form an unstable sample set, denoted Q.
In the present invention, all training samples in the unstable sample set Q are preferably selected to calculate the inter-trimmed similarity.
And 2, clustering the selected training samples, training each type of sample and storing a group of weights.
Preferably, the clustering is performed according to a method comprising the steps of:
step 2-1, obtaining the sum of the similarity between the sample u in the unstable sample set Q and all other samples in Q, and marking all the samples as unprocessed state;
step 2-2, performing descending order arrangement on all unstable samples according to the sum of the similarity, and traversing the samples;
and 2-3, clustering according to the processing state of the sample.
Preferably, when clustering is performed, firstly, whether the sample is unprocessed or not is judged, and if the sample is processed, the next cycle process is started; if the samples are not processed, a new cluster is created, all unstable samples are sorted in descending order according to the similarity of the currently processed samples, and the samples are traversed.
More preferably, in the created new cluster, the processing state of each sample is judged, and if the sample is processed, the next cycle process is entered; if the sample is not processed, it is determined whether the sample is similar to all samples in the cluster, if so, the sample is added to the cluster, and if not, the sample is skipped.
In the invention, the clustering method follows two principles, firstly, the fine tuning similarity of all samples in each cluster is positive; second, the number of clusters should be as small as possible to reduce the space required for model storage.
By the heuristic clustering method, time cost can be saved by real-time fine adjustment, each cluster is used for learning a group of parameters, and the method is very effective for testing images similar to the cluster.
Wherein samples not in the unstable sample set Q are designated as a cluster, denoted S0. And (3) using each cluster to fine-tune the density map generator of the basic model to obtain K +1 weight sets, wherein K represents the number of clusters obtained in the step 2.
And 3, training a weight retrieval module.
In the present invention, in order to obtain the optimal weight of the test sample, each cluster is preferably regarded as a class, a multi-class classifier is trained, that is, the weight retrieval module, and the established population counting network model is marked as FDC (i.e., MFDC) with multiple parameter sets.
According to a preferred embodiment of the present invention, when training the multi-class classifier, the soft label is represented by the following formula:
wherein S isjFor the jth cluster, xiIs the ith sample, s (i, j) is the similarity between sample i and sample j,is the label of sample i on cluster j.
The present inventors consider that a sample belonging to one cluster may have positive inter-trimmed similarity with some samples in other clusters, and therefore, in the present invention, a soft label described by the above formula is adopted instead of simply using a hard label, which is calculated based on the average similarity between the sample and the samples in the cluster.
In a further preferred embodiment, ResNet-18 is employed as the backbone for the multi-class classifier.
Wherein the input to the classifier comprises the original training image and the output of the feature extractor in the base model.
Preferably, the original training image is aligned to the same size of the feature extractor output by the shallow CNN-Pool-CNN structure.
In a further preferred embodiment, the cross-entropy loss function for training the multi-class classifier is as follows:
wherein L is a loss function value, T is a total number of samples, N is a total number of classes,for the purpose of the calculated soft label,the prediction probability for sample i classified as cluster j.
When a test image is tested, a prediction result of a weight retrieval module (a multi-class classifier) after training convergence is represented as the probability that the image belongs to a certain cluster. If each probability is small, the probability of representing data belonging to each cluster is small, and it is considered to be from cluster 0.
In the invention, the trained multi-class classifier can predict class labels of the test data, and the prediction result is used for retrieving the optimal weight so as to dynamically select a group of specific parameters according to the characteristics of the test image, thereby greatly improving the performance.
The population counting method using storage enhancement adopts a multi-weight strategy for population counting, the strategy utilizes the relation among samples, improves the generalization capability of a single simple network with a plurality of parameter sets, can be integrated with most of the existing methods, and can obviously improve the performance of the existing methods; and meanwhile, a plurality of clusters of the training images are obtained by adopting an effective task-driven similarity and clustering method, each cluster is used for learning a group of parameters, and the method is very effective for testing the images similar to the clusters.
The invention also provides a computer readable storage medium storing a program for population counting using memory enhancement, which program, when executed by a processor, causes the processor to carry out the steps of the method for population counting using memory enhancement.
The crowd counting method using memory enhancement in the present invention can be implemented by means of software plus necessary general hardware platform, the software is stored in a computer readable storage medium (including ROM/RAM, magnetic disk, optical disk), and includes several instructions to make a terminal device (which may be a mobile phone, a computer, a server, a network device, etc.) execute the method of the present invention.
The invention also provides a computer device comprising a memory and a processor, the memory storing a program for people counting using memory enhancement, which program, when executed by the processor, causes the processor to carry out the steps of the method for people counting using memory enhancement.
Examples
The present invention is further described below by way of specific examples, which are merely exemplary and do not limit the scope of the present invention in any way.
Example 1
1. Data set
This example was performed on three datasets of ShanghaiTech Part A, UCF-QNRF, and NWPU-crown, in that order.
Wherein, the ShanghaiTech Part A data set refers to: desenzhou/ShanghaiTechDataset, data set associated in Single Image Crowd Counting via Multi Column volumetric Neural Network (MCNN) (githu. com); the UCF-QNRF dataset refers to CRCV Center for Research in Computer Vision at the University of Central Florida (ucf.edu); the NWPU-Crow dataset refers to a Crown Benchmark.
A basic case introduction for these three data sets is shown in table 1.
TABLE 1
Name (R) | Number of pictures | Number of people |
ShangHaiTech Part A | 482 | 241667 |
UCF-QNRF | 1535 | 1.25million |
NWPU-Crowd | 5109 | Unknown |
2. Performance evaluation criteria:
performance indicators include Mean Absolute Error (MAE) and Mean Square Error (MSE).
3. Task description
The network is trained using a training set provided by the public data set, and predictions are made on a test set provided by the public data set. For the ShangHaiTech Part A dataset and the UCF-QNRF dataset, the prediction index is calculated from the open test set. The NWPU-crown dataset is submitted on crown Benchmark to obtain feedback.
4. Results and analysis
The method provided by the invention is compared with the existing method on different data sets, and the comparison result of the average absolute error (MAE) and the Mean Square Error (MSE) is shown in tables 2-4.
Table 2 shows the comparison result between the method of the present invention and the existing method on the ShangHaiTech Part _ a dataset, and table 3 shows the comparison result between the method of the present invention and the existing method on the UCF-QNRF dataset, and the comparison result between the method of the present invention and the existing method on the NWPU-crwood dataset.
TABLE 2
The MCNN method is specifically shown in the literature Zhang, Y; zhou, d.; chen, s.; gao, S. & Ma, Y.Single Image Crowd Counting via Multi Column volume protocol of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.589-597,2016 ";
the CSRNet method is specifically disclosed in the literature "Li, Y.; zhang, X. & Chen, D.CSRNet: scaled relational Neural Networks for applying the mapping the high mapped scenes. proceedings of the IEEE Computer Society reference on Computer Vision and Pattern Recognition, 2018';
the ResSFCN-101 method is specifically disclosed in the literature "Laradji, I.H.; rostamzadeh, N.; pinheiro, p.o.; vazzez, D. & Schmidt, M.wheel are the Blobs, Counting by Localization with Point supervision of proceedings of the European Conference on Computer Vision (ECCV), pp.547-562,2018 ";
CAN methods are specifically described in literature "Liu, w.; salzmann, M. & Fua, P.Context-Aware crown counting. proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol.2019-June,2019 ";
the DM-Count method is specifically disclosed in the literature "Wang, B.; liu, h.; samaras, D. & Hoai, M.distribution Matching for Crowd counting. proceedings of Advances in Neural Information Processing Systems, 2020';
S-DCNet method and SS-DCNet (cls) are specifically disclosed in the literature "Xiong, H.; lu, h.; liu, c.; liang, l.; cao, Z. & Shen, C.from Open Set to Closed Set, Supervised Spatial Divide-and-Conquer for Object counting. proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.8362-8371,2019 ";
M-MCNN is MCNN modified by the method of the invention;
M-CSRNet is CSRNet improved using the method of the invention;
FDC-18 is a method using a basic model;
MFDC-18 is the method of the present invention (employing FDC with multiple parameter sets).
TABLE 3
Method | MAE | MSE |
MCNN | 277 | 426 |
Switch-CNN | 228 | 445 |
CAN | 107 | 183 |
CSRNet | 98.2 | 157.2 |
S-DCNet | 97.7 | 167.6 |
DM-Count | 85.6 | 148.3 |
SS-DCNet(cls) | 81.9 | 143.8 |
M-MCNN | 234.1 | 381.8 |
M-CSRNet | 83.1 | 144.6 |
FDC-18 | 93.0 | 157.3 |
MFDC-18 | 76.2 | 121.5 |
The Switch-CNN method is specifically disclosed in the literature "Sam, D.B.; surya, S. & Babu, R.V.switching capacitive neutral Network for Crowd counting. proceedings IEEE Conference on Computer Vision and Pattern Recognition, Vol.2017-January,2017 ".
TABLE 4
Where O _ MAE denotes MAE averaged by picture, O _ MSE denotes MSE averaged by picture, O _ NAE denotes MAE normalized by number of people, avg.
As can be seen from tables 2-4, on the data set of Shanghai Tech Part A and UCF-QNRF, the MFDC-18 method disclosed by the invention has the lowest MSE and MAE values, and the MCNN and CSRNet (namely M-MCNN and M-CSRNet) improved by the method disclosed by the invention are improved in performance compared with the original method.
On the NWPU-crown data set, the method disclosed by the invention is greatly superior to the previous method in indexes of O _ MAE and O _ MSE, and the effectiveness of the method is illustrated; the classification test error of the method based on the Avg, MAE (S/L), namely the scene and illumination is still greatly superior to that of the previous method, and the effectiveness of the method in various scene categories is proved.
Further, fig. 2 shows the effect of the improvement of the multi-weight Method (MFDC) to the single-weight method (FDC) according to the present invention, and as can be seen from fig. 2, the MFDC method greatly improves the prediction accuracy compared to the FDC method.
FIG. 3 shows the comparison of the population Count and performance of the population counting method using the memory enhancement of the present invention with the prior art methods (MCNN, SANet, PCCNet-light, CSRNet, SFCN-101, Bayesian, SCAR, CAN, SDCNT, DM-Count, respectively).
As can be seen from fig. 3, under the condition of similar parameter quantities, the method of the present invention greatly reduces the average error value compared with the conventional method.
The invention has been described in detail with reference to specific embodiments and illustrative examples, but the description is not intended to be construed in a limiting sense. Those skilled in the art will appreciate that various equivalent substitutions, modifications or improvements may be made to the technical solution of the present invention and its embodiments without departing from the spirit and scope of the present invention, which fall within the scope of the present invention.
Claims (4)
1. A method of population counting, the method comprising a training phase and a testing phase, the training phase comprising the steps of:
step 1, obtaining the similarity between training images and selecting a training sample;
step 1-1, establishing a crowd counting network model;
step 1-2, obtaining the similarity between training image samples;
definition T { (x)i,yi) 1,2, N, wherein i is a training set with N sample crops, and the sample crops are marked for samples and corresponding points thereof; the feature extractor and density map generator for a given model are defined as f ═ Ψ (x,θ1) And d ═ Φ (f, θ)2) Wherein, theta1And theta2Parameters of the feature extractor and the density map generator, respectively; the loss function is expressed asWhere ŷ ═ Φ (Ψ (x, θ)1),θ2);
The similarity between the training image samples is obtained by a method comprising the steps of:
step 1-2-1, obtaining the loss of the ith sample;
wherein the ith sample (x) of the basic modeli,yi) Has a loss of li=L(Φ(fi,θ2),yi) The basic model is a population counting network model;
step 1-2-2, fine-tuning parameters of a density map generator of the crowd counting network model;
wherein, the parameters of the density chart generator of the basic model are finely adjusted to obtain the optimal parameters
Step 1-2-3, obtaining the loss of the jth sample, and obtaining the fine tuning similarity between the ith sample and the jth sample;
the fine-tuning similarity between the ith and jth samples is obtained by:
li=L(Φ(fi,θ2),yi) I-th sample (x) of the population count network modeli,yi) Loss of (d);representing fine-tuning modelsSample j (x)j,yj) Loss of (d);
step 1-3, selecting a training sample;
an index I (I, m) of the falling tendency of the ith sample in the mth training period is defined as follows:
xiis the ith sample; y isiRepresenting the true value of the ith sample, and epsilon representing the hyper-parameter;
the instability of the training sample is obtained by:
setting a threshold eta with the value of 0-0.5, wherein a sample with instability greater than the threshold is selected to form an unstable sample set, which is represented as Q;
step 2, clustering the selected training samples, and storing a group of weights for each class;
the clustering is performed according to a method comprising the following steps:
step 2-1, obtaining the sum of the similarity between the sample u in the unstable sample set Q and all other samples in Q, and marking all the samples as unprocessed state;
step 2-2, performing descending order arrangement on all unstable samples according to the sum of the similarity, and traversing the samples;
step 2-3, clustering is carried out according to the processing state of the sample;
in the step 2, the process is carried out,
when clustering is carried out, firstly, judging whether a sample is unprocessed or not, and if the sample is processed, entering the next cycle process; if the samples are not processed, a new cluster is created, all unstable samples are sorted in a descending order according to the similarity of the currently processed samples, and the samples are traversed;
in the new cluster creation, the processing state of each sample is judged, and if the sample is processed, the next cycle process is started; if the sample is not processed, judging whether the sample is similar to all samples in the cluster, if so, adding the sample into the cluster, and if not, skipping the sample;
step 3, training a weight retrieval module;
in step 3, when the weight retrieval module is trained, the soft label shown as the following formula is adopted:
2. The population counting method according to claim 1, wherein in step 1-1, said constructed population counting network model is composed of a standard Feature Pyramid Network (FPN) as a basic feature extractor and a dilated convolution as a density map generator.
3. A computer-readable storage medium, in which a program for population counting enhanced with storage is stored, which program, when executed by a processor, causes the processor to carry out the steps of the population counting method according to one of claims 1 to 2.
4. A computer device comprising a memory and a processor, the memory storing a population counting program enhanced with memory, the program, when executed by the processor, causing the processor to carry out the steps of the population counting method according to one of claims 1 to 2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110169724.9A CN112818884B (en) | 2021-02-07 | 2021-02-07 | Crowd counting method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110169724.9A CN112818884B (en) | 2021-02-07 | 2021-02-07 | Crowd counting method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112818884A CN112818884A (en) | 2021-05-18 |
CN112818884B true CN112818884B (en) | 2021-11-30 |
Family
ID=75862262
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110169724.9A Active CN112818884B (en) | 2021-02-07 | 2021-02-07 | Crowd counting method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112818884B (en) |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101295305B (en) * | 2007-04-25 | 2012-10-31 | 富士通株式会社 | Image retrieval device |
CN101464950B (en) * | 2009-01-16 | 2011-05-04 | 北京航空航天大学 | Video human face identification and retrieval method based on on-line learning and Bayesian inference |
CN102436589B (en) * | 2010-09-29 | 2015-05-06 | 中国科学院电子学研究所 | Complex object automatic recognition method based on multi-category primitive self-learning |
CN102306281B (en) * | 2011-07-13 | 2013-11-27 | 东南大学 | Multi-mode automatic estimating method for humage |
CN102799935B (en) * | 2012-06-21 | 2015-03-04 | 武汉烽火众智数字技术有限责任公司 | Human flow counting method based on video analysis technology |
CN103295031B (en) * | 2013-04-15 | 2016-12-28 | 浙江大学 | A kind of image object method of counting based on canonical risk minimization |
CN105631418B (en) * | 2015-12-24 | 2020-02-18 | 浙江宇视科技有限公司 | People counting method and device |
CN106874862B (en) * | 2017-01-24 | 2021-06-04 | 复旦大学 | Crowd counting method based on sub-model technology and semi-supervised learning |
CN107358596B (en) * | 2017-04-11 | 2020-09-18 | 阿里巴巴集团控股有限公司 | Vehicle loss assessment method and device based on image, electronic equipment and system |
CN107506703B (en) * | 2017-08-09 | 2020-08-25 | 中国科学院大学 | Pedestrian re-identification method based on unsupervised local metric learning and reordering |
-
2021
- 2021-02-07 CN CN202110169724.9A patent/CN112818884B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN112818884A (en) | 2021-05-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107480261B (en) | Fine-grained face image fast retrieval method based on deep learning | |
WO2020073951A1 (en) | Method and apparatus for training image recognition model, network device, and storage medium | |
WO2020114378A1 (en) | Video watermark identification method and apparatus, device, and storage medium | |
Firpi et al. | Swarmed feature selection | |
CN110598598A (en) | Double-current convolution neural network human behavior identification method based on finite sample set | |
CN105809672B (en) | A kind of image multiple target collaboration dividing method constrained based on super-pixel and structuring | |
CN110322445B (en) | Semantic segmentation method based on maximum prediction and inter-label correlation loss function | |
CN107169117B (en) | Hand-drawn human motion retrieval method based on automatic encoder and DTW | |
CN111723829B (en) | Full-convolution target detection method based on attention mask fusion | |
WO2021051987A1 (en) | Method and apparatus for training neural network model | |
CN110751027B (en) | Pedestrian re-identification method based on deep multi-instance learning | |
CN106682681A (en) | Recognition algorithm automatic improvement method based on relevance feedback | |
CN104966052A (en) | Attributive characteristic representation-based group behavior identification method | |
CN109190666B (en) | Flower image classification method based on improved deep neural network | |
CN111524140B (en) | Medical image semantic segmentation method based on CNN and random forest method | |
CN110163130B (en) | Feature pre-alignment random forest classification system and method for gesture recognition | |
CN114118207B (en) | Incremental learning image identification method based on network expansion and memory recall mechanism | |
CN115457332A (en) | Image multi-label classification method based on graph convolution neural network and class activation mapping | |
CN110969639B (en) | Image segmentation method based on LFMVO optimization algorithm | |
Chou et al. | A hierarchical multiple classifier learning algorithm | |
Mund et al. | Active online confidence boosting for efficient object classification | |
CN112818884B (en) | Crowd counting method | |
CN110751662B (en) | Image segmentation method and system for quantum-behaved particle swarm optimization fuzzy C-means | |
Wei et al. | Salient object detection based on weighted hypergraph and random walk | |
CN110941994B (en) | Pedestrian re-identification integration method based on meta-class-based learner |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Li Guorong Inventor after: Liu Xinyan Inventor after: Su Li Inventor after: Huang Qingming Inventor before: Liu Xinyan Inventor before: Li Guorong Inventor before: Su Li Inventor before: Huang Qingming |
|
CB03 | Change of inventor or designer information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |