CN111428587A - Crowd counting and density estimating method and device, storage medium and terminal - Google Patents

Crowd counting and density estimating method and device, storage medium and terminal Download PDF

Info

Publication number
CN111428587A
CN111428587A CN202010162552.8A CN202010162552A CN111428587A CN 111428587 A CN111428587 A CN 111428587A CN 202010162552 A CN202010162552 A CN 202010162552A CN 111428587 A CN111428587 A CN 111428587A
Authority
CN
China
Prior art keywords
crowd
density
labeled
model
image data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010162552.8A
Other languages
Chinese (zh)
Other versions
CN111428587B (en
Inventor
李莉
赵震
林国义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN202010162552.8A priority Critical patent/CN111428587B/en
Publication of CN111428587A publication Critical patent/CN111428587A/en
Application granted granted Critical
Publication of CN111428587B publication Critical patent/CN111428587B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention provides a crowd counting and density estimating method, a device, a storage medium and a terminal. Firstly, randomly selecting a plurality of images from a non-labeled crowd image data set for labeling to generate corresponding crowd density labels, and adding the crowd density labels into a labeled crowd image data set; training a population counting and density estimation model until the model converges; selecting a plurality of images from the remaining unlabelled crowd image data set by using a probability weighted selection strategy to generate corresponding crowd density labels, and adding the corresponding crowd density labels into the labeled crowd image data set; continuously iterating and optimizing until the model meets the performance requirement; adding an antagonistic learning branch based on feature mixing and gradient inversion into the model, and training the model by using the labeled crowd image and the unlabeled crowd image; and carrying out crowd counting and density estimation on the crowd image to be predicted. The invention fundamentally reduces the labeling workload of the crowd image and improves the generalization performance of the model.

Description

Crowd counting and density estimating method and device, storage medium and terminal
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for crowd counting and density estimation, a storage medium, and a terminal.
Background
In recent years, convolutional neural networks have enjoyed great success in the fields of population counting and density estimation. Convolutional neural networks typically have training parameters on the order of tens of millions, and typically require large amounts of labeled data to prevent overfitting of the model. In constructing a data set for the people counting and density estimation task, it is generally necessary to mark the center of each person's head in each image and record its coordinate position. Since there may be a large number of heads in each image of the crowd, the labeling method is time-consuming and labor-consuming. The mainstream method is to label all the acquired images of the crowd and train the model by using a fully supervised learning method, but it inevitably needs to expend a great deal of effort and even financial resources for labeling the data. Therefore, for a scene that cannot be labeled with a large number of images of a crowd for some reason, the fully supervised learning method may cause a rapid decrease or even collapse of the model effect due to lack of sufficient labeling data. Therefore, the application of convolutional neural networks in the field of population counting and density estimation lacking labeled images is yet to be further researched.
In a real scene, crowd images can be obtained from video monitoring and other equipment in a large quantity, can be used for providing the most intuitive crowd distribution information, and has important application in the aspects of security, business discovery, people counting and the like. In general, a model trained by using a crowd image under a certain feature distribution or shot parameter may have performance degraded or even collapsed in other scenes. Therefore, in practical applications, in order to obtain optimal performance, images of people closest to the future use scene are usually collected and labeled for model training. But at the same time, if the acquired images are all labeled, a great deal of labeling work is inevitably caused. Related research has found that many training samples only provide redundant training functions in model training, and even compromise model training. Therefore, if data which is most beneficial to model training can be selected as much as possible by a certain method and label-free data is fully utilized, labeling work can be reduced fundamentally and even an effect superior to full labeling can be achieved.
However, it is a great challenge to pick out the most needed images of the model from a large number of unlabeled images and to make full use of the unlabeled images. At present, in the field of population counting and density estimation, related researches for screening images from unlabeled images for labeling are few, and only a few researches are conducted to explore how to fully utilize the unlabeled images: for example, by using a generative countermeasure network, two classification branches of a real image and a false image generated by the model are added in the model, so that the unlabeled real image can participate in the training process of the model, and the generalization performance of the common part of the model is improved; a special multi-stage self-coding structure is provided, so that most parameters in a model can be trained by a label-free image, and only a small number of model parameters need to be trained by using a label image; self-supervision sequencing loss is realized by counting the selected regions of the crowd images, and the loss function can be used on the labeled images and the unlabeled images at the same time; model performance is improved with additional artificial images and domain adaptive techniques. However, these studies still have many disadvantages. First, the improvements brought by these operations are limited and do not achieve practical effects. Secondly, in order to introduce label-free data in the training process, some unstable modules may be added, so that the whole model is difficult to train or the training time is too long, and the method is not suitable for actual scenes. Furthermore, active learning, while almost blank in research in the area of population counting and density estimation, has demonstrated its advantages in many areas of reducing data annotation workload and speeding up model training efficiency.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, it is an object of the present invention to provide a people counting and density estimating method, apparatus, storage medium and terminal, which are used to solve the problems in the prior art.
To achieve the above and other related objects, a first aspect of the present invention provides a population counting and density estimating method, comprising: randomly selecting a plurality of images from the image data set of the non-labeled crowd with the limited number of labels for labeling to generate corresponding crowd density labels, and adding the crowd density labels into the image data set of the labeled crowd; training a population counting and density estimation model based on the labeled population image dataset until the model converges; based on the model, selecting a plurality of images from the remaining unlabelled crowd image data set by using a probability weighted selection strategy to generate corresponding crowd density labels, and adding the corresponding crowd density labels into the labeled crowd image data set; continuously iterating and optimizing until the performance of the model is required; adding an antagonistic learning branch based on feature mixing and gradient inversion into the model meeting the performance requirement, and training the model by using the labeled crowd image and the unlabeled crowd image; and performing crowd counting and density estimation on the crowd images to be predicted by using the model trained by using the labeled crowd images and the unlabeled crowd images.
In some embodiments of the first aspect of the present invention, said selecting, based on the model, a plurality of images from the remaining unlabeled population image dataset using a probability weighted selection strategy, generating corresponding population density labels, and adding to the labeled population image dataset, comprises: predicting the rest unlabeled crowd image data sets by using the model to obtain a density prediction graph and a counting prediction value of each unlabeled crowd image data; carrying out region division on the counting prediction value set by utilizing a natural breakpoint method, and dividing the labeled crowd image and the unlabeled crowd image according to the obtained regions; calculating a similarity metric value of the labeled crowd image data set and the unlabeled crowd image data set in each region based on grid division; respectively converting the similarity metric value of each image data of the non-tag crowd in each region into a probability value through normalization; and performing non-playback sampling on the image data of the non-labeled crowd in each region according to the probability value, performing head labeling on the extracted image data of the non-labeled crowd, generating a crowd density label, and adding the crowd density label into the image data set of the labeled crowd.
In some embodiments of the first aspect of the present invention, the method for obtaining the crowd density label includes obtaining by convolving a head label of the image with a gaussian kernel or an adaptive gaussian kernel.
In some embodiments of the first aspect of the present invention, the method of training a population count and density estimation model based on the labeled population image dataset comprises a density map regression method.
In some embodiments of the first aspect of the present invention, the density map regression method can employ functions including a mean square error loss function.
In some embodiments of the first aspect of the present invention, the antagonistic learning branch comprises a feature mixture layer, a gradient inversion layer, and a subsequent dichotomy module.
In some embodiments of the first aspect of the present invention, the functions that the antagonistic learning branch can take include a two-class cross entropy loss function.
To achieve the above and other related objects, a second aspect of the present invention provides a crowd counting and density estimating apparatus, comprising: the labeling module randomly selects a plurality of images from the image data set of the non-labeled crowd with the label quantity limitation for labeling, generates corresponding crowd density labels and adds the corresponding crowd density labels into the image data set of the labeled crowd; a training module for training a population count and density estimation model based on the labeled population image dataset until the model converges; the screening module is used for selecting a plurality of images from the rest non-label crowd image data sets by using a probability weighted selection strategy based on the model, generating corresponding crowd density labels and adding the corresponding crowd density labels into the labeled crowd image data sets; the circulation module repeats the operation of the training module and the screening module until the model meets the performance requirement; the characteristic mixing module is used for adding an antagonistic learning branch based on characteristic mixing and gradient inversion into the model meeting the performance requirement, and training the model by using the labeled crowd image and the unlabeled crowd image; and the counting module is used for carrying out crowd counting and density estimation on the crowd image to be predicted by using the model trained by the labeled crowd image and the unlabeled crowd image.
To achieve the above and other related objects, a third aspect of the present invention provides a computer-readable storage medium having a computer program stored thereon, the computer program, when executed by a processor, implementing the crowd counting and density estimation method.
To achieve the above and other related objects, a fourth aspect of the present invention provides an electronic terminal comprising: a processor and a memory; the memory is used for storing computer programs, and the processor is used for executing the computer programs stored by the memory so as to enable the terminal to execute the crowd counting and density estimation method.
As described above, the crowd counting and density estimating method, apparatus, storage medium and terminal according to the present invention have the following advantageous effects:
(1) the labeling cost of the method is low. In general, the main cost of deep learning comes from the labeling of training data, and the labeling cost of a large data set may reach tens of millions of dollars. The active learning method provided by the invention can fundamentally reduce the labeling workload of the crowd images, simultaneously keeps higher model performance, and particularly when a small model or a mobile end model is faced, the parameter quantity is less, and the overlarge training data quantity is not needed, so that the training data selected by active learning plays a greater role, and the model performance is extremely close to or even surpasses the full supervision learning. And the semi-supervised learning based on feature mixing and gradient inversion fully utilizes the acquired data, further improves the model capability by utilizing massive label-free images, and achieves the purpose of practicality only labeling a small amount of data.
(2) The generalization performance of the model is good. According to the method, only data which is more beneficial to the model is labeled, so that the model is not influenced by negative effects of harmful data, and meanwhile, due to the fact that semi-supervised learning based on feature mixing and gradient inversion is used, the model can make full use of a large amount of multi-source unlabeled data, and the generalization performance of the model is improved.
Drawings
FIG. 1 is a flow chart illustrating a method for population count and density estimation according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of a crowd counting and density estimating apparatus according to an embodiment of the invention.
Fig. 3 is a schematic structural diagram of an electronic terminal according to an embodiment of the invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It is noted that in the following description, reference is made to the accompanying drawings which illustrate several embodiments of the present invention. It is to be understood that other embodiments may be utilized and that mechanical, structural, electrical, and operational changes may be made without departing from the spirit and scope of the present invention. The following detailed description is not to be taken in a limiting sense, and the scope of embodiments of the present invention is defined only by the claims of the issued patent. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. Spatially relative terms, such as "upper," "lower," "left," "right," "lower," "below," "lower," "above," "upper," and the like, may be used herein to facilitate describing one element or feature's relationship to another element or feature as illustrated in the figures.
In the present invention, unless otherwise expressly specified or limited, the terms "mounted," "connected," "secured," "retained," and the like are to be construed broadly, e.g., as meaning fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," and/or "comprising," when used in this specification, specify the presence of stated features, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, operations, elements, components, items, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions or operations are inherently mutually exclusive in some way.
The query screening strategy is one of the most critical steps for active learning, determines the final performance of the model to the greatest extent, and is mainly divided into an uncertainty strategy and a diversity strategy. Currently, active learning is mainly applied to the classification field, and because the classification task is relatively simple, uncertainty and diversity, such as entropy, confidence and the like of classification probability, can be defined according to a plurality of intuitive ways. However, the task of population counting and density estimation is a regression problem with intensive predictions, and there is no research on its query screening strategy.
The active learning strategy is made according to the sample diversity. Repeated experiments show that images with different population densities play an important role in the generalization performance of the model. If the crowd density is too single, the model can be subjected to more serious overfitting and is poorer in performance on a test set, so that the method considers that the model can be positively influenced by uniformly sampling in each interval of the crowd density. Assuming that each image is equal in size and the population distribution is relatively uniform, the most intuitive embodiment of the population density is the population count of the image. Meanwhile, if the images obtained by screening are too similar, information redundancy can be brought, so that the GDSIM similarity measurement scheme is provided to avoid the information redundancy as much as possible.
Semi-supervised learning based on confrontation has been widely used in various research fields, however, almost blank in the field of population counting and density estimation. Since the labeled exemplars and unlabeled exemplars are from the same domain, it is difficult for general counterlearning to play a good role. Therefore, the invention provides a feature mixing countermeasure module based on feature mixing and gradient inversion, and discrete data are continuous as much as possible to reduce the difficulty of feature alignment of the model
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention are further described in detail by the following embodiments in conjunction with the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example one
Fig. 1 is a schematic flow chart of the method for population counting and density estimation in the present embodiment, which includes the following specific steps:
step S11: randomly selecting a plurality of images from the image data set of the non-labeled crowd with the limitation of label quantity to label, generating corresponding crowd density labels, and adding the corresponding crowd density labels into the image data set of the labeled crowd.
In a preferred embodiment of this embodiment, the method for obtaining the crowd density label includes obtaining the crowd density label by convolving the head label of the image with a gaussian kernel or an adaptive gaussian kernel, that is:
Figure BDA0002406302350000051
Figure BDA0002406302350000061
Figure BDA0002406302350000062
wherein σ0Representing the Gaussian kernel, σiRepresenting an adaptive Gaussian kernel, P being a graphLike any coordinate value in x, piIs the coordinate value G of the human head i in the image xσSelecting K persons closest to person head i in the image x by a Nearest Neighbor classification algorithm (KNN) for a Gaussian kernel function with variance of sigma, wherein N is a population count value of the image x, and d isikFor the pixel distance between head i and head K in the K labels, β is
Figure BDA0002406302350000063
The scaling factor of (c).
Step S12: training a population count and density estimation model based on the labeled population image dataset until the model converges.
In a preferred embodiment of the present invention, the method for training the population count and density estimation model based on the labeled population image dataset includes a density map regression method. The functions that can be adopted by the density map regression method include a mean square error loss function, which is specifically expressed as:
Figure BDA0002406302350000064
wherein N is the batch size of input data in the training process,
Figure BDA0002406302350000065
and
Figure BDA0002406302350000066
the density prediction graph and the density label graph of the image n are respectively represented, and the type of the model and whether the model is pre-trained are not limited.
Specifically, the crowd counting and density estimation model comprises a feature extractor and a density regression module. The input of the feature extractor is a crowd image, and the input of the density regression module is the output feature of the feature extractor. The training process is end-to-end training.
Step S13: based on the model, selecting a plurality of images from the remaining unlabeled crowd image data set by using a probability weighted selection strategy to generate corresponding crowd density labels, and adding the corresponding crowd density labels into the labeled crowd image data set.
Alternatively, step S13 in this embodiment can be implemented by steps S131 to S135 described below.
Step 131: using said model RtFor the remaining unlabeled population image dataset UtPredicting to obtain image data of each non-label crowd
Figure BDA0002406302350000067
Density prediction map of
Figure BDA0002406302350000068
And counting the predicted values
Figure BDA0002406302350000069
Wherein t is the active learning turn, and j is the serial number of the image data of the non-label crowd.
Step 132: counting prediction value set by utilizing natural breakpoint method
Figure BDA00024063023500000610
And performing area division, and dividing the labeled crowd image and the unlabeled crowd image according to the obtained areas. The natural breakpoint method classifies the research objects according to the statistical distribution rule of the data, including the natural turning points, the characteristic points and the like of the data, so that the difference between classes is maximized and the difference between the same classes is minimized. Specifically, the research data can be made into a frequency histogram, a gradient curve graph and an accumulation frequency histogram, which are all helpful for finding out natural break points of the data. The natural breakpoint method comprises a jenks natural breakpoint method.
Step 133: calculating each region Pk tImage dataset V of people with middle tagsk tAnd a tagless crowd image dataset Uk tBased on the grid-partitioning similarity metric. Specifically, the calculation formula of the similarity metric value may be represented as:
Figure BDA0002406302350000071
wherein the image ujEvenly divided into 4 according to arealBlock, LAThe index of the fine granularity of the grid is,
Figure BDA0002406302350000072
for the count prediction value of the l area of the jth unlabeled picture,
Figure BDA0002406302350000073
and the true counting value of the ith area of the ith labeled image is obtained.
Step 134: respectively combining each region Pk tImage data of each non-label crowd
Figure BDA0002406302350000074
Measure of similarity of
Figure BDA0002406302350000075
And converting into probability values through normalization. Specifically, the calculation formula of the probability value is represented as:
Figure BDA0002406302350000076
where J represents the number of unlabeled images.
Step S135: in each region P according to the above probability valuesk tAnd performing non-return sampling on the image data of the non-labeled crowd, performing head labeling on the extracted image data of the non-labeled crowd, generating crowd density labels, and adding the crowd density labels into the image data set of the labeled crowd.
Step S14: and continuously iterating and optimizing until the model meets the performance requirement. In particular, the performance requirements include accuracy, sensitivity, specificity, and mahalanobis correlation coefficients of the model; error rate, accuracy, P-R curve, F1 metric, ROC curve, and cost curve are also included.
Step S15: and adding an antagonistic learning branch based on feature mixing and gradient inversion into the model meeting the performance requirement, and training the model by using the labeled crowd image and the unlabeled crowd image.
Specifically, a counterstudy branch based on feature mixing and gradient inversion is connected behind a feature extractor of the model, a density regression branch is not changed, and the two branches are independent of each other. The density regression branch is only influenced by the images of the labeled population, and the feature extractor and the counterstudy branch are simultaneously influenced by the images of the labeled population and the images of the unlabeled population.
In a preferred embodiment of the present invention, the pair of learning branches includes a feature mixture layer, a gradient inversion layer and two subsequent classification modules. The feature mixture layer can be represented as:
λ~Beta(α,α);
λ′=max(λ,1-λ);
F(x′)=λ′F(x1)+(1-λ′)F(x2);
y′=λ′y1+(1-λ′)y2
Figure BDA0002406302350000081
wherein Beta is Beta distribution, α is Beta distribution parameter, x1And x2For any two images, F (x) is the output feature of image x at the feature extractor, y1And y2Each represents x1And x2The category label of (1).
Preferably, the antagonistic learning branch employs a two-class cross-entropy loss function:
Figure BDA0002406302350000082
wherein N is a data serial number, N is a batch size of input data in the training process, and ynClass labels for features n, pnIs a probabilistic predictor that feature n belongs to category 1.
Specifically, the embodiment adopts an end-to-end training mode, and optimizes the model by using a density regression branch and a counterstudy branch, where the total loss function is represented as:
L=Lreg+λLa
where λ is the scaling factor.
Step S16: and performing crowd counting and density estimation on the crowd images to be predicted by using the model trained by using the labeled crowd images and the unlabeled crowd images. Specifically, the density regression branch of the model is utilized to perform population counting and density estimation on the image to be predicted.
In summary, the embodiments of the present invention are based on an active learning idea, utilize sample diversity to perform uniform sampling on each interval of population density, and provide a GDSIM similarity measurement scheme to avoid occurrence of information redundancy as much as possible, and provide a semi-supervised learning method based on feature mixing and gradient inversion, so as to make discrete data continuous as much as possible to reduce difficulty of feature alignment performed by a model, and make full use of unlabeled image data, so that the embodiments of the present invention can obtain a population count and density estimation model with good generalization performance under a condition of extremely low labeling cost.
Example two
Fig. 2 is a schematic structural diagram of a crowd counting and density estimating apparatus according to an embodiment of the invention, including: the labeling module 21 randomly selects a plurality of images from the image data set of the non-labeled crowd with the label quantity limitation for labeling, generates corresponding crowd density labels, and adds the corresponding crowd density labels into the image data set of the labeled crowd; a training module 22 for training a population count and density estimation model based on the labeled population image dataset until the model converges; the screening module 23 is used for selecting a plurality of images from the remaining unlabelled crowd image data set by using a probability weighted selection strategy based on the model, generating corresponding crowd density labels, and adding the corresponding crowd density labels into the labeled crowd image data set; the circulation module 24 continuously iterates and optimizes until the model meets the performance requirement; the feature mixing module 25 is used for adding an antagonistic learning branch based on feature mixing and gradient inversion into the model meeting the performance requirement, and training the model by using the labeled crowd image and the unlabeled crowd image; and the counting module 26 is used for counting the population and estimating the density of the population to be predicted by using the model trained by using the labeled population image and the unlabeled population image.
It should be noted that the modules provided in this embodiment are similar to the methods provided in the foregoing, and therefore, the detailed description is omitted. It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the labeling module may be a processing element separately set up, or may be implemented by being integrated into a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and the processing element of the apparatus calls and executes the functions of the labeling module. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.
For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
EXAMPLE III
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the population counting and density estimation method.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Example four
Fig. 3 is a schematic structural diagram of an electronic terminal according to an embodiment of the present invention. This example provides an electronic terminal, includes: a processor 31, a memory 32, a communicator 33; the memory 32 is connected to the processor 31 and the communicator 33 through a system bus and is used for mutual communication, the memory 32 is used for storing computer programs, the communicator 33 is used for communicating with other devices, and the processor 31 is used for running the computer programs, so that the electronic terminal executes the steps of the people counting and density estimation method.
The above-mentioned system bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface is used for realizing communication between the database access device and other equipment (such as a client, a read-write library and a read-only library). The Memory may include a Random Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In summary, the crowd counting and density estimating method, device, storage medium and terminal provided by the invention can fundamentally reduce the labeling workload of crowd images during crowd counting and improve the generalization performance of the model. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (10)

1. A method for population count and density estimation, comprising:
randomly selecting a plurality of images from the image data set of the non-labeled crowd with the limited number of labels for labeling to generate corresponding crowd density labels, and adding the crowd density labels into the image data set of the labeled crowd;
training a population counting and density estimation model based on the labeled population image dataset until the model converges;
based on the model, selecting a plurality of images from the remaining unlabelled crowd image data set by using a probability weighted selection strategy to generate corresponding crowd density labels, and adding the corresponding crowd density labels into the labeled crowd image data set;
continuously iterating and optimizing until the model meets the performance requirement;
adding an antagonistic learning branch based on feature mixing and gradient inversion into the model meeting the performance requirement, and training the model by using the labeled crowd image and the unlabeled crowd image;
and performing crowd counting and density estimation on the crowd images to be predicted by using the model trained by using the labeled crowd images and the unlabeled crowd images.
2. The method of claim 1, wherein selecting a plurality of images from the remaining unlabeled population image dataset based on the model using a probability weighted selection strategy to generate corresponding population density labels for adding to the labeled population image dataset comprises:
predicting the rest unlabeled crowd image data sets by using the model to obtain a density prediction graph and a counting prediction value of each unlabeled crowd image data;
carrying out region division on the counting prediction value set by utilizing a natural breakpoint method, and dividing the labeled crowd image and the unlabeled crowd image according to the obtained regions;
calculating a similarity metric value of the labeled crowd image data set and the unlabeled crowd image data set in each region based on grid division;
respectively converting the similarity metric value of each image data of the non-tag crowd in each region into a probability value through normalization;
and performing non-playback sampling on the image data of the non-labeled crowd in each region according to the probability value, performing head labeling on the extracted image data of the non-labeled crowd, generating a crowd density label, and adding the crowd density label into the image data set of the labeled crowd.
3. The method of claim 1, wherein the obtaining of the crowd density label comprises convolving a head label of the image with a gaussian kernel or an adaptive gaussian kernel.
4. The method of claim 1, wherein the method of training a population count and density estimation model based on the labeled population image dataset comprises a density map regression method.
5. The method of claim 4, wherein the function that can be employed by the density map regression method comprises a mean square error loss function.
6. The method of claim 1, wherein the antagonistic learning branch comprises a feature mixture layer, a gradient inversion layer, and a subsequent dichotomy module.
7. The method of claim 1, wherein the functions that the antagonistic learning branch can take comprise a two-class cross entropy loss function.
8. A crowd counting and density estimating apparatus, comprising:
the labeling module randomly selects a plurality of images from the image data set of the non-labeled crowd with the label quantity limitation for labeling, generates corresponding crowd density labels and adds the corresponding crowd density labels into the image data set of the labeled crowd;
a training module for training a population count and density estimation model based on the labeled population image dataset until the model converges;
the screening module is used for selecting a plurality of images from the rest non-label crowd image data sets by using a probability weighted selection strategy based on the model, generating corresponding crowd density labels and adding the corresponding crowd density labels into the labeled crowd image data sets;
the loop module continuously iterates and optimizes until the model meets the performance requirement;
the characteristic mixing module is used for adding an antagonistic learning branch based on characteristic mixing and gradient inversion into the model meeting the performance requirement, and training the model by using the labeled crowd image and the unlabeled crowd image;
and the counting module is used for carrying out crowd counting and density estimation on the crowd image to be predicted by using the model trained by the labeled crowd image and the unlabeled crowd image.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the people counting and density estimation method according to any one of claims 1 to 7.
10. An electronic terminal, comprising: a processor and a memory;
the memory is for storing a computer program, and the processor is for executing the computer program stored by the memory to cause the terminal to perform the people counting and density estimation method according to any one of claims 1 to 7.
CN202010162552.8A 2020-03-10 2020-03-10 Crowd counting and density estimating method, device, storage medium and terminal Active CN111428587B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010162552.8A CN111428587B (en) 2020-03-10 2020-03-10 Crowd counting and density estimating method, device, storage medium and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010162552.8A CN111428587B (en) 2020-03-10 2020-03-10 Crowd counting and density estimating method, device, storage medium and terminal

Publications (2)

Publication Number Publication Date
CN111428587A true CN111428587A (en) 2020-07-17
CN111428587B CN111428587B (en) 2022-07-29

Family

ID=71551532

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010162552.8A Active CN111428587B (en) 2020-03-10 2020-03-10 Crowd counting and density estimating method, device, storage medium and terminal

Country Status (1)

Country Link
CN (1) CN111428587B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112181312A (en) * 2020-10-23 2021-01-05 北京安石科技有限公司 Method and system for quickly reading hard disk data
CN112364788A (en) * 2020-11-13 2021-02-12 润联软件系统(深圳)有限公司 Monitoring video crowd quantity monitoring method based on deep learning and related components thereof
CN112906517A (en) * 2021-02-04 2021-06-04 广东省科学院智能制造研究所 Self-supervision power law distribution crowd counting method and device and electronic equipment
CN113361429A (en) * 2021-06-11 2021-09-07 长江大学 Analysis method and experimental device for movement behaviors of stored grain pests
CN113516029A (en) * 2021-04-28 2021-10-19 上海科技大学 Image crowd counting method, device, medium and terminal based on partial annotation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853400A (en) * 2010-05-20 2010-10-06 武汉大学 Multiclass image classification method based on active learning and semi-supervised learning
EP2704060A2 (en) * 2012-09-03 2014-03-05 Vision Semantics Limited Crowd density estimation
CN106991444A (en) * 2017-03-31 2017-07-28 西南石油大学 The Active Learning Method clustered based on peak density
CN108804635A (en) * 2018-06-01 2018-11-13 广东电网有限责任公司 A kind of method for measuring similarity based on Attributions selection
CN110781941A (en) * 2019-10-18 2020-02-11 苏州浪潮智能科技有限公司 Human ring labeling method and device based on active learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853400A (en) * 2010-05-20 2010-10-06 武汉大学 Multiclass image classification method based on active learning and semi-supervised learning
EP2704060A2 (en) * 2012-09-03 2014-03-05 Vision Semantics Limited Crowd density estimation
CN106991444A (en) * 2017-03-31 2017-07-28 西南石油大学 The Active Learning Method clustered based on peak density
CN108804635A (en) * 2018-06-01 2018-11-13 广东电网有限责任公司 A kind of method for measuring similarity based on Attributions selection
CN110781941A (en) * 2019-10-18 2020-02-11 苏州浪潮智能科技有限公司 Human ring labeling method and device based on active learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JENNIFER VANDONI, EMANUEL ALDEA: "Active Learning for High-Density Crowd Count Regression", 《IEEE》 *
张晓鹏: "语音情感识别中主动学习和半监督学习方法研究", 《万方数据》 *
韩新怡: "基于深度学习的人群密度估计算法研究", 《万方数据》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112181312A (en) * 2020-10-23 2021-01-05 北京安石科技有限公司 Method and system for quickly reading hard disk data
CN112364788A (en) * 2020-11-13 2021-02-12 润联软件系统(深圳)有限公司 Monitoring video crowd quantity monitoring method based on deep learning and related components thereof
CN112364788B (en) * 2020-11-13 2021-08-03 润联软件系统(深圳)有限公司 Monitoring video crowd quantity monitoring method based on deep learning and related components thereof
CN112906517A (en) * 2021-02-04 2021-06-04 广东省科学院智能制造研究所 Self-supervision power law distribution crowd counting method and device and electronic equipment
CN112906517B (en) * 2021-02-04 2023-09-19 广东省科学院智能制造研究所 Self-supervision power law distribution crowd counting method and device and electronic equipment
CN113516029A (en) * 2021-04-28 2021-10-19 上海科技大学 Image crowd counting method, device, medium and terminal based on partial annotation
CN113516029B (en) * 2021-04-28 2023-11-07 上海科技大学 Image crowd counting method, device, medium and terminal based on partial annotation
CN113361429A (en) * 2021-06-11 2021-09-07 长江大学 Analysis method and experimental device for movement behaviors of stored grain pests
CN113361429B (en) * 2021-06-11 2022-11-04 长江大学 Analysis method and experimental device for movement behaviors of stored grain pests

Also Published As

Publication number Publication date
CN111428587B (en) 2022-07-29

Similar Documents

Publication Publication Date Title
CN111428587B (en) Crowd counting and density estimating method, device, storage medium and terminal
Amid et al. TriMap: Large-scale dimensionality reduction using triplets
Wang et al. Fast visual object counting via example-based density estimation
ZhuЃ et al. Learning from labeled and unlabeled data with label propagation
Zhang et al. Unsupervised feature selection via adaptive graph learning and constraint
Asadi et al. A convolution recurrent autoencoder for spatio-temporal missing data imputation
Jiang et al. Gaussian-induced convolution for graphs
Yu et al. Dynamic background subtraction using histograms based on fuzzy c-means clustering and fuzzy nearness degree
Seo et al. Self-organizing maps and clustering methods for matrix data
Wang et al. Multi-attention mutual information distributed framework for few-shot learning
Hou et al. Network pruning via resource reallocation
Gao et al. Feature redundancy based on interaction information for multi-label feature selection
Montero et al. Efficient large-scale face clustering using an online Mixture of Gaussians
Zaidi et al. Distributed deep variational information bottleneck
Chen et al. Semi-supervised dual-branch network for image classification
CN111259938B (en) Manifold learning and gradient lifting model-based image multi-label classification method
Umer et al. Comparative analysis of extreme verification latency learning algorithms
Jin et al. Blind image quality assessment for multiple distortion image
Xu et al. Semi-supervised deep density clustering
Ding et al. Saliency detection via background prior and foreground seeds
Jia et al. Revisiting saliency metrics: Farthest-neighbor area under curve
CN111310842A (en) Density self-adaptive rapid clustering method
Li et al. Learning graph neural networks with approximate gradient descent
He et al. Large-scale Graph Sinkhorn Distance Approximation for Resource-constrained Devices
de Freitas et al. Compressed Hierarchical Representations for Multi-task Learning and Task Clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant