CN113822111A - Crowd detection model training method and device and crowd counting method and device - Google Patents

Crowd detection model training method and device and crowd counting method and device Download PDF

Info

Publication number
CN113822111A
CN113822111A CN202110067279.5A CN202110067279A CN113822111A CN 113822111 A CN113822111 A CN 113822111A CN 202110067279 A CN202110067279 A CN 202110067279A CN 113822111 A CN113822111 A CN 113822111A
Authority
CN
China
Prior art keywords
head
detection
crowd
sense organs
detection model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110067279.5A
Other languages
Chinese (zh)
Other versions
CN113822111B (en
Inventor
谷爱国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Zhenshi Information Technology Co Ltd
Original Assignee
Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Zhenshi Information Technology Co Ltd filed Critical Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority to CN202110067279.5A priority Critical patent/CN113822111B/en
Publication of CN113822111A publication Critical patent/CN113822111A/en
Application granted granted Critical
Publication of CN113822111B publication Critical patent/CN113822111B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a crowd detection model training method and device and a crowd counting method and device, wherein the model training method comprises the following steps: acquiring a sample data set; wherein, the head and the five sense organs of the person in the sample picture are marked with a detection frame; training a pre-constructed crowd detection model by using the sample data set to obtain a target crowd detection model; wherein the training comprises: detecting the head of the person and the five sense organs in the sample picture to obtain a head candidate detection frame and a five sense organs candidate detection frame; generating attention feature vectors of corresponding heads by utilizing a heuristic attention weighting network based on the head candidate detection boxes and the five sense organ candidate detection boxes; identifying the authenticity of the corresponding head by utilizing a classification network based on the attention feature vector; and adjusting parameters of the crowd detection model according to the identification result and the detection frame identified in the sample picture. By adopting the invention, the crowd counting accuracy can be improved.

Description

Crowd detection model training method and device and crowd counting method and device
Technical Field
The invention relates to the technical field of computers, in particular to a crowd detection model training method and device and a crowd counting method and device.
Background
People counting is an important computer vision technology for security. In the intelligent security field, the unmanned patrol car can effectively judge crowd gathering conditions through crowd counting, early warning is made in advance, and abnormal behaviors are prevented.
Human head detection is a common people counting method, which calculates the number of people by recognizing the head of people.
Disclosure of Invention
In view of the above, the present invention is directed to a method and an apparatus for training a crowd detection model, and a method and an apparatus for counting crowd, which can improve the accuracy of crowd counting.
In order to achieve the above purpose, the embodiment of the present invention provides a technical solution:
a method of crowd detection model training, the method comprising:
acquiring a sample data set; wherein, the head and the five sense organs of the person in the sample picture are marked with a detection frame;
training a pre-constructed crowd detection model by using the sample data set to obtain a target crowd detection model; wherein the training comprises:
detecting the head of the person and the five sense organs in the sample picture to obtain a head candidate detection frame and a five sense organs candidate detection frame;
generating attention feature vectors of corresponding heads by utilizing a heuristic attention weighting network based on the head candidate detection boxes and the five sense organ candidate detection boxes;
identifying the authenticity of the corresponding head by utilizing a classification network based on the attention feature vector; and adjusting parameters of the crowd detection model according to the identification result and the detection frame identified in the sample picture.
In one embodiment, the detecting the head and the five sense organs of the person in the sample picture to obtain a head candidate detection frame and a five sense organ candidate detection frame includes:
detecting the head in the sample picture by using a pre-trained head detection model to obtain a head candidate detection frame;
obtaining a subgraph of the corresponding head based on the head candidate detection frame;
and detecting each five sense organs in the subgraph by using a five sense organs detection model to obtain the five sense organs candidate detection frame.
In one embodiment, the generating the attention feature vector for the respective head comprises:
extracting a corresponding head subregion characteristic matrix based on a first head candidate detection frame by utilizing a first interested region extraction layer of the heuristic attention weighting network;
utilizing a first global pooling layer of the heuristic attention weighting network to perform global average sampling on the head sub-region feature matrix to obtain a corresponding head average feature vector;
extracting a corresponding feature matrix of the sub-region of the facial features based on each of the facial feature candidate detection boxes by using a second region-of-interest extraction layer of the heuristic attention weighting network;
utilizing a second global pooling layer of the heuristic attention weighting network to carry out average sampling on the feature matrix of each facial feature subregion so as to obtain an average feature vector of the corresponding facial feature;
calculating an attention weight vector for each of the five sense organs in the respective head based on the average feature vector for the head and the average feature vector for the respective five sense organs;
and performing point multiplication on the head average feature vector and the attention weight vector of each corresponding five sense organs respectively, and summing the result of the point multiplication to obtain the attention feature vector of the head corresponding to the first head candidate detection box.
In one embodiment, said calculating an attention weight vector for each of said five sense organs in the respective head comprises:
if the average feature vector exists in the five sense organs, the average feature vector of the corresponding five sense organs is point-multiplied with the average feature vector of the head to obtain a corresponding attention weight vector;
if the average feature vector does not exist for the five sense organs, then the corresponding attention weight vector is zero.
In one embodiment, the adjusting the parameters of the crowd detection model comprises:
adjusting parameters in the head detection model, the facial feature detection model, the heuristic attention weighting network, and the classification network.
In one embodiment, the five sense organs include:
left eye, right eye, left ear, right ear, and mouth.
A method of population counting, comprising:
acquiring a target detection picture;
detecting the head of a person in the target detection picture based on a crowd detection model, and counting the detected head to obtain the number of the person in the target detection picture;
wherein the confidence of the counting head is greater than a preset threshold; the crowd detection model is obtained by training in advance by adopting any crowd detection model training method.
A crowd detection model training apparatus comprising:
the sample data acquisition module is used for acquiring a sample data set; wherein, the head and the five sense organs of the person in the sample picture are marked with a detection frame;
the model training module is used for training a pre-constructed crowd detection model by using the sample data set to obtain a target crowd detection model; wherein the training comprises:
detecting the head of the person and the five sense organs in the sample picture to obtain a head candidate detection frame and a five sense organs candidate detection frame;
generating attention feature vectors of corresponding heads by utilizing a heuristic attention weighting network based on the head candidate detection boxes and the five sense organ candidate detection boxes;
identifying the authenticity of the corresponding head by utilizing a classification network based on the attention feature vector; and adjusting parameters of the crowd detection model according to the identification result and the detection frame identified in the sample picture.
A people counting device comprising:
the detection target acquisition module is used for acquiring a target detection picture;
the head detection module is used for detecting the head in the target detection picture based on a crowd detection model and counting the detected head to obtain the number of people in the target detection picture; wherein the confidence of the counting head is greater than a preset threshold; the crowd detection model is obtained by training by adopting any crowd detection model training method.
A crowd detection model training apparatus comprising a processor and a memory;
the memory has stored therein an application executable by the processor for causing the processor to perform the crowd detection model training method of any one of claims 1 to 6.
A computer readable storage medium having stored therein computer readable instructions for performing the crowd detection model training method as described above.
The embodiment of the invention also provides crowd counting equipment, which comprises a processor and a memory;
the memory has stored therein an application executable by the processor for causing the processor to perform the people counting method as described above.
Embodiments of the present invention also provide a computer-readable storage medium having stored therein computer-readable instructions for performing the people counting method as described above.
According to the technical scheme, in the model training method and device and the crowd counting method and device provided by the embodiment of the invention, in the process of training the crowd detection model by using the sample picture, the five sense organs are detected on the basis of the human head detection, and the results of the human head detection and the five sense organs are comprehensively processed by using a heuristic attention weighting mechanism to generate the attention characteristic vector of each head detected by the human head. Therefore, the difference between the human head and other similar objects in the shape can be improved by utilizing the result of the five sense organs detection and adopting a heuristic attention weighting mechanism, so that the accuracy of the attention feature vector input to the classification network can be improved, the false detection of the human head in the human head detection result is screened out, and the detection accuracy of the crowd detection model can be improved. Accordingly, the accuracy of people counting by using the people detection model is improved.
Drawings
FIG. 1 is a schematic flow chart of a method according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram of a crowd detection model network structure according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a second embodiment of the method of the present invention;
FIG. 4 is a schematic structural diagram of an apparatus according to a third embodiment of the present invention;
fig. 5 is a schematic structural diagram of a fourth apparatus according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
The inventor finds that the existing scheme for counting the crowd by using human head detection exists in the process of realizing the invention: the counting error is large. Through careful study analysis, the specific causes of the problem were found to be as follows:
the existing human head detection scheme is realized based on a general target detection framework. In the scheme, target features are extracted firstly, and then the category and the position of a target are obtained through classification and regression. In an actual application scenario, angles of a person relative to a camera may be different, and differences of characteristics of the head of the person at different angles are relatively large, for example, differences of a front face and a back head are relatively large. The presence of such a large difference makes it easy for similar objects that have less difference in their characteristics from the human head to be misdetected from the human head. Because, in a real scenario, it is inevitable that: at a certain angle, the difference in the characteristics of the human head from other objects is smaller than the difference in the characteristics of the human head at different angles. For example, the plush toy has little difference in head from the human hindbrain region. Thus, the head of the plush toy can be easily identified as the head of a person by using the existing human head detection technology. Therefore, the existing human head detection scheme is easy to have the false detection problem, so that the crowd counting error is larger.
Fig. 1 is a schematic flow diagram of a model training method according to an embodiment of the present invention, and as shown in fig. 1, the model training method implemented in the embodiment mainly includes:
step 101, obtaining a sample data set.
Wherein, the head and the five sense organs of the person in the sample picture are marked with a detection frame.
In practical application, the person skilled in the art can select the five sense organs to be detected according to actual needs. Preferably, in order to effectively screen false detection results of human head detection, the five sense organs to be detected may include: left eye, right eye, left ear, right ear, and mouth. In practical application, the person skilled in the art can set the five sense organs to be detected according to actual needs, as long as: regardless of the shooting angle, the image of the head of the real person includes at least one of the five sense organs in the set of five sense organs. Thus, the head of the person subjected to false detection can be screened out based on the five sense organs, so that the detection accuracy is improved.
In this step, the head and the preset five sense organs in each sample picture need to be marked with the detection frame identifiers, so that when the model is trained, the model parameters are adjusted based on the detection frame identifiers in the sample pictures and the detection results output by the model.
And 102, training a pre-constructed crowd detection model by using the sample data set to obtain a target crowd detection model.
Based on the sample pictures in the sample data set, training the crowd detection model can be specifically realized by adopting the following steps:
detecting the head of the person and the five sense organs in the sample picture to obtain a head candidate detection frame and a five sense organs candidate detection frame;
generating an attention feature vector for each head using a heuristic attention weighting network based on the head candidate detection box and the facial feature candidate detection box for the respective head;
identifying the authenticity of the corresponding head by utilizing a classification network based on the attention feature vector; and adjusting parameters of the crowd detection model according to the identification result and the detection frame identified in the sample picture.
Here, the specific method for adjusting the parameters of the crowd detection model according to the recognition result and the detection frame identified in the sample picture is known to those skilled in the art, and is not described herein again.
According to the training method, the head of the person in the sample picture needs to be detected, the five sense organs in the head need to be detected, the object of the head which is misjudged as the person in the head detection result can be effectively screened out by utilizing the detection result of the five sense organs and the heuristic attention weighting network, the accuracy of the head attention characteristic vector input to the classification network is effectively improved, the accuracy of the head identification result input by the classification network is improved, and the detection accuracy of the crowd detection model is further improved.
In addition, in the training method, the heuristic attention weighting mechanism is introduced to improve the accuracy of classification, so that after the attention feature vector of the head of the person is generated, the attention feature vector is only required to be input into the classification network of the model to identify the authenticity of the head of the person, and a detection frame is required to be finely adjusted through regression processing to improve the detection accuracy like the existing head detection method, so that the detection speed can be effectively improved compared with the existing head detection method.
The classification network is used for identifying the authenticity of the head of the corresponding person based on the attention feature vector of each head. The specific structure can be implemented by using an existing classifier, for example, two full-connection layers and a softmax activation function can be included, but the specific structure is not limited to this, and the specific structure can also be implemented by using one full-connection layer or a plurality of full-connection layers.
In one embodiment, in the above model training method, the following method may be adopted to detect the head and the five sense organs in the sample picture, and obtain the head candidate detection frame and the five sense organ candidate detection frame:
step a1, detecting the head in the sample picture by using a pre-trained head detection model to obtain a head candidate detection frame.
In this step, the detection frame of each head in the picture detected by the head detection model is used as a head candidate detection frame, so as to verify the authenticity of the picture in the subsequent steps.
Here, the head detection model is a model for detecting the head of a person in a picture.
And a2, obtaining a subgraph of the corresponding head based on the head candidate detection frame.
In this step, the image in the head candidate detection frame is used as a subgraph of the corresponding head, so that five sense organs in the image are identified based on the subgraph in the subsequent step, and a subregion feature map of each sense organ is obtained.
Step a3, detecting each facial organ in the subgraph by using a facial organ detection model to obtain the facial organ candidate detection frame.
In this step, each preset five sense organs in the head map is detected, and the detected detection frame is used as a candidate detection frame for the corresponding five sense organs. For example, if the five sense organs to be detected include the left eye, the right eye, the left ear, the right ear and the mouth, the step will need to detect these five sense organs from the subgraph, and obtain the detection frame for the left eye, the detection frame for the right eye, the detection frame for the left ear, the detection frame for the right ear and the detection frame for the mouth.
It should be noted that, due to different shooting angles in practical applications, there is a possibility that an image of all the preset five sense organs cannot be included in one head sub-image, that is, there may be no detection frame for some preset five sense organs in the sub-image.
In the above method, both the head detection model and the facial feature detection model may be implemented by using an existing target detection method, for example, by using a region candidate network (RPN).
In one embodiment, in the above model training method, for each detected head, generating the attention feature vector of the head using a heuristic attention weighting network may employ the following method:
and b1, extracting a corresponding head sub-region feature matrix based on the first head candidate detection box by using a first region of interest extraction layer (ROI Pooling) of the heuristic attention weighting network.
In this step, for each head candidate detection box detected in step a1, a corresponding head sub-region feature matrix (i.e., a head sub-region feature map) is extracted based on the head candidate detection box, so as to obtain a head average feature vector of the corresponding head. The first head candidate detection box represents any one of the head candidate detection boxes detected in step a 1.
Step b2, utilizing a first Global Pooling layer (Global Pooling) of the heuristic attention weighting network to perform Global average sampling on the head sub-region feature matrix to obtain a corresponding head average feature vector.
Step b3, extracting a corresponding feature matrix of the sub-region of the five sense organs based on each of the candidate detection boxes of the five sense organs in the first candidate detection box of the head by using a second region of interest extraction layer of the heuristic attention weighting network.
In this step, the feature matrix of the facial features of the head is extracted from the first head candidate detection frame, and if a certain facial feature does not have a candidate detection frame, the corresponding feature matrix of the facial features does not exist.
Step b4, utilizing the second global pooling layer of the heuristic attention weighting network to perform average sampling on the feature matrix of each facial feature subregion so as to obtain an average feature vector of the corresponding facial feature.
In this step, the feature matrix of the facial features of each facial feature of the facial features of the human in the first head candidate detection frame is sampled averagely to obtain an average feature vector of the corresponding facial features of the human, so that attention weighting processing is performed based on the average feature vector to screen out features of the head of the human detected by mistake in human head detection.
Step b5, calculating an attention weight vector of each of the five sense organs in the head corresponding to the first head candidate detection box based on the head average feature vector and the average feature vector of the corresponding five sense organs.
In one embodiment, this may be specifically in accordance with
Figure BDA0002904575770000091
Calculating an attention weight vector for each of said five sense organs in the respective head, wherein wiAttention weight vector, m, representing the five sense organs iiRepresents the mean eigenvector of the five sense organs i, h represents the mean eigenvector of the head, wiWith the same dimension as said h.
In the above calculation method, if the average feature vector exists in the five sense organs, the average feature vector of the corresponding five sense organs is point-multiplied with the average feature vector of the head to obtain the corresponding attention weight vector. If the average feature vector does not exist for the five sense organs, then the corresponding attention weight vector is zero. In this way, for an object whose head is erroneously detected as a human, the attention weight vectors of all five sense organs corresponding to the object are zero, because the detection frame of the five sense organs is not detected in the sub-image.
Step b6, performing point multiplication on the head average feature vector and the attention weight vector of each corresponding five sense organs respectively, and summing the result of the point multiplication to obtain the attention feature vector of the head corresponding to the first head candidate detection box.
Here, as described in the above step, the attention weight vector of the five sense organs of an object similar to the head of a person will be zero, and thus, the result of multiplying the zero vector by the head average feature vector point will be a zero vector. In this way, the attention feature vector of the object similar to the head of the person is a zero vector, so that the difference between the head of the person and other objects with similar shapes is improved, and therefore, the object which is falsely detected as the head of the person can be effectively screened out by using the step b 6.
In one embodiment, in the above model training method, when adjusting parameters of the crowd detection model according to the result output by the classification network, parameters of the head detection model, the five sense organs detection model, the heuristic attention weighting network, and the classification network in the model are specifically optimized and adjusted. The specific adjustment method is known to those skilled in the art and will not be described herein.
In one embodiment, for training in the heuristic attention weighting network and classification network, the cross entropy loss function can be used to optimize by a stochastic gradient descent method, but is not limited thereto.
In order to facilitate clear understanding of the crowd detection model structure provided by the embodiment of the invention. Fig. 2 is a schematic diagram of a crowd detection model network structure obtained based on the above model training method. As shown in fig. 2, the model includes a head detection model, a five sense organs detection model, a heuristic attention weighting network, and a classification network. In the network structure example, the head detection model and the facial feature detection model are both implemented by using RPN.
Based on the above embodiment of the model training method, an embodiment of the present invention further provides a population counting method, as shown in fig. 3, the population counting method includes:
step 301, obtaining a target detection picture.
And 302, detecting the head of the person in the target detection picture based on a crowd detection model, and counting the detected head to obtain the number of the person in the target detection picture.
Wherein the confidence of the counting head is greater than a preset threshold; the crowd detection model is obtained by adopting the embodiment of the crowd detection model training method in advance.
In step 302, for each head detected by the crowd detection model, crowd counting is performed according to the corresponding confidence level, that is, the head with the statistical confidence level greater than the preset threshold value is counted. The specific calculation method of the confidence coefficient of the detection result can be realized by adopting the existing method.
As described in the above analysis, in the crowd detection model used in this step, due to the introduction of the five sense organs detection means and the combination of the heuristic attention mechanism, the false detection result of the human head detection can be effectively screened out, so that the detection accuracy of the crowd detection model can be ensured. Therefore, in step 302, the people detection model obtained by training in the first embodiment of the present invention is used to detect the head of the person in the target detection picture, and the number is counted according to the detection result, so that the accuracy of people detection can be improved.
Here, the threshold value, which is a constraint condition for limiting the head of the detected person to participate in counting, may be set to an appropriate value by those skilled in the art.
Corresponding to the above embodiment of the model training method, an embodiment of the present invention further provides a model training apparatus, as shown in fig. 4, the apparatus includes:
a sample data obtaining module 401, configured to obtain a sample data set; wherein, the head and the five sense organs of the person in the sample picture are marked with a detection frame.
A model training module 402, configured to train a pre-constructed crowd detection model by using the sample data set to obtain a target crowd detection model; wherein the training comprises:
detecting the head of the person and the five sense organs in the sample picture to obtain a head candidate detection frame and a five sense organs candidate detection frame;
generating attention feature vectors of corresponding heads by utilizing a heuristic attention weighting network based on the head candidate detection boxes and the five sense organ candidate detection boxes;
identifying the authenticity of the corresponding head by utilizing a classification network based on the attention feature vector; and adjusting parameters of the crowd detection model according to the identification result and the detection frame identified in the sample picture.
Corresponding to the above embodiment of the crowd counting method, an embodiment of the present invention further provides a crowd counting apparatus, as shown in fig. 5, the crowd counting apparatus includes:
a detected target obtaining module 501, configured to obtain a target detected picture;
a head detection module 502, configured to detect the head of a person in the target detection picture based on a crowd detection model, and count the detected head to obtain the number of the person in the target detection picture; wherein the confidence of the counting head is greater than a preset threshold; the crowd detection model is obtained by training through the crowd detection model training method.
It can be seen from the above embodiments that, in the above embodiment of the model training method, in the process of training the crowd detection model by using the sample picture, the five sense organs detection is introduced on the basis of the head detection, and the results of the head detection and the five sense organs detection are comprehensively processed by using a heuristic attention weighting mechanism to generate the attention feature vector of each head detected by the head. Therefore, the difference between the human head and other similar objects in the shape can be improved by utilizing the result of the five sense organs detection and adopting a heuristic attention weighting mechanism, so that the accuracy of the attention feature vector input to the classification network can be improved, the false detection result of the human head detection can be screened out, and the detection accuracy of the trained crowd detection model can be improved. Accordingly, the accuracy of people counting by using the people detection model is improved.
The crowd detection model provided by the embodiment of the invention can effectively overcome the influence of the shooting angle on the detection accuracy, so that the crowd counting method realized based on the crowd detection model has wider application scenes and is suitable for various scenes, such as crowd dense scenes, crowd sparse scenes, angle change diversity and the like.
Corresponding to the embodiment of the crowd detection model training method, the embodiment of the invention also provides crowd detection model training equipment, which comprises a processor and a memory;
the memory has stored therein an application executable by the processor for causing the processor to perform the crowd detection model training method as described above.
Embodiments of the present invention also provide a computer-readable storage medium, in which computer-readable instructions are stored, and the computer-readable instructions are used for executing the crowd detection model training method described above.
Corresponding to the embodiment of the crowd counting method, the embodiment of the invention also provides crowd counting equipment, which comprises a processor and a memory;
the memory has stored therein an application executable by the processor for causing the processor to perform the people counting method as described above.
Embodiments of the present invention also provide a computer-readable storage medium having stored therein computer-readable instructions for performing the people counting method as described above.
In the above embodiments, the memory may be specifically implemented as various storage media such as an Electrically Erasable Programmable Read Only Memory (EEPROM), a Flash memory (Flash memory), a Programmable Read Only Memory (PROM), and the like. The processor may be implemented to include one or more central processors or one or more field programmable gate arrays, wherein the field programmable gate arrays integrate one or more central processor cores. In particular, the central processor or central processor core may be implemented as a CPU or MCU.
It should be noted that not all steps and modules in the above flows and structures are necessary, and some steps or modules may be omitted according to actual needs. The execution order of the steps is not fixed and can be adjusted as required. The division of each module is only for convenience of describing adopted functional division, and in actual implementation, one module may be divided into multiple modules, and the functions of multiple modules may also be implemented by the same module, and these modules may be located in the same device or in different devices.
The hardware modules in the various embodiments may be implemented mechanically or electronically. For example, a hardware module may include specially designed permanent circuits or logic devices (e.g., a special purpose processor such as an FPGA or ASiC) for performing specific operations. A hardware module may also include programmable logic devices or circuits (e.g., including a general-purpose processor or other programmable processor) that are temporarily configured by software to perform certain operations. The implementation of the hardware module in a mechanical manner, or in a dedicated permanent circuit, or in a temporarily configured circuit (e.g., configured by software), may be determined based on cost and time considerations.
Examples of the storage medium for supplying the program code include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs, DVD + RWs), magnetic tapes, nonvolatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or the cloud by a communication network.
"exemplary" means "serving as an example, instance, or illustration" herein, and any illustration, embodiment, or steps described as "exemplary" herein should not be construed as a preferred or advantageous alternative. For the sake of simplicity, the drawings are only schematic representations of the parts relevant to the invention, and do not represent the actual structure of the product. In addition, in order to make the drawings concise and understandable, components having the same structure or function in some of the drawings are only schematically illustrated or only labeled. In this document, "a" does not mean that the number of the relevant portions of the present invention is limited to "only one", and "a" does not mean that the number of the relevant portions of the present invention "more than one" is excluded. In this document, "upper", "lower", "front", "rear", "left", "right", "inner", "outer", and the like are used only to indicate relative positional relationships between relevant portions, and do not limit absolute positions of the relevant portions.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (13)

1. A method for training a crowd detection model, the method comprising:
acquiring a sample data set; wherein, the head and the five sense organs of the person in the sample picture are marked with a detection frame;
training a pre-constructed crowd detection model by using the sample data set to obtain a target crowd detection model; wherein the training comprises:
detecting the head of the person and the five sense organs in the sample picture to obtain a head candidate detection frame and a five sense organs candidate detection frame;
generating attention feature vectors of corresponding heads by utilizing a heuristic attention weighting network based on the head candidate detection boxes and the five sense organ candidate detection boxes;
identifying the authenticity of the corresponding head by utilizing a classification network based on the attention feature vector; and adjusting parameters of the crowd detection model according to the identification result and the detection frame identified in the sample picture.
2. The method of claim 1, wherein the detecting the head and the five sense organs of the person in the sample picture to obtain a head candidate detection frame and a five sense organ candidate detection frame comprises:
detecting the head in the sample picture by using a pre-trained head detection model to obtain a head candidate detection frame;
obtaining a subgraph of the corresponding head based on the head candidate detection frame;
and detecting each five sense organs in the subgraph by using a five sense organs detection model to obtain the five sense organs candidate detection frame.
3. The method of claim 1, wherein the generating the attention feature vector for the respective head comprises:
extracting a corresponding head subregion characteristic matrix based on a first head candidate detection frame by utilizing a first interested region extraction layer of the heuristic attention weighting network;
utilizing a first global pooling layer of the heuristic attention weighting network to perform global average sampling on the head sub-region feature matrix to obtain a corresponding head average feature vector;
extracting a corresponding feature matrix of the sub-region of the facial features based on each of the facial feature candidate detection boxes by using a second region-of-interest extraction layer of the heuristic attention weighting network;
utilizing a second global pooling layer of the heuristic attention weighting network to carry out average sampling on the feature matrix of each facial feature subregion so as to obtain an average feature vector of the corresponding facial feature;
calculating an attention weight vector for each of the five sense organs in the respective head based on the average feature vector for the head and the average feature vector for the respective five sense organs;
and performing point multiplication on the head average feature vector and the attention weight vector of each corresponding five sense organs respectively, and summing the result of the point multiplication to obtain the attention feature vector of the head corresponding to the first head candidate detection box.
4. The method of claim 3, wherein said calculating an attention weight vector for each of said five sense organs in the respective head comprises:
if the average feature vector exists in the five sense organs, the average feature vector of the corresponding five sense organs is point-multiplied with the average feature vector of the head to obtain a corresponding attention weight vector;
if the average feature vector does not exist for the five sense organs, then the corresponding attention weight vector is zero.
5. The method of claim 2, wherein the adjusting the parameters of the crowd detection model comprises:
adjusting parameters in the head detection model, the facial feature detection model, the heuristic attention weighting network, and the classification network.
6. The method of claim 1, wherein the five sense organs comprise:
left eye, right eye, left ear, right ear, and mouth.
7. A method of population counting, comprising:
acquiring a target detection picture;
detecting the head of a person in the target detection picture based on a crowd detection model, and counting the detected head to obtain the number of the person in the target detection picture;
wherein the confidence of the counting head is greater than a preset threshold; the population detection model is previously trained by any one of the methods of claims 1 to 6.
8. A crowd detection model training device, comprising:
the sample data acquisition module is used for acquiring a sample data set; wherein, the head and the five sense organs of the person in the sample picture are marked with a detection frame;
the model training module is used for training a pre-constructed crowd detection model by using the sample data set to obtain a target crowd detection model; wherein the training comprises:
detecting the head of the person and the five sense organs in the sample picture to obtain a head candidate detection frame and a five sense organs candidate detection frame;
generating attention feature vectors of corresponding heads by utilizing a heuristic attention weighting network based on the head candidate detection boxes and the five sense organ candidate detection boxes;
identifying the authenticity of the corresponding head by utilizing a classification network based on the attention feature vector; and adjusting parameters of the crowd detection model according to the identification result and the detection frame identified in the sample picture.
9. A people counting device, comprising:
the detection target acquisition module is used for acquiring a target detection picture;
the head detection module is used for detecting the head in the target detection picture based on a crowd detection model and counting the detected head to obtain the number of people in the target detection picture; wherein the confidence of the counting head is greater than a preset threshold; the population detection model is trained using any one of the methods of claims 1 to 6.
10. A crowd detection model training apparatus comprising a processor and a memory;
the memory has stored therein an application executable by the processor for causing the processor to perform the crowd detection model training method of any one of claims 1 to 6.
11. A computer-readable storage medium having computer-readable instructions stored thereon for performing the crowd detection model training method of any one of claims 1 to 6.
12. A crowd counting device comprising a processor and a memory;
the memory has stored therein an application executable by the processor for causing the processor to perform the people counting method of claim 7.
13. A computer readable storage medium having computer readable instructions stored thereon for performing the people counting method of claim 7.
CN202110067279.5A 2021-01-19 2021-01-19 Crowd detection model training method and device and crowd counting method and device Active CN113822111B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110067279.5A CN113822111B (en) 2021-01-19 2021-01-19 Crowd detection model training method and device and crowd counting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110067279.5A CN113822111B (en) 2021-01-19 2021-01-19 Crowd detection model training method and device and crowd counting method and device

Publications (2)

Publication Number Publication Date
CN113822111A true CN113822111A (en) 2021-12-21
CN113822111B CN113822111B (en) 2024-05-24

Family

ID=78912375

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110067279.5A Active CN113822111B (en) 2021-01-19 2021-01-19 Crowd detection model training method and device and crowd counting method and device

Country Status (1)

Country Link
CN (1) CN113822111B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140139633A1 (en) * 2012-11-21 2014-05-22 Pelco, Inc. Method and System for Counting People Using Depth Sensor
CN105303193A (en) * 2015-09-21 2016-02-03 重庆邮电大学 People counting system for processing single-frame image
WO2016183766A1 (en) * 2015-05-18 2016-11-24 Xiaogang Wang Method and apparatus for generating predictive models
CN106960195A (en) * 2017-03-27 2017-07-18 深圳市丰巨泰科电子有限公司 A kind of people counting method and device based on deep learning
WO2018121690A1 (en) * 2016-12-29 2018-07-05 北京市商汤科技开发有限公司 Object attribute detection method and device, neural network training method and device, and regional detection method and device
CN108985256A (en) * 2018-08-01 2018-12-11 曜科智能科技(上海)有限公司 Based on the multiple neural network demographic method of scene Density Distribution, system, medium, terminal
US20190251333A1 (en) * 2017-06-02 2019-08-15 Tencent Technology (Shenzhen) Company Limited Face detection training method and apparatus, and electronic device
CN111046747A (en) * 2019-11-21 2020-04-21 北京金山云网络技术有限公司 Crowd counting model training method, crowd counting method, device and server
WO2020207038A1 (en) * 2019-04-12 2020-10-15 深圳壹账通智能科技有限公司 People counting method, apparatus, and device based on facial recognition, and storage medium
CN111950507A (en) * 2020-08-25 2020-11-17 北京猎户星空科技有限公司 Data processing and model training method, device, equipment and medium
CN112232140A (en) * 2020-09-25 2021-01-15 浙江远传信息技术股份有限公司 Crowd counting method and device, electronic equipment and computer storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140139633A1 (en) * 2012-11-21 2014-05-22 Pelco, Inc. Method and System for Counting People Using Depth Sensor
WO2016183766A1 (en) * 2015-05-18 2016-11-24 Xiaogang Wang Method and apparatus for generating predictive models
CN105303193A (en) * 2015-09-21 2016-02-03 重庆邮电大学 People counting system for processing single-frame image
WO2018121690A1 (en) * 2016-12-29 2018-07-05 北京市商汤科技开发有限公司 Object attribute detection method and device, neural network training method and device, and regional detection method and device
CN106960195A (en) * 2017-03-27 2017-07-18 深圳市丰巨泰科电子有限公司 A kind of people counting method and device based on deep learning
US20190251333A1 (en) * 2017-06-02 2019-08-15 Tencent Technology (Shenzhen) Company Limited Face detection training method and apparatus, and electronic device
CN108985256A (en) * 2018-08-01 2018-12-11 曜科智能科技(上海)有限公司 Based on the multiple neural network demographic method of scene Density Distribution, system, medium, terminal
WO2020207038A1 (en) * 2019-04-12 2020-10-15 深圳壹账通智能科技有限公司 People counting method, apparatus, and device based on facial recognition, and storage medium
CN111046747A (en) * 2019-11-21 2020-04-21 北京金山云网络技术有限公司 Crowd counting model training method, crowd counting method, device and server
CN111950507A (en) * 2020-08-25 2020-11-17 北京猎户星空科技有限公司 Data processing and model training method, device, equipment and medium
CN112232140A (en) * 2020-09-25 2021-01-15 浙江远传信息技术股份有限公司 Crowd counting method and device, electronic equipment and computer storage medium

Also Published As

Publication number Publication date
CN113822111B (en) 2024-05-24

Similar Documents

Publication Publication Date Title
CN110020592B (en) Object detection model training method, device, computer equipment and storage medium
CN108416250B (en) People counting method and device
US9478039B1 (en) Background modeling and foreground extraction method based on depth image
CN106778595B (en) Method for detecting abnormal behaviors in crowd based on Gaussian mixture model
CN107153817B (en) Pedestrian re-identification data labeling method and device
Saif et al. Automatic license plate recognition system for bangla license plates using convolutional neural network
US9652694B2 (en) Object detection method, object detection device, and image pickup device
CN105404886B (en) Characteristic model generation method and characteristic model generating means
JP2015207280A (en) target identification method and target identification device
CN104599287B (en) Method for tracing object and device, object identifying method and device
CN108596045B (en) Group abnormal behavior detection method based on aerial monitoring platform
CN110390229B (en) Face picture screening method and device, electronic equipment and storage medium
CN107292302B (en) Method and system for detecting interest points in picture
CN112634329B (en) Scene target activity prediction method and device based on space-time and or graph
CN110415260B (en) Smoke image segmentation and identification method based on dictionary and BP neural network
Raghavendra et al. Novel presentation attack detection algorithm for face recognition system: Application to 3D face mask attack
WO2013075295A1 (en) Clothing identification method and system for low-resolution video
CN109376736A (en) A kind of small video target detection method based on depth convolutional neural networks
CN112116635A (en) Visual tracking method and device based on rapid human body movement
CN109165636A (en) A kind of sparse recognition methods of Rare Birds based on component-level multiple features fusion
Temel et al. Object recognition under multifarious conditions: A reliability analysis and a feature similarity-based performance estimation
CN111950507B (en) Data processing and model training method, device, equipment and medium
Zhou et al. A practical method for counting arbitrary target objects in arbitrary scenes
CN110765940B (en) Target object statistical method and device
CN117333795A (en) River surface flow velocity measurement method and system based on screening post-treatment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant