CN112101122B - Weak supervision object number estimation method based on sorting network - Google Patents
Weak supervision object number estimation method based on sorting network Download PDFInfo
- Publication number
- CN112101122B CN112101122B CN202010845336.3A CN202010845336A CN112101122B CN 112101122 B CN112101122 B CN 112101122B CN 202010845336 A CN202010845336 A CN 202010845336A CN 112101122 B CN112101122 B CN 112101122B
- Authority
- CN
- China
- Prior art keywords
- network
- ordering
- layer
- image
- objects
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 22
- 238000011176 pooling Methods 0.000 claims abstract description 15
- 239000011159 matrix material Substances 0.000 claims abstract description 12
- 238000013528 artificial neural network Methods 0.000 claims abstract description 4
- 230000005540 biological transmission Effects 0.000 claims abstract description 4
- 230000003044 adaptive effect Effects 0.000 claims abstract 3
- 238000012549 training Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 2
- 238000010191 image analysis Methods 0.000 claims description 2
- 238000002372 labelling Methods 0.000 abstract description 2
- 239000013598 vector Substances 0.000 abstract description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000009966 trimming Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Multimedia (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the technical field of computer vision, in particular to a weak supervision object number estimation method based on a sorting network, which does not need to rely on object position labeling information to train a model, saves human resources and improves the universality of the model; comprising the following steps: extracting image features by using a deep neural network, and acquiring pyramid feature vectors by using an adaptive pooling layer; the number of objects returned back and forth using the full connection layer; the model is trained using a multi-branch ordering network, the ordering result is transformed into an ordering matrix using the sink horn layer, and the loss is calculated using the soft label transmission matrix as a true value.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a weak supervision object number estimation method based on a sorting network.
Background
The counting of key objects such as the number of people, vehicles and the like is realized through the camera in the public occasion, and the method has important research value. For example, the counting result of people in a waiting hall and the estimation of the number of vehicles in a traffic intersection can optimize the dispatching of public traffic; a sudden change in the number of people in an area may be both a result of an incident and a result of an incident. Therefore, the object number estimation in the image video has important value in the intelligent security field, and is an important research content in the fields of computer vision and intelligent video monitoring.
At present, the number of objects estimation method can be roughly divided into three types, namely 1) object detection, wherein the method is relatively direct, and in a scene with sparse objects, the number of the objects is obtained by detecting the objects in an image, so that the method is not effective under the condition of crowding the objects. 2) Clustering visual characteristic tracks: for video monitoring, the number of objects is generally determined by the number obtained by clustering the tracks by using a KLT tracker and a clustering method. 3) Feature-based regression by building a regression model of image features and the number of image objects, and estimating the number of objects in the scene by measuring the image features. The direct method is easy to be influenced by the difficult problems such as shielding under the crowded condition, and the indirect method has the capability of counting objects in a large scale from the integral characteristics of the object group.
Existing algorithms based on feature regression suffer from the following drawbacks. First, labeling of object locations is often expensive. Existing object number estimation datasets provide locations for each object to train a number regression network, while in the estimation phase, these location tags are not considered, only the accuracy of the estimated object number is estimated. In fact, without the need for location, only the number of objects in the image may be annotated, with a more efficient weak supervision method to train the object number estimation model.
Disclosure of Invention
In order to solve the technical problems, the invention provides a weak supervision object number estimation method based on a sorting network, which does not need object position marking information, saves human resources and improves object number estimation accuracy.
The invention discloses a weak supervision object number estimation method based on a sorting network, which comprises the following steps of:
s1, extracting image features by using a pre-trained deep neural network such as VGG-16, and then returning a density chart by using convolution operation; and extracting multi-scale characteristics from the density map by using the self-adaptive pooling layer to capture global and local information in the image, and inputting the global and local information into the number of regression objects of the full-connection layer. The self-adaptive pooling layer comprises a global sub-cluster layer and a local sub-cluster layer.
S2, learning the multi-scale features by using an image object number ordering network, so that the multi-scale features are sensitive to the object number. The ranking network here is a multi-branch network whose inputs are multi-scale features of multiple images and whose outputs are the results of ranking according to the number of objects in the images.
S3, using a sink horn layer in the sorting network to change the sorting characteristics into ordinal matrixes, constructing a soft tag transmission matrix by using the real number of objects in the image, and training the sorting network by using cross entropy loss to obtain characteristics sensitive to the number of the objects; then training a regression network to finally obtain an object number regression model;
the invention discloses a weak supervision object number estimation method based on a sorting network, which comprises the following specific operations of step S1: extracting image features by using a depth network model pre-trained on an image analysis task, and returning a pseudo probability density map; then constructing a global sub-cluster layer by using a pooling layer with larger stride, and extracting global features from the density map; and constructing a local sub-cluster layer by using the pooling layer with smaller steps, and extracting local features from the density map.
The invention discloses a weak supervision object number estimation method based on a sorting network, which comprises the following specific operations of step S2: trimming feature extraction model using multi-branch ordering network to obtain global and local features for number of objects in image
The invention discloses a weak supervision object number estimation method based on a sorting network, which comprises the following specific operations of step S3: using a differentiable sink horn layer to change the ordering attribute into an ordinal matrix; constructing a more efficient soft label transport matrix to train the ranking network; the sorting network is trained using cross entropy loss and the regression network is trained using mean square error.
The beneficial effects of the invention are as follows: the ordering network can learn multi-scale characteristics sensitive to the number of objects through the relative relation of the number of objects among images, is used for inputting the regression network, avoids using the position information of the objects, and does not need a large amount of manpower to label the position information of the objects. Using a sink Horn layer that is differentiable, so that the network can train end-to-end; the soft label transportation matrix is constructed by utilizing the relative relation of the number of objects in the image, so that a complex program of the sequencing task is effectively reflected, and the accuracy of estimating the number of objects is improved.
Drawings
Fig. 1 is a schematic diagram of the present invention.
Detailed Description
The following describes the embodiments of the present invention in further detail with reference to examples. The following examples are illustrative of the present invention but are not intended to limit the scope of the invention
Examples
S1, extracting image features by using a pre-trained deep neural network such as VGG-16, and then returning a density chart by using convolution operation; global and local information in the image is captured by extracting multi-scale features from the density map by using a plurality of pooling layers, and the global and local information is input to the number of fully connected layer regression objects. The self-adaptive pooling layer comprises a global sub-cluster layer and a local sub-cluster layer. The global sub-cluster layer uses a three-Max pooling layer, and the pooling step sizes are 8, 16 and 32 respectively; the local sub-cluster layer uses two Average pooling layers, and the pooling step length is 1 and 2;
s2, learning the multi-scale features by using an image object number ordering network, so that the multi-scale features are sensitive to the object number. The ranking network here is a multi-branch network whose inputs are multi-scale features of multiple images and whose outputs are the results of ranking according to the number of objects in the images. Specifically, a K-branch network can be adopted to extract multi-scale features f of K images 1 ,f 2 ,f 3 ,…,f K Then calculate f 1 -f 2 ,f 1 -f 3 ,…,f 1 -f k ,f 2 -f 4 ,…,f 2 -f K ,…,f K-1 -f K Is input into a sorting network to obtain a K (K-1) -dimensional sorting vector f d ;
S3, using a sink horn layer in the sorting network to sort the characteristic f d Becomes an ordinal matrix P in which the ith row and jth column elements P i,j The probability that the ith image is ranked at the jth name is shown; constructing a soft label transmission matrix using a true number of objects in an image
The true ordering result of the images is represented by sigma, wherein sigma ith element sigma (i) represents that the ith image is arranged at the sigma (i) th position, and the element in the soft label matrix is calculated as follows:
wherein the method comprises the steps of
△ thr Is a predefined threshold. The ordering network is trained using the following cross entropy loss, resulting in features that are sensitive to the number of objects.
And then training a regression network by using the mean square error loss to finally obtain the object number regression model.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that modifications and variations can be made without departing from the technical principles of the present invention, and these modifications and variations should also be regarded as the scope of the invention.
Claims (4)
1. A method for estimating the number of weakly supervised objects based on a ranking network, comprising:
s1, extracting image features by using a pre-trained deep neural network VGG-16, and then returning to a density chart by using convolution operation; extracting multi-scale characteristics from the density map by using an adaptive pooling layer to capture global and local information in the image, and inputting the global and local information into the number of regression objects of the full-connection layer, wherein the adaptive pooling layer comprises two types of global sub-cluster layers and local sub-cluster layers;
s2, learning the multi-scale features by using an image object number ordering network to enable the multi-scale features to be sensitive to the object number, wherein the ordering network is a multi-branch network, the multi-scale features of a plurality of images are input into the ordering network, and the result of ordering according to the object number in the images is output;
s3, using a sink horn layer in the sorting network to change the sorting characteristics into ordinal matrixes, constructing a soft tag transmission matrix by using the real number of objects in the image, and training the sorting network by using cross entropy loss to obtain characteristics sensitive to the number of the objects; and training a regression network to finally obtain the object number regression model.
2. The method for estimating the number of weakly supervised objects based on the ranking network as set forth in claim 1, wherein the specific operations in step S1 are as follows: extracting image features by using a depth network model pre-trained on an image analysis task, and returning a pseudo probability density map; then constructing a global sub-cluster layer by using a pooling layer with larger stride, and extracting global features from the density map; and constructing a local sub-cluster layer by using the pooling layer with smaller steps, and extracting local features from the density map.
3. The method for estimating the number of weakly supervised objects based on the ranking network as set forth in claim 1, wherein the specific operations in step S2 are as follows: and fine-tuning the feature extraction model by using a multi-branch ordering network to obtain global and local features for the number of objects in the image.
4. The method for estimating the number of weakly supervised objects based on the ranking network as set forth in claim 1, wherein the specific operation of step S3 is as follows: using a differentiable sink horn layer to change the ordering attribute into an ordinal matrix; constructing a more efficient soft label transport matrix, using cross entropy loss to train the ordering network; the regression network is trained using the mean square error.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010845336.3A CN112101122B (en) | 2020-08-20 | 2020-08-20 | Weak supervision object number estimation method based on sorting network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010845336.3A CN112101122B (en) | 2020-08-20 | 2020-08-20 | Weak supervision object number estimation method based on sorting network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112101122A CN112101122A (en) | 2020-12-18 |
CN112101122B true CN112101122B (en) | 2024-02-09 |
Family
ID=73753262
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010845336.3A Active CN112101122B (en) | 2020-08-20 | 2020-08-20 | Weak supervision object number estimation method based on sorting network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112101122B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107301387A (en) * | 2017-06-16 | 2017-10-27 | 华南理工大学 | A kind of image Dense crowd method of counting based on deep learning |
CN111428733A (en) * | 2020-03-12 | 2020-07-17 | 山东大学 | Zero sample target detection method and system based on semantic feature space conversion |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11205103B2 (en) * | 2016-12-09 | 2021-12-21 | The Research Foundation for the State University | Semisupervised autoencoder for sentiment analysis |
CN110678877B (en) * | 2017-03-16 | 2022-07-26 | 西门子股份公司 | System and method for visual localization in test images |
-
2020
- 2020-08-20 CN CN202010845336.3A patent/CN112101122B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107301387A (en) * | 2017-06-16 | 2017-10-27 | 华南理工大学 | A kind of image Dense crowd method of counting based on deep learning |
CN111428733A (en) * | 2020-03-12 | 2020-07-17 | 山东大学 | Zero sample target detection method and system based on semantic feature space conversion |
Non-Patent Citations (1)
Title |
---|
基于深度卷积神经网络的弱监督图像语义分割;郑宝玉;王雨;吴锦雯;周全;;南京邮电大学学报(自然科学版)(05);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112101122A (en) | 2020-12-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110059554B (en) | Multi-branch target detection method based on traffic scene | |
Cai et al. | A modified YOLOv3 model for fish detection based on MobileNetv1 as backbone | |
US8620026B2 (en) | Video-based detection of multiple object types under varying poses | |
US11194331B2 (en) | Unsupervised classification of encountering scenarios using connected vehicle datasets | |
Wang et al. | Dairy goat detection based on Faster R-CNN from surveillance video | |
WO2019041360A1 (en) | Pedestrian attribute recognition and positioning method and convolutional neural network system | |
CN106951830B (en) | Image scene multi-object marking method based on prior condition constraint | |
CN104063719A (en) | Method and device for pedestrian detection based on depth convolutional network | |
Farag et al. | Deep learning versus traditional methods for parking lots occupancy classification | |
CN111738074B (en) | Pedestrian attribute identification method, system and device based on weak supervision learning | |
Getahun et al. | A deep learning approach for lane detection | |
CN115527269A (en) | Intelligent human body posture image identification method and system | |
CN113269038B (en) | Multi-scale-based pedestrian detection method | |
Li et al. | Real-time monocular joint perception network for autonomous driving | |
CN112101122B (en) | Weak supervision object number estimation method based on sorting network | |
Patel et al. | Survey on scene classification techniques | |
Rohith et al. | Remote sensing signature classification of agriculture detection using deep convolution network models | |
CN110110670B (en) | Data association method in pedestrian tracking based on Wasserstein measurement | |
Qiao et al. | Rapid trajectory clustering based on neighbor spatial analysis | |
Wang et al. | Deep tiered image segmentation for detecting internal ice layers in radar imagery | |
Prakash | Object Detection In Surveillance Video | |
CN115063831A (en) | High-performance pedestrian retrieval and re-identification method and device | |
Williams et al. | Detecting marine animals in underwater video: Let's start with salmon | |
CN111353349B (en) | Human body key point detection method and device, electronic equipment and storage medium | |
CN110084146B (en) | Pedestrian detection method and device based on shielding perception self-supervision learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |