CN112101122B - Weak supervision object number estimation method based on sorting network - Google Patents

Weak supervision object number estimation method based on sorting network Download PDF

Info

Publication number
CN112101122B
CN112101122B CN202010845336.3A CN202010845336A CN112101122B CN 112101122 B CN112101122 B CN 112101122B CN 202010845336 A CN202010845336 A CN 202010845336A CN 112101122 B CN112101122 B CN 112101122B
Authority
CN
China
Prior art keywords
network
ordering
layer
image
objects
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010845336.3A
Other languages
Chinese (zh)
Other versions
CN112101122A (en
Inventor
李国荣
杨一帆
黄庆明
苏荔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Chinese Academy of Sciences
Original Assignee
University of Chinese Academy of Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Chinese Academy of Sciences filed Critical University of Chinese Academy of Sciences
Priority to CN202010845336.3A priority Critical patent/CN112101122B/en
Publication of CN112101122A publication Critical patent/CN112101122A/en
Application granted granted Critical
Publication of CN112101122B publication Critical patent/CN112101122B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of computer vision, in particular to a weak supervision object number estimation method based on a sorting network, which does not need to rely on object position labeling information to train a model, saves human resources and improves the universality of the model; comprising the following steps: extracting image features by using a deep neural network, and acquiring pyramid feature vectors by using an adaptive pooling layer; the number of objects returned back and forth using the full connection layer; the model is trained using a multi-branch ordering network, the ordering result is transformed into an ordering matrix using the sink horn layer, and the loss is calculated using the soft label transmission matrix as a true value.

Description

Weak supervision object number estimation method based on sorting network
Technical Field
The invention relates to the technical field of computer vision, in particular to a weak supervision object number estimation method based on a sorting network.
Background
The counting of key objects such as the number of people, vehicles and the like is realized through the camera in the public occasion, and the method has important research value. For example, the counting result of people in a waiting hall and the estimation of the number of vehicles in a traffic intersection can optimize the dispatching of public traffic; a sudden change in the number of people in an area may be both a result of an incident and a result of an incident. Therefore, the object number estimation in the image video has important value in the intelligent security field, and is an important research content in the fields of computer vision and intelligent video monitoring.
At present, the number of objects estimation method can be roughly divided into three types, namely 1) object detection, wherein the method is relatively direct, and in a scene with sparse objects, the number of the objects is obtained by detecting the objects in an image, so that the method is not effective under the condition of crowding the objects. 2) Clustering visual characteristic tracks: for video monitoring, the number of objects is generally determined by the number obtained by clustering the tracks by using a KLT tracker and a clustering method. 3) Feature-based regression by building a regression model of image features and the number of image objects, and estimating the number of objects in the scene by measuring the image features. The direct method is easy to be influenced by the difficult problems such as shielding under the crowded condition, and the indirect method has the capability of counting objects in a large scale from the integral characteristics of the object group.
Existing algorithms based on feature regression suffer from the following drawbacks. First, labeling of object locations is often expensive. Existing object number estimation datasets provide locations for each object to train a number regression network, while in the estimation phase, these location tags are not considered, only the accuracy of the estimated object number is estimated. In fact, without the need for location, only the number of objects in the image may be annotated, with a more efficient weak supervision method to train the object number estimation model.
Disclosure of Invention
In order to solve the technical problems, the invention provides a weak supervision object number estimation method based on a sorting network, which does not need object position marking information, saves human resources and improves object number estimation accuracy.
The invention discloses a weak supervision object number estimation method based on a sorting network, which comprises the following steps of:
s1, extracting image features by using a pre-trained deep neural network such as VGG-16, and then returning a density chart by using convolution operation; and extracting multi-scale characteristics from the density map by using the self-adaptive pooling layer to capture global and local information in the image, and inputting the global and local information into the number of regression objects of the full-connection layer. The self-adaptive pooling layer comprises a global sub-cluster layer and a local sub-cluster layer.
S2, learning the multi-scale features by using an image object number ordering network, so that the multi-scale features are sensitive to the object number. The ranking network here is a multi-branch network whose inputs are multi-scale features of multiple images and whose outputs are the results of ranking according to the number of objects in the images.
S3, using a sink horn layer in the sorting network to change the sorting characteristics into ordinal matrixes, constructing a soft tag transmission matrix by using the real number of objects in the image, and training the sorting network by using cross entropy loss to obtain characteristics sensitive to the number of the objects; then training a regression network to finally obtain an object number regression model;
the invention discloses a weak supervision object number estimation method based on a sorting network, which comprises the following specific operations of step S1: extracting image features by using a depth network model pre-trained on an image analysis task, and returning a pseudo probability density map; then constructing a global sub-cluster layer by using a pooling layer with larger stride, and extracting global features from the density map; and constructing a local sub-cluster layer by using the pooling layer with smaller steps, and extracting local features from the density map.
The invention discloses a weak supervision object number estimation method based on a sorting network, which comprises the following specific operations of step S2: trimming feature extraction model using multi-branch ordering network to obtain global and local features for number of objects in image
The invention discloses a weak supervision object number estimation method based on a sorting network, which comprises the following specific operations of step S3: using a differentiable sink horn layer to change the ordering attribute into an ordinal matrix; constructing a more efficient soft label transport matrix to train the ranking network; the sorting network is trained using cross entropy loss and the regression network is trained using mean square error.
The beneficial effects of the invention are as follows: the ordering network can learn multi-scale characteristics sensitive to the number of objects through the relative relation of the number of objects among images, is used for inputting the regression network, avoids using the position information of the objects, and does not need a large amount of manpower to label the position information of the objects. Using a sink Horn layer that is differentiable, so that the network can train end-to-end; the soft label transportation matrix is constructed by utilizing the relative relation of the number of objects in the image, so that a complex program of the sequencing task is effectively reflected, and the accuracy of estimating the number of objects is improved.
Drawings
Fig. 1 is a schematic diagram of the present invention.
Detailed Description
The following describes the embodiments of the present invention in further detail with reference to examples. The following examples are illustrative of the present invention but are not intended to limit the scope of the invention
Examples
S1, extracting image features by using a pre-trained deep neural network such as VGG-16, and then returning a density chart by using convolution operation; global and local information in the image is captured by extracting multi-scale features from the density map by using a plurality of pooling layers, and the global and local information is input to the number of fully connected layer regression objects. The self-adaptive pooling layer comprises a global sub-cluster layer and a local sub-cluster layer. The global sub-cluster layer uses a three-Max pooling layer, and the pooling step sizes are 8, 16 and 32 respectively; the local sub-cluster layer uses two Average pooling layers, and the pooling step length is 1 and 2;
s2, learning the multi-scale features by using an image object number ordering network, so that the multi-scale features are sensitive to the object number. The ranking network here is a multi-branch network whose inputs are multi-scale features of multiple images and whose outputs are the results of ranking according to the number of objects in the images. Specifically, a K-branch network can be adopted to extract multi-scale features f of K images 1 ,f 2 ,f 3 ,…,f K Then calculate f 1 -f 2 ,f 1 -f 3 ,…,f 1 -f k ,f 2 -f 4 ,…,f 2 -f K ,…,f K-1 -f K Is input into a sorting network to obtain a K (K-1) -dimensional sorting vector f d
S3, using a sink horn layer in the sorting network to sort the characteristic f d Becomes an ordinal matrix P in which the ith row and jth column elements P i,j The probability that the ith image is ranked at the jth name is shown; constructing a soft label transmission matrix using a true number of objects in an image
The true ordering result of the images is represented by sigma, wherein sigma ith element sigma (i) represents that the ith image is arranged at the sigma (i) th position, and the element in the soft label matrix is calculated as follows:
wherein the method comprises the steps of
thr Is a predefined threshold. The ordering network is trained using the following cross entropy loss, resulting in features that are sensitive to the number of objects.
And then training a regression network by using the mean square error loss to finally obtain the object number regression model.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that modifications and variations can be made without departing from the technical principles of the present invention, and these modifications and variations should also be regarded as the scope of the invention.

Claims (4)

1. A method for estimating the number of weakly supervised objects based on a ranking network, comprising:
s1, extracting image features by using a pre-trained deep neural network VGG-16, and then returning to a density chart by using convolution operation; extracting multi-scale characteristics from the density map by using an adaptive pooling layer to capture global and local information in the image, and inputting the global and local information into the number of regression objects of the full-connection layer, wherein the adaptive pooling layer comprises two types of global sub-cluster layers and local sub-cluster layers;
s2, learning the multi-scale features by using an image object number ordering network to enable the multi-scale features to be sensitive to the object number, wherein the ordering network is a multi-branch network, the multi-scale features of a plurality of images are input into the ordering network, and the result of ordering according to the object number in the images is output;
s3, using a sink horn layer in the sorting network to change the sorting characteristics into ordinal matrixes, constructing a soft tag transmission matrix by using the real number of objects in the image, and training the sorting network by using cross entropy loss to obtain characteristics sensitive to the number of the objects; and training a regression network to finally obtain the object number regression model.
2. The method for estimating the number of weakly supervised objects based on the ranking network as set forth in claim 1, wherein the specific operations in step S1 are as follows: extracting image features by using a depth network model pre-trained on an image analysis task, and returning a pseudo probability density map; then constructing a global sub-cluster layer by using a pooling layer with larger stride, and extracting global features from the density map; and constructing a local sub-cluster layer by using the pooling layer with smaller steps, and extracting local features from the density map.
3. The method for estimating the number of weakly supervised objects based on the ranking network as set forth in claim 1, wherein the specific operations in step S2 are as follows: and fine-tuning the feature extraction model by using a multi-branch ordering network to obtain global and local features for the number of objects in the image.
4. The method for estimating the number of weakly supervised objects based on the ranking network as set forth in claim 1, wherein the specific operation of step S3 is as follows: using a differentiable sink horn layer to change the ordering attribute into an ordinal matrix; constructing a more efficient soft label transport matrix, using cross entropy loss to train the ordering network; the regression network is trained using the mean square error.
CN202010845336.3A 2020-08-20 2020-08-20 Weak supervision object number estimation method based on sorting network Active CN112101122B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010845336.3A CN112101122B (en) 2020-08-20 2020-08-20 Weak supervision object number estimation method based on sorting network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010845336.3A CN112101122B (en) 2020-08-20 2020-08-20 Weak supervision object number estimation method based on sorting network

Publications (2)

Publication Number Publication Date
CN112101122A CN112101122A (en) 2020-12-18
CN112101122B true CN112101122B (en) 2024-02-09

Family

ID=73753262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010845336.3A Active CN112101122B (en) 2020-08-20 2020-08-20 Weak supervision object number estimation method based on sorting network

Country Status (1)

Country Link
CN (1) CN112101122B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107301387A (en) * 2017-06-16 2017-10-27 华南理工大学 A kind of image Dense crowd method of counting based on deep learning
CN111428733A (en) * 2020-03-12 2020-07-17 山东大学 Zero sample target detection method and system based on semantic feature space conversion

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11205103B2 (en) * 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis
CN110678877B (en) * 2017-03-16 2022-07-26 西门子股份公司 System and method for visual localization in test images

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107301387A (en) * 2017-06-16 2017-10-27 华南理工大学 A kind of image Dense crowd method of counting based on deep learning
CN111428733A (en) * 2020-03-12 2020-07-17 山东大学 Zero sample target detection method and system based on semantic feature space conversion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度卷积神经网络的弱监督图像语义分割;郑宝玉;王雨;吴锦雯;周全;;南京邮电大学学报(自然科学版)(05);全文 *

Also Published As

Publication number Publication date
CN112101122A (en) 2020-12-18

Similar Documents

Publication Publication Date Title
CN110059554B (en) Multi-branch target detection method based on traffic scene
Cai et al. A modified YOLOv3 model for fish detection based on MobileNetv1 as backbone
US8620026B2 (en) Video-based detection of multiple object types under varying poses
US11194331B2 (en) Unsupervised classification of encountering scenarios using connected vehicle datasets
Wang et al. Dairy goat detection based on Faster R-CNN from surveillance video
WO2019041360A1 (en) Pedestrian attribute recognition and positioning method and convolutional neural network system
CN106951830B (en) Image scene multi-object marking method based on prior condition constraint
CN104063719A (en) Method and device for pedestrian detection based on depth convolutional network
Farag et al. Deep learning versus traditional methods for parking lots occupancy classification
CN111738074B (en) Pedestrian attribute identification method, system and device based on weak supervision learning
Getahun et al. A deep learning approach for lane detection
CN115527269A (en) Intelligent human body posture image identification method and system
CN113269038B (en) Multi-scale-based pedestrian detection method
Li et al. Real-time monocular joint perception network for autonomous driving
CN112101122B (en) Weak supervision object number estimation method based on sorting network
Patel et al. Survey on scene classification techniques
Rohith et al. Remote sensing signature classification of agriculture detection using deep convolution network models
CN110110670B (en) Data association method in pedestrian tracking based on Wasserstein measurement
Qiao et al. Rapid trajectory clustering based on neighbor spatial analysis
Wang et al. Deep tiered image segmentation for detecting internal ice layers in radar imagery
Prakash Object Detection In Surveillance Video
CN115063831A (en) High-performance pedestrian retrieval and re-identification method and device
Williams et al. Detecting marine animals in underwater video: Let's start with salmon
CN111353349B (en) Human body key point detection method and device, electronic equipment and storage medium
CN110084146B (en) Pedestrian detection method and device based on shielding perception self-supervision learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant