CN112101122B

CN112101122B - Weak supervision object number estimation method based on sorting network

Info

Publication number: CN112101122B
Application number: CN202010845336.3A
Authority: CN
Inventors: 李国荣; 杨一帆; 黄庆明; 苏荔
Original assignee: University of Chinese Academy of Sciences
Current assignee: University of Chinese Academy of Sciences
Priority date: 2020-08-20
Filing date: 2020-08-20
Publication date: 2024-02-09
Anticipated expiration: 2040-08-20
Also published as: CN112101122A

Abstract

The invention relates to the technical field of computer vision, in particular to a weak supervision object number estimation method based on a sorting network, which does not need to rely on object position labeling information to train a model, saves human resources and improves the universality of the model; comprising the following steps: extracting image features by using a deep neural network, and acquiring pyramid feature vectors by using an adaptive pooling layer; the number of objects returned back and forth using the full connection layer; the model is trained using a multi-branch ordering network, the ordering result is transformed into an ordering matrix using the sink horn layer, and the loss is calculated using the soft label transmission matrix as a true value.

Description

Weak supervision object number estimation method based on sorting network

Technical Field

The invention relates to the technical field of computer vision, in particular to a weak supervision object number estimation method based on a sorting network.

Background

The counting of key objects such as the number of people, vehicles and the like is realized through the camera in the public occasion, and the method has important research value. For example, the counting result of people in a waiting hall and the estimation of the number of vehicles in a traffic intersection can optimize the dispatching of public traffic; a sudden change in the number of people in an area may be both a result of an incident and a result of an incident. Therefore, the object number estimation in the image video has important value in the intelligent security field, and is an important research content in the fields of computer vision and intelligent video monitoring.

At present, the number of objects estimation method can be roughly divided into three types, namely 1) object detection, wherein the method is relatively direct, and in a scene with sparse objects, the number of the objects is obtained by detecting the objects in an image, so that the method is not effective under the condition of crowding the objects. 2) Clustering visual characteristic tracks: for video monitoring, the number of objects is generally determined by the number obtained by clustering the tracks by using a KLT tracker and a clustering method. 3) Feature-based regression by building a regression model of image features and the number of image objects, and estimating the number of objects in the scene by measuring the image features. The direct method is easy to be influenced by the difficult problems such as shielding under the crowded condition, and the indirect method has the capability of counting objects in a large scale from the integral characteristics of the object group.

Existing algorithms based on feature regression suffer from the following drawbacks. First, labeling of object locations is often expensive. Existing object number estimation datasets provide locations for each object to train a number regression network, while in the estimation phase, these location tags are not considered, only the accuracy of the estimated object number is estimated. In fact, without the need for location, only the number of objects in the image may be annotated, with a more efficient weak supervision method to train the object number estimation model.

Disclosure of Invention

In order to solve the technical problems, the invention provides a weak supervision object number estimation method based on a sorting network, which does not need object position marking information, saves human resources and improves object number estimation accuracy.

The invention discloses a weak supervision object number estimation method based on a sorting network, which comprises the following steps of:

s1, extracting image features by using a pre-trained deep neural network such as VGG-16, and then returning a density chart by using convolution operation; and extracting multi-scale characteristics from the density map by using the self-adaptive pooling layer to capture global and local information in the image, and inputting the global and local information into the number of regression objects of the full-connection layer. The self-adaptive pooling layer comprises a global sub-cluster layer and a local sub-cluster layer.

S2, learning the multi-scale features by using an image object number ordering network, so that the multi-scale features are sensitive to the object number. The ranking network here is a multi-branch network whose inputs are multi-scale features of multiple images and whose outputs are the results of ranking according to the number of objects in the images.

S3, using a sink horn layer in the sorting network to change the sorting characteristics into ordinal matrixes, constructing a soft tag transmission matrix by using the real number of objects in the image, and training the sorting network by using cross entropy loss to obtain characteristics sensitive to the number of the objects; then training a regression network to finally obtain an object number regression model;

the invention discloses a weak supervision object number estimation method based on a sorting network, which comprises the following specific operations of step S1: extracting image features by using a depth network model pre-trained on an image analysis task, and returning a pseudo probability density map; then constructing a global sub-cluster layer by using a pooling layer with larger stride, and extracting global features from the density map; and constructing a local sub-cluster layer by using the pooling layer with smaller steps, and extracting local features from the density map.

The invention discloses a weak supervision object number estimation method based on a sorting network, which comprises the following specific operations of step S2: trimming feature extraction model using multi-branch ordering network to obtain global and local features for number of objects in image

The invention discloses a weak supervision object number estimation method based on a sorting network, which comprises the following specific operations of step S3: using a differentiable sink horn layer to change the ordering attribute into an ordinal matrix; constructing a more efficient soft label transport matrix to train the ranking network; the sorting network is trained using cross entropy loss and the regression network is trained using mean square error.

The beneficial effects of the invention are as follows: the ordering network can learn multi-scale characteristics sensitive to the number of objects through the relative relation of the number of objects among images, is used for inputting the regression network, avoids using the position information of the objects, and does not need a large amount of manpower to label the position information of the objects. Using a sink Horn layer that is differentiable, so that the network can train end-to-end; the soft label transportation matrix is constructed by utilizing the relative relation of the number of objects in the image, so that a complex program of the sequencing task is effectively reflected, and the accuracy of estimating the number of objects is improved.

Drawings

Fig. 1 is a schematic diagram of the present invention.

Detailed Description

The following describes the embodiments of the present invention in further detail with reference to examples. The following examples are illustrative of the present invention but are not intended to limit the scope of the invention

Examples

S1, extracting image features by using a pre-trained deep neural network such as VGG-16, and then returning a density chart by using convolution operation; global and local information in the image is captured by extracting multi-scale features from the density map by using a plurality of pooling layers, and the global and local information is input to the number of fully connected layer regression objects. The self-adaptive pooling layer comprises a global sub-cluster layer and a local sub-cluster layer. The global sub-cluster layer uses a three-Max pooling layer, and the pooling step sizes are 8, 16 and 32 respectively; the local sub-cluster layer uses two Average pooling layers, and the pooling step length is 1 and 2;

s2, learning the multi-scale features by using an image object number ordering network, so that the multi-scale features are sensitive to the object number. The ranking network here is a multi-branch network whose inputs are multi-scale features of multiple images and whose outputs are the results of ranking according to the number of objects in the images. Specifically, a K-branch network can be adopted to extract multi-scale features f of K images ₁ ,f ₂ ,f ₃ ,…，f _K Then calculate f ₁ -f ₂ ，f ₁ -f ₃ ,…,f ₁ -f _k ,f ₂ -f ₄ ,…,f ₂ -f _K ,…,f _K-1 -f _K Is input into a sorting network to obtain a K (K-1) -dimensional sorting vector f _d ；

S3, using a sink horn layer in the sorting network to sort the characteristic f _d Becomes an ordinal matrix P in which the ith row and jth column elements P _i,j The probability that the ith image is ranked at the jth name is shown; constructing a soft label transmission matrix using a true number of objects in an image

The true ordering result of the images is represented by sigma, wherein sigma ith element sigma (i) represents that the ith image is arranged at the sigma (i) th position, and the element in the soft label matrix is calculated as follows:

wherein the method comprises the steps of

△ _thr Is a predefined threshold. The ordering network is trained using the following cross entropy loss, resulting in features that are sensitive to the number of objects.

And then training a regression network by using the mean square error loss to finally obtain the object number regression model.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that modifications and variations can be made without departing from the technical principles of the present invention, and these modifications and variations should also be regarded as the scope of the invention.

Claims

1. A method for estimating the number of weakly supervised objects based on a ranking network, comprising:

s1, extracting image features by using a pre-trained deep neural network VGG-16, and then returning to a density chart by using convolution operation; extracting multi-scale characteristics from the density map by using an adaptive pooling layer to capture global and local information in the image, and inputting the global and local information into the number of regression objects of the full-connection layer, wherein the adaptive pooling layer comprises two types of global sub-cluster layers and local sub-cluster layers;

s2, learning the multi-scale features by using an image object number ordering network to enable the multi-scale features to be sensitive to the object number, wherein the ordering network is a multi-branch network, the multi-scale features of a plurality of images are input into the ordering network, and the result of ordering according to the object number in the images is output;

s3, using a sink horn layer in the sorting network to change the sorting characteristics into ordinal matrixes, constructing a soft tag transmission matrix by using the real number of objects in the image, and training the sorting network by using cross entropy loss to obtain characteristics sensitive to the number of the objects; and training a regression network to finally obtain the object number regression model.

2. The method for estimating the number of weakly supervised objects based on the ranking network as set forth in claim 1, wherein the specific operations in step S1 are as follows: extracting image features by using a depth network model pre-trained on an image analysis task, and returning a pseudo probability density map; then constructing a global sub-cluster layer by using a pooling layer with larger stride, and extracting global features from the density map; and constructing a local sub-cluster layer by using the pooling layer with smaller steps, and extracting local features from the density map.

3. The method for estimating the number of weakly supervised objects based on the ranking network as set forth in claim 1, wherein the specific operations in step S2 are as follows: and fine-tuning the feature extraction model by using a multi-branch ordering network to obtain global and local features for the number of objects in the image.

4. The method for estimating the number of weakly supervised objects based on the ranking network as set forth in claim 1, wherein the specific operation of step S3 is as follows: using a differentiable sink horn layer to change the ordering attribute into an ordinal matrix; constructing a more efficient soft label transport matrix, using cross entropy loss to train the ordering network; the regression network is trained using the mean square error.