CN106446933B

CN106446933B - Multi-target detection method based on contextual information

Info

Publication number: CN106446933B
Application number: CN201610785155.XA
Authority: CN
Inventors: 李涛; 裴利沈; 赵雪专; 张栋梁; 李冬梅; 朱晓珺; 曲豪; 邹香玲; 高大伟; 刘永
Original assignee: HENAN RADIO & TELEVISION UNIVERSITY
Current assignee: HENAN RADIO & TELEVISION UNIVERSITY
Priority date: 2016-08-31
Filing date: 2016-08-31
Publication date: 2019-08-02
Anticipated expiration: 2036-08-31
Also published as: CN106446933A

Abstract

The invention discloses a kind of multi-target detection methods based on contextual information, including Matching Model on training under line and line, using Gist this feature of input picture by the corresponding scene where at a distance from scene clustering center, selecting the picture, and obtain corresponding select probability；By running all existing simple target basis detector DPM of target class, corresponding target detection window and corresponding target detection score value are obtained, obtains the testing result of target in conjunction with Gist feature using trained context model.This method utilizes global context data separation different scenes, then according to mutuality of objectives under different scenes, corresponding target detection model is formed, interfering with each other between target between different scenes is effectively reduced, further improves the accuracy of multi-target detection.

Description

Multi-target detection method based on context information

Technical Field

The invention relates to a multi-target detection technology based on context information, which can be applied in a multi-target detection system in real time.

Background art:

image or video based object detection is a research hotspot in the field of computer vision for recent decades and for a considerable period of time afterwards, and is the basis of visual understanding. The technology can be widely applied to the fields of subjects and engineering application such as target tracking, object detection and identification, information safety, autonomous driving, image retrieval, robots, man-machine interaction, medical image analysis, Internet of things and the like.

The existing target detection system mainly realizes the identification and detection of different targets through the depiction of the appearance information of the targets. Currently, this type of system mainly uses artificially designed features (such as HOG, LBP, SIFT, etc.) or deep features directly obtained from the image itself through deep learning to depict the target appearance, and uses the target appearance to realize target detection. However, in the actual detection of daily life, the system is mostly an open environment without restriction, and is complex and changeable, and has interferences such as illumination change, view angle change, target shielding and the like.

The invention relates to a target identification method based on context constraint, which is invented by Wang Yue Ring, Liuchang, Chenjunling and the like of Chinese Huazhong university of science and technology, and the invention is applied to Chinese intellectual property office in 12 months and 7 days in 2012 and approved, and is disclosed in 4 months and 17 days in 2013 with the publication number of CN 103049763A.

The patent document discloses a target identification method based on context constraint, which is used for remote sensing image scene classification and target detection and identification. The method comprises the steps of firstly carrying out filtering processing on an image, then carrying out region segmentation, segmenting the image into a plurality of connected domains, marking each connected domain, secondly calculating a feature vector of each connected domain, inputting the feature vector into a classifier trained in advance for carrying out scene classification calculation, outputting a class labeled graph, secondly, on the basis, according to a target to be identified, delimiting a local region range where the target possibly exists on the labeled graph, carrying out preprocessing operation on the local region, calculating an interested region in the region, and finally, extracting features and inputting the extracted features into the classifier for identification. The invention provides a quick and effective scene classification method, and aims to provide effective context constraint for target identification and improve identification efficiency and accuracy. The algorithm flow is shown in figure 1 below.

The above patent technical documents still have drawbacks: although the method utilizes the segmentation region and labels to obtain the scene classification, the method carries out global context constraint on the classification basis to calculate the region of interest, obtains the relevant feature vector, and identifies the corresponding target through the trained classifier. However, the method only utilizes the global scene context to obtain the possible target area, considers the relative position distribution of the target based on the scene, and ignores the symbiotic depiction among the targets. In addition, when the information content of the target is small, the target cannot be accurately depicted, and corresponding target detection cannot be obtained through the classifier.

The invention content is as follows:

aiming at the problems of insufficient information of the target and the like, the invention directly or indirectly provides auxiliary information for target detection by means of related information from the outside of the target in the picture or video so as to improve the accuracy of target detection.

The technical scheme adopted for realizing the purpose of the invention is as follows: a multi-target detection method based on context information is characterized by comprising an offline training step and an online matching step, wherein the offline training step is used for obtaining a subtree model:

the method comprises the following steps: firstly, labeling image target classes in a training set by using LableMe software aiming at the training set to obtain a training set image of a target identifier; training DPM detectors of all targets in the image; step two: calculating Gist characteristics of the pictures in the training set to obtain global context information; then, scene division is realized by utilizing an improved spectral clustering method;

step three: representing scenes through hidden variables, and then acquiring symbiosis and position distribution information of targets according to the labeling results of the targets of the training pictures in different scenes;

step four: judging whether the two targets are consistent targets or not by calculating the mapping distribution of the target pairs in the two pictures in the training set in the conversion space to form consistent target pairs;

step five: learning a tree structure through a weighted Chow-Liu algorithm by using the symbiosis and position distribution information and the consistency target obtained in the third step and the fourth step, and then training parameters to obtain a subtree model;

and (3) matching models on line:

the method comprises the following steps: when detecting, firstly, calculating Gist characteristics of an input image;

step two: then, according to Gist characteristics of the input image, dividing the image into corresponding scene subspaces in training and obtaining probability distribution of the corresponding scene subspaces;

step three: secondly, obtaining detection scores and detection window information of all targets of the image through trained DPM detectors of different targets;

step four: and (4) calculating the maximum posterior estimation of the target detection and the probability of correctness by utilizing the scene probability distribution obtained in the second step and the third step, the target detection scores and the detection window information in an iteration mode and combining a subtree prior model obtained by the offline training part, thereby correcting the target detection results of various DPM detectors and obtaining the final multi-target detection result.

And obtaining 520-dimensional Gist characteristics of each picture in the training set in the second step of obtaining the subtree model through offline training, wherein the obtaining process comprises the following steps: firstly, filtering an image by utilizing a group of Gabor filter groups with different scales and directions to obtain a group of filtered images; then, respectively carrying out non-overlapping grid division on the filtered image according to a fixed size, and solving a mean value of each grid after image division; finally, cascading the mean values of each grid obtained from the image group to form a global feature, and obtaining a final 520-dimensional Gist feature of the image, wherein the expression is as follows:

wherein,gist feature representing the jth image, cat representing a cascade of features, I^jRepresenting the j-th image with a division grid of r × l, g_mnRespectively representing n-directional Gabor filters of m-scale,representing the convolution of the image with a Gabor filter, n_cThe number of filters representing convolution is m x n,has a dimension of r × l × n_c. The scheme adopts 4 Gabor filters with the scales and the directions of 8;

wherein, the step two of obtaining the sub-tree model by offline training adopts an improved spectral clustering method to obtain 6-8 types of sub-scenes, and the specific steps are as follows: firstly, inputting Gist characteristics of each image in a training set, and obtaining a similarity matrix representing the similarity between each image in the training set by using a Random Forest method; then, the similarity matrix is used as input, and a spectral clustering method is adopted to cluster the training set pictures, so as to realize scene division of different training set pictures.

Step three of obtaining the subtree model by offline training is to blend a subtree context model of the consistency target pair when the subtree model is trained, and the specific steps are as follows:

(1)

the components of the acquisition consistency object are expressed as follows

Wherein (l)_x(oik),l_y(oik)) represents the coordinates of the center position of the object box of the kth instance of the ith class of object in the o graph. Scale sc (oik) is expressed as the square root of the area of the target box for this example, and view angle p (oik) is obtained as the aspect ratio of the target box; similarly (l)_x(qil),l_y(qil)) represents the center position coordinates of the object frame of the i-th instance of the i-th class object in the q-graph. Scale sc (qi l), view angle p (qi l); using variablesRepresenting the corresponding changes of the same kind of target variables in the two graphs in different four-dimensional space, wherein R ∈ R represents the mutual relation, R represents the same kind of target corresponding relation set of the two graphs in each consistency target pair,showing the relationship of the mutual change of the target positions,showing the relationship of the target dimensions to each other,representing facies of a target perspectiveA tautomeric relationship; judging whether the corresponding target pair accords with consistency distribution or not through the mapping distribution calculated by the formula (2), if so, judging that the corresponding target pair belongs to the same target paradigm, namely a consistency target pair;

(2)

generating final target group sets under different subspaces by utilizing greedy clustering, and avoiding the sensitivity of conversion space region division and redundancy caused by similar target group generation by adopting a soft voting mode; if the frequency of the target occurrence is not more than 50% in the target group, the target is removed from the target group; finally forming target groups under different scene subspaces; on the basis of the formed target group, forming a consistent target pair in the same target group through pairwise combination of different types of targets;

(3)

the local context information between the targets is described and represented through the symbiosis and mutual position relation between the proposed consistent target pair and the single target; the method comprises the following steps: first, the correlation of the consistency target pair and the sub-scenes is characterized:

θ_it＝cf_it×isf_i (3)

wherein, cf_itRepresenting the frequency, isf, of occurrence of the ith coherent object pair in the tth sub-scene_iAn inverse scene frequency index representing the ith correspondence target pair, which is expressed as follows:

wherein T represents the total type number of the sub-scenes, T_tIndicating the number of sub-scene types containing the ith coincidence target pair, ξ is a minimum to avoid isf_iThe value is 0; all the correlation coefficients theta are obtained_itThen, carrying out normalization processing;

(4)

establishing a binary tree for describing the symbiosis of the targets and a Gaussian tree for describing the position relationship of the targets under different sub-scenes t by using the labeling information of the training set pictures, wherein the binary tree and the Gaussian tree describe the position relationship of the targets and together depict a prior subtree model;

the joint probability of whether all the targets appear in the binary tree is expressed as:

p(b|z_t)＝p(b_root|z_t)∏_ip(b_i|b_pa(i),z_t) (5)

where i represents a node in the tree, pa (i) represents a parent node of node i, b_iE {0,1} represents whether the target i appears in the image; with b ≡ { b ≡ b_iRepresents all target classes; b_rootRoot node, z, representing a subtree_tIs a discrete variable representing the tth sub-scene space;

position L of object i_iDepending on the occurrence of the target, the interdependencies between its locations have a binary tree structure consistent with the occurrence of the target, which is expressed as follows:

p(L|b)＝p(L_root|b_root)Π_ip(L_i|L_pa(i),b_i,b_pa(i)) (6)

wherein L is_rootIndicating the location of the root node, L_pa(i)Indicating the location of the parent node.

Then the joint distribution of the variable b and the position L is represented as:

whereinExpressed as:

(5) combining the detection result of the trained single target detector DPM with Gist

The global features are integrated into the prior model, and the global features are represented by g, then the joint distribution is represented as:

wherein,expressed as:

W_ikindicating the position of the kth candidate window obtained with a single target detection of the target class i,

s_ika score representing a kth candidate window obtained with a single target detection of the target class i; c. C_ikWhether the kth candidate window of the target class i is correctly detected is shown, if so, the value is 1, otherwise, the value is 0;

(6) the training subtree model mainly comprises the learning of a tree structure and the learning of related parameters; when the Chow-Liu algorithm is used for carrying out prior model learning of the tree structure, the correlation theta between the consistency target pair depicted in the formula (3) and the scene_itChanging the mutual information S of the parent and child nodes in the target pair_i：

S_i＝S_i×(1+sigm(θ_it)) (11)

Then, completing the structure learning of the subtree prior model according to the maximum weight;

for the learning of model parameters, first, p (b) in formula (8)_i|b_pa(i)) The method comprises the steps of obtaining through counting symbiosis and consistency target pairs of targets and mutual information change; p (L)_i|L_pa(i),b_i,b_pa(i)) Taking values according to the appearance of the parent-child nodes, co-occurrence of the common parent-child nodes, appearance of the child nodes and absence of the child nodes, and taking Gaussian distribution into consideration to obtain the values:

p (g | b) is estimated by the Gist global feature of each training image in equation (9)_i) Specifically obtained by the following formula:

for global feature g, p (b) is estimated by adopting a logistic regression method_i|g)；

Integrating the corresponding detection results of a single basis detector, first the probability p (c) of correct detection_ik|b_i) Its value is closely related to whether the target appears, the form is as follows:

when the target does not appear, the correct detection rate is 0, and when the target appears, the correct detection probability is the ratio of the number of correct detections to the total number of labels marked by the target in the training set;

then, the position probability p (W) of the window is detected_ik|c_ik,L_i) Is a Gaussian distribution, relying on correct detection of c_ikAnd location L of target class i_iIt is expressed as:

wherein, when the window is correctly detected, then W_ikConforming to a Gaussian distribution, Λ_iA variance representing a predicted position of the target; if the window is not correctly detected, W_ikNot dependent on L_iMay be expressed as a constant;

finally, the score probability p(s) for the basis detector_ik|c_ik) Depending on the result c of the correct detection_ikExpressed as:

wherein p (c)_ik|s_ik) And estimating by adopting a logistic regression method.

Wherein the on-line matching section:

(1) in the detection, firstly, the method in formula (1) is adopted to obtain Gist global feature for the input image j；

(2) Then, according to Gist characteristics of the input image, dividing the image into corresponding scene subspaces in training and obtaining probability distribution of the corresponding scene subspaces; the probability distribution of the corresponding sub-scene is specifically expressed as:

wherein,representing the inverse of the distance of the input picture j to the center of the t sub-scene clusters,represents the reciprocal sum of distances to all cluster centers; using normalized probability representationThe probability of belonging to a certain sub-scene;

(3) obtaining initial detection scores and detection window information of each target of the image by using the trained DPM detectors of different targets;

(4) by utilizing the probability distribution of the sub-scenes obtained in the steps (2) and (3), the detection scores of all targets and the detection window information, and combining a sub-tree prior model obtained by an offline training part, an iteration mode is adopted to obtain the maximum posterior estimation of the target detection and the probability of correctness, so that the target detection results of various DPM detectors are corrected to obtain the final multi-target detection result; obtained by iterative optimization of the following formula:

the invention has the beneficial effects that: in the system, aiming at the problems of insufficient information of the target and the like, by means of related information from outside the target in the picture or video, such as scene information of the target and interrelation among different targets, auxiliary information is directly or indirectly provided for target detection, so that the accuracy of target detection is improved. The system utilizes Gist global characteristics representing the context information of the global scene to realize scene selection, then, aiming at different scene subspaces, the concept of a consistent target pair is put forward while the symbiotic relationship and the position relationship between single targets are merged, and the concept is taken as important local context information to be merged into a target detection model of a corresponding subtree structure. And changing corresponding mutual information weight in the formation process of the subtree target detection model through the consistency target pair. Thereby changing the structure of the subtree target detection model using the local context information of the consistency target pair. The method utilizes the global context information to distinguish different scenes, then forms a corresponding target detection model according to the interrelation between targets in different scenes, effectively reduces the mutual interference between the targets in different scenes, enhances the mutual constraint between the targets by introducing a consistent target pair, provides more robust local context information, and further improves the accuracy of multi-target detection compared with the existing system.

Description of the drawings:

FIG. 1 is a flow chart of the prior art;

FIG. 2 is a flow chart of the present invention for a vehicle type identification algorithm;

FIG. 3 is a schematic diagram of the consistent target pair acquisition of the present invention;

FIG. 4 is a diagram of some detection results of the multi-target detection method based on context information according to the present invention.

The specific implementation mode is as follows:

by means of the related information from the outside of the target to be detected in the image or video, such as the scene information of the target to be detected, the mutual relation between different targets and the target to be detected, and the like, the auxiliary information can be directly or indirectly provided for the target to be detected, the target to be detected is more abundantly depicted, and therefore the accuracy of target detection can be improved. Based on the thought, the invention provides a multi-target detection system fusing various context information, and the system consists of a scene selection layer and a sub-tree layer. Firstly, a scene selection layer is obtained through Gist global characteristics, then, in a corresponding sub-scene, symbiosis and position relations between targets are described through a probability graph model of a tree structure by using a single target and a consistent target, a sub-tree layer is obtained, and therefore, multi-target detection is achieved through global and local context information. During training, firstly, representing global context information by using Gist characteristics in a scene selection layer, obtaining an initial scene subset by using the characteristics and adopting an improved spectral clustering method, and selecting root nodes of sub-trees under the subset; then, under the corresponding subset, the labeled training set images are utilized, the local context information between the targets is represented through the proposed symbiosis and mutual position relation between the consistent target pair and the single target, and different sub-tree models are obtained through training by utilizing the local information. When the target detection is carried out, firstly, the Gist characteristic of an input picture is calculated and obtained, in a scene selection layer, the corresponding scene where the picture is located is selected by utilizing the characteristic through the distance between the characteristic and a scene clustering center, and the corresponding selection probability is obtained; and then, operating existing single target basic detectors DPM of all target classes to obtain corresponding target detection windows and corresponding target detection scores, and obtaining target detection results by combining Gist characteristics through a trained context model. The method utilizes the acquired local and global context information to reduce or remove the detection result obtained according to the appearance target detector, and completes the correction of the single target detection result to obtain the final target detection result.

The embodiment realizes the steps of the context-based multi-target detection system:

an offline training part of the multi-target detection system, 1) firstly, labeling image target classes in a training set by using LableMe software aiming at the training set to obtain a training set image of a target identifier. 2) Calculating Gist characteristics of the pictures in the training set to obtain global context information; then, scene division is realized by utilizing an improved spectral clustering method; 3) representing scenes through hidden variables, and then acquiring symbiosis and position distribution information of targets according to the labeling results of the targets of the training pictures in different scenes; 4) judging whether the two targets are consistent targets or not by calculating the mapping distribution of the target pairs in the two pictures in the training set in the conversion space to form consistent target pairs; 5) using 3), the symbiosis and position distribution information obtained in 4) and the consistency target to learn a tree structure through a weighted Chow-Liu algorithm, and then training parameters to obtain a subtree model.

On-line matching model

1) Upon detection, first, Gist characteristics of an input image are calculated. 2) Then, according to Gist characteristics of the input image, dividing the image into corresponding scene subspaces in training and obtaining probability distribution of the corresponding scene subspaces; 3) secondly, obtaining detection scores and detection window information of all targets of the image through trained DPM detectors of different targets; 4) and (3) obtaining scene probability distribution, target detection scores and detection window information by using the scene probability distribution obtained in step 2) and step 3), solving maximum posterior estimation of target detection and probability of correctness by adopting an iteration mode and combining a subtree prior model obtained by an offline training part, and thus correcting target detection results of various DPM detectors to obtain a final multi-target detection result (DPM detector: DPM (Deformable Parts model) is a very successful target detection algorithm, and a detection champion of VOC (visual object class) is continuously obtained for years. It has become an important part of many classifiers, segmentation, human body posture and behavior classification. In 2010, the inventor, Pedro Felzenszwalb, was awarded a "lifelong achievement prize" by VOC. DPM can be viewed as an extension of HOG (histograms of organized gradients), with the general idea being consistent with HOG. Firstly, calculating a gradient direction histogram, and then training by using an SVM (support Vector machine) to obtain a gradient Model (Model) of the object. With such templates, it can be directly used for classification, and a simple understanding is that the model matches the target).

The implementation flow of the scheme is shown in fig. 2.

Aiming at the above flow, the scheme is elaborated in detail:

sub-line training to obtain sub-tree model

1) Firstly, target labeling is carried out on a training set image by using LabelMe software, a training set image containing target type and position information is obtained, and a DPM detector of each target in the image is trained.

2) And then, calculating Gist characteristics of the samples in the training set so as to obtain global context information of the sample images, and realizing different scene division by utilizing an improved spectral clustering method. The detailed steps are as follows:

(2.1)

obtain the Gist characteristic of 520 dimensions of each picture in the training set. The acquisition process comprises the following steps: firstly, filtering an image by utilizing a group of Gabor filter groups with different scales and directions to obtain a group of filtered images; then, respectively carrying out non-overlapping grid division on the filtered image according to a fixed size, and solving a mean value of each grid after image division; finally, cascading the mean values of each grid obtained from the image group to form a global feature, and obtaining a final 520-dimensional Gist feature of the image, wherein the expression is as follows:

wherein,gist feature representing the jth image, cat representing a cascade of features, I^jRepresenting the j-th image with a division grid of r × l, g_mnRespectively representing n-directional Gabor filters of m-scale,representing the convolution of the image with a Gabor filter, n_cThe number of filters representing convolution is m x n,has a dimension of r × l × n_c. The scheme adopts 4 Gabor filters with the size in the 8 directions.

And (2.2) aiming at the Gist characteristics of each picture of the obtained training set, obtaining 6-8 types of sub-scenes by adopting an improved spectral clustering method. The specific process is as follows: firstly, inputting Gist characteristics of each image in a training set, and obtaining a similarity matrix representing the similarity between each image in the training set by using a Random Forest method; then, the similarity matrix is used as input, and a spectral clustering method is adopted to cluster the training set pictures, so as to realize scene division of different training set pictures.

3) And in each different scene subspace, training a corresponding sub-tree model in the scene by using the image subset obtained in the scene subspace and adopting a probability graph model with a tree structure. In the scheme, when a subtree model is trained, a consistency target pair is integrated to describe pairwise relations between targets, and a subtree context model of the consistency target pair is provided. The specific process is as follows:

(3.1) firstly, according to the consistent distribution of two adjacent heterogeneous targets in two different images in the scene subspace on the spatial position, the scale and the visual angle, obtaining a consistent target pair in the scene subspace. The specific process of obtaining the consistency target pair is shown in fig. 3. Each component thereof is represented as follows

Wherein (l)_x(oik),l_y(oik)) represents the coordinates of the center position of the object box of the kth instance of the ith class of object in the o graph. Scale sc (oik) is expressed as the square root of the area of the target box for this example, and view angle p (oik) is obtained as the aspect ratio of the target box; similarly (l)_x(qil),l_y(qil)) represents the center position coordinates of the object frame of the i-th instance of the i-th class object in the q-graph. Scale sc (qi l), view angle p (qi l); using variablesRepresenting the corresponding changes of the same kind of target variables in the two graphs in different four-dimensional space, wherein R ∈ R represents the mutual relation, R represents the same kind of target corresponding relation set of the two graphs in each consistency target pair,showing the relationship of the mutual change of the target positions,showing the relationship of the target dimensions to each other,representing the mutual variation relation of the target view angles; judging whether the corresponding target pair accords with consistency distribution or not through the mapping distribution calculated by the formula (2), if so, judging that the corresponding target pair belongs to the same target paradigm, namely a consistency target pair;

(3.2)

and (5) generating a final target group set under different subspaces by means of greedy clustering. And the method adopts a soft voting mode to avoid the sensitivity of the division of the conversion space region and the redundancy caused by the generation of a similar target group. Meanwhile, in order to reduce the level number of the target group, if the frequency of the occurrence of the target is not more than 50% in the target group, the target is removed from the target group, and through the above operations, the target group under different scene subspaces is finally formed. On the basis of the formed target group, in the same target group, a consistency target pair is formed through pairwise combination of different types of targets.

(3.3)

The local context information between the targets is represented by the co-occurrence and mutual position relation between the proposed consistent target pair and the single target. First, the correlation of the consistency object pair and the sub-scenes is characterized. As shown in the following formula:

θ_it＝cf_it×isf_i (3)

wherein T represents the total type number of the sub-scenes, T_tIndicating the number of sub-scene types containing the ith coincidence target pair, ξ is a minimum value to avoid isf_iThe value is 0. All the correlation coefficients theta are obtained_itThen, normalization processing is performed.

(3.4)

And establishing a binary tree for describing the symbiosis of the targets and a Gaussian tree for describing the position relationship of the targets under different sub-scenes t by using the labeling information of the training set pictures, wherein the binary tree and the Gaussian tree are used for describing the position relationship of the targets and together depict a prior subtree model.

p(b|z_t)＝p(b_root|z_t)Π_ip(b_i|b_pa(i),z_t) (5)

where i represents a node in the tree, pa (i) represents a parent node of node i, b_iE {0,1} represents whether or not the object i appears in the image. With b ≡ { b ≡ b_iRepresents all target classes. b_rootRoot node, z, representing a subtree_tIs a discrete variable representing the t-th sub-scene space.

p(L|b)＝p(L_root|b_root)Π_ip(L_i|L_pa(i),b_i,b_pa(i)) (6)

whereinExpressed as:

(3.5) the detection result of the trained single target detector DPM and the Gist global feature are merged into a prior model, the global feature is represented by g, and the joint distribution is represented as:

wherein,expressed as:

W_ikrepresenting the position, s, of the kth candidate window obtained with a single target detection of the target class i_ikA score representing a kth candidate window obtained with a single target detection of the target class i; c. C_ikAnd whether the kth candidate window of the target class i is correctly detected is shown, if so, the value is 1, and otherwise, the value is 0.

And (3.6) training the subtree model mainly comprises learning of a tree structure and learning of related parameters. When using Chow-Liu algorithm to do prior model learning of tree structure, through

(3.3) correlation of the correspondence object pair depicted in (3) with the scene θ_itChanging the mutual information S of the parent and child nodes in the target pair_i. The concrete expression is as follows:

S_i＝S_i×(1+sigm(θ_it)) (11)

and then, completing the structure learning of the subtree prior model according to the maximum weight.

For the learning of model parameters, first, p (b) in the formula (8)_i|b_pa(i)) The method is obtained by counting the pair of the symbiosis and consistency targets and the mutual information change of the targets. p (L)_i|L_pa(i),b_i,b_pa(i)) And obtaining values by considering Gaussian distribution according to three conditions of occurrence of parent and child nodes, co-occurrence of common-division parent and child nodes, occurrence of child nodes and non-occurrence of child nodes. The specific form is as follows:

(9) where p (g | b) is estimated by the Gist global feature of each training image_i) Specifically obtained by the following formula:

for global feature g, p (b) is estimated by adopting a logistic regression method_i|g)。

and when the target does not appear, the correct detection rate is 0, and when the target appears, the correct detection probability is the ratio of the number of correct detections to the total number of labels marked on the target in the training set.

wherein, when the window is correctly detected, then W_ikConforming to a Gaussian distribution, Λ_iA variance representing a predicted position of the target; if the window is not correctly detected, W_ikNot dependent on L_iAnd may be expressed as a constant.

wherein p (c)_ik|s_ik) And estimating by adopting a logistic regression method.

Two-wire matching section

4) In the detection, the method in 2) is firstly adopted for the input image j to obtain the Gist global feature

5) Then, according to Gist characteristics of the input image, the image is divided into corresponding scene subspaces in training, and probability distribution of the corresponding scene subspaces is obtained. Wherein, the probability distribution of the corresponding sub-scene is specifically expressed as:

wherein,representing the inverse of the distance of the input picture j to the center of the t sub-scene clusters,representing the reciprocal sum of distances to all cluster centers. Using normalized probability representationThe probability of belonging to a certain sub-scene.

6) Obtaining initial detection scores and detection window information of each target of the image by using the trained DPM detectors of different targets;

7) and (3) solving the maximum posterior estimation of the target detection and the probability of correctness by utilizing the probability distribution of the sub-scenes obtained in the step 5) and the step 6) and the values of all target detection and the information of the detection windows in an iterative mode in combination with a sub-tree prior model obtained by an offline training part, thereby correcting the target detection results of various DPM detectors to obtain the final multi-target detection result. In particular by iterative optimization of the following formula.

The scheme integrates context information and enriches target expression, and as shown in fig. 4, the multi-target detection method based on the context information obtains satisfactory detection results.

Claims

1. A multi-target detection method based on context information is characterized by comprising an offline training model and an online matching model,

and (3) obtaining a subtree model by offline training:

the method comprises the following steps: firstly, labeling image target classes in a training set by using LableMe software aiming at the training set to obtain a training set image of a target identifier; training DPM detectors of all targets in the image;

step two: calculating Gist characteristics of the pictures in the training set to obtain global context information; then, scene division is realized by utilizing an improved spectral clustering method;

and (3) matching models on line:

2. A multi-target detection method based on context information is characterized in that 520-dimensional Gist characteristics of each picture in a training set are obtained in a second step of obtaining a subtree model through offline training, and the obtaining process comprises the following steps: firstly, filtering an image by utilizing a group of Gabor filter groups with different scales and directions to obtain a group of filtered images; then, respectively carrying out non-overlapping grid division on the filtered image according to a fixed size, and solving a mean value of each grid after image division; finally, cascading the mean values of each grid obtained from the image group to form a global feature, and obtaining a final 520-dimensional Gist feature of the image, wherein the expression is as follows:

wherein,gist feature representing the jth image, cat representing a cascade of features, I^jRepresenting the j-th image with a division grid of r × l, g_mnRespectively representing n-directional Gabor filters of m-scale,representing the convolution of the image with a Gabor filter, n_cThe number of filters representing convolution is m x n,has a dimension of r × l × n_c。

3. The multi-target detection method based on context information as claimed in claim 1, wherein: in the second step of obtaining the sub-tree model by offline training, an improved spectral clustering method is adopted to obtain 6-8 types of sub-scenes, and the specific steps are as follows: firstly, inputting Gist characteristics of each image in a training set, and obtaining a similarity matrix representing the similarity between each image in the training set by using a Random Forest method; then, the similarity matrix is used as input, and a spectral clustering method is adopted to cluster the training set pictures, so as to realize scene division of different training set pictures.

4. The multi-target detection method based on context information as claimed in claim 1, wherein: and step three of obtaining a subtree model by offline training, wherein scenes are represented by hidden variables, and then a subtree context model of a consistent target pair is merged into the scenes when symbiosis and position distribution information of targets are obtained according to the labeling result of the targets of the training pictures in different scenes, and the method specifically comprises the following steps of:

(1) firstly, obtaining a consistent target pair in a scene subspace according to consistent distribution of two adjacent heterogeneous targets in two different images in the scene subspace on the spatial position, the scale and the visual angle;

the components of the acquisition consistency object are expressed as follows

Wherein (l)_x(oik),l_y(oik)) coordinates of a center position of the object box representing the kth instance of the ith class of object in the o graph; scale sc (oik) is expressed as the square root of the area of the target box for this example, and view angle p (oik) is obtained as the aspect ratio of the target box; similarly (l)_x(qil),l_y(qil)) coordinates of the center position of the target frame of the l-th instance of the i-th class target in the q-graph; scale sc (qi l), view angle p (qi l); using variablesRepresenting the corresponding changes of the same kind of target variables in the two graphs in different four-dimensional space, wherein R ∈ R represents the mutual relation, R represents the same kind of target corresponding relation set of the two graphs in each consistency target pair,showing the relationship of the mutual change of the target positions,showing the relationship of the target dimensions to each other,representing the mutual variation of the view angles of an objectTransforming the relationship; judging whether the corresponding target pair accords with consistency distribution or not through the mapping distribution calculated by the formula (2), if so, judging that the corresponding target pair belongs to the same target paradigm, namely a consistency target pair;

(2) generating final target group sets under different subspaces by utilizing greedy clustering, and avoiding the sensitivity of conversion space region division and redundancy caused by similar target group generation by adopting a soft voting mode; if the frequency of the target occurrence is not more than 50% in the target group, the target is removed from the target group; finally forming target groups under different scene subspaces; on the basis of the formed target group, forming a consistent target pair in the same target group through pairwise combination of different types of targets;

(3) the local context information between the targets is described and represented through the symbiosis and mutual position relation between the proposed consistent target pair and the single target; the method comprises the following steps: first, the correlation of the consistency target pair and the sub-scenes is characterized:

θ_vt＝cf_vt×isf_v (3)

wherein, cf_vtRepresenting the frequency, isf, of occurrence of the vth coherent target pair in the tth sub-scene_vAn inverse scene frequency index representing the v-th correspondence target pair, which is expressed as follows:

wherein T represents the total type number of the sub-scenes, T_tIndicating the number of sub-scene types containing the vth pair of consistency objects, ξ is a minimum to avoid isf_vThe value is 0; all the correlation coefficients theta are obtained_vtThen, carrying out normalization processing;

(4) establishing a binary tree for describing the symbiosis of the targets and a Gaussian tree for describing the position relationship of the targets under different sub-scenes t by using the labeling information of the training set pictures, wherein the binary tree and the Gaussian tree describe the position relationship of the targets and together depict a prior subtree model;

p(b|z_t)＝p(b_root|z_t)∏_wp(b_w|b_pa(w),z_t) (5)

where w represents a node in the tree, pa (w) represents a parent node of node w, b_wE {0,1} represents whether the target w appears in the image; with b ≡ { b ≡ b_wRepresents all target classes; b_rootRoot node, z, representing a subtree_tIs a discrete variable representing the tth sub-scene space;

position L of target w_wDepending on the occurrence of the target, the interdependencies between its locations have a binary tree structure consistent with the occurrence of the target, which is expressed as follows:

p(L|b)＝p(L_root|b_root)∏_wp(L_w|L_pa(w),b_w,b_pa(w)) (6)

wherein L is_rootIndicating the location of the root node, L_pa(w)Representing the location of the parent node;

whereinExpressed as:

(5) and (3) integrating the detection result of the trained single target detector DPM and the Gist global feature into a prior model, wherein the global feature is represented by g, and the joint distribution is represented as:

wherein,expressed as:

W_wkrepresenting the position, s, of the kth candidate window obtained with a single target detection of the target class w_wkRepresents the score of the kth candidate window obtained with a single target detection of the target class w; c. C_wkWhether the kth candidate window of the target class w is correctly detected is shown, if so, the value is 1, otherwise, the value is 0; (6) the training subtree model mainly comprises the learning of a tree structure and the learning of related parameters; when the Chow-Liu algorithm is used for carrying out prior model learning of the tree structure, the correlation theta between the consistency target pair depicted in the formula (3) and the scene_wtChanging the mutual information S of the parent and child nodes in the target pair_w：

S_w＝S_w×(1+sigm(θ_wt)) (11)

for the learning of model parameters, first, p (b) in formula (8)_w|b_pa(w)) The method comprises the steps of obtaining through counting symbiosis and consistency target pairs of targets and mutual information change; p (L)_w|L_pa(w),b_w,b_pa(w)) Taking values according to the appearance of the parent-child nodes, co-occurrence of the common parent-child nodes, appearance of the child nodes and absence of the child nodes, and taking Gaussian distribution into consideration to obtain the values:

p (g | b) is estimated by the Gist global feature of each training image in equation (9)_w) Specifically obtained by the following formula:

for global feature g, p (b) is estimated by adopting a logistic regression method_w|g)；

Integrating the corresponding detection results of a single basis detector, first the probability p (c) of correct detection_wk|b_w) Its value is closely related to whether the target appears, the form is as follows:

then, the position probability p (W) of the window is detected_wk|c_wk,L_w) Is a Gaussian distribution, relying on correct detection of c_ikAnd the location L of the target class w_wIt is expressed as:

wherein, when the window is correctly detected, then W_wkConforming to a Gaussian distribution, Λ_wA variance representing a predicted position of the target; if the window is not correctly detected, W_wkNot dependent on L_wMay be expressed as a constant;

finally, the score probability p(s) for the basis detector_wk|c_wk) Depending on the result c of the correct detection_wkExpressed as:

wherein p (c)_wk|s_wk) And estimating by adopting a logistic regression method.