CN106446933B - Multi-target detection method based on contextual information - Google Patents

Multi-target detection method based on contextual information Download PDF

Info

Publication number
CN106446933B
CN106446933B CN201610785155.XA CN201610785155A CN106446933B CN 106446933 B CN106446933 B CN 106446933B CN 201610785155 A CN201610785155 A CN 201610785155A CN 106446933 B CN106446933 B CN 106446933B
Authority
CN
China
Prior art keywords
target
image
targets
representing
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201610785155.XA
Other languages
Chinese (zh)
Other versions
CN106446933A (en
Inventor
李涛
裴利沈
赵雪专
张栋梁
李冬梅
朱晓珺
曲豪
邹香玲
高大伟
刘永
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HENAN RADIO & TELEVISION UNIVERSITY
Original Assignee
HENAN RADIO & TELEVISION UNIVERSITY
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by HENAN RADIO & TELEVISION UNIVERSITY filed Critical HENAN RADIO & TELEVISION UNIVERSITY
Priority to CN201610785155.XA priority Critical patent/CN106446933B/en
Publication of CN106446933A publication Critical patent/CN106446933A/en
Application granted granted Critical
Publication of CN106446933B publication Critical patent/CN106446933B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2323Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • G06V10/507Summing image-intensity values; Histogram projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Discrete Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of multi-target detection methods based on contextual information, including Matching Model on training under line and line, using Gist this feature of input picture by the corresponding scene where at a distance from scene clustering center, selecting the picture, and obtain corresponding select probability;By running all existing simple target basis detector DPM of target class, corresponding target detection window and corresponding target detection score value are obtained, obtains the testing result of target in conjunction with Gist feature using trained context model.This method utilizes global context data separation different scenes, then according to mutuality of objectives under different scenes, corresponding target detection model is formed, interfering with each other between target between different scenes is effectively reduced, further improves the accuracy of multi-target detection.

Description

Multi-target detection method based on context information
Technical Field
The invention relates to a multi-target detection technology based on context information, which can be applied in a multi-target detection system in real time.
Background art:
image or video based object detection is a research hotspot in the field of computer vision for recent decades and for a considerable period of time afterwards, and is the basis of visual understanding. The technology can be widely applied to the fields of subjects and engineering application such as target tracking, object detection and identification, information safety, autonomous driving, image retrieval, robots, man-machine interaction, medical image analysis, Internet of things and the like.
The existing target detection system mainly realizes the identification and detection of different targets through the depiction of the appearance information of the targets. Currently, this type of system mainly uses artificially designed features (such as HOG, LBP, SIFT, etc.) or deep features directly obtained from the image itself through deep learning to depict the target appearance, and uses the target appearance to realize target detection. However, in the actual detection of daily life, the system is mostly an open environment without restriction, and is complex and changeable, and has interferences such as illumination change, view angle change, target shielding and the like.
The invention relates to a target identification method based on context constraint, which is invented by Wang Yue Ring, Liuchang, Chenjunling and the like of Chinese Huazhong university of science and technology, and the invention is applied to Chinese intellectual property office in 12 months and 7 days in 2012 and approved, and is disclosed in 4 months and 17 days in 2013 with the publication number of CN 103049763A.
The patent document discloses a target identification method based on context constraint, which is used for remote sensing image scene classification and target detection and identification. The method comprises the steps of firstly carrying out filtering processing on an image, then carrying out region segmentation, segmenting the image into a plurality of connected domains, marking each connected domain, secondly calculating a feature vector of each connected domain, inputting the feature vector into a classifier trained in advance for carrying out scene classification calculation, outputting a class labeled graph, secondly, on the basis, according to a target to be identified, delimiting a local region range where the target possibly exists on the labeled graph, carrying out preprocessing operation on the local region, calculating an interested region in the region, and finally, extracting features and inputting the extracted features into the classifier for identification. The invention provides a quick and effective scene classification method, and aims to provide effective context constraint for target identification and improve identification efficiency and accuracy. The algorithm flow is shown in figure 1 below.
The above patent technical documents still have drawbacks: although the method utilizes the segmentation region and labels to obtain the scene classification, the method carries out global context constraint on the classification basis to calculate the region of interest, obtains the relevant feature vector, and identifies the corresponding target through the trained classifier. However, the method only utilizes the global scene context to obtain the possible target area, considers the relative position distribution of the target based on the scene, and ignores the symbiotic depiction among the targets. In addition, when the information content of the target is small, the target cannot be accurately depicted, and corresponding target detection cannot be obtained through the classifier.
The invention content is as follows:
aiming at the problems of insufficient information of the target and the like, the invention directly or indirectly provides auxiliary information for target detection by means of related information from the outside of the target in the picture or video so as to improve the accuracy of target detection.
The technical scheme adopted for realizing the purpose of the invention is as follows: a multi-target detection method based on context information is characterized by comprising an offline training step and an online matching step, wherein the offline training step is used for obtaining a subtree model:
the method comprises the following steps: firstly, labeling image target classes in a training set by using LableMe software aiming at the training set to obtain a training set image of a target identifier; training DPM detectors of all targets in the image; step two: calculating Gist characteristics of the pictures in the training set to obtain global context information; then, scene division is realized by utilizing an improved spectral clustering method;
step three: representing scenes through hidden variables, and then acquiring symbiosis and position distribution information of targets according to the labeling results of the targets of the training pictures in different scenes;
step four: judging whether the two targets are consistent targets or not by calculating the mapping distribution of the target pairs in the two pictures in the training set in the conversion space to form consistent target pairs;
step five: learning a tree structure through a weighted Chow-Liu algorithm by using the symbiosis and position distribution information and the consistency target obtained in the third step and the fourth step, and then training parameters to obtain a subtree model;
and (3) matching models on line:
the method comprises the following steps: when detecting, firstly, calculating Gist characteristics of an input image;
step two: then, according to Gist characteristics of the input image, dividing the image into corresponding scene subspaces in training and obtaining probability distribution of the corresponding scene subspaces;
step three: secondly, obtaining detection scores and detection window information of all targets of the image through trained DPM detectors of different targets;
step four: and (4) calculating the maximum posterior estimation of the target detection and the probability of correctness by utilizing the scene probability distribution obtained in the second step and the third step, the target detection scores and the detection window information in an iteration mode and combining a subtree prior model obtained by the offline training part, thereby correcting the target detection results of various DPM detectors and obtaining the final multi-target detection result.
And obtaining 520-dimensional Gist characteristics of each picture in the training set in the second step of obtaining the subtree model through offline training, wherein the obtaining process comprises the following steps: firstly, filtering an image by utilizing a group of Gabor filter groups with different scales and directions to obtain a group of filtered images; then, respectively carrying out non-overlapping grid division on the filtered image according to a fixed size, and solving a mean value of each grid after image division; finally, cascading the mean values of each grid obtained from the image group to form a global feature, and obtaining a final 520-dimensional Gist feature of the image, wherein the expression is as follows:
wherein,gist feature representing the jth image, cat representing a cascade of features, IjRepresenting the j-th image with a division grid of r × l, gmnRespectively representing n-directional Gabor filters of m-scale,representing the convolution of the image with a Gabor filter, ncThe number of filters representing convolution is m x n,has a dimension of r × l × nc. The scheme adopts 4 Gabor filters with the scales and the directions of 8;
wherein, the step two of obtaining the sub-tree model by offline training adopts an improved spectral clustering method to obtain 6-8 types of sub-scenes, and the specific steps are as follows: firstly, inputting Gist characteristics of each image in a training set, and obtaining a similarity matrix representing the similarity between each image in the training set by using a Random Forest method; then, the similarity matrix is used as input, and a spectral clustering method is adopted to cluster the training set pictures, so as to realize scene division of different training set pictures.
Step three of obtaining the subtree model by offline training is to blend a subtree context model of the consistency target pair when the subtree model is trained, and the specific steps are as follows:
(1)
the components of the acquisition consistency object are expressed as follows
Wherein (l)x(oik),ly(oik)) represents the coordinates of the center position of the object box of the kth instance of the ith class of object in the o graph. Scale sc (oik) is expressed as the square root of the area of the target box for this example, and view angle p (oik) is obtained as the aspect ratio of the target box; similarly (l)x(qil),ly(qil)) represents the center position coordinates of the object frame of the i-th instance of the i-th class object in the q-graph. Scale sc (qi l), view angle p (qi l); using variablesRepresenting the corresponding changes of the same kind of target variables in the two graphs in different four-dimensional space, wherein R ∈ R represents the mutual relation, R represents the same kind of target corresponding relation set of the two graphs in each consistency target pair,showing the relationship of the mutual change of the target positions,showing the relationship of the target dimensions to each other,representing facies of a target perspectiveA tautomeric relationship; judging whether the corresponding target pair accords with consistency distribution or not through the mapping distribution calculated by the formula (2), if so, judging that the corresponding target pair belongs to the same target paradigm, namely a consistency target pair;
(2)
generating final target group sets under different subspaces by utilizing greedy clustering, and avoiding the sensitivity of conversion space region division and redundancy caused by similar target group generation by adopting a soft voting mode; if the frequency of the target occurrence is not more than 50% in the target group, the target is removed from the target group; finally forming target groups under different scene subspaces; on the basis of the formed target group, forming a consistent target pair in the same target group through pairwise combination of different types of targets;
(3)
the local context information between the targets is described and represented through the symbiosis and mutual position relation between the proposed consistent target pair and the single target; the method comprises the following steps: first, the correlation of the consistency target pair and the sub-scenes is characterized:
θit=cfit×isfi (3)
wherein, cfitRepresenting the frequency, isf, of occurrence of the ith coherent object pair in the tth sub-sceneiAn inverse scene frequency index representing the ith correspondence target pair, which is expressed as follows:
wherein T represents the total type number of the sub-scenes, TtIndicating the number of sub-scene types containing the ith coincidence target pair, ξ is a minimum to avoid isfiThe value is 0; all the correlation coefficients theta are obtaineditThen, carrying out normalization processing;
(4)
establishing a binary tree for describing the symbiosis of the targets and a Gaussian tree for describing the position relationship of the targets under different sub-scenes t by using the labeling information of the training set pictures, wherein the binary tree and the Gaussian tree describe the position relationship of the targets and together depict a prior subtree model;
the joint probability of whether all the targets appear in the binary tree is expressed as:
p(b|zt)=p(broot|zt)∏ip(bi|bpa(i),zt) (5)
where i represents a node in the tree, pa (i) represents a parent node of node i, biE {0,1} represents whether the target i appears in the image; with b ≡ { b ≡ biRepresents all target classes; brootRoot node, z, representing a subtreetIs a discrete variable representing the tth sub-scene space;
position L of object iiDepending on the occurrence of the target, the interdependencies between its locations have a binary tree structure consistent with the occurrence of the target, which is expressed as follows:
p(L|b)=p(Lroot|brootip(Li|Lpa(i),bi,bpa(i)) (6)
wherein L isrootIndicating the location of the root node, Lpa(i)Indicating the location of the parent node.
Then the joint distribution of the variable b and the position L is represented as:
whereinExpressed as:
(5) combining the detection result of the trained single target detector DPM with Gist
The global features are integrated into the prior model, and the global features are represented by g, then the joint distribution is represented as:
wherein,expressed as:
Wikindicating the position of the kth candidate window obtained with a single target detection of the target class i,
sika score representing a kth candidate window obtained with a single target detection of the target class i; c. CikWhether the kth candidate window of the target class i is correctly detected is shown, if so, the value is 1, otherwise, the value is 0;
(6) the training subtree model mainly comprises the learning of a tree structure and the learning of related parameters; when the Chow-Liu algorithm is used for carrying out prior model learning of the tree structure, the correlation theta between the consistency target pair depicted in the formula (3) and the sceneitChanging the mutual information S of the parent and child nodes in the target pairi
Si=Si×(1+sigm(θit)) (11)
Then, completing the structure learning of the subtree prior model according to the maximum weight;
for the learning of model parameters, first, p (b) in formula (8)i|bpa(i)) The method comprises the steps of obtaining through counting symbiosis and consistency target pairs of targets and mutual information change; p (L)i|Lpa(i),bi,bpa(i)) Taking values according to the appearance of the parent-child nodes, co-occurrence of the common parent-child nodes, appearance of the child nodes and absence of the child nodes, and taking Gaussian distribution into consideration to obtain the values:
p (g | b) is estimated by the Gist global feature of each training image in equation (9)i) Specifically obtained by the following formula:
for global feature g, p (b) is estimated by adopting a logistic regression methodi|g);
Integrating the corresponding detection results of a single basis detector, first the probability p (c) of correct detectionik|bi) Its value is closely related to whether the target appears, the form is as follows:
when the target does not appear, the correct detection rate is 0, and when the target appears, the correct detection probability is the ratio of the number of correct detections to the total number of labels marked by the target in the training set;
then, the position probability p (W) of the window is detectedik|cik,Li) Is a Gaussian distribution, relying on correct detection of cikAnd location L of target class iiIt is expressed as:
wherein, when the window is correctly detected, then WikConforming to a Gaussian distribution, ΛiA variance representing a predicted position of the target; if the window is not correctly detected, WikNot dependent on LiMay be expressed as a constant;
finally, the score probability p(s) for the basis detectorik|cik) Depending on the result c of the correct detectionikExpressed as:
wherein p (c)ik|sik) And estimating by adopting a logistic regression method.
Wherein the on-line matching section:
(1) in the detection, firstly, the method in formula (1) is adopted to obtain Gist global feature for the input image j
(2) Then, according to Gist characteristics of the input image, dividing the image into corresponding scene subspaces in training and obtaining probability distribution of the corresponding scene subspaces; the probability distribution of the corresponding sub-scene is specifically expressed as:
wherein,representing the inverse of the distance of the input picture j to the center of the t sub-scene clusters,represents the reciprocal sum of distances to all cluster centers; using normalized probability representationThe probability of belonging to a certain sub-scene;
(3) obtaining initial detection scores and detection window information of each target of the image by using the trained DPM detectors of different targets;
(4) by utilizing the probability distribution of the sub-scenes obtained in the steps (2) and (3), the detection scores of all targets and the detection window information, and combining a sub-tree prior model obtained by an offline training part, an iteration mode is adopted to obtain the maximum posterior estimation of the target detection and the probability of correctness, so that the target detection results of various DPM detectors are corrected to obtain the final multi-target detection result; obtained by iterative optimization of the following formula:
the invention has the beneficial effects that: in the system, aiming at the problems of insufficient information of the target and the like, by means of related information from outside the target in the picture or video, such as scene information of the target and interrelation among different targets, auxiliary information is directly or indirectly provided for target detection, so that the accuracy of target detection is improved. The system utilizes Gist global characteristics representing the context information of the global scene to realize scene selection, then, aiming at different scene subspaces, the concept of a consistent target pair is put forward while the symbiotic relationship and the position relationship between single targets are merged, and the concept is taken as important local context information to be merged into a target detection model of a corresponding subtree structure. And changing corresponding mutual information weight in the formation process of the subtree target detection model through the consistency target pair. Thereby changing the structure of the subtree target detection model using the local context information of the consistency target pair. The method utilizes the global context information to distinguish different scenes, then forms a corresponding target detection model according to the interrelation between targets in different scenes, effectively reduces the mutual interference between the targets in different scenes, enhances the mutual constraint between the targets by introducing a consistent target pair, provides more robust local context information, and further improves the accuracy of multi-target detection compared with the existing system.
Description of the drawings:
FIG. 1 is a flow chart of the prior art;
FIG. 2 is a flow chart of the present invention for a vehicle type identification algorithm;
FIG. 3 is a schematic diagram of the consistent target pair acquisition of the present invention;
FIG. 4 is a diagram of some detection results of the multi-target detection method based on context information according to the present invention.
The specific implementation mode is as follows:
by means of the related information from the outside of the target to be detected in the image or video, such as the scene information of the target to be detected, the mutual relation between different targets and the target to be detected, and the like, the auxiliary information can be directly or indirectly provided for the target to be detected, the target to be detected is more abundantly depicted, and therefore the accuracy of target detection can be improved. Based on the thought, the invention provides a multi-target detection system fusing various context information, and the system consists of a scene selection layer and a sub-tree layer. Firstly, a scene selection layer is obtained through Gist global characteristics, then, in a corresponding sub-scene, symbiosis and position relations between targets are described through a probability graph model of a tree structure by using a single target and a consistent target, a sub-tree layer is obtained, and therefore, multi-target detection is achieved through global and local context information. During training, firstly, representing global context information by using Gist characteristics in a scene selection layer, obtaining an initial scene subset by using the characteristics and adopting an improved spectral clustering method, and selecting root nodes of sub-trees under the subset; then, under the corresponding subset, the labeled training set images are utilized, the local context information between the targets is represented through the proposed symbiosis and mutual position relation between the consistent target pair and the single target, and different sub-tree models are obtained through training by utilizing the local information. When the target detection is carried out, firstly, the Gist characteristic of an input picture is calculated and obtained, in a scene selection layer, the corresponding scene where the picture is located is selected by utilizing the characteristic through the distance between the characteristic and a scene clustering center, and the corresponding selection probability is obtained; and then, operating existing single target basic detectors DPM of all target classes to obtain corresponding target detection windows and corresponding target detection scores, and obtaining target detection results by combining Gist characteristics through a trained context model. The method utilizes the acquired local and global context information to reduce or remove the detection result obtained according to the appearance target detector, and completes the correction of the single target detection result to obtain the final target detection result.
The embodiment realizes the steps of the context-based multi-target detection system:
an offline training part of the multi-target detection system, 1) firstly, labeling image target classes in a training set by using LableMe software aiming at the training set to obtain a training set image of a target identifier. 2) Calculating Gist characteristics of the pictures in the training set to obtain global context information; then, scene division is realized by utilizing an improved spectral clustering method; 3) representing scenes through hidden variables, and then acquiring symbiosis and position distribution information of targets according to the labeling results of the targets of the training pictures in different scenes; 4) judging whether the two targets are consistent targets or not by calculating the mapping distribution of the target pairs in the two pictures in the training set in the conversion space to form consistent target pairs; 5) using 3), the symbiosis and position distribution information obtained in 4) and the consistency target to learn a tree structure through a weighted Chow-Liu algorithm, and then training parameters to obtain a subtree model.
On-line matching model
1) Upon detection, first, Gist characteristics of an input image are calculated. 2) Then, according to Gist characteristics of the input image, dividing the image into corresponding scene subspaces in training and obtaining probability distribution of the corresponding scene subspaces; 3) secondly, obtaining detection scores and detection window information of all targets of the image through trained DPM detectors of different targets; 4) and (3) obtaining scene probability distribution, target detection scores and detection window information by using the scene probability distribution obtained in step 2) and step 3), solving maximum posterior estimation of target detection and probability of correctness by adopting an iteration mode and combining a subtree prior model obtained by an offline training part, and thus correcting target detection results of various DPM detectors to obtain a final multi-target detection result (DPM detector: DPM (Deformable Parts model) is a very successful target detection algorithm, and a detection champion of VOC (visual object class) is continuously obtained for years. It has become an important part of many classifiers, segmentation, human body posture and behavior classification. In 2010, the inventor, Pedro Felzenszwalb, was awarded a "lifelong achievement prize" by VOC. DPM can be viewed as an extension of HOG (histograms of organized gradients), with the general idea being consistent with HOG. Firstly, calculating a gradient direction histogram, and then training by using an SVM (support Vector machine) to obtain a gradient Model (Model) of the object. With such templates, it can be directly used for classification, and a simple understanding is that the model matches the target).
The implementation flow of the scheme is shown in fig. 2.
Aiming at the above flow, the scheme is elaborated in detail:
sub-line training to obtain sub-tree model
1) Firstly, target labeling is carried out on a training set image by using LabelMe software, a training set image containing target type and position information is obtained, and a DPM detector of each target in the image is trained.
2) And then, calculating Gist characteristics of the samples in the training set so as to obtain global context information of the sample images, and realizing different scene division by utilizing an improved spectral clustering method. The detailed steps are as follows:
(2.1)
obtain the Gist characteristic of 520 dimensions of each picture in the training set. The acquisition process comprises the following steps: firstly, filtering an image by utilizing a group of Gabor filter groups with different scales and directions to obtain a group of filtered images; then, respectively carrying out non-overlapping grid division on the filtered image according to a fixed size, and solving a mean value of each grid after image division; finally, cascading the mean values of each grid obtained from the image group to form a global feature, and obtaining a final 520-dimensional Gist feature of the image, wherein the expression is as follows:
wherein,gist feature representing the jth image, cat representing a cascade of features, IjRepresenting the j-th image with a division grid of r × l, gmnRespectively representing n-directional Gabor filters of m-scale,representing the convolution of the image with a Gabor filter, ncThe number of filters representing convolution is m x n,has a dimension of r × l × nc. The scheme adopts 4 Gabor filters with the size in the 8 directions.
And (2.2) aiming at the Gist characteristics of each picture of the obtained training set, obtaining 6-8 types of sub-scenes by adopting an improved spectral clustering method. The specific process is as follows: firstly, inputting Gist characteristics of each image in a training set, and obtaining a similarity matrix representing the similarity between each image in the training set by using a Random Forest method; then, the similarity matrix is used as input, and a spectral clustering method is adopted to cluster the training set pictures, so as to realize scene division of different training set pictures.
3) And in each different scene subspace, training a corresponding sub-tree model in the scene by using the image subset obtained in the scene subspace and adopting a probability graph model with a tree structure. In the scheme, when a subtree model is trained, a consistency target pair is integrated to describe pairwise relations between targets, and a subtree context model of the consistency target pair is provided. The specific process is as follows:
(3.1) firstly, according to the consistent distribution of two adjacent heterogeneous targets in two different images in the scene subspace on the spatial position, the scale and the visual angle, obtaining a consistent target pair in the scene subspace. The specific process of obtaining the consistency target pair is shown in fig. 3. Each component thereof is represented as follows
Wherein (l)x(oik),ly(oik)) represents the coordinates of the center position of the object box of the kth instance of the ith class of object in the o graph. Scale sc (oik) is expressed as the square root of the area of the target box for this example, and view angle p (oik) is obtained as the aspect ratio of the target box; similarly (l)x(qil),ly(qil)) represents the center position coordinates of the object frame of the i-th instance of the i-th class object in the q-graph. Scale sc (qi l), view angle p (qi l); using variablesRepresenting the corresponding changes of the same kind of target variables in the two graphs in different four-dimensional space, wherein R ∈ R represents the mutual relation, R represents the same kind of target corresponding relation set of the two graphs in each consistency target pair,showing the relationship of the mutual change of the target positions,showing the relationship of the target dimensions to each other,representing the mutual variation relation of the target view angles; judging whether the corresponding target pair accords with consistency distribution or not through the mapping distribution calculated by the formula (2), if so, judging that the corresponding target pair belongs to the same target paradigm, namely a consistency target pair;
(3.2)
and (5) generating a final target group set under different subspaces by means of greedy clustering. And the method adopts a soft voting mode to avoid the sensitivity of the division of the conversion space region and the redundancy caused by the generation of a similar target group. Meanwhile, in order to reduce the level number of the target group, if the frequency of the occurrence of the target is not more than 50% in the target group, the target is removed from the target group, and through the above operations, the target group under different scene subspaces is finally formed. On the basis of the formed target group, in the same target group, a consistency target pair is formed through pairwise combination of different types of targets.
(3.3)
The local context information between the targets is represented by the co-occurrence and mutual position relation between the proposed consistent target pair and the single target. First, the correlation of the consistency object pair and the sub-scenes is characterized. As shown in the following formula:
θit=cfit×isfi (3)
wherein, cfitRepresenting the frequency, isf, of occurrence of the ith coherent object pair in the tth sub-sceneiAn inverse scene frequency index representing the ith correspondence target pair, which is expressed as follows:
wherein T represents the total type number of the sub-scenes, TtIndicating the number of sub-scene types containing the ith coincidence target pair, ξ is a minimum value to avoid isfiThe value is 0. All the correlation coefficients theta are obtaineditThen, normalization processing is performed.
(3.4)
And establishing a binary tree for describing the symbiosis of the targets and a Gaussian tree for describing the position relationship of the targets under different sub-scenes t by using the labeling information of the training set pictures, wherein the binary tree and the Gaussian tree are used for describing the position relationship of the targets and together depict a prior subtree model.
The joint probability of whether all the targets appear in the binary tree is expressed as:
p(b|zt)=p(broot|ztip(bi|bpa(i),zt) (5)
where i represents a node in the tree, pa (i) represents a parent node of node i, biE {0,1} represents whether or not the object i appears in the image. With b ≡ { b ≡ biRepresents all target classes. brootRoot node, z, representing a subtreetIs a discrete variable representing the t-th sub-scene space.
Position L of object iiDepending on the occurrence of the target, the interdependencies between its locations have a binary tree structure consistent with the occurrence of the target, which is expressed as follows:
p(L|b)=p(Lroot|brootip(Li|Lpa(i),bi,bpa(i)) (6)
wherein L isrootIndicating the location of the root node, Lpa(i)Indicating the location of the parent node.
Then the joint distribution of the variable b and the position L is represented as:
whereinExpressed as:
(3.5) the detection result of the trained single target detector DPM and the Gist global feature are merged into a prior model, the global feature is represented by g, and the joint distribution is represented as:
wherein,expressed as:
Wikrepresenting the position, s, of the kth candidate window obtained with a single target detection of the target class iikA score representing a kth candidate window obtained with a single target detection of the target class i; c. CikAnd whether the kth candidate window of the target class i is correctly detected is shown, if so, the value is 1, and otherwise, the value is 0.
And (3.6) training the subtree model mainly comprises learning of a tree structure and learning of related parameters. When using Chow-Liu algorithm to do prior model learning of tree structure, through
(3.3) correlation of the correspondence object pair depicted in (3) with the scene θitChanging the mutual information S of the parent and child nodes in the target pairi. The concrete expression is as follows:
Si=Si×(1+sigm(θit)) (11)
and then, completing the structure learning of the subtree prior model according to the maximum weight.
For the learning of model parameters, first, p (b) in the formula (8)i|bpa(i)) The method is obtained by counting the pair of the symbiosis and consistency targets and the mutual information change of the targets. p (L)i|Lpa(i),bi,bpa(i)) And obtaining values by considering Gaussian distribution according to three conditions of occurrence of parent and child nodes, co-occurrence of common-division parent and child nodes, occurrence of child nodes and non-occurrence of child nodes. The specific form is as follows:
(9) where p (g | b) is estimated by the Gist global feature of each training imagei) Specifically obtained by the following formula:
for global feature g, p (b) is estimated by adopting a logistic regression methodi|g)。
Integrating the corresponding detection results of a single basis detector, first the probability p (c) of correct detectionik|bi) Its value is closely related to whether the target appears, the form is as follows:
and when the target does not appear, the correct detection rate is 0, and when the target appears, the correct detection probability is the ratio of the number of correct detections to the total number of labels marked on the target in the training set.
Then, the position probability p (W) of the window is detectedik|cik,Li) Is a Gaussian distribution, relying on correct detection of cikAnd location L of target class iiIt is expressed as:
wherein, when the window is correctly detected, then WikConforming to a Gaussian distribution, ΛiA variance representing a predicted position of the target; if the window is not correctly detected, WikNot dependent on LiAnd may be expressed as a constant.
Finally, the score probability p(s) for the basis detectorik|cik) Depending on the result c of the correct detectionikExpressed as:
wherein p (c)ik|sik) And estimating by adopting a logistic regression method.
Two-wire matching section
4) In the detection, the method in 2) is firstly adopted for the input image j to obtain the Gist global feature
5) Then, according to Gist characteristics of the input image, the image is divided into corresponding scene subspaces in training, and probability distribution of the corresponding scene subspaces is obtained. Wherein, the probability distribution of the corresponding sub-scene is specifically expressed as:
wherein,representing the inverse of the distance of the input picture j to the center of the t sub-scene clusters,representing the reciprocal sum of distances to all cluster centers. Using normalized probability representationThe probability of belonging to a certain sub-scene.
6) Obtaining initial detection scores and detection window information of each target of the image by using the trained DPM detectors of different targets;
7) and (3) solving the maximum posterior estimation of the target detection and the probability of correctness by utilizing the probability distribution of the sub-scenes obtained in the step 5) and the step 6) and the values of all target detection and the information of the detection windows in an iterative mode in combination with a sub-tree prior model obtained by an offline training part, thereby correcting the target detection results of various DPM detectors to obtain the final multi-target detection result. In particular by iterative optimization of the following formula.
The scheme integrates context information and enriches target expression, and as shown in fig. 4, the multi-target detection method based on the context information obtains satisfactory detection results.

Claims (4)

1. A multi-target detection method based on context information is characterized by comprising an offline training model and an online matching model,
and (3) obtaining a subtree model by offline training:
the method comprises the following steps: firstly, labeling image target classes in a training set by using LableMe software aiming at the training set to obtain a training set image of a target identifier; training DPM detectors of all targets in the image;
step two: calculating Gist characteristics of the pictures in the training set to obtain global context information; then, scene division is realized by utilizing an improved spectral clustering method;
step three: representing scenes through hidden variables, and then acquiring symbiosis and position distribution information of targets according to the labeling results of the targets of the training pictures in different scenes;
step four: judging whether the two targets are consistent targets or not by calculating the mapping distribution of the target pairs in the two pictures in the training set in the conversion space to form consistent target pairs;
step five: learning a tree structure through a weighted Chow-Liu algorithm by using the symbiosis and position distribution information and the consistency target obtained in the third step and the fourth step, and then training parameters to obtain a subtree model;
and (3) matching models on line:
the method comprises the following steps: when detecting, firstly, calculating Gist characteristics of an input image;
step two: then, according to Gist characteristics of the input image, dividing the image into corresponding scene subspaces in training and obtaining probability distribution of the corresponding scene subspaces;
step three: secondly, obtaining detection scores and detection window information of all targets of the image through trained DPM detectors of different targets;
step four: and (4) calculating the maximum posterior estimation of the target detection and the probability of correctness by utilizing the scene probability distribution obtained in the second step and the third step, the target detection scores and the detection window information in an iteration mode and combining a subtree prior model obtained by the offline training part, thereby correcting the target detection results of various DPM detectors and obtaining the final multi-target detection result.
2. A multi-target detection method based on context information is characterized in that 520-dimensional Gist characteristics of each picture in a training set are obtained in a second step of obtaining a subtree model through offline training, and the obtaining process comprises the following steps: firstly, filtering an image by utilizing a group of Gabor filter groups with different scales and directions to obtain a group of filtered images; then, respectively carrying out non-overlapping grid division on the filtered image according to a fixed size, and solving a mean value of each grid after image division; finally, cascading the mean values of each grid obtained from the image group to form a global feature, and obtaining a final 520-dimensional Gist feature of the image, wherein the expression is as follows:
wherein,gist feature representing the jth image, cat representing a cascade of features, IjRepresenting the j-th image with a division grid of r × l, gmnRespectively representing n-directional Gabor filters of m-scale,representing the convolution of the image with a Gabor filter, ncThe number of filters representing convolution is m x n,has a dimension of r × l × nc
3. The multi-target detection method based on context information as claimed in claim 1, wherein: in the second step of obtaining the sub-tree model by offline training, an improved spectral clustering method is adopted to obtain 6-8 types of sub-scenes, and the specific steps are as follows: firstly, inputting Gist characteristics of each image in a training set, and obtaining a similarity matrix representing the similarity between each image in the training set by using a Random Forest method; then, the similarity matrix is used as input, and a spectral clustering method is adopted to cluster the training set pictures, so as to realize scene division of different training set pictures.
4. The multi-target detection method based on context information as claimed in claim 1, wherein: and step three of obtaining a subtree model by offline training, wherein scenes are represented by hidden variables, and then a subtree context model of a consistent target pair is merged into the scenes when symbiosis and position distribution information of targets are obtained according to the labeling result of the targets of the training pictures in different scenes, and the method specifically comprises the following steps of:
(1) firstly, obtaining a consistent target pair in a scene subspace according to consistent distribution of two adjacent heterogeneous targets in two different images in the scene subspace on the spatial position, the scale and the visual angle;
the components of the acquisition consistency object are expressed as follows
Wherein (l)x(oik),ly(oik)) coordinates of a center position of the object box representing the kth instance of the ith class of object in the o graph; scale sc (oik) is expressed as the square root of the area of the target box for this example, and view angle p (oik) is obtained as the aspect ratio of the target box; similarly (l)x(qil),ly(qil)) coordinates of the center position of the target frame of the l-th instance of the i-th class target in the q-graph; scale sc (qi l), view angle p (qi l); using variablesRepresenting the corresponding changes of the same kind of target variables in the two graphs in different four-dimensional space, wherein R ∈ R represents the mutual relation, R represents the same kind of target corresponding relation set of the two graphs in each consistency target pair,showing the relationship of the mutual change of the target positions,showing the relationship of the target dimensions to each other,representing the mutual variation of the view angles of an objectTransforming the relationship; judging whether the corresponding target pair accords with consistency distribution or not through the mapping distribution calculated by the formula (2), if so, judging that the corresponding target pair belongs to the same target paradigm, namely a consistency target pair;
(2) generating final target group sets under different subspaces by utilizing greedy clustering, and avoiding the sensitivity of conversion space region division and redundancy caused by similar target group generation by adopting a soft voting mode; if the frequency of the target occurrence is not more than 50% in the target group, the target is removed from the target group; finally forming target groups under different scene subspaces; on the basis of the formed target group, forming a consistent target pair in the same target group through pairwise combination of different types of targets;
(3) the local context information between the targets is described and represented through the symbiosis and mutual position relation between the proposed consistent target pair and the single target; the method comprises the following steps: first, the correlation of the consistency target pair and the sub-scenes is characterized:
θvt=cfvt×isfv (3)
wherein, cfvtRepresenting the frequency, isf, of occurrence of the vth coherent target pair in the tth sub-scenevAn inverse scene frequency index representing the v-th correspondence target pair, which is expressed as follows:
wherein T represents the total type number of the sub-scenes, TtIndicating the number of sub-scene types containing the vth pair of consistency objects, ξ is a minimum to avoid isfvThe value is 0; all the correlation coefficients theta are obtainedvtThen, carrying out normalization processing;
(4) establishing a binary tree for describing the symbiosis of the targets and a Gaussian tree for describing the position relationship of the targets under different sub-scenes t by using the labeling information of the training set pictures, wherein the binary tree and the Gaussian tree describe the position relationship of the targets and together depict a prior subtree model;
the joint probability of whether all the targets appear in the binary tree is expressed as:
p(b|zt)=p(broot|zt)∏wp(bw|bpa(w),zt) (5)
where w represents a node in the tree, pa (w) represents a parent node of node w, bwE {0,1} represents whether the target w appears in the image; with b ≡ { b ≡ bwRepresents all target classes; brootRoot node, z, representing a subtreetIs a discrete variable representing the tth sub-scene space;
position L of target wwDepending on the occurrence of the target, the interdependencies between its locations have a binary tree structure consistent with the occurrence of the target, which is expressed as follows:
p(L|b)=p(Lroot|broot)∏wp(Lw|Lpa(w),bw,bpa(w)) (6)
wherein L isrootIndicating the location of the root node, Lpa(w)Representing the location of the parent node;
then the joint distribution of the variable b and the position L is represented as:
whereinExpressed as:
(5) and (3) integrating the detection result of the trained single target detector DPM and the Gist global feature into a prior model, wherein the global feature is represented by g, and the joint distribution is represented as:
wherein,expressed as:
Wwkrepresenting the position, s, of the kth candidate window obtained with a single target detection of the target class wwkRepresents the score of the kth candidate window obtained with a single target detection of the target class w; c. CwkWhether the kth candidate window of the target class w is correctly detected is shown, if so, the value is 1, otherwise, the value is 0; (6) the training subtree model mainly comprises the learning of a tree structure and the learning of related parameters; when the Chow-Liu algorithm is used for carrying out prior model learning of the tree structure, the correlation theta between the consistency target pair depicted in the formula (3) and the scenewtChanging the mutual information S of the parent and child nodes in the target pairw
Sw=Sw×(1+sigm(θwt)) (11)
Then, completing the structure learning of the subtree prior model according to the maximum weight;
for the learning of model parameters, first, p (b) in formula (8)w|bpa(w)) The method comprises the steps of obtaining through counting symbiosis and consistency target pairs of targets and mutual information change; p (L)w|Lpa(w),bw,bpa(w)) Taking values according to the appearance of the parent-child nodes, co-occurrence of the common parent-child nodes, appearance of the child nodes and absence of the child nodes, and taking Gaussian distribution into consideration to obtain the values:
p (g | b) is estimated by the Gist global feature of each training image in equation (9)w) Specifically obtained by the following formula:
for global feature g, p (b) is estimated by adopting a logistic regression methodw|g);
Integrating the corresponding detection results of a single basis detector, first the probability p (c) of correct detectionwk|bw) Its value is closely related to whether the target appears, the form is as follows:
when the target does not appear, the correct detection rate is 0, and when the target appears, the correct detection probability is the ratio of the number of correct detections to the total number of labels marked by the target in the training set;
then, the position probability p (W) of the window is detectedwk|cwk,Lw) Is a Gaussian distribution, relying on correct detection of cikAnd the location L of the target class wwIt is expressed as:
wherein, when the window is correctly detected, then WwkConforming to a Gaussian distribution, ΛwA variance representing a predicted position of the target; if the window is not correctly detected, WwkNot dependent on LwMay be expressed as a constant;
finally, the score probability p(s) for the basis detectorwk|cwk) Depending on the result c of the correct detectionwkExpressed as:
wherein p (c)wk|swk) And estimating by adopting a logistic regression method.
CN201610785155.XA 2016-08-31 2016-08-31 Multi-target detection method based on contextual information Expired - Fee Related CN106446933B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610785155.XA CN106446933B (en) 2016-08-31 2016-08-31 Multi-target detection method based on contextual information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610785155.XA CN106446933B (en) 2016-08-31 2016-08-31 Multi-target detection method based on contextual information

Publications (2)

Publication Number Publication Date
CN106446933A CN106446933A (en) 2017-02-22
CN106446933B true CN106446933B (en) 2019-08-02

Family

ID=58091496

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610785155.XA Expired - Fee Related CN106446933B (en) 2016-08-31 2016-08-31 Multi-target detection method based on contextual information

Country Status (1)

Country Link
CN (1) CN106446933B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951574B (en) * 2017-05-03 2019-06-14 牡丹江医学院 A kind of information processing system and method based on computer network
CN107832795B (en) * 2017-11-14 2021-07-27 深圳码隆科技有限公司 Article identification method and system and electronic equipment
CN108062531B (en) * 2017-12-25 2021-10-19 南京信息工程大学 Video target detection method based on cascade regression convolutional neural network
CN109977738B (en) * 2017-12-28 2023-07-25 深圳Tcl新技术有限公司 Video scene segmentation judging method, intelligent terminal and storage medium
CN108363992B (en) * 2018-03-15 2021-12-14 南京钜力智能制造技术研究院有限公司 Fire early warning method for monitoring video image smoke based on machine learning
CN109241819A (en) * 2018-07-07 2019-01-18 西安电子科技大学 Based on quickly multiple dimensioned and joint template matching multiple target pedestrian detection method
CN110288629B (en) * 2019-06-24 2021-07-06 湖北亿咖通科技有限公司 Target detection automatic labeling method and device based on moving object detection
CN110334639B (en) * 2019-06-28 2021-08-10 北京精英系统科技有限公司 Device and method for filtering error detection result of image analysis detection algorithm
CN111079674B (en) * 2019-12-22 2022-04-26 东北师范大学 Target detection method based on global and local information fusion
CN111080639A (en) * 2019-12-30 2020-04-28 四川希氏异构医疗科技有限公司 Multi-scene digestive tract endoscope image identification method and system based on artificial intelligence
CN111814885B (en) * 2020-07-10 2021-06-22 云从科技集团股份有限公司 Method, system, device and medium for managing image frames
CN112052350B (en) * 2020-08-25 2024-03-01 腾讯科技(深圳)有限公司 Picture retrieval method, device, equipment and computer readable storage medium
CN112148267A (en) * 2020-09-30 2020-12-29 深圳壹账通智能科技有限公司 Artificial intelligence function providing method, device and storage medium
CN112395974B (en) * 2020-11-16 2021-09-07 南京工程学院 Target confidence correction method based on dependency relationship between objects
CN113138924B (en) * 2021-04-23 2023-10-31 扬州大学 Thread safety code identification method based on graph learning
CN112906696B (en) * 2021-05-06 2021-08-13 北京惠朗时代科技有限公司 English image region identification method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577832A (en) * 2012-07-30 2014-02-12 华中科技大学 People flow statistical method based on spatio-temporal context
CN104778466A (en) * 2015-04-16 2015-07-15 北京航空航天大学 Detection method combining various context clues for image focus region
CN104933735A (en) * 2015-06-30 2015-09-23 中国电子科技集团公司第二十九研究所 A real time human face tracking method and a system based on spatio-temporal context learning
CN105631895A (en) * 2015-12-18 2016-06-01 重庆大学 Temporal-spatial context video target tracking method combining particle filtering
CN105740891A (en) * 2016-01-27 2016-07-06 北京工业大学 Target detection method based on multilevel characteristic extraction and context model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577832A (en) * 2012-07-30 2014-02-12 华中科技大学 People flow statistical method based on spatio-temporal context
CN104778466A (en) * 2015-04-16 2015-07-15 北京航空航天大学 Detection method combining various context clues for image focus region
CN104933735A (en) * 2015-06-30 2015-09-23 中国电子科技集团公司第二十九研究所 A real time human face tracking method and a system based on spatio-temporal context learning
CN105631895A (en) * 2015-12-18 2016-06-01 重庆大学 Temporal-spatial context video target tracking method combining particle filtering
CN105740891A (en) * 2016-01-27 2016-07-06 北京工业大学 Target detection method based on multilevel characteristic extraction and context model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Tree-Based Context Model for Object Recognition;M. J. Choi等;《TPAMI》;20121231;全文
基于线性拟合的多运动目标跟踪算法;李涛等;《西南师范大学学报》;20150531;第40卷(第5期);全文

Also Published As

Publication number Publication date
CN106446933A (en) 2017-02-22

Similar Documents

Publication Publication Date Title
CN106446933B (en) Multi-target detection method based on contextual information
CN108470332B (en) Multi-target tracking method and device
Benedek et al. Change detection in optical aerial images by a multilayer conditional mixed Markov model
Zhao et al. Saliency detection by multi-context deep learning
Zhou et al. Salient object detection via fuzzy theory and object-level enhancement
Khan et al. An efficient contour based fine-grained algorithm for multi category object detection
Venugopal Automatic semantic segmentation with DeepLab dilated learning network for change detection in remote sensing images
Jia et al. Visual tracking via coarse and fine structural local sparse appearance models
Alidoost et al. A CNN-based approach for automatic building detection and recognition of roof types using a single aerial image
CN103425996B (en) A kind of large-scale image recognition methods of parallel distributed
Shahab et al. How salient is scene text?
Schwalbe Concept embedding analysis: A review
CN106056627B (en) A kind of robust method for tracking target based on local distinctive rarefaction representation
Jiang et al. Multi-feature tracking via adaptive weights
Naseer et al. Multimodal Objects Categorization by Fusing GMM and Multi-layer Perceptron
CN110909678B (en) Face recognition method and system based on width learning network feature extraction
Li et al. Automatic annotation algorithm of medical radiological images using convolutional neural network
Wang et al. Crop pest detection by three-scale convolutional neural network with attention
Mazzamuto et al. Weakly supervised attended object detection using gaze data as annotations
CN110516638A (en) A kind of sign Language Recognition Method based on track and random forest
Poetro et al. Advancements in Agricultural Automation: SVM Classifier with Hu Moments for Vegetable Identification
Han Image object tracking based on temporal context and MOSSE
Lu et al. Recognizing human actions by two-level Beta process hidden Markov model
Lu et al. Visual tracking via probabilistic hypergraph ranking
Guo et al. Identifying rice field weeds from unmanned aerial vehicle remote sensing imagery using deep learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190802

Termination date: 20210831

CF01 Termination of patent right due to non-payment of annual fee