CN114913472A - Infrared video pedestrian significance detection method combining graph learning and probability propagation - Google Patents

Infrared video pedestrian significance detection method combining graph learning and probability propagation Download PDF

Info

Publication number
CN114913472A
CN114913472A CN202210167951.2A CN202210167951A CN114913472A CN 114913472 A CN114913472 A CN 114913472A CN 202210167951 A CN202210167951 A CN 202210167951A CN 114913472 A CN114913472 A CN 114913472A
Authority
CN
China
Prior art keywords
motion
follows
pixel
significance
super
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210167951.2A
Other languages
Chinese (zh)
Inventor
李露
郑玉
刘博�
罗晓燕
周付根
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202210167951.2A priority Critical patent/CN114913472A/en
Publication of CN114913472A publication Critical patent/CN114913472A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/187Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Abstract

The invention discloses an infrared video pedestrian significance detection method combining graph learning and probability propagation, which can automatically position an infrared pedestrian region in an infrared video and accurately distinguish pedestrians from a background. The method comprises the following steps: firstly, the method comprises the following steps: generating a candidate region based on the Boolean graph; II, secondly: calculating the motion significance of each frame image; thirdly, the method comprises the following steps: constructing a multi-view space-time diagram structure; fourthly, the method comprises the following steps: and constructing and solving an energy function combining graph self-learning and significance propagation. Through the steps, the method can accurately and robustly extract the space-time significance of the infrared pedestrian target from the disordered background and multiple complex motions, almost completely inhibits the background, and has practical application value in other image processing fields such as target segmentation, target tracking, target retrieval and the like.

Description

Infrared video pedestrian significance detection method combining graph learning and probability propagation
Technical Field
The invention relates to an infrared video pedestrian significance detection method combining graph learning and probability propagation, and belongs to the field of computer vision and digital image processing. The method has wide application prospect in the fields of target segmentation, identification, tracking and the like.
Background
The detection of the saliency of images has been widely studied as an important research content in the field of computer vision, and has achieved a good effect. Video saliency detection remains a subject to be studied. Video saliency aims to automatically find and locate the portion of a given video that is most attractive to the viewer. As an effective preprocessing means, video saliency has important application in target tracking, repositioning, video compression, video summarization and the like. The existing video saliency methods are basically researched for visible light videos, but visible light images often lose effects under the challenging conditions of poor lighting conditions, extreme weather, light change and the like. The infrared imaging is used for detecting the target by means of the heat energy of passively receiving the thermal radiation of the target, is not influenced by weather and climate, and can effectively make up the deficiency of visible light images, so that the infrared imaging plays an increasingly important role in the fields of military affairs, security protection, monitoring, intelligent transportation and the like. Especially, pedestrians often have higher significance in infrared images due to the characteristic of self heat radiation, so the research of detecting the significance of infrared videos has practical significance for the development of the fields of intelligent traffic unmanned driving and the like. However, the application of video saliency to infrared images remains challenging for existing approaches, and no related approach has been proposed.
The classical video saliency detection model is generally based on low-level spatial features and motion features (such as color, texture and motion vector fields), heuristic rules (such as contrast and similarity), and a priori knowledge (such as foreground prior and background prior), and is further fused by simple mathematical operations. However, these direct fusion methods often have difficulty in obtaining robust detection results in videos containing complex situations, such as background clutter, lack of features in objects, and multiple motions, by performing frame-by-frame operations. In view of the link between video sequences, scholars propose a trajectory-based method, which extends video frames to a spatio-temporal tubular structure of points or region blocks, and a series of trajectory descriptors as bases for significance measurement. This type of method can capture the consistency of video in time and space well, but track clustering requires careful selection of a suitable model, which brings high computational complexity. Graph-based methods use probabilistic models to propagate significance values in both spatial and temporal domains, reducing computational load by connection constraints when constructing graphs. The construction of a robust graph model for video saliency detection involves three key techniques: the construction of an accurate initial saliency metric and map and the design of an energy function.
However, the above video saliency methods are often based on various features such as color texture of visible light images, and these methods are often ineffective in infrared images lacking these features, so these methods cannot be directly applied to infrared images, and robust features capable of describing infrared targets need to be proposed for the intrinsic characteristics of infrared videos. On the other hand, video saliency detection often needs to face complex motion states including camera motion, dynamic background and the like, and it is very important to describe the difference between a target and a background by using limited features of an infrared image and capture the continuity of an infrared video in a space domain and a time domain. The research aiming at the problems has very important significance.
Disclosure of Invention
(1) Objects of the invention
Infrared video pedestrian detection has important applications in the field of intelligent transportation, such as in pedestrian monitoring systems and in vehicle-mounted pedestrian detection systems. Because the infrared equipment can work all day long, the condition that visible light cannot be used under the conditions of poor illumination, bad weather and the like can be compensated to a great extent. The saliency detection can automatically determine salient objects in the positioning images, highlight the objects and suppress the background, and pedestrians in the infrared images just have the salient features due to the radiation characteristics of the pedestrians. However, due to the characteristics of low contrast, lack of color texture characteristics, low signal-to-noise ratio and the like of the infrared image, the existing saliency detection method for visible light is difficult to be directly applied to the infrared image. In addition, when the situation that the camera moves and the background is complex is faced in an infrared video, the motion significance of the pedestrian is difficult to accurately extract.
In order to solve the existing problems, the saliency can be well applied to an infrared pedestrian video, pedestrians are highlighted, and the pedestrians and the background are separated, the invention provides an infrared video pedestrian saliency detection method combining graph learning and probability propagation. Firstly, considering that the region-based saliency detection method often has more accurate edge information and can ensure the consistency of the interior of an object and reduce the calculation amount, the method provides a Boolean-graph-based candidate target region generation strategy for an infrared image to acquire a series of regions possibly containing salient targets. And for the generated regions, an analog descriptor is proposed to measure the possibility that each region represents a complete object. Then, considering that the gradient of the optical flow motion field has higher robustness and the image edge can often represent the motion direction of the image background, the method provides the infrared pedestrian significance description characteristics based on the motion gradient contrast and the motion direction difference, and combines with the similarity operator to obtain the probability that each region belongs to the foreground and the background. Secondly, in order to better capture the correlation of the pedestrians in the infrared video time and space, the method constructs a graph structure for the infrared video, describes the correlation among graph nodes from multiple angles of gray scale, edge and motion, and constructs a space-time graph structure by combining the correlation of time. Finally, in order to avoid errors of the manually set graph structure and optimize significance detection, the method constructs an energy function combining graph structure self-learning and significance propagation, and obtains an optimal result through iterative solution.
(2) Technical scheme
The invention discloses an infrared video pedestrian significance detection method combining graph learning and probability propagation, which comprises the following specific steps of:
the method comprises the following steps: generating a candidate region based on the Boolean graph; firstly, carrying out super-pixel segmentation on each frame of image of an infrared video; then constructing Boolean diagrams and cascading the Boolean diagrams at all levels to obtain a series of three-dimensional areas; finally, calculating similarity operators for sub-regions of each three-dimensional region, and normalizing;
in the step one, "super-pixel segmentation is performed on each frame of image of the infrared video", which is performed as follows: using SLIC method to convert t frame image I t In which the adjacent pixels with similar gray scale and structure are clustered into irregular pixel block set with certain visual significance
Figure BDA0003517322200000031
sp t,i And N t Respectively represent I t And the ith super pixel in (b) and the total number of super pixels. The gray level of each super pixel is the average value of the gray levels of the internal pixels, and the calculation formula is as follows:
Figure BDA0003517322200000032
wherein C t (p) represents the gray value of the pixel p in the t-th frame image, | sp t,i I is then superpixel sp t,i The area of (d);
in step one, "building boolean diagrams and concatenating the boolean diagrams at different levels to obtain a series of three-dimensional regions", as follows: dividing the t-th frame infrared image after the super-pixel division by using an integer from 255 to 0 as a threshold value to obtain a series of binary images to form a Boolean image B t ={B t,255 ,B t,254 ,...,B t,0 }:
B t,θ =ξ(SP t ,θ)
Where ξ is the splitting operation, SP is t Superpixels smaller than the threshold θ are labeled 0, otherwise labeled 1, B t,θ Is a boolean plot at threshold θ. According to the gray distribution characteristics of the infrared imageIt is known that B t,255 Is a completely black image or contains individual white regions, B as the threshold decreases t,θ These white areas are increasing in size, and it is known that all of them are integrated into a full white image.
Then, the slave B t,255 The connected regions appearing therein are numbered starting with 1, and all connected regions therein are assigned different numbers with consecutive integers. After B t,θ When a brand-new area which is completely not overlapped with the numbered area appears, a new number is continuously distributed to the new area; when the coded area is overlapped with the unique numbered area, the number of the area is inherited; when the number of the area is overlapped with a plurality of numbered areas, the number of the area with the largest area is inherited. Numbering all connected regions in the whole Boolean graph sequence according to the rule, and a series of regions with the same number in different Boolean graph layers can form a three-dimensional region
Figure BDA0003517322200000033
N r As an image I t The number of the obtained three-dimensional regions, and the three-dimensional regions are formed by a plurality of sub-region image layers
Figure BDA0003517322200000034
l is the layer where each sub-region is located.
Wherein, in the step one, the similarity operator is calculated and normalized for the sub-regions of each three-dimensional region, and the calculation method is as follows: first, each subregion is calculated
Figure BDA0003517322200000035
The greater the gray contrast of the inner superpixel to its neighboring outer superpixel, the more likely the region is to represent a complete object. The calculation formula is as follows:
Figure BDA0003517322200000041
wherein sp t,i Is a sub-region
Figure BDA0003517322200000042
Inner super-pixel, sp t,j Is not subject to
Figure BDA0003517322200000043
The super pixel of (2); delta (-) is an indicator function when a superpixel sp t,j Belong to and sp t,i Neighboring super-pixel sets
Figure BDA0003517322200000044
When the value is 1, otherwise, it is 0.
Then calculating each sub-region
Figure BDA0003517322200000045
The smaller the difference between the superpixels within a sub-region, the more likely it represents a complete object. The calculation formula is as follows:
Figure BDA0003517322200000046
then calculating each sub-region
Figure BDA0003517322200000047
The gradient information contained at the boundary, the more gradients contained at the boundary are more likely to be complete objects. The calculation formula is as follows:
Figure BDA0003517322200000048
wherein br t,p Is an image frame I t The response of the gradient map at pixel p,
Figure BDA0003517322200000049
is a sub-region
Figure BDA00035173222000000410
The set of boundary pixels of (2).
Figure BDA00035173222000000411
Is a sub-region
Figure BDA00035173222000000412
The area of (a).
According to the above rule, the calculation formula of the similarity descriptor is as follows:
Figure BDA00035173222000000413
and finally, normalizing the similarity operators of all the sub-regions in each three-dimensional region, wherein the calculation formula is as follows:
Figure BDA00035173222000000414
the resulting similarity operator describes the likelihood that each region can represent the complete target region.
Step two: and calculating the motion significance of each frame image. Firstly, extracting an optical flow field of a video sequence; then, calculating the motion significance of each region based on the local gradient according to the motion gradient of the image; secondly, calculating the motion significance based on the motion direction by extracting the background motion main direction; and finally, combining the motion significance with the similarity operator to obtain the probability that each super pixel belongs to the foreground/background.
Wherein, the step two of extracting the optical flow field of the video sequence refers to calculating the adjacent current image frame I by an optical flow method LDOF t To its next frame image I t+1 Between the forward motion vectors F in both the horizontal x and vertical y directions ═ Fx t ;Fy t ]. And the video is reversely calculated by using an optical flow method LDOF (laser direct imaging) to obtain a current image frame I t To its previous frame image I t-1 Backward motion vector B ═ Bx t ;By t ]。
Wherein, the calculation method of "calculating the motion saliency of each region based on the local gradient according to the motion gradient of the image" in the step twoThe method comprises the following steps: first, image frame I is calculated by summing the motion field gradients t The gradient probability of (2) is calculated as follows
Figure BDA0003517322200000051
Wherein
Figure BDA0003517322200000052
And
Figure BDA0003517322200000053
the tabulation represents the gradient along the horizontal direction and the gradient along the vertical direction.
Then for each sub-region
Figure BDA0003517322200000054
The gradient probabilities of the internal superpixels are added to obtain the motion gradient probability of each subarea, and the calculation formula is as follows:
Figure BDA0003517322200000055
wherein Mg t,p Is the motion gradient probability of the pixel p,
Figure BDA0003517322200000056
then is
Figure BDA0003517322200000057
A motion saliency value based on a motion gradient.
In step two, the calculation method of "calculating the motion saliency based on the motion direction by extracting the background motion principal direction" is as follows: firstly, an image I is extracted t Four-edge superpixel set
Figure BDA0003517322200000058
As background super-pixel, then using K-means clustering method to divide the motion vector of background super-pixel into K classes
Figure BDA0003517322200000059
And accordingly obtaining K clustering centers
Figure BDA00035173222000000510
And the main motion direction of each class is used, the class with the total number ratio of the classes less than 1/6 is deleted, and the residual clustering center is used as the background main motion direction.
Then, each sub-region is calculated
Figure BDA00035173222000000511
The difference between the inner pixel and the background main motion direction is more likely to belong to the background significant value, and the calculation formula is as follows:
Figure BDA00035173222000000512
wherein F t,p Is a motion vector for the pixel p,
Figure BDA00035173222000000513
then is
Figure BDA00035173222000000514
A motion saliency value based on a direction of motion.
The calculation method of the "combining the motion saliency with the similarity operator to obtain the probability of each super pixel belonging to the foreground/background" in the step two is as follows: firstly, motion gradient significant values and analog values of the superpixel in all candidate regions are accumulated to obtain the foreground probability of the superpixel, and the calculation formula is as follows:
Figure BDA0003517322200000061
Figure BDA0003517322200000062
then it is a super pixel sp t,i The foreground probability value of (1).
Then, the motion direction significant value and the similarity value of the superpixel in all the candidate areas are accumulated to obtain the background probability of the superpixel, and the calculation formula is as follows:
Figure BDA0003517322200000063
Figure BDA0003517322200000064
then it is a super pixel sp t,i The background probability value of (2).
Step three: and constructing a multi-view space-time diagram structure. Firstly, constructing a spatial neighborhood relationship of each node of a graph structure, and then constructing a spatial correlation matrix from three aspects of gray scale, edge and motion; a time correlation matrix between adjacent frame nodes is then constructed. And finally, combining the time correlation matrix and the space correlation matrix to obtain the space-time correlation matrix of different visual angles.
The "spatial neighborhood relationship of each node in the graph structure is constructed" in step three, which is performed as follows: each super pixel is taken as a node of the graph, and the set of the super pixels adjacent to the super pixel and the super pixels adjacent to the super pixels is taken as a spatial neighborhood of the node.
The method for "constructing the spatial correlation matrix from three aspects of gray scale, edge and motion" in step three includes the following steps: firstly, the gray scale difference value of each node and the neighborhood thereof is calculated to measure the gray scale similarity of each node, thereby constructing a gray scale correlation matrix
Figure BDA0003517322200000065
The calculation formula is as follows:
Figure BDA0003517322200000066
wherein
Figure BDA0003517322200000067
Is node sp t,i The neighborhood of (a) is determined,
Figure BDA0003517322200000068
is node sp t,i And sp t,j The gray scale dependency of (a).
Then, gradient calculation edge correlation matrix of common edges owned by each adjacent node is measured
Figure BDA0003517322200000069
The stronger the gradient information separates adjacent superpixels, the same object cannot be attributed to, and the calculation formula is:
Figure BDA00035173222000000610
wherein bp is t,i→j Is a super pixel sp t,i Neutral sp t,j Set of adjacent pixels, bp t,j→i Is a super pixel sp t,j Neutral sp t,i Neighboring sets of pixels br t,p And br t,q The gradient values of pixels p and q respectively,
Figure BDA0003517322200000071
then is sp with the super pixel t,j A set of directly adjacent superpixels.
Figure BDA0003517322200000072
Is node sp t,i And sp t,j The edge correlation of (1).
Wherein the step three describes "constructing a time correlation matrix between adjacent frame nodes", which is performed as follows: firstly, two adjacent continuous frames I t And I t+1 The forward and backward optical flows are mapped to each other, respectively, and the calculation formula is as follows:
Figure BDA0003517322200000073
pos p ,pos q respectively, is an image I t Middle pixel p and image I t+1 The position coordinate vector of the middle pixel q,
Figure BDA0003517322200000074
is to mix pos p Using f t,p Mapping to the next frame I t+1 The position vector of (a) is determined,
Figure BDA0003517322200000075
then is the pos q By using b t+1,q Mapping to previous frame I t Is determined. Then, respectively calculating the motion difference between each pixel and the mapping pixel thereof to obtain the forward and backward motion probabilities:
Figure BDA0003517322200000076
Figure BDA0003517322200000077
wherein
Figure BDA0003517322200000078
In order to be the probability of a forward motion,
Figure BDA0003517322200000079
is the backward motion probability. Finally, the forward and backward motion probabilities in the super-pixel overlapping part of the super-pixel and its adjacent frame are accumulated to measure their motion consistency, thereby constructing a time correlation matrix
Figure BDA00035173222000000710
The calculation formula is as follows:
Figure BDA00035173222000000711
Figure BDA00035173222000000712
is a matrix
Figure BDA00035173222000000713
Middle super pixel sp t,i Neutral sp t+1,j The correlation value between them.
The method for obtaining the spatio-temporal correlation matrix of different viewing angles by combining the temporal correlation matrix and the spatial correlation matrix in step three is as follows: firstly, serially connecting super pixels of two continuous frames of images to obtain a super pixel containing N-N t +N t+1 A graph of individual nodes; then, spatial correlation matrixes of gray scale, edge and motion are respectively combined with the temporal correlation matrix to obtain a space-time correlation matrix which describes the consistency of the nodes on time and space in different angles
Figure BDA00035173222000000714
N v The calculation formula is the number of the angles as follows:
Figure BDA00035173222000000715
step four: and constructing and solving an energy function combining graph self-learning and significance propagation. An energy function containing graph learning, significance propagation and joint learning items is constructed on the basis of a graph structure, and variables in the energy function are sequentially solved by using an alternative optimization method, so that an optimized graph correlation matrix and a significance detection result are obtained.
Wherein, the method for constructing the energy function including the graph learning, the significance propagation and the joint learning term on the basis of the graph structure in the fourth step is as follows: firstly, two frames of images I are constructed t And I t+1 The cascaded initial significance vector describes each node of the graph, and the calculation formula is as follows:
Figure BDA0003517322200000081
then, since the times t and t +1 do not appear in the optimization process of the energy function, in order to simplify the formula, the time indices t and t +1 are omitted from the variables, so that the energy function is constructed as follows:
Figure BDA0003517322200000082
s.t.W1=1,W≥0 n ,η1=1,η≥0
wherein the first item in the first row is a graph learning item, W is an optimized space-time correlation matrix to be learned, and the optimal combination weight is obtained by learning the correlation matrix of multiple visual angles
Figure BDA0003517322200000083
To pair
Figure BDA0003517322200000084
Linear addition fits the best W. The second row of data items is then a significance propagation item,
Figure BDA0003517322200000085
and
Figure BDA0003517322200000086
then the optimized foreground and background probabilities of significance should be kept consistent with the initial foreground and background probabilities, respectively. The last two items in the first row are joint learning items, and the optimized correlation matrix obtained by learning is used for ensuring that nodes with high consistency in time or space should keep similar significance values, and the item plays a role in graph learning and significance propagation. The third row is an optimization constraint that requires that the sum of the vectors of rows of W be 1 and that there cannot be elements less than zero. Meanwhile, the sum of the view angle weights eta is ensured to be 1 and not less than 0.
The step four of "solving the variables in sequence by using the alternative optimization method" means that when solving one variable, other variables are regarded as constants and kept unchanged by using the alternative optimization method, then the variables are updated to solve the other variables, and the energy function is solved in sequence
Figure BDA0003517322200000087
W and eta are solved as follows:
first of all, hold
Figure BDA0003517322200000088
W and eta are constants, calculating
Figure BDA0003517322200000089
The energy function is simplified as:
Figure BDA00035173222000000810
solving by derivation
Figure BDA00035173222000000811
The calculation formula of (2) is as follows:
Figure BDA00035173222000000812
then hold
Figure BDA0003517322200000091
And eta is constant, calculating
Figure BDA0003517322200000092
The energy function is simplified as:
Figure BDA0003517322200000093
the calculation formula is as follows:
Figure BDA0003517322200000094
then hold
Figure BDA0003517322200000095
And eta is a constant, and W is calculated by the following calculation method: first for each row vector W of W i The energy function is simplified as:
Figure BDA0003517322200000096
the minimization problem is solved by utilizing an Optimization toolbox of a Matlab program to obtain w i And sequentially solving the pairs from i to N to obtain the optimized W.
Then hold
Figure BDA0003517322200000097
And W is a constant, and eta is calculated by first simplifying the energy function to
Figure BDA0003517322200000098
To solve η, W and H v Conversion to column vectors
Figure BDA0003517322200000099
And
Figure BDA00035173222000000910
the equation is then rewritten as:
Figure BDA00035173222000000911
wherein
Figure BDA00035173222000000912
The minimization problem can also be solved to η using the Optimization toolkit of Matlab program.
Finally in function of original energy
Figure BDA00035173222000000913
W and η are updated to the solved values and the above calculation is repeated. Repetition ofThe value solved by the energy function will reach steady state after 7 times. At this time, the final product will be
Figure BDA00035173222000000914
And combining the values to obtain a final significance value, wherein the calculation method comprises the following steps:
Figure BDA00035173222000000915
Figure BDA00035173222000000916
the calculated final significance value is obtained.
Through the steps, a good significance detection result can be obtained for the infrared pedestrian video, pedestrians are completely highlighted, the background is almost completely inhibited, and the method has practical application values in other image processing fields such as target segmentation, target tracking, target retrieval and the like.
(3) Compared with the prior art, the invention has the advantages that:
the method provides a candidate region extraction method based on the Boolean diagram, can more completely maintain the edge information and the structural information of the object, and can better highlight the whole remarkable target.
Meanwhile, the method provides a saliency description method combining motion saliency and similarity operator, and can fully describe the saliency of the moving object in the video from the perspective of space and time. Compared with the prior method, the method can be applied to more complex backgrounds and difficult scenes such as camera motion.
Then, the method provides an optimization model combining graph structure self-learning and significance propagation. Different from the prior graph model which adopts a correlation matrix set manually, the correlation matrix of the method can be corrected by self by utilizing significance information in the learning process. While the saliency propagation is optimized with continuously revised graph structures. Compared with the prior method, the method can obtain more robust and more accurate detection results.
Drawings
FIG. 1 is a block diagram of the detection method of the present invention.
Detailed Description
For better understanding of the technical solutions of the present invention, the following further describes embodiments of the present invention with reference to the accompanying drawings.
The flow chart of the invention is shown in fig. 1, the invention provides an infrared video pedestrian significance detection method combining graph learning and probability propagation, which comprises the following specific implementation steps:
the method comprises the following steps: generating a candidate region based on the Boolean graph;
firstly, carrying out super-pixel segmentation on each frame of image of an infrared video; using SLIC method to convert t frame image I t In which the adjacent pixels with similar gray scale and structure are clustered into irregular pixel block set with certain visual significance
Figure BDA0003517322200000101
sp t,i And N t Respectively represent I t And the ith super pixel in (b) and the total number of super pixels. The gray level of each super pixel is the average value of the gray levels of the internal pixels, and the calculation formula is as follows:
Figure BDA0003517322200000102
wherein C t (p) represents the gray value of the pixel p in the t-th frame image, | sp t,i I is then superpixel sp t,i The area of (d);
then constructing a Boolean graph; dividing the t-th frame infrared image after the super-pixel division by using an integer from 255 to 0 as a threshold value to obtain a series of binary images to form a Boolean image B t ={B t,255 ,B t,254 ,...,B t,0 }:
B t,θ =ξ(SP t ,θ)
Where ξ is the splitting operation, SP is t Superpixels smaller than the threshold θ are labeled 0, otherwise labeled 1, B t,θ Is a boolean plot at threshold θ. According to the gray level distribution characteristics of the infrared image, B t,255 Is a completely black image or contains individual white regions, B as the threshold decreases t,θ These white areas are increasing in size, and it is known that all of them are integrated into a full white image.
Then, cascading adjacent Boolean diagrams on the basis to construct a three-dimensional candidate region; will be from B t,255 The connected regions appearing therein are numbered starting with 1, and all connected regions therein are assigned different numbers with consecutive integers. After B t,θ When a brand-new area which is completely not overlapped with the numbered area appears, a new number is continuously distributed to the new area; when the coded area is overlapped with the unique numbered area, the number of the area is inherited; when the number of the area is overlapped with a plurality of numbered areas, the number of the area with the largest area is inherited. Numbering all connected regions in the whole Boolean graph sequence according to the rule, and a series of regions with the same number in different Boolean graph layers can form a three-dimensional region
Figure BDA0003517322200000111
N r As an image I t The number of the obtained three-dimensional regions, and the three-dimensional regions are formed by a plurality of sub-region image layers
Figure BDA0003517322200000112
l is the layer where each sub-region is located.
Finally, constructing a similar physical operator to calculate the value of each sub-region; first, each subregion is calculated
Figure BDA0003517322200000113
The greater the gray contrast of the inner superpixel to its neighboring outer superpixel, the more likely the region is to represent a complete object. The calculation formula is as follows:
Figure BDA0003517322200000114
wherein sp t,i Is a sub-region
Figure BDA0003517322200000115
Inner super-pixel, sp t,j Is not belonging to
Figure BDA0003517322200000116
The super pixel of (2); delta (-) is an indicator function when a superpixel sp t,j Belong to and sp t,i Neighboring super-pixel sets
Figure BDA0003517322200000117
When the value is 1, otherwise, it is 0.
Then calculating each sub-region
Figure BDA0003517322200000118
The smaller the difference between the superpixels within a sub-region, the more likely it represents a complete object. The calculation formula is as follows:
Figure BDA0003517322200000119
then calculating each sub-region
Figure BDA00035173222000001110
The gradient information contained at the boundary, the more gradients contained at the boundary are more likely to be complete objects. The calculation formula is as follows:
Figure BDA0003517322200000121
wherein br t,p Is an image frame I t The response of the gradient map at pixel p,
Figure BDA0003517322200000122
is a sub-region
Figure BDA0003517322200000123
The set of boundary pixels of (1).
Figure BDA0003517322200000124
Is a sub-region
Figure BDA0003517322200000125
The area of (a).
According to the above rule, the calculation formula of the similarity descriptor is as follows:
Figure BDA0003517322200000126
and finally, normalizing the similarity operators of all the sub-regions in each three-dimensional region, wherein the calculation formula is as follows:
Figure BDA0003517322200000127
the resulting similarity operator describes the likelihood that each region can represent the complete target region.
Step two: calculating the motion significance of each frame image;
firstly, extracting an optical flow field of a video sequence; calculating adjacent current image frame I by optical flow method LDOF t To its next frame image I t+1 Between the forward motion vectors F in both the horizontal x and vertical y directions ═ Fx t ;Fy t ]. And the video is reversely calculated by using an optical flow method LDOF (laser direct imaging) to obtain a current image frame I t To its previous frame image I t-1 Backward motion vector B ═ Bx t ;By t ]。
Then, calculating the motion significance of each region based on the local gradient according to the motion gradient of the image; first, image frame I is calculated by summing the gradients of the motion fields t The gradient probability of (2) is calculated as follows
Figure BDA0003517322200000128
Wherein
Figure BDA0003517322200000129
And
Figure BDA00035173222000001210
the sub-tables represent the graduations along the horizontal direction and the graduations along the vertical direction.
Then for each sub-region
Figure BDA00035173222000001211
The gradient probabilities of the internal superpixels are added to obtain the motion gradient probability of each subarea, and the calculation formula is as follows:
Figure BDA00035173222000001212
wherein Mg t,p Is the motion gradient probability of the pixel p,
Figure BDA00035173222000001213
then is
Figure BDA00035173222000001214
A motion saliency value based on a motion gradient.
Secondly, calculating the motion significance based on the motion direction by extracting the background motion main direction; firstly, extracting an image I t Four-edge superpixel set
Figure BDA00035173222000001215
As background super-pixel, then using K-means clustering method to divide the motion vector of background super-pixel into K classes
Figure BDA00035173222000001216
And accordingly obtaining K clustering centers
Figure BDA00035173222000001217
And the main motion direction of each class is used, the class with the total number ratio of the classes less than 1/6 is deleted, and the residual clustering center is used as the background main motion direction.
Then, each is calculatedSub-area
Figure BDA0003517322200000131
The difference between the inner pixel and the background main motion direction is more likely to belong to the background significant value, and the calculation formula is as follows:
Figure BDA0003517322200000132
wherein F t,p Is a motion vector for the pixel p,
Figure BDA0003517322200000133
then is
Figure BDA0003517322200000134
A motion saliency value based on a direction of motion.
Finally, combining the motion significance with the similarity operator to obtain the probability that each super pixel belongs to the foreground/background; firstly, motion gradient significant values and analog values of the superpixel in all candidate regions are accumulated to obtain the foreground probability of the superpixel, and the calculation formula is as follows:
Figure BDA0003517322200000135
Figure BDA0003517322200000136
then it is a super pixel sp t,i Foreground probability value of (2).
Then, the motion direction significant value and the similarity value of the superpixel in all the candidate areas are accumulated to obtain the background probability of the superpixel, and the calculation formula is as follows:
Figure BDA0003517322200000137
Figure BDA0003517322200000138
then it is a super pixel sp t,i The background probability value of (2).
Step three: constructing a multi-view space-time diagram structure;
firstly, constructing a spatial neighborhood relationship of each node of a graph structure; each super pixel is taken as a node of the graph, and the set of the super pixels adjacent to the super pixels and the super pixels adjacent to the super pixels is taken as a spatial neighborhood of the node.
Then constructing a multi-view spatial correlation matrix; firstly, the gray scale difference value of each node and the neighborhood thereof is calculated to measure the gray scale similarity of each node, thereby constructing a gray scale correlation matrix
Figure BDA0003517322200000139
The calculation formula is as follows:
Figure BDA00035173222000001310
wherein
Figure BDA00035173222000001311
Is node sp t,i The neighborhood of (a) is determined,
Figure BDA00035173222000001312
is node sp t,i And sp t,j The gray scale dependency of (a).
Then, gradient calculation edge correlation matrix of common edges owned by each adjacent node is measured
Figure BDA00035173222000001313
The stronger the gradient information separates adjacent superpixels, the same object cannot be attributed to, and the calculation formula is:
Figure BDA0003517322200000141
wherein bp is t,i→j Is a super pixel sp t,i Neutral sp t,j Set of adjacent pixels, bp t,j→i Is a super pixel sp t,j Neutral sp t,i Neighboring sets of pixels br t,p And br t,q The gradient values of the pixels p and q respectively,
Figure BDA0003517322200000142
then is sp with the super pixel t,j A set of directly adjacent superpixels.
Figure BDA0003517322200000143
Is node sp t,i And sp t,j The edge correlation of (1).
Then constructing a time correlation matrix between adjacent frame nodes; firstly, two adjacent continuous frames I t And I t+1 The forward and backward optical flows are mapped to each other, respectively, and the calculation formula is as follows:
Figure BDA0003517322200000144
pos p ,pos q respectively, is an image I t Middle pixel p and image I t+1 The position coordinate vector of the middle pixel q,
Figure BDA0003517322200000145
is to mix pos with p Using f t,p Mapping to the next frame I t+1 The position vector of (a) is determined,
Figure BDA0003517322200000146
then is the pos q By using b t+1,q Mapping to previous frame I t Is determined. Then, respectively calculating the motion difference between each pixel and the mapping pixel thereof to obtain the forward and backward motion probabilities:
Figure BDA0003517322200000147
Figure BDA0003517322200000148
wherein
Figure BDA0003517322200000149
In order to be the probability of a forward motion,
Figure BDA00035173222000001410
is the backward motion probability. Finally, the forward and backward motion probabilities in the super-pixel overlapping part of the super-pixel and its adjacent frame are accumulated to measure their motion consistency, thereby constructing a time correlation matrix
Figure BDA00035173222000001411
The calculation formula is as follows:
Figure BDA00035173222000001412
Figure BDA00035173222000001413
is a matrix
Figure BDA00035173222000001414
Middle super pixel sp t,i Neutral sp t+1,j The correlation value between them.
Finally, combining the time correlation matrix with the space correlation matrix to obtain space-time correlation matrixes of different visual angles; firstly, serially connecting super pixels of two continuous frames of images to obtain a super pixel containing N-N t +N t+1 A graph of individual nodes; then, spatial correlation matrixes of gray scale, edge and motion are respectively combined with the temporal correlation matrix to obtain a space-time correlation matrix which describes the consistency of the nodes on time and space in different angles
Figure BDA0003517322200000151
N v The calculation formula is the number of the angles as follows:
Figure BDA0003517322200000152
step four: and constructing and solving an energy function combining graph self-learning and significance propagation.
Firstly, two frames of images I are constructed t And I t+1 The cascaded initial significance vector describes each node of the graph, and the calculation formula is as follows:
Figure BDA0003517322200000153
then, since the times t and t +1 do not appear in the optimization process of the energy function, in order to simplify the formula, the time indices t and t +1 are omitted from the variables, so that the energy function is constructed as follows:
Figure BDA0003517322200000154
s.t.W1=1,W≥0 n ,η1=1,η≥0
wherein the first item in the first row is a graph learning item, W is an optimized space-time correlation matrix to be learned, and the optimal combination weight is obtained by learning the correlation matrix of multiple visual angles
Figure BDA0003517322200000155
To pair
Figure BDA0003517322200000156
Linear addition fits the best W. The second row of data items is then a significance propagation item,
Figure BDA0003517322200000157
and
Figure BDA0003517322200000158
then the optimized foreground and background probabilities of significance should be kept consistent with the initial foreground and background probabilities, respectively. The last two items in the first row are combined learning items, and the optimal correlation matrix obtained by learning is used for ensuring the learning in time or spaceNodes with high consistency should maintain similar significance values, which play a role in both graph learning and significance propagation. The third row is an optimization constraint that requires that the sum of the vectors of rows of W be 1 and that there cannot be elements less than zero. Meanwhile, the sum of the view angle weights eta is ensured to be 1 and not less than 0.
On the basis, the method utilizes an alternative optimization method to sequentially solve the variables in the method; first of all maintain
Figure BDA0003517322200000159
W and eta are constants, calculating
Figure BDA00035173222000001510
The energy function is simplified as:
Figure BDA00035173222000001511
solving by derivation
Figure BDA00035173222000001512
The calculation formula of (2) is as follows:
Figure BDA00035173222000001513
then hold
Figure BDA00035173222000001514
W and eta are constants, calculating
Figure BDA00035173222000001515
The energy function is simplified as:
Figure BDA0003517322200000161
the calculation formula is as follows:
Figure BDA0003517322200000162
then hold
Figure BDA0003517322200000163
And eta is a constant, and W is calculated by the following calculation method: first for each row vector W of W i The energy function is simplified to:
Figure BDA0003517322200000164
the minimization problem is solved by utilizing an Optimization toolbox of a Matlab program to obtain w i And sequentially solving the pairs from i to N to obtain the optimized W.
Then hold
Figure BDA0003517322200000165
And W is a constant, and eta is calculated by first simplifying the energy function to
Figure BDA0003517322200000166
To solve η, W and H v Conversion to column vectors
Figure BDA0003517322200000167
And
Figure BDA0003517322200000168
the equation is then rewritten as:
Figure BDA0003517322200000169
wherein
Figure BDA00035173222000001610
The minimization problem can also be solved to η using the Optimization toolkit of Matlab program.
Finally, theIn function of original energy
Figure BDA00035173222000001611
W and η are updated to the solved values and the above calculation is repeated. The value solved by the energy function will reach steady state after 7 repetitions. At this time, the final product will be
Figure BDA00035173222000001612
And combining the values to obtain a final significance value, wherein the calculation method comprises the following steps:
Figure BDA00035173222000001613
Figure BDA00035173222000001614
the final significance value is obtained.

Claims (14)

1. An infrared video pedestrian significance detection method combining graph learning and probability propagation is characterized by comprising the following steps of:
the method comprises the following steps: generating a candidate region based on the Boolean graph; aiming at the subsequent steps, firstly, carrying out super-pixel segmentation on each frame of image of the infrared video; then constructing Boolean diagrams and cascading the Boolean diagrams at all levels to obtain a series of three-dimensional areas; finally, calculating similarity operators for sub-regions of each three-dimensional region, and normalizing;
step two: and calculating the motion significance of each frame image. Firstly, extracting an optical flow field of a video sequence; then, calculating the motion significance of each region based on the local gradient according to the motion gradient of the image; secondly, calculating the motion significance based on the motion direction by extracting the background motion main direction; finally, combining the motion significance with the similarity operator to obtain the probability that each super pixel belongs to the foreground/background;
step three: and constructing a multi-view space-time diagram structure. Firstly, constructing a spatial neighborhood relationship of each node of a graph structure, and then constructing a spatial correlation matrix from three aspects of gray scale, edge and motion; a time correlation matrix between adjacent frame nodes is then constructed. Finally, combining the time correlation matrix with the space correlation matrix to obtain space-time correlation matrixes of different visual angles;
step four: and constructing and solving an energy function combining graph self-learning and significance propagation. An energy function containing graph learning, significance propagation and joint learning items is constructed on the basis of a graph structure, and variables in the energy function are sequentially solved by using an alternative optimization method, so that an optimized graph correlation matrix and a significance detection result are obtained.
2. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the "super-pixel segmentation of each frame image of the infrared video" in the step one is performed by the following steps: using SLIC algorithm to convert t frame image I t In which the adjacent pixels with similar gray scale and structure are clustered into irregular pixel block set with certain visual significance
Figure FDA0003517322190000011
sp t,i And N t Respectively represent I t The ith super pixel in (b) and the total number of super pixels. The gray level of each super pixel is the average value of the gray levels of the internal pixels, and the calculation formula is as follows:
Figure FDA0003517322190000012
wherein C is t (p) represents the gray value of the pixel p in the t-th frame image, | sp t,i I is then superpixel sp t,i The area of (a).
3. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: in step one, "building boolean diagrams and concatenating the boolean diagrams of various levels to obtain a series of three-dimensional regions" is performed as follows: dividing the super pixel into t frame redThe external image is divided by using an integer from 255 to 0 as a threshold value, so that a series of binary image forming Boolean images B can be obtained t ={B t,255 ,B t,254 ,...,B t,0 }:
B t,θ =ξ(SP t ,θ)
Where ξ is the splitting operation, SP is t Superpixels smaller than the threshold θ are labeled 0, otherwise labeled 1, B t,θ Is a boolean plot at threshold θ. According to the gray distribution characteristics of the infrared image, B t,255 Is a completely black image or contains individual white regions, B as the threshold decreases t,θ These white areas are increasing in size, and it is known that all of them are fused together to form a full white image.
Then, the slave B t,255 The connected regions appearing therein are numbered starting with 1, and all connected regions therein are assigned different numbers with consecutive integers. After B t,θ When a brand-new area which is completely not overlapped with the numbered area appears, a new number is continuously distributed to the new area; when the coded area is overlapped with the unique numbered area, the number of the area is inherited; when the number of the area is overlapped with a plurality of numbered areas, the number of the area with the largest area is inherited. Numbering all connected regions in the whole Boolean graph sequence according to the rule, and a series of regions with the same number in different Boolean graph layers can form a three-dimensional region
Figure FDA0003517322190000021
N r As an image I t The number of the obtained three-dimensional regions, and the three-dimensional regions are formed by a plurality of sub-region image layers
Figure FDA0003517322190000022
l is the layer where each sub-region is located.
4. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: calculating similarity operators for sub-regions of each three-dimensional region as described in step one, andnormalization ", the calculation method is as follows: first, each subregion is calculated
Figure FDA0003517322190000023
The greater the gray contrast of the inner superpixel to its neighboring outer superpixel, the more likely the region is to represent a complete object. The calculation formula is as follows:
Figure FDA0003517322190000024
wherein sp t,i Is a sub-region
Figure FDA0003517322190000025
Inner super-pixel, sp t,j Is not subject to
Figure FDA0003517322190000026
The super pixel of (2); delta (-) is an indicator function when a superpixel sp t,j Belong to and sp t,i Neighboring super-pixel sets
Figure FDA0003517322190000027
When the value is 1, otherwise, it is 0.
Then calculating each sub-region
Figure FDA0003517322190000031
The smaller the difference between the superpixels within a sub-region, the more likely it represents a complete object. The calculation formula is as follows:
Figure FDA0003517322190000032
then calculating each sub-region
Figure FDA0003517322190000033
The gradient information contained at the boundary, the more gradients contained at the boundary are more likely to be complete objects. The calculation formula is as follows:
Figure FDA0003517322190000034
wherein br t,p Is an image frame I t The response of the gradient map at pixel p,
Figure FDA0003517322190000035
is a sub-region
Figure FDA0003517322190000036
The set of boundary pixels of (1).
Figure FDA0003517322190000037
Is a sub-region
Figure FDA0003517322190000038
The area of (a).
According to the above rule, the calculation formula of the similarity descriptor is as follows:
Figure FDA0003517322190000039
and finally, normalizing the similarity operators of all the sub-regions in each three-dimensional region, wherein the calculation formula is as follows:
Figure FDA00035173221900000310
the resulting similarity operator describes the likelihood that each region can represent the complete target region.
5. The infrared video pedestrian saliency coupled with map learning and probability propagation of claim 1The detection method is characterized by comprising the following steps: the step two of extracting the optical flow field of the video sequence refers to calculating the adjacent current image frame I by an optical flow method LDOF t To its next frame image I t+1 Between the forward motion vectors F in both the horizontal x and vertical y directions ═ Fx t ;Fy t ]. And the video is reversely calculated by using an optical flow method LDOF (laser direct imaging) to obtain a current image frame I t To its previous frame image I t-1 Backward motion vector B ═ Bx t ;By t ]。
6. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: in step two, the calculation method of "calculating the motion saliency of each region based on the local gradient according to the motion gradient of the image" is as follows: first, image frame I is calculated by summing the gradients of the motion fields t The gradient probability of (2) is calculated as follows
Figure FDA00035173221900000311
Wherein
Figure FDA0003517322190000041
And
Figure FDA0003517322190000042
the sub-tables represent the graduations along the horizontal direction and the graduations along the vertical direction.
Then for each sub-region
Figure FDA0003517322190000043
The gradient probabilities of the internal superpixels are added to obtain the motion gradient probability of each subarea, and the calculation formula is as follows:
Figure FDA0003517322190000044
wherein Mg t,p Is the motion gradient probability of the pixel p,
Figure FDA0003517322190000045
then is
Figure FDA0003517322190000046
A motion saliency value based on a motion gradient.
7. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: in step two, the motion saliency based on the motion direction is calculated by extracting the background motion principal direction, and the calculation method is as follows: firstly, extracting an image I t Four-edge superpixel set
Figure FDA0003517322190000047
As background super-pixel, then using K-means clustering method to divide the motion vector of background super-pixel into K classes
Figure FDA0003517322190000048
And accordingly obtaining K clustering centers
Figure FDA0003517322190000049
And the main motion direction of each class is used, the class with the total number ratio of the classes less than 1/6 is deleted, and the residual clustering center is used as the background main motion direction.
Then, each sub-region is calculated
Figure FDA00035173221900000410
The difference between the inner pixel and the background main motion direction is more likely to belong to the background significant value, and the calculation formula is as follows:
Figure FDA00035173221900000411
wherein F t,p Is a motion vector for the pixel p,
Figure FDA00035173221900000412
then is
Figure FDA00035173221900000413
A motion saliency value based on a direction of motion.
8. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the calculation method for combining the motion saliency and the similarity operator to obtain the probability of each super pixel belonging to the foreground/background in the step two is as follows: firstly, motion gradient significant values and analog values of the superpixel in all candidate regions are accumulated to obtain the foreground probability of the superpixel, and the calculation formula is as follows:
Figure FDA00035173221900000414
Figure FDA00035173221900000415
then it is a superpixel sp t,i The foreground probability value of (1).
Then, the motion direction significant value and the similarity value of the superpixel in all the candidate areas are accumulated to obtain the background probability of the superpixel, and the calculation formula is as follows:
Figure FDA0003517322190000051
Figure FDA0003517322190000052
then it is a super pixel sp t,i The background probability value of (2).
9. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the "spatial neighborhood relationship of each node of the graph structure is constructed" in step three, which is performed as follows: each super pixel is taken as a node of the graph, and the set of the super pixels adjacent to the super pixel and the super pixels adjacent to the super pixels is taken as a spatial neighborhood of the node.
10. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the "spatial correlation matrix is constructed from three aspects of gray scale, edge and motion respectively" in step three, and the calculation method is as follows: firstly, the gray scale difference value of each node and the neighborhood thereof is calculated to measure the gray scale similarity of each node, thereby constructing a gray scale correlation matrix
Figure FDA0003517322190000053
The calculation formula is as follows:
Figure FDA0003517322190000054
wherein
Figure FDA0003517322190000055
Is node sp t,i The neighborhood of (a) is determined,
Figure FDA0003517322190000056
is node sp t,i And sp t,j The gray scale dependency of (a).
Then, gradient calculation edge correlation matrix of common edges owned by each adjacent node is measured
Figure FDA0003517322190000057
The stronger the gradient information between adjacent superpixels is, the less likely they belong to the same object, and their calculationThe formula is as follows:
Figure FDA0003517322190000058
wherein bp is t,i→j Is a super pixel sp t,i Neutral sp t,j Set of adjacent pixels, bp t,j→i Is a super pixel sp t,j Neutral sp t,i Neighboring set of pixels br t,p And br t,q The gradient values of the pixels p and q respectively,
Figure FDA0003517322190000059
then is sp with the super pixel t,j A set of directly adjacent superpixels.
Figure FDA00035173221900000510
Is node sp t,i And sp t,j The edge correlation of (1).
11. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the "construction of the time correlation matrix between adjacent frame nodes" described in step three is performed as follows: firstly, two adjacent continuous frames I t And I t+1 The forward and backward optical flows are mapped to each other, respectively, and the calculation formula is as follows:
Figure FDA0003517322190000061
pos p ,pos q are respectively an image I t Middle pixel p and image I t+1 The position coordinate vector of the middle pixel q,
Figure FDA0003517322190000062
is to mix pos p Using f t,p Mapping to the next frame I t+1 The position vector of (1) is determined,
Figure FDA0003517322190000063
then is the pos q By using b t+1,q Mapping to previous frame I t Is determined. Then, respectively calculating the motion difference between each pixel and the mapping pixel thereof to obtain the forward and backward motion probabilities:
Figure FDA0003517322190000064
Figure FDA0003517322190000065
wherein
Figure FDA0003517322190000066
In order to be the probability of a forward motion,
Figure FDA0003517322190000067
is the backward motion probability. Finally, the forward and backward motion probabilities in the super-pixel overlapping part of the super-pixel and its adjacent frame are accumulated to measure their motion consistency, thereby constructing a time correlation matrix
Figure FDA0003517322190000068
The calculation formula is as follows:
Figure FDA0003517322190000069
Figure FDA00035173221900000610
is a matrix
Figure FDA00035173221900000611
Middle super pixel sp t,i Neutral sp t+1,j The correlation value between them.
12. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the "combining the temporal correlation matrix and the spatial correlation matrix to obtain the spatio-temporal correlation matrix of different view angles" described in step three is performed as follows: firstly, serially connecting super pixels of two continuous frames of images to obtain a super pixel containing N-N t +N t+1 A graph of individual nodes; then, spatial correlation matrixes of gray scale, edge and motion are respectively combined with the temporal correlation matrix to obtain a space-time correlation matrix which describes the consistency of the nodes on time and space in different angles
Figure FDA00035173221900000612
N v The calculation formula is the number of the angles as follows:
Figure FDA00035173221900000613
13. the infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the "constructing an energy function including graph learning, significance propagation and joint learning terms on the basis of the graph structure" described in step four is performed as follows: firstly, two frames of images I are constructed t And I t+1 The cascaded initial significance vector describes each node of the graph, and the calculation formula is as follows:
Figure FDA0003517322190000071
then, since the times t and t +1 do not appear in the optimization process of the energy function, in order to simplify the formula, the time indices t and t +1 are omitted from the variables, so that the energy function is constructed as follows:
Figure FDA0003517322190000072
wherein the first item in the first row is a graph learning item, W is an optimized space-time correlation matrix to be learned, and the optimal combination weight is obtained by learning the correlation matrix of multiple visual angles
Figure FDA0003517322190000073
To pair
Figure FDA0003517322190000074
Linear addition fits the best W. The second row of data items is then a significance propagation item,
Figure FDA0003517322190000075
and
Figure FDA0003517322190000076
then the optimized foreground and background probabilities of significance should be kept consistent with the initial foreground and background probabilities, respectively. The last two items in the first row are joint learning items, and the optimized correlation matrix obtained by learning is used for ensuring that nodes with high consistency in time or space should keep similar significance values, and the item plays a role in graph learning and significance propagation. The third row is an optimization constraint that requires that the sum of the vectors of rows of W be 1 and that there cannot be elements less than zero. Meanwhile, the sum of the view angle weights eta is ensured to be 1 and cannot be less than 0.
14. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the step four of "solving the variables in sequence by using the alternative optimization method" means that when solving one variable, other variables are regarded as constants and kept unchanged by using the alternative optimization method, then the variables are updated to solve the other variables, and the variables in the energy function are solved in sequence
Figure FDA0003517322190000077
W and eta are solved as follows:
first of all maintain
Figure FDA0003517322190000078
W and eta are constants, calculating
Figure FDA0003517322190000079
The energy function is simplified as:
Figure FDA00035173221900000710
solving by derivation
Figure FDA00035173221900000711
The calculation formula of (2) is as follows:
Figure FDA0003517322190000081
then hold
Figure FDA0003517322190000082
W and eta are constants, calculating
Figure FDA0003517322190000083
The energy function is simplified as:
Figure FDA0003517322190000084
the calculation formula is as follows:
Figure FDA0003517322190000085
then hold
Figure FDA0003517322190000086
And eta is a constant, and W is calculated by the following calculation method: first for each row vector W of W i The energy function is simplified as:
Figure FDA0003517322190000087
wherein
Figure FDA0003517322190000088
Figure FDA0003517322190000089
The minimization problem is solved by utilizing an Optimization toolbox of a Matlab program to obtain w i And sequentially solving the pairs from i to N to obtain the optimized W.
Then hold
Figure FDA00035173221900000810
And W is a constant, and eta is calculated by first simplifying the energy function to
Figure FDA00035173221900000811
To solve η, W and H are added v Conversion to column vectors
Figure FDA00035173221900000812
And
Figure FDA00035173221900000813
the equation is then rewritten as:
Figure FDA00035173221900000814
wherein
Figure FDA00035173221900000815
The minimization problem can also be solved to η using the Optimization toolkit of Matlab program.
Finally in function of original energy
Figure FDA00035173221900000816
W and η are updated to the solved values and the above calculation is repeated. The value solved by the energy function will reach steady state after 7 repetitions. At this time, the final product will be
Figure FDA00035173221900000817
And combining the values to obtain a final significance value, wherein the calculation method comprises the following steps:
Figure FDA00035173221900000818
Figure FDA00035173221900000819
the final significance value is obtained.
CN202210167951.2A 2022-02-23 2022-02-23 Infrared video pedestrian significance detection method combining graph learning and probability propagation Pending CN114913472A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210167951.2A CN114913472A (en) 2022-02-23 2022-02-23 Infrared video pedestrian significance detection method combining graph learning and probability propagation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210167951.2A CN114913472A (en) 2022-02-23 2022-02-23 Infrared video pedestrian significance detection method combining graph learning and probability propagation

Publications (1)

Publication Number Publication Date
CN114913472A true CN114913472A (en) 2022-08-16

Family

ID=82762817

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210167951.2A Pending CN114913472A (en) 2022-02-23 2022-02-23 Infrared video pedestrian significance detection method combining graph learning and probability propagation

Country Status (1)

Country Link
CN (1) CN114913472A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117133443A (en) * 2023-08-29 2023-11-28 山东大学 Lower limb venous thrombosis ultrasonic auxiliary diagnosis system based on video dynamic operator

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117133443A (en) * 2023-08-29 2023-11-28 山东大学 Lower limb venous thrombosis ultrasonic auxiliary diagnosis system based on video dynamic operator
CN117133443B (en) * 2023-08-29 2024-03-12 山东大学 Lower limb venous thrombosis ultrasonic auxiliary diagnosis system based on video dynamic operator

Similar Documents

Publication Publication Date Title
Smith et al. Stereo matching with nonparametric smoothness priors in feature space
CN103996202A (en) Stereo matching method based on hybrid matching cost and adaptive window
CN103996201A (en) Stereo matching method based on improved gradient and adaptive window
Bešić et al. Dynamic object removal and spatio-temporal RGB-D inpainting via geometry-aware adversarial learning
CN108764244B (en) Potential target area detection method based on convolutional neural network and conditional random field
Mascaro et al. Diffuser: Multi-view 2d-to-3d label diffusion for semantic scene segmentation
Song et al. Self-supervised depth completion from direct visual-lidar odometry in autonomous driving
Rubinstein et al. Towards longer long-range motion trajectories
Jiang et al. Data fusion-based multi-object tracking for unconstrained visual sensor networks
Li et al. Two-stage adaptive object scene flow using hybrid cnn-crf model
CN104331890A (en) Method and system for estimating global disparity
Wang et al. Unsupervised learning of 3d scene flow from monocular camera
Li et al. Deep learning based monocular depth prediction: Datasets, methods and applications
Salehian et al. Dynamic programming-based dense stereo matching improvement using an efficient search space reduction technique
Zhang et al. Multiscale adaptation fusion networks for depth completion
Liu et al. Learning optical flow and scene flow with bidirectional camera-lidar fusion
CN114913472A (en) Infrared video pedestrian significance detection method combining graph learning and probability propagation
Fućek et al. Dense disparity estimation in ego-motion reduced search space
CN114707611B (en) Mobile robot map construction method, storage medium and equipment based on graph neural network feature extraction and matching
Wang et al. Robust obstacle detection based on a novel disparity calculation method and G-disparity
Liu et al. Contextualized trajectory parsing with spatio-temporal graph
Salih et al. Depth estimation using monocular cues from single image
Xu et al. Moving target detection and tracking in FLIR image sequences based on thermal target modeling
EP2947626B1 (en) Method and apparatus for generating spanning tree, method and apparatus for stereo matching, method and apparatus for up-sampling, and method and apparatus for generating reference pixel
Hehn et al. Instance stixels: Segmenting and grouping stixels into objects

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination