CN114913472A - Infrared video pedestrian significance detection method combining graph learning and probability propagation - Google Patents
Infrared video pedestrian significance detection method combining graph learning and probability propagation Download PDFInfo
- Publication number
- CN114913472A CN114913472A CN202210167951.2A CN202210167951A CN114913472A CN 114913472 A CN114913472 A CN 114913472A CN 202210167951 A CN202210167951 A CN 202210167951A CN 114913472 A CN114913472 A CN 114913472A
- Authority
- CN
- China
- Prior art keywords
- motion
- follows
- pixel
- significance
- super
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/136—Segmentation; Edge detection involving thresholding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/187—Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/269—Analysis of motion using gradient-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10048—Infrared image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
Abstract
The invention discloses an infrared video pedestrian significance detection method combining graph learning and probability propagation, which can automatically position an infrared pedestrian region in an infrared video and accurately distinguish pedestrians from a background. The method comprises the following steps: firstly, the method comprises the following steps: generating a candidate region based on the Boolean graph; II, secondly: calculating the motion significance of each frame image; thirdly, the method comprises the following steps: constructing a multi-view space-time diagram structure; fourthly, the method comprises the following steps: and constructing and solving an energy function combining graph self-learning and significance propagation. Through the steps, the method can accurately and robustly extract the space-time significance of the infrared pedestrian target from the disordered background and multiple complex motions, almost completely inhibits the background, and has practical application value in other image processing fields such as target segmentation, target tracking, target retrieval and the like.
Description
Technical Field
The invention relates to an infrared video pedestrian significance detection method combining graph learning and probability propagation, and belongs to the field of computer vision and digital image processing. The method has wide application prospect in the fields of target segmentation, identification, tracking and the like.
Background
The detection of the saliency of images has been widely studied as an important research content in the field of computer vision, and has achieved a good effect. Video saliency detection remains a subject to be studied. Video saliency aims to automatically find and locate the portion of a given video that is most attractive to the viewer. As an effective preprocessing means, video saliency has important application in target tracking, repositioning, video compression, video summarization and the like. The existing video saliency methods are basically researched for visible light videos, but visible light images often lose effects under the challenging conditions of poor lighting conditions, extreme weather, light change and the like. The infrared imaging is used for detecting the target by means of the heat energy of passively receiving the thermal radiation of the target, is not influenced by weather and climate, and can effectively make up the deficiency of visible light images, so that the infrared imaging plays an increasingly important role in the fields of military affairs, security protection, monitoring, intelligent transportation and the like. Especially, pedestrians often have higher significance in infrared images due to the characteristic of self heat radiation, so the research of detecting the significance of infrared videos has practical significance for the development of the fields of intelligent traffic unmanned driving and the like. However, the application of video saliency to infrared images remains challenging for existing approaches, and no related approach has been proposed.
The classical video saliency detection model is generally based on low-level spatial features and motion features (such as color, texture and motion vector fields), heuristic rules (such as contrast and similarity), and a priori knowledge (such as foreground prior and background prior), and is further fused by simple mathematical operations. However, these direct fusion methods often have difficulty in obtaining robust detection results in videos containing complex situations, such as background clutter, lack of features in objects, and multiple motions, by performing frame-by-frame operations. In view of the link between video sequences, scholars propose a trajectory-based method, which extends video frames to a spatio-temporal tubular structure of points or region blocks, and a series of trajectory descriptors as bases for significance measurement. This type of method can capture the consistency of video in time and space well, but track clustering requires careful selection of a suitable model, which brings high computational complexity. Graph-based methods use probabilistic models to propagate significance values in both spatial and temporal domains, reducing computational load by connection constraints when constructing graphs. The construction of a robust graph model for video saliency detection involves three key techniques: the construction of an accurate initial saliency metric and map and the design of an energy function.
However, the above video saliency methods are often based on various features such as color texture of visible light images, and these methods are often ineffective in infrared images lacking these features, so these methods cannot be directly applied to infrared images, and robust features capable of describing infrared targets need to be proposed for the intrinsic characteristics of infrared videos. On the other hand, video saliency detection often needs to face complex motion states including camera motion, dynamic background and the like, and it is very important to describe the difference between a target and a background by using limited features of an infrared image and capture the continuity of an infrared video in a space domain and a time domain. The research aiming at the problems has very important significance.
Disclosure of Invention
(1) Objects of the invention
Infrared video pedestrian detection has important applications in the field of intelligent transportation, such as in pedestrian monitoring systems and in vehicle-mounted pedestrian detection systems. Because the infrared equipment can work all day long, the condition that visible light cannot be used under the conditions of poor illumination, bad weather and the like can be compensated to a great extent. The saliency detection can automatically determine salient objects in the positioning images, highlight the objects and suppress the background, and pedestrians in the infrared images just have the salient features due to the radiation characteristics of the pedestrians. However, due to the characteristics of low contrast, lack of color texture characteristics, low signal-to-noise ratio and the like of the infrared image, the existing saliency detection method for visible light is difficult to be directly applied to the infrared image. In addition, when the situation that the camera moves and the background is complex is faced in an infrared video, the motion significance of the pedestrian is difficult to accurately extract.
In order to solve the existing problems, the saliency can be well applied to an infrared pedestrian video, pedestrians are highlighted, and the pedestrians and the background are separated, the invention provides an infrared video pedestrian saliency detection method combining graph learning and probability propagation. Firstly, considering that the region-based saliency detection method often has more accurate edge information and can ensure the consistency of the interior of an object and reduce the calculation amount, the method provides a Boolean-graph-based candidate target region generation strategy for an infrared image to acquire a series of regions possibly containing salient targets. And for the generated regions, an analog descriptor is proposed to measure the possibility that each region represents a complete object. Then, considering that the gradient of the optical flow motion field has higher robustness and the image edge can often represent the motion direction of the image background, the method provides the infrared pedestrian significance description characteristics based on the motion gradient contrast and the motion direction difference, and combines with the similarity operator to obtain the probability that each region belongs to the foreground and the background. Secondly, in order to better capture the correlation of the pedestrians in the infrared video time and space, the method constructs a graph structure for the infrared video, describes the correlation among graph nodes from multiple angles of gray scale, edge and motion, and constructs a space-time graph structure by combining the correlation of time. Finally, in order to avoid errors of the manually set graph structure and optimize significance detection, the method constructs an energy function combining graph structure self-learning and significance propagation, and obtains an optimal result through iterative solution.
(2) Technical scheme
The invention discloses an infrared video pedestrian significance detection method combining graph learning and probability propagation, which comprises the following specific steps of:
the method comprises the following steps: generating a candidate region based on the Boolean graph; firstly, carrying out super-pixel segmentation on each frame of image of an infrared video; then constructing Boolean diagrams and cascading the Boolean diagrams at all levels to obtain a series of three-dimensional areas; finally, calculating similarity operators for sub-regions of each three-dimensional region, and normalizing;
in the step one, "super-pixel segmentation is performed on each frame of image of the infrared video", which is performed as follows: using SLIC method to convert t frame image I t In which the adjacent pixels with similar gray scale and structure are clustered into irregular pixel block set with certain visual significancesp t,i And N t Respectively represent I t And the ith super pixel in (b) and the total number of super pixels. The gray level of each super pixel is the average value of the gray levels of the internal pixels, and the calculation formula is as follows:
wherein C t (p) represents the gray value of the pixel p in the t-th frame image, | sp t,i I is then superpixel sp t,i The area of (d);
in step one, "building boolean diagrams and concatenating the boolean diagrams at different levels to obtain a series of three-dimensional regions", as follows: dividing the t-th frame infrared image after the super-pixel division by using an integer from 255 to 0 as a threshold value to obtain a series of binary images to form a Boolean image B t ={B t,255 ,B t,254 ,...,B t,0 }:
B t,θ =ξ(SP t ,θ)
Where ξ is the splitting operation, SP is t Superpixels smaller than the threshold θ are labeled 0, otherwise labeled 1, B t,θ Is a boolean plot at threshold θ. According to the gray distribution characteristics of the infrared imageIt is known that B t,255 Is a completely black image or contains individual white regions, B as the threshold decreases t,θ These white areas are increasing in size, and it is known that all of them are integrated into a full white image.
Then, the slave B t,255 The connected regions appearing therein are numbered starting with 1, and all connected regions therein are assigned different numbers with consecutive integers. After B t,θ When a brand-new area which is completely not overlapped with the numbered area appears, a new number is continuously distributed to the new area; when the coded area is overlapped with the unique numbered area, the number of the area is inherited; when the number of the area is overlapped with a plurality of numbered areas, the number of the area with the largest area is inherited. Numbering all connected regions in the whole Boolean graph sequence according to the rule, and a series of regions with the same number in different Boolean graph layers can form a three-dimensional regionN r As an image I t The number of the obtained three-dimensional regions, and the three-dimensional regions are formed by a plurality of sub-region image layersl is the layer where each sub-region is located.
Wherein, in the step one, the similarity operator is calculated and normalized for the sub-regions of each three-dimensional region, and the calculation method is as follows: first, each subregion is calculatedThe greater the gray contrast of the inner superpixel to its neighboring outer superpixel, the more likely the region is to represent a complete object. The calculation formula is as follows:
wherein sp t,i Is a sub-regionInner super-pixel, sp t,j Is not subject toThe super pixel of (2); delta (-) is an indicator function when a superpixel sp t,j Belong to and sp t,i Neighboring super-pixel setsWhen the value is 1, otherwise, it is 0.
Then calculating each sub-regionThe smaller the difference between the superpixels within a sub-region, the more likely it represents a complete object. The calculation formula is as follows:
then calculating each sub-regionThe gradient information contained at the boundary, the more gradients contained at the boundary are more likely to be complete objects. The calculation formula is as follows:
wherein br t,p Is an image frame I t The response of the gradient map at pixel p,is a sub-regionThe set of boundary pixels of (2).Is a sub-regionThe area of (a).
According to the above rule, the calculation formula of the similarity descriptor is as follows:
and finally, normalizing the similarity operators of all the sub-regions in each three-dimensional region, wherein the calculation formula is as follows:
the resulting similarity operator describes the likelihood that each region can represent the complete target region.
Step two: and calculating the motion significance of each frame image. Firstly, extracting an optical flow field of a video sequence; then, calculating the motion significance of each region based on the local gradient according to the motion gradient of the image; secondly, calculating the motion significance based on the motion direction by extracting the background motion main direction; and finally, combining the motion significance with the similarity operator to obtain the probability that each super pixel belongs to the foreground/background.
Wherein, the step two of extracting the optical flow field of the video sequence refers to calculating the adjacent current image frame I by an optical flow method LDOF t To its next frame image I t+1 Between the forward motion vectors F in both the horizontal x and vertical y directions ═ Fx t ;Fy t ]. And the video is reversely calculated by using an optical flow method LDOF (laser direct imaging) to obtain a current image frame I t To its previous frame image I t-1 Backward motion vector B ═ Bx t ;By t ]。
Wherein, the calculation method of "calculating the motion saliency of each region based on the local gradient according to the motion gradient of the image" in the step twoThe method comprises the following steps: first, image frame I is calculated by summing the motion field gradients t The gradient probability of (2) is calculated as follows
WhereinAndthe tabulation represents the gradient along the horizontal direction and the gradient along the vertical direction.
Then for each sub-regionThe gradient probabilities of the internal superpixels are added to obtain the motion gradient probability of each subarea, and the calculation formula is as follows:
wherein Mg t,p Is the motion gradient probability of the pixel p,then isA motion saliency value based on a motion gradient.
In step two, the calculation method of "calculating the motion saliency based on the motion direction by extracting the background motion principal direction" is as follows: firstly, an image I is extracted t Four-edge superpixel setAs background super-pixel, then using K-means clustering method to divide the motion vector of background super-pixel into K classesAnd accordingly obtaining K clustering centersAnd the main motion direction of each class is used, the class with the total number ratio of the classes less than 1/6 is deleted, and the residual clustering center is used as the background main motion direction.
Then, each sub-region is calculatedThe difference between the inner pixel and the background main motion direction is more likely to belong to the background significant value, and the calculation formula is as follows:
wherein F t,p Is a motion vector for the pixel p,then isA motion saliency value based on a direction of motion.
The calculation method of the "combining the motion saliency with the similarity operator to obtain the probability of each super pixel belonging to the foreground/background" in the step two is as follows: firstly, motion gradient significant values and analog values of the superpixel in all candidate regions are accumulated to obtain the foreground probability of the superpixel, and the calculation formula is as follows:
Then, the motion direction significant value and the similarity value of the superpixel in all the candidate areas are accumulated to obtain the background probability of the superpixel, and the calculation formula is as follows:
Step three: and constructing a multi-view space-time diagram structure. Firstly, constructing a spatial neighborhood relationship of each node of a graph structure, and then constructing a spatial correlation matrix from three aspects of gray scale, edge and motion; a time correlation matrix between adjacent frame nodes is then constructed. And finally, combining the time correlation matrix and the space correlation matrix to obtain the space-time correlation matrix of different visual angles.
The "spatial neighborhood relationship of each node in the graph structure is constructed" in step three, which is performed as follows: each super pixel is taken as a node of the graph, and the set of the super pixels adjacent to the super pixel and the super pixels adjacent to the super pixels is taken as a spatial neighborhood of the node.
The method for "constructing the spatial correlation matrix from three aspects of gray scale, edge and motion" in step three includes the following steps: firstly, the gray scale difference value of each node and the neighborhood thereof is calculated to measure the gray scale similarity of each node, thereby constructing a gray scale correlation matrixThe calculation formula is as follows:
whereinIs node sp t,i The neighborhood of (a) is determined,is node sp t,i And sp t,j The gray scale dependency of (a).
Then, gradient calculation edge correlation matrix of common edges owned by each adjacent node is measuredThe stronger the gradient information separates adjacent superpixels, the same object cannot be attributed to, and the calculation formula is:
wherein bp is t,i→j Is a super pixel sp t,i Neutral sp t,j Set of adjacent pixels, bp t,j→i Is a super pixel sp t,j Neutral sp t,i Neighboring sets of pixels br t,p And br t,q The gradient values of pixels p and q respectively,then is sp with the super pixel t,j A set of directly adjacent superpixels.Is node sp t,i And sp t,j The edge correlation of (1).
Wherein the step three describes "constructing a time correlation matrix between adjacent frame nodes", which is performed as follows: firstly, two adjacent continuous frames I t And I t+1 The forward and backward optical flows are mapped to each other, respectively, and the calculation formula is as follows:
pos p ,pos q respectively, is an image I t Middle pixel p and image I t+1 The position coordinate vector of the middle pixel q,is to mix pos p Using f t,p Mapping to the next frame I t+1 The position vector of (a) is determined,then is the pos q By using b t+1,q Mapping to previous frame I t Is determined. Then, respectively calculating the motion difference between each pixel and the mapping pixel thereof to obtain the forward and backward motion probabilities:
whereinIn order to be the probability of a forward motion,is the backward motion probability. Finally, the forward and backward motion probabilities in the super-pixel overlapping part of the super-pixel and its adjacent frame are accumulated to measure their motion consistency, thereby constructing a time correlation matrixThe calculation formula is as follows:
The method for obtaining the spatio-temporal correlation matrix of different viewing angles by combining the temporal correlation matrix and the spatial correlation matrix in step three is as follows: firstly, serially connecting super pixels of two continuous frames of images to obtain a super pixel containing N-N t +N t+1 A graph of individual nodes; then, spatial correlation matrixes of gray scale, edge and motion are respectively combined with the temporal correlation matrix to obtain a space-time correlation matrix which describes the consistency of the nodes on time and space in different anglesN v The calculation formula is the number of the angles as follows:
step four: and constructing and solving an energy function combining graph self-learning and significance propagation. An energy function containing graph learning, significance propagation and joint learning items is constructed on the basis of a graph structure, and variables in the energy function are sequentially solved by using an alternative optimization method, so that an optimized graph correlation matrix and a significance detection result are obtained.
Wherein, the method for constructing the energy function including the graph learning, the significance propagation and the joint learning term on the basis of the graph structure in the fourth step is as follows: firstly, two frames of images I are constructed t And I t+1 The cascaded initial significance vector describes each node of the graph, and the calculation formula is as follows:
then, since the times t and t +1 do not appear in the optimization process of the energy function, in order to simplify the formula, the time indices t and t +1 are omitted from the variables, so that the energy function is constructed as follows:
s.t.W1=1,W≥0 n ,η1=1,η≥0
wherein the first item in the first row is a graph learning item, W is an optimized space-time correlation matrix to be learned, and the optimal combination weight is obtained by learning the correlation matrix of multiple visual anglesTo pairLinear addition fits the best W. The second row of data items is then a significance propagation item,andthen the optimized foreground and background probabilities of significance should be kept consistent with the initial foreground and background probabilities, respectively. The last two items in the first row are joint learning items, and the optimized correlation matrix obtained by learning is used for ensuring that nodes with high consistency in time or space should keep similar significance values, and the item plays a role in graph learning and significance propagation. The third row is an optimization constraint that requires that the sum of the vectors of rows of W be 1 and that there cannot be elements less than zero. Meanwhile, the sum of the view angle weights eta is ensured to be 1 and not less than 0.
The step four of "solving the variables in sequence by using the alternative optimization method" means that when solving one variable, other variables are regarded as constants and kept unchanged by using the alternative optimization method, then the variables are updated to solve the other variables, and the energy function is solved in sequenceW and eta are solved as follows:
the calculation formula is as follows:
then holdAnd eta is a constant, and W is calculated by the following calculation method: first for each row vector W of W i The energy function is simplified as:
the minimization problem is solved by utilizing an Optimization toolbox of a Matlab program to obtain w i And sequentially solving the pairs from i to N to obtain the optimized W.
whereinThe minimization problem can also be solved to η using the Optimization toolkit of Matlab program.
Finally in function of original energyW and η are updated to the solved values and the above calculation is repeated. Repetition ofThe value solved by the energy function will reach steady state after 7 times. At this time, the final product will beAnd combining the values to obtain a final significance value, wherein the calculation method comprises the following steps:
Through the steps, a good significance detection result can be obtained for the infrared pedestrian video, pedestrians are completely highlighted, the background is almost completely inhibited, and the method has practical application values in other image processing fields such as target segmentation, target tracking, target retrieval and the like.
(3) Compared with the prior art, the invention has the advantages that:
the method provides a candidate region extraction method based on the Boolean diagram, can more completely maintain the edge information and the structural information of the object, and can better highlight the whole remarkable target.
Meanwhile, the method provides a saliency description method combining motion saliency and similarity operator, and can fully describe the saliency of the moving object in the video from the perspective of space and time. Compared with the prior method, the method can be applied to more complex backgrounds and difficult scenes such as camera motion.
Then, the method provides an optimization model combining graph structure self-learning and significance propagation. Different from the prior graph model which adopts a correlation matrix set manually, the correlation matrix of the method can be corrected by self by utilizing significance information in the learning process. While the saliency propagation is optimized with continuously revised graph structures. Compared with the prior method, the method can obtain more robust and more accurate detection results.
Drawings
FIG. 1 is a block diagram of the detection method of the present invention.
Detailed Description
For better understanding of the technical solutions of the present invention, the following further describes embodiments of the present invention with reference to the accompanying drawings.
The flow chart of the invention is shown in fig. 1, the invention provides an infrared video pedestrian significance detection method combining graph learning and probability propagation, which comprises the following specific implementation steps:
the method comprises the following steps: generating a candidate region based on the Boolean graph;
firstly, carrying out super-pixel segmentation on each frame of image of an infrared video; using SLIC method to convert t frame image I t In which the adjacent pixels with similar gray scale and structure are clustered into irregular pixel block set with certain visual significancesp t,i And N t Respectively represent I t And the ith super pixel in (b) and the total number of super pixels. The gray level of each super pixel is the average value of the gray levels of the internal pixels, and the calculation formula is as follows:
wherein C t (p) represents the gray value of the pixel p in the t-th frame image, | sp t,i I is then superpixel sp t,i The area of (d);
then constructing a Boolean graph; dividing the t-th frame infrared image after the super-pixel division by using an integer from 255 to 0 as a threshold value to obtain a series of binary images to form a Boolean image B t ={B t,255 ,B t,254 ,...,B t,0 }:
B t,θ =ξ(SP t ,θ)
Where ξ is the splitting operation, SP is t Superpixels smaller than the threshold θ are labeled 0, otherwise labeled 1, B t,θ Is a boolean plot at threshold θ. According to the gray level distribution characteristics of the infrared image, B t,255 Is a completely black image or contains individual white regions, B as the threshold decreases t,θ These white areas are increasing in size, and it is known that all of them are integrated into a full white image.
Then, cascading adjacent Boolean diagrams on the basis to construct a three-dimensional candidate region; will be from B t,255 The connected regions appearing therein are numbered starting with 1, and all connected regions therein are assigned different numbers with consecutive integers. After B t,θ When a brand-new area which is completely not overlapped with the numbered area appears, a new number is continuously distributed to the new area; when the coded area is overlapped with the unique numbered area, the number of the area is inherited; when the number of the area is overlapped with a plurality of numbered areas, the number of the area with the largest area is inherited. Numbering all connected regions in the whole Boolean graph sequence according to the rule, and a series of regions with the same number in different Boolean graph layers can form a three-dimensional regionN r As an image I t The number of the obtained three-dimensional regions, and the three-dimensional regions are formed by a plurality of sub-region image layersl is the layer where each sub-region is located.
Finally, constructing a similar physical operator to calculate the value of each sub-region; first, each subregion is calculatedThe greater the gray contrast of the inner superpixel to its neighboring outer superpixel, the more likely the region is to represent a complete object. The calculation formula is as follows:
wherein sp t,i Is a sub-regionInner super-pixel, sp t,j Is not belonging toThe super pixel of (2); delta (-) is an indicator function when a superpixel sp t,j Belong to and sp t,i Neighboring super-pixel setsWhen the value is 1, otherwise, it is 0.
Then calculating each sub-regionThe smaller the difference between the superpixels within a sub-region, the more likely it represents a complete object. The calculation formula is as follows:
then calculating each sub-regionThe gradient information contained at the boundary, the more gradients contained at the boundary are more likely to be complete objects. The calculation formula is as follows:
wherein br t,p Is an image frame I t The response of the gradient map at pixel p,is a sub-regionThe set of boundary pixels of (1).Is a sub-regionThe area of (a).
According to the above rule, the calculation formula of the similarity descriptor is as follows:
and finally, normalizing the similarity operators of all the sub-regions in each three-dimensional region, wherein the calculation formula is as follows:
the resulting similarity operator describes the likelihood that each region can represent the complete target region.
Step two: calculating the motion significance of each frame image;
firstly, extracting an optical flow field of a video sequence; calculating adjacent current image frame I by optical flow method LDOF t To its next frame image I t+1 Between the forward motion vectors F in both the horizontal x and vertical y directions ═ Fx t ;Fy t ]. And the video is reversely calculated by using an optical flow method LDOF (laser direct imaging) to obtain a current image frame I t To its previous frame image I t-1 Backward motion vector B ═ Bx t ;By t ]。
Then, calculating the motion significance of each region based on the local gradient according to the motion gradient of the image; first, image frame I is calculated by summing the gradients of the motion fields t The gradient probability of (2) is calculated as follows
WhereinAndthe sub-tables represent the graduations along the horizontal direction and the graduations along the vertical direction.
Then for each sub-regionThe gradient probabilities of the internal superpixels are added to obtain the motion gradient probability of each subarea, and the calculation formula is as follows:
wherein Mg t,p Is the motion gradient probability of the pixel p,then isA motion saliency value based on a motion gradient.
Secondly, calculating the motion significance based on the motion direction by extracting the background motion main direction; firstly, extracting an image I t Four-edge superpixel setAs background super-pixel, then using K-means clustering method to divide the motion vector of background super-pixel into K classesAnd accordingly obtaining K clustering centersAnd the main motion direction of each class is used, the class with the total number ratio of the classes less than 1/6 is deleted, and the residual clustering center is used as the background main motion direction.
Then, each is calculatedSub-areaThe difference between the inner pixel and the background main motion direction is more likely to belong to the background significant value, and the calculation formula is as follows:
wherein F t,p Is a motion vector for the pixel p,then isA motion saliency value based on a direction of motion.
Finally, combining the motion significance with the similarity operator to obtain the probability that each super pixel belongs to the foreground/background; firstly, motion gradient significant values and analog values of the superpixel in all candidate regions are accumulated to obtain the foreground probability of the superpixel, and the calculation formula is as follows:
Then, the motion direction significant value and the similarity value of the superpixel in all the candidate areas are accumulated to obtain the background probability of the superpixel, and the calculation formula is as follows:
Step three: constructing a multi-view space-time diagram structure;
firstly, constructing a spatial neighborhood relationship of each node of a graph structure; each super pixel is taken as a node of the graph, and the set of the super pixels adjacent to the super pixels and the super pixels adjacent to the super pixels is taken as a spatial neighborhood of the node.
Then constructing a multi-view spatial correlation matrix; firstly, the gray scale difference value of each node and the neighborhood thereof is calculated to measure the gray scale similarity of each node, thereby constructing a gray scale correlation matrixThe calculation formula is as follows:
whereinIs node sp t,i The neighborhood of (a) is determined,is node sp t,i And sp t,j The gray scale dependency of (a).
Then, gradient calculation edge correlation matrix of common edges owned by each adjacent node is measuredThe stronger the gradient information separates adjacent superpixels, the same object cannot be attributed to, and the calculation formula is:
wherein bp is t,i→j Is a super pixel sp t,i Neutral sp t,j Set of adjacent pixels, bp t,j→i Is a super pixel sp t,j Neutral sp t,i Neighboring sets of pixels br t,p And br t,q The gradient values of the pixels p and q respectively,then is sp with the super pixel t,j A set of directly adjacent superpixels.Is node sp t,i And sp t,j The edge correlation of (1).
Then constructing a time correlation matrix between adjacent frame nodes; firstly, two adjacent continuous frames I t And I t+1 The forward and backward optical flows are mapped to each other, respectively, and the calculation formula is as follows:
pos p ,pos q respectively, is an image I t Middle pixel p and image I t+1 The position coordinate vector of the middle pixel q,is to mix pos with p Using f t,p Mapping to the next frame I t+1 The position vector of (a) is determined,then is the pos q By using b t+1,q Mapping to previous frame I t Is determined. Then, respectively calculating the motion difference between each pixel and the mapping pixel thereof to obtain the forward and backward motion probabilities:
whereinIn order to be the probability of a forward motion,is the backward motion probability. Finally, the forward and backward motion probabilities in the super-pixel overlapping part of the super-pixel and its adjacent frame are accumulated to measure their motion consistency, thereby constructing a time correlation matrixThe calculation formula is as follows:
Finally, combining the time correlation matrix with the space correlation matrix to obtain space-time correlation matrixes of different visual angles; firstly, serially connecting super pixels of two continuous frames of images to obtain a super pixel containing N-N t +N t+1 A graph of individual nodes; then, spatial correlation matrixes of gray scale, edge and motion are respectively combined with the temporal correlation matrix to obtain a space-time correlation matrix which describes the consistency of the nodes on time and space in different anglesN v The calculation formula is the number of the angles as follows:
step four: and constructing and solving an energy function combining graph self-learning and significance propagation.
Firstly, two frames of images I are constructed t And I t+1 The cascaded initial significance vector describes each node of the graph, and the calculation formula is as follows:
then, since the times t and t +1 do not appear in the optimization process of the energy function, in order to simplify the formula, the time indices t and t +1 are omitted from the variables, so that the energy function is constructed as follows:
s.t.W1=1,W≥0 n ,η1=1,η≥0
wherein the first item in the first row is a graph learning item, W is an optimized space-time correlation matrix to be learned, and the optimal combination weight is obtained by learning the correlation matrix of multiple visual anglesTo pairLinear addition fits the best W. The second row of data items is then a significance propagation item,andthen the optimized foreground and background probabilities of significance should be kept consistent with the initial foreground and background probabilities, respectively. The last two items in the first row are combined learning items, and the optimal correlation matrix obtained by learning is used for ensuring the learning in time or spaceNodes with high consistency should maintain similar significance values, which play a role in both graph learning and significance propagation. The third row is an optimization constraint that requires that the sum of the vectors of rows of W be 1 and that there cannot be elements less than zero. Meanwhile, the sum of the view angle weights eta is ensured to be 1 and not less than 0.
On the basis, the method utilizes an alternative optimization method to sequentially solve the variables in the method; first of all maintainW and eta are constants, calculatingThe energy function is simplified as:
the calculation formula is as follows:
then holdAnd eta is a constant, and W is calculated by the following calculation method: first for each row vector W of W i The energy function is simplified to:
the minimization problem is solved by utilizing an Optimization toolbox of a Matlab program to obtain w i And sequentially solving the pairs from i to N to obtain the optimized W.
whereinThe minimization problem can also be solved to η using the Optimization toolkit of Matlab program.
Finally, theIn function of original energyW and η are updated to the solved values and the above calculation is repeated. The value solved by the energy function will reach steady state after 7 repetitions. At this time, the final product will beAnd combining the values to obtain a final significance value, wherein the calculation method comprises the following steps:
Claims (14)
1. An infrared video pedestrian significance detection method combining graph learning and probability propagation is characterized by comprising the following steps of:
the method comprises the following steps: generating a candidate region based on the Boolean graph; aiming at the subsequent steps, firstly, carrying out super-pixel segmentation on each frame of image of the infrared video; then constructing Boolean diagrams and cascading the Boolean diagrams at all levels to obtain a series of three-dimensional areas; finally, calculating similarity operators for sub-regions of each three-dimensional region, and normalizing;
step two: and calculating the motion significance of each frame image. Firstly, extracting an optical flow field of a video sequence; then, calculating the motion significance of each region based on the local gradient according to the motion gradient of the image; secondly, calculating the motion significance based on the motion direction by extracting the background motion main direction; finally, combining the motion significance with the similarity operator to obtain the probability that each super pixel belongs to the foreground/background;
step three: and constructing a multi-view space-time diagram structure. Firstly, constructing a spatial neighborhood relationship of each node of a graph structure, and then constructing a spatial correlation matrix from three aspects of gray scale, edge and motion; a time correlation matrix between adjacent frame nodes is then constructed. Finally, combining the time correlation matrix with the space correlation matrix to obtain space-time correlation matrixes of different visual angles;
step four: and constructing and solving an energy function combining graph self-learning and significance propagation. An energy function containing graph learning, significance propagation and joint learning items is constructed on the basis of a graph structure, and variables in the energy function are sequentially solved by using an alternative optimization method, so that an optimized graph correlation matrix and a significance detection result are obtained.
2. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the "super-pixel segmentation of each frame image of the infrared video" in the step one is performed by the following steps: using SLIC algorithm to convert t frame image I t In which the adjacent pixels with similar gray scale and structure are clustered into irregular pixel block set with certain visual significancesp t,i And N t Respectively represent I t The ith super pixel in (b) and the total number of super pixels. The gray level of each super pixel is the average value of the gray levels of the internal pixels, and the calculation formula is as follows:
wherein C is t (p) represents the gray value of the pixel p in the t-th frame image, | sp t,i I is then superpixel sp t,i The area of (a).
3. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: in step one, "building boolean diagrams and concatenating the boolean diagrams of various levels to obtain a series of three-dimensional regions" is performed as follows: dividing the super pixel into t frame redThe external image is divided by using an integer from 255 to 0 as a threshold value, so that a series of binary image forming Boolean images B can be obtained t ={B t,255 ,B t,254 ,...,B t,0 }:
B t,θ =ξ(SP t ,θ)
Where ξ is the splitting operation, SP is t Superpixels smaller than the threshold θ are labeled 0, otherwise labeled 1, B t,θ Is a boolean plot at threshold θ. According to the gray distribution characteristics of the infrared image, B t,255 Is a completely black image or contains individual white regions, B as the threshold decreases t,θ These white areas are increasing in size, and it is known that all of them are fused together to form a full white image.
Then, the slave B t,255 The connected regions appearing therein are numbered starting with 1, and all connected regions therein are assigned different numbers with consecutive integers. After B t,θ When a brand-new area which is completely not overlapped with the numbered area appears, a new number is continuously distributed to the new area; when the coded area is overlapped with the unique numbered area, the number of the area is inherited; when the number of the area is overlapped with a plurality of numbered areas, the number of the area with the largest area is inherited. Numbering all connected regions in the whole Boolean graph sequence according to the rule, and a series of regions with the same number in different Boolean graph layers can form a three-dimensional regionN r As an image I t The number of the obtained three-dimensional regions, and the three-dimensional regions are formed by a plurality of sub-region image layersl is the layer where each sub-region is located.
4. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: calculating similarity operators for sub-regions of each three-dimensional region as described in step one, andnormalization ", the calculation method is as follows: first, each subregion is calculatedThe greater the gray contrast of the inner superpixel to its neighboring outer superpixel, the more likely the region is to represent a complete object. The calculation formula is as follows:
wherein sp t,i Is a sub-regionInner super-pixel, sp t,j Is not subject toThe super pixel of (2); delta (-) is an indicator function when a superpixel sp t,j Belong to and sp t,i Neighboring super-pixel setsWhen the value is 1, otherwise, it is 0.
Then calculating each sub-regionThe smaller the difference between the superpixels within a sub-region, the more likely it represents a complete object. The calculation formula is as follows:
then calculating each sub-regionThe gradient information contained at the boundary, the more gradients contained at the boundary are more likely to be complete objects. The calculation formula is as follows:
wherein br t,p Is an image frame I t The response of the gradient map at pixel p,is a sub-regionThe set of boundary pixels of (1).Is a sub-regionThe area of (a).
According to the above rule, the calculation formula of the similarity descriptor is as follows:
and finally, normalizing the similarity operators of all the sub-regions in each three-dimensional region, wherein the calculation formula is as follows:
the resulting similarity operator describes the likelihood that each region can represent the complete target region.
5. The infrared video pedestrian saliency coupled with map learning and probability propagation of claim 1The detection method is characterized by comprising the following steps: the step two of extracting the optical flow field of the video sequence refers to calculating the adjacent current image frame I by an optical flow method LDOF t To its next frame image I t+1 Between the forward motion vectors F in both the horizontal x and vertical y directions ═ Fx t ;Fy t ]. And the video is reversely calculated by using an optical flow method LDOF (laser direct imaging) to obtain a current image frame I t To its previous frame image I t-1 Backward motion vector B ═ Bx t ;By t ]。
6. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: in step two, the calculation method of "calculating the motion saliency of each region based on the local gradient according to the motion gradient of the image" is as follows: first, image frame I is calculated by summing the gradients of the motion fields t The gradient probability of (2) is calculated as follows
WhereinAndthe sub-tables represent the graduations along the horizontal direction and the graduations along the vertical direction.
Then for each sub-regionThe gradient probabilities of the internal superpixels are added to obtain the motion gradient probability of each subarea, and the calculation formula is as follows:
7. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: in step two, the motion saliency based on the motion direction is calculated by extracting the background motion principal direction, and the calculation method is as follows: firstly, extracting an image I t Four-edge superpixel setAs background super-pixel, then using K-means clustering method to divide the motion vector of background super-pixel into K classesAnd accordingly obtaining K clustering centersAnd the main motion direction of each class is used, the class with the total number ratio of the classes less than 1/6 is deleted, and the residual clustering center is used as the background main motion direction.
Then, each sub-region is calculatedThe difference between the inner pixel and the background main motion direction is more likely to belong to the background significant value, and the calculation formula is as follows:
8. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the calculation method for combining the motion saliency and the similarity operator to obtain the probability of each super pixel belonging to the foreground/background in the step two is as follows: firstly, motion gradient significant values and analog values of the superpixel in all candidate regions are accumulated to obtain the foreground probability of the superpixel, and the calculation formula is as follows:
Then, the motion direction significant value and the similarity value of the superpixel in all the candidate areas are accumulated to obtain the background probability of the superpixel, and the calculation formula is as follows:
9. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the "spatial neighborhood relationship of each node of the graph structure is constructed" in step three, which is performed as follows: each super pixel is taken as a node of the graph, and the set of the super pixels adjacent to the super pixel and the super pixels adjacent to the super pixels is taken as a spatial neighborhood of the node.
10. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the "spatial correlation matrix is constructed from three aspects of gray scale, edge and motion respectively" in step three, and the calculation method is as follows: firstly, the gray scale difference value of each node and the neighborhood thereof is calculated to measure the gray scale similarity of each node, thereby constructing a gray scale correlation matrixThe calculation formula is as follows:
whereinIs node sp t,i The neighborhood of (a) is determined,is node sp t,i And sp t,j The gray scale dependency of (a).
Then, gradient calculation edge correlation matrix of common edges owned by each adjacent node is measuredThe stronger the gradient information between adjacent superpixels is, the less likely they belong to the same object, and their calculationThe formula is as follows:
wherein bp is t,i→j Is a super pixel sp t,i Neutral sp t,j Set of adjacent pixels, bp t,j→i Is a super pixel sp t,j Neutral sp t,i Neighboring set of pixels br t,p And br t,q The gradient values of the pixels p and q respectively,then is sp with the super pixel t,j A set of directly adjacent superpixels.Is node sp t,i And sp t,j The edge correlation of (1).
11. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the "construction of the time correlation matrix between adjacent frame nodes" described in step three is performed as follows: firstly, two adjacent continuous frames I t And I t+1 The forward and backward optical flows are mapped to each other, respectively, and the calculation formula is as follows:
pos p ,pos q are respectively an image I t Middle pixel p and image I t+1 The position coordinate vector of the middle pixel q,is to mix pos p Using f t,p Mapping to the next frame I t+1 The position vector of (1) is determined,then is the pos q By using b t+1,q Mapping to previous frame I t Is determined. Then, respectively calculating the motion difference between each pixel and the mapping pixel thereof to obtain the forward and backward motion probabilities:
whereinIn order to be the probability of a forward motion,is the backward motion probability. Finally, the forward and backward motion probabilities in the super-pixel overlapping part of the super-pixel and its adjacent frame are accumulated to measure their motion consistency, thereby constructing a time correlation matrixThe calculation formula is as follows:
12. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the "combining the temporal correlation matrix and the spatial correlation matrix to obtain the spatio-temporal correlation matrix of different view angles" described in step three is performed as follows: firstly, serially connecting super pixels of two continuous frames of images to obtain a super pixel containing N-N t +N t+1 A graph of individual nodes; then, spatial correlation matrixes of gray scale, edge and motion are respectively combined with the temporal correlation matrix to obtain a space-time correlation matrix which describes the consistency of the nodes on time and space in different anglesN v The calculation formula is the number of the angles as follows:
13. the infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the "constructing an energy function including graph learning, significance propagation and joint learning terms on the basis of the graph structure" described in step four is performed as follows: firstly, two frames of images I are constructed t And I t+1 The cascaded initial significance vector describes each node of the graph, and the calculation formula is as follows:
then, since the times t and t +1 do not appear in the optimization process of the energy function, in order to simplify the formula, the time indices t and t +1 are omitted from the variables, so that the energy function is constructed as follows:
wherein the first item in the first row is a graph learning item, W is an optimized space-time correlation matrix to be learned, and the optimal combination weight is obtained by learning the correlation matrix of multiple visual anglesTo pairLinear addition fits the best W. The second row of data items is then a significance propagation item,andthen the optimized foreground and background probabilities of significance should be kept consistent with the initial foreground and background probabilities, respectively. The last two items in the first row are joint learning items, and the optimized correlation matrix obtained by learning is used for ensuring that nodes with high consistency in time or space should keep similar significance values, and the item plays a role in graph learning and significance propagation. The third row is an optimization constraint that requires that the sum of the vectors of rows of W be 1 and that there cannot be elements less than zero. Meanwhile, the sum of the view angle weights eta is ensured to be 1 and cannot be less than 0.
14. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the step four of "solving the variables in sequence by using the alternative optimization method" means that when solving one variable, other variables are regarded as constants and kept unchanged by using the alternative optimization method, then the variables are updated to solve the other variables, and the variables in the energy function are solved in sequenceW and eta are solved as follows:
the calculation formula is as follows:
then holdAnd eta is a constant, and W is calculated by the following calculation method: first for each row vector W of W i The energy function is simplified as:
The minimization problem is solved by utilizing an Optimization toolbox of a Matlab program to obtain w i And sequentially solving the pairs from i to N to obtain the optimized W.
whereinThe minimization problem can also be solved to η using the Optimization toolkit of Matlab program.
Finally in function of original energyW and η are updated to the solved values and the above calculation is repeated. The value solved by the energy function will reach steady state after 7 repetitions. At this time, the final product will beAnd combining the values to obtain a final significance value, wherein the calculation method comprises the following steps:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210167951.2A CN114913472A (en) | 2022-02-23 | 2022-02-23 | Infrared video pedestrian significance detection method combining graph learning and probability propagation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210167951.2A CN114913472A (en) | 2022-02-23 | 2022-02-23 | Infrared video pedestrian significance detection method combining graph learning and probability propagation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114913472A true CN114913472A (en) | 2022-08-16 |
Family
ID=82762817
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210167951.2A Pending CN114913472A (en) | 2022-02-23 | 2022-02-23 | Infrared video pedestrian significance detection method combining graph learning and probability propagation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114913472A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117133443A (en) * | 2023-08-29 | 2023-11-28 | 山东大学 | Lower limb venous thrombosis ultrasonic auxiliary diagnosis system based on video dynamic operator |
-
2022
- 2022-02-23 CN CN202210167951.2A patent/CN114913472A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117133443A (en) * | 2023-08-29 | 2023-11-28 | 山东大学 | Lower limb venous thrombosis ultrasonic auxiliary diagnosis system based on video dynamic operator |
CN117133443B (en) * | 2023-08-29 | 2024-03-12 | 山东大学 | Lower limb venous thrombosis ultrasonic auxiliary diagnosis system based on video dynamic operator |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Smith et al. | Stereo matching with nonparametric smoothness priors in feature space | |
CN103996202A (en) | Stereo matching method based on hybrid matching cost and adaptive window | |
CN103996201A (en) | Stereo matching method based on improved gradient and adaptive window | |
Bešić et al. | Dynamic object removal and spatio-temporal RGB-D inpainting via geometry-aware adversarial learning | |
CN108764244B (en) | Potential target area detection method based on convolutional neural network and conditional random field | |
Mascaro et al. | Diffuser: Multi-view 2d-to-3d label diffusion for semantic scene segmentation | |
Song et al. | Self-supervised depth completion from direct visual-lidar odometry in autonomous driving | |
Rubinstein et al. | Towards longer long-range motion trajectories | |
Jiang et al. | Data fusion-based multi-object tracking for unconstrained visual sensor networks | |
Li et al. | Two-stage adaptive object scene flow using hybrid cnn-crf model | |
CN104331890A (en) | Method and system for estimating global disparity | |
Wang et al. | Unsupervised learning of 3d scene flow from monocular camera | |
Li et al. | Deep learning based monocular depth prediction: Datasets, methods and applications | |
Salehian et al. | Dynamic programming-based dense stereo matching improvement using an efficient search space reduction technique | |
Zhang et al. | Multiscale adaptation fusion networks for depth completion | |
Liu et al. | Learning optical flow and scene flow with bidirectional camera-lidar fusion | |
CN114913472A (en) | Infrared video pedestrian significance detection method combining graph learning and probability propagation | |
Fućek et al. | Dense disparity estimation in ego-motion reduced search space | |
CN114707611B (en) | Mobile robot map construction method, storage medium and equipment based on graph neural network feature extraction and matching | |
Wang et al. | Robust obstacle detection based on a novel disparity calculation method and G-disparity | |
Liu et al. | Contextualized trajectory parsing with spatio-temporal graph | |
Salih et al. | Depth estimation using monocular cues from single image | |
Xu et al. | Moving target detection and tracking in FLIR image sequences based on thermal target modeling | |
EP2947626B1 (en) | Method and apparatus for generating spanning tree, method and apparatus for stereo matching, method and apparatus for up-sampling, and method and apparatus for generating reference pixel | |
Hehn et al. | Instance stixels: Segmenting and grouping stixels into objects |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |