CN114913472A

CN114913472A - Infrared video pedestrian significance detection method combining graph learning and probability propagation

Info

Publication number: CN114913472A
Application number: CN202210167951.2A
Authority: CN
Inventors: 李露; 郑玉; 刘博�; 罗晓燕; 周付根
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2022-02-23
Filing date: 2022-02-23
Publication date: 2022-08-16

Abstract

The invention discloses an infrared video pedestrian significance detection method combining graph learning and probability propagation, which can automatically position an infrared pedestrian region in an infrared video and accurately distinguish pedestrians from a background. The method comprises the following steps: firstly, the method comprises the following steps: generating a candidate region based on the Boolean graph; II, secondly: calculating the motion significance of each frame image; thirdly, the method comprises the following steps: constructing a multi-view space-time diagram structure; fourthly, the method comprises the following steps: and constructing and solving an energy function combining graph self-learning and significance propagation. Through the steps, the method can accurately and robustly extract the space-time significance of the infrared pedestrian target from the disordered background and multiple complex motions, almost completely inhibits the background, and has practical application value in other image processing fields such as target segmentation, target tracking, target retrieval and the like.

Description

Infrared video pedestrian significance detection method combining graph learning and probability propagation

Technical Field

The invention relates to an infrared video pedestrian significance detection method combining graph learning and probability propagation, and belongs to the field of computer vision and digital image processing. The method has wide application prospect in the fields of target segmentation, identification, tracking and the like.

Background

The detection of the saliency of images has been widely studied as an important research content in the field of computer vision, and has achieved a good effect. Video saliency detection remains a subject to be studied. Video saliency aims to automatically find and locate the portion of a given video that is most attractive to the viewer. As an effective preprocessing means, video saliency has important application in target tracking, repositioning, video compression, video summarization and the like. The existing video saliency methods are basically researched for visible light videos, but visible light images often lose effects under the challenging conditions of poor lighting conditions, extreme weather, light change and the like. The infrared imaging is used for detecting the target by means of the heat energy of passively receiving the thermal radiation of the target, is not influenced by weather and climate, and can effectively make up the deficiency of visible light images, so that the infrared imaging plays an increasingly important role in the fields of military affairs, security protection, monitoring, intelligent transportation and the like. Especially, pedestrians often have higher significance in infrared images due to the characteristic of self heat radiation, so the research of detecting the significance of infrared videos has practical significance for the development of the fields of intelligent traffic unmanned driving and the like. However, the application of video saliency to infrared images remains challenging for existing approaches, and no related approach has been proposed.

The classical video saliency detection model is generally based on low-level spatial features and motion features (such as color, texture and motion vector fields), heuristic rules (such as contrast and similarity), and a priori knowledge (such as foreground prior and background prior), and is further fused by simple mathematical operations. However, these direct fusion methods often have difficulty in obtaining robust detection results in videos containing complex situations, such as background clutter, lack of features in objects, and multiple motions, by performing frame-by-frame operations. In view of the link between video sequences, scholars propose a trajectory-based method, which extends video frames to a spatio-temporal tubular structure of points or region blocks, and a series of trajectory descriptors as bases for significance measurement. This type of method can capture the consistency of video in time and space well, but track clustering requires careful selection of a suitable model, which brings high computational complexity. Graph-based methods use probabilistic models to propagate significance values in both spatial and temporal domains, reducing computational load by connection constraints when constructing graphs. The construction of a robust graph model for video saliency detection involves three key techniques: the construction of an accurate initial saliency metric and map and the design of an energy function.

However, the above video saliency methods are often based on various features such as color texture of visible light images, and these methods are often ineffective in infrared images lacking these features, so these methods cannot be directly applied to infrared images, and robust features capable of describing infrared targets need to be proposed for the intrinsic characteristics of infrared videos. On the other hand, video saliency detection often needs to face complex motion states including camera motion, dynamic background and the like, and it is very important to describe the difference between a target and a background by using limited features of an infrared image and capture the continuity of an infrared video in a space domain and a time domain. The research aiming at the problems has very important significance.

Disclosure of Invention

(1) Objects of the invention

Infrared video pedestrian detection has important applications in the field of intelligent transportation, such as in pedestrian monitoring systems and in vehicle-mounted pedestrian detection systems. Because the infrared equipment can work all day long, the condition that visible light cannot be used under the conditions of poor illumination, bad weather and the like can be compensated to a great extent. The saliency detection can automatically determine salient objects in the positioning images, highlight the objects and suppress the background, and pedestrians in the infrared images just have the salient features due to the radiation characteristics of the pedestrians. However, due to the characteristics of low contrast, lack of color texture characteristics, low signal-to-noise ratio and the like of the infrared image, the existing saliency detection method for visible light is difficult to be directly applied to the infrared image. In addition, when the situation that the camera moves and the background is complex is faced in an infrared video, the motion significance of the pedestrian is difficult to accurately extract.

In order to solve the existing problems, the saliency can be well applied to an infrared pedestrian video, pedestrians are highlighted, and the pedestrians and the background are separated, the invention provides an infrared video pedestrian saliency detection method combining graph learning and probability propagation. Firstly, considering that the region-based saliency detection method often has more accurate edge information and can ensure the consistency of the interior of an object and reduce the calculation amount, the method provides a Boolean-graph-based candidate target region generation strategy for an infrared image to acquire a series of regions possibly containing salient targets. And for the generated regions, an analog descriptor is proposed to measure the possibility that each region represents a complete object. Then, considering that the gradient of the optical flow motion field has higher robustness and the image edge can often represent the motion direction of the image background, the method provides the infrared pedestrian significance description characteristics based on the motion gradient contrast and the motion direction difference, and combines with the similarity operator to obtain the probability that each region belongs to the foreground and the background. Secondly, in order to better capture the correlation of the pedestrians in the infrared video time and space, the method constructs a graph structure for the infrared video, describes the correlation among graph nodes from multiple angles of gray scale, edge and motion, and constructs a space-time graph structure by combining the correlation of time. Finally, in order to avoid errors of the manually set graph structure and optimize significance detection, the method constructs an energy function combining graph structure self-learning and significance propagation, and obtains an optimal result through iterative solution.

(2) Technical scheme

The invention discloses an infrared video pedestrian significance detection method combining graph learning and probability propagation, which comprises the following specific steps of:

the method comprises the following steps: generating a candidate region based on the Boolean graph; firstly, carrying out super-pixel segmentation on each frame of image of an infrared video; then constructing Boolean diagrams and cascading the Boolean diagrams at all levels to obtain a series of three-dimensional areas; finally, calculating similarity operators for sub-regions of each three-dimensional region, and normalizing;

in the step one, "super-pixel segmentation is performed on each frame of image of the infrared video", which is performed as follows: using SLIC method to convert t frame image I _t In which the adjacent pixels with similar gray scale and structure are clustered into irregular pixel block set with certain visual significance

sp _t,i And N _t Respectively represent I _t And the ith super pixel in (b) and the total number of super pixels. The gray level of each super pixel is the average value of the gray levels of the internal pixels, and the calculation formula is as follows:

wherein C _t (p) represents the gray value of the pixel p in the t-th frame image, | sp _t,i I is then superpixel sp _t,i The area of (d);

in step one, "building boolean diagrams and concatenating the boolean diagrams at different levels to obtain a series of three-dimensional regions", as follows: dividing the t-th frame infrared image after the super-pixel division by using an integer from 255 to 0 as a threshold value to obtain a series of binary images to form a Boolean image B _t ＝{B _t,255 ,B _t,254 ,...,B _t,0 }：

B _t,θ ＝ξ(SP _t ,θ)

Where ξ is the splitting operation, SP is _t Superpixels smaller than the threshold θ are labeled 0, otherwise labeled 1, B _t,θ Is a boolean plot at threshold θ. According to the gray distribution characteristics of the infrared imageIt is known that B _t,255 Is a completely black image or contains individual white regions, B as the threshold decreases _t,θ These white areas are increasing in size, and it is known that all of them are integrated into a full white image.

Then, the slave B _t,255 The connected regions appearing therein are numbered starting with 1, and all connected regions therein are assigned different numbers with consecutive integers. After B _t,θ When a brand-new area which is completely not overlapped with the numbered area appears, a new number is continuously distributed to the new area; when the coded area is overlapped with the unique numbered area, the number of the area is inherited; when the number of the area is overlapped with a plurality of numbered areas, the number of the area with the largest area is inherited. Numbering all connected regions in the whole Boolean graph sequence according to the rule, and a series of regions with the same number in different Boolean graph layers can form a three-dimensional region

N _r As an image I _t The number of the obtained three-dimensional regions, and the three-dimensional regions are formed by a plurality of sub-region image layers

l is the layer where each sub-region is located.

Wherein, in the step one, the similarity operator is calculated and normalized for the sub-regions of each three-dimensional region, and the calculation method is as follows: first, each subregion is calculated

The greater the gray contrast of the inner superpixel to its neighboring outer superpixel, the more likely the region is to represent a complete object. The calculation formula is as follows:

wherein sp _t,i Is a sub-region

Inner super-pixel, sp _t,j Is not subject to

The super pixel of (2); delta (-) is an indicator function when a superpixel sp _t,j Belong to and sp _t,i Neighboring super-pixel sets

When the value is 1, otherwise, it is 0.

Then calculating each sub-region

The smaller the difference between the superpixels within a sub-region, the more likely it represents a complete object. The calculation formula is as follows:

then calculating each sub-region

The gradient information contained at the boundary, the more gradients contained at the boundary are more likely to be complete objects. The calculation formula is as follows:

wherein br _t,p Is an image frame I _t The response of the gradient map at pixel p,

is a sub-region

The set of boundary pixels of (2).

Is a sub-region

The area of (a).

According to the above rule, the calculation formula of the similarity descriptor is as follows:

and finally, normalizing the similarity operators of all the sub-regions in each three-dimensional region, wherein the calculation formula is as follows:

the resulting similarity operator describes the likelihood that each region can represent the complete target region.

Step two: and calculating the motion significance of each frame image. Firstly, extracting an optical flow field of a video sequence; then, calculating the motion significance of each region based on the local gradient according to the motion gradient of the image; secondly, calculating the motion significance based on the motion direction by extracting the background motion main direction; and finally, combining the motion significance with the similarity operator to obtain the probability that each super pixel belongs to the foreground/background.

Wherein, the step two of extracting the optical flow field of the video sequence refers to calculating the adjacent current image frame I by an optical flow method LDOF _t To its next frame image I _t+1 Between the forward motion vectors F in both the horizontal x and vertical y directions ═ Fx _t ；Fy _t ]. And the video is reversely calculated by using an optical flow method LDOF (laser direct imaging) to obtain a current image frame I _t To its previous frame image I _t-1 Backward motion vector B ═ Bx _t ；By _t ]。

Wherein, the calculation method of "calculating the motion saliency of each region based on the local gradient according to the motion gradient of the image" in the step twoThe method comprises the following steps: first, image frame I is calculated by summing the motion field gradients _t The gradient probability of (2) is calculated as follows

Wherein

And

the tabulation represents the gradient along the horizontal direction and the gradient along the vertical direction.

Then for each sub-region

The gradient probabilities of the internal superpixels are added to obtain the motion gradient probability of each subarea, and the calculation formula is as follows:

wherein Mg _t,p Is the motion gradient probability of the pixel p,

then is

A motion saliency value based on a motion gradient.

In step two, the calculation method of "calculating the motion saliency based on the motion direction by extracting the background motion principal direction" is as follows: firstly, an image I is extracted _t Four-edge superpixel set

As background super-pixel, then using K-means clustering method to divide the motion vector of background super-pixel into K classes

And accordingly obtaining K clustering centers

And the main motion direction of each class is used, the class with the total number ratio of the classes less than 1/6 is deleted, and the residual clustering center is used as the background main motion direction.

Then, each sub-region is calculated

The difference between the inner pixel and the background main motion direction is more likely to belong to the background significant value, and the calculation formula is as follows:

wherein F _t,p Is a motion vector for the pixel p,

then is

A motion saliency value based on a direction of motion.

The calculation method of the "combining the motion saliency with the similarity operator to obtain the probability of each super pixel belonging to the foreground/background" in the step two is as follows: firstly, motion gradient significant values and analog values of the superpixel in all candidate regions are accumulated to obtain the foreground probability of the superpixel, and the calculation formula is as follows:

then it is a super pixel sp _t,i The foreground probability value of (1).

Then, the motion direction significant value and the similarity value of the superpixel in all the candidate areas are accumulated to obtain the background probability of the superpixel, and the calculation formula is as follows:

then it is a super pixel sp _t,i The background probability value of (2).

Step three: and constructing a multi-view space-time diagram structure. Firstly, constructing a spatial neighborhood relationship of each node of a graph structure, and then constructing a spatial correlation matrix from three aspects of gray scale, edge and motion; a time correlation matrix between adjacent frame nodes is then constructed. And finally, combining the time correlation matrix and the space correlation matrix to obtain the space-time correlation matrix of different visual angles.

The "spatial neighborhood relationship of each node in the graph structure is constructed" in step three, which is performed as follows: each super pixel is taken as a node of the graph, and the set of the super pixels adjacent to the super pixel and the super pixels adjacent to the super pixels is taken as a spatial neighborhood of the node.

The method for "constructing the spatial correlation matrix from three aspects of gray scale, edge and motion" in step three includes the following steps: firstly, the gray scale difference value of each node and the neighborhood thereof is calculated to measure the gray scale similarity of each node, thereby constructing a gray scale correlation matrix

The calculation formula is as follows:

wherein

Is node sp _t,i The neighborhood of (a) is determined,

is node sp _t,i And sp _t,j The gray scale dependency of (a).

Then, gradient calculation edge correlation matrix of common edges owned by each adjacent node is measured

The stronger the gradient information separates adjacent superpixels, the same object cannot be attributed to, and the calculation formula is:

wherein bp is _t,i→j Is a super pixel sp _t,i Neutral sp _t,j Set of adjacent pixels, bp _t,j→i Is a super pixel sp _t,j Neutral sp _t,i Neighboring sets of pixels br _t,p And br _t,q The gradient values of pixels p and q respectively,

then is sp with the super pixel _t,j A set of directly adjacent superpixels.

Is node sp _t,i And sp _t,j The edge correlation of (1).

Wherein the step three describes "constructing a time correlation matrix between adjacent frame nodes", which is performed as follows: firstly, two adjacent continuous frames I _t And I _t+1 The forward and backward optical flows are mapped to each other, respectively, and the calculation formula is as follows:

pos _p ,pos _q respectively, is an image I _t Middle pixel p and image I _t+1 The position coordinate vector of the middle pixel q,

is to mix pos _p Using f _t,p Mapping to the next frame I _t+1 The position vector of (a) is determined,

then is the pos _q By using b _t+1,q Mapping to previous frame I _t Is determined. Then, respectively calculating the motion difference between each pixel and the mapping pixel thereof to obtain the forward and backward motion probabilities:

wherein

In order to be the probability of a forward motion,

is the backward motion probability. Finally, the forward and backward motion probabilities in the super-pixel overlapping part of the super-pixel and its adjacent frame are accumulated to measure their motion consistency, thereby constructing a time correlation matrix

The calculation formula is as follows:

is a matrix

Middle super pixel sp _t,i Neutral sp _t+1,j The correlation value between them.

The method for obtaining the spatio-temporal correlation matrix of different viewing angles by combining the temporal correlation matrix and the spatial correlation matrix in step three is as follows: firstly, serially connecting super pixels of two continuous frames of images to obtain a super pixel containing N-N _t +N _t+1 A graph of individual nodes; then, spatial correlation matrixes of gray scale, edge and motion are respectively combined with the temporal correlation matrix to obtain a space-time correlation matrix which describes the consistency of the nodes on time and space in different angles

N _v The calculation formula is the number of the angles as follows:

step four: and constructing and solving an energy function combining graph self-learning and significance propagation. An energy function containing graph learning, significance propagation and joint learning items is constructed on the basis of a graph structure, and variables in the energy function are sequentially solved by using an alternative optimization method, so that an optimized graph correlation matrix and a significance detection result are obtained.

Wherein, the method for constructing the energy function including the graph learning, the significance propagation and the joint learning term on the basis of the graph structure in the fourth step is as follows: firstly, two frames of images I are constructed _t And I _t+1 The cascaded initial significance vector describes each node of the graph, and the calculation formula is as follows:

then, since the times t and t +1 do not appear in the optimization process of the energy function, in order to simplify the formula, the time indices t and t +1 are omitted from the variables, so that the energy function is constructed as follows:

s.t.W1＝1,W≥0 _n ,η1＝1,η≥0

wherein the first item in the first row is a graph learning item, W is an optimized space-time correlation matrix to be learned, and the optimal combination weight is obtained by learning the correlation matrix of multiple visual angles

To pair

Linear addition fits the best W. The second row of data items is then a significance propagation item,

and

then the optimized foreground and background probabilities of significance should be kept consistent with the initial foreground and background probabilities, respectively. The last two items in the first row are joint learning items, and the optimized correlation matrix obtained by learning is used for ensuring that nodes with high consistency in time or space should keep similar significance values, and the item plays a role in graph learning and significance propagation. The third row is an optimization constraint that requires that the sum of the vectors of rows of W be 1 and that there cannot be elements less than zero. Meanwhile, the sum of the view angle weights eta is ensured to be 1 and not less than 0.

The step four of "solving the variables in sequence by using the alternative optimization method" means that when solving one variable, other variables are regarded as constants and kept unchanged by using the alternative optimization method, then the variables are updated to solve the other variables, and the energy function is solved in sequence

W and eta are solved as follows:

first of all, hold

W and eta are constants, calculating

The energy function is simplified as:

solving by derivation

The calculation formula of (2) is as follows:

then hold

And eta is constant, calculating

The energy function is simplified as:

the calculation formula is as follows:

then hold

And eta is a constant, and W is calculated by the following calculation method: first for each row vector W of W _i The energy function is simplified as:

the minimization problem is solved by utilizing an Optimization toolbox of a Matlab program to obtain w _i And sequentially solving the pairs from i to N to obtain the optimized W.

Then hold

And W is a constant, and eta is calculated by first simplifying the energy function to

To solve η, W and H ^v Conversion to column vectors

And

the equation is then rewritten as:

wherein

The minimization problem can also be solved to η using the Optimization toolkit of Matlab program.

Finally in function of original energy

W and η are updated to the solved values and the above calculation is repeated. Repetition ofThe value solved by the energy function will reach steady state after 7 times. At this time, the final product will be

And combining the values to obtain a final significance value, wherein the calculation method comprises the following steps:

the calculated final significance value is obtained.

Through the steps, a good significance detection result can be obtained for the infrared pedestrian video, pedestrians are completely highlighted, the background is almost completely inhibited, and the method has practical application values in other image processing fields such as target segmentation, target tracking, target retrieval and the like.

(3) Compared with the prior art, the invention has the advantages that:

the method provides a candidate region extraction method based on the Boolean diagram, can more completely maintain the edge information and the structural information of the object, and can better highlight the whole remarkable target.

Meanwhile, the method provides a saliency description method combining motion saliency and similarity operator, and can fully describe the saliency of the moving object in the video from the perspective of space and time. Compared with the prior method, the method can be applied to more complex backgrounds and difficult scenes such as camera motion.

Then, the method provides an optimization model combining graph structure self-learning and significance propagation. Different from the prior graph model which adopts a correlation matrix set manually, the correlation matrix of the method can be corrected by self by utilizing significance information in the learning process. While the saliency propagation is optimized with continuously revised graph structures. Compared with the prior method, the method can obtain more robust and more accurate detection results.

Drawings

FIG. 1 is a block diagram of the detection method of the present invention.

Detailed Description

For better understanding of the technical solutions of the present invention, the following further describes embodiments of the present invention with reference to the accompanying drawings.

The flow chart of the invention is shown in fig. 1, the invention provides an infrared video pedestrian significance detection method combining graph learning and probability propagation, which comprises the following specific implementation steps:

the method comprises the following steps: generating a candidate region based on the Boolean graph;

firstly, carrying out super-pixel segmentation on each frame of image of an infrared video; using SLIC method to convert t frame image I _t In which the adjacent pixels with similar gray scale and structure are clustered into irregular pixel block set with certain visual significance

then constructing a Boolean graph; dividing the t-th frame infrared image after the super-pixel division by using an integer from 255 to 0 as a threshold value to obtain a series of binary images to form a Boolean image B _t ＝{B _t,255 ,B _t,254 ,...,B _t,0 }：

B _t,θ ＝ξ(SP _t ,θ)

Where ξ is the splitting operation, SP is _t Superpixels smaller than the threshold θ are labeled 0, otherwise labeled 1, B _t,θ Is a boolean plot at threshold θ. According to the gray level distribution characteristics of the infrared image, B _t,255 Is a completely black image or contains individual white regions, B as the threshold decreases _t,θ These white areas are increasing in size, and it is known that all of them are integrated into a full white image.

Then, cascading adjacent Boolean diagrams on the basis to construct a three-dimensional candidate region; will be from B _t,255 The connected regions appearing therein are numbered starting with 1, and all connected regions therein are assigned different numbers with consecutive integers. After B _t,θ When a brand-new area which is completely not overlapped with the numbered area appears, a new number is continuously distributed to the new area; when the coded area is overlapped with the unique numbered area, the number of the area is inherited; when the number of the area is overlapped with a plurality of numbered areas, the number of the area with the largest area is inherited. Numbering all connected regions in the whole Boolean graph sequence according to the rule, and a series of regions with the same number in different Boolean graph layers can form a three-dimensional region

l is the layer where each sub-region is located.

Finally, constructing a similar physical operator to calculate the value of each sub-region; first, each subregion is calculated

wherein sp _t,i Is a sub-region

Inner super-pixel, sp _t,j Is not belonging to

When the value is 1, otherwise, it is 0.

Then calculating each sub-region

then calculating each sub-region

is a sub-region

The set of boundary pixels of (1).

Is a sub-region

The area of (a).

Step two: calculating the motion significance of each frame image;

firstly, extracting an optical flow field of a video sequence; calculating adjacent current image frame I by optical flow method LDOF _t To its next frame image I _t+1 Between the forward motion vectors F in both the horizontal x and vertical y directions ═ Fx _t ；Fy _t ]. And the video is reversely calculated by using an optical flow method LDOF (laser direct imaging) to obtain a current image frame I _t To its previous frame image I _t-1 Backward motion vector B ═ Bx _t ；By _t ]。

Then, calculating the motion significance of each region based on the local gradient according to the motion gradient of the image; first, image frame I is calculated by summing the gradients of the motion fields _t The gradient probability of (2) is calculated as follows

Wherein

And

the sub-tables represent the graduations along the horizontal direction and the graduations along the vertical direction.

Then for each sub-region

wherein Mg _t,p Is the motion gradient probability of the pixel p,

then is

A motion saliency value based on a motion gradient.

Secondly, calculating the motion significance based on the motion direction by extracting the background motion main direction; firstly, extracting an image I _t Four-edge superpixel set

And accordingly obtaining K clustering centers

Then, each is calculatedSub-area

wherein F _t,p Is a motion vector for the pixel p,

then is

A motion saliency value based on a direction of motion.

Finally, combining the motion significance with the similarity operator to obtain the probability that each super pixel belongs to the foreground/background; firstly, motion gradient significant values and analog values of the superpixel in all candidate regions are accumulated to obtain the foreground probability of the superpixel, and the calculation formula is as follows:

then it is a super pixel sp _t,i Foreground probability value of (2).

then it is a super pixel sp _t,i The background probability value of (2).

Step three: constructing a multi-view space-time diagram structure;

firstly, constructing a spatial neighborhood relationship of each node of a graph structure; each super pixel is taken as a node of the graph, and the set of the super pixels adjacent to the super pixels and the super pixels adjacent to the super pixels is taken as a spatial neighborhood of the node.

Then constructing a multi-view spatial correlation matrix; firstly, the gray scale difference value of each node and the neighborhood thereof is calculated to measure the gray scale similarity of each node, thereby constructing a gray scale correlation matrix

The calculation formula is as follows:

wherein

Is node sp _t,i The neighborhood of (a) is determined,

is node sp _t,i And sp _t,j The gray scale dependency of (a).

wherein bp is _t,i→j Is a super pixel sp _t,i Neutral sp _t,j Set of adjacent pixels, bp _t,j→i Is a super pixel sp _t,j Neutral sp _t,i Neighboring sets of pixels br _t,p And br _t,q The gradient values of the pixels p and q respectively,

then is sp with the super pixel _t,j A set of directly adjacent superpixels.

Is node sp _t,i And sp _t,j The edge correlation of (1).

Then constructing a time correlation matrix between adjacent frame nodes; firstly, two adjacent continuous frames I _t And I _t+1 The forward and backward optical flows are mapped to each other, respectively, and the calculation formula is as follows:

is to mix pos with _p Using f _t,p Mapping to the next frame I _t+1 The position vector of (a) is determined,

wherein

In order to be the probability of a forward motion,

The calculation formula is as follows:

is a matrix

Finally, combining the time correlation matrix with the space correlation matrix to obtain space-time correlation matrixes of different visual angles; firstly, serially connecting super pixels of two continuous frames of images to obtain a super pixel containing N-N _t +N _t+1 A graph of individual nodes; then, spatial correlation matrixes of gray scale, edge and motion are respectively combined with the temporal correlation matrix to obtain a space-time correlation matrix which describes the consistency of the nodes on time and space in different angles

N _v The calculation formula is the number of the angles as follows:

step four: and constructing and solving an energy function combining graph self-learning and significance propagation.

Firstly, two frames of images I are constructed _t And I _t+1 The cascaded initial significance vector describes each node of the graph, and the calculation formula is as follows:

s.t.W1＝1,W≥0 _n ,η1＝1,η≥0

To pair

and

then the optimized foreground and background probabilities of significance should be kept consistent with the initial foreground and background probabilities, respectively. The last two items in the first row are combined learning items, and the optimal correlation matrix obtained by learning is used for ensuring the learning in time or spaceNodes with high consistency should maintain similar significance values, which play a role in both graph learning and significance propagation. The third row is an optimization constraint that requires that the sum of the vectors of rows of W be 1 and that there cannot be elements less than zero. Meanwhile, the sum of the view angle weights eta is ensured to be 1 and not less than 0.

On the basis, the method utilizes an alternative optimization method to sequentially solve the variables in the method; first of all maintain

W and eta are constants, calculating

The energy function is simplified as:

solving by derivation

The calculation formula of (2) is as follows:

then hold

W and eta are constants, calculating

The energy function is simplified as:

the calculation formula is as follows:

then hold

And eta is a constant, and W is calculated by the following calculation method: first for each row vector W of W _i The energy function is simplified to:

Then hold

To solve η, W and H ^v Conversion to column vectors

And

the equation is then rewritten as:

wherein

Finally, theIn function of original energy

W and η are updated to the solved values and the above calculation is repeated. The value solved by the energy function will reach steady state after 7 repetitions. At this time, the final product will be

the final significance value is obtained.

Claims

1. An infrared video pedestrian significance detection method combining graph learning and probability propagation is characterized by comprising the following steps of:

the method comprises the following steps: generating a candidate region based on the Boolean graph; aiming at the subsequent steps, firstly, carrying out super-pixel segmentation on each frame of image of the infrared video; then constructing Boolean diagrams and cascading the Boolean diagrams at all levels to obtain a series of three-dimensional areas; finally, calculating similarity operators for sub-regions of each three-dimensional region, and normalizing;

step two: and calculating the motion significance of each frame image. Firstly, extracting an optical flow field of a video sequence; then, calculating the motion significance of each region based on the local gradient according to the motion gradient of the image; secondly, calculating the motion significance based on the motion direction by extracting the background motion main direction; finally, combining the motion significance with the similarity operator to obtain the probability that each super pixel belongs to the foreground/background;

step three: and constructing a multi-view space-time diagram structure. Firstly, constructing a spatial neighborhood relationship of each node of a graph structure, and then constructing a spatial correlation matrix from three aspects of gray scale, edge and motion; a time correlation matrix between adjacent frame nodes is then constructed. Finally, combining the time correlation matrix with the space correlation matrix to obtain space-time correlation matrixes of different visual angles;

2. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the "super-pixel segmentation of each frame image of the infrared video" in the step one is performed by the following steps: using SLIC algorithm to convert t frame image I _t In which the adjacent pixels with similar gray scale and structure are clustered into irregular pixel block set with certain visual significance

sp _t,i And N _t Respectively represent I _t The ith super pixel in (b) and the total number of super pixels. The gray level of each super pixel is the average value of the gray levels of the internal pixels, and the calculation formula is as follows:

wherein C is _t (p) represents the gray value of the pixel p in the t-th frame image, | sp _t,i I is then superpixel sp _t,i The area of (a).

3. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: in step one, "building boolean diagrams and concatenating the boolean diagrams of various levels to obtain a series of three-dimensional regions" is performed as follows: dividing the super pixel into t frame redThe external image is divided by using an integer from 255 to 0 as a threshold value, so that a series of binary image forming Boolean images B can be obtained _t ＝{B _t,255 ,B _t,254 ,...,B _t,0 }：

B _t,θ ＝ξ(SP _t ,θ)

Where ξ is the splitting operation, SP is _t Superpixels smaller than the threshold θ are labeled 0, otherwise labeled 1, B _t,θ Is a boolean plot at threshold θ. According to the gray distribution characteristics of the infrared image, B _t,255 Is a completely black image or contains individual white regions, B as the threshold decreases _t,θ These white areas are increasing in size, and it is known that all of them are fused together to form a full white image.

l is the layer where each sub-region is located.

4. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: calculating similarity operators for sub-regions of each three-dimensional region as described in step one, andnormalization ", the calculation method is as follows: first, each subregion is calculated

wherein sp _t,i Is a sub-region

Inner super-pixel, sp _t,j Is not subject to

When the value is 1, otherwise, it is 0.

Then calculating each sub-region

then calculating each sub-region

is a sub-region

The set of boundary pixels of (1).

Is a sub-region

The area of (a).

5. The infrared video pedestrian saliency coupled with map learning and probability propagation of claim 1The detection method is characterized by comprising the following steps: the step two of extracting the optical flow field of the video sequence refers to calculating the adjacent current image frame I by an optical flow method LDOF _t To its next frame image I _t+1 Between the forward motion vectors F in both the horizontal x and vertical y directions ═ Fx _t ；Fy _t ]. And the video is reversely calculated by using an optical flow method LDOF (laser direct imaging) to obtain a current image frame I _t To its previous frame image I _t-1 Backward motion vector B ═ Bx _t ；By _t ]。

6. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: in step two, the calculation method of "calculating the motion saliency of each region based on the local gradient according to the motion gradient of the image" is as follows: first, image frame I is calculated by summing the gradients of the motion fields _t The gradient probability of (2) is calculated as follows

Wherein

And

Then for each sub-region

wherein Mg _t,p Is the motion gradient probability of the pixel p,

then is

A motion saliency value based on a motion gradient.

7. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: in step two, the motion saliency based on the motion direction is calculated by extracting the background motion principal direction, and the calculation method is as follows: firstly, extracting an image I _t Four-edge superpixel set

And accordingly obtaining K clustering centers

Then, each sub-region is calculated

wherein F _t,p Is a motion vector for the pixel p,

then is

A motion saliency value based on a direction of motion.

8. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the calculation method for combining the motion saliency and the similarity operator to obtain the probability of each super pixel belonging to the foreground/background in the step two is as follows: firstly, motion gradient significant values and analog values of the superpixel in all candidate regions are accumulated to obtain the foreground probability of the superpixel, and the calculation formula is as follows:

then it is a superpixel sp _t,i The foreground probability value of (1).

then it is a super pixel sp _t,i The background probability value of (2).

9. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the "spatial neighborhood relationship of each node of the graph structure is constructed" in step three, which is performed as follows: each super pixel is taken as a node of the graph, and the set of the super pixels adjacent to the super pixel and the super pixels adjacent to the super pixels is taken as a spatial neighborhood of the node.

10. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the "spatial correlation matrix is constructed from three aspects of gray scale, edge and motion respectively" in step three, and the calculation method is as follows: firstly, the gray scale difference value of each node and the neighborhood thereof is calculated to measure the gray scale similarity of each node, thereby constructing a gray scale correlation matrix

The calculation formula is as follows:

wherein

Is node sp _t,i The neighborhood of (a) is determined,

is node sp _t,i And sp _t,j The gray scale dependency of (a).

The stronger the gradient information between adjacent superpixels is, the less likely they belong to the same object, and their calculationThe formula is as follows:

wherein bp is _t,i→j Is a super pixel sp _t,i Neutral sp _t,j Set of adjacent pixels, bp _t,j→i Is a super pixel sp _t,j Neutral sp _t,i Neighboring set of pixels br _t,p And br _t,q The gradient values of the pixels p and q respectively,

then is sp with the super pixel _t,j A set of directly adjacent superpixels.

Is node sp _t,i And sp _t,j The edge correlation of (1).

11. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the "construction of the time correlation matrix between adjacent frame nodes" described in step three is performed as follows: firstly, two adjacent continuous frames I _t And I _t+1 The forward and backward optical flows are mapped to each other, respectively, and the calculation formula is as follows:

pos _p ,pos _q are respectively an image I _t Middle pixel p and image I _t+1 The position coordinate vector of the middle pixel q,

is to mix pos _p Using f _t,p Mapping to the next frame I _t+1 The position vector of (1) is determined,

wherein

In order to be the probability of a forward motion,

The calculation formula is as follows:

is a matrix

12. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the "combining the temporal correlation matrix and the spatial correlation matrix to obtain the spatio-temporal correlation matrix of different view angles" described in step three is performed as follows: firstly, serially connecting super pixels of two continuous frames of images to obtain a super pixel containing N-N _t +N _t+1 A graph of individual nodes; then, spatial correlation matrixes of gray scale, edge and motion are respectively combined with the temporal correlation matrix to obtain a space-time correlation matrix which describes the consistency of the nodes on time and space in different angles

N _v The calculation formula is the number of the angles as follows:

。

13. the infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the "constructing an energy function including graph learning, significance propagation and joint learning terms on the basis of the graph structure" described in step four is performed as follows: firstly, two frames of images I are constructed _t And I _t+1 The cascaded initial significance vector describes each node of the graph, and the calculation formula is as follows:

To pair

and

then the optimized foreground and background probabilities of significance should be kept consistent with the initial foreground and background probabilities, respectively. The last two items in the first row are joint learning items, and the optimized correlation matrix obtained by learning is used for ensuring that nodes with high consistency in time or space should keep similar significance values, and the item plays a role in graph learning and significance propagation. The third row is an optimization constraint that requires that the sum of the vectors of rows of W be 1 and that there cannot be elements less than zero. Meanwhile, the sum of the view angle weights eta is ensured to be 1 and cannot be less than 0.

14. The infrared video pedestrian significance detection method combining graph learning and probability propagation according to claim 1, characterized in that: the step four of "solving the variables in sequence by using the alternative optimization method" means that when solving one variable, other variables are regarded as constants and kept unchanged by using the alternative optimization method, then the variables are updated to solve the other variables, and the variables in the energy function are solved in sequence