CN112465872A - Image sequence optical flow estimation method based on learnable occlusion mask and secondary deformation optimization - Google Patents
Image sequence optical flow estimation method based on learnable occlusion mask and secondary deformation optimization Download PDFInfo
- Publication number
- CN112465872A CN112465872A CN202011454593.0A CN202011454593A CN112465872A CN 112465872 A CN112465872 A CN 112465872A CN 202011454593 A CN202011454593 A CN 202011454593A CN 112465872 A CN112465872 A CN 112465872A
- Authority
- CN
- China
- Prior art keywords
- optical flow
- feature
- layer
- deformation
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000003287 optical effect Effects 0.000 title claims abstract description 140
- 238000005457 optimization Methods 0.000 title claims abstract description 24
- 238000000034 method Methods 0.000 title claims abstract description 23
- 238000006073 displacement reaction Methods 0.000 claims abstract description 17
- 238000004364 calculation method Methods 0.000 claims abstract description 15
- 230000006870 function Effects 0.000 claims description 15
- 230000004913 activation Effects 0.000 claims description 13
- 238000010586 diagram Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 5
- 230000003247 decreasing effect Effects 0.000 claims description 4
- 241000282326 Felis catus Species 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 230000003213 activating effect Effects 0.000 claims description 2
- 230000009467 reduction Effects 0.000 claims 1
- 230000003313 weakening effect Effects 0.000 claims 1
- 238000011160 research Methods 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G06T5/73—
-
- G06T5/80—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20016—Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Abstract
The invention discloses an image sequence light stream estimation method based on a learnable occlusion mask and secondary deformation optimization, which comprises the steps of firstly inputting any continuous two frames of images in an image sequence, and carrying out feature pyramid downsampling and layering on the images to obtain multi-resolution two-frame features; calculating the correlation degree of the first frame feature and the second frame feature in each layer of pyramid, and constructing an occlusion mask-based module by utilizing the correlation degree; then, removing the edge artifact of the deformation feature by using the obtained shielding mask to optimize the optical flow of the image motion edge blur; constructing a secondary deformation optimization module by using the optical flow after the occlusion constraint, and further optimizing the estimation of the optical flow of the image motion edge at a sub-pixel level by secondary deformation; and carrying out the same shielding mask and secondary deformation on the deformation features in each pyramid layer to obtain a residual flow to refine the optical flow, and outputting the final optimized optical flow estimation when the optical flow reaches the pyramid bottom layer. The method has higher calculation precision and better applicability to image sequences such as motion occlusion, large displacement motion and the like.
Description
Technical Field
The invention relates to an image sequence processing technology, in particular to an image sequence optical flow estimation method based on a learnable occlusion mask and secondary deformation optimization.
Background
The optical flow is a method for calculating the motion information of an object between adjacent frames by finding the corresponding relation between the previous frame and the current frame according to the change of pixel points in an image sequence in a time domain and the correlation between the adjacent frames. Geometric and motion information of objects in a scene can be well recognized through optical flow estimation of an image sequence. In recent years, with the rapid development of deep learning theory and technology, the convolutional neural network model is widely applied to the optical flow estimation technology research, and because the method has the remarkable advantages of high calculation speed, high stability and the like, the optical flow estimation gradually becomes a hotspot in the image processing and computer vision research fields. The research results are widely applied to higher-level visual tasks such as target detection, target tracking, action recognition, automatic driving, three-dimensional reconstruction and the like.
At present, optical flow estimation based on deep learning is the most commonly adopted method in the research of image sequence optical flow computing technology, and compared with the traditional optical flow estimation technology research based on mathematical reasoning extraction feature matching iteration minimization energy functional, the method can estimate the optical flow more efficiently, quickly and accurately. However, due to the fact that an object in an image sequence scene has motion occlusion or large displacement motion, the problem that motion information is lost due to motion edge blurring and large displacement motion still exists in the optical flow estimation technology, robustness of an image sequence containing non-rigid motion and large displacement is poor, and application of the optical flow estimation method based on deep learning in various fields is limited.
Disclosure of Invention
The invention aims to provide an image sequence optical flow estimation method based on a learnable occlusion mask and secondary deformation optimization aiming at the defects and shortcomings of the prior art, and the optical flow estimation is refined by utilizing a learnable occlusion mask of each pyramid layer and a residual flow obtained by secondary deformation so as to improve the accuracy and robustness of an image sequence pyramid layered model for estimating the optical flow of the edge of a moving object in a scene.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows. An image sequence optical flow estimation method based on a learnable occlusion mask and secondary deformation optimization comprises the following steps:
1) inputting any two continuous frames of images in an image sequence;
2) carrying out feature pyramid downsampling layering on the two selected frames of images to obtain two frames of feature images with five layers of different resolutions;
3) firstly, calculating the correlation degree of the two frame image characteristics in the highest layer of the pyramid, and then inputting the correlation degree into an optical flow estimator to calculate an initial optical flow;
4) constructing a learnable occlusion mask optimization module by utilizing the correlation, wherein the learnable occlusion mask optimization module comprises five convolution modules which are stacked continuously, each convolution module Conv comprises a 3 x 3 convolution, a batch normalization process and an activation function LeakyRelu, and outputting a single-channel occlusion mask characteristic by inputting the correlation, the optical flow, the context characteristic and the first frame characteristic in the current pyramid layer into the learnable occlusion mask module, wherein the number of characteristic channels decreases gradually layer by layer when passing through the continuous convolution layers and is respectively 128, 96, 64, 32 and 1, wherein the last layer has no activation function, and the calculation formula is as follows:
in the formula: fⅰ 1、Fⅰ 2Respectively represent the characteristics of the first frame and the second frame of the ith-2, 3,4,5,6 th layer(ii) a Wherein x1 and x2 respectively represent the coordinates of the characteristic pixel points of the corresponding first frame and the second frame; if d is less than or equal to 4, setting the maximum displacement of the pixel point to be 4 pixels, and calculating the matching correlation degree, wherein the size of a target search window in the second frame feature is 9 multiplied by 9; the method comprises the following steps of (1) solving an L2 norm on a characteristic channel to carry out normalization processing on the correlation degree; representing the inner product of two features; corr represents the correlation degree between two frames of feature maps; upflowi+1Representing the optical flow after the pyramid upper-layer optical flow is up-sampled by a factor of two, and the size of the optical flow is doubled as the feature scale is increased by one time; warpiRepresenting that the feature of a second frame of the current layer is deformed after upsampling by using the (i + 1) th layer of optical flow to obtain a deformation feature warpFi 2The deformation is beneficial to reducing the matching distance of the feature space, and the deformation and displacement of the pixel points between frames are weakened; corriRepresenting the correlation degree between the two frames of feature maps of the ith layer pyramid; cat represents the cascade connection of a plurality of characteristics on the channel dimension to obtain the multi-scale context cascade connection characteristic xi(ii) a The estimationflow represents an optical flow estimator; x represents the feature after convolution of the last convolution module of the continuous stacking convolution; maskiRepresenting the ith layer of block mask;
the mask of the obtained shielding mask characteristicsiAfter upsampling, activating a function, and then performing inner product on the feature channel dimension with the lower pyramid deformation feature, because some useful optical flow information is masked while removing the edge artifact of the deformation feature, the optical flow information missing from the masked feature needs to be compensated by adding the deconvolved upper pyramid cascade feature containing the missing information, and the optimized deformation feature can be obtained; the canonical constraint of the multi-scale context occlusion mask on the second frame deformation feature can be expressed as:
in the formula:representing constraints with occlusion masksFeatures after the artifact are deformed; upmaski+1Representing the occlusion mask after upsampling by a factor of two; deconv denotes the deconvolution upper pyramid context cascade feature xi+1(ii) a sigmoid denotes an activation function that will occlude the maskiObtaining an occlusion probability mask between a threshold value and (0, 1);
5) inputting superposition of current pyramid layer optical flow, correlation, context characteristics and first frame characteristics for an occlusion mask module to obtain an occlusion mask characteristic diagram; the lower the gray value of a pixel point in the shielding mask characteristic image, the pixel point tends to be in a state that a first frame exists and a second frame is shielded; on the contrary, in the shielding mask characteristic diagram, the higher the gray value of the pixel point is, the pixel point tends to be in a state of shielding in the first frame and appearing in the second frame; the method has the advantages that constraints are applied to deformation characteristics through a learnable occlusion mask, and the problem of image motion edge blurring is restrained;
6) constructing a secondary deformation optimization module by utilizing optical flow estimation; calculating optical flow according to the correlation degree of the undistorted features or the deformed features after the mask is shielded and the first frame features, performing secondary deformation on the second frame features by using the optical flow, overlapping the optical flow, the deformed features and the first frame features on feature dimensions, passing through a continuous five-layer convolution module, gradually decreasing the number of channels of a continuous convolution layer by layer, wherein the number of the channels is 128, 96, 64, 32 and 2 respectively, and the number of the channels of the continuous convolution layer is not an activation function in the last layer, and outputting a two-channel residual error stream; the sum of the residual flow and the current layer optical flow is the optical flow estimation of the final optimized image motion edge and the large displacement motion; the calculation formula is as follows:
in the formula, flowjRepresenting that the correlation degree is calculated by using the characteristics of the occlusion mask after the deformation error is restrained, and then an optical flow estimation of a j ═ {2,3,4,5,6} layer is obtained through an optical flow calculation layer;indicating optimization of optical flow with first time deformationjAgain for the second frame featureCharacteristics after secondary deformation; residualflowjIs the j-th layer residual optical flow; featjIs the j level context cascade feature; finalflowjPredicting optical flow for layer j;
7) inputting superposition of the first frame characteristics, the characteristics after secondary deformation and the current pyramid layer optical flow for a secondary deformation residual flow optimization module to obtain a residual flow; the first deformation is pixel-level optical flow estimation, the second deformation is sub-pixel-level optical flow estimation, the secondary residual flow contains rich contour information of a moving object, the optical flow field is further compensated, optical flow learning is guided, the moving edge of an image is optimized, the matching distance of a feature space is reduced, and missing optical flow information caused by large displacement motion is made up;
8) carrying out the same occlusion mask constraint and secondary deformation calculation residual flow in each pyramid layer, and taking the sum of the optical flow after the occlusion mask is regular and the secondary deformation residual flow as the optical flow estimation finalflow after the motion edge of the optimized imagej(ii) a And when the image reaches the pyramid bottom layer, outputting the final dense optical flow estimation of the image sequence motion edge optimization, and obtaining the rich motion information and the geometrical structure of the object from the optical flow estimation.
The invention is based on the image sequence optical flow estimation method of the learnable occlusion mask and the secondary deformation optimization, and the learnable occlusion mask is adopted to remove the edge artifact of the deformation image caused by the movement occlusion and the residual flow generated by the secondary deformation to correct the optical flow information of the image movement edge blur and the moving object deletion caused by the large displacement movement.
Drawings
FIG. 1 is a first frame image of a KITTI2015 tracking image sequence according to an embodiment of the present invention;
FIG. 2 is a second frame image of a KITTI2015 tracking image sequence according to an embodiment of the present invention;
FIG. 3 is a block diagram of an embodiment of the pyramid-based hierarchical optical flow estimation model;
FIG. 4 is a mask feature map of a KITTI2015 tracing image sequence calculated by the embodiment of the present invention;
FIG. 5 is a residual flow graph of the secondary deformation estimation of the KITTI2015training image sequence calculated by the embodiment of the present invention;
fig. 6 is a light flow diagram of a final optimized image motion edge of a KITTI2015training image sequence calculated by the embodiment of the invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings. Referring to fig. 1 to 6, an embodiment of the present invention is illustrated below, which is based on a learnable occlusion mask and a secondary deformation optimized image sequence optical flow calculation method, using a KITTI2015training image sequence optical flow calculation experiment:
it comprises the following steps:
inputting a first frame image of a KITTI2015 tracing image sequence and a second frame image of the KITTI2015 tracing image sequence (as shown in FIG. 1 and FIG. 2);
secondly, performing feature pyramid downsampling layering on the input KITTI2015training image sequence (as shown in FIG. 3), ItRefers to the first frame image, I, of the KITTI2015training image sequencet+1Refers to the KITTI2015 trailing image sequence second frame image; establishing a convolutional neural network feature pyramid, wherein the feature pyramid comprises five continuous convolution modules, and the number of channels of the continuous convolution layers is 128, 64 and 64 respectively; a first frame image ItAnd a second frame image It+1Inputting the images into a feature pyramid after cascading, and performing pyramid feature extraction on the two frames of images to obtain two frames of feature images with five layers of different resolutions, wherein the resolution of the feature image at the lower layer of the pyramid is twice that of the feature image at the upper layer;
thirdly, calculating the correlation of the continuous frame image features in the highest layer in the feature pyramid, and then inputting the correlation into an optical flow estimator to calculate an initial optical flow (as shown in fig. 3);
fourthly, a learnable occlusion mask estimation module (shown as a dashed box in fig. 3) is constructed by utilizing the correlation, wherein the occlusion mask estimation module comprises five convolution modules which are stacked in series, and each convolution module comprises a 3 × 3 convolution, a batch normalization process and an activation function LeakyRelu; superposing the relevance, the optical flow, the context cascade feature and the first frame feature in the current pyramid layer on the feature channel dimension, inputting the superposed feature channels into a continuous convolution layer in an occlusion mask estimator, gradually decreasing the number of the feature channels of the continuous convolution layer by layers, wherein the number of the feature channels is respectively 128, 96, 64, 32 and 1, and outputting the feature of a single-channel occlusion mask if the last layer has no activation function; the calculation formula is as follows:
in the formula: fⅰ 1、Fⅰ 2The i-th ═ {2,3,4,5,6} layer characteristics of the first frame and the second frame are represented respectively; wherein x1 and x2 respectively represent the coordinates of the characteristic pixel points of the corresponding first frame and the second frame; if d is less than or equal to 4, setting the maximum displacement of the pixel point to be 4 pixels, and calculating the matching correlation degree, wherein the size of a target search window in the second frame is 9 multiplied by 9; the method comprises the following steps of (1) solving an L2 norm on a characteristic channel to carry out normalization processing on the correlation degree; corr represents the correlation degree between two frames of feature maps; upflowi+1Representing the optical flow after the pyramid upper layer optical flow is up-sampled by a factor of two, and the vector size of the pyramid upper layer optical flow is also doubled along with the increase of the optical flow scale; warpiRepresenting that the feature of a second frame of the current layer is deformed after upsampling by using the (i + 1) th layer of optical flow to obtain a deformation feature warpFi 2The deformation is beneficial to reducing the matching distance of the feature space, and the deformation and displacement of the pixel points between frames are weakened; corriRepresenting the correlation degree between the two frames of feature maps of the ith layer pyramid; cat represents the cascade connection of a plurality of characteristics on the channel dimension to obtain the multi-scale context cascade connection characteristic xi(ii) a The estimationflow represents an optical flow estimator; x represents the feature after convolution of the last convolution module of the continuous stacking convolution; maskiRepresenting the ith layer of block mask;
the mask of the obtained shielding mask characteristicsiAfter up-sampling, the data is processed by an activation function and then deformed with the lower pyramidThe features are subjected to inner product on the dimension of the feature channel, and as some useful optical flow information is masked while the edge artifact of the deformation features is removed, the optical flow information with the feature missing after masking needs to be compensated by the deconvolution upper pyramid cascade features containing the missing information, so that the optimized deformation features can be obtained (as shown by a shielding mask estimation module in a dashed frame in fig. 3); the canonical constraint of the multi-scale context occlusion mask on the second frame deformation feature can be expressed as:
in the formula:representing features after constraining the deformed artifact with the occlusion mask; upmaski+1Representing the occlusion mask after upsampling by a factor of two; deconv denotes the deconvolution upper pyramid context cascade feature xi+1(ii) a sigmoid denotes an activation function that will occlude the maskiObtaining an occlusion probability mask between a threshold value and (0, 1);
fifthly, inputting the superposition of the optical flow, the feature map relevancy, the context cascade feature and the first frame feature of the current pyramid layer into the learnable occlusion mask estimation module in each pyramid layer, outputting one learnable occlusion mask feature for each pyramid layer to be used for regularly constraining the deformation feature of the pyramid layer in the lower layer, and continuously optimizing the optical flow error; when the pyramid bottom layer is reached, a KITTI2015 tracing image sequence shielding mask feature map (as shown in FIG. 4) is obtained; in the shielding mask characteristic diagram, the lower the gray value of a pixel point is, the pixel point tends to exist in a first frame and is in a shielding state in a second frame; on the contrary, in the shielding mask feature map, the higher the gray value of a pixel point is, the more the pixel point tends to be in a state of shielding in the first frame and appearing in the second frame; the problem of fuzzy motion edge of an optical flow estimation image is suppressed by applying constraint on deformation characteristics through a learnable occlusion mask;
constructing a secondary deformation residual flow estimation module by utilizing optical flow estimation; the secondary deformation residual error flow estimator in the module comprises a continuous five-layer convolution module, the number of channels of the continuous convolution layer is gradually decreased layer by layer and is respectively 128, 96, 64, 32 and 2, and the last layer has no activation function; a first secondary deformation residual flow estimation module (shown as a dashed line frame at the lower left of fig. 3), directly deforming a second frame feature by using an initial optical flow calculated by a topmost pyramid through a feature deformer, and then superposing the optical flow, the deformation feature and the first frame feature on a feature channel dimension and inputting the superposed optical flow, the deformation feature and the first frame feature into a secondary deformation residual flow estimator to obtain a residual flow; the sum of the residual flow and the initial optical flow is used as the optical flow estimation after the optimization of the top pyramid; inputting the second frame characteristics of the topmost optimized optical flow deformation into a learnable occlusion mask estimation module to obtain deformation characteristics after regular constraint; inputting the calculated correlation degree of the deformation feature after the mask is shielded and the first frame feature into an optical flow estimator to calculate an optical flow after the mask constraint optimization, and secondarily deforming the second frame feature again by using the optical flow after the mask constraint optimization; superposing the optical flow, the deformation characteristic and the first frame characteristic on the characteristic channel dimension, and then inputting the optical flow, the deformation characteristic and the first frame characteristic into a second secondary deformation residual flow estimation module (shown as a dashed box at the lower right of the figure 3) to obtain a secondary residual flow; the sum of the secondary residual flow and the current layer optical flow is used as the optical flow estimation of the final optimized image motion edge and the large displacement motion; the calculation formula is as follows:
in the formula, flowjCalculating a correlation degree by using the characteristics of the occlusion mask constrained deformation error, and inputting the correlation degree into an optical flow estimator to obtain optical flow estimation of a jth {2,3,4,5,6} layer;indicating optimization of optical flow with first time deformationjPerforming secondary deformation on the second frame feature; residualflowjIs the j layer residual stream; featjIs above and below the jth layerA text cascade feature; finalflowjPredicting optical flow for layer j;
seventhly, the superposition of the first frame features, the features after secondary deformation and the optical flow of the current pyramid layer is input into a secondary deformation residual flow estimation module in each pyramid layer, and a residual flow is output from each pyramid layer to further optimize the optical flow after the occlusion mask is restrained; when the pyramid bottom layer is reached, the last layer of residual stream estimation is obtained (as shown in fig. 5); the first deformation is pixel-level optical flow estimation, the second deformation is sub-pixel-level optical flow estimation, the secondary residual flow contains rich contour information of a moving object, the optical flow field is further compensated, optical flow learning is guided, the moving edge of an image is optimized, the matching distance of a feature space is reduced, and missing optical flow information caused by large displacement motion is made up;
eighthly, finally, carrying out the same occlusion mask constraint and secondary deformation calculation residual flow in each layer of the pyramid, taking the sum of the optical flow after the occlusion mask is regular and the secondary deformation residual flow as the optical flow estimation finalflow after the motion edge of the optimized imagej(ii) a When the pyramid bottom layer is reached, outputting the final KITTI2015 tracing image sequence motion edge optimized dense optical flow estimation (as shown in FIG. 6); the larger the gray value of the pixel point in the dense optical flow estimation is, the larger the optical flow value of the pixel point is, and the faster the relative motion speed is; conversely, the larger the gray value of the pixel point in the dense optical flow estimation is, the smaller the optical flow value of the pixel point is, and the smaller the relative motion speed is; the rich motion information and the geometric structure information of the object are obtained from the optical flow estimation, and the method can be effectively applied to more advanced visual tasks.
The above description is only for the purpose of illustrating the technical solutions of the present invention and not for the purpose of limiting the same, and other modifications or equivalent substitutions made by those skilled in the art to the technical solutions of the present invention should be covered within the scope of the claims of the present invention without departing from the spirit and scope of the technical solutions of the present invention.
Claims (1)
1. An image sequence optical flow estimation method based on a learnable occlusion mask and secondary deformation optimization comprises the following steps:
1) inputting any two continuous frames of images in an image sequence;
2) carrying out feature pyramid downsampling layering on the two selected frames of images to obtain two frames of feature images with five layers of different resolutions;
3) firstly, calculating the correlation degree of the two frame image characteristics in the highest layer of the pyramid, and then inputting the correlation degree into an optical flow estimator to calculate an initial optical flow;
4) constructing a learnable occlusion mask optimization module by utilizing the correlation, wherein the module comprises five convolution modules which are continuously stacked, each convolution module Conv comprises a 3 x 3 convolution, a batch normalization process and an activation function Leaky Relu, the learnable occlusion mask module outputs a single-channel occlusion mask characteristic by inputting the correlation, the optical flow, the context cascade characteristic and the first frame characteristic in the current pyramid layer, and the number of characteristic channels is gradually reduced layer by layer when the learnable occlusion mask module passes through the continuous convolution layers, wherein the number of the characteristic channels is respectively 128, 96, 64, 32 and 1, the last layer does not have the activation function, and the calculation formula is as follows:
in the formula: fⅰ 1、Fⅰ 2The i-th ═ {2,3,4,5,6} layer characteristics of the first frame and the second frame are represented respectively; wherein x1 and x2 respectively represent the coordinates of the characteristic pixel points of the corresponding first frame and the second frame; if d is less than or equal to 4, setting the maximum displacement of the pixel point to be 4 pixels, and calculating the matching correlation degree, wherein the size of a target search window in the second frame feature is 9 multiplied by 9; the method comprises the following steps of (1) solving an L2 norm on a characteristic channel to carry out normalization processing on the correlation degree; representing the inner product of two features; corr represents the correlation degree between two frames of feature maps; upflowi+1Representing the optical flow after the pyramid upper-layer optical flow is up-sampled by a factor of two, and the size of the optical flow is doubled as the feature scale is increased by one time; warpiRepresenting that the feature of a second frame of the current layer is deformed after upsampling by using the (i + 1) th layer of optical flow to obtain a deformation feature warpFi 2The deformation contributing to a reduction in feature spaceMatching distance, and weakening deformation and displacement of pixel points between frames; corriRepresenting the correlation degree between the two frames of feature maps of the ith layer pyramid; cat represents the cascade connection of a plurality of characteristics on the channel dimension to obtain the multi-scale context cascade connection characteristic xi(ii) a The estimationflow represents an optical flow estimator; x represents the feature after convolution of the last convolution module of the continuous stacking convolution; maskiRepresenting the ith layer of block mask features;
the mask of the obtained shielding mask characteristicsiAfter upsampling, activating a function, and then performing inner product on the feature channel dimension with the lower pyramid deformation feature, because some useful optical flow information is masked while removing the edge artifact of the deformation feature, the optical flow information missing from the masked feature needs to be compensated by adding the deconvolved upper pyramid cascade feature containing the missing information, and the optimized deformation feature can be obtained; the canonical constraint of the multi-scale context occlusion mask on the second frame deformation feature can be expressed as:
in the formula:representing features after constraining the deformed artifact with the occlusion mask; upmaski+1Representing the occlusion mask after upsampling by a factor of two; deconv denotes the deconvolution upper pyramid context cascade feature xi+1(ii) a sigmoid denotes an activation function that will occlude the maskiObtaining an occlusion probability mask between a threshold value and (0, 1);
5) inputting the superposition of the current pyramid layer optical flow, the correlation, the context cascade feature and the first frame feature for the occlusion mask module to obtain an occlusion mask feature map; the lower the gray value of a pixel point in the shielding mask characteristic image, the pixel point tends to be in a state that a first frame exists and a second frame is shielded; on the contrary, in the shielding mask characteristic diagram, the higher the gray value of the pixel point is, the pixel point tends to be in a state of shielding in the first frame and appearing in the second frame; the method has the advantages that constraints are applied to deformation characteristics through a learnable occlusion mask, and the problem of image motion edge blurring is restrained;
6) constructing a secondary deformation optimization module by utilizing optical flow estimation; calculating optical flow according to the correlation degree of the undistorted features or the deformed features after the mask is shielded and the first frame features, performing secondary deformation on the second frame features by using the optical flow, overlapping the optical flow, the deformed features and the first frame features on feature dimensions, passing through a continuous five-layer convolution module, decreasing the number of channels of continuous convolution layers to 128, 96, 64, 32 and 2 respectively, and outputting two-channel residual error flow if the last layer has no activation function; the sum of the residual flow and the current layer optical flow is the optical flow estimation of the final optimized image motion edge and the large displacement motion; the calculation formula is as follows:
in the formula, flowjRepresenting that the correlation degree is calculated by using the characteristics of the occlusion mask after the deformation error is restrained, and then an optical flow estimation of a j ═ {2,3,4,5,6} layer is obtained through an optical flow calculation layer;indicating optimization of optical flow with first time deformationjPerforming secondary deformation on the second frame feature; residualflowjIs the j-th layer residual optical flow; featjIs the j level context cascade feature; finalflowjPredicting optical flow for layer j;
7) inputting superposition of the first frame characteristics, the characteristics after secondary deformation and the current pyramid layer optical flow for a secondary deformation residual flow optimization module to obtain a residual flow; the first deformation is pixel-level optical flow estimation, the second deformation is sub-pixel-level optical flow estimation, the secondary residual flow contains rich contour information of a moving object, the optical flow field is further compensated, optical flow learning is guided, the moving edge of an image is optimized, the matching distance of a feature space is reduced, and missing optical flow information caused by large displacement motion is made up;
8) carrying out the same occlusion mask constraint and secondary deformation calculation residual flow in each pyramid layer, and taking the sum of the optical flow after the occlusion mask is regular and the secondary deformation residual flow as the optical flow estimation finalflow after the motion edge of the optimized imagej(ii) a And when the image reaches the pyramid bottom layer, outputting the final dense optical flow estimation of the image sequence motion edge optimization, and obtaining the rich motion information and the geometrical structure of the object from the optical flow estimation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011454593.0A CN112465872B (en) | 2020-12-10 | 2020-12-10 | Image sequence optical flow estimation method based on learnable occlusion mask and secondary deformation optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011454593.0A CN112465872B (en) | 2020-12-10 | 2020-12-10 | Image sequence optical flow estimation method based on learnable occlusion mask and secondary deformation optimization |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112465872A true CN112465872A (en) | 2021-03-09 |
CN112465872B CN112465872B (en) | 2022-08-26 |
Family
ID=74801930
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011454593.0A Active CN112465872B (en) | 2020-12-10 | 2020-12-10 | Image sequence optical flow estimation method based on learnable occlusion mask and secondary deformation optimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112465872B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114926498A (en) * | 2022-04-26 | 2022-08-19 | 电子科技大学 | Rapid target tracking method based on space-time constraint and learnable feature matching |
CN115937906A (en) * | 2023-02-16 | 2023-04-07 | 武汉图科智能科技有限公司 | Occlusion scene pedestrian re-identification method based on occlusion inhibition and feature reconstruction |
Citations (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070092122A1 (en) * | 2005-09-15 | 2007-04-26 | Jiangjian Xiao | Method and system for segment-based optical flow estimation |
US20070258707A1 (en) * | 2006-05-08 | 2007-11-08 | Ramesh Raskar | Method and apparatus for deblurring images |
CN107220596A (en) * | 2017-05-11 | 2017-09-29 | 西安电子科技大学 | Estimation method of human posture based on cascade mechanism for correcting errors |
CN107527358A (en) * | 2017-08-23 | 2017-12-29 | 北京图森未来科技有限公司 | A kind of dense optical flow method of estimation and device |
CN107862706A (en) * | 2017-11-01 | 2018-03-30 | 天津大学 | A kind of improvement optical flow field model algorithm of feature based vector |
CN108010061A (en) * | 2017-12-19 | 2018-05-08 | 湖南丹尼尔智能科技有限公司 | A kind of deep learning light stream method of estimation instructed based on moving boundaries |
US20180324465A1 (en) * | 2017-05-05 | 2018-11-08 | Disney Enterprises, Inc. | Edge-aware spatio-temporal filtering and optical flow estimation in real time |
CN108776971A (en) * | 2018-06-04 | 2018-11-09 | 南昌航空大学 | A kind of variation light stream based on layering nearest-neighbor determines method and system |
CN109086807A (en) * | 2018-07-16 | 2018-12-25 | 哈尔滨工程大学 | A kind of semi-supervised light stream learning method stacking network based on empty convolution |
WO2019005170A1 (en) * | 2017-06-30 | 2019-01-03 | Intel Corporation | Systems, methods, and apparatuses for implementing dynamic learning mask correction for resolution enhancement and optical proximity correction (opc) of lithography masks |
US20190333198A1 (en) * | 2018-04-25 | 2019-10-31 | Adobe Inc. | Training and utilizing an image exposure transformation neural network to generate a long-exposure image from a single short-exposure image |
CA3085303A1 (en) * | 2018-05-17 | 2019-11-21 | The United States Of America, Department Of Health And Human Services, National Institutes Of Health | Method and system for automatically generating and analyzing fully quantitative pixel-wise myocardial blood flow and myocardial perfusion reserve maps to detect ischemic heart disease using cardiac perfusion magnetic resonance imaging |
WO2020088766A1 (en) * | 2018-10-31 | 2020-05-07 | Toyota Motor Europe | Methods for optical flow estimation |
CN111311490A (en) * | 2020-01-20 | 2020-06-19 | 陕西师范大学 | Video super-resolution reconstruction method based on multi-frame fusion optical flow |
CN111340844A (en) * | 2020-02-24 | 2020-06-26 | 南昌航空大学 | Multi-scale feature optical flow learning calculation method based on self-attention mechanism |
US20200211206A1 (en) * | 2018-12-27 | 2020-07-02 | Baidu Usa Llc | Joint learning of geometry and motion with three-dimensional holistic understanding |
CN111402292A (en) * | 2020-03-10 | 2020-07-10 | 南昌航空大学 | Image sequence optical flow calculation method based on characteristic deformation error occlusion detection |
CN111462191A (en) * | 2020-04-23 | 2020-07-28 | 武汉大学 | Non-local filter unsupervised optical flow estimation method based on deep learning |
CN111476715A (en) * | 2020-04-03 | 2020-07-31 | 三峡大学 | Lagrange video motion amplification method based on image deformation technology |
CN111582483A (en) * | 2020-05-14 | 2020-08-25 | 哈尔滨工程大学 | Unsupervised learning optical flow estimation method based on space and channel combined attention mechanism |
CN111612825A (en) * | 2020-06-28 | 2020-09-01 | 南昌航空大学 | Image sequence motion occlusion detection method based on optical flow and multi-scale context |
CN111626308A (en) * | 2020-04-22 | 2020-09-04 | 上海交通大学 | Real-time optical flow estimation method based on lightweight convolutional neural network |
-
2020
- 2020-12-10 CN CN202011454593.0A patent/CN112465872B/en active Active
Patent Citations (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070092122A1 (en) * | 2005-09-15 | 2007-04-26 | Jiangjian Xiao | Method and system for segment-based optical flow estimation |
US20070258707A1 (en) * | 2006-05-08 | 2007-11-08 | Ramesh Raskar | Method and apparatus for deblurring images |
US20180324465A1 (en) * | 2017-05-05 | 2018-11-08 | Disney Enterprises, Inc. | Edge-aware spatio-temporal filtering and optical flow estimation in real time |
CN107220596A (en) * | 2017-05-11 | 2017-09-29 | 西安电子科技大学 | Estimation method of human posture based on cascade mechanism for correcting errors |
WO2019005170A1 (en) * | 2017-06-30 | 2019-01-03 | Intel Corporation | Systems, methods, and apparatuses for implementing dynamic learning mask correction for resolution enhancement and optical proximity correction (opc) of lithography masks |
CN107527358A (en) * | 2017-08-23 | 2017-12-29 | 北京图森未来科技有限公司 | A kind of dense optical flow method of estimation and device |
CN107862706A (en) * | 2017-11-01 | 2018-03-30 | 天津大学 | A kind of improvement optical flow field model algorithm of feature based vector |
CN108010061A (en) * | 2017-12-19 | 2018-05-08 | 湖南丹尼尔智能科技有限公司 | A kind of deep learning light stream method of estimation instructed based on moving boundaries |
US20190333198A1 (en) * | 2018-04-25 | 2019-10-31 | Adobe Inc. | Training and utilizing an image exposure transformation neural network to generate a long-exposure image from a single short-exposure image |
CA3085303A1 (en) * | 2018-05-17 | 2019-11-21 | The United States Of America, Department Of Health And Human Services, National Institutes Of Health | Method and system for automatically generating and analyzing fully quantitative pixel-wise myocardial blood flow and myocardial perfusion reserve maps to detect ischemic heart disease using cardiac perfusion magnetic resonance imaging |
CN108776971A (en) * | 2018-06-04 | 2018-11-09 | 南昌航空大学 | A kind of variation light stream based on layering nearest-neighbor determines method and system |
CN109086807A (en) * | 2018-07-16 | 2018-12-25 | 哈尔滨工程大学 | A kind of semi-supervised light stream learning method stacking network based on empty convolution |
WO2020088766A1 (en) * | 2018-10-31 | 2020-05-07 | Toyota Motor Europe | Methods for optical flow estimation |
US20200211206A1 (en) * | 2018-12-27 | 2020-07-02 | Baidu Usa Llc | Joint learning of geometry and motion with three-dimensional holistic understanding |
CN111311490A (en) * | 2020-01-20 | 2020-06-19 | 陕西师范大学 | Video super-resolution reconstruction method based on multi-frame fusion optical flow |
CN111340844A (en) * | 2020-02-24 | 2020-06-26 | 南昌航空大学 | Multi-scale feature optical flow learning calculation method based on self-attention mechanism |
CN111402292A (en) * | 2020-03-10 | 2020-07-10 | 南昌航空大学 | Image sequence optical flow calculation method based on characteristic deformation error occlusion detection |
CN111476715A (en) * | 2020-04-03 | 2020-07-31 | 三峡大学 | Lagrange video motion amplification method based on image deformation technology |
CN111626308A (en) * | 2020-04-22 | 2020-09-04 | 上海交通大学 | Real-time optical flow estimation method based on lightweight convolutional neural network |
CN111462191A (en) * | 2020-04-23 | 2020-07-28 | 武汉大学 | Non-local filter unsupervised optical flow estimation method based on deep learning |
CN111582483A (en) * | 2020-05-14 | 2020-08-25 | 哈尔滨工程大学 | Unsupervised learning optical flow estimation method based on space and channel combined attention mechanism |
CN111612825A (en) * | 2020-06-28 | 2020-09-01 | 南昌航空大学 | Image sequence motion occlusion detection method based on optical flow and multi-scale context |
Non-Patent Citations (8)
Title |
---|
CHEN CHANG LOY等: ""LiteFlowNet3:Resolving Correspondence Ambiguity for More Accurate Optical Flow Estimation"", 《EUROPEAN CONFERENCE ON COMPUTER VISION》 * |
DENIS FORTUN等: ""Optical flow modeling and computation:A survey"", 《COMPUTER VISION AND IMAGE UNDERSTANDING》 * |
SHENGYU ZHAO等: ""MaskFlownet:Asymmetric Feature Matching With Learnable occlusion Mask"", 《2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
何燕: ""图像序列光流计算方法概述"", 《电脑知识与技术》 * |
张聪炫等: ""深度学习光流计算技术研究进展"", 《电子学报》 * |
汪明润: ""基于遮挡检测的非局部约束变分光流计算技术研究"", 《中国优秀硕士学位论文全文数据库》 * |
王峰等: ""利用平稳光流估计的海上视频去抖"", 《中国图象图形学报》 * |
陈震等: ""基于深度匹配的由稀疏到稠密大位移运动光流估计"", 《自动化学报》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114926498A (en) * | 2022-04-26 | 2022-08-19 | 电子科技大学 | Rapid target tracking method based on space-time constraint and learnable feature matching |
CN115937906A (en) * | 2023-02-16 | 2023-04-07 | 武汉图科智能科技有限公司 | Occlusion scene pedestrian re-identification method based on occlusion inhibition and feature reconstruction |
Also Published As
Publication number | Publication date |
---|---|
CN112465872B (en) | 2022-08-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111882002B (en) | MSF-AM-based low-illumination target detection method | |
CN111340844B (en) | Multi-scale characteristic optical flow learning calculation method based on self-attention mechanism | |
CN112651973A (en) | Semantic segmentation method based on cascade of feature pyramid attention and mixed attention | |
CN110942471B (en) | Long-term target tracking method based on space-time constraint | |
CN112465872B (en) | Image sequence optical flow estimation method based on learnable occlusion mask and secondary deformation optimization | |
CN111696110B (en) | Scene segmentation method and system | |
CN113343778B (en) | Lane line detection method and system based on LaneSegNet | |
CN112001391A (en) | Image feature fusion image semantic segmentation method | |
CN111612825A (en) | Image sequence motion occlusion detection method based on optical flow and multi-scale context | |
CN114724155A (en) | Scene text detection method, system and equipment based on deep convolutional neural network | |
CN111402292A (en) | Image sequence optical flow calculation method based on characteristic deformation error occlusion detection | |
CN116310098A (en) | Multi-view three-dimensional reconstruction method based on attention mechanism and variable convolution depth network | |
Cho et al. | Modified perceptual cycle generative adversarial network-based image enhancement for improving accuracy of low light image segmentation | |
CN116681978A (en) | Attention mechanism and multi-scale feature fusion-based saliency target detection method | |
US20230260247A1 (en) | System and method for dual-value attention and instance boundary aware regression in computer vision system | |
CN116758340A (en) | Small target detection method based on super-resolution feature pyramid and attention mechanism | |
CN116596966A (en) | Segmentation and tracking method based on attention and feature fusion | |
Guo et al. | Semantic image segmentation based on SegNetWithCRFs | |
CN115115860A (en) | Image feature point detection matching network based on deep learning | |
CN115620118A (en) | Saliency target detection method based on multi-scale expansion convolutional neural network | |
CN114494284A (en) | Scene analysis model and method based on explicit supervision area relation | |
CN113255459A (en) | Image sequence-based lane line detection method | |
CN113538527A (en) | Efficient lightweight optical flow estimation method | |
CN111986233A (en) | Large-scene minimum target remote sensing video tracking method based on feature self-learning | |
Lee et al. | Where to look: Visual attention estimation in road scene video for safe driving |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |