CN112465872A - Image sequence optical flow estimation method based on learnable occlusion mask and secondary deformation optimization - Google Patents

Image sequence optical flow estimation method based on learnable occlusion mask and secondary deformation optimization Download PDF

Info

Publication number
CN112465872A
CN112465872A CN202011454593.0A CN202011454593A CN112465872A CN 112465872 A CN112465872 A CN 112465872A CN 202011454593 A CN202011454593 A CN 202011454593A CN 112465872 A CN112465872 A CN 112465872A
Authority
CN
China
Prior art keywords
optical flow
feature
layer
deformation
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011454593.0A
Other languages
Chinese (zh)
Other versions
CN112465872B (en
Inventor
陈震
何庭建
张聪炫
胡卫明
黎明
陈昊
李凌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanchang Hangkong University
Original Assignee
Nanchang Hangkong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanchang Hangkong University filed Critical Nanchang Hangkong University
Priority to CN202011454593.0A priority Critical patent/CN112465872B/en
Publication of CN112465872A publication Critical patent/CN112465872A/en
Application granted granted Critical
Publication of CN112465872B publication Critical patent/CN112465872B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T5/73
    • G06T5/80
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention discloses an image sequence light stream estimation method based on a learnable occlusion mask and secondary deformation optimization, which comprises the steps of firstly inputting any continuous two frames of images in an image sequence, and carrying out feature pyramid downsampling and layering on the images to obtain multi-resolution two-frame features; calculating the correlation degree of the first frame feature and the second frame feature in each layer of pyramid, and constructing an occlusion mask-based module by utilizing the correlation degree; then, removing the edge artifact of the deformation feature by using the obtained shielding mask to optimize the optical flow of the image motion edge blur; constructing a secondary deformation optimization module by using the optical flow after the occlusion constraint, and further optimizing the estimation of the optical flow of the image motion edge at a sub-pixel level by secondary deformation; and carrying out the same shielding mask and secondary deformation on the deformation features in each pyramid layer to obtain a residual flow to refine the optical flow, and outputting the final optimized optical flow estimation when the optical flow reaches the pyramid bottom layer. The method has higher calculation precision and better applicability to image sequences such as motion occlusion, large displacement motion and the like.

Description

Image sequence optical flow estimation method based on learnable occlusion mask and secondary deformation optimization
Technical Field
The invention relates to an image sequence processing technology, in particular to an image sequence optical flow estimation method based on a learnable occlusion mask and secondary deformation optimization.
Background
The optical flow is a method for calculating the motion information of an object between adjacent frames by finding the corresponding relation between the previous frame and the current frame according to the change of pixel points in an image sequence in a time domain and the correlation between the adjacent frames. Geometric and motion information of objects in a scene can be well recognized through optical flow estimation of an image sequence. In recent years, with the rapid development of deep learning theory and technology, the convolutional neural network model is widely applied to the optical flow estimation technology research, and because the method has the remarkable advantages of high calculation speed, high stability and the like, the optical flow estimation gradually becomes a hotspot in the image processing and computer vision research fields. The research results are widely applied to higher-level visual tasks such as target detection, target tracking, action recognition, automatic driving, three-dimensional reconstruction and the like.
At present, optical flow estimation based on deep learning is the most commonly adopted method in the research of image sequence optical flow computing technology, and compared with the traditional optical flow estimation technology research based on mathematical reasoning extraction feature matching iteration minimization energy functional, the method can estimate the optical flow more efficiently, quickly and accurately. However, due to the fact that an object in an image sequence scene has motion occlusion or large displacement motion, the problem that motion information is lost due to motion edge blurring and large displacement motion still exists in the optical flow estimation technology, robustness of an image sequence containing non-rigid motion and large displacement is poor, and application of the optical flow estimation method based on deep learning in various fields is limited.
Disclosure of Invention
The invention aims to provide an image sequence optical flow estimation method based on a learnable occlusion mask and secondary deformation optimization aiming at the defects and shortcomings of the prior art, and the optical flow estimation is refined by utilizing a learnable occlusion mask of each pyramid layer and a residual flow obtained by secondary deformation so as to improve the accuracy and robustness of an image sequence pyramid layered model for estimating the optical flow of the edge of a moving object in a scene.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows. An image sequence optical flow estimation method based on a learnable occlusion mask and secondary deformation optimization comprises the following steps:
1) inputting any two continuous frames of images in an image sequence;
2) carrying out feature pyramid downsampling layering on the two selected frames of images to obtain two frames of feature images with five layers of different resolutions;
3) firstly, calculating the correlation degree of the two frame image characteristics in the highest layer of the pyramid, and then inputting the correlation degree into an optical flow estimator to calculate an initial optical flow;
4) constructing a learnable occlusion mask optimization module by utilizing the correlation, wherein the learnable occlusion mask optimization module comprises five convolution modules which are stacked continuously, each convolution module Conv comprises a 3 x 3 convolution, a batch normalization process and an activation function LeakyRelu, and outputting a single-channel occlusion mask characteristic by inputting the correlation, the optical flow, the context characteristic and the first frame characteristic in the current pyramid layer into the learnable occlusion mask module, wherein the number of characteristic channels decreases gradually layer by layer when passing through the continuous convolution layers and is respectively 128, 96, 64, 32 and 1, wherein the last layer has no activation function, and the calculation formula is as follows:
Figure BDA0002828160800000021
in the formula: f 1、F 2Respectively represent the characteristics of the first frame and the second frame of the ith-2, 3,4,5,6 th layer(ii) a Wherein x1 and x2 respectively represent the coordinates of the characteristic pixel points of the corresponding first frame and the second frame; if d is less than or equal to 4, setting the maximum displacement of the pixel point to be 4 pixels, and calculating the matching correlation degree, wherein the size of a target search window in the second frame feature is 9 multiplied by 9; the method comprises the following steps of (1) solving an L2 norm on a characteristic channel to carry out normalization processing on the correlation degree; representing the inner product of two features; corr represents the correlation degree between two frames of feature maps; upflowi+1Representing the optical flow after the pyramid upper-layer optical flow is up-sampled by a factor of two, and the size of the optical flow is doubled as the feature scale is increased by one time; warpiRepresenting that the feature of a second frame of the current layer is deformed after upsampling by using the (i + 1) th layer of optical flow to obtain a deformation feature warpFi 2The deformation is beneficial to reducing the matching distance of the feature space, and the deformation and displacement of the pixel points between frames are weakened; corriRepresenting the correlation degree between the two frames of feature maps of the ith layer pyramid; cat represents the cascade connection of a plurality of characteristics on the channel dimension to obtain the multi-scale context cascade connection characteristic xi(ii) a The estimationflow represents an optical flow estimator; x represents the feature after convolution of the last convolution module of the continuous stacking convolution; maskiRepresenting the ith layer of block mask;
the mask of the obtained shielding mask characteristicsiAfter upsampling, activating a function, and then performing inner product on the feature channel dimension with the lower pyramid deformation feature, because some useful optical flow information is masked while removing the edge artifact of the deformation feature, the optical flow information missing from the masked feature needs to be compensated by adding the deconvolved upper pyramid cascade feature containing the missing information, and the optimized deformation feature can be obtained; the canonical constraint of the multi-scale context occlusion mask on the second frame deformation feature can be expressed as:
Figure BDA0002828160800000031
in the formula:
Figure BDA0002828160800000032
representing constraints with occlusion masksFeatures after the artifact are deformed; upmaski+1Representing the occlusion mask after upsampling by a factor of two; deconv denotes the deconvolution upper pyramid context cascade feature xi+1(ii) a sigmoid denotes an activation function that will occlude the maskiObtaining an occlusion probability mask between a threshold value and (0, 1);
5) inputting superposition of current pyramid layer optical flow, correlation, context characteristics and first frame characteristics for an occlusion mask module to obtain an occlusion mask characteristic diagram; the lower the gray value of a pixel point in the shielding mask characteristic image, the pixel point tends to be in a state that a first frame exists and a second frame is shielded; on the contrary, in the shielding mask characteristic diagram, the higher the gray value of the pixel point is, the pixel point tends to be in a state of shielding in the first frame and appearing in the second frame; the method has the advantages that constraints are applied to deformation characteristics through a learnable occlusion mask, and the problem of image motion edge blurring is restrained;
6) constructing a secondary deformation optimization module by utilizing optical flow estimation; calculating optical flow according to the correlation degree of the undistorted features or the deformed features after the mask is shielded and the first frame features, performing secondary deformation on the second frame features by using the optical flow, overlapping the optical flow, the deformed features and the first frame features on feature dimensions, passing through a continuous five-layer convolution module, gradually decreasing the number of channels of a continuous convolution layer by layer, wherein the number of the channels is 128, 96, 64, 32 and 2 respectively, and the number of the channels of the continuous convolution layer is not an activation function in the last layer, and outputting a two-channel residual error stream; the sum of the residual flow and the current layer optical flow is the optical flow estimation of the final optimized image motion edge and the large displacement motion; the calculation formula is as follows:
Figure BDA0002828160800000033
in the formula, flowjRepresenting that the correlation degree is calculated by using the characteristics of the occlusion mask after the deformation error is restrained, and then an optical flow estimation of a j ═ {2,3,4,5,6} layer is obtained through an optical flow calculation layer;
Figure BDA0002828160800000034
indicating optimization of optical flow with first time deformationjAgain for the second frame featureCharacteristics after secondary deformation; residualflowjIs the j-th layer residual optical flow; featjIs the j level context cascade feature; finalflowjPredicting optical flow for layer j;
7) inputting superposition of the first frame characteristics, the characteristics after secondary deformation and the current pyramid layer optical flow for a secondary deformation residual flow optimization module to obtain a residual flow; the first deformation is pixel-level optical flow estimation, the second deformation is sub-pixel-level optical flow estimation, the secondary residual flow contains rich contour information of a moving object, the optical flow field is further compensated, optical flow learning is guided, the moving edge of an image is optimized, the matching distance of a feature space is reduced, and missing optical flow information caused by large displacement motion is made up;
8) carrying out the same occlusion mask constraint and secondary deformation calculation residual flow in each pyramid layer, and taking the sum of the optical flow after the occlusion mask is regular and the secondary deformation residual flow as the optical flow estimation finalflow after the motion edge of the optimized imagej(ii) a And when the image reaches the pyramid bottom layer, outputting the final dense optical flow estimation of the image sequence motion edge optimization, and obtaining the rich motion information and the geometrical structure of the object from the optical flow estimation.
The invention is based on the image sequence optical flow estimation method of the learnable occlusion mask and the secondary deformation optimization, and the learnable occlusion mask is adopted to remove the edge artifact of the deformation image caused by the movement occlusion and the residual flow generated by the secondary deformation to correct the optical flow information of the image movement edge blur and the moving object deletion caused by the large displacement movement.
Drawings
FIG. 1 is a first frame image of a KITTI2015 tracking image sequence according to an embodiment of the present invention;
FIG. 2 is a second frame image of a KITTI2015 tracking image sequence according to an embodiment of the present invention;
FIG. 3 is a block diagram of an embodiment of the pyramid-based hierarchical optical flow estimation model;
FIG. 4 is a mask feature map of a KITTI2015 tracing image sequence calculated by the embodiment of the present invention;
FIG. 5 is a residual flow graph of the secondary deformation estimation of the KITTI2015training image sequence calculated by the embodiment of the present invention;
fig. 6 is a light flow diagram of a final optimized image motion edge of a KITTI2015training image sequence calculated by the embodiment of the invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings. Referring to fig. 1 to 6, an embodiment of the present invention is illustrated below, which is based on a learnable occlusion mask and a secondary deformation optimized image sequence optical flow calculation method, using a KITTI2015training image sequence optical flow calculation experiment:
it comprises the following steps:
inputting a first frame image of a KITTI2015 tracing image sequence and a second frame image of the KITTI2015 tracing image sequence (as shown in FIG. 1 and FIG. 2);
secondly, performing feature pyramid downsampling layering on the input KITTI2015training image sequence (as shown in FIG. 3), ItRefers to the first frame image, I, of the KITTI2015training image sequencet+1Refers to the KITTI2015 trailing image sequence second frame image; establishing a convolutional neural network feature pyramid, wherein the feature pyramid comprises five continuous convolution modules, and the number of channels of the continuous convolution layers is 128, 64 and 64 respectively; a first frame image ItAnd a second frame image It+1Inputting the images into a feature pyramid after cascading, and performing pyramid feature extraction on the two frames of images to obtain two frames of feature images with five layers of different resolutions, wherein the resolution of the feature image at the lower layer of the pyramid is twice that of the feature image at the upper layer;
thirdly, calculating the correlation of the continuous frame image features in the highest layer in the feature pyramid, and then inputting the correlation into an optical flow estimator to calculate an initial optical flow (as shown in fig. 3);
fourthly, a learnable occlusion mask estimation module (shown as a dashed box in fig. 3) is constructed by utilizing the correlation, wherein the occlusion mask estimation module comprises five convolution modules which are stacked in series, and each convolution module comprises a 3 × 3 convolution, a batch normalization process and an activation function LeakyRelu; superposing the relevance, the optical flow, the context cascade feature and the first frame feature in the current pyramid layer on the feature channel dimension, inputting the superposed feature channels into a continuous convolution layer in an occlusion mask estimator, gradually decreasing the number of the feature channels of the continuous convolution layer by layers, wherein the number of the feature channels is respectively 128, 96, 64, 32 and 1, and outputting the feature of a single-channel occlusion mask if the last layer has no activation function; the calculation formula is as follows:
Figure BDA0002828160800000051
in the formula: f 1、F 2The i-th ═ {2,3,4,5,6} layer characteristics of the first frame and the second frame are represented respectively; wherein x1 and x2 respectively represent the coordinates of the characteristic pixel points of the corresponding first frame and the second frame; if d is less than or equal to 4, setting the maximum displacement of the pixel point to be 4 pixels, and calculating the matching correlation degree, wherein the size of a target search window in the second frame is 9 multiplied by 9; the method comprises the following steps of (1) solving an L2 norm on a characteristic channel to carry out normalization processing on the correlation degree; corr represents the correlation degree between two frames of feature maps; upflowi+1Representing the optical flow after the pyramid upper layer optical flow is up-sampled by a factor of two, and the vector size of the pyramid upper layer optical flow is also doubled along with the increase of the optical flow scale; warpiRepresenting that the feature of a second frame of the current layer is deformed after upsampling by using the (i + 1) th layer of optical flow to obtain a deformation feature warpFi 2The deformation is beneficial to reducing the matching distance of the feature space, and the deformation and displacement of the pixel points between frames are weakened; corriRepresenting the correlation degree between the two frames of feature maps of the ith layer pyramid; cat represents the cascade connection of a plurality of characteristics on the channel dimension to obtain the multi-scale context cascade connection characteristic xi(ii) a The estimationflow represents an optical flow estimator; x represents the feature after convolution of the last convolution module of the continuous stacking convolution; maskiRepresenting the ith layer of block mask;
the mask of the obtained shielding mask characteristicsiAfter up-sampling, the data is processed by an activation function and then deformed with the lower pyramidThe features are subjected to inner product on the dimension of the feature channel, and as some useful optical flow information is masked while the edge artifact of the deformation features is removed, the optical flow information with the feature missing after masking needs to be compensated by the deconvolution upper pyramid cascade features containing the missing information, so that the optimized deformation features can be obtained (as shown by a shielding mask estimation module in a dashed frame in fig. 3); the canonical constraint of the multi-scale context occlusion mask on the second frame deformation feature can be expressed as:
Figure BDA0002828160800000061
in the formula:
Figure BDA0002828160800000062
representing features after constraining the deformed artifact with the occlusion mask; upmaski+1Representing the occlusion mask after upsampling by a factor of two; deconv denotes the deconvolution upper pyramid context cascade feature xi+1(ii) a sigmoid denotes an activation function that will occlude the maskiObtaining an occlusion probability mask between a threshold value and (0, 1);
fifthly, inputting the superposition of the optical flow, the feature map relevancy, the context cascade feature and the first frame feature of the current pyramid layer into the learnable occlusion mask estimation module in each pyramid layer, outputting one learnable occlusion mask feature for each pyramid layer to be used for regularly constraining the deformation feature of the pyramid layer in the lower layer, and continuously optimizing the optical flow error; when the pyramid bottom layer is reached, a KITTI2015 tracing image sequence shielding mask feature map (as shown in FIG. 4) is obtained; in the shielding mask characteristic diagram, the lower the gray value of a pixel point is, the pixel point tends to exist in a first frame and is in a shielding state in a second frame; on the contrary, in the shielding mask feature map, the higher the gray value of a pixel point is, the more the pixel point tends to be in a state of shielding in the first frame and appearing in the second frame; the problem of fuzzy motion edge of an optical flow estimation image is suppressed by applying constraint on deformation characteristics through a learnable occlusion mask;
constructing a secondary deformation residual flow estimation module by utilizing optical flow estimation; the secondary deformation residual error flow estimator in the module comprises a continuous five-layer convolution module, the number of channels of the continuous convolution layer is gradually decreased layer by layer and is respectively 128, 96, 64, 32 and 2, and the last layer has no activation function; a first secondary deformation residual flow estimation module (shown as a dashed line frame at the lower left of fig. 3), directly deforming a second frame feature by using an initial optical flow calculated by a topmost pyramid through a feature deformer, and then superposing the optical flow, the deformation feature and the first frame feature on a feature channel dimension and inputting the superposed optical flow, the deformation feature and the first frame feature into a secondary deformation residual flow estimator to obtain a residual flow; the sum of the residual flow and the initial optical flow is used as the optical flow estimation after the optimization of the top pyramid; inputting the second frame characteristics of the topmost optimized optical flow deformation into a learnable occlusion mask estimation module to obtain deformation characteristics after regular constraint; inputting the calculated correlation degree of the deformation feature after the mask is shielded and the first frame feature into an optical flow estimator to calculate an optical flow after the mask constraint optimization, and secondarily deforming the second frame feature again by using the optical flow after the mask constraint optimization; superposing the optical flow, the deformation characteristic and the first frame characteristic on the characteristic channel dimension, and then inputting the optical flow, the deformation characteristic and the first frame characteristic into a second secondary deformation residual flow estimation module (shown as a dashed box at the lower right of the figure 3) to obtain a secondary residual flow; the sum of the secondary residual flow and the current layer optical flow is used as the optical flow estimation of the final optimized image motion edge and the large displacement motion; the calculation formula is as follows:
Figure BDA0002828160800000071
in the formula, flowjCalculating a correlation degree by using the characteristics of the occlusion mask constrained deformation error, and inputting the correlation degree into an optical flow estimator to obtain optical flow estimation of a jth {2,3,4,5,6} layer;
Figure BDA0002828160800000072
indicating optimization of optical flow with first time deformationjPerforming secondary deformation on the second frame feature; residualflowjIs the j layer residual stream; featjIs above and below the jth layerA text cascade feature; finalflowjPredicting optical flow for layer j;
seventhly, the superposition of the first frame features, the features after secondary deformation and the optical flow of the current pyramid layer is input into a secondary deformation residual flow estimation module in each pyramid layer, and a residual flow is output from each pyramid layer to further optimize the optical flow after the occlusion mask is restrained; when the pyramid bottom layer is reached, the last layer of residual stream estimation is obtained (as shown in fig. 5); the first deformation is pixel-level optical flow estimation, the second deformation is sub-pixel-level optical flow estimation, the secondary residual flow contains rich contour information of a moving object, the optical flow field is further compensated, optical flow learning is guided, the moving edge of an image is optimized, the matching distance of a feature space is reduced, and missing optical flow information caused by large displacement motion is made up;
eighthly, finally, carrying out the same occlusion mask constraint and secondary deformation calculation residual flow in each layer of the pyramid, taking the sum of the optical flow after the occlusion mask is regular and the secondary deformation residual flow as the optical flow estimation finalflow after the motion edge of the optimized imagej(ii) a When the pyramid bottom layer is reached, outputting the final KITTI2015 tracing image sequence motion edge optimized dense optical flow estimation (as shown in FIG. 6); the larger the gray value of the pixel point in the dense optical flow estimation is, the larger the optical flow value of the pixel point is, and the faster the relative motion speed is; conversely, the larger the gray value of the pixel point in the dense optical flow estimation is, the smaller the optical flow value of the pixel point is, and the smaller the relative motion speed is; the rich motion information and the geometric structure information of the object are obtained from the optical flow estimation, and the method can be effectively applied to more advanced visual tasks.
The above description is only for the purpose of illustrating the technical solutions of the present invention and not for the purpose of limiting the same, and other modifications or equivalent substitutions made by those skilled in the art to the technical solutions of the present invention should be covered within the scope of the claims of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (1)

1. An image sequence optical flow estimation method based on a learnable occlusion mask and secondary deformation optimization comprises the following steps:
1) inputting any two continuous frames of images in an image sequence;
2) carrying out feature pyramid downsampling layering on the two selected frames of images to obtain two frames of feature images with five layers of different resolutions;
3) firstly, calculating the correlation degree of the two frame image characteristics in the highest layer of the pyramid, and then inputting the correlation degree into an optical flow estimator to calculate an initial optical flow;
4) constructing a learnable occlusion mask optimization module by utilizing the correlation, wherein the module comprises five convolution modules which are continuously stacked, each convolution module Conv comprises a 3 x 3 convolution, a batch normalization process and an activation function Leaky Relu, the learnable occlusion mask module outputs a single-channel occlusion mask characteristic by inputting the correlation, the optical flow, the context cascade characteristic and the first frame characteristic in the current pyramid layer, and the number of characteristic channels is gradually reduced layer by layer when the learnable occlusion mask module passes through the continuous convolution layers, wherein the number of the characteristic channels is respectively 128, 96, 64, 32 and 1, the last layer does not have the activation function, and the calculation formula is as follows:
Figure FDA0002828160790000011
in the formula: f 1、F 2The i-th ═ {2,3,4,5,6} layer characteristics of the first frame and the second frame are represented respectively; wherein x1 and x2 respectively represent the coordinates of the characteristic pixel points of the corresponding first frame and the second frame; if d is less than or equal to 4, setting the maximum displacement of the pixel point to be 4 pixels, and calculating the matching correlation degree, wherein the size of a target search window in the second frame feature is 9 multiplied by 9; the method comprises the following steps of (1) solving an L2 norm on a characteristic channel to carry out normalization processing on the correlation degree; representing the inner product of two features; corr represents the correlation degree between two frames of feature maps; upflowi+1Representing the optical flow after the pyramid upper-layer optical flow is up-sampled by a factor of two, and the size of the optical flow is doubled as the feature scale is increased by one time; warpiRepresenting that the feature of a second frame of the current layer is deformed after upsampling by using the (i + 1) th layer of optical flow to obtain a deformation feature warpFi 2The deformation contributing to a reduction in feature spaceMatching distance, and weakening deformation and displacement of pixel points between frames; corriRepresenting the correlation degree between the two frames of feature maps of the ith layer pyramid; cat represents the cascade connection of a plurality of characteristics on the channel dimension to obtain the multi-scale context cascade connection characteristic xi(ii) a The estimationflow represents an optical flow estimator; x represents the feature after convolution of the last convolution module of the continuous stacking convolution; maskiRepresenting the ith layer of block mask features;
the mask of the obtained shielding mask characteristicsiAfter upsampling, activating a function, and then performing inner product on the feature channel dimension with the lower pyramid deformation feature, because some useful optical flow information is masked while removing the edge artifact of the deformation feature, the optical flow information missing from the masked feature needs to be compensated by adding the deconvolved upper pyramid cascade feature containing the missing information, and the optimized deformation feature can be obtained; the canonical constraint of the multi-scale context occlusion mask on the second frame deformation feature can be expressed as:
Figure FDA0002828160790000021
in the formula:
Figure FDA0002828160790000022
representing features after constraining the deformed artifact with the occlusion mask; upmaski+1Representing the occlusion mask after upsampling by a factor of two; deconv denotes the deconvolution upper pyramid context cascade feature xi+1(ii) a sigmoid denotes an activation function that will occlude the maskiObtaining an occlusion probability mask between a threshold value and (0, 1);
5) inputting the superposition of the current pyramid layer optical flow, the correlation, the context cascade feature and the first frame feature for the occlusion mask module to obtain an occlusion mask feature map; the lower the gray value of a pixel point in the shielding mask characteristic image, the pixel point tends to be in a state that a first frame exists and a second frame is shielded; on the contrary, in the shielding mask characteristic diagram, the higher the gray value of the pixel point is, the pixel point tends to be in a state of shielding in the first frame and appearing in the second frame; the method has the advantages that constraints are applied to deformation characteristics through a learnable occlusion mask, and the problem of image motion edge blurring is restrained;
6) constructing a secondary deformation optimization module by utilizing optical flow estimation; calculating optical flow according to the correlation degree of the undistorted features or the deformed features after the mask is shielded and the first frame features, performing secondary deformation on the second frame features by using the optical flow, overlapping the optical flow, the deformed features and the first frame features on feature dimensions, passing through a continuous five-layer convolution module, decreasing the number of channels of continuous convolution layers to 128, 96, 64, 32 and 2 respectively, and outputting two-channel residual error flow if the last layer has no activation function; the sum of the residual flow and the current layer optical flow is the optical flow estimation of the final optimized image motion edge and the large displacement motion; the calculation formula is as follows:
Figure FDA0002828160790000023
in the formula, flowjRepresenting that the correlation degree is calculated by using the characteristics of the occlusion mask after the deformation error is restrained, and then an optical flow estimation of a j ═ {2,3,4,5,6} layer is obtained through an optical flow calculation layer;
Figure FDA0002828160790000031
indicating optimization of optical flow with first time deformationjPerforming secondary deformation on the second frame feature; residualflowjIs the j-th layer residual optical flow; featjIs the j level context cascade feature; finalflowjPredicting optical flow for layer j;
7) inputting superposition of the first frame characteristics, the characteristics after secondary deformation and the current pyramid layer optical flow for a secondary deformation residual flow optimization module to obtain a residual flow; the first deformation is pixel-level optical flow estimation, the second deformation is sub-pixel-level optical flow estimation, the secondary residual flow contains rich contour information of a moving object, the optical flow field is further compensated, optical flow learning is guided, the moving edge of an image is optimized, the matching distance of a feature space is reduced, and missing optical flow information caused by large displacement motion is made up;
8) carrying out the same occlusion mask constraint and secondary deformation calculation residual flow in each pyramid layer, and taking the sum of the optical flow after the occlusion mask is regular and the secondary deformation residual flow as the optical flow estimation finalflow after the motion edge of the optimized imagej(ii) a And when the image reaches the pyramid bottom layer, outputting the final dense optical flow estimation of the image sequence motion edge optimization, and obtaining the rich motion information and the geometrical structure of the object from the optical flow estimation.
CN202011454593.0A 2020-12-10 2020-12-10 Image sequence optical flow estimation method based on learnable occlusion mask and secondary deformation optimization Active CN112465872B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011454593.0A CN112465872B (en) 2020-12-10 2020-12-10 Image sequence optical flow estimation method based on learnable occlusion mask and secondary deformation optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011454593.0A CN112465872B (en) 2020-12-10 2020-12-10 Image sequence optical flow estimation method based on learnable occlusion mask and secondary deformation optimization

Publications (2)

Publication Number Publication Date
CN112465872A true CN112465872A (en) 2021-03-09
CN112465872B CN112465872B (en) 2022-08-26

Family

ID=74801930

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011454593.0A Active CN112465872B (en) 2020-12-10 2020-12-10 Image sequence optical flow estimation method based on learnable occlusion mask and secondary deformation optimization

Country Status (1)

Country Link
CN (1) CN112465872B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114926498A (en) * 2022-04-26 2022-08-19 电子科技大学 Rapid target tracking method based on space-time constraint and learnable feature matching
CN115937906A (en) * 2023-02-16 2023-04-07 武汉图科智能科技有限公司 Occlusion scene pedestrian re-identification method based on occlusion inhibition and feature reconstruction

Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070092122A1 (en) * 2005-09-15 2007-04-26 Jiangjian Xiao Method and system for segment-based optical flow estimation
US20070258707A1 (en) * 2006-05-08 2007-11-08 Ramesh Raskar Method and apparatus for deblurring images
CN107220596A (en) * 2017-05-11 2017-09-29 西安电子科技大学 Estimation method of human posture based on cascade mechanism for correcting errors
CN107527358A (en) * 2017-08-23 2017-12-29 北京图森未来科技有限公司 A kind of dense optical flow method of estimation and device
CN107862706A (en) * 2017-11-01 2018-03-30 天津大学 A kind of improvement optical flow field model algorithm of feature based vector
CN108010061A (en) * 2017-12-19 2018-05-08 湖南丹尼尔智能科技有限公司 A kind of deep learning light stream method of estimation instructed based on moving boundaries
US20180324465A1 (en) * 2017-05-05 2018-11-08 Disney Enterprises, Inc. Edge-aware spatio-temporal filtering and optical flow estimation in real time
CN108776971A (en) * 2018-06-04 2018-11-09 南昌航空大学 A kind of variation light stream based on layering nearest-neighbor determines method and system
CN109086807A (en) * 2018-07-16 2018-12-25 哈尔滨工程大学 A kind of semi-supervised light stream learning method stacking network based on empty convolution
WO2019005170A1 (en) * 2017-06-30 2019-01-03 Intel Corporation Systems, methods, and apparatuses for implementing dynamic learning mask correction for resolution enhancement and optical proximity correction (opc) of lithography masks
US20190333198A1 (en) * 2018-04-25 2019-10-31 Adobe Inc. Training and utilizing an image exposure transformation neural network to generate a long-exposure image from a single short-exposure image
CA3085303A1 (en) * 2018-05-17 2019-11-21 The United States Of America, Department Of Health And Human Services, National Institutes Of Health Method and system for automatically generating and analyzing fully quantitative pixel-wise myocardial blood flow and myocardial perfusion reserve maps to detect ischemic heart disease using cardiac perfusion magnetic resonance imaging
WO2020088766A1 (en) * 2018-10-31 2020-05-07 Toyota Motor Europe Methods for optical flow estimation
CN111311490A (en) * 2020-01-20 2020-06-19 陕西师范大学 Video super-resolution reconstruction method based on multi-frame fusion optical flow
CN111340844A (en) * 2020-02-24 2020-06-26 南昌航空大学 Multi-scale feature optical flow learning calculation method based on self-attention mechanism
US20200211206A1 (en) * 2018-12-27 2020-07-02 Baidu Usa Llc Joint learning of geometry and motion with three-dimensional holistic understanding
CN111402292A (en) * 2020-03-10 2020-07-10 南昌航空大学 Image sequence optical flow calculation method based on characteristic deformation error occlusion detection
CN111462191A (en) * 2020-04-23 2020-07-28 武汉大学 Non-local filter unsupervised optical flow estimation method based on deep learning
CN111476715A (en) * 2020-04-03 2020-07-31 三峡大学 Lagrange video motion amplification method based on image deformation technology
CN111582483A (en) * 2020-05-14 2020-08-25 哈尔滨工程大学 Unsupervised learning optical flow estimation method based on space and channel combined attention mechanism
CN111612825A (en) * 2020-06-28 2020-09-01 南昌航空大学 Image sequence motion occlusion detection method based on optical flow and multi-scale context
CN111626308A (en) * 2020-04-22 2020-09-04 上海交通大学 Real-time optical flow estimation method based on lightweight convolutional neural network

Patent Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070092122A1 (en) * 2005-09-15 2007-04-26 Jiangjian Xiao Method and system for segment-based optical flow estimation
US20070258707A1 (en) * 2006-05-08 2007-11-08 Ramesh Raskar Method and apparatus for deblurring images
US20180324465A1 (en) * 2017-05-05 2018-11-08 Disney Enterprises, Inc. Edge-aware spatio-temporal filtering and optical flow estimation in real time
CN107220596A (en) * 2017-05-11 2017-09-29 西安电子科技大学 Estimation method of human posture based on cascade mechanism for correcting errors
WO2019005170A1 (en) * 2017-06-30 2019-01-03 Intel Corporation Systems, methods, and apparatuses for implementing dynamic learning mask correction for resolution enhancement and optical proximity correction (opc) of lithography masks
CN107527358A (en) * 2017-08-23 2017-12-29 北京图森未来科技有限公司 A kind of dense optical flow method of estimation and device
CN107862706A (en) * 2017-11-01 2018-03-30 天津大学 A kind of improvement optical flow field model algorithm of feature based vector
CN108010061A (en) * 2017-12-19 2018-05-08 湖南丹尼尔智能科技有限公司 A kind of deep learning light stream method of estimation instructed based on moving boundaries
US20190333198A1 (en) * 2018-04-25 2019-10-31 Adobe Inc. Training and utilizing an image exposure transformation neural network to generate a long-exposure image from a single short-exposure image
CA3085303A1 (en) * 2018-05-17 2019-11-21 The United States Of America, Department Of Health And Human Services, National Institutes Of Health Method and system for automatically generating and analyzing fully quantitative pixel-wise myocardial blood flow and myocardial perfusion reserve maps to detect ischemic heart disease using cardiac perfusion magnetic resonance imaging
CN108776971A (en) * 2018-06-04 2018-11-09 南昌航空大学 A kind of variation light stream based on layering nearest-neighbor determines method and system
CN109086807A (en) * 2018-07-16 2018-12-25 哈尔滨工程大学 A kind of semi-supervised light stream learning method stacking network based on empty convolution
WO2020088766A1 (en) * 2018-10-31 2020-05-07 Toyota Motor Europe Methods for optical flow estimation
US20200211206A1 (en) * 2018-12-27 2020-07-02 Baidu Usa Llc Joint learning of geometry and motion with three-dimensional holistic understanding
CN111311490A (en) * 2020-01-20 2020-06-19 陕西师范大学 Video super-resolution reconstruction method based on multi-frame fusion optical flow
CN111340844A (en) * 2020-02-24 2020-06-26 南昌航空大学 Multi-scale feature optical flow learning calculation method based on self-attention mechanism
CN111402292A (en) * 2020-03-10 2020-07-10 南昌航空大学 Image sequence optical flow calculation method based on characteristic deformation error occlusion detection
CN111476715A (en) * 2020-04-03 2020-07-31 三峡大学 Lagrange video motion amplification method based on image deformation technology
CN111626308A (en) * 2020-04-22 2020-09-04 上海交通大学 Real-time optical flow estimation method based on lightweight convolutional neural network
CN111462191A (en) * 2020-04-23 2020-07-28 武汉大学 Non-local filter unsupervised optical flow estimation method based on deep learning
CN111582483A (en) * 2020-05-14 2020-08-25 哈尔滨工程大学 Unsupervised learning optical flow estimation method based on space and channel combined attention mechanism
CN111612825A (en) * 2020-06-28 2020-09-01 南昌航空大学 Image sequence motion occlusion detection method based on optical flow and multi-scale context

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
CHEN CHANG LOY等: ""LiteFlowNet3:Resolving Correspondence Ambiguity for More Accurate Optical Flow Estimation"", 《EUROPEAN CONFERENCE ON COMPUTER VISION》 *
DENIS FORTUN等: ""Optical flow modeling and computation:A survey"", 《COMPUTER VISION AND IMAGE UNDERSTANDING》 *
SHENGYU ZHAO等: ""MaskFlownet:Asymmetric Feature Matching With Learnable occlusion Mask"", 《2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
何燕: ""图像序列光流计算方法概述"", 《电脑知识与技术》 *
张聪炫等: ""深度学习光流计算技术研究进展"", 《电子学报》 *
汪明润: ""基于遮挡检测的非局部约束变分光流计算技术研究"", 《中国优秀硕士学位论文全文数据库》 *
王峰等: ""利用平稳光流估计的海上视频去抖"", 《中国图象图形学报》 *
陈震等: ""基于深度匹配的由稀疏到稠密大位移运动光流估计"", 《自动化学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114926498A (en) * 2022-04-26 2022-08-19 电子科技大学 Rapid target tracking method based on space-time constraint and learnable feature matching
CN115937906A (en) * 2023-02-16 2023-04-07 武汉图科智能科技有限公司 Occlusion scene pedestrian re-identification method based on occlusion inhibition and feature reconstruction

Also Published As

Publication number Publication date
CN112465872B (en) 2022-08-26

Similar Documents

Publication Publication Date Title
CN111882002B (en) MSF-AM-based low-illumination target detection method
CN111340844B (en) Multi-scale characteristic optical flow learning calculation method based on self-attention mechanism
CN112651973A (en) Semantic segmentation method based on cascade of feature pyramid attention and mixed attention
CN110942471B (en) Long-term target tracking method based on space-time constraint
CN112465872B (en) Image sequence optical flow estimation method based on learnable occlusion mask and secondary deformation optimization
CN111696110B (en) Scene segmentation method and system
CN113343778B (en) Lane line detection method and system based on LaneSegNet
CN112001391A (en) Image feature fusion image semantic segmentation method
CN111612825A (en) Image sequence motion occlusion detection method based on optical flow and multi-scale context
CN114724155A (en) Scene text detection method, system and equipment based on deep convolutional neural network
CN111402292A (en) Image sequence optical flow calculation method based on characteristic deformation error occlusion detection
CN116310098A (en) Multi-view three-dimensional reconstruction method based on attention mechanism and variable convolution depth network
Cho et al. Modified perceptual cycle generative adversarial network-based image enhancement for improving accuracy of low light image segmentation
CN116681978A (en) Attention mechanism and multi-scale feature fusion-based saliency target detection method
US20230260247A1 (en) System and method for dual-value attention and instance boundary aware regression in computer vision system
CN116758340A (en) Small target detection method based on super-resolution feature pyramid and attention mechanism
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion
Guo et al. Semantic image segmentation based on SegNetWithCRFs
CN115115860A (en) Image feature point detection matching network based on deep learning
CN115620118A (en) Saliency target detection method based on multi-scale expansion convolutional neural network
CN114494284A (en) Scene analysis model and method based on explicit supervision area relation
CN113255459A (en) Image sequence-based lane line detection method
CN113538527A (en) Efficient lightweight optical flow estimation method
CN111986233A (en) Large-scene minimum target remote sensing video tracking method based on feature self-learning
Lee et al. Where to look: Visual attention estimation in road scene video for safe driving

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant